Assessment and Learning-Gardner PDF
Assessment and Learning-Gardner PDF
Assessment and Learning-Gardner PDF
Th,._ On.
UllllillilIIIlll16 II
CFDC-YRO-JQNJ
Assessment and Learning
Edited by John Gardner
~ S G Publications
London l'haoJ_ Oab New DelhI
I
5
6
12
15
IIl"m T<");!
Staff draw on good practU from othI>r as. means to furth.er their OWn
profe'!lSionaI development.
Staff read rrsl'an::h rqx>rts as one of useful ideas for improving their
J""dire.
Sta/f use the web as one of uscrul ideas for improving their practice.
Students ar<' roffiulted about how !hey learn mosc effectively.
Staff relate what works in thrir own practire 10 rrsl'an:h findings.
Staff modify their praetio' in the Iighloi published _rch
Staff C<lrT)' out joint reseilrch/evaluation with oro!' or more roIleagues as a "'ay of
improving
Table 2.S 'Building SociIlI Ofpi/al' (flldo. 82) ilems
FilK:lor B2: Building !IOl.ia1 learning, working, supporting and talking "1th eh other
(alpha .. 0.7476)
hem No
16
19
20
21
"
2J
"
JIl"m Tellt
Staff regularly collaborate 10 plan their teiIK:hing.
If staff have I problem l'>1th their teaching they usually tum to colleagues
for help.
suggest ide.IS or appr0ache5 for ro!kagues 10 by out in class.
Teachers make roIltive agn.-ements 10 ltSl out new ideas.
TeolChers dis<:usa openly with rollugues what and how !hey are learning.
Staff mqu......t1y use informal opportunities to discu5l; how studmts leam
Staff offer one another reaS5Urance and support.
variables W3S aCCOUnted for by our selectL'd teacher learning variablcs. Separate
models were tested for teachers without managerial responsibility and for
middle and senior managers.
Results for teachers without managerial responsibility
As Table 2.8 shows, the teacher teaming practices, taken together, accounted for
a rather low proportion of the variance in all three variables of classroom
assessment practices.
35
Asseument and Learning
Table 2.6 'CrI'licQI Qnd UIlTni"K' (fQctor BJJ ;Itms
helor 83: Critical aM responsive learning: through reflection. experimentaliol
and by responding to fl?l!dback {alpha 0.7573)
lll.'m No
,
,
10
13
14
ltern Ton:t
Staff a"" able to sec how priJCtim> th<Jt wurk inone <'OIltext might be adapted to
othPr rontexts.
Staff rt'ned on thPir practice as a way of idl'1ltifying professiOlllllleaming lll't'\ls
Staff "penment with their prac:ti as a conscious strategy fur improving
classroom teaching and learning.
Staff modif)' thPir priOCtke in the light of {erobitck from their students.
Staff modify their priOCticfo in the light of from of their
classroom practice.
Staff modify thPir in the light <X evidence from eo.'aluations of their
classroom pra.clire by manal\ers or othl'r rolleafiUl'S.
Table 2.7 Wlr.. ;"K' (fllCtor 84) ilrms
Factor 83: Valuing learning: believing that all .sllKknts art' rnpabll' of leaming: the provision by
ll'achers of an affective eT1vironrnent in which students ran take risks with their Ie.1ming, and
teachers <'OIlrnbuting to the learning orientation of their i1Chools by idmtifying t!ll'mselve as
well as lheirstudents as learners (alpha 0.6252)
1
25
26
27
Staff as well as students learn in this i1Chool.
Staff believe that aU students all' rap"bIe of learning.
Students in this JKhoo:lI enjoy learning.
Pupil success is regularly celebrated.
Table 2.8 proportio" of uorriQ"u in l'ilC" of tilt t"t dimtnsions of dllSSlOOm ilS$l'SSmtnt
IlroJunlMjoT IIy II'OCMr ll'<lrnillg prllCtim; (ttlK"rn' riSpOli54'S}
Variables
INming explicit
Promoting IHming autonomy
Pcrfurmance orientation
Variance (%)
13.0
'.0
0.1
However, when we compared the strength of association between each of the
teacher leaming variables and each of the classroom assessment variables we
found thaI 'inquiry' (teachers' U5eS of and responses to evidence, and their col-
laboration with colleagues in ;Oint research and evaluation activity) was most
36
Professional Learning as a Condition for Assessment for learning
strongly associated with 'making learning explicit' and the 'promotion of learn-
ing autonomy', 'Building social capital' and 'critical and responsive learning'
had only weak and non-significant associations with all three classroom assess-
ment variables. None of the teacher learning variables was significantly related
to the 'perfonnanre orientation' variable.
Result5 for middle and senior managers
As Table 2.9 shows, when we analysed the school managers' responses, the
thrl'l.' teacher l{'aming variables accounted for much more of the varianre in
each of the three classroom assessment practices than was the case with data
from teachers without managerial responsibility.
Table 2.9 The proportion Of V1lrillncr iu each of tile tllree dime'lsious of c/IlSSTOOm
aSses5mflllaccou,lIedfor by teacher leaming practices (managers' respouses)
Dependent Variables Varianu ~
Making learning explicit 36
Promoting leaming aulooomy 29
Perlonnance orientaliOll 7
Independ<'fll variabl"": ilKluiry. building oocial capital critical and rcsponsi..... If'aming
All teacher teaming variables were significantly associated with 'making learning
explicit'. 'Critical and responsh'e learning' and 'Inquiry' had the strongest associ-
ations. 'Inquiry' was the only teacher leaming variable that was significantly
related to the 'promotion of learning autonomy' and here the relationship was
strong. 'Building sodal capital' and 'critical and responsive leaming' had only
weak and non-significant relationships with 'perfomlance orientation'.
Discussion
A number of themes of considerable interest arise from this an;dysis. First, there
are the differences betwi'en the responses of teachers and managers. There is a
stronger relationship between managers' perceptions of teacher learning prac-
tices and classroom assessment praclic.-'S in their schools than there is between
'ordinary' teachers' perceptions of learning by teachers in their school and their
own classroom practires. On the surface this might be expected because man-
agers with some responsibility for the work of others are likely 10 po:orceh'e these
aspects more clearly, or may even wanlto 'talk Ihem up'. However, we have
been cautious about coming to any such conclusion because teachers and man-
agers were asked different questions about classroom assessment (related to
teachers' own practices but managers' perceptiOns of others practices) which
render direct comparisons problematic. Also we know that a few managers,
37
Assessment and Learning
including one head teacher, either through choice or misunderstanding, actu-
ally completl..od the teachers' version of the questionnaire.
More interesting are the areas where results for both teachers and managers
are similar, although the strength of association differs. Four points are worthy
of note, First, the three teacher learning variables account only poorly for the
\'ariance in 'performance orientation' in classroom assessment. This suggests a
weak association and might reasonably lead to a conclusion that any perform-
ance orientation in classroom assessment derh'es very little from teachers'
l"arning practices, and probably owes more to structural constraints in the
environment such as curriculum prescriptions and performance management.
Second, and in contrast, teacher learning variables do seem to account for
variance in the two classroom assessment variables most closely allied with
assessment for learning: 'making learning explicit' and 'promoting learning
autonomy'. The fact that the strongest associations are with 'making learning
explicit' is not surprising. Although this 'baseline' questionnaire was
administered in 2002, before project development work was properly begun in
Learning How to Learn schools, there was already considerable activity in
England under th" banner of assessment for learning: some stimulated by
national initiatives such as the DfES Key Stage 3 and Primary Strategies, and
some promoted by researchers, such as the King's College, London assessment
group (see Chapter 1), or by, consultants (Sulton, 1995; Oarke, 1998, 2001). The
national strategies put particular emphasis on making learning objectives and
success criteria explicit. as does Clarke in her advice to primary schools.
TIUrd, the relationship of 'inquiry' to 'promoting leaming autonomy' is par-
ticularly interesting. The strength of the relationship is not as strong for tcach-
ers as it is for managers but the clear association suggests that teachers' uses of.
and responses to, different sources of evidence (from more formal research and
their own inqUiries) together with their collaboration with colleagues in joint
research and evaluation acth'ity, are important for the development of assess-
ment practices that lead to autonomous, independent and acth'e learning
among their students. This insight may be a key finding because other analyses
of Section A data (to be reported in detail in other publications) suggest that
'promoting learning autonomy' (for example, giving students opportunities to
decide their own learning objectives and to peer and self-assess) was the dimen-
sion of claSS1lXlm assessment practice that teachers were having the greatest
difficulty implementing, despite believing illo be important or crucial.
Final1y, and perhaps surprisingly, 'building social capital' does nol appear to
be strongly related to change in class1lXlm assessment practice, at least not in a
straightforward, linear kind of way. This implies that teacher learning practices
focused on building social capital through, for example, team building,
networking, building trust and mutual support, may be of limited value
without a clear focus on specific changes to be brought about in classrooms. In
other words, processes m.oed content (a point made in Otapter 1). So, we may
need to be cautious about allocating time, energy and resources to building
social capital if it lacks an explicit classroom focus. Indeed, teachers might
develop and use social capital as a 'polite' way of protecting their classroom
38
Professional learning as a Condition for Assessment for learning
privacy. By agreeing to collaborate with colleagues in 'safe' aspects of their
work,. such as giving and receiving moral support, exchanging resources and
the like, they can effectively keep coUeagues at 'arm's length' from the class-
room issues that really need attention. In particular, classroom-based modt'S of
collaboration can be avoided.
Conclusion: the centr.llty of le.rnlng by te.chers for the
development of .5Hument for lurning
These interpretations of results from the Learning How to Learn Project cany
thl'('(' strong messages for teachers' professional learning if it is intendl'<l to
support the developmffit of assessment for learning. First, classroom assess-
ment for learning practices are underpinned most strongly by teachers' learn-
ing in the contexts of their classrooms with a clear forus on change in teacher
and learner roles and practices and on interactions between assessment, cur-
riculum and pedagogy. Insights from work in other projects, as dC$Cribed in
Chapters 1 and 5, are thus corroborated by evidence from the Learning How to
Learn project's survey of 1,000 teachers. This implies that programmes of pro-
fessional development, whether school-based or course-based, should be
focused on classrooms and classroom practice.
The growth of interest in 'research lessons' (Stigler and Hiebert, 1999) offers
one possible approach. The idea derives from Japan where teams of teachers
identify an aspect of thl!ir teaching which is likely to have an impact on an area
of need in students' learning. They spend between one and thret' years working
in groups, planning interventions which may prove effective, closely observing
these 'research lessons' and deconstructing and writing up what they learn -
from failures as well as successes. At the end of a cycle of studies they may
teach a 'public research lesson' before an audience of peers from local schools
and colleges in order to share the practice and widen the critique. These studies
are Widely read by Japanese teachers who contribute more than 50 per cent of
the educational research literature produced in the country (Fernandez, 20(2).
Lesson study has been developed in a number of locations in the USA over the
past seven years. It is also used in the National Collo:>ge for School Leadership's
Networked Learning Communities projects in England and is the particular
forus of a research training fellowship linked to the Learning How to Learn
Project (Dudley, 2(04). There are other possible approaches, of course, and the
Learning How to Learn Project will report some of these in other publications.
Second, as tne above account of research lessons demonstrates, both n ~
vidual and social processes of teacher learning are to be valued. Our survey
findings indicate that both are important conditions for the promotion of
assessment for learning in classrooms. This justifies the approach taken in thc
KMOFAP and the Learning How to Learn Pro;ect in providing. or encouraging.
opportunities for teachers to learn together in in-school teams, departmental
teams and across-school groups.
Third, if 'promoting learning autonomy' is the ultimate goal but the greatest
challenge, as our evidence suggests it is, and if 'inquiry' approaches to teacher
39
Assessment and learning
learning are productive in this respect. then more emphasis needs to be placed
on providing opportunities and encouragement to teachers to engage with and
use research relevant to their classroom interests, and recognizing the value of,
and supporting, teachers' collaborative inquiries into their own practices. The
first was a strength of the way that the findings of the 1998 Black and Wiliam
review of classroom assessment l'E'search was disseminated to teachers and
used as a basis for in-service work (see Chapter 1). The second reflects
Stenhouse's (1975) belief that 'it is teachers who, in the end, wj]] change the
world of the school by understanding it' and that a 'research tradition which is
accessible to teachers and which feeds teaching must be created if education is
to be significantly improved' (p. 208). He argued for teacher research on the
grounds that, 'It is not enough that teachers' work should be studied, they
need to study it themselves' (p. 208). In the thirty years since he wrote these
words, many forms of teacher research and inqUiry have flourished, some
more focused on student learning than others. Our research suggests that
classroom-based teacher research and inquiry is not only an important strand
of teachers' continuing learning, as Stenhouse argued, but also an important
faclor in helping students develop independence and autonomy in their
learning. The explanation for this might be quite simple, yet profound. If
teachers are prepared and committed to engage in the risky business of
problemattr:ing their own practice, seeking evidence to evaluate in order to
judge where change is needed, and then to act on their decisions, they are thus
engaging in assessment for learning with respect to their own professional
learning. Helping students to do the same with respect to their learning
bt!comes less challenging because teachers are familiar with the principles and
processes through inquiry into their own practices. In other words, they are
well on the way to conceptualizing, developing and valUing expanded roles
for themselves and their students in teaching and learning.
Assessment for learning and inquirybased learning by
teachers as parallel processes
In a prt'Sl'ntation on the theme of 'Conditions for Lifelong Learning', to an audi-
ence of policy makers from the UK Department for Education and Skills and
related agencies in July 2003, we argued that teachers and schools needed to
develop the processes and practices of learning how to learn if they are to create
the conditions for students to learn and to learn how to learn. We saw assess-
ment for learning at the heart of this. For teachers this implied that they need
to: valul' learning and engage with innovation; draw on a widl' range of evi-
dence, for example peer observation, student consultation, research results and
web resources; reflect critically on and modify their practice; and engage in
both individual and ooUective learning, in an atmosphere of confidence that
they can help students improve. We made reference to the Assessment Refonn
Croup's (ARC, 2OO2a) definition of assessment for learning and argued that:
'Whether learners are students, teachl'rs or schools, learning how to learn is
40
Professional learning as a Condition for Assessment for learning
achieved when they make sense of where they are in their learning.. decide
where they need to go, and how best to get there'. We hypothesized that the
processes are parallel for both students' learning and teachers' learning. The
evidence of analysis of responses to the learning How to learn Project's teacher
questionnaire, suggests that this hypothesis was well founded.
Implications
In the context of this book, the implications of this study are substantial
because there is still evidence that classroom assessment practices need to be
improved. The Annual Report of Her Majesty's Chief Inspector of Schools for
2003/04 states that teaching and learning could be improved by the better use
of assessment in primal')' schools. In secondary schools the situation is e\'en
worse: 'The use of assessment in meeting individual students' needs remains a
weakness generally and is unsatisfactory in well over a tenth of schools' (see
OFSTED, 2005). If this situation is to be remedied, our evidence suggests that
proper attention needs to be given to teachers' professional development in
this area. However, current approaches to 'rolling out' the lessons from
effective small-scale research and development may not meet the need. As the
Learning How to learn Project has discovered, whilst some changes in
practice, such as sharing learning objecti\'es with students, have been achieved
by many teachers there is a danger that these can remain at the level of surface
changes in procedures. Oc<>per changes, and those that are perhaps even more
fundamental, such as promoting indept!ndent and autonomous learning,
remain more difficult. The5e changes do not happen through the agency of
consultants and the distribution of ring-binders fuIl of material, although
these can have a role. In the end, change can only be embedded if teachers
actively engage with the ideas and principles underpinning the advocated
practices and if the environments in which they work are supporting such
engagement.
Our evidence suggests that the encouragement and support required to
make classrooms the focus and site of individual and ooUective professional
development are vital, In this way teachers can, in the context of reflective prac-
tice, develop the principles for action that provide frames of referenCt' when
confronted with unexpected circumstances. For this reason, an inquiry-based
approach is also vital, although it has profound implications for school leader-
ship and policy more widely, Not only do teachers need access to relevant
knowledge llJ\d skills for drawinS on research and developing their own
inquiries, they also need pennission to experiment. and occasionally to fail, and
then to learn from these failures. In the current climate of accountability this is
difficult - but not impossible. As one infants' teacher involved in the Learning
How to Learn Project claimed:
TIu! focus on learning how /0 learn enabled professiollill dialogue /0 flourish, p ~
mo/ed col/abora/IV( learning opportunities for childrtn lind lIdul/s lind d(l)t'/oped II
darer ulldffSlaudillg of some of the f f l ~ n l s Ihllt contribute 10 suuessful
4'
I
Assessment and Learning
leaming. It has bee,. one of tit.! most powerful professional dev;'lapmmt opportu-
nities in my raret'r and has my teaching and ll'amillg.
Likewise, a serondary school head teacher;
Assessmtnl for learning has bmt II joy. 11 is intellectually profound, yet eminently
practical and acclSsible. {II/ hils enhanced 1M learning oills al/. f have lUI dOI,bt
l!ul! Qur childrm art now bettt1' taught than ever before. /I has bern the IJi.sI tdu-
elltiollal dwtlapmelll of my career.
What is particularly interesting about these two statements is that neither
dearly distinguishes between the learning of students and the learning of teach-
ers. The two are closely associated. Moreover, the classroom and school bel:ome
the environment for effective learning by both groups, and assessment for
learning is at the heart of this.
Notes
I The Le;llming How 10 u-am - in Classrooms, Schools and Networks Project
was a four year development and research project funded from January 2001
10 June 2005 by the UK Eronomic and Social Research Council as pari of
Phase II of the Teaching and Learning Research Programme (see
http://www.tlrp.org ). The Project (ref: L139 25 1020) was directed by Mary
James (Institule of Education- University of wndon) and co-directed by
Robert McConnick (Open University). Other members of the research team
were Patrick Carmichael, Mary-Jane Drummond, John MacBeath, David
Pedder, Richard Procter and Sue Swaffield (University of Cambridge), Paul
Black, Bethan Marshall (King's College, wndon), Leslie Honour (Vni\'ersity
of Real:ling) and Alison Fox (Open University). Past members of the team
were Geoff Southworth, Uni\'ersity of Reading (until March 2002), Colin
Conner and David Frost, University of Cambridge (until April 2003 and April
2004 rt"speelively) and Dylan Wiliam and Joanna Swann, King's College,
London (until August 2003 and January 2005 respedively). Further details
are available al hltp:llwww.leamtoleam.ac.uk.
2 Forty-three schools were inilially recruited to the project from five local edu-
cation authorities (Essex. Hertfordshire, Medway; Oxfordshire, Redbridge)
and one virtual education action zone (Kent and Somerset VEAZ). During the
lifetime of the project five schools withdrew. The criteria used for selfftion of
schools were:
A willingness by schools to be involved in the for the durOltion,
and actively to contribute to project ideas;
Six schools to be chosen from each main LEA (fewer for Redbridge and
the VEAZ) with the proportion of one se.::ondary school to two primary
schools, preferably in clusler groups;
A range of contexts to be represented in the overall sample: urban/rural;
small/large; mono-ethnic/multi-ethnic;
42
Professional learning as a Condition for Assessment for learning
Schools' pl'rformance at one or two key stages to have been allocated a 'C'
benchmark grade, in !he Office for Standards in Education Performam:!"
and Assessment (PANDA) Report at the beginning of the project, thai is,
based on their results in 2OCXJ. This is a crude measure of 'averagely' pl'r-
fonning schools. Not all schools in Rcdbridge, which were added to the
sample in response to a spedal request from the LEA, conformed to this
criterion;
Schools to be within a reasonable distance from the unh'ersity
bases of researchers.
3 1lle dual scale fonnat adopled for all three sections of this questionnaire was
shaped by assumptions similar 10 those thai informed the design of the
'teacher questionnaire' used in the Improving School Effectiveness Project
(lSEP) (MacBcath and Mortimore, 2001).
43
Part II Theory
Chapter 3
Assessment. Teaching and Theories of Learning
Mary James
The discussion of formative ilssessment practice and implications for leachers'
professional learning.. in ChapteTS I and 2 draws attention to the dose rela-
tionship between assessment and pedagogy. Indeed, the argument in bolh
chapters is that effective assessment for learning is central and integra! to teach-
ing and learning. This raises some theoretical questions about the ways in
which assessment and learning are conceptualized and how they articulate.
nus chapter considers the reliltionship belwr.'Cn assessment practice and the
ways in which Ii'll' processes and outCOffi(,S of learning are undt'rstood, which
also has implications for Ii'll' curriculum and teaching.
Starting frum an assumption that there should be II degree of alignment
between assessment OUT understanding of lo.'arning. a number of different
approaches to the practice of classroom assessment are described and analysed
for the !X'rspediws on learning that them. Three clusters of the-ories
of learning are identified and their implications for assessment practice are dis---
cussed. The point is made that learning theorists themselves rarely make state-
ments about how learning outcomes within their models should be assessed.
This may account for the lack of an adcqu<lte theoretical base for some assess---
ment practiCl.-"S and, conversely, for a lack of development of assessments
aligned with some of the most intl'fCsting nl'W learning theory. The chapter con-
cludes with a discussion of wheth('r eclectic or synthetic models of assessments
matched to learning arc feasible. The intention here is to treat the concepts
broadly and to pruvide a basis for more specific consideration of particular
issues in the two dmpters following this one, and indf.>ed in the ro.>st of the book.
Thus Chapter 4 the role of assessment in motivation ror learning <lnd
Chapt!!r 5 on the thL'Ory uf fonnative assessment.
Alignment between assessment and learning?
The alignment (Biggs, 1996; Biggs and Tang. 1997) of assessment with learning.
teaching and content knowledge is a basis for claims for the validity of assess---
ments (see Chapter 8), but the relationship is not straightforward and cannot be
taken for granted. Indeed there are plenty of examples of assessment practices
that have only tenuous or partial relationships with current understanding of
47
I
Assessment and Learning
\
.i
h
\.'k' h .
cammg wlI In parhcu aT omams. ,;1 c, or mstance, S ort answer lests In
science that require a recall of facts but do not begin to tap into the under-
standing of concepts or the in....estigative pl'OCE'SSeS, which are central to the
'ways of thinking and doing' (Entwistle, 2005) thai characterize so:;ience as a
subject>discipline. Nor do assessment practices always take sufficient account of
current understanding of the ways in which students learn subject maller, the
difficulties they encounter and how these are overcomt>.
Historically, much assessment practice was founded on the conlent and
methods of psychology, the kind of psychology especially that deals with
mental traits and their measurement. Thus classical test theory has primarily
been concerned with differentiating between individuals who possess rerlain
attributes, or in d('termining the degree to which they do so. This 'differential-
ist' perspedive is still very evident in popular discourse {sec for ('xample,
Phillips, 1996). The focus tends to be on whether some behaviour or quality can
be detected rather than the process by which it was acquired. However, during
the twentieth century our understanding of how learning occurs has developed
apace. It is no longer secn as a private activity dependentlargel)', if not wholly,
on an indh'idual's possession of innate and usually stable characteristics such
as general intelligence. Interactions between people, and mediating tools such
as language, now seen to ha\'(' crucial roles in learning. Thus the assessment
of learning outcomes needs to take more account of the social as well as the
individual procCSSl'$ through which learning occurs. This ret:]uires expansion of
perspedives on learning and assessmt'nt that take more account of
from the disciplines of social-psychology, and anthropology.
Similarly, insofar as assessments are intendt'd to asS(>ss 'something'. thai is,
somt' content, account needs to be taken also of the way the subrect domain of
relevance is structured, the key concepts or 'big ideas' associated with it, and
the methods and prOCt'sses Ihal characterize practice in the field. This is an
important basis for construct validity without which ilssessments are valueless
(sec Chapter 8). This "'-"quirement implies some engageml'nt with idl'ils frum
the branch of philosophy that deals with the nature of knowledge, that is, epis-
temology. Thus psychological, social-ps)'chological, sociological and I'pistemo-
logical dimensions all need to be taken into consideration al some level in the
framing of assessment practice. This is no easy task for assessment experts and
may seem far too an expectation of classroom teachers; yet one
exped their training to provide them minimall)' with pedagogical content
knowledge (Shulman. 1987), a basic understanding of how people learn (Ieilm-
ing theory), and some assessment literacy (Earl et aI., 2000) in order to plltthese
things together. The difficulty, in the climate that has developed around initial
teacher training over the last fiftet'n years, has beo!n the reduction of teaching to
a fairl), atomistic collection of tcrhnical competences. This is antilheticilllO the
s)'noptic and synthetic apprwch that teachers may nCi:'d to acquire in ordl'r to
align their teaching and assessment practice with th..ir undl'rstanding of learn-
ers, learning and subiect knowledge.
Teachers are not helped by the fact that formal external assessments - often
with high stakes allached to them - are oHen not well aligned either. Whilst
48
Assessment, Teaching and Theories of learning
exdting new developments in our understanding of learning unfold, develop-
ments in assessment systems and technology sometimes lag behind. Even some
of the masl inJlO\'ative and novel developments, say, in e-assessment, are
underpinned by models of learning that are limited or, in some cases, out-of-
date. This is understandable too oc>eause the development of dependable
assessments - always an important consideration in large-scale testing -is asso-
dated with an elaborate technology which takes much time and the skills of
measurement experts, many of whom haVing often acquired their expertise in
the very specialist field of psychometrics. This is especially true in the USA
which has a powerful influence on other anglophone countries (s.ee Chapter 10).
In this book we are primarily interested in classroom assessment by teachers,
but research tells us that teachers' assessment prama- is inevitably influenced
by external aSSl'SSment (Harten, 2(04) and teachers often usc these assessments
as models of their own, even if they do not use them directly. By using models
of assessment borrowed from elsewhere, teachers may find themselves sub-
scribing.. uncritically or unwittingly, to the theories of learning on which they
are based. Some teachers do have clear and internally consistent theork'S of
learning to underpin their assessment practict', and they are able to articulate
them, as teachers invoh'ed in the KMOFAP (Black et aI., 2003; see Chapter l)
and others investigated by Harlen (2000) illustrate. But some disjunction
between 'espoused theory' and 'theory-in-practice' (Schon, 1983) is common, as
is a lack of thL'Qretical cohert'na-. This raises a question about wht'ther it really
matters which conceptions of learning underpin classroom assessment prOle-
Ua.>s if they are deemecl to 'work' well enough, and whether the need for con-
sistency between teaching.. learning and assessment might be overrated.
My view is that it does matter because some assessment practices are very
much less effective than others in promoting the kinds of learning outcomes that
are l"lIX'ded by young people today and in the fuh.lre (see James and Brown. 2005,
for a discussion of questions for assessment arising from different conceptions of
learning outcomes). As Chapler 4- makes clear, the most ,""luable le"rning oul-
comes in enabling human flourishing - as citizens, as workers, as family and
community members and as fulfilled individuals- are those tlult allow continued
learning.. when and whef(' it is required, in a rapidly changing information- and
technology-rich environment. Thef(' is a need, therefore, for teachers to have a
view about the kinds of learning that af(' most v"luable for their students and 10
choose and develop approaches to teaching and assessment accordingly.
Helping teachers to become more effective may therefore mean both changes
in their assessment practice and changes in their beliefs about le"ming. It will
entail development of a critical awareness that change in one will, and should,
inevitably lead to the need for change in the other. So, for instance, implement-
ing assessment for learning/formative assessment may require a teacher 10
rethink what effective learning is, and his or her role in bringing it about. Sim-
ilarly, a change in their view of learning is likely to require assessment practice
to be modified. While the focus of this book is mainly on formative assessment,
a good deal is relevant to classroom-based summalive assessment by which
teachers summarize what has been achieved at certain times.
49
I
Assessment and Learning
Examp'" of dlff....nt classroom assusment practices
So, what might classroom assessments practices, aligned with differenllheories
of leaming. look like? Consider the following examples. They are written as car-
icatures of particular approaches in order to provide a basis for subsequent dis--
Ulssion. In reality, the differences are unlikely to be so stark and teachers often
blend approaches. The focus of the examples is a secondary school teacher who
has JUS! received a new student inlo her English class. He has recently arrived
in the rountry and English is an additional language for him although I'll'
speaks it reasonably well. The teacher wants to assess his writing. [f she chooses
one of the following approaches what would il say about her model of knowl-
~ r ~ ._m,,"
She sits him in a quiet room by himself and sets him a timed test thai consists
of short answer questions asking him, without recourse to reference material or
access to other students, to: identify parts of given sentences (nouns, \'erb5, arti-
des, connectives); make a list of adjectives to describe nouns; punctuate sen-
tences; spell a list of ten words in a hierarchy of difficulty; write thn>e sentences
describing a favourite animal or place; write the opening paragraph of a story.
She then marks these using a marking scheme (scoring rubric) which enables
her to identify incorrect answers or weaknesses and compare his perfonnance
with others in the class. As a result she places him in a group with others at a
similar level and then provides this group with additional exeroses (0 practise
perfonnance in areas of weakness. When he shows improvement she is liberal
with her praise and then moves on to the next set of skills to be learnt. learn-
ing by rote and practice is a dominant feature of this approach.
Example 2
As part of her class teaching.. she has been covering work on 'genre' in the pro-
gramme of study. Her current focus is narrative and esp..'Ciaily the aspect of
temporal sequencing. The class has been reading Tolkien's The Hobbit and she
used this as a stimulus for their own writing of stories of journeys in search of
treasure. 11te students discuss the qualities of Tht Hobbit that make it a good
story, including structure, plot, characterization, use of language and dramatic
t('nsion (all key concepts to be understood). These they note as things to con-
sider in their own writing. Using a writing frame they first plan their stories and
then try out opening paragraphs. They write their stories over a series of
lessons. At draft stages they review their work, individually with the teacher
and through peer discussion using the criteria they have developed. Then they
redraft to improve their work using the feedback they have received.
The teacher monitors this activity throughout and observes that her new
student has a rich experience of travel to draw on, although some of those expe-
riences have bt.>en negative and need 10 be handled sensitively. With English as
50
I
Assessment, Teaching and Theories of learning
an additional language he knows more than he can say and needs to be helped
to acquire a wider vocabulary. He also has problems with sequencing which she
thinks could indicate a specific learning difficulty or a different cultural con-
ception of time. She makes a mental note 10 observe this in future activities. In
the meantime she decides to provide lots of opportunities for him to engage in
classroom talk to help with the first difficulty. To help with the sequencing dif-
ficulty, she suggests that he writes topic sentences on card and cuts them out so
that he can physically move them round his table until he gets them in a satis-
factory order. When his story is complete, the student is asked to record his own
self-evaluation and the teacher makes comments on this and his work whim
they discuss tugether to decide next steps. She does not make much use of
praise or numerical scores or grades because, by making learning explicit, he
understands the nature and substance of the progress he has made.
Example 3
1lle teacher regards one of her main aims as helping to develop her students as
writers. To this end she constructs her classroom as a writing workshop. The
new student is invited to ;oin this workshop and all participants including the
teacher and any learning support assistants are involved, on this occasion. in
writing stories for children of a different age to themselves. Although their own
writing or that of others including established authors is used to stimulate
thinking and writing.. aU members in the group, from the most expert to the
most novice, are encouraged to ~ t their own goals and to choose an individual
or group task that will be challenging but achievable with the help of the
knowledge and skill of others in the group. There is no concept of a single spe-
cific goal to be achieved or a performance 'gap' to be closed but rather a
'horizon of possibilities' to be reached. The broad learning goal is for all
membf.>rs of the group to develop their identities as writers.
By participating together in the activity of writing.. each member of the group
has the opportunity to learn from the way others tackle the tasks (rather than
being told how kl do things). Different members of the group take on the role
of student and teacher according to the particular challenges of a given activity.
For example, if the teacher wants to write a sklry for young people she might
need to learn about street language from her students. Thus they become her
teachers. At intervals the members of the group read their work to the rest and
the group p p r ~ it, drawing on the criteria they use to judge what counts as
good work. 'nU$E' criteria may be those shared by writers more generally (as in
Examples I and 2 above) but the dynamic of the group might allow new crite-
ria to emerge and be accepted as norms for this group. For example, the intro-
duction of a new student member with a different cultural background could
encourage more experimental work in the group as a whole.
The model is in some respects similar to apprenticeship models, although
these lend to be associated with the preservation and maintenance of guild
knowledge. In other respects it goes beyond this and, like the University of East
Anglia's well-known creative writing course, it seeks to foster creativity. Our new
51
Assessment and Learning
student begins by being a peripheral participant in this wntmg workshop,
ob5erving and le,lming from what others do, but gradually he is brought into the
group and becomes a full participating member. Assessment in this context is
ongoirlg,. continuous and shared by all participants (not justlhe preserve of the
teacher) but linked very specifically to thl;' p<lrticular activity. Thl;'re is often less
concern to make general statements about competence and more concern to
appraise the quality of Ill<' particular perfonnance or artefact, and the process of
producing it. It is ronsid('red especially important to ('valuate how well the
student has used the resources (tools) available to him. in tenns of materials, tt'Ch-
nology, people, language and ideas, to soh'(' the particular problf.'IDS hf.' faced.
TheJeaming is focusl.,d on an authentic proiect so one of the most important
indicators of success wlll be whether the audience for the stories produced
(other children) responds to them positi\'elr- Their response will also provide
key fonnative feedback to be used by the individual student and the group in
future projt"Cts. The role of the English teacher is therefore not as final arbiter of
quality. but as 'more expert other' and 'guide on the side'. Learning outcomes
are be!.it recorded and dcmollStrilted to othf.'rs through portfolios of work, rather
like those produced by art students, or through the vehicle of the 'masterpit"Ce'
(the 'piece for the master craftsman' designed to be a demonstration of the ~
of whid. an apprentice is capable - also a model for the doctoral thesis).
Each of thl.'Se cxamplt.'S looks very different as a model of teaching. learning
and assessment, yet each is internally CQnsistent and df.'monstrates alignment
between: a conception of valued knowledge in the sub-domain (writing in
English); a view of learning as a process and its implications for teaching; and
an appropriate method for assessing the prQO.'SS and product of such learning.
Of course, I;'ach of thesf.' eleml.'nlS may be contested, as al"(> thl.' theories on which
they all' founded. These theories are elaborated in the next section.
The theoretical foundations of learning and
assessment practice
In this section I will considl.'r three views of learning. identifyinf; th..ir manifl.'s-
tation in classroom practice and the role of assessment in ea(h. Thl.' three I.'xam-
pIes given in the previous section were attempts to portray what each of these
might look like in thl.' real world of schools: to put flesh on theorf.'tical bones. In
reality however, teachers combine these approaches by, for instance, incorpo-
rating elements of Example I into Example 2. or combining clements of
Examp\(' 2 with Example 3. Thus boundaries are blum'!!. Similarly, the per-
spectives on learning considered in this section are broad clusters or families of
theories. Within ..ach c1ustl.'r there is a spectrum of v;ews that sometimes over-
laps with another cluster, therefore it is difficult to claim exclusivity for each
category. For example, constructivist rhetoric can be found in beha\'iOllfist
approaches and the boundary between cognitivist constructivism and social
construttivism is indistinct. This may be helpful because, in pr;lclice, teachers
often 'cherry-pick'. Whilst theorists can objt"Ct that this does violenn' to the
52
Assessment, Teaching and Theories of learning
coherence of their theories and their intellectual roots I will argue, in the next
section of this chapter, that teachers may have grounds for combining
approaches.
In the US literature (Greeno et a1., 1996; Bredo, 1997; Pellegrino et at., 20(1)
the three perspectives are often labelled 'behavorist', 'cognitive', and 'situated',
but within the UK,. drawing more on European literature, the labels 'behav-
iourist', 'constructivist', and 'socio-uJltural' or 'activist' are sometimes pre-
These two sets of labels are combined in the descriptions below tx'Causc
they are roughly equivalent Each of these perspectives is based on a view of
what learning is and how it takes place; it is in respect to these key questions
that they differ. However, and this is an important point - they do not neces-
sarily claim to have a view about the implications for the construction of learn-
ing environments, for teaching or assessment This has sometimes created
problems for learning theorists tx'Cau5e practitioners and policy makers usually
expect them to have a view on these mailers, and if this is not the case then
there arc those who will try to fill the gap; some successfully and others less so.
The Learning Working Group set up in 2004 by David Miliband, the then
Minister for School Standards in England, noted this with respect to Gardner's
theory of multiple intelligences:
171 Ill.. caSt' of multiple inlelligences Iht'r!' haVf' ulldOi/btedly ban colISt'ljuellas ill
fflUClltiOlI thai Gardller did not illtend. alld SOOIl h.. b.!gan 10 distance himSt'1ffrom
som!' vf lile applielltions in ilis nllme thai he witllt'SSt'd ill schools:
, ... I learned that all elltire slale in Auslralill had advl'led all eduelltiolla/ program
bused ill /lIlrl on Ml throry. The morel /earllI'd Ilwut this program, Ihe/ess rom-
fi!rtaMe I WOIS. While pilrls of IIlI' program w"re ",aSOllllblf alld oosed 011 rt'St'arclr,
mudl of it WOIS 11 mishmash ofpractices, with ueithl'r seie,'tijic jtJlIndatioll nor clin-
ical warrant. Left-brain aud rigllt-braill cOl1lrllsls, scl/sory-based leanlillg styles.
"nium-linguistic Illld MI approachl'S rommillg/I'd u'ilh dauting
promiscuity.' (Hllrgrern.'I'S, ZOOS: 15}
The theory of multiple intelligences is not a theory of learning, stricti)' speak-
ing, but a theory of mental trails. The point is an important one be<;:ausc the
scholarship of learning theorists is, by definition, focused on learning per sc and
not necessarily the implications and application of their ideas for pedagogic
practice. To take this SE'cond step r<X!ui .... 'S applications to be equally rigorously
investigated if they are to be warranted (see James et aI., 2005). In Gardner's
caSE' this lVas the reason for his key role in HaT\'ard's Project UfO which applied
his ideas to practice (Projcct Zero, 2005).
Bearing these cautions in mind the following account summarizes, in a
schematic and IlCCt'$3rily brief way, the key ideas associated with each 01 the
three families of learning theories. First, how learning takes place (the process
and environment for learning) and second, how achievement (the product of
learning) is construed. This is as far as some theories go. However, and vcry
tentatively, I will also extract some implications for teaching and assessment
53
,
Assessment and learning
I
thai would seem to be consistent with the theory, as illustrated in the examples
in the section above.
Behaviourlst theories of leaming
Behaviourisl theories emerged strongly in the 1930s and are most popularly
associated with the work of Pavlov, Watson, Skinner and Thorndike. Behav-
iourism t;emained a dominant theoretical perspective into the 1960s and 1970s,
when some of today's teachers w('re trained, and can still bt> Sft'n in behaviour
modification programmes as well as everyday practice. Breda (1997), who is
particularly interesting on the subject of the philosophical and political move-
ments that provide the background to these developments, notes the' associa-
tion with the political conservatism that followed the end of World War I and
the growth of positivism,. empiridsm, technicisrn and managerialism.
According 10 these theories the environment for learning is the determining
factor. U!aming is viewed as the conditioned response to edernal stimuli,
Rewards and punishments, or at least the withholding of rewards, are power-
ful ways of forming or extinguishing habits. Praise may be part of such a
reward system, Thesc theories also take the view that complex wholes are
assembled out of parts, SO learning can best be accomplished when complex
performances are deconstructed and when each element is practiscd, reinfora.-d
and sub!ll'l:Juenlly built upon. These theories have no concept of mind, intelli-
gence, ego; there is 'no ghost in the machine'. This is not to say that
such thearish d<!I1Y the existence of human consciousness but that they do not
feel that this is necessary to explain learning; they are only interested in obS('ry-
able behaviour and claim that this is sufficient. From this perspective, achiey('-
ment in learning is often equated with the accumulation of skills and the
memorization of information (facls) in a given domain, demonstrated in the for
mation of habits that allow spt'l'dy performance.
Implications for teaching construe the teacher's role as being to train people
to r=pond to instruction correctly and rapidly. In curriculum planning.. basic
skills are introduced before complex skills. Positive feedback, often in the form
of non-specific praisc, and correction of mistakes are used to make the connec-
tions behveen stimulus and response. As for the environment for learning..
these theories imply that students are best taught in homogeneous groups
according to skill level or individually according to their rale of progress
through a differentiated programme based on a fixed hierarchy of skill acquisi-
tion. Computer-based typing 'tutors' are paradigm examples of this .. Uhough
the approach is ..Iso evident in vocational qualific<ltions post-16 (for example,
the UK General National Vocational Qualification or GNVQ) where leaming
outcomes are broken down into tightly specified components. In the early days
of the national curriculum the disaggregation of allainment levels into atom-
ized statements of attainment reflected this approach. The current Widespread
and fre,uent use of Key Stage 2 practice tests to enhance scores on national tests
in England also rests on behaviourist assumptions aboutleaming.
Implications for assessment are that progress is measured through unseen
54
I
Assessment, Teaching and Theories of learning
timed tests with items taken from progl'l:'SSive levels in a skill hierarchy. Perfor-
mance is usually interpreted as either correct or incorrect and poor perfonnance
is remedied by more practia' in the incorrect items, sometimes by deronstruct-
ing them further and going back to even more basic skills. This would be the
only feasible interpretation of formative assessment according to these theories.
Example 1 in the previous section comes close to this characterization.
cognltl..... c;onstructl...lst theorl.s of learning
As with behaviourist and sodo-cultural theories, these derive from a mix of
intellectual traditions including positivism.. rationalism and humanism. Noted
theorists include linguists such as Chomsky, computer scientists such as Simon.
and cognitive scientists such as Bruner (who in his later writing moved towards
socio-cultural approaches; see Bruner, 1996). Rec:ently, neuroscientists ha\'e
joined these ranks and are offering nC\',' perspectives on theories that began
their real growth in the 19605 alongside and often in reaction to behaviourism.
Learning., according to these theories, requires the active engagement of
learners and is determinL>d by what goes on in people's heads. As the refel'\!nce
to 'cognition' makes clear, these theories are interested in 'mind' as a function
of 'brain'. A particular focus is on how people construct meaning and make
sense of the world through organizing structures, concepts and principles in
schema (mental models). Prior knowledge is I'\!garded as a powerful determi-
nant of a student's capacity to learn new materia1. There is an emphasis on
'understanding' (and eliminating misunderstanding) and probll'm solving is
seen as the context for knowledge construction. Processing strategies, such as
deductive reasoning from principles and inductive reasoning from evidence,
are important. Differences between experts and novices are marked by the way
experts organize knowledge in structures that make it more retrievable and
useful. From this perspective, achievement is framed in terms of understanding
in relation to conceptual structures and competence in processing strategies.
The two components of metacognition - St'lf-monitoring and St'lf-regulation -
are also important dimensions of leaming.
This perspective on learning has received extensive recent attention for its
implications in relation to teaching and assessment. The two companion volumes
produced by the US National Research Council (Bransford et al., 2000; Pellegrino
et aI., 20(1) are perhaps the best examples of the genre currently a\'ailable. With
the growth of neuroscience and brain research. there are no signs that interest will
diminish. The greatest danger seems to be thatlhe desire to find applications will
rush ahead of the science to support them (see the quote from Gardner above).
Cognitivist theories are complex and differentiated and it is difficult to summa-
rize their overall implications. However, in essence, the role of the teacher is to
help 'novices' to acquire 'expert' understanding of conceptual structures and pro-
cessing strategies to solve problems by symbolic manipulation with 'less search'.
In view of the importance of prior learning asan influence on new learning.. fonn-
ative assessment emerges as an important integral element of pedagogic practice
because it is necessary to elicit students' mental models (through. for example,
55
Assessment and Learning
classroom dialogue, open-ended assignments, thinking-aloud protocols and
concept-mapping) in order to scaffold their understanding of knowledge strnc-
hHes and to provide them with opportunities to apply concepts and strategies in
novel situations. In this context teaching and assessment are blendoo towards the
goals of leaming, particularly the goal of closing the gap between current under-
standil)g and the new understandings sought Example 2 in the previous section
illustrates some aspects of this approach. [t is not surprising therefore that many
formulations of fonnative assessment are associated with this particular theoret-
ical framework (see Chapter 5). Some experimental approaches to summative
assessment are also founded on these theories of learning, for example, the use
of computer software applications for problem-solving and as
a measure of students' learning of knowledge structures (see Pellegrino et aI.,
2001; and Bevan, 2004, for a teacher's usc of these applications). Huwe""r, thl.'Se
assessment technologies are still in their infancy and much formal testing still
relies heavily on behavioural approaches, or on psychometric or 'differentialist'
models. As noted earlier, these are often not underpinned by a thoory of learning
as such because they regard individual ability to learn as lx-ing related to innate
mental Characteristics such as the amount of general intelligenre possessed.
Soc:io-<ultural, situated and activity theories of learning
The socio-<:ulrural perspective on learning is often regarded as a new develop-
ment but Bredo (1997) traces its intellectual origins back to the conjunction of
functiol\ill psychology and philosophical pragmatism in the work of James,
Dewey and Mead at the beginning of the twentieth century. Associated also
with social democratic and progressivist values, these theoretical approaches
acrually stimulated the conservative backlash of behaviourism. Watson, the
principal evangelist of \x>haviourism, was a studt'nt of Dewey at Chicago but
admitted that he never understood him (citoo in Brooo, 1997: 17). The intcrac-
tionist views of the Chicago school, which viewed human development as a
transact\On between the individual and the environment (actor and structure),
derived from Gemlan and British (Darwin) thought but also had some-
thing in common with the development of cultural psychology in Russia, asso-
ciated with Vygotsky (1978) and deri.ed from the dialectical materialism of
Marx (see Edwards, 2005, for an accessible account). Vygutsky was in faci
writing at the same time as Dewey and there is some <,vidence that they actu-
ally met (Glassman, 2(01).
Vygotsky's thinking has subsequently influl.'nced theorists such as Bruner
(1996) in the USA and Engestrom (1999) in Finland. Bruner has been interested
in the education of children but Engeslrom is known principally for recunfig-
uring Russian activity theory as an explanation of how ll.'aming happens in the
workplace. Other key theorists who regard individualleaming as in
the social environment indude Rogoff (1990) and Lave and Wenger (Lave and
Wenger, 1991; Wenl;jer, 1998) whu draw on anthropological work to character-
ize learning as 'cognitive apprenticeship' in 'communities of practice'. Given
the intellectual roots - deriving as much from social theory, sociology and
56
Assessment, Teaching and Theories of Learning
anthropology as from psychology - the language and concepts employed in
socio-rultural approaches are often quite different. For example, 'agency', 'com-
munity', 'rules', 'roles', 'division of labour', 'artefacts' and 'contradictions'
fealure prominently in the discourse.
Acrording to this perspective, learning occurs in an interaction between the
indi\'idual and the social environment. (It is significant that Vygotsky's seminal
work is entitled Mind ill Society.) Thinking is conducted through actions that
alter the situation and the situation changes the thinking; the two constantly
interact. Especially important is the notion that learning is a mediated activity
in which cultural artefacts have a crucial role. These can be physical artefac15
such as books and equipment but they can also be symbolic tools such as lan-
guage. Since languagl', which is central to our capacity to think, is dc\elopt.'1.1 in
relationships between people, social relationships are necessary for, and
pTC.'de, learning (Vygotsky, 1978). Thus learning is by definition a social and
collaborative activity in which peoplc develop their thinking together. Group
work is not an optional extra. Learning invoh'es participation and whilt is
learned is not necessarily the property of an individual but shared within the
social group, hl'llce the concept of 'distributed cognition' (Salomon, 1993) in
which collective knowledge of the group, community or organization is
regarded as greater than the sum of the knowledge of individuals. The out-
comes of learning that are most valued are engaged participation in ways that
others find appropriate, for example, S4.'Cing the world in a pilrticular way and
acting acrordingly. The de\"t'lopment of identities is particularly important; this
involves the learner shaping and being shaped by a community of practice.
Knowledge is not abstracted from conte)(t but seen in relation to it, thus it is dif-
ficult to judge an individual as having acquiTL'1.1 knowledgt' in generaltenns,
Ihat is, e)(tracled from praclice.
These theories provide very interesling descriptions and explan,ltions of
learning in communities of practice bulthe newer ones arc not yet well worked
out in tem\s of their implications for teaching and assessment, particularly in
the case of the 1aller and especially in school contexts. Example 3 in the section
above is my altempt to t')(trapolate from the theory. According 10 my reading..
socio-rullural approaches imply that the leacher needs to create an environ
men! in which people can be stimulated to think and act in authentic tasks (like
apprenticl'S) beyond tht'ir currt'nt level of compelence (bul in what Vygotsky
calls their 'zone of proximal development'). Access to, and use of, an appropri-
,lte range of tools are important aspects of such an expansive learning environ-
m..n\. It is important to find activities that learners can complete with assistance
but not alone so that the 'more experl other', in some cases the teacher but often
a pt'Cr, can 'scaffold' their learning (a concept shared with cognitivisl
approaches) and remove the scaffold when they can cope on their own. Tasks
need to be collaborative and students must be involved both in the generation
of probll'ms and of solutions. Teachers and students jointly solve problems and
all develop their skill and understanding.
Assessment within this perspective is weakly conceptualized at prescnt.
Since tht' model draws extt'nsively on anthropological concepts one might
57
Assessment and learning
expect forms of ethnographic observation and inference to have a role.
However, Pelltogrino et aI. (2001: 101) devote only one paragraph to this possi-
bility and make a single reference to 'in vivo' studies of complex situated
problem solving as a model. In the UK. Filer and Pollard (2000) provide an
ethnographic account of the way children build learning identities and the role
assessment plays in this. As they show, learning can be inferred from active par-
ticipation in authentic (real-world) activities or projects. The focus here is on
how well people exerdse 'agency' in their use of the resources or tools (intel-
lectual, human, material) available to them to formulate problems, work pro-
ductively, and evaluate their efforts. Learning outcomes can be captured and
reported through various forms of recording.. including audio and visual
media. 'The portfolio has an important role in this although attempts to 'grade'
portfolios according to 'scoring rubrics' seems to be out of alignment with the
socio-cultural perspective. Serafini (2000) makes this point about the slale-man-
dated Arizona Student Assessment Program, a portfolio-based system, which
reduced the possibilities for 'assessment as inquiry' largely to 'assessment as
procedure' or even 'assessment as measurement'. Biggs and Tang (1997) argue
that judgement needs to be holistic to be consistent with a socio-cultural or sit-
uated approach. Moreover, if a key goal of learning is to build learning identi-
ties then students' own self-assessments must be central. However, this raises
questions about how to ensure the trustworthiness of such assessments when
large numbers of studffits are involved and when those who are interested in
the outcomes of such learning cannot partidpate in the activities that generale
them. aearly, more work needs 10 be done to develop approaches to assess-
ment coherent with a socio-cultural perspective on learning.
Possibilities for ecledicism or synthesis
The previous two sections have attempted 10 show the potential to develop
consistency between assessment practice and beliefs about learning and to
proVide a basis for arguing that change in one almost always requires a change
in the other. I have noted, however, that assessment practice is sometimes out
of step with developments in learning theory and can undermine effective
teaching and learning because its washback effect is so powerful, especially in
high stakes settings. It would seem. therefore, that alignment between assess-
ment practice and learning theory is something to strive for. But is this realistic
and how can it be accomplished? Teachers are very interested in 'what works'
for them in classrooms and will sometimes argue that a blend or mix of practi-
cal approaches works best. They will wonder if this is acceptable or whether
they have to be purist about the perspective they adopt. They might ask: Do I
have to choose one approach to the exclusion of others? Can J mix them? Or is
there a model that combines elements of all? These questions are essentially
about purism, or synthesis. An analogy derived from chemistry
might help to make these distinctions clear.
The paradigm purist might argue that, like oil and water, these theories do
5.
Assessment. Teaching and Theories of Learning
not mix. A theory, if it is a good theory, allempts to provide as complete an
account as possible of the phenomenil in question. Therefore one good theory
should be sufficient. Howe\'er, if the bounds around a set of phenomena are
drJwn slightly differently, as they can be with respect to teaching ,md learning
beciluse it is a wide and complex field of study, then a number of theories may
overlap. Thus behaviourist approaches seem to work perfectly well when the
focus is on the development of some basic skills or habitual behaviours. In these
contexts, too much thought might actually get in the way of execution. On the
other hand, cognitivist approaches seem to be best when deep understanding
of conceptuill structures within subject domains is the desired outcome. Thus,
'fitness for purpose' is an important consideration in making such judgements
and a blending of approaches. like a mixture of salt and bicarbonate of soda as
a substitute for toothpaste, might work well. Such a combination would consti-
tute an edectic approach. Nonetheless, there are practices that contradict each
other and to employ them both could Simply confuse students. The use of non-
specific praise is a case in point. Whilst the use of such praise to reinforce the
desired behaviour may be effective in one context, in another context it can bt!
counter-productive to the development of understanding (see Chapter 4 for
more discussion).
The nature of the subject domain might also encourage consideration of
whether priority should be given to one approach in preference to another. For
example, subject disciplines such as scienC{' and mathematics, with hierarchi-
cally-ordered and generally-accepted conceptual structures, may lend them-
selves to constructivist approaches beller than broader 'fields' of study with
contested or multiple criteria of what counls as quality learning (&Idler, 1987),
such as in the expreSsive arts. It is perhaps no surprise that teaching and assess-
ment applications from a constructivist perspecth'e draw on an overwhelming
majority of examples from science and mathl'malics (see Bransford et aI., 2000,
and Pellegrino et aI., 2001). Many elaborations of formative assessment also do
so (Black et al., 2003) although accounts of applications in other subjects are
being de\'eloped (Marshall and Hodgen, 2005) with a resulting need to critique
and adapt earlier models (see Chapter 5).
Most importantly, the constructivist approach in both theory and practice
has taken on board the importance of the social dimension of learning: hence
the increasing usc of the term 'social constructivism'. Similarly, there is now
evidence that socio--cultural and activity theory fram('works are invoh-ed in a
'discursiv(' shift' to recognize the cognitive potential to explain how we learn
new practices (Edwards, 2005). This seems to suggest possibilities for synthesis
whereby a more complete thl'Ory can emerge from blending and bonding key
elements of previous thrones. The analogy with chemistry would be the cre-
ation of a new compound (for example, a polymer) through the combining of
elements in a chemical reaction. Thus synthesis goes further than eclecticism
towards creating a nC'w alignment. Could it be that one day we will have a more
complete meta-theory which synthC'sizes the insights from what now appear to
be rather disparate perspectives? Could such a throry permit a range of assess-
ment practices to fit different contexts and purposes whilst still maintaining an
59
Assessment and Learning
internal consistency and coherence? Chapter 5 goes some way to meeting this
challenge with respect to fonnative assessment/assessment for learning. Cer-
tainly, the possibility for a more complete and inclusive theory of learning to
guide the practice of teaching and assessment seems a goal worth pursuing,
to the end, however, decisions about which assessment practices are most
appropriate should flow from educational judgements as 10 preferred learning
outcomes. This forces US to engage with questions of value - what we consider
to be worthwhile, which in a sense is beyond both theory and method.
60
Chapter 4
The Role of Assessment in Developing
Motivation for learning
Wynne Harlen
This chapteT is about motivation for learning and how assessment for different
purposes, uS(.'<l in various ways, can affect it, both bl.>ncfidally and detrimen-
tally. It begins with a brief di!;CUssion of some key components of motivation for
learning and some of thc theories relevant to it. This is followed by refercnre to
research evidence relating to the impact of summati\'!' assessment on motiva-
lion for learning. Despite the great range and variety in the research studies.
their findings conv(>rge in providing (>vid..nce that soml' summatiw assessnwnt
practices, particularly high stakes tests, have a negalive impact. At the same
time, the evidence points towards ways of avoiding such impact. Not surpris-
ingly, these actions suggest classroom practin-s that reflect m,my of the features
of '(ormatiV<' assessment', or 'assessment (or learning', these two terms being
used interchangeably here to describe assessment when it has the purpose and
effect of enabling students to make progress in their learning. The chapter ends
by drawing together implications (or assessment policy at the school, local and
national levels.
The importance of motivation for learning
Motivation has been described as 'the conditions and processes that account for
the arousal, direction, magnitude, and maintenance of effort' {Katzell lind
Thompson, 1990: 144}, and motivation (or learning as the 'engine' that drives
teaching and ]<'arning (Stiggins, 2001: 36). It is a constmct of what impels learn-
ers to spend Ihe time and effort nceded for learning and solVing problems
(Bransford et aI., 2(00). [t is clearly central to learning, bul is not only needed as
an input into education. [I is also an essential outcome of education if students
are to be able to adapt to changing conditions and problems in their lives
beyond f<.lrmal schooling. Thc more rapid thc chiln!::c in th... "S(! conditions, the
more important is strong motivation to learn new skills and to enioy the chal-
lenge.
Consequently, d(>vcluping motivatiun (or learning is seen as an important
outcum(' of educatiun in the twenty-first century and it is eSS(>ntial to be aware
of what aspects of leaching and learning practia' act to promote or inhibit il.
Assessment is one of the key factors that affect motivation. Stiggins claims that
61
I
Assessment and Learning
teachers can enhance or destroy students' desires to Jearn more qUickly and
more permanently through their use of assessment than through any other
tools al their disposal (2001: 36). In this chapter we look at this association and
take it further to suggest ways of using assessment to enhance motivation for
learning. However, it is first necessary to consider the nature of motivation in
some detail, for it is not a single or simple entity. By rerognizing some of ils
complexity we can see how assessment interacts with it.
The concept of motivation for learning
In SOffie sense all actions are motivated, as we always have some reason for
doing something.. even if it is just to fill an idle hour, or to experience the sense
of achievement in meeting a challenge, or to avoid the consequences of laking
no action.
People read, or even write books, climb mountains or take heroic risks for
these reasons. We may undertake unpleasant and apparently unrewarding
tasks because we know that by doing so we avoid the even more unpleasant
consequences of inaction or, in other circumstances, achieve the satisfaction of
helping others. In tasks that we enjoy, the motivation may be in the enjoyment
of the process or in the product; a person might take a walk because he or she
enjoys the experience or because the destination can only be reached on foot, or
because of the knowledge that the exercise will be good for the health. In such
cases the goals are clear and the achievement, or non-adue\"ement, of them is
made evident in a relatively short time. In relation to learning. however, the
value of making an effort is not always apparent to the student. This underlines
the importance of understanding how learning contexts and conditions, and
particularly the emdal roll' of assessment, impact on motivation.
ExtrinSK .nd intrinsic motiv.tion
There is a well-established distinction betwL'l'n intrinsic and extrinsic motiva-
tion. When applied to motivation for learning it refers to the difference between
the learning process being a source of satisfaction itself or the potential gains
from learning being the driving force. In the latter case, extrinsic motivation, the
benefit derived may be a result of achieving a certain level of attainment but is
not related to what is learned; learning is a means to an end, not an end in itself.
On the other hand intrinsic motivation describes the situation in which leamers
find satisfaction in the skills and knowledge that result and find enjoyment in
learning them. Intrinsic motivation is seen as the ideal, since it is more likely to
lead to a desire to continue learning than leaming motivated extrinsically by
rewards such as stars, certificates, prizes or gifts in the absence of such external
incentives. Most teachers ha\"e come across students who constantly ask '[s it
for the er-amination?' when asked to undertake a new task. This follows years
of being told how important it is to ~ the examination rather than to hereme
aware of the usefulness and interest in what is being learned.
62
The Role of Assessment in Developing Motivation for learning
1he distinction betwt>en intrinsic and extrinsic motivation for learning is a
useful one when one considers the extremes. 1here are times when effort is made
in undertaking a task because of enjoyment in the process and satisfaction in the
knowledge or skills that result. There are also times when the effort is made
because either there are penalties for not accomplishing a task according to expec-
tations or there are rewards that have little connection with the learning task
(such as a new bicycle for passing an examination). However, there is a large area
between the extremes where is il difficult to characterize a reward as providing
extrinsic or intrinsic motivation. For example, tm- desire to gain a certificate
which enables a learner to pass on to the next stage of learning could be regarded
as extrinsic motivation,. but on the other hand the certificate can be st'en as sym-
bolic of the learning achieved. Similarly praise can be a confirmation that one has
achieved something worthwhile or a reason for expending effort.
Furthermore, to regard all extrinsic sources of motivation as 'bad' and all
intrinsic motivation as 'good' ignores the reality of the variety of learning. of
learning contexts and goals as learning. Hidi (2000) suggests that what may
apply to short-term or simple tasks may not apply 10 long-term and complex
activities. She contends that 'a combination of intrinsic rewards inherent in
interesting activities and external rewards, particularly those that proVide per-
formance feedback, may be required to maintain individuals' engagement
across complex and often difficult - perhaps painful- periods of learning' (Hidi
and Harackiewicz. 2000: 159). Nevertheless, there is strong evidence, reviewed
by Dt.'Ci et al. (1999), that external rewards undermine intrinsic motivalion
across a range of activities, populations and types of reward. Kohn has written
extensively about the destructive impact of external rewards, such as money, on
student learning. From experimental studies comparing rewarded and non-
rewarded students he concludes that those students offered external rewards:
choose cll5ier tllSks, IlSS ill IIs;ng the information available to solve
novel problems, and lend to be answer-on'entated and mort illogical in tlltir
probll:m-solving strattgits. They ste1I1 to work harder and product mort activity,
but tilt activity is ofa Iowtr quality. contains mort and is more stereotyped
and less thQn the work of subjects wurking on tht prob-
lems. (7993: 471-2)
Although the quality of this particular research by Kohn has been criticil.ed
(Kellaghan et al., 19%), the findings arc supported by similar studies and Kel-
laghan et al. (1996) themselv"", report evidence that intrinsic motivation Is aSSO-
ciated with levels of engagement in learning that lead to conceptual
understanding and higher level thinking skills. The review by Crooks (1988)
also drew attention to research that indicates the problems associated with
extrinsic motivation in tending to lead to 'shallow' rather than 'deep' learning.
'Intrinsic' and 'extrinsic' are descriptions of overall forms of motivation but
to understand how to promote intrinsic motivation in individualleamers it is
necessary to consider some underlying factors. Rewards and punishments are
only one way of influencing motivation and people vary in their response to
63
Assessment and learning
them; the reward has to be valUl-'l:l. if it is 10 promote the effort needed to achieve
it. The effort required for leaming is influenced by interest, goal-orientation,
locus of control, self-esteem, self-efficaey and self-regulation. These are inler-
connected components of motivation for learning and there is a good deal of
eVidente that assessment has a key role in promoting or inhibiting them and
henct' affects the nature of the leaming achieved in partirular cirrumstances.
Components of motivation for learning
Interest
Interest is the result of an interaction between an individual and certain aspects
of thl! cnvironml'flt. It has a powerful impact on leaming. Hidi and Harackiewicr
suggest that 'it can be viewed as both a state and a disposition of a person, and it
has a cognitive, as well as an affective, component' (2(XX): 152). As it depends on
the individual as well as on the environment, studies have identified two aspects:
individual or personal interest, and 'situational' interest, residing in context\lal
factors of the en"ironment. Individual interest is considered to be a relativt'Iy
stable response to ct'rtain experiences, objects or topics that develop over time as
knowledge increases and enhances pleasure in the activity. Situational interest
resides in certain aspects of the environment that attract attention and mayor
may not last. Not surprisingly those with personal interest in particular activities
persist in them for longer, learn from them and enjoy the activities more than
those with less personal intef('St. Wht!re personal interest is absent, situational
interest is partirularly important fur involvement in leaming. Featuf('S of learn-
ing activities such as IlQvelty, surprise and links to existing e.xperience provide a
meaningful context and can therefore help to engage studl'f1.ts' interest. Some
potentially boring activities can be made interesting through, for example,
making them into games. 11 has also been found that changing the social en,i-
ronment can encourage inten-'St; for instanct', some students show more when
working with others than by themselves (Isaac et aL, 1999).
1l\c aim of creating situational interest is to get students to participate in
learning1tasks that they do not initially find interesting, in the hope that per-
sonal interest may develop, at the same time as some learning taking plact'. This
is more likely to happen if students are encouraged to see the purpose of their
involvement as leaming. Thus tm- development of interest !holt leads to learn-
ing is ('Onnected with goal orientation and with the type of feedback they
receive, both of which are closely connected with assessment as discussed later.
Goal orientation
How learners see the goals of engaging in a leolming task determines the direc-
tion in which effort will be made and how they will organize and prioritize (or
not) time spent for learning. The nature of the goo.l that is adopted is clearly crit-
icaL Goals will only be selected if they are understood, appear achievable, and
are seen as worthwhile. As Henderson and Dweck (1990) point out, if students
..
,
The Role of A$wssment in Developing Motivation for Learning
do not value the goals of academic achievement they are unlikely to be moti-
vated to achieve them.
The relationship between the goals embraced by a learner and ho..... they
respond to learning tasks is ellpressed in tenns of two main types of goal. These
are described as 'learning (or mastery) goals' and 'perfonnance (or ego) goals'
(Ames, 1992). Those motivated by goals identified in tenns of learning apply
effort in acquiring new skills, seek to understand what is involved rather than
just committing infonnation to memory, persist in the faa- of difficulties, and
generally try to increase their competence. Those oriented towaros goals iden-
tified as a level of performana- seek Ihe easiest way to meet ret:juiremcnts and
achieve the goals, compare themseh'es with others, and consider ability to be
more important than effort.
A good deal of research evidence supports the superiority of goals as learn-
ing o\'er goals as performance. For example, Am('!j and Archer (1988) found
those who hold with goals as learning seek challenging tasks and Benmansour
(1999) found a particularly strong association between goal orientation and the
use of active learning strategies. The use of more passive learning strategies and
avoidance of challenge by those who see goals as performance is particularly
serious for lower achieving students. lndL'Cd BUller (1992) found thai the effects
of different goal orientations are less evident among high achieving students or
those perceiving themselves as perfonning well than among those performing
less well. But the extent to which goal orientation is a dichotomy has been chal-
lenged by evidence that goals as learning and goals as performance are uncor-
related (Mcinerney el aI., 1997) and that there may be students who endorse one
or other, both or neither. The fact that researchers have sct up experimental sit-
uations that induce different goal orientations in order to investigate their effect
(as in the study by Schunk, 1996, outlined later) indicates that they are subject
to change and manipulation and so can be influenced by classroom culture.
The evident value for school work of goals as learning leads to the question
of how students can be oriented or re-oriented towards these rather Ihan goals
as pl'rformanee. This question of how individuals rome to embrace gools is dis-
cussed by Kellaghan et al (1996). They cite evidence of the need to ensure that
goals are understood, that they are challenging but achievable, seen to be ben-
eficial to the learner and are valued by them, and that the social and cultural
conted facilitates opportunities for learning. In relation to the last of these con-
ditions they romment:
$o(';al and cu/lu.al an: ;mporlanl aspecls of wcauSl' Ihey
elln inftuenct students' of self, their beliefs Ilboul aehirocnlenl, lind the
selection ofgools. Thus II may, 0. mll!J 1101, Ildopl Ilchiromrelll gools 10 gllill
or ka-p the approval of othen;. ... If aaldemic IlchievemtJIl is nol VIIlued in II
stud"nt's neighbourhood, pet" group, orfamily, 1M sl udelll will be Ilffected bylhis
in ronsidmng whethrr or nol to Ildopt Ilclldem;c gools. Evell if Ilclldemic IlchiL'!1e-
menllllld the rtWQrds lISSOCililed with illire perceived 10 have Ixd"e, II sluJelrt may
decide thaI home and school support art inadequate /0 help him or her succ:eed.
(1996: 13-14)
65
Assessment <1nd Le<1rning
This further underlines the of the components of motivation
chosen for discussion here. It also draws attention to the extent to which learn-
ers feel themselves to be in control of their learning.. the "locus of control', the
point to which we now tum.
Locus of control
As just suggested, 'locus of control' refers to whether learners perceive the
cause Of their success or failure to be under their control (internal 10001S) or to
be controlled by others (external locus). Locus of control is a central concept in
attribution theory 1979). A sense of internal control is evident in those
who recognize that their success or failure is due to factors within themselves,
either their effort or their ability. They see themselves as capable of success and
are prepared to invest the necessary effort to meet challenges. Those with a
sense of external control attribute their success or failure to external fadors,
such as their teacher or luck. They have less motivation to make an effort to
overcome problems and prefer to kl'ep to tasks where they can succeed.
In addition. the beliefs of learners about whether their ability is something
that can or cannot be changL>d by effort affects thL'ir response to challenging
tasks (DwCC"k. 1999). Those with a view that their effort can improve their ability
will not be deterred by failure, but will persist and apply more effort. Those
with a view of their ability as fixed find, in success, support for their view. But
failure casts doubt on the ability they regard as fixlXl. So risk of failure is to be
avoided; when not confident of success, they are likely to avoid challenge. As
in the case of goal orientation. the consequences are most serious for those who
perceive/heir ability to be low, for the chance of failure is higher and th('y learn
to expect it. The implication for their feeling of self-worth as a leamer, and self-
esteem more generally, is clear.
Self-esteem
Self-esteem refers to how people value themselves both as people and as learn-
ers. It shows in the confidence that the person feels in being able to learn. Those
who are confident in their ability to learn will approach a learning task with an
expectation of sucress and a determination to overcome problems. By contrast,
those who have- gained a vicw of themscl\'cs as less able to succeed art:' likel)' to
be tentative in attempting new tasks and deterred by problems encountered. As
a resullthey appear to make less effort to learn and find less and less enjoyment
in the learning situation. As noted, this is related to their view of whether they
have control over their performam:t' and whether effort can improve it.
5elf-efftucy
Sclf-efficacy is closely related to self-esteem and to locus of control, but is more
directed at Specific tasks for subjects. It refers to how capable the learner fC(!ls
of succeeding in a particular task or type of task, It is characterized as 'I can'
"
The Role of Assessment in Developing Motivation for Learning
versus 'I can't' by Anderson and Bourke (2000: 35) who state that it is a learned
response, the learning taking place over time through the student's various
experiences of success and failure. Clearly, the more a student experiences
failure in relation to a type of task the more likely it is that they will become
convinced of not being able to suCC('('d. The student develops a condition
described as 'learned helplessness', characterized by a lack of persistence with
a task or even an unwillingness to put enough effort into it to ha\'e a chance of
success. Assessment must have a key role in this development, so it is impor-
tant for learning that the assessment is conducted so as to build self-efficacy.
Self-regulation in learning refers to the will to act in ways that bring about
leaming. It refers to learners' consciously controlling their attention and actions
so that they are able to solve problems or carry out lasks successfully. Self-reg-
ulated learners select and use strategies for l('aming and evaluate their success.
They take responsibility for their own learning and make choices about how to
improve. Those not able to regulate their own learning depend on others to teU
them what 10 do and 10 judge how well they have done il. Young children are
able to regulate their leaming by adopting simple strategies relevant to learn-
ing, such as focusing their attention on key features to detect changes or 'clus-
tering' to aid their memory. Bransford et al. (2000) quote the example of third
year school students outperforming college students in memorizing a list of 30
items. The younger students grouped the items into dusters with meaning for
them which aided recall. It would appear from examples such as this that learn-
ing depends on a control of strategies and not just on an increase in experience
and information.
Consciously selecting relevant strategies is a step towards students reflecting
on leaming and becoming aware of their own thinking. leading to ffida-eogni-
tion. For this they need a language to use when talking about learning and about
themselves as learners. Developing and using this language, in a context where
each person is valued, were found by Deakin Crick et al. (2002) to be central in
developing students' strategic awareness of their learning. Promoting self-regu-
lation and mela-cognition enables effort to be directed to improve performance.
Assessment and motivation for learning
How leaming is assessed is intimately related 10 views of learning. Behaviourist
views of learning. which continue to permeate classrooms and indeed to influ-
ence education policy decisions, are based on reinforcing required behaviour
wilh rewards and deterring unwanted behaviour with punishments. Student
assessment is generally the vehicle for applying these rewards and punish-
ments. Constructivist views of learning focus attention on the processes of
learning and the leamer's role. Teachers engage students in self-assessment and
use their own assessment to try to identify the leamer's current understanding
67
Auesment and learning
and level of skills. These are matters discussed in detail in Chapter 3. Our focus
here is on how assessment affects each of the components of motivation dis-
cussed in the last section. As we will see there are both negative and positive
dfeets and by considering both we can draw out in the next section, the ways
in which assessment can promote motivation for learning.
The research studies of how assessment impacts on motivation for learning
are variable in design. population studied, and in quality. A systematic review
of research on this impact, conducted by Harlen and Deakin Crick (2002, 2(03)
identified 183 potentially relevant studies, of which 19 remained after succes-
sive roundS of applying inclusion and exclusion criteria, and making judgments
on the weight of evidenct' each study provided for thl' questions addressed. The
rest.'arch discussed ht>re draws ht>avily on this review, mainly on the 12 studies
that provided evidence of high weight for the review questions. The focus was
on thl' impact of summative a5Sl'ssment, some conducted by teachers and some
by external agencies. These are the most common fonns of assessment encoun-
tered by students for, as Black and Wiliam (1998a) point out, current practice of
assessment lacks many of the features that are required for assessment to be
formative. The findings indicate how assessment can be practised so that, even
though its purpose is summative, it can support rather than detract from moti
vation for learning.
Motivation, as we have seen, is too complex a ooncept for it to be studied as
a Single dependent variable. Rather, research studies have concerned one or
mol'(' of tht> oomponents indicated in the last section. underlining their inter-
relatedness. The studies do not fit neatly into categories identified by the com-
ponents of motivation as dependent variables. Thus the <1pproach taken here is
to outline the findings from some key studies grouped <1ccording to the inde-
pendent variable, the assessment being studied, and then to draw together the
motivation-related themes emerging from them.
Studies of the impact of the national testing and assessment In
England and Wales
Several studies were <1ble to take <1dvant<1ge of the introduction into England
and Wales of fonnal tests and teachers' assessments from the beginning of the
19905 in order to explore the changes associated with the innovation. In primary
schools the national curriculum tests represented <1 considerable change from
previous practice and a unique opportunity to rompare students' experiences
before and after this innovation. Part of one such study was reported by Pollard
et al. (2000). The research was one element of a larger longitudinal study, which
mapped the educational l'xperiences in a cohort of students as they passed
through primary school beginning just one year before the introduction of the
national tests and <1sscssment in England and Wales. Over the eight years of the
study, personal interviews with head teachers, teachers and students were
some of the most important sources of d<1ta. Other procedures included ques-
tionnaires for teachen;, observation in classrooms using systcm<1tic quantitative
procedures and qU<1lit<1tive approaches, open-ended or partially structured
.8
The Role of Assessment in Developing Motivation for Learning
field notes, and children's cartoon bubble completions. Sodometric data on chil-
dren's friendship patterns and tape recordings of teadw.rs' interactions with
children were also collected.
The study found that in the initial stages of national testing the teachers tried
to 'protect' studt>nts from tht> effects of the new assessment requirements, which
they saw as potentially damaging. But as time went on,. teachers became more
accepting of a formal structured approach to student assessment. As the stu-
dents became older they were aware of assessment only as a summative activ-
ity. They used criteria of neatness, correctness, quantity, and effort when
commenting on their own and others' work. There was no evidence from stu-
dents that teachers were communicating any formative or diagnostic assess-
ment to them. Feelings of tension. uncertainty and test anxiety were reported.
The researchers concluded that pressure of external assessment had had an
impact on students' attitudes and perceptions. Students became less confident
in their self-assessmmts and more likely to attribute success and failure to
innate characteristics. They were less positive about assessment interactions
that revealt'd their weaknesses. The assessment process was intimately associ-
ated with their developing sense of themselves as learners and as people. They
incorporated their leachl'rs' evaluation of them into the construction of their
identity as learners.
Another study of the impact of the national curriculum tests in England and
Wales focused specifically on students' self-esteem. Davies and Brember (1998,
1999) conducted a study beginning two years before the introduction of
national tests and extending for several years afterwards, using successivt>
cohorts of Year 2 (7-year-old) and Year 6 (ll-yeal'-Old) students. They adminis-
tered measures of self-esteem and some standardized tests in reading and
mathematics. For Year 2 children, self-esteem dropped with each year, with the
greatest drop coinciding with tht> introduction of the national curriculum tests.
Although there was a small upturn for the fifth cohort, the level still remained
lower than the third and very much below the se.::ond cohort. Mean levels of
self-esteem for the pre-national test cohorts were significantly higher than for
the post-national test cohorts. TIle difference in self-esteem across cohorts was
highly significant for Year 2 children but not for Year 6 children. Before the
introduction of the national tests there was no overall relationship between self-
esteem and achievement in reading and maths on the standardized tests.
However, there was a positive correlation between self-esteem and perform-
ance after the introduction of national curriculum lests. TIle authors suggested
that the lack of correlation between achi"vem"nl and 5Clf-eSlee.m before the
national curriculum tests meant that the children's view of themselves was
apparently less affected by their attainments than in the case of the post-
national test group.
A small-scale study by Reay and Wiliam (1999) concerned the experienCt'S of
Year 6 (11-year-old) student.s in one primary sd1.ool in the term before taking the
national tests. TIle researdl.ers observed in the class for over 60 hours and inter-
viewed students in groups. They described the class as being at 'fever pitch'
because of the impending tests. l1\e results of the.se test.s had in fact little conse-
69
Assessment and learning
quem:t' for the srndents, but because the school was held responsible for the le\"els
that they reached and was charged to make in scores from one
year to another, the tcsts had high stakes for the teachers involved" In the
observed class, the teacher's anxieties were evident in !h<.> way he berated the chil-
dren for poor performance in the practice tests. Even though the students reoog-
nized that the tests were about how well they had been taught they still worried
about their performance and about possible consequences for th...ir own furnn'.
They were beginning 10 view themselves and others differently in terms of tl'!it
results, equating cleverness with doing v.-ell in the tests, and inCI'('asingly refer-
ring to the levels they expected themselves and others to achieve.
Studi of selection tlKts in Northern Ireland
While the tests for II-year-old students in England and Wales were not used for
selection until 2003, tests of 11-year-olds in Northern Ireland were used for the
highly competitive selection for admission to grammar school. Two studies of
contrasting design reported different kinds of evidence abut the impact of the
tests on aspects of students' motivation for learning. johnston and McClune
(2000) invcstigated the imp3et on teachers, students and students' learning
processes in science lessons through interviews, questionnilires and classroom
observations_ Leonard and Davey (2001) reported the students' perspectives of
the process of preparing for taking and coming to terms with the results of these
tests, generally known as II-plus tests.
Johnston and McClune (2000) used several instroments to measure students'
learning dispositions, self-esteem, locus of control and attitude to scl<.>nce and
related these to the transfer grades obtained by the students in the II-plus
examination. They found four main leaming dispositions, using the Learning
Combination Inventory (Johnston, 1996). These were described as:
'Precise processing' (preference for gathering.. processing and utilizing lots of
data, which gives ris.e to asking and answering many questions and a pref-
erence for demonstrating learning through writing answers and factual
reports);
'Sequential processing' (preference for clear and explidt directions in
approaching learning tasks);
'Technical processing' (preference for hands-on experience and problem-
solVing tasks; wlllingness to take risks and to be creative);
'Confluent processing' (typical of creative and imaginative thinkers, who
think in terms of connections and links between ideas and phenomena and
like to see the 'bigger picture').
Classroom observation showed that teachers were teaching in ways that gave
priority to sequential processing and which linked success and ability in science
to precise/sequential processing. The statistical analysis showed a positive cor-
relation between precise/sequential learning dispositions and self-esteem. The
more positive a student's disposition towards precis.e/St.'quential or technical
70
The Role of Assessment in Developing Motivation for learning
processing. the higher is their self-esteem and the more internal their locus of
control. Conversely, the more confluent the student's learning orientation, the
more external their locus of control and the lower is their self-esteem. Inter-
views with teachers indicated that they felt the need to teach through highly
structured activities and transmission of information on account of the nature
of the selection tests. However, the learning dispositions of students showed a
preference for technical processing. that is. through first-hand exploration and
problem solving. Thus teachers appeared to be valuing precise/sequential pro-
cessing approaches to learning more than other approaches and in doing so
were discriminating against and demoraliZing students whose p r r n ~ was
to learn in other ways.
A study by Leonard and Davey, (20l(1) funded by Save the Children, was
specifically designed to rewal students' views on the II-plus tests. Studl'nts
were interviewed in focus groups on three occasions, and they wrote stories
and drew pictul'("S about their experiences and feelings. The interviews took
place just after taking the test then in the week before the results were
announced, and finally a week after the results were known. Thus th;> various
phases of the testing process and its aftermath could be studied at times when
these were uppermost in the students' minds. As well as being th... cause of
extreme lest anxiety, the impact on the self-esteem of those who did not meet
their own or others' expectations was often devastating. Despite efforts by
teachers to avoid value judgements being made on the basis of grades achieved,
it was clear that among the students those who achieved grade A were pt!r-
ceived as smart and grade D students were !X'fCeived as stupid. The self-esteem
of those receiving a grade D plummeted. What makes this impact all the more
regrettable is that thc measures are so unreliable that many thousands of stu-
dents are misgraded (see Chapter 7).
Studies of regular classroom assessment in North
America
Brookhart and IANoge (1999) studied US third grade students' perceptions of
assessment 'events' taking place in the course of regular classroom work. They
collected data by questionnaire from students about their perceptions of a task
(as 'easy' or 'difficult', and so on) before auempting it. After the ev;>n! they
asked students about how much effort thcy felt they had applied. Selected stu-
dents were then interviewed about their perceptions of the assessment. The
results weTC USl.'d to test a model of the role of classroom assessment in student
motivation and achievement. The findings indicated that students' self-efficacy
judgements about their ability to do particular classroom assessments were
based on p'vious experiences with similar kinds of classroom assessments.
Results of previous spelling tests, for example, weTC offered as evidence of how
students expected to do on the current spelling test. Judgemental feedback from
previous work was used by students as an indication of how much effort they
needed to invest. Students who were sure that they would succeed in the work
71
Assessment and Learning
might not put effort into it. However this would depend on their goal orienta-
tion. Those st'eing goals as perfonnance might apply effort, if this was how they
would be judged, in order to gain approval.
The authors also found that teachers' explicit instructions and how they pre-
sented and treated classroom assessrnt.>fI( events afft..eted the way students
approached the tasks. When a teacher exhorted a student to work towards a good
grade that teacher was, on the one hand, motivating students and on the other
was setting up a perfonnance orientation that may have decrea.o;ed motivation.
Duckworth et at (1986) also studied the impact of nonnal classroom grading
procedures but in this case with high school students in the USA across differ-
ent subjects. Their aim was to understand the relationship between effort, effi-
cacy and futility in relation to types of teacher feedback at the individual
student level, at the class levet and at the school level. Questionnaires were
administered to a cross-seetion of students in 69 schools to proVide indices of
effort, efficacy and futility. At the individual [e\'el they found efficacy positively
correlated with effort across all ability levels and subjects. These same n'lation-
ships were stronger at class level. However, there was only scattered support
for the hypothesis that the fit between the tests and what had been shJdied
would be positively associated with efficacy and negatively associated with
futility. At the school level, collegiality (amount of constructive talk about
testing) among teachers was related to students' perceptions of desirable testing
practices and students' feelings of efficacy and effort. School leadership was
needed to develop ..md foster such collegial interaction.
Some of the detailed findings antidpated those of Brookhart and DeVoge
(1999). In particular, Duckworth et al. (1986) found students' perceptions of com-
munication, feedback, and helpfulness of their teachers to be strongly related to
feelings of efficacy of study and effort to study. They also found that the students'
perceptions, in relation to the communication. feedback and helpfulness of their
teachers to be strongly related to their feelings of the efficacy versus futility of
study and of their own efforts to study. The authors suggested that the difference
found betv.-een results for specific events and the more gencralll,'actions was p0s-
sibly due to the infonnal culture of expectations, built up over the year by teach-
ers' remarks and ll,'adions that had operated independently of the specific
practices studied. This may be part of a 'halo' effeet from desirable class testing
practices. They therefore argued lhat increasing student perceptions of desirable
class testing practices may increase feelings of efficacy and levels of effort.
Students' understanding of the grades they were given by their teachers was
the subject of a study by Evans and Engelberg (1988). Data were collected by
questionnaire from students in grades 4 to 11 in the USA, about understanding
of grades, attitude to grades, and attribution. In terms of understanding of
grades the authors found, as hypothesized, that older students understood
simple grOldes more than younger ones, but even the older students did not
understand complex systems o( grades in which judgments about effort and
behaviour were combined with academic achievement. The experienn' of being
given a grade, or label, without knowing what it meant seemed likely to lead to
a feeling of he[plessness. In tenns of attihJdcs to grades, not surprisingly,
72
The Role of Assessment in Developing Motivation for Learning
higher-achieving studtmts were more likely to regard grades as fair and to like
being graded more than lower-achieving students. Oearly, receiving low
grades was an unpleasant experience which gave repeated confirmation of per-
sonal value rather than help in making progress. It was found that younger stu-
dents pen:cived grades as fair more than older ones, but they also attached less
importance to them. Evans and Engelberg also looked at attribution and found
that lower achieving and younger students made more external attributions
than higher achieving and older students who used more ability attributions.
This suggested that low-achieving students attempted to protect their self-
esteem by attributing their relative failure to external factors.
In her study of self-regulated learning conducted in Canada, Perry (1998)
divided teachers of grade 2 and 3 students into two groups b ~ d on a survey
of their classroom activities in teaching writing. One group was of teachers
whose natural teaching style encouraged self-regulated learning. In these high
self-regulated classrooms teachers prOVided complex activities, they offered
students choices, enabling them to control the amount of challenge, to collabo-
rate with peers, and to evaluate their work. The other group was of teachers
who were more controlling, who offered few choices, and students' assessments
of their own work were limited to mechanical features (spt'lling, punctuation
and 50 on). lhese were described as 'low self-regulated classrooms'. Question-
naires were administered to students in these two groups of classes and a
sample of students in each group was obsc:<rved in five sessions of writing.
Although there were some limitations to this study, the findings were of
interest. There was a difference between the responses of children in high and
low self-regulated classrooms to being asked what they would want the
researcher to notice about their writing whilst looking through their work.
Although a large proportion of students in both contexts indicated that the
mechanical aspects of writing were a focus for them, many more students in
high selfregulated classrooms alluded to the meaningful aspects and intrinsic
value of their work. Students in the low self-regulatl.'d classrooms also were
more likely to respond 'I don't know' or suggest that they did not care. Simi-
larly, in interviews, the shldents observed in the high self..regulatL>d classrooms
indicated an approach to learning that renected intrinsic motivation. They
showed a task focus when choosing topics or collaborators for their writing and
focused on what they had learned about a topic and how their writing had
improved when they evaluated their writing products. In contrast, the students
in the low self-regulated classrooms were more focused on their teacher's eval-
ulltion." of their writing and how much tht.'}' got right on a particular aSSign-
ment. Both the high and low achievers in the low self-regulated classes were
concerned with getting 'a good mark'.
5tuclles of experimental m.nipul.tion of fft'dback .nd geNII
orl..,tatlon
A study by Butler (1988) of the effect of different forms of feedback, involving
fifth and sixth grade studenls in Israel, is well quoted for its rt.'sults relating to
73
Assessment and Learning
changes in le\'els of achievement. However, the study also reported on the
interest shown in the tasks used in the study foUowing different forms of feed-
back. The students, first and sixth graders, were randomly allocated to groups
and were given both convergent and divergent tasks. After working on these
tasks they received feedback on their performance and answered an interest
questionnaire. Three feedback conditions were applied to different groups:
Comments only: feedback consisted of one sentence, which related specifi-
cally tb the performance of the individual student (task involving);
Grades only: these ",rere based on the scores after conversion to follow a
nomlal distribution with scores ranging from 40 to 99 (ego-involving);
Grades plus comments.
High achieving students expressed similar interest in all feedback conditions,
whilst low achieving students expressed most interest after comments only, The
combined interest of high achieving students receiving grades and grades plus
comments was higher than that of the lower achieving students in these condi-
tions. However, the interest of high and low achieving students in the com-
ments only grades did not differ Significantly. The author concluded that the
results indicated that the ego-involving feedback whether or not combined with
task-involving feedback induced ego-involVing orientation. that is, a motiva-
tion to achieve high scores rather than promoting interest in the task. On the
other hand, promoting task involvement by giving task related non-ego-involv-
ing feedback may promote the interest and performance of all students, with
particular value for the lower achieving students.
In the experimental study of goal orientation and self-assessment by Schunk
(1996) in the USA, fourth grade students were randomly assigned to one of
four eXPfrimental conditions: goals as learning with self-assessment; goals as
learning without self-assessment; goals as performance with and without self-
assessment. The students studied seven packages of material, covering six
major types of skill in dealing wilh fractions and a revision package, for 45
minutes a day over seven days. The difference between the goal instructions
lay in a small change in wording in presenting each package. Self-assessment
was undertaken by the relevant groups at the end of each session. Measures of
goal orientation, self-efficacy, and skills in the tasks (addition of fradions) were
administered as pre- and post-tests. The result of this study was that the effect
of goal orientation on achievement was only apparent when self-assessment
was absent. Self evaluation appeared to swamp any effect of goal-orientation.
Therefore, in a second study all students engaged in self-assessment but only
at the end of the programme rather than in ('ach session, to equalize and
reduce its effect. With self-assessment held constant, the results showed
Significant ('ffeels of goal orientation for self-efficacy and for skill in th('
addition of fractions. The scort'S of the group working towards learning-goals
were significantly higher than those of the performance-goals group on both
measures.
Of relevance here lire several studies, not included in the systematic review,
74
I
I
The Role of Assessment in Developing Motivation for learning
reported by Dweck (1999). When Elliott and Dweck (1988) introduced some
tasks 10 different groups of fifth grade studenls in the USA, they did this in a
way whereby some regarded the goal as performanre and others as learning.
The two groups performed equally well when they experienced SlI0ce5S, but
there was some difference in the groups' ft'Sponse 10 difficult problems. Many
of those given goals as performance began to show paltems of behaviour
reflecting helplessness and t ~ problem-solving stratt'gies deteriorated, whilst
most of those who saw goals as learning remained engaged and continuccl 10
use effective strategies.
Dweck and Leggett (1988) found a relationship between students' theories
about their general ability (intelligence) and goal orientation. This was one of a
series of investigatior\5 into the effects of believing, on t ~ one hand, that intel-
ligence is innalt' and fixed, and on the olher, that intelligence can be improved
by effort. The view of intelligence by some eighth grade students was identified
by asking for their agreement or disagreement with statements such as 'you can
learn new things but you can't really change your basic intelligence' (Dweck,
1999: 21). The students were then offerccl iI seriL'S of tasks, some of which were
describl.>d in terms of 'goals as performance' and some in terms of 'goals as
learning'. They found a significant relationship between bt!licfs about their
ability and the studenls' choice of task, with those holding a fixed view of their
ability choosing a performance goal task.
These findings suggest that students who are encouraged to set' learning as
their goal feel more capable, apply effort, and raise their performance. This is
less likely to happen where students are ori('nted to performance which other
research shows inevitably follows in the context of high stakes summati\"e
assessment. For instance, Pollard et al. (2000) found thai after the introduction
of national tesls, teachers increasingly focused on performance oulcomes rather
than the learning process, Schunk's (1996) findings, however, suggest that
student self-assessment has a more important role in learning than goal orien-
tation,. but when it is combined with goals as learning it leads to impro\'ed per-
formance and self-efficacy.
Using assessment to promote motivation for 'earning
In the foregoing sections we have discussed various forms and components of
motivation ilnd considered some evidence of how it is affected by assessment.
As a start in bringing theSl' together, it is useful to restate the reasons for bt'ing
concerned with motivation for learning. In plain terms, these are because we
want, and indt't'd society needs, students who:
Want 10 learn and value learning;
Know how to learn;
Feel capable of learning;
Understand what they have to learn and why;
Enjoy learning.
7S
Assessment and learning
How does assessment affect these outcomes? We will first bring together the
features of assessment practice that need to be avoided. Then we will look at the
more positive side of the relationship.
Impacts of assessment to be avoided
Assessment, particularly when high stakes art' attached to the results, creates a
strong reason for learning. But this reason is, for the vast majority of srudents, to
pass the test/examination at thl' necessary level to achieve the reward. Students
who are extrinsically motivated in this way see their gools as perfonnance rather
than as learning.. and the evidence shows that this is associated with seeking the
easiest route to the necessary perfonnance. Students with such goal orientation
use passive rather than active learning strategies and avoid challenges; their
learning is described as 'shallow' rather than 'deep' (Ames and Archer, 1988;
Benmansour, \999; Crooks, 1988; Harlen and James, 1997). Students are encour-
aged, sometimes unWittingly, by their teachers in this approach to their work. The
way in which teachers introduce tasks to students can orientate stud<!llts to goals
as performance rather than goals as learning (Brookhart and [)('Voge, 2000;
Schunk, 1996). Repeated tests, in which are they encouraged to perform well to
get high scores, teaches students that performance is what mailers. This perme-
ales throughout classroom transactions, affecting students' approach to thl'ir
work (Pollard et aI., 2000; Reay and Wiliam, 1999).
Pollard et al. (2(0')) suggest that making teachers accountable for test scores
but not for effective teaching. encourages the administration of practice tests.
Many tl!achers also go further and actively coach students in passing tests
rather than spending time in helping them to understand what is being tested
(Gordon and Reese, 1997; Leonard and Davey, 2001). Thus the scope and depth
of leaming are seriously undennint'd. As discussed in Chapter 8, this may also
affed the validity of the tests if coaching in test-taking enables students to
perfonn well even when they do not have the required knowledge, skills and
undl'rstanding.
E\ocn when not directly teaching to the tests, teachers change their approach.
Johnston and McClune (2000) reported that teachers adjusted their teaching
style in ways they perceived as nl'Cl'SSilry because of the tests. They spent the
most timl' in direct instruction and less in prOViding opportunities for students
to leam through enquiry and problem solving. This impairs leaming.. and the
feeling of being capable of learning.. for those students who prefer 10 do this in
a more active way.
The research confirms that feedback to students has a key role in detennin-
ing their feeling of being capabll' of learning.. of tackling their classroom activi-
ties and assessmentlasks successfully. Feedback can come from several sources:
from the reactions of the teachers to their work, from others, including their
pt"'E'rs, and from their own previous performance on similar tasks. In relation to
teachers' feedback, there is strong evidence that, in an atmosphere dominated
by high stakes tests, teachers' feedback is largely judgemental and rarely fonn-
ative (Pollard et aI., 20(0). Butler's (1988) experimental study of different kinds
76
The Role of Assessment in Developing Motivation for Learning
of feedback indicated that such feedback encourages interest in performance
rather than in learning and is detrimental to interest in the work,. and achieve-
ment, of lower achieving students.
The feedback that students obtain from their own previous performance in
similar work is a significant element in their feeling of being able to learn in a
particular situation (Brookhart and DeVoge, 1999). Consequently, if this is gen-
erally judgemental in nature it has a cumulative impact on their self-efficacy.
The opportunity for past experience to help further learning is lost.
Feedback from these different directions adds to the general impression that
students have of their teachers' helpfulness and interest in them as learners.
Indeed, Roderick and Engel (2001) reported on how a school providing a high
level of support was able to raise the effort and test performance of vel)' low
achieving and disaffl"cted students to a far greater degrl't' than a comparable
school providing low level support for similar students. High support meant
creating an environment of social and educational support, working hard to
increase students' sense of self-efficacy, focusing on learning related goals,
making goals explicit, using assessment 10 help students succeed and creating
cognitive maps which made progress evident. They also displayed a strong
sense of responsibility for their students. Low teacher support meant teachers
not seeing the brget grades as attainable, not translating the need to work
harder into meaningful activities, not displaying recognition of change and
motivation on the part of students, and not making personal connections with
students in relation to goals as learning. There are implications here and in
Duckworth et ai's (1986) study for school management. Pollard et aI. (2000) and
Hall and Harding (2002) also found that the assessment discourse and quality
of professional relationships teachers have with their colleagues outside the
classroom influence the quality of teaching and learning inside the classroom.
In summary, assessment can have a negative impact on student motivation
for learning by:
Creating an classroom culture which favours transmission teaching and
undervalues variety in ways of learning;
Focusing the content of teaching narrowly on what is tesled;
Orienling students 10 adopt goals as performance rather than goals as learn
ing;
Providing predominantly judgmental feedback in terms of scores or grades;
Favouring conditions in which summativl.' judgements permeate all teach
ers' aS5eSSment tramactions.
Assessment practices that preserve student motivation
Each item in the above list indicates consequences 10 be avoided and so sug
gests what nollo do. However, the research evidt.'T'lce also provides more posi.
tive implications for practice. One of the more difficult changes to make is to
convince teachers thai levels of achievemenl can be raised by means other than
by teaching to the tests. Certainly students will have to be prepared for the tests
77
I
Assessment and learning
they are r;quired to take, but this besllakes the fann of explaining the purpose
and nature of the lest and spending time, not on practising past lest items, but
on developing understanding and skills by using assessment to help learning.
The work of Black el al. (2003) in development of practical approaches 10 using
assessment for learning has added 10 the evidence of the positive effect of fonn-
ati\"e assessment on achievement (set' Chapter 1). Since the measures of change
in achievl!IIlent used in this work are the same statutory tesls as are used in all
schools, the results show that improvement can be brought about by attention
to learning without teaching 10 the test.
The particularly serious impact of summath'e assessment and tests on lower
achieving students results from their repeated experi>Tlce of failure in compar-
ison with more successful students. There are implications here for two kinds
of action that can minimize the negative impact for all students. The first is 10
ensure thai the demands of a test are consistent with the capability of the stu-
dents, that is, that students are not faced with tests that are beyond their reach
(Duckworth el al., 1986). The nolion of 'testing when ready' is relevant here. It
is practised in the Scottish national assessment programme, where students are
given a test at a certain level when the teachers are confident that based on their
professional judgement they will be able to succeed. Thus all students can expe-
rience success, which preserves their self-esteem and feeling of self-efficacy. The
result also helps students to recognize the progress they are making in their
learning.. noted as important in the research (Roderick and Engel, 2001; Duck-
worth et aI., 1986). The se<:ond action is for teachers actively to promote this
awareness of progress that each student is making and to discourOlge students
from comparing themselves with each other in terms of the levels or scores that
they have aUained.
The rt.'Search also underlines the value of involving students in self-assess-
ment (Schunk, 1996) and in decisions about tests (Leonard and Davey, 2001;
Perry, 1998). Both of these necessitate helping students to understOlnd the
reasons for the tests and the learning that will be assessed, thus helping to
promote goals as learning. These practices are more feOldily applied to those
tests that leachers control rather than to external tests. However, there is abun-
dant evidence that the majority by far of tests that students undergo are
imposed by teachers, either as part of regular checking or in practising for exter-
nal tests. Thus a key action that can be taken is to minimize the explicit prepa-
ration for external lests and use feedback from regular dasswork to focus
students on the skills and knowledge that will be tested.
If teachers are to take these actions, they need support at the school level in
the form of an ethos and policy that promotes the use of assessment to help
learning as well as serving summative purposes. There are implications for the
management of schools in establishing effective communication about assess-
m('llt and developing and maintaining collegialily through structures and
expectations that enable teachers to avoid the negative impact of assessment on
motivation for learning. These school procedures and policies have also to be
communicated to parents.
Finally, there are of course implications for local and national assessm<-nt
78
The Role of Assessment in Developing Motivation for Learning
policies. The force driving teachers to spend 50 much time on direct preparation
for t('S1S derives from the high stakes attached to the results. The regular
national or state-wide tests for all students throughout primary and secondar)'
school have greater consequences for teachers and schools than for students.
But whether the stakes are high for the student (as when the rt.'Sults are used for
certification or selection) or for the teacher and school (as when aggregated
student tests or examination results are used as a measure of teacher or school
effectiveness), the consequence is that teaching and learning are focused on
what is tested with ali the consequences for motivation for learning that have
been discussed here.
The iron)' is that, as an outcome of the high stakes use, the tests do not
provide the valid information required for their purposes. In particular, tests
taken b)' all students can only cover a narrow sample (and the most reliably
marked sample) of student attainment; teaching how to pass tests means that
students may be able to pass evo:!n who:!n they do not have tho:! skills and under-
standing which the test is intended to measure (Gordon and Reese, 1997).
Further, the reliability of the lL'Sts as useful indic.ators of students' attainment is
undennined b)' the differential impdct of the testing procedures on a significant
proportion of students. Girls and lower achieving students are likely to ha\e
high levels of test anxiety that influence their measured perfomumce (Evans
and Engelberg, \988; Benmansour, \999; Reay and Wiliam, (999). Older lower
achieving students are likely to minimize effort and may even answer ran-
domly since they eXJX!ct to fail anyway (Paris et aI., 1991). Thus results may be
unreliable and may exaggerate the difference between the higher and lower
achieving students.
To avoid these pitfalls, the Assessment Rt>(ornl Group (ARC), as a result of
consultation with policy makers and practitioners on the implications of the
research, concluded that designers and users of assessment systems and tests
should:
Be marc actively aware of the limited validity of the information about pupil
attainment that is being obtained from current high stakes testing pro-
grammes;
Reduce the stakes of such summativc assessments by using. at national and
local levels, the performance indicators derived from them morc S{'lectivl'ly
and more sensitivdy. They should take due account of the potential for those
indicators to impact negatively on learning. on teaching and on the curriru-
lum;
Be more aware of the true costs of national systems of testing. in terms of
teaching hme, practice tests and marking. This in t\lm should lead policy
makers to come to reasoned conclusions about the bt.'nt'fits and costs of each
element in those systems;
Consider that for tracking standards of attainment at national level it is
worth b..'Sting a sample of pupils rather than a full age cohort. This IVould
reduce both the nt>gative impacts of high stakes tests on pupil motivation
and the costs incurred;
79
Assessment and learning
Use test development expertise to create fonns of tests and assessments that
will make it possible to assess all valued outcomes of education. including
for example creativity and problem solving;
Develop a broader range uf indicators to evaluate the perfonnance of
schools. Indicators that are derived from summative assessments should
therefore be seen as only one element in a more broadly-based judgment.
This would diminish the likely impact of public judgments of school per-
formance on those pupils whose motivation is most 'at risk' (ARG, 2002b:
11-12).
This chapter has discussed evidence that the way in which assessment is used
both inside the classroom by teachers. and outside by others, has a profound
impact on students' motivation for It'aming. It is evident that motivation has a
kt'y role in the kind of learning in which students engage; a central concern of
this book.
It is natural for students and teachers to aim for high performance, but when
this is measured by external tests and when the results arc accompanied by
penalties for low performance, the aim becomes to perfonn well in the tests and
this is often not the same as to learn welL Moreover, when there are high stakes
attached to the test results the tests are inevitably designed to have high relia-
bility and focus on what can be tested in this way. Although the reliability of
these tests may not be as high as assumro (see Chapter 7), the atll.'mpt to aspire
to 'ob;cetivity' is generally to the detriment of the validity of the IlOSt. The
inevitable consequence, as the research shows, is to narrow the learning expc-
rienCt.'S of the students. However, the impact of high slakes testing may wen
have longer-tenn conse<Juences than the narrowness of corriculum experience.
Further learning and continued learning throughoullife depend on huw people
view themselves as learners, whether they fcc-I lhey can achieve success
through effort, whether they gain satisfaction from learning.: all aspects of moti-
valion for learning.
The impaclthat assessment can have on students can be eilher positive, as
discussed in Chapter I, or negative as set oul in this chapter. What happens
dcpendJ on how the teacher mediates the impact of assessment on studt'nls.
Chapter 3 showed that teachers' views of learning affect their pedagogy. When
teachers see this role as helping students 10 pass tests, by whatever means, their
teaching methods and the experiences of the students are distorted. The align-
ment of assessment, corriculum and pedagogy is most easily upset by chango'S
in assessment and this has 10 be taken into account in designing assessment
policy.
80
Chapter 5
Developing a Theory of Formative Assessment
A model for classroom transactions
Whilst previous chapters have described the development of (ormati,c assess--
ment practices, and have explored \'arious specific aspects of these and their
operation, the aim in this chapter is both more holistic and more ambitious. We
will attempt to set oul a theory of formative assessment. Such 11 theory should
help intem"late the discussion so far within a single comprehensive framework
and thereby provide a basis for further exploration. It would be extravagant to
claim lha! it achieves this purpose, not least because its limited basis is our find-
ings from the King's-Medway-oxfordshire Formative Assessment Project, the
KMOFAP example as described in Chapter 1.
Thai project was designed to enhance learning through thll development of
formative assessment. The basic assumptions that informed the design of the
work wen' in part pragmatic, arising from the evidence that fonnative assess-
ment work did enhance students' perfonnance, and in part theoreticaL One the-
oretical basis was to bring together evidence about classroom questioning
practices (for example research on optimal 'wait lime') with the general princi-
ple that learning work must start from the leamer's existing ideas. The other
was provided by arguments from Sadler that self-assessment and assess-
ment were essential to the effective o;x'ration of fonnalive assessment, a view
that was supported in some of the research evidence. nombly the work of White
and Frederiksen (1998),
However, these are too narrow a basis for making sense of our project's out-
comes. The need to expand the theoretical base was signalled in the response
made by Perrenoud to our review:
This lfeedbackl 110 longer SNms to me, to the central issuc. 11 W(luld
seem more impertl/nl to amcntrate on /hroretkl/l models of lel/ming I/nd its
reguilltiml and implemmtatimr. constitute thl' real SystflflS of thoughl
and ac/iO/I, in which feedback is only one element. 0998: 86)
By 'regulation', he meant the whole process of planning, classroom
implementation, and adaptation, by which teachers achieve their l<laming
intentions for their students. In what follows, we will try to link the ideas
81
,
Assessment and learning
expressed in this statement with an expanded theoretical perspective. The
principal aim is 10 provide a framework within which we can make sense of
whal it was that changed in those classrooms where teachers were dC\'cloping
their use of formative assessment
It is obvious that a diverse collection of issues is relevant 10 the understand-
ing of classroom assessment and so il follows thai, if there is 10 be a unifying
framework,. it will have to be edcctic yet selective in eliciting mutually consis-
tent messages from different perspectives. As one study expresses it:
.. , an attempt to ulldersland IIssessmenl tIIusf involve Q critical comb!-
lIalicm alrd ro-ordinatiQI1 of insights d<Tived from a number of I'sychologicallmd
sociological sllllldpoints, none of which by themselves pnJt!ide asUjjidtrll basis for
analysis. (Torrance and Pryor, 1998: 105)
However, if such a framework is to be more than a mere collection, it will have
to serve to interrelate the collection in a way that illuminates and enriches its
components. It should also suggest new interpretations of evidence from class-
rooms, and new ideas for further research and development work.
In what follows, we will develop our theory on the basis of Ihe work
described in Chapter 1. However, other approaches are mentioned throughout,
and near the end we shall use the framework to make comparisons between this
and other projects which were also designed to study or change teaching and
learning in classrooms.
Starting points
We will begin by considering the classroom as a 'community of practice' (Lave
and Wenger, 1991; Wenger, 1998) or as a 'figured world' (Holland et aI., 1998).
In both these perspectives, the focus is not so much on 'what is' but rather on
what the various actors involved take things to be:
By figLed world', then, we nle,m II socially and ellltllrally colls/rlleted reilim of
inlerprrta/ion in which partiell/llr chllraclf'TS lind lIc10rs aft' recosniud,
Clmet is altaeMd to artllin aels, and parlielllllr olllcomes art 1I11/14ed aver olhers.
wch is a simplified world popllia/ed by a sel of agenls .,. who engage ill a limited
range of meaningfll/acls or changes ofslllle ... liS mOVl"d by II specific;;el afforces.
(We/Iger, 1998: 52)
The focus of the approach is a careful delineation of the constraints and aHor-
dances (Gibson, 1979) provided by the 'community of practice' or 'figured
world' cOmbined with a consideration of how the actors or agents, in this case
the teacher and the students, exercise agency within these constraints and aHor-
dances. Their actions are to be interpreted in tenns of their perceptions of the
structure in which they have to operate, in particular the significance they
attach to beliefs or actions through which they engage, that is, the ways in
82
I
Developing a Theory of Formative Assessment
which they as agents interact with the other agents and forces. These ways serve
to define the roles that they adopl Many of the changes arising in our project
can be inlerpmed as changes in the roles adopted, both by teachers and stu-
denls. However, these perspectives proved inadequate as explanatory or illu-
minative mechanisms.
This was because although the notions of communities of practice and
figured worlds accounted well for the ways in which the actions of agents are
structured (and that of the figured world in particular accounts for the
differing degrees of agency exhibited), neither conct'ptual framework provides
for the activities of agents to change the strocture. In Wenger's example people
learn 10 bel::ome claims processors, and are changed in the process, but the
world of claims processing is hardly changed at all by the enculturation of a
new individual. Similarly, in the examples used by Holland et aI., agents
develop their identities by exercising agency within the figured worlds of, for
example, college sororities, or of Alcoholics Anonymous, but the figured
worlds remain substantially unaltered. In contrast, the agency of teachers and
students, both as individuals and as groups within the classroom can have 11
substantial impact on what the 'world of that classroom' looks like.
Furthermore, our particular interest here is more in the changes that occurred
in teachers' practices, and in their classrooms, than in the continuities and
stabilities.
For this reason,. we have found it more productive to think of the subject
classroom as an 'activity system' (Engestrom, 1987). Unlike communities of
practice and figured worlds, which emphasize continuity and stability. , ...
activity systems are best viewed as complex formations in which equilibrium is
an exception and tensions, disturbances and local innovations are the rule and
the engine of change' (Salomon, 1993: 8-9).
For EngesWm the key elements of an activity system are defined as follows:
77u! sul7jtXt rtftrS 10 tlu' individual or subgroup whOS(' agnu,!! is chasm as the
point of view in Ihe imalysis. The objecl refers 10 Ihe 'raw matmal' or 'problem
spa' at which lire activily is dirfed and which is moulded or transformed illto
outcomes Wit/I the help ofphysicul imd symbolic. external and inlemal tools (medi-
ating instrummts and signs). The community comprisn mulliple individuals
andlor subgroups who slum tire snme object. The division of labour refers to bolh
the horiumtal division of tasb e t ~ n tire membos of the rommullity and to tilt
verticill division of f"OlWr and status. Finally the ruin r(fer to the (;>;pUcit and
implicit rr'gulations, norms and conventiolls tluft constrain actiolls and interac-
tions within th( activity system. (fngestrom, 1993: 67)
These elements form two interconnected groups. The first group constitutes the
sphere of production - the visible actions undertaken within the system directed
towards achieving the desired goals - but these are merely the 'tip of the
iceberg'. Underlying these elements are the social, culturlll and historic conditions
within which the goals are sought, and these two groups of elements and the
dialectic between them together constitute an activity system.
83
I
Assessment and Learning
As noted above. we believe that the most useful st.arting point for analysis is to
analyse the classroom as an activity system. It would, of course, be possible to
consider the whole so::hool or even the wider community as an activity system,
but such an analysis would necessarily ignore the particularities of the features of
individual classrooms that would in our view paint too simplistic a picture. At
the olher extreme, we could view small groups of students in classrooms as an
activity system, with the classroom as the wider context in which they act, bul
such groups are nol well defined in most of the classrooms we observed and thus
would be rather artificial. Adopting the classroom as the activity system allows
other sources of influence to be taken inlo aCa)unl. lhe students' motivations and
beliefs are strongly shaped by their lives outside the school, whilst the classroom
is itself embedded in the rontext of a particular school
How teachers act, and how their students participate, in classrooms study<
ing particular subjects will be influenced by their experiences in oth"'r subject
classrooms, by the cthos of the school and by the wider rommuruty. Thcn.'fon.',
we believe that it is important that the activity system is the 5ubjrct classroom.
Thl'Te are important diffen>nces between a group of students and a teacher
gathering in a particular place for the ll'aming of mathematics and those
meeting to learn sden((' or English. Whilst this view derives in part from the
initial emphasis of our work on classrooms in secondarylhigh schools, our more
recent experiences with primary schools also suggest that, in primary class-
rooms also, the subject being taught at th(> time l'xl'rls a strong on the
way that fonnative practices are impl(>m(>nted.
Before ronsidering the implications of treating the subj<:t classroom as an
activity syst('m, we nCt'd to discuss in more detail the changes in the practice of
the KMOFAP teilchers. We shall do this in tenns of four key aspects, which w('
will suggest prOVide the minimal elements of a theory of formative assessment.
First, wt' discuss changes in the relationship between the teacher's role and the
nature of the subjl"C! discipline. Second, we discuss changes in the teachers'
beliefs about their role in the regulation of the learning process (derived from
their implicit theories of leaming). Third, we discuss the student-teacher inter-
action focusing specifically on thl' roll' of feedback in this process, which
involves discussion of the levels of feedback, the 'fine-grain of fl-edback', ilnd a
brief discussion of the relev'lIlce of Vygotsky's notion of the 'zone of proximal
develop,menl' (ZPD) to the regulation of learning. The fourth e!l'ment of the
model is the role of thl' student.
While a theory that focuses on thl'S(' four components and the way that they
play out in the classroom may not have sufficient explanatory powl'r to be
useful, we do not believe that any attempt to underst.md the ph(>nomena that
we are without taking thl'S(' factors into is likely to be suc-
cessful. Wl' have fonnulated tht-'S(' components because we believe. on th", basis
of the data available to us, that they fonn key inputs for the fonnulation of any
theQry. Our intl'nlion is also to show that Ihl'Se four components fonn a frame-
work which can be incorporated in, and illuminated by. a treatment of thl'
subject classroom as an activity systl'm.
84
I
a Theory of Assessment
First component: tnehers. lumers and the subject discipline
As the project teachers became more thoughtful about the quality, both of the
questions they asked and of their responses to students' answers, it became
evident that the achievement of this quality depended both on the relevance of
questions and responses in relation to the conceptual structure of the subject
matler, and on their efficacy in relation 10 the learning capacities of the recipients.
Thus there was a need to analyse the interplay betwa'fl teachers' views of the
nature of the subject matler particularly of appropriate epislemology and onlol-
ogy, and the selection and articulation of goals and subject mailer that followed
on the one hand, and their models of cognition and of learning (new theories of
cognition amid well be cenlral here - see Pellegrino et al., 1999) on the other. The
types of classroom interaction entailed in the learning contexts of different subject
matlers will not necessarily have a great deal in common with one another.
Comparisons between our experiences of work with teachers of English,
science and mathematics respectively have strengthened our view that the
subjl..'Ct disciplines create strong differences between both the identities of
teachers and the conduct of learning work in their classes (Crossman and
Stodolsky, 1994; Hodgen and Marsh,JlI, 2005). One clear difference between the
teaching of English and the teaching of mathematics and science is that in the
laller there is a body of subject mailer that teachers tend to regard as giving the
subject unique and Objectively defmed aims. It is possible to 'deliver' the subject
malter rather than to help studffits to learn it with understanding, and even
where help with understanding is given priority, this is often simply designed
to ensure that every student achieves the 'correct' conceptual goal.
In the teaching of writing, there is little by way of explicit subject malter to
'deliver', except in the case of those teachers who focus only on the mechanics
of grammar, spelling and punctuation. 50 there is no single goal appropriate for
all. Thus most teachers of this subject are naturally more accustomed to giving
individual fC'l'dback to help all students to improve the quality of their individ-
ual efforts at wrillen conununication. There is a vast range of types of quality
writing the goal can be any point across an entire horizon rather than one par-
ticular point. These inter-subject differences might be less defined if the aims of
the teaching were 10 be changed. For example, open-ended investigations in
mathematics or science, or critical study of the social and ethical consequences
of scil."ntific discoveries, are activities that have more in common with the pro-
duction of personal writing or critical appreciation in English.
It is also relevant that many teachers of English, at least at high-school 1.,,,,,1,
are themselves writers, and students have more direct interaction with the
'subject' through their own reading and writing than they do with (say) science.
Nevertheless, whilst teachers of English might naturally engage more with use
of feedback than many of their science colleagues, the quality of the ft"t'dback
that they provide and the overall strategies in relation to the metacognitive
quality of that feedback still need careful, often radical, development.
While much research into teacher education and teacher development has
focused on the importance of teachers' subject knowledge, such rt"Search has
85
Assessment and learning
rarely distinguished between abstract content knowledge and pedagogical
content knowledge (Shulman,. 1986). Astudy of elementary school teachers con-
ducted for the UK's Teacher Training Agency in 1995-1996 (Askew et al., 1997)
found no relationship between learners' progress in mathematics and their
teachers' level of qualification in mathematics, bUI a strong positive correlation
existed rl.'garding their pedagogical conlent knowledge. This would suggest
that it is important to conceptualize the relationship between teacher and
subject matter as a two-way relationship, in that the teacher's capacity 10
explore and reinterpret the subject matter is important for effective pedagogy.
What is less dear is the importance of change in lhe interaction between stu-
dents and the subjects they are studying. In the main, most middle and high
school students seem to identify a school subject with the subject leacher: this
teacher generally mediates the student's relationship with the subject, and there
cannot be said to be any direct subject-student interaction. However, one aim of
the teacher could well be to enhance the leamer's capacity to interact directly
with the subject's productions. which would involve a gradual withdrawing
from the role of mediator. The meaning 10 be attached to such a change, let
alone the timing and tactics to achieve this end, will dearly be different between
different subjects. In sub;ecls that are even more clearly performance subjects,
notllbly physiCilI eduCiltion and musical performanct', feedback is even less
problematic in that its purpose can be evident to both teacher and student. and
it is clear that the leaming is entirely dependent on it. The students-as-groups
aspect ffilly also emerge more clearly insofar as students work together to repro-
duce, or at least to simulate, the community practices of the sub;ect areas, for
example as actors in a slagI' drama, or as a team in a science investigation.
Second component: the teacher's role and the regulation of
learning
The assessment initiatives of our project led many teachers 10 think aboul their
tl.'aching in new ways. Two of them described the changes as follows:
I /lOW think mQno Ilbout t/le contomt of the lesson. The influence hilS shifted from
'what am I going to tellch and whafart the pupils going to do?' tOUNlrds 'how 11m
/ going to tellch this lind whllt Ilrt the pupils going to learn?'
~
Thtrt WllS adefinite transition lit some 1/Oint, fromjoel/sing 011 whllt 1was putting
into tht proass, 10 wlult tire pl/pi/s wert colltribl/ting. It became obvious tlult one
Wlly to mllke a sig/lijialllt sustllinablt changt !VIIS to gtt the pupils doillg mort of
tilt thinking. I then began to search for Wllys to /rlakt the ltllrning pl'CJaS5 IIwrt
trll/lspllrrot to the pupils. Indud 1 now spend my time looking for ways to gd
pupils fa tllkt respcJltsibilily for tlreir lellming lit the SIllllt time making tht l l l r n ~
ing mort rollllborativt. This inevitllbly lrads to mort interactive Imming activities
ill tire classroom.
86
Developing a Theory of Formative Assessment
These teachers' comments suggested a shift from the regulation of activity
('what art' the students going to do?') to the rt'gulation of learning ('what are
the students going to learn?'). In considering such regulation,. Perrenoud (1998)
distinguishes two aspects of teacher action. The first involves the way a teacher
plans and sets up any lesson. For this aspect, we found that a teacher's aim of
improving formative assessment led them to change the ways in which they
planned lessons with a shift towards creating 'didactic situations' - in other
words, they specifically designed these questions and tasks so that they gener-
ated 'teachable moments' - occasions when a teacher could usefully intervene
to further learning. 1be second involves teacher action during the implementa-
tion of such plans, determined by the fine detail of the way they interact with
students. Here again teachers changed, using enhanced wait time and altering
their roles from simply presentation 10 encouraging dialogue.
OveralL it is also clear from these two quotations that the teachers were
engaged in 'interactive regulation' by their emphasis on the transfer to the stu-
dents of responsibility for their learning. This transfer led teachers to give
enhanced priority to the need 10 equip students with the cognitive strategies
required to achieve transition to the new understandings and skills potentially
accessible through the subject matter. This implied giving more emphasis to cog-
nitive and meta-eognitive skilJs and strategies than is usually given in schools.
Such changes Wt're evident in the shifts in questioning.. in the skilful use of com-
ments on homework,. and particularly in the new approadl to the use of tests as
pari of the learning process. It is significant that- a few months into the pro;eo,
the teachers asked the research team to give them a talk on theories of learning, a
topic that v..e would have judged too theoretical at the start of the projl'ct.
Some teachers have seemed quite comfortable with this transfer of responsi-
bility to the student, and the implications for change in the studL'fll's role and in
the character of the teacher-student relationship are clear. However, some other
teachers found such changes threatening rather than exciting. Detailed explo-
ration of the trajectories of development for difftorent teachers (see for example,
Lee, 2IXXJ, and Black et aI., 2003) showed that the changes have been seen as a loss
of control of the learning. by some who were trying seriously to implement them.
Although one can argue thaI, objectively, teacher control was going to be just as
strong and just as essential, subjectively it did not feel like that to these particular
teachers, in part because it implied a change in their conception of how learning
is mediated by a teacher. Such a shift alters the whole basis of 'interactive regula-
tion' which is discussed in more detail in the following sectiOfl.
Third component: feedtNck _nd the student-teKher interKtlon
The complex detail of feedback
It em('rges from the above discussion that in the four-<omponent model that we
would propose, the crucial interaction is that between t('acher and student, and
this is clearly a central feature in any study of formative assessment. As already
87
Assessment and learning
pointed out, our starting position was based in part on the seminaJ paper by
Sadler (1989) on formative assessment. One main feature of his model was an
argument that the leamer's lask is 10 close the gap between the present state of
understanding and the leaming goal, that self-assessment is essential if the
leamer is to be able to do this. TIle teacher's role, then, is to communicate
appropriate goals and to promote self-assessment as students work towards
them. In this process, feedback in the classroom should operate both from
teacher to students and from students to the teacher.
Perrenoud (1998) criticized the treatment of feedback in our 1998 review.
Whilst we do not accept some of his interpretations of that paper, his plea that
the concept of feedback be treated more broadly, as noted earlier, is a valuable
comment. The features to which he drew attention were:
The relationship of feedback to concepts of teaching and leaming;
The degree of individuaJization (or personalization of the feedback);
The way the nature of the feedback affects the cognith-e and the socio/affec-
tive perspectives of the pupils;
The efficacy of the feedback in supporting the teachers' intentions for the
pupils' learning;
The synergies between feedback and the broader context of the culture of
classroom and school, and the expectations of the pupils.
Some aspects of these points have already been alluded to above. However, a
more detailed discussion is called for which will be set out here under three
headings: the differenllevels of feedback; the fine-grained features of feedback;
the relevance of Vygotsky's notion of the zone of proximal development (and in
particular the importance of differentiation).
Levels of feedback
The enactment of a piece of tl.'aching goes through a sequence of stages as
follows:
a) A design with formative/feedback opportunities built in:
b) Implementation in which students' responses are evoked;
c) Reception and interpretation of these responses by a teacher (or by peers);
d) Further teaching action based on the interpretation of the responses:
e) Reception and interpretation of these responses by thl.' student:
f) Moving on to the next part of the design.
This is sel out to make clear that the students in (b) and (e) and the teachers in
(c) and (d) are invol\'ed in feedback activities. Feedback can involve different
lengths of loop, from the short-term loops (c) to (d) to (e) and back to (c), to
longer-term loops around the whole sequllnce, that is, from (a) to (e) and then
back again when the wholt" sequence may be redesigned. The concept of regu-
lation involves all of these.
Two points made by Perrenoud are relevant here. One is to t"mphasize that
the mere presence of feedback is insufficient in judging the guidance of learn-
88
Developing a Theory of Formative Assessment
ing (set' Oed and Ryan, 1994). The other is that learning is guided by more than
the practice of feedback. In particular, nol all regulation of learning processes
uses formative assessmenl.lf, for example, the leaching develops metacognilive
skills in the students, they can then regulale !heir own learning 10 a grealer
extent and thus become less dependent on feedback from others. More gener-
ally, it is important to look broadly al Ihe 'regulation pohmtial' of any
leaming acli\'ity, noting however that this depends on the conlexl, on what stu-
dents bring.. on the classroom culture that has been forged 'upstream' (that is,
the procedures whereby a student comes to be placed in a context, a group, a
situation), and on ways in which students im'est themselves in the work.
Several of the project Il!achers ha\'e commented Ihat when they now take a class
in substitution for an absent teacher, the interactive approaches that they have
developed with their own classes cannot be made to work.
The fine-grain of feedback
Whilst the inclusion in our framework of models of learning, of teachers' per-
ceptions of the subject matter and of their pedagogical content knowledge deals
in principle with the nec:essary conditions for effective feedback. these are but
bare bones and in particular may mislead in paying too little attention 10 the
complexity of what is involved, 11te compleXities are discussed in some detail
by Perrenoud, and some of his main points are briefly summarized here.
The messages gi\'l'n in fl't'dback are U!lCless unless studenls are able to do
something with them. So the tearner nt-'eds to understand the way studt.'nls think
and the way in which they take in new messages both at general (sub;ect disci-
pline) and specific (individual) levels, The problem is that this calls for a theory
relating to the mental processes of students which does nol yel exist (allhough
SQme foundations have been laid: see Pellegrino, et aI., 20(1). Teachers usc intu-
itive rudimentary theories, but even if good theory were to be available, applying
it in any specific context would be a far from straightforward undertaking.
For both the teacher, and any obscn'er or researcher, it follows that they can
only draw conclusions from situations observed in the light of theorl'tical
models. As Perrenoud argues:
Witholl' a 'heure/ira/ made! of Ihe media/ions Ihrough U'llich all situll-
lion injlul'nces roglli,ion, alld ill parlicular the lellming prouss, l!>e clln obSt'rvt
tlwllSl/llds ofsilualiOIlS withoul beillg able to draw auy conclusions. (1998: 95)
In framing and guiding classroom dialogue, judgmenls have to be grounded in
activity but must achieve detachment from it (that is, to transcend il) in order to
focus on Ihe knowledge and the learning process. A teacher's intervention to
regulale the learning activit)' has to invol\'e:
, ., all illCUrsioll inlo lire rqlTl'St'lllaliOlI and thought of pupil to
/'Tille II breaklhroJlgh in understanding, 11 /1/.'11' point of l,iew or thl' shapiug of Il
1I0'iou which am immedillle!y bf-wml' opallliv/'. (1998: 97)
89
1
Assessment and Learning
Torrance and Pryor (1998) studied the fine grain of feedback through video
re<:ordings of episodes in primary school classrooms. Many of their findings eo:ho
those of our study, albeit as an analysis of the variations in practice between
teachers rather than as p<lrt of an intervention. What they are keen to emphasize
is the complexity of the sodal interaction in a classroom. which leads them to look
closely 011 issues of power mainly as exercised by teachers al different levels, for
example f)lcrting power OVt't students with dosed questioning, or sharing power
with students (Kreisberg.. 1992) using more open questioning. Torrance and Pryor
also give an example of how fet.>dback,. which does no more than guide the group
discussion lhal a teacher is mainly trying 10 obseJ"Ve, transfers power. However,
this is then unevenly distributed amongst the students.
The zone of proximal development and differentiation
Sadler's emphasis on a teacher's task in defining t ~ gap bt'hveen what the
learner can achieve without help and what may be achieved with suitable ~ l p
and the fact that this lays emphasis on the social and language aspects of leam-
ing, might seem to connect directly with a common interpretation of VygOlsky's
concept of a ZOne of Proximal Development (Vygotsky, 1986). Also relevant are
the conrepts of S{;llffoJding as developed by Wood et al. (1976), and Rogoff's
(t990) broader notion of guidf'd participation, which serve to emphaSize and
clarify the role of a teacher.
However, discussions of the ZI'D are difficult to interpret without knowing
precisely how the authors interpret the concept. Here we draw on the analysis
of ChaikJin (2005), who points out that for Vygotsky the zone has to be defmed
in termsjof a model of development. These different 'ages' of development are
defined as a sequence of coherent structures for interacting intellectual func-
tions. A learner will have achieved a particular 'age' of de\'elopment, and
possess immature but maturing functions which will lead to the next 'age'. In
an interactive situation, one which may be aimed at diagnosis rather than for
specific teaching purposes, the learner may be able to share, in collaboration..
only the mature functions: 'the area of immature, but maturing.. processes
makes up the child's zone of proximal development' (Vygotsky, 1998: 202).
Teaching should then focus on those maturing functions which arc needed to
complete the transition to the next age period. Whilst the age periods are objec-
tively defined, the ZPD of each learner will be subjecti\'ely defined. Interven-
tions such as those by the thinking skills programmes (Shayer and Adey, 1993)
may s u ~ because they focus on maturing processes of general import.mce.
It fol1ows that what is needed is those learning tasks in which a learner is
involved in interaction with others, and these will serve to identify the particu
lar areas of intellectual function which" in relation to achieving the next 'age' of
development for that leamer, are still immature. TIlis has to be done in the light
of a comprehensi\'e model of 'ages' of intellectual development.
TIlis is dearly a task of immense difficulty, one that is far more complex than
that implied by the notion of a 'gap', which many see as implied by Sadler's
analysis. It is probably true that less sophisticated notions of a 'gap', and of scaf-
90
Developing a Theory of Formative Assessment
folding interventions to close such, are of practical value. However, they cannot
be identified with Vygotsky's concept of a ZPD, and they will not attend to the
real complexity of the obstacles that learners encounter in advancing the matu-
rity of their learning.
This argument serves to bring out the point that success in fostering and
making use of enhanced teacher-student interactions must depend on the
capacity to adapt to the different ZPDs in a class, that is, on the capacity of a
teacher to handle differentiation at a rather subtle level of understanding of
each learner. However, it does not follow that the problem reduces to a one-Qn-
one versus whole class dichotomy, for social leaming is a strong component of
intellectual de\'elopment and capacity to learn in interaction is an essential
diagnostic tool. peer assessment, peer teaching.. and group
learning in general have all been enhanced in our project's work. and the way
that the need for differentiation is affected by these practices remains to be
studied. The fact that in some research studies enhanced formative assessment
has produced the greatest gains for those classified initially as 'low-achievers'
may be relevant here.
The overall message seems to be that in order to understand the determi-
nants of effectiw ft-'edback.. or broaden the perspt.'C!ive whilst detecting and
interpreting indicators of effective regulation, we will need theoretical models
that acknowledge the situated nature of learning (Greeno et OIL, 1998) and the
operation of teaching situations. We have to understand the context of schemes
of work by teachers and w(' have to study how they might plan for and interact
on the spot to explore and meet the needs of different students. This sets a for-
midable task for any research study of formative work in the classroom.
Fourth component: the student's role in learning
The perceptions of our teachers, as reported above, are that their students have
changed role from being passive redpients to being active learners who can
take responsibility for and manage their own learning. Another teacher
reported this as follows:
Tiley/tel thai th.' pressure to succeed in tesls is /JeiIJE rry/laced by lhe ueed 10 under-
stand the work thai hils bel'l nnPfTed aud the trst is jusl lin lIssessl7lentll/ollg lhe
way of wllllineeds morr !l.Vlrk alld whal seems to be fine ... They havt rommenlt:d
ml Ihl' fact IhIlt they think I am interesled ill Ihe !(enrral way 10 Rei 10 an
allswer Ilia/l aSPf!ciftc selulioll am! when Clare [a researcllrr/ interviewed thelll Ihey
decided this was sc Ihal they could apply IIleir ulldersllllldillg ill a wider sense.
Other, albeit very limited, int... rviews with students have also produced evi-
dence that students SilW a change in that their teacher seemed really interested
in what they thought and not merely on whether they could produce the right
answer. Indeed, on... aspo..'Cl of the project has been that students responded very
positively to the opportunities and the stimulus to take more responsibility for
their own learning.
91
Assessment and learning
These changes can be interpreted in terms of two aspects. One already men-
tioned in an earlier section is the de\'elopment of meta-rognition, involving as
it must some degree of reflection by the student about his or her own I..aming
(Hacker et al., 1998). Of significance here also is Ihe concept of self-regulat..d
learning as developed by Schunk. (1996) and Zimmerman and Schunk (1989),
and the findings of the Melbourne Project for Enhanced Effective Learning
(PEEL) summarized in Baird and Northfield (1992).
Analysis of our work may be taken further along these lines, by relating it
to the Iit.. rature on 'meta-learning' (Watkins et aI., 2001). Many of the activities
described in our first section could readily be classified as meta-cognitive, on
the part of both teachers and their students. The distinction, emphasized by
Watkins et aI., between 'learning orientation' and 'performance orientation'
(see Dweck, 1986, 1999) is also intrinsic to our approach. The achievement of
meta-learning is less clear, for what would be required is that students would
reflect on the new strategies in which they had been involved, and would seek
to deploy these in new contexts. The practice of active revision in preparation
for examinations, or the realization that one n{'('ds to.seek clarity about aims if
one is to be able to evaluate the quality of one's own work, may well be
examples of meta-learning, but evidence about students' perceptions and
responses to new challenges would be needed to support any claims about
outcomes of this type.
Asetond aspect, involving conative and affective dimensions, is reflected in
changes in the students' perceptions of their teacher's personal interest in them.
Mention has been made above, in the report on the abandonment of giving
marks or grades on written work, of Butler and Neuman's (1995) account of the
importance of such a change, It is nol merely that a numerical mark or grade is
ineffective for I('arning because it does not tell you what 10 do; it also
your self-perception. If the mark is high, you are pleased but have no impetus
to do better, If it is low, it might confirm your belief that you are not 3bl(' to learn
the subJect. Many other studies have explored the neg3tivc effects not only on
learning but also on self-concept, self-efficacy and self-attribution of th" class-
room culture in which m3rks and grades come to be a dominant currency of
classroom relationships (see for example, Ames, 1992; Cameron and l'it'rce,
1994; Buller 3nd Winne, 1995; Vispocl and Austin, 1995). In particular, as long
as students believe that efforts on their part cannot make much difference
because of their lack of '3biJity', efforts to ('nhancc their capability as learners
will have little effect.
The importance of such issues is emphasized by Cowie's (2004) study which
explored students' reactions to formative assessment. One of h('r general
was that students are in any activity balancing thr('e g03ls
simultaneously, n3mely, compidton of work tasks, "ffective learning and
social-relationship goals. When these conflict th('y tend to prioritize the
social-relationship goals at the expense of le3rning goals; so, fur cxample,
many wiII limit disclosure of th('ir id('3s in the classroom for fear of harm to
their feelings and reputation. The way in which the teach('r deals with such
disclosures ;s crucial. The respect shown them by a teacher and their trust in
92
I
Developing a Theory of Formative Assessment
that teacher affect students' responses to any feedback - they need to feel safe
if they are to risk exposure. Cowie also found that the students' responses to
formative feedback cannot be assumed to be uniform. Some prioritize learning
goals and so look for thoughtful suggestions, preferably in one-to-one
exchanges, whilst others pursue performance goals and so want help to
complete their work without the distraction of questions about their
understanding. Sadly, many felt that the main responsibility for their learning
rested with the teacher and not with themselves. In an activity theory
representation, as exemplified later in this chapter (Figures 5.1 and 5.2), all of
the issues raised by such work are represented by the element labelled
'community'; the connections of this element with the other elements of the
diagram are both important and complex.
Much writing about classroom learning focuses on the learner as an individ-
ual or on learning as a social process. Our approach has been to treat the social-
individual interaction as a central feature, drawing on the writings of Bredo
(1994) and Bnmer (1996). Thus, feedback to individuals and self-assessment has
been emphasized, but so ha\"e peer assessment, peer support in learning and
class discussion about learning.
For the work of students in groups, the emphasis by Sadler (1989, 1998) and
others that peer assessment is a particularly valuable way of implementing
fomlative assessment has bo..>en amply borne out in the work reported here. The-
oretically, this perspective ought to be evaluated in the broader contllXt of the
application to classrooms and schools of analyses of the social and communal
dimensions of learning as developed, for example, in Wenger's (1998) study of
communities of practice. These points are ilIustrdted by the following extrdct
from an interview with a student in the KMOFAP; discussing peer marking of
his investigation:
After a pupil marking my intotStigation, J con now acknowledge my mistokts
lasier. 1 hope Ihat it is /101 just me who [euml from Ihe illue;tigatioll but tht pupil
who marked it did also.
Next time! will havc 10 make my txplllnalions dea,er, lIS they said '/I i ~ hard 10
lmdtrstond', so l must Ill"xl time make my equatioll dl"arrr. J will now explain my
equalum again so il is c ~ o r
11l.is quotation also bears oul Bruner's (1996) emphasis on the importance of
externalizing one's thoughts by producing objects or oeuv,es which.. being
public, a..., accessibl" tu ...,flection and dialugu,-" leading to enrichment through
communal interaction. He points out that awareness of one's own thinking, and
a capacity to understand the thinking of others, provide an CSSl:'ntial reasoned
base for interpersonal negotiation that can enhance understanding.
The importance of peer assessment may be more fundamental than is
apparent in accounts by teachers of their work. For self-assessment, each
student has to interact mainly with text; interactions with the teacher, insofar as
they are personal, must be brief. Discussing the work of Palincsar and Brown
(1984) on children's reading, Wood states:
93
I
Assessment and learning
This work, motivated by Vygolsky's thtory of developmelll /lnd by his writings on
literacy, startedfrom lISSumption Ihlll some children fail to Ilduanct beyond Ihe
initial stages of reading baust IMy do no/ blOW how to 'interllef' with terl, thai
is, they do not b..'COme aclively (ngaged ill IIttempts to interpret whllt tlwy read,
1M: intmJtnlion I/'dmiquts involved bringing in/a Ihe making
public lind audible, U1<Iys of interacting with tat thai skilled readers usually
urrderltVre au/omal/allly lind soundltsSly, 0998: 22o--J)
Thus if a student's interpretation of aims and of criteria of quality of perform-
ance is to be enriched; such enrichment may well require 'talk about text', and
given thai it is impracticable to achieve this through teacher-student interac-
tions the interactions made possible through peer assessment may meet an
essential need.
Overall, it is clear that these changes in a student's role as learner are a sig-
nificant ftoature in the reform of classroom learning.. that our formative assess-
ment initiative has been effective in its impact on these features, and that
changes in a student's own beliefs and implicit models of learning also under-
lie the developments involved.
Applying adivity theory
[n considering the interpretation of these four components in terms of a repre-
sentation of the subject classroom as an activity system, we have concentrated
mainly on the 'tip of th.e iceberg': subjects, objects and cultural resources, and
the relationships between th.ese three elements. As will be clear in our exposi-
tion of tlwse ideas, the nature of these relations is strongly influenced by the
other elements of activity systems, that is, rules, community, and division of
labour. The discussion of these relationships will be brief; a full exploration
would a far longer treatment than is possible here.
In the activity system of the subject classroom, the tools or cultural "-,sources
that appear to be particularly important in the de\'elopment of formative
assessmmt are:
Views and ideas about the nature of the subject, including pedagogical
content knowledge;
Methoos for enhancing the formative aspects of interaction. such as rich
questions, ideas about what makes feedback effective and techniques such as
'traffic lights' and SO on;
Vil'wS and ideas about the nature of learning.
The subjtds are, as stated earlier, the teacher and the students, although it is
important to acknowledge that it useful to distinguish between students as
individuals and students in groups in the classroom (Ball and Bass, 2000).
The object in most of the subject classrooms we studied was increased studcnl
success, either in temlS of better quality learning or simply better scores on
state-mandated tests. Many teachers spokl' of their interest in participating in
94
I
I
I
Developing a Theory of Formative Assessment
the project because of the promise of better results. However, as well as this
object which, as noted above, was secured by most of the participating teachers,
the ou/collln of the projects included changes in the expectations that teachers
had of their students, and also changes in the kinds of assessments that these
teachers used in their routines. The most important change in the teachers' own
assessments was a shift towards using those that provided information for the
teacher not only about who had learnt what, but also proferred some insights
into why this was, in particular - when interpreted appropriately -those that
gave some idea as to what to do about it. In other words, a shift towards assess-
ments that could be formative jor tf'ilcher.
Tools
Subjects
Objecls,
outcomes
Tools
...... ........ of .......J.w,....
k_
,
.....: 1loIe,,( ""-..
I"'_p'
of til< It""",,,
,.... io.:Ii\-..lu&J1
r."'......
--------.
E<l<'nO&Ilr ........
"l:':'::""".
aod ........
r"" .......
Figure 5.1; Patttrll!< Df influence in Iht KMOFAP and BEAR projtcts (SlJlid-/u'adtd arrows
n'/'rrsrot injlurnas in KMOFAP; optn-l1faurd arrows ftprtsmtinjlulntr.iin BEAR)
Figure 5.1 uses the context of the KMOFAP and a US example, the Berkeley
Evaluation and Assessment Research project (BEAR; see Wilson and Sloane,
2000), to illustrate the various components of the theoretical framework out-
lined above and their interrelationships. Components 1, 2 and 4 are represented
as tools, while component 3 is reprcsentL>d in the links belwC<'n the teacher and
the students (both individually and in groups). Solid-headed arrows are used
to represent the key influences in the KMOFAP project while the open-ended
arrows represent influences in the BEAR project. Using this framework, the
course of the KMOFAP can be seen as beginning with tools (in particular find-
ings related to the nature of feedback and the importance of questions) which
prompt changes in the relationship betw.?e" IIJe subjects (that is, in the relation-
ship between the teacher and the students) which in tum prompt changes in the
95
Assessment and Learning
subjects themselves (that is, changes in the teachl'r's and studl'nts' roIl's). These
changes then trigger further changes in other tools such as the nature of the
subjl'ct and the view of learning. In particular, the changes prompted in the
teache.r classroom practices involved moving from simpl(' associationist views
of Il'aming to embracing constructivism and taking responsibility for learning
linked to self-regulation of learning.. metacognition and socialleaming.
Figure 5.1 does not represent an activity system in the canonical way. This
more common representation. using the nested triangles, is shown in Figure
5.2. Here the relationships are brought out more dearly by placing tools at the
apex with subjects, and objects and outcomes on the base of the upper triangle.
Thus it would be possible in principle to map Figure 5.1 into this pari of Figure
5.2 but much of the d('tail would either be lost or appear confusingly complex.
I
i u ~ 5.2: Elnnrllts of adivity systems (EnKo'$lrom, 1987/
However, what the canonical representation makes more explicit are the {'Ie-
ments in the lowest row of Figure 5.2 and their links with th(' re-sl. Whilst the
community, deemed as the subject classroom, is a given. both the rults and the
division of labour are changed by a fonnative innovation. For the rules, if teach-
ers cease to give grades or marks on homework in order to focus on feedback
through comments, th{'y may be in conflict with management rul('s and
parental expectations for many schools. Yet in two of the KMOFAP schools such
rules were eventually changed, the new rule being, for the whole school, that
marks and grades were not to be given as feedback on writlen homework. The
more pervasive 'rule' - that schools are under pressure to produce high grades
in national tests - did limit some fonnative developments, and it is clear that
synergy between teachers' formative practices and their responsibilities for
summative assessments would be hard to achieve without some room for
manoeuvre in relation to high-stakes testing.
The division of labour is a feature that is radically transfonned, as made dear in
the second component for changes in the teacher's role, and in the fourth compo-
nent for changes in the student's role. One aspect of the transfer of power and
responsibility thai is involved here is that the students begin to share ownership
of the tools, for example by involvement in summative testing processes, and by
becoming less dependent 00 the teacher for their access to subject knowledge.
What is obvious from this discussion is that there are strong interactions
between the various clements of the system. This suggests that any attempt to
I
Developing a Theory of Formative Assessment
record and interpret the dynamics of change as an innovation. notably in form-
ative assessment, could do well to adopt and adapt an activity theory approach
along the lines sketched here,
Strategies for development
The KMOFAP and BEAR projects
[t is useful at this point to contrast the approach adopted in our proje<:t with an
alternative strategy; clearly eXl'mplified in the BEAR project (Wilson and
Sloane, 2000) and impressive in the evidence of learning gains associated with
an emphasis on formative assessment. This differed from the work described in
the first part of this paper in the following ways:
It was part of a curriculum innovation into which were 'embt.'tided' nt.'w
formative assessment practices;
An important aim was to secure and establish the reliability and \'alidity of
assessment practices so that assessment by teachers could with-
stand public scrutiny and claim equal status with the external standardized
tests which have sum negative effects on education in the USA;
The aims were fonnulated as a profile of a few main components, with each
component being set out as a sequence of levels to reflect the expected pro-
gression of learning within each;
The assessment instlUments were written tests proVided externally, some to
be used as short-term checks on progress, some of greater length to be used
a medium-term meeks;
Whilst formative use was emphasized, there was very little acrount of the
ways in which feedback was deployed or received by students.
To over-simplify, it could be said that the apparent weakness of the BEAR
project lies in those aspects in which our project was strong. At the same time
its strengths, in the quality of the assessment instruments and the rigour in their
use and interpretation, throw into sharp relief the weakness of our projL'C!, for
the cognitive quality of the questions used by our teachers and of their feedback
comments whether oral or written still needs further attention. Whilst the two
approaches may be seen as complementary, and each may have been the
optimum approach for the particular context and culture in which it was
designed to operate, there remains the issue of whether some aspL'Cts of either
could be incorporated, albeit at a later stage in implementation, in the other.
In terms of our model, the BEAR project imports thL'Ories of the subject and
of learning and requires teachers to work to these models, but is not explicit on
the nature of the teacher-student interactions or the change in roles of either
teachers or students. Thus the project does not seem to affected the class-
room community through any Significant shift in the division of labour. Similar,
although not identical, contrasts could be drawn by analysis of many of the
research initiatives described in the 1998 Il;'view by Black and Wiliam. The
97
I
Assessment and learning
contrast between our work and that of the BEAR project is brought out dearly
in Figure 5.1, which shows the patterns of innuence in the two projects.
This comparison can help to draw attention 10 the options available in any
programme of teacher development. The partial successes of our own approach
have a peculiar significance in thai they have led 10 changes tranSl;Cnding the
boundaries envisaged by our initial concentration on fannative assessment.
This cXpaJ\Sion may in part have arisen because of our emphasis on the respon-
sibility otthe teachers as partners with us, sharing responsibility for the direc-
tion of change. It might have been predictable that their initiati\'es would
broaden the scope, because their work has to marry into the fuJI reality of class-
room work and cannol be limited 10 one theoretically abstracted feature. Indced
we have come to think of fonnalive assessment as a 'Trojan Horse' for more
general innovation in pedagogy - a point to which we shall return in the
conduding section below.
I
Other related research and development studies
The BEAR study was similar in many respects to our own, so it is particularly
interesting to explore the (omparison in detail. However, we have developed
the view that what is at issue is a throry of classroom and from this
perspective the number of relevant studies becomes far too great for any syn-
thesis to be attempted here.
Three of related studit.'S may suffict' to indicate possibilities. The
first is the cognitive ac(eleration work associated with Shayer (1999). In com-
parison with the cognitive aca-leration initiative, our fonnativt"' intervmtion
did not target specific reasoning skills and so does not call for ad hoc teaching.
although within the set pie' lessons of that initiative many of the practices
have much in common with the fonnative practices. In tenns of the scheme of
Figure 5.1, the work involves very specific tools and is characterized by a more
explicit - and thereby less eclectic -Irarning analysis which impacts directly on
the role of Ihe teacher. It resembles the BEAR project in these respects, but it does
not resembh.' it in respect of the dired link to externally sel tesls alld criteria.
A second example is the work on 'Talk Lessons' developed by Neil Mercer
and his colleagues (Mera-r, 2000; MerCf.'r et aI., 2(04). These lessons could
indeed be seen as a powerful way of strengthening the development of peer
assessment practices in enhancing students' capacity to learn. This initiative
develops different specific tools but it also, in terms of Figure 5.1, works to
direct links betwt"Cn the leaflling rmalys;s, the ;n/erae/ioll methods and the division
of labour by focusing its effort on the role of the s/udellt ;'1 a group.
The third example is related to the second, but is the broader field summa-
rized in Alexander's (2004) booklet TDWIlrds Dialogic Teaching, which draws on a
range of studies of classroom dialogue. The main argument here starts from the
several studies that have shown that classroom dialogue fails to develop stu-
dents' active participation, reducing dialogue to a ritual of superfidal questions
in a context of 'delivery' teaching whereby thoughtful participation cannot
98
Developing a Theory of Formative Assessment
develop. His arguments call for an emphasis on all three of the tools areas in
Figure 5.1. and puts extra emphasis on the community element represented
directly in Figure 5.2 but only indirectly in the connecting arrows between
teacher role and student roles in Figure 5.1.
Conclusions and Implications
We have focused the discussion in this chapter on our own study, in part
because our approach to theory was grounded in that work. in part because we
do not know of any other study which is grounded in a comparably compre-
hensive and sustained development with a group of tearners. Whilst we regard
the theory as a promising start there is clearly further work to be done in both
developing it and relating empirical evidence to it.
If we consider the potential value of the four component model that we have
explored and discussed. an obvious outcome is that it could be used to suggest
many questions which could form the starting point for further empirical
research, many of which would requiring fine-grained studies of teacher-
student interactions (see for example, Torrance and Pryor, 1998; Cowie, 2004).
However, the more ambitious target for this chapter is more fundamental - to
help guide the direction and interpretation of further re!iCarch through the the-
oretical framework that is proposed.
We have explored above, very briefly, the possibility for developing the
theory through attempting new interpretations of initiatives already published.
TItis exploration, which involves attempting to embed the formative aspect in a
broader view of pedagogy, reflects the point made by Perrenoud quoted at the
beginning of this chapter that it is neo:essary to consider formative feedback in
the wider context of 'models of learning and its regulation and their imple-
mentation'. This may seem to be over-ambitious in attempting a complete
theory of pedagogy rather than only that partiC\llar aspect of pedagogy which
is l ~ l l e 'formative assessment'. However, such an altempt seems inevitable
given our c.xperience of the initially limited aim of developing formative assess-
ment leading to much more radical changes.
One function of a theoretical framework should ~ to guide the optimum
cooice of strategies to improve pedagogy; by identifying those key determi-
nants that have to be evaluated in making such choices and in learning lessons
from experiences in other contexts. It follows that the framework might ~ used
to evaluate, n>trospectively or prospectively, the design of any initiative in
teaching and learning. In the ca!iC of the KMQFAP initiative, it should help
answer the question of whether it was the optimum way of devoting effort and
re50urces towards the improvement of classroom pedagogy. TItis would seem
a very diffiC\lIt question to answer in the face of the potential compleXity of a
comprehensive theory of pedagogy that might prOVide the basis for an answer.
However, some significant insight could be distilled in a way that would at least
help re50lve the puzzle of the project's unexpected success, represented by the
metaphor of the Trojan Horse mentioned in the previous section.
99
Assessment and Learning
The argument starts by pointing out th,lt th(' examples of chang'" which the
teachers described seemed to confirm that working to improve the teacher-
student interaction through formative assessment could serve to catalyse changes
in both the teacher's role and those adopted by that teacher's students. The
changes motivate, perhaps demand, an alteration in the various interactions of
both students and teachers with their throries of learning, and with the ways in
which they perceive and relate to the subject matter that they are teaching. Thus
whilst we cannot argue that development of formative assessment is the only
wa)', or e,'en the best way, to open up a broader range of desirable changes in
classroom learning.. we can see that it may be peculiarly effective, in part be1:ause
the quality of interacth'e f ~ c k is a critical feature in determining the quality
of learning acti\'ity, and is therefore a central featur(' of pedagogy.
We might also speculate that a focus on innovation in formative assessmomt
may be productive because many teachers, regardless of their perceptions of their
teaching role and of the learning roles of their students, can see the importance of
working on particular and limited aspects of feedback but might then haw their
perspectives shifted as they undertake such work. In the project, the tools pro-
vided led teachers to think mor(' deeply - about their pedagogical content knowl-
edge, about their assumptions on learning and about interactions with their
students; hence activating all of the components of our framework for thcm.
Given that a development of formative assessment has this peculiar potential
to catalyse more radical change, a theory that helps design and track such change
would be an important resource. The approach sketcrn.>d out here may help such
tracking, inasmuch that the compone1lts of our model interpreted in terms of an
activity system framework do seem to interact strongly and dynamically. and
would help in interpreting any change pt'lX"CSS. A central feature may be that
inconsistencies between the various elements of the classroom system are hard
for the actors to tolerate. The interaction lines in the frameworks of Figures 5.1
and 5.2 are all-important for they signal that any irmovation that s u ~ s in
changing one element might well destabilize the existing equilibrium, so thaI the
whole pattern of pedagogy is affectl'<i to achil'Vl' a new equilibrium.
Acknowledgements
We woulli acknowledge the support given by the Nuffield Foundation in
funding the first phase of the KMOFAP, and by the National Science Founda-
tion for funding the subsequent phase through their support of our partnership
wilh the Stanford CAPITAL projed (NSF Grant REC-99(9370). This present
paper ft'P,Orts the findings of our work in England to date: comparative and
synthesized findings with the Stanford partners will be subjects for latt"r study.
We are grateful to Sue Swaffield from Medway and Dorothy Kavanagh from
Oxfordshire who, on behalf of trn?ir authorities, helped to create and nurture our
links with their l'I.'Spective schools. 1be teachers in this pro;ect have been the main
agents of its success. 11teir willingness 10 lake risks with our ideas was essential,
and their voices are an important basis for the main message of this paper.
100
C
",.
Progressive crileria for the
skill or undttStanding
developro in diffen>nt
11'5SOl\5 (fine..grained)
t'!5- pl.lnning investigations
Criteria for
NflOrting levels
of achievement
(roane-grained)
eg. e",!uiry skil'"
figure 6.3: Cools "I ..,rWus lrods of dt/"il
These considerations show that it is quite possible to use evidence gathered
as part of teaching both to help learning and for reporting purposes. But. as
with the use of summative data for formative purposes, there are limitations to
the process. By definition, in this context, the judgement is made by the teacher
and for summative purposes there needs to be some assurance of dependabil-
ity. Thus some quality assurance procedures need to be in place. The more
weight that is given to the summative judgement, the more stringent the quality
assurance needs to be, possibly including some inter-school as well as intra-
school moderation in judgements of evidence. This is difficult in relation to out-
comes where the evidence is ephemeral and a consequence of this can be that
the end-use of the evidence influences what evidence is gathered and how. It
could result in a tick-list approach to gathering evidence or a series of special
tasks that give concrete evidence, making fonnative assessment into a succes-
sion of summative assessments.
A further limitation is that if summative assessment is based solely on evi-
dence gathered within the context of regular classroom activities, this evidence
will be limited by the range and richness of the educational provision and the
efficiency of the teachers in evidence. In some circumstanC'('S the evi-
dence reqUired to summarize learning may need to be supplemented by intro-
ducing spccialtasks if, for instance, a teacher has been unable for one reason or
another to collect all that is necessary to make judgements about all the stu-
dents. For fonnative purposes it is often appropriate to consider the progress of
groups rather than of individual students. Additional evidence may then be
needed when making a on the achievement of individual students.
112
Copyrighted Material
On the Relationship Between Assessment for Formative and Summative Purposes
In summary, limitations on using evideoa:' galhl'red for fonnali\'l' asSl'SS-
menl, if it is to ml'<.'tthe requirements of summative assessment, are:
It is essential to reinterpret the evidence in relation to the S<lme criteria for
each student;
It is important for teachl'rs to be very ckar about when lewl-baSl.-'d criterill
arc appropriate and not to use them for grading students when the purpose
of the assessment is formative;
Since the formative use of evidence depends on teachers' judgments, add i-
tion1l1 qUlllity assuranCl:' prOCl.-'(\urcs will be nl'l'ded when the infonnation is
used for a different purpose;
Teachers may need to supplement evidenCl:' from regular classroom events
with special tasks to ensure that all nccessary evidence is collected for all stu-
dents;
The difficulty of dealing with ephemeral cvidence could lead to a tick-list
approach or a series of summati"c tasks;
This may \\'ell change the nature of formative assessment, making it more
formal.
Revisiting the relationship
A dichotomy or a dimension?
The discussion in the previous two parts of this chapter indicates that there is
no sharp discontinuity between assessment for learning and assessmcnt to
rq:>ort learning. In particular, it is possible to view the judgements of l'vidl'nce
against the progressive criteria in the middle column of Figure 6.3 both as fonn-
ative, in helping decisions about next steps, and as summative in indicating
where students have reached. This suggests that the relationship between form-
ative and summatiVl' assessment might be described as a 'dimension rather
than a 'dichotomy'. Some points along the dimension Me indicated in Figurl' 6.4
(dcrived from Harlen. 1991.
At the extrcml'S are the practio::s and uses that most typify assessment for
learning and aSSl'ssment of learning. Atlhe purely fonnative end is assessment
that is integral to student-teacher interaction and is also pMt of the student's
role. The teacher and student consider work in rcliltion to the goals that arc
ilppropriilte for the particular learner and so the judgements arc essentially
student-referenced. The central purpoSl' is to enable teacher and students to
identify the next steps in leamingand to know how to take theSl'. Atthe purely
summative end of the dimension the purpoS<.' is to give an ilccount of whilt has
been ilchieved ilt Cl:'rlain points. For this purpose, the ilsscssment should result
in a dependable report on the achievements of each individual student.
Although Sl'lf-asSl'ssment milY be part of the prOC\'ss, the ultimate responsibil-
ity for giving a fair account of how l'ach student's INlming compares with thl'
criteria or standards rests with the teilchcr.
113
Copyrighted Material
Assessment and Learning
Between these ends it is possible to identify a range of procedures having
various roles in teaching and learning. For instance, many teachers would begin
a new topic by finding out what the students already know, the purpose being
to inform the teaching plans rather than to identify the point of development of
each individual. Similarly, at the end of a section of work teachers often give an
informal test (or use 'traffic-lighting' - see Chapter 1) to assess whether new
ideas have been grasped or need consolidation.
Summalive
W",mol F_
Informal IFormal
formative formal;ve summative summal;ve
Major focus What art' lIMo nexl steps in \'INI has befon achieved 10
lumin ? date?
1""1-
ToWo"" To inform TOl1\Of'litor
To """"
next steps in nexl steps in
F"""",,
achievement.
]eamin tNeron a ainst
"""
of individuals
How is
Introduced Introduced Separate task
evidence pari of class into normal ;nlO rooona] or lesl
collected? wmk elMs work class work
",",,01 Studenl Student and Criterion CrilerKlf\
judgemenl
.f"""",,, criterion referenced refe""",ed
rei,,,,,,,,,,
Judged by Student and Teacher Teacher or
teaclMor
..
marker
Action taken Feedback to Feedback into Feedback Into Report 10
students and teaching teaching student.
teacher plans parent, other
teachers ek.
Epithet Assessment Matching Dipstkk A..........ment
for l"amin
of
Figure 6.4' 1\ plI5Siblr dimrnsion of I155rSSmrnt pllfPOS'S Imd
There is some parallel here with intermediate purposes for assessment
tified by others. Cowie and Bell (1999) interpreted their observation of the
assessment practices of ten teachers in New Zealand as indicating two forms of
formative assessment: planned and interactive. Planned formative assessment
concems the whole class and the teacher's purpose is to find out how far the
learning has progressed in relation to what is expected in the standards or cur-
riculum. Information gathering, perhaps by giving a brief class test or spedal
task. is planned and prepared ahead; the findings are fed back into teaching.
This is similar to 'informal summative'. Interactive formative assessment is not
planned ahead in this way; it arises from the learning activity. Its function is to
help the learning of individuals and it extends beyond cognitive aspects of
learning to social and personalleaming; feedback is both to the teacher and the
learners and is immediate. It has the attributes of 'informal formative'.
Cowie and Bell's interactive formative assessment is similar to the classroom
assessment that Glover and Thomas (1999) describe as 'dynamic'. Like Black
114
On the Relationship Between Assessment for Formative and Summative Purposes
and Wiliam (1998b), they emphasize the involvement of students in learning
and indeed speak of 'devolving power to the learners' and suggest that without
this, dynamic assessment is not possible. Unlike Cowie and Bell, however, the)'
claim that all assessment must be planned.
There are also different degrees of formality towards the summative end.
!;\/hat is described as 'informal summative' may involve similar practice 10
'formal formative', as is illustrated in Figure 6.4. However, the essential difference
is the use made of the evidence. If the cycle is closed, as in Figure 6.1, and the evi
dence is used in adapting teaching, then it is formal formative. If there is no feed-
back into teadting, as in Figure 6.2, tm'fl it falls into the CiltegOry of 'informal
summative', even though the evidence may be the same classroom test.
Yet rather than trying to make even more distinctions among assessment
procedures, this analysis perhaps ought to be taken as indicating no more than
that there are different ways of practising and using formative assessment and
summative assessment. If this is so, do we then need the distinction at all?
Is there ~ t i v and summative assessment or just good
assessment1
Some of those involved in developing assessment have argued that the forma-
tive/summative distinction itself is not helpful and that we should simply strive
for 'good assessment'. Good formative assessment will support good judge-
ments by teachers about student progress and levels of attainment and good
summalive assessment will provide feedback that can be used to help leaming.
Maxwell (2004) describes progressive assessment as blurring the boundary
between formative and summative assessment.
TIle discussion of Figure 6.4 certainly indicates a blurred boundary. Added
to this, the recognition of how evidence can be used for both purposes would at
first sight seem to add to the case against retaining the distinction between
formative and summative assessment. In both cases there arc limitations in the
dual use of the evidence, but on closer inspection these are seen to be of rather
different kinds. The limitation of using evidence which has initially been gath-
ered for a summative purpose to help learning bears on the validity of the evi-
dence; it is just not sufficiently rich and readily available to be adequate for
formative use. The limitation of using evidence which has initially been gath-
ered to help learning to report on learning. bears on the reliability of the evi-
dence. In this case there are steps that can be taken to address limitation and
increase reliability; training can ensure that teachers collect evidence systemat-
ically and with integrity whilst moderation can optimize comparability.
When procedures are in place to assure quality in this way, then evidence
gathered by teachers at a level of detail suitable for helping learning can also be
used for dependable assessment of learning. This is what happens in the
Queensland Senior Certificate, where all the information nceded to grade stu-
dents comes from evidence collected and judged by teachers. However, the
reverse situation cannot be found; there are no examples of all the needs of
assessment for learning being provided from evidence collected for summative
115
Assessment and Learning
purposes. Of course, il is 1'101 logical that this could be so.
This asymmetry in dual use seems to be a strong argument for maintaining
the distinction in purposes. We need to know for what purpose the evidence
was gathered and for what purpose it is used. Only then can we ('valuate
whether it is 'good' or not. One can conduct the same assessment and use it for
different purposes just as one can between two places for different pur-
poses. As the purpose is the basis for evaluating the success of the Journey, so
the purpose of assessment enables us to evaluate whether the purpose has been
achieved. If we fuse or confuse formative and summative purposes, experience
strongly suggests that 'good assessment' will mean good assessment of learn-
ing. not for learning.
Conclusion
The notion of progression is one of two key clements in the discussion in this
To develop students' understanding and skills teachers need to hav.. in
mind some developmental criteria in order to see how the goals of specific
If:'SSOns are linked to the progression of more general concepts and skills. For
formative purposes, these criteria do not need to be linked to levels; they just
provide a guid.. to the next steps in leaming. For summative assessment pur-
poses, where a summary is required, the use of levels, standards or grades is a
way of communicating what a student has achieved in ternlS of the criteria I"\.'p-
rest'nted by the levels, standards or grades. This process condenses evidence and
m.'CE'ssarily means a loss of detail. The uses to which summative informJtion is
put require that the levels and their like mean the same for all students. That is,
putting aside for the moment the possibility of mis-grading discussed in Chapter
7, a level or grade X for student A means this student has achieved roughly the
same as student Bwho also achieved a level or grade X. It is only for colwenlence
that we use levels; in theory we could "'port perfornlance in terms of a profile
.lCrosS succession of progressive criteria, but this would probably provide far
too much detail for most purposes where a concise summary is required.
It is the different purposes of the information, the second key feature of this
chapter, that create a distinction between formative and summative assessment.
We hay!.' argued that !.'vidence gathered as part of teaching and learning can be
used for both fonnative and summative purposes. It is used to help learning
when interpreted in tcnns of individuals' progress towards lesson goals. The
same can also be interpreted against the general crih?ria used in
reporting achievement in terms of levels or grades. However, we have noted
that evidence gathered in a form that is already a summary, as from a tl'St or
('xaminatio", generdlly lac.ks the detail needed to identify and inform next steps
in learning. This means that we cannot usc any evidence for just any purpose.
It argues for maintaining a dear distinction betWl"i'n formative and summativc
in terms of the usc made of the evidence.
Although ther(' an.> shadt.'S of fonnality in the ways of conducting formative
assessment and summative assessment, as illdicat('d in Figure 6.4, the diffcr-
116
On the Relationship Between Assessment for Formative and Summative Purposes
ence in purpose remains. We cannot make an assumption tnat the way in which
('vidence is gathered will determine its use in learning; a classroom test can be
used to inform teaching witnout any reference to levels or it can be used to
provide a grade or 11'\1'1 for end of stage reporting. The asymmetrical relation-
ship - that evidence collected for formative assessment can be used for sum-
malive assessment but not vice \ers.l - means Ihat removing the labels and
referring only to 'assessment' would ineVitably favour summative purposes.
It is both a weakness and a strength that summative assessment derived by
reintl.'Tpreting fomlatiw evidence means that both are in tne hands of Ihe
teacher. The weakness arises from the known bias and errors that occur in
teachers' judgements. All assessment involves judgement and will therefore be
subjt->et to some error and bias. \Vhile this aspect has been given alt.mtion in the
context of teachers' assessment for summative uses, it no doubt exists in teach-
ers' assessment for fornlative purposes. Although it is not necessary to be over-
conC'l!med about the reliability of assessment for this purpose (because it occurs
regularly and the teacher wi11 be able to use feedback to correct for a mistaken
judgment), the more carefully any s ~ s s m e n t is made the more value it will
have in helping learning. TIle strength, therefore, is that the procedures for
ensuring more dependable summative assessment, which need to be in place in
a syst{'m using teachers' judgements, will benefit the formative use, the
teacher's und...rstanding of the learning g0.11s and tlw nature of progression in
achieving them. Experience shows that moderation of teachers' judgements,
necessary for external uses of summative assessment, can be conducted so that
this not only serws a quality control function but aloo has a quality assuranC'l!
function, with an impact on the process of assessment by teachers (ASF, 2004).
This will improVl' the collection and use of evideno..' for a formativl' as well as
a summative purpose.
This chapter has sought to exploTl' thl' rl'lationship between formative
assessment and summative assessment with a view to using the same evidence
for both purposes, We have Sl'{'n that there are potential dangers for formatiw
assessment in assuming that evid...nce gathered for summative assessment Clln
serve formaliw pUTpost-'S. Similarly, additional measures need to be put in
place if summati\'e assessment based on evidence gathered and used for form-
ative assessment is to be adcquat{'ly rcliJble. These issues are key to protecting
the intel;rity of as&>SSIllent and parlirular to protecting the integrity of forma-
tive assessment so that assessment has a positive impact on learning. which is
th{' centrJ! concern of this b<xlk.
117
Chapter 7
The Reliability of Assessments
Paul Black and Dylan Wiliam
The discussion at the end of the previous chapter raises the question of the part
that tcachen; play in the summative assessment of their students. For many of
the decisions made within schools, summative assessments made by teachers
play an important role and affect the progress of students. For summative
assessments that are used outside the school whether for progress 10 employ-
ment, further stages of education or for accountability purposes, the stakes are
even higher. The question of the extent to which such assessments should be
entrusted to teachers and schools is a key issue in assessment policy.
Any assessments should be so designed thai the users of the results, be they
the students, their parents. their teachers or the gatekeepers for further stages of
education or employment, can have oonfidence in the results. 1bere are two main
criteria of quality of an examination result thai should be a basis for such confi-
dence: reliability and validity. This chapter is concerned only with the first of
these, although there are areas of ov('rlap bctw('Cll th('J1l. The tenn 'dependabil-
ity' is used to signify the overall judgement of quality for an assessment which
may be influenced by both rellability and validity, and by other features also.
It is not possible to optimize the systems for producing summative assess-
ments, t.'ither for use within schools or for more general public use, unless both
the reliability and the validity of the various methods available are carefully
appraised. Both qualities are essential. However, the public in geTK.'ral and polley
makers in particular do not understand or pay attention to reliability, TIley
apJX'ar to have faith in the dept.'lldability of the results of short tests when they
are in fact ignorant of the si7..es of the inescapable errors that accompany this and
any other measure. This is II serious failing. Decisions which will have an impor-
tant effect on a student's future may be taken by placing more trust in Oil test-score
than in otht>r evidence about that student, when such trust is not justified.
In this chapter, the first section discusses what is meant by the reliability of
the score obtained from a summative test and the second examines published
evidence about lest reliabilities. The third section then looks at decision consis-
teney, thai is, the effects of limited reliability on the errors that ensue in assign-
ing candidates to specific grades or levels on the basis of lest srore5. The srope
of the considerations then broadens in the next thret' sections, which discuss in
tum the o\'erlap between reliability and validity; the reliability of formative
assessments and the broader issue of dependability. The leading issues are then
highlighted in a closing summary.
119
I
A5.sessmel'lt and learning
Thre.ts to reliability
No test is perfectly reliable. It is highly unlikely that the SOJTC thai somCQne gets
on one occasion would be exactly the same as on another occasion, c,cn on tht!
same test. However, if they took the same or similar tests on a number of occa-
sions then the average of all those scores would, in general be a good indicator
of their capability on whatever it was that the test was measuring. This awrage
is sometimes called the 'true score'. Thus, the starting point (or the
reliability of a lesl is to hypothesi7.e that each student has a 'truc score' on a par-
ticular lest - this does not mean that we believe that a student has a true 'ability'
in (say) reading, nor thaI the reading score is in any Soense fixed.
The main sources of error that can threaten the rclinbility of an examination
result are:
Any particular student may perform belter or worse depending on the actual
questions chosen for the particular administration of the lest;
The same student may perform better or worse from day-ta.-day;
Different markers may give different marks for the same piece of work.
(Black and Wiliam, 2(02)
The first of these three is a problem of question sampling. On any syllabus,
there will be a wry large number of questions Ihal can be set. Questions can
differ both in their content (for example, force. light. electricity in physiC5) and
in the type of atlainment Ihat they lest (for example. knowledge of definitions,
solution of routine short problems. design of an experiment to test a hypothe-
sis). Those who sel the UK General Certificate of Secondary Educatiun (GCSE)
and Advanced-level examinations usually work with a twa.-dimensional grid
with (say) content topiC5 as the rows and types of attainment as the columns.
The relative weights to be given 10 the ce11s of Ihis grid are usua11y prescribed
in the syllabus (but they can vary between one syllabus and another); so across
anyone examination, the examiners must reflecl this in the distribution of the
questions (for example, one on the definition of force, ""0 on applying the
concept in simple quantitative problems, and so on). In addition, they may
deploy different types of qUL'Stions, for example using a set of 40 mullipl ..-
choice questions 10 lest knowledge and simple applications so as to cover m.my
cells, and then having a sma11 number of longer problems 10 lesl application of
concepts and synthesis of ideas (for example, design of an experiment involv-
ing detection of light with devices which give oul elL'Clrical signals).
What the examples here demonstrate is that the composition of an examina-
tion is a delicate balancing act. There is a huge number of possible questions
that can be set on anyone syllabus: the examiners have to select a tiny propor-
tion and Iry to make their selection a fair sampl.. of Ihc whole.1 If the time
allowed for Ihe test or tests is very short, the sample will be very small. The
smaller the sample, the less confidence one can have thai the result for anyone
candidate would be the same as that which would be given on anolher sample
composed in the s.lme way. Thus, any examination can become mort' reliable if
il can be given a longer time.
120
I
The Reliability of Assessments
No examination can prooua.' a perfect, error-free result. TIle size of the errors
due to the first of the sources of error listed above can be estimated from thc inter-
nal consistency of a test's results. If, for a test composed of Sl'VCral ilt'ms, candi-
dates are divided according to their overall score on the test, then one can look at
each component question (or item) to see whether those with a high overall score
have high scores on this question. and those with low overall scores have low
scores on this question. If this turns out to be the case, then the question is said to
have 'high discrimination' .If most of the questions have high discrimination. then
they are consistent with one another in putting the candidates in more or less the
samc order. The reliability-ooeffident that is often quoted for a lest is a meil5l.lre of
the internal consistency between the different questions that make up the test. Its
value will be a number between ".era and one. The measures usually employed
are the Kuder-Richardson coefficient (for multiple-choice tests) or Cronbach's
alpha (for other types of test); the principle underlying these two is the same.
If this internal consistenty is high, then it is likely that a much longer test
sampling more of the syllabus will give approximately the same result.
However, if checks on internal consistency reveal (say) that the reliability of a
test is at the level of 0.85, then in order to increase it to 0.95 with questions of
the same type, it would be ne<essary to more than triple the length of the test.
Reliability could be increased in another way - by removing from the test all
those questions which had low discrimination and replacing them with ques-
tions with high discrimination. This can only be done if questions are pre-
tested, and might have the effect of narrowing the diversity of issues
represented in the test in order to homogenize it.
Indices based on such checks are often claimed to gh'e the reliability of an
examination result. Such a claim is not justified, however, for it takes no account
of other possible sources of error. For example, a second source of error means
that the actual score achieved by a candidate on a given day could vary sub-
stantially from day to day. Again, this fig\lre could be improved. but only by
selling the test in sections with each taken on different days. Data on this source
are hard to find so it is usually not possible to estimate its effed. It would seem
hard to claim a priori that it is negligible.
The third source - marker error - is dealt with in part by careful selection and
training of markers, in pari by rigorous rules of procedure laid down (or
markers to follow and in part by careful checks on samples of marked work.
Whilst errors due to this SOUITI' could be reduced by double marking of every
script, this would also lead to very large increases both in the cost of examina-
tions and in tlw time taken 10 determine results. Particular cases of marker error
justifiably attract public concern, yet overall the errors due to this source are
probably small in comparison with the effects of the other sources listed here.
II is important to note, th'refore. that the main limitations on the accuracy of
e,\CaminJtion results are not the fault of testing agencies. All o( the sources could
be tacklC'd, but only if increases in costs, examining times and times taken to
produce results were to be aa:epted by the edocational system. Such acceptance
seems most unlikely; in this, as in many other situations, the public gets what it
is prepared to pay for.
'21
I
Assessment and learning
Evidence about reliability
Because there arc few published studies relating to the reliability of public
examinations, the proportions of candidates awarded the 'wrong' grade on any
one occasion arc nol known It is very surprising that there are no serious
attempts to research the effects of error in public eJlaminations, let alone publish
the rt'Sults.
The crucial criterion is therefore how dose the score I'll' get on a particular
testing occasion is to the 'true score', and given possiblt' error in a final mark,
there follows the possibility that a candidate's grade, which is based on an inter-
pretation of that mark, will also be in error.
Thus this criterion is ooncemcd with the inevitable chance of error in any
examination result. Four studies serve to illustrate the important(' of this crite-
rion. The first iSi! study by Rogosa (1999) of standardized lesls used in the state
of California. nus shows that even for tests with apparently high indices of reli-
ability, the chances of a candidate being mis-classified are high enough to leild
to serious consequences for many candidiltes. His results were expressed in
terms of percentiles, a measure of position in the rank order of all candidates. If
a candidate is on (say) the 40th percentile this means thai 40 per cent of all can-
didates have marks at or below the mark achieved by that candidate. His results
showed, for example, !hilt in gT3de 9 mathemiltics there is only a 57 per cent
probability that candidates whose 'true score' would put them in !he middle of
the rank order of candidates, that is, on the 50th percentile, will ach.Lally be clas-
sified a5 somewhere within the range 40th to 60th percentile, so that the other
43 pl'r cent of candidates will be mis-dassified by over ten percentile points. For
Ihose under-classified, this could lead to a requirement to repeat a grade or to
attend a summer school. II could also result in assignment to a lower track in
school, which would probably prejudice future achievement. Of the thn."C
sources of error listed above, this study explored the effects of the first only, that
is, error due to the limited sample of all possible questions.
The second is a report by Black (1963) of the use of tv.o parallel forms of tests
for first-year physics undergraduates, the two being taken within a few days of
one another and marked by the same markers to common criteria. The tests
were designed to decide who should proceed to honours study. Out of 100 can-
didates, 26 failed the first paper and 26 failed the ~ o n but only 13 failed
both. Half of those who would be denit"d further acress 10 the honours course
on the one pilpcr would have passed on the second, and vice versa. Untillhal
year decisions about proceeding to different courses had been taken on the
results of a single paJX'r. The effects illustrated by this study could ha\'e arisen
from the first two sources of error listed above.
The third study has proVided results which are more detailed and compre-
hensive. Gardner and Cowan (2000) report an analysis of the 11-plus selection
examination in Northern Ireland, where each candidate sits two parallel forms
of test with each covering English, mathematics and science. They were able to
examine both the internal consistency of each test and the consistency between
them. Results are reported on a six-grade scale and each selective (grammar)
122
I
The Reliability of Assessments
school admits its applicants on the basis of their grade, starting with the
highest, and working down the grades until all the places are filled. Their
analysis shows that if one expects to infer a candidate's true grade from the
reported grade and one wants to be correct in this inference 95 per cent of the
time, then for a candidate in a middle grade one can say only that the true score
lies somewhere between the highest and the lowest grades (the '95 per cent con-
fidence interval' thus ranges from the lowes! to the highest grade). For a candi-
date just in the highest grade the true score may be anywhere within the top
four grades; given that this is the 95 per C('Jlt confidence interval 5 per cent of
students will be mis-<Iassified by an even greater margin. Of course, for stu
dents dose to the threshold, even a small mis-classification might lead to the
wrong decision. Given that 6-7000 candidates secure selective places, it is likely
that around 3000 will be mis-classified to the extent that either secures or denies
acceptance of their entry to grammar school due to the unreliability of the test.
1ltis study reflects the effects of all three possible sources of error.
The fourth source of evidence was provided by an analysis carried out by
Wiliam (2001) of the key stage tests used in England at ages 7, II and 14 respec-
tively. He concluded that the chances of a student's level result being wrong by
one level were around 20-30 per cent - this being an underestimate as it was
based onJy on the variability revealed by the internal consistency of perfonn-
ances on the single test occasion. 1ltis example is discussed in more detail
below. This study is similar to that by Rogosa quoted above, in that it explores
only the effects of errors due to the limited sample of all possible questions.
One can note that three of the reliability studies quoted above were carried
out by critics outside the systems crilidzed. There have been no funnal allempts
by governments or their agendes to conduct thorough research to establish the
reliabilities of high-stakes examinations. If this were to be done, il seems likely
that the resulting probabilities of mis-grading would be large enough to cause
some public concern. Thus it is essential that such research be undertaken and
the data made public. The following conclusion of Gardner and Cowan about
the Northem Ireland test applies with equal force to all of our public testing:
The publishN illfimnatioll 011 lilt does 1101 '1It.'rl1he mjuimnents of the interna-
ti0n4/ slandards 011 rollCQtional bolh gentrally in lilt prooision of standard
reliability and VIllidity information and particularly, for uample, in lilt validation of
the Test outcoma in rtlalion to its predictirw power ifor 'potential to
benefit from agrammar school rollcation'), rstablishing norms, plVllidillg informa-
tion on pott'll/ial mis-c/assijicalion, and accommooating disability. (1000: 9)
Estimating the consequences: decision consistency
As noted above, an individual's true score on a test is simply the average score
that the individual would get over repeated takings of the same or a very
similar test. The issue to be explored in this section is the possible effect of
errors in their actual test scores on decisions taken about the classification of
123
Copyrighted Material
Assessment and Learning
candidates in grades or lewis (for the examples '1uoted in the pTl'vious 5{'(tion
it was these Con5(.'quend.--s which were the focus of attention).
Knowing a student's mork on 0 test is not very informDtive unless we know
how difficult the test is. Because calibrating the difficulty of tests is complex the
results of many standardized tests al"{' Tl'ported on a standard scale, which
allows the performance of individuals to be compaTl'd with the performance of
a representative group of students who took the test nt some point in the past.
When this is done. it is conventional to scale the scores so that the overage score
is 100 nnd the standard deviation of the SCOIl'S is 15. This means that:
68 per cent (th<lt is, roughly two-thirds) of the population scure betwL'Cn 85
and 115;
For the other 32 per cent, 16 per cent SCOI"{' below !IS and 16 per cent score
above 115;
96 per cent score between 70 and DO.
So we can S<ly that the level of pcrformance of someone who scores 115 on a
reading test would be achieved or surpassed by 16 per cent of the population,
or that this leve! of performance is at the 84th percentile.
From this it would be tempting to conclude that som{'{)ne who scored 115 on
the test really is in the top 16 per cent of the population, but this may not be the
case because of the unreliability of the test. To explore the consequences of error
in the seore of any candidate, the first step is to examine the internal consisten-
cies amongst the test's scores. This can be used to calculate the conventional
measure known as the 'reliabi lity CUC'fficient'. A value for this coefficient of 1.0
means that the errors ure zero. so there is no error and the test is perfectly reli-
able. A )Cffident of (J.O means that the errors are very variable und the spread
in their likely values is the S<lme as that of the observed scores. that is, the scores
obtained by the individuals are all error so there is no information about the
individuals at nil! When a tL'St has a reliability of zero the rt'Sult of the test is
completely rundom.
The reliability of tests procluaxl in schools is typically around 0.7 to 0.8 while
that for commercially produced educational tests range from 0.1'1 to 0.9, and can
be over 0.9 for specialist psychological tests (a reputable standardized test will
provide details of the reliability and how it was calculated). To 5(.'C what this
means in practicr. it is useful to look at some specific kinds of tests.
If we assume a value for the reliability of a test, then we can estimate how far
the observed score is likely to be from the true saJre. For example. if the relia-
bility of a test is 0.75, then the standard deviation (5D) of the errors (a measure
of the spread in the errors) turns out to be 7.5.2 The consequences of this for the
standardized test will be that:
For 68 per cent of the candidates their actual scores will be within 7.5 (that is,
one 5D) of their true scores;
For % pcr crnt of the candidates their actual SCOIl'S will be within 15 (that is,
two 50s) of their true scores;
124
Copyrighted Material
The Reliability of Assessments
For 4 per cent of the candidates their actual score will be at least 15 away
from thcir true score.
For most studenlS in a class of 30, their actual score will ~ close to their true
score (that is, what they 'should' have got), but it is likely Ihal for at least one
the score will be 'wrong' by 15 points (but of course we do oot know who this
student is, nor whether the score they got was higher or lower than their true
score). For a test with a reliability of 0.75, this means that someone who scores
115 (who we might think is in the top sixth of the population) might on another
occasion score just 100 making them appear average. or as high as 130. pUlling
them in the top 2 per cent (often used as the threshold for considering a studmt
'gifted'). U the reliability were higher then this spread in the errors would be
smaller - for a reliability 01 0.85. the above value of 1.5 for the SO would be
replaced by a value of 6.
8ecauSll' the effects of unreliability operate randomly the averages across
groups of students, howt'wr, are quite accurate. For every student whose actual
score is lower than their true score there is likely to be one whost> actual score
is higher than their true score. so the average observed score across a class of
studfmts will be very dose 10 the average true score. But just as the person with
OIW foot in boiling waler and one foot in ice is quite comfortable 'on avt'rage'
....-e must bt> aware that the results of even the best tests can be wildly inaccurate
for a few individual students, and therefore high-stakes decisions should never
be based solely on the results of single tt'5ts.
Making sense of reliability for the key slage tests used in England is h.l.rder
because these are used to assign levels rather than marks, for good reason. It is
tempting to regard someone who gets 15 per cent in a test as being better than
someone who gets 14 perrent. ewn though the second pel!/Oi' might actually have
a higher true score. In order to avoid unwarranted precisian, thl-n:forc, ....-e oftl'1'I
just report levels. The danger, howcvt'r, is that in a\'Oiding un.....arranted procis.ion
....-e end up fulling victim to unwarranted accuracy - while we can sec that a mark
of 15 per cent is only a little better than 14 per rent, it is tempting to conclude thilt
level 2 is somehow qualilath-ely better than level 1. Firstly, the difference- in per-
(ormance betwel'O SOIlll'OO(' who scored level 2 and someone who scored le\'d I
might be only a single mark, and secondly, because o( the unrcliability of the tesl,
the person scoring 1('\.'('1 I might actually ha\-e h"d a hight.... true score.
Only limited data have bet>n published about the f't'liability of nalional cur-
riculum tests, although it is likely that the reliability of national curriculum lest5
is around 0.80 - perhaps slightly higher fot malhematiCtl and C i e n ~ Assum-
ing this reliability value, it is possible 10 ("alcuhtte the proportion of .'itudrnts
who would bc awarded the 'wrong' levels al each key stage of the national cur-
riculum. The proportion varies as a result of the unreliability of the tests as
shown in Table 7.1.
It is clear thai lht> greater the precision (that is,. thl! mOf\c> differenl levels into
which studl.'Ots are to be classified as they move from KSI to K(3) the lower the
accuracy. What is also clear is that although the proportion of mis-dassilkations
declines steadily as the reliability of a test increases, the impro\'Cmt.'I\t is very slow.
125
I
Assessment and Learning
TablE> 7.1: in prrJFJ('rtirm fJ/ in n"tio,,"' clirril:ullim tests willi
",liability
R..Ii.bIlJly of T..sl
0."
0.65 0.10 0.15
""
0.85 0.90 0.95
Pm:enllge t%) of Students Misclassifid .., Each
Key Stage
Key Suge
""
27 25 23 23
" "
" "
"SJ
...
" "
36
"
27 23
"
"53
55 53
"
" " '"
"
"
We can make tests more reliable by improving the items included in the tests
"nd by making the marking more consistent, but in general the effect of such
changes iii smaU. There are only two ways of achieving a significant increase in
the reliability of a test: make the scope of the lest narrower so you ask more
questions on fewer topics, or make the lest longer so you ask more questions on
all of the topics.
It tums out that3 if we have a test with a r'-'liability of 0.75, and we want to
make it into a test with a reliability of 0.85, we would need a lest 1.9 times as
long. In other words, doubling the length of the test would reduce the propor-
tion of students by only 8 per cent at Key Stage t by 9 per cent at
Key Stage 2 and by 4 per cent at Key Stage 3. It is clear thilt increaSing the reli-
ability of the test has only a smilJl effect on the accuracy of the levels. In filct, if
we wilnted to improve the reliability of Key Stage 2 tests so that only 10 per cent
of students were awarded the incorrect level, we should need to increase the
length of the tests in each subject to o\'er 30 hours.
Now il seems unlikely thai even the most radicdl proponents of schools tests
would countenance 30 hours of testing for each subject. In survey tests, which
use only limited samples of students to obtain an overall evaluation of Shldents'
achievement, it is possible by giving different tests to different sub-samples to
use 30 hours of tC'sting (see for example, the UK Assessm('nt of Perfomlance
Unit surwys - Black, 1990). However, the reliability of the overall test perform-
ance of a group will be far hight'r than that for anyone individual, so that in
optimizmg the design of any sud! survey the extra testing time available has
been used mainly to increase the variety of IX'rformance outcomes assessed,
that is, to enhance validity.
Fortunately, there is allOthl'r way of increasing the effective length of a test,
without increasing testing time, and that is through the use of teacher assess-
ment. By doing this we would, in effect, be using assessments conducted over
tens if not hundreds of hours for ead! student so that there would be the potcn-
tial to achieve a degree of reliability that has never been ad!ieved in any system
of timed written examinations. This possibility has to be explored in the light of
evidence about the potential relilJbility of teachers' summative assessments. A
I
I
I
I
The Reliability of Assessments
mriew of sudl evidence (see Harlen. 2004; ASF, 2004) dors show t.hJ,t it is pos-
sible to achieve high reliability if the procedures by which teachers arrive OIt
summativt' judgments are carefully designed and monitored.
The ov.'" betwn reliability and validity
'There are several issues affecting the interpretation of assessment results that
involve overlap between the concepts of reliability and validity. One S\Ich issue
bears on whether or not have been so romposed and presented that
the student's response will give an authentic pict\lre of the capability being
tested - a feature which may be called the 'disclosure of a question' (Wilia""
1992). Good disclosure is not l'asy 10 attain. For example. SE'\'t!ra1 rtSE'arch
studies have established that in multipll."-Choice rests in sOenct' many of those
making a correct choice among the alternatives had made their selection on the
basis of incorrect reasoning. whilst others had been led 10 a wrong choice by
legitimal" re.lsoning combined with unexpected interpretations of the question.
It would seem that in such tests approximittely one third of students are incor-
rectly evaluated on any Ont' question (Tamir, 1990; Towns and Robinson. 1993;
Yarroch 1991). It has abo been shown.. for open-ended questions. that misin
terpretation frequently leads candidates to fail to display what they know and
understand (Gauld, 1980). This source of error might arise from a random
sourre, for example careless reading by the student, and might have less impact
if the student were to attempt a larger number of questions; it would then
become a reliability issue. However, it might !'i'fled a systematic wt'akness in
the reading and/or interpretation of questions. which is not relevant to the per-
formance th.:lt lhe test is designed to measure; il would then be a validity issue.
A similar ambiguity of overlap arises in considering the use of tests for pre-
diction. For example, we might like most secondary.schools in the UK want to
use the results of IQ or aptitude tests taken at the age of 11 to predict scores on
GCSE examinations taken at 16, or use such tests at the end of high school to
predict performance in tertiary level work (Choppin and Orr, 1976). What we
would need to do would be to compare the GCSE scores obtained by students
at age 16 with those scores which the same students obtained on the IQ tests
five years earlier, when they were II. In gerK'ral we would find thai those who
got high scores in the IQ tests at 11 get high in GCSE, and low scorers
get lower grades. there will also be some students getting high SCOI1!S
on the IQ tests who do not go on to do well at GCSE and vice versa. How good
the predk:tion - oflen c:.alled the 'predictive validity of lhe - is u5ually
expressed as a correlation coeffident. Acorrelation of one means the correlation
is perfect, while a correlation of zero would mean that the predictor te1J5 US
nothing at all about the criterion. Generally, in educational testing. a COtTetation
of 0.7 between predictor and criterion is regarded as good.
In interpreting these coeffidents. can! is often needed because they an:
quently reported after 'correction for unreliabUity'. The validity of IQ scores as
predictors of GCSE is usually taken to mean the comlation between true.scores
on the predictor and true scores on the criterion. Howewr. as we have seen.. we
127
I
Assessment and learning
never know the true scores - all we have are the observed scores and these are
aflected by the unreliability of Ihe tests. When someone reports a validity ClX'f-
ficien! as being corrected for unreliability, they are quoting the correlation
between the true scores on the predictor and criterion by applying a statistical
adjustment 10 the correlation between the observed scores, which will appear to
be much better than we can actually do in practice because the effects of unre-
liability are inescapable. For example, if the correlation between the true scores
on a pfedictor and a criterion - that is, the validity 'corrected for unreliability'
- is 0.7, but each of these is measured with tests of reliability 0.9, the correlation
between the actual values on Ihc predictor and the criterion will be less than 0.6.
Adecline from 0.7 to 0.6 might seem small, but it should be pointed out Ihal the
proportion of the common variance in the results depends on the square of the
correlation coefficient. so that in this case there will be a decrease from 49 per
cent to 35 per cent in the variance in the scores that is common to the two tests.
A similar issue arises in the common practice of using lest results to select
individuals. If we use a test to group a cohort of 100 students into four sets for
mathematics, with, say, 35 in the top sel, 30 in set 2, 20 in set 3 and 15 in set 4,
how accurate will our setting be? If we assume that our selection test has a pre-
dictive validity of 0.7 and a reliability of 0.9, then of the 35 students that we
place in the top set, only 23 should actually be there - the other 12 should be in
sets 2 or 3. Perhaps more implrtantly, given the rationale used for setting, 12
studentJ who should be in set 1 will actually be placed in set 2 or even set 3.
Only 12 of the 30 students in set 2 will be correctly placed there - nine should
have been in set 1 and nine should have been in sets 3 and 4. The complete sit-
uation l!il shown in Table 7.2.
Table 7. 2: A l T ~ y of 5<'1/;"8 with ~ Irsl of oolidity of 0.7
NUDlbetof Stud.nlJ that Should be Plac.d in
Earn S.l
So< ,
S.t2
So"
" '" '"
s. ,
"
, ,
~ in which Shld.nlJ Au
s.,
,
"
S
Adu.o.Uy Plac.d
So< ,
,
S 7
So<
,
So..
"
,
,
In o!h{'r lordS, because of the limitations in tilt' reliability and validity of the test..
only half of the students all." placed where they 'should' be. Again. it is worth
noting that these are IlOIweaknt'sses in the quality of the tests but fundamental
limitations of what tests can do. If anything. the assumptions made hell." all."
rather conservative - reliabilities of 0.9 and predictive validities of 0.7 all." at the
limit of what we can achieve with rorrent methods. As with national curriculum
".
The Reliability of Assessments
testing, the key to improwd reliability lies with increased use of teacher assess-
ment,. standardized and moderated to minimize the potential for bias.
A different issue in the relationship between reliability and validity is the
'trade-off' whereby one may be enhanced at the expense of the other. An
example here is the different structures in the UK GCSES papers for different
subjects. A typkal9l).minute paper in science includes approximately 12 struc-
tured questions giving a total of about 50 sub-sections. For each of these, the
space allowed on the examination paper for the response is rarely more than
four lines. The large number of issues so covered, and the homogeneity in the
type of response demanded, help to enhance reliability but at the expense of
validity, because there is no opportunity for candidates to offer a synthesis or
comprehensi\'C discussion in extl!nded prose. By contrast, a paper in (say) a
social science may require answers to only two or thret' questions. This makes
the test valid in prioritizing modes of connected thinking and writing, but
undennincs reliability in that some candidates will be 'unlucky' because these
particular questions, being a very small sample of the work studied, are based
on topics that they have not studied thoroughly.
An extreme example of this trade-off is the situation where reliability is dis-
regarded because validity is all important. A PhD examination is an obvious
example; the candidate has to show the capability to conduct an in-depth explo-
ration of a chosen problem. The inference that may be made is that someone
who can do this is able to 'do research', that is, to do work of similarly good
quality on other problems. This will be a judgment by the examiners: there is
only a single task, and no possibility of looking for consistency of perfonnance
across many tasks, so that estimates of reliability are impossible. Here, validity
does not depend on reliability. This is quite unlike the case of inferring thai a
GCSE candidate is competent in mathematics on the basis of an aggregate score
over the st'\'Cral questions attempted in a written test. Here, reliability is a prior
condition necessary but nol sufficient - for achieving validity. There are more
complex intennediate cases, for example if a certificate examination is com-
posed of marks on a wrilten paper and an assessment of a single substantial
project; in such a case, the automatic addition of a it'SI paper score iIInd a project
score may be inappropriate.
Aell.bUtty tor formative assessments
Adifferent arena of overlap is involved in the consideration of fonnative assess-
ment, given that all of the above discussion arist.'S in relation to summative assess-
ments. The issues here are very different from the summative issues. Any
evidence here is collected and interpreted for the Pu.rposl' of guiding learning on
the particular task involved and generalizatiOtl across a range of tasks to fonn an
overall judgment is irrelevillnt. Furthermore, inadequate disclosure which intro-
duces irrelevant variation in a test and thereby reduces reliability (as well as
validity) is less important if the teacher can detect and col'll'ct for it in continuing
interactiOtl Vlith the learner. However, some fonnative assessment takes place
over longer lime intervals than that of interactions in (say) classroom dialogue
129
Assessment and learning
and involves action in response to a collection of several pieces of evidence. One
example would be response to a class test, when a teacher has to decide now
much time to gh'e to remedial work in order to tackle weaknesses revealed by
that test: here, both the I'\'liability and validity of the test will be at issue, although
if short-term interactions are to be part of any 'improvement' exercise any short-
comings in the action taken should become evident fairly quickly and can be cor-
rected immediately. This issue was expanded more fully in Wiliam and Black
where the argument was summed up as follows:
As nottdII/.Iow. summlltitot lind frnmlltifJt functions IIrt, for tM pUrpost' ofthis dis-
cussion, charaderiud as the ffld of a ron/inuum alang which Il5se5STllt1lt can be
locllted. At alit atreme (the jonnativtJ the problems of Cffllting s/UlTtd m/!/lnings
beyolld tile immed;ate settillg are ignored; are evaluated by the extent
/0 which they provide II /Jolsis jor slICU5S!II/llclion. At the other extreme (lhe Slml-
mlltive} sh/lred mtllnings art milch more importlln/, lind /he considerable dis/or-
tions and ulldtsirabll' conseqU(lICtS that are often justified by to the
need to create ronsistency of interpretation. Presenting this argu1lll.'n/ somewhat
stllrkly, when fornUltiue functions are pIlramount, meanings aTt often wlida/ed by
their consequences, and when sllmmative functions art paramount, COllseqU(llctS
art by meanings. 0996: 544)
Conclusion
TIUs chapter ends with inoomplcle arguments be<:auSoC of the overlaps between
reliability and validity. Of importance here is dependability, which is essentially
an overall integrating concept in which both reliability and validity are sub-
sumed. It follows that a comprehensive consideration of this overall issue
belongs to the next chapter on validity.
Hov,.oever, the issues discussed here are dearly of great importance and ought
to be understood by both designers of assessment systems and by users of test
results. One arena, in which this has importance, is the use of assessment results
within schools as guidance for students and decisions about them. The fact that
teachers may be unaware of the limited reliability of own tests is thus a
serious issue. Where assessment results are used for decisions beyond schools,
knowledge of reliability is also importanl The fact that, at least for pubilc exam-
inations in the UK, reliability is neither researched nor discussed is a serious
weakness. Data on reliability must be taken into account in designing test
systems for optimum 'trade-off' between the various constraints and criteria
that determine dependability. [n the absence of such data, optimum design is
hardly possible because it is not possible to evaluate fully alternative design
possibilities. As emphasized at the beginning of this chapter, this absence is also
serious because all users can be Sl."riously misled. For example, decisions that
ha\"e an important effect on a student's future may be taken by placing mOll."
trust in a test-score than in other evidence about that student, when such trust
is dearly not justified.
130
Copyrighted Material
The Reliability of Assessments
Oyt'wll, ont' consequt'nce of tht' absenct' of feliabil ity data is that most teach-
ers, th(> public in general. and policy mllkl'rs in particular do not understand or
1lltend to test reliability as lin issue ilnd some ilre indeed reluctilnt to promote
res<'arch into reliability because of a fear thilt it will undermine public confi-
dt'nce in examinations. Of course, it may well do so to an unrt'asonable degrl-'e
wherl' the mlxliil and thl' public gent'rilily do not understand rona'pts of uncer
tilinty ilnd error in datil. A debilte that promotes the development of such
understilnding is long overdue.
Notes
In eXilminiltion jargon, the whole collection of possible questions is Cllned il
'domain', ilnd the issue just discussed is called 'domain sampling'. In any
sllbject, it is possible to split tht' subject domain (say physics) into S('veral sub-
domains. t'ithcr according to rontcnt or to typt'S of atlllinment (say under
standing of concepts, llppHCiltion in complex problems, design of experi-
ments). One might then test each domain separately with its own sct of ques-
tions and report on a student's attainment in each of thCSl' domains scp..lrately,
so giving 1l profile of atlllinments insklld of a singl(> rt'suH. Howcwr, if this is
done by splitting up but not inCfCllsing the total testing time, then each
domilin will be tested by a very small number of questions so thilt the SCOfc
for each element of the profile will be filr less reliable than the overall sCtlre.
2 SinCl' the stilndard dl'viation of tht' scorcs is 15 for a rcli,lbility of 0.85. from
our key formula we Ciln say that the st,lOdard deviation of the errors is:
j1 (J.S5 x 15
which is just under 6. Similarly. for a reliability of 0.75, the same formula will
give II valut' of 7.5.
3 In gcn(>ral if we hllve lltest of relillbility r and we want a reliability of R, then
we need to lengthen the test by a fllctnrof" given by:
R(l- r)
If - r(l R)
4 The classifieiltion consistency increilSCS broadly ilS the fourth root of the test
length. so a doubling in classification consistency r<,<!uires incl'\'asing the
test length 16 timl'S.
5 GCSE is the General Ccrtificat(> of Secondary Education which comprises a
sct of subject eXilminiltions, from which most students in secondary schools
in England. Wales and Northern Ireland al age 16, that is, at the end of Ctlm-
pulsory l-xlucation, chOOS(' to tal<(> 11 ft'w (gent'rally betwl--'t'n 4 and 7) subjeds.
131
Copyrighted Material
Chapter 8
The Validity of Formative Assessment
Gordon Stolwlrt
The deceptively simple claim of this chapter is that for formative assessment to be
valid it must lead to further learning. 1be validity argument is therefore about the
consequences of assessment. The assumption is that formative assessment gener-
ates information thai enables this further learning to take place - the 'how to gel
there' of our working definition of assessment for learning (see the Introduction).
One implication of this is that assessments may be fonnative in intention but are
not so in practice because they do not generate further learning.
This 'consequential' approach differs from how the validity of summative
assessments is generally judged. Here the emphasis is on the trustworthiness of
the inferences drilwn from the results. It is about the meaning attached to an
assessment and will vary according to purpose. Reliability is more ct"ntral to
this because if the results are unreliable, then the inferences drawn from them
will lack validity (see Chapler 6).
This chapter examines current understandings of validity in relation to both
summative and formative assessment. It then explores the conditions that
encourage assessment for leaming and those which may undermine il Two key
factors in this are the context in which learning takes place and the quality of
feedback. The learning context includes the socio-cultural and policy environ-
ment as well as what goes on in the classroom. Feedback is seen as a key
element in the teaching and learning relationship. These factors relate directly
to the treatment of 'making learning explicit' in Chapter 2, to motivation in
Chapter4 and the formative-summalive relationship in Chapter 6. Reliability in
formative assessment is discussed in Chapter 7.
Validity
Most of the theorizing about validity relates to testing and this will be used as
the basis for looking at validity in formative assessment. In relation to testing.
validity is no longer simply seen as a static property of an assessment, which is
something a test has, but is based on the inferences drawn from the results of
an assessment. This means that each time a test is given. the interpretation of
the results is part of a 'validity argument'. For example, if a well-designed
mathematics test is used as the sole selection instrument for admission to art
133
,
Assessment and Learning
school we may immediately judge it as an invalid assessment. [f it is used to
select for mathematics classes tnen it may be more valid. [t is how the assess-
ment information is understood and used that is critical. AI the heart of current
understandings of validity are assumptions that an assessment effectively
samples the construct Ihal it claims to assess. Is the assessment too restricted in
wh.1.1 it covers or does it actually assess different skills or understandings to
those intended? This is essentially about fitness-for-purpose. It is a property of
the test scores rather than the lest itself. The 1985 verSion of the American Edu-
cational Research Association's Standards for Educatiollal alld Psych%gleal Testing
was explicit on this: 'validity always refers to the degree 10 which , .. evidence
supports Ihe inferenres Ihal are made from the scores' (1985: 9).
Ths approach was rnampioned by Messick:
Validity is all illttgrated ewJuative judgtmt'nt of tilt degref' 10 which tmpirical <'Vi-
dellce alld theoretical ratiollaJes support the adequacy mId appropriateness of infrr-
enas and actions bas.!d fJlr ttst srore5 or other modI'S of IIssessmenl. (1989: 13)
TIle validity of summative assessment is therefore essentially about
thiness, how well the construct has been assessed and the results interpreted.
This brings into play both the interpretation of the construct and the reliability
of the assessment. Any unreliability in an assessment weakens confidence in the
inferences that (an be drawn. If there is limited confidence in the results as a
consequence of how the test was marked or how the final grade was decided,
then its validity is threatened.
Howevt'r, a test may be highly reliable yet sample only a part of a con-
stmcl. We can then have only limited confidence in what il tells us about a
student's overall understanding. Take the exampl(' of a reading test. To 1x> valid
this test mUSI assess competence in reading. But what do we mean by reading?
Thirty years ago a widely used reading tesl in England was the SchonI'll's
Graded Word Reading Test which required readers to pronounce com'Ctly
single deoontextualized words of increasing difficulty (for example, tree - side-
real). The total of correctly read words, with the test stopping after ten conser:-
utive failures, was then converted into a 'Reading Age'. By contrast, the current
national curriculum English tests for II-year--olds in England are based on a
construct of reading that focuses on understanding of, and making inferences
from, written text. Responses are written and there is no 'reading out loud'
involved. Gearly a key element in considering validity is to agree the construct
that is being assessed. Which of the above provides a more valid reading score
or do both suffer from 'construct under-representation' (Messick, 1989)?
This approach links to the previous h\1O chapters. Because the purpose of the
assessment is a key element in validity - 'it does what it claims to do' - then th('
validity argument differs for formative and summative assessment (see
Chapter 6). In fonnative assessment it is about consequences - has further
learning laken place as a result of the assessment? In summative assessment it
is the trustworthiness of the inferences that are drawn from the resulls - does
our interpretation of students' results do justia!' to their understanding? On this
134
The Validity of Formative Assessment
basis much of what is discussed as reliabilily in Chapter 7 is subsumed into
validilyarguments.
Threats to validity
This chapler uses Ihe idea of 'threats to validity' (Crooks et aI., 1996) 10 explore
whert" the validity argument may be most vulnerable. Crooks el al. use an
approach which sees the validity process as a series of linked stages. The
weakest link in the chain is Ihe mosl serious Ihrt"al to validity. If Ihere is limited
confidence in the results as a consequence of a highly inconsistent administra-
tion of a test, then this may be the most important threat to validity. Or it could
be \hat a tesl was fairly marked and graded but the interpretation of the results,
and Ihe decisions made as a consequence, were misguided and therefore under-
mine its validity.
One of the key threat:> 10 validity in test-based 5ummati\"e assessmenl is tholt,
in the quest for highly reliabll' assessml'nt, only thl' more l'asHy and rt:'liabl)'
assessed paris of a construct art:' assessed. So speaking and listening may be left
out of language t s ~ because of reliability issues and writing may be assessed
through multiple-choice tests. \\'hile reliability is necessary for \'alidit)', a highly
reliable tl'St may be less valid because it sampled only a small part of lhe construct
- so we cannot make confident generali7.alions from the results. One of the strong
arguments for summative teacher assessment is that it does allow a construct to
be more fully; and repeatedly, sampled. So even if it may seem less reliable
because it cannot be standardized as precisely as examination marking. il may be
a more dependable assessment of the construct being measured (see Chapter 7).
A second major threat to validity is what Messick obscurely calls 'construct
irrelevant variance'. If a test is intended to measure reasoning skills but
students can do well on it by rote leaming of prepan.-d olnswefl;, then it is not
doing Wholl it claims it is doing; it therefore lacks validity. This is because
suCO?ss has come from perfonnance irrelevant to the construct being assessed
(Frederiksen and Collins, 1989).
This is important be<:ause it means that leaming cannol simply be Ci:juated
with perfonnance on lests. Goo<! scores do not necessarily mean that effective
learning has taken place. This was evidenced by Gordon and Reese's
conclusions on their study of the Texas Assessment of Academic Skills, which
students were passing
et>t'll thlJUgh Iht students hlWt nroer lellrned the amcepts on which thry are bting
tested. As tellehl'TS become mare IIdept lit this I'rocI"SS, they elln roen telleh s/udellts
to carrn:tly 1l/ISWtr ttSt it.-Ills ill/ended to Illl'llSUre sludellts' Ilbility to apply, or
sY/lthtSiu, rot'/J though tile Sludents hill!/' not droelopl'd appl!catillrJ, analysis or
synthtSis skills. (1997: 364)
If increasing proportions of ll-year-olds reach level 4 on the national curricu-
lum assessment in England, have educational standards risen? There is limited
135
Assessment and learning
public recognition that there may be an 'improved test taking' factor that
accounts for some of this and there may not be the same degree of improvement
on other, similar, measures for which there has not been extensive preparation
(Tymms, 2004; Linn, 2OCXJ).
This brief review of how validity is being interpreted in summative assessment
proVides a framework for considering the validity of formative assessments. Here
validity arguments go beyond the focus on the inferences drawn from the results
to consider the consequences of an assessment.. a contested approach in relation
to summative assessment (see Shepard, 1997; Popham,. 1997).
V.lid fOmMtive .ssessment
It is consequential validity which is the basis for validity claims in formative
assessment. By definition, the purpose of formative assessment is to lead to
further learning. If it fails in this then, while the intention was formative, the
process was not (Wiliam and Black, 1996; Wiliam,. 2(00). This is a strict, and cir-
cular, definition which implies that validity is central to developing or practis-
ing formative assessment.
Validity in formative assessment hinges on how effectively this learning
takes place. What gets in the way of this further learning can be treated as a
threat to the validity of formative assessment. This parallels processes for
investigating the validity of tests (Crooks et 011.,1996) and of teacher-based
performance assessments (Kane et a1.,1999). In the follOWing sections some of
the key factors that may support or undermine formative assessment are
briefly considered. The learning context in which formative assessment takes
place is seen as critical, This includes what goes on outside the classroom.. the
social and political environment, as well as expectations about what and how
teachers teach and learners learn within the classroom. At a more individual
level, feedback has a key role in formative assessment. What we know about
successful feedback is discussed, along with why some feedback practices may
undermine learning.
The learning context
If validity is based on whether learning takes place as a consequence of an
assessment, what can encourage or undermine this learning? Perrenoud (1998)
has argued that formative assessment is affected by what goes on 'upstream' of
specific teacher-learner interactions and that this context is often neglected,
partly because it is so complex. Some of the cultural assumptions on which
assessment for learning is based are a product largely of developed anglophone
cultures (particularly Australia, New Zealand, the UK and the USA), with their
'whole child' approaches, individualism and attitude to motivation. II is there-
fore worth briefly considering how some different social and cultural factors
may affect what goes on in the classroom, since these are Iiktoly 10 provide dif-
fering threats 10 effective formative assessment.
The Validity of Formative Assessment
Outside the classroom
At the macro level, the role and stah.Ls of education within a society will impact
on students' motivation to learn and the scope for fonnative assessmenl. In a
society where high value is plan>d on education, for example in Chinese edu-
cation. student motivation to ll!arn may be a 'given' (Watkins, 2iXXl) rather than
having to be fostered by schools as is often the assumption in m3ny UK and
North American schools (Hidi and H3rackiewicz, 2iXXl; ARC, 200201). Similarly,
the emphasis on fl'edback being task-related rather than self-related may sit
more comfortably in cultures which see the role of the teacher as to instruct (for
example, France) rather than as to care for the 'whole child' (for example Ihe
UK, see Raveaud, 2(04). There are also cultural differences around the extent to
which education may be Sl'i'n as a collective activity with group work as a
natural expression of this, or as an individualistic activity in which peer assess-
ment may seem alien (Watkins, 2000).
The curriculum and how it is assessed is another key 'outside' contelltual
factor. The opportunities for fonnative assessment, in a centralized curriculum
with high-stakes national testing. will be different for those teachers who enjoy
more autonomy over what they have to cover and how they assess il. In
Chapter 6 the question is raised as to whether an outcomes-based/criterion-
related curriculum, with its attempts to make learning goals and standards
explicit, provides improved opportunities for fonnative assessment.
Inadequate training and resources are obvious threats to fonnative
assessmenLIn many large and badly resourced classrooms, ideas of individual
feedback or of regular groupwork are non-starters. In some countries very
large classes may be taught by teachers with limited subject knowledge who
an' teaching an unfamiliar curriculum - all of which will limit potential (Meier,
2000).
The culture of schooling will also impact on the effectiveness of
fonnative assessment. Entrenched views on teaching and learning may
undennine or support fonnative assessment, as might deeply embedded
assessment practices. For example, in a culh.Lre where the dominant model of
teaching is didactic, moves towards peer and self-assessment by learners may
involve radical, and managerially unpopular, changes to the classroom ethos
(Carless, 2005; Black et al., 2003). Similarly, where continuous assessment by
teachers is high stakes, detennines progression and is well understood by
parents, any move by a classroom teacher to provide feedback through
comments rather than marks or grades is likely to meet ~ s t n (Cark"Ss,
2005). This may come from both outside and inside the school - parents may
see the teachers as not doing their jobs properly and the students may not c0-
operate because work that does not receive a mark does not count, and so is
not worth doing.
These are just a few examples of the ways in which the social and educational
context will shape and control what is possible inside the classroom and in indi-
vidual teacher-sh.Ldent interactions. TItese social and cultural factors will con-
dition how effective fonnative assessment in the classroom will be.
137
Assessment and Learning
Inside the dassroom: lumlng context and fHdback
For formative assessment to lead to learning.. the classroom context has to be
supportive and the feedback to the learner productive. Uthe conditions militate
against learning they become threats to validity.
Crooks (2001) has outlined what he considers are the key issues that influ-
ence the validity of formative assessment in the classroom. He groups these into
four main factors: affective, task, structural and process. To reflect the themes of
this book Ihey can be organized in terms of trust and motivation; explicit learn-
ing; and the fonnative/summative relationship.
Trust and motivation
Chapter 4 demonstrates that learning involves trust and motivation. For
Crooks, trust implies supportive classroom relationships and attitudes where
the student feels safe to admit difficulties and the teacher is constructive and
encouraging. Motivation involves both teacher commitment to the student's
learning and the student's own wish to learn and improve. TheN may also be a
strong contextual element in this, so that il will also fluctuate across different
situations -I may be much more willing to learn in drama than I am in maths.
Crooks's approach to affective factors may reflect the cultural assumptions of
English-speaking industrialized societies. In other cultures, motivation and tnIst
may be differently expressed. For example, commentarlcs on Russian schooling
suggest a very different attitude to praise and to building self-esteem. Alexander
observes that while there was only a handful of praise descriptors in Russian, 'the
vocabulary of disapproval is rich and varied' (2CX. 375). Yet in this more aitical
climate, there is strong evidence of Russian students pursuing mastery goals and
being willing to risk mistakes. Hufton and Elliott report from their comparative
study that 'it was not unusual for students who did not understand something.
to request to work in front of the class on the blackboard, so that teacher and
peers could follow and correct their working' (2001: 10). Raveaud's (2004) account
of French and English primary school classes offers some similar challenges to
anglophoroe understandings of fostering trust and motivation.
II may be that we need a more robust view of trust to make sense of such find-
ings. TIle Russian example of trust may be a lot less likely to occur in cultures in
which teachers seek to minimize the risk of error and to protect learners' self-
esteem. 1lle trust seems to be based on the assumption that the teacher is there to
help them learn bUI is not necessarily going to rescue them immediately from mis-
takes or misunderstandings. it is this kind of trust that makes the idiosyncratic and
unplanned formative interactions in the classroom powerful; there is confidence in
the tearner who has, in tum, confidence in the student's capacity to learn.
Explicit learning
An element of this trust is that the teacher knows what is to be learned. Explicit
learning incorporates the teacher's knowledge and understanding of the task,
the aiteria and standards that are to be met and how effectively these are com-
138
The Validity of Formative Assessment
municated to the learners. Oarke (2001, 2005) has continued to draw attention
to the importance of being explicit about 'learning intentions'. The importance
of subject knowledge may have been underplayt.'d in some of the earlier writ-
ings on assessment for learning. though there is now an increased recognition
of the importance of pedagogical content knowledge (see Otapter 2).
The reason for being more explicit about learning intentions is, in part, 10
engage the student in understanding what is required. One of the threats to
validity is that students do not understand what they are supposed to be learn-
ing and what reaching the intended standard wi.ll involve. A recent survey of
l3-year-olds in England (Stoll et aI., 2003), which involved:z.ooo students, asked
'what helps you learn in schoolf. The largest group of responses involved the
teacher making clear what was being learned:
My sdmu le/lchu /lnd English /e/lcher wriles oul Slims for the Irsson. which
helps undus/llnd whal wt' 11ft going 10 do in lesson. I don'lUb lellchffs
wIw jusl give you II of IOOrk /lnd up1 us to know whu/ 10 do, tlzey have 10
the piue of ulOrk. (2003: 62-3)
Other work with students has also brought home how bewildered some are
about what is being learned:
II's /loiliult llulven't Ifllmt much. It's just tlult 1 don't really undusland what
/'m doing. 05-year-old studffll, Harris el al., 1995)
Understanding 'where they need to get to in their learning' is a key ('lement in the
definition of aSSCSllment for learning (ARG, 2002a). For Sadler (1989) it is this
understanding of where they need to get to ('the standard') thai is critical to suc-
cessful feedback. When we are not sure what is needed, it is hard to make sense
of feedback. At a more theoretical level this can be linked to construct validity
what is to be learned and how does this relate to the domain being studied?
Dilemmas in 'making explidt'
How explicit should learning intentions be? How do w(' strike a balance which
encourages deep learning procesSt.'S and mastery learning? If the intentions are
too general the learner may not be able to appreciate what is required. If they
are too specific this may lend itself to surface learning of 'knowledge in bits'.
The level descriptions used in the assessment of the nationlll curriculum in
England, and other comparable 'outcomes-based' approaches around the
world, run the risk of being either too general or too 'dense' to be self-evident.
11ley may need considerable further mediation by teachers in order for learners
to grasp them. For example, this is the level description for the perfonnance in
writing expected of l1-year-olds in England:
Ltve/4 Writing
Pupils' writing in II rllnge offorms is liuely lind thoughtful. ldellS 11ft often sus-
t/lined lind dtvtloptd in intertS/ing WlIys lind organized approprlatdy for
139
Assessment and Learning
purpos4! of the reoder. Vocabulory choices art aftffl adventurous and words Ori u$M
for e!frct. Pupils are bf'ginning to ~ grammatically complex sentem:ts, extalding
meaning. Spelling, including thot of polysyllabic words that conform to rigular
pill/ans, is generally accurate. Full stops, capilllllmrrs lind qlltstion morks are
u ~ correctly, lind pupils lire bf'ginning to u ~ punctuation within the sentence.
Handwriting style is fluent. joined and legibk (QCA. 2005)
While this may provide a good basis for explicit leaming intentions, the practice
of many !iChools to 'level' (that is, to award a level to) each piece of work is likely
to be unproductive in terms of understanding the standard. This is especially so
as the descriptions are used in a 'best fit' rather than in a criterion-referenced way,
allowing the studt'nt to gain a level 4 without fuUy meeting aU the requirements.
In a criterionreferenced system, in which the student must meet every state-
ment at a le\'el to gain that leveL the threat is that the standard may become too
detailed and mechanistic. This may encourage a surface learning approach in
which discrete techniques are worked on in a way that may inhibit 'prindpled'
understanding. For example, some of the occupational qualifications in
England have been made so specific that 'learning' consists of meeting hun-
dreds of competence statements, leading to a 'tick-box' approach in which stu-
dents are 'hunters and gatherers of information without deep engagement in
either content or process' (Ecclestone, 2002: 36). This approach is paralleled in
highly detailed 'assessment objectives' in national tests and examinations which
may encourage micro--teaching on how to gain an extra mark,. rather than a
broader understanding.
The formativelsummative relationship
Crooks (2001) identifies 'connections' and 'purposes' as structural factors which
influence the validity of formatiVl.' assessment. His concern is the relationship
of formative assessment to the end product and to its summatiVl.' assessment
(see Chapter 6). His assumption is that formative assessment is part of work-in-
progress in the classroom. How does the fmal version benefit from formative
assessment? While the salience of feedback on a draft version of a piece of in-
class coursework may be obvious, it is less clear if the summative element is an
external examination. The threat to validity in the preparing-for-tests classroom
is that the emphasis may shift from learning to test-taking techniques, encour-
aging 'construct-irrelevant' teaching and learning.
When the work is strongly criterion-related, so that the standard of perform-
ance required to reach a certain level is specified, then this has implications for
the teacher's role. The formative element in this process involves feedback on
how the work relates to the criteria and how it can be improved to reach a par-
ticular standard. The summative judgement is whether, at a given point, the
work meets the standard. The dilemma here for many teachers is how to play
the roles of both facilitator and examiner. There is evidence from portfolio-
based vocational qualifications that some teachers have found this problemati-
cal (Ecclestone, 2(02).
140
The validity of Formative Assgffient
In summary, wilhin the classroom factors such as tnJst and motivation,
clarity about what is being learned and the relationship of formative assessment
to the summative goals will all affect the validity of fonnalive assessments. This
leads 10 the recognition that the possibilities for fonnalive assessment are betler
in some learning conlexts than others. 11le task is then to improve the learning
rontext so as to increase the validity of formative assessment.
Validity and feedback
Feedback is one of IN> central components of assessment for I('aming. If feed-
back is defined in terms of 'closing the gap' between actual and desired per-
formance then lhe key consequential validity issue is whether this has occurred.
What the resench evidence make! clear, however, is just how complex the
of feedback in learning is. While we can give fet'dback th"t is intended
to hdp the learner to close the gap, this may not happen. It is not
just that feedback does not improve learning. it may even interfere with it.
Kluger and DeNisi ronclude from their metaanalysis of the psychological
research that:
'n Olin' third of 1M aJSt$ FmllxIck Inlmltfltiuns mlucnl pt'fiJlf/l(lnct ... 1
bdiew tlull T'tStllrrhc's lind prlUli/ionn5 IIfiu their jminp thlll jMlbIldc
is dtsirllblt with 'luts/ion of whetlln- FttdblUk fntmltfllion bcrntfils p<rform-
,nct. (J996: 275. 277)
For feedback in the classroom. the following play an important role in the estab-
lishment of valid feedback:
It is dearly linked to the learning intention.:
The learner understands the success criteria/standard;
It gives cues at appropriate levels on how to bridge the gap:
a) seU'regulatol)'/met3cognitive
b) process/deep learning
c) task/surface learning;
It focuses on the task rather than the learner (self/ego);
It challenges, requires action, and is achievable.
The Hrst two points refe.r back 10 the task factors and reinforct' the relationship
between darily about what is being leamt."<! and those assessment criteria
which relate directly to it.
'Cues at appropriate levels' is derived from psychological construdS used by
Kluger and DeNisi and needs some decoding. 11le thrust of their argument is
that if feedback is pitched at a particular level then the response to it is likely to
be at that leW'l. For example. if feedback is in te.rms of encouraging persever-
ance with a task ('self-regulation') the mponse will be in terms of more effort.
While this in itself will not lead to new learning it may provide the rontext for
seeking further feedback 1'1 the process or task level. Feedback is most power-
14'
Assessment and learning
fuJ when it i! provided at the process level and seeb to 1Ilah- connections and
grasp underlying principles. Feedb<tck at the task level is producth'e when it
deals with incorrect or partial information. though It'Ss 50 when the
task/concept is not understood.
This line 01 rt'asoning means that feedback at the selfJego level will focus atten-
tion at this level. The gap to be dosed is then Ies6 about students' learning than
their self-perception. Kluger and DeNisi discuss this In terms of reducing 'self-
related discrepancy' which may involve switching 10 other tasks 'that Io'iOUld
signal attainment of 8e1iview'(l996: 266).. a prOCtiiS which depleres the
cognitive resources available fOt the task. Ii I am given feedback that my work has
disappointed my trocher, who knows I could do better, I will seek ways of rec-
onciling these judgt'lTlents to my own 8elf-understanding. I may attribute the
quality of my WQfk to lack oJ effort proleding my view of myself as having the
ability to do it (a favoured male It.''Chn.ique7). HoweV\"!', if the teacher', judgement
was on a task I had done my best on. I may begin 10 doubt my ability - a pr0ct'S5
which if continuously repeated may lead 10 a stale of 'learned helpJessness'
(Dweck, 1999). In this I dedart' ') am no good al this' and may avoid any further
exposure, for example by dropping that5Ubjee:t and finding an easier one.
Research into dassroom assessment (Gipps et at, 20(0) has shown that e\'en
with expert teachers relatively little of this process or task-focused 'descriptive'
feedback takes place. Rather, most feedback is 'evaluative' and takes the fonn
or the leadll.'r signalling approval or disapproval, with judgements about the
effort made. While evaluative feedback may have a role in terms of motivation
and effort, it is unlikely 10 lead directly to leaming and so is nol valid fonnatlve
8Slle$$ment.
MIUb Imd grllda lI:5 Ihrml! to VAlid jlmuJlivt lIS5l'S5mt7ll. Treating marks and
grades as threats to valid fonnative assessment is one of the most provocative
issues in assessment for II is not a new claim. 1homdike, one of lhe
founding fathers of behaviourism. claimed that grades can impede learning
because as a feedback ml."Chani.sm 'Its vice was ils relativity lcomparison to othersl
and indefini\enes$ [low level of specifk:ityJ' (1913: 286). Buildin8 on the won: of
Butler (1988).. which showed significant student leilming gains from comment-
onJy marking when compared with marking which U5Il.'d gradt"S and comments.
ther't' ha5 been encouragement to move to 'comment-onIy' marking (Black et al,
2003; Darke, 2(01). The rationale for this Is that grades. marks and lewis do not
provide information about how to move forward; any infonnation is too deeply
encoded. For many students they will have a negative effect because:
Leaming Is likely to stop on the task when a summalive grade Is awarded for
it (Kohn. 1993);
The level of rnponsoe may shifl to a .self/ego level in which the learners' ener-
gies go Into recondling the mark with their view of themselves as \earners;
They may enoourage a perfonnance orientation in which the focus Is SUcces&
In relation to others rather than Ieaming. This in tum may have negalivt'
motivational and learning; ronsequenc:es for those who gct low grades (ARC,
2002b; Reay and Wiliam. 1999).
'42
The Validity of Formative Assessment
For many this is an area in which social and political expectations make any
such move problematic. as evidenced by the press furore when Insidt tht BIlla
Box (Black and Wiliam, 1998b) was launched, with national newspaper head-
lines such as:
DON'T MARK HOMEWORK - It upstts dunm; says top educlltion txpert
(Daily Mirror (6102198);
"d
nvo OUT OF TEN - For eduClllionlllisls wlw want the world to be Il dijfr:mTt
place lThe Times, tditorial, 0610ZJ98)
Smith and Gorard (2005) also provide a cautionary example of where 'comment
only' was introduced with some negative leaming consequences. TItis was the
result of teachers simply providing the evaluative comments they usually made
alongside their marks ('try and improve'), rather than providing feedback on
'where ... to go and how best to get there'. For the students it meant they were
confused about the standard they had achieved, as marks at least provide an
indication of the relative merit of the work.
Praise as a lhrtlll to valid ji:mllativt IISStssmelit. This is another highly sensitive
area. The logiC behind this claim is that praise is unable to directly improve
learning. What it may do is motivate or enrourage future learning.. but this does
not constitute formative assessment. The threat is that it may even get in the
way of leaming. Praise is essentially self, rather than task. focused (or will be
treated that way by the recipient). Kluger and DeNisi (1996) suggest that while
praise may help when a task is relatively simple, it impairs performance on cog-
nitively demanding tasks, partly because it shifts attention away from the task.
What Gipps et al. (2001) have shown is that praise was one of the predomi-
nant forms of feedback in the classrooms they observed, with task focused feed-
back infrequent. While we might understand this as busy teachers keeping
students motivated, with detailed feedback a luxury under normal classroom
conditions, the teachers would probably consider that they were giving fom,a-
tive feedback. This is a misunderstanding that has regularly been noted (ARC,
1999). An unpublished study br Bates and Moller Boller (2000) has supported
this, Their research involved examining, as part of a local authority review of
schools' assessment policies, the marking comments over a scven month period
across 12 subjects in one ll-rear-oJd's work books. Over 40 per ccnt of the 114
written comments were praise unaccompanied by feedback. A further 25 per
cent we.re presentational comments: 'don't squash up your work'; 'pleasc take
care spelling'; 'very good - always write in pen'. The highJy generalized
nature of the feedback was bome out by it being impossible to determhw to
which subjects the majority of the feedback related. In only 23 per ccnt of the
cases was there specific process level feedback, for example: 'what parts of her
character do you like?'; 'why did you do 2 different tests?'; 'Why is this? What
do you think you really learned?'.
143
1
Assessment and Learning
Dweck (1999) and Kohn (1993) have dted the negative impact of praise and
rewards on conceptiOns of learning. Dweck's experiments have shown how
those receiving constant praise and rewards are likely to attribute their success
to their ability. This is perceived as a fixed entity, as opposed 10 'incrementalists'
who take a more situational and effort-based view of successfulleaming. The
consequence of this can be that 'straight A' students will do all they can 10 pre-
serve tht'ir reputation, including laking easier courses and avoiding any risk of
failure. The emphasis is then on perfonnance - gaining good grades ('proving
competence') - rather than on mastery with ils attendant risks of sct-backs and
even failure ('improving competence'; Watkins et aI., 2001). Owed gOl'S on to
show the negative impact on 'top students' (particularly femaJes) when they
progress to colleges where success may be marc e u s i ~ e whu generate self-
doubt about whether they really had the 'ability' they had been conditioned
into thinking they possessed.
This approach raises questions aboul the use of merits, gold stars and smiley
faces in the classroom. These are not about learning so much as motivation, and
this foml of motivation can undemline decp Jearning and encourage a per-
formanCt' motivation (S(>(' Chapter 2). Clarke has taken a practicaJ look al some
of the 'sticky issues' of external rewards (2001: 120) in relalion to formative
assessment in the primary classroom.
Valid!(edback Challenges, requires actior! alld is IlchiWllble. Clarke (2001) has also
observed that one problem arca of classroom feedback is that while sludents are
gi\'en feedback on a piece of work Ihey are oflen not re<Juired 10 do anything
active with it; in effect it is ignored. This is particularly unproductiv" wh"n the
comment is made repeatedly (for example, 'you must improve your presenta-
tion'). Research from the LEARN project (Weeden el aI., 2002) found that
written fce.dback was sometimes undermined by leach{'TS being unclear, bolh
because the handwriting was hard to J'('ad and be<:ause Ihe language used was
difficulilo understand.
While 100 little challenge in feedback does not directly encourage learning,
too much can make the gap S(>('m impoSSible 10 bridge. Most of us will have
experienCed 'killer feedback' which makes such huge, or numerous, demands
that we decide il i ~ nol worlh the {'ffort.
A rurther, and salutary, factor in feedback not leading to learning is Ihal the
learner has a choice as to what to do with Ihe feedback. If 'learners must ulti-
mately be responsible for their learning since no-one else can do it for them'
(ARC, 19?9: 7) then the manner in which they usc feedback is pari of this. The
risk of onJy making limited use increases when ft.'"dback is given in the form of
a 'gift' _ handed over by Ihe giver 10 the recipienl- rather than as part of a di,J-
logue (Askew and Lodge, 2000). Kluger and DeNisi (1996) also show how th..
learner has options when faced with feedback, and can choos.c to:
Jncreasoe their effort rather Ihan lower the standard;
Modify the standard;
Abandon Iht' standard ('retiJ'(' hurt');
Reject the feedback!mesS(>nger.
144
I
The Validity of Formative Assessment
1lll' fIrSt response is more Likely when 'the gool is clear, when high commitmenl is
secured for it and when belief in eventuaJ success is high' (1996: 260), We see the
other three options being ext'l'e&od (and exercise them OUI'!lelves) when students
settle for 'aliI want to do is pass', having started with more ambitious goals; when
they declare they are 'rubbish al ... ' and make no further effort and whl-n,. to
punish a teacher they do not like, they deliberately make no effort in that subject.
Other sources of fee dback
Feedback in this chapter has been treated largely in terms of the teacher-learner
relationship. It can,. however, rome from II variety of sources. What is increas-
ingly being recognized is lhat peer and self-assessment have a significanl role
to play in valid formative assessment. The logic of this is that, for these forms
of assessment to be effective, students have to be actively aware of the learning
inlention and the standard thaI has to be met. Sadler argued that lhe ultimale
aim of formative assessment is:
to download thai evalrlalive {lISS<!S5l1rcntJ knowledgc sc that stlldl'nts evenlually
become independent of the teacher and inteJligerltly engage in aud monitor their
own development. if allYthing, thl' guild knowledgc of lellchen; should consist less
in bruwing how 10 f'OOlullte student work lind mort: in brOUJing WIIYS 10 download
evaluative hrOUJlcdge 10 students. (1989: 141)
While the aim of feedback is to reduce trial and error in the learning process, it
is not intended to completely e:.clude them. Kluger and DeNisi make the point
thaI 'e\'en when FI [feedback inten:ention] is accompanied by uscful cucs, they
may serve as crutchL'S, preventing leaming from errors (natural feedback)
which may be a superior learning mode' (1996: 265). One has only 10 walch
skateboarders practising techniques (no manuals, no adults to offer feedback)
to see the point of this claim.
Feedback has been a Iwy element in this discussion of validity beocause of its
critical role in leading to further learning, the concept at the heart of conse-
quential validity in fonnative assessment. Whal has to be recognized is the
romple:.ity of feedback processes, and how activities thaI pass for feedback
may not be valid. TIle challenge is whether the consequence of the feedback is
further learning, rather than improved motivation or changes to self-esteem.
These may have a place, but are not themselves formative assessment. A further
thOllght is that some forms of feedback may sometimes undermine the deep
learning we claim to encourage.
What has not been considered so far is the role of reliability in these forma-
tive asSt-'SSment processes. Does it pose a threat to validity in the samc way as it
does in summalhe assessmenl?
Reliability and forrrYtive aSHssment
Unlike summative assessments, com"cntional concepts of reliability such as
marker consistency do not playa part in this partiCUlar validity argument. For
145
,
Assessment and Learning
formative purposes judgements are essentially student-referenced rather than
needing to be consistently applied. since a variety of students with similar out-
comes may need different feedback to 'close the gap' in their learning. This is a
strength rather than a problem.
How is reliability to be interpreted in relation to formative assessment? Re-
conceptualizing reliability in terms of the trustworthiness of the leacher's
assessment has potential in relation to formative assessment. Wiliam (1992: 13)
has proposed the useful concepts of disclosure ('the extent to which an assess-
ment produces evidence of attainment from an individual in the area being
assessed') and fidelity ('the extent to which evidence of attainment that has
been disclosed is recorded faithfully') as alternative ways of thinking about reli-
ability. The concept of disclosure is useful in thinking about the reliability of
formative assessment. Has the formative assessment gathered the quality of
evidence needed to understand where the learner is? Would a different task
ha\'e led to a different understanding? While, ideally, feedback is repeated and
informal and errors in interpretation will be self-correcting, the intention is to
provide relevant feedback. 'Unreliable', in the sense of the limited dependabil-
ity of the quality of interpretation and feedback, may ha\'c some salience. nus
is particulilrly relevant when feedback is being given in relation to any fonn of
criterion-related standards, since any misinterpretation of these by the teacher
could lead to feedback that misdirects learning.
Conclusion
Validity is central to any assessment. It is directly related to the purpose, fonn and
context of an assessment; as these vary, so do the key threats to the validity of an
assessment.1be validation process involves judgements about the inferences and
consequences of an assessment and what may undermine confidence in these.
Reliability issues are part of this process in relation to summative assessment, but
less so to formative assessment since unreliable results will undermine the
dependability of the inferences that are made. So too will a failure to sample effec-
tively the construct being assessed. even if the assessment is reliable.
In formative assessment validity is about consequences. Did further learning
take place as a result of formative assessmenl? 1lle threats 10 validity are those
things that get in the way of this learning. 1hese may be related to the classroom
context, itself affected by the larger sodo-cultural context. and the conditions for
learning. nus is exemplified in how feedback. a key concept in formative assess-
ment, is used in classroom interactions. Many current feedback practices may nol
lead to further learning, and therefore may not be valid formative assessment.
I
146
I
Part IV Policy
Chapter 9
Constructing Assessment for Learning in the
UK Policy Environment
Richard Daugherty and Kathryn Ec:c1Htone
The rise of interest in assessment for learning in lhe UK has, as earlier chaplers
show, produced a parallcl increase in theoretical and technical activity in rela-
tion to teachers' assessments of thcir own studlmts and in mechanisms to
promote the validity and reliability of such assessments. All of these dimen-
sions have important policy implications for national assessment systems. Thl'
chaplers on teachers' practice show thai there was also, over the same period, a
growing professional interest in assessment for learning. However, despite
attempts in the late 1980s to include notions of assessment for leaming within
national curriculum assessment in England and Wales, UK policy makers have
only n.>cenlly taken an interest in this crucial aspect of the assessment of stu-
dents' attainments. For example in England, where the policy environment had
appeared to be unfavourable, policy makers have at the time of wriling linked
assessment's role in support of leaming to the 'personalization' of leaming..
which is a central plank in the current Labour government's approach to 'per-
sonalized public services' (see Leadbetter, 2(04).
This chapter explores the rise of assessment for leaming as a feilture of edu-
cation policy in the four countries of the VK and shows how assessment poli-
cies arc a pivotal element in the distinct, and increasingly divergent, policy
environments in England, Scotland, Wales and Northem Ireland. Each of the
four counhies of the UK is e\'olving its own education system and this process
has accelerated since 1999 when structural changes in the CQnstitution of the UK
resulted in increased policy-making powers for the Scottish parliament and for
assemblies in Wales and in Northem Ireland. The chapter considers the rising
prominence of assessment for leaming within the broader educalion policy
scene as one of the major ways in which governments aim to alter professional
and public expectations of assessment systems.
We take as our starting point Broadfoot's (19%) reminder that asst'Ssment
practices and discourses ate embedded in and emanate from cultural, social
and political traditions and assumptions. These affect policies and teachers'
prilctices in subtle, complex ilnd often runtr.ldictory wilys. In relation to
assessment, the past thirty years have seen fundamental changes in
expectations ilbout the soci.ll, political and educational purposes that
assessment systems must serve. Growing political interest in assessment for
leilming has occurred partly in response to a shift from norm-referenced
149
,
Assessment and learning
systems engineered to select the highest achieving students, towards various
forms of criterion-based systems Ihal aim to be both merilocratic and indusive.
At the same time, attempts to introduce more holistic approa<::hes to
assessment in poSI-14 and post-compulsory education and training, such as
records of achievement and portfolio assessment, aim to expand Ihe range of
outcomes that can be certificated and recognized formally (see Broadfoot,
1986; Hargreaves, 1995; Jessup, 1991).
The broader background of changing ideas about what counts as legitimate,
educational and useful assessment forms the context for considering debates
and poli<;y shifts around assessment for learning. This chapter will explore
debates inside policy processes and among academic and professional con-
stituencies about assessment for learning in the compulsory school system. In
the first p,art WI' will outline some theoretical tools that are useful for analysing
these deootes and processes. In the second part we will show how in England
ideas about assessment for learning were debated and contested amongst
policy makers, a(ademiC'l and professional constituencies as national curricu-
lum assessment was developed and implemented. In the third part \\1,," will
explain how policies and politics in Scotland set up a quite different context for
debate and practice in relation to assessment for leaming. In the fourth part 1'1,,"
will refer to poliq developments in Wales and Northern Ireland and also
review a range of recent policy initiatives across all four countries. Finally, w,,"
shall summarize the main shifts in conceptions and enactments of assessment
for learning in order to show how its edm:ational potential for making learning
deeper and more motivating can be subverted.
Analysing assessment policy
It is important to define 'policy' and while Ball admowledges that this is
fraught with oonceptual oonfusion, he offers a useful working definition:
[polidesJ are pre-eminenliy, statements aoout practice - the WilY things amId or
should /It - which rest Upo", derive from, statements aoout the world - ilbout the
way things are. Thtryj are in tellded to bring about indiuidual wlu tions to diag/losed
problems. (1990: 22)
Further clarification about what we mean here by 'policy' is offered by Dale
who differentiates between the 'politics of education' as the brooder agenda for
edu(ation, created through particular processes and structures, and 'education
p o l t ~ as pnxesses that operate inside official government dep;lTlments and
agencies and through engagement with other interested groups. These
processes are convoluted, often contentious and opaque to those outside them,
but they work to translate a political agenda into proposals to which institu-
tions and practitioners respond (Dale, 1994). He argues that a focus on educa-
tion politics makes little sense unless there is 'A more or less explicit reference
to, and appreciation of, the politics of edu(ation' (1994: 35).
150
Constructing Assessment for learning in the UK Policy Environment
Following these broad notions of policy and politics, one approach to analy-
sis in this chapter would be to locate debates about assessment for learning in a
broader structural analysis of the ways in which the economy, education and
culture interact. Or, we could analyse how various groups, individuals and
interested constituencies interact both within formal policy proct"Sses and
broader advocacy and the 'epistemic communities' that contribute ideas and
information to policy makers. We could also undertake a discursj\... analysis of
how a particular nolion, in this case assessment for l...aming, is symbolized and
then enacted through political conceptualization, fonnation and transmission.
Combining all three approaches enables an analysis of assessment for learning
as a prominent them... in education policy and th.. politics of education to be
tr"red to previous problems and debates (see for example, Eccl...stone, 2002).
We rc:lgnize h...re th... need to rem...mber broader stmctural and cultural
influences on debates about assessment for learning. We also acknowledge the
nl.'ed to know more about the .. Heets of debates about assessm(>nt for learning
at maera-, meso-- and micro-- levels of policy and practice and how these conneet
national policy, institutional responses to policy; and the shaping of individual
identity and social actions in classrooms. However, for reasons of space and
clarity, we will confine our analysis of assessm...nt for learnIng in recent assess-
ml.'nt policy to two notions off...red by Ball and other colleagues, namely 'policy
as text' and 'policy as discourse' (Ball, 1990,1994; BoWl' et aI., 1992).
Policy as text
K...y texts, such as acts of parliam...nt are translated at various le\'els of the policy
process into other official t('xts. such as national rurriruium policy statements
and regulations, and then into what Bowl' et al. call 'Sl"CQndary texts', such as
non-statutory guidelines and adviU' on practiu-. At all stages of the policy
proa.-ss official positions about assessment emerge in subtle and often contra-
dictory ways through various texts and discussions. Texts, therefore, represent
policy and encode it in complex ways through the struggles, compromises and
public interpretation of political intentions. Texis are th(>n decoded through
impleml.'ntations and new intt'rprctations, by individuals and constituencies,
moving in and out of policy processes. As Ball points out, attempts to present
policy may spread confusion as various mediators of policy try to relate their
understandings of policy to particular contexts. It is therefore crucial to recog-
nize that texts are not
clear or clOSI,'d ar complete Ibul} Ihe products of at lIarious stages (al
pcillts of initial illf/Ueuce, in tilt' micropchtir:s of !ormation, iu the I'M-
liammtary I'rocess and in Ihe poli/lrs ami mlcropolitics of interesl group arllcllla-
11011). (1994: 16)
Interest in assessment for learning at all levels of the UK's education system has
gen...rated 11 deluge of texts that follow on from the official key texts: draft and
151
I
Assessment and Le,Jrning
final assessment specifications; guidance to specification writers; advice to
teachers; gUidelines to awarding body officers; decisions and debates recordC'd
in minutes of policy meetings and public documents such as policy papers and
text books. In addition, interest groups and professional bodies offer their own
interpretations of assessment for learning while the speeches of policy milkers.
offidal videos and wcbsites add further layers of meaning. These texts can all
be seen as' ... cannibalized products of multiple (but circumscribed) influences
and agendas. There is ad hocery, negotiation and serendipity within th(' stall',
within policy fonnation' (Ball, 1994: 16).
In addition, as Bowe et. al argue, texis vary in the extent to which they are
'readerly' and offer minimum opportunities for interpretation by readers, or
'writerly', where the)' invite the reader to join in, to co--opt'rate and feel some
ownership of the ideas. Making sense of new texts leads people into a ' ...
process of trying to translate and make familiar the language and attendant
embedded logics' (1992: II).
For teachers, parents, professional bodies, policy makers and implementers
of policy, such as inspectors, a plurality of texts produces a plurality of read-
ings. Such romplexity means that we need to bear in mind constantly, we
review policy debates about assessment for learning, that' ... the expression of
policy is fraught with the possibility of misunderstandings, texts are gent'Tdl-
ized, written in reldtion to idealizations of the "real world" dnd can never be
exhaustive' (1992: 21).
Assessment for learning may therefore be robustly and overtly defin,-"<l, or it
may emerge in subtle and more implicit ways. Its various representations
reflect, again in overt and implicit ways, beliefs about desirable educational
goals and practices. In addition, further negotiation and understanding (om{'
(rom a very diverse range of bodies and individuals who make use of policy
texts. These include awarding body officers, inspectors, staff development
organizers, unions and professional organizations, local education authority
advisers, teachers, students, parents and employers. All crl.'ate and amend the
officiallexts and offer competing interpretations of policy aims. Exploration of
different texts Cilll therefore reveal the influences and agendas view,-"<l as legit-
imate both inside policy processes and within institutions. It also reveals how
these change over time as key actors move on or are remo\'ed from processes
and debates. Charting how policy texis have evolved enables us to understand
more about how teachers and students intl.'rprel their intentions and tum them
into 'interactive and sustainable practice' within particular social, institutional
and cultural contexts (Ball, 1994: 19).
Policy as discourse
Despite the importance of texts for understanding policy debates and
processes, focusing analysis too heaVily on them can produce an over-ratiunal
and linea'r account of debates about assessment for learning. The notion of
'policy as discourse' is therefore a crucial parallel notion because it enables
152
Constructing Assessme-nt for le-arning in the- UK Policy Environme-nt
researche-rs, practitioners and implemmters of policy to see how discourses in
policy construct and legitimize certain possibilities for thinking and acting
while ladtly excluding others. Through language, symbols and codes and their
presentation by different authors, discourses embody subtle fusions of particu-
lar meanings of truth and knowledge through the playing out of power strug-
gles inside and outside policy. They construct our responses to policy Ihrough
the language, concepts and vocabulary that they make available to us, and legit-
imize some voices and constihJendes as h;'gitimilte definers of problems and
solutions whilst silendng others (Ball, 1994). Focusing on discourse encourages
analysts of texts to pay dose attention to the language and to its ade<Juacy as a
way of thinking about and organizing how studmts learn. It also reminds us
how texts reflect shifts in the locus of power betvl'cen different groups and indi-
viduals in struggle to maintain or change views of schooling.
However, analysis of the ways in which particular discourses legitimize
voices, problems and solutions must also take account of the 'silences' in the
text, namely the voices and notions that it lea\'CS out. Silences operate within a
text to affect how we view educational problems and polides, but they also
come from other discourses and the policy proresses that produce them. For
example, a discourse about assessment for learning needs to be interpreted in
the light of disrourses in other texts about accountability or the need for nation-
ally reliable assessment.
In this chapter we will focus on selected texts in order to identify the interac-
tions and goals of different bodies in the production of texts and the discourses
of assessment for learning that permeate them.. either overtly or more subtly.
Assessment for learning in England
The Introdudion of national curriculum assessment
The transformation of education policy brought about by the Education Reform
Act of 1988 included within it, for the first time in the modem era, proVision for
a statutory national curriculum and associated 'assessment arrangements' cov-
ering the years of compulsory schooling (ages 5 to 16). With relatively little
prior thought seemingly having been given to what form such arrangements
might take, the Minister for Education remitted an experl group chaired by an
academic. Professor Paul Black, to draw up proposalS.
TIle Task Group on Assessment and Testing {TGAT}, working to an open-
ended all-purpose remit from government, chose to place assessment of stu-
dents by their teachers at the centre of its framework for assessment. The
group's recommendations (DES/WO, 1988a) drew on experience in the 1970s
and 1980s, in particular in relation to graded tests. of teachers' assessments con-
tributing both to students' learning and to periooic summative judgments
about their attainments. In the earliest paragraphs of the report - 'our starting
point' -the formati\'e purpose of assessment is identified as a central feature of
the work that teachers undertake:
153
Assessment and learning
Promoting children'S learnillg is a pritlcipal aim of Shoolitlg. Assessme'lllies at
the hellTI of this process. (pllra. 3)
... the remUs lof llalio"alllS5tSSllrtrtlsI should provide II basis for decisiQns about
PUllits' jurther learning 'lnds: they should bt' formaliVt'. (pilra. 5)
The initial formal response of minJsters to the TCAT recommendations was to
signal acceptance of what would clearly be an innovative system for assessing
students' attainments and their progress. The minister responsible, Kenneth
Baker, echoed TCATs focus on the individual student in his statement to par-
liament ae<:epting the group's main and supplementary reports and m05t of its
recommendations:
The results of tesls and other assessments should be used both fomullhrdy 10 I1dp
better tmming altd 10 infurm "exl steps for a pupil, and summatively at ages 7, lJ.
14 and 16 to inform pIlrt/lts about their child's progress. (Jflwled by Black, 7997: 37)
Yet it is clear, from the memoirs of the politicians in key roles at the time (Baker,
1993; Thatcher, 1993) as well as from the work of academics (Ball, 1990;
Callaghan, 1995; Taylor, 1995), that fomlal government acceptance of most of
the TCAT recommendations did not mean acceptance either of a discourse of
formative assessment or of its translation into ideas for practice. From a very
early stage of policy development, though only the proposals for the consis-
tency of teachers' assessments to be enhanced by group moderation (DES/\VO,
1988b) h..'ld actually been formally rejected by the minister, the
amongst policy makers concemed test development al each of the first three
'key stages'. The ideas that had shaped the TCAT blueprint for national cur-
riculum assessment qUickly came to be 'silences' in the policy discourse and in
the texts about assessment that emanated from government and its agencies.
As Black's own account of Whatever /wppeIled 10 TGAT makes clear, several
factors including the growing infIuenC\' of the 'New Right' had the effect of
underm\ning TGAT and transforming national curriculum assessment into a
very differently oriented set of assessment policies (Black. 1997). The need for
national assessments to supply indJC6 of school perlOmlance for accountability
purposes, an aspect of the policy thai had been downplayed when the govern-
ment was enlisting support for the Education Act's passage through parliament.
came to the fore once the legislation was in place. The ideological nature of the
debates about TCAT within the go\'eming part)' is evident from the characteris-
tically blunt comments of the then Prime Minister in her memoirs:
fact thai it [lhe TGAT RtpOrtJ was uoeJcollled by tlte LAbour Party, 0,,,
National Union ofTeachl7S and the TImes Educational SUPIIll'mml was 1',lOuglllo
cOllfinll for me Ihat its approach was /I proposed an daoorlltl' and complex
system of assessmen/ - teachl7 dominlltrd and uncos/ed. (Thillchcr, 1993: 594}
to short, TCAT was perceived as the work of an insidious left-leaning 'education
establishment' intent upon subverting the government's best intentions to raise
educational standards. llUs neo-conservative discourse, epilomi7..{'(\ by the lan-
154
Constructing Assessment for learning in the UK Policy Environment
glJage used in Marsland and Seaton's 711 Empirr Strikes Back (1993), was in the
ascendancy amongst education policy m k ~ in England in the early 1990s (see
Black, 1995, for a fuller discussion of this). Ewn though its influence was beginning
to wane by the time of the Dearing Review of the national curriculum and its
assessment in the middle of the decade, the voice of the Centre for Policy Studies
(Lawlor, 1993) was still a prominent feature of the policy discourse at national level.
As detailed policies for each element in the new assessment system were
deveJoped. the national curriculum assessment arrangements emerged as a
system of time-limited and end-of-stage tests in the 'core' subjects only, the
main purpose of which was to supply data that would place each student on a
10 (later 8+) level scale (Daugherty, 1995). It also became increasingly obvious
that such data would be aggregated and published as indicators of the per-
fonnance of teachers, schools, local education authorities and the system as a
whole. Although TGAT had envisaged arrangements that would focus on the
fonnative use of data on individual students, the evaluative use of aggregate
data coloured the multifarious texts spawned by national curriculum assess-
ment. In parallel, policy discourses associated with those texts reinforced this
perfonnance indicator and target-driven view of assessment. Without ever
being supersedl."d by a reviSl."d policy, the TGAT recommendations were dis-
lorted and then abandoned, thereby illustrating how policy texts are reworked
at the whole system level as policies move from an initial blueprint through
development and implementation. In this respect the discourse of assessment
for learning carricd the ominous silence, of accountability and concerns about
the reliability of teacher assessment, from other parallel discourses.
Over the same period, agencies and individuals responsible for implement;!;-
tion were interpreting and mediating those policies. 1llere is substantial research
evidence about the ways in which national curriculum assessment came to be
understood and operationalized, both by officials in departments and agencies of
govemment and by the headteachers and teachers in schools on whose practices
the system depended. This was happening in spite of government antipathy to
teachers' practices as biased and too student-eentred and the low profile of
'teacher assessment' in the policy texts of the time. Evidence of the impact of those
polities can be found in findings both from large-scale longitudinal studies in the
primary curriculum (Osbom et aI., 2CXXJ; Pollard et al., 2000 - see also Glapter 4)
and from many other empirical studies reporting on assessment practia.'S in
schools (for example, Tunstall and Gipps, 1996; Torrance and Pryor, 1998; Reay
and Wiliam. 1999). Taken together, these studies show the effects of a target-led
approach to assessment and the subtle changes to teachers' and students' roles
and perceptions of the purposes and outcomes of assessment.
The potential for local education authorities to have a significant role in
implementing.. moderating and monitoring national curriculum assessment
was never fully developed because policy makers at the national level, more
often implicitly than explicitly, acted as if what was decreed in London would
be accepted and acted upon in every classroom in every school in every LEA in
the luntry. Local education authorities were also perceived by some policy
activists on the political right as being prime movers in a malign influence of
155
Assessment and learning
the 'education establishment' on the education system. However, as agencies
that provided training for teachers and therclor!' medialed the texts published
at the centre, lheir influence on schools and teachers would be considerable. As
Conner and James have shown, some local authorities went beyond the
'attempt 10 accommodate state policy within a broad framework of local values
and practice' (1996: 164). developing local initiatives such as procedures for
moderating leachers' assessments of their students. Local moderation as a nec-
essary component in any national system that makes use of teachers' judgments
had, once TGAT's proposals in this respect had been rejected, been n{'gleeted by
policy makers al the national level.
Education policies in general, including assessment policies, were being
shaped by whal Broadfoot (2000) and Ball (2000), using Lyolard's lerm, have
characterized as a culture of 'performativity'. Performativity came to dominate
the thinking of policy makers in government during that period to such an
exlent thai
the cltllr policy emphasis fa[ I!U' 1990s WllSj on ~ r t as a measurement
droice, Ihe rt5ults ofu.'hich flre used 10 good studellls, teflchrrs alld inslitutimls as
Q whole 10 Iry harder. II is /lot surprisill8 thaI, faced wilh thest prt'SSurtS, schools
haw typically succumbd 10 them. (Broadfoot, 2000: J43}
As bolh Broadfoot (2000) and Ball (2000) have argued, lhe selting and regula-
lion of political targets influence leachers in subtle and profound ways. Sum-
mative assessmenl by leachers of their own students, a residual feature from
the original TCAT framework, was still given a notional status in the O"erall
framework, for example in the Dearing Review of curriculum and assessml'nt
in 1993{4. BUllh" policy t{'xls and associated discoufS{'s of Ihal period showl'd
thai recognition of Ihe teacher's role in using assessment to guide and support
learning disappearl'd from sight. Black and Wiliilm conclude Ihal
... by 1995 nothing UJilS left of the advances made ill the prnlious dutldes. Gov-
ernm/.'1l1 UlQS luken'llrm or uuinttrt5led in fimnaliw aSSfSSmmt; the systems 10
intl'gratt it with the 5ummlltiw had gone, and the further dtt'tlopme71t oftool5 U/jlS
OIrly weakly supported. (2003: 626}
The strengthening of audemic and professional disc:ourses
[n conlrast, acad{'mic and professional asS'ssment discourses retained forma-
tive assessmenl as a crucial aspect of assessmenl practice in ooucational insti-
tutions. In addition.. such discourses presentoo formative assessment as a
necessary component of any aSSl'Ssmenl policy Ihill soughlto mi'{'t several pur-
poses, which mighl be scrvoo by studenl data. For exampll', Torrance (1993)
was writing about Inc 'theoretical problems' and 'empirical questions' associ-
all'd with formative aSSl'Ssmenl. Contnbutions by academics 10 wider Iheoreli-
cal debates about assessment such as by Gipps (1994), and texis written by
ilcildemics for practitioners such as by Stobart and Gipps (1990 and subsequent
156
Constructing Assessment fOf learning in the UK Policy Environment
editions), also recognized and promoted the importance of formative assess-
menl. Crooks's (1988) review of the impact of assessment practices on students
and Sadler's (1989) seminal paper on formative assessment helped fuel contin-
uing debates in academic circles that were in contrast to the preocrupation
amongst policy makers in England with the summative and ,,"valuative uses of
aslieSSment data.
In this contell:t. and with the explicit aim of innuencmg assessment policy
discourses and tell:ts, a small group of academics was established in 1989 as one
of 5e"eral British Educational Research Association policy task groups and con-
tinued as an unaffiliated Assessment Reform Group after 1997. Among its early
publications was a critical commentary on the development of national cur
riculum assessmmt which reiterated the role of assessment in 'the improve-
mlOnt of education' (Harlen et al
v
1992). The group then obtained funding for a
survey of the research literature on formative assessment, undertaken by Paul
Black and Dylan Wiliam. The outcomes of that review, published both in the
form of a fun report in an academic journal (Black and Wiliam, 199&) and in a
pamphlet for wider cimJlation to practitioners and policy makers (Blade. and
Wiliam, 1998b), would in time ~ acknowledged as a major contribution to
reorienting the discourse associated with assessment policies in the UK (see
also Chapter I).
This initial optimistic foray by a group of audemics hoping to i n l u n ~
policy was supplemented by later publications from the team working with
Black and Wiliam at King's College, London and from the Assessment Reform
Group. As part of a strategy for communicating mon' ..fft'dively with policy
makers, making use of pamphlets and policy seminars, the Assessment Reform
Group chose the nlOf(> aOC('ssible term 'assessment for learning' rather than
using the technical lenninology of 'formative assessment'. Assessment for
learning also became increasingly promil'K!nt in the professional discourse
about assessment, supportt.>d by organil.ations such as the AIAA, with lts memo
bt>rship mainly drawn &om assessmCTlt inspectors and advisers in local gov-
(Omment, and by other advocates of formative asst.>ssml'nt operating from a base
in higher education such as Clarke (2001).
New government. continuing discourse
GOYt!mmcnt policy on curriculum and asst.ossment in England during the lole
1990s remained strongly wedded to the notion that the 'raising of standards' of
attainment in schools should ~ t."qualt"d with improyem..nl ;" the aggrega!c"
scores of successive cohorts of students as they passed through the 'key stages'
of the national curriculum. This applies at least as much to the 'Blairite' educa-
tion policies of thl' Labour administrations sinO! 1997 as to the policies of the
Conservative governments of the 1980s and the early to mid 1990s. In some
respects, the educalion policies of the incoming government in 1997 gave a
fresh impetus to the culture that had dominated policy tell:1s and discourses
earlier in the decade, endorsing rather than seeking to change the ideological
stanCt' that had underpinned education policies:
157
I
Assessment and Learning
... mQrlY of NOlJ Labour's changts to the Omsen'lltive agenda wert' Illrgely cos-
metic. In some of its manifestations NroJ LAbour's SlHlllied Third Way looked
remarhlbly similllT /0 qllllsi-markds. (Whit/y, 2002: 127)
Reinforcing a general fX'r'ption that the most important role for data from
national curriculum assessments was to ruel performance indicators, the new
government's first major policy paper ExctlJellce in Schools (DrEE, 1997) sig-
nalled that schools and local authorities wOllld be expected to set and meet
'benchmarking' targets. The then Secretary of Stale for Education, David Blun-
kelt, raisea. the public profile of benchmarking by stating thai he would resign
if the government's national targets, based on national curriculum test data,
were not met. At the school and local authority level these policies weT(' policed
by an inspection agency, OFSTED, whose head revelled in his public image as
the scourge of 'low standards' in classrooms and in schools. This discourS<' was
dominated by the role of assessment in relation to accountability and alongside
this centrally-driven national strategies emerged, first in literacy (from 1998)
and then in numeracy (from 1999). Both were underpinned by the publication
of pedagogical prescriptions in which the formative functions of classroom
assessment had no official role.
The performativity culture was thus retaining its hold on policy makers and
also on practitioners whose performance, individually and collectively, was
being judged in those terms. Looking back on the first five years of Labour edu-
cation policies in England, Reynolds sums up in these terms:
tThe Labollr gl1flt'rnmenl} kept in its {,irllllll entirtly thl;' 'mllrket-bllS<'d' edllclI-
tio/lal policits introouctd by the govemmeut from 1988 to 1997,
illoolvillg the systrmotic tightening of celliral control on the /latllrt of Ihr cur-
riculllm IIml on a5SeSsmerll outcomts, combined with deooluli01r 10 S{;hools of the
dt'lcrmilration of Ihe 'means', 01 schoolalrd classroom IrlJt'l, to determine rmlrom/"S.
(2002: 97J
In such circumstances, there was no place in the official policy discourse in
England for assessment's role as an informal and personal source of support
for the learner or as a key element in learners' genuine engagement with
learning. Instead, the silences in both texts and discourses in relation to
formative assessment as integral to meaningful learning led to an implicit
presentation of it as an instrumental adjunct to the goal of raising formalle\'els
of achievement. In policy discourse and text, 'achievement' and 'learning'
became synonymous. This image of assessment prevailed amongst English
policy makers into the middle years of the next decade, with no
acknowledgment that an assessment culture which geared every of the
classroom experience to test performance - 'a SATurated model of pupildom'
(Hall el a\., 2(04) - was not conducive 10 fostering effective assessment for
learning, And the research into primary and serondary school students'
attitudes to assessment and learning. cited above, showed just how strong an
influence a summative image of assessment was.
158
I
Constructing Assessment for Learning in the UK Policy Environment
Scotland - distind politics, distinctive policies
From guidelines to national survey
Scotland did not experience the kind of major transformation of its schools Ulilt
the Education Reform Act brought about in England and Wales, relying instead
on a series of din..'Ctive but not statutory nalional 'guidelines'. In relation to assess-
ment policy, it retained a national system of periodic sampling of student attain-
ments introduced in 1983 - the Assessment of Achicvementl'rogramme (AAP)-
and did not foUow England in introducing a statutory curriculum backed up by
a system of external tests. Assessml'llt for learning as a major official policy pri-
orityemerged in the early years of the twenty-first century as a product of a dif-
ferent political environment and distinctive policy processes in Scotland (Humes,
1997, 1999) that predated the establishment of a Scottish Parliament in 1999
(Bryce, 1999; Bryce and Humes, 1999; Paterson. 2003; Finlay, 2(04).
Concurrently with the passage of the Education Reform Act through the UK
parliament, the Scottish Office published in No\'ember 1987 a policy text, Cur-
riculum Assessment i/l A Policy for the 1990s. which set out aims for
education of the 5 to 14 age group. In terms of policy process some Scottish
Office ministers fa\'ourcd statutory regulation. Howe\'er, after strong opposi-
tion from parents and teachers to proposals for stiltutory testing.. the guidelines
published in the early 19905 were to be the Scottish response to the broader
trend, within the UK and internationally, towards greater central government
control of the curriculum and assessment. Allhough this was a seemingly softer
approach to regulation of the curriculum and assessment in schools than was
found elsewhere in the UK, Finlay has argued that too much should not be
made of the usc of the word 'gUidelines'.
Hi'T Mafr'sty's of f.dHcation ... us the QS the basis of ills/I('c-
tions ofprimary SdIOO/S alld scolldari"" mId thl'l'Xpectatioll is tlral tlrry will
find a c/O$(' corre:sl'lJm!l'l1C/' betwel'l1 tire guidelines Ql1d pral."tice. 2(/()4: 6)
According to Hayward et al. (2004: 398) the guidelines on assessment, AsSi"SS-
mellt 5--J4 {SOED, 1991), ensured that there were 'clear principles, advocating
the centrality of formative assessment'. And yet, in spite of a supportive policy
discourse, it became apparent from both academic studies (Swann and Brown,
1997) and from a report by the schools inspectorate (HMI, 1999) that the impact
of Assessment 5-14 on the ground in schools was patchy. A national consulta-
tion on future assessment policies for Scottish schools, prompted by the find-
ings of the 1999 HMI survey, was undertaken b)' the Scouish Executh'e
Education Department (SEED) (Hayward et aI., 20(4). The consultation
re\'ealed 'clear, almost unanimous support for the principles of Assessment
5-14' (Hayward et aI., 2(04). However, by the late 1990s it was clear that the
overall assessment system was fragmented and fulfilling none of its purposes
particularly effectively.
Among the issues to emerge from the cunsultation report was the difficulties
159
Assessment and learning
teachers faced in establishing 'assessment for learning' practices in their class-
rooms and the tensions within an education culture where teachers were
expected to be able to reconcile expectations that assessment practices should
serve both formative and accountability purposes. Interestingly, the seeds of
subsequent developments were already in evidence in that report with overtly
supportive reference being made to Black and Wiliam's review of research. The
academic discourse developing from that review came to have a more direct
influence on government policy over the next few years in Scotland than was
the case in England.
There were other assessment policy initiatives during the 19905 from the
Scottish Executive which was still answerable at that time, via the Secretary of
State for Scotland, to the UK parliament. These had their roots in the priorities
of the UK government but took a different form from parallel developments in
Ihe three other UK countries. 'Neither the National Test system nor the AAP
survey had been designed to meet the new data requirements; the 'test when
ready' system did not proVide conventional test scores that could readily be
used for monitoring and accountability purposes' (Hutchinson,. 2005). The per-
ceived need to make available evidence about students' attainments in key cur-
riculum areas, the driver behind publication of school performance tables in
England, led to the introduction of the Nationa15--14 Survey of Achievement in
1998. Up until that point, the national tests in Scotland had been offered to
schools as test units in reading, writing and maths, devised by what was to
become the Scottish Qualifications Authority. to be used by teachers to confirm
their judgments as to the attainment levels (A to F) which students had reached.
After 1998, SEED collected aggregate attainment infonnation from every school
in those curriculum areas and there was an associated expectation that the
reported levels would have been confinned by national tests. Taken together,
these moves represented a considerable raising of the stakes because test data
became an overt part of the accountability policy discourse.
The 'Assessment is fot' te.rning' Project
Despite these influences, politics and policy making in Scotland had given rise
to a distinctive set of assessment policies during the 19905, and the establish-
ment of a Scottish Parliament in 1999 gave fresh impetus to a wide range of poli-
cies on education. A flurry of activity in education - a major responsibility of
the new de\'olved legislature - led to the passing by parliament of the Stan-
dards in Scotland's Schools Act with its five 'National Priorities'. The Minister
for Education and Young l'eople initiated a 'National Debate on Education' in
2001 and published Eduell/ing for Exctllenct in 2003, setting out the Executive's
response to \'iews expressed in the National Debate. Assessment policy for
Scottish schools was the sub;ect of a major parliamentary debate in 2003. Polit-
ical power was in the hands of a LabourlLiberal Democrat coalition, with influ-
ential individuals such as Jack McConnell, initially the minister responsible for
schools policy and subsequently First Minister, to the fore. National agencies,
such as Learning and Teaching Scotland (LTS) and Her Majesty's (nspectorat('
160
Constructing Assessment for Learning in the UK Policy Environment
of Education (HMIE), were drawn into policy development but the shaping of
education policy in the early years or the new centmy was driven frum within
the Executive and strongly supported by key politicians.
Thl' o:.'Stablishment of an Assessment Action Group was the next stage in
assessment policy development, drawing in a r;mge of interest groups includ-
ing national agencies, representati\'l's of teachers, parents and rescarcht'rs. lis
roll" was to OVl"rsee a programml" that developed assessment for students from
3 to 14. This programme was subsequently. and significantly, entitled 'Assess-
ment is for Learning' (Ail'Ll. The AifL programme had considerable wsources
invested in it, mainly to allow teachers time away from the classroom to engage
in devdoping their practice. And yt't. thuugh strongly k'd and guided from the
centre, thl" de\'dopmental model adopted was for teachers at schoollewl across
Scotland being recruitl'<! to a series of parallel projects and given opportunitit.'S
to shape and to share their practice.
One of those projects focused on 'Support for Professional Practice in For-
mative Assessment' and was explicitly based on the work of Black and Wiliam,
involving a team from King's College London ll-d by thl'm as consultants. The
report of its external evaluation (Hallam et al.. 2(04) is positive in tone: 'rela-
tively few difficulties in implementation', 'dramatic improvement in pupils'
learning skills', 'a shift from t..acher-cent",-'t1 pedagogy'. And yet, whilst recog-
nizing that progress had bo..'l'n madl" in the pilot 5(;hools, the evaluators also
highlighted the challenges ahead if the project's gains in terms of improving
learning through formativl" assessment were to be sustained and disseminated
more widely. PerCl'i"ed obstacles to furthl'r successful development includL-d
the tensions between formative assessment strategiL'S and what was required of
teachers in relation to summati\'(' assessment. Some teachers reported that time
pressures militated against being able to cover required curriculum contenl.
Evaluators of the programme also argued that it was crucial to continue teacher
'uwnership' of the policy development process:
TIll' I,roject/las had a promising slart ... /bljl/ s u ~ s u l dissemillalioll reqllires
colliinued fum/ing 10 .:"a/>/e nt'll' participarlls 10 hape suJficierll lime 10 delle/OJ'
aud i"'plrlller,t IJrdr id((l$ a",' n:fleet "PO" ami <'t'al"al.: '''eir progrl'S5. alaI/am rl
aJ., 2004: 13)
Th.. evaluators' conclusions, together wilh insights from other studies of the
programme such as Hayward et aI's (2004), offer helpful pointers to th.. issues
that nCt-'<i to be addlX-'SSL'<i in ilny p<.>licy context which aspires to the major ped-
agogical innovation that the widespread adoption of assessment for learning
classroom practices entails.
In Nov..mber 2004, Scotland's policy journey from the generalities of the 1991
guidelines through reformulation and reinvigoration of policy priorities
r..ached the point where, in Assessmenl, u'Sling and Rt.'/lOrli"s 3-14: Oljr Rs,lOIISC
(SEED, 2004a), ministers adopted moot of the main policy recummendations of
the consultation earlier that year on the AiI'L programme. By early 2005, officials
in the SEED were embarking on plans for implementing this latest policy text
161
I
Assessment and Learning
on assessment in Scotland in parallel with the equivalent official published text,
A Curriculum fur ExcellenC(: Ministrrial (SEED, 2004b), setting priorities
and targets for the curriculum.
How had Scotland come to lake this particular route to rethinking assess-
ment policy? tn his review of Scottish education policies Finlay (2004) argues
that almost all of the major policy developments in education would haw been
possible in the pre-l999 era of 'administrative devolution'; indeed milny of
those developments were initiated and had progressed prior to 1999:
TIre contribution of political devolution in Scotland has bam Ihe realiud opportu-
nity to tngllge the demos much more widely in demlXTalic processes. I1Jl1iting
people to amlribule at tarly stllges to identifying long lerm political priorities is
qrlitt diffrrenl from giving the frredom to choo!>e much more widely ill dem-
ocratic (2004: 8, emphasis in original)
With Scotland bt>ing the first of the four UK oountries to identify as5t>Ssml'nt for
learning as a policy priority and to move, from 2005, into whole system imple-
mentation it will be interesting to see the extent to which that distinctive polit-
ical ideology oontinues to oolour the realization of assessment for learning in
the day-to-day practices of schools and dassl"()Oms.
Multiplying policies - proliferating discourses
Whilst assessment policies in Scotland and England diverged during the 1990s
Wales, operilting within the Sdme legislative framework as England, used the
limited scope allolVe<! by 'administrati\"e devolution' to bt> more positive about
the value,o( teachers' assessments and to adopt a less aggressive acoounl.lbility
regime (D.lugherty, 2000). The absence of primary school performance tables, a
different approach to the inspection of schools (fhomas and Egan, 2000) and an
insignificant Wales-based daily press all meant that the media frenzy about
'failing' teachers and schools was not a feature of the Welsh public discourse.
There is evidence (see for example, Daugherty and 2(03) from
the pre-l999 era of administrative devolution that a distinctive policy em'iron-
ment in Wales resulted in the recontextualizing of London-based policies.
Howeve
1
after 1999 political devolution undoubtedly speeded up the pace of
change as Wales became an all'na for policy formulation as well as policy
implementation. The rhetoric of The Ltarniu8 COlmlry published by the Welsh
Assembly govemment in 2001, was foUowed up by; amongst other policy deci
sions, the abolition of national testing for 7-}"ear-olds from 2002. Assessment for
learning did not become a feature of the policy agenda in Wales until the
Daugherty Assessment Review Group, established by the Assembly Minister
for Education in 2003, published its recommendations (Daugherty, 2(04).
Encouraged by the minister's espousal of 'evidence-informed policy-making',
the group was influenced by research evidence from the Black and Wiliam
review, from the Assessment Refonn Group and from other assessment spe-
162
Constructing Assessment for Learning In the UK Policy Environment
cialists. One of its main recommendations was that 'The development of as5eSS-"
ment for learning practices should be a central feature of a programme for
development in WalL'S of curriculum and assessment' (2004: 31).
The discotlTSe of assessment policy in Wales, without the steady evolution
since 1991 that had led to the Scottish Executive's endorsement of assessment
for learning, had thus changed markedly after 1999. By 2004 assessment for
learning was an uncontested aspect of official policy in Wales. As part of its
more broadly based recommendations the agency with statutory responsibility
for advising the Welsh Assembly government also quoted the Assessment
Reform Group's definition of assessment for learning in its advocacy of policy
change: 'ACCAC recommends that it should be remitted to establish a pro-
gramme to develop assessment for learning' (ACCAC, 2004: 41).
The minister in Wales, Jane Davidson, announced in July 2004 that she would
be implementing these recommendations from the Review Group and from
ACCAC and her support for a new assessment policy framework was unequiv-
ocal:
Theu clnlT roidl'1lCf' ... that umed if wt' art' to g..t th.. best from lJUT
the curriculum and our leachrrs. I propose, therrfore, to mOl!(' away ovu
the next flJUT YfQTS from th.. CU/"Tf'nt testing rt'gim.. to Il system which is mlJll'
getlrt'd 10 the pupil, mor.. on skills and puts t..ach..,. at its hellr/,
(Dilvidson, 2004: 2J
In Wales, as in Scotland, political devolution had given a fresh impetus to the
rethinking of education policies in general (Rees, 2(05).
Northern Ireland, since its establishment as a state in 1922,. had developed
what McKt'Own refers to as a tradition of
Ildoptioll (somttimes with minor adap/lltiouJ of policy from GB so to obtain
parity of provision, the mm-implemCll/ation of GB policy whtll dumed inappro-
and the ofpolicy illitia/iVl'S, wIJ..,.e feasiblt, which art re/fVQ,tt
specifically to Norlhern lrelllnd.(2004: 3)
Any review of assessment pollcies in that part of the UK can be framed in those
terms although England, rather than Wales or Scotland, was usually the source
for policy borrowing rather than 'GB'. Thus the Northern Ireland framework is
recognizably a first oousin of that to be found in England (and in Wales), but the
systO'm of mainly faith-based schooling and thO' oontinuLod existt.'rll:e of aca-
demic selection for secondary schooling at age 11 are the product of the
countty's distinctive social context.
A culture in which 'assessment' is closely associated with the testing of 11-
year-olds would not appear to be favourable to the development, politically or
professionally, of assessment for learning. And yet, even in a policy environ-
ment where deep and longstanding political conflicts stalled the establishment
of a devolved legislative assembly; initiatives taken by the agency responsible
for advising on curriculum and assessment brought assessment for learning
163
Assessment and Learning
into the policy disrourse (CCEA, 2003; Montgomery, 2(04). As White's rom-
mentary on the CCEA 'Pathways' proposals notes;
Thl' m4jor priority is Ihllt llSSl'SSnlDlt should httppupils to Itllm, luchl'rs to ll'tU'h,
and paff/lls - Illl rtH'ducators - 10 support Ilmi SUp//Ermtnl flllult gutS 0/1 in
schoo/so This is why thr r"lplulsis ... is on IlSSlSS"'1'7rt jor ItQrnins rlllhn' than
llSSl'SSmt/l1 of lumi"K. (Whitt, 2()(U: J4)
1ne emphasis in the policy discourses in Northem Ireland has now been ron-
solidated with the key assessment lor learning approaches established within
the Key Stage I and 2 rurriculum as 'Ongoing Integrated Assessment' (CCEA,
2004: 10). At Key Stage 3 CCEA's Pathways consultatkm document proposed
that the' ... research carried out by the Assessment Refonn Group and others
has produced substantial evidence to show that by adopting these approaches
IIlSSI.'SSment for leamingl. the progress and achievement of pupils in all ability
ranges and the professional rompetcnces of teachers can be significantly
enhanced' (2003: 103). The then minisler for l>dUcation. Barry Gardiner, gave Ihe
final go ahead fot all of the curriculum proposals in June 200t
In England there were, during the second tenn of the Blair govemment (from
2(01), Iwo notable p.. ttems of developmentl'\"lating to assessment for learning.
1ne firsl was the lncrt'asing atlrotion to it in publications by England-based
organwti01\5 representing (SHA, Swaffield and Dudley, 2002;
NUT, 2004; CTC(E), 2(04). A joint publication from thret' of the teacher assoda-
lions, direded 301 ministers in England, made the case for assessment for learn-
ing to berome a 5ignificant part of lhe policy agenda for 'raising
standard5' wltiiSi also lilTing doubts about way in wltich thai term had
become part of the official policy discourse:
11It' mood of IlSSt$Sm1'711 for luming /lOW bring by rtlin
lIUIillly 011 ItIIdttrs' IInlllysis lind mllnllgtmtnl of datil to dillgrroK lind Illrgtl
pupils' ltllrning nt'tds. is ill dirtct co/ltrlldictioll 10, for uamp/t, 1M tfJtc-
lil./ltl1l'55 of 1M highly sucassfut optll1Nlch 10 IlS5l':SSmlnt for Itt/riling Ildopltd by
KillS'S Co/ltgr, london lind by IlIlliolllzl Ibstssmtnt 1kftmn Croup. (ATL
NUT lind PAT. 2004: 4'
Within these texts not only are the weU-establisked differences on policy
between teacher I't'presentatives and go\....mment apparerot but thel'\" OIl'\" also
'Ubstantial variations in the way5 in whim the teach!;!.r organizations defIne
assessmenl for learning and how its polential might be realized.
1ne lM.'COod significant series o( developments during Blair's second lenn
brought 'assessment for learning' for the first lime into the official discuurlll$
associated with education policy in England. The second lenn of a Blair-led
government WIS &5 wedded as Ihe first had been to 'strategie5', accompanied
by documentation and an infrastructul'\" of agencies implemroting
national directives. 1he language of target-setting and the discourse of perlor-
malivity I'\"mained, but attempts weI'\" also mad!;!. in some policy lexis 10 leaven
the discourse with a 'softer' message in which the individual 5"'dent's needs
weI'\" acknowledged. The 2004 Primary Strategy (for students aged 5 10 11)
,..
Constructing Assessment for learning in the UK Policy Environment
included guidance materials for schools and teachers in which assessment for
learning figured prominently. And yet. as a critique of the materials by an
organi7.ation representing assessment professionals points out, th{' mat{'rials
are 'problematic' and based on a model of assessment that is one o( 'frequent
5ummatiVl' assessment not formative assessment' (AAIA, 2005a: para 4.3) 1lle
Key Stage 3 Strategy (for shldents aged 11 to 14), whilst drawing more directly
on the .....ork of Black and Wiliam and the Assessment Reform Group, also
appcared trapped in a mindset that sees target-setting by tearnl'rs and schools
as the only route to higher achievement. That same uneasy mix of disparate dis-
courses was apparent in a ministerial speech early in 2004 which placed 'pcr-
sonalized learning' at the centre of the gm'emmenl's new policy agenda for
schools. Assessment for learning would be one of five 'key processes' in realiz-
ing this ambition to 'pcrsonalize' learning: 'Assessment for Learning that feeds
into lesson planning and teaching strategies, sets clear targets, and clearly iden-
tifies what pupils need to do to get there' (Miliband, 2004: 4).
Ol'veloping shldent autonomy through self- and peer assessment, which is
central to the view of assessment for learning that its academic advocates had
been promoting. is nowhere to be seen in this teacher-led and targel-dominated
usage of the term.
The report in 2005 of a 'Learning Working Group', commissioned by the
Minister for Schools in England but without any official stahls in the policy
process, refers both to the growing awareness of assessment (or learning and to
the proliferation of discourses when it comments that
AssC'smcnIivr Icarning is sprmding rapidly, in ,'Ilrl because iI, or more accurately
a version of il (some would IIrg14e a JJf'"I'f'rsionj, conlributes to the Key Stagt: 3
Strategy in Engla/rd, and in part becallS<' teach.-rs find that it works -thr safll-
tiftc evidence and the practiCf cvidCllce art aligned ami mutually suppertillf'. (Har-
greaws, 2005: 9)
Assessment for learning had, by 2005, been incorporated into the official policy
discourses in the other three UK countries and the term was increasingly lea-
hired in policy-related texis and associated discourses in England. But doubts
remained about the commitment to it of policy makers and ministers:
.,. for [(aming' is becoming aca/ch-all phrase, used to rep to a rallge
of practices. In semi l't'rsiO/ls it has been lumed inlo a series of ritualiud proct-
durn. In o/hrs it is laun to be more concerned with monitoring and rtcord-
keeping thatr urith using information to help Itarning. (fames, 2004: 2)
Conclusion
Within It''Ss than a decade assessment for learning became t"Slablished as an
element in the official policy discourses in each of the four c:ountries of the UK.
It did so in ways that reflected the distinctive cultures, policy environments and
165
Asses.sment and learning
policy processes of eam country. Whilst there are some common roots dis-
C't!mib'e in discourses across the UK there are also aspects of the process of
policy development that are fundamentally difftorent and can Ix> expected to
give rise 10 differences in the impilct of policy on practice.
1he academic discourse within the UK as a whole, though mainly based on
the Engtish evol\'ed during the 1990s. By the end of the century
there was an enriched literature, drawing on a growing body of empirical evi
denct'. The work of the PACE project is significant in this respect (Osborn et al..
2000; Pollard et 2000) as is that of Torrance and Pryor (1998) with its the0-
rizing In tt'nns of 'convergent' and 'divergent' modes of assessment But it was
the review by Black and Wiliam (199&), supported by the ways in which its
authors and the Assessment Refonn Group targeted policymakers in advocat
ing fonnath't' aMeSSment. which was inCTE'asingly recognized and reinterpreted
in the professional and policy discourses. 1ne new po6t-devolution administra-
tions in Sootland and Wm made overt commitments to 'evidence-infonned
policy'. This contributed to the evidence and argument from Black and Wiliam's
review of research, by the review's authors in Scotland and by
Daugherty in Wales. and becoming an e;q>lidl influcntt on the assessnwnl
policy decisions annoul'l'd during 2004.
In contrast. the dominant policy discourses and the main official policy texts
in England seemed at first to be largely unaffected by the evidence from
research or the advoncy of academics, Instead, it was developments in certain
localities, such as the KMOFAP and Learning How to Learn Project (see Chap-
ters I and 2) plus the work with groups of teachers by others such as Shirley
Clarke that fuell\>d a groundswell of interest in schools. Only when thai growth
in interest amongst leachers found echoes in somt' of the continuing centrally'
driven policy initiatiWll of the DfF.5 from 2003 onwards did the language of
assessment for leaming enter lhe policy discourses at national level in England.
Yet. as is evident from the examples quoted this infiltration into the
official discouNe'S brought with it sometimes worryingly disparate \-erslons of
both the 'why and the 'how' of assessment for learning. 'Personalized
with assessment for leaming as 0lWof five key components, WitS highlighted in
England in the run-up 10 e1ectiolU in 2005 as the educational dimension of the
govemment'l new drive 10 'personalize' public services. Pollard And James
Vl-elcome its potential to re-orientlte policies for schools tOWArdS the needs of
leamers whilst Also warning of the dangers of 'slipping back into over-sirnpli-
fied consideration of teaching provision lUld systems' (2005: 5). BUI
an accountability culture 110 strongly coloured by perfonnativity has meant
that, for many English professionals working al the lew!. discourses ass0-
ciated with student assessments are linked to parental choice of schools and
public measures of school perfonnance.
It i5 here that the concept of 'policy as discourse' is powerful. The 'silenoes'
of an enthusiastic policy rhetoric about assessment for learning comt' from
another, seemingly separate discourse - thai of performaliv!ly. Other silences
within the new di!lCOUrse of personalized learning which is suggesting an
individualized to assessment that is fllr from the constructivi.!lt and
'66
(onstructing Assessment for Learning in the UK Policy Environment
social learning notions that underpin assessment for learning (see Chapter 3).
Such silences speak louder to many than do the official policy texts in England
which now refer routinely to 'learning', displacing the discourse of 'what must
be taught' that had been dominant in the 19905.
The distinctive social and political cultures of these four countries are thus
increasingly apparent. For example, the assessment policy texts and discourses
at national level in Scotland and Wales acknowledge the reality of the tensions
created by trying to use evidence from assessment directly in support of learn-
ing whilst also using data, both about individuals and on cohorts of students,
for summative and evaluative purposes. The Scottish Executive's approach to
accountability was through the active involvement by represl,'ntatives of all
stakeholder interests in developing policy. This is in marked contrast to the
English approach which seeks to empower people by offering them choice in a
supposed market for schools, with aggregate data from student assessments as
the main indicator of school perfonnance. The emphasis on school self-evalua-
tion in the Scottish policy framework is another contribution to changing how
assessment data are used, thereby changing the perceived role of assessment in
the school system. Wales has been distancing itself from the inheritance of an
'England and Wales' assessment policy. But it is not yet clear whether those
responsible for policy in Wales, whether at national level or in schools and local
education authorities, realize how much of a culture shift is needed for teach-
ers to be able to dewlop their assessment practices in ways that ensure assess-
ment for learning is a mafor priority,
Understanding the social and political context is therefore crucial for an
understanding of the current status of assessment for learning in each of the
four countries. Understanding the interplay of discourse and text within each
country is also crucial for any judgement about the prospects for assessment
that supports learning and fosters student autonomy becoming embL'<idL'<i in
the routine practices of thousands of schools and tens of thousands of teachers.
It will be evident from this chapter that there are four 'policy trajectories' to
be found within the UK. At one level those who have long argued that fonna-
tive assessment has been neglected as a policy priority can be encouraged by
the fact that assessment for learning has moved up the official policy agenda in
all four countries o\'er the past decade. But it must also be remembered that the
policy developments reviewed here have all been located at the early stagC$ of
the policy cycle, namely those concerned with initiating policy at the national
level and articulating broad policy intentions for the system as a whole.
For a short lime in the late 1980s, the TCAT Report put fonnativc assessment
at the centre of a framework for national curriculum assessment for England
and Wales; the stages of policy development and implementation which fol-
lowed that Report ensured that, in 'policy as practice', it disappeared without
Irace during the 19905. Amarkedly more favourable social and political contl,')(t,
at least in Scotland and Wales, now offers better prospects for the ambitions of
recent policy texis in those countries being implemented in ways that, while
ineVitably mediated by practitioners, do not lose sight of the original policy
aims. In all four countries the policy cycle is only now beginning to unfold.
,.7
Chapter 10
Assessment for Learning: Why no Profile in US
Policy?
Dylan Wiliam
Th(' aim of this ch.:lpler is nol to provide an on'rvipw of assessment for learn-
ing in US schools' policy - given the lack of good evidence on this point, such a
chapler would either be vcr}' short, or highly speeulali\'l'. Instead, it is to
attempt to i!(counl for thc current position with regard to assessment for learn-
ing in the USA in Ii'll' light o( the history of assessment more generally. In the
broadest terms, the expeel<llion of high reliability and objectivity in the assess-
ment of students' learning within a culture of accountability and litigation
when things go wrons. has tended 10 deflfft policy developm",nts from ;my
consideration of improving learning through assessment.
The main story of this chapter, therefore, is how one highly specialized role
for assessment, the selection of students for higher l>ducation, and a wry
cialized solution \0 the problem, the use of an aptitude tl'St, gained wide
acceptance and usage. By eventually dominating other methtxls of selecting
students for university and ultimately influencing the methods of assessment
uSt.>d for other purposes, such approaches to asS<.'Ssment have eclipsed the use
of and to some extent discourse on formative assessment; that is, assessment
dl'Signed to support learning.
The chapter begins with a brief account of the creation of the College
Entrance Examinations Board and its attempts to bring some cohe/'('nce to the
use of written examinations in uni\'ersity admissions. The criticisms that were
made of the use of such examinations led to explorations of the use of
intelligence tests, which had originally used to diagnose learning
difficulties among Parisian school studl'nts but which had lx"Cn modified in
the USA to enable bl;mket t('sting of army rl'cruits in the closing stagl's of the
First World War. Subsequent se.:tions detail how the army intelligence test was
developed into the Schulastic Aptitude Test and how this test came to
dominate uni\'ersity admissions in the USA. The final sections discuss how
assessment in schools developed over the latter part of the twentieth century,
including some of the alternative methods of assessment such as portfolios
which were explored in the 198(}s and 199Os. These methods. with clear links
to assessment for learning. were ultimately eradicated by the press for cheap
scalable methods of testing for accountability a role that the technology of
aptitude testing was to fill.
169
Assessment and Learning
Assessment in US schools
For at least the last hundred years, Ute experience of US school students has
been Utat assessment means grading. From the third or fourth grade (age 8 to
9), and continuing into graduate studies. almost all work that is assessed is eval-
uated on the same literal grade scale: A, B, C, D or F (fail). Scores on tests or
other work that is expressed on a percentage scale are roul:incly converted to a
leiter grade, with cul-offs fur A typically ranging from 90 to 93, B from 80 to 83,
C from 7{) to 73 and D from 60 to 63. Scores belolV 60 are generally graded as F.
In high schools (and sometimes earlier) these grades are then cumulated by
assigning 'grade-points' of 4,3,2, I and 0 to grades of A, B, C. D and F respec-
tively, and then averaged to produce the grade-point average (CPA). Where stu-
dents take especially demanding courses, such as Advanced Placement courses
that confer college credit, the grade-point equivalences may be scaled up, so
that an A might get 5. However, despite the extraordinary consistency in this
practice across the USA, exactly what Ute grade represents and what factors
teachers take into account in assigning grades and assessing students in general
are far from clear (Madaus and KeUaghan, 1992; Stiggins et aI., 1986), and there
arc few empirical studies on what really goes on in classrooms.
Several studies conducted in the 1980s found that while teadwrs were
requin.>d to administer many tests, they had relied on their own observations or
tests Utey had constructed themselves in making decisions about students (Stig-
gins and Bridgeford, 1985; Herman and Dorr-Bremme, 1983; Dorr-Brenune et al.,
1983; Dorr-Brcmme and Herman,. 1986). Crooks (1988) found that such teacher-
produced tests tended to emphasize low-oroer skills sum as factual recall rather
than complex thinking. Stiggins et al. (1989) showed that the use of grades boUt
to communicate to students and parents about student learning on Ute one hand,
and to motivate students on the other, was in fundamental conflict.
Perhaps because of this internal conflict, it is clear that the grade is rarely a
pure measure of attainment and will frequently include how much efforl Ute
student put into the assignment, attendance and sometimes e\'en behaviour in
class. The lack of clarity led Dressel to define a grade as 'an inadequate report
of an inaccurate judgment by a biased and variable judge of the extcntto which
a student has attained an undefined level of mastery of an unknown proportion
of an indefinite material' (Chickering, 1983).
Inconsistency in the meanings of grades from state to state and even district to
district may not have presented too many problems when the grades were 10 be
used locally, but at the beginning of the twentieth century as students applied to
higher education institutions increasingly further afield. and as universities
switched from merely recruiting to selecting students, methods for comparing
grades and other records from different schools became increasingly necessary.
Written examinations
Written examinations were introduced into Ute Boston public school system in
1845 when the superintendent of instructioo,. Mann, decided that the 500 most
170
Assessment for Learning: Why no Profile in US Policy?
able 14-year-olds should take an examination on the same day (Traven, 1983).
The idea was quickly taken up elsewhere and the results were frequently used
to make 'high-stakes' decisions about students such as promotion and
tion. The stultifying effects of the examinations were noted by the superinten-
dent of schools for Cincinatti:
... tlleY hllve occllsiolled lind mllde u>elluigh imperll,iw the use of mechll/licaland
rote I11l!tllods of teaching; they havt occasioned cl1lmming and the most vicious
habits of study; they Iurot cllused much of the ovoerpresSUfe charged upon schools,
scme of which is real; thl!!{ hllve t"nrptcd both tf'llchers lind pupils to disJwnnty;
lind IIISI bu' nol lellst, they hllve pamitted II mechllniclll method of school supavi-
siol1, (White, 1888; 519)
Admission 10 higher education institutions in the USA al the time was a rather
informal process. Most universities were recroiting rather than selecting stu-
dents; quite simply there were more places than applicants, and at times,
admission decisions appear to have been based on financial as much as aca-
demic criteria 1986).
In the period after the civil war, universities had begun 10 formalize their
admissions procedures. In 1865 the New York Board of Regenls, which was
responsible for the supervision of higher education institutions, put in place a
series of examinations for entry to high school. In 1878, they added 10 these
examinations for graduation from high schools which were used by universities
in the state to decide whether students ready for higher t'duration. Stu-
dents who did not pass the Regents examinations were able 10 obtain 'local'
high school diplomas if they met the requirements laid down by the district.
Another approach.. pionC('red by the University of Michigan. was to accredit
high schools so Ihat they were able to certify students as being ready for higher
education (Broome, 19(3) and several olher universities adopted similar mech-
anisms. Towards the end of the cenlury, however, the number of higher t'duca-
tion institutions to which a school might send students and the number of
schools from which a university might draw its students both grew. In order to
simplify lhe accreditation process, a large number of reciprocal arrangements
were established. Although attempts to co-ordinate these were made (see Krug.
1969), particularly in the elile institutions, it appears that university staff resis-
ted Ihat loss of control over admissions decisions. The validity of the Michigan
approach was also weakened by accumulating evidence thai teachers' grading
of student work was not particularly reliable. Not only did different teachers
give the same piece of work different grades, but even the grades awarded by
a particular teacher were inconsistent over time (Starch and Elliott, 1912, 1913).
As an altemative, the Ivy League universities (BroWf\, Columbia. Cornell. Dart-
mouth, Harvard, Pennsylvania. Princeton and Yale) proposed the use of common
written entrance examinations. Many universities were already using written
entrance examinations, for example Harvard and Yale since 1851 (Broome, 19(3),
buteach university had its own system with its owndistinctive focus. The purpose
behind the creation of the College Entranot> Examination Board in 1899 was to
171
I
Assessment and learning
establish a set of common examinations scored uniformly thai would bring some
coherence to the high school curriculum, while at the same time allowing indi-
vidual institutions to make their own admission decisions. Although the idea of
a common high school curriculum and associated examinations was resisted by
many institutions, the College Boards as the examinations came to be known
gained increasing acceptance after their introduction in 1901.
Th(' original CoUege Boords were highly pn>dictable tests - even the specific
passage of Homer or Virgil thai would be tested was made public - and so there
was conU'm that the tests ilSS<'ssed the quality of coaching rather th.m the talent
of the student. For this reason the College Board introduced its New Plan exam-
inations in 1916, focusing on just four subjects and placing greater emphasis on
higher-.order skills. Originally the New Plan examinations were taken almost
exclusively by students applying for Harvard, Princeton or Yale. However.
other universities quickly began lo.see the benefits of the 'New Plan' examina-
tions and for two reasons. Firstly, they provided information about the capabil-
ity of applicants to reason critically as opposed to regurgitating memorized
answers, and secondly, they freed schools from haVing to train students on a
narrow range of content. Although there was also SOllle renewed interest in
models of school accreditation (for example in New England), the New Plan
examinations b..><:ame increasingly popular and were quickly established as the
dominant assessment for university admission.
However, these examinations were still a compromise between a test of
school learning and a test of 'mental power'; more forused on the latter than the
original College Boards, but still an assessment that depended strongly on the
quality of preparation received by the student. It is hardly surprising.. therefore,
that the predominance of the 'College Boards' was soon 10 be challenged by the
developing technology of intelligence testing.
The origins of intelligence testing
The philosophical tradition known as British empiricism held that all knowl-
edge comes from experience (in contrast to the continental rationalist tradition
which emphasized the role of reason and innale ideas). Therefore, when Galton
sought to dl'fine measures of intellectual functioning as pari of his arguments
on 'hereditary genius' it is nol surprising that he forused on measures of
sensory acuity rather than knowledge (Galton, 1869). Building on this work, in
1890 Cattell published a list of ten mental tests that he proposed might be used
to measure individual differem:es in mental processes. To a modem eye,
Catll'll's tests look rather odd. They measured grip strength, speed of move-
ment of the arm, sensitivity 10 touch and pain, the ability to judge weights, time
taken to reacl to sound and to name colours, accuracy of judging lenglh and
time and memory for random strings of leiters.
In contrast, Binet had argued throughout the 1890s that intellectual func-
tioning could nol be reduced 10 sensory acuity. In collaboration with Simon he
produced a series of 30 graduated tests thai forused on allentioTl, communica-
172
Assessment for Learning: Why no Profile in US Policy?
tion,. memory, comprehension,. reasoning and abstraction. 11U"ough extensive
field trials, the tests were adjusted so as to be appropriate for students of a par-
ticular age. Ua child could answer correctly those items in the Year 4 tests, but
not the Year 5 tesls, then the child could be said 10 have a mental age of four.
However, the results were interpreted as classifications of children's abilities,
rather than measurements, and were used in particular 10 identify those stu
dents who would require additional teaching to make adequate progress. In
fact, Binet stated explicitly
I do not bdinle that one may measure ont of the i71/elltctual aptitudes in the sense
thllt Olio.' mill5Ut'fS a Irngth or a capacity. Thus, whtn a person studied CDn re/Din
sevetl figures after a single audition, Olio.' can class him, from tho.' point of his
_oryfor figures, after tho.' individual who retains tight figures under tire SIlme
conditions, and txfore those who rttain six. /I is a c1assifialtion, not 1/ mill5Ure-
mm/. (cited in VI/ron, 1936: 47)
Binet's work was broughllO the USA by Goddard who translated the tests into
English and administered them to the children at the New Jersey Training
School in Vineland. He was somewhat surprised to discover that the classifica-
tion of children on the basis of the tests agreed with the informal assessments
made by Vineland teachers; 'It met our needs. A classification of our children
based on the Scale agreed with the Institution experience' (1916: 5).
In the same year, Terman (1916) adopted the structure of the Binet-Simon
lests, but discarded items he felt were inappropriate for US contexts. He added
40 new items, which enabled him to increase the number of items per test to six.
The resulting tests, known as the Stanford-Binet tests, were then developed
in multiple-choice versions for use with army recruits. Known as Army Alpha
and Army Beta tests, the US Army trials proved successful, providing scores
that correlated highly with officers' judgments about the capabilities of their
men. This resulted in their full adoption and by the end of January 1919, the
lests had been administered to 1.726,966 men (Zenderland, 2<XXl).
InteUlilence tests In university admissions
The Anny Alpha test results demonstrated the feasibility of large-scale, group-
administered intelligence tests and shortly after the end of the First World War,
many universities began to explore the utility of intelligence tests for a range of
purposes.
In 1919, both Purdue University and Ohio University administered the Army
Alpha to all their students and, by 1924, the use of intelligence tests was wide-
spread in US universities. In some, the intelligence tests were used to identify
students who appeared to have greater ability than their work al university
indicated; in others, the results were used to inform placement decisions both
between programmes and within programmes (that is, to 'section' classes to
create homogeneous ability groups). Perhaps inevitably, the tests were also
used as performance indicators: to compare the ability of students in different
m
departments within the same university and to compare students auending dif-
ferent universities. In an early example of an attempt to manipulate 'league
table' standings, Terman. still at Stanford which was al the time regarded as a
'provincial' university, suggested selecting students on the basis of intelligence
test scores in order to improve the university's position in the reports of uni-
versity meril then being produced (Terman.. 1921).
Around this time, many universities began to experience difficulties in
meeting dPmand. 1he number of high smool graduale5 had more than doublf'd
from 1915 to 1925 and although many universities had tried to expand their
intake to meet demand, 50me were experiencing subslMltiai pressure on plac:es.
As levine noted' ... a small but critical number of liberal arts coUcges enjoyed
the luxury of seltcting their student bodies for the first time' (1986: 136). In
order to addresslhis issue, in 1920 the College Board established a commission
' ... to investigate and report on general intelligence examinations and other
new Iypes of examinations oHered ill several secondary school subjects'. The
task of developing 'new types of examinations' of content was given to
lnomdike and Wood of Columbia Teachers' College, who prest'llted the first
'objective examinations' (in algebra and history) 10 the College Board in 1922.
Four years earlier, some of the leading public universities Mel founded the
American Council on Education (ACE) to represent their intcresl5. In 1924 ACE
asked Thurslone, a psychologist at the Camc.-gie Institute of Technology, to
develop a 5erk'S of intelligence tests. Thurslone had hoped that his work would
be embraced by the CoUege Board but they in tum set up their own Committee
of Experts to investigate the use of 'psychological lests'. Although the commit-
lee included notable psychologisl3, no-one from TeacherS College was invited,
despite the foundational work of Thomdike and Wood in both inleUigence
and the development of 'objective' tests of subject knowledge. This was
to have severe and far.reaching implications for the development of the test that
came to be known as the Scholastic Aptitude Test. As Hubin notes ' ... from its
inceptkln. the Scholastic Aptitude Tesl was isolated from advancell in education
and leaming theory and ultimarely isolaled from the advances in a field thai
later would be called cognitive psychology' (1988: 198).
The Sc:holastic Aptitude Test
1he first version of the Scholastic Aptitude Test was produced in 1926 and
administered to 8026 students. As Brigham wrote in the introduction to the
manual thai accompanied the
1M Imn 'SCltOlllSlic IIptitu!k test' rnu rrfrrtnu 101M tI" of tXRminli/ion now in
curnn/ IlMfind Vlrriowsly Cllllttl 'psydwlogiclll Itsts', 'inltlligtllU ItStS', 'mhI1l1f
tlbility Its'S', IIfrontSS 'tS'S' tt cr:tm. Tht (Dmmilt liStS tM Itrm 'lIpli
tlldt' to distingui!lt such ItSts from tests ofll'llilrillg ill school AllY c1l1ims
tlUlI "Plitudt 'esls now in 11M rttllly m_lIn 'gmmd intttligtrlu' or 'gtntl'llf
tlbilily' mflY or lIUIy not SIlM/IIIltilltttl. It luis, bn MY St1ltrlllly
tsltlblishrrllhlll high scortS in slldl ItslS uSlllilly indiCIIlt IIbiJity 10 do II high orlkT
174
Assessment for learning: Why no Profile in US Policy7
of ;;cholastic wvrk. The tam 'scholastic uptitud/"' mukes no slronger claimfor such
tfSls Ihan thlltthae;s a tendency for individual differences in scores in lests
to hi" associated positivdy with individual diffrrences in subsequenl aClldemic
attahlmenl. (1926: l)
Initially, the .1cceptance of the SAT was slow. Over the first eleven years, the
number of test takers grew only 1.5 per <:ent per year. Most members of the
College Board (including Columbia, Prin<:eton and Yale) required students to
take the examination but m'o (Harvard and Bryn Mawr) did not, although sin<:e
most students applied to more than one institution both Harvard and Bryn
Mawr did have SAT scores on many of ilS students which provided eviden<:e
that could be used in support of the SAT's \'aJidity, and this evidence was crocial
when Conant, appointed as president of Harvard in 1933, began his allempts to
make Harvard more mentocratic.
One of Conant's first acts was to est<lblish a new scholarship programme and
he determined that the SAT, together with schooltranscnpts and recommenda-
tions, should form the basis of the Harvard National5cholarships administered
in 1934-6. The SAT proved to be an immediate success. Students awarded
scholarships on the basis of SAT scores did well .1t H.1rvard; indeed the 1981
Nobel Prize winner (Economic 5cien<:e), James Tobin, was one of the early recip-
ients of a Harvilrd scholarship. Emboldened by the success of the SAT, Conant
persuaded 14 of the College Board universities to base all scholarship decisions
on objectively scored multiple-choice tests from 1937 onwards.
From its first use in 1926, the outcomes on the SAT had been reported on the
familiar 200 to 800 scale, by scaling the raw scores to have a mean of 5(X} and a
standard deviation of 100. From 1926-40, this norming was based on the stu-
dents who took the SAT each year, so that the meaning of a score might change
from year to year according to the scores of the students who took the test. Since
the early period of the SAT was one of experimentation with different sorts of
items and formats, the difference in meaning from year to y"ar may have
quite large e\-en if the population of tt>St-lakers did nol change much. Respond-
ing to complaints from administrators, in 1941 the College Board inlroduced a
system of equating tests so that each form of the verbal test was equated 10 the
\'ersion administered in April 1941 (Angofi, 1971) and the mathematics test to
that administered in April 1942. At the same time, the traditional College Board
writlen examinations were withdrawn.
At the time of these the test was taken by than 20,0Cl0 students
but by 1951, three years after the Educational Testing Agency began to admin-
ister them, the number of SAT takers had grown to 81,000 and by 1961 to
805,000. In 2004, thf.' SAT was taken by 1,419,007 students (College Board, 2004).
While the SAT remainf.'d substantially unchanged for over sixty years, its name
has not. In 1990, the College Board changed its name to the Scholastic Assess
ment Test, and in 1996, it decided that the leiters did nol stand for anything. It
was just Ihe SAT.
The most serious and enduring ch.l1lenge to the predominance of the SAT
came in 1959, when Linquist and McCarrell of Iowa University established Amer-
175
,
Assessment and Learning
iean College Testing (now called simply ACT'). Lindquist was an acknowledged
leader in the field of psychometrics and had edited the first edition of the field's
'bible', EduClItioJ/1Il1 Measuremellt (Lindquist, 1951). ACT was strong where the
College Board was weak. lbey had very strong links with public universities,
especially in the mid-west,. and had a strong in measuring school
achievement. And where the College Board was interested in helping the elite
universities in selecting srudents, ACT was much more interested in placement-
helping universities decide which programmes would suit an individual. In
reality, however, the differences between the ACT and the SAT are not thai clear
cuI. Despite its origins in thc idea of assessing intelligence, the SAT has always
been a test of skills th3t are developed al school; students with higher levels of
reasoning skills find mastering the material for the ACT easier. In fact, the eoITt'
lation between the scores on the SAT and the ACT is 0.92 (Dorans aI., 1997;
Dorans, 1999). To all intents and purposes, the two are measuring the same thing.
Ne\ertheless, many srudents lake both tests in order to maximize their <:hances
of gclting into their chosrn university, and almost as many students lake the ACT
eadl year (1, 171,460 in 2(04) as take lhe SAT (ACT, 21Xl4).
Ever since its introdudion, the SAT has been subjeded 10 mu<:h critical
scrutiny (again, see Lemann, 1999 for a summary), but things came to a head in
2001 when Ri<:hard Atkinson, president of the University of California,
announced that he had asked the senate of the university not to require SAT
reasoning test scores in considering applicants. In doing so, he said:
All teo oftetl, lmiwrsitics use SAT scores to ra'rk appliCatlts ill determillillg
who should l'f' admit/cd. This use of the SAT is /l0/ compatible with the US view
all how merit should be defilled and opportuuities distributed. The strength of us
society hos l>eetl its belief that actual achil'lN'm..'lt SllOlild be u'llIlt molters most.
should be judged ,m the basis ofwhM tht'Y IlIlve made of Ille apport IlIli-
tics available to them. 111 other words, in AmaiclI, stiidellls sllQuld be jlldged 011
what trey hUl'C accomplished dllrillgfour years of lrigh school, taking into OecOII'11
their opportunities. (Atkitlson, 20(1)
Because the SAT and the ACT are, as noted above, essentially measuring the
same thing, these criticisms are not well-founded in terms of the quality of deci-
sions made on the basis of tesl seorl-'S. The criticism is really one about the
message that is sent by calling something 'general reasoning' ralher than
'school achievement' - essentially an issue of value implications (Messick,
1980). Nevertheless, Ihe threatl'Tloo loss of income was enough to make the
College Board change the SAT to focus more on achievement and 10 include a
writing test. The new test was administered for the first time in March 2005,
The SAT therefore appears set to dominate the arena of admissions to US
universities for years to come. No-one really understands what the SAT is meas
uring, nor how a test is able to predict college grades almost ilS well
as the high.school grade point average (CPA) which is built lip from hundn.'<Is
of hours of assessed work. Nevertheless, the SAT works. [t works partly because
it is athJned to the US higher education system. In most European uni-
176
,
Assessment for learning: Why no Profile in US Policy?
versities, selection to university is combined with placement into a specific pro-
gramme, so information is needed on the applicant's aptitude for a particular
programme of study. In US universities, students do not select their 'major'
until the second or third year, so at admission information on specific aptitudes
is not needed. The SAT works also because it is well-suited to a society with a
propensity to litigate. The reliability of the SAT is extremely high (over 0.9) and
there is little evidl.'nce of bias (minority students get lower soort.'s on the test, but
also do less well at college).
In terms of what it Sl.'ts out to do, therefore, the SAT is a very effective assess-
ment. The problem is that it set the agenda for what kinds of assessment are
acceptable or possible. As the demand to hold schools accountable grew during
the final part of the twentieth century, the technology of multiple--choicl' tl'Sling
that h"d developed for the SAT was easily prcsso.>d into service for the
assessment of younger children.
The rise and rise in assessment for accountability
One of the key principles of the constitution of the USA is that anything that is
not specified as a federal function is 'reserved to the states', and this notion (that
has within the European Union giv{'n the inell'gant nam{' of 'subsidiarity')
is also practised within most states. Education in particular has always a
local issu{' in the US. so that for e"ample dc<isions about curricula, teach{'rs' pay
and conditions of service and organizational structures are not made at the state
lewl but in 17,000 school districts. Most of the funding for schools is raised
in the foml of taxes on local residential and commercial property. SinCt' the
school budget is g{'nl.orally determined by locally elected Bo.1rds of Education
there is a \'el)' high degree of accountability, and the annual surveys produced
by the Phi Delta Kappan organization indicilte that most communities are
happy with their local schools.
Frum tho:' 1960s, however, stale fInd f(-cleral sources had bt.><:Uffil' greater and
greater net contributors (Corbett and Wilson, 1991: 25), which led to demands
that school districts become accountable thl' local community and the
state has thus plaYl>d a greatl'r and greater role in ('<jucation policy and funding.
For example, in 1961 California introduo.>d a programm{' of achiew.'ment
testing in all its schools although the nature of th(' tests was l('flto the districts.
In 1972, the California Assessment Program was introduCt'd which mandated
multipil' choire tests in language, arts ,lnd mathl'matics in gr"dcs 2, 3, 6 and 12
(tests for grade 8 were added in 1983). Subsequent legislation in 1991, 1994 and
1995 {'nacted n{'w state-wide testing initiatives that were only partly imple-
mented. Howcvcr, in 1997 new legal rt.-quirements for curriculum standards
were passed which in 1998 led 10 the Standardized Testing and Reporting
(STAR) Program. Under this programme, all students in grades 2 to II take the
Stanford Achio:'wnwnt Test - a ballery of nurm-rt'ferenced tcsts - every rear.
Those in grades 2 to 8 are tested in reading, writing, SlX'lling and mathern'ltics,
and those in grades 9, 10 and II arc tested in reading, writing, mathematics,
177
Aueumen1 and Learning
science and sodill studies. In 1999 further 1egisliltion introduced the ACiidemic
Performilnce Index (API), iI weighted index of scores on the Stanford Achieve-
ment Tests, with i1WilrdS for high-performing schools and iI combination of
sanctions and additional resourn-s for schools with poor performance.
same legislation also introduced requirements for passing scores on the tests for
entry into high school, iIlld for the award of a high--school diploma.
portfolios
Many states t:xperimented with alternatives to standardited tests for mon-
itoring the quality of education and for attesting to the achievements of indio
vidual students. In 1974, the National Writing Pro;m (NWP) had been
established at the University of Califomia, Berkeley. Drawing inspiration from
the practices of professional writers, the National Writing Project emphasized
the importance of repeated redrafting in the writing process and so, to aS5t'SS
the writing process properly, one needed to see the development of the final
piece through _ral drafts. In judging the quality of the work,. the degree of
improvemrot across the drafts was.s important as the quality of the final draft.
lbe emphasis on the process by which a piece of work was created. rather
than lhe resulting product. wu also a key feature of the Arts-PROPEL project-
a collaboration between the Project Zero research group at Harvard University
and the Educational TC'lIting Service. 1he idea witS that students would' ... write
pot>ms, compose their own 5Of\&5, paint portraits and tackle other "real-life"
pro;ects as the starting point for exploring the works of practising artists'
(Project Zero, 2005). Originally, it appeal'5 that the interest in portfoli05 was
intended to be primarily formiltive but many writel'5 also called for perform-
ance or authentic assessments to be used instead of standardized tests (Berlak
et al.. 1992; Gardner, 1992)
Two stales in particular, Vermonl iIlld Kentucky, did explore whelher
portfolios could be used in place of standardized tests to provide evidence for
itCCOuntilbility and some districts also .ystems in which
portfolios were used for summative a5Se'SSments of individual students.
..r, the use of portfolios was attacked on several grounds such as being
' ... costly indeed, and slow and curnbenome' and' ... its biggest naw as an
external assessment is its subjectivity and unreliability' (Finn, dted In
Mathews, 20(4).
In 1994, the RAND Corporation released a report on the use of portfolios in
Vermont (Koretz et 31., 1994), which is regarded by many as a turning point in
the uS(' of portfolios (_ fur example, Mathews, 20(4). Koretz and his team
found that the meanings of grades or scores on portfolios were rarely compa-
rable from schoo1 to school because !here witS liltle agreement about what sorts
of elements should be included. The standards for reliability that had been set
by the SAT simply cou.ld not be matchrd with portfolios. While advocates
might claim that the latter were more valid measures of learning, the fact that
the same portfolio would gel diffel'1"J'll scores according to who dKi the scoring
made their use for summative purposes impossible in the US context.
178
Assessment for Learning; Why no Profile in US Policy7
In fact, even if portfolios had been able to attain high levels of reliability, it is
doubtful that they would have gained acceptance. Teachers did feel that the use
of portfolios was valuable, although the time needed to produce worthwhile
portfolios detracted from other priorities. Mathematics teachers in particular
complained that portfolio activities took time away from basic skills and com-
putation. Furthermore, even before the RAND report, the portfolio movement
was being eclipsed by the push for 'standards-based' education and assessment
(Mathews, 2004).
Standards
In 1989, President Bush convened the first National Education Summit in Char-
lottesville, Virginia, led by (the then) Governor Clinton of Arkansas. Those
attending the summit, mostly stale governors, were perhaps not surprisingly
able to agret' on the importance of involving all stakeholders in the education
process, of providing schools with the resources necessary to do tht> job, and to
hold schools accountable for their performance. What was not so obvious was
the agreemenllhat all states should establish standards for education and they
should aspire to having all students mm those standards. In many ways this
harked back to the belief that all students would learn if taught properly, a
belief that underpinned tht> 'payment by ll.'Sults' culture of the first half of tht>
nineteenth century (Madaus and KeJlaghan.. 1992).
The importance attached to 'standards' may appear odd to Europeans but the
idea of national or regional standards has been long established in Europe. Even
in England, which lacked a national curriculum until 1989, there was substantial
agreement about what shouJd be in,. say, a matht'marics curriculum since all
teachers were preparing studCflt5 for similar sets of public examinations.
Prominent in the development of national standards was the National
Council of Teachers of Mathematics (NCfM), which published its ClllTiculllm
lind EVQlllalion Standards jor Mathematics in 1989 and ProfrssiOlral Standards for
Tnu:hing Mnthemalics two years later (NCfM, 1989,1991). Because of the huge
amount of consultation which the NCfM had undertaken in constructing the
standards they quickly became a modt'l for states to follow, and over the next
few years every state in the USA except Iowa adopted state-wide standards for
the major school subjects. States gradually aligned their high-stakes accounta
bility tests with the state standards, although the e:dent to which written tests
could legitimately assess the high-order goals contained in most state standards
is questionable (Webb, 1999).
Texas had introduced a state-wide high-school graduation test in 1984. In 1990,
the graduation tests were subsuml'd within the Texas Assessment of Academic
Skills (rAAS), a !iCries of untimed standards-based achievement tests for grades
3 to 10 in reading, writing, mathematics and social studies. Apart from writing.
these tests are in multiple-choice format. MassachuseUs introduced statl"-wide
testing in 1986. The original aim of the assessment was to provide infonnation
about the quality of schools across the state, much in the same way as the
National Assessment of Educational Progress (NAEP) had done for the country
'79
Assessment and Learning
as a whole (jones and Olkin, 2(04). Students were tested in reading, mathematics
and science at grade 4 and grade 8 in alternate years until 1996, and only scores
for the state as whole were published. In 1998, howe\'er, the state introduced the
Massachusetts Comprehensive Assessment System (MCAS), which tests Sh.Ldents
at grades 4. 8 and 10 in English, mathematics, scieoce and technology, social
shldies and history (the last two in grade 8 only). The tests uS(' a variety of
fonnats including multiple-choice and ronstrocted response items.
In reviewing the development of state-wide testing programmes, Bolon sug-
gests that many states appeared to be involved in a competition which might be
called 'Our standards are stiffer than yours' (2000: 11). Given that political time-
scales tend to be \'el')' short, it is perhaps not surprising that politidans have
been anxious to produce highly visible responses to the challenge of raising
sh.Ldent achievement. Howe\'er, the wisdom of setting such challenging stan-
dards was called into question when,. in January 2002. President Bush signed
into law the No Child Left Behind (NCLB) Act of 2001.
No Child Left Behind
Technically; NCLB is a reauthorization of the Elementary and Seamdaf)' Edu-
cation Act originally passed in 1965 (in the USA much legislation expires unless
reauthorized) and is a complex piece of I{'gislation,. even by US standards. The
main requirement of the act is that, in order to receive federal funds, each state
must propose a series of staged targets for achieving the overall goal of all stu-
;n grades J...8 to bt.> proficient in reading and mathematics by 2014
(although the definition of 'proficient' is left to each state). Each school is judged
to be making 'adl."lJuate yearly progress' (AYP) towards this goal if the propor-
tion of students being judged as 'proficient' on annual state-produced stan-
dards-based tests exceeds the target percentage for the state for thai year.
FurthemlOre, the AYP requirements apply not only to the totality of sh.Ldents in
a grade but also to specific sub-groups of students (for example ethnic minor-
ity groups), SO that it is not possible for good perfonnance by some student sub-
groups to offset poor perfonnance in others. Among the many sanctions that
the NCLB mandates, if schools fail to make AYP then paJ'('l1ts have the right to
have their child moved to another school at the district's expense.
It has been claimed by some (see for example, Robson" 2(04) that NCLB was
dt."Signed by Republicans to pave the way for mass school privatization by
showing the vast majority of public schools to be failing. In fact, the act had
strong bipartisan support Indeed some of the most draconian dements of the
legislation" such as the definition of 'adequate yearly progress', were insisted on
by Democrats because they did not want schools to regarded liS successful if
low by some students (for example those from minority ethnic
communities) were offset by high perfonnance by others. However, it is clear
that the way that the legislation was actually put into practice appears to be
very different from what was imagined by some of its original supporters, and
an incre(lsing number of both Rcpublic(ln (lnd Democratic polil:icians are calling
for substantial changes in the operation of the Act.
180
Assessment for Learning: Why no Profile in US Policy?
Failure to make AYP has severe consequences for schools, and as a result
many schools and districts have invested both time and money in setting up
systems for monitoring what teachers are teaching and whal students are learn-
ing. In order to ensure that teachers cover the curriculum, most districts have
devised 'curriculum pacing guides' that specify which pages of the set texts are
to be covered every (and sometimes each day). With such rigid pacing.
there are few opportunities for teacheTS to use information on student per-
formance to address learning needs.
Very recently, there has also been a huge upsurge of interest in systems that
monitor student progress through the use of regular formal tests that are
designed to predict performance on the annual state tests - some reports
suggest that this may be the fastest growing sector of the education market. The
idea of such regular testing is that students who are likely to fail the state test,
and may therefore prevent a school from reaching its AYP target, can be identi-
fied early and given additional support. For this reason these systems are rou-
tinely described in the USA as 'formative assessment', even though the results
of the assessments rarely impact on learning and as such might be better
described as 'early-warning summative'. In many districts such tests are given
once a on a Friday. Thursdays are consumed with preparation for the test,
and Mondays with reviews of the incorrect answers, leaving only 4l) per cent of
the a\'ailable sub;"cl time for teaching. While the pressure on schools to
improve the performance of all students means Ihat schools in the USA are now
marl' than ever in need of effective formative assessment, the conditions for its
development seem e\'cn less promising than ever.
Conclusion
In Europe, for most of the twentieth century, education beyond the age of 15 or
16 was intended only for those intending to go to university. The consequence
of this has been that the alignment between school and university curricula is
\'ery high - indeed it can bc argued that the academic curriculum for 16to 19-
year-olds in Europe has been delem\ined by the universities, wilh consequent
implications for lhe curriculum during the period of compulsory schooling. In
lhe USA, howe'ier, despite the fact that for most of the twentieth century a
greater proportion of school lea\'ers went on to higher education, the high-
school curriculum has always been an end in itself and determined locally. The
advantage of this is that schools arc able to serve their local commu-
nities well. The disadvantage is that high s<:hool curricula are often poorly
aligned with the demands of higher education and this ha5 persisted even with
the introduction of state standards (Standards for SUcce5S, 2(03).
When higher education was an essentially local undertaking the problems
caused by lack of alignment could be addressed reasonably easily, but the
growth of national elite universities rendered such local solutions unworkable.
At the time o( its 'ossification' in 1941, the SAT was being taken by less thiln
20,000 students each rear (Hubin, 1988), and it is entirely possible thilt it would
181
Assessment and Learning
have remained a test required only of those students applying for the most
selective universities, with a range of altematives including achievement tests
also in use. It would be unfair to blame the SAT for the present condition of
assessment in us schools, but it does seem likely that the dominance of (he SAT
and the prevalence of multiple-choice testing in schools are both indications of
the same convictions, deeply and widely held in the USA about the importance
of objectivity in assessment.
Once multiple-choice tests wetI,' established (and not long afterwards, the
machine marking of tests - see Hubin, 1988), it was probably also inevitable that
any form of 'authentic' assessment such as examinations that required extended
responses let alone portfolios would have been found wanting in comparison.
This is partly due to such assessments tending to have lower reliability than
multiple-choice items because of the differences between raters, although this
can be addressed by having multiple raters. A more important limitation,
within the US context, is the effect of student-task inter3roon - the fact that with
a smaller number of items, the particular set of items included may suit some
students better than others. [n Europe, such variability is typically not regarded
as an aspect of reliability - it's just 'the luck of the draw'. However, in the USA,
the fact that a different set of items might yield a different result for a particu-
lar student would open the possibility of expensive and inconvenient litigation.
Once the standards-based. accountability movenlent began to gather momen-
tum, in the 1980s, the incorporation of the existing technology of machine-
scored multiple-choice tests was also probably inevitable. Americans had got
uSt.>d to testing students for less than $10 per test and to spend $30 or more for
a less reliable test as is commonplace in Europe, whatever the Ildvantllgl"S in
terms of validity, would be politically very difficult.
However, even with annual state-mandated multiple-choice testing it could
be argued that there was still space for the development of effective formative
assessment. After all, one of the key findings of the research literature in the
field was that attention to formative assessment raises scores even on state-
mandated tests (Crooks, 1988; Black and Wiliam, 19983), Nevertheless, the
prospects for the development of effective formative assessment within US edu-
cation St.'Cm more remote than ever. 1lle reasons for this are of course complex,
but two factors appear to be especially important.
The first is the extraordinary belief in the value of grades, both as a d('Vice for
communication between teachers on the one hand and students and parents on
the other, and also as a way of motivating students despite the large and mount-
ing body of evidence to the contrary (see Chapter 4).
The serond is the effect of an extraordinary degree of local accountability in
the USA. Most of the 17,000 district superintendents in the USA are appointed
by directly-elccted boards of education, which arc anxious to ensure that the
money raised in local property taxes is spent efficiently. Under NCLB, the
superintendents are required to ensure that their schools make 'adequate yearly
progress'. The adoption of 'early.waming summative' testing systems therefore
represents a highly visible response to the task of ensuring that the district's
schools will meet their AYP targets.
182
Assessment for learning: Why no Profile in US Policy?
There arc districts where imaginative leaders can see that the challenge of
raising achievement, and reducing the very large gaps in achievement between
white and minority students that exist in the USA, requires more than just
'business as usual, but with greater intensity'. But political timescalcs are short
and educational change is slow. Asuperintendent who is not re-elected will not
change anything. Satisfying the political press for quick results with the long-
term vision needed to produce effective long-term improvement is an extraor-
dinarily difficult and perhaps impossible task. There has never bet>n a time
when the USA needed effective formative assessment more but, perversely,
never have the prospe...1s for its sucressful development looked so bleak.
"3
Chapter 11
Policy and Practice in Assessment for learning:
the Experience of Selected DECO Countries
Judy Sebba
Studies of assessment for learning in countries other than the USA or those in
the UK potentially provide a rich and stimulating source of evidena> for under-
standing practices in assessment for learning. In Chapter 9, Daugherty and
Ecdestone provide an analysis of policy developments in the four countries of
the UK and in Chapter 10, Wiliam outlines reasons for assessment for leaming
not featuring in policy in the USA. This chapter draws on illustrative examples
of practice from sell'C!ed countries that participated in an OECO study of fann-
alive assessment. Daugherty and Ecclestone quoted Ball (1990) as suggesting
that poHdes are pre-eminently statements about practire. intended 10 bring
about solutions to problems idmtified by individual teachers and schools.
Classroom practice can thus be seen as a key measure of policy implementiltion
and it is examples of classroom practice from different countries that are pre-
sented and analysed in this chapter. Some brief comments are made about poli-
cies in these countries but a comprehensive analysis of educational policies in
these countries is beyond lhe scope of this chapler.
In 2005, the Centre for Educational Research and Innovation (CERI) al OECD
published research (OEcD, 2(051) on 'fonnath'e aSSl'SSment' in lower secondary
education, drawing on case studies involVing eight countries: Canada, Denmark,
England. Finland, Italy, New Zealand, Australia and Scotland. I undl'rtook thI'
case study of Qul'CnSland, Australia which was written up with Graham Maxwell,
a senklr manager working in the locality. This chapler draws heavily on examples
from the OECD study including thOO(' case studies in Canada. Denmark, New
Zealand and in particular, Queensland. There was no significant differences
between!hCSl' countries and the others not mentioned - it is simply that these case
studies provided illustrations of some emerging themes. It is important to
acknowledge thaI the eight COuntriL'S were from Europe, North America and Au...
tralasia and did not include a country that by any definition oould be described as
'developing'. Furthennore, the chapter draws only on illustrative examples of
policy and practice and the analysis cannot thl'refOTe be claimed to provide a com-
prehensive or definitive picture of the countries involved.
The OECO study provides a basis for idt'ntifying some common themes in
assessment for learning policy and practice thaI can be compared across coun-
tries. These involve, at the most basic level, what is included in assessment for
learning ('fonnative assessment' as it is called in the study), the nature and role of
185
A.s5e55ment and Learning
feedback, self and peer-assessment, the relationship between student grouping
strategies and assessment for learning and teacher de\'elopment. ill addition to
these classroom level issues, there are contextual factors in schools and beyond
which enhance or inhibit the implementation of assessment for learning strate-
gies. These factors, such as the role of leadership, schools as learning
organizations and students as agents of change. are nol specific to developing
assessment for learning and might more appropriately be viewed as strategies for
school improvement. They do, however. vary within and aCTOSS different coun-
tries thus innuendng the capadty for assessment for learning strategies to be
effective. The links between assessment for learning at the classroom le\el,
teacher development and school impro\ement are further explored in Chapter 2.
Before considering these themes, four underlying tensions are acknowl-
edged which need to be taken into account when drawing interpretations and
inferences from comparisons of assessment for learning across countries. These
are: focusing on difference at the expense of similarities; the innu('nce of cul-
tural contexts; problems of transferability of strategies across countries; and the
m('thodo]ogicallimitations of research involVing short visits to a small St.'CIOr of
provision in oth('r countries. These issues have been more extensively debated
in the literature on comparative education (for example Vulllamy et 011.,1990),
but nl't.'d to be mentioned here to urge caution in drawing generalized infer-
ences from the examples presented.
Focusing on difference at the expense of similarities
In undertaking any comparative analysis. there is a danger of focusing exclu-
sively on differences and ignoring or under-acknowledging similarities.
Throughout this chapter, an attempt is made to draw parallelS between the
experiences of assessment for learning across different countries and to seek
multiple interpretations of both the similarities and differences observed. [t is
important 10 attempt to distinguish between differences in tenninology, defini-
tions and meaning and real di{f('rcnces in policy and practice. International
comparisons are frequently hampered by a lack of agreed consistenlterminol-
ogy, which acts as a barrier to communication, development of understanding
and the drawing of conclusions. For example, much greater emphasiS was put
in som;countries on the use of test data to infonn teaching as a component of
formative assessment, whereas others saw this as distinctively separate from
formative assessment as su,;h.
The influence of cuttural contexts
A second underlying tension concerns the nl't.'d to acknowledge the cultural
contexts, within which assessment for learning strategies are being imple-
mented) Attempting to understand the cultural differences between countries
and indeed, between different areas within countries, is a considerable chal-
lenge extensively debated in comparative education (for example VuJliamy et
186
Policy and Practice in Assessment for Learning
aI., 1990). Broadfoot et al. (1993), provided strong evidence of the influence of
culture on the educational organization and processes in different countrics.
The interaction between national cultures and educational policies and stroc-
tures adds further complexity to this. For example, national policies prescribing
curricula, assessment and accountability systems provide stroctural contexts
which may reflect, or indeed crea\(', particular cultural contexts. Vulliamy
(2004) argues that the increasing knowledge and information associated with
globalization are in danger of strengthening those positivist approaches that
threaten the centrality of culture in comparative education.
Problems of transferability
"Thirdly, and partly related to issues of cultural context, the transferability of
strategies and innovations from one country to another is a further underlying
tension. The assumption that educational innovations that have an effect in one
context will have the same or any effect in another has been challenged by
many writers (for example Crossley, 1984). The Black and Wiliam research
review (1998a), which indicated the substantial impact of assessmenl for learn-
ing strategies on students' learning and generated extensiw international inter-
est, drew on a large number of studies from different countries but was
ultimately limited to those written in English. The OECD study included
appendices of reviews of the literature in English. French and German.
However, more often the findings of a single study undertaken in one country
are taken as evidence for the efficacy of that strategy in another country. Fur-
thermore, as Fielding et al. (2005) have demonstrated, the concept of 'best prac-
tice' is contested and its transfer from one individual, institution or group to
another is much more complicated than national policies tend to acknowledge.
Methodologkalllmtlations of ntSNrdI based on short ' .,...r visits
A final underlying tension is the methodological limitations of what Vulliamy
el al. (1990) referred to as 'jetting in' experts or researchers to other countries for
short periods. The OECD study inmlved 'experts' from one country visiting
another country for a short time (1-2 weeks), and in partnership with 'experts'
from the host country visiting schools, analysing documentation and inler-
viewing those individuals and groups involved as consumers or providers of
assessment for learning. While many comparati\'e studies are probably simi-
larly Iimit.."<i, the methodological weaknesses in research design. data colle<"-
lion, analysis and interpretation and of trying to identify appropriate research
questions, construct a research design and implement it in a short period wilh
limited understanding of the context, raise serious questions. Drawing infer-
ences from a ncressarily partial picture in contexts where others have deter-
mined what dOCtlmenls are accessed.. who is interviewed and observed and
perhaps the views that are conveyed.. is problematic. For example, it is dear that
observations undertaken in two schools in QtlL'ensland cannot be assumed to be
representalive of thai slate, lei alone Iypical of Australia as a whole. General-
'87
Assessment and Learning
izatioos should therefore be minimized and interpretations treated as illustra-
tive rather than indicative. Vulliamy et aI. argue that in-depth qualitative
r ~ r h undertaken prior to the full study can inform tnc subsequent study
questions and design, thereby increasing relevance, validity (for example in
interpreting tenninology) and understanding. 'They also discuss ways in which
research capacity can be developed with local populations.
Despite these limitations, the DECO smdy provided rich descriptions of a
variety of forms of practice in assessment for learning, in a range of different
contexts, which offered interesting insights and suggested that some classroom
practices and challenges may share moll' similarities than differences.
What is induded in .SHssment for ....-.,inglfonnatlv.
assessm.nt?
In the DECO study 'fonnative assessment' is defined as:
... frequent, intl!ractive s ~ s m ..nls of sludent prog1?SS (lnd undaslanding to
idrntify ltarnillg nuds (lnd (ldjust traching appropriatl!/Y. (2005: 21)
This definition differs significantly from that which is prOVided in the intro-
duction (ARG, 2002a) and which has been adopted in the national primary and
secondary strategies in the UK. The Assessment Reform Group definition puts
considerably gll'ater emphasis 00 the use to be made by learners of the assess-
ment information. The DECO definition instead stresses the adjusting of teach-
ing in light of the assessment.
Similarities in the strategies encompassed in formative assessment across the
eight countries included; establishing learning goals and tracking of individual
students' progress towards these goals; ensuring student understanding (rather
than just skills or knowledge) is adequately assessed; providing feedback that
influences subsequent teacning; the active involvement of students in the learn-
ing process. But the emphasis given to each of these and the additional strate-
gies which were included under the formative assessment umbrella varied
considerably. For example, in the Danish case study schools there was gll'ater
emphasis on developing self-confidence and verbal competence. In New
Zealand, formative assessment was linked to the Maori Mainstream Pro-
gramme within which the importance of cultUll' is emphasized through group
work,. co-construction of knowledge and peer solidarity (Bishop and Glynn.
1999). Several of the case studies included use of summative assessment data as
part of their formative strategies, even where the use has been for whole school
improvement rather than individual learning, which arguably falls outside the
ARG definition given in the lntroduction.
The nature and role of feedback
At one of the two schools in Queensland (Sebba and Maxwell, 2005) students
well' working 00 their individual assignments in the library using books, articlC9
188
Copyrighted Material
Policy and Practice in Assessment for Learning
and the internet to research globalization in the context of a company they had
chosen, for exampl... Nike or McDonald's, The teacher individually saw about half
of the 25 students in the group to review their progress. She asked challenging
open questions 10 encourage them to cxtend and deepen their investigations and
gave sJx:eific feedback on what they needed to target for improvement.
In each of th... twu schools, I analys...d moll' than 20 students' files from across
yeM groups and ability ranges (as identified by the teachers) in order to check if
and how grades were used. I collected examples of comment marking and
looked for evidence that students had acted upon the comments. One of the
schools used no grades at all and the comment marking was characterized as
very specific and almost always incfudL'{\ targl'ls for improvement. What distin-
guished it in particulM from the comment mMking I have experienced in
England was that even positive comments were e1aboraicd to ensure that stu-
dents were left in no doubt about why a piC' of work was so good. For example:
You girls lww done ajrlll!astic job!! Not only is your information accurate and
well-researched, bllt you have also successflllly completed tile extrnsiorl tasks.! ny
and ket7' an eye on the difference be/wee'l 'mdangm:d' and 'extinct' and wafch
YOllr spelling. Bul please keep up Ihis brilliant effort! YOII have all gone aboue and
beyond in this actiVity!! Well done!!
The comments, in purticular for less high achieving students, were additionally
often characterized by empathy and humour:
L, an C1'cellenl effort, I would like 10 join you ill your miSSIOn to Mars.
At this school, l1-year-old students said thut grades or marks were never given.
They felt that this helped them work to their own standard and not worry about
comparing themselves 10 other people. They all claimed 10 Il'ad and act upon
the comments and suggestL'd that the t...ach...r was always wiffing to discuss
them. In both schools, students claimed to read and net uJXl'1 comments written
on work and there was some evidence of this, though there were no spedfie
consistent strategies as observed in a few schools elsewhere, such as keeping a
list of mmmcnts in the front of a book and expecting students to indicate when
and where (indicnted by a page ref('rence) these have been acted upon,
However, tenchers and students identified lessons in which time wns allocated
to making revisions in response to comments given.
[n Denmark (Townshend et aI., 2(05), one case study school pul great
emphasis on v r ~ l competcnciL'S. Goal-setling and {)fal feedback were strong
features of the formotive assessment work. Orol assessment was preferrccl
because it was qUick and f1exib[e ond allowed an immediate response from the
student enabling misunderstandings to be clarified rapidly. [ndividual student
interviews took piaa' several limes a year in order to assess progress and sel
new guals focusing on SUbjL'Ct outcomes, work attitudes and sociol skiffs. As in
the Queensland schools, the lessons often incorporated periods of reflective
feedback from students to teachers which resulted in adjustments to teaching.
189
Copyrighted Material
,
Assessment and learning
Students used logbooks to record their reflectiuns and these were u5ed for
teacher and student to enter inlo a dialogue. Forster and Masters (1996-2001)
provide further examples of this in their materials developed in Australia on
dC\'elopmental assessment.
Effecth'c feedback seems to be characterized by specific comments that focus
on srudenls' understanding rather than on quality of presentation or behaviour.
Oral feedback may allow for grealer exploration of understanding and be more
immediate but written comments allow teachers grealer flexibility to reflect on
students' work and allocate more time 10 this process, though the dialogue may
be delayed and stilted. [n conle:ds with a strong oTal tradition such as the
Danish case study school. the balance was more in favour of oral thJn written
fl'edbOlck. Effective oral questioning and fl't"dback 5l'i'm to Te<Juire the teacher to
be confident and competent in that subject area and to ha\'e the flexibility to 'try
another way' in Iheir questioning and feedback strategies in order to ensure that
the message has been understood. A much fuller acroun! of this issue can be
found in Black et a1. (2003).
Self- and peer assessment
In the PROllC programme in Quebec (S1iwka et aI., 2005) teaching is organized
aroundtnterdisdplinary projects with a strong emphasis on collaborative group
explordtion. All projects in the programme make exlensh-e use o( 10 for
resean:J1, reporting and assessing. At the start of each proiect students identify
their individualleaming targets and at "'gular intervals they an> given time for
writing reflections on their own leaming. their learn learning and the achieve-
ment of their targets. These written reports are the basis for future target setting
and choices. Peer assessment is used 10 give feedback on each others' regular pre-
sentations of work and Of\ teamwork skllls. Students reported needing to adjust
10 the level of autonomy expected compared to their previous schools: 'You
understand that you are responsible, you are in charge' (SHwka et aI., 2005: 102).
In Saskatchewan (Sliwka et aI., 2005), one school uses electronic learning
portfoli . with younger children to record their own learning. They keep exem-
plary pieces of work, scan in art work and are taught how to assess their own
work. The same researchers noled thai in a school in western Newfoundland
portfOliO!> are similarly used by students to record Iheir best work alongside
reflective journals. In pairs, Ihey \Ise criteria prOVided by the teacher 10 give
each other feedback on ways of improving their quality of writing in English.
These practices an> used fonnatively to support continuous learning but may
also contribute 10 summative purposes of assessmenl.
Teachers and students in Queensland have had to ..Idapt to Ihe development
of an outcomes-based assessment system in which th;.ore is no testing. Teachers
are trying to ensure thai the students are aware of and understand the outcome-
based statements and can assess themselves against the standards. When inter-
viewed, the students described reflection time as a featllre of most lessons. This
involved Ihc use of their leaming ioumals in which questions to be addressed
190
I
Policy and Praeti((' in AS50eSSment fO( learning
included 'what do you understand about ... l' They gave examples 01 marking
each others' work and giving each other feedback on written work. Self and
peer-assessment was a strong fealure of the lessons observed in Queensland.
Every week there was an allocalro time for Year 8 and some Year 9 studml5 to
reflect on their leaming. working with others and their experiences, and to
write comments aboul it in their learning journals. Teachers Vo-ere allowed to
read these but nol allowed to write in them,
In another lesson OIl this school. OIl the end of each artivity, the students were
invited to assess it on difficulty; student feedback determined whether the
teacher moved on to the next activity or gave a further explanation of the pre-
vious one. Students and other staff interviewed confirmed that reflections of
this type were regularly built into lessons.
In Queensland, peer assessment was less well devclopt."Ci than self-assess-
mmt in the lessons observed. This may reflect the additional demands on teach-
en and students of peer assessment and the skills thai we have noted elsewhere
needing to be taught (for details of the skills, see Chapters 1and 5; also Kutnick
el a!., 20(5). Feedback between students tended to be at the level of whether an
outcome was corTed or not. rather than indicating how to improve it. Students
were encouraged to reflect on how effectively they had worked as a group, as
well as how well they had completed the task. One student had entered the fol-
lowing commt.'Ilt into her journal:
Ytsttrday my group and f madt dijfeTrnl shapes o/a urtllin siu Oul oll/n4l$pnper.
I got frustrated Tloht:n nobody would IisttPI to mt. Bul wefiniwd a squart and tlt'O
m:tansles. Listtn. Nom: of our gmup t ~ s listtPled 10 tllch other, Wt 1I1/1uld
idras but wouldn'l explain thml. Thm il would all tnd up in a mess.
In both schools in Queensland there \Vas a strong ethos of developing lifelong
learners rather than only getting students through school together with the
!;('1lior urtificate tlwy received on leaving. This was reflected in a strong focus on
leaming to learn and on students taking responsibility for their own actions, Self-
and peer assessment were seen by students and tearners 10 be contributing 10 this
but it was acknowledged that they required skllls that had to be taught
The relationship between student grouping strategies
and assessment for learning
Assessment for learning encourages students to develop greater responsibility for
their own learning but also to regard their peers as a potential re5OUK\l for k.,.rn
ing and thus to become I('SS dependent on the teacher. Assessment for learning
l't.'<juires effective group work,. in particular in lhe area of peer assessment. Well
dewloped peer assessment is very demanding on students' social and communi-
cation skills, in particular listening. tum-laking. clear and concise verbal and
written expression. empathy and sensitivity. 1here is substantial evidmce that
group work skills need to be taught (for example. see Kutnick ct al .. 200S for a
19'
and Learning
review) and that highly eErective groop work takes a long time to develop. Teadl-
ing shJdents to self-rclIe<1 was observed as follows in a school in Queensland:
Si:ctun Yrar 9 pupils in II PSHE fpcwnllill/ld socilll hea/lh tduClllimrl It$5(Jn
u'l11Ud i" Stl/-stltdtd groups of four. Tht school Iuut II 'buddyinS' for
incoming YNr 7 pupils whtrdly they had a" idtnlifitd 'buddy' fro'" YtlIr 10 htlp
Stltlt IlItm inlo schooJ. Most pupils in this Ytvlr 9 c1/lSS Iuld IIpplitd to bt
and wtrt to bt inttrVitwtd to 5tt ifl/lty art suilablt jor this rolt. Tht Ittlc:htr IISktd
tht groups to idtnlify u>lult cllllrllcttriSIU:s 'budditS' nmi. Sht g/lV't tht'm 1m
wdiscuss thi$ and drllU' up a lisl. Sht i"vited Iht groups 10Jttdbock. Sht
IIrtn i"lIiltd IhmJ '0 spmd 10 minults working 0111 wllllt qutSlions Iht i"ttr't'itw-
t'r5 u'Ollld ask,hnn to drllU' out, whtlhtr tMy Iuut Ihl$l' sl";/Is lI"d hO/.lllhty uJould
IIIlSII'tl'" tht'Sl qunlKms. Ihttnd oflllt Ihird ,illily 1I11dJttdbllCk * ttSktd Ihnn
wlIS5t5S haw IMy rrorktd i" Ihdr groups lind inlliltd fttdblld.
Despite the challenges of group work, students and teachl:>rs alike in the' two
Queensland schools reported bet"lefidal outcomes of using tht>se stralL-gi6. In
interviews in one school the shJdents claimed that they worked in groups for
about half the lessons and that this helped them to de\'elop their understanding
through testing out their ideas, examples and explanations on others. They sug-
gested that the disadvantages of working in groups included having' ... to
work with people yOll don't like, who hold you back or mess about'. Overall.
they felt that the advantages oulwi:'ighed the disadvantages and favoun.-d the
mixed-ability groups that they usually experienced:
I rtWm it's imporlanlla Iultlt pmp/t workinS tCJgtlhn' lit dilfrrtnt In""s, lhen Iitt
ptOplt lit hishtr Ittltls alii INCh lilt Pft'p/t at /.ower lnJtls in Ihtir own UJIIY. In Iht
rtaJ fl'Orld you u.,rt with di/ft:mlt Pft'plt, you who you work
unlh lind working with otll" ptOp/t you dont know Iltlps (Stbba and
Maxwtll, 2005: 202-3)
In II school in Denmark (Townshend et aI., 2005), 'core groups' prOVided oppor-
tunities for reflection on goals, effort and outcomes. The students ga\<t' each
other oral feedback. recorded their views in th...ir logbooks and cvaluatl.'d one
another's academic achievements and presentational skl1ls. This was done for
each pro;cct thai School leaders reported that students were
more competent at reflting on their own learning, identifying their targets
and engaging in social interaction.
Teacher development
The definition of formative assessment in the DECO study rerers to adjusting
teaching in response to feedback about and in this sense, formativt'
and teacher development are inextricably linked as emphasized in
Chapter 2. Many of the case studies refer to evaluation of teaching. For
192
Policy and Practice in Assessment for learning
example, in one school in Denmark Townshend et aI. (2005) noted that teachers
evaluated subject courses in teams as part of their departmental meetings. This
enabled lhem to compare the same student's progress in different sub;ects. The
focus of these evaluations was decided by prior agreement between teacher and
student. In this way, teaching as well as students' progress were assessed.
In Quebec. the university teacher educators acknowledged that in lhe
PROTIC programml' (Sliwka 1'1 aI., 2(05) leachers have to adapt to a shifl in
control and responsibility for learning from teacher to student, as noled in
earlier chapters. lrn:>y are also reqUired 10 recognize that they are nolihi' only
SOUTO:' of knowledge in the classroom. Thl're is evidenct' that the PRQTIC
approach has had an impad on the teaching approaches used by other Il'ach
ers, such as those in Saskatchewan who reported that whereas previously they
had worked in complete isolation,. they are now much more interested in
working together and sharing resources and have developed a clear focus on
how students learn since introducing formative assessment. This professional
development issue is developed further in Chapter 2.
Teachers at one school in Queensland (Sebba and Maxwell, 2005) shared
pieces of work and discussed comments they had made on them as well as the
work itself. They reported that this challenged their thinking and developed
their practice. Heads of department saw this as professional behaviour for mod-
eration purposes rather than monitoring of marking for accountability pur-
poses. It was seen as relatively easy to do in a small school in which
departments are not isolated.
In the case study school in Denmark (fownshend et al., 2005) teacher devel-
opment is supported through a centre for teaching innovation based in Copen-
hagen.. eslablished specifically to develop innovalory practices and share these
with schools across Denmark. They plan and provide in-service courses for
teachers, support school improvement in othtor schools and publish materials in
accessible professional journals. Teachers are challenged by self-evaluation in
that they are concerned they may be insufficiently 'objeclh'e' but there is evi-
dence of a continuing drive for more secure and effective teaching approaches.
School leaders in the case study schools acknowledged the cultural change
required 10 impleml'fll formative assessment slrall'gies effectively.
SChool improvement conteJrtual factors
The role of leadership
II was a feature of a number of the schools in the DECO shIdies that the head
teacher or principal had re<:ently changed, and thai the school had bt.."t'n restrue-
hired and significant new whole-school iniliath'es introduced. New managers
often proVided the impehIs for change or were appointed 10 provide this, bUI
frequent changes to senior management teams can be a threat to longer-term
suslainability of new initiatives and assessment for learning stralegies is nol
exempt from this. There was evidence in one of the schools in Queensland that
the changes brought about through assessment for learning including shIdent
193
Assessment and learning
self-reflection., group work and comment marking had become embedded in
the infrastructure of the school and could thereby 'survive' some degree of
senior management change.
Developing schools as learning organizations
The schools where significant progress had been made were characterized by
an ongoing commitment to further development. They did nol express views
suggesting they thought they had reached their targets and could reduce their
energies. Consistent with research on developing schools as learning commu-
nities (for example, McMahon et aI., 2(04), these schools recognized the impor-
tance of engaging with the wider community beyond teachers with many, for
example, having well-developed. mechanisms for ongoing dialogue with
parents about formative assessmenl. The emphasis on teachers working in
teams and helping to develop one another seems to have been another feature
of these schools, which is an issue considered in Chapter 2.
Students as agents of change
There was some evidence of students' awareness that their school was different
to others and of linking this 10 aspects of fonnative assessment. For example, in
one school in Queensland students that teaching strategies compared
\<ery favourably to those used in other schools attended by their friends, sug-
gesting that other schools reliL'<i more heavily on worksheets and students
received less full explanations from teachers. Students in one of the Denmark
schools noted that they enjoyed beller relationships with teachers and thai
instead of just 'gelling grades', they were engaged in a process with their teach
ers which en.. bled them to gel to know them beUer and to discuss expectations.
There was, however, liule evidence from the DECO case studies of well-
developed. examples of students acting as change agents in their schools in the
realm of formative assessment. Fielding (2001) has proposed levels of engage-
ment for students in schools that enable them to become 'true' contributors to
or even leaders of change. For example, students might be expected to chal-
lenge the school on why peer assessment opportunities provided in one subject
were not created in other subjects. This would seem to be II potential next devel-
opment for the schools
The impact and relevance of policy
Australia has a national curriculum framework based on agreed national goals
stilled ilS providing young Austrillian<; with the knowledge, skills, attitudes and
values relevant to social, cultural and economic needs in local national and inter-
national settings. This includes a commitment to an outcomes based approach
across eight key learning areas and an emphasis on developing lifelong learners.
The Queensland Studies Authority published principles (QSA, 2005) that empha-
'94
Policy and Practice in Assessment for Learning
size that assessing students is an intt.'Sral part of the teaching and learning
process and that opportunities should be provided for srudents to take l'e'>ponsi-
bility for their own learning and self-monitoring. TIle syUabus documents rec-
ommend that assessment be continuous and on-going and be integrated into the
learning cycl.... _ that is, provid.... the basis for planning. monitoring student
progress, providing feedback on teaching and setting new learning targ.... ts. One
of the principles developed by the Assessment Reform Group (ARC, 20013), and
presented in the Introduction. relates to the need for assessment for learning to be
pari of effective planning for teaching and learning.
There is no external testing or examining in secondary schools
in Queensland. Reporting in y.... ars 1-10 is currently a school responsibility and
is unmoderated. School-based assessments for the Senior Certificate (Year 12)
are currently moderated by subject-baSt.>d panels of expert teachers, prOViding
advice to schools on the quality of their assessment and judgments based on
sample portfolios.
There are important contextual policy factors that seem likely to support the
practices observed in the two Queensland schools. Perhaps the national cur-
riculum framework, less prescriptive than that in England, is a useful contex-
tual factor enabling a strong focus on formative assessment by reducing the
necessity for teachers to dt.>Iermine what to teach. The lack of extemaltests and
examinations passed without comml.'tlt in the teacher interviews in the Queens-
land schools, yet as a systematic review of research on this has shown (Harlen
and Deakin Crick, 2003), high-stakes testing and publication of results af(> asso-
ciated with teachers adopting a teaching style that favours transmission of
knowledge. This not only reduces the use of teaching approaches consistent
with assessment for learning but is likely to consume energy that
teachers COtlld instead direct into assessment for learning. Finally, the status
ascribed to teacher summative assessment in the Queensland system suggests
that formative assessment is better recognized than in some other systems, for
both its contribution to learning and to summative assessment. Its role as a key
professional skill for teachers, as advocated in the principles in the introduc-
tion, is recognized in this system.
In Denmark, the 200J Education Act introduced an outcomes-based curricu-
lum defining competencies for all srudents. Schools are required to publish
annually the results of average grades on their websites, though these seemed
not to take account of prior attainment and were thef(>fof(> regarded by those
interviewed as f(>f1eeting intake and not effectiveness. There was no evidence
that this accountability framework was inhibiting the developments in forma-
tive assessment in the schools. At the time the case studies were conducted
(2002-3) the Ministry of Education had just changed the definition of the role of
headteachers, so in one of the case srudy schools the head had used formative
assessment as part of the change str.ltegy adopted.
Educational policy in Canada is set at provinct'!territory level. At federal
level, monitoring across the provinces takes place but curricular guidelines are
produced in the provinces and territories and these often emphasize learning to
learn skills. In western Newfoundland, the mandate from the Department of
195
Assessment and Learning
Education and school district that required test (attainment) data to be the basis
for school improvement has influenced the developing focus on analysing
progress and addressing the needs of 'weaker' students, partly through forma-
tive assessment. Initial resistance was followed by a gradual change in culture
and analysing data is now a key focus of staff development activities, closely
linked to evaluation by teachers with individual students. This ll'flects the
Assessment Reform Group principle (ARC, 2OO2a) of promoting a commitment
to a shared understanding of the criteria by which studt.>nts all' assessed. The
tension for teachers ll'mains how to ll'concile the demands of summative
testing with formative assessment, though this does not seem to be severely
limiting developments in assessment for learning.
conclusion
Despite considerable differences in cultural contexts across the DECO case
study schools, what teachers do at the classroom level may be surprisingly
similar. For example, the feedback provided to students on their work, the
development of self and peer-assessment and the implications of formative
assessment for group work have overlapping practices across countries. Per-
ceptions of these, however, by students, teachers, senior school managers and
teacher educators may differ as a result of the considerable differences in
national policy contexts. The national assessment and curriculum framework,
accountability mechanisms and underlying values ll'flected in these .. bout the
purposes of education. lifelong learning skills and skills for employability, will
enhance or inhibit to different degrees the teachers' capacity to adopt, imple-
ment and sustain formative assessment practices. Furthermore, as has been the
experience of school improvement in general schools that are effective at imple-
menting a specific strategy, in this case formative assessment, are often those
which can redirect the requirements and additional resources made available
through national policies to support their established plans.
Note
1 I would like to acknowledge the extensive writing and editing that Janet
Looney of DECO contributed to the report.
I
196
Assessment for Learning: A Compelling
Conceptualization
John Gardner
At a seminar in 1998, hosted by the Nuffield Foundation at Irn:.ir London head-
quarters, the Assessment Reform Group launched the Black and Wiliam review
pamphletillsidr tile Box. The review itsell. and the pamphlet, immediately
attracted critkal acclaim and have continued to enjoy significant impact on
assessment thinking throughout the UK and further afield to the present day.
However. one moment in the event sticks out clearly in my memory. After the
main presentation. a senior educational policy maker stood up and declall"d
that he had heard it all before; we had nothing new to offer. Indicating.. with a
glance al his watch, that he had commitments elsewhere he promptly left the
seminar before the discussion proper gol undenvay. My immediate urge was 10
rush after him and say 'Yes, you are absolutely right! But il seems to us that,
powerful as it mighl be, formative assessment is actually off the schools' and
policy-makers' radar! Surely we need to do something quite urgently if we are
to reap the benefits we know are there?' I resisted the urge and instead a year
later, at the same \'enue and with the same sponsors, we injected the urgency
we all felt was needed. We launched the pamphlet for Learning:
Beyond the Black Box. This pamphlet deliberately and directly challenged official
complacency and inertia.
Six years on, the Assessment Reform Group can now record an impressive
list of dissemination successes and official endorsements of assessment for
learning from, for example, the Scollish and Welsh governments, the
curriculum and assessment agencies of England, Scotland, Wales and
Northern Ireland, and from overseas jurisdictions as diverse as Hong Kong
and the Canadian province of Alberta. However, in contrast to the situation in
Scotland, Wales and Northern Ireland the policy agenda in England n'mains
somewhat hamstrung.. with an accountability focus driving assessment policy
and specifically with schools being evaluated on the basis of the performance
of their students on external assessments. The ensuing and controversial
'league tables', which purport 10 indicate the relative quality of education in
the schools concerned, arguably increase the emphasis on 'teaching to the test'
as schools focus on raising their students' performance in external tests and
assessments. There is evidence that the richness of the delivered curriculum
suffers and that the pedagogic techniques associated with assessment for
learning are neglected.
197
I
Assessment and learning
Paradoxically; assessment for learning's central message, prompted by the
research 'rview of Black and WiJiam (l998a) and disseminated vigorously by
the Assessment Reform Group, is thai o\'erall standards and individual per-
formanC(' may be improved by actually emphasizing formative aSS<-'SSffienl
techniques such as student self-assessment, negotialion of le.lfning goals and
feedback to identify next steps. This message is now squarely on the 'radar' of
English schools as it continues to generate interest at the grassroots level.
attrdcting official endorsement in major areas such as the Key Stage 3 national
curriculum and in the professional development publications of the Qualifica-
tions and Curriculum Authority (QCA, 2(04).
Much progress is therefore being made, but let me return for a moment to the
observations made by our disappointed seminar guest above. [ readily mncede
that the principles and processes of assessment for learning are not novel in any
real sense; indeed they have a fairly lengthy pedigree in curriculum and assess-
ment developments in the UK. [ could reflect on Harry Black's work with teach-
ers in the early 1980s (Black, 1986) or [ could cite the work by Harlen that led to
the publication of professional development materials under the title Match and
Mismatch (Harlen et aI., 1977), to illustrate the point. Such sources would be in
keeping with the book's primary fOOlS on schools but I will illustrate the
breadth of Te<:ognition of the principles we espouse with an example from post-
compulsory (vocational) education. The quotation that fol1ows could conceiv-
ably have appeall'd at any time in the last seven years since the publication of
I/lside the Black Box (Black and Wiliam, 1998b) and the subsequent Assessment
Reform Group outputs: Assessment for vaming: lkyond the Black Box (ARG,
1999); Assessmrot for /.Laming: 10 Principles (ARC, 2002a) and Testing, Motivation
Qud vaming (ARC, 2002b).
However, the quotation I reproduce below was actually written in 1986 by
Pring as part of an analysis of developments in vocational curricula, initially
sponsored by the 1979 publication of the Department of Education and
Science's Further Education Unit's A Basisfor ChoiCt'. He argued that a number
of implications for assessment had begun to emerge in the wake of the various
initiatives in post-mmpulsory qualifications and summarized them as follows:
First, what had 10 /If' Qssessed was dijfrrellt. A curriculum that stresses pUs<mal
drtrl!lopmellt, social awartness, cooperative learning, Ilmblem solfling, is seeking to
assess dfJfrrent qualities from those in traditional forms of examination.
Secondly, the purpose of assessment was different ... the main purrese of
was the diagnosis of learning neals with a fliew to promoting the
process of learning. It is difficull 10 provide wdl-inji:mllfd guidallu, and const-
qllent I/egotiation of further learning txpriences, wi/hollt some assessment of
wlrat silidents know or c.m do. The1?fore, it was recommended Ilrllt lire QSsess
ment sholiid /:Ie plIrt of QcontinllOus, formative prvfilf of lhe experiences aud
achievemellts of the student. Fur/hermore, it was envisllged that this profile would
/:Ie the basis of regular teacher/student diS{;ussioll Iwd guidQnce of educational
l/rvgress. ... The rQdicQI difference lies not only ill the contenl of what is tllugh/ but
aloo in the processes of learning and thus the demands upon assessmenl. hI its
198
I
I
for Learning: A Compelling Conceptualization
Rnources Shul ... the Joint Beard /City aud Guilds of Londo" InstituU and lhe
Business and Technician Education Council] says:
'Ifthe individual student is to bt enabled to make the most ofhi$/he-r programme,
the quality of IIII! /lSStSSml!nl syslem aud ils link with supportive guidance will /It
critical. Most of the assessing will bt formative; that is, a rtgullir JetdtHlck on per-
jormonc:e to the students from allllu8 involved ... '
Assessment is thus tied to guidance, negotiotioll, aud the llS!iumptio/r of respell-
sibilily for olle's oum learnillg. (Pring, 1986: 13--14, emphases in originll1)
There are many such examples, over time, of the acceptance that the classroom
assessment techniques oomprising assessment for learning are broadly 'good
things' (0 do. HowcV('r, the specific intention of this book has been to ground
this 'goodness' in a credible argument that draws its authority and explanatory
power from sound empirical and theoretical contexts. The central arguments
have emerged in \'arious ways throughout the chapters, using research evi-
dence and theory to explain and support the points made. We allempted
to address the specific education-related aspects of assessment for learning but
clearly there are many more oontextual issues that have a bearing on practice,
policy and indeed perception. These include the dominance in some quarters of
summative assessment and the use of data from student assessments for
accountability purposes. The various educational and contextual key issues
may be categorized as follows:
Oassroom pedagogy;
The essence of assessment for learning;
Motivation of learners;
Atheory of assessment for learning;
Assessment myths, misunderstandings and tensions;
The complexity of influencing policy and policy makers.
Classroom
In Chapters I and 2, Black and Wiliam, and James and Pedder, respectively
relate insights gained from major research projects into the practice of assess-
ment for learning in the classroom. Chapter 1 offered examples of
techniques for feedback. for self-assessment and classroom questioning which
were developed by the teachers, but it particularly struck a chord in illustrating
how'... aopplying research in praocti<'C is mud> more than a simple process of
translating the findings into the classroom'. In true developmental fashion the
application in practice led directly to more research insights, especially in the
oonlexl of teachers' professional development. In Olapter 2. James and Pedder
took up the professional development baton by conceptualizing it in terms of
the ten principles (or assessmenl for learning. with a specific focus on the prin-
ciple that it should be regarded as a key professional skill for teachers.
However, they issue stem warnings that radical changes are needed in teach-
ing and learning roles, that teachers need to learn new practices and that the
'99
Copyrighted Material
Assessment and Learning
various changt."'S need to be encuuragt."<i by a supportive culture of continuuus
professionnl development. Their warnings continue: teachers' learning is not
strnightforward nnd there are serious dangers of much of the assessment for
lenming gains translnting to superfidnl practice if teachers do not engage
actively with the ideas and practia>s, and if the environments in which they
work du not nclively encourage inquiry-based modes of proft."'Ssional develop-
ment in the classroom. To paraphrase one of their mesS<lges, a true assessment
for learning context for teachers is one in which they take responsibility for all
aspects of their professional development, giving new meaning tu the old
expression 'self-taught'.
The essence of assessment for learning
Almost every chaptl'r in this book addresses at Il'ast suml' uf the CUrl' issues uf
assessment for knrning, including practice and theory nnd what some might
term its antithesis - assessment of learning or summative assessment. However,
it is the poli<:y chapter by Sebba (Chapter 11) that particularly focuses on the
commonality uf p r l i ~ in formativl' assessment a<:russ several natiunal and
cultural boundaries. Key ingredients of assessment for learning induding peer
and self-assessment, fccdba<:k to support lenrning and effcctive questioning OTC
all in eviden<:c, but here too 5cbbo issues several warnings.
These indude an echo of Black and Wiliam's and James and Pedder's identi-
fication of the crucial nt.'<.>(\ to ensure appropriate teachers' professional dewl-
opment. In terms of the students themselves, she also identifies the need to
teach peer (spcdfi<:ally group) assessment skills in areas such as verbal expres-
sion, sensitivity, tum-taking and listening. Scbba's chapter demonstrates a
welcoml' harmony in aspirations and understanding relating to formative
assessment across a variety of cultures and organizational contexts.
Motivation of learners
A theme that plays out through every SUCQ'Ssful instance of assessment for
learning is the motivation 10 learn that it generates among students. It is
arguably uncontentious that students arc motivated to learn if they partidpate
in developing their learning activities, if they know how their work will be
assess<-'<.1 and if they are involwd in aSSL'SSing it with their fX-'Crs. Perhaps it is
also unnecessory to point out that students' motivation is enhanced by the
ability to engage readily with their teacher, 10 receive fccdba<:k that supports
the next steps in their learning or by being involved in draWing up the criteria
against which they will be asseSSL'<.1.
Why then are these classroom processes not more commonplace? Harlen's
Chapter 4 docs not seck to answer this question but she focuses TCsearch
eviden<:c on the types of circumstan<:cs in which assessment has deleterious
effects on students' motivation and provides the Assessment Reform Group's
conclusions on how the worst effects can be avoided and the motivation of
learners enhanced.
200
Copyrighted Material
Copyrighted Material
Assessment for Learning: A Compelling Conceptualization
A theory of assessment for learning
A central aim of the book has been to cxplore a theorc!ical understanding of
assessment for learning and James's Chapter 3 provides an a<X'CSsible foray into
the main learning theories from which a putative theory of formative assess-
ment might spring. This daunting task is then taken up by Black and Wiliam in
Chapter 5. Grounding their tentative theory on the bnsis of the experiences of
the KMOFAP project, nnd supported by EngcstrOm's Activity System theory,
they make no bones about the compleXity of the task. That snid, the four com-
ponent model they expound in this chapter offers an approach to theorizing
assessment for learning. Blnck and Wiliam nrc. however. the first to concede
that much more needs to be done in terms of understanding practice before a
more comprehensive theory can be ilchieved.
Assessment myths. misunderstandings and tensions
I see the offerings in this ciltegory as addreSSing the amtextuill factors that
impinge on as&-'Ssment for learning nnd its practices. First there is the relation-
ship between assessment for learning and assessment of learning, or put another
way, between formative and summative assessment. Haden's Chilpter 6 is about
purposes and the manner in which the purpose ordained for any spedfic assess-
ment activity will ultimlltely distinguish it as serving Icaming (assessment for
lenming) or as providing a measure of learning (summative aSS('ssment). This
leads to a variety of tensions, not least about whether one assessment can provide
evidence for both of these purposes. The optimistic view, that perhaps they can
be, is unpickcd in some detilil by Hilrlen who condud('S that there is an asym-
metrical relationship Ix'''''e\'I"l evidence gathered for summative and formative
purposes, and that this curtails the opportunity for dual usnge.
Purposes are also key determinants in the arguments for reliability and valid-
ity and this link leads to thL'SC underpinning concepts being examined in Chap-
ters 7 (Black and Wiliam) and 8 (Stobart) respectively. Black and Wiliam's chnpter
provides a grounding in reliability theory, illustrated by examples, which debunk
the myth that cxternill summiltive testing is de facto reliable. If there exists criti-
cism that ilS5eSsment for leilming is inherently unreliilble because it involves
teachers with their subjective biases, for example. the nntithL'Sis that summative
assessment is reliable becnu.sc it is more 'objective', for exnmple, is seriously
undermined. Stobart's messnge is equally uneqUivocal. He draws out the purpose
of the ilsscssmenl as the milin driver in determining its validity. Formative assess-
ment is by definition dL'Signcd to support learning. To be considered valid, then,
it must lead to further lenming. It is as simple as that. Or is it?
Stobart's chapter is liberally sprinkled with caveats and cautions. The
'upstream' factors of nationill culture ilnd sociill context can create circum-
stances in which some of the key assessment for learning features might actu-
ally threaten the learning support they are otherwise designed to deliver. For
example, in cultures where it is the norm, individual feedback may enhance
201
Copyrighted Material
AsSMsment and learning
[earning. while in others where task-related feedback is expected it impact
negatively on the leam<>l'S. Peer assessment may simply be unacceptable in
some cultures. Where highstakes lTansitions ex.ist (for ex.ample, enlry to uni-
versity) oomment-only feedback may be seriously conlentious and may strug-
gle to ilchieve its lim of supporting anxious students (or their parenls!) 10 next
steps in their learning. Nevertheless.. as a first approximation. the basic tenet of
successfully supporting learning remains the main validity check for assess-
ment for learning.
The comp5e.ity of Infll.Mf'Kint _nd milkers
Herein lies the rub. It ofl!rl mallen little if academic resear<hers 'know whal's
right' since the resources and other support, which they or practitioners might
may not be forthroming if the pre\'ailing policy is blind to the issues, or
worst", undef3tands them and dt'liberately Ignores lhem. Even if the researdl is
irrefutable, Rist warns that:
Wt IIrt wt/l ptlSl the Unit whCl it is possible to IIrgUt thllt good rtSUrch will,
MIIUS(' it i$ good, influtna 1M l:ooiicy I'racess. 1Mt kind oflintllr rtlRlioll5hip of
rtSblrch 10 lie/ion i$ not # uiabu U'Vly to think #ooul how knowltdgt am inform
dtrision nUlkins. "l'M rtIl1liorl is ooth mort subllt lind m01l" 'nluous. (2000: 1002)
It certainly helps if the professionals find ways of doing what they think is
'righi' anyway and in the UK a variety of bottomup pressures from grassroots
praditiOl>ers, not 'eIlSl through the activities of the A55essment Reform Croup
and our many collaborators, has brought considerable success in influendng
policy dt-velopment. But the pl'OCeSS is exceedingly complex..
Daugherty and Ecclestont'" Chapter 9 spells out the compleXity underlying
the suooesses of assessment for learning to date, with theory-based explanation
where appropriate and with factual detail in terms of the four countries of the
UK. In esserlct' they argue that the process of change is slow but once it grips,
governments can eMd quite radical polidcs. \Viliam's Chapter 10 paints an
entirely diffmmt picture of the assessment agenda in the USA. Here change is
more or less limited to refioements of long-eslllblished variations of summatiw
assessment, much of it geared to high-stak.l"s selection. Assessment for learning
is barely on the hOril.on, queued other learning oriented activities M1ch
as portfolio assessment. Where formativt' assessment does appear, it is more or
less iI collation of frequent or continuous assessments (for example, in a
lie) that constitute iI form of summative assessment, albt'it p;'rhaps in II more
valid manner than an end-of'year lesL
Assessment for learning: the concept
Any book coveTing the practice, theory and policy relating to a given educa-
tional ronocpt might Cl')flivably dilin\ 10 provide a comprehensive analysis of
202
Aueument for Learnil"lg: A Compelling Conceptualization
that roncept. We do not make such a claim for this book on assessment for
learning because the extent of existing knowledge and understanding of such a
complex process and set of techniques is still in its early stages. We might claim.
however, to have assembled an authoritative account of what is known today,
however inadequate the extent of this knowledge and understanding might be.
Drawing as it does on the work of many researchers and practitioners. as well
as our own. this is not an unreasonable claim. We will leave this for other! to
judge. What w(' can say categorically about aSSE'SSl'nent for learning. hoWE'ver,
is that it is more often than not a fundam('ntal ('lement of any sucressfull('arn
ing cont('.:1.
A d('('p appreciation of this fad was brought hom(' to me very d('arly in a
recent presentation I aUL,"ded on assessment for learning. Tho.- presenters were
two tl'arne.rs, Margo Aksalnik and Bev Hill, from a Rankin lnll't school in the
Nunuvut Territory, a new province established in northern Canada in 1999. The
main illustration in the talk was of the national symbol of the Inuit people, the
Inukshuk. An lnukshuk is a person-like construction of medium-sized rocks,
which has been used by the Inuit people for millennia as a means of guiding
wayfarers in the treeless and landmark-less expanses of northern Canada. 1heir
various uses include giving directiol\5 to good fishing waters or simply reas<-
suring the wayfarer that others hav(' passed the same way, and that they are on
the right path. A reproduction of the illustrative model used by tM two teach
ers is presented in Figure 12.1.
SII-.. 1n"Ilh..,
Dr '. , _
Rint .. ,
\
""""
,_.
...
...
"-
...
"'-
...0 .. , ...
........
(
'.... 01
..
J
""
1' .....
203
Copyrighted Material
Assessment and Learning
As can be St->en, they plaaod assessment for learning squarely in the set of
main ingredients designed to create a school with a culture of success. The other
clements included teachers, their planning of the learning activities, their teach-
ing and assessment strategies, their capadty to rencct about their own and their
students' learning. and the resources they bring to the learning environment.
Outside of the classroom, additional elements include professional develop-
ment and team sUpJXlrt for the teachers while outside of the school, the positive
involvement of parents adds to the re<:ipe for success.
It is arguable that other asJX-'<:ts of a successful school could be fmUld to pop-
ulate the Inukshuk's frame; successful sporting programmes or a students'
roundl for example. No doubt they lmd other features of successful schools are
also somewhere within the model, but the community-based context in which
the two teachers introduced assessment for learning to their school dispelled
any notion that its inclusion in the lnukshuk was either whimsical or contrived
(or the event (a seminar on assessment (or learning on Vanrouver Island). They
rlXountcd that:
The Elders mel 10 consider tlrese new approaches and had the conapt of assessment
for learning e.tplailled to them. They then came lip with a word to identify tire
dynamic - the rCSOllanCe - behllCen teaching, learning and assessment. (AkSlllnik
and Hill, 2(04)
This new word, in the lnuktitutlanguage of the Inuits, is
and is wrillen in Roman form as llIitaunikuliriniq (or in sound form:
nee-qu-lee-ree-nee-kay). Most non-Inuit educationalists will have difficulty
articulating this word but they will not fail to empathize with the assessment
for learning aspirations of this small community in Canada's frozen north.
Conclusion
Throughout all of the tex! in this book, the aim has been to argue the case for
the importance of a5St-'SSment as a means of enhilncing learning. The argument
has been bncked up by rensoncd explanation, empirical evidence and tht.'Qreil-
cal nnnlysis. Borrowing Fullan et nl's phrnsc, we offer what we hope is a 'com-
pelling conceptualization' (2004: 43) of a type of assessment thnt is specifically
designed to serve learning - and which impacts positively on thrt.>e key areas of
education: classroom pedagogy, the quality of students' learning experiences
and the insights that underpin assessment JXllicy formation.
204
Copyrighted Material
References
AAlA (2005<1) A Critique of llu ASSl'S$IIlent for LLllrning Mllttrillis in 'urelll'lJU lind Enj(Jy-
ment: !.tarninS "nd TtllChins in Ihe Prim,,')' Yro",'. Birmingham: Association for
Achievement and Improvement through Assessment.
AAIA (2OO5b) MII7lllsing AsstSSm"nt for !.taming. Birmingham: Association for Achieve-.
ment and Improvement throogh Assessment.
ACCAC (2004) Re"lnu of the School Cllrncllium"nd ASStSJi"...nl A...."n8'"menls 5-16: A ...".,rt
to the Welsh I15snnb/y gsromtnrntt. Cardiff: Qualifications. Curriculum and Assessment
Authority for
ACT (2004) available at hllp://www.act.orglnewsidataJ041data.html
AERA/APAINCME (1985) Stllndards for EducatioMal and P.yrholpgirlll Tnts. American
Educational R"""arch Association/American Psychological Assoriation/Nalional
Council on Measurement in Education, Washington DC; American Psychological
Association.
Abalnil<, M. and Hill, B. (2004) Oral presentation to 'Making Connections', Assessment
for Learning Symposium, Vancouver l.land: Oassroom Connections, Coum.nay,
July.
Alexander, R. (2OlXl) C"ltll" ,,,,d Pedagogy; IM/UIlI1/;olllll rompari50n. In primllry eduCJItioM.
OKford; Blackwell.
Alennder, R. (2Ol}I) TOll'ards Dialogic Trilrhillg; RethlnkiMg rlllSSroom tallt. 2nd edn. Cam-
bridge: Dialogos.
Ames, C. {19M} 'Achievement attributions and self-instructions under rompetitive and
individualistic goal structl.lres'.!ollrnlll of Edllaltiollld Psychology, 76: 478--87.
Ames, C. {1m} 'OaSST'l)(lms: goals, structures and student mothalion'. fOllrllal of EdIlCll-
tnlllill Psychology, (3): 261_71.
Ames, C. and Archer, J. (1988) 'Achievement goals in the classroom: students'leaming
and moti\'ation fournalof EduCIltionll1 Psych<Xogy, 80: 260--67.
Anderson,. L W. and Bourke, S.F. (2001) AJfrcIIvt ChIlrlJCttrisllc:s in Sdtoois. 2nd
edn. NJ:
Angoff, W. H. (ed.) {I '171} The Colltgf 80Ilrd Admission;; Tl'Sting ProgrllllJ: A tfChlllcal n'J'Orl
"M nstIIn:h IIlId devtlopmeMtlJClivilin rdllliMg 10 tl,t sdwJastic aphtudt ttst and achitvl'1lltllt
tests. 2nd OOn. New York: College Entrance Examination Boord.
ARG (1999) AssessmeMt for u..rlliMS; Beyund tlot WIi{;It bor. Uni\'erslty of Cambridge:
Assessment Reform Group.
ARG ASSi'S$meMt for u..ming; 10 principles. University ofCambridgt': Assessment
Reform Group.
ARG (2002b) Te:stillg, M"tiooll"n "nd u..rning. University of Cambridge: Assessment
Reform Group.
205
Assessment and learning
ASF (2004) Working 2, DTIlft j Su",mMiur Assrssmrnl by T/'Ilfhns; [vidence fro'"
ils i"'pIiCl//ions jrJr policy lind prllctiu. Assessment Systems for the Future
Project (see ARC ....ebsite: http://www.assessment-reform-group.org).
Askew, Bro....n. M. L. Rhodes, v., Johnson. D. C. and Wiliam, D. (1997) [!frc/it... Tractr-
nos of N.. ",noIlCY: Finlll rr'pOTl. London: King's College, School of Education.
Askew, S.nd Lodge, C. (2000) 'Gifts, ping-pong and loops -linking feedback and learn-
ing', in S. Aske.... (ed.), FrrtflxldftJr ullrrri/ig. London: RoutledgeFalmer.
Atkinson, R. C. (2001) Tht' 2001 Robv-t H. Atu'tll Oisting..ishn/ Lturl', Paper presented at
83rd Annual Meeting of the American Council on Education.. Washington, DC.
Oakland, CA: Univt'rsity of California.
An, NUT and PAT (2004) Itss<ossmml for /Laming. nil' lutu" "f
nlltionm rnrri....l..m lISSrSS"'l"'/: A WIly fortnml. london: Association of Teachers and
LertulWS/Nalional Union ofTeacheTsI Professional Association of Teachers.
Ayn-.s, E. r. (1918) 'History and present status of educational meil5uren'ents', in S. A.
CO\lrtis (ed.). The MtlIS.."mrnt of Ed"clllionlli p,..,a"CI$. The 5rVr/11"",t" yurbook of Illl'
Nlllumlli Sodrty for tlrl' Siudy of Ed..allion. Bloomington, 1L: rublic School Publishing
Company. pp. 9-15.
Baird, J.R.2 and Northfield, J.R. (1992) ullrning fro'" ,'''' PEEL f.xp<',ierler. Melbourne:
Monash University.
Baker, K. (1993) nre T.. rbulrnt Y",rs: My life in polilia. London: Fabee and Fab<'r.
Ball, D. L and Bass, H. (2000) 'Interweaving content and pedagogy in teaching and
learning to leach: Knowing and using mathematics', in J. Boaler (ed.), Mu/lipll' Pno-
Sllt'rtiws On Tl'IlfIring 1I11d lLIIrning. Westport, CT: Ablex. pp. 83- \(}.t
Ball. S. (1990) Polilia and Policy-Making in Ed..clltion: Exploralions in policy soriology.
London: Routledge.
Ball. S. (1994) EduCIIlion &fornr: A criticlIl and poststr..cl","1 apl'n)Ilch. Buckingh.am: Open
University I'ress.
Ball, S. (2000) 'Performance and fabrications in the education economy: towards lhe per-
formative society'. A..stralio" [dlleolion Res.-orrhn, 27 (2): 1-24.
Bates, R.nd Moller Boller, J. ( 200J) Fmlbad 10 Il }",,. 7 P"pil. Unpublished research,
Stockport Local Education Authorily.
Benmansour, N. (1999) 'Motivational orientations, self-efficaC)', anxiety and slrategy use
in learning high i1Chool mathematics in Morocro'. Mn/itrrrurlelln /o.. rnlll of Educalional
SIl4dies, 4: 1-15.
Bennis. W.G.. Benne. K.D. and Chin. R. (1961) n,e Plunnirlg of C/'O"gr. London: Holt
Rinelutrt and Wins Ion.
Ikrlak, t-l, Newmann. F. M., Adams, E., Archbald, D. A.. Burgess, T., J. and
Romberg, T. A. (1992) Tou'llrds a New SOma of [duelltilmal Testins arid ASSl'SslIImt.
Albany, NY: State Univt'rsity of Ne.... York P!l.'SS.
Bevan. R. (2004) 'From black boxes to glass boxes: the "pplication of compulerisc.>d
concept-mapping in schools'. Paper presented al the Teaching and Learning Research
Project Annual Conference, Cardiff, NoV<'mber.
Biggs, J. (1996)'Enhanring leaching thrQUgh constructive alignment'. Higher [<lI4Clllion,
32,
Biggs. J. (1999) 'What the student does: teaching for enhanced learning'. Highrr dl4C1/lion
arid (>ny-Iop"'l'nt, 18 (I): 57-75.
Biggs, J. and Tang, C. (1997) 'Assessment by portfolio: mnstructing learning and design-
ing leaching'. Papt'r presented at the annual conference of the Higher Education
Research and Development Society of Australasia, Adelaide, July.
Binet, A. and Simon, T. (1911)'La mesure du developpement de i'intelligence chez it'S
206
References
enfants'. BIiUttln dt la Socittt librt PO"' ,"th.dt psychologiqllr dt rmfanl, 70--1,
Bishop, R. and Glynn, T. (1999) Culturt COUII/S: Cllanging poIlv, rt/lltiolls in &lIlClition. New
ualand: Dunmore Press.
Black,. H. (1986) 'Asses.sment for learning", in D. J, Nuttall (ed.), Edurlltwnlll
London: Falmer. PI" 7_18.
Black, P. (1963) Bulletin of the IMtilute of Physics and the Physical Society, 202-3.
Black. P. (1990) 'APU Science - the past and the fUlure'. School Srimct Rroiro.'. n (258):
13-28.
Black, P. (1993) 'Formative and summalive assessmml by teachers', Studits in 5cinlrr
EduCIltil>n, 21: 49-97.
Black, P. (1995) 'Ideology, evidence and the raising of standards'. Annual EduCillion
Lecture, umdon. King's College.
Black, P. (1997) 'Whalever Happened to TGAT?'. in C Cullingford (ed.), AsSl'S5mtn/ I'S.
Era/ua/imi. London: Cassell. PI" 24-50.
Black. P. and Wiliam, D. (199&), 'Assessment and classroom learning',lIssGsmrnt in &lu-
coli/lll, S: 7-71.
Black, P. and Wiliam, D. (I99Sb)I"sidt th.. B/lld: Btu: RIl.silill s/a"dord$ Illrougl, rllISSI't)Om
IISstSsmnlt. London: King's College (see also Phi [klta Kappan, 80: 139-48),
Black, P. and Wiliam, D. (2002) S/a"dards in Public Xllmillatimls. London: King's COUl'gC,
School of Education.
Black, P. and Wiliam, D. (2lXl3) 'In praise of eduealional "-'Search: formalive assessment'.
Brl/ish Educational Rl'Sl'IIrch !OImlal, 29 (5): 623-37.
Black, P. and Wiliam, D. (2005) 'Changing leaching through formalive aS5eSliment:
research and practice: The King'sMlodwayQxfordshire Formative Assessment
Project', in DECO: Formatlllt' Asst'ssmmt: Imprm'ins /tamlng ill rltwrooms,
Paris: DECO.
Black. P., Harrison, C, lee, C, Marshall, B. and Wiliam, D. (2002) /rrsidt Blork
Box: A55tS5mrn/ for ill London: NFER Nelson.
Black. P.. Harrison, c., lee, C, Marshall, B. and Wiliam, D, (2003) ASSt$.5l11tnt for (Lllrll'
"Ig: PlIth"g i/ ill/a prarhct. Buckingham: Open University Press.
Black, P., Harrison, C, J, and DU$(:hl, R. (2004) As.srs,n'ent of 5citllrt
/4-/9. London; Royal Society (www.royalsoc.ac.uk/educalion).
Bolon, C. (2000) 'Schoolbased Slandard tesling'. EdllcotiOlI Policy Allo/ysis Arrlti,'CS, 8 (23)
at htlp'J/epaa,asu.edufepaa/v8n23/
BoWl', R.. Ball, S, with Gold, A. (1992) Rtfor"'III8 Ed"cotiorl mid Chouging Schools, Ca'M!
studi.., in poliey sociology. London: Routledge.
Bransford, J. D., Brown, A. L. and Cockin!y R. R. {2000} Ho", Proplt Ltilrn: Brain. mind.
txpt'ritnct ond Sfllool. Washington, DC Nalional Academies Press.
Bredo, E. (1994) 'Rl>ffiMtructing educalional psychology'. Eduelltimlol Psychologist, 29 (I):
23-45.
llredo, E. (1997) 'n.., social coMtruction of I<'aming', in G, D. l"h)... (ed.), Hondbook of tic,,
dl'lnic l.l'amitrg: COlls/rllrtlorl ofklll1ll'ltdgt. San Diego, CA: Acad<'mic Press,
Brigham, C. C (ed.) (1926) 5chalasrit Apritudt Trst>: A ",alluol for tilt U'M! of srhools. New
York, NY: College Entrance Examination Board.
Broadfoot, P. (1986) Profi/rs oud Rteo,ds of Ad",""""'t,,l: II rrviru' of i....u"" and prMtiu.
London, Reinehart and Wilson.
Broadfoot, P. (1996) Edurmi."" ASSl'S5",tllt a,ul Socirt)l. Buckingham: Open Uni,....rsily
r=.
Broadfoot, [', (2000) 'Empowerment or perfomativity? Assessment policy in the late
twentieth CI.'Iltury', in R. Phillips and J. Furlong (ed.). &luca/wn, Rifornl ond /ht Sta/t:
207
Assessment and leaming
Twmty-ftw of polities, policy lI1ld prlldiet. London: RoutledgeFalmef. pp. 136-55.
Broadfoot P., Osborn,. M.. Gilly, M. and Bucher, A. (1993) PrrcqU1JtlS ofTtlUhillg: Primary
Khooi telldws ill EIlgilUld lI1ld Frllllet. London: Cassell.
Brookhart,. S. and DeVoge, J. (1999) a theory about 1M role of classroom assess-
ment in student motivation and achievement'. Applitd Mt'dS,,"mt'Ilt ill Ed"crotioll, 12:
409-25.
Broome, E. C. (1903) All HistOTiC<lllllld Criticlll DiWll5Siorr cif CoJlrgt Admi5$Km Rell"i......
mellls. New York. NY: Macmlllan.
Brous5t'au, G. (1984) 'lhe crucial role of the didactical contract in 1M analysis and con-
struction of situalions in teaching and learning maIMmatiC!i', in H. G. Steiner {ed.),
Throry uf Mlllhematics EduC'/Ilioll: ICME 5 tl!pie 1Irt11l1l1d miltiamfrrrnu. Germany: Bie1e-
feld. lnsfitut fUr Didaktik der Mathematik def Universifat Bielefeld. pp. 110-19.
Bruner, J. (1996) Cul/u", of EduC'/ltiorr. Cambridge, MA: Harvard UniveTSity
Bryce, T (1999) 'Could do belter? Assessment in Soottish schools', in T. Bryce and w.
Humes {eds), Scottish Edualtioll. Edinburgh: Edinburgh University. pp. 709-20.
Bryce, T. and Humes. W. (1999) &ollish Eduelllioll. Edinburgh: Edinburgh University
Press.
Butler, D. L. and Winne, P. H. (1995) 'Fel.'dhaek and self-regulated learning: a theoretical
synthesis'. Rn.'irw cif Eduelllional Rtsnlrch, 65 (3): 24HI.
Butler, R. (1988) 'Enhancing and undermining intrinsic motivation: the effects of task-
involving and ego-involving evaluation on interest and performance'. British !oumlll
of EduaitwlllIl Psydw/ogy, 58: 1-14.
Butler, R. (1992) 'What young peQple want to know when: effects of mastery and ability
goals on inlerests in ditferent kinds of social oompariSOfl'. lOu",al of Ptrs<mality (lnd
Soci41 Psydwlogy, 62: 934----43.
BUller, R. and Neuman, O. (1995) 'Effect5 of task and ego-achievement goals on help-
seo.king behaviours and altitudes. ]aurnal of (dualtional P5YChoiogy, 87 (2): 261-71.
Callaghan. D. (1995) 'lhe believers: politics, personalities in the making of the 1988
Education Act'. History of Eduution, 24 (4): 369--85.
Camefflll. J. and Pierct', D. P. (1994) 'Reinforcement, reward. and intrin5ic motivation: a
meta-analysis'. Rroit'w of EduCllliollaJ RtvIlrch, 64 {3): 363-423.
Carless, D. (2005) 'Pl"ll5pects for the implementation of assessment for learning'. Assess-
mtlll in (duratioll, 12 (1): 39--54.
Carter, C. R. (1997) 'Assessment: shifting 1M responsibility'. The Journal of Smmdury
Gifted Edu,""lion, 9 (2): 68-75.
Callell, J. M. (1890) 'Mental tests and measurement'. Milld, 15: 373-81.
Cattell,', M. and Farrand. L (1896) 'Physical and menial measurements of the students
at Columbia University'. Psyclwlllgif"Q1 &>vine, 3: 61&-48.
CCEA (2003) PalhWilYS - Propo5IIJs frr CllrriCl/lum IIlld Assessmml III Key J. Belfast:
Council for the Curriculum, Examinations and Assessment.
CCEA (2004) 1M RroiMd Norlherll Irellllld Pril1lllry CUnlCl/lum: Slllges lanJ 2. Belfast:
Council for Curriculum, Examinations and Assessment.
Chadwick, E. {18M) 'Statistics of educational results'. The Museum, 3: 479-84.
ChaikJin, S. (2005) 'lhe zone of proximal development in Vygotsky's analysis of leaming
and instruction', http://www,education.miami.edu/blanlonw/mainsite/oomponents-
fromdmer/ComponentS/ChaikiinTheZoneOfProximal()@wlopmenUnVygotsky.htmi.
Chickering, A. W. (1983) 'Grades: one more tilt at the windmill'. Anrma", Associlltioll for
High" Edualtioll Bulldill, 35 IS): 10-13.
Choppin, B. and Orr, 1.. {1976) Aptitude Ttslillg III 18+. Windsor: NFER Publishing.
Clarke, S. (1998) Targe/illg Asstssmtl'll ill the Primary School. London: Hodder and
208
Referentes
Stoughton.
Clarke, S. (WOl) Unlocking F,mlla/ill/' Asst'Ssmwl, London: Hodder and Stoughton.
Clarke, S. (2005) FormaliV<' AS5('S,mrnt ;n Ille S,w"dary CI""room. London: Hodder
Murray,
ColleSC Board (2004). Coll"S,'-b.J""d S",,;ors: II profilt oj SAT"mgram It'Sl-taktr5. New York,
NY: College Entrance baminalions Board.
Conanl. ). B. (19-W) 'Education {or a classless society: jeffel"$Onian lradition'. TI,e
AtlaHtir:, 165 (5): 5'i:HlO2.
Conner, C. and lames, M, (1996) 'The meddling role of LEAs in lhe interpretation of go,,
emment assessment policy at school level in England', Currintlum !OImlat 7 (2):
1SJ-66.
Corbeu, H. O. and Wilson, B. L, (1991) &fo.." 11",1 Rebel/ion. Hillsdale, NY:
Cowie, B. (20:14) St,"lrul rollllllrlllary a" Joron"lil'<' a5."-'SSll1ml. Papt:'r prel;<'nted al the
annual of the Nalional Association for Rt"Searrn in Science Teaching. Van-
couver, March.
Cowie, B. and \:leU, B. (1999) 'A model of formative a.\.SeS..ment in s<-;ence educalion',
in Eduealio", 6 (1):
Crooks, T. J. (1'J88) 'The impact of classroom evaluation practiC\.'S on sludt"fllS'. Rn'i,..,,, oj
Ed"catio"al Rrsenrcl" 51!: 43&-81.
Crooks, T. j. (2001) 'The validity 01 formative assessments'. Papt:'r presented at the
Annual Conference of the British Educalional Research Association, Leeds.
Crooks. T. J., Kane M.T. and Cohen. A, S. (19%) 'Threals to the valid use of assessments'.
A:<stS5",ml ill Edueali"", 3 (3): 265---S5.
Crossley, M. (198-1) for curriculum change and the question of inlernational
transfer'. Journal aJC",rie"/,,,," Sludi"" 16: 75-88.
Cumming, j. and Maxwell, G. S, (2004) 'Asso$sment in Australian schools: current prac
tk... and trends'. A5S<'Ssmtrll i,r Edr"'lIli"", II (I): 89-108.
Dale, R. (1994) 'Applied education politics or political sociology of education: contrast-
ing approaches to the sludy of ree""t edu<:ation reform in England and Walcs', in D.
Halpin and B. Troyna (t"<:ls), R,'S<'urellirrg Edumtioll Poliey: E1lrirlll mrd mttlrooolosirill
issrlcs. London, Falmer,
Daugherty, R. (1995) NII/ionlll Currieul"", Assessment: A rf'l,iell' oj !",Iiey
London: Falm"r,
Oaugh"r1y, R. (2000) 'National Curriculum assessmenl policies in Wales: administrative
d...\'olution or indigenous pol icy development?'. nrc Wl'Islr /ourtlill oj Eoll/riltimr. 9 (2):
4--17,
OilUgherty, R. (2004) Leorlling Polllu"'ys SIoll/tory AsSt'Ssmml: FiJlol report of tI..,
Doug/rerly AS,"-'Ssmenl R",,;cw Cro"I" C""diff: Welsh A. Government.
Oaugherty, R. and Elfed-eh,...,ns, r. (2003) 'A national curriculum for Wales: a case stud)'
of policy-m.,king in the era of ..dminist",tive de,olution'. B,itish ]ourtlal oj Edt/eotiO/ra!
Studies, 51 (3): 233-53.
Davidson, J. (2004) Statemml I" pltwrry stSsimr oj Ii,,' Natiollal A>Stm/lly for Wales, july.
Cardiff: Welsh Assemblr Government,
Davies, J. and Ilrember, L (1998) 'National curriculum testing and in Year 2.
The first five years: a cross-!iectionalstudy'. Ed"raliolla/ P'yrllOl"8Y' 18: 365-75.
Oa.. i"'5,]. and Brember, l. (1999) 'Reading and m,lIhemat;cs attainments and self-esteem
in Years 2 and 6: an eight rear cross-sectional study'. Ed'leatWnol Sludies, 25: 145---57.
Deakin Crick, R., BrO.Idfoot, P and Claxton, G. (2002) o......l"ping ELL!: tlrt Effrrti,'t: Lift-
lon8 lLnr"i"8 l"vrHtory in PradiCt. Bristol: University of Bristol Graduate School of
Edu<:ation.
209
Assessment and Learning
Oed E. L. and Ryan. R. M. (1994) 'Promoting self-determined education', ScalldiullVialJ
1011"'01 of EdllClllumol h:sNrch. 38 (I): 3-14.
!Rei, E. Koestner, R. and Ryan R. M. (1999) 'A meta-analysis revie..... of experiments
examining the effects of extrinsic rewards on intrinsic motivation'. Psyrlwlogiul
Slilletin. 125: 627-88.
DES/WO (1988a) TIlSk Grllllp 011 !\ssNsmrnt alld Trsli"g: A TI'pO"I. london: Department of
Education and Science and the Welsh Office.
DES/WO (l988b) Task Grollp 011 AS5nsmrlll alld Trslillg; Thr suppkmm/ary TI'pO"ls.
London: Department of Education and Science and the Welsh Office.
DfEE (1997) Eduraliou for Exrrl/rner. london: Department for Eduration and Employ-
ment.
Dorart5. N. J. (1999) Corrrspolldmrr &tu.'11 ACT alld SAT I Scorrs. Princeton. NJ: Educa-
tiorull Testing Service.
Dorans. N. j., Lyu. C. F., Pommerich, F. and Houston. W. M. (1997) 'Concordance
between ACT Assessment and re--centered SAT I sum scores'. Collrgr m,d Univvsily.
73 (2): 24-34.
Dorr-Bremrne. D. W. and Herman. J. L. (1986) Asstossiug Sludml Arlrit'l'{'",rrrl: A profilt of
dassroom prarliers. Los Angeles. CA: Uniwrsity of California. Cenler for the SIudy of
Evaluation.
DorrBremme, D. W. Herman. j. l. and Doherty, V. W. (1983) Adtin",mml Trs/illg ill
Amniton P"blir Sc/,ooIs: A Ilolio"al pns/otCtiur. Los Angeles. CA: of Califor-
nia. Center for the Study of Evaluation.
Duckworth, K" Fielding, G. and Shaughnes.sy. j. (1986) Tht Rila/iolls/tip of Higlt School
Ttl/d,m' Class Trslillg Pradim 10 S'"dt",s' Ftf'lin1P of EffiOJty and Effor/s /0 SI,uly.
Eugene. OR: University of Oregon.
Dudley, P. (2004) 'lessons for I..aming: l'KO'arch lesson study, innovation, transf"r and
metapedagog}"' a design experiment?' Paper at the ESRC TI.RP Annual
Conference. Cardiff. November. Available at http://ww..... Ilrp.orgfdspace/r''trieve/
289IDudleyt-fullt-papert-Nov04t-for-+ronferencev5271004.doc.
D.....eck. CS. (1986) Motivational proces:;es affecting learning', IIm<7icali Psyd'ol"l(isl, 41:
IlJ4O-4B.
D eck. C S. (1992) 'The study of goals in psychology'. PsycltologiCIII Scitner. 3: 165-7.
D eck. C. S. (1999) StIfThroriN: Thrir roIt in mo'if",'i"n, JI"SOlialily and dn""lopmrlll.
Philadelphia: Psychology Press.
Dweck, C. Sand u-ggett. E. l. (1988) 'A socio-rognitive approach to motivation and per
5OI"Iality'. PsycllOllI8iral Rroinv. 95: 256-73.
E<!rl. l . Fullan. M., leith.....ood. K. and Watson. N. (2000) Walching and uor"ing: El'llltja-
lim, "f tilt implrntt"lation "o/ionallittrocy and nl/mnlll)' slraltgiN: First m,n",,1 "l"'rl.
London: Department for Education and Employment,
EcdestOf'li!, K. (2002) A'do,,,,my in Post-16 Ed"calion. london: RoutledgeFalmer.
Ecelestone. K. S.....ann, j .. Greenwood. M. Vobar. J. and Eldred. j. (2004) Impl'l1lring For-
maliv.- ill VOOIlional Edl/ralio" and &/sir Skills. Research project in progress
_see www.exeter.ac.uk.
Ed.....ards, A. (2005) 'let's get beyond community and practice: the many meanings of
learning by participating'. TIrt Currirl/II/m Journal, 16 (1): 49-65.
Elliott. E. S. and (h,.'eck, C. S. (1988) 'Goals: an approach to motivation and achievement'.
/ol/rnalo/ Pns;malily and Sodal Psychology. 504: 5-12.
Engeslrorn. Y. (1987) uorning by Expanding: An ru:/roity-lluortlical "pproach to drorlol'mtn-
101 Helsinki. Finland: Orienta-Konsultit Oy.
Engeslriim, y. (1993) 'Developmental studies of WQrk as a testbench of activity theory:
210
References
til<!' case of primary car(' in mOOical education', in S. Olaildin ;md J. l.Ivt' (ed$), Undtt
sttmding PrlICticr: Pusptiln on Ildiuity and a1nltrl. Cambridge, UK: Cambridge
Univt'rsity Press. pp. 6+-103.
Engestriim, Y. (1999) 'Activity theory and indi\'idual and social transformation', in Y.
EngeJtrlim. R. Micttinen and R-L. PunamJiki (eds). P{'TSptrtiv0t5 "" Activity Theory.
C.:unbridge: Cambridge Univt'rsity Pre$s.
Entwistle, N. (2005) 'Learning outcomes and .....aY" of thinking aCl"QSl!l contrasting disci-
plines and settings in higher education'. 1M CllrrinlllIlIl 101lnud. 16 (1): 67--82.
Evan..g, E. ;md Engelbl.rg. R. (1988) 'Stud""ts' perceptions of sdlool grading'. 10llrnal of
R=ardt and CNvtlopmmt in Edurolion, 21: 44-54.
Fernandez. C (2002) 'Learning from Japa.nese approilches to profes:;;ional dewlopment:
the case of lesson JOllntl1l ofTtilWr Edllrlltitm, 53 (5): 393--405.
Fielding, M. (2001) 'Students as radical agents of change'. 10llrnal of Educational
2: 123-41.
Fielding, M., Bragg. S., Craig, J., Curmingham, 1., Eraul, M., GilJinson, S., Homl;', M.,
Robinson, C. ;md Thorp, J. (2005) fmors Inflllenring the Transfrr of Good PrPClicr.
London: DfES.
Filer, A. lIIld Pollard, A. (2000) 1M Social World of Pupil AS;;('S5mrnl: and rontals
of primary Mhooling. London: Continuum.
Finlay, l. (2004) 'Evolution or devolution? Distinctive education policies in Scotland'.
Paper presented at the Annual Conferroce of the British Educational Research ASS<)-
dation,. Manchester.
Flyvbjerg.. B. (2001) Making Social Sdrnct Malh.,: WIry social inquiry fails lmd /low it ran
again. Cambridge: Cambridge University P""".
Foos, P. W., Mor", J. J. and Tkacz. S. (1':J94) 'Student study tedmiques and the generation
",ffecl'. Journal of fAllcationlll 86: 567-76.
Fcm;lE'r, M. ;md Masters, G. (1996-2001) Rrscllrcr Kil (romplete Camber
well: Auslralian Council for Educational Research.
Fredericksen. J. R. and Collins, A. (1989) 'A systems appr(),lch to educationalle!lting'.
fAlIC4tional 18 (9): 27-32.
Frederiksen, J. R. and White, B. Y. (2004) 'Designing assessment for instruction and
accountability: an application of validity theory to as5t'Ssing scientific inquiry', in M.
Wilson (ed.), Towards CoI'm!rra bttt<.....n Classroom AS;;('S5nlmt and ,l,c"ounlability: IOJrd
Yrarbook of the National Society fur Stlldy of fAuralion ParI II. Chicago: National
Society for the Study of Education. pp. 74-t04.
Fullan, M., Bertani, A. and Quinn. J. (2004) 'New lessorl.'i for districtwide refonn'. Eduea
lionalLLadrrship, April: 42--6.
Galton, F. (1869) Hrrrdilary Gnrills: An inquiry inlo its laws and ronStqlltnctS. London. UK:
Macmillan.
Gardner, H. (1989) 'Zero-based arts education: an introduction to Arts PROI'El'. Studies
i" Art Edurat;"": A }outrr,,1 of Issutf ,,"d 30 (2): 71-83.
Gardner, H. (1992) 'Assessment in cont,-'Xt: the a!lemat;'.... to staodardised testing', in B.
R. Gifford and M. C. O'Connor Changing AUl':Ssmmts: Alttrnatiw vitu.'$ of apti
IUtU, and Boston,. MA: Kluwer AcademiC Publishers. pp.
77-117.
Gardner, J. and Cowan, P. (2000) Testing lite Test: A study of tltr reli"I>ility and validity of I,"
Norllocrn Irdand mmsfrr proudllrr trsl in rnabling tilt of pupilsfor gntmlllar school
plactS. Belfast: Queen's Univt>rSity of Belfast. (See also Assnsm.... t in fAlIClltion, 14 (2):
145-65)
Gauld, C. F. (1980) 'Subjtct oriented lest construction'. Raurch in EduClllion, 10:
211
Copyrighted Material
Assessment and Learning
77--82.
Cibs<m, J, J. (1979) [>logical Approach to V;;<lUIl l'm:eptirm. London, UK: I loughton
Mifflin.
Cipps, C. (1994) Iky(md Ttsa"g: Towards a Ihrory ofrducalio"a/ as"""sme"l. London: Falffil'r,
Gipps, C and Murphy, 1'. (1994) A rair ",,,,,I? Assessment, achievement and equity, Hucking-
ham: Open University Press.
Gipps, C, McCailum, ll. and Hargreaves, (2001) Whal Makes a Good Primary Sehool
Expert c1as.<mo", strategies. l.ondon: l:almN.
Glassman, M, (2001) 'Dewey and Vygotsky: sociely, experience and inquiry in educa-
tiona! practice'. [dllcaliolJal Res<arche,. 30 (4): 3-14.
Clover, 1'. and 111Omas, R. (1999) 'Coming to grips with rontinuous aSSl."SSmcnt'. ASStss
"'ellt in Educalum, 4 (3): 365----80.
C,oddMd, 1-1.1-1. (1916) l'ublicalwn IJ/lhe Vinela"d Tra,,,i"li ScIv.,1 (no, 11). Vindand, NJ.
Vindand Training smool.
Goldstein, 1r. (1996) 'Group differen""" and bias in aS5<'5;5ment', in I r. Goldstein and 'I:
Lewis (eds), and statistical iso;ue,;. Chichesler: John
WUf.'y, pp. 8-5-93.
C,,-,rdon, S. and ReeS<', M. (1997) 'High stakes !l"sting: worth the price?'. Jounwl if Sehool
IAldership, 7: 34s-68,
Graduate Record Examinations Board (2004) Cllide 10 lhe U"" if Sco"", 2004--200:5. Prince-
ton, NJ: Educatiunal'l"sting Service.
Grocno, J. G. and 'lhe Middl".School Mathematics 'lhrOlJgh Applications Project Group
(1998) 'n,e situativity of knowing. learnlng and r<'S<'arch'. I'syc/wlJ,'gist, 53
(l): 5-26.
Greeno, J. G., Pearson, P. D. and Schoenfeld, A. I L (1996) Implications for NA!:!'''f Res<arch
01' uanling "lid C"gllilion; R'7"'rt Of" sludy by the Natio..al Academy of Ed,,-
ClItion. I'and on the NAEl' Trial State Ass<-'SSment, conducted by the Institute for
R<'S<'arch on Learning. Stanford, CA: National Amdemy of Education.
Grossman, F. L. and Stndolsky, S. S. (1994) 'Considerations of content and the circum-
stan""" of secondary school teaching'. RwelO"f Res<ttrch ill [ducal/em, 4: 179-221-
GrC(E) (2004) IlIlemal alld LIlemaJ A"o;emmt: Whal lhe right balallce f'" the future>
Advict to Ihe Secrrlary of Sialc jiJr I:ducati"n "lid mile'S, Lolldon: General Teaching
Council for Engiand,
Hacker, D, J. Dunlosky, J. and GraCS5Cr, A. C. (1998) Mclarogllilwn i" EdUOllimwl '[7,"""1
alld I'ractice. Mahwah, N]: Lawrence Erlbaum Associates,
1lall, C and I larding. A (2()()2) 'Leve] descriptions and teacher assessment in England:
towards a community of assessment practice'. Educa/iolllli 44: 1-15.
1lalL K., Collins, J., Benj,'min, 5, Nind, M. and Shcchy, K (2004) 'SATurated models of
pupildom: assessmenl and inclusion/exclusion', British EduCll/ional Research lnunlal.
30 (6); 801-18.
Hallam, S., Kirton, A, Peffers, J., Hobcrtson, 1'. and Stobart, G. (2004) Epalualw" of ['mlocl
10f Ihe A"St'Ssrnellt is for Development Programme: SUp".Jrt f"r profeSSiollai prllC-
lice in formalilX ",,,,,,,smenl. Edinburgh. UK: Srottish Ex<'O.ltivc.
Hargreaves, A (1995) Curru:ulllm and R1or",. Buckingham: Open University
Press.
HargrNves, D. (1999) 'The knowledge crealing school'. 101m",! of EdllCal/mml
Sludi<!S, 47: 122-44.
Hargreaves, D. (2005) Aooul 1<'7""'1 of Ihe Learning worki"g group. London:
Demos.
Harlen, W. (19911) 'Classroom assessment: a dimension of purposes and procedures'.
212
Copyrighted Material
Referern:es
Paper presented at the annual conference of the New Zealand Association for
Research in Education, Dunedin, December.
Harlen, W, (2000) u"",ing Ilnd Assasing ximcr 5-J2. 3rd edn. London:
Chapman Publishing.
Harlen,. W. (2rot) 'A systematic review of the reliability and validity of assessment by
teachers used for summative purposes', in ElIidrnct in [dur,,/ion Library Issue
I. wndon: Ei'PI-Centre, Social Sciences Research Unit, InstihJte of Education.
Harl"n, w. (2005) 'Teachers' summati,e practices and assessment for learning: tensions
and synergies. Curriculum !ournal, 16 (2): 207-23,
Harlen, W. and Deakin Crick, R. (2002) 'A systematic review of the impact of summatiw
as.s<-'SSment and tests on students' motivation for learning (EPPI-Centre Review)". In
Rrm:rcl, E"idrner in Edurlltim, Library lssur 1. wndon: EPPI-Centre, Social Science
Research Unit. lru;litute of Education, Available OIl the website at: httpj/eppi.iOi:'.
ac.uk/EPP1Weblhome.aspx?pag"",,/reel/review..grOllpsJassessmentJreview_one.htm.
Harlen, W. and Deakin Crick, R. (2003) 'Testing and motivation for learning'. Assrssmrnt
in Education, 10 (2): 169-208.
Harlen, W. and James, M, (1997) 'Assessment and I"arning: diff"rences and relationships
between formative and sllmmative assessment'. Assrssmrnt in Eduration, 4 (3): 365-S0,
l13rlen, w., Da......in, A. and Murphy, M, (1977) /.,tadrr's G"idt: Malch and Mismatch
Rnising Q"ts/ions and Mlltrll Ilnd Mismlllcl, Fi"di"g Ansrvrl'$. Edinburgh: Oliver and
Boyd.
Harlen, W. et aL (1992) 'Assessment and the improvement of ooucation'. Thr Cllrrirllium
/ournal, 3 (3): 21,..30.
Harris, S" Wallace, G. and Rudduck, I. (1995) 'Irs not that I haventleamt much. It's just
that [dont really understand whal I'm doing: mebrognition and se<:ondary school
shJdmls'. R=arrh Pal"'rs in Edura/ion, 10 (2): 253-71.
Hayward, L. Priestley, M. and Young. M. (2004) 'Ruffling the calm of the ocean floor:
merging practice, policy and research in assessment in Scotland'. Oxford of
Education, 30 (3): 397--415.
Hmderson, V.L. and Dweck, C. S. (1990) 'Motivation and achievement', in S. S Feldman
and G. R. Elliott (eels), AI tI,e Thrrsho/d: Thr dew/oping ado/rsanl. Cambridge, MA:
Harvacd University Press. PI'. 308-29.
Herman, I. Land Dorr-Bremme, D. W. (1983) 'U'5"'I of testing in the schools: a national
profile'. Nrw Directilms for Testing lind Mrasumtltnl. 19: 7-17.
Hidi, S. (2000) 'An interest re-s.earcher's perspective: the effects of extrinsic and intrinsic
factors On motivation', in C. Sansone and J. M, Harackiewkz (oos), 1"lrinsic lind
Extrinsic MOlil'IItion: 1M- for optimal molir'lltion and prrftmnanrt. New York: Aca-
demic Press.
Hidi, S. and Harackiewia:, J. M. (2000) 'Motivating the academically unmotivatoo: a cril-
ical issue for the 21st century. Rrr,irw of Educ,,/ionlll RrsnIrch, 70 (2): 151-79.
HMI (1999) HM /nspotrlorso!Srlwo/s Rroitw"I AsstSSmenl in Pte&J,<lCI and 5-14. Available
online at: http://www.scolland.gov.ukJ3-14assessmentfrapm-OO.htm.
Hodgen. J. and Marshall, B. (2005) 'Assessment for learning in mathematics and English:
a comparison'. The Curriculum Journal, 16 (2): 153-76.
Holland, D., Lachicotte Ir. W., Skinner, D, and Cain, C. (1998jldmtily Q"d Agomcy in Cui
tllral Worlds. Cambridge, MA: Harvard University Press.
Hubin, D. R. (1988) The Sriro/astir Aphludr Test: Its ,Irorlopment and introduction,
I900-J94/J. Unpublished Unhersily of Oregon PhD thesis. Retrieved from
http://darkwing.llOTIgon.edu/-hubinon13.11.0t
Hufton, N. and Elliott, J. (2001) 'Achievement motivation: cross-culhJral puzzlL'S and
213
Assessment and Learning
paradoxes'. Paper at the British Educati{)J\al Re5eard! AS5IXiation Confer-
enn', Leeds.
Humes, W. (1997) 'Analysing the policy process'. Scottish EduclltNmll1 Rroil"W, 29 (1):
Humes, W. (1999) 'Policy-making in Scottish education', in T. Bryce and W. Humes (OOs),
SaJ/lish Educlltion. Edinburgh: Edinburgh University Press. pp. 74-85.
HutchirtSOrl, C. and Hayward, L. (2005) 'TN! journey so far: assessment for learning in
Scotland'. 17lt CurriOilum 'Clurnlll, (forthroming).
Intercultural Development Re5eard! Association (1999) umgitudi"lIl Attritioll Ratts in
TtXIIS Public High School, 1!J8S-J986 to 1998-1999. San Antonio, TX: Intercultural
Development Research Association.
Isaac, j.. Sansone, C. and Smith, J. L (1999) Other people as a sauro'S of interest in an
activity'. 'oumlll of Exptrimrntlll SocUlI Psyltology, 35: 239-65.
James, M. (2004) 'Assessment of learning.. assessment for learning and pt'rsonalised
karning: synergies and tensions'. Paper presentl'd at Goldman Sachs US/UK Confer-
ence on Urban Education. London, December.
James, M. and Brown.. S. (2005) 'Grasping the Tl.RP nettle: preliminary analysis and
some enduring issues surrounding the improvement of learning outroml'S'. Tht Cur'
riculum Joumal, 16 (I): 7--30.
James, M., redder, D. and SwaWeld, S. with Conner, c., Frost, D. and MacBeath, J. (2003)
'A servant of two designing te$('arrn to advann' Icnowll'<lge and practice'.
rapt'r presentl'd at the annual meeting of the American Educational Research Ass0-
ciation. Chicago, in the symposium, 7111ki"8. TWrkin8 lind Iror"i"8 ",lth trodlaS lind
uhool Iraders; tilt Cambridge Symp<l$ium'. http://www.leamtolearn.ac.ul<:/home/
009_pubIit_paperslronf-papt'rs/OOO/l21-aera master2003.doc).
James, Pollard, A., Rees, G. and Taylor, C. (2005) 'Researching learning outromes:
building confidence in our conclusions'. The Curriculum JOllnlal. 16 (I): Il"J'}..U.
Jessup. G. (1991) Olltcomts: NVQs Amltltt emerging modd of tducation. London: Falmer.
Johnston, c. (1996) tilt Will to Wrn. Thousand Oaks, CA: Corwin Press.
Johnston, J. and McClune, W. (2000) &Ition Projt $<-1 S. I: PlIpil ",,,til'lllio,, and atlitudts
- locus of co"trol, IronJing dispositio" arid lilt imllAd pf orr ttaehing arid
leami"g. Belfast: Department of Education for Northern Ireland.
Jones.. L. V. and Oll<:in, l. (2004) 17,e Nation's Riport Cllrd: Evolutilm and PflSpt'(III'l'S. Bloom-
ington, 11..: Phi Delta Kappan Educational Foundation.
Kalll.', M. T., Crooks, T. and Cohen. A. (1999) 'Validating measures of performann". Edll-
catianal MNl$urrment: lS-lut'S a",( Prar/iet, 18 (2): 5-17.
R. A. and "Thompson, D. E (1990) 'Worl<: motivation: theory and practice'. Amn-
lelln Psychologist, 45: 144-53.
Kenagha", T., Madaus, G. and Raczel<:- A. (1996) Tht Clf E.xtrmal uami"lIti'lMs III
Improvr Stude"t MotiVAtion. Washington, OC: AERA.
King. A. (1992) 'Fadlitaling elaborative learning through guidl'd studentgenerated
questior'ling'. Ed"ralional Psychologist, 27: 111-26.
Kluger, A. N. and DeNisi, A. (1996) 'TIle effects of feedback interventions on perform-
once: a historical review, a meta-analysis, and a preliminary fel"dbacl<: intervention
theory'. PsydwlogirolBullelin, 119: 252-84.
Kohn, A. (1993) PuniWd by Rrwards: trouble with gold staTS, illetntil ... plQIlS. A's,
and brilln. Boston: Houghton Mifflin.
Korelz. D. M. (1998) 'Large-scale portfolio assessments in the US: evidence pertaining to
the quality of measurement'. A$sessmtllt in durotio", Principles, Policy lind Prar/iet, 5
(3): 309--34.
Koretz, D. M., Stemer. B. M., Klein,. S. r., McCaffrey, D. and Deibert, E. (1994) Ca" Port-
214
References
{olios Asstss Siudeni IIIld Influence IllSlruerionr 1M J991-92 Vermont e%pUi-
mce. SaTlta Monic-. CA: RAND Corporation.
I<n'!isberg.. S. (1992) Thrnsfilrming Pl1Wfl'; Dominlliion. Empou'fflTlf1l1 IImJ UJualtion. New
York :StiIte Univt'mty of New York
!<rug. E. A. (1969) 1M Shaping of the Ammun High Sdwol: 188Q--1920. Madison. WI: Uni-
of Wisconsin Press.
Kutniclr:. Sebba.J., Blatchford. P. and Galton. M. (2005) The EffrclsofPupil Grouping: An
ulmdd IittrlllulY m>kwfilr DfF.S (submitted).
!.avt', J. and Wenger, E. (1991) Silulltd Ll'arning: Ll'gitimllle pmphmd pIIrliciplltilm. Cam-
bridge: Cambridge Univt'rsitr Press.
Lawlor, S. (ed) (1993) TIll' Dtllrillg Drilate; AsMssmmlllnd the IlIIIioJ'UIJ cuniallum. London:
Centre for Policy Studies.
Leadbetter, C. (2(l(I.l) !.nrning. London: DfESlDEMOS.
Lee, C (2lXXl) 'Studying changes in 1Mpractia' of two teachers'. Paper presented at sym-
posium entitled 'Getting Inside the Bladr::: 80:< : Fonnative A505l'SSment in Practict", 1M
British Educational Researdl Association 26th Annual Conferena.', Cardiff University.
London: King's Collcgt' School of EduC<ltion.
Lemann, N. (1999) 1M Big Tnt: 1M Srd histo1y of the AmmCl/n mmtocT/ICY. New York.
NY: FalTar, Straus and Giroux.
Leoflard, M. and Davey, C. (20(1) Tiwughls on the 11 Plus, Bo.lfasl: Save the C1l.ildrro
Fund.
Levine, D. O. (1986) Thr AmlTiCl/n Collegr lind Ihr Culture of /\spi",,!;"n 1915-1940. Ithaca.
NY: Comell University Press.
Lindquist, E. F. (ed.) (1951) Educational 1st edn. Washington. oc: American
Council on Education.
Linn, R. L (1989) 'Current persp.ctivcs and future directions'. in R. L Linn (ed.), EduCl/-
tional MTllSulYmrnt. Jrd edn. London: Coilier Macmillan. pp. 1-10.
Linn, R. L. (2000) 'Assessment and accountability'. Rrsrllrchl'T, 29 (2): 4-16.
MacBealh, J. and Mortimon>, P. {eds) (2001) Improving School Ejfrdn't'nT$$. Buckingham:
Open University Press.
Madaus, G. F. and Kellaghan, T. (1992) 'Curriculum evaluation and OI5S<'SSment', in P. W.
Jackson led.), Handbook of on Curriculum. New York. NY: Macmillan. pp.
119--54.
Marshall, B. and Hodgen, J. (2005) Fonnative Assessment in English. Private communi-
cation _ in preparation for publication.
Marsland. D. and Seaton, N. (1993) EnrpilY Strim 8k: Crtlltivr 51llrotrswn of tilT Natilmal
Curriculum. York.: Campaign for Real EduC<ltion.
Masters, G. and Forster, M. (1996) Progrrss Map$. Victoria, Australia: Australian Council
for Educational Researdl.
Mathews, ,. (2004) Pllrtfolio ASSofSsmrlll. Retrieved on 30.3.05 from http://,,ww.f''duca-
tionnexl.orgl2OO43172.html.
Maxwell, G. S. (2004) 'Progressive assessment for leaming and certificaotion: some
lessons from school-based llSS<.>ssment in Queensland'. Paper p...-sented at W third
Conferroao of the A5I;ociation of Commonwealth Examination and Asses5ment
Boards, March, Nadi. Fiji.
McDonald, A. S., Newton, P. E., Whelton, C. and Benefield, P. (2001) Aptitudr TntinKfilr
U"iwrslry Entrllna: A literlllure rt"lIirw. Slough: National Foundation for Educational
Re5l'arch in England and Wales.
Mcinerney, D. M., Roche, L Mclnemey, V. and Marsh, H. (1997) 'Cultural perspec-
tives on school motivation: the n'levan<:e and application of goal theort. AmericQ"
215
Assessment and learning
EduC'Qlitmal Journal, 34: 207-36.
McKeown, P. (2004) 'Exploring education policy in Northern Paper presented
at the iUUlual conference of tM British Educational Research Association. Manchester
UK, September_
Mc..'\otahon, A., Thomas, S., Greenwood, A., Stoll L., Bolam, R., Hawkey, 1<., Wallace. M.
and Ingram, M. (2004) 'Effecti ...e professionalleaming communities'. Paper presented
at the ICSEI conference, Rotterdam.
Mcier, C. (WOO) "The influence of educational opportuniliell on assessment rellults in a
multicultural SoJuth Africa'. Paper presented at 26th IAEA conferenre, Jerusalem.
Mer<'e1', N. (2IXXl) Words and Minds. London: Roulledge.
Mercer, N., Dawes, L, Wegerif, R. and Sims, C (2lXW) 'Reasoning as a ways of
helping children to use language to learn science'. British Educillional Rtsi'ardr /(lU"'II/,
3(J (3): 359-77.
Messiclcf S. (1980) 'Test "'alidity and the ethics of assessment'. Ammcan 35
(1I): 10\2-27.
Messick. S. (1989) 'Validity', in R. L. Linn (ed.), ducatwnal Measurt'men/. 3rd edn. New
NY: American Council on Education and Macmillan. pp. 13---103.
Miliband, D. (2004) Speed1 to the North of England Education Conference, Bellasl
January a...ailable at http://www.dfes.go....ukJspeeches.
Montgomery, M. (2004) 'Key features of assessment in NOrlMm Ireland'. Paper pre-
sented at a seminar of tht' Assessment Systems for the Future project, Cambridge UK.
March.
National Commitlet" on Science Education Standards and (1995) Na"'"nal
Scirnu EduCll/wn Siandl/rds, Washinglon, DC: National Academiell Press_
Natriello, G. (1987) "The impact of e...aluation processes on studenlJl'. ducati"nlll Psy-
chologist, 22: 155-75.
NCfM (1989) Curriculum lind Eva/llatit", Stllndllrds for Scllool Ma/hrmalla;. Rellton, VA:
National Council of Teachers of MaltlematiC'$.
NCfM (1991) Profrssionlll Standards for Tellc"ing Mathemlltics. Re!lton, VA: National
Council of Teachers of MathematiC'!.
NlIT (2004) Tht NUT Approllc" to Asstssmenl for Eng/lind: foundation stagr and prillUlry.
London: National Union of Teilchers.
DECO (2005) forlllalilJf' As.stssmen" Improving learning in S'Colldary classrooms. Paris:
DECO.
OFSTED (2005) The Aunual Rlp(lrl of Hn- Mlljrsly's C"ief Inspector of Schools for 200J/Q.l.
Office for Standards in Education. at http://,,"WW.ofstcd.gov.ukJpublicationsiannual_
report0304fannuaLreport.htm.
Osborn, M., McNess. E
v
Broadfoot r., Pollard, A. and Triggs, P. (2OCO) What TelIChn"s Do:
Chan 'ng policy "lid prl/clia in prim"ry education. u.ndon: Continuum.
Palincw., AS. and Brown A.L. (1984) Rniprocal Te"chinK "f Co"'prr/rtnsion f"5ttring "nd
Monitoring Activitin: Cogui/ion alld instrllction. Hillsdale, NJ: Erlbaum.
Paris, S., lawton. To, Turner, J. and Roth, J- (1991) 'A developmental perspe<:ti...e on stan-
dardised achie...ementtesting', Educational 20: 12-20.
Paterson, L. (2003) Scottish Educlliion ill tile Twmlu-lIr Cmlury. Edinburgh: Edinburgh
Uni ...ersity Press.
Pedder, D., James, M. and MacBeath, J. (2005) 'How teachers value and practise profes-
sionallearning'. RNI'IIrch in duCllli"n, 20 (3): (forthroming).
Pellegrino, J. W., Baxter, G. P. and Glaser, R. (1999) 'Addressing the 'Two Disciplines'
problem: linking theories of cognition with assessment and instructional practice',
Rroiew of RestIlrch in duclllwn, 24: 307-53.
216
,
References
Pellegrino, P., Chudowsky, N. and Glaser, R. (2001) KllOWing What StUrkllts Know:
Kirner dl"ign of Washington, DC: National Academies
P"",s.
,,*,rrf."floud. P. (l991) 'Towards a pragmalic approach to formative evaluation, in I'
Weslon (ed.), af Pupils Aehi"''l'melll: Mofi,..tioll 'lIld sdtooI SIICCt':5S. Amster-
dam: Swels and Zeitlinger. pp. 79-10l.
PerT('noud, P. (1998) 'From formalive evaluation to a conlrolled regulation of learning
processes. Towards a wider conceptual field'. AS>f'SSmml in EJllcati<m, 5 (I): 85--102.
"*'rry, N. (19'98) 'Young children's selfregulated learning and conlexts that support it'.
faurnol of dlwltwlllll Psyclwlogy, 90: 715-29,
rhillipll, M. (1996) All Must ha".. Prius. london: UtilI>, Brown and Company.
Pollard, A. and James, M. (ed5) (2005) PrrsollllliS<'d Ltarning: A l1y Tl'QChillg
alld uamillg Rt'SeIlrd' Swindon: Economic and Social Research Council.
Pollard, A., Trigss. 1', Broadfoot P., McNess, E. and Osborn, M. (2000) Whol Pupils Say:
O'llnging pa/icy pmctier ill primwy niuutioll. London: Continuum.
Popham,. W. J. (1997) 'Consequential validity: right concern - wrong roncepl'. Eduellti(mlll
's.sUts otld 16 (2): 9-13.
Pring. R. (1986) 'The developing 14--18 curriculum and changes in asses.srnent', In T.
Staden and P. Preece (ed5), Is.suts in 23. beler: University of
Exeler. pp. 12-21.
Project Zero (2005) Hi,'ory of Uro. Retrieved on 30,03.05 from hltp:1!
www.pz.harvard.edulHistorylHistory.htm.
QCA (2004) ASMSsmml for ul/rnillg: Rtst'on:I' into London: QualificatiOll5 and
Curriculum Authority (CD-ROM pac1<age).
QCA (2005) ...ww.nc.ulr:.nct (accessed 22.02.05).
QSA (2005) Studies Authorit}, al hup:/fw....w.qsa.qld.OOu.au.
Ramaprasad, A. (1983) 'On the definition oJ feedbacl<'./klurvionrl 28: 4--13.
Raveaud, M. (2004) 'Asse5sment in French and English iniant schools: the
work,. the child or the culture?'. AS5t55..,{nl in EduClltwn, I I (2): 193-211.
Reay, D. and Wiliam, D. (1999) 'I'll be a nothing: structure, agency and the construction
of identity through llsses.sment'. British [Aueo/iOllal Rtst'l/rcIr lournal, 25: 343-54.
R""S, G. (2005) 'Democratic devolution and education policy in Wales: the emergel\Cl' of
a national system?'. umtmrpllNlTJI Walts, 17: 28-43.
J., McCall J. and MacCilchrist, B. (2001) 'Change leadership: planning.
tualizalion and perception', in J. MacBeath and P. Mortimore (eds), Impnwillg 5chooI
Ejfrr'ivrlll'SS. Buckingham: Open Uni,...tsity 1'Te5$. pp. 122-37.
Reynolds, D. (2002) 'Developing differently: educational policy in England. Wales, Scol-
land and Northern Ireland', in J. Adams and P. Robinson (ed5), DroaIulionlll PllKtitt;
Public policy di!frrtnets wilhill UK. London: Institute of Public Policy Research. pp.
93-103.
RilIt, R. C (2000) 'Influencing the policy pr""""" with qualitative """"arch, in N. K.
Denzin and Y. S. Lincoln (eds), of &starch. Thousand o..lr:s, CA:
Sag". pp. 1001-77.
Robson.. B. (2004) 'Built to fail: every child left behind. MinneapoJis/St Paul'. Cily PI/StS,
2S (1214). Retrieved from ht!p:1/cit)pages.com/databank/2Sfl214/artidell955.asp on
31.03.05.
Roderick. M. and Engel, M. (2001) 'The grasshopper and the ant: motivational responses
of low achieving pupils to high stakes testing'. Educatiollal EwluotiOlll/lld Policy AlU/ly-
sis, 23: 197-228.
Rogoff, B. (1990) ill Thillking: CognWvr in sociI/I Cllllh."rl. Oxford:
217
Assessment and Learning
Oxford University Pres.s.
Rogosa, D. (1999) Huw Art Iht STAR P....u"tllt Ra"k Srorts Ivr
Studrllts? All Intrrprrtivt guidt. CSE Technical Report 509a. Los Angeles, CA: CRESST.
Published on ...eb-site : http://ww....C5t'.uc1a.l'du/products/reports_set.hlm.
Rousseau, J-J. (l176211961) Emllt, london: Dent.
Rowe, M. B. (1974) 'Wait lime and rewards as instructional variables, tlwir innut'nce On
language, logic and {ale conlrol'. Illumal IIf I" Trnchillg, II: 81-94.
Sacks, P. (1999) SllIndllrdl;m Minds: Thr high prier IIf Amninl's l<'Sting cu/lurr a"d ,mUlt "...
(lin da It> c:ha"St II. Cambridge, MA; Books.
Sadler, D. R. (1987) 'Specifying and promulgating achievement standards'. Oxlvrd Rn';j'U'
<>fEduCf/lion, 13: 191-209.
Sadler, D. R. (1989) 'Formative assessment and tit<:! design of inslructional syslems'.
Instructiollll1 Scinlct. 18: 119-44.
Sadler, 0, R. (1998) 'Formative assessment: revisiting the lerrito!),'. AsstSslIImt ill Edurll-
tl"n, 5: 77414.
Salomon, G. (ed.) (1993) Di5trlbutNi CagllitiOIlS: Psych"logiclllll"" j',h,ra/i""lll cOllsldrrallollS.
Cambridge; Cambridge Uni\'ersity Press,
Schon, D. (1%3) Tht Rrflrrtivr PrtK:/lrla",.,.. New York.: Basic Books,
Schunk,. D, H, (1996) 'Goal and sel{-('valualive during children's cognitive
skillieaming', Americlln EduC1Jrillllal Rr$l'lIrrh 10u"'lIl, 33 (2): 359--438.
Scriven. M. (\967) 'The methodology of evaluation' in R. W, Tyler (ed.), PfI'Sptiws <>/
Currictdum 1'll/uIIII"n. Chicago: Rand McNally, pp. :w-&3.
Sebba, J. and Maxwell, G. (2005) 'Quet':nsland, Australia: an outcomes-based curricu-
lum', ill FU1'lIIu';vt ASSt'SSmomt: 1m/trolling lrllrning in strondllty Paris: DECD.
5EED (2004a) ASst'Ssml'l1l, Trsting arid Rl'pOrting 3-14: Our rrspom;r. Edinburgh: Scottish
Exccuti,e.
5EED (2004b) A Cu"imlumIvr &ctlltnct: Mlnlsl,.,.illl rrspollst. Edinburgh: Scottish
ulive.
Serafini. F. (2000) 'Thrt'(! paradigms of assessment; measurement, prc.><:edure, and
inquiry'. TIlt Rtllding Trochtr, 54 (4): 384-93.
5fard, A. (1998) 'On t....o metaphors for learning and the dangers of choosing just one'.
EducaOO/IIJ1 27 (2): 4-\3,
SHA (2002) E;raminll,lonslllld AsstSsmrnl. Leicester: Serondary Heads' Association.
5ha)",r, M. (1999) 'Cognitive acceleration through science education II: its effects and
smpe', In/w,alumal/uurnlll af Scu",ct EduCII/iu", 2\ (5): S8J--902.
Shayer, M, and Adey. P. (1993) 'Accelerating the development of formal thinking in
middle and high-school students 4. 3 years after a 2-year intervention'. /ou"'l1l of
ResrllTfh I" Scirn(t TtllChlllg, 30:
Shepard, L. A. (1997) 'The centrality of test use and consequences for test validity'. Ed,,-
ca/iOlllU MtllSurj'mc"t; Issur5 a,," Practier, 16 (2): 5-8, 13.
Shulman, L. (1986) 'TIlose who understand: knowledge growth in teaching'. Educarim"u
15 (I): 4-14.
Shulman, L, (\987) 'Knowll'dge and teaching: foundations of tlw new ""form'. Harvard
E;duC1Jtill"al RnJif"lL', 57 (I): 1-22.
SHwka, A., Fushell, M., Gauthier, M, and Johnson. R. (2005) 'Canada: encouraging the
use of summal;ve data for formative purposes', in Furmll/II'" As5l'5sI1lC"/: improt'ing
It/lrnlng in strolldary classrooms. Paris: OECO.
Smith, E. and eorar<L S. (200S) '"They gives uS OUr marks: the role of furmative feedback
in student progress'. AsstSSmtnt in Educlltiun, 12 (1): 21-38.
SOED (199\) Cumrulum and Assrssmt"l;n Scu/lalld; ASSt'SSmrnt 5-14. Edinburgh: HMSO.
218
References
Starch, D. and Elliott, E. C. (1912) 'Reliability of grading high school W{)rk in English'.
School Rn';nl'. 20, 442-57.
Starch, D. and Elliott. E. C. (1913) 'Reliability of grading high school work in mathemat-
ics'. Schoo/ Rroirw, 21: 254-9.
Standards for Success (2003) Mixrd mrsSi1grs: Iligll Irsls commlmiallr "bout
stuMIII rtIIdinJ'SSfur roll'gr. Euge[\(', OR: Association of American Universities.
L. (1975) All In/roductioll la CurriCI4I"", "nd DnJrlapmtll/. London;
Heinemann Educational Books.
Stiggins. R. j. (2001) SlllllwHllt'Oh'fit Classl'OOlm ASSI'Ssmmt. 3rd (-dn. UPf'l.'r Saddle River,
NJ: Merrill Hall.
Stigsins, R. J. and Bridgeford, N. J. (1985) 'The ecology of dailSroom assessment'. Jaurn'"
4 Educaliana/ Ml"aSurrmrrll, 22 (4): 271-86.
Stiggins, R. j.. Conklin, N. F. and Bridgeford. N. J, (1986) 'Classroom assessment: a key
to effective edutation'. M'''Sl4rcmrnt: /SWfS "lid 5 (2): 5-17.
Stiggins. R. J., Frisbie, D. A. and Griswold, P. A. (1989) 'Inside high-school grading prac-
tiCl.'S: building a research agenda'. EdllCllhOlral MraSlll1'mrrrl: /ssurs 11IId Practirt'. 8 (2):
5-14.
Stigler. J. and Hiebert. J. (1999) TIl' Tnlcl,illS Gap: TIlt ksl idellS from I/Ir trorld's Irllchrnfur
improving fflilelltioll III Orr classroom. New Yorl<,. NY: FR.... Press.
Stob"rt, G. and Gipps, C. (1990) Assrssml'lll: A I.achrrs gui,lt In Ort issllrs. 1st edn. London:
Hodder and Stoughton.
Stoll. L., St{Xldrt. G. Martin. S" FrL....man. S" FrL'ed.man, E., Sammons, P. and Smees, R.
(2003) Prt"/lIlring for Chang" E",I"a"'"" of the impltmmlatiOlr of lilt Kry Slogt 3 slrattl{Y
pilol. London: DfES.
Sutton. R. (I99S) Assrssmml for Lr"",ini!' Manchest...r: Ruth SUllOI' Publications.
Swaffield. S. and Dudley, r. (2002) Assrssmr"t Walley for Wi.5<' Dieisi{ms. Londun: Asso-
dation 01 Teachers and Ledun-rs.
Swann, J. and Brown, S. (1':197) '"Jl, implementation of a Nillional Curriculum and teach-
... rs classroom thinking'. Rrst'arrh PIl/1tTS irl Educllli{m. 12: 91-114.
Tamir, P. (1'J90) 'Justifying the selection of answers in multiple ,noire items'./rllmmlilmlll
/ollrnol 'if Scirn.. EdllcotiOl', 12 (5), 563-73.
Taylor. T. (1995) 'Movers and shilk,'rs; high politics and the origins of the National Cur-
riculum'. TIlt Cl4rriClllum JOIm'1I1. 6 (2): 160-84,
Terman, L. M. (1916) TIlt of I",tl/igmc!', Elo5ton, MA: Houghton-Mifflin.
Terman, L. M. (1921) 'Intelligence tests In colleges and universities', School lind 5<lciny,
(April 28): 482.
Thatcher, M, (1993) TI,e Daw"i"g Sln:r/ Yellrs. umdon' Harp<'rColiins.
Thomas, G. and Egan, D. (2(0)) 'Policies On schools insp<.ortion in Wales and England',
in R. Daugherty R. Phillips and G. R('<'5 (eds). Edlltilliorl irl Wait'S.
Cardiff: University of Wales PrL'$S, pp. 149----70.
Thorndike. E. L (1913) Educaliorill/ PS!fClwlogy. V<I!"mr I: Tht' n,,'ul? of lIIari. New
York: Columbia University Teachers College.
Torrance, H. (1993) 'Formative assessment - SOme theoretical problems and empirical
questions'. !OImlal of EdllcatiOlr. 23 (3): 333-43.
Torrance, H. and Pryor, J, (1998) T=!ting. Ifllming Illld
assrssmrnl ill tilt classroom. Buckingham, Open University Press,
Toulmin, S. (2001) Rotlurn 10 Rotas"". Cambridge, MA: Harvard University Pll"SS.
Towns. M. H. and Robinson, W. R. (1993) 'Student of tcst-wiseness strategies in
solving multiple-choice chemistry ./,mrllal of Rrstarrh in Scit'nrt T<'Ilch-
illg, 30 (7): 70'9--22.
Assessment and Learning
Townshend, J., Moos, L and Skov, P. (2005) 'Denmark: building on a tradition of democ-
racy and dialogue in schools', In Formatipt ASSNsmfllt: Improvillg laming ill SVlldary
c1a.ssrooms. Paris: OECD.
Travers. R. M. W. (19&3) How Rrs.,llrr;h Chllngrd Amrrlom &hooIs: A hislory from 1840
to IIlr Kalamazoo, MI: Mythos Press.
Tunstall, P. and Cipps, C. (1996) 'Teamer feedback to young children in aSS('$$-
men!: a typology'. British Educlltional lournal, 22: 389-404.
Tymms, P. (2tX}I) 'Are standards rising in English primary schools?'. BrilWt Eduational
R(SNrrll JOllrnal, 30 (4): 477-9..\.
Varon, E. J. (1936) 'Allred Binet's concept of intelligence'. PsycllOtogiCJ/1 43: 32-49.
Vispoel, W. P. and Austin, J. R. (1995) 'SuCU'SS and failure in junior high school: a critical
inddent approach to understanding students' attributional beliefs', AmrrlCJ/n Educa-
Iiollal Rfstarrh tournai, 32 (2): 377-412.
Vulliamy, C, 'The impact of globalisation on qualitative research in oomparati\e
and international education'. Campal'l', 34: 261-84,
Vumamy, G., u-win, K. and St"pheTl$, D. (1990) Doing Educatio",,1 in
Cou'II,lrs; QualitaliVl" London: Falmer.
Vygotsky, L. S. (1978) Mind in SocieIlJ: The DtwIopmrnl of Higlln' P.yrllOlogical Procrss.
Cambridge. MA: Harvard University Press.
Vygotsky, L. S. (1986) Thought and Languagt. Cambridge, MA: Harvard University Press.
Vygotsk)', l. S. {1998 119J3!41l 'The problem of age', in R. W. Rieber (ed.). Thr Colltctrd
Works of L S. \<)'golsky: Vol. 5. Child P.ydrcJogy (trans. by M, Hall). New York; Plenum
Press. pp. 187-205.
c., Carnell E.. lodge, c., Wagner, P. and Whalley, C (2000) Utl'Il;'lg about
lJarning, London: Routledge.
Walkins. c., Carnell, E.. lodge, C Wagner, P. and \'/halley, C. (2001) NSIN Resrllrrh
Martrr$ No.13: l1arnillg Ilbout IraTning mhm,res pnfr,mal1l:r. London: Institute of Edu-
cation.
Walkins, D. (2000) 'Learning and teaching: a cross-cultural penlpeetive'. &hovJ /..rlldrr.Jtip
and Manllgr..,rlll, 20 (2): 161-73.
Webb, N. L (1999) Aligumrnt ofScirncr and Mathrmalirs Slandard.llud Assrssmrnl. in Four
StatfS. WaShinglon, OC: Council of Chief Slate School Officers.
Weeden, P., Winter, J. and Bro.,dfoot, P, (2002) Assnsmml: \-'/hat', ill il for schoo/s7
london: RoutledgeFalmer,
Weiner, B. (1979) 'A throry of motivation for some classroom experience!l'./ouml/I of Edu-
rglia"al Psychology, 71: 3-25.
Wenger, E. (l998) Communitil'S of Pmcticr: /..ram/ng, mt'IIning mid idrnlillJ. Cambridge:
Cambridge University Press.
White, B. Y. and Frederiksen. J. R. (1998) 'Inquiry, mool'ling and ml'tacognition: making
sciel1C<' aca"Sible to all students'. CDgnltion and Inslruclirm, 16 (I): 3-ll8.
\'/hite, . E. (1888) 'Examinations and promotions'. EduCtlticm, 8: 519--22.
VI.'hite, J. (2tX}I) Unpublished report on the CCEA 'l'athwaY'" proposals. london.: Uni-
versity Instirute of Education.
\'/hitty, G. (2002) Milking ofEduCtltiou Policy. london: Paul Chapman l'ublications.
Wiliam, D. (1992) 'Some te<:hnical iS5ues in aS5eSSml'llt: a user's guide'. Briti5h Journlll for
Curriwlum alld As$r5smrnl, 2 (3): 11-20.
Wiliam. D. (2000) 'Recent developments in educational assessment in Engl;md: the inte-
gration of fonnative and summalive functions of assessment'. Paper presented at
SweMaS, Umea, S....eden, May.
Wiliam, D. (lOOl) 'Reliability, validity and all that jazz'. Eduoztioll J--.IJ, 29 (3): 17_21.
220
Referen<:es
Wiliam, D. (2003) 'l}.., impact of ooucatiOllal research on education' in A.
Bishop, M. A. Dements, C. K/!itel, J. Kilpatrick and F. K. S. u-ung {oos), xrond In'{I'
of fdllC<llion. Dordrecht, Nl!thl!rlands: Kluw/!r Acad/!-
mic Publishers. pp. 469---88.
Wiliam, D. and Black, P. (1996) 'Mcanings and col\SI!<juenco>s; a basis for di$tinguishing
formative and summat;"e functions of assessment". British Eduallitmal R="rch
23 (5): 537--48.
Wiliarn.. D., Lee. C, Harrison, C and Black, P. (2004) 'Teachers developing lI$Soe$smL'Tlt for
learning: impact on student achievement'. in (dllc,,'wn, 11: 49--65.
Wilson, M, (1990) 'Measurement of dew-lopmental It'vels', in T. Husen and T. N.
Postlethwaite (eds), Inltrnali",ral fncyrloptdia of fducolion: SInd sludirs (Slip-
plernrntory rolume). Chford: Pergamon Press,
Wilson. M. and Sloane. K. (2000) 'From prinaples to practict': an embedded assessment
system'. Applird Mrl/Surernrnl in fducali,,,, (forthcoming).
Wilson. M., Kennedy, C and Draney. K (2004) G'adrMap (Version 4.0) {computer
program]. Berkeley, CA: University of California, BEAR Center.
Wilson, S. M, and Berne, J. (1999) 'Teacher learning and the acquisition of professional
I:JlQwk-dge: an examination of research on contemporary professional development',
in A. lran-Nejad and P. D. Pearson (L>dS), RI'I'jro, of Rrsrarrli in Washington.
DC: American Educational Research Association. Pl'. 173--209.
Wood, D. (1998) How eMd"n Tlrink "nd LJ"m: TIll! s;x;ial co','exls of "'g"itiVf' drof"lopmrnl.
2nd oon, Oxford: Blackwell.
Wood, D., Bruner, J. S., and RO!is, G. (1976) The role of tutoring in problem !IOlving.
Joumal ofClrild PsycholDgy and Di$<:iplinrs, 17: 89-100.
Yarroch, W. L (1991) 'The implications of rontent n"rsus item validity on sdena- tests'.
Journlll of Rrgarch in Sciroct Ttl/chiog, 2B (7): 619-29.
Zcnderland, L. (2000) M"'Sllring Mi"ds' Hrnry Hi'1Wr1 Goddard the o,igins of Amrri"'!1
intrlligt11ct lesting. Cambridge: Cambridge University Press.
Zimmerman. B. J. and Schunk, D. H. (oos) (1989) Stlf-Rrgulatrd ua",ing l/lfd Acadrmic
Achirooo"""t: TIltory, rtSt'IITCh, lind prllCtict, NI!W York: Springer.
U!!.. 136, m 142.lU.
I..H..!.M.. 157, 166,
I.l!2. ll!Z. m 198
"""" C ""
Bourke. S,f, 6Z
eo-. R. 151, 152
Brolluford. ).0. &L 6Z
Bredo, E.,5,1 56. 'll
Bremoo. L fa
Bridgeford. N,J. lZD.
Brign'lIn. c.c. 174
Bro.odoot. P. g lJi. l.t2..
l>l. I.IO. IBZ
Brookh.art,. S. Z1. Z2. 12
Bl'OOlTIt', E.C. 111
G. 20
Brown. A.L M. 'll
Brown. S. rI. l.59.
Bm........, J. 'll
Bry<oe, T. l.59.
Butler. D.L!l2
Butler, R,1S..6S.. 'R,142
D. J.S,1
J.!l2
Carless, D. ill
Carter, c.R.1lll
CCEA 164
Cllaiklin, S. 'lIl
Olklo:ering. A, W, l.ZD:
Olin.. R, 22
Choppin. B. 127
OlUdowsky, N. 5J..
,.. .....
Darke. S. m 142,. 144,
151
Oronan. G. 6Z
Cocking. R.R, 52. &L 6Z
Cohen. AS m.. 136
lZ5.
Collins. A. 135
Conklin. N.F.l.ZD:
eon......... C.1S6
Corbett. I:LD. lZ'l
222
Cowan. P. rn. 123
B. 'R.. ill
Crooks, T,J. l.ll.. 6J. Z6. rn...
Ul! 140, 157, 170,
'" CT0551ey, M, l.BZ
Cumming.). I.OZ
R. lSIl
Darwin. A,
Daugherty, R. ll!.. 155, 1.fl2"J
Davey; C. ZIl.. Z1 Z6. 18
David5(ln, J.163
Da\o'ies, I. 69:
Deakin Crick. R. fiL M.. l.'lS
Di. E.L. 1I9
DeNi5l A. U. 141. 142, 14J,
144, ill
DES/WO.lQ,!?. 1().l, 153, J.5,I
DeVog.., J. Z12b. <'6.11
DfEE 1Sl
Donerty, V, W, lZD.
Dorans, N.J.1Zti
DorrBremm.., D.W. lZD.
Draney, K. 110
Duckwortn. K. Z2. 71. Z8
Dudley, P.:H. 164
Dunlo61<y, J. !l2
Ow!<, C.s. & Oi,
'R. 142. 1M
Earl LM!
Ecdel;tone, K. M. 140, ill
Edwards, A. S:9'
0.162
Elfed..Qwens, P. 162
Elliot. E.C.1Zl
Elliott. .S. ZS
Elli<;l!t. J. 138
M, 71. Z8
Eng.. R. z:z. Z!I'
Enge$tJVm. Y. !!,1 !l6
N. f.8
E. z:z. Z!I'
Fielding. G. "lL ZlI.
Fil:'lding. M. l8Z
Fill:'1, A. sa
Finlay, L l5'l. 162
Flyvbj<ors. B. II
Foes. P. W. 1.6
Forsll:'r, M. 110, l.!!l1
Frederiksen. J.R. 106,
135
FrisbII:'. D.A. 110
Fullan, M. "lOt
Calton, F. 112
Gardner. H. 118
Gardner, J. 123
Gauld, Cf. 127
Gibson, J.I.II2
Gipps, C 142, 143. 155, 156.
Glaser, R. i!J.. .22-
"'-'"
Glas.sman, M. 56
Glover. I>. I.li.
Glynn, T. 1.88
Coddard.l:i.H.123
Cold. A.1.a ill
Corard,S. ill
Cordon. S. Z2. Z"!. ill
Graesser, A.C. 22
Gn.-eno, I.G. 'll
Griswold, P.A. 1ZO
Grossman, P:L as
cre(E) 1601
Hacker, 0.1. '12
HaiL C. Tl
HalL K. l5iI
Hallam, S. 2Z.. ill
Har.tCkil:'wia. I.M. & M..lJZ
Harding, A. T1
Hargreavt'5. A.1Sll
HargreaV1:'5. D. 20.
Hargreaves, E. 142,. W
Harfen, W. 012. & l.O5..
106.l.llZ, l.ll2. l1Q. 113.
127, 157, JjS. I.9ll
Harris, S. 1J9.
Hayward. L m ill
Hmdel'$On, V.L. M
Hennan, J.L 110
Hid;' S. & 6i.l3Z
Hiebert,. I. .l'.!
HilL B. 203, 204
HMII59
Hodg..... J. ill 52. 85
Holland. D.!l2 83
Hubin, D.R. 174, ll!1182
Hufton.. N. Llll.
Humes. W.lS'l
HUlrnil\llOf\, Cl.6ll
Author index
baac, J. M
Jamel. M. J!).. ,ll .u. tl. 5J...
Zb.l.l!'L l.56. 166
lessup. G. l.Sll.
lohnston, c. 2ll
Iohn!;ton. J. ZQ, Z6
Jones,. L.V. 1811
Ka.ne. M.T. m 1.36
R.A. lil
K<!llaghan. T. & & !.TI!. l1'1
Kenrll:'dy, C lliI
King. A.1.6
Kluger, A.N.!1.. 141, 142,
143,1#, lAS:
KOl!$lner, R. 6)
Kohn.. A. & 142, ill
Koretz, D.M. 118
Kmsberg. 5. !Ill
KUlnick, P.191
Lawlor, 5. l55
Leadbetter, C. lfi
r-, CllZ
E.L.Z5
u.m.ann. N. l16.
lronard, M. Z!!. Z1 & ZlI.
u.vilW!, D.O. I1.L 174
Lewin, K.186-7, 1.88
UndquiSl. E.f.l16.
linn, R.L illl.36
Lodg... C. ill
MacBeath, I. 3.L 31n, J2
McCalL J. l'l
McCallum. B. 142, W
McC1u..... W. Z!.!.. Z6
MacCilchrist B. l.'.l
Mcllll:'fTle)'. D.M. 6S.
McKo'Own.. P. L6.1
McMahon, A. ill
Madaus. G.f. 6J.. 6S. J.Zll. l1'5!
Marshall, B. l.ll,. 52. 85
Masters, G. W!. l!IO
J. 118, l19.
Muwt'll, C.5. l.ll6::Z.llS..
1M. 192, l.'.l1
Ml'iI:'r, C. 1JZ
M"l'CI:'r, N. 9Il
Messick, 5. m l16.
Miliband, D. 165.
Moller Boller, J. 1.0
Mootgomery, M. 1601
MOOII, l. 139, \92, 19J
Mora, J.I. 1.6.
Mortimore, P:31n
Murphy. M.m
223
N"triello, G. 10
NCTM122
Neuman. O. '12
Northfil:'ld, I.R. 22
NUT ".
OfSTED"
Olkin, L 1.80.
Orr, L 127
Osborn, M. 155, 166
Palirlc:Nr, A.5. 'll
Pari!!, S. Z':I
Patl'rson, L. lS'l
Pe"1'!OI\, P.O. 53
Pt'<lder, D.;n. 32
l'ellegrino, J. W. !IS
l'ellegrino. P. SJ., 56. 5B.
59,39
Pel'Tet\Olld, P. w... l!.L !!Z
lIlL"''''
Perry, N. Z8
I't.illips, M. ill
PiI:'1t'e.. D.P. 22
Pollard, A. ft!.. Z2.. & "lL
""'66
Popham. W.J. l.36.
Prietley, M. 159, ill
l'ring. R. 1'.l'l
ProjKt Zero 178
Pryor, I. &. 2!1 15S..
'66
QCA 140, 198
Q5A '"
Quinn. J. 2l}4
RDCl.<!k, A. 65
Ramaprasad, A. U
Rave3ud, M. m 1J8.
Ruy, D.@.&a 142,l.55
RPe$, G. L6.1
Reege. M. Z6. Z!!.1l5
Reeves. J. l.'.l
R."...,.o:b, D. 1511
Rist, R.C. 202
RobiMon,. W.R. 121
Robson. B. 1.80.
Rod...nck, M."!L ZlI.
Rogoff, B. 56. !Ill
Rogoeoa, D. 122
ROIl&, G.2lJ:
RouSllc.llu. I-J lIJ
RoW\', M.B. 14.
Ruddutk, J. l.3'l
Ryan, R.M. 8'.l
Sadler, D.R. J1.l2.. l!!t
malarial
I
i1Q2. 145. 157
Salomon, G. Eo. IIJ.
s.m..on.... c. M
Schoenffold, A..H. 53
Schil<\ D. ti
Schunk, D.U. Z1. Zft.
"""
ScriVl'1\, M.l
Sl!bba.], l1\f!. 192, ill
SEED l.6b2
Serafini, F.
Sfard, A. 2'.1
SHA 164
Shaughnessy, J. Z1.. 'l1.1ll
Sharer, M. '!!.. ':!lI
Shevard, LA. 1J6
Shulman, L,l& 1l6.
Skov, P. 189. 192, ill
Sliwka, A. l'lll
Sloane, K. 2Z
Smith. E. ill
Smith, J_l. >I
Standanb for Success 1l!.l
Starch, DJ ill
L.:ill
Stephens, D. 1116-7, l8lI.
StiAAiru;, RI_ lZll
Stigler, j. J2
Stobart, G. l.56
Assessment and learning
$lodolsl<y. 55,!l5
Stol1, L rn
SUlt..... R. 38
Swaffield, S. 164
Swann,. j. J.52
Tam,r, P. 127
Tang. C. , 51!
Taylor. T. l5l
Terman. LM, m 174
Thatcher, M, l2i
Thomas, G. 162
Thomas. R. ill
Thompson, D.E. 6.1
Thorndike, E.L 142
TkaCz. S. lli
Torrance. H. R.!it \l!1
l26. 166
Toulmin, S. 2.1
Towns, MJ:L 127
Townshend, J. 189, 192,.l2.1
Travers. R,M.W.W
Tunstall, P. ill
Tymms, P. l.J6
Varon, E.J, ill
Vispoel W.P. 12. 22.
Vul1iam)'. G. lM::Z l88
Vygotsky. L5, g:!lJ:
224
Wallace, G. 13'l
Watkins. C. 'll. 1..
Walkins, D.R1J2
Webb, N,L 1.Z9.
Wl.'eden, P. Hi
W...mer, B. 66.
Wenger, E. 5fl..ll2. '.l.3
White, B. Y. l..2. !!l. l.ll6.
While, E.E. ill
While, J. 1M
l'IIhitty, G. l5S.
Wiliam. 0.11 L1ll. H. 'll.
&'!:Q.f!B.f!2.Z6.l2.'lZ.
114--15,120,123.127, 1:l(l,
136,142, HJ. J..i6. 155,
157. 166, In. 187.
m
Wilson. B.L 1ZZ
WilliQn, M. 97. lll!.
Wilson, S.M. 20
Winne, Cl:L on
Win\cr, I. ill
Wood, D. 'Kt. 2bl.
Yarroch, W.L.127
Young, M. 159, ill
ZenJerland, L.ID
Zimml'rman, B.).
Subject index
Added 10 the pil8l.' referent'\' 'f' dl.'notes a
figure.
activity systems, subject classrooms as !!Hz
2H
activity theory S9.
American Collcge (Acn l.Z6.
ARC (Assessment Reform Croup) L:i. ill
157,l97
'AS6I'SS""-'l'It is for Learning' (Aift)
programme l1Ill:02
assessment for learning Z::J... 197-:IDI
'compelling conceptualization' 204
concept 202-4
as a c)"cle of eV\'rlts J.l:Y,,5
definition 2
distinctions belwee" summalive
assessment and l. J..(),l"ji
educational and rontexlual ;HUes 1!l9 2m
and inquiry-base<! learning by le..chers
bZ
OECO study 11 OECO study
principles J. 2LI. Z!!. 1lJ8
and prof<"sionalleaming ow prof"""ional
devclopment/leaming
and student grooping strategies 191-2
UK policy on Uk policy
US policy sn US policy
snlJlStl formative aSS<.'SSment
for learning in the d3$room
""
ful"re issues 2bS
KMOFAr II KMOFAP (King's-M"",w,,>'.
Oxford"hi... Formati"" AS'lt'Mmenl
ProjI'd)
.......arch review 'i=.12
a__ of learning
distinctions between aSS'SSment for
"'"ming and 1ll3=6.
56 also summative assessment
As5essment Reform Group (ARC) L ill!.
1$7, l.2Z
aUribution th.""Y U
and locus of control 66.
Australia
impact and of policy
_ ulso QueeMland
BEAR (Berkeley Evaluation and ....
Research) pro;oo u.o
rontr.u;l with KMOF.... I'(King..Medway
Oxfordshin1 Formati..... Assessment
I'rojec1) '1Z=8
paltems of influ.ence '!5:
bdlaviourist theories Si. 62
unada
impact and of polley JjH
professional development/le,ming 19.1
self- and peer 'sseslimcnl l!llI.
cla$Skalle:lt !IS
classroom di,logue 1.4
framing and guiding 89
sludies2Y
classroom di5C'O\.lfW 11
clusroom practice 2-3. 11199-200
application of researdll1l::2L l'B
linb with dewk>pmenlll
examples 5lb2
,nd KMOfAI' (King's-Medway-
Oxfordshi", Formati..... Assessment
classroom roles
changes l.ld.ll
implications of assessment for le,ming
"='"
"" also sl\.Idenl's role; leacher's role
c1assruornli
assessmenl for learning In _ assessmenl
for learning in the classroom
a. communit;"" of p.mce lI.2:oJ
as figured worldsllb3
_ also subject cIass:rooms
cogniti..... lIoCCeleralion initi,\;w 28
cogni\lvist theories as
_ also COf\5lruclivisllht.'Ories
comment marking a 142-3
in QlIl'Cn5land 189
Gardner's theory of multiple intelligen!l53
goal orientation M::6. '.l'b3.
experimental manipulation ZbS
goals al different levels lQ9...-11 112f
grades
negative effects 11.. 22-
students' underilanding of Z2.,,3.
5.00 tests
Exallnra in SdIoolJ LS!I
explidlleaming l.Jlb!!
dilemll'la5
external 'l. .&&'l
"ffect of f8.
fonnative use 1.llB
s also nationAl curriculum; n.. lional
testing and assessment; ponfolio
llelection tests
exterT\lll control 66.
exlerNll rewards 63.
exlrinsk motivation
distinction between intrinsic and 62=3
negatiw impact 63. Z6
reedback J.lli"S
and C\Iltural diff..>.....-.as ill
experimental manipulation of ZJ,,5
impact'1it:::Z
in the OECD study J..8B",9O
researdllileTature 1L 12
and student-leacher interaction 8Z=!ll
through mar\cing Md5
validity and 141...fl
s I/so or.>l fle.edback; J"!'.'r ..-menl;
self'il5Se5$fIlent
figunod worlds, classrooms as 82:oJ.
fonn,ati"" assessment 2,,28
as a cydo:> of e-=b J..(I,b5
innovation in 100
learning contexl 5 learning context
relationship with summatlw ..-ment
_ relationship between fonnative and
summltiw a5lleSliment
reliability 129-JQ, I.!Cb6
""5'earm review 'bI1
andstudmts
involwmenl in 11
reactions 10 !l2:ol
lension between summative demands and
IB
theory 5 theory of formative assessment
all I Trojan Hol'R I!ll.. 99d.OO
validity 133=46
thr@at:'l to Li6
Ii IJJso _menl for learning
formative as!Ie55ment infonnation. use for
!ummati"", assessment IQ9...-J3. liZ
four componmt model of formati""
assessment 4. M:::2:i.!i'2. 201
fonmtive 1-.nefII 1JZ
national roniculum
__\ _
commltil!s of practices, as
"'"
...lidity 136
'ronstrJft 1rn'levanl variance' J..35,,6
'consttlifl undef"repraelltltion' 1.Y.
oonstrufivisl: lMorieI !L 6Zdl.
J.jQ
infIUC'n<\l! of U6. m !..J!!...
curricuta
between school and university
""""",
192-
relevance of policy l.95
natu and role oHeedback 189--90
development/leaming 00
studera IlS ag.....1'1 of dlang<! lit
of .-ments lli.l3ll
develo tal IndicalOn l..ll:b1.1
assessment 1ll4.
'dldactlc.Jltuarions'. creation of s:z
differenlittion ilL !llhl
di!IIClosurf 1.46.
of a question' 127
,_
policy III 166--7
S QIsolia5llfOOm disrol.IIX
'dislribu.;c. cognltlon'.5Z
dynamic _mentlll::J.5
....
pol.ltiG' l5Il
Ellgl.nd W::6. 166--7, 19'1=8
and dis<.vur-.
Labour tducatkln polldeflIS7-$, 164-6
nationafcurriculum 5
naoMal curriculum
""""",:!observation 58
evaluation
ofte 192-3
evaluatiw J..GI
evalWlliw 142
ct.....ginl bltsis of of
III-Ii
distincti'l" between evidence and
in:tt:tion of I.D'.l.
and formative
_1IlSO kft:natiVl' lIS5e5m'IeOt information;
evidelce
examinatio&s
formati purpose J.ll!b'j
226
righted malanal
Subject
as !twats to valid fomuotive iWeSSment
142-3
use in US schools 1m, l.82.
lTUlrking
group work 200
between a5Sol'SSment for
learning and 191-2
guided participation 911
individual interest 64.
inqulry-ba5l'd learning !lll::1
IlI5idt 1M B/;rd Boz 21 I.a. m
intelligena:
cffl on goal orientation of views
0'15
SIt 1I1t;</ multiple
intelligence testing If!!.. lZ2::Z
Scholastic Aptitude Test (SAT) 17.....7.
""'"
use in university admissions J.1.3"i
interactive formative assessment ill
'interaeli"" ""Sulation'lIZ
interest 6t
intemal ('OfItrol 66
intrinsic motivation
distinction bo!t_..."lrins;c and 62.=J.
promotion of 6.l::i
judgement of teachers llZ.l.46
Key Stage 1 initiative 22. I.li.2. l28.
reliability IUt
key stage tests, reliability 121125-6
KMOfAP (King's-Medway-o..cfordshire
Formative Project) J1cl.1
XL 8.L !H. 166
change in teacher practiC'e
contrast with the BAR (Berkeley
Evaluation and Research)
project 2Z::d
and impact 2b.'l
outcomes l.6d1I
patterns of influenC'e 2S::6
practices that developed Hd..6
research and practiC'e l.Hl
setting up l2.=l!
leadership, role of 1.'!'6L
learning
assumptions aboutll
motivation for sa motivation for learning
regulation of!it't r<>gUlalion of learning
thl'Oli<os of SIt theories of learning
sa also assessment for learnin.ll;
assessment of learning.: expHcit
learning.: inquiry-based learning.: meta-
learning.: penonaHzed learning.:
profes.olional development/leaming.:
self-""&,,lat..d learning; teacher learning
learning ronte>ct1.J6"t1
insid... th... classroom l3lb41
eJ<plicilleaming In explicit learning
formalivefsummative relatiomhip
Hf),j
trust and motivation l.1B
outside !he classroom ill
sa al5tl cultural contexts
learning environmentlZ::l8
'learning goals' 6S
Learning Huw to learn (l2l1 projecl:L
2J::i.?:L.:K!. 166
responses to staff questionnaire 1lb3
dimensions of classroom a55es5fTlmt
and teacher learnill& J.l::4.
relationships between lISIIe5Sment
practices and teacher learning
practices :Y=Z
learning organiuliOf\ll, developing schools
'learning orientation', distinction betwe.m
'perlurmanceorier\talion' and 92
learning tasks 11
lesson study.19
local asso.'SSment policies Zlb!Ifl
locus of control 66
lower ..chieving students il
impact of summative aS8l'SSment and
lests Z8
management of schooU s school
management
marker error ill
marking
h;oedback through Lfd.5
negative dfeC'ts 92
as a threat 10 valid fonnati\'e :ueessment
142-3
!it't comment marking.: grades; peer
marking
'mastery goals' 6S
mastery learning programmes 12
develOpment of !1.. 92
meta-learning 92
motivation for leaming 61-110, 200
components M::Z
concept 62,,;1
impaClll of assessment 6Z=Z.S
negali''I'1b=l
researdl fiIl::Z.S
ImporlanC'e 61-2. 25
and the role and statUS of e<luClltion
within a society ill
using assessment to promote Z5dlfl
mullipre intelligences, Gardner's 53
multiple-choice leSt!' 121
use in the US l.8l
national curriculum
227
Assessment and learning
introduction l.S.lo6
1eveI de:scriptiom J..l9,,4Q
5. Key Stage J initiative; key Stl>ge
'6'
national testing and.' ...mt 2B=8ll
No Child Left Behind (NCLB) Act (2001)
!Jll>,l
North America
classroom assessment Zb3
5 fIlSQ US policy
Nllf'lhem Ireland
policy environment 1.6.Y
"",Iection examinations Z!!::l..l.22.=3
OECD study J.8.S",96
definition of formative assessmmt188.
focusing on differences at the expeMe of
similariticsl86.
imp"d and ....lev;once of policy J.2,Y,
influence of cultural coolel<ts 1..ll6=Z
methodologicallimitl>tlons 1.8ZdI
nature and role of feedbKk I.8lb'lfl
problems of trafl!iferability 1BZ
profe$$ionaJ developm<!rlt/leaming 192..,3
khool impwvement a,mtextual factors
,.".
'''''If- and peer J.'lfb.I
student grouping sWllegics and
lIS6C55mmt for learning 191-2
oroll_ment 189-9ll
0I1l1 r..edbad: l!.. 189, 00
oral qul'l'itionir>g lS. 190
peer Z n:i.. 200
and cultural differences 132
in the OECD study J.'lfb.I
role In formative Il$6t'$6ment llL as.
pe.!r ....."'ing l1i
'performance goals' 65
pt':rformanc:e motivation 1.
'performance orientation', distinction
between 'leaming orientation' and!12
performativity l56, L2B... 166
per.;ooal interest 6!1.
per.;ooalized Ieaming H'L l.6.2. 166
plannrd formative assessment ill
policy H. 200
analysing lSlbl
impad and ",levance J.9,b6
influencing 202
5 IlISQ UK policy; US policy
policy '" 152::.l 166-7
policy as te,,1 151=2
'poillial of education' 150
portfolio ..-menl i. 58.
in Canada 00
in the US 178-9
5"/SQ o.-rwand portfolio _ment
ponfolio-ba-t YOClltionlll qualificatioM 1.4ll
po_r, i!5ues of !lIJ
practice 5 dassroom practin>
praise J..U,4
predlctioo\ use of tests for 127...,
profeMional d"""lopmmt/lO'amlng
27-42, 199-.200
links with the application of l'l'lIearm to
practice 2J
in theOECD study 192-3
questionnaire and findings 3lI:::9.
indkalO" for lIo-lI,m
progressive -.nentlO6::Z ill
llft-17, 201
and .....lidity J.Ji
""""'''"''
group work 192
Impact and ....levance of policy
nature and role of feedback 188-9
professional deve:lopmmtlJeaming 1.9.1
role of leadefship I.9.Y
self and peer -.nent l.flOd
students IS agents of ch.ange J.9,i
Queoensland portfolio I.l!Il...
ill. ill. ""
question sampling 12lb1
in c\asIlroom$ _ tlaM-room
dialogue
regulation of learning fl9'
teadlt'f'. role and 86=Z
....lationship between fonnatiYf and
summative 103..JZ WI
didlotomy Qr dimension 113-15
distinctions i. .lDJ::6... U5=l6
efft on validity u.o
fonnalive use of IUmmaliYf
,.."
using fonnative ilIl!leSIlment information
lor iUmmalive as5eIlII"JWI1t lQ9.-JJ. 111
reliability of..-nmts i. I INI, 201
deci5ion romlstency l2.b3O
and the owrlap between validity and
reliability 127-9
and the fonnalive
-,
evidence about l.22=l. J.3lb.1
threat512lb1
_....
applkation in'" pradke J..!!:,1L l.'B
. finq with prof".iona\ 2J
_11I50 lolacher research
re9l'ardllessons J9
reJOllf'l:eI, effec:t on formatiw ..-ment
ill
rewards
negative impact UI
sa also l'lCternal rewards
228
Subject index
roil'S sa classroom roll.."
schooling l38
20
Aptitude Tesl (SAn 174-7, 1.&1=2
school '!L ZIl
relalionship bel.ween pe=ptions of
teacher leamillS practic._", alld
classroom aiISCSSrm.'r11 practices lldI
Srotl.nd 2l. ZIl. 159-62. 166, lbZ
5eledioll, u:;e of lesl ......ults for 12&09:
:;electiolllest"
an.lysis of ",liability U2.::3
impact on for le.millg i'D=l
5elf..o;.:;essment 12 :lli.. 2J
effeet of e.rem.1 assessment 69:
goal and 14.
in the OECD study l..'.llhl
role in formalive asscssm""Il!.L LIS
scll...,fficacy 66=1
dfeCI 0/ cbssroom assessment 1.1.=2
5e1f-<$Il",m 66
impact of national 1'-'515 in England alld
Wak", ff!.
sclf.regulal.'<l le.ming fil. 'l2
effect on lor learning Z1
silual,-d Ih''Ori"" 5fd!
situ.tional inten",l 6:l
conslructivism' S'l
SO<ial,,'lationship gOilI" 'l2.::cJ
wcilKUltural tht.-ori,,,, 56-8, 5':1
inter..."i"n
feedback and IiHl
impru,oing lhrough fonnalin' assessm,."l
100
slud,."ts
as agents of change 1.2:1
a",umptions about leamin)\ 2J
f... tu illS
and formative .....'SSment
im'olvemenl in lJ
noactions to!U.::3
involvement in doosiuns about Z!I
perCl'plions of teacher personal illle"'St 21
understanding of Z2:...l
s alS<J lower achieving stodents; subjl..:t-
interaction
stodenr. role
chanS''''8Z
in learning 2.ld
subjc<:1 classrooms, activity syslems!!H
9H
subjl",t discipline, relalionship I>o."""",n Ih,'
'e""her's role and
subj..",t k""",k"'g", importan,"" uf leach..,s'
""" LN
subjol",t-studenl inleraction, importan"" of
chang., in t!6
summative as,,-'SSme"',t
impact on lower achil'Ving students ZIl
nolationship with formative assessment
sa ndalionship betwem formative and
"""sment
l..1b5
Ihreats 10 & US::::!l
sa of teachers'
summative assessm''fll formali"e
"""
summali"" 1'-'5'"
formalive Us<' 15-16, l.!!. l.J:lZ"li
impact On lower achil'Ving students ZIl
",Iiabilily ll'.b2'l
'Support for in
Furmalive AS!lCSSrm.'flt' 22=J
'Talk Les.suns':!B
Task Group on A......sment Tcstin,o;
(TeAl) report lQi, 153-5, lbZ
leacher d.. profes.sional
de"elop"",nl{1eaming
t..acher W. 22=JO.
Cl'ntralily J2=ll1
relalionship betw,,,,n <loISsroom
oISses,menl prJcti.,.. and
teach...
t.'aclwrs
assumptions Jbout leaming 2J
to l.ll:l::5
importanU' uf subjl"C1 knowll-dge ll5-{"
LN
inquiry-bJs<>d lI!aming br ll!=2
ludg,>m''''1 117, l:l6
I,'vels of support from Zl
assessn"",t by l.l.!:>ol.
StY ul", studenltncher interaction
rul"
and Ihe r"l\ulation of ll6=Z
relatiunship betw"",n subjo.."CI disciplill<'
Jnd 8S=6
t,-,ching
192-3
"'-"Iu"nce of Slag'''; 8ll
teJching 10 the t,,,,1 & 77-8, IO!\, l!!Z
't.,;1 analrsis' l1lil
t.... 1qu,,,,lion5. preparation 1&
't,,,,ling when ready' ZIl
t.'SIS
J.1lIb9.
impact on lower achieving Slud..nlS ZIl
sl"d""ls ill d,'Ci,ions Zl!
use fur pn'<lictic'" 127-8
use for :;election
5 01", t,,,,ls; summati"I!
Ic.;ls; 1<> 111,- tt'St
lexl, poiicy l.5.b2
TC/ T O"',k Group on Asso.o;smmt and
Tc"ing) ,..,port lJ.!:l.l..S.b:!. 1St>, lbZ
229
A$wssment and Learning
theories of leamlng H .iZ::6O
alignment bt'tweet\ aS5eSlill'letlt and
e>,ampres of classroom assessment
practices Sl!::2
beIla""""rist 6Z
cognitive, constructivist 1L :!'11i7-S,
"
possibilities for edectidsm Of synthesis
"""
5OCio-mltural, situated and aaivity
IN.1so activity theory
theory of fol'TT\ative aS$$Sment ill_1m, 201
application of activity theory <y"z
four component model!. 2'1
'"
model for cla5.'lroom transactions
I!k2
",lated """"arm and development studift
"'"
strategies for development 'l2=8.
thinking skills programm<'5 'lll
To...",rd. Dialog'c '.1.8
'traffic lighl!!' 15
training. efkct on formative aS6eSSment ill
tratli;ferabiUty, problenu of lllZ
Trojan Horw. formative as 28..
<"MOO
tru.tlJ8
UJ( policy 149-67,202
in England _ England
in Northern Ireland I..ft.Y
in Srot1and 22 Z!l. 159-62, 166,16Z
in Wall'S 166, lfiZ
US poIi')' 169--83 202
assessment for lKOOUntabilHy l77-11I,
l!b1
No OliJd Left Behind (NelSI Act
(2001) ll!!l::l
portfo1iOOli 178-9
standards l.Z2::,/!O
assesliment in schools 1ZO
studies of z:b1. ZH
and intelligtmee testing _ inh'lligence
SchoIasllc Aptitude Test (SAT) 174-7
written exmlinations 1.Zlb2
validity .u=!I
of formative D)...;Mi,2fi1-2
and feedback 1414>
and the learning rontext IN learning
conte>.1
overlap between rrliability and 127-9
of summaliV1! .s_m<'Tlt J...l:bS.
ttm.ats to '& ll50di
waitlilN' """"arm. LI.
lmpa.ct of nation;ol teMing and
""""
policy environment 162--.). 166, 16Z
Working BillCk Box 22
Zone of Proximal Development (ZP'DI
O<hl
230