Appropriate Criteria Key To Effective Rubrics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

REVIEW

published: 10 April 2018


doi: 10.3389/feduc.2018.00022

Appropriate Criteria: Key to Effective


Rubrics
Susan M. Brookhart*

Department of Educational Foundations and Leadership, Duquesne University, Pittsburgh, PA, United States

True rubrics feature criteria appropriate to an assessment’s purpose, and they describe
these criteria across a continuum of performance levels. The presence of both criteria and
performance level descriptions distinguishes rubrics from other kinds of evaluation tools
(e.g., checklists, rating scales). This paper reviewed studies of rubrics in higher education
from 2005 to 2017. The types of rubrics studied in higher education to date have been
mostly analytic (considering each criterion separately), descriptive rubrics, typically with
four or five performance levels. Other types of rubrics have also been studied, and some
studies called their assessment tool a “rubric” when in fact it was a rating scale. Further,
for a few (7 out of 51) rubrics, performance level descriptions used rating-scale language
or counted occurrences of elements instead of describing quality. Rubrics using this kind
of language may be expected to be more useful for grading than for learning. Finally, no
relationship was found between type or quality of rubric and study results. All studies
described positive outcomes for rubric use.
Edited by:
Keywords: criteria, rubrics, performance level descriptions, higher education, assessment expectations
Anders Jönsson,
Kristianstad University College,
Sweden
A rubric articulates expectations for student work by listing criteria for the work and performance
Reviewed by:
level descriptions across a continuum of quality (Andrade, 2000; Arter and Chappuis, 2006). Thus,
Eva Marie Ingeborg Hartell,
a rubric has two parts: criteria that express what to look for in the work and performance level
Royal Institute of Technology, Sweden
Robbert Smit, descriptions that describe what instantiations of those criteria look like in work at varying quality
University of Teacher Education St. levels, from low to high.
Gallen, Switzerland Other assessment tools, like rating scales and checklists, are sometimes confused with rubrics.
*Correspondence: Rubrics, checklists, and rating scales all have criteria; the scale is what distinguishes them. Checklists
Susan M. Brookhart ask for dichotomous decisions (typically has/doesn’t have or yes/no) for each criterion. Rating
[email protected] scales ask for decisions across a scale that does not describe the performance. Common rating
scales include numerical scales (e.g., 1–5), evaluative scales (e.g., Excellent-Good-Fair-Poor), and
Specialty section: frequency scales (e.g., Always, Usually-Sometimes-Never). Frequency scales are sometimes useful
This article was submitted to for ratings of behavior, but none of the rating scales offer students a description of the quality of
Assessment, Testing and Applied
their performance they can easily use to envision their next steps in learning. The purpose of this
Measurement,
paper is to investigate the types of rubrics that have been studied in higher education.
a section of the journal
Frontiers in Education Rubrics have been analyzed in several different ways. One important characteristic of rubrics
is whether they are general or task-specific (Arter and McTighe, 2001; Arter and Chappuis,
Received: 01 February 2018
2006; Brookhart, 2013). General rubrics apply to a family of similar tasks (e.g., persuasive
Accepted: 27 March 2018
Published: 10 April 2018 writing prompts, mathematics problem solving). For example, a general rubric for an essay on
characterization might include a performance level description that reads, “Used relevant textual
Citation:
Brookhart SM (2018) Appropriate
evidence to support conclusions about a character.” Task-specific rubrics specify the specific facts,
Criteria: Key to Effective Rubrics. concepts, and/or procedures that students’ responses to a task should contain. For example, a
Front. Educ. 3:22. task-specific rubric for the characterization essay might specify which pieces of textual evidence
doi: 10.3389/feduc.2018.00022 the student should have located and what conclusions the student should have drawn from this

Frontiers in Education | www.frontiersin.org 1 April 2018 | Volume 3 | Article 22


Brookhart Appropriate Criteria: Key to Effective Rubrics

evidence. The generality of the rubric is perhaps the most mechanisms (p. 138): increasing transparency, reducing anxiety,
important characteristic, because general rubrics can be shared aiding the feedback process, improving student self-efficacy, or
with students and used for learning as well as for grading. supporting student Self-regulation.
The prevailing hypothesis about how rubrics help students is Reddy and Andrade (2010) addressed the use of rubrics in
that they make explicit both the expectations for student work post-secondary education specifically. They noted that rubrics
and, more generally, describe what learning looks like (Andrade, have the potential to identify needs in courses and programs,
2000; Arter and McTighe, 2001; Arter and Chappuis, 2006; Bell and have been found to support learning (although not in all
et al., 2013; Brookhart, 2013; Nordrum et al., 2013; Panadero and studies). The found that the validity and reliability of rubrics
Jonsson, 2013). In this way, rubrics play a role in the formative can be established, but this is not always done in higher
learning cycle (Where am I going? Where am I now? Where to education applications of rubrics. Finally, they found that some
next? Hattie and Timperley, 2007) and support student agency higher education faculty may resist the use of rubrics, which
and self-regulation (Andrade, 2010). Some research has borne may be linked to a limited understanding of the purposes of
out this idea, showing that rubrics do make expectations explicit rubrics. Students generally perceive that rubrics serve purposes
for students (Jonsson, 2014; Prins et al., 2016) and that students of learning and achievement, while some faculty members think
do use rubrics for this purpose (Andrade and Du, 2005; Garcia- of rubrics primarily as grading schemes (p. 439). In fact, rubrics
Ros, 2011). General rubrics should be written with descriptive are not as easy to use for grading as some traditional rating or
language, as opposed to evaluative language (e.g., excellent, point schemes; the reason to use rubrics is that they can support
poor) because descriptive language helps students envision learning and align learning with grading.
where they are in their learning and where they should go Some criticisms and challenges for rubrics have been noted.
next. Nordrum et al. (2013) summarized words of caution from several
Another important way to characterize rubrics is whether they scholars about the potential for the criteria used in rubrics to
are analytic or holistic. Analytic rubrics consider criteria one at be subjective or vague, or to narrow students’ understandings
a time, which means they are better for feedback to students of learning (see also Torrance, 2007). In a backhanded way,
(Arter and McTighe, 2001; Arter and Chappuis, 2006; Brookhart, these criticisms support the thesis of this review, namely, that
2013; Brookhart and Nitko, 2019). Holistic criteria consider all appropriate criteria are the key to the effectiveness of a rubric.
the criteria simultaneously, requiring only one decision on one Such criticisms are reasonable and get their traction from the
scale. This means they are better for grading, for times when fact that many ineffective or poor-quality rubrics exist, that do
students will not need to use feedback, because making only one have vague or narrow criteria. A particularly dramatic example
decision is quicker and less cognitively demanding than making of this happens when the criteria in a rubric are about following
several. the directions for an assignment rather than describing learning
Rubrics have been characterized by the number of criteria (e.g., “has three sources” rather than “uses a variety of relevant,
and number of levels they use. The number of criteria should be credible sources”). Rubrics of this kind misdirect student efforts
linked to the intended learning outcome(s) to be assessed, and and mis-measure learning.
the number of levels should be related to the types of decisions Sadler (2014) argued that codification of qualities of good
that need to be made and to the number of reliable distinctions work into criteria cannot mean the same thing in all contexts
in student work that are possible and helpful. and cannot be specific enough to guide student thinking. He
Dawson (2017) recently summarized a set of 14 rubric suggests instantiation instead of codification, describing a process
design elements that characterize both the rubrics themselves of induction where the qualities of good work are inferred from
and their use in context. His intent was to provide more a body of work samples. In fact, this method is already used
precision to discussions about rubrics and to future research in classrooms when teachers seek to clarify criteria for rubrics
in the area. His 14 areas included: specificity, secrecy, (Arter and Chappuis, 2006) or when teachers co-create rubrics
exemplars, scoring strategy, evaluative criteria, quality levels, with students (Andrade and Heritage, 2017).
quality definitions, judgment complexity, users and uses,
creators, quality processes, accompanying feedback information, PURPOSE OF THE STUDY
presentation, and explanation. In Dawson’s terms, this study
focused on specificity, evaluative criteria, quality levels, quality A number of scholars have published studies of the reliability,
definitions, quality processes, and presentation (how the validity, and/or effectiveness of rubrics in higher education and
information is displayed). provided the rubrics themselves for inspection. This allows for
Four recent literature reviews on the topic of rubrics (Jonsson the investigation of several research questions, including:
and Svingby, 2007; Reddy and Andrade, 2010; Panadero and
(1) What are the types and quality of the rubrics studied in
Jonsson, 2013; Brookhart and Chen, 2015) summarize research
higher education?
on rubrics. Brookhart and Chen (2015) updated Jonsson and
(2) Are there any relationships between the type and quality of
Svingby’s (2007) comprehensive literature review. Panadero and
these rubrics and reported reliability, validity, and/or effects
Jonsson (2013) specifically addressed the use of rubrics in
on learning and motivation?
formative assessment and the fact that formative assessment
begins with students understanding expectations. They posited Question 1 was of interest because, after doing the previous
that rubrics help improve student learning through several review (Brookhart and Chen, 2015), I became aware that not

Frontiers in Education | www.frontiersin.org 2 April 2018 | Volume 3 | Article 22


Brookhart Appropriate Criteria: Key to Effective Rubrics

all of the assessment tools in studies that claimed to be about a previous review of literature on rubrics from 2005 to 2013.
rubrics were characterized by both criteria and performance level Thirty-six studies from that review were done in the context of
descriptions, as for true rubrics (Andrade, 2000). The purpose of higher education. I conducted an electronic search for articles
Research Question 1 was simply to describe the distribution of published from 2013 to 2017 in the ERIC database. This yielded
assessment tool types in a systematic manner. 10 additional studies, for a total of 46 studies. The 46 studies have
Question 2 was of interest from a learning perspective. the following characteristics: (a) conducted in higher education,
Various types of assessment tools can be used reliably (Brookhart (b) studied the rubrics (i.e., did not just use the rubrics to study
and Nitko, 2019) and be valid for specific purposes. An something else, or give a description of “how-to-do-rubrics”),
additional claim, however, is made about true rubrics. Because and (c) included the rubrics in the article.
the performance level descriptions describe performance across There are two reasons for limiting the studies to the higher
a continuum of work quality, rubrics are intended to be useful education context. One, most published studies of rubrics have
for students’ learning (Andrade, 2000; Brookhart, 2013). The been conducted in higher education. I do not think this means
criteria and performance level descriptions, together, can help fewer rubrics are being used in the K-12 context; I observe a lot
students conceptualize their learning goal, focus on important of rubric use in K-12. Higher education users, however, are more
aspects of learning and performance, and envision where they likely to do a formal review of some kind and publish their results.
are in their learning and what they should try to improve Thus the number of available studies was large enough to support
(Falchikov and Boud, 1989). Thus I hypothesized that there a review. Two, given that more published information on rubrics
would not be a relationship between type of rubric and exists in higher education than K-12, limiting the review to higher
conventional reliability and validity evidence. However, I did education holds constant one possible source of complexity in
expect a relationship between type of rubric and the effects of understanding rubric use, because all of the students are adult
rubrics on learning and motivation, expecting true descriptive learners. Rubrics used with K-12 students must be written at an
rubrics to support student learning better than the other types of appropriate developmental or educational level. The reason for
tools. limiting the studies to ones that included a copy of the rubrics in
the article was that the analysis for this review required classifying
METHOD the type and characteristics of the rubrics themselves.
Information about the 46 studies was entered into a
This study is a literature review. Study selection began with the spreadsheet. Information noted about the studies included
data base of studies selected for Brookhart and Chen (2015), country, level (undergraduate or graduate), type (rubric, rating

TABLE 1 | Types of rubrics used in studies of rubrics in higher education.

Type How criteria Performance level descriptions used Performance level descriptions included rating-scale Total
are considered descriptive language language and/or relied on counting occurrences

General Rubrics Analytic 1 level – 1 4 levels – 4 35


3 levels – 3 5 levels – 1
4 levels – 14 7 levels – 1
5 levels – 8 Total – 6
6 levels – 2
8 levels – 1
Total – 29

Holistic 4 levels – 3 5 levels – 1 5


5 levels – 1
Total – 4

Task-Specific Rubrics Analytic 2 levels – 1 1

Holistic 1 level – 2 2

Rating Scale Analytic 5 5

Point Scheme Holistic 3 3

Total 36 15 51

Number of rubrics does not equal number of studies because some studies had more than one rubric.
General rubrics are general enough to apply to a family of similar tasks and can be shared with students. Task-specific rubrics apply to just one task and cannot be shared with students.
Analytic rubrics consider each criterion separately. Holistic rubrics consider all criteria simultaneously.
Rating scales require ratings on criteria using a judgmental scale. Examples include numeric scales (e.g., 1–5), frequency scales (e.g., always-usually-sometimes-never), and evaluative
scales (e.g., excellent-good-fair-poor).
Point schemes are schemes to score tasks by assigning points to various aspects of students’ responses.

Frontiers in Education | www.frontiersin.org 3 April 2018 | Volume 3 | Article 22


Brookhart Appropriate Criteria: Key to Effective Rubrics

TABLE 2 | Reliability evidence for rubrics.

Study Level Rubric topic & description Sample Reliability evidence

Avanzino, 2010 Undergraduate Oral communication 230 speeches (112 κ = 0.92


Analytic rubric with 3 criteria, 3 levels with individual, 118 group)
mostly descriptive plds
Britton et al., 2017 Undergraduate Team-Q Rubric for individual teamwork skills 70 students in a theater External rater ICC 0.76
Final version: 5 criteria, each with behavioral history and literature course, Research assistants ICC 0.77
descriptions, rated with a 5-level frequency 24 of whom gave full Peers (4–5 per group) ICC 0.79
scale (never to always) consent For revised rubric: internal consistency of
self-ratings α = 0.91; internal consistency of
peer-ratings α = 0.97
Chasteen et al., Undergraduate Physics, electromagnetism 103 students in 3 courses κ = 0.41
2012 Detailed task-specific point schemes for each (final version), 432 students consistency between criteria
task in 14 courses during test α = 0.82
development
Cho et al., 2006 Undergraduate, SWoRD Writing Rubrics 708 students in 16 courses Untrained raters
graduate Analytic rubric with 3 criteria and 7 levels; Plds over 3 years from 4 Single rater ICCs 0.17–0.56
were somewhat descriptive but relied on universities Multiple rater ICCs 0.45–0.88
counting (e.g., “all but one argument…“) or Compared reliability from student and
rating-scale language instructor perspectives
Ciorba and Smith, Undergraduate Music – Instrumental and vocal performance 28 panels of judges, 359 inter-judge consistency, median α = 0.89
2009 Analytic rubric with 3 criteria and descriptive music students’
plds at 5 levels performances
DeWever et al., Undergraduate Group work 659 students in 2 years, in Untrained raters
2011 Analytic rubric with 4 criteria and descriptive groups of 8–9 (81 groups) Single rater ICCs.33 −0.50 (individual
plds at 4 levels criteria),0.50 −0.60 (total score)
Garcia-Ros, 2011 Undergraduate Oral presentation 64 educational psychology exact agreement = 66%
14 criteria organized into 4 areas. 4 levels (0-3) students adjacent agreement = 98%
with descriptive plds κ = 0.36 exact agreement
κ = 0.80 adjacent agreement
median r = 0.89
Kocakülah, 2010 Undergraduate Newton’s Laws of Motion problem solving 153 physics students in 4 Untrained raters
Rubric style point scheme; Analytic rubric with classes single rater ICCs, 0.14, 0.38
6 criteria and descriptive plds at 5 levels, but multiple rater ICCs, 0.93, 0.98
points vary depending on the criterion instructor’s consistency between 2 forms,
median α = 0.76
Lewis et al., 2008 Undergraduate Acute care treatment planning 22 students, 5 clinical Expert raters
Analytic rubric with 4 criteria and descriptive educators, 1 academic Single rater ICC = 0.32
plds at 4 levels faculty
Menéndez-Varela Undergraduate Service learning projects 84 history of art students Project content α increased from 0.67 (at stage
and Gregori-Giralt, 2 analytic rubrics. Content: 4 criteria, 4 levels 2 of study) to 0.93 (at stage 3 of study; α for
2016 each, w/ descriptive plds. Oral presentation: 5 oral presentation skills was 0.77
criteria, 4 levels, descriptive plds except for
time
Newman et al., Graduate Peer assessment of faculty teaching 14 resource faculty Expert raters
2009 faculty Rating scale, 1–5 (excellent through does not Single rater ICC = 0.27 (total score)
demonstrate criterion), on 11 criteria
Nicholson et al., Undergraduate Nurse clinical performance in operating suite 40 pre-op nurses rating 3 Expert raters
2009 Analytic rubric with 12 criteria and descriptive videos Single rater ICCs.51 −0.61
plds at 4 levels. Descriptions required Multiple rater ICC =0.98
inferences (e.g., ”would require some
prompting and assistance” p. 75).
Pagano et al., Undergraduate Writing (College composition) 6 institutions year 1, 5 Adjacent agreement = 74%
2008 Analytic rubric, 6 levels with descriptive plds at institutions year 2
3 of the levels (1–2, 3–4, 5–6)
Reddy, 2011 Graduate Business Cases, Business Projects 35 instructors, 95 business Exact agreement 0.61–0.99
Business case study rubric (4 dim); business students, 2 institutions Single rater ICCs 0.90–0.95
project rubric (7 dim), each with descriptive Multiple rater ICCs 0.71–0.99
plds at 4 levels

(Continued)

Frontiers in Education | www.frontiersin.org 4 April 2018 | Volume 3 | Article 22


Brookhart Appropriate Criteria: Key to Effective Rubrics

TABLE 2 | Continued

Study Level Rubric topic & description Sample Reliability evidence

Rochford and Graduate Business case analysis Case analysis assignments Multiple rater ICC = 0.96
Borchert, 2011 Analytic rubric, 10 criteria, organized into 4 in MBA program capstone
”subobjectives“ using a 1-5 scale with course
descriptive plds for 1, 3, and 5.
Schamber and Undergraduate Critical thinking 2002, 30 papers; 2003, 30 Median r = 0.90
Mahoney, 2006 5 criteria (for each section of the paper) based papers
on Facione and Facione (1996), with
descriptive plds at5 levels
Schreiber et al., Undergraduate Public Speaking Competence Rubric Study 1, 5 coders, 45 Expert raters
2012 Analytic rubric with 9 criteria (+2 optional), with speeches; Study 2, 3 Multiple rater ICCs 0.91, 0.93
descriptive plds at 5 levels undergraduate + 1 faculty
coder, 50 speeches
Stellmack et al., Undergraduate Writing APA-style introductions 40 papers, 3 Interrater agreement
2009 Analytic rubric with 8 criteria with descriptive researcher/graders exact = 0.37, adjacent = 0.90
plds at 4 levels Intrarater agreement
exact = 0.78, adjacent = 0.98
κ = 0.33
Timmerman et al., Undergraduate Science writing 142 lab reports, 9 trained Generalizability for relative decisions = 0.85
2011 Analytic rubric with 15 criteria and descriptive and 8 ’natural’ graduate
plds at 4 levels student raters
Wald et al., 2012 Graduate Reflective writing 10–60 narratives over 5 Single-rater ICCs 0.51–0.75
Analytic rubric with 5 criteria (+1 optional) and trials Inter-judge consistency, median α = 0.77
descriptive plds at 4 levels
Wallace et al., Undergraduate Astronomy – Cosmology 65 responses from 21 Exact agreement, overall score = 83%
2011 Task-specific, holistic rubrics for each test item, students, 9 items κ = 0.76, weighted κ = 0.82
with 5 levels

plds, Performance Level Descriptions.

scale, or point scheme), how the rubric considered criteria at that level and, therefore, do not provide a clear description
(analytic or holistic), whether the performance level descriptors of the learning goal. An example of evaluative language used in
were truly descriptive or used rating scale and/or numerical a rubric can be found in the performance level descriptions for
language in the levels, type of construct assessed by the rubrics one of the criteria of an oral communication rubric (Avanzino,
(cognitive or behavioral), whether the rubrics were used with 2010, p. 109). This is the performance level description for Level
students or just by instructors for grading, sample, study method 2 (Adequate) on the criterion of Delivery:
(e.g., case study, quasi-experimental), and findings. Descriptive
Speaker’s delivery style/use of notes (manuscript or
and summary information about these classifications and study
extemporaneous) is average; inconsistent focus on audience.
descriptions was used to address the research questions.
As an example of what is meant by descriptive language in Notice that the key word in the first part of the performance
a rubric, consider this excerpt from Prins et al. (2016). This is level description, “average,” does not give any information to
the performance level description for Level 3 of the criterion the student about what average delivery looks like in regard to
Manuscript Structure from a rubric for research theses (p. 133): style and use of notes. The second part of the performance level
description, “inconsistent focus on audience,” is descriptive and
All elements are logically connected and keypoints within
gives students information about what Level 2 performance looks
sections are organized. Research questions, hypotheses,
like in regard to audience focus.
research design, results, inferences and evaluations are related
and form a consistent and concise argumentation.
RESULTS AND DISCUSSION
Notice that a key characteristic of the language in this
performance level description is that it describes the work. Thus The 46 studies yielded 51 different rubrics because several studies
for students who aspire to this high level, the rubric depicts for included more than one rubric. The two sections below take up
them what their work needs to look like in order to reach that results for each research question in turn.
goal.
In contrast, if performance level descriptions are written Type and Quality of Rubrics
in evaluative language (for example, if the performance level Table 1 displays counts of the type and quality of rubrics found in
description above had read, “The paper shows excellent the studies. Most of the rubrics (29 out of 51, 57%) were analytic,
manuscript structure”), the rubric does not give students the descriptive rubrics. This means they considered the criteria
information they need to further their learning. Rubrics written separately, requiring a separate decision about work quality
in evaluative language do not give students a depiction of work for each criterion. In addition, it means that the performance

Frontiers in Education | www.frontiersin.org 5 April 2018 | Volume 3 | Article 22


Brookhart Appropriate Criteria: Key to Effective Rubrics

TABLE 3 | Validity evidence for rubrics.

Study Level Rubric topic & description Sample Validity evidence

Avanzino, 2010 Undergraduate Oral communication 230 speeches (112 Based on student learning outcomes;
Analytic rubric with 3 criteria, 3 levels with individual, 118 group) Subject expert review
mostly descriptive plds
Bauer and Cole, Undergraduate Chemistry guided-inquiry activities Rating 60 science faculty, 4 Rubric was sensitive enough to distinguish four
2012 scale, 0-3, on 15 indicators of POGIL (process manipulated versions of the versions of the activity
oriented guided inquiry learning) task
Britton et al., 2017 Undergraduate Team-Q Rubric for individual teamwork skills 70 students in a theater Factor analysis yielded a one-factor solution
Final version: 5 criteria, each with behavioral history and literature course,
descriptions, rated with a 5-level frequency 24 of whom gave full
scale (never to always) consent
Chasteen et al., Undergraduate Physics, electromagnetism 103 students in 3 courses Expert feedback;
2012 Detailed task-specific point schemes for each (final version), 432 students Student interviews[
task in 14 courses during test Student results differed by course (could
development differentiate types of instruction),
criterion-related evidence (to physics grades)
Cho et al., 2006 Undergraduate, Writing 708 students in 16 courses Correlations of student ratings with instructor
graduate Analytic rubric with 3 criteria and 7 levels; Plds over 3 years from 4 and expert ratings
were somewhat descriptive but relied on universities
counting (e.g., ”all but one argument…“) or
rating-scale language
Ciorba and Smith, Undergraduate Music – Instrumental and vocal performance 28 panels of judges, 359 Scores rose by year (Fr-Soph-Jr-Sr);
2009 Analytic rubric with 3 criteria and descriptive music students’ Scale intercorrelations (internal validity
plds at 5 levels performances evidence)
Garcia-Ros, 2011 Undergraduate Oral presentation 64 educational psychology Students’ perceptions
14 criteria organized into 4 areas. 4 levels (0–3) students
with descriptive plds
Hancock and Graduate Graduate Student Development Profile for Pilot 26 first year students, Demonstrated student growth over time;
Brundage, 2010 Speech-Language Pathology students then applied whole-program Faculty perceptions
Jonsson, 2014 Graduate 3 rubrics 13 statistics students in an Students found the rubrics transparent and
Survey construction rubric in epidemiology: epidemiology program, 105 useful. Criteria were aligned with assignments,
analytic, general rubric, 2 criteria, 4 levels with real estate students, 48 ”thereby inviting the students to use the rubrics
plds for each dental students as guides to performance, as well as tools for
House inspection rubric in real estate program: self-assessment and reflection” (p. 849).
more like a checklist, w/ multiple criteria and a Results were interpreted to mean that rubrics
tally of facts and reasoning for each made assessment expectations explicit for
Patient communication rubric in dental students.
program: indicators for each of several criteria
Kocakülah, 2010 Undergraduate Physics – Newton’s Laws of Motion problems 153 physics students in 4 Students’ mean peer scores were same as
Rubric style point scheme; Analytic rubric with classes Instructor scores
6 criteria and descriptive plds at 5 levels, but
points vary depending on the criterion
Latifa et al., 2015 Undergraduate Practical Rating Rubric of Speaking Test 12 English speaking Lecturers found the grading scale easy to use.
Holistic grading rubric with 5 levels (0-4), 5 lecturers in several Authors asserted they compared it with analytic
criteria, mostly counting (e.g., percentage of institutions in Indonesia scoring.
errors)
Menéndez-Varela Undergraduate Service learning projects 84 history of art students Three factors: Project content, Oral
and Gregori-Giralt, 2 analytic rubrics. Content: 4 criteria, 4 levels presentation skills, and Difficulty
2016 each, w/ descriptive plds. Oral presentation: 5
criteria, 4 levels, descriptive plds except for
time
Moni et al., 2005 Undergraduate Concept maps – Physiology 62 students, 2 faculty (plus Student perceptions;
Study was done using original “rubric,” which 1 faculty advisor) Faculty perceptions
was a point scheme for the concept map task.
Revised rubric was an analytic rubric, 3 criteria,
5 levels, descriptive plds, based on student &
faculty feedback
Pagano et al., Undergraduate Writing (College composition) 6 institutions year 1, 5 Scores increased from early to late in the
2008 Analytic rubric, 6 levels with descriptive plds at institutions year 2 semester
3 of the levels (1-2, 3-4, 5-6)

(Continued)

Frontiers in Education | www.frontiersin.org 6 April 2018 | Volume 3 | Article 22


Brookhart Appropriate Criteria: Key to Effective Rubrics

TABLE 3 | Continued

Study Level Rubric topic & description Sample Validity evidence

Prins et al., 2016 Undergraduate Research theses in education 105 students Studied student use and perceptions via
Analytic rubric, 6 criteria, 3 levels, descriptive questionnaire. Students felt rubrics had 4
plds for levels 2 “must have” and 3 “nice to functions (based on a factor analysis of
have” (where 1 was assumed to be “does not questionnaire). Students who got lower grades
have”) on the task reported beginning to apply the
rubric’s criteria later. Faculty wanted another
level to distinguish good from excellent work.
Reddy, 2011 Graduate Business Cases, Business Projects 35 instructors, 95 business Expert review;
Business case study rubric (4 dim); business students, 2 institutions Student perceptions
project rubric (7 dim), each with descriptive
plds at 4 levels
Rezaei and Graduate Writing 467 graduate students Quasi-experiment investigating influence of
Lovorn, 2010 Analytic rubrics with 5 criteria and descriptive construct-irrelevant factors
plds at 4 levels; descriptions somewhat
inferential (e.g., “limited understanding”)
Schreiber et al., Undergraduate Public Speaking Competence Rubric Study 1, 5 coders, 45 Factor analysis (internal structure evidence);
2012 Analytic rubric with 9 criteria & 2 optional speeches; Study 2, 3 Criterion-related evidence (correlation of rubric
criteria, with descriptive plds at 5 levels undergraduate + 1 faculty scores for speeches with grades assigned to
coder, 50 speeches the speeches using different scoring schemes
during the semester)
Stellmack et al., Undergraduate Writing APA-style introductions 40 papers, 3 Criterion-related evidence (Spearman
2009 Analytic rubric with 8 dimensions with researcher/graders correlation with independent judge)
descriptive plds at 4 levels
Timmerman et al., Undergraduate Science writing Analytic rubric with 15 criteria 142 lab reports, 9 trained Grader (graduate student) perceptions; Faculty
2011 and descriptive plds at 4 levels and 8 ’natural’ graduate (expert) review
student raters
Urios et al., 2015 Undergraduate Teamwork and oral & written communication 2 groups, 30 students in Validation questionnaire. Students lacked
skills, in a chemical engineering degree each, 1 teacher & teaching knowledge of the use of rubrics, lacked
3 main criteria and subcriteria, with rating-scale assistant in each adaptability and were somewhat resistant. Also
language in 2 to 4 levels under each, mostly “lack of commitment and proactivity in the
about surface features teaching/learning process” p. 147.
Wald et al., 2012 Graduate Reflective writing 10–60 narratives over 5 Rubric content based on literature
Analytic rubric with 5 criteria (+1 optional) and trials
descriptive plds at 4 levels
Wallace et al., Undergraduate Astronomy – Cosmology 65 responses from 21 Rubric content based on student responses to
2011 Task-specific, holistic rubrics for each test item, students, 9 items tasks
with 5 levels
Young, 2013 Undergraduate Physiotherapy clinical demonstrations 67 students Students’ self-efficacy to grade was greater for
Holistic proforma used mostly rating-scale the proforma than the rubric. Students felt
language, 5 levels, with some highly inferential rubric aided evaluation more than proforma at
description, 1/2 page; Analytic rubric was very first (when they needed the behaviors listed
complicated, more of a point scheme, 5 criteria explicitly) but changed in perception of
(+safety pass/fail), 5 levels-to rate that required competence to use the proforma by the end of
counting behaviors listed from the standards, 3 the semester. Rubric was more useful for
pages learning, but proforma was easier to use to
score.

plds, Performance Level Descriptions.

level descriptions used descriptive, as opposed to evaluative, None of the three could be shared with students, because they
language, which is expected to be more supportive of learning. would “give away” answers. Such rubrics are more useful for
Most commonly, these rubrics described four (14) or five (8) grading than for formative assessment supporting learning. This
performance levels. does not necessarily mean the rubrics were not of quality,
Four of the 51 rubrics (8%) were holistic, descriptive because they served well the grading function for which they
rubrics. This means they considered the criteria simultaneously, were designed. However, they represent a missed opportunity to
requiring one decision about work quality across all criteria at support learning as well as grading.
once. In addition, the performance level descriptions used the A few of the rubrics were not written in a descriptive manner.
desired descriptive language. Six of the analytic rubrics and one of the holistic rubrics used
Three of the rubrics were descriptive and task-specific. One rating scale language and/or listed counts of occurrences of
of these was an analytic rubric and two were holistic rubrics. elements in the work, instead of describing the quality of student

Frontiers in Education | www.frontiersin.org 7 April 2018 | Volume 3 | Article 22


Brookhart Appropriate Criteria: Key to Effective Rubrics

TABLE 4 | Descriptive case studies about developing and using rubrics.

Study Level Rubric topic & description Sample

Bissell and Lemons, Undergraduate Introductory Biology Paper-and-Pencil Tasks 150 students in 1 introductory biology
2006 Detailed task-specific point schemes for grading biology paper-and-pencil course
tasks
Bowen, 2017 Undergraduate Visual Literacy Competency Holistic rubric with 5 levels based on the SOLO 2 courses, popular culture & visual
taxonomy rhetoric; applied rubric to 1
assignment in each course
Davidowitz et al., 2005 Undergraduate Rubric for flow diagrams in chemistry labs 133 flow diagrams from 16 students
Analytic rubric with plds using mostly rating-scale language (some
descriptive) in 4 levels
Dinur and Sherman, Undergraduate Business Case Study Presentation 159 business students
2009 3 rubrics, 2 of which were true rubrics. Content rubric was a 1–5 rating
scale on 9 criteria; Oral presentation rubric was an analytic rubric with plds
using frequency-scale language on 4 levels of 4 criteria; Written Assignment
rubrics was an analytic rubric with 8 criteria (only 1 of which was about
content) and descriptive plds at 4 levels
Fraser et al., 2005 Undergraduate Business Writing Results summarized, sample size not
Analytic rubric with 6 criteria and descriptive plds at 5 levels given
Knight, 2006 Undergraduate Information Literacy (Annotated Bibliographies) 260 bibliographies with 10 citations in
Analytic rubric with 5 criteria and descriptive plds at 3 levels, but the each
descriptions include a lot of counting elements

plds, Performance Level Descriptions.

learning and performance. Thus 7 out of 51 (14%) of the rubrics performance level descriptions, or as internal consistency
were not of the quality that is expected to be best for student among criteria. Construct validity was addressed with a variety
learning (Arter and McTighe, 2001; Arter and Chappuis, 2006; of methods, from expert review to factor analysis; some studies
Andrade, 2010; Brookhart, 2013). also addressed consequential evidence for validity with student
Finally, eight of the 51 rubrics (16%) were not rubrics but or faculty questionnaires. No discernable patterns were found
rather rating scales (5) or point schemes for grading (3). It is that indicated one form of rubric was preferable to another in
possible that the authors were not aware of the more nuanced regard to reliability or validity. Although this conforms to my
meaning of “rubric” currently used by educators and used the hypothesis, this result is also partly because most of the studies’
term in a more generic way to mean any scoring scheme. reported results and experience with rubrics were positive, no
As the heart of Research Question 1 was about the potential of matter what type of rubric was used.
the rubrics used to contribute to student learning, I also coded the Table 5 describes 13 studies of the effects of rubrics on
studies according to whether the rubrics were used with students learning or motivation, all with positive results. Learning was
or whether they were just used by instructors for grading. Of the most commonly operationalized as improvement in student
46 studies, 26 (56%) reported using the rubrics with students and work. Motivation was typically operationalized as student
20 (43%) did not use rubrics with students but rather used them responses to questionnaires. In these studies as well, no
only for grading. discernable pattern was found regarding type of rubric. Despite
the logical and learning-based arguments made in the literature
and summarized in the introduction to this article, rubrics with
Relation of Rubric Type to Reliability, both descriptive and evaluative performance level descriptions
Validity, and Learning both led to at least some positive results for students. Eight of
Different studies reported different characteristics of their these studies used descriptive rubrics and five used evaluative
rubrics. I charted studies that reported evidence for the reliability rubrics. It is possible that the lack of association of type of rubric
of information from rubrics (Table 2) and the validity of with study findings is a result of publication bias, because most of
information from rubrics (Table 3). For the sake of completeness, the studies had good things to say about rubrics and their effects.
Table 4 lists six studies that presented their work with rubrics in The small sample size (13 studies) may also be an issue.
a descriptive case-study style that did not fit easily into Table 2
or Table 3 or in Table 5 (below) about the effects of rubrics on CONCLUSIONS
learning. With the inclusion of Table 4, readers have descriptions
of all 51 rubrics in all 46 studies reported under Research Rubrics are becoming more and more evident as part of
Question 1. assessment in higher education. Evidence for that claim is simply
Reliability was most commonly studied as inter-rater the number of studies that are published investigating this new
reliability, arguably the most important for rubrics because and growing interest and the assertions made in those studies
judgment is involved in matching student work with about rising interest in rubrics.

Frontiers in Education | www.frontiersin.org 8 April 2018 | Volume 3 | Article 22


Brookhart Appropriate Criteria: Key to Effective Rubrics

TABLE 5 | Studies of the effects of rubric use on student learning and motivation to learn.

Study Level Rubric topic & description Sample Design Findings

Andrade and Undergraduate Educational Psychology 14 teacher education Focus groups Students used rubrics to determine teacher’s
Du, 2005 Learning Vignettes Performance students who had used expectations, plan production, check their work
Rubric, Analytic rubric with 6 rubrics in Ed Psych in progress, and guide and reflect on feedback.
weighted criteria and descriptive plds Some students only checked the A and B
at 4 levels levels of the rubric, and some saw rubrics as a
way to “give teachers what they want.”
Ash et al., Undergraduate Service learning objectives, Critical 14 students in 2 Pre-experimental Improvement across drafts was noted, with the
2005 thinking classes Academic criterion being the most difficult for
Holistic rubric for service learning students. Improvement in first drafts across the
objectives, listed according to level of semester was also noted, but smaller, and
thinking, 0–4 (0 not described), so the again the Academic criterion was the hardest.
learning objectives formed the
descriptions; Holistic critical thinking
rubric, 4 levels, 8 simultaneous
criteria, descriptive plds
Britton et al., Undergraduate Team-Q Rubric for individual 70 students in a theater Instrument Significant improvement in teamwork skills from
2017 teamwork skills history and literature development first time to second time in both self-ratings
Final version: 5 criteria, each with course, 24 of whom and peer ratings. External ratings improved
behavioral descriptions, rated with a gave full consent from Time 1 to Time 2 but not significantly so.
5-level frequency scale (never to
always)
Howell, 2011 Undergraduate Juvenile delinquency course 80 students in 2 Quasi- Controlling for college year, criminal justice
assignment rubric sections of the experimental major (vs. not), pretest score and gender, being
Holistic grading rubric, somewhat instructor’s own course in the treatment group (having rubrics provided
task-specific, plds for each of 4 with the assignment) predicted achievement
levels, which were then converted to (β = 0.488). The only other large predictor was
points for grading college year. Student achievement was higher
when rubrics were used.
Howell, 2014 Undergraduate Juvenile delinquency course 76 students in 2 Quasi- Treatment group (completed an assignment
assignment rubric sections of the experimental using a grading rubric) scored higher than
Holistic grading rubric, somewhat instructor’s own course comparison group (same assignment, no
task-specific, plds for each of 4 rubric). Regression showed rubric used
levels, which were then converted to contributed significantly after controlling for
points for grading baseline course knowledge and gpa.
Kerby and Undergraduate Oral communications and 1 business accounting Case study Oral presentation skills improved from
Romine, 2010 & graduate presentation program sophomore to senior years, did not further
Analytic rubric with 8 criteria and improve in graduate level, which the
descriptive plds at 3 levels researchers attributed to more complex
material to present.
Kocakülah, Undergraduate Newton’s Laws of Motion problem 153 physics students in Quasi- Students who took part in the designing and
2010 solving 4 classes experimental using of a rubric, performed better in solving
Rubric style point scheme; Analytic problems than those who had the same
rubric with 6 criteria and descriptive instruction but no rubric.
plds at 5 levels, but points vary
depending on the criterion
McCormick Undergraduate Self-assessment of Executive 44 seniors in a Pre-experimental Student perceived competence increased over
et al., 2007 Leadership leadership education the semester. Half of the students accurately
Analytic rubric with 6 criteria and 8 course estimated their competence (based on final
levels (0–7), with descriptive plds at exam), the other half underestimated their
levels 2, 4, and 6 competence.
Menéndez- Undergraduate Service learning projects 84 history of art Validity study Significant increase in scores (quality of
Varela and 2 analytic rubrics. Content: 4 criteria, students projects) from stage 1 to stage 3 of the study,
Gregori-Giralt, 4 levels each, w/descriptive plds. Oral overall and for each of 5 raters individually;
2016 presentation: 5 criteria, 4 levels, work quality increased as rubric use was
descriptive plds except for time repeated
Petkov and Undergraduate Business Projects 20 students fall (rubric), Pre-experimental Rubrics group achievement was higher than
Petkova, 13 criteria grouped into 4 areas, with 20 students spring (no the comparison group.
2006 rating-scale language at 4 levels rubric)

(Continued)

Frontiers in Education | www.frontiersin.org 9 April 2018 | Volume 3 | Article 22


Brookhart Appropriate Criteria: Key to Effective Rubrics

TABLE 5 | Continued

Study Level Rubric topic & description Sample Design Findings

Reynolds- Undergraduate Writing 45 ed psych students Open-ended Pre-service teachers who used rubrics as
Keefer, Analytic rubric with 5 criteria and questionnaire students reported being more likely to use
2010 descriptive plds for 6 levels rubrics in their own teaching.
Ritchie, 2016 Undergraduate Oral presentations in biology 39 students in 2 Pre-experimental Students in self-assessment w/rubrics group
“Rubric” was really a rating scale with sections (1 w/rubric improved more in 2nd presentation, with less
15 criteria org under “content,” self-assessment & 1 variability. All viewed their videotaped
organization, & delivery, scored 1–5, without); each gave 2 presentation (cf. 47% of control grp). Peer
“poor/absent” to “no change needed” presentations assessment accurate (compared with
instructor), self-assessment was not.
Vandenberg Undergraduate Financial analysis project 49 students in 3 Pre-experimental Students who used rubrics scored significantly
et al., 2010 Analytic rubric with 5 criteria and sections of the course higher on two of three sections of the project.
descriptive plds for 5 levels Students with rubrics felt the requirements of
the assignment were more clearly
communicated that those without.

Plds, Performance Level Descriptions.

Research Question 1 asked about the type and quality of key to effective rubrics. Trivial or surface-level criteria will not
rubrics published in studies of rubrics in higher education. The draw learning goals for students as clearly as substantive criteria.
number of criteria varies widely depending on the rubric and Students will try to produce what is expected of them. If the
its purpose. Three, four, and five are the most common number criterion is simply having or counting something in their work
of levels. While most of the rubrics are descriptive—the type of (e.g., “has 5 paragraphs”), students need not pay attention to the
rubrics generally expected to be most useful for learning—many quality of what their work has. If the criterion is substantive (e.g.,
are not. Perhaps most surprising, and potentially troubling, is “states a compelling thesis”), attention to quality becomes part of
that only 56% of the studies reported using rubrics with students. the work.
If all that is required is a grading scheme, traditional point It is likely that appropriate performance level descriptions are
schemes or rating scales are easier for instructors to use. The also key for effective rubrics, but this review did not establish
value of a rubric lies in its formative potential (Panadero and this fact. A major recommendation for future research is to
Jonsson, 2013), where the same tool that students can use to learn design studies that investigate how students use the performance
and monitor their learning is then used for grading and final level descriptions as they work, in monitoring their work, and
evaluation by instructors. in their self-assessment judgments. Future research might also
Research Question 2 asked whether rubric type and quality focus on two additional characteristics of rubrics (Dawson,
were related to measurement quality (reliability and validity) or 2017): users and uses and judgment complexity. Several studies
effects on learning and motivation to learn. Among studies in this in this review established that students use rubrics to make
review, reported reliability and validity was not related to type expectations explicit. However, in only 56% of the studies were
of rubric. Reported effects on learning and/or motivation were rubrics used with students, thus missing the opportunity to
not related to type of rubric. The discussion above speculated take advantage of this important rubric function. Therefore,
that part of the reason for these findings might be publication it seems important to seek additional understanding of users
bias, because only studies with good effects—whatever the type and uses of rubrics. In this review, judgment complexity was
of rubric they used—were reported. a clear issue for one study (Young, 2013). In that study, a
However, we should not dismiss all the results with a hand- complex rubric was found more useful for learning, but a holistic
wave about publication bias. All of the tools in the studies of rating scale was easier to use once the learning had occurred.
rubrics—true rubrics, rating scales, checklists—had criteria. The This hint from one study suggests that different degrees of
differences were in the type of scale and scale descriptions used. judgment complexity might be more useful in different stages of
Criteria lay out for students and instructors what is expected learning.
in student work and, by extension, what it looks like when Rubrics are one way to make learning expectations explicit for
evidence of intended learning has been produced. Several of learners. Appropriate criteria are key. More research is needed
the articles stated explicitly that the point of rubrics was to that establishes how performance level descriptions function
make assignment expectations explicit (e.g., Andrade and Du, during learning and, more generally, how students use rubrics for
2005; Fraser et al., 2005; Reynolds-Keefer, 2010; Vandenberg learning, not just that they do.
et al., 2010; Jonsson, 2014; Prins et al., 2016). The criteria are
the assignment expectations: the qualities the final work should AUTHOR CONTRIBUTIONS
display. The performance level descriptions instantiate those
expectations at different levels of competence. Thus, one firm The author confirms being the sole contributor of this work and
conclusion from this review is that appropriate criteria are the approved it for publication.

Frontiers in Education | www.frontiersin.org 10 April 2018 | Volume 3 | Article 22


Brookhart Appropriate Criteria: Key to Effective Rubrics

REFERENCES DeWever, B., Van Keer, H., Schellens, T., and Valke, M. (2011). Assessing
collaboration in a wiki: the reliability of university students’ peer assessment.
Andrade, H. G. (2000). Using rubrics to promote thinking and learning. Internet High. Educ. 14, 201–206. doi: 10.1016/j.iheduc.2011.07.003
Educational Leadership 57, 13–18. Available online at: http://www.ascd.org/ Dinur, A., and Sherman, H. (2009). Incorporating outcomes assessment and
publications/educational-leadership/feb00/vol57/num05/Using-Rubrics-to- rubrics into case instruction. J. Behav. Appl. Manag. 10, 291–311.
Promote-Thinking-and-Learning.aspx Facione, N. C., and Facione, P. A. (1996). Externalizing the critical thinking in
Andrade, H., and Du, Y. (2005). Student perspectives on rubric-referenced knowledge development and clinical judgment. Nurs. Outlook 44, 129–136.
assessment. Pract. Assess. Res. Eval. 10, 1–11. Available online at: http:// doi: 10.1016/S0029-6554(06)80005-9
pareonline.net/pdf/v10n3.pdf Falchikov, N., and Boud, D. (1989). Student self-assessment in higher education: a
Andrade, H., and Heritage, M. (2017). Using Assessment to Enhance Learning, meta-analysis. Rev. Educ. Res. 59, 395–430.
Achievement, and Academic Self-Regulation. New York, NY: Routledge. Fraser, L., Harich, K., Norby, J., Brzovic, K., Rizkallah, T., and Loewy, D. (2005).
Andrade, H. L. (2010). “Students as the definitive source of formative assessment: Diagnostic and value-added assessment of business writing. Bus. Commun. Q.
academic self-assessment and the self-regulation of learning,” in Handbook of 68, 290–305. doi: 10.1177/1080569905279405
Formative Assessment, eds H. L. Andrade and G. J. Cizek (New York, NY: Garcia-Ros, R. (2011). Analysis and validation of a rubric to assess oral
Routledge), 90–105. presentation skills in university contexts. Electr. J. Res. Educ. Psychol. 9,
Arter, J. A., and Chappuis, J. (2006). Creating and Recognizing Quality Rubrics. 1043–1062.
Boston: Pearson. Hancock, A. B., and Brundage, S. B. (2010). Formative feedback, rubrics, and
Arter, J. A., and McTighe, J. (2001). Scoring Rubrics in the Classroom: Using assessment of professional competency through a speech-language pathology
Performance Criteria for Assessing and Improving Student Performance. graduate program. J. All. Health, 39, 110–119.
Thousand Oaks, CA: Corwin. Hattie, J., and Timperley, H. (2007). The power of feedback. Rev. Educ. Res. 77,
Ash, S. L., Clayton, P. H., and Atkinson, M. P. (2005). Integrating reflection and 81–112. doi: 10.3102/003465430298487
assessment to capture and improve student learning. Mich. J. Comm. Serv. Howell, R. J. (2011). Exploring the impact of grading rubrics on academic
Learn. 11, 49–60. Available online at: http://hdl.handle.net/2027/spo.3239521. performance: findings from a quasi-experimental, pre-post evaluation. J. Excell.
0011.204 Coll. Teach. 22, 31–49.
Avanzino, S. (2010). Starting from scratch and getting somewhere: Howell, R. J. (2014). Grading rubrics: hoopla or help? Innov. Educ. Teach. Int. 51,
assessment of oral communication proficiency in general education 400–410. doi: 10.1080/14703297.2013.785252
across lower and upper division courses. Commun. Teach. 24, 91–110. Jonsson, A. (2014). Rubrics as a way of providing transparency in assessment.
doi: 10.1080/17404621003680898 Assess. Eval. High. Educ. 39, 840–852. doi: 10.1080/02602938.2013.875117
Bauer, C. F., and Cole, R. (2012). Validation of an assessment rubric via Jonsson, A., and Svingby, G. (2007). The use of scoring rubrics: Reliability,
controlled modification of a classroom activity. J. Chem. Educ. 89, 1104–1108. validity and educational consequences. Educ. Res. Rev. 2, 130–144.
doi: 10.1021/ed2003324 doi: 10.1016/j.edurev.2007.05.002
Bell, A., Mladenovic, R., and Price, M. (2013). Students’ perceptions of the Kerby, D., and Romine, J. (2010). Develop oral presentation skills through
usefulness of marking guides, grade descriptors and annotated exemplars. accounting curriculum design and course-embedded assessment. Journal of
Assess. Eval. High. Educ. 38, 769–788. doi: 10.1080/02602938.2012.714738 Education for Business, 85, 172–179. doi: 10.1080/08832320903252389
Bissell, A. N., and Lemons, P. R. (2006). A new method for assessing Knight, L. A. (2006). Using rubrics to assess information literacy. Ref. Serv. Rev. 34,
critical thinking in the classroom. BioScience, 56, 66–72. doi: 10.1641/0006- 43–55. doi: 10.1108/00907320610640752
3568(2006)056[0066:ANMFAC]2.0.CO;2 Kocakülah, M. (2010). Development and application of a rubric for evaluating
Bowen, T. (2017). Assessing visual literacy: a case study of developing a rubric students’ performance on Newton’s Laws of Motion. J. Sci. Educ. Technol. 19,
for identifying and applying criteria to undergraduate student learning. Teach. 146–164. doi: 10.1007/s10956-009-9188-9
High. Educ. 22, 705–719. doi: 10.1080/13562517.2017.1289507 Latifa, A., Rahman, A., Hamra, A., Jabu, B., and Nur, R. (2015). Developing a
Britton, E., Simper, N., Leger, A., and Stephenson, J. (2017). Assessing practical rating rubric of speaking test for university students of English in
teamwork in undergraduate education: a measurement tool to evaluate Parepare, Indonesia. Engl. Lang. Teach. 8, 166–177. doi: 10.5539/elt.v8n6p166
individual teamwork skills. Assess. Eval. High. Educ. 42, 378–397. Lewis, L. K., Stiller, K., and Hardy, F. (2008). A clinical assessment tool used for
doi: 10.1080/02602938.2015.1116497 physiotherapy students—is it reliable? Physiother. Theory Pract. 24, 121–134.
Brookhart, S. M. (2013). How to Create and Use Rubrics for Formative Assessment doi: 10.1080/09593980701508894
and Grading. Alexandria, VA: ASCD. McCormick, M. J., Dooley, K. E., Lindner, J. R., and Cummins, R. L. (2007).
Brookhart, S. M., and Chen, F. (2015). The quality and effectiveness of descriptive Perceived growth versus actual growth in executive leadership competencies:
rubrics. Educ. Rev. 67, 343–368. doi: 10.1080/00131911.2014.929565 an application of the stair-step behaviorally anchored evaluation approach. J.
Brookhart, S. M., and Nitko, A. J. (2019). Educational Assessment of Students, 8th Agric. Educ. 48, 23–35. doi: 10.5032/jae.2007.02023
Edn. Boston, MA: Pearson. Menéndez-Varela, J., and Gregori-Giralt, E. (2016). The contribution of rubrics to
Chasteen, S. V., Pepper, R. E., Caballero, M. D., Pollock, S. J., and Perkins, K. the validity of performance assessment: a study of the conservation-restoration
K. (2012). Colorado Upper-Division Electrostatics diagnostic: a conceptual and design undergraduate degrees. Assess. Eval. High. Educ. 41, 228–244.
assessment for the junior level. Phys. Rev. Spec. Top. Phys. Educ. Res. 8:020108. doi: 10.1080/02602938.2014.998169
doi: 10.1103/PhysRevSTPER.8.020108 Moni, R. W., Beswick, E., and Moni, K. B. (2005). Using student feedback to
Cho, K., Schunn, C. D., and Wilson, R. W. (2006). Validity and reliability of construct an assessment rubric for a concept map in physiology. Adv. Physiol.
scaffolded peer assessment of writing from instructor and student perspectives. Educ. 29, 197–203. doi: 10.1152/advan.00066.2004
J. Educ. Psychol. 98, 891–901. doi: 10.1037/0022-0663.98.4.891 Newman, L. R., Lown, B. A., Jones, R. N., Johansson, A., and Schwartzstein, R. M.
Ciorba, C. R., and Smith, N. Y. (2009). Measurement of instrumental and vocal (2009). Developing a peer assessment of lecturing instrument: lessons learned.
undergraduate performance juries using a multidimensional assessment rubric. Acad. Med. 84, 1104–1110. doi: 10.1097/ACM.0b013e3181ad18f9
J. Res. Music Educ. 57, 5–15. doi: 10.1177/0022429409333405 Nicholson, P., Gillis, S., and Dunning, A. M. (2009). The use of scoring rubrics to
Davidowitz, B., Rollnick, M., and Fakudze, C. (2005). Development and application determine clinical performance in the operating suite. Nurse Educ. Today 29,
of a rubric for analysis of novice students’ laboratory flow diagrams. Int. J. Sci. 73–82. doi: 10.1016/j.nedt.2008.06.011
Educ. 27, 43–59. doi: 10.1080/0950069042000243754 Nordrum, L., Evans, K., and Gustafsson, M. (2013). Comparing student
Dawson, P. (2017). Assessment rubrics: towards clearer and more replicable learning experiences of in-text commentary and rubric-articulated feedback:
design, research and practice. Assess. Eval. High. Educ. 42, 347–360. strategies for formative assessment. Assess. Eval. High. Educ. 38, 919–940.
doi: 10.1080/02602938.2015.1111294 doi: 10.1080/02602938.2012.758229

Frontiers in Education | www.frontiersin.org 11 April 2018 | Volume 3 | Article 22


Brookhart Appropriate Criteria: Key to Effective Rubrics

Pagano, N., Bernhardt, S. A., Reynolds, D., Williams, M., and McCurrie, M. (2008). rubric for grading APA-style introductions. Teach. Psychol. 36, 102–107.
An inter-institutional model for college writing assessment. Coll. Composition doi: 10.1080/00986280902739776
Commun. 60, 285–320. Timmerman, B. E. C., Strickland, D. C., Johnson, R. L., and Payne, J. R. (2011).
Panadero, E., and Jonsson, A. (2013). The use of scoring rubrics for formative Development of a ‘universal’ rubric for assessing undergraduates’ scientific
assessment purposes revisited: a review. Educ. Res. Rev. 9, 129–144. reasoning skills using scientific writing. Assess. Eval. High. Educ. 36, 509–547.
doi: 10.1016/j.edurev.2013.01.002 doi: 10.1080/02602930903540991
Petkov, D., and Petkova, O. (2006). Development of scoring rubrics for IS projects Torrance, H. (2007). Assessment as learning? How the use of explicit learning
as an assessment tool. Issues Informing Sci. Inform. Technol. 3, 499–510. objectives, assessment criteria and feedback in post-secondary education
doi: 10.28945/910 and training can come to dominate learning. Assess. Educ. 14, 281–294.
Prins, F. J., de Kleijn, R., and van Tartwijk, J. (2016). Students’ use of doi: 10.1080/09695940701591867
a rubric for research theses. Assess. Eval. High. Educ. 42, 128–150. Urios, M. I., Rangel, E. R., Tomàs, R. B., Salvador, J. T., Garci,á, F. C., and
doi: 10.1080/02602938.2015.1085954 Piquer, C. F. (2015). Generic skills development and learning/assessment
Reddy, M. Y. (2011). Design and development of rubrics to improve assessment process: use of rubrics and student validation. J. Technol. Sci. Educ. 5, 107–121.
outcomes: a pilot study in a master’s level Business program in India. Qual. doi: 10.3926/jotse.147
Assur. Educ. 19, 84–104. doi: 10.1108/09684881111107771 Vandenberg, A., Stollak, M., McKeag, L., and Obermann, D. (2010). GPS in the
Reddy, Y., and Andrade, H. (2010). A review of rubric use in higher classroom: using rubrics to increase student achievement. Res. High. Educ. J. 9,
education. Assess. Eval. High. Educ. 35, 435–448. doi: 10.1080/026029309028 1–10. Available online at: http://www.aabri.com/manuscripts/10522.pdf
62859 Wald, H. S., Borkan, J. M., Taylor, J. S., Anthony, D., and Reis, S. P. (2012).
Reynolds-Keefer, L. (2010). Rubric-referenced assessment in teacher preparation: Fostering and evaluating reflective capacity in medical education: developing
an opportunity to learn by using. Pract. Assess. Res. Eval. 15, 1–9. Available the REFLECT rubric for assessing reflective writing. Acad. Med. 87, 41–50.
online at: http://pareonline.net/getvn.asp?v=15&n=8 doi: 10.1097/ACM.0b013e31823b55fa
Rezaei, A., and Lovorn, M. (2010). Reliability and validity of rubrics for assessment Wallace, C. S., Prather, E. E., and Duncan, D. K. (2011). A study of
through writing. Assess. Writing, 15, 18–39. doi: 10.1016/j.asw.2010.01.003 general education Astronomy students’ understandings of cosmology. Part II.
Ritchie, S. M. (2016). Self-assessment of video-recorded presentations: Evaluating four conceptual cosmology surveys: a classical test theory approach.
does it improve skills? Act. Learn. High. Educ. 17, 207–221. Astron. Educ. Rev. 10:010107. doi: 10.3847/AER2011030
doi: 10.1177/1469787416654807 Young, C. (2013). Initiating self-assessment strategies in novice physiotherapy
Rochford, L., and Borchert, P. S. (2011). Assessing higher level learning: students: a method case study. Assess. Eval. High. Educ. 38, 998–1011.
developing rubrics for case analysis. J. Educ. Bus. 86, 258–265. doi: 10.1080/02602938.2013.771255
doi: 10.1080/08832323.2010.512319
Sadler, D. R. (2014). The futility of attempting to codify academic achievement Conflict of Interest Statement: The author declares that the research was
standards. High. Educ. 67, 273–288. doi: 10.1007/s10734-013-9649-1 conducted in the absence of any commercial or financial relationships that could
Schamber, J. F., and Mahoney, S. L. (2006). Assessing and improving the quality of be construed as a potential conflict of interest.
group critical thinking exhibited in the final projects of collaborative learning
groups. J. Gen. Educ. 55, 103–137. doi: 10.1353/jge.2006.0025 Copyright © 2018 Brookhart. This is an open-access article distributed under the
Schreiber, L. M., Paul, G. D., and Shibley, L. R. (2012). The development and terms of the Creative Commons Attribution License (CC BY). The use, distribution
test of the public speaking competence rubric. Commun. Educ. 61, 205–233. or reproduction in other forums is permitted, provided the original author(s) and the
doi: 10.1080/03634523.2012.670709 copyright owner are credited and that the original publication in this journal is cited,
Stellmack, M. A., Konheim-Kalkstein, Y. L., Manor, J. E., Massey, A. R., in accordance with accepted academic practice. No use, distribution or reproduction
and Schmitz, J. P. (2009). An assessment of reliability and validity of a is permitted which does not comply with these terms.

Frontiers in Education | www.frontiersin.org 12 April 2018 | Volume 3 | Article 22

You might also like