Educational Assessments in The COVID 19 Era and Beyond
Educational Assessments in The COVID 19 Era and Beyond
Educational Assessments in The COVID 19 Era and Beyond
Educational Assessments
in the COVID-19 Era and Beyond
[Concerning] the children of this pandemic … [t]he models no longer apply, the
benchmarks are no longer valid, the trend analyses have been interrupted….
When the children return to school, they will have returned with a new history
that we will need to help them identify and make sense of…. There is no
assessment that applies to who they are or what they have learned.
Dr. Teresa Thayer Snyder,
Retired Superintendent, Voorheesville Central School District, NY1
[G]iven a shortage of testing data for Black, Hispanic and poor children, it could
well be that these groups have fared worse in the pandemic than their white or
more affluent peers…. Given these realities, the new education secretary …
should resist calls to put off annual student testing.
Editorial Board,
The New York Times2
INTRODUCTION
As the COVID-19 pandemic continues to spread across the United States and around the
world, school systems everywhere are in crisis management, with education leaders and
teachers struggling to provide continuous instruction via combinations of in-person,
virtual, and hybrid learning modes. In this uncertain and fluid environment, the regular
challenges of assessing what and how students are learning have become even more
complex: teachers need information to guide classroom-level learning—no matter which
instructional mode—and states, school districts, schools, teachers, parents and caregivers,
students, and communities need evidence of how COVID-19 is affecting historically
marginalized, disadvantaged, and underserved students.
The two quotes above reflect diverse opinions about what information regarding student
learning is most needed, the critical audiences for that information, and the most
appropriate ways to obtain it in the remainder of the current school year and for the next
school year beginning in fall 2021. Although reliable data are necessary to inform future
educational goals and resource allocations, how these data are gathered, and ultimately
used, is contested. The persistent debate about fair uses of assessment for instructional
improvement and accountability has become more heated, as educators and policy
makers weigh the benefits and risks of suspending mandatory assessment requirements
under the federal Every Student Succeeds Act of 2015 (ESSA), which requires annual
summative assessments in grades 3–8 and once in grades 10–12. At the crux of this
argument is the balance between the fairness of holding schools, teachers, and students
A fundamental and familiar question, therefore, centers on the rationale for assessment.
What are its goals and, in particular, can assessment advance teaching and learning and
reduce educational inequities? In the near term, what are the best “uses” of assessment in
2021?
To address these questions, the National Academy of Education (NAEd) convened a group
of scholars, policy leaders, and educators (see the list attached to this summary report)
for a focused discussion of the “how” and “why” of testing in both the contexts of the
special circumstances of 2021 and beyond. This online roundtable built on NAEd’s prior
work addressing COVID-19 as well as its historical3 and recent4 work addressing
educational assessments. Presented here are some of the overarching themes of the
conversation to stimulate further discussion among educators, researchers, policy
makers, and the general public.
This summary report begins where the roundtable conversation kept returning, with a
description of the purposes and intended users of different types of assessments. Next, it
discusses inequities in education and implications for the appropriate uses of assessment.
Then the report addresses the 2021 end-of-year “summative” assessments: assuming that
school districts will administer such assessments, it points out caveats to keep in mind
regarding test administration, interpretation, and intended and unintended uses of test
results. Finally, looking beyond 2020–2021 end-of-year assessments, the report discusses
themes that emerged, including ensuring that assessment systems are balanced and
equitable, reframing accountability from a deficit lens to an improvement perspective,
and expanding assessment literacy.
No single test can serve all of these purposes with requisite validity and reliability. 7
Critically, the intended purposes and uses of a test should be defined and explicitly
addressed both at the stages of design and interpretation of results.8 More precisely, uses
Another way of thinking of the uses of assessment would be to categorize them as follows:
assessments for learning, assessments as learning, and assessments of learning.
Assessments for learning enable teachers to use information about students’ knowledge
and skills to inform teaching and to provide feedback to students to help them monitor
and improve their learning. Assessments as learning occurs when participating in an
assessment not only tracks learning but affects it. Assessments of learning monitor
knowledge and understanding, as demonstrated by performance on the tests, often in
terms of progress toward defined learning goals.
Additionally, assessments should not only measure outcomes (i.e., what students have
learned) but also processes (i.e., how teaching and learning is occurring) and “opportunity
to learn” constructs. The COVID-19 pandemic in many ways brings to the forefront the
importance of understanding and documenting the processes and contexts of learning and
the need to account for them in the design and interpretation of assessments.
Frequency and
Use Context Purpose Intended Uses Intended Users
• Formative: • Inform instruction • Teachers
ongoing during • Provide feedback to • Students
the course of students • Parents and
instruction • As input to grading caregivers
Classrooms • Periodic • Principals
and Summative:
Schools end-of-unit
and/or end-of-
semester and/
or end-of-year
Frequency and
Use Context Purpose Intended Uses Intended Users
• Periodic • Feedback and guidance to • Teachers
Summative: principals and teachers • Students
monthly; for improved instruction • Parents and
quarterly; semi- • Feedback and guidance caregivers
annually, as for school and district • Principals
desired leaders on the • School district
School effectiveness of certain leaders
Districts programs, instructional • General public
approaches, and curricula
• Monitoring of school and
district progress
• Inform choices for
resource allocation
In mid-March 2020, because of the pandemic, the U.S. Department of Education (ED)
granted temporary waivers to all 50 states, the District of Columbia, the Commonwealth of
Puerto Rico, and the Bureau of Indian Education of the U.S. Department of the Interior,
which were meant to relieve them from the mandate to administer standardized tests and
the associated reporting requirements at the end of the 2019–2020 school year. However,
on September 3, 2020, the ED stated that it will not grant waivers of the summative
testing requirements for the 2020–2021 school year, citing research12 that school closures
affected the most vulnerable students disproportionately and widened disparities. The
ED’s policy is based on the argument that assessment data are needed to document
learning and educational disparities and to guide decision-making.13
Administration
Test administration procedures are developed for an assessment program in large part to
reduce measurement error and increase the validity and reliability of the inferences
drawn from the assessment. These procedures address numerous factors, such as the
timing of test administration, test format (e.g., paper and pencil or digital, multiple choice
or other item forms), location and conditions of testing (e.g., remote, in school, in school
wearing masks), and implementation of accommodations for test-takers, such as students
with disabilities or English learners. The ability for testing sites to adhere to test
administration procedures must be examined and contextualized prior to interpreting or
using the resulting data. When looking at the 2021 end-of-year, large-scale summative
assessments, key test administration procedures to consider are:
• Conditions and contexts of administration. This year, states and districts will likely
vary the contexts of administration contemplated for state- or district-level
assessments. Although most states are preparing for in-school testing, given that
Interpretation
The interpretation of the assessment results necessarily requires some type of
comparison of scores or other summaries of data. For individual, subgroup, or even school
- or district-level interpretations, assessments need to be referenced or compared to prior
years, or past performance, or to an absolute standard such as a cut point. 16 Under pre-
COVID-19 conditions, comparability concerns were already prevalent and critical to
examine. In fact, in early 2020, the NAEd produced a volume, The Comparability of Large-
Scale Educational Assessments: Issues and Recommendations, addressing how to ensure (or
improve) comparability to better interpret test results. Given the disruptions to society
and the educational system since March 2020, making valid interpretations 17 from 2020–
2021 summative assessment data will be even more difficult. It will be crucial to provide
as much contextual information as possible when interpreting such data. In addition to the
“normal” comparability concerns outlined in the 2020 NAEd volume, there are other high-
level considerations that should be addressed in reporting the results of the 2020–2021
assessments:
• Content of instruction. Validity and reliability of inferences from test scores hinge,
in most cases, on the extent to which the test is designed to align with standards,
curricula, and instruction. Given the pandemic, states, districts, schools, and
teachers were forced to prioritize the educational content taught to students. Both
last and this school year, some standards and curricula were modified, forsaken, or
delayed to a later date or grade level. However, year-end summative assessments
in most cases were likely not similarly modified. It is critical for schools and
districts to determine what content and skills were actually taught and to provide
this contextualization to the test scores.
• Modes of instruction. How content was delivered to students varied not only by
states, districts, schools, and classrooms, but also varied within these contexts. It
also varied over time—some students might have started with remote learning,
attended school for a short period of time, and then returned to the remote
modality. Moreover, what “remote learning” means significantly varied—for some
it was delivered through paper learning packets, others computer-based videos,
and for others a mix of synchronous and asynchronous learning. Experts in digital
and online pedagogy are quick to emphasize the differences between “emergency
remote instruction” and high-quality virtual teaching and learning. Within these
diverse environments, instruction varied widely, and the “what” and “how” of this
• Conditions and contexts of administration. While most states are planning for in-
person administration of assessments, some may permit remote examinations; as
described above, even these examinations will likely take different forms. The
conditions and contexts of administration are likely nonrandom and could affect
claims concerning comparability and other important components of assessment
interpretation.
• Participation rates. Not all students will take end-of-year assessments. While we
have yet to see a national opt-out movement, some research already reports that a
majority of parents support cancelling the 2020–2021 end-of-year summative
assessments.18 It is likely that some parents, caregivers, and students will choose
not to have their children (or themselves) return to campus—if they are in remote
learning—simply to enable testing. Additionally, if those opting out are
nonrandomly distributed and include a larger percentage of historically
marginalized or disadvantaged students and others who are relatively less engaged
in schooling because of the pandemic, the interpretation of assessment results will
become more challenging, with test results not supporting valid or reliable
inferences about performance. On a final note, ESSA requires that 95% of all
students and 95% of all student subgroups participate in the end-of-year state
assessments. With likely lower participation rates, this factor is another
comparability and interpretive dimension that must be contextualized (i.e., who
did and did not get tested, and why?).
• Social and emotional well-being. While assessment conditions for some students
can be stressful and anxiety-producing in “normal” times, the pandemic is likely to
make things even worse. Some students will have concerns about health risks and
physical safety in school buildings. Moreover, unusual testing conditions, such as
mask wearing, distancing, and barricades to prevent the spread of the virus can
increase stress, which further compromises the validity and reliability of
assessment data. Students’ social and emotional well-being prior to and during
testing will likely be reflected in their scores but are difficult to account for
accurately. Again, interpretations of assessment results must be sensitive to
contextual determinants of student well-being.
Use Cases
Above we address the caveats needed to make any interpretations from the 2020–2021
summative assessments. Here, we address specific “use” cases of the 2020–2021 end-of-
year summative assessments. Assessments are designed and validated for specific uses.
ESSA-mandated tests are meant to be designed to measure student achievement and these
data are then used for ESSA-mandated accountability.21 ESSA also requires, for
accountability purposes, that data be disaggregated to the subgroup level. In addition to
federal requirements,22 states also have various other mandates surrounding the uses of
educational assessments, including for grade promotion, teacher evaluations, high school
graduation, certain student grading, and ranking or rating schools.
• Educational inequities. NCLB was the first time states were federally required to
report and account for subgroup assessment results. As a result of this NCLB
mandate, inequities among subgroups at the school building level were identified
and highlighted for the first time. ESSA, NCLB’s successor, requires, as did NCLB,
that school-level accountability data be reported by the following subgroups:
economically disadvantaged students; racial/ethnic groups; students with
disabilities as defined by the Individuals with Disabilities Education Act (IDEA);
• Opportunity to learn. As noted above, for valid and reliable interpretations and
uses of assessments, the assessments must be appropriately contextualized. For
instance, it must be understood how much of the curriculum was covered, how the
material was imparted (in-person, remotely, synchronous, asynchronous), and the
composition of students’ learning environments. The instructional delivery mode
needs to be further parsed as, for example, synchronous online learning likely
widely varies and these differences should be, to the extent possible, examined. It is
important to understand if students were engaging in remote learning and at the
same time caring for younger siblings, suffering from food insecurity, struggling
socially and emotionally, sharing limited technology devices, struggling to access
Wi-Fi, or learning in abusive environments. If the 2020–2021 exams were to be
administered, this contextual information would be necessary for interpretation. 26
Moreover, regardless of contextualizing the summative examinations, gathering
OTL data is critical to highlight educational inequities and to allocate resources and
supports accordingly.27
As we continue to define what to measure, we must ensure that assessments reliably and
validly measure the “what.” Below are goals for assessments that emerged during the
roundtable conversation to consider as we think about assessments and their uses to help
improve teaching and learning by informing decisions about students; teachers, curricula,
programs, and schools; funding; and policy.
Communicate Clearly (and Often) the Intended Purposes and Uses of Particular
Assessments as Well as Any Relevant Context. People want tests to provide simple
answers to complex questions. Instead, we must continue to emphasize the intended
interpretations and uses of each particular assessment. We also must contextualize
assessment results, emphasize measurement error and uncertainty, and warn against
unwarranted causal attributions. For example, state leaders should meet with staff from
media outlets well before test results are produced to provide them with a framework for
interpreting the results.
Expand Assessment Literacy. Assessments are only useful if those who could benefit
from the information can access, interpret, and use the information to improve teaching
and learning. Recognizing that appropriately educating all who interpret and make use of
educational testing data is no small task, we offer a few suggestions. First, we need to
ensure that the right people quickly gain access to and use testing data. Second, we need
to ensure that teachers, administrators, parents and caregivers, and students are educated
in how to interpret and use assessments to further teaching and learning and create
equitable educational opportunities. For teachers, this may result in professional
development and in-service opportunities. Like all aspects of education, parents and
caregivers need to be seen as integral partners in using assessments to further learning.
Finally, it is critical that policy makers and media outlets are provided with a framework
and context to understand, interpret, and report results.
Examine the Equity Concerns Inherent in Other Assessments. While the NAEd
roundtable focused on formative and summative assessments, there are additional
assessments, particularly those used for diagnostic and classification purposes that are
rife with equity concerns. Assessments are used to diagnose (e.g., disabilities), classify
(e.g., English learners), place and assign (e.g., gifted and talented, advanced placement),
promote and demote, and certify and graduate students. Prior to the COVID-19 pandemic,
some such assessments raised validity and equity concerns. We imagine that because of
the pandemic, these assessments may compound inequities and students who have
historically benefited from such assessments are more likely to benefit now and those
historically harmed will likely be more harmed. Thus, these assessments need to be
closely examined and refined to ensure that they are valid measures of their intended
purposes and do not instead further exacerbate educational inequities.
Encourage Innovation and Flexibility. Ideally, systems of assessment would serve the
improvement of institutions, the improvement of teaching and learning, the improvement
of teachers, and the improvement of students. To accomplish this, we need to encourage
appropriate, mindful, and documented flexibility and innovation to see what works and
when, and then the results can be used to encourage some degree of uniformity at a more
macro level. At the federal level, through the reauthorization of ESSA, the federal
government should encourage innovation in assessment and accountability, including
approaches that look beyond testing per se. ESSA should provide waivers and fund pilots
for states to appropriately experiment with assessment techniques and provide feedback
Address Ongoing COVID-19 Implications. COVID-19 implications will be felt for years,
and we must continue to attempt to measure these implications on both academic and
social and emotional learning and provide supports to address them. Moreover, we will
have a generation of children impacted by the COVID-19 pandemic who will lack
benchmark assessments, have inconsistent measures, or for a variety of factors stated
above have summative assessment measures impacted by OTL or other contextual
variables. We must be vigilant to monitor and address the COVID-19 legacy, particularly
for our historically disadvantaged children.
CONCLUSION
Assessments, if used properly, can help us to mitigate the impacts of the COVID-19
pandemic for years to come. If used improperly, assessments may waste precious
instructional time and resources, worsen inequities, reinforce misperceptions as to
sources of inequity, and impede sound education policy. While most people agree that
critical data are needed to measure academic knowledge, the “what” and “how” continue
to afflict us. Thus, we encourage further discussions among educators, researchers, policy
makers, and the general public to work toward making sure educational assessments are
part of a system to further teaching and learning and to further the pursuit of equity.
AERA (American Educational Research Association), APA (American Psychological Association), &
NCME (National Council on Measurement in Education). (2014). Standards for educational and
psychological measurement. Washington, DC: AERA.
Alexander, L., James, H. T., & Glaser, R. (1987). The nation's report card: Improving the assessment of
student achievement. Washington, DC: National Academy of Education.
Baghian, J. (2021). Assessment data can help us build back better. Education Next Forum.
Berman, A. I., Feuer, M. J., & Pellegrino, J. W. (2019). What use is educational assessment?
The ANNALS of the American Academy of Political and Social Science, 683(1), 8–20. https://
doi.org/10.1177/0002716219843871.
Berman, A. I., Haertel, E. H., & Pellegrino, J. W. (Eds.). (2020). Comparability of large-scale educational
assessments: Issues and recommendations. Washington, DC: National Academy of Education. https://
doi.org/10.31094/2020/1.
Black, P., & Wiliam, D. (2010). Inside the black box: Raising standards through classroom
assessment. Phi Delta Kappan, 92(1), 81–90. https://doi.org/10.1177/003172171009200119.
Boyer, M., Dadey, N., & Keng L. (2020, September). Statewide summative assessment in
spring 2021: A workbook to support planning and decision-making. Dover, NH: National Center for the
Improvement of Educational Assessment.
Connecticut State Department of Education. (2020, June 29). Sensible assessment practices for 2020-21
and beyond.
Council of Chief State School Officers. (2020). Restart & recovery: Assessments in spring 2021.
DePascale, C., & Gong, B. (2020). Comparability of individual students’ scores on the “same test”. In A. I.
Berman, E. H. Haertel, & J. W. Pellegrino (Eds.), Comparability of large-scale educational assessments:
Issues and recommendations (pp. 25–48). Washington, DC: National Academy of Education. https://
doi.org/10.31094/2020/1.
DeVos, B. (letter, September 3, 2020). Key policy letters signed by the Education Secretary or Deputy
Secretary.
Dorn, E., Hancock, B., Sarakatsannis, J., & Viruleg, E. (2020, June). COVID-19 and student
learning in the United States: The hurt could last a lifetime. McKinsey & Company.
Glaser, R., Linn, R., & Bohrnstedt, G. (1997). Assessment in transition: Monitoring the nation’s educational
progress. Washington, DC: National Academy of Education.
Gordon, E. W. (1995). Toward an equitable system of assessment. The Journal of Negro Education, 64(3),
360–372.
Haertel, E., & Ho, A. (2016). Fairness using derived scores. In N. J. Dorans & L. L. Cook (Eds.), Fairness in
educational assessment and measurement (1st ed.). New York: Routledge. https://
doi.org/10.4324/9781315774527.
Kane, M. T. (2016). Explicating validity. Assessment in Education: Principles, Policy & Practice, 23(2), 198–
211. https://doi.org/10.1080/0969594X.2015.1060192.
Keng, L., Boyer, M., & Marion, S. F. (2020). Into the unknown: Assessment considerations for spring
2021. Educational Measurement: Issues and Practice, 39(3), 53–59. http://dx.doi.org/10.1111/
emip.12362.
Kuhfeld, M., Soland, J., Tarasawa, B., Johnson, A., Ruzek, E., & Liu, J. (2020, December 3). How is COVID-19
affecting student learning? Initial findings from fall 2020. Brookings, Brown Center Chalkboard.
Kuhfeld, M., Soland, J., Tarasawa, B., Johnson, A., Ruzek, E., & Liu, J. (2020). Projecting the potential
impact of COVID-19 school closures on academic achievement. Educational Researcher, 49(8), 549–565.
https://doi.org/10.3102/0013189X20965918.
Marion, S. (2020, October). Using opportunity-to-learn data to support educational equity. Dover, NH:
National Center for the Improvement of Educational Assessment.
Marion, S., Gong, B., Lorie, W., & Kockler, R. (2020, July). Restart & recovery: Assessment consideration
for fall 2020. Council of Chief State School Officers.
Marion, S. F., Gonzales, D., Wiener, R., & Peltzman, A. (2020). This is not a test, this is an emergency:
Special considerations for assessing and advancing equity in school-year 2020–21. National Center for the
Improvement of Educational Assessment (www.nciea.org) and The Aspen Institute
(www.aspeninstitute.org/education).
Marion, S. & Shepard, L. (2021). Focus on instructional and intervention, not testing, in 2021. Education
Next Forum.
Moss, P. A., Pullin, D. C., Gee, J., Haertel, E. H., & Young, L. J. (2008). Assessment, equity, and opportunity to
learn. Cambridge, UK: Cambridge University Press.
National Academy of Education. (2009). Education policy white paper on standards, assessments, and
accountability. L. Shepard, J. Hannaway, & E. Baker (Eds.). Washington, DC: Author.
National Academies of Sciences, Engineering, and Medicine. (2019). Monitoring educational equity.
Washington, DC: The National Academies Press. https://doi.org/10.17226/25389.
National Education Association. (2003). Balanced assessment: The key to accountability and improved
student learning. Washington, DC: Author.
National Research Council. (2011). Incentives and test-based accountability in education. Washington,
DC: The National Academies Press. https://doi.org/10.17226/12521.
National Research Council. (2001). Knowing what students know: The science and design of educational
assessment. Committee on the Foundations of Assessment. J. Pellegrino, N. Chudowsky, & R. Glaser
(Eds.). Board on Testing and Assessment, Center for Education. Division of Behavioral and Social
Sciences and Education. Washington, DC: National Academy Press.
National Research Council & National Academy of Education. (2010). Getting value out of value-added:
Report of a workshop. Committee on Value-Added Methodology for Instructional Improvement, Program
Evaluation, and Educational Accountability. H. Braun, N. Chudowsky, & J. Koenig (Eds.). Center for
Education, Division of Behavioral and Social Sciences and Education. Washington, DC: The National
Academies Press.
Shepard, L. A. (2020, December 16). Testing students this spring would be a mistake. Education Week.
Shepard, L. A. (2000). The role of assessment in a learning culture. Educational Researcher, 29(7), 4–14.
Silver, D., & Polikoff, M. (2020, November 16). Getting testy about testing—K–12 parents support
canceling standardized testing this spring. That might not be a good idea. The 74.
Singer, J. D., Braun, H. I., & Chudowsky, N. (Eds.). (2018). International education assessments: Cautions,
conundrums, and common sense. Washington, DC: National Academy of Education.
Soland, J., Kuhfeld, M., Tarasawa, B., Johnson, A., Ruzek, E., & Liu, J. (2020, May 27). The impact of COVID-
19 on student achievement and what it may mean for education. Brookings, Brown Center Chalkboard.
U.S. Congress, Office of Technology Assessment (1992). Testing in American schools: Asking the right
questions. Washington, DC: U.S. Government Printing Office.
Kent McGuire
Program Director of Education
William and Flora Hewlett Foundation
NAEd Staff
Amy I. Berman
Deputy Director
Dian Dong
Senior Program Officer
1
This excerpt comes from a “Dear Friends and Colleagues” -scale educational assessments: Issues and recommendations.
letter that Dr. Teresa Thayer Snyder posted to Facebook on Washington, DC: National Academy of Education. https://
December 6, 2020. It was widely shared including in full on doi.org/10.31094/2020/1.
Diane Ravitch’s blog on December 12, 2020.
6
While not all uses of testing are justifiable, it is important to
2
The Editorial Board (2021, January 2). The wreckage Betsy recognize them so that they can be addressed when
DeVos leaves behind: The Education Department lies in ruins attempting to expand assessment literacy. (For more
right when it’s needed most. The New York Times. information see “Beyond 2020–2021 Assessments” below.)
3 7
See National Research Council & National Academy of Berman, A. I., Haertel, E. H., & Pellegrino, J. W. (Eds.). (2020).
Education. (2010). Getting value out of value-added: Report of Comparability of large-scale educational assessments: Issues
a workshop. Committee on Value-Added Methodology for and recommendations. Washington, DC: National Academy of
Instructional Improvement, Program Evaluation, and Education. https://doi.org/10.31094/2020/1; Berman, A. I.,
Educational Accountability. H. Braun, N. Chudowsky, & J. Feuer, M. J., & Pellegrino, J. W. (2019). What use is educational
Koenig (Eds.). Center for Education, Division of Behavioral and assessment? The ANNALS of the American Academy of Political
Social Sciences and Education. Washington, DC: The National and Social Science, 683(1), 8–20. https://
Academies Press; National Academy of Education. (2009). doi.org/10.1177/0002716219843871; Connecticut State
Education policy white paper on standards, assessments, and Department of Education. (2020, June 29). Sensible assessment
accountability. L. Shepard, J. Hannaway, & E. Baker (Eds.). practices for 2020–21 and beyond; National Research Council.
Washington, DC: Author; Glaser, R., Linn, R., & Bohrnstedt, G. (2001). National Research Council. (2001). Knowing what
(1997). Assessment in transition: Monitoring the nation’s students know: The science and design of educational
educational progress. Washington, DC: National Academy of assessment. Committee on the Foundations of Assessment. J.
Education; Alexander, L., James, H. T., & Glaser, R. (1987). The Pellegrino, N. Chudowsky, & R. Glaser (Eds.). Board on Testing
nation's report card: Improving the assessment of student and Assessment, Center for Education. Division of Behavioral
achievement. Washington, DC: National Academy of Education. and Social Sciences and Education. Washington, DC: National
Academy Press; U.S. Congress, Office of Technology
4
See Berman, A. I., Haertel, E. H., & Pellegrino, J. W. (Eds.). Assessment. (1992), Testing in American schools: Asking the
(2020). Comparability of large-scale educational assessments: right questions. Washington, DC: U.S. Government Printing
Issues and recommendations. Washington, DC: National Office.
Academy of Education. https://doi.org/10.31094/2020/1;
8
Berman, A. I., Feuer, M. J., & Pellegrino, J. W. (2019). What use Berman, A. I., Haertel, E. H., & Pellegrino, J. W. (Eds.). (2020).
is educational assessment? The ANNALS of the American Comparability of large-scale educational assessments: Issues
Academy of Political and Social Science, 683(1), 8–20. https:// and recommendations. Washington, DC: National Academy of
doi.org/10.1177/0002716219843871; Singer, J. D., Braun, H. I., Education. https://doi.org/10.31094/2020/1; AERA (American
& Chudowsky, N. (Eds.). (2018). International education Educational Research Association), APA (American
assessments: Cautions, conundrums, and common sense. Psychological Association), & NCME (National Council on
Washington, DC: National Academy of Education. Measurement in Education). (2014). Standards for educational
and psychological measurement. Washington, DC: AERA.
5
The words assessment and test are used throughout this
9
report, and though to some extent they are interchangeable, Haertel, E., & Ho, A. (2016). Fairness using derived scores. In
they do have different meanings. Assessment is more general, N. J. Dorans & L. L. Cook (Eds.), Fairness in educational
conveying the idea of a process providing evidence of quality. assessment and measurement (1st ed.). New York: Routledge.
Assessment covers a broad range of procedures to measure https://doi.org/10.4324/9781315774527.
teaching and learning. A test is one product that measures a
10
particular set of objectives or behavior. See Berman, A. I., There are of course additional contexts where assessments
Haertel, E. H., & Pellegrino, J. W. (2020). Introduction: Framing are used. For example, the federal government administers the
the issues. In A. I. Berman, E. H. Haertel, & J. W. Pellegrino National Assessment of Educational Progress (NAEP), which is
(Eds.), Comparability of large-scale educational assessments: given to a representative sample of students across the
Issues and recommendations (pp. 9–24). Comparability of large country to garner national, state, and some urban district
measures of what students know across various subject areas.
There also are various international assessments in which the supposed to mean, and the interpretation is said to be valid if
United States participates. claims inherent in the interpretation are supported by
appropriate evidence.” Kane, M. T. (2016). Explicating validity.
11
For a discussion of the different meanings of inequity and Assessment in Education: Principles, Policy & Practice, 23(2),
inequality, see, e.g., National Academies of Sciences, 198–211. https://doi.org/10.1080/0969594X.2015.1060192.
Engineering, and Medicine. (2019). Monitoring educational
18
equity. Washington, DC: The National Academies Press. Silver, D., & Polikoff, M. (2020, November 16). Getting testy
https://doi.org/10.17226/25389. about testing—K–12 parents support canceling standardized
testing this spring. That might not be a good idea. The 74.
12
Dorn, E., Hancock, B., Sarakatsannis, J., & Viruleg, E. (2020,
19
June). COVID-19 and student learning in the United States: The DePascale, C., & Gong, B. (2020). Comparability of individual
hurt could last a lifetime. McKinsey & Company. students’ scores on the “same test”. In A. I. Berman, E. H.
Haertel, & J. W. Pellegrino (Eds.), Comparability of large-scale
13
DeVos, B. (letter, 2020, September 3). Key policy letters educational assessments: Issues and recommendations (pp. 25
signed by the Education Secretary or Deputy Secretary. –48). Washington, DC: National Academy of Education (citing
Moss, P. A., Pullin, D. C., Gee, J., Haertel, E. H., & Young, L. J.
14
In December 2020, Education Week published an opinion (2008). Assessment, equity, and opportunity to learn.
piece in which Lorrie Shepard presented many of the concerns Cambridge, UK: Cambridge University Press.). https://
with administering and using 2020–2021 end-of-year doi.org/10.31094/2020/1.
summative assessments. See, Shepard, L. A. (2020, December
20
16). Testing students this spring would be a mistake. Education Id.
Week. Shepard also responded to numerous civil rights
21
advocacy groups urging 2020–2021 summative exams. See Civil Bennett, R. (2020). Interpreting test-score comparisons. In A.
Rights Organizations. (letter, 2020, November 20). Letter to I. Berman, E. H. Haertel, & J. W. Pellegrino (Eds.), Comparability
Deputy Assistant Secretary Ryder, U.S. Department of of large-scale educational assessments: Issues and
Education. See also Baghian, J. (2021). Assessment data can recommendations (pp. 227–235). Washington, DC: National
help us build back better. Education Next Forum; Marion, S. & Academy of Education. https://doi.org/10.31094/2020/1;
Shepard, L. (2021). Focus on instructional and intervention, not Berman, A. I., Feuer, M. J., & Pellegrino, J. W. (2019). What use
testing, in 2021. Education Next Forum. is educational assessment? The ANNALS of the American
Academy of Political and Social Science, 683(1), 8–20. https://
15
Boyer, M., Dadey, N., & Keng L. (2020, September). doi.org/10.1177/0002716219843871.
Statewide summative assessment in
22
spring 2021: A workbook to support planning and decision- For many reasons identified in this section, including
making. Dover, NH: National Center for the Improvement of concerns about representativeness, comparability, reliability,
Educational Assessment; Council of Chief State School Officers. and validity, the National Center for Education Statistics (NCES)
(2020). Restart & recovery: Assessments in spring 2021; Keng, and the National Assessment Governing Board (NAGB)
L., Boyer, M., & Marion, S. F. (2020). Into the unknown: postponed the 2021 administration of the National Assessment
Assessment considerations for spring 2021. Educational of Educational Progress (NAEP) in reading and mathematics.
Measurement: Issues and Practice, 39(3), 53–59. http:// “At the Governing Board’s Nov. 19-20 meeting, NCES
dx.doi.org/10.1111/emip.12362. presented compelling data, which convinced Board members
that COVID-19 related conditions prevent NCES from
16
Bennett, R. (2020). Interpreting test-score comparisons. In A. administering NAEP safely to a sufficient and representative
I. Berman, E. H. Haertel, & J. W. Pellegrino (Eds.), Comparability sample, and reporting results in a valid and reliable manner
of large-scale educational assessments: Issues and consistent with NCES’ statistical standards and the NAEP
recommendations (pp. 227–235). Washington, DC: National Authorization Act. Thus, the Governing Board believes a 2022
Academy of Education. https://doi.org/10.31094/2020/1. administration of NAEP reading and mathematics at grades 4
and 8 would be more likely to provide valuable—and valid—
17
“The validation of a score interpretation involves an data about student achievement in the wake of COVID-19 to
investigation of whether the scores mean what they are support effective policy, research, and resource
allocation” (NAGB. (2020, November 25). Governing Board
statement on postponement of NAEP 2021.).
23 31
Shepard, L. A. (2020, December 16). Testing students this National Research Council. (2011). Incentives and test-based
spring would be a mistake. Education Week; Marion, S. F., accountability in education. Washington, DC: The National
Gonzales, D., Wiener, R., & Peltzman, A. (2020). This is not a Academies Press. https://doi.org/10.17226/12521.
test, this is an emergency: Special considerations for assessing
32
and advancing equity in school-year 2020–21. National Center Marion, S. F., Gonzales, D., Wiener, R., & Peltzman, A. (2020).
for the Improvement of Educational Assessment This is not a test, this is an emergency: Special considerations
(www.nciea.org) and The Aspen Institute for assessing and advancing equity in school-year 2020–21.
(www.aspeninstitute.org/education). National Center for the Improvement of Educational
Assessment (www.nciea.org) and The Aspen Institute
24
Kuhfeld, M., Soland, J., Tarasawa, B., Johnson, A., Ruzek, E., (www.aspeninstitute.org/education); National Academies of
& Liu, J. (2020). Projecting the potential impact of COVID-19 Sciences, Engineering, and Medicine. (2019). Monitoring
school closures on academic achievement. Educational educational equity. Washington, DC: The National Academies
Researcher, 49(8), 549–565. https:// Press. https://doi.org/10.17226/25389.
doi.org/10.3102/0013189X20965918.
25
Additionally, there are significant logistical and operational
challenges to administering end-of-year assessments including
staffing needs and concerns, distancing requirements,
protective equipment requirements, in-school device
availability (e.g., due to providing in-home devices), safe
handling of materials (e.g., papers, pencils), security, and
remote proctoring requirements and costs.
26
We recognize that OTL data can be difficult to collect and
coherently and uniformly report, and that the COVID-19
pandemic has made such collection more difficult.
27
Marion, S. (2020, October). Using opportunity-to-learn data
to support educational equity. Dover, NH: National Center for
the Improvement of Educational Assessment.
28
National Research Council. (2001). Knowing what students
know: The science and design of educational
assessment. Committee on the Foundations of Assessment. J.
Pellegrino, N. Chudowsky, & R. Glaser (Eds.). Board on Testing
and Assessment, Center for Education. Division of Behavioral
and Social Sciences and Education. Washington, DC: National
Academy Press.
29
See, e.g., New Mexico Public Education Department (2020)
documents “Using Multiple Measures and Formative Practice
to Identify Learning Needs,” “Reentry Guidance,” and
“Instructional Acceleration.”
30
Mislevy, R. J. (2019). Advances in measurement and
cognition. In A. I. Berman, M. J. Feuer, & J. W. Pellegrino (Eds.)
What use is educational assessment? The ANNALS of the
American Academy of Political and Social Science, 683(1), 164–
182. https://doi.org/10.1177/0002716219843816.
Suggested Citation:
National Academy of Education. (2021). Educational assessments in the COVID-19 era and beyond.
Washington, DC: Author.
For inquiries, contact Amy Berman, Deputy Director ([email protected]), or Dian Dong, Senior
Program Officer ([email protected]).
This project was supported by a grant from the Spencer Foundation. Any opinions, findings, conclusions, or
recommendations expressed in this publication are those of the National Academy of Education and do not
necessarily reflect the views of the Spencer Foundation.