Scaffolding Feedback To Maximize Long-Term Error Correction: Memory & Cognition 2010, 38 (7), 951-961
Scaffolding Feedback To Maximize Long-Term Error Correction: Memory & Cognition 2010, 38 (7), 951-961
Scaffolding Feedback To Maximize Long-Term Error Correction: Memory & Cognition 2010, 38 (7), 951-961
Janet Metcalfe
Columbia University, New York, New York
Scaffolded feedback was tested against three other feedback presentation methods (standard corrective feed-
back, minimal feedback, and answer-until-correct multiple-choice feedback) over both short- and long-term
retention intervals in order to assess which method would produce the most robust gains in error correction.
Scaffolded feedback was a method designed to take advantage of the benefits of retrieval practice by providing
incremental hints until the correct answer could be self-generated. In Experiments 1 and 3, on an immediate
test, final memory for the correct answer was lowest for questions given minimal feedback, moderate for the
answer-until-correct condition, and equally high in the scaffolded feedback condition and the standard feedback
condition. However, tests of the maintenance of the corrections over a 30-min delay (Experiment 2) and over a
1-day delay (Experiment 3) demonstrated that scaffolded feedback gave rise to the best memory for the correct
answers at a delay.
Memory errors are common. People fail to retrieve in- is that it is usually not sufficient to simply tell the learner
formation that they have learned, and they retrieve flawed whether they were right or wrong (Bangert-Drowns, Kulik,
or even false information, often judging that it is correct Kulik, & Morgan, 1991; Moreno, 2004; Pashler et al.,
with great confidence (Butterfield & Metcalfe, 2001, 2005). Pashler et al. (2005) showed that for feedback to be
2006; Roediger & McDermott, 1995, 2000). What, then, processed effectively, it is crucial that the correct answer
is the most effective way to correct memory errors, and be conveyed. Their results showed that feedback that only
what method results in the stability of those corrections relayed a “correct” or “incorrect” message was ineffectual.
over time? One approach that has proven successful has Only after getting the correct answer as feedback did the
been to present corrective feedback following an error. participants show an increase in retention. In a similar set
However, the resilience of the feedback over time and of experiments, Hancock, Stock, and Kulhavy (1992) found
the type of feedback that is most effective in correcting that participants spent more time processing feedback that
performance have not been extensively tested. Our pur- relayed the correct answer than they did for feedback sim-
pose in the present study was to investigate the durability ply indicating whether they had been right or wrong. If
and effectiveness of error correction following corrective time spent is a measure of effort, Hancock et al.’s findings
feedback and to contrast a variety of formats of providing suggest that people allocate more effort and processing re-
feedback to surmise whether there is a method that is more sources to feedback when it contains the correct response.
effective than simply presenting the correct answer. Scaf- Together, these results indicate that feedback is more con-
folded feedback, in which incremental hints were given structive when it relays the correct answer.
until the correct answer could be self-generated, was con- When the correct answer is made available after an
trasted with other methods, over both short- and long-term error, people are able to integrate that information into
retention intervals, to assess which method would produce memory, as is illustrated by an increased probability of
the most effective, long-lasting gains in error correction. answering correctly on a follow-up test (R. C. Anderson
Feedback has been shown to have considerable posi- et al., 1971). Corrective feedback appears to work well
tive benefits for memory performance (R. C. Anderson, for errors of commission, errors of omission (Metcalfe &
Kulhavy, & Andre, 1971; Butler & Roediger, 2007, 2008; Kornell, 2007; Pashler et al., 2005), and high-confidence
Lhyle & Kulhavy, 1987; Pashler, Cepeda, Wixted, & Rohrer, errors (Butterfield & Metcalfe, 2001, 2006). Feedback
2005). However, the kind of feedback that is given matters. can also strengthen correct answers that were given with
The first and most elementary finding concerning feedback low confidence (Butler, Karpicke, & Roediger, 2008).
B. Finn, [email protected]
All corrective feedback, however, is not created equal. A itself aids memory. In contrast to reading a response, re-
few studies have shown that there are differential benefits trieval of a response from memory requires more effort
depending on how the corrective feedback is presented and may engender deeper processing and more elaborate
(Butler, Karpicke, & Roediger, 2007; Lhyle & Kulhavy, or variable encoding, and may strengthen or increase the
1987; Pashler et al., 2005). These studies draw on depth- number of semantic cues or routes available for retrieval
of-processing research (Craik & Lockhart, 1972; Craik & of the item from memory (Bjork, 1975; Craik & Lockhart,
Tulving, 1975) showing that memory benefits accompany 1972; Craik & Tulving, 1975; Jacoby, 1978; McDaniel &
more active, elaborate processing. The rationale behind Masson, 1985; Melton, 1967; Whitten & Bjork, 1977).
these studies was that if there is more elaborative process- So, we might expect that getting the person to generate
ing of the answer during the presentation of the feedback, the correct response after they have made an error would
a stronger representation of the correct answer should be an effective method of presenting feedback. There are
result. Lhyle and Kulhavy wrote, “If feedback functions two problems with this method. The first is that having
primarily to correct errors, then it follows that any design just generated the wrong answer, people are unlikely to
characteristic that leads students to process, study, or ap- be able to generate anything, and generating nothing will
prehend the feedback more closely should increase the not help later recall. The second is that if they do generate
amount of correction that takes place and ultimately im- something, it is likely to be wrong, which may, in turn, re-
prove criterion performance” (p. 320). In their study, par- sult in enhanced memory for the wrong answer. If it were
ticipants read a text, answered a multiple-choice question possible to circumvent these two problems, self-generated
about the content, and were then given feedback about the or active feedback might be more effective than standard
correct answer. When the feedback was scrambled and the feedback, in which the person is simply given the answer.
participants were required to unscramble it, more errors Accordingly, Butler et al. (2007) explored the pos-
were corrected on the subsequent test than in a condition sibility that feedback involving active selection of the
in which the feedback was presented intact. This outcome correct answer might enhance learning. They tested an
occurred in only one of two of the experiments reported answer-until-correct feedback format. In this paradigm,
in the study, however. According to Lhyle and Kulhavy, originally developed by Pressey (1926), participants an-
rearranging the feedback produced better performance swered multiple-choice questions by selecting from the
because it was effortful, took more time, and made use of response options until they chose the correct option. An
semantic processing—all characteristics of deep, elabo- “incorrect” message followed incorrect responses. Final
rate processing. retention was tested 1 day later. Butler et al. (2007) pro-
Another possibility might be to try to promote retrieval posed that in comparison to a passive presentation of the
practice (Bjork, 1975) or the use of self-generation of the feedback, the answer-until-correct feedback format would
answer (Jacoby, 1978; Slamecka & Graf, 1978) at the time serve as a kind of self-generation of the response through
of feedback, rather than to passively present the answer. self-selection and would thereby enhance learning. The
Retrieval practice is thought by many to be at the core of the participants would also have the benefit of knowing the
testing effect, the finding that taking a test has benefits for correct answer by the end of each question trial. However,
memory retention that go beyond the gains obtained from in contrast to their predictions, Butler et al.’s (2007) results
mere presentation of the material (for a review, see Roedi- showed no advantage for items for which the answer-until-
ger & Karpicke, 2006). The testing effect is closely related correct feedback was given over those items for which the
to the generation effect (Jacoby, 1978; Slamecka & Graf, standard correct-answer presentation was given.
1978), the parallel finding that self-generating or retrieving There are several reasons that the answer-until-correct
a response leads to better retention and recognition perfor- procedure may not have been the most favorable format to
mance for the generated item than does a presentation of the showcase the benefits of generation-enhanced feedback.
same item (see Carrier & Pashler, 1992, for a discussion of First, selection of the correct answer may not require the
how the generation and testing effects can differ). A recent generation of a response from memory. Selection could be
meta-analysis of over 86 generation-effect studies, in which based on familiarity and may not engage the deep memory-
over 17,000 participants were tested, demonstrated that the enhancing retrieval processes that accompany recollection
benefits of generation to memory are robust and consistent (Jacoby, 1991; Yonelinas, 2002). Second, multiple-choice
(Bertsch, Pesta, Wiscott, & McDaniel, 2007). tests expose people to incorrect information at the same
How do the benefits of self-generation arise? One pos- time as the correct information is presented, similarly to
sibility is that when items are generated by the participants, the classic A–B A–C interference paradigms (Barnes &
the participants are simply given an additional learning op- Underwood, 1959) and misinformation effect paradigms
portunity (Thompson, Wenger, & Bartling, 1978)—the so- (Loftus & Palmer, 1974). Interference from the incorrect
called amount-of-processing hypothesis (Dempster, 1996, responses may compromise memory for the correct an-
1997; Roediger & Karpicke, 2006). However, much evi- swer. Though overall a testing benefit has been shown with
dence has shown that an additional study presentation does multiple-choice tests, the selection of the response lures,
not enhance retention as much as generating, even when or indeed, even the mere exposure of the lures before the
processing time is matched (Allen, Mahler, & Estes, 1969; correct answer is ultimately chosen, can interfere with
Carrier & Pashler, 1992; Hogan & Kintsch, 1971; Tulv- memory for the final, correct item (Butler, Marsh, Goode,
ing, 1967; Wenger, Thompson, & Bartling, 1980). A more & Roediger, 2006; Huelser & Marsh, 2006; Marsh, Roe
widely accepted proposal is that the process of retrieval diger, Bjork, & Bjork, 2007; Roediger & Marsh, 2005;
Scaffold to Maximize Error Correction 953
Schooler, Foster, & Loftus, 1988). Roediger and Marsh, ies of the method of vanishing cues, Glisky and Schacter
for example, showed that multiple-choice lures are fre- (1989) and Glisky, Schacter, and Tulving (1986) had am-
quently offered as answers on a follow-up cued-recall test nesic patients first learn new computer terms by seeing
if the lures were chosen during the initial test. These find- the definition (e.g., to store a program on a disk) with a
ings suggest that the most advantageous format for pre- fragment of the target (e.g., s____). Increasing letters were
senting feedback may not be multiple choice. given until the patient was able to guess the correct term
Our interest lay in exploiting the fact that self-generating (e.g., save). On later trials, after they were able to generate
an answer, in contrast to either reading the answer or correctly with all letter cues, the letters vanished one by
selecting it from a set of alternatives, might provide a one from the target item (so long as the patient maintained
considerable boost to memory. We wanted to contrast a perfect performance) until the patient could retrieve the
method for presenting feedback that would make use of answer with the cue alone. These studies showed that am-
the benefits of self-generation with other methods that re- nesiac patients could learn and retain new information if
quired less elaborative processing. Accordingly, we tested the number of errors that they produced was minimized.
a method that we call scaffolded feedback. Our intention A possibly unfavorable consequence of scaffolding
was to use a method that required self-generation, while feedback, as we will use it, is the possibility of unsuccess-
still ensuring that the correct answer would be produced. ful retrieval attempts, which could make the conditions
With scaffolded feedback, participants made retrieval of learning errorful, rather than errorless. There are ad-
attempts that were guided by incremental hints. For ex- vantages to errorless learning in individuals with memory
ample, a participant might be asked, “What was the crime impairments or learning disabilities (N. D. Anderson &
committed by those in Dante’s lowest level of hell in ‘The Craik, 2006; Baddeley & Wilson, 1994; Hayman, Mac-
Inferno’?” If they could not answer the question or if they donald, & Tulving, 1993; Jones & Eayrs, 1992; Sidman
provided the wrong response, they were given another op- & Stoddard, 1967; and see Clare & Jones, 2008, and Kes-
portunity. If the next response that they gave was incor- sels & de Haan, 2003, for reviews). But it is not clear that
rect, they were given the first letter of the answer (e.g., b) such errorful learning conditions have a detrimental effect
and another chance to answer. If they still could not an- in healthy young participants (Kornell, Hays, & Bjork,
swer, they were given the next letter (e.g., e), and so on, 2009; Metcalfe & Kornell, 2007; Pashler, Zarow, & Trip-
until they answered correctly or the whole answer (e.g., lett, 2003). Several recent studies indicate that as long as
betrayal) had been revealed. Because the participants the participant ultimately receives feedback of the correct
could use the hints to manage their own memory retrieval, answer, unsuccessful attempts at retrieving may not harm
we hoped to engage more active retrieval processes and memory (Kornell et al., 2009; Richland, Kornell, & Kao,
better attendant memory than those that would be utilized 2009).
either during an answer-until-correct procedure or during In the present set of experiments, we contrasted four
the standard correct-answer presentation. methods of presenting feedback: (1) standard feedback,
The scaffolded feedback method borrows from the do- in which the correct answer was presented immediately
main of educational psychology, in which the scaffolding following an error; (2) scaffolded feedback, in which par-
approach has long been regarded as an effective support of ticipants were given increasing hints until they answered
learning. In general, scaffolding involves a process of help- the question correctly; (3) answer-until-correct multiple-
ing students reach goals and solve problems that they could choice feedback; and (4) minimal feedback, in which par-
not work out independently but that with some assistance, ticipants knew the answer was wrong, and they were also
they are often able to (Wood, Bruner, & Ross, 1976). Typi- given one additional chance to provide the correct answer.
cally, the scaffolding process involves more than just pre- We measured error correction over short- and long-term
senting the correct solution to a problem. Instead, students test delays. We tested short-term recall performance with
may be given hints about the correct response or a new sug- an immediate test in Experiment 1 and longer-term recall
gestion about how to think about the problem, with the ulti- performance with a 30-min test delay in Experiment 2. In
mate goal of solving the problem correctly themselves. Experiment 3, we sought to extend and replicate our find-
Carpenter and DeLosh (2006) used a procedure simi- ings by comparing results from an immediate and a 1-day
lar to our scaffolding condition to explore the question of delayed test, in a within-participants design.
how intervening tests, as compared to repeated study of
a list, benefited later free recall tests. Single words were Experiment 1
given an initial study, followed by an intervening test or an
intervening study trial and then an immediate final free re- Method
call test. In the test condition that was similar to our scaf- Participants. The participants were 24 undergraduates at Colum-
folding condition, the participants were asked to recall the bia University and Barnard College. They participated for course
studied items (on the intervening test) and were prompted credit or cash.
with one, two, three, or four of the item’s first letters. Final Materials. The questions were 191 general information ques-
free recall was better with test rather than with study as tions (e.g., “What is the name of the unit of measure that refers to a
six-foot depth of water?”; answer, fathom). These items were com-
the intervening task. Furthermore, fewer rather than more posed of a subset of the published questions from Nelson and Narens
letter prompts resulted in better final recall. A procedure (1980). A number of questions that were in the original pool were
similar to the scaffolding that we used has also been used no longer relevant or correct and were eliminated from the pool. All
to enhance learning in special populations. In the stud- correct answers were a single word.
954 Finn and Metcalfe
Procedure. The participants were tested individually on comput- test. The initial confidence in the answers was .36 (SE 5
ers. The experiment had two test phases: an initial test and a surprise .03). Because the feedback condition was determined only
final recall test. During both test phases, the participants answered after the incorrect response had been made, there was no
general information questions. During the final recall test, the par-
possible effect of feedback condition on initial test per-
ticipants only answered questions that they had answered incorrectly
during the initial test. At the beginning of the experiment, the par- formance, in this or in the experiments that follow. The
ticipants were instructed that they would answer general information participants’ confidence ratings were postdictive of their
questions and indicate their confidence in their answer. They were initial test performance. The mean gamma correlation be-
encouraged to guess if they did not know the answer. The partici- tween initial confidence ratings and initial recall perfor-
pants were not told about the retest that was to follow their initial test mance was γ 5 .77 (SE 5 .03), which was significantly
and confidence ratings. In both test phases, a general information greater than 0 [t(23) 5 26.76, p , .05].
question was presented, and the participants typed in their response.
There were no restrictions on the amount of time that they could take Performance at feedback. Without corrective feed-
to answer each question. back, in the minimal feedback condition, error correc-
During the initial test phase, the participants entered their response tion rarely occurred. When the participants were asked
and were then asked to indicate their confidence in their response to supply a new answer after having been told that their
by using a horizontal slider that ranged from very unsure on the left answer was incorrect, but without any additional support,
end to very sure on the right end. The slider bar was set to the middle they were able to correct only a few of the errors of their
of the slider at the onset of each question. Confidence ratings were
coded along a scale from 0 to 100, with 0 indicating a selection of the
own accord, resulting in a proportion of .09 (SE 5 .02) of
lowest limit of the slider, at the very unsure end, and 100 indicating errors corrected at feedback in this condition. This was,
a selection of the highest limit, at the very sure end. however, significantly greater than 0 [t(23) 5 4.39, p ,
When the participants’ answer was correct, a chime would sound, .01]. In the scaffolded condition, 82% (SE 5 3) of the
and the next general information question was presented. If their items were answered correctly before the entire answer
answer was incorrect, there was no chime, and one of four feed- had been revealed with the successive letter hints. To ex-
back conditions immediately occurred. The set of four feedback
amine the average number of hints needed to answer each
conditions were rerandomized after every four incorrect answers.
The four conditions were as follows: standard feedback, scaffolded item correctly for each participant, we computed the aver-
feedback, answer-until-correct multiple-choice feedback, and min- age proportion of the answer revealed, since the answers
imal feedback. The standard feedback was a presentation of the were made up of different numbers of letters. On average,
correct response immediately following the error. The participants the participants needed a proportion of .56 (SE 5 .03) of
could study the feedback for as long as they liked. In the scaffolded the word revealed before they were able to answer cor-
feedback condition, the participants were given an opportunity to rectly. In the answer-until-correct condition, a proportion
provide another answer. If the new answer that they provided was
not correct, the first letter of the correct answer was presented, and of .88 (SE 5 .03) of the correct answers were selected be-
they were given another opportunity to enter the correct answer. fore they were the only item not yet selected. On average,
This process continued, with one additional letter of the answer the participants selected 2.88 (SE 5 0.14) incorrect items
presented after each answer attempt, until the participants answered from the six alternatives presented before they picked the
correctly. In the answer-until-correct multiple-choice feedback correct answer.
condition, an array of six options, including the correct answer, Final test performance. There was a significant ef-
was presented, and the participants could choose a new response.
The experiment program randomly selected the six options from a
fect of feedback condition on final test performance
set of nine potential options. If the participants’ original error was [F(3,69) 5 111.39, MSe 5 0.02, p , .05, η 2p 5 .83]. As is
included in the list of six options, that option was replaced with shown in Figure 1, final test performance was best in the
one of the remaining three options, ensuring that there were always scaffolded condition (M 5 .77, SE 5 .03) and in the stan-
six novel alternatives. Upon selection, if the item was incorrect, dard feedback condition (M 5 .73, SE 5 .04), followed by
it turned red for 500 msec, and the participants were asked to try
again. All six options remained on the screen. When the correct
answer was selected, it turned green for 500 msec, and the experi- 1 Scaffolded feedback
ment moved on. In the minimal feedback condition, after making
Mean Final Test Performance
an error, the participants were given one opportunity to provide .9 Standard feedback
another answer. A chime sounded if they answered correctly. They .8 Answer until correct
then moved on to the next question. Minimal feedback
.7
Immediately following each feedback response, the participants
used a slider to specify whether they knew the answer all along. .6
The slider ranged from That’s new to me on the left end to I actu-
.5
ally knew it all along on the right end. These judgments were not of
focal interest for the present investigation and will not be discussed .4
further. After the participants made this judgment, the next general .3
knowledge question was presented. This process continued until the
participants had answered 36 questions incorrectly and received .2
feedback in one of the four conditions. Then, for the final recall .1
phase, those 36 incorrect questions were randomized by the com-
puter, and each cue was presented for test. 0
Type of Feedback
Results Figure 1. Mean final test performance on an immediate test
Basic data. The participants’ mean recall performance as a function of type of feedback given to original errors in
(proportion correct) was .28 (SE 5 .02) during the initial Experiment 1.
Scaffold to Maximize Error Correction 955
Table 1
Mean Final Test Performance As a Function of Percentage of
Answer Revealed in Scaffolded Condition
Experiment 3
Percentage
of Answer Experiment 1 Experiment 2 Immediate Delayed
Revealed M SE M SE M SE M SE
0 .92 .08 .81 .13 .83 .17 1.00 .00
20 .95 .05 .99 .01 1.00 .00 1.00 .00
40 .92 .05 .85 .09 .89 .11 .78 .15
60 .90 .04 .75 .07 .97 .03 .86 .08
80 .68 .08 .73 .10 .71 .11 .46 .14
100 .44 .09 .45 .07 .60 .11 .46 .11
the answer-until-correct condition (M 5 .61, SE 5 .04). Final test performance for items in which the whole
The worst performance was found in the minimal feed- word had to be revealed before it was answered correctly
back condition (M 5 .08, SE 5 .02). All feedback condi- (M 5 .41, SE 5 .10) was lower than was final test perfor-
tions were significantly different from 0 (all ts . 1, all mance for items that were correctly answered with only
ps , .05). Post hoc pairwise comparisons (which in this part of the word having been revealed (M 5 .86, SE 5 .03)
and subsequent experiments were Bonferroni corrected) [t(18) 5 4.84, p , .01]. The degrees of freedom in this
showed that final test performance results for the scaf- and subsequent analyses may differ from the total num-
folded and standard feedback conditions were not signifi- ber of participants, because there were some who always
cantly different from one another (t , 1). Performance in answered before the whole word hint had been revealed.
the scaffolded condition was significantly better than that Having the whole word revealed in the scaffolded condi-
in the answer-until-correct condition [t(23) 5 3.37, p , tion resulted in much lower final recall than did seeing the
.05]. The performance difference between the standard whole word in the standard feedback condition (M 5 .71,
feedback and the answer-until-correct conditions was SE 5 .04) [t(19) 5 2.95, p , .01]—a result undoubtedly
marginally significant [t(23) 5 2.71, p 5 .07]. Finally, due to the fact that items that required the whole word to
performance in the minimal feedback condition was sig- be revealed in the scaffolded condition were much more
nificantly worse than performance in each of the three difficult than the random selection of items given whole
other conditions (all ps , .05). word feedback in the standard feedback condition.
Final test performance as a function of the amount In a second analysis, similar to that given by Carpenter
of the cue revealed in the scaffolded condition. With and DeLosh (2006), items were split into bins of 0%, 20%,
the following analyses, we explored the relationship be- 40%, 60%, 80%, and 100% on the basis of the proportion
tween final retention and the amount of cue revealed. It of the answer that had been revealed before a correct guess
is possible that performance differences favoring fewer resulted. Although there were too few participants with a
letters could reflect the fact that those items that are re- value in each bin to conduct a one-way repeated measures
called with the need for fewer cues during the scaffolding ANOVA, in this or in the analyses that follow, the means
procedure were the easier items, and hence, differences in are presented in Table 1 for archival purposes. Final test
final recall might be due to that fact alone. (Note that in performance showed the first drop when 80% of the word
Carpenter & DeLosh’s [2006] third experiment, they var- had been revealed. The steepest drop occurred when the
ied the number of intervening test cues, rather than allow- entire word had been revealed. Our data, like those of Car-
ing them to be participant controlled, and still found an penter and DeLosh, revealed the worst performance when
advantage in final free recall for items given fewer cues. the greatest number of hints were given.
An item-selection effect cannot account for this result.) Final performance as a function of the number of
On the other hand, in our experiment, items that required multiple-choice options needed. Would a similar pattern
more letters to be revealed were almost certainly studied to that in the scaffolded condition appear in the answer-
considerably longer—although we did not formally mea- until-correct multiple-choice condition—namely, that se-
sure the time—than those that required only a few let- lection of a correct option, before it was the only remain-
ters to be remembered correctly, since the letter cues were ing option not yet selected would lead to better final test
given one at a time. This study time factor would predict performance than selection of the correct option when it
that memory should be better for items that required many was the last possible option? Again, we present these re-
cues. In Carpenter and DeLosh’s experiment, study time sults for archival purposes and emphasize that the results
probably followed the opposite pattern, since the partici- of this analysis should be interpreted with caution, given
pants were asked to retrieve the correct answer when given the possible item-selection artifacts, as items answered cor-
one, two, three, or four cues, and the effort to retrieve al- rectly only once all the other items had been chosen were
most certainly took longer with one than with four cues. undoubtedly more difficult, a priori, than those items that
Thus, time on in their experiment (unlike in ours) would could be answered before they were the only item not yet
predict a final recall advantage for fewer cues—which is selected. The mean final recall performance for questions
what they found. in which the correct answer had been selected last was .37
956 Finn and Metcalfe
Table 2
Mean Final Test Performance As a Function of Order of
Correct Selection in Multiple Choice
Experiment 3
Ordinal Experiment 1 Experiment 2 Immediate Delayed
Position M SE M SE M SE M SE
First .80 .07 .65 .07 .83 .09 .76 .08
Second .66 .10 .48 .09 .96 .04 .46 .14
Third .45 .11 .55 .10 .80 .13 .29 .16
Fourth .48 .11 .36 .10 .62 .24 .50 .17
Fifth .56 .11 .57 .14 .44 .18 .39 .16
Last .37 .11 .51 .12 .44 .15 .33 .21
(SE 5 .11), in comparison to .63 (SE 5 .04) for questions being tested has large beneficial effects on long-term
in which the correct answer had been selected before it was tests (Carpenter, Pashler, Wixted, & Vul, 2008; Carrier &
the only remaining item. This difference was significant Pashler, 1992; Cull, 2000; Roediger & Karpicke, 2006;
[t(16) 5 2.11, p 5 .05]. Again, we created bins on the basis Thompson et al., 1978; Wenger et al., 1980; Wheeler,
of the number of items that were selected until the correct Ewers, & Buonanno, 2003), consistent with what Bjork
item had been chosen, giving first, second, third, fourth, (1994) called desirable difficulties. The retrieval practice
fifth, and sixth selection bins. As can be seen in Table 1, the involved in testing may make items more resistant to for-
benefit of selecting the correct item appeared to be confined getting (Carpenter et al., 2008; Hogan & Kintsch, 1971;
to its having been selected early. Performance dropped to Roediger & Karpicke, 2006). Carpenter et al. compared a
around 45% on and after the third selection (see Table 2). test with feedback with a study presentation over a range
of retention intervals ranging from 5 min to 42 days. They
Discussion found that the rate of forgetting was less following testing
The results of this experiment indicate that providing than following restudy. If the retrieval practice in scaf-
corrective feedback is important. When no corrections folded feedback is similar to retrieval practice that may be
were given, but the participants simply had to try again to operative in testing, we might find performance benefits
come up with the answers, their eventual performance was to scaffolded feedback over standard feedback, when the
very poor. Performance was better when the participants criterion test is delayed rather than immediate.
received the correct answer by successively guessing in a
multiple-choice test, until they got the right answer. But Experiment 2
the benefits of this answer-until-correct procedure were
not as great as those either when the participants were In Experiment 2, we investigated test performance at a
simply given the answer or when their self-retrieval of the delay. This allowed us to explore feedback-related differ-
correct answer was scaffolded. However, scaffolding and ences in the maintenance of the correct information over
simply being given the answer did not result in different a longer retention interval. The hypothesis was that scaf-
performance levels. When feedback was scaffolded, the folded feedback would benefit retention more at a delay
time needed to present the feedback probably increased, than would standard feedback.
as did the effort needed to do the task. But the result was
not better performance on the immediate final test. Given Method
that the same proportion of errors were corrected follow- The participants were 25 undergraduates at Columbia University
and Barnard College. They participated for course credit or cash.
ing scaffolded and standard feedback, if the test is imme-
The design, materials, and procedure in Experiment 2 were identical
diate, these results indicate that there is no advantage to those in Experiment 1, except that the final test came after a half-
using the more laborious scaffolding methodology. hour delay instead of immediately following questions and feedback.
One caveat to this conclusion is that the similarity in The half-hour delay was filled with an unrelated experiment.
the effectiveness of the scaffolded and standard feedback
conditions might be constrained to an immediate test. One Results
method or the other might have differential long-term Basic data. The participants’ mean recall performance
consequences. If there were a longer term advantage to was a proportion of .22 (SE 5 .03) correct during the ini-
the scaffolded method, that might provide a compelling tial test, and their initial confidence in their answers was
reason to switch to the more intensive method. .37 (SE 5 .03). The participants’ confidence ratings were
There are some indications, from the literature on test- postdictive of their initial test performance. The mean
ing effects, that memorial advantages of retrieval practice gamma correlation between initial confidence ratings and
may not be different from direct study or simply being initial recall performance was g 5 .72 (SE 5 .04) and was
presented in the immediate term but may have large ef- significantly greater than 0 [t(23) 5 18.56, p , .05].
fects when testing is delayed. Although being tested rather Performance at feedback. As in Experiment 1, in the
than restudying can make no difference or can even pro- minimal feedback condition, error correction was rare,
duce worse performance on an immediate follow-up test, resulting in a proportion of .04 (SE 5 .02) of errors cor-
Scaffold to Maximize Error Correction 957
.9 Standard feedback
after 100% of the word had been revealed (see Table 1).
.8 Answer until correct
Final performance as a function of the number of
.7 Minimal feedback multiple-choice options needed. Recall performance
.6 was the same for questions in which the correct answer
had been selected last (M 5 .51, SE 5 .07) and for ques-
.5
tions in which the correct answer had been selected be-
.4 fore it was the only remaining item (M 5 .51, SE 5 .12)
.3 (t , 1). An analysis of performance using bins created on
the basis of the number of items that had been selected
.2
until the correct item had been chosen showed that there
.1 appeared to be no performance benefit on the delayed test
0 for selecting the correct item early (see Table 2).
Type of Feedback
Discussion
Figure 2. Mean final test performance on a test delayed by half When a follow-up recall test was given at a delay,
an hour as a function of type of feedback given to original errors performance differences between the standard feedback
in Experiment 2.
condition and the scaffolded condition emerged. There
were more items answered correctly following the scaf-
rected at the time of feedback (t . 1, p , .01). In the folded feedback than following the standard feedback,
scaffolded condition, a proportion of .59 (SE 5 .05) of which did not show significantly better performance than
the items were answered correctly before the entire answer the answer-until-correct format. As in Experiment 1, the
had been revealed with hints. On average, the participants minimal feedback condition showed the lowest rate of
needed a proportion of .68 (SE 5 .03) of the word re- error correction.
vealed before they were able to answer correctly. In the Because Experiments 1 and 2 were run on separate
answer-until-correct multiple-choice condition, a propor- groups of participants at different times in the academic
tion of .91 (SE 5 .05) of the correct answers were selected year, it was not appropriate to contrast results from the
before they were the only item not yet selected. On aver- immediate and delayed tests. To directly compare par-
age, the participants selected 2.68 (SE 5 0.13) incorrect ticipants’ results from an immediate and a delayed test
items until they picked the correct answer. and to expand our results by using an extended delay, we
Final test performance. There was a significant effect conducted a final experiment. In Experiment 3, we used
of feedback condition on final test performance [F(3,72) 5 a within-participants design to contrast the magnitude of
49.46, MSe 5 0.03, p , .05, η 2p 5 .67]. The mean final error correction following each of the feedback conditions
test performance for each of the conditions was as fol- on an immediate test and on a 1-day delayed test.
lows and is shown in Figure 2: The scaffolded condition
(M 5 .66, SE 5 .05) was followed by the answer-until- Experiment 3
correct condition (M 5 .56, SE 5 .04), then the standard
feedback condition (M 5 .53, SE 5 .05), and finally the Method
minimal feedback condition (M 5 .09, SE 5 .03). Planned The participants were 18 undergraduates at Columbia University
comparisons revealed significant differences between the and Barnard College. They participated for course credit or cash.
The experiment was a 2 (test delay: immediate vs. delayed test) 3
scaffolded and the standard feedback conditions [t(24) 5 4 (feedback condition) within-participants design. The materials and
2.46, p , .05] and between the scaffolded and the answer- procedure of Experiment 3 were identical to those in Experiments 1
until-correct conditions [t(24) 5 2.31, p , .05], with the and 2, except for two procedural changes that were implemented so
scaffolded condition showing superior performance across that we could test both immediately and at a delay. The first differ-
the delay. Performance following answer-until-correct and ence was that all of the participants were tested immediately on half
standard feedback conditions was equivalent (t , 1). All of the items and came back either 1 or 2 days later for a delayed test
on the remaining half. The mean delay between feedback and the
feedback condition comparisons with the minimal feed- final delayed test was 1.22 days. The items were assigned randomly
back condition showed significant differences (all ps , in equal numbers into the immediate and delayed test conditions.
.05). The performance scores in all of the feedback condi- The second change was that the participants answered questions
tions were significantly different from 0 (all ps , .05). until they had attained 40 incorrect answers.
Final test performance as a function of the amount
of the cue revealed in the scaffolded condition. When Results
the participants were given the entire word, performance Basic data. The participants’ mean recall performance
on the delayed test was significantly lower (M 5 .47, on the initial test was a proportion of a proportion of .30
SE 5 .07) than when they were able to answer it with (SE 5 .02) correct during the initial test. Initial confidence
only partial cues (M 5 .72, SE 5 .06) [t(23) 5 2.99, p , in answers given was .39 (SE 5 .03). The participants’
.05]. Delayed final test performance for items given the confidence ratings were postdictive of their initial test
whole word answer was not significantly worse than the performance. The mean gamma correlation between ini-
958 Finn and Metcalfe
tial confidence ratings and initial recall performance was Performance for all feedback conditions was significantly
g 5 .76 (SE 5 .03) and was significantly greater than 0 different from 0 (all ps , .05).
[t(17) 5 29.18, p , .05]. The main result of interest was the significant time of
Performance at feedback. As in Experiments 1 and 2, test 3 feedback condition interaction [F(3,51) 5 4.85,
error correction at feedback was rare in the minimal feed- MSe 5 0.03, p , .05, η 2p 5 .22]. Performance following
back condition (M 5 .09, SE 5 .02; t . 1, p , .05). There the standard feedback and the scaffolded conditions was
were no significant differences between the immediate our central focus. Post hoc tests revealed that there was
and delayed conditions in any of the following feedback no significant difference between the standard feedback
performance analyses ( ps . .05), which was as expected, (M 5 .76, SE 5 .07) and scaffolded (M 5 .77, SE 5 .06)
since the test delay manipulation had not yet been intro- conditions (t , 1, p . .05) on the immediate test. Impor-
duced at the time of feedback. In the scaffolded condition, tantly, however, the benefits of scaffolded feedback over
a proportion of .68 (SE 5 .04) of the items were answered standard feedback were shown at the delay. Performance
correctly before the entire answer had been revealed with on the delayed test showed a significant (M 5 .16) per-
hints. On average, the participants needed a proportion of formance advantage for items given scaffolded feedback
.61 (SE 5 .03) of the word revealed before they were able (M 5 .67, SE 5 .06) over items given standard feedback
to answer correctly. In the answer-until-correct multiple- (M 5 .51, SE 5 .06) [t(17) 5 2.61, p , .05].
choice condition, a proportion of .90 (SE 5 .02) of the On the immediate test, performance following answer-
correct answers were selected before they were the only until-correct feedback (M 5 .78, SE 5 .04) was not dif-
item not yet selected. On average, the participants selected ferent from performance following either the standard or
2.73 (SE 5 0.12) incorrect items until they picked the cor- the scaffolded feedback (all ts , 1, all ps . .05). This re-
rect answer. sult differed from those of Experiment 1, in which perfor-
Final test performance on the immediate and de- mance in the answer-until-correct condition was different
layed tests. Mean test performance for each condition from that in the standard and scaffolded conditions. On
on the immediate and delayed tests can be seen in Fig- the delayed test, performance in the answer-until-correct
ure 3. There was a main effect of time of test [F(1,17) 5 condition (M 5 .52, SE 5 .06) was not different from
25.30, MSe 5 0.03, p , .05, η 2p 5 .60] showing an ex- that in the standard feedback condition (t , 1, p . .05).
pected delay-related drop in performance (immediate, The difference between performance in the answer-until-
M 5 .60, SE 5 .04; delayed, M 5 .45, SE 5 .04). There correct condition and that in the scaffolded condition on
was also a main effect of feedback condition [F(3,51) 5 the delayed test was at significance [t(17) 5 2.06, p 5
68.19, MSe 5 0.04, p , .05, η 2p 5 .80]. The lowest per- .05]. Performance following the minimal feedback condi-
formance was shown in the minimal feedback condition tion was the worst and was significantly different from
(M 5 .11, SE 5 .02), which was significantly different that in all other conditions on both the immediate and de-
from that in all other conditions (all ts . 1, all ps , .05). layed tests (all ts . 1, all ps , .05).
There was no overall test performance difference between Final test performance as a function of the amount
the other feedback conditions (scaffolded, M 5 .72, SE 5 of the cue revealed in the scaffolded condition. Per-
.05; answer-until-correct, M 5 .65, SE 5 .04; standard formance on the final test was lower (M 5 .52, SE 5 .11)
feedback, M 5 .63, SE 5 .05) (all ts , 1, all ps . .05). when the whole word was revealed than when only partial
cues were revealed (M 5 .83, SE 5 .05) [F(1,13) 5 8.56,
MSe 5 0.03, p , .05, η 2p 5 .40]. Neither the effect of time
of test nor the time of test 3 hint amount interaction was
1 Scaffolded feedback
significant ( p . .05). Items given standard feedback were
Mean Final Test Performance
Standard feedback
.9
Answer until correct
recalled significantly better than items given the whole
.8 Minimal feedback
word answer in scaffolded feedback (t . 1, p , .05). Final
test performance for the scaffolded condition showed the
.7
first drop when about 80% of the word had been revealed
.6 (see Table 1).
.5 Final performance as a function of the number of
.4 multiple-choice options needed. We could not compute
the full 2 (time of test: immediate vs. delayed) 3 2 (hint:
.3
partial vs. whole) repeated measures ANOVA for the
.2 answer-until-correct multiple-choice condition, because
.1 there were only a few participants who had selected the
0 correct answer as their last possible selection in the imme-
Immediate Test Delayed Test diate and delayed test conditions. Collapsing over time of
test, we found that recall performance was worse for items
Type of Feedback in which the correct answer had been selected last (M 5
Figure 3. Mean final test performance on an immediate test .42, SE 5 .12) than for questions in which the correct an-
and on a test delayed by 1 day as a function of type of feedback swer had been selected before it was the only remaining
given to original errors in Experiment 3. item (M 5 .72, SE 5 .16) ( p , .05). There appeared to
Scaffold to Maximize Error Correction 959
be some benefit at final test for selecting the correct item folding system that we used, which is very easy to imple-
early (see Table 2). ment, had considerable favorable effects.
Much research has shown that providing students
Discussion with corrective feedback can improve performance on a
In Experiment 3, we replicated and extended the find- follow-up test. Here, we showed that scaffolded corrective
ings of Experiments 1 and 2. Final test performance was feedback resulted in corrections that were more resilient
not different between the standard and scaffolded condi- to a delay interval than corrections following either the
tions on an immediate test. When the test was delayed standard feedback or answer-until-correct multiple-choice
over 24 h, however, scaffolded feedback produced more formats. Standard feedback is the most effective and ef-
long-lasting gains in error correction than did standard ficient method to use if a student only has a few minutes
feedback. to correct their errors before a test. However, if the goal is
long-term knowledge retention, the results presented here
General Discussion indicate that the student will be best served by scaffolded
feedback.
In the set of experiments presented here, we contrasted
Author Note
four methods of presenting corrective feedback. Scaf-
folding feedback, by giving successive hints but requir- This research was supported by NIMH Grant RO1MH60637 and by
ing that the participant generate the answer him or her- Grant 220020166 from the James S. McDonnell Foundation. We thank
self, took advantage of the benefits of retrieval practice the scholars from MetaLab for their help and comments. Correspon-
and generation. This method was designed to utilize the dence concerning this article should be addressed to B. Finn, Department
of Psychology, Washington University, St. Louis, MO 63130 (e-mail:
deep retrieval processes that are engaged in the inten- [email protected]).
tional generation of a response from memory (Jacoby,
1991; Yonelinas, 2002). In contrast to standard feedback, References
scaffolded feedback capitalized on the benefits of re-
Allen, G. A., Mahler, W. A., & Estes, W. K. (1969). Effects of
trieval attempts on memory, while making certain that recall tests on long-term retention of paired associates. Journal of
the correct answer would be produced. We found on a Verbal Learning & Verbal Behavior, 8, 463-470. doi:10.1016/S0022
test conducted immediately after study that errors in the -5371(69)80090-3
scaffolded feedback condition were corrected at a rate Anderson, N. D., & Craik, F. I. M. (2006). The mnemonic mechanisms
equally high as that in the standard feedback condition, of errorless learning. Neuropsychologia, 44, 2806-2813. doi:10.1016/
j.neuropsychologia.2006.05.026
in which the answers were simply provided to partipants. Anderson, R. C., Kulhavy, R. W., & Andre, T. (1971). Feedback pro-
However, when the test was delayed for either a half hour cedures in programmed instruction. Journal of Educational Research,
or slightly more than a day, scaffolded feedback led to 62, 148-156. doi:10.1037/h0030766
greater recall than did standard feedback or the answer- Baddeley, A., & Wilson, B. A. (1994). When implicit learning fails:
Amnesia and the problem of error elimination. Neuropsychologia, 32,
until-correct feedback. 53-68. doi:10.1016/0028-3932(94)90068-X
With scaffolding, feedback can be flexible and dy- Bangert-Drowns, R. L., Kulik, C.-L. C., Kulik, J. A., & Morgan, M.
namic, allowing calibration of feedback to the knowl- (1991). The instructional effect of feedback in test-like events. Review
edge and skills of each student. Student A may not need of Educational Research, 61, 213-238.
to be exposed to as many clues as Student B to answer Barnes, J., & Underwood, B. (1959). “Fate” of first learned asso-
ciations in transfer theory. Journal of Experimental Psychology, 58,
a particular question or to retrieve relevant information 97-105.
from memory. Each student will have different memo- Bertsch, S., Pesta, B. J., Wiscott, R., & McDaniel, M. A. (2007).
ries, experiences, and domain knowledge, and therefore, The generation effect: A meta-analytic review. Memory & Cognition,
each will have different feedback requirements, which 35, 201-210.
Bjork, R. A. (1975). Retrieval as a memory modifier: An interpretation
can be dynamically adjusted on the basis of the student’s of negative recency and related phenomena. In R. L. Solso (Ed.), In-
current state of knowledge. Items just at the boundary of formation processing and cognition: The Loyola Symposium (pp. 123-
what the person knows (or what Metcalfe and colleagues 144). Hillsdale, NJ: Erlbaum.
have called the region of proximal learning; Metcalfe, Bjork, R. A. (1994). Memory and metamemory considerations in the
2002, in press; Metcalfe & Kornell, 2003, 2005) might training of human beings. In J. Metcalfe & A. Shimamura (Eds.),
Metacognition: Knowing about knowing (pp. 185-205), Cambridge,
be those items that were initially answered incorrectly MA: MIT Press.
but, with the benefit of a small amount of scaffolding, Butler, A. C., Karpicke, J. D., & Roediger, H. L., III (2007). The ef-
could be self-generated correctly. Effective learning can fect of type and timing of feedback on learning from multiple-choice
occur because the instructor (even when that instructor tests. Journal of Experimental Psychology: Applied, 13, 273-281.
Butler, A. C., Karpicke, J. D., & Roediger, H. L., III (2008). Cor-
is a computer) and student are coordinated (Pea, 2004; recting a metacognitive error: Feedback enhances retention of low
Wood et al., 1976). confidence correct responses. Journal of Experimental Psychology:
One limitation of this particular instantiation of scaf- Learning, Memory, & Cognition, 34, 918-928. doi:10.1037/0278
folded feedback was the narrow scope of our hints, which -7393.34.4.918
only displayed additional letters of the correct answer. A Butler, A. C., Marsh, E. J., Goode, M. K., & Roediger, H. L., III
(2006). When additional multiple-choice lures aid versus hinder later
more sophisticated scaffolding system using semantic memory. Applied Cognitive Psychology, 20, 941-956. doi:10.1002/
cues, for example, might produce even better results, al- acp.1239
though this needs to be tested. Even the very simple scaf- Butler, A. C., & Roediger, H. L., III (2007). Testing improves long-term
960 Finn and Metcalfe
retention in a simulated classroom setting. European Journal of Cog- Kessels, R. P. C., & de Haan, E. H. F. (2003). Implicit learning in
nitive Psychology, 19, 514-527. doi:10.1080/09541440701326097 memory rehabilitation: A meta-analysis on errorless learning and van-
Butler, A. C., & Roediger, H. L., III (2008). Feedback enhances the ishing cues methods. Journal of Clinical & Experimental Neuropsy-
positive effects and reduces the negative effects of multiple-choice test- chology, 25, 805-814. doi:10.1076/jcen.25.6.805.16474
ing. Memory & Cognition, 36, 604-616. doi:10.3758/MC.36.3.604 Kornell, N., Hays, M. J., & Bjork, R. A. (2009). Unsuccessful retrieval
Butterfield, B., & Metcalfe, J. (2001). Errors committed with high attempts enhance subsequent learning. Journal of Experimental Psy-
confidence are hypercorrected. Journal of Experimental Psychology: chology: Learning, Memory, & Cognition, 35, 989-998. doi:10.1037/
Learning, Memory, & Cognition, 27, 1491-1494. doi:10.1037/0278 a0015729
-7393.27.6.1491 Lhyle, K. G., & Kulhavy, R. W. (1987). Feedback processing and error
Butterfield, B., & Metcalfe, J. (2006). The correction of errors com- correction. Journal of Educational Psychology, 79, 320-322. doi:10
mitted with high confidence. Metacognition & Learning, 1, 1556- .1037/0022-0663.79.3.320
1623. doi:10.1007/s11409-006-6894-z Loftus, E. F., & Palmer, J. C. (1974). Reconstruction of automobile
Carpenter, S. K., & DeLosh, E. L. (2006). Impoverished cue support destruction: An example of the interaction between language and
enhances subsequent retention: Support for the elaborative retrieval memory. Journal of Verbal Learning & Verbal Behavior, 13, 585-589.
explanation of the testing effect. Memory & Cognition, 34, 268-276. doi:10.1016/S0022-5371(74)80011-3
Carpenter, S. K., Pashler, H., Wixted, J. T., & Vul, E. (2008). The Marsh, E. J., Roediger, H. L., III, Bjork, R. A., & Bjork, E. L. (2007).
effects of tests on learning and forgetting. Memory & Cognition, 36, The memorial consequences of multiple-choice testing. Psychonomic
438-448. doi:10.3758/MC.36.2.438 Bulletin & Review, 14, 194-199.
Carrier, M., & Pashler, H. (1992). The influence of retrieval on reten- McDaniel, M. A., & Masson, M. E. J. (1985). Altering memory rep-
tion. Memory & Cognition, 20, 632-642. resentations through retrieval. Journal of Experimental Psychology:
Clare, L., & Jones, R. S. P. (2008). Errorless learning in the reha- Learning, Memory, & Cognition, 11, 371-385. doi:10.1037/0278-7393
bilitation of memory impairment: A critical review. Neuropsychology .11.2.371
Review, 18, 1-23. doi:10.1007/s11065-008-9051-4 Melton, A. W. (1967). Repetition and retrieval from memory. Science,
Craik, F. I. M., & Lockhart, R. S. (1972). Levels of processing: 158, 532. doi:10.1126/science.158.3800.532-b
A framework for memory research. Journal of Verbal Learning & Metcalfe, J. (2002). Is study time allocated selectively to a region of
Verbal Behavior, 11, 671-684. doi:10.1016/S0022-5371(72)80001-X proximal learning? Journal of Experimental Psychology: General,
Craik, F. I. M., & Tulving, E. (1975). Depth of processing and the 131, 349-363. doi:10.1037/0096-3445.131.3.349
retention of words in episodic memory. Journal of Experimental Psy- Metcalfe, J. (in press). Desirable difficulties and study in the region of
chology: General, 104, 268-294. proximal learning. In A. S. Benjamin (Ed.), Successful remembering
Cull, W. L. (2000). Untangling the benefits of multiple study op- and successful forgetting: A festschrift in honor of Robert A. Bjork.
portunities and repeated testing for cued recall. Applied Cognitive New York: Psychology Press.
Psychology, 14, 215-235. doi:10.1002/(SICI)1099-0720(200005/ Metcalfe, J., & Kornell, N. (2003). The dynamics of learning and
06)14:3<215::AID-ACP640>3.3.CO;2-T allocation of study time to a region of proximal learning. Journal of
Dempster, F. N. (1996). Distributing and managing the conditions of Experimental Psychology: General, 132, 530-542. doi:10.1037/0096
encoding and practice. In E. C. Carterette & M. P. Friedman (Series -3445.132.4.530
Eds.) & E. L. Bjork & R. A. Bjork (Vol. Eds.), Handbook of per- Metcalfe, J., & Kornell, N. (2005). A region of proximal learning
ception and cognition: Vol. 10. Memory (2nd ed., pp. 317-344). San model of study time allocation. Journal of Memory & Language, 52,
Diego: Academic Press. 463-477. doi:10.1016/j.jml.2004.12.001
Dempster, F. N. (1997). Using tests to promote classroom learning. In Metcalfe, J., & Kornell, N. (2007). Principles of cognitive science
R. F. Dillon (Ed.), Handbook on testing (pp. 332-346). Westport, CT: in education: The effects of generation, errors and feedback. Psy-
Greenwood Press. chonomic Bulletin & Review, 14, 225-229.
Glisky, E. L., & Schacter, D. L. (1989). Extending the limits of Moreno, R. (2004). Decreasing cognitive load for novice students:
complex learning in organic amnesia: Computer training in a voca- Effects of explanatory versus corrective feedback in discovery-
tional domain. Neuropsychologia, 27, 107-120. doi:10.1016/0028 based multimedia. Instructional Science, 32, 99-113. doi:10.1023/
-3932(89)90093-6 B:TRUC.0000021811.66966.1d
Glisky, E. L., Schacter, D. L., & Tulving, E. (1986). Computer learn- Nelson, T. O., & Narens, L. (1980). Norms of 300 general-information
ing by memory-impaired patients: Acquisition and retention of com- questions: Accuracy of recall, latency of recall, and feeling-of-
plex knowledge. Neuropsychologia, 24, 313-328. doi:10.1016/0028 knowing ratings. Journal of Verbal Learning & Verbal Behavior, 19,
-3932(86)90017-5 338-368. doi:10.1016/S0022-5371(80)90266-2
Hancock, T. E., Stock, W. A., & Kulhavy, R. W. (1992). Predicting Pashler, H., Cepeda, N. J., Wixted, J. T., & Rohrer, D. (2005). When
feedback effects from response-certitude estimates. Bulletin of the does feedback facilitate learning of words? Journal of Experimental
Psychonomic Society, 30, 173-176. Psychology: Learning, Memory, & Cognition, 31, 3-8. doi:10.1037/
Hayman, C. A. G., Macdonald, C. A., & Tulving, E. (1993). The role 0278-7393.31.1.3
of repetition and associative interference in new semantic learning in Pashler, H., Zarow, G., & Triplett, B. (2003). Is temporal spacing of
amnesia: A case experiment. Journal of Cognitive Neuroscience, 5, tests helpful even when it inflates error rates? Journal of Experimental
375-389. doi:10.1162/jocn.1993.5.4.375 Psychology: Learning, Memory, & Cognition, 29, 1051-1057.
Hogan, R. M., & Kintsch, W. (1971). Differential effects of study Pea, R. D. (2004). The social and technological dimensions of scaf-
and test trials on long-term recognition and recall. Journal of Ver- folding and related theoretical concepts for learning, education,
bal Learning & Verbal Behavior, 10, 562-567. doi:10.1016/S0022 and human activity. Journal of the Learning Sciences, 13, 423-451.
-5371(71)80029-4 doi:10.1207/s15327809jls1303_6
Huelser, B. J., & Marsh, E. J. (2006, November). Does guessing on Pressey, S. L. (1926). A simple apparatus which gives tests and scores
a multiple-choice test affect later cued recall? Poster presented at the and teaches. School & Society, 23, 373-376.
47th Annual Meeting of the Psychonomic Society, Houston. Richland, L. E., Kornell, N., & Kao, L. S. (2009). The pretesting
Jacoby, L. L. (1978). On interpreting the effects of repetition: Solving a effect: Do unsuccessful retrieval attempts enhance learning? Jour-
problem versus remembering a solution. Journal of Verbal Learning & nal of Experimental Psychology: Applied, 15, 243-257. doi:10.1037/
Verbal Behavior, 17, 649-668. doi:10.1016/S0022-5371(78)90393-6 a0016496
Jacoby, L. L. (1991). A process dissociation framework: Separating Roediger, H. L., III, & Karpicke, J. D. (2006). The power of testing
automatic from intentional uses of memory. Journal of Memory & memory: Basic research and implications for educational practice.
Language, 30, 513-541. doi:10.1016/0749-596X(91)90025-F Perspectives on Psychological Science, 1, 181-210. doi:10.1111/j.1745
Jones, R. S. P., & Eayrs, C. B. (1992). The use of errorless learning -6916.2006.00012.x
procedures in teaching people with a learning disability: A critical Roediger, H. L., III, & Marsh, E. J. (2005). The positive and nega-
review. Mental Handicap Research, 5, 204-212. tive consequences of multiple-choice testing. Journal of Experimen-
Scaffold to Maximize Error Correction 961
tal Psychology: Learning, Memory, & Cognition, 31, 1155-1159. Tulving, E. (1967). The effects of presentation and recall of material in
doi:10.1037/0278-7393.31.5.1155 free-recall learning. Journal of Verbal Learning & Verbal Behavior, 6,
Roediger, H. L., III, & McDermott, K. B. (1995). Creating false memo- 175-184. doi:10.1016/S0022-5371(67)80092-6
ries: Remembering words not presented in lists. Journal of Experimen- Wenger, S. K., Thompson, C. P., & Bartling, C. A. (1980). Recall
tal Psychology: Learning, Memory, & Cognition, 21, 803-814. doi:10 facilitates subsequent recognition. Journal of Experimental Psychol-
.1037/0278-7393.21.4.803 ogy: Human Learning & Memory, 6, 135-144. doi:10.1037/0278
Roediger, H. L., III, & McDermott, K. B. (2000). Distortions of mem- -7393.6.2.135
ory. In F. I. M. Craik & E. Tulving (Eds.), The Oxford handbook of Wheeler, M. A., Ewers, M., & Buonanno, J. F. (2003). Different rates
memory (pp. 149-164). Oxford: Oxford University Press. of forgetting following study versus test trials. Memory, 11, 571-580.
Schooler, J. W., Foster, R. A., & Loftus, E. E. (1988). Some delete- doi:10.1080/09658210244000414
rious consequences of the act of recollection. Memory & Cognition, Whitten, W. B., II, & Bjork, R. A. (1977). Learning from tests: Effects
16, 243-251. of spacing. Journal of Verbal Learning & Verbal Behavior, 16, 465-
Sidman, M., & Stoddard, L. T. (1967). The effectiveness of fading in 478. doi:10.1016/S0022-5371(77)80040-6
programming during a simultaneous form discrimination for retarded Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring and
children. Journal of the Experimental Analysis of Behavior, 10, 3-15. problem solving. Journal of Child Psychology & Psychiatry, 17, 89-
Slamecka, N. J., & Graf, P. (1978). The generation effect: Delin- 100. doi:10.1111/j.1469-7610.1976.tb00381.x
eation of a phenomenon. Journal of Experimental Psychology: Yonelinas, A. P. (2002). The nature of recollection and familiarity: A
Human Learning & Memory, 4, 592-604. doi:10.1037/0278-7393.4 review of 30 years of research. Journal of Memory & Language, 46,
.6.592 441-517.
Thompson, C. P., Wenger, S. K., & Bartling, C. A. (1978). How
recall facilitates subsequent recall: A reappraisal. Journal of Ex-
perimental Psychology: Human Learning & Memory, 4, 210-221. (Manuscript received December 9, 2009;
doi:10.1037/0278-7393.4.3.210 revision accepted for publication March 7, 2010.)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.