Process Assessment in Dynamic Testing Using Electronic Tangibles
Process Assessment in Dynamic Testing Using Electronic Tangibles
Process Assessment in Dynamic Testing Using Electronic Tangibles
DOI: 10.1111/jcal.12318
ORIGINAL ARTICLE
1
Faculty of Social Sciences, Department of
Developmental and Educational Psychology, Abstract
Leiden University, Leiden, The Netherlands Task solving processes and changes in these processes have long been expected to
2
Zwijsen BV, Tilburg, The Netherlands
provide valuable information about children's performance in school. This article used
Correspondence
Wilma C. M. Resing, Faculty of Social
electronic tangibles (concrete materials that can be physically manipulated) and a
Sciences, Department of Psychology, Section dynamic testing format (pretest, training, and posttest) to investigate children's task
Developmental and Educational Psychology,
Leiden University, P.O. Box 9555, 2300 RB
solving processes and changes in these processes as a result of training. We also
Leiden, The Netherlands. evaluated the value of process information for the prediction of school results. Partic-
Email: [email protected]
ipants were N = 253 children with a mean age of 7.8 years. Half of them received a
graduated prompts training; the other half received repeated practice only. Three pro-
cess measures were used: grouping behaviour, verbalized strategies, and completion
time. Different measures showed different effects of training, with verbalized strate-
gies showing the largest difference on the posttest between trained and untrained
children. Although process measures were related to performance on our dynamic
task and to math and reading performance in school, the amount of help provided
during training provided the most predictive value to school results. We concluded
that children's task solving processes provide valuable information, but the interpreta-
tion requires more research.
KEY W ORDS
dynamic testing, inductive reasoning, log file analysis, process assessment, series completion,
tangible user interface
--------------------------------------------------------------------------------------------------------------------------------
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided
the original work is properly cited.
© 2018 The Authors. Journal of Computer Assisted Learning Published by John Wiley & Sons, Ltd.
acquired knowledge and skills have to be assessed, but also their Hickendorff, Resing, Heiser, & de Boeck, 2013). More importantly, both
potential to learn when the opportunity is presented (Grigorenko & the number of prompts and posttest scores were found to be good
Sternberg, 1998; Sternberg & Grigorenko, 2002). These criticisms led predictors of future school success as well as an indicator of learning
to the development of dynamic testing, which involves testing proce- potential (e.g., Caffrey, Fuchs, & Fuchs, 2008).
dures in which a training session is incorporated to assess the child's
response to a learning opportunity (e.g., Kozulin, 2011; Lidz, 2014;
Resing, 2013; Sternberg & Grigorenko, 2002; Stringer, 2018). To
1.2 | Inductive reasoning and series completion
improve the predictive validity of traditional tests, some researchers In many static and dynamic testing procedures, inductive reasoning
argued that an additional analysis of the task solving process would tasks are extensively used. The process of inductive reasoning requires
provide valuable information regarding cognitive potential (Resing & one to detect and formulate a general rule within a specific set of
Elliott, 2011; Resing, Xenidou‐Dervou, Steijn, & Elliott, 2012; Sternberg, elements (Klauer & Phye, 2008). Inductive reasoning ability is consid-
& Grigorenko, 2002). Both the assessment of the child's progression in ered a core component of children's cognitive and scholastic develop-
task solving, including the use of electronic tangibles, and the evalua- ment (Molnár, Greiff, & Csapó, 2013; Perret, 2015; Resing & Elliott,
tion of this task solving process were the foci of the process‐oriented 2011), and can be measured with a variety of tasks, such as analogies,
dynamic testing procedures used in the current study. In the current categorization, and series completion (Perret, 2015; Sternberg, 1985).
paper, task solving processes were defined as the task‐oriented In the current study, schematic picture series completion tasks were
behaviours children employed during inductive reasoning task solving. used, in which pictorial series had to be completed by inducing and
implementing solving rules. Simon and Kotovsky (1963) identified three
central components of the inductive reasoning task solving process: (a)
1.1 | Dynamic testing and graduated prompts
the detection of relations/transformations in the material, (b) the
procedure identification of periodicity, and (c) the completion of the pattern.
Whereas static tests do not include training beyond repeated Series completion tasks can be constructed with a range of con-
instruction or, in most cases, do not contain explanations or feedback tents such as letters, numbers, and pictures. Letters and numbers have
regarding the correctness of answers, dynamic testing incorporates an a fixed, often familiar relationship to each other. Pictures and colours,
instruction moment in the form of feedback, training, or scaffolding. on the other hand, do not and, therefore, require more analysis of the
Dynamic testing can be utilized not only to measure progression in task sequence to determine the relationship(s) and, in doing so, solve the
solving, in terms of accuracy scores on the task considered, but also to tasks (Resing & Elliott, 2011). Schematic pictures, as used in the
assess the processes involved in learning how to solve these problems current study, can consist of several combined sets of transformations,
(Elliott, Resing, & Beckmann, 2018; Haywood & Lidz, 2007; Resing & which are not necessarily related (e.g., Sternberg & Gardner, 1983),
Elliott, 2011; Sternberg & Grigorenko, 2002). Over the years, several and have a constructed response format. As opposed to multiple‐
different formats have been developed for dynamic testing (Haywood choice items, constructed response items were found to be more
& Lidz, 2007; Lidz, 2014; Sternberg & Grigorenko, 2002). Formats difficult to solve and to elicit more advanced and overt task problem
range from relatively unstructured, with a great emphasis on the exam- solving processes on a dynamic test of analogical reasoning in 5‐ and
iners' possibility to provide unique individualized instruction at any 6‐year‐old children (Stevenson, Heiser, & Resing, 2016).
point the examiner deems necessary, to completely standardized (e.g.,
Campione, Brown, Ferrara, Jones, & Steinberg, 1985; Resing, 1998).
Dynamic tests have been implemented in a variety of domains including
1.3 | Process‐oriented testing
academic subjects and language development (Elliott et al., 2018), with When children or adults are first presented with a problem to solve,
a range of available testing instruments to target the domain of they, in principle, attempt to understand it by creating an initial
interest (Haywood & Lidz, 2007; Sternberg & Grigorenko, 2002). problem representation. According to Robertson (2001), the efficiency
In some of the more structured formats, for example, a pretest, and accuracy of the task solving process are determined by the quality
training, and posttest design, children are provided with graduated of this representation. As argued by many researchers, this initial
prompts as part of the instruction moment (Campione et al., 1985; representation is a crucial aspect of performance (Hunt, 1980; Pretz,
Fabio, 2005; Ferrara, Brown, & Campione, 1986; Sternberg & Naples, & Sternberg, 2003). As problem representation is said to
Grigorenko, 2002). This procedure provides standardized help, in the determine the strategies that are chosen to try and solve a problem,
form of hints and prompts, which are presented to children if they can- an incorrect representation may result in the use of inaccurate
not solve a problem independently. The graduated prompts approach strategies (Alibali, Phillips, & Fischer, 2009; Pretz et al., 2003). The prob-
was originally designed to assess individual differences in the amount lem representation of a solver can potentially be improved as the result
and type of instruction needed to elicit the solving of tasks and was fur- of learning to use new solving strategies. Often, the extent to which
ther refined to find the degree of help a child needed to complete a task improvement is successful is believed to be dependent on the availabil-
successfully (Campione et al., 1985; Resing, 1993, 2000). Hints are ity and organization of the requested knowledge (Pretz et al., 2003).
hierarchically ordered, from general, metacognitive prompts to Moreover, the notion of “problem space” was introduced by
concrete, cognitive scaffolds. The method of training was found to lead Newell and Simon (1972), as a conceptualization of the problem defi-
to greater improvement in task success than regular feedback, nition and representation that contain all possible routes to a solution.
especially for the children who had low initial scores (Stevenson, According to these authors, a problem space can be reduced by
VEERBEEK ET AL. 129
restructuring the problem into a set of smaller problems, which is also subjected to training between pretest and posttest, to investigate
called “means‐ends analysis.” This approach is thought to be particu- children's potential for learning in both the outcome and the process
larly helpful if no clear solving strategy is available (Robertson, 2001; of solving inductive reasoning tasks. In addition, we tested a rule‐
Weisberg, 2015). The ways in which a solver structures a problem, based automated scoring method developed to measure changes in
for example, by analysing the sequence of solving steps or grouping problem representation in children's inductive problem solving.
these answering steps in meaningful units, are thought to provide We first expected (Hypothesis 1) children's problem solving
valuable information about individual differences in problem solving. processes and outcomes in series completion to progress to a
However, most standard cognitive tests have not been constructed more sophisticated level. We expected (Hypothesis 1a) children to
to reveal this process information (Richard & Zamani, 2003). show more accuracy in their series completion solving skills as a
Process‐oriented dynamic testing originated from an intention to result of a graduated prompts training, than as a result of repeated
detect (individual) changes in strategy use as a result of training practice (Resing & Elliott, 2011; Resing et al., 2012). Further, we
(Resing & Elliott, 2011) and from the idea that examining strategy anticipated that (Hypothesis 1b) training would lead children to show
use would enable an examiner to assess how a person's solving of a more grouping activities (separating groups of task elements) to make
task progresses. Examination of an individual's use of strategies, completion of the series easier and that (Hypothesis 1c) training would
offering information on which specific strategies might be used more lead to more sophisticated verbalized strategy use (Resing et al.,
effectively, may provide valuable insight into what a person needs to 2012). We also expected (Hypothesis 1d) a decrease in the time spent
improve specific task performance (Greiff, Wüstenberg, & Avvisati, on the task as a result of more familiarity with the type and structure
2015). The pivotal role of strategy use in task performance has also of the tasks as a result of training (Tenison et al., 2014).
been highlighted by Siegler (2004, 2007). He found not only that Second, we investigated children's shifts in the process of solving
instability in strategy use over a short period of time is associated with the series completion tasks as a result of repeated practice and train-
improvement in task performance (Siegler, 2004, 2007) but also that ing, by distinguishing subgroups of children based on their initial task
this improvement seems connected to a person's ability to adapt strat- solving processes. It was expected that the distribution of children
egy use to the requirements of the situation (Hunt, 1980; Siegler, over the subgroups would change from pretest to posttest and that
1996). He concluded, however, that an individual's global strategy pat- trained children would move towards more sophisticated categories
tern that was displayed throughout learning situations could be char- of grouping behaviour than nontrained children (Hypothesis 2a). We
acterized by a shift from less to more advanced strategy use (Siegler, also expected trained children moving towards more advanced verbal-
1996; Siegler & Svetina, 2006). Nevertheless, although more expert ized strategy categories than nontrained children (Hypothesis 2b).
reasoners appear to use more advanced strategies more frequently, Third, we expected (Hypothesis 3a) process measures to be
both simple and advanced strategies can produce accurate task out- related to accuracy on the series completion task and to children's
comes (Klauer & Phye, 2008). Recent studies have stressed that the academic performance on mathematics and reading comprehension.
relationship between performance and strategy use could be mediated The process measures were expected to provide explanatory value
by task difficulty (Goldhammer et al., 2014; Tenison, Fincham, & for academic performance on mathematics (Hypothesis 3b) and on
Anderson, 2014). reading comprehension (Hypothesis 3c). In line with previous research
In practice, however, process‐oriented testing has shown to be (Elliott, 2000; Greiff et al., 2013; Zoanetti & Griffin, 2017), we also
challenging, because the sequential solving steps involved can quickly expected (Hypothesis 3d) dynamic test measures (scores) to provide
become too much to analyse or are often difficult to interpret superior prediction over static measures regarding school performance
(Zoanetti & Griffin, 2017). With the emergence of computers in the (Caffrey et al., 2008; Resing, 1993).
educational and cognitive testing domains, it has become easier to
collect data regarding children's process of task solving. Computers
allow for monitoring an individual's progress, while providing individ-
2 | METHOD
ual learning experiences (Price, Jewitt, & Crescenzi, 2015; Verhaegh,
Fontijn, & Hoonhout, 2007). Although the opportunity to analyse 2.1 | Participants
problem solving behaviour from digital log files has been praised since The study employed 253 children, 134 boys and 119 girls
the early days of computer‐based assessment, interpreting these files (M = 7.8 years; SD = 0.61 years). The children were recruited from
in a meaningful way has proven to be difficult (Greiff et al., 2015; 12 second grade classes in nine primary schools, all located in mid-
Zoanetti & Griffin, 2017). As a result, the advantages offered by com- dle‐class socio‐economic status regions in the Netherlands. Informed
puterized assessment appear to have hardly been exploited optimally. consent was obtained from both the teachers and the parents before
testing started. The research was approved by the ethics board of
the university. Fifteen children were not able to attend all sessions,
1.4 | Aims and research questions
and therefore, their data were not included in the data for analysis.
The current study sought to investigate the possibilities for process‐
oriented dynamic testing, using various ways of process measurement.
2.2 | Design
By combining these outcomes, we aimed to study the predictive valid-
ity of dynamic testing with regard to academic performance. We used A pretest–posttest control group design was used (see Table 1 for an
a dynamic testing format in which half the participating children were overview). A randomized blocking procedure was used to assign
130 VEERBEEK ET AL.
Raven's Standard
Progressive Matrices Pretest Training 1 Training 2 Posttest
Training X X X X X
Control X X dots dots X
children to either the training (N = 126) or the control (N = 127) con- 2.3.3 | TagTiles console
dition. Blocking in pairs was, per school, based on children's scores on A tangible user interface (TUI), TagTiles (Serious Toys, 2011), was
the Raven's Standard Progressive Matrices (Raven, Raven, & Court, utilized for administering the dynamic test. The console consisted of
1998), collected prior to the pretest session. Per pair, children were an electronic grid with 12 × 12 fields, which included sensors to detect
randomly assigned to a condition and, then, were individually tested activity on its surface. The console was equipped with multicolour
during four sessions. Children who were assigned to the training con- LEDs, providing visual feedback, and audio playback, used for instruc-
dition received a pretest, two training sessions, and a posttest. Control tions and prompts during the pretest and posttest and the training.
group children received the same pretest and posttest but spent an To use the functionality of computer systems in monitoring behav-
equal amount of time on visual–spatial dot‐completion tasks, instead iour and providing automated responses, but not be restricted to the
of receiving training sessions. Each session lasted approximately regular computer interface such as a mouse and keyboard, TUIs were
30 min. Sessions took place weekly. developed (Verhaegh, Resing, Jacobs, & Fontijn, 2009). These physical
objects allow for natural manipulation and have electronic sensors built
in to use some of the functionality of computers (Ullmer & Ishii, 2000).
2.3 | Materials
These TUIs allow for monitoring the task solving process through the
2.3.1 | Raven's Standard Progressive Matrices physical manipulations of the solver (Verhaegh, Fontijn, et al., 2007).
To assess the children's level of inductive reasoning ability before test- They are easier to use by children, because the physical tangibles do
ing, Raven's Standard Progressive Matrices was used (Raven et al., not require any interpretation or representation like PC interfaces do
1998). The test consists of 60 items, progressing in difficulty. It (Verhaegh et al., 2009), thereby allowing for more accurate measure-
requires the children to detect which piece is missing and choose ment for assessment purposes (Verhaegh, Fontijn, Aarts, & Resing,
the correct answer out of six to eight options based on the character- 2013; Verhaegh, Fontijn, & Resing, 2013). The console enabled children
istics and relationships in the item. The Raven test has an internal con- to work independently (Verhaegh, Hoonhout, & Fontijn, 2007), because
sistency coefficient of ɑ = 0.83 and a split‐half coefficient of r = 0.91. it was programmed not only to provide standardized instruction and
assistance as a response to the child's actions (Verhaegh, Fontijn, Aarts,
Boer, & van de Wouw, 2011), but also to record children's task solving
2.3.2 | Scholastic achievement
processes step‐by‐step (Henning, Verhaegh, & Resing, 2010).
The scores of the Dutch standardized, norm‐referenced tests of
scholastic achievement (Cito Math, Janssen, Hop, & Wouda, 2015,
and Cito Reading Comprehension, Jolink, Tomesen, Hilte, Weekers, 2.3.4 | Dynamic test of schematic picture series
& Engelen, 2015) were provided by the participating schools. These completion
tests have been developed with the purpose of monitoring children's To assess children's task solving process, a dynamic test version of a
progress on the school subjects. Children's achievement on the test pictorial (puppets) series completion task was used (Resing & Elliott,
is scored on a scale that ranges from “A” to “E,” with “A” scores 2011; Resing, Touw, Veerbeek, & Elliott, 2017; Resing, Tunteler, &
representing the highest (25%) performance and “D” (15%) and “E” Elliott, 2015; Resing et al., 2012). The puppet task has been designed
representing the lowest (10%), compared with the average perfor- as a schematic picture series completion task with a constructed
mance of Dutch children of the same age (Janssen et al., 2015; Jolink response answering format. Each series consists of six puppet figures,
et al., 2015; Keuning et al., 2015). For two children, a Cito Math score and the child has to provide the seventh (Figure 1). To solve the task,
was not available; for 63 children, a Cito Reading Comprehension the child has to detect the changes in the series, by looking for transfor-
score was not provided because their schools did not administer this mations in the task characteristics and the periodicity of the transfor-
test. The reliability for mathematics (M4 [Grade 2]), defined in terms mations. From this, the rule(s) underlying these changes has (have)
of measurement accuracy, is MAcc = 0.93 (Janssen et al., 2015). For to be induced before the task can be solved (Resing & Elliott, 2011).
reading comprehension (M4 [Grade 2]), the reliability in terms of mea- The child has to solve each series on the console, using coloured
surement accuracy is MAcc = 0.86 (Jolink et al., 2015). blocks with RFID tags. Each puppet consists of seven body pieces,
FIGURE 1 Example item of the puppet series completion task [Colour figure can be viewed at wileyonlinelibrary.com]
VEERBEEK ET AL. 131
differing in colour (yellow, blue, green, and pink), pattern (plain, stripes, answering patterns. A human test leader was present to escort the
and dots), and head (male and female). The task has varying levels of children from and to the classroom. During testing, the test leader
difficulty, with gradually more changes in the periodicity and number recorded the placement of pieces and verbalizations given by the child,
of transformations. The items were presented in a booklet, which providing a backup in case the electronic console would malfunction.
displayed one item per page.
2.4 | Scoring
2.3.5 | Pretest and posttest
The variables recorded in the log files included the time of placement
The pretest and posttest both consist of 12 items and are equivalently for each piece and the identity and placement location of each piece
constructed. Each item on the pretest has a parallel item on the post- placed on the console surface. In addition, for each item, the log files
test with the same transformations and periodicity (but,e.g., different contained the number of correctly placed pieces, completion time,
colours, patterns, or heads). Both the pretest and the posttest sessions and whether or not the answer that was provided was accurate. The
started with an example item presented and instructed by the console. log files were cleared of irrelevant data, such as accidental movement
The two training sessions consisted of six items each. Scoring of pieces, or motoric difficulty in the correct placement of the pieces.
was based on the accuracy of solving the items on the test. The The relevant data were then imported into SPSS for further analysis. In
score consisted of the amount of correctly solved items on the case of a computer malfunction, data were retrieved from the manu-
test, which could range between 0 and 12. The overall Pearson corre- ally scored hardcopies. Additionally, the manually scored hardcopies
lation between pretest and posttest was (r = 0.54, p < 0.001), which, included a written record of children's explanations of their solutions.
as expected, was slightly higher for the control condition (r = 0.59, These explanations were also recorded on audio, for which explicit
p < 0.001) than for the training condition (r = 0.51, p < 0.001). consent was given by the children's parents.
FIGURE 2 Prompts provided by the electric console during the training procedure of the dynamic test
132 VEERBEEK ET AL.
the GAP was defined as full analytical, if all of the expected groups when children were required to click on the bottom right corner of the
in that item were placed; partial analytical, if between 50% and 99% console. Out of the completion times, the average completion times
of the expected groups for the item were placed; and nonanalytical, were calculated over the full test. For some children (N = 18), for
if 50% or less of the expected groups for the item were placed. which the completion times for one or two items were missing,
Children were allocated to a strategy class based on the frequency average time scores were calculated with the remaining items. If the
of GAP scores over all test items. If a single strategy category was completion times of more than two items were missing, the children
used on more than 33% of the items, the child was allocated to the (one at pretest, three at posttest) were excluded from the time analy-
corresponding strategy class. Mixed strategy classes were used if ses (N = 4).
children used two types of GAP in more than 33% of the cases. More
information on the categories and classes and which criteria applied
for them can be found in Appendix B. 3 | RESULTS
Hypothesis Result
1a. Higher accuracy in series completion solving as a result of Significant effects found for session, condition, and Session × Condition.
graduated prompts training Children who received training made more progress from pretest to
posttest
1b. More grouping activities in series completion solving as a Significant effect found for session, but not for condition and
result of graduated prompts training Session × Condition. All children progressed from pretest to posttest
1c. More sophisticated verbalized strategy use in series Significant effects found for session, condition, and Session × Condition.
completion solving as a result of graduated prompts training Sharper increase in the use of full inductive verbal strategy use for
trained children
1d. Decreased time spent on task Significant effect found for session, but not for condition and
Session × Condition. Completion times became shorter from pretest
to posttest for all children.
2a. More sophisticated categories of grouping behaviour used as a Significant relationship found between the condition and the use of GAP
result of training on the posttest; trained children made more use of more advanced
grouping behaviour on the posttest
2b. More advanced verbalized strategy categories used as a result Significant relationship found between the condition and the use of
of training verbalized strategy class on the posttest; trained children made more
use of more advanced verbal strategies on the posttest
3a. Process measures related to accuracy on the series completion On the pretest, all process measures were related to accuracy on the
task and academic performance series completion task. On the posttest, there were different patterns
of correlations between conditions.
3b. Process measures provide explanatory value for academic On the pretest, GAP and accuracy were significant predictors for math
performance on mathematics performance. On the posttest, process measures did not add to the
prediction.
3c. Process measures provide explanatory value for academic On the pretest, process measures did not add to the prediction of
performance on reading comprehension reading comprehension, although accuracy was a significant predictor.
On the posttest, accuracy, number of prompts, and completion time
were all predictors to reading comprehension scores
3d. Dynamic test measures provide superior prediction over static For math, posttest scores provided more explained variance than pretest
measures regarding school performance scores. For reading comprehension, number of prompts provided
more explained variance than pretest accuracy, but posttest accuracy
did not
TABLE 3 Means and standard deviations for accuracy, GAP categories, verbal strategy categories, and completion time
condition (training/control) as between‐subjects factor. Multivariate frequently and partial and full analytical GAP more frequently. How-
effects were found for session, Wilk's λ = 0.619, F (2, 250) = 76.87, ever, the graduated prompts training did not result in a faster progres-
p < 0.001, η = 0.38, but not for condition, Wilk's λ = 0.994, F (2,
2
sion towards more advanced GAP than repeated practice did.
250) = 0.791, p = 0.455, η2 = 0.01, or Session × Condition, Wilk's Third, we expected that training would lead to more sophisticated
λ = 0.991, F (2, 250) = 1.155, p = 0.317, η2 = 0.01. Univariate analyses verbalized strategy use. A multivariate repeated measures ANOVA
(see Table 4 and Figure 4) per GAP category revealed a significant was conducted with session (pretest/posttest) as within‐subjects
main effect for session for nonanalytical, partial analytical, and full factor, condition (dynamic testing/control) as between‐subjects factor,
analytical GAP. These results showed that the use of GAP changed and the number of verbal explanations per strategy category (nonin-
from pretest to posttest. Children used nonanalytical GAP less ductive, partial inductive, and full inductive) as dependent variables.
134 VEERBEEK ET AL.
TABLE 4 Results of the repeated measures ANOVAs for Accuracy (N = 253), GAP categories (N = 253), Verbal strategy categories (N = 253), and
Completion Time (N = 249)
FIGURE 4 Mean pretest and posttest scores and standards deviations for accuracy, completion time, grouping of answer pieces, and verbalized
strategies
VEERBEEK ET AL. 135
Multivariate effects were found for session, Wilk's λ = 0.799, F (3, analyses (chi‐square tests) were employed to evaluate how children's
249) = 20.89, p < 0.001, η2 = 0.20; condition, Wilk's λ = 0.965, F (3, behaviour and verbal solving processes changed over time (Table 5).
249) = 2.99, p = 0.031, η2 = 0.04; and Session × Condition, Wilk's We analysed the predicted shifts in GAP by analysing the relationship
λ = 0.934, F (3, 249) = 5.83, p = 0.001, η2 = 0.07. Univariate analyses between condition (training/control) and GAP class—(1) nonanalytical;
(see Table 4 and Figure 4) revealed significant main effects for session (2) mixed 1 and 3; (3) partial analytical; (4) mixed 3 and 5; and (5) full
for the noninductive and the full inductive strategy category, but not analytical. These classes have been described in Appendix B. On the
for the partial inductive strategy category. A significant effect for con- pretest, no significant relationship was found between condition and
dition was found for the full inductive strategy category, but not for the use of GAP, χ2pretest(n = 253) = 6.39, p = 0.172 (40% of the cells
the noninductive and partial inductive strategy category. Similarly, a have expected count less than 5). On the posttest, a significant
significant interaction effect was found for Session × Condition for relationship was found between condition and the use of GAP,
the full inductive strategy category, but not for the noninductive or χ2posttest(n = 253) = 8.28, p = 0.041 (25% of the cells have expected
the partial inductive strategy category. From pretest to posttest, there count less than 5). As we expected, trained children made more use
was a reduction in the use of noninductive verbal strategies and an of more advanced grouping behaviour on the posttest than children
increase in the use of full inductive verbal strategies. More impor- who had not received training.
tantly, the trained children showed a sharper increase in the use of full Using comparable analyses, we examined the shifts in children's
inductive verbal strategies from pretest to posttest than did children in verbal strategy classes—(1) noninductive; (2) mixed 1 and 3; (3) partial
the control condition. inductive; (4) mixed 3 and 5; and (5) full inductive—in relation to the
Finally, a repeated measures ANOVA with session (pretest/ condition (training/control). The pretest data showed, as expected,
posttest) as within‐subjects factor, condition (training/control) as no significant effect for condition on the verbalized strategy class,
between‐subjects factor, and completion time as dependent variable χ2pretest(n = 252) = 4.49, p = 0.344 (40% of the cells have expected
revealed a significant main effect for session, but not for condition, count less than 5). However, on the posttest, a significant effect for
or Session × Condition. Children's completion times became shorter condition was revealed, χ2posttest(n = 253) = 14.58, p = 0.006 (0% of
from pretest to posttest, but the training did not lead to a significant the cells have expected count less than 5). In line with our hypothesis,
difference compared with repeated practice. trained children made more use of more advanced verbal strategies
than those who did not receive training.
TABLE 5 Results for the crosstabs analyses for grouping of pieces and verbalized strategies
3. Partial 5. Full
1. Nonanalytical 2. Mixed 1 and 3 analytical 4. Mixed 3 and 5 analytical Missing Total
Grouping of pieces—Training
Pretest Frequency 32 2 40 1 51 126
Percentage 25.4 1.6 31.7 0.8 40.5 100
Posttest Frequency 6 0 16 2 102 126
Percentage 4.8 0.0 12.7 1.6 81.0 100
Grouping of pieces—Control
Pretest Frequency 46 1 25 1 54 127
Percentage 36.2 0.8 19.7 0.8 42.5 100
Posttest Frequency 18 0 9 3 97 127
Percentage 14.2 0.0 7.1 2.4 76.4 100
3. Partial 5. Full
1. Noninductive 2. Mixed 1 and 3 inductive 4. Mixed 3 and 5 Inductive Missing Total
Verbal explanation—Training
Pretest Frequency 54 10 56 4 1 1 126
Percentage 43.2 8.0 44.8 3.2 0.8 100
Posttest Frequency 40 7 51 10 18 126
Percentage 31.7 5.6 40.5 7.9 14.3 100
Verbal explanation—Control
Pretest Frequency 57 18 50 1 1 127
Percentage 44.9 14.2 39.4 0.8 0.8 100
Posttest Frequency 49 15 51 9 3 127
Percentage 38.6 11.8 40.2 7.1 2.4 100
136 VEERBEEK ET AL.
scores on mathematics and reading comprehension. To answer the of the variance in math achievement, with an explained variance in
question whether dynamic measures would provide more predictive math of 9.6%. Accuracy on the pretest of the series completion test
value than static (pretest) measures, multiple linear regression analyses and pretest GAP were the only significant predictors in this third
were carried out. Math and reading comprehension achievement model.
scores were included as the respective dependent variables and A second hierarchical regression was run to analyse the predictive
accuracy scores, GAP scores, verbalization class, completion times, value of the posttest scores regarding the math achievement scores.
and number of prompts as predictor variables, for pretest and posttest, The results were shown in Table 8. Model 1, with the posttest GAP
respectively. Table 6 shows the correlation structure of all variables as predictor, did not show significance. Adding the posttest verbaliza-
involved in the various regression analyses. tion and completion time scores as predictors did not lead to a signif-
Hierarchical regression analyses were run on the data of children icant model. In a third model, posttest accuracy was added as a
in the training condition. The results were displayed in Table 7. A first predictor, which led to a significant model that explained 12.7% of
hierarchical regression analysis was conducted with math achievement variance in math scores. In this model, posttest accuracy was the only
score as the dependent variable and the GAP pretest score as the significant predictor. An additional model was used, in which the num-
independent variable. This analysis led to a significant model, which ber of prompts provided during training was included as a predictor
explained 4.4% of variance in math. In a second model, the pretest instead of posttest accuracy. This model significantly explained
GAP, verbalization, and completion time were entered as predictors. 12.8% of the variance in math scores. The number of prompts pro-
This model was significant but did not provide a significant improve- vided during the training condition was the only significant predictor
ment upon the first model. Pretest GAP was the only significant in this model. In line with our expectations, dynamic (posttest) mea-
predictor in this model. A third model in which the pretest accuracy sures provided more explained variance in math scores (12.7% and
score was added as predictor led to a significantly better explanation 12.8%, respectively) than static (pretest) measures (9.6%).
TABLE 6 Correlations for process and outcome measures on the puppet task and mathematics and reading comprehension
Pretest (N = 253) Posttest (N = 253)
Dynamic testing (n = 127) Control (n = 126)
Accuracy Math Reading Accuracy Math Reading Accuracy Math Reading
Accuracy 0.28** 0.36** 0.37** 0.31** 0.26** 0.31**
GAP 0.31** 0.20** 0.21** 0.07 −0.06 −0.10 0.35** 0.07 0.16
Verbalization 0.45** 0.11 0.22** 0.37** 0.22* 0.15 0.41** 0.10 0.14
Time 0.22** −0.03 0.02 0.30* 0.06 −0.11 0.07 −0.11 0.07
Prompts −0.72** −0.37** −0.35**
TABLE 7 Regression analyses for the prediction of school results for the dynamic testing group on the pretest
TABLE 8 Regression analyses for the prediction of school results for the dynamic testing group on the posttest
Model 1 Model 2 ( F = 2.20, R2 = 0.05) Model 3 ( F = 5.46**, R2 = 0.16) Model 4 ( F = 5.53**, R2 = 0.16)
( F = 0.397, R2 = .000) F Δ = 3.09*, R2Δ = 0.05 F Δ = 14.53**, R2Δ = 0.10 F Δ = 14.78**, R2Δ = 0.11
Math
(n = 124) B SE β B SE β B SE β B SE β
Constant 4.45 1.04 3.94 1.06 3.48 1.02 5.74 1.11
GAP −0.82 1.30 −0.06 −1.08 1.30 −0.08 −1.23 1.23 −0.09 −1.58 1.23 −0.11
Verbalization 0.20 0.08 0.21* 0.09 0.08 0.10 0.05 0.09 0.05
Completion time 2.79 E‐6 0.00 0.04 −3.77 E‐6 0.00 −0.05 −2.08 E‐6 0.00 −0.03
Accuracy 0.18 0.05 0.36**
No. of prompts −0.05 0.01 −0.38**
Model 1 Model 2 ( F = 1.68, R2 = 0.05) Model 3 ( F = 4.22**, R2 = 0.16) Model 4 ( F = 4.87**, R2 = 0.18)
( F = 1.00, R2 = 0.01) F Δ = 2.01, R2Δ = 0.04 F Δ = 11.27**, R2Δ = 0.11 F Δ = 13.72**, R2Δ = 0.13
Reading comprehension
(n = 94) B SE β B SE β B SE β B SE β
Constant 4.73 1.22 4.92 1.26 4.39 1.21 6.72 1.28
GAP −1.56 1.56 −0.10 −1.57 1.55 −0.11 −1.65 1.47 −0.11 −1.95 1.46 −0.13
Verbalization 0.17 0.10 0.18 0.07 0.10 0.07 0.01 0.10 0.01
Completion time −1.02 E‐5 0.00 −0.13 −1.69 E‐5 0.00 −0.22* −1.52 E‐5 0.00 −0.20*
Accuracy 0.18 0.05 0.36**
No. of prompts −0.06 0.02 −0.41**
Note. The F Δ and R2Δ of Model 4 are based on the change from Model 2.
*p < 0.05. **p < 0.01.
These differential effects for the process measures can be under- information, however, seems to differ for each type of process mea-
stood in the light of core differences in children's solving processes on sure. For the grouping behaviour, it was found that after training and
the series completion task. On the one hand, verbalizations can be repeated practice with the task, the majority of children progressed
seen as rather task‐specific processing, as they are descriptions of towards the most advanced grouping category. This might indicate
the rules underlying the series completion items, representing specific that low grouping scores could be interpreted as a warning signal.
strategies to series completion problem solving. The graduated For the verbalizations, on the other hand, even after training, a sub-
prompts method most likely provided the children, if necessary, with stantial number of children still provided verbalizations that were
detailed task knowledge, which would mean that the more general classified in the lowest category, because a large group of children
problem solving structures that are used to solve unfamiliar problems were not able to explain how the series should be solved. Only very
would become less relevant. This notion was supported by the pat- few children were able to consistently provide complete explanations
terns of relations between task success and process measures for and could be identified as the top performers. With regard to comple-
the trained children, versus those who had received repeated practice tion time, more time spent on the task was associated with better
only and children's untrained performance on the pretest. This would performance. Fast performance would be an indicator that children
be in line with the model proposed by Weisberg (2015), which states do not take enough time to acquire information and control and
that, when solving a problem, the first stage is to search for any monitor their actions (Scherer, Greiff, & Hautamäki, 2015).
available knowledge that could be used for solving the problem. The Previous research has shown superior predictive qualities of
graduated prompts method procedure provided specific knowledge dynamic testing for school performance compared with static testing
and methods for solving the series completion task. This knowledge (Caffrey et al., 2008; Elliott et al., 2018), and our findings seem mostly
was likely not previously available to the children on the pretest, nor in line with this trend. The dynamic (trained posttest) performance
did they acquire it through repeated practice. As a result, untrained showed a higher predictive relationship for mathematics than did
performance was dependent on the second and third stages of the the static (pretest) task performance, as it did in previous research
model, being domain‐general methods and the restructuring of (e.g., Stevenson, Bergwerff, Heiser, & Resing, 2014). For the prediction
the problem, respectively (Weisberg, 2015). Grouping behaviour, on of reading comprehension, the amount of help provided during train-
the other hand, was thought to be a general measure of how children ing provided more prediction than static test measures, but trained
are able to restructure the problem representation, by dividing the (posttest) performance did not. Furthermore, on the dynamic test,
task into smaller subproblems, a form of means‐ends analysis (Newell completion time was the only process measure that was related to
& Simon, 1972; Pretz et al., 2003; Robertson, 2001; Weisberg, 2015). reading comprehension. Surprisingly, here, faster performance was
Our data show that most children already used an elementary form of predictive of better reading comprehension scores. The potential rela-
grouping behaviour at the pretest and progressed in doing so when tion between completion time and reading comprehension has, how-
tested twice. This would also explain why GAP, as a measure for ever, been analysed in a linear way. In future, a curvilinear analysis
restructuring of the problem representation, was no longer related to method, as has been reported in other domains (e.g., Greiff, Niepel,
performance after training. Robertson (2001) distinguished between Scherer, & Martin, 2016), might further confirm or disconfirm this per-
strong and weak methods of problem solving. Strong methods were ceived change. The other process measures no longer contributed to
described as learned scripts that provide a reasonable certainty of the prediction of school performance beyond the prediction offered
solving the problem correctly. In contrast, weak methods would be by accuracy. For both math and reading comprehension, the number
methods for the solver to use when no clear method of solving is avail- of prompts children needed during training provided more predictive
able. These do not guarantee a correct solution (Newell & Simon, 1972; value than outcome scores.
Robertson, 2001). The graduated prompts training will likely have Of course, this study had some limitations. The use of a con-
provided children with strong methods, rendering the use of these structed response answering format enabled measuring of process
weak methods less important to attain a correct solution to the task. indicators, as well as analysis of children's actions through rule‐based
The process measures were weakly to moderately related to accu- log file analysis in a manner that would not have been possible in a mul-
racy in solving the series completion task. In line with previous expec- tiple‐choice answering format. This poses a limitation to the applicabil-
tations voiced in literature (e.g., Elliott, 2000; Greiff et al., 2013; ity of the GAP measure and may prove to be an issue when applying
Zoanetti & Griffin, 2017), the process measures used in this study this measure to a more diverse set of tests. We nevertheless would like
would provide explanatory information on task performance. The to encourage future test makers to make use of constructed response
rule‐based log file analysis was instrumental in uncovering process answering formats, as it seems to provide useful information, that
information, particularly in relation to the restructuring of the problem cannot be obtained from traditional multiple‐choice tests (Kuo, Chen,
representation, by the analysis of the GAP. The predictive value of Yang, & Mok, 2016; Stevenson et al., 2016; Yang et al., 2002).
GAP extended beyond the series completion task performance, to It should be taken into account that the current findings were
school performance on mathematics and reading comprehension. This obtained using a series completion task and therefore cannot readily
supports the notion that process measures, such as GAP, could pro- be generalized to any other domains. Similarly, this research was con-
vide us with more understanding of reasons for not correctly solving ducted with a single, specific age group, for which inductive reasoning
the tasks and subsequently might provide information for intervention ability is still in full development. Using other age groups in future
(Elliott, 2000; Greiff et al., 2013; Yang, Buckendahl, Juszkiewicz, & research could provide us with information on which processes
Bhola, 2002; Zoanetti & Griffin, 2017). The meaning of the process transcend beyond these age limits.
VEERBEEK ET AL. 139
In evaluating the processes involved in solving the series comple- CONFLIC T OF INT E RE ST
tion tasks, this research used only three separate process measures, There is no conflict of interest.
which all appeared to measure different aspects of the series comple-
tion solving process. Despite using metacognitive prompts during ORCID
training, this study did not include any measures for level of
Bart Vogelaar http://orcid.org/0000-0002-5131-2480
metacognitive functioning. Future research might identify other
Wilma C.M. Resing http://orcid.org/0000-0003-3864-4517
factors involved in series completion performance and the training of
series completion solving ability. These would include not only cogni- RE FE RE NC ES
tive factors such as strategy use and knowledge but also factors
Alibali, M. W., Phillips, K. M. O., & Fischer, A. D. (2009). Learning new
such as metacognitive skills and emotional and motivational factors. problem‐solving strategies leads to changes in problem representation.
Also, as the task solving process has shown to interact with item Cognitive Development, 24(2), 89–101. https://doi.org/10.1016/j.
cogdev.2008.12.005
characteristics such as item difficulty (Dodonova & Dodonov, 2013;
Caffrey, E., Fuchs, D., & Fuchs, L. S. (2008). The predictive validity of
Goldhammer et al., 2014; Tenison et al., 2014), future research should
dynamic assessment: A review. The Journal of Special Education, 41(4),
take these item characteristics into account, to gain more detailed 254–270. https://doi.org/10.1177/0022466907310366
insights into the factors that are at play in successfully solving series Campione, J. C. (1989). Assisted assessment: A taxonomy of approaches
completion tasks. and an outline of strengths and weaknesses. Journal of Learning Disabil-
Additionally, although this research revealed some indications ities, 22(3), 151–165.
that process measurement can provide information on both reasons Campione, J. C., Brown, A. L., Ferrara, R. A., Jones, R. S., & Steinberg, E.
(1985). Breakdowns in flexible use of information: Intelligence‐related
for failure and possible interventions, no clear framework yet exists differences in transfer following equivalent learning performance.
to interpret these process measures or connect them to practical Intelligence, 9(4), 297–315. https://doi.org/10.1016/0160‐2896(85)
and evidence‐based interventions. Future research could provide 90017‐0
guidelines regarding process data to inform practitioners on the usabil- Dodonova, Y. A., & Dodonov, Y. S. (2013). Faster on easy items, more
accurate on difficult ones: Cognitive ability and performance on a task
ity of process measures in assessment and intervention. Future
of varying difficulty. Intelligence, 41(1), 1–10. https://doi.org/10.1016/
research could provide guidelines regarding process data to inform j.intell.2012.10.003
practitioners on the usability of process measures in assessment and Elliott, J. G. (2000). The psychological assessment of children with learning
intervention. Previous research (e.g., Greiff et al., 2016) concluded that difficulties. British Journal of Special Education, 27(2), 59–66. https://
doi.org/10.1111/1467‐8527.00161
completion time and complex problem solving showed a curvilinear
Elliott, J. G., Grigorenko, E. L., & Resing, W. C. M. (2010). Dynamic
instead of a linear relationship. Future research could focus on non‐
assessment. In P. Peterson, E. Baker, & B. McGaw (Eds.), International
linear relationships between process measures and performance to Encyclopedia of Education (Vol. 3, pp. 220–225). Oxford: Elsevier.
provide more detailed information. Elliott, J. G., Resing, W. C. M., & Beckmann, J. F. (2018). Dynamic
In conclusion, this research revealed some information concerning assessment: A case of unfulfilled potential? Educational Review, 70(1),
the potential value of process‐oriented dynamic testing in predicting 7–17. http://doi.org/10.1080/00131911.2018.1396806
school results and the value of process measures for indicating the Ericsson, K. A., & Simon, H. A. (1980). Verbal reports as data. Psychological
Review, 87(3), 215–251. https://doi.org/10.1037/h0021465
underlying causes of success or failure on the dynamic series comple-
Fabio, R. A. (2005). Dynamic assessment of intelligence is a better reply to
tion task. Dynamic measures could be utilized to provide increased
adaptive behavior and cognitive plasticity. The Journal of General Psy-
predictive value for school performance. Through using a constructed chology, 132(1), 41–66. https://doi.org/10.3200/GENP.132.1.41‐66
response answering format, rule‐based log file analysis could success- Ferrara, R. A., Brown, A. L., & Campione, J. C. (1986). Children's learning
fully be administered to provide measures for the restructuring of the and transfer of inductive reasoning rules: Studies of proximal develop-
ment. Child Development, 57(5), 1087–1099. https://doi.org/10.2307/
problem representation in children. This measure of children's group-
1130433
ing behaviour in solving a series completion task provided predictive
Fiorello, C. A., Hale, J. B., Holdnack, J. A., Kavanagh, J. A., Terrell, J., & Long,
value for both performance on the series completion task itself, as well L. (2007). Interpreting intelligence test results for children with disabil-
as mathematics performance in school. ities: Is global intelligence relevant? Applied Neuropsychology, 14(1),
Training was found to result in changes in the processes involved 2–12. https://doi.org/10.1080/09084280701280379
in solving the series completion task. Instead of using domain‐general Goldhammer, F., Naumann, J., Stelter, A., Tóth, K., Rölke, H., & Klieme, E.
(2014). The time on task effect in reading and problem solving is mod-
methods of solving the tasks, children appeared to make more use of erated by task difficulty and skill: Insights from a computer‐based
different, learned scripts after graduated prompts training. The various large‐scale assessment. Journal of Educational Psychology, 106(3),
processes involved in solving series completion tasks played different 608–626. https://doi.org/10.1037/a0034716
roles in task success and were influenced differently by training. Greiff, S., Niepel, C., Scherer, R., & Martin, R. (2016). Understanding stu-
dents' performance in a computer‐based assessment of complex
These factors should all be taken into account when interpreting chil-
problem solving: An analysis of behavioral data from computer‐gener-
dren's processes in solving tasks and may need different interventions ated log files. Computers in Human Behavior, 61, 36–46. https://doi.
to remediate. Indeed, the picture that arises from the different pro- org/10.1016/j.chb.2016.02.095
cesses involved in solving these problems appears to become more Greiff, S., Wüstenberg, S., & Avvisati, F. (2015). Computer‐generated log‐
file analyses as a window into students' minds? A showcase study
complex as we learn more about them, rendering the possibilities for
based on the PISA 2012 assessment of problem solving. Computers
measurement offered by the use of computer more and more neces- and Education, 91, 92–105. https://doi.org/10.1016/j.compedu.
sary in interpreting these measurements. 2015.10.018
140 VEERBEEK ET AL.
Greiff, S., Wüstenberg, S., Molnár, G., Fischer, A., Funke, J., & Csapó, B. Resing, W. C. M. (1993). Measuring inductive reasoning skills: The
(2013). Complex problem solving in educational contexts—Something construction of a learning potential. In J. H. M. Hamers, K. Sijtsma, &
beyond g: Concept, assessment, measurement invariance, and con- A. J. J. M. Ruijssenaars (Eds.), Learning potential assessment: Theoretical,
struct validity. Journal of Educational Psychology, 105(2), 364–379. methodological and practical issues (pp. 219–241). Amsterdam/Berwyn,
https://doi.org/10.1037/a0031856 PA: Swets & Zeitlinger Inc.
Grigorenko, E. L., & Sternberg, R. J. (1998). Dynamic testing. Psychological Resing, W. C. M. (1998). Intelligence and learning potential: Theoretical
Bulletin, 124(1), 75–111. and research issues. In J. Kingma, & W. Tomic (Eds.), Conceptual issues
Haywood, C. H., & Lidz, C. S. (2007). Dynamic assessment in practice: Clinical in research on intelligence. Vol. 5 of Advances in cognition and educa-
and educational applications. New York: Cambridge University Press. tional practice (pp. 227–259). Stamford, Connecticut: JAI Press.
Henning, J. R., Verhaegh, J., & Resing, W. C. M. (2010). Creating an Resing, W. C. M. (2000). Assessing the learning potential for inductive
individualised learning situation using scaffolding in a tangible electronic reasoning in young children. In C. S. Lidz & J. G. Elliott (Eds.), Dynamic
series completion task. Educational & Child Psychology, 28(2), 85–100. assessment: Prevailing models and applications (pp. 229–262). New York:
Elsevier.
Hunt, E. (1980). Intelligence as an information‐processing concept. British
Resing, W. C. M. (2013). Dynamic Testing and Individualized Instruction:
Journal of Psychology, 71, 449–474.
Helpful in Cognitive Education? Journal of Cognitive Education and
Janssen, J., Hop, M., & Wouda, J. (2015). Wetenschappelijke Psychology, 12(1), 81–95. http://doi.org/10.1891/1945-8959.12.1.81
verantwoording Rekenen‐Wiskunde 3.0 voor groep 4 [Scientific report
Resing, W. C. M., & Elliott, J. G. (2011). Dynamic testing with tangible
of Arithmetic‐Mathematics 3.0 for grade 2]. Arnhem, The Netherlands:
electronics: Measuring children's change in strategy use with a series
Cito BV.
completion task. The British Journal of Educational Psychology, 81,
Jolink, A., Tomesen, M., Hilte, M., Weekers, A., & Engelen, R. (2015). 579–605. http://doi.org/10.1348/2044-8279.002006
Wetenschappelijke verantwoording Begrijpend lezen 3.0 voor groep
Resing, W. C. M., Xenidou‐Dervou, I., Steijn, W. M. P., & Elliott, J. G.
4 [Scientific report of Reading Comprehension 3.0 for grade 2].
(2012). A “picture” of children's potential for learning: Looking into
Arnhem, The Netherlands: Cito BV.
strategy changes and working memory by dynamic testing. Learning
Keuning, J., Boxtel, H. van, Lansink, N., Visser, J., Weekers, A., & Engelen, and Individual Differences, 22(1), 144–150. http://doi.org/10.1016/j.
R. (2015). Actualiteit en kwaliteit van normen Een werkwijze voor het lindif.2011.11.002
normeren van een leerlingvolgsysteem [Actuality and quality of norms:
Resing, W. C. M., Tunteler, E., & Elliott, J. G. (2015). The effect of dynamic
Method of determining norms of a student tracking system]. Arnhem,
testing with electronic prompts and scaffolds on children's inductive
The Netherlands: Cito BV.
reasoning: A microgenetic study. Journal of Cognitive Education and
Kirk, E. P., & Ashcraft, M. H. (2001). Telling stories: The perils and promise Psychology, 14(2), 231–251.
of using verbal reports to study math strategies. Journal of Experimental
Resing, W. C. M., Touw, K. W. J., Veerbeek, J., & Elliott, J. G. (2017). Progress
Psychology. Learning, Memory, and Cognition, 27(1), 157–175. https://
in the inductive strategy use of children from different ethnic
doi.org/10.1037/0278‐7393.27.1.157
backgrounds: A study employing dynamic testing. Educational Psychology,
Klauer, K. J., & Phye, G. D. (2008). Inductive reasoning: A training 37(2), 173–191. http://doi.org/10.1080/01443410.2016.1164300
approach. Review of Educational Research, 78(1), 85–123. https://doi.
Richard, J.‐F., & Zamani, M. (2003). A problem‐solving model as a tool for
org/10.3102/0034654307313402
analyzing adaptive behavior. In R. J. Sternberg, J. Lautrey, & T. I. Lubart
Kozulin, A. (2011). Learning potential and cognitive modifiability. Assess- (Eds.), Models of intelligence (pp. 213–226). Washington: American
ment in Education: Principles, Policy and Practice, 18(2), 169–181. Psychological Association.
https://doi.org/10.1080/0969594X.2010.526586
Richardson, K., & Norgate, S. H. (2015). Does IQ really predict job perfor-
Kuo, B.‐C., Chen, C.‐H., Yang, C.‐W., & Mok, M. M. C. (2016). Cognitive mance? Applied Developmental Science, 19(3), 153–169. https://doi.
diagnostic models for tests with multiple‐choice and constructed‐ org/10.1080/10888691.2014.983635
response items. Educational Psychology, 36(6), 1115–1133. https://
Robertson, S. I. (2001). Problem solving. East Sussex, UK: Psychology Press Ltd.
doi.org/10.1080/01443410.2016.1166176
Scherer, R., Greiff, S., & Hautamäki, J. (2015). Exploring the relation
Lidz, C. S. (2014). Leaning toward a consensus about dynamic assessment:
between time on task and ability in complex problem solving. Intelli-
Can we? Do we want to? Journal of Cognitive Education and Psychology,
gence, 48, 37–50. https://doi.org/10.1016/j.intell.2014.10.003
13(3), 292–307.
Serious Toys (2011). Serious Toys. Retrieved from www.serioustoys.com
Molnár, G., Greiff, S., & Csapó, B. (2013). Inductive reasoning, domain spe-
cific and complex problem solving: Relations and development. Siegler, R. S. (1996). Emerging minds: The process of change in children's
Thinking Skills and Creativity, 9, 35–45. https://doi.org/10.1016/j. thinking. New York: Oxford University Press.
tsc.2013.03.002 Siegler, R. S. (2004). Learning about learning. Merrill‐Palmer Quarterly,
Neisser, U., Boodoo, G., Bouchard, T. J., Boykin, A. W., Ceci, S. J., Loehlin, J. 50(3), 353–368. https://doi.org/10.1353/mpq.2004.0025
C., & Sternberg, R. J. (1996). Intelligence: Knowns and unknowns. Siegler, R. S. (2007). Cognitive variability. Developmental Science, 10(1),
American Psychologist, 51(2), 77–101. 104–109. https://doi.org/10.1111/j.1467‐7687.2007.00571.x
Newell, A., & Simon, H. A. (1972). Human problem solving. Englewood Cliffs, Siegler, R. S., & Svetina, M. (2006). What leads children to adopt new strat-
NJ: Prentice‐Hall. egies? A microgenetic/cross‐sectional study of class inclusion. Child
Perret, P. (2015). Children's inductive reasoning: Developmental and edu- Development, 77(4), 997–1015.
cational perspectives. Journal of Cognitive Education and Psychology, Simon, H. A., & Kotovsky, K. (1963). Human acquisition of concepts for
14(3), 389–408. sequential patterns. Psychological Review, 70(6), 534–546. https://doi.
Pretz, J. E., Naples, A. J., & Sternberg, R. J. (2003). Recognizing, defining, org/10.1037/h0043901
and representing problems. In J. E. Davidson, & R. J. Sternberg (Eds.), Sternberg, R. J. (1985). Beyond IQ: A triarchic theory of human intelligence.
The psychology of problem solving (pp. 3–30). New York: Cambridge New York: Cambridge University Press.
University Press. Sternberg, R. J. (1997). The concept of intelligence and its role in lifelong
Price, S., Jewitt, C., & Crescenzi, L. (2015). The role of iPads in pre‐school learning and success. American Psychologist, 52(10), 1030–1037.
children's mark making development. Computers & Education, 87, https://doi.org/10.1037/0003‐066X.52.10.1030
131–141. https://doi.org/10.1016/j.compedu.2015.04.003 Sternberg, R. J., & Gardner, M. K. (1983). Unities in inductive reasoning.
Raven, J., Raven, J. C., & Court, J. H. (1998). Raven manual: Standard Journal of Experimental Psychology: General, 112(1), 80–116. https://
progressive matrices. Oxford: Oxford Psychologists Press. doi.org/10.1037/0096‐3445.112.1.80
VEERBEEK ET AL. 141
Sternberg, R. J., & Grigorenko, E. L. (2002). Dynamic testing: The nature and spatial reasoning tasks: Comparing children's performance on tangible
measurement of learning potential. New York: Cambridge University Press. and virtual puzzles. Educational & Child Psychology, 26(3), 18–39.
Stevenson, C. E., Hickendorff, M., Resing, W. C. M., Heiser, W. J., & Verhaegh, J., Fontijn, W., Aarts, E. H. L., Boer, L., & van de Wouw, D.
de Boeck, P. A. L. (2013). Explanatory item response modeling of (2011). A development support bubble for children. Journal of
children's change on a dynamic test of analogical reasoning. Intelligence, Ambient Intelligence and Smart Environments, 3(1), 27–35. http://doi.
41(3), 157–168. http://doi.org/10.1016/j.intell.2013.01.003 org/10.3233/AIS-2011-0092
Stevenson, C. E., Bergwerff, C. E., Heiser, W. J., & Resing, W. C. M. (2014). Verhaegh, J., Fontijn, W. F. J., Aarts, E. H. L., & Resing, W. C. M. (2013).
Working memory and dynamic measures of analogical reasoning as In‐game assessment and training of nonverbal cognitive skills using
predictors of children's math and reading achievement. Infant and Child TagTiles. Personal and Ubiquitous Computing, 17(8), 1637–1646.
Development, 23, 51–66. http://doi.org/10.1002/icd http://doi.org/10.1007/s00779-012-0527-0
Stevenson, C. E., Heiser, W. J., & Resing, W. C. M. (2016). Dynamic testing Verhaegh, J., Fontijn, W. F. J., & Resing, W. C. M. (2013). On the correlation
of analogical reasoning in 5‐6 year olds: Multiple choice versus con- between children's performances on electronic board tasks and non-
structed‐response training. Journal of Psychoeducational Assessment, verbal intelligence test measures. Computers & Education Education,
34(6), 550–565. http://doi.org/10.1177/0734282915622912 69, 419–430. http://doi.org/10.1016/j.compedu.2013.07.026
Stringer, P. (2018). Dynamic assessment in educational settings: Is potential Weisberg, R. W. (2015). Toward an integrated theory of insight in problem
ever realised? Educational Review, 70(1), 18–30. https://doi.org/ solving. Thinking & Reasoning, 21(1), 5–39. https://doi.org/10.1080/
10.1080/00131911.2018.1397900 13546783.2014.886625
Tenison, C., Fincham, J. M., & Anderson, J. R. (2014). Detecting math prob- Yang, Y., Buckendahl, C. W., Juszkiewicz, P. J., & Bhola, D. S. (2002). A
lem solving strategies: An investigation into the use of retrospective review of strategies for validating computer‐automated scoring.
self‐reports, latency and fMRI data. Neuropsychologia, 54, 41–52. Applied Measurement in Education, 15(4), 391–412. https://doi.org/
https://doi.org/10.1016/j.neuropsychologia.2013.12.011 10.1207/S15324818AME1504
Ullmer, B., & Ishii, H. (2000). Emerging frameworks for tangible user inter- Zoanetti, N., & Griffin, P. (2017). Log‐file data as indicators for problem‐
faces. IBM Systems Journal, 39, 915–931. solving processes. In B. Csapó, & J. Funke (Eds.), The nature of problem
Verhaegh, J., Fontijn, W., & Hoonhout, J. (2007). TagTiles: Optimal challenge solving: Using research to inspire 21st century learning (pp. 177–191).
in educational electronics. In TEI'07: First International Conference on Paris: OECD Publishing.
Tangible and Embedded Interaction (pp. 187–190). New York: ACM
Press. http://doi.org/http://doi.acm.org/10.1145/1226969.1227008
How to cite this article: Veerbeek J, Vogelaar B, Verhaegh J,
Verhaegh, J., Hoonhout, J., & Fontijn, W. (2007). Effective use of fun with a
Resing WCM. Process assessment in dynamic testing using
tangible interactive console. In Proceedings of the 4th International
Symposium on Pervasive Gaming Applications, pp. 177–178. electronic tangibles. J Comput Assist Learn. 2019;35:127–142.
Verhaegh, J., Resing, W. C. M., Jacobs, A. P. A., & Fontijn, W. F. J. (2009). https://doi.org/10.1111/jcal.12318
Playing with blocks or with the computer? Solving complex visual‐
APPENDIX A
G RO U P I N G O F A NS W E R P I E C E S , G RO U P S P E R I T EM
For each item, the pieces that were considered adaptive when grouped together were discerned. The number of groups per item and which groups
applied to which item are displayed below.
Pretest Posttest
Item No. of groups Groups No. of groups Groups
1 3 1. Arms 3 1. Arms
2. Legs 2. Legs
3. Body 3. Body
2 2 1. Arms + Legs 2 1. Arms + Legs
2. Body 2. Body
3 4 1. Arms + Legs 4 1. Arms
2. ArmLeft + LegLeft 2. Legs
3. ArmRight + LegRight 3. Body
4. Body 4. Arms + Legs
4 2 1. Arms + Legs 2 1. Arms + Legs
2. Body 2. Body
5 4 1. Arms 4 1. Arms
2. Legs 2. Legs
3. Body 3. Body
4. Arms + Body 4. Arms + Body
6 3 1. ArmLeft + LegLeft 3 1. Arms
2. ArmRight + LegRight 2. Legs
3. Body 3. Body
7 5 1. ArmLeft + LegLeft 5 1. ArmLeft + LegLeft
2. ArmRight + LegRight 2. ArmRight + LegRight
3. Body 3. Body
4. ArmRight + BodyRight + LegRight 4. ArmRight + BodyRight + LegRight
5. ArmRight + Body + LegRight 5. ArmRight + Body + LegRight
142 VEERBEEK ET AL.
(Continued)
Pretest Posttest
Item No. of groups Groups No. of groups Groups
8 2 1. Arms + Legs 2 1. Arms + Legs
2. Body 2. Body
9 2 1. Arms + Legs 2 1. Arms + Legs
2. Body 2. Body
10 4 1. Arms 4 1. Arms
2. Legs 2. Legs
3. Body 3. Body
4. Arms + Body 4. Arms + Body
11 3 1. ArmLeft + LegLeft 4 1. Arms
2. ArmRight + LegRight 2. Legs
3. Body 3. Body
4. BodyLeft + BodyRight + Legs
12 5 1. Arms + Legs 5 1. Arms + Legs
2. Body 2. Body
3. ArmRight + BodyRight + LegRight 3. ArmRight + BodyRight + LegRight
4. BodyLeft + BodyMiddle 4. BodyLeft + BodyMiddle
5. Arms + BodyRight + Legs 5. Arms + BodyRight + Legs
APPENDIX B
C A T E G O R I E S O F G R O U P I N G B E HA V I O U R A N D V E R B A L S T R A T E G I E S
Scoring of the different categories per item for grouping behaviour and verbal strategies, and assignment to classes based on the use of these
strategies during the test session.
Full analytical Based on a GAP score of >99% for an item, which indicates adaptive grouping of the puppet parts, based on the
transformations in the item (pieces that go through similar transformations are grouped together) and similarity
in other characteristics such as colour, pattern, or anatomy (arms, legs, and body).
Partial analytical Based on a GAP score of 51–99% for an item, which indicates some use of adaptive grouping, but not yet
consistently using all of the transformations and characteristics of the item to structure the solving process.
Nonanalytical Based on a GAP score of 50% or lower, as an indicator of idiosyncratic solving that is not based on the analysis of
the item characteristics, but instead an unplanned or inflexible approach to solving the task.
Verbal strategy Description of category per item
Full inductive An inductive description of all the transformations in the task is provided, which could be completely verbal or
partially verbal with support of implicit explanation components such as pointing.
Partial inductive The child is able to provide some inductive explanation of the transformations in the series but does not explain all
transformations that are necessary to successfully complete the task.
Noninductive No inductive explanation is provided, but instead the explanation is either lacking (“I don't know”) or based on
information other than the relevant item characteristics (“I like pink”).
Based on the most frequently used categories of grouping behaviour and verbal strategies, children were allocated to classes that reflected
their most frequently used style of solving the items.
1. Nonanalytical 1. Noninductive Nonanalytical/noninductive behaviour was used the most and at least in >33% of
the items on the testing session (pretest/posttest)
2. Mixed 1 and 3 2. Mixed 1 and 3 Both nonanalytical/noninductive and partial analytical/partial inductive strategies
were used on more than 33% of the items
3. Partial analytical 3. Partial inductive Partial analytical/partial inductive behaviour was used the most and at least in
>33% of the items on the testing session. Also included in this class were
children that used both nonanalytical/noninductive and full analytical/full
inductive strategies in >33% of the items and children that used all three
categories equally much
4. Mixed 3 and 5 4. Mixed 3 and 5 Both partial analytical/partial inductive strategies and full analytical/full inductive
strategies were used on more than 33% of the items
5. Full analytical 5. Full inductive Full analytical/full inductive behaviour was used the most and at least in >33% of
the items on the testing session