1
A Framework for Controlling Non-Symbolic Numerical Stimuli
Yoel Shilat1, 2* ∙ Avishai Henik1, 2 ∙ Hanit Gallili2, 3 ∙ Shir Wasserman4 ∙ Moti Salti5
1 Department of Psychology, Ben-Gurion University of the Negev, Beer Sheva 84105, Israel
2 The Zelman Center for Brain Science Research, Ben-Gurion University of the Negev, Beer-Sheva, Israel
3 School of Brain Sciences and Cognition, Ben-Gurion University of the Negev, Beer-Sheva, Israel
4 Behavioral Sciences Program, Ben-Gurion University of the Negev, Beer-Sheva, Israel
5 Brain Imaging Research Center, Ben-Gurion University of the Negev, Beer-Sheva, Israel
* Corresponding author:
[email protected]; Phone: +972-8-6477209
2
Abstract
Non-symbolic numerical stimuli play a crucial role in numerical cognition. Physical properties such as
surface area, density, and item circumference are inherently correlated with quantity. The correlations
between physical properties and quantity mask the mechanism underlying numerical perception. Nonsymbolic stimuli are generated using different generation methods (GMs) aimed at controlling physical
properties. The way a GM controls physical properties affects numerical judgments. Here, using a
novel data-driven approach, we provide a methodological review of non-symbolic stimuli GMs
developed since 2000. Annotators tagged the GMs’ control over physical properties. Next, GMs were
qualitatively analyzed for different property controls, terminology, and definitions. The tagging and
qualitative analysis provided data suitable for quantitative analysis of GMs. We found that the field
thrives with a wide variety of GMs aimed at tackling new methodological and theoretical ideas.
However, the field lacks a common language and a method to incorporate new ideas in the existing
literature. Furthermore, these shortcomings impair the comparison, replication, and reanalysis of
previous studies in light of new ideas. We present guidelines for GMs that will hopefully contribute to
the field. First, researchers should define controlled properties explicitly and consistently. Second,
researchers should provide the code package used to generate stimuli. Third, researchers should also
provide the actual stimuli, coupled with the behavioral and neuronal responses. This last guideline
would enable researchers to reanalyze previously obtained data, enabling incorporating new ideas in
the context of prior research.
Keywords: Numerical cognition ∙ Numerosity ∙ Physical properties ∙ Reproducibsility ∙ Comparability ∙ Reanalysis
3
A Framework for Controlling Non-Symbolic Numerical Stimuli
Introduction
Recent numerical comparison studies struggle to control the physical properties of non-symbolic
arrays. An array of items can be described by its number of items or by its physical properties. The
physical properties of any item array have a natural and inherent correlation with the array’s quantity
(Dehaene, 1999; Leibovich et al., 2017; Mehler & Bever, 1967). Increasing or decreasing the number of
items necessarily changes at least one of the array’s physical properties (see Figure 1). Consider three
large apples lying on the kitchen counter. Adding a fourth large apple would increase the total surface
area and circumference of the apple’s stack (array).
Early studies show an association between performance in numerical judgments and the
physical properties of the presented stimuli (e.g., French, 1953; Frith & Frith, 1972; Piaget, 1968). In the
contemporary study of numerical cognition, the physical properties of a numerical array are considered
in two opposing manners. Some treat them as a biasing factor distracting us from the thing we want to
study—the ability to judge numerosity (e.g., Halberda et al., 2008; Piazza et al., 2004). Others treat them
as a key to understanding numerical capacity (Gebuis & Reynvoet, 2011a, 2011b; Leibovich et al., 2017).
Researchers from all views create non-symbolic numerical stimuli to study numerical cognition.
Regardless of their view of numerical cognition, researchers must address the natural correlation
between the array’s physical properties and its quantity to prevent confounds. Importantly, there are
abundant ways to create non-symbolic numerical stimuli relying on different theoretical and
methodological prisms (DeWind et al., 2015; Gebuis & Reynvoet, 2011a, 2011b; Katzin et al., 2019;
Piazza et al., 2004; Salti et al., 2017). The different ways the stimuli are created change the correlations
between numerical and physical properties. The different correlations between numerical and physical
properties bias behavioral and neuronal results (Clayton et al., 2015; DeWind & Brannon, 2016; Gebuis
& Reynvoet, 2011a, 2011b; Kuzmina & Malykh, 2022; Leibovich et al., 2017; Salti et al., 2017; Smets et
al., 2015). The different correlations between numerical and physical properties make it harder to
compare different studies (Clayton et al., 2015; Smets et al., 2015), and limit the ability to examine
previous studies through new theoretical prisms.
4
Figure 1
A Natural Correlation between Physical Properties and Numerosity
Note. Increasing or decreasing the number of items necessarily changes at least one of the array’s
physical properties. The left panel depicts three items, and the right one displays four items. A list of
some of the arrays’ physical properties is displayed on the left and right of the arrays. Adding one item
will necessarily change one or more of the arrays’ physical properties. Increasing the number of items
to four in the array in the top row would increase the total surface area and circumference of the array.
In contrast, increasing the number of items to four in the array in the bottom row, while keeping the
array’s total surface area and circumference constant, will result in decreasing the array’s average
diameter and area (denoted as the item surface area).
Non-symbolic numerical stimuli are multifaceted and can be classified according to different
taxonomies; for example, according to a global-local distinction (Guy & Medioni, 1993; Navon, 1977),
or intrinsic-extrinsic distinction (Gebuis & Reynvoet, 2011a; Salti et al., 2017). Individual physical
properties can also be defined in multiple ways. For example, there are seven different definitions for
density in the literature (see our Qualitative analysis). The diverse ways to address, control, manipulate,
and analyze physical properties make it harder to achieve a common scientific ground (Stalnaker, 2002).
The lack of common ground hinders efficient comparisons of results coming from different studies that
used different methods to generate stimuli. Hereafter, we will denote the Generation Methods to
generate non-symbolic numerical stimuli as GMs. A GM is defined as a unique algorithm for creating
visual non-symbolic stimuli according to predefined numerical and physical parameters. Different
GMs use different experimental controls and manipulations that generate different correlations
5
between numerical and physical properties (Gebuis & Reynvoet, 2011a; Katzin et al., 2019; Salti et al.,
2017). The different correlations might bias the results of studies. Smets et al. (2015) found that two
different GMs suggested by Piazza and Dehaene and by Gebuis and Reynvoet led to different Weber
fractions and performance accuracy (Gebuis & Reynvoet, 2011a; Piazza et al., 2004). Clayton et al.
(2015) have also found that two different GMs led to different accuracy and congruity effects. When the
stimuli were created using Gebuis and Reynvoet’s (2011b) GM, participants exhibited the canonical
congruency effect, while the Panamath GM (Halberda et al., 2008) led to the opposite congruency
effect. Using a stimuli set created by the Panamath GM, the participants’ accuracy was higher in
incongruent trials than in congruent trials. When Clayton et al. (2015) compared the stimuli produced by
Gebuis and Reynvoet’s (2011a) GM to Panamath GM stimuli (Halberda et al., 2008), they discovered
that the relationship between the arrays’ convex hull area, total surface area, and quantity were
different between the two different stimuli sets (Clayton et al., 2015). To conclude, comparing the
results coming from different experiments with stimuli generated by different GMs is limited.
The variety of GMs also limits the ability to examine previous studies through new theoretical
prisms. For example, Yousif and Keil (2019) suggested that arrays’ additive area has a role in numerical
judgments. Previous studies have not recorded data on arrays’ additive area, nor have they provided
their stimuli. Therefore, it is impossible to test this idea retrospectively because there is no way to
extract the relevant information from previous studies. Another example comes from our own
experience. We lately suggested that the shape of the convex hull is important for numerical judgments
(Katzin et al., 2019; Shilat et al., 2021). Previous studies have not recorded data on the shape of the
convex hull and have not provided their stimuli. Therefore, it is impossible to test this idea
retrospectively. Science evolves and new ideas constantly emerge. To prioritize new ideas, it is best to
test them using diverse methods. Testing new ideas on previously published data could facilitate the
“future scientist” and enable more rapid progression.
The current review
The purpose of the current review is to create common ground for future studies and to provide tools to
test new ideas on already existing data. This review starts by describing the current state of the field in
terms of a common language, followed by mapping the different controls employed by the various
GMs. For each GM we examined how the physical properties were defined and how they were
controlled. This was followed by a quantitative analysis of the distribution of the frequency of these
6
different controls. Describing the field and mapping it provided a data-driven review of the different
GMs.
Methods
Procedure overview
We studied GMs of non-symbolic numerical stimuli, designed to control the physical properties of
non-symbolic numerical stimuli. We started by identifying and collecting the different GMs. This
provided qualitative raw data on the different ways to control physical properties. Human annotators
tagged the different controls in each of the identified methods, making the qualitative data suitable for
quantitative analysis of the distributions and trends in the GMs’ control over physical properties.
Figure 2 presents an overview of the structure and pipeline of this review.
Figure 2
The Structure and Pipeline of this Review
7
Note. The workflow of constructing the current review.
We focused on GMs published between the years 2000 and 2021. Focusing on these years
allowed a broad and exhaustive look on one hand, and on the other, narrowed it to relevant GMs used
in contemporary research. Figure 3 reflects the growing number of relevant publications related to
GMs since the year 2000. The interest in GMs has been steadily increasing since the year 2000. Because
the term “Generation Methods” is not commonly used, we ran a search for the term “non-symbolic
numerical stimuli” in Semantic Scholar (semanticscholar.org). Figure 3 presents the results of this
search.
Figure 3
The Number of Publications Related to GMs has Increased Throughout the Years
Note. The figure displays the number of publications related to the GMs field between the years 19902021. The number of publications related to GMs has constantly grown over the years. Publication year
linearly predicted the number of publications related to GMs (see Supplementary Information).
8
Although this review deals with GMs from the year 2000, we wanted to provide a wider scope of
results. The results were obtained by running the query “non-symbolic numerical stimuli” in Semantic
Scholar. The x-axis presents the article’s publication year. The y-axis presents the number of
publications. The year 2000 is marked with a gray arrow. The blue line denotes a regression fit line
predicting the number of numerical cognition publications according to their publication year. The
gray shaded mark denotes a 95% confidence interval for the regression line.
GMs database
The identification of the reviewed GMs list (see Figure 2) started by collecting a list of initial methods
based on our knowledge of the literature (N = 9). The initial search was followed by a search for
additional GMs (N = 13). We used the names of the publications in the initial methods list as input for
searching for additional methods. The additional GMs search was conducted in three different ways,
providing data on previous and/or follow-up GMs:(1) previous GMs were gathered by searching within
the references of each of the identified GMs publications; (2) follow-up GMs were found by
performing a search for the citations of the nine initial GMs via the Google Scholar academic search
engine (scholar.google.com); (3) previous and follow-up GMs were searched via the Connected Papers
platform (connectedpapers.com) – a free web tool that builds a visual network graph of the papers
related to the “origin paper” that was used as the search query input. The network graph is created by
aggregating similar articles according to their semantic similarity and overlapping citations (Kaur et al.,
2022). The Connected Papers platform provides data on prior and future (i.e., “derivative”) works
related to the “origin paper”. The graph’s data is retrieved from Semantic Scholar – a free semantic
search engine for multiple academic disciplines (Fricke, 2018; Gusenbauer, 2019; Gusenbauer &
Haddaway, 2020; Jones, 2015). All in all, our GMs Database included twenty-two GMs that we have
identified.
Identifying the different controls
Different GMs were designed within different theoretical and methodological frameworks.
Accordingly, they approach the problem of property control from different perspectives. The
complexity of the different GMs makes it very hard to create a natural language processing (NLP)
algorithm that can be used to tag the properties controlled. Moreover, the diversity of contexts in which
the data can be tagged makes it impossible to create an a priori set of over-reaching general tagging
rules and requires manual tagging. Therefore, human annotators manually tagged the different controls
9
used by the GMs. To keep our tagging process as data-driven as possible we designed a straightforward
and parsimonious annotation scheme comprised of three steps (see Figure 4). First, annotators searched
for physical properties that were stated by the authors as the properties they controlled for. Second,
annotators marked the properties definitions in GMs articles and tested their consistency throughout
the text. A definition was tagged as inconsistent if there was a discrepancy between the property
definition within the manuscript and the way this property was controlled for. We did not impose
previous definitions on the identified controls and relied on the authors’ properties definitions
whenever they were available. Each annotator independently read all GMs manuscripts, and both
compiled an initial list of possible controls. In the three times the annotators disagreed, they discussed
the issue and reached a joint decision. Whenever the annotators identified a new control that did not
appear in the initial list, they added it to the tagging list if and only if the following two criteria were
met: (1) they could not find an equivalent to the given property under a different name, and (2) the
property synonyms were not found under a thesaurus entry (Danner, 2014). The results of the tagging
process are available online (see Supplementary Information).
Figure 4
Annotation Scheme for Controlled Properties
Note. The annotation scheme was used for identifying and tagging the experimental control used by
the different GMs. It was designed in a data-driven manner, so that the GMs’ control was tagged based
on the publications themselves and not on a priori assumptions. The process started by identifying
possible properties that were controlled by the authors; first, by searching the manuscripts for explicit
statements on properties controlled, and then making sure these properties were defined. Then, the
annotators searched and tagged the control types used by the GMs and the properties they controlled
for.
Control types.
As is clear by now, the close relationships between physical and numerical properties in non-symbolic
stimuli cannot be overlooked because using different controls is associated with different behavioral
10
and neuronal responses (Clayton et al., 2015; DeWind & Brannon, 2016; Gebuis & Reynvoet, 2011a,
2011b; Kuzmina & Malykh, 2022; Piazza et al., 2004; Salti et al., 2017; Smets et al., 2015). Accordingly,
it is required to design experiments in which certain properties (e.g., total surface area) will not predict
other properties (usually, numerosity). There is also a need to control physical properties such that they
will have similar discriminability. The methods to control these factors will be denoted as Control
Types (N = 5). Trivially, numerical comparison stimuli are comprised of two arrays, therefore we
divided the tagged control types into Between Arrays Controls )N = 3), defined as a control on both
arrays, and Within Arrays Controls )N = 2), defined as a control for each of the separated arrays. We
discuss the different aspects of the different control types in Table 1.
11
Table 1
Control Types
Between arrays control
Definition
Rational
Two arrays are congruent when they display a
high degree of perceptual similarity in one or
more of their dimensions (Egner, 2007)
Congruency is used to prevent the observers
from predicting one property using another one,
mainly using a physical magnitude to predict
numerosity
Shapes heterogeneity between An array is heterogeneous within shapes when
its items are in different shapes. An array is
homogenous within shapes when it is
comprised of items of the same shape
A shape heterogeneous array can make it harder
to compute some of the stimuli properties to
predict numerosity. In contrast, a homogenous
array will make it easier to compute the total or
average items’ sizes and circumferences and
use it to predict numerosity when compared to
items that vary in their shape (Aulet &
Lourenco, 2021)
Example
A congruent condition in which the more
numerous array also cover a larger area, and an
incongruent condition in which the more
numerous array covers a smaller area. When
congruent and incongruent stimuli appear in
equal proportions the two dimensions (e.g.,
total area and numerosity) cannot predict one
another
Two stimuli are controlled for saliency when
A salient property may be used by the observers Two arrays, one with three dots and the other
Saliency
the ratios of the controlled properties are
to perform the task even when it is not relevant with four dots. Also, the item surface area of
similar-to-equal. A property has a higher degree to the task at hand (Salti et al., 2017)
the more numerous array is ten times larger
of saliency when the ratio of this property in
than the less numerous one. The items’ surface
both arrays is considerably lower than the ratios
area is considerably more salient than
of other properties
numerosity as its ratio is much higher.
Equalizing the numerical ratio to the items’
1
3
surface area ratios to a ratio of instead of 10
4
will match the saliency of these properties
When the items’ shapes vary between the
Two arrays. One is comprised of triangles and
Shapes heterogeneity between Two array shapes are heterogenous-between
when the array’s items’ shapes vary between
arrays, the arrays are more dissociable.
the other is comprised of crescents (Sophian &
the two arrays. Two array shapes are
Furthermore, it is harder for the observers to
Chu, 2008)
homogenous-between when both arrays are
compute some of the arrays’ physical properties
comprised of the same shapes
using the same algorithm and observers must
use different algorithms to compute stimuli
properties
Within arrays control
Definition
Rational
Example
Congruency
A heterogenous within shapes array will be
comprised of dots, triangles, and crescents. In
contrast, in a homogenous within shapes array
all items will be dots
12
Sizes heterogeneity within
An array is heterogenous within sizes when its Using size heterogenous within arrays
elements vary in their sizes and is homogenous eliminates some of the associations between
within sizes when its elements are the same size items’ individual and collective sizes to
numerosity and avoids related confounds
(Gebuis & Reynvoet, 2011a; Guillaume et al.,
2020; Marchant et al., 2013). When the items are
size heterogeneous within, it is harder for
participants to assess the average items’ area
and to use it for numerical decisions. In
contrast, when all items are homogenous, it
increases some of the natural correlations
between quantity and physical properties
related to the items’ spacing, sizes, and the
items’ coverage of the display (Guillaume et al.,
2020; Rousselle et al., 2004; but see also
DeWind et al., 2015)
In a heterogenous within sizes array the items
in the same array have a different surface area,
but two arrays of different quantities can have
the same total surface area. In contrast, in a
homogenous within sizes array the items in the
same array have the same surface area, and two
arrays of different quantities will have a
different total surface area
Note. A list of the tagged control types. The control types were divided according to within arrays control and between arrays control.
13
Controlled properties.
The final controlled properties list included thirteen properties. Table 2 details all the physical
properties, alternative terms, and definitions. The different properties were grouped according to the
previously suggested taxonomy of intrinsic and extrinsic properties (Gebuis & Reynvoet, 2011a; Salti et
al., 2017). Notably, Piazza et al. (2004) also discussed intrinsic and extrinsic properties but used the
terms intensive and extensive parameters. Eventually, we used Shilat et al.’s (2021) definition for the
intrinsic-extrinsic taxonomy. Accordingly, intrinsic properties describe information extracted from
individual array items. In contrast, extrinsic properties describe information extracted from the array as
a whole (Shilat et al., 2021; similar to Xu & Spelke, 2000). Intrinsic and extrinsic properties can be
relatively independent of one another, especially when using different-sized items. For example,
increasing the extrinsic property of the array’s convex hull area by increasing the distance between the
items will not affect the intrinsic property of the array’s total surface area as they are independent. The
intrinsic-extrinsic distinction reflects refined nuances of stimuli control indicating an increased focus
of a GM on methodological issues.
14
Table 2
Physical Property Definitions and Alternative Terms
Intrinsic
Properties
Item surface area
Total surface area
Luminance
Formula
𝑛
∑
𝑖=1
𝑛
𝜋𝑟𝑖2
𝑛
∑ 𝜋𝑟𝑖2
𝑖=1
𝑐𝑑
𝑚2
Definition
Alternative Terms
Individual items’ average surface area
Item size, surface
individual, average dots
size, average surface area
The sum of the items’ surface areas
Note
True area, total filled area,
summed continuous
extended, cumulated
surface area, aggregate
surface, total occupied
area, cumulative area,
overall surface
The items’ brightness or intensity relative to Contrast, total brightness The GMs inconsistently
the background.
defined luminance and
referred to it in different
manners. Some of the GMs
consider luminance
homologous to the total
surface area (Guillaume et al.,
2020; Piazza et al., 2004;
Rousselle et al., 2004; Soltész
et al., 2010; Yousif & Keil,
2019). Another portion of the
GMs referred to luminance in
a more nuanced and complex
manner (Dakin et al., 2011;
Lourenco et al., 2012; Ross,
2003). When using items of
the same color and a
homogenous background,
luminance can be
operationally defined and
calculated in the same
manner as the total surface
area. For example, when
using black dots with a white
15
Average diameter
Total
circumference
Additive area
Inter distance
Extrinsic
Properties
Density
Open space
Apparent
closeness
Convex hull area
𝑛
2𝑟𝑖
∑
𝑛
Individual items’ average diameter
𝑖=1
𝑛
The sum of the items’ circumferences
𝑖=1
𝑛
The sum of the items’ dimensions (Yousif &
Keil, 2019)
∑ 2𝜋𝑟𝑖
∑ 2𝑟𝑖 + 2𝑟𝑖
background. Nevertheless,
luminance has a different
theoretical meaning then the
total surface area (Kadosh et
al., 2008; Mareschal & Baker,
1998; Pinel et al., 2004).
𝑟
Average radius (∑𝑛𝑖=1 𝑖 ) was
𝑛
considered as homologic to
average diameter
𝑖=1
The average distance between dot centers,
2 + (𝑦
2 ) calculated as the average of the shortest open
(
)
)
𝑚𝑖𝑛(√
x
−
x
−
𝑦
ⅈ+1
ⅈ
ⅈ+1
ⅈ
𝑛−1
𝛴ⅈ=1
path connecting all the array’s dots
𝑛−1
Definition
Formula
There are several definitions (see Qualitative
analysis 2nd section)
Similar to density but was inconsistently
defined ( see Qualitative analysis 2nd section).
Also, open space has a different theoretical
meaning than density but might be considered
as a proxy of density (Sophian & Chu, 2008)
The stimulus overall scaling. Increasing the
apparent closeness is equivalent to zooming in
on a stimulus, such that it subtends a larger
visual angle without changing its relative
proportions (DeWind et al., 2015)
The area of the smallest convex polygon that
contains all objects in the array
Contour length, sum of
the items’ perimeters
Inter-item spacing
Alternative terms
Note
Coverage, element
spacing
Area extended, total
envelope, field area,
global occupied field,
field area
Convex hull area can be
calculated by dividing the
polygon into triangles,
calculating their areas, and
summing them
16
Average
occupancy
Spatial frequency
𝐶𝑜𝑛𝑣𝑒𝑥 ℎ𝑢𝑙𝑙
𝑛
The average space individual dots sustain
within and around their physical size
Sparsity, average field
area per item, the inverse
of the density
The number of spatial cycles of a visual event
(such as an object or color code) within a given
image area, usually measured in pixels
(Boreman, 2021; De Valois & Switkes, 1980;
Efford, 2000)
Note. The Table lists tagged physical properties textual definition and properties’ alternative terms, divided according to the taxonomy of intrinsic and extrinsic
properties. Whenever applicable, we also provide the formula defining each property. As most of the GMs use dots, the relevant equations refer to circles.
When 𝑛 denotes the number of dots, 𝑟 denotes the dot radiuses, 𝑖 denotes the dot indexes, 𝑐𝑑 denotes candela units (Trezona, 2000), 𝑚 denotes meters, and
𝑥 and 𝑦 denote the cartesian coordinates of the dot centers. We referred to specific GMs if the GMs used a unique definition of a property or if a property
appeared only in one GM. The properties are ordered so that similar properties that rely on the same variable or constant are grouped together.
17
Results
Qualitative analysis
Generation methods list analysis.
Figure 5 provides an overview of the different GMs (N = 22), the properties they controlled (N = 13),
and the ways the properties were controlled (i.e., control types, N = 5). The properties were divided
according to the intrinsic-extrinsic distinction. We named the GMs according to their respective
publication authors. Some of the GM publications were co-authored by authors participating in more
than one publication.
Provided data. To fully understand the results obtained using different GMs there is a need to
understand the difference between the different stimuli sets and compare them in the context of a
theoretical perspective. To gain knowledge regarding stimuli reproduction and the comparability of the
different GMs, we tagged the different types of data supplied by the GMs. We found three types of
data-sharing in the literature providing knowledge of the stimuli sets and GMs: (1) methods that shared
their stimuli sets as pictures, such that they could be reanalyzed after publication (e.g., Shilat et al.,
2021, which is not a GM); (2) methods that provided a way to reproduce their stimuli by providing a
software package or graphical user interface (GUI) that enables stimuli reproduction (e.g., Guillaume et
al., 2020);and (3) methods that shared a detailed report on the stimuli features (such as the ratio of
physical properties in each picture) but did not share the pictures themselves (Yousif & Keil, 2019). We
found that none of the GMs shared their stimuli as pictures. Another issue is that only one GM
provided detailed data but not a software package or GUI to reproduce the stimuli. Therefore, we
decided to unify these different tags into a unified tag and named it reproducibility. Figure 5 displays
the status of the GMs reproducibility (N = 12).
Control Type is not related to the controlled properties. We found that the types of controls used by
the GMs and the controlled properties had a low-to-no dependency on one another. For example, a GM
can control for the congruency of the average diameter to the arrays numerosity. However, the same
method can control the stimuli saliency by imposing it on other physical properties, such that the
arrays’ convex hull area will be equated to their numerosity. Any type of control the GMs impose on
the different physical properties does not entail the use of other control types.
18
Figure 5
List of All GMs, their Control Types, and Controlled Properties
Note. The figure presents a list of all GMs and the results of tagging their control types, controlled
properties, and reproducibility. The GMs controlled a wide variety of properties and a wide variety of
property combinations. Each row depicts a different GM (N = 22). The GMs are ordered according to
their respective publication year and alphabetical order. The rightmost column presents data on the
GMs reproducibility, marked with an open lock. The rest of the columns present data on the GMs’
controls, marked using circles. The controls were divided into control types (N = 5) and controlled
properties (N = 13). Whenever a control type was used, a blue circle appears. The controlled properties
19
were divided into intrinsic and extrinsic properties and are marked in dark and light pink circles,
respectively.
Generation methods properties definitions and terminology.
Properties definitions, and the case of density. The definitions of the different properties are
inconsistent or inexistent. Density, for example, is inconsistently defined in the literature (Dakin et al.,
2011; De Marco & Cutini, 2020). We chose to focus on density because it is the extrinsic property most
of the GMs have attempted to control for (~59% of the GMs (15/22); see Table 3. The different
definitions for density rely on a combination of three characteristics of the non-symbolic array: (1) the
items’ number; (2) the area on which the items are scattered; and (3) the items’ distances. Notably,
density was not defined in ~33.3% (5/15) of the GMs that stated that they controlled for it (Halberda et
al., 2008; Halberda & Feigenson, 2008; Huntley-Fenner & Cannon, 2000; Piazza et al., 2004; Rousselle
et al., 2004).
Importantly, there were inconsistencies in definition, even within a manuscript. Thirty percent
(3/10) of the GMs that have stated to control for density and have also provided a definition of it were
not consistent in the way it was calculated. For example, Sophian and Chu (2008) discussed two
definitions of density. The first definition is based on the total surface area. The second definition is
based on individual items. However, they eventually controlled for a third definition, namely, the
amount of open space in the array. We could not find a concrete definition of the term open space.
Instead, open space is the aggregated space unoccupied by the array’s items. Open space might be
considered as a proxy of density. Open space was manipulated by using different array configurations
or different groupings that provide different levels of open space. The GMs that have stated to control
for density but have not defined it or have inconsistently referred to it within a paper, were conceived
as a part of the earlier GMs. In later years, more GMs that have stated to control for density have
defined it.
20
Table 3
Different Density Definitions of the GMs that Controlled Density
Generation Method Control Definition Definition
Statement Availability Consistency
Huntley-Fenner &
Cannon, 2000
Piazza et al., 2004
Definition
✓
Rousselle et al., 2004
Halberda et al., 2008
Halberda &
Feigenson, 2008
Ross, 2003
✓
✓
Dakin et al., 2011
✓
✓
Sophian & Chu, 2008
✓
✓
Number
Display area
When the display area is defined as the area in which the items can be scattered
The authors regarded density as element spacing but have not defined it. Otherwise, defined
Number
density as convex hull area
Total surface area
Dⅈsplay area
Also, defined density as the amount of space individual items occupy. Eventually, they
controlled for the open space in the array, a proxy of density (see above).
Guillaume et al., 2020
✓
✓
✓
✓
✓
✓
Zanon et al., 2021
Gebuis & Reynvoet,
2011a
Gebuis & Reynvoet,
2011b
Total surface area
Convex hull area
21
Salti et al., 2017
De Marco & Cutini,
2020
DeWind et al., 2015
✓
✓
✓
Item size
√
Spacing
Spacing was defined according to the distance between a fixed number of items. However, in
Number
future work DeWind and Brannon (2019) defined density as: Convex hull area
Note. The table displays an analysis of the different definitions for density. Eight out of 15 GMs (~53%) that stated they control density have inconsistently
defined density or have not defined density at all. All GMs in the current table have stated that they controlled density. The second column from the left
presents the existence of a density control statement. The third column denotes the availability of a definition for density in the text. The fourth column presents
the consistency of the definition of density throughout the text. Trivially, if density was not defined, its definition could not be consistent. A definition was
marked inconsistent if the annotators found a discrepancy between the definition of density and the actual way it was controlled for. The seven different
definitions for density appear in the last column. A green checkmark represents conditions encoded as true, defined as conditions in which the annotators have
spotted a control statement on density, or found a definition of density or found it consistent. Whenever one of these conditions was not met, it was marked with
a red x-mark. The GMs are ordered according to the following categories: (1) GMs that have not defined density; (2) GMs that have inconsistently regarded
density; and (3) GMs that have properly defined density. Within each category the GMs are arranged in the following order: (1) definitions of density based on
the items’ number; (2) definitions of density relying on the convex hull area; and (3) definitions of density relying on the items’ distances. Otherwise, the GMs
are ordered chronologically.
22
Inconsistent terminology. We found that the thirteen different controlled for properties were referred
to by a total of 35 alternative terms (Table 2). The same property might be referred to by using
synonyms that share the same meaning, with some more similar than others (Danner, 2014; Lea, 2008).
A property can be also referred to by different homologous terms, although these terms are not
straightforward synonyms of the same property. The annotators reviewed the GM text again and
discussed whether these terms refer to properties that already exist in our properties list. Total
circumference provides an example of a property that could be replaced by various synonyms. For
instance, the word “circumference” can be replaced with the word “perimeter” or any other synonym.
The word “total” can be replaced with “sum” or any other synonym or combination of these synonyms.
We found that many GMs used different synonyms for the total circumference (De Marco & Cutini,
2020; DeWind et al., 2015; Halberda & Feigenson, 2008; Lyons et al., 2014; Price et al., 2012; Rousselle
& Noël, 2008; Rousselle et al., 2004; Salti et al., 2017; Yousif & Keil, 2019; Zanon et al., 2021). Some
properties were referred to by terms that are not direct synonyms and can throw the reader off in a
different theoretical direction. For instance, the term “area extended”, used as an alternative term to
convex hull area, can be misinterpreted as related to the items’ surface area. For some terms, it was
hard to know if the different authors referred to the same property. For example, some used the term
“contour length” as homologous to the term “total circumference” (DeWind et al., 2015; Gebuis &
Reynvoet, 2011a, 2011b; Halberda & Feigenson, 2008; Rousselle & Noël, 2008; Rousselle et al., 2004;
Soltész et al., 2010; Sophian & Chu, 2008; Yousif & Keil, 2019; Zanon et al., 2021). The situation was
even less clear for extrinsic properties not dependent on the items’ radiuses. For example, the convex
hull area was also referred to using the term “field area” (DeWind et al., 2015) or “area extended”
(Gebuis & Reynvoet, 2011a), or “global occupied” area. The convex hull area was also referred to using
the term “total envelope” (Halberda & Feigenson, 2008) which is also used by Soltész et al. (2010) to
refer to the items’ “total circumference”.
Quantitative analysis
Stimuli reproducibility.
None of the GMs have provided their stimuli. We counted the number of methods that provided
their stimuli or a way to reproduce the stimuli or provided a detailed report on the stimuli features.
Importantly, no GM has provided the stimuli. Ten of the 22 GMs provided a way to reproduce their
stimuli. Only one method provided detailed data on its stimuli but did not provide a way to reproduce
its stimuli (Guillaume et al., 2020). The chance that a GM will provide options for stimuli
23
reproducibility is higher in recent years than in the earlier years of this review. We found that the year
of publication predicts 43% of the variance in the proportion of publications that provided means for
stimuli reproducibility, F(1, 11) = 10.129, p = .009, r = 0.692, 95% CI: [0.229, 0.9], with moderate-tostrong Bayesian evidence supporting this effect, BF10 = 7.566.
Controlled properties.
The number of controlled properties. The average number of controlled properties by each GM was
3.227, SEM = 0.271. The number of controlled intrinsic properties (MeanIntrinsic = 2.318, SEM = 0.179)
was higher than the number of controlled extrinsic properties (MeanExtrinsic = 0.909, SEM = 0.236). The
diversity in the number of controlled properties was calculated using Gini-Simpson's index of
diversity. The Gini-Simpson's index of diversity is usually denoted using D or G but for clarity will be
denoted here using the notation Diversity-index (Keylock, 2005; Lande, 1996; Simpson, 1949; Tran et
al., 2021). In the current context, Diversity-index measures the probability that two GMs will control a
different number of properties. The higher the Gini-Simpson's index of diversity, the higher the
probability that the two compared GMs will control a different number of properties. There was
medium-to-high diversity, with Diversity-index = .757 in the number of controlled properties.
Property control has increased throughout the years. As seen in Figure 6, throughout the years the
average number of controlled properties has constantly increased. Publication year explains 39.7% of
the variance in the number of controlled for properties, F(21) = 14.828, p < .001, r = 0.652, 95% CI:
[0.319, 0.942], with strong Bayesian evidence supporting this effect, BF10 = 30.344. As the number of
years increases, the number of controlled intrinsic properties increases, but the number of extrinsic
properties does not increase. Publication year explains 16.5% of the variance in the number of
controlled intrinsic properties, F(1, 21) = 5.137, p = .034, r = 0.452, 95% CI: [0.038, 0.734], with weak-tomoderate Bayesian evidence supporting this effect, BF10 = 2.144. In contrast, publication year has not
explained the number of controlled for extrinsic properties, r = 0.311, 95% CI: [-0.128, 0.647], p = .159.
There was not enough Bayesian support for a null effect, BF01 = 0.812. Therefore, the increase in the
number of controlled properties is driven by the increase in the control of intrinsic properties and is not
affected by a change in the control in extrinsic properties. There was also an increase in the diversity of
the properties controlled by GMs. When comparing the Gini-Simpson's index of the diversity of the
GMs before and after 2011, Diversity-index Before = .666 and Diversity-indexAfter = .833, the diversity of
the number of controlled properties multiplied itself by a factor of 1.25 after 2011.
24
Figure 6
The Number of Controlled Properties Throughout the Years
Note. The figure depicts the number of controlled properties (y-axis) as a function of the year in which
the GMs were published. The number of controlled properties has constantly grown over the years.
Each black circle depicts one GM. If two GMs were published in the same year and controlled for the
same number of properties, the circles are overlayed on top of each other. For example, in 2012 two
GMs (Lourenco et al., 2012; Price et al., 2012) controlled for two properties and they are represented by
one circle only. Linear regression shows that the number of controlled properties has constantly
increased throughout the years, r = 0.652, 95% CI: [0.319, 0.942], p < .001, BF10 = 30.344, YProperties =
0.652Xyear – 274.915, R2 = 0.397. The blue line denotes the regression fit line, and the SEM is
represented by the shaded gray curve.
Total surface area is the most common controlled property. The different methods have controlled
for 13 intrinsic and extrinsic properties (see Table 2). Figure 7 displays the relative frequency of GMs
that controlled specific properties. No single property was controlled by all methods. The average
25
relative frequency of all controlled properties was ~25%, SEM = 6.274. We measured the asymmetry of
the GMs’ controlled properties distribution by calculating the distribution’s skewness. Skewness can
be defined as a measure of the asymmetry of a distribution of a random variable around its mean
(Groeneveld & Meeden, 1984; MacGillivray, 1986; Pearson, 1895). Ordering the distributions from the
largest frequency to the smallest frequency results in a high positive skewness level, Skewness = 1.142,
SEskewness = 0.616. The high level of positive skewness indicates that only a small number of physical
properties were controlled for in most GMs and most properties have a low probability of being
controlled for. Notably, the total surface area was controlled in 77% of the GMs, and the item surface
area was controlled in 55% of the GMs.
Figure 7
The Relative Frequency of the Controlled Properties
Note. The figure depicts the distribution of different controlled properties. No property was controlled
for by all GMs, but the total surface area was controlled for by most of the GMs. Intrinsic properties
were more frequently controlled for than extrinsic properties. The list of properties is presented on the
x-axis. Bars represent the relative frequency of the controlled properties between the GMs. The
26
controlled properties were grouped into intrinsic and extrinsic properties and are marked in dark and
light pink, respectively. The properties within the intrinsic and extrinsic groups were ordered from the
most frequently controlled property to the least frequently controlled property. When two properties
were equally controlled by the GMs, they were ordered in the figure according to the order provided in
Table 2.
High diversity in the controlled properties. There was a high diversity in the properties controlled by
the GMs, Diversity-index = .876, such that there is a very high probability that two different GMs will
control for different properties. Importantly, the high diversity in the controlled properties persists
regardless of the two most commonly controlled for properties. The Gini-Simpson's index of diversity
was calculated for all possible controlled properties and stayed almost the same when we removed the
total surface area and item surface, Diversity-indexWithout = .878.
All GMs controlled for intrinsic properties. The different GMs either controlled for intrinsic
properties, defined at the level of individual items, or controlled for extrinsic properties, defined at the
level of the whole array. All GMs controlled for at least one intrinsic property. Half of the methods
only controlled for intrinsic properties and the other half controlled for both intrinsic and extrinsic
properties. Notably, not a single GM controlled for only extrinsic properties.
Low-to-medium similarity in some of the controlled properties. As seen in Figure 7, the two most
commonly controlled for properties were intrinsic properties—total surface area and item surface area
were controlled for in 77% and 54% of the methods, respectively. Examining the union and intersection
of the GMs that controlled for the total surface area and items’ surface area or only one of these
properties, shows that approximately 36% of the GMs controlled for both properties, and 95% of
controlled for at least one of them. Therefore, there is a medium similarity in the controlled for intrinsic
properties. The most frequently controlled extrinsic properties were density and convex hull area. Only
18% of the methods controlled both the items’ density and convex hull. Therefore, there is almost no
similarity among GMs in the controlled extrinsic properties.
Controlled types.
Most GMs used congruency, saliency, and sizes heterogeneity within as control types. The
proportion of the methods that used various control types (see Methods section) is displayed in Figure
8. No single control type was used by all methods. The average relative frequency of the different
control types was equal to approximately 46%, SEM = 13.667. Approximately, 60-80% of the GMs
27
used the same three control types: congruency, sizes heterogeneity within, and saliency. Fifty-nine
percent of the GMs used congruency. Sizes heterogeneity within was controlled by 68% of the
methods. Saliency was controlled by 77% of the GMs. Notably, 91% of the methods used at least one of
these control types, and 82% of the methods used at least a combination of these control types.
Therefore, there is a high similarity in the control types used by the different methods.
Figure 8
GMs Use of Various Control Types
Note. The figure depicts the relative frequency of use of different control types by the different GMs.
Ninety-one percent of the GMs used at least one control type of: congruency, saliency, and sizes
heterogeneity within. The x-axis presents the list of control types. The y-axis presents the relative
frequency of the GMs that used these control types. The control types are ordered from the most
frequently used to the least frequently used control type.
GMs employ a stable number of control types throughout the years. Throughout the years the
average number of control types employed by GMs has not significantly increased, p =.863, with weak
28
evidence for the null hypothesis, BF01 = 2.573. The difference between the average number of control
types used by the GMs before and after 2011 was calculated using the Mann-Whitney U test. The
Mann-Whitney U test (also known as the Wilcoxon rank sum test) is a non-parametric test for the
difference between two independent samples, denoted using the statistic U (Mann & Whitney, 1947;
McKnight & Najab, 2010; Wilcoxon, 1945). All in all, there was no significant number in the number of
control types used by the GMs before and after 2011, U(13,9) = 45, p = .353, with weak Bayesian
support for the null hypothesis, BF01 = 1.962.
Discussion
The current review examined various GMs with the following objectives in mind. First, we wanted to
describe the current state of the field in terms of its common language and ground. Second, we wanted
to accurately map the control types and properties controlled by the different GMs. Third, we wanted to
test if the field is limited in its ability to compare between studies and to examine previous studies
through new theoretical prisms. Finally, we wanted to provide guidelines for GMs.
Using a combination of automatic tools and human annotators who inspected the literature, we
found 22 GMs (Figure 5). These GMs used five different control types (Table 2) to control 13 physical
properties (Table 2). Some of the control types can be imposed on all properties and some on only a
portion of them. The control type has a low-to-no dependency on which properties were controlled for.
Consequently, this leads to a variety of GMs and inevitably to different stimuli sets and results
(Clayton et al., 2015; DeWind & Brannon, 2016; Gebuis & Reynvoet, 2011a; Katzin et al., 2019;
Kuzmina & Malykh, 2022; Salti et al., 2017; Smets et al., 2015).
The ability to compare and replicate previous results relies on proper definitions of the
controlled properties. A large portion of the GMs publications lacks proper definitions of properties
they chose to control for. Our analysis showed that some GMs have stated to control certain properties
but have not defined these properties. While some of the properties do not need definitions (mainly
definitions stemming from Euclidean geometry), other properties like density or the convex hull area
do need an exact definition. Another problem is that some of the properties were inconsistently defined
between GMs. Even worse, some were inconsistently defined within the same publication. The lack of
consistency of property definition between GMs is more harmful than the lack of definitions as it
makes the GMs that have controlled for it incomparable. Inconsistent property definition within an
article not only makes it less replicable but makes it impossible to understand the prism through which
29
the researchers conducted their study and the logic underlying the GM design. All in all, the lack of
proper definitions creates a lack of common language, prevents proper reproduction or reanalysis of
the results, and makes GMs incomparable to one another.
On the surface, the majority of GMs have used one of a few controls. All GMs but one
controlled one of two intrinsic properties—the individual items’ surface area or the items’ total surface
(Figures 5 and 7). Notably, the items individual and total surface area are similar properties dependent
on the square radius of the items (as most of the GMs use dot-shaped items). In addition, most GMs
used the same control types (Figure 8), and the type of controls has not changed throughout the years.
Nevertheless, although most of the GMs controlled for similar properties and used similar control
types, our mapping shows that the different GMs are incomparable. This is because each GM used
additional controls that were highly diverse. In fact, most properties had a low probability of being
controlled for. Finally, the number and diversity of the controlled properties increased throughout the
years.
There is the question of whether new ideas changed the way GMs were designed. Scientific
progress occurs when scientific ideas feed the scientific domain and contribute to its expansion and
development (Bird, 2000; Kuhn, 1970). Notably, throughout the years the number of controlled
properties has increased (Figure 6). The increase in the number of controlled properties provides
evidence for the assimilation of new ideas and findings into the field. Furthermore, during the 2nd
decade of the reviewed period, new physical properties were added by new GMs. Yet, the high
diversity of controlled properties and the low probability that two GMs will control the same properties
suggests that the ideas coming from previous GMs are usually not adopted by later GMs.
It is worth focusing on two different properties that reflect opposing trends in the field. On one
hand, the adoption of the convex hull area shows that new ideas can be successfully implemented into
the field. In 2011, Gebuis and Reynvoet showed how the convex hull area biased numerical
comparisons, and since then others have highlighted the great need to control convex hull area in
numerical comparison tasks (Clayton et al., 2015; Rodríguez & Ferreira, 2023). We found that since the
convex hull area was introduced to the literature, the majority of the GMs have implemented it in their
design (De Marco & Cutini, 2020; DeWind & Brannon, 2019; Guillaume et al., 2020; Salti et al., 2017;
Zanon et al., 2021). On the other hand, the case of density provides a different picture. Although there
were attempts to control it in a substantial portion of the GMs (~59%), it was not carefully defined and
therefore poorly controlled for.
30
Taken together, the field shows a partial capability to improve GMs’ control and/or to
assimilate new ideas. We think that the progress of the numerical cognition field will be more efficient
if it will be possible to compare previous ideas and test novel ideas in light of previous findings. The
ability to compare ideas and test new ideas in light of previous ones relies on the elaboration and
precision of the methods section. Additionally, by providing the original study stimuli along with the
data, it would be possible to inspect the results through the prisms of new GMs. However, we found
that only half of the GMs provided a way to reproduce their stimuli. Moreover, none of the GMs
provided their actual stimuli, let alone stimuli coupled with results.
This review inspects the GMs from a prism of the physical properties they controlled for. The
choice to focus on physical properties relies on two motivations. First, physical properties are
intertwined with numerosity and therefore the initial control choices have major effects on the
experimental results. Second, we identified that the choice to control certain physical properties makes
studies incomparable as it creates unique correlations between different physical properties that make
the stimuli incomparable. The guiding line of this review was to conduct an inductive data-driven
review, as opposed to a narrative-driven review. However, our focus on physical properties necessarily
limited our span and created priors for the rest of the process. There are other aspects that should be
considered when designing a GM but were not reviewed here.
Throughout this review, we considered all non-symbolic numerical comparative judgments as
equal, regardless of the stimuli presentation mode. However, there is a variety of ways to present the
stimuli. For instance, the two compared arrays can be displayed simultaneously or sequentially. Some
have shown that when the same arrays were presented using either simultaneous, sequential, or
intermixed modes of presentation, participants exhibited different reaction times, accuracy, and Weber
fractions patterns (Kuzmina & Malykh, 2022; Norris & Castronovo, 2016; Price et al., 2012; Smets et
al., 2016). In an intermixed presentation mode, two arrays colored in different colors are overlayed in
the same display (Halberda et al., 2008). Figure 9 presents simultaneous and intermixed displays. The
results obtained using these different presentation modes are not comparable (but see Smets et al. 2016),
because different presentation modes might measure different types of cognitive processes, such as
serial or parallel processing (Townsend, 1990).
Figure 9
31
Two Stimuli Presentation Modes Involve Different Stimuli Control
Note. The stimuli can be presented using simultaneous (Panel A) and intermixed presentation (Panel B)
modes. Importantly, these modes of presentation involve different control of the stimuli physical
properties.
Top-down strategic effects and learning effects might bias participants to attend to different
physical properties according to task goals (Salti et al., 2019). Specifically, using the same stimuli but
changing the task goals influences the results. Changing the task from a numerical task in which the
participants' goal is to choose the more numerous array, to a physical task in which the participants'
goal is to choose the larger array, changes the effect of physical properties on performance (Katzin et
al., 2019; Leibovich et al., 2015; Leibovich-Raveh et al., 2018; Salti et al., 2017). Avitan et al. (2022)
found that changing the task goal to choose a smaller magnitude (quantity or area) instead of the larger
magnitude modulated participants' performance and interacted with task type (numerical or physical).
Leibovich-Raveh et al. (2018) manipulated participants' emphasis on accuracy or speed during a
numerical comparison task and discovered that the effect of different physical properties was
dependent on participants' emphasis on accuracy or speed. Furthermore, the experimental block design
should be considered when designing a numerical comparison experiment. Pekár and Kinder (2020)
generated their stimuli using Gebuis and Reynvoet’s (2011a) GM and found increased congruency
effects when the different stimuli sets were mixed in the same block in comparison to when the
different stimuli sets were displayed in different blocks. Tokita and Ishiguchi (2010) found that the
effect of physical properties was modulated by practice. Thus, different block designs and amounts of
practice induce different effects and should be carefully compared to one another.
This review focused on the control of physical properties. We suggest paying attention and
carefully controlling for the stimuli presentation mode and strategic and learning effects when
designing GMs and running a numerical comparison study. A robust body of evidence supports a
32
relationship between stimuli design and top-down strategic effects, and therefore future studies should
pursue this direction. The problem of controlling physical properties in non-symbolic stimuli is
relevant regardless of the stimuli presentation mode, as different stimuli sets create different
correlations between the numerical and physical dimensions. In fact, the discrepancy between different
stimuli sets and the theoretical prisms leading their design is apparent when comparing simultaneous
and intermixed displays. For example, it is not clear if the intermixed stimuli (Figure 9, Panel A) are
comprised of two different arrays with different convex hull areas or we should only account for one
convex hull comprised of both the yellow and the blue dots arrays.
Guidelines
Taken together, the variety of different GMs, properties, and control types reflects a theoretical and
methodological wealth. This wealth makes it hard to maintain a common language within the field, but
its advantages are apparent. The field is thriving with an increasing number of articles (Figure 3), and it
has a large number of venues and ideas that could be pursued.
We see the diversity in the field as a strength, and we do not advocate using one GM or any
other way to impose a common language (but see De Marco & Cutini, 2020; Zanon et al., 2021, for a
different opinion). Yet, currently, it is very hard to compare GMs. Accordingly, we recommend that
authors provide an explicit and consistent definition of all controlled properties. Providing definitions
will improve the replicability, interpretability, and explainability of previous studies (Broniatowski,
2021), such that previous ideas will be translatable in light of newer views (Almaatouq et al., 2022;
Rocca & Yarkoni, 2021).
We also suggest providing access to the code package used to generate the stimuli. Providing
access to the original code will allow testing further questions and issues not tested in the original
work. It will also provide a way to assess how hard or unnatural producing the stimuli was, as the
control of some properties requires imposing numerous constraints on the stimuli. It is very hard to
predict how controlling for one or more properties might affect other properties because different
properties are intercorrelated with one another in many ways (De Marco & Cutini, 2020; Salti et al.,
2017). Accordingly, we suggest that authors should examine the correlations among the different
properties after the stimuli are produced. In addition, we suggest providing the actual stimuli alongside
the behavioral and neuronal responses (if applicable) to each stimulus. New ideas could be tested by
reanalyzing previous studies when both the stimuli and their corresponding responses are provided
33
(e.g., Shilat et al., 2021). These recommendations are in line with open-source practices, which increase
study reliability and provide community-based knowledge-sharing (AlMarzouq et al., 2005). Moreover,
these recommendations will hopefully enable researchers to better understand results obtained using
different methods and provide more generality and reproducibility (Schooler, 2014) to the field. Finally,
these recommendations allow diversity on one hand and an ability to examine old data through new
prisms on the other hand. Suggested guidelines for GMs appear in Box 3 below.
Box 3
Guidelines
1. Property definitions
Provide explicit and consistent definitions of all controlled properties or control types.
2. Providing stimuli generation code
Provide the code package used to generate the stimuli.
3. Post-production analysis
Examine the correlations among different properties after producing the stimuli.
4. Providing stimuli and responses
Provide the actual stimuli alongside the corresponding results to each stimulus.
Acknowledgments The authors thank Dr. Tali Leibovich-Raveh for her valuable insights.
Furthermore, we thank Ms. Adi Gabzu for her important insights and help with the data curation and
tagging. Finally, we wish to thank the lovely Mrs. Desiree Meloul for her enlightening insights and
editing the drafts of the article.
Supplementary Information The online version contains supplementary material.
Authors' Contributions Conceptualization - AH, MS, YS; Methodology - YS; Investigation - HG,
SW, YS; Visualization - SW, YS; Supervision - AH, MS; Writing original draft - MS, YS; Reviewing
and editing - AH, HG, MS, SW, YS.
34
Funding This research did not receive any specific grant from funding agencies in the public,
commercial, or not-for-profit sectors.
Data and Code Availability The datasets generated and analyzed during this study are publicly
available as a part of the Supplementary Information.
Declarations
Conflicts of Interest The authors declare that there is no conflict of interest.
Ethics Approval Not applicable.
Consent to Participate Not applicable.
Consent for Publication Not applicable.
Competing Interests The authors declare no competing interests.
35
References
Almaatouq, A., Griffiths, T. L., Suchow, J. W., Whiting, M. E., Evans, J., & Watts, D. J. (2022). Beyond
Playing 20 Questions with Nature: Integrative experiment design in the social and behavioral
sciences. Behavioral and Brain Sciences, 1-55. https://doi.org/10.1017/S0140525X22002874
AlMarzouq, M., Zheng, L., Rong, G., & Grover, V. (2005). Open source: Concepts, benefits, and
challenges. Communications of the Association for Information Systems, 16, 756-784.
https://doi.org/10.17705/1CAIS.01637
Aulet, L. S., & Lourenco, S. F. (2021). The relative salience of numerical and non-numerical dimensions
shifts over development: A re-analysis of Tomlinson, DeWind, and Brannon (2020). Cognition,
210, 104610. https://doi.org/10.1016/j.cognition.2021.104610
Avitan, A., Galili, H., & Henik, A. (2022). Less is more? Instructions modulate the way we interact with
continuous features in non-symbolic dot-array comparison tasks. SSRN.
https://doi.org/10.2139/ssrn.4065685
Bird, A. (2000). Thomas Kuhn (1st ed.). Acumen.
Boreman, G. D. (2021). Modulation transfer function in optical and electro-optical systems (2nd ed.).
SPIE Press.
Broniatowski, D. A. (2021). Psychological Foundations of Explainability and Interpretability in
Artificial Intelligence. National Institute of Standards and Technology.
https://doi.org/10.6028/NIST.IR.8367
Clayton, S., Gilmore, C., & Inglis, M. (2015). Dot comparison stimuli are not all alike: The effect of
different visual controls on ANS measurement. Acta Psychologica, 161, 177–184.
https://doi.org/10.1016/j.actpsy.2015.09.007
Dakin, S. C., Tibber, M. S., Greenwood, J. A., Kingdom, F. A. A., & Morgan, M. J. (2011). A common
visual metric for approximate number and density. Proceedings of the National Academy of
Sciences, 108(49), 19552–19557. https://doi.org/10.1073/pnas.1113195108
Danner, H. G. (2014). A thesaurus of English word roots. Rowman & Littlefield.
De Marco, D., & Cutini, S. (2020). Introducing CUSTOM: A customized, ultraprecise, standardizationoriented, multipurpose algorithm for generating nonsymbolic number stimuli. Behavior
Research Methods, 52(4), 1528–1537. https://doi.org/10.3758/s13428-019-01332-z
De Valois, K. K., & Switkes, E. (1980). Spatial frequency specific interaction of dot patterns and
gratings. Proceedings of the National Academy of Sciences, 77(1), 662–665.
https://doi.org/10.1073/pnas.77.1.662
36
Dehaene, S. (1999). The number sense: How the mind creates mathematics (1st ed.). Oxford University
Press.
DeWind, N. K., Adams, G. K., Platt, M., & Brannon, E. (2015). Modeling the approximate number
system to quantify the contribution of visual stimulus features. Cognition, 142, 247–265.
https://doi.org/10.1016/j.cognition.2015.05.016
DeWind, N. K., & Brannon, E. M. (2016). Significant inter-test reliability across approximate number
system assessments. Frontiers in Psychology, 7, Article 317.
https://doi.org/10.3389/fpsyg.2016.00310
DeWind, N. K., & Brannon, E. M. (2019, January 7). Measuring congruence effects in nonsymbolic
number comparison: The importance of the degree of congruence. Methods in Numerical
Cognition Workshop, Budapest, Hungary. https://osf.io/ds2h7
Efford, N. (2000). Digital image processing: A practical introduction using Java (1st ed.). AddisonWesley.
Egner, T. (2007). Congruency sequence effects and cognitive control. Cognitive, Affective, &
Behavioral Neuroscience, 7(4), 380–390. https://doi.org/10.3758/CABN.7.4.380
French, R. S. (1953). The discrimination of dot patterns as a function of number and average separation
of dots. Journal of Experimental Psychology, 46(1), 1–9. https://doi.org/10.1037/h0059543
Fricke, S. (2018). Semantic Scholar. Journal of the Medical Library Association, 106(1), 145-147.
https://doi.org/10.5195/jmla.2018.280
Frith, C. D., & Frith, U. (1972). The solitaire illusion: An illusion of numerosity. Perception &
Psychophysics, 11(6), 409–410. https://doi.org/10.3758/BF03206279
Gebuis, T., & Reynvoet, B. (2011a). Generating nonsymbolic number stimuli. Behavior Research
Methods, 43(4), 981–986. https://doi.org/10.3758/s13428-011-0097-5
Gebuis, T., & Reynvoet, B. (2011b). The interplay between nonsymbolic number and its continuous
visual properties. Journal of Experimental Psychology: General, 141(4), 642–648.
https://doi.org/10.1037/a0026218
Groeneveld, R. A., & Meeden, G. (1984). Measuring skewness and kurtosis. The Statistician, 33(4), 391399. https://doi.org/10.2307/2987742
Guillaume, M., Schiltz, C., & Rinsveld, A. V. (2020). NASCO: A New Method and Program to
Generate Dot Arrays for Non-Symbolic Number Comparison Tasks. Journal of Numerical
Cognition, 6(1), 129–147. https://doi.org/10.5964/JNC.V6I1.231
37
Gusenbauer, M. (2019). Google Scholar to overshadow them all? Comparing the sizes of 12 academic
search engines and bibliographic databases. Scientometrics, 118(1), 177–214.
https://doi.org/10.1007/s11192-018-2958-5
Gusenbauer, M., & Haddaway, N. R. (2020). Which academic search systems are suitable for systematic
reviews or meta‐analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26
other resources. Research Synthesis Methods, 11(2), 181–217. https://doi.org/10.1002/jrsm.1378
Guy, G., & Medioni, G. (1993). Inferring global perceptual contours from local features. Proceedings of
IEEE Conference on Computer Vision and Pattern Recognition, 786–787.
https://doi.org/10.1109/CVPR.1993.341175
Halberda, J., & Feigenson, L. (2008). Developmental change in the acuity of the “number sense”: The
approximate number system in 3-, 4-, 5-, and 6-year-olds and adults. Developmental
Psychology, 44(5), 1457–1465. https://doi.org/10.1037/a0012682
Halberda, J., Mazzocco, M. M. M., & Feigenson, L. (2008). Individual differences in non-verbal number
acuity correlate with maths achievement. Nature, 455(7213), 665–668.
https://doi.org/10.1038/nature07246
Huntley-Fenner, G., & Cannon, E. (2000). Preschoolers’ magnitude comparisons are mediated by a
preverbal analog mechanism. Psychological Science, 11(2), 147–152.
https://doi.org/10.1111/1467-9280.00230
Jones, N. (2015). Artificial-intelligence institute launches free science search engine. Nature, 10.
https://doi.org/10.1038/nature.2015.18703
Kadosh, R. C., Kadosh, K. C., & Henik, A. (2008). When brightness counts: The neuronal correlate of
numerical–luminance interference. Cerebral Cortex, 18(2), 337-343.
https://doi.org/10.1093/cercor/bhm058
Katzin, N., Katzin, D., Rosen, A., Henik, A., & Salti, M. (2020). Putting the world in mind: The case of
mental representation of quantity. Cognition, 195, Article 104088.
https://doi.org/10.1016/j.cognition.2019.104088
Katzin, N., Salti, M., & Henik, A. (2019). Holistic processing of numerical arrays. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 45(6), 1014–1022.
https://doi.org/10.1037/xlm0000640
Kaur, A., Sharma, R., Mishra, P., Sinhababu, A., & Chakravarty, R. (2022). Visual research discovery
using connected papers: A use case of blockchain in libraries. The Serials Librarian, 83(2), 186196.
38
Keylock, C. J. (2005). Simpson diversity and the Shannon-Wiener index as special cases of a
generalized entropy. Oikos, 109(1), 203–207. https://doi.org/10.1111/j.0030-1299.2005.13735.x
Kuhn, T. S. (1970). The structure of scientific revolutions (2d ed.). University of Chicago Press.
Kuzmina, Y., & Malykh, S. (2022). The effect of visual parameters on nonsymbolic numerosity
estimation varies depending on the format of stimulus presentation. Journal of Experimental
Child Psychology, 224, Article 105514. https://doi.org/10.1016/j.jecp.2022.105514
Lande, R. (1996). Statistics and partitioning of species diversity, and similarity among multiple
communities. Oikos, 76(1), 5-13. https://doi.org/10.2307/3545743
Lea, D. (Ed.). (2008). Oxford learner’s thesaurus: A dictionary of synonyms (1st ed.). Oxford Univ
Press.
Leibovich, T., Henik, A., & Salti, M. (2015). Numerosity processing is context driven even in the
subitizing range: An fMRI study. Neuropsychologia, 77, 137–147.
https://doi.org/10.1016/j.neuropsychologia.2015.08.016
Leibovich, T., Katzin, N., Harel, M., & Henik, A. (2017). From “sense of number” to “sense of
magnitude”: The role of continuous magnitudes in numerical cognition. Behavioral and Brain
Sciences, 40, e164. https://doi.org/10.1017/S0140525X16000960
Leibovich-Raveh, T., Stein, I., Henik, A., & Salti, M. (2018). Number and continuous magnitude
processing depends on task goals and numerosity ratio. Journal of Cognition, 1(1), Article 19.
https://doi.org/10.5334/joc.22
Lourenco, S. F., Bonny, J. W., Fernandez, E. P., & Rao, S. (2012). Nonsymbolic number and cumulative
area representations contribute shared and unique variance to symbolic math competence.
Proceedings of the National Academy of Sciences, 109(46), 18737–18742.
https://doi.org/10.1073/pnas.1207212109
Lyons, I. M., Price, G. R., Vaessen, A., Blomert, L., & Ansari, D. (2014). Numerical predictors of
arithmetic success in grades 1-6. Developmental Science, 17(5), 714–726.
https://doi.org/10.1111/desc.12152
MacGillivray, H. L. (1986). Skewness and asymmetry: Measures and orderings. The Annals of
Statistics, 14(3), 994-1011. https://doi.org/10.1214/aos/1176350046
Mann, H. B., & Whitney, D. R. (1947). On a test of whether one of two random variables is
stochastically larger than the other. The Annals of Mathematical Statistic, 18(1), 50–60.
http://www.jstor.org/stable/2236101
39
Marchant, A. P., Simons, D. J., & de Fockert, J. W. (2013). Ensemble representations: Effects of set size
and item heterogeneity on average size perception. Acta Psychologica, 142(2), 245–250.
https://doi.org/10.1016/j.actpsy.2012.11.002
Mareschal, I., & Baker, C. L. (1998). A cortical locus for the processing of contrast-defined contours.
Nature neuroscience, 1(2), 150-154. https://doi.org/10.1038/401
McKnight, P. E., & Najab, J. (2010). Mann‐Whitney U Test. In I. B. Weiner & W. E. Craighead (Eds.),
The Corsini encyclopedia of psychology (1st ed.). Wiley.
https://doi.org/10.1002/9780470479216.corpsy0524
Mehler, J., & Bever, T. G. (1967). Cognitive capacity of very young children. Science, 158(3797), 141–
142. https://doi.org/10.1126/science.158.3797.141
Navon, D. (1977). Forest before trees: The precedence of global features in visual perception. Cognitive
Psychology, 9(3), 353–383. https://doi.org/10.1016/0010-0285(77)90012-3
Norris, J., & Castronovo, J. (2016). Dot display affects approximate number system acuity and
relationships with mathematical achievement and inhibitory control. PLoS ONE, 11(5),
e0155543. https://doi.org/10.1371/journal.pone.0155543
Pearson, K. (1895). X. Contributions to the mathematical theory of evolution.—II. Skew variation in
homogeneous material. Philosophical Transactions of the Royal Society of London (A), 186,
343–414. https://doi.org/10.1098/rsta.1895.0010
Pekár, J., & Kinder, A. (2020). The interplay between non-symbolic number and its continuous visual
properties revisited: Effects of mixing trials of different types. Quarterly Journal of
Experimental Psychology, 73(5), 698–710. https://doi.org/10.1177/1747021819891068
Piaget, J. (1968). Quantification, conservation, and nativism: Quantitative evaluations of children aged
two to three years are examined. Science, 162(3857), 976–979.
https://doi.org/10.1126/science.162.3857.976
Piazza, M., Izard, V., Pinel, P., Bihan, D. L., & Dehaene, S. (2004). Tuning curves for approximate
numerosity in the human intraparietal sulcus. Neuron, 44(3), 547–555.
https://doi.org/10.1016/j.neuron.2004.10.014
Pinel, P., Piazza, M., Le Bihan, D., & Dehaene, S. (2004). Distributed and overlapping cerebral
representations of number, size, and luminance during comparative judgments. Neuron, 41(6),
983-993. https://doi.org/10.1016/S0896-6273(04)00107-2
40
Price, G. R., Palmer, D., Battista, C., & Ansari, D. (2012). Nonsymbolic numerical magnitude
comparison: Reliability and validity of different task variants and outcome measures, and their
relationship to arithmetic achievement in adults. Acta Psychologica, 140(1), 50–57.
https://doi.org/10.1016/j.actpsy.2012.02.008
Rocca, R., & Yarkoni, T. (2021). Putting psychology to the test: Rethinking model evaluation through
benchmarking and prediction. Advances in Methods and Practices in Psychological Science,
4(3), 1-24. https://doi.org/10.1177/25152459211026864
Rodríguez, C., & Ferreira, R. A. (2023). To what extent is dot comparison an appropriate measure of
approximate number system? Frontiers in Psychology, 13, Article 1065600.
https://doi.org/10.3389/fpsyg.2022.1065600
Ross, J. (2003). Visual discrimination of number without counting. Perception, 32(7), 867–870.
https://doi.org/10.1068/p5029
Rousselle, L., & Noël, M.-P. (2008). The development of automatic numerosity processing in
preschoolers: Evidence for numerosity-perceptual interference. Developmental Psychology,
44(2), 544–560. https://doi.org/10.1037/0012-1649.44.2.544
Rousselle, L., Palmers, E., & Noël, M.-P. (2004). Magnitude comparison in preschoolers: What counts?
Influence of perceptual variables. Journal of Experimental Child Psychology, 87(1), 57–84.
https://doi.org/10.1016/j.jecp.2003.10.005
Salti, M., Harel, A., & Marti, S. (2019). Conscious perception: Time for an update? Journal of Cognitive
Neuroscience, 31(1), 1–7. https://doi.org/10.1162/jocn_a_01343
Salti, M., Katzin, N., Katzin, D., Leibovich, T., & Henik, A. (2017). One tamed at a time: A new
approach for controlling continuous magnitudes in numerical comparison tasks. Behavior
Research Methods, 49(3), 1120–1127. https://doi.org/10.3758/s13428-016-0772-7
Schooler, J. W. (2014). Metascience could rescue the ‘replication crisis.’ Nature, 515(7525), 9–9.
https://doi.org/10.1038/515009a
Shilat, Y., Salti, M., & Henik, A. (2021). Shaping the way from the unknown to the known: The role of
convex hull shape in numerical comparisons. Cognition, 217, Article 104893.
https://doi.org/10.1016/j.cognition.2021.104893
Simpson, E. H. (1949). Measurement of diversity. Nature, 163(4148), 688–688.
https://doi.org/10.1038/163688a0
41
Smets, K., Moors, P., & Reynvoet, B. (2016). Effects of presentation type and visual control in
numerosity discrimination: Implications for number processing? Frontiers in Psychology, 7,
Article 66. https://doi.org/10.3389/fpsyg.2016.00066
Smets, K., Sasanguie, D., Szücs, D., & Reynvoet, B. (2015). The effect of different methods to construct
non-symbolic stimuli in numerosity estimation and comparison. Journal of Cognitive
Psychology, 27(3), 310–325. https://doi.org/10.1080/20445911.2014.996568
Soltész, F., Szűcs, D., & Szűcs, L. (2010). Relationships between magnitude representation, counting
and memory in 4- to 7-year-old children: A developmental study. Behavioral and Brain
Functions, 6, Article 13. https://doi.org/10.1186/1744-9081-6-13
Sophian, C., & Chu, Y. (2008). How do people apprehend large numerosities? Cognition, 107(2), 460–
478. https://doi.org/10.1016/j.cognition.2007.10.009
Stalnaker, R. (2002). Common ground. Linguistics and Philosophy, 25(5/6), 701–721.
https://doi.org/10.1023/A:1020867916902
Tokita, M., & Ishiguchi, A. (2010). How might the discrepancy in the effects of perceptual variables on
numerosity judgment be reconciled? Attention, Perception & Psychophysics, 72(7), 1839–1853.
https://doi.org/10.3758/APP.72.7.1839
Townsend, J. T. (1990). Serial vs. parallel processing: Sometimes they look like Tweedledum and
Tweedledee but they can (and should) be distinguished. Psychological Science, 1(1), 46–54.
https://doi.org/10.1111/j.1467-9280.1990.tb00067.x
Tran, U. S., Lallai, T., Gyimesi, M., Baliko, J., Ramazanova, D., & Voracek, M. (2021). Harnessing the
fifth element of distributional statistics for psychological science: A practical primer and shiny
app for measures of statistical inequality and concentration. Frontiers in Psychology, 12,
Article 716164. https://doi.org/10.3389/fpsyg.2021.716164
Trezona, P.W. (2000), Luminance: Its use and misuse. Color Res. Appl., 25, 145-147.
https://doi.org/10.1002/(SICI)1520-6378(200004)25:2<145::AID-COL9>3.0.CO;2-0
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80-83.
https://doi.org/10.2307/3001968
Xu, F., & Spelke, E. S. (2000). Large number discrimination in 6-month-old infants. Cognition, 74(1),
B1–B11. https://doi.org/10.1016/S0010-0277(99)00066-9
42
Yousif, S. R., & Keil, F. C. (2019). The additive-area heuristic: An efficient but illusory means of visual
area approximation. Psychological Science, 30(4), 495–503.
https://doi.org/10.1177/0956797619831617
Zanon, M., Potrich, D., Bortot, M., & Vallortigara, G. (2021). Towards a standardization of nonsymbolic numerical experiments: GeNEsIS, a flexible and user-friendly tool to generate
controlled stimuli. Behavior Research Methods, 54, 146-157. https://doi.org/10.3758/s13428021-01580-y