1 s2.0 S0749596X16300730 Main
1 s2.0 S0749596X16300730 Main
1 s2.0 S0749596X16300730 Main
a r t i c l e i n f o a b s t r a c t
Article history: Individuals show differences in the extent to which psycholinguistic variables predict their
Received 14 December 2015 responses for lexical processing tasks. A key variable accounting for much variance in lex-
revision received 24 August 2016 ical processing is frequency, but the size of the frequency effect has been demonstrated to
Available online 12 September 2016
reduce as a consequence of the individual’s vocabulary size. Using a connectionist compu-
tational implementation of the triangle model on a large set of English words, where ortho-
Keywords: graphic, phonological, and semantic representations interact during processing, we show
Reading
that the model demonstrates a reduced frequency effect as a consequence of amount of
Frequency effects
Computational modelling
exposure to the language, a variable that was also a cause of greater vocabulary size in
Individual differences the model. The model was also trained to learn a second language, Dutch, and replicated
Bilingualism behavioural observations that increased proficiency in a second language resulted in
Lifespan development reduced frequency effects for that language but increased frequency effects in the first lan-
guage. The model provides a first step to demonstrating causal relations between psy-
cholinguistic variables in a model of individual differences in lexical processing, and the
effect of bilingualism on interacting variables within the language processing system.
Ó 2016 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY
license (http://creativecommons.org/licenses/by/4.0/).
http://dx.doi.org/10.1016/j.jml.2016.08.003
0749-596X/Ó 2016 The Authors. Published by Elsevier Inc.
This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
2 P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21
The frequency effect is typically treated in analyses as a between word frequency and visual duration thresholds
random effect as if variance across participants is random. in a word identification task. This publication is (erro-
Hence, until very recently, frequency effects have tended to neously) considered to be the start of word frequency
have been related to mean group responses to individual research by many researchers. In two experiments, Howes
words, rather than appraised in terms of individuals and Solomon presented evidence that the visual duration
responding to individual words. However, in the first study threshold in word identification decreased as a function
on the phenomenon it was already reported that the fre- of the logarithm of word frequency (also based on Thorn-
quency effect differed between participants who had small dike’s counts). Importantly, and unfortunately, no individ-
and large vocabularies. In a largely overlooked paper, ual differences were examined and the word frequency
Preston (1935) was the first to examine the word fre- effect was presented as a group effect, assumed to be
quency effect. She measured the ‘speed of word perception’ observed to the same degree in all participants. Howes
for familiar and unfamiliar words of the same length. The and Solomon’s view has dominated the literature, even
stimulus words consisted of 50 familiar and 50 unfamiliar though occasionally differences in the frequency effect
six-letter two-syllable words chosen on the basis of between groups have been investigated (e.g., Chateau &
Thorndike’s (1931) 20,000 Word List. The familiar words Jared, 2000; Lewellen, Goldinger, Pisoni, & Greene, 1993;
were selected from the 1500 highest words of the list Sears, Siakaluk, Chow, & Buchanan, 2008).
(i.e., those used most frequently in printed matter). The Our own interest in individual differences in the word
unfamiliar words were selected from the 19th and the frequency effect arose from a series of experiments pub-
20th thousand lowest words. Speed of word perception lished by Yap, Balota, Tse, and Besner (2008).2 In this article
was ‘‘measured by the time between the exposing of a the authors presented data from three different universities
stimulus word and the verbal reading of it” (nowadays on the same lexical decision task. Table 1 gives a summary of
called a word naming task). Eighty-one members of ele- the finding that caught our attention. As in Preston’s (1935)
mentary psychology classes at the University of Minnesota study, students with a smaller vocabulary size had longer
served as participants. Their average ‘‘perception time” for reaction times and, more importantly, showed a larger fre-
the familiar words was 578 ms; that for the unfamiliar quency effect.
words 691 ms. The influence of vocabulary size on the frequency effect
A second purpose of Preston’s study was ‘‘the study of was later replicated in a large-scale analysis of individual
the relation of various measures of reading ability to speed differences in the English Lexicon Project (Yap, Balota,
of word perception.”1 The reading ability of the participants Sibley, & Ratcliff, 2012).
was determined by the administration of the Vocabulary At first sight, it seems surprising that people with a lar-
Test of the Minnesota Reading Examination, the Chapman ger vocabulary are more efficient at activating the correct
Cook Speed of Reading Test, and Test V of the Iowa Silent representation than those with a smaller vocabulary, given
Reading tests. The first test contained 100 words with five that they have to select among more candidates in the
possible definitions from which examinees had to select vocabulary (Lewellen et al., 1993). Still, there are at least
the correct definition. In the Chapman Cook Speed of Read- four mechanisms that may contribute to the effect. The
ing Test participants were presented with 25 short para- first is that a larger frequency effect may be a side-effect
graphs in which one word spoiled (sic) the paragraphs. of longer reaction times (RTs; Faust, Balota, Spieler, &
Participants had to find as many intruder words as possible Ferraro, 1999): Comparing the data from Yap et al.
in 2.5 min and cross out these words. Test V of the Iowa (2008) shown in Table 1, 678 ms is 11% longer than 612,
Silent Reading tests was a paragraph comprehension test, and 844 is 15% longer than 732 ms. If we assume that part
in which 12 paragraphs had to be read and 3 questions of the RT to words is not due to word processing but to
answered per paragraph. Preston observed significant nega- constant durations such as those involved in stimulus
tive correlations between the language proficiency test transmission and action planning and performance, it
scores and the word perception response times, with the could even be possible that the proportional increase
highest correlation between vocabulary size and word per- between low and high frequency words is the same across
ception response times, and the lowest correlation between the groups. For the example at hand, this would be the case
text comprehension and word perception response times. when the constant time period for stimulus transmission
The correlation was higher for the unfamiliar words than and action is around 438 ms, as then for the lowest vocab-
the familiar words (e.g., the correlation between vocabulary ulary group the stimulus processing time would be 240 ms
size and word perception response time was .508 for the [678–438], and 174 ms for the highest vocabulary group,
unfamiliar words, and .412 for the familiar words). In other which is 38% different. For the high frequency words, the
words, the relation between vocabulary size and response differences between the highest and lowest vocabulary
times was greater for low- than high-frequency words, sug- group would be 406 ms and 294 ms, which is again 38%
gesting that individual differences in reading responses may more. Thus, it is feasible that vocabulary size affects word
reduce as a consequence of exposure. processing speed generally, rather than affecting the vari-
Preston’s (1935) paper was not mentioned in Howes ance associated with the frequency effect.
and Solomon’s (1951) article examining the relationship
1
There was also a third purpose: To determine the test-retest reliability
2
of the speed of word perception measure by asking participants to name Just like many other researchers, we were until recently unaware of the
the words twice with six days or more in-between. The reliability was .93. Preston (1935) paper. We thank Andy Ellis for pointing it out to us.
P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21 3
Table 1
Frequency effect of 3 groups of students with different vocabulary sizes on the same lexical decision task, based on Yap et al. (Experiments 2–4, clear
presentation condition).
A second explanation for individual differences in the the layers determine the efficiency with which one repre-
frequency effect could be that the more efficient retrieval sentation can activate the other. These depend on a num-
operation in people with large vocabulary sizes is due to ber of factors, including the number of times an item has
their higher intelligence. Indeed, vocabulary tests are used been presented to the model. Stimuli that are often pre-
as a part of measures of intelligence, and load on g sented succeed in a greater accumulation of adaptation of
(Wechsler, 2008), and g in turn relates to processing speed the weights in the network, so that the output they gener-
(Salthouse, 1996). So, the relation between the frequency ate resembles the desired output to a closer extent. In con-
effect and vocabulary size could be an artefact of intelli- trast, stimuli with a low presentation probability have less
gence. However, this interpretation received a serious set- impact on the organisation of the network and take more
back when it was observed that exactly the same function time to be effectively learned, resulting in larger error as
accounts for the relation between vocabulary size and fre- the model attempts to produce phonological or semantic
quency effects in second language (L2) processing as in representations from a given orthographic input. As a
first language (L1) processing (Brysbaert, Lagrou, & result, distributed networks are able to simulate frequency
Stevens, in press; Diependaele, Lemhöfer, & Brysbaert, effects without any requirement of the researcher to intro-
2013). The frequency effect is larger in L2 than L1, but this duce a frequency dependent parameter (see, e.g., Harm &
difference disappears when vocabulary size is taken into Seidenberg, 1999; Seidenberg & McClelland, 1989). In
account. The apparently larger effect of frequency in L2 is these models, high frequency words are processed more
thus because people generally know fewer words in L2 accurately than low frequency words because the connec-
than in L1. It is difficult to maintain that people would be tions supporting learning the mapping between ortho-
less intelligent in L2. graphic, phonological, and semantic representations have
A third possible contribution to the correlation between undergone more adjustment to reduce error within the
vocabulary size and the frequency effect relates to differ- system for the higher frequency words. Thus, the model
ences in the type of input. Some people may be exposed processes words to which it has been exposed with greater
to more varied input than others. For instance, it is well fidelity. Accuracy of production of phonological (for word
established that written language comprises a more varied naming) or semantic (for lexical decision) representations
vocabulary than spoken language (for reviews, see has been taken to reflect response times in behavioural
Kuperman & Van Dyke, 2013; Pfost, Dörfler, & Artelt, lexical processing in previous models (Plaut et al., 1996;
2013), at least partially because word repetition is toler- Seidenberg & McClelland, 1989).
ated in speech but not in writing. However, even when The triangle model refers to the connectionist model
modality of input is controlled, Kuperman and Van Dyke where orthographic, phonological, and semantic represen-
(2013) showed that a larger input is associated with rela- tations interact in word processing (Harm & Seidenberg,
tively more exposure to low frequency words. 2004; Seidenberg & McClelland, 1989). This model has
Finally, it could be the case that higher exposure by been tested on a range of group level effects, such as word
itself is enough to explain the smaller word frequency frequency, yet it also has the potential to reflect individual
effect, without any need for extra variables. In that scenar- differences in performance. In particular, the various theo-
io, both the small frequency effect and the large vocabulary ries about the relation between vocabulary knowledge,
size would be consequences of language exposure, which first and second language facility, and exposure can be
has a larger effect on the efficiency of word retrieval than tested for the extent to which they give rise to frequency
on the cost of interword competition. Such a view would effects within the model.
be by far the simplest interpretation and, hence, it is There are alternative models that could also potentially
worthwhile to examine whether it can be observed in com- be used to test these individual differences in performance.
putational models of word processing. The dual route cascaded (DRC) model implements two
The type of computational model best suited to investi- routes for mapping from orthography to phonology, a sub-
gate learning effects consists of the distributed connection- lexical route that maps letters to sounds via a set of
ist models (Chang, Furber, & Welbourne, 2012; Harm & grapheme-phoneme correspondence rules, and a lexical
Seidenberg, 2004; Monaghan & Ellis, 2010; Plaut, route containing word units which directly, and simultane-
McClelland, Seidenberg, & Patterson, 1996; Welbourne & ously, activate the phonology corresponding to the whole
Lambon Ralph, 2007). In these models, words are not rep- word (Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001).
resented as localist representations (nodes in a network), Such models implement word frequency effects by adding
but as activation patterns across orthographic, phonologi- an inhibitory bias that is inversely proportional to the log
cal and semantic layers. The connection weights between of the frequency of the word. Adelman and Brown (2008)
4 P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21
showed how variables within this model could be system- to a linear improvement in responding to all words, or
atically varied to test fit of the model to data, and Ziegler whether a reduced frequency effect may be caused by
et al. (2008) tested the extent to which adjusting variables improved fidelity of low frequency word mappings.
in the DRC model could simulate individual variation in Simulations 2 and 3 teased apart the relative contribu-
reading as a consequence of visual letter, word-level tion of vocabulary exposure and vocabulary size, by train-
phonological, and segmental phonological skills. In the ing the model on different vocabulary sizes. We predicted
case of the frequency effect, this could be adjusted within that vocabulary exposure would be the key factor resulting
the DRC model by varying the gradient of the frequency in changes in the frequency effect. Finally, Simulation 4
bias inhibitory function, or by varying the frequencies of tested the effect of learning a second language on fre-
words in the model’s input, as a proxy to adjusting the quency effects in the model, and whether increasing profi-
model’s environment. A third alternative would be to vary ciency in the second language resulted in reduced
the relative contribution of the lexical and sublexical frequency effects in this second language and increased
routes to word naming performance. As the sublexical frequency effects in the first language, as a result of vocab-
route is not affected by individual word frequency, word ulary size differences, in turn resulting from differences in
frequency effects would be reduced if the sublexical route exposure to the two languages. For this simulation, we
contributes more to the model’s response. However, these introduced a second language – Dutch – to the triangle
effects would have to be implemented in the model, rather model in order to investigate the relative frequency effects
than be an emergent consequence of the way the reading within the model for its reading of English and Dutch
system interacts with the environment. A more recent words, as exposure to each language varied.
instantiation of a dual route model, comprising lexical
and sublexical routes, is the CDP+ model (Perry, Ziegler,
Simulation 1: frequency effects in the triangle model of
& Zorzi, 2007). For this model, the lexical route is similar
reading
to that of the DRC, but the sublexical route learns to adjust
weights between particular letters and phonemes accord-
Method
ing to their relative frequencies. Consequently, frequency
effects at the word level are again implemented within
Architecture
the lexical route, but the overall size of the frequency effect
The model was based on the connectionist triangle
could again be altered by varying the relative contribution
model of Harm and Seidenberg (2004), and is shown in
of lexical and sublexical routes to performance. In
Fig. 1. The model comprised three representational layers,
Adelman, Sabatos-DeVito, Marquis, and Estes’ (2014) test
where orthographic, phonological, and semantic represen-
of individual differences within the CDP+ framework, they
tations of words were presented. It was limited to mono-
interpreted frequency effects as emerging only from the
syllabic words.
former variable: via adjustment of the frequency inhibitory
The phonological layer was connected to and from a set
bias in the lexical route.
of 50 cleanup units to enable the model to develop stable
Our aim in this paper is to determine the extent to
phonological representations for words. The phonological
which quantitative changes in exposure to words can
layer was connected to the semantic layer via a set of
affect the frequency effect in word naming. We report
300 hidden units. The semantic layer was connected to
the results from a series of simulations systematically
and from a set of 50 semantic cleanup units. The semantic
examining the size of frequency effects during training of
layer was connected to the phonological layer via another
the connectionist triangle model of reading (Harm &
set of 300 hidden units.
Seidenberg, 2004; Seidenberg & McClelland, 1989). Exam-
A 4 unit context layer was connected to the semantic
ining the triangle model enables us to ascertain the extent
layer via a set of 10 hidden units. This context layer
to which exposure alone has an effect on frequency effects,
enabled the model to disambiguate homophones using
without imposing adaptations to the system, as would be
context. For each homophone, a different context unit
the case using the DRC or CDP+ models as starting points.
was active. Which unit was active for each set of homo-
The precise characteristics of the model we view as not
phones was selected randomly, such that each context unit
being the critical issue, but rather we provide an explo-
was active to approximately the same frequency across the
ration of the principle of how environment can impact on
training set. For words which were not homophones, all
psycholinguistic factors affecting word representation.
context layer units were inactive.
In Simulation 1, we determined whether the beha-
The orthographic layer was connected to the phonolog-
vioural observation of the reduced frequency effect relat-
ical layer via a set of 100 hidden units, and to the semantic
ing to vocabulary size may be a consequence of greater
layer via a set of 300 units. A different number of units was
exposure to the vocabulary in the model. We tested
required for successfully learning the mapping from
whether exposure results in decreasing frequency effects
orthography to phonology than for orthography to seman-
for both naming (simulated by orthography to phonology
tics (see Plaut et al., 1996, for requirements of learning
mappings within the model) and lexical decision (simu-
pseudo-regular and arbitrary mappings).
lated by orthography to semantics mappings). We antici-
pate that reductions in the frequency effect may result
from increasing the efficiency of mappings in the model, Training set
as a consequence of extended exposure to the vocabulary. Written forms of monosyllabic words were presented at
We further tested whether the model’s performance is due the orthographic layer, which comprised 10 letter slots,
P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21 5
Hidden
300 units
Hidden
10 units Semantics Phonology
2446 units 8 x 25 units
Hidden
300 units
Hidden Hidden
300 units 100 units
Orthography
10 x 26 units
Fig. 1. Architecture of the triangle model of reading used in the current simulations.
within which each letter was represented as one unit (Weide, 1998) and a semantic representation listed in
active from a set of 26. Words were vowel-centred, such Wordnet (Miller, 1990). This set of words was slightly
that the first vowel in the word was presented at the fourth greater than that used in Harm and Seidenberg (2004)
letter slot, with two slots available for up to two consecu- because in their simulations they only included word
tive vowels in the orthography. Consonants preceding the forms with their most frequent inflected form, whereas
vowel were presented across slots 1–3, with these onset we included all monosyllabic inflected versions of the
consonants in adjacent slots to the vowel. The remaining word.
consonants and following vowels were presented in slots Frequency of words was derived from the Wall Street
commencing at slot 7 and filled slots adjacent to the two Journal corpus (Marcus, Santorini, & Marcinkiewicz,
vowel slots. Thus, for the word ‘‘plane”, the orthographic 1993), and frequency was log-compressed prior to training
representation was presented across the slots _ p l a _ n of the model. This measure of frequency was that
e _ _ _, and for ‘‘aunt”, the orthographic representation employed in the first implementation of the triangle model
was _ _ _ a u n t _ _ _. A letter present in each position (Harm & Seidenberg, 2004), and is included here for com-
was represented as the unit in the slot associated with that parison with this earlier version. Note that this compres-
letter having activity 1. sion maintains the relative frequency order of words, but
Phonological forms of words were presented at the substantially reduces the range of frequencies for the
phonological layer, which comprised 8 phoneme slots, model. The model therefore applies a stringent test of the
with each slot composed of a set of 25 phonological fea- extent to which the changing frequency effects in beha-
tures. Phonological features were exactly those used by viour can be simulated with this smaller distinction
Harm and Seidenberg (2004). Phonological representations between word frequencies.
of words were presented with three slots for the onset, one
slot for the vowel, and four slots for the coda. Onset and Training and testing
coda consonants were presented across slots directly adja- Five versions of the model were trained as separate sim-
cent to the vowel. Diphthongs, and long and short vowels ulations, with different randomised starting weights, and
were all represented as a set of features active in a single different random orderings of training patterns selected
vowel slot. So, for the word ‘‘plane”, the phonological rep- according to frequency. This ensured that the observed
resentation was _ p l eI n _ _ _. Phoneme features had activ- results were not due to particular starting configurations
ity 1 in phoneme slots that were present in the input. of the model.
The semantic representations of each word were
acquired from Wordnet (Miller, 1990), using the same Pretraining. The model was first trained to learn to map
algorithm described by Harm and Seidenberg (2004). The between phonological and semantic representations, as
semantic representation for each word comprised an acti- well as to develop stable phonological to phonological
vated subset of 2446 semantic features. Presence of a fea- mappings, and semantic to semantic mappings.
ture was represented with activity 1. For the phonological to phonological mapping trials, a
There were a total of 6229 words, which comprised all phonological representation of a word was presented at
monosyllabic words in English which had both a phonolog- the phonological layer. Then, the activity in the model
ical representation in the CMU pronouncing dictionary was allowed to cycle for 6 time steps, and for time steps
6 P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21
7 and 8 the model was required to reproduce the phono- actual production. If the closest phonemes matched the
logical representation of the word. Similarly, for the target in all positions then the model’s phonological pro-
semantic to semantic trials, the model was required to duction was judged to be accurate.
reproduce the semantic representation presented at the For the reading trials, the model was presented with the
semantic layer in time steps 7 and 8. For the phonological orthographic representation of each word, and closeness
to semantic mappings, the phonological representation and accuracy of the model’s actual production at both the
and the context representation was presented to the model semantic and the phonological layers were recorded. As
for a word for all 8 time steps, and the model was required with behavioural studies of reading, we distinguish accu-
to produce the semantic representation of the word in time racy of responses from response time measures. The model
steps 7 and 8. For semantics to phonological mappings, the may produce an accurate response (closer to the target
semantic representation was presented at all time steps, than any other representation in the training set) but to
and the model was required to produce the phonological varying degrees of closeness in terms of the actual versus
representation of the word at time steps 7 and 8. As the target representation. Closeness of the model’s phonologi-
semantic representation was unambiguous with respect cal production to the target phonology was taken to relate
to producing the phonological form of the word, the con- to response time measures of naming, in accordance with
text layer was not necessary in order to form this mapping. previous connectionist models of reading (e.g., Harm &
The model was trained using recurrent backpropaga- Seidenberg, 2004; Monaghan & Ellis, 2010; Plaut et al.,
tion, with cross-entropy error computed between the tar- 1996) as it provides an indication of the ease with which
get and the model’s actual production for each word’s the model can generate the phonological form of the word
representation. The learning rate was set at 0.05. The pre- from its orthographic input. Similarly, the closeness of the
training comprised 2 million word presentations, with semantic production was related to response times in lex-
words selected according to their log-compressed fre- ical tasks involving generation of a semantic representa-
quency, in the range [0.05, 1]. 10% of trials were the phono- tion, as again the closeness reflects the ease with which
logical to phonological mapping, 10% were semantics to the model can produce a meaning representation from
semantics, 40% of trials mapped from semantics to phonol- orthographic input.
ogy, and the remaining 40% mapped from phonology to An alternative measure of accuracy of semantics would
semantics. be to determine whether each feature was activated above
or below a given threshold, rather than to measure accu-
Reading training. Following pretraining, the model then racy based on relative distance to other patterns in the
learned to map from orthographic forms onto phonological training set. To determine whether taking a unit threshold
and semantic representations. The orthographic represen- of 0.5 at the semantic output layer resulted in a different
tation of a word was presented at the orthographic layer, reflection of accuracy, we compared the model’s perfor-
and simultaneously the context layer representation was mance for the nearest neighbour and threshold function
also presented. Then, from time steps 7 to 12, the model accuracy measures. At the end of training, the model was
was required to produce the phonological and the seman- able to solve the task to a high degree of accuracy for both
tic representation for that word. Cross-entropy error was accuracy measures (for nearest neighbour: mean = 99.7%,
backpropagated through the model, and the learning rate SD = .05%, for threshold: mean = 98.3%, SD = .07%). There
was set at 0.01. The model was trained for 1 million was a high degree of correspondence between the thresh-
presentations. old measure of accuracy and the nearest semantic repre-
sentation measure: mean agreement = 98.5% of patterns,
Testing. The pretraining model was tested on both phono- SD = .6%, v2(1) = 3828.3, p < .0000001. Thus, the model
logical to semantic trials, and semantic to phonological tri- was able to solve the mapping task to a high degree of
als. For the phonological to semantic trials, the accuracy regardless of the precise measure of accuracy.
phonological representation of each word was presented,
and then the model’s production at the semantic layer at Results
the end of the 8 time steps of activation was recorded.
The closeness of the model’s semantic production was The model’s performance for accuracy was assessed
determined by measuring the sum squared error over the using generalized linear mixed effects models, and mea-
semantic layer. The accuracy of the model’s semantic pro- sures for frequency effects were assessed on the model’s
duction was measured by computing the cosine of the error. The significance of individual and interacting factors
model’s actual semantic representation against the seman- was assessed by determining whether the model fit
tic representations of each of the 6229 words in the train- improved significantly by applying a likelihood ratio test
ing set. If the cosine distance was lowest for the target comparison between models with and without the factor
representation then the model was judged to be accurate. or interaction of interest.
For the semantic to phonological trials, the semantic
representation was presented and then the phonological Pretraining
production was compared to the target phonological repre- Pretraining was halted after 2 million patterns, and at
sentation after 8 time steps, then the closeness of the mod- this point the model achieved mean accuracy of 96.0%
el’s production was determined by measuring sum squared (SD = 1.9%) for mapping from semantic to phonological
error. Accuracy of the model was measured by determining representations, and 87.8% (SD = 1.2%) for mapping from
for each phoneme slot the closest phoneme to the model’s phonological to semantic representations (see Fig. 2). To
P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21 7
Fig. 2. Performance of the triangle model during pretraining between Fig. 3. Performance of the triangle model during training on orthography
phonological and semantic representations (S ? P is semantics to to phonological (O ? P) and orthography to semantic (O ? S) represen-
phonology mappings, P ? S is phonology to semantics mappings). Error tations. Error bars show ±1 SEM of mean accuracy by simulation.
bars show ±1 SEM of mean accuracy by simulation.
Frequency effects
test whether semantic representations were slower to To determine the extent to which frequency effects var-
acquire than phonological representations during learning, ied as a consequence of exposure, the correlation between
we compared the fit of binary logistic linear mixed effects frequency and the closeness of the model’s output produc-
models. As a baseline, we constructed a model with simu- tion compared to the target representation, as measured
lation (simulation one to five) and word (each of the 6229 by mean square error, for phonological and semantic rep-
vocabulary items) as random effects, and log of training resentations is shown in Fig. 4. Frequency effects can then
epoch as a fixed effect, with accuracy (correct or incorrect) be determined by the extent to which the frequency of a
of the model as the dependent variable. We then tested word improves the fit of the statistical model to the com-
whether adding mapping type (semantics to phonology, putational model error data. Changes in the frequency
or phonology to semantics) to this model resulted in a sig- effect can then be determined by examining the interac-
nificant improvement of fit. We found that it did, v2(1) tion of frequency with other fixed factors in the model.
= 28,851, p < .001, thus, the computational model learned To compare frequency effects across the phonological
to map from semantics to phonology more accurately than and semantic representations, a mixed effects model with
phonology to semantics. This was likely because the simulation and word as random effects, and log of training
semantic input representations were more distinct, epoch as fixed effect was constructed as a baseline. Adding
enabling greater differentiation of input patterns during mapping (orthography to phonology, or orthography to
training. semantics) as a fixed effect improved model fit, v2(1)
= 246,635, p < .001, as did adding word frequency, v2(1)
= 1920.6, p < .001. This indicated that, overall, there was a
Reading accuracy frequency effect in the triangle model’s performance. Add-
For the full reading model, accuracy for mapping from ing the interaction between frequency and mapping also
orthography to phonology and to semantics is shown in improved fit, v2(1) = 70,012, p < .001. This indicated that,
Fig. 3. By the end of 1 million patterns of training, the as anticipated, the frequency effect was larger for the
model was able to accurately produce the phonological semantic representations than for the phonological repre-
(mean = 99.9%, SD = .03%) and the semantic representa- sentations. This is consistent with a greater effect of
tions (mean = 99.8%, SD = .05%). A binary logistic mixed item-level properties for arbitrary than for consistent map-
effects model with simulation and word as random effects, pings, both within mappings, such as in the frequency by
and log of training epoch as fixed effect was improved in fit consistency effect for single word naming tasks (Taraban
by adding in an additional fixed effect of mapping type & McClelland, 1987) and across mappings, such as the lar-
(orthography to phonology, or orthography to semantics), ger frequency effect as a predictor of lexical decision
v2(1) = 47,542, p < .001. Adding an interaction between response times (which has been proposed to involve
training epoch and mapping type also improved fit signif- semantic representations) compared to naming times for
icantly, v2(1) = 244.24, p < .001, indicating that phonologi- single words (Ghyselinck, Lewis, & Brysbaert, 2004).
cal representations were learned more accurately than In general, the frequency effect for both semantic and
semantic representations especially in the early stages of phonological representations declined with length of train-
training. ing. For instance, for the semantic representations change
8 P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21
O->P
10K 100K 1M
O->S
10K 100K 1M
Fig. 5. Mean square error of the model’s productions by word frequency for all 6229 words in the vocabulary, for orthography to phonology (O ? P) and
orthography to semantic (O ? S) mappings at different stages of training. Solid lines show the linear regression fit.
Method
Architecture
The architecture was the same as in Simulation 1.
Results
tested the effect of vocabulary size on the frequency effect initial increase in frequency effects with the larger
for the phonological and semantic representations sepa- vocabularies.
rately, by first constructing a baseline linear mixed effects For the semantic representations, the same series of
model with the closeness of the model’s production to the models were tested as for the phonological representa-
target as the dependent variable, random effects of simula- tions. The interaction between frequency and log of train-
tion and word, and fixed effects of log of training epoch, ing epoch improved model fit significantly, v2(1) = 58,475,
frequency and vocabulary size. The effect of vocabulary p < .001. Frequency by vocabulary size also improved
size on the frequency effect is determined by examining model fit, v2(1) = 23,879, p < .001, with the frequency
the interactions between the fixed effects. effect largest for 4000 words, then 2000 words, then
For the phonological representations, adding the inter- 1000 words, t = 73.2, t = 29.1, both p < .001. Adding the
action between frequency and log of training epoch three way interaction also significantly improved fit,
resulted in a significant improvement in fit, v2(1) v2(1) = 3851.7, p < .001. In this case, there was a mono-
= 11,711, p < .001, thus confirming the effect of frequency tonic relation between vocabulary size and change in the
changing with training that was also observed for the full frequency effect, such that the rate of change was highest
set of 6229 words. Adding the interaction between fre- for 4000 words than 2000 words, which was in turn higher
quency and vocabulary size significantly improved model than for 1000 words, t = 36.60, t = 28.36, respectively, both
fit, v2(1) = 266.13, p < .001, with the magnitude of the fre- p < .001. However, importantly it remained the case that,
quency effect greater for 4000 words than 2000 words, when controlling for vocabulary size, frequency effects
t = 6.38, and the frequency effect for 2000 words greater reduced as exposure increased.
than that for 1000 words, t = 12.09, both p < .001. Adding The change in frequency effect with exposure was again
the three-way interaction between log of training epoch, found to be improved by a quadratic fit over the three
frequency and vocabulary size to a model with all main vocabulary sizes, v2(2) = 20,977, p < .001, however, as with
effects and two-way interactions also resulted in a signifi- the phonological representations, the interaction between
cant improvement in fit, v2(1) = 4.5263, p = .034. The vocabulary size and frequency and the quadratic of log
decline in the frequency effect with training was greater epoch also significantly improved fit, v2(2) = 11,275,
for the 2000 word vocabulary than the 4000 word vocabu- p < .001. For each vocabulary size individually, the quadra-
lary, t = 7.11, and the 4000 vocabulary decline was greater tic improved fit: 1000 words: v2(2) = 11,834; 2000 words:
than the 1000 word vocabulary, t = 8.64, both p < .001. v2(2) = 17,899; 4000 words: v2(2) = 8657.6, all p < .001. All
Thus, the change in the frequency effect was affected by vocabulary sizes demonstrated the change in direction of
vocabulary size, but was not monotonically related to the frequency effect, though this was largest for the 2000
vocabulary size: a larger vocabulary resulted in a smaller word condition.
reduction in the frequency effect than a medium vocabu- All in all, there is little evidence that larger vocabulary
lary. Overall, controlling for vocabulary size, the observa- sizes lead to smaller frequency effects. If anything, they
tion that frequency effects declined with training induce stronger overall frequency effects. Furthermore, at
exposure was highly reliable. least in the case of orthography to phonology mappings,
We further tested whether the observation from Simu- a larger vocabulary is even protective against a change in
lation 1 that the frequency effect changed direction as a frequency effects as a consequence of additional training.
consequence of training for the varying vocabulary sizes. Thus, the behavioural effects relating to frequency effect
We compared models with a linear and a quadratic inter- changes are not simulated in the model by vocabulary size
action effect of frequency and log epoch, and found that increasing, but are due instead to exposure. Furthermore,
the quadratic improved fit of the model over all three our interpretation of the frequency effect change as being
vocabulary sizes combined, v2(2) = 30,394, p < .001, indi- driven by two processes – an initial increase in the fre-
cating that, overall, there was a quadratic effect of fre- quency effect as representational fidelity improves, then
quency against exposure similar to Simulation 1. decrease with exposure to items – is shown to be general-
However, the three-way interaction between vocabulary izable across these vocabulary sizes.
size, frequency, and quadratic function of log epoch However, in Simulation 2 the selection of subsets of
improved fit further, v2(1) = 7312.1, p < .001, indicating words was random which may not perfectly reflect the sit-
that the quadratic effect decreased with smaller vocabu- uation of actual acquisition, where smaller vocabularies
lary sizes. Investigating the vocabulary sizes individually, are likely to comprise the most frequent words. In order
the quadratic effect improved model fit for all vocabulary to test whether vocabulary size might affect frequency
sizes: for 1000 words, v2(2) = 7974.7; for 2000 words, effects if smaller vocabularies constitute the subset of
v2(2) = 12,300; for 4000 words, v2(2) = 15,510, all higher-frequency words, we conducted Simulation 3.
p < .001. Though Fig. 8 illustrates an initial increase for
the 1000 word vocabulary for phonological representa-
tions, the quadratic fit indicates that the change in direc- Simulation 3: frequency effects in the triangle model
tion occurs at an early point in training. Thus, the change trained with varying vocabulary size
of direction in the frequency effect is greater for larger
vocabulary sizes, but the effect is still discernible for smal- This simulation was similar to that of Simulation 2,
ler vocabulary sizes. We interpret this as being due to the except that the subsets of 1000, 2000, and 4000 words
difficulties in developing high-fidelity representations comprised the most frequent words from the larger vocab-
when the vocabulary size is greater, resulting in a larger ulary, in order to simulate the greater likelihood of smaller
12 P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21
Method
Architecture
The architecture was the same as in Simulation 1.
Results
Fig. 9. Orthography to phonology and orthography to semantics map-
The triangle model’s performance was assessed in the pings accuracy for the model trained with different vocabulary sizes,
same way as for Simulation 2 by constructing mixed selected as the most frequent words.
effects models and testing individual factors and interac-
tions for their improvement to model fit.
Fig. 9 shows the accuracy of the model for mapping
from orthography to phonology and orthography to
semantics during learning for the 1000, 2000, and 4000
word sets. As for Simulation 2, increasing the size of the
vocabulary resulted in a reduction in accuracy during
training: A generalized linear mixed effects model adding
vocabulary size as a fixed factor improved model fit com-
pared to a model with just random effects of simulation
and word and fixed effect of log of training epoch, v2(1)
= 6514.8, p < .001. Again, like Simulation 2, the effect of
vocabulary size was significantly different for mapping to
semantics than mapping to phonology: Adding an interac-
tion between mapping and vocabulary size increased
model fit compared to the model containing just main
effects, v2(1) = 1149.8, p < .001. Also similar to Simulation
2, the effect of vocabulary size was greater in the earlier
stages of training: adding an interaction between log of
epoch training and vocabulary size significantly increased
model fit, v2(1) = 1568.6, p < .001.
Fig. 10 shows the frequency effect for the model during
training for semantic and the phonological output for the Fig. 10. Frequency effects for orthography to phonology and semantics
mappings, for the triangle model trained with different vocabulary sizes
different vocabulary sizes in Simulation 3. There was a for the most frequent 1000, 2000, or 4000 words in the corpus. Error bars
reduction in the frequency effect as vocabulary size show ±1 SEM of mean correlation between word frequency and error by
reduced. As Simulation 2, the effect of vocabulary size on simulation.
the frequency effect was determined by examining the
interactions between the fixed effects, by testing the
improvement of fit over a baseline model containing only
harder arbitrary (semantic) versus easier quasi-
random effects and main effects.
systematic (phonological) mappings.
As for Simulations 1 and 2, frequency effects were
For the phonological representations, adding the inter-
found to be larger for semantic than phonological repre-
action between frequency and log of training epoch
sentations, v2(1) = 41,545, p < .001. As for Simulation 2,
resulted in a significant improvement in fit, v2(1)
the interaction between frequency, mapping, and vocabu-
= 12,613, p < .001, the effect of frequency changed with
lary size improved fit, v2(1) = 5197.5, p < .001. The differ-
training in the same way as for Simulations 1 and 2. Adding
ence in frequency effect was greater for 4000 than 2000
the interaction between frequency and vocabulary size sig-
words, t = 60.3, and greater for 2000 than 1000, t = 6.0,
nificantly improved model fit, v2(1) = 33.037, p = .001,
both p < .001, consistent with an enhanced difference for
with the magnitude of the frequency effect greater for
a model required to learn a larger versus a smaller set of
P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21 13
4000 words than 2000 words, t = 2.31, p = .021, which was Simulation 4: frequency effects in first and second
greater than 1000 words, t = 2.52, p = .012. Adding the languages
three-way interaction between log of training epoch, fre-
quency and vocabulary size to a model with all main Simulations 1, 2, and 3 established that, in the triangle
effects and two-way interactions also resulted in a signifi- model, the frequency effect in learning to read a single lan-
cant improvement in fit, v2(1) = 176.93, p < .001. The guage can relate to exposure. In bilinguals, mapping
effects were similar to those for Simulation 2: the larger between orthographic, phonological, and semantic repre-
vocabulary related to a larger frequency effect. When sentations in two languages, frequency effects have been
vocabulary size was controlled, the frequency effect was shown to be stronger compared to monolinguals (Gollan,
found to decrease as a consequence of extended training. Montoya, Cera, & Sandoval, 2008; Ransdell & Fischler,
As for Simulation 2, the change in frequency effect with 1987). An explanation for this has been in terms of fre-
exposure was found to be improved by a quadratic fit over quency of usage (Gollan et al., 2008): As bilinguals have
the three vocabulary sizes, v2(2) = 46,623, p < .001. Also as less exposure to each language, they have ‘‘weaker-links”
for Simulation 2, the interaction between vocabulary size, between orthographic, phonological, and semantic repre-
frequency and quadratic of log epoch also improved fit, sentations and this will be particularly harmful for access-
v2(2) = 32,477, p < .001. For each vocabulary size individu- ing low frequency words.
ally, the quadratic again improved fit: 1000 words: v2(2) An alternative account of reduced frequency effects is
= 5098.5; 2000 words: v2(2) = 20,332; 4000 words: v2(2) increased interference between languages: there is greater
= 26,010, all p < .001. Again, all vocabulary sizes demon- competition amongst a vocabulary that is almost twice as
strate the change in direction of the frequency effect. large in bilinguals than monolinguals, reducing the psy-
For the semantic representations, the interaction cholinguistic effects influencing lexical access in a single
between frequency and log of training epoch improved language (Costa, 2005; Peterson & Savoy, 1998). Such influ-
model fit significantly, v2(1) = 63,787, p < .001. Frequency ences across languages are well-attested, with L2 acquisi-
by vocabulary size also significantly improved model fit, tion resulting in slower lexical access to L1 (Kroll,
v2(1) = 4.927, p = .026. Adding the three way interaction Michael, Tokowicz, & Dufour, 2002; Linck, Kroll, &
did significantly improve fit, v2(1) = 4268.9, p < .001. As Sunderman, 2009) and a larger frequency effect, even in
with the phonological effects, the larger vocabularies the dominant language (Gollan et al., 2008).
resulted in a larger frequency effect, and demonstrated In terms of comparison of frequency effects within
that, when controlling for vocabulary size, the frequency bilingual speakers, the frequency effect is typically larger
effect reduced with exposure. in L2 than in L1 (Cop, Keuleers, Drieghe, & Duyck, 2015;
For the quadratic fit of log epoch, the interaction with de Groot, Borgwaldt, Bos, & van den Eijnden, 2002;
frequency improved fit over all three vocabulary sizes for Duyck, Vanderelst, Desmet, & Hartsuiker, 2008; Van
the semantic representations, v2(2) = 56,523, p < .001. Wijnendaele & Brysbaert, 2002; Whitford & Titone,
There was a significant improvement in fit with the inter- 2012). In a mega-study, Lemhöfer et al. (2008) tested Eng-
action between vocabulary size, frequency and the quadra- lish word identification in English monolingual and bilin-
tic of log epoch, v2(2) = 37,148, p < .001. As with gual Dutch, French, and German speakers, and found a
Simulation 2, the quadratic improved fit for each vocabu- larger L2 frequency effect than L1 in English, which was
lary size: 1000 words: v2(2) = 11,834; 2000 words: v2(2) due principally to greater slowing of low-frequency words
= 17,899; 4000 words: v2(2) = 8657.6, all p < .001. The in the L2 speakers. Diependaele et al. (2013) argued that
results show that, as for Simulation 2, there is a change this difference disappears when vocabulary size in each
in direction of the frequency effect with training, with language is taken into account, and in a more recent
the size of the effect changing, but the qualitative nature mega-study Brysbaert et al. (in press) confirmed that most
of this change unaffected by vocabulary size. of the difference in frequency effects between L1 and L2
Thus, the results of Simulation 2 and 3 indicate that a was due to vocabulary size, taken as a proxy for exposure
larger vocabulary was protective against reduced fre- to each language.
quency effects, rather than the cause of frequency effect In the present simulation, we investigated whether the
changes with training as could be expected given the triangle model can simulate these effects by examining rel-
stronger competition possible from a larger vocabulary. ative exposure to two languages. We tested two hypothe-
Therefore, the smaller frequency effect for people with ses: (1) that exposure is the main determinant of the
large vocabularies found in lexical decision tasks cannot difference in frequency effect between L1 and L2, and (2)
be explained by vocabulary size itself. At the same time, that knowledge of another language increases the fre-
Simulations 2 and 3 confirmed the finding of Simulation quency effect in L1. We tested whether these hypotheses
1 that extra exposure undoes the larger frequency effect were consistent with the triangle model’s performance
related to the knowledge of more words. Towards the when trained on a second language. We chose Dutch as
end of the training, the frequency effect was similar for the second language, as this language has a high degree
all vocabulary sizes tested. After 1 million training trials of orthographic overlap with English and was one of the
the frequency effect on the O ? S mappings was smaller languages tested by Diependaele et al. (2013). We imple-
for the model trained on 4000 words than for the model mented sequential acquisition (L2 introduced after some
with 1000 words trained after 20K trials, even when the time learning L1), as this is the typical state-of-affairs for
latter 1000 words were the most frequent ones (Fig. 10). participants in research on bilingualism (Li & Zhao, 2013).
14 P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21
Method
Architecture
The architecture of the model was the same as in Sim-
ulation 1.
Fig. 14. Frequency effect affected by exposure to second language. (A) Effect of Dutch exposure on English orthography to phonology; (B) effect on Dutch
orthography to phonology; (C) effect on English orthography to semantics; (D) effect on Dutch orthography to semantics. Notice that as the curves go higher
in this figure, they approach a frequency effect of 0; lower values mean a stronger frequency effect.
Dutch exposure, t = 14.33, both p < .001. Adding the also improved fit, v2(1) = 118.17, p < .001, indicating that
interaction between frequency, Dutch exposure, and log the effect declined at a greater rate with further training
epoch training exposure improved fit compared to a model for 25% compared to 50%, t = 8.60, and 50% compared to
containing main effects and two-way effects, v2(2) 75% Dutch exposure, t = 11.28, both p < .001.
= 68.286, p < .001, indicating a greater change of frequency All in all, the results of the simulations agree rather well
effect with training time for the 25% Dutch exposure than with the behavioural findings: (1) The English frequency
50% Dutch exposure, t = 3.311, and smaller change still for effects become stronger with more use of Dutch, but (2)
the 75% Dutch exposure, t = 4.902, both p < .001 (see decrease as the training continues. We also see (3) a stron-
Fig. 14B). ger frequency effect in L2 when it is used less frequently
For semantics, increase in exposure to Dutch also (i.e., for the Dutch 25% exposure). However, from Fig. 14,
resulted in a reduced effect of frequency for Dutch there is a suggestion that the frequency effect in the 75%
(Fig. 14D). Adding the interaction between word frequency Dutch condition was very small (panels B and D). This
and proportion of Dutch improved model fit, v2(1) was partially a consequence of measuring the frequency
= 899.47, p < .001, with the magnitude of the effect signif- effect only after 100,000 training presentations to the
icantly larger for 25% Dutch than 50% Dutch exposure, model. Comparing to the different vocabulary conditions
t = 41.09, and smaller still for 25% Dutch, t = 31.08, of Simulation 2 (Fig. 8), after 100,000 epochs the frequency
p < .001. Adding the three-way interaction to the model effect for phonological and semantic representations in the
P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21 17
1000 word simulation had already substantially declined. ing in Dutch words being processed with similar levels of
Investigating the Dutch model at earlier training stages, ease regardless of their individual frequencies. As the over-
we found that frequency effects were initially higher than lap between orthography and semantics is only very low
those observed after 100,000 bilingual training trials for between these languages, we do not observe a reduced fre-
the semantic representations: after 540,000 trials, the fre- quency effect for the semantic representations.
quency effect peaked at .093 (SD = .020). Yet, the fre- However, note that the simultaneous exposure to the
quency effect for phonology remained small, but two languages exacerbates the frequency effect: the 25%
significantly different than chance, at these earlier training Dutch exposure model has had the same exposure to
stages (e.g., after 560,000 trials, the frequency effect Dutch at 700,000 epochs of training as the 50% Dutch expo-
peaked at mean = .025, SD = .024). It could be that the sure model has had at 600,000 epochs, and yet the fre-
small frequency effect in Dutch was due to optimising quency effects appear to still be enhanced in this second
the merging of the statistics of the mappings for Dutch language. To test this possible enhancement from learning
and English orthography to phonology mappings when in another language, independent of exposure to the lan-
sufficient exposure to Dutch was available, thereby result- guage in which frequency effects are to be tested, we anal-
Fig. 15. Frequency effect according to exposure to second language, controlling for exposure in the first language. (A) English orthography to phonology; (B)
Dutch orthography to phonology; (C) English orthography to semantics; (D) Dutch orthography to semantics.
18 P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21
ysed a subset of the model data equating the exposure to model on multiple languages, and such weakening of links
each language, and comparing the frequency effect across does not have to be explicitly included in the model.
exposure conditions. Thus, for English, we compared the
frequency effect of the model for the 25% Dutch exposure
training at 600,000 epochs, the 50% Dutch exposure train- General discussion
ing at 700,000 epochs, and the 75% Dutch exposure train-
ing at 800,000 epochs. For Dutch, we measured the Individual differences in performance for language
frequency effect for the 25% Dutch exposure training con- tasks are a topic of growing interest (Andrews, 2015; Yap
dition at 800,000 epochs, the 50% Dutch exposure training et al., 2012). Such variation can provide insight into the
at 700,000 epochs, and the 75% Dutch exposure training at processing parameters that underlie behaviour. In word
600,000 epochs. The results for frequency effects in naming and lexical decision tasks, a key observation is that
phonology and in semantics are summarised in Fig. 15. psycholinguistic effects may vary across participants. Indi-
Baseline linear mixed effects models on the frequency vidual differences in the variance in response times and
effect were first constructed, with simulation and word accuracy explained by psycholinguistic variables can be
as random effects and frequency and exposure condition partially accounted for by age (Morrison, Hirsh, Chappell,
as factors. Then, the improvement in fit when the interac- & Ellis, 2002), by language proficiency (Chateau & Jared,
tion between frequency and exposure condition was 2000; Diependaele et al., 2013; Lewellen et al., 1993;
determined. Preston, 1935; Sears et al., 2008; Yap et al., 2008, 2012),
For English, the intensity of Dutch exposure had a sig- or as a consequence of language exposure (Brysbaert
nificant effect for orthography to phonology mappings, et al., in press; Kuperman & Van Dyke, 2013). Of particular
v2(1) = 58.435, p < .001, and for orthography to semantics, interest to us was to examine the potential causes of the
v2(1) = 811.99, p < .001. Similarly, for Dutch, intensity of frequency effect, because it accounts for a large proportion
exposure was significant for orthography to phonology, of variance in lexical processing accuracy and response
v2(1) = 12.147, p < .001, and for orthography to semantics, times in behavioural studies. Our simulations were able
v2(1) = 83.464, p < .001. The effects of intensity affected to replicate observed differences in frequency effects for
both languages in a similar way: there was greater reduc- lexical processing tasks that principally involve mapping
tion of the frequency effect if exposure to Dutch was more from orthography to phonology and those that map from
intense, which applied both to English words and Dutch orthography to semantics (Ghyselinck et al., 2004).
words in the bilingual model. We considered four possible explanations for the obser-
A further analysis of the 25%, 50%, and 75% Dutch expo- vation that participants with larger vocabularies have
sure simulations, controlling for accuracy of Dutch reading lower frequency effects. First, the relation between size
instead of exposure to Dutch, resulted in a similar pattern of the frequency effect and vocabulary size may be a mere
of effects. At 600,000 epochs, the 75% Dutch exposure sim- side-effect of quicker response times in those with greater
ulations reached 93.6% (SD = 24.4%) for phonology and language proficiency. In this case, the frequency effect may
89.9% (SD = 30.1%) for semantics. At 700,000 epochs, the be reduced in those with higher language proficiency
50% Dutch exposure simulations reached similar accuracy because of a floor effect in response times. Our Simulation
(phonology mean = 92.9%, SD = 25.7%; semantics 1 demonstrated that greater proficiency could be related to
mean = 88.5%, SD = 31.9%). At 1,000,000 epochs, the 25% frequency effects, but went further than previous beha-
Dutch exposure was similarly accurate (phonology vioural studies by demonstrating a potential cause of this
mean = 92.5%, SD = 26.3%; semantics mean = 88.1%, relation: due to amount of exposure to language by the
SD = 32.4%), so these simulations at these training epochs reading system. Furthermore, the origin of the reduced fre-
were compared. Intensity of Dutch exposure influenced quency effect was primarily due to reduction of error vari-
frequency effects in English for both orthography to ance for lower frequency words in the triangle model. This
phonology, v2(1) = 198.44, p < .001, and orthography to change in the model’s mappings between representations
semantics, v2(1) = 3241, p < .001. Similarly, intensity of is a consequence of error-driven learning in the model,
Dutch exposure influenced frequency effects in Dutch in such that those patterns that contribute most error con-
phonology, v2(1) = 145.51, p < .001, and semantics, v2(1) tribute most change to weights on connections within
= 478.56, p < .001. As with the simulations controlling for the model. As error from low-frequency words is greater
exposure to Dutch, when controlling for accuracy of perfor- than that for high-frequency words, the low-frequency
mance in Dutch, increased intensity of Dutch exposure words are contributing most to reconfiguring the model’s
resulted in a smaller frequency effect in both languages. structure by reducing the model’s error for those low-
Thus, frequency effects were not entirely independent frequency patterns. Thus, the reduced frequency effect
in first and second language, and therefore cannot be com- was not entirely due to a general improvement in response
pletely accounted for by exposure within a language in the fidelity across all stimuli, in contrast to this first explana-
model’s performance, as Diependaele et al. (2013) have tion. However, the overall reduction in error for the mod-
proposed. Instead the results seem consistent with the el’s representations of phonological and semantic forms
weaker-links hypothesis of Gollan et al. (2008), who pro- of words is consistent with a contribution of frequency
posed that learning a second language can reduce the effect reduction relating to response variation associated
strength of mapping between orthography and phonology with psycholinguistic processes of lexical access associated
and semantics in a first language. This weaker links prop- with generation of the decision making response (e.g.,
erty of the model is an emergent result of training the Norris, 2009). Furthermore, we established in Simulations
P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21 19
2 and 3 that vocabulary size was not the key variable individuals. Training a model that learns to map between
underlying changes in frequency effects, but rather orthographic and phonological and semantic representa-
amount of exposure was the critical driver behind efficient tions with increasing efficiency demonstrated the same
processing of mappings between representations. effects as those observed in participants. Furthermore, size
The second explanation for variation in frequency of vocabulary was not sufficient to explain the model’s per-
effects was that language proficiency is related to intelli- formance. The triangle model therefore tests the adequacy
gence, and intelligence is underwritten by greater speed of a theory based on language exposure resulting in greater
of processing, which could again compress frequency efficiency of accessing representations of words. This theo-
effects for those with higher intelligence. Here, the data retical principle was shown to account also for individual
and computational modelling of bilingual participants is difference effects observed in reading in L1 and L2, and
crucial. Diependaele et al. (2013) showed that frequency these data are critical for distinguishing exposure effects
effects were not person-dependent but rather dependent from other individual variation in cognitive processing that
on the individual’s proficiency in the language being could affect performance. For instance, efficiency of map-
tested. We showed that the triangle model can be pings between representations can be the result of the
extended to learn to read words in second language, and amount of resources serving mappings in a computational
that varying the exposure of the model to first and second model, or by the learning function – faster learning relates
language could predict the pattern of frequency effects for to a higher learning rate parameter in the model, or by
L1 and L2 speakers. Simulation 4 demonstrated that increasing the speed with which information can pass
increased exposure to L2, with a concomitant increase in within the network (e.g., Faust et al., 1999; Plaut &
proficiency in L2, resulted in increased frequency effects Booth, 2000). All these parameters are potentially adjusta-
in L1 and reduced frequency effects in L2. However, the ble in the model, but none would explain the apparent
model’s performance was not wholly accounted for by interaction between size of frequency effects in L1 and
amount of exposure within a language, as there was evi- L2. Adjustments to resources, rate of learning, or speed of
dence that intensity of exposure also affected the size of processing would result in similar effects in both first
frequency effects. In both first and second languages, and second language, whereas, the size of the frequency
greater intensity of second language exposure reduced effects are shown to be inversely related to proficiency
the size of frequency effects when total exposure within for each language. We thus contend that additional factors
each language was controlled. Indeed, increased intensity contributing to individual differences in the frequency
of exposure to a second language could be hypothesised effect are not necessary to explain the data, and that an
to result in increasing the noise in mappings for the first explanation based on exposure is the most parsimonious
learned language, thereby increasing the frequency effect explanation for the observed effects.
in that first language, due to reduction in compression. Similarities between first and second languages may
However, the opposite was the case: the increase of the influence the extent to which multiple languages influence
L1 frequency effect was largest in the 25% Dutch exposure processing in the other language. Kaushanskaya, Yoo, and
situation. The finding that our model predicts an increase Marian (2011) found that for English-Spanish bilinguals,
in the L1 word frequency effect when another language proficiency in Spanish reading was associated with profi-
is learned, is a finding consistent with studies of interfer- ciency in English reading. However, for English-Mandarin
ence effects across languages (e.g., Costa, 2005; Gollan bilinguals, self-reported Mandarin reading proficiency
et al., 2008; Linck et al., 2009), where acquisition of an L2 was associated with lower English reading skills. The sim-
can increase response times in L1. Such effects may be ulations of bilingual reading we have performed have
observed at both the lexical access stage of language pro- involved two closely-related languages, with overlapping
cessing (as demonstrated in our model) as well as affecting orthographic and phonological mappings (consider the
decision making processes, as reflected in the subtle effects bad/bath example, above). In a behavioural study on nam-
of L2 revealed in the diffusion model simulations of beha- ing responses in English, Lemhöfer et al. (2008) found only
vioural results in Brysbaert et al. (in press). small differences in responses on English words between
The third explanation for reduced frequency effects is English monolingual, Dutch-English, French-English, or
that greater exposure to a language results in proportion- German-English bilinguals, apart from the enhanced fre-
ally more exposure to lower frequency words (Kuperman quency effect for L2 speakers. So, such closely-related lan-
& Van Dyke, 2013). However, the frequency compression guages may not result in a strong interference effect. Yet,
used for sampling of input to the model meant that even simulating a wider range of languages, with varying
all the low frequency words were highly likely to occur degrees of similarity among orthographic, phonological,
even in small samples. For instance, by 200,000 random and semantic representations would enable us to deter-
samples, the point at which frequency effects tend to mine the computational consequences of overlaying over-
decrease in magnitude, 99.9% of words will have been sam- lapping versus distinct mappings in the reading system.
pled. Furthermore, sampled word frequencies at 100,000, Critically, the model predicted that changes in fre-
200,000, and 300,000 epochs were correlated at 1.00. Thus, quency effects were not linear as a consequence of expo-
sampling biases are not sufficient to explain the triangle sure. Rather, frequency effects increased during early
model’s performance. stages of language processing, as the model develops an
The fourth explanation we considered was that expo- accurate representation of words, and discrimination
sure is the key factor underlying the relation between lan- between phonological and semantic forms, akin to devel-
guage proficiency and size of frequency effects between opment of lexical quality (Perfetti, 2007). However, after
20 P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21
these representations have become well-formed (from relations among these variables can be tested. In particular,
about 100,000 to 200,000 epochs of training) the frequency we varied vocabulary size and exposure to measure how
effect then begins to reduce, as a consequence of increasing frequency effects vary between individuals. In the triangle
efficiency of the mappings. Thus, the triangle model gener- model, exposure is the cause of variation in both vocabu-
ates the prediction that individual differences in lexical lary learning and frequency effects, in both first and second
processing are likely to reflect both this fidelity of repre- languages.
sentation and efficiency of mapping, and can potentially
explain why frequency effects are less prominent in chil-
dren than young adults (Ellis, 2002; Garlock, Walley, & Acknowledgments
Metsala, 2001), because frequency effects are reduced by
poorer quality of representation. However, our simulations This research was supported by ESRC grant RES-000-
predict that with extensive exposure, frequency effects can 22-4049.
in principle fall below those of learners in early stages of
acquiring the language (see, e.g., Fig. 4), especially for References
lower-frequency words (Fig. 6). Comparisons between
children and older adults would be one way to assess this Adelman, J. S., & Brown, G. D. A. (2008). Methods of testing and diagnosing
prediction. model error: Dual and single route cascaded models of reading aloud.
Journal of Memory and Language, 59, 524–544.
Adelman et al. (2014) examined a range of psycholin- Adelman, J. S., Brown, G. D. A., & Quesada, J. F. (2006). Contextual
guistic factors, including length, consistency and fre- diversity, not word frequency, determines word naming and lexical
quency, in terms of parameter variation in DRC and CDP+ decision times. Psychological Science, 17, 814–823.
Adelman, J. S., Sabatos-DeVito, M. G., Marquis, S. J., & Estes, Z. (2014).
models. Their interest was the extent to which these mod- Individual differences in reading aloud: A mega-study, item effects,
els were sufficient to explain observed inter-individual and some models. Cognitive Psychology, 68, 113–160.
covariation in psycholinguistic variables derived from Andrews, S. (2015). Individual differences among skilled readers: The role
of lexical quality. In A. Pollatsek & R. Treiman (Eds.), The Oxford
behavioural mega-studies. Our aim for the current simula-
handbook of reading (pp. 129–138).
tions was different: to distinguish the relative contribu- Balota, D. A., Cortese, M. J., Sergent-Marshall, S. D., Spieler, D. H., & Yap, M.
tions of language proficiency and the size of the (2004). Visual word recognition of single-syllable words. Journal of
frequency effect in a computational model of reading that Experimental Psychology: General, 133(2), 283–316.
Brysbaert, M., Buchmeier, M., Conrad, M., Jacobs, A. M., Bölte, J., & Böhl, A.
can learn mappings as a consequence of exposure to the (2011). The word frequency effect: A review of recent developments
vocabulary of a language. However, there are possibilities and implications for the choice of frequency estimates in German.
for investigating the extent to which variation in training Experimental Psychology, 58, 412–424.
Brysbaert, M., Lagrou, E., & Stevens, M. (in press). Visual word recognition
the triangle model can reflect behavioural observations in a second language: A test of the lexical entrenchment hypothesis
for other psycholinguistic variables. For instance, with lexical decision times. Bilingualism: Language and Cognition (in
Adelman et al. (2014) co-located length and consistency press).
Brysbaert, M., Stevens, M., Mandera, P., & Keuleers, E. (2016). The impact
effects in the sublexical route of the DRC and CDP+ models, of word prevalence on lexical decision times: Evidence from the
and located frequency effects in the lexical route of these Dutch Lexicon Project 2. Journal of Experimental Psychology: Human
models. This constrains the extent to which these variables Perception and Performance, 42, 441–458.
Chang, Y. N., Furber, S., & Welbourne, S. (2012). ‘‘Serial” effects in parallel
are likely to covary – length and consistency effects should models of reading. Cognitive Psychology, 64(4), 267–291.
have similar coefficients for individuals, but may have dif- Chateau, D., & Jared, D. (2000). Exposure to print and word recognition
ferent coefficients to that of frequency. In the triangle processes. Memory & Cognition, 28(1), 143–153.
Coltheart, M., Rastle, K., Perry, C., Langdon, R., & Ziegler, J. (2001). DRC: A
model, we anticipate that variables such as length and con-
dual route cascaded model of visual word recognition and reading
sistency should be related to exposure in a similar way to aloud. Psychological Review, 108, 204–256.
frequency. This is because the effect of exposure on the Cop, U., Keuleers, E., Drieghe, D., & Duyck, W. (2015). Frequency effects in
model is to increase efficiency of the mappings, and com- monolingual and bilingual natural reading. Psychonomic Bulletin &
Review, 20, 963–972.
press the size of the difference between mappings that Cortese, M. J., & Khanna, M. M. (2007). Age of acquisition predicts naming
are initially difficult and those that are easier. Longer and lexical-decision performance above and beyond 22 other
words tend to contain more information in orthography predictor variables: An analysis of 2342 words. Quarterly Journal of
Experimental Psychology, 60, 1072–1082.
and in phonology and so are more complex to map than Costa, A. (2005). Lexical access in bilingual production. In J. F. Kroll & A. M.
shorter words. Inconsistent words are harder to map B. de Groot (Eds.), Handbook of bilingualism: Psycholinguistic
because they benefit less than consistent words from approaches (pp. 308–328). New York: Oxford University Press.
de Groot, A. M. B., Borgwaldt, S., Bos, M., & van den Eijnden, E. (2002).
learning mappings for other words with similar ortho- Lexical decision and word naming in bilinguals: Language effects and
graphic forms. Future investigation of the triangle model task effects. Journal of Memory and Language, 47, 91–124.
could determine the interplay between these factors and Diependaele, K., Lemhöfer, K., & Brysbaert, M. (2013). The word frequency
effect in first and second language word recognition: A lexical
the extent to which they are explained by exposure, or
entrenchment account. Quarterly Journal of Experimental Psychology,
require additional reconfiguring of architectural 66, 843–863.
parameters. Duyck, W., Vanderelst, D., Desmet, T., & Hartsuiker, R. J. (2008). The
frequency effect in second-language visual word recognition.
The computational modelling approach demonstrated
Psychonomic Bulletin & Review, 15(4), 850–855.
here enables isolation and control of various contributors Ellis, N. C. (2002). Frequency effects in language processing. Studies in
to behavioural performance. In this respect it provides a Second Language Acquisition, 24, 143–188.
useful accompaniment to approaches that demonstrate Faust, M. E., Balota, D. A., Spieler, D. H., & Ferraro, F. R. (1999). Individual
differences in information-processing rate and amount: Implications
the observed correlations among various psycholinguistic for group differences in response latency. Psychological Bulletin, 125
variables. The computational modelling means that causal (6), 777–799.
P. Monaghan et al. / Journal of Memory and Language 93 (2017) 1–21 21
Forster, K. I., & Chambers, S. M. (1973). Lexical access and naming time. Peterson, R. R., & Savoy, P. (1998). Lexical selection and phonological
Journal of Verbal Learning and Verbal Behavior, 12, 627–635. encoding during language production: Evidence for cascaded
Garlock, V. M., Walley, A. C., & Metsala, J. L. (2001). Age-of-acquisition, processing. Journal of Experimental Psychology: Learning, Memory,
word frequency, and neighborhood density effects on spoken word and Cognition, 24, 539–557.
recognition by children and adults. Journal of Memory and Language, Pfost, M., Dörfler, T., & Artelt, C. (2013). Students’ extracurricular reading
45(3), 468–492. behavior and the development of vocabulary and reading
Ghyselinck, M., Lewis, M. B., & Brysbaert, M. (2004). Age of acquisition and comprehension. Learning and Individual Differences, 26, 89–102.
the cumulative-frequency hypothesis: A review of the literature and a Plaut, D. C., & Booth, J. R. (2000). Individual and developmental
new multi-task investigation. Acta Psychologica, 115, 43–67. differences in semantic priming: Empirical and computational
Gollan, T. H., Montoya, R. I., Cera, C., & Sandoval, T. C. (2008). More use support for a single-mechanism account of lexical processing.
almost always means a smaller frequency effect: Aging, bilingualism, Psychological Review, 107(4), 786–823.
and the weaker links hypothesis. Journal of Memory and Language, 58 Plaut, D. C., McClelland, J. L., Seidenberg, M. S., & Patterson, K. (1996).
(3), 787–814. Understanding normal and impaired reading: Computational
Gomez, P., & Perea, M. (2014). Decomposing encoding and decisional principles in quasi-regular domains. Psychological Review, 103,
components in visual-word recognition: A diffusion model analysis. 56–115.
The Quarterly Journal of Experimental Psychology, 67(12), 2455–2466. Preston, K. A. (1935). The speed of word perception and its relation to
Harm, M. W., & Seidenberg, M. S. (1999). Phonology, reading acquisition, reading ability. Journal of General Psychology, 13, 199–203.
and dyslexia: Insights from connectionist models. Psychological Ransdell, S. E., & Fischler, I. (1987). Memory in a monolingual mode:
Review, 106, 491–528. When are bilinguals at a disadvantage? Journal of Memory and
Harm, M. W., & Seidenberg, M. S. (2004). Computing the meaning of Language, 26, 392–405.
words in reading: Cooperative division of labor between visual and Ratcliff, R., Gomez, P., & McKoon, G. (2004). A diffusion model account of
phonological processes. Psychological Review, 111, 662–720. the lexical decision task. Psychological Review, 111, 159–182.
Howes, D. H., & Solomon, R. L. (1951). Visual duration threshold as a Salthouse, T. A. (1996). The processing-speed theory of adult age
function of word-probability. Journal of Experimental Psychology, 41 differences in cognition. Psychological Review, 103(3), 403–428.
(6), 401–410. Sears, C. R., Siakaluk, P. D., Chow, V. C., & Buchanan, L. (2008). Is there an
Kaushanskaya, M., Yoo, J., & Marian, V. (2011). The effect of second- effect of print exposure on the word frequency effect and the
language experience on native-language processing. Vigo International neighborhood size effect? Journal of Psycholinguistic Research, 37(4),
Journal of Applied Linguistics, 8, 54–77. 269–291.
Keuleers, E., Stevens, M., Mandera, P., & Brysbaert, M. (2015). Word Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental
knowledge in the crowd: Measuring vocabulary size and word model of word recognition and naming. Psychological Review, 96(4),
prevalence in a massive online experiment. Quarterly Journal of 523–568.
Experimental Psychology, 68(8), 1665–1692. Shipley, W. C. (1940). A self-administering scale for measuring
Kroll, J. F., Michael, E., Tokowicz, N., & Dufour, R. (2002). The development intellectual impairment and deterioration. The Journal of Psychology,
of lexical fluency in a second language. Second Language Research, 18, 9(2), 371–377.
137–171. Spieler, D. H., & Balota, D. A. (1997). Bringing computational models of
Kuperman, V., & Van Dyke, J. A. (2013). Reassessing word frequency as a word naming down to the item level. Psychological Science, 411–416.
determinant of word recognition for skilled and unskilled readers. Taraban, R., & McClelland, J. L. (1987). Conspiracy effects in word
Journal of Experimental Psychology: Human Perception and Performance, pronunciation. Journal of Memory and Language, 25, 608–631.
39(3), 802. Thorndike, E. L. (1931). A teacher’s word book of twenty thousand words.
Lemhöfer, K., Dijkstra, T., Schriefers, H., Baayen, R. H., Grainger, J., & New York: Teacher College, Columbia University.
Zwisterlood, P. (2008). Native language influences on word Van Wijnendaele, I., & Brysbaert, M. (2002). Visual word recognition in
recognition in a second language: A megastudy. Journal of bilinguals: Phonological priming from the second to the first
Experimental Psychology: Learning, Memory, and Cognition, 34, 12–31. language. Journal of Experimental Psychology: Human Perception and
Lewellen, M. J., Goldinger, S. D., Pisoni, D. B., & Greene, B. G. (1993). Lexical Performance, 28, 616–627.
familiarity and processing efficiency: Individual differences in Wechsler, D. (2008). Wechsler adult intelligence scale (4th ed.). San
naming, lexical decision, and semantic categorization. Journal of Antonio, TX: Pearson.
Experimental Psychology: General, 122(3), 316–330. Weide, R. L. (1998). The CMU pronouncing dictionary <http://www.
Li, P., & Zhao, X. (2013). Self-organizing map models of language speech.cs.cmu.edu/cgibin/cmudict>.
acquisition. Frontiers in Psychology, 4, 828. Welbourne, S. R., & Lambon Ralph, M. A. (2007). Using parallel distributed
Linck, J. A., Kroll, J. F., & Sunderman, G. (2009). Losing access to the native processing models to simulate phonological dyslexia: The key role of
language while immersed in a second language: Evidence for the role plasticity-related recovery. Journal of Cognitive Neuroscience, 19(7),
of inhibition in second-language learning. Psychological Science, 20 1125–1139.
(12), 1507–1515. Whitford, V., & Titone, D. (2012). Second-language experience modulates
Marcus, M., Santorini, B., & Marcinkiewicz, M. A. (1993). Building a large first-and second-language word frequency effects: Evidence from eye
annotated corpus of English: The Penn Treebank. Computational movement measures of natural paragraph reading. Psychonomic
Linguistics, 19, 313–330. Bulletin & Review, 19(1), 73–80.
Miller, G. A. (1990). WordNet: An on-line lexical database. International Yap, M. J., & Balota, D. A. (2009). Visual word recognition in multisyllabic
Journal of Lexicography, 3, 235–312. words. Journal of Memory and Language, 60, 502–529.
Monaghan, P., & Ellis, A. W. (2010). Modeling reading development: Yap, M. J., Balota, D. A., Sibley, D. E., & Ratcliff, R. (2012). Individual
Cumulative, incremental learning in a computational model of word differences in visual word recognition: Insights from the English
naming. Journal of Memory and Language, 63, 506–525. Lexicon Project. Journal of Experimental Psychology: Human Perception
Morrison, C. M., Hirsh, K. W., Chappell, T., & Ellis, A. W. (2002). Age and and Performance, 38(1), 53–79.
age of acquisition: An evaluation of the cumulative frequency Yap, M. J., Balota, D. A., Tse, C. S., & Besner, D. (2008). On the additive
hypothesis. European Journal of Cognitive Psychology, 14(4), 435–459. effects of stimulus quality and word frequency in lexical decision:
Norris, D. (2009). Putting it all together: A unified account of word Evidence for opposing interactive influences revealed by RT
recognition and reaction-time distributions. Psychological Review, 116, distributional analyses. Journal of Experimental Psychology: Learning,
207–219. Memory, and Cognition, 34(3), 495–513.
Perfetti, C. (2007). Reading ability: Lexical quality to comprehension. Ziegler, J. C., Castel, C., Pech-Georgel, C., George, F., Alario, F. X., & Perry, C.
Scientific Studies of Reading, 11(4), 357–383. (2008). Developmental dyslexia and the dual route model of reading:
Perry, C., Ziegler, J. C., & Zorzi, M. (2007). Nested incremental modeling in Simulating individual differences and subtypes. Cognition, 107,
the development of computational theories: The CDP+ model of 151–178.
reading aloud. Psychological Review, 114, 273–315.