Modelling Mental Lexicon
Modelling Mental Lexicon
Modelling Mental Lexicon
Abstract
Since the introduction of the term mental lexicon, several efforts were
made to describe, develop and understand this system responsible for the
mental representations of words. Such efforts have done following several
approaches, since the theoretical and empirical methods up to computa-
tional models. Recently, Recurrent Neural Networks (RNN) have received
particular attention due to their capacity in several applications, including
language processing; likewise, realistic models and networks based on neu-
rophysiological data have contributed to the understanding of the brain’s
actions and behaviors during language processing. This work aims to re-
view and show some consideration on mental lexicon modeling as well as
the current approaches based on RNN, its alternatives and the cortical-like
RNN model.
1
1 Mental Lexicon
In this section, several considerations related to modelling the mental lexicon are
presented following the work was done by Weronika Szubko-Sitarek [1], where
a full review and discussion can be found.
Since the first discussions regarding the mental lexicon, psycholinguistics
have questioned many aspects about the nature of this system, which is respon-
sible for the mental representation of the words and their meanings [2]. One
of the main discussion was the representation of lexical entries, namely, their
internal structure. Levelt [3] proposed that such items should be separated in
two components: semantic component (Lemma) and formal component (Lex-
eme). The first one is related to the word’s meaning and its syntax; the second
one includes the morphology, phonology and orthographic. The community has
widely accepted this theory, but there still are discussions around the semantic
representations and how the meaning is represented [4]. Likewise, issues remain
in the number of lexicons and the modality of input and output (listening and
reading).
On the other hand, discussions about how the lexical entries should be stored
were also raised. Two main hypotheses were built to face this problem: The Full
Listing Hypothesis [5] and Decompositional Hypothesis [3]. In the first case, the
words as seen as a whole and independent lexical entity, like in a written dic-
tionary, there is no distinction among between words and its variations (go and
goes are stored separately). Unlike this approach, Decompositional Hypothesis
establish that words are stored as a bundle of morphemes (smallest meaningful
units of language). This theory has had experimental evidence coming from
priming tasks, lexical decision tasks, spoken error analysis or experiments with
brain-damaged subjects.
Semantical representation has also discussed in the literature. In order to
explain how the conceptual features storage and their retrieval from memory
work, several models have been postulated:
2
line with two important categorization theories, in this point arises the
main discussion of the model. In one side, the classical view, coming from
ancient Greece, namely, the Aristotelian model of categorization (essence
and accidents); in the other side, the prototype theory derived from cogni-
tivism, suggest that the people acquire the meaning of words by reference
to a highly typical example. Several authors agree with this theory, since
that it has empirical evidence (for a complete discussion see [1].
Other key concept within mental lexicon is the lexical access, this mechanism
is responsible of searching and recognizing a word in a fast way (about 200 ms
after its onset). Moreover, its complexity depends on the modality of the word
(phonological or orthographical) which has to be recognized so that its meaning
can be accessed. In the opposite way, the production of language starts from
the meaning of the desire concept and it has to be translated into a phonological
or orthographical representation. Several models have been proposed to explain
and analyze the lexical access:
1. Serial search model: In this model, it is proposed that lexical access occurs
by sequentially scanning of lexical entries one at a time. In the Forster’s
model [9], following the comparison of the lexicon to a library, access infor-
mation (orthographic, phonological and semantic/syntactic) are used to
localize the lexical item. Independently of the modality (visual, auditory),
each item can be accessed one at a time. Once the word’s localization is
found (based on its access information), the search for the word entry
in the lexicon starts until the correct lexical entry is retrieved from the
lexicon (master lexicon in Forster’s model).
2. The Logogen model [10]: Unlike the previous model, the Logogen model
uses a threshold for accessing words instead of a determinate location.
That means words are activated when it receives enough energy (thresh-
old) to access the lexical entry. Similarly to Forster’s model, the input
can be from orthographical, phonological or semantic information. Mor-
ton suggests that each entry has some features in common with a targeted
stimulus (logogen). Thus, when the input receives the lexical entry candi-
dates, the number of features for each logogen is summed up, and whether
the threshold is reached, the word, therefore, is activated. This model has
taken special attention due that it describes how the nervous system re-
sponsible for lexical processing works, integrating both acoustic and visual
inputs.
3. Cohort model: Inspired by the Spreading Activation Model, Marslen et al.
[11] propose that instead of the words of similar semantics being primed,
the priming is done by words with similar sound. Comprised by three
stages, the Marslen’s model starts with the activation of a set of words
(cohort) with similar sound (access stage); After, the lexical items in the
cohort are progressive suppressed based on its inconsistency with the con-
text (selection stage), thus until to reach a single item (integration stage).
4. Computational models: So far, all of the models presented here have been
determined by ”high-level theoretical principles,” that means, without car-
ing about what is going on inside the boxes. Thus, reading models can
3
explain and handle realistic and simulate data from the traditional ex-
periments (lexical decision, masked priming, eye-movements). Thanks for
the current computing powers, models can perform large-scale simulations
using a huge number of words; of course, it is necessary to highlight that
models only tackle a limited range of phenomenon or tasks to be simu-
lated. Several models have been proposed based on the task, the word
frequency, letter order, word-superiority effect, RT distribution, among
others. They are discussed in [1] and are visualized in the figure 1.
4
Figure 1: Mental lexicon considerations. Source: the author.
2.1 Introduction
Recurrent Neural Networks (RNNs) are obtained from the feed-forward network
by connecting the neurons’ output to their inputs [15]. Their name is due that
perform the same task for every element of a sequence; thus, the output of
the network is depended on the previous computations. The short-term time-
dependency is modelled by the hidden-to-hidden connections without using any
time delay-taps. They are usually trained iteratively via a procedure known
as back-propagation-through-time (BPTT). The figure 2 shows a RNN being
unfolded into a full network [16]. In other words, the current state (s) expands
out over time into a sequence of the n-layer neural network. Thus, for each
input x at time step t, there is a state s and an output o for the same time step.
The fact that RNNs share parameters (U, V, W) at each layer reflects that each
step is carried out the same task with different inputs.
As RNNs share parameters at each layer, when it is unfolded in time, results
in the problem of vanishing gradients. Long short-term memory (LSTM) [17]
was proposed to resolve this problem. An LSTM memory cell is composed of
four main elements: an input gate and output gate that control the impact of
the input and output value on the state of the memory cell; a self-recurrent
connection that controls the evolution of the state of the memory cell; and the
forget gate determines how much of prior memory value should be passed into
the next time step. Depending on the states of these gates, LSTM can represent
long-term or short-term dependency of sequential data.
5
Figure 2: RNN unfolding over time. Source: [16]
Recently, LSTM RNNs won several international competitions and set nu-
merous benchmark records. A stack of bidirectional LSTM RNNs broke a fa-
mous TIMIT speech (phoneme) recognition record, achieving 17.7% test set er-
ror rate [18]. LSTM-based systems also set benchmark records in language iden-
tification [19], medium-vocabulary speech recognition [20], and text-to-speech
synthesis [21]. LSTMs are also applicable to handwriting recognition [22], voice
activity detection [23], machine translation [24].
Moreover, RNN has widely used for modelling various neurobiological phe-
nomena, considering anatomical, electrophysiological and computational con-
straints. The computational power of RNNs comes from the fact that its neu-
ron’s activity is affected not only by the current stimulus (input) to the network
but also by the current state of the network, it means that inside the network
will keep traces of past inputs on [25]. Thus, the RNNs are ideally suited for
computations that unfold over time such as holding items in working memory
or accumulating evidence for decision-making.
For example, Barak [25] presents RNNs as a versatile tool to explain neural
phenomena including several constraints. He exposes how combining trained
RNNs with reverse engineering can represent an alternative framework for neu-
roscience modelling. Moreover, Rajan et al. [26] show RNN models of neural
sequences of memory based on decision-making tasks generated by minimally
structured networks. They suggest that neural sequences activation may pro-
vide a dynamic mechanism for short-term memory, which comes from mostly
unstructured network architectures. In the same way, Güçlü and J. van Gerven
[27] show how RNNs are a well-suited tool for modelling the dynamics of human
brain activity. In their approach, they investigated how the internal memories
of RNNs can be used in the prediction of feature-evoked response sequence,
which is commonly measured using fMRI. Likewise, Susillo et al. [28] use RNNs
to generate muscle activity signals (electromyography, EMG) to explain how
the neural responses work in motor cortex. They started with the hypotheses
that motor cortex reflects a dynamics, which is used for generating temporal
commands. Thus, the RNNs are used to transform simple inputs into temporal
and spatial complex patterns of muscle activity.
The previous works show how RNNs are a powerful tool for modelling several
neural dynamics in the brain. The next section will focus on the use of recurrent
networks in language modelling.
6
2.2 RNN in language modelling
Since the first implementation of the single Recurrent Neural Network (SRN)
[29] and its posterior modification [30], several updates, and new approaches
were created. In this first model, there are three simple layers (input, hidden
and output) and its task was to predict the next letter from a sequence (one
letter presented at a time). Likewise, this model has a contextual representation,
which keeps the activation pattern from the current hidden layer’s generated
by the previous step. Thus, this model was able to create symbol sequences
based on the input (n-1) stimuli and the context data. These pioneering studies
showed that SRN is a suitable model from artificial grammar learning (AGL)
using sequential data as an input. Consequently, the new approaches include
some of the SRN’s features, indeed, one of these is its capability of creating
grammatical and semantic categories as well as relationships between them. In
effect, as Elman points out [31], there is no a mental lexicon in the SRN(as the
definition presented before). Instead, the lexical knowledge is implicit by word’s
internal states (defining by the dynamical of the network).
Alternative methods based on SRN (or at least in its recurrent definition)
have been proposed. Fitz [32] describes a Liquid-state machine (LSM), a re-
current neural network inspired on the characteristics of the cerebellum infor-
mation processes. The model uses a sparse (liquid) and random connections
of neuron-like units which turn the time-varying input signal into a spatial-
temporal pattern. A working memory model which processes the past inputs
over transient states (like SRN). The author empathizes the robustness of this
implementation, mainly due to small changes in the parameters. Also, He used
the model to make novel predictions for different conditions (more frames in
languages and a considerable distance between dependencies). Demonstrating a
U-shaped pattern for the first conditions (shifted towards lower variability) and
an improvement of the performance for the three frames, one filler condition.
Furthermore, the same author developed a dual-path model [33] which uses
SRN as a sequencing pathway in order to learn syntactic representations from
two categories layers (CCOMPRES and COMPRESS) and unlike of using a
single prediction, this model computes an error (the difference between the
current next word and the model’s prediction) to adjust the weights in the
learning algorithm and thus make more accurate predictions. Meanwhile, the
meaning pathway encodes the target message in a fast-changing links between
the Role and Concept layers, the Role layer is composed by four thematic role
variables (Action, Agent, Theme, Recipient). The two pathways are intercepted
at the Hidden Layer. Consequently, these layers are intercepted at the output
layer, and thus the model can learn which word was associated with the specific
concept. This model presents several advantages, for instance the encoding the
complex utterances, the input distribution and high sentence accuracy.
Spite of the versatility and widely uses, RNN still has learning issues due to
the back-propagation limitation (as explained in the previous section). In order
to offer an alternative to that, the Self-Organized Feature Maps (SOFM) was
proposed for language modelling. The DevLex-II word learning model intro-
duced by Li et al. [34], this model uses three local maps to process and organize
linguistic information: auditory, semantic and articulatory maps. The first one
takes the phonological information; the semantic maps are responsible for orga-
nizing the meaning representations, and finally, the articulatory map integrates
7
the outputs phonemic sequences of words. These maps are inter-connected with
associative links trained by Hebbian learning which allows to the network a use-
ful modelling of comprehension and production processes. Ferro et al. [35] de-
veloped an improved model called Temporal Self-Organizing Maps (THSOMS).
This approach keeps the time serial information of the network using a predic-
tive activation chain which encodes both spatial and temporal information of
the input. The model tackles some issues related to the lexical organization
and morphological processing. Likewise, this model has revealed some dynam-
ics among short-term memory (activation), long-term memory (learning) and a
morphological organization of stored word forms (topology).
Nevertheless, all of the previous models dont have a network dynamics con-
sistent with a range of neurophysiological findings in learning, language and
memory tasks. Thus, in the next section presents a hybrid and cortical-like
network which combines both of the previous networks (RNN and SOM), called
as Self-Organazing Recurrent Neural Network (SORN).
8
SORN is prominent due to its simplicity and the biological plausibility of
the recurrent plastic layer. Indeed, the authors report an observed log-normal
weight distribution of the synaptic weights of the network matching experimen-
tal findings [37]. Likewise, with the three plasticity mechanism, it is possible to
create an adequate representation of the input, allowing the network to outper-
form randomly initialized non-plastic networks (like SRN). Additionally, fluc-
tuation patterns of the connection strength were consistent with those found in
the dynamics of dendritic spines of rat hippocampus. Thus, SORN can offer
the possibility of studying the plasticity mechanism presented in the brain with
manageable and straightforward networks [37]. SORN has been successfully
applied to tasks for prediction [36], recall and non-linear computation [38] and
artificial grammar learning [39].
Undoubtedly, SORN is a feasible and complete tool for asserting and mod-
elling some or all of the mental lexicon entity. Using a network which includes a
neurophysiological behavior can be useful for the understanding of the language
processing phenomena.
9
References
[1] Weronika Szubko-Sitarek. Modelling the Lexicon: Some General Consider-
ations, pages 33–66. Springer Berlin Heidelberg, Berlin, Heidelberg, 2015.
[2] John Fiel. Psycholinguistics: the Key Concepts. Routledge, United King-
dom, London, 2004.
[3] W.J.M Levelt. Lexical Access in Speech Production. Oxford:Blackwell,
United Kingdom, London, 1993.
[4] Manfred Bierwisch and Robert Schreuder. From concepts to lexical items.
Cognition, 42(1):23 – 60, 1992.
[5] B Butterworth. Language Production Volume 2: Development, Writing
and Other Language Processes. Academic Press, United Kingdom, London,
1983.
[6] Allan M. Collins and M. Ross Quillian. Does category size affect catego-
rization time? Journal of Verbal Learning and Verbal Behavior, 9(4):432 –
438, 1970.
[7] A. M. Collins and E. F. Loftus. A spreading activation theory of semantic
processing. Psychological Review, 82:407–428, 1975.
[8] E.J. Shoben Smith, E.E. and L.J. Rips. Structure and process in semantic
memory: A featural model for semantic decisions. Psychological Review,
81:214–241, 1974.
[9] K. I. Forster. Accessing the Mental Lexicon. In New Approaches to Lan-
guage Mechanisms. A Collection of Psycholinguistic Studies, pages 257–
287. North-Holland Publishing Company, 1976.
[10] Morton. J. and K. Patterson. A new attempt at an interpretation or an
attempt at new interpretation. In Deep Dsylexia, pages 91–118. London:
Routledge., 1998.
[11] William D. Marslen-Wilson. Functional parallelism in spoken word-
recognition. Cognition, 25(1):71 – 102, 1987. Special Issue Spoken Word
Recognition.
[12] J Fodor. The Modularity of Mind: an Essay on Faculty Psychology. MIT
Press, Cambridge, MA, 1983.
10
[15] Michael Husken and Peter Stagge. Recurrent neural networks for time
series classification. Neurocomputing, 50:223 – 235, 2003.
[16] Deep Learning WILDM. Artificial Intelligence and NLP. Recurrent neural
networks tutorial, part 1 – introduction to rnns.
[17] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neu-
ral Comput., 9(8):1735–1780, November 1997.
[18] Alex Graves, Abdel-rahman Mohamed, and Geoffrey E. Hinton. Speech
recognition with deep recurrent neural networks. CoRR, abs/1303.5778,
2013.
[19] Lopez-Moreno I. Sak H. Gonzalez-Rodriguez J. Gonzalez-Dominguez, J.
and P. J. Moreno. Automatic language identification using long short-term
memory recurrent neural networks, 2014.
[20] Jürgen T. Geiger, Zixing Zhang, Felix Weninger, Björn Schuller, and Ger-
hard Rigoll. Robust speech recognition using long short-term memory re-
current neural networks for hybrid acoustic modelling, 2014.
[21] Qian Y.-Xie F. Fan, Y. and F. K. Soong. Tts synthesis with bidirectional
lstm based recurrent neural networks., 2014.
[22] Alex Graves and Jürgen Schmidhuber. Offline handwriting recognition
with multidimensional recurrent neural networks. In D. Koller, D. Schuur-
mans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information
Processing Systems 21, pages 545–552. Curran Associates, Inc., 2009.
[23] F. Eyben, F. Weninger, S. Squartini, and B. Schuller. Real-life voice ac-
tivity detection with lstm recurrent neural networks and an application to
hollywood movies. In 2013 IEEE International Conference on Acoustics,
Speech and Signal Processing, pages 483–487, May 2013.
[24] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learn-
ing with neural networks. CoRR, abs/1409.3215, 2014.
[25] Omri Barak. Recurrent neural networks as versatile tools of neuroscience
research. Current Opinion in Neurobiology, 46:1 – 6, 2017. Computational
Neuroscience.
[26] Kanaka Rajan, Christopher D. Harvey, and David W. Tank. Recurrent
network models of sequence generation and memory. Neuron, 90(1):128 –
142, 2016.
[27] Umut Guclu and Marcel A. J. van Gerven. Modeling the dynamics of
human brain activity with recurrent neural networks. Frontiers in Compu-
tational Neuroscience, 11:7, 2017.
[28] David Sussillo, Mark M. Churchland, Matthew T. Kaufman, and Kr-
ishna V. Shenoy. A neural network that finds a naturalistic solution for the
production of muscle activity. Nature Neuroscience, 18:1025–1033, 2015.
[29] Jeffrey L. Elman. Finding structure in time. Cognitive Science, 14(2):179
– 211, 1990.
11
[30] Jeffrey L. Elman. Distributed representations, simple recurrent networks,
and grammatical structure. Machine Learning, 7(2):195–225, Sep 1991.
[33] Hartmut Fitz and Franklin Chang. Meaningful questions: The acquisi-
tion of auxiliary inversion in a connectionist model of sentence production.
Cognition, 166:225 – 250, 2017.
[34] Ping Li, Xiaowei Zhao, and Brian Mac Whinney. Dynamic self-organization
and early lexical development in children. Cognitive Science, 31(4):581–612.
[35] Marcello Ferro, Giovanni Pezzulo, and Vito Pirrelli. Morphology, memory
and the mental lexicon. 2012.
[36] Andreea Lazar, Gordon Pipa, and Jochen Triesch. Sorn: a self-organizing
recurrent neural network. Frontiers in Computational Neuroscience, 3:23,
2009.
[37] Witali Aswolinskiy and Gordon Pipa. Rm-sorn: a reward-modulated self-
organizing recurrent neural network. Frontiers in Computational Neuro-
science, 9:36, 2015.
[38] Hazem Toutounji and Gordon Pipa. Spatiotemporal computations of an
excitable and plastic brain: Neuronal plasticity leads to noise-robust and
noise-constructive computations. PLOS Computational Biology, 10(3):1–
20, 03 2014.
[39] Renato Carlos Farinha Duarte, Peggy Seriès, and Abigail Morrison. Self-
organized artificial grammar learning in spiking neural networks. In CogSci,
2014.
12