A Modular Approach To Language Production Models A

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/235673694
A modular approach to language production: Models and facts
Article in Cortex · February 2013

DOI: 10.1016/j.cortex.2013.02.005.
CITATIONS READS
10 759
5 authors, including:
Juan Valle Lisboa Andrés Pomi

Universidad de la República de Uruguay Facultad de Ciencias
23 PUBLICATIONS 248 CITATIONS 16 PUBLICATIONS 228 CITATIONS
SEE PROFILE SEE PROFILE
Alvaro Cabana Eduardo Mizraji

Universidad de la República de Uruguay Universidad de la República de Uruguay
17 PUBLICATIONS 124 CITATIONS 72 PUBLICATIONS 683 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Semantics View project
Revista Psicología, Conocimiento y Sociedad View project
All content following this page was uploaded by Alvaro Cabana on 09 July 2018.
The user has requested enhancement of the downloaded file.

This article appeared in a journal published by Elsevier. The attached
copy is furnished to the author for internal non-commercial research
and education use, including for instruction at the authors institution
and sharing with colleagues.
Other uses, including reproduction and distribution, or selling or
licensing copies, or posting to personal, institutional or third party
websites are prohibited.
In most cases authors are permitted to post their version of the
article (e.g. in Word or Tex form) to their personal website or
institutional repository. Authors requiring further information
regarding Elsevier’s archiving and manuscript policies are
encouraged to visit:
http://www.elsevier.com/authorsrights
Author's personal copy
c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6
Available online at www.sciencedirect.com
ScienceDirect
Journal homepage: www.elsevier.com/locate/cortex
Special issue: Research report
A modular approach to language production:

Models and facts
Juan C. Valle-Lisboa a, Andrés Pomi a, Álvaro Cabana a, Brita Elvevåg b,c

and Eduardo Mizraji a,*
a
Group of Cognitive Systems Modeling, Biophysics Section, Facultad de Ciencias, Universidad de la República,
Montevideo, Uruguay
b
Psychiatry Research Group, Department of Clinical Medicine, University of Tromsø, Tromsø, Norway
c
Norwegian Centre for Integrated Care and Telemedicine (NST), University Hospital of North Norway, Tromsø,
Norway
article info abstract
Article history: Numerous cortical disorders affect language. We explore the connection between the
Received 11 July 2012 observed language behavior and the underlying substrates by adopting a neurocomputa-
Reviewed 10 October 2012 tional approach. To represent the observed trajectories of the discourse in patients with
Revised 19 December 2012 disorganized speech and in healthy participants, we design a graphical representation for
Accepted 7 February 2013 the discourse as a trajectory that allows us to visualize and measure the degree of order in
Published online 19 February 2013 the discourse as a function of the disorder of the trajectories. Our work assumes that many
of the properties of language production and comprehension can be understood in terms of
Keywords: the dynamics of modular networks of neural associative memories. Based upon this
Language assumption, we connect three theoretical and empirical domains: (1) neural models of
Neural models language processing and production, (2) statistical methods used in the construction of
Model neuroimaging functional brain images, and (3) corpus linguistic tools, such as Latent Semantic Analysis
Schizophrenia (henceforth LSA), that are used to discover the topic organization of language. We show
Discourse trajectories how the neurocomputational models intertwine with LSA and the mathematical basis of
functional neuroimaging. Within this framework we describe the properties of a context-
dependent neural model, based on matrix associative memories, that performs goal-
oriented linguistic behavior. We link these matrix associative memory models with the
mathematics that underlie functional neuroimaging techniques and present the “func-
tional brain images” emerging from the model. This provides us with a completely
“transparent box” with which to analyze the implication of some statistical images. Finally,
we use these models to explore the possibility that functional synaptic disconnection can
lead to an increase in connectivity between the representations of concepts that could
explain some of the alterations in discourse displayed by patients with schizophrenia.
ª 2013 Elsevier Ltd. All rights reserved.
* Corresponding author. Facultad de Ciencias, Universidad de la República, Iguá 4225, Montevideo 11400, Uruguay.
E-mail addresses: [email protected], [email protected] (E. Mizraji).
http://dx.doi.org/10.1016/j.cortex.2013.02.005
0010-9452/ª 2013 Elsevier Ltd. All rights reserved.
62 c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6
empirical domains currently used to analyze language,

1. Introduction namely (1) neural models of language processing and pro-
duction, (2) statistical methods that are used to create func-
Research investigating the neural substrates of language
tional brain images, and (3) corpus linguistic tools such as
production and comprehension is a complex and rapidly
Latent Semantic Analysis (henceforth LSA) that are used to
evolving field (Pulvermüller, 2010, 2012). A clear understand-
discover the topic organization of language. To this end we
ing of these neurocognitive processes is also relevant for a
demonstrate how these neurocomputational models inter-
variety of psychiatric and neurological conditions where lan-
twine with LSA and the mathematical basis of functional
guage is often a crucial operational indicator of cognitive
neuroimaging. Subsequently, we apply these methods to
function, as well as the clinical course and state of the illness
some aspects of discourse disorganization.
(Andreasen and Grove, 1986; Snowdon et al., 1996; DeLisi,
We have recently presented an empirical analysis of
2001; McKenna and Oh, 2005).
discourse productions in patients with schizophrenia, where
One important approach to the study of the neural basis of
we introduce an entropy measure in order to quantify
cognition relies heavily on mathematical models of neural
discourse disorganization (Cabana et al., 2011a). Here we show
networks. These models try to reveal the essential workings of
that this disorganization can be captured in our models of
the almost inextricably complex real neural networks of the
language production. We use our three-part approach (stated
brain. One of the main goals of neural modeling is to translate
above) in order to evaluate the consequences of functional
some basic empirical facts into transparent mathematical
disconnection at the level of discourse disorganization and
devices that could be used to explicitly establish some prop-
neuroimaging, solving the apparent paradox that disconnec-
erties of network topologies and neuronal connections,
tion at the synapse (and neuroimaging level) can nevertheless
including synaptic transmission (McCulloch and Pitts, 1943;
produce a certain level of semantic hyper-connection in the
Anderson, 1995; Arbib, 1995; Elman et al., 1997).
discourse of some patients with schizophrenia.
The language faculty includes several functional modules
The organization of this paper is as follows: in Section 2 we
and so it is natural to try and devise explicit theoretical ac-
present a graphical representation of linguistic productions
counts that connect all the modules and explain how they
that enables the measurement of the degree of order in
function and how they breakdown. We suggest that adoption
discourse. This representation of empirical data was inspired
of mathematical and computational approaches may nurture
by results obtained from neural models of associative mem-
a more explicit and unified account of language by demanding
ories, and motivates the exploration of the usage of neuro-
explicitly stated hypotheses. Such an approach does, of
computational models in the investigation of the nature of
course, run the risk of naively ignoring important details, and
language anomalies in psychosis. In Sections 3 and 4 we re-
in light of current knowledge this approach will be unable to
view previous theoretical work that provides the foundation
do justice to the richness of current verbal or symbolic char-
for the development of our neural models, and Sections 5e7
acterizations of the relevant phenomena. However, we believe
document the results from our lesioned models in terms of
that these limitations are at least partially outweighed by the
the deterioration of discourse production. In Section 3 we re-
heuristic value of a computationally explicit (and hence test-
view the fundamentals of associative memory models and
able and readily falsifiable) model. Indeed, we suggest that the
present a multiplicative model that we have been developing
diversity of current models points to the need for a more
for several years, and that is at the core of all the models we
constrained form of theorizing.
employ in this paper. Then, in Section 4 we describe a multi-
A core assumption underlying the present work is that
modular neural network that reproduces the thematic pack-
many of the properties of language production and compre-
ing of lexical strings in a way that is closely related to LSA. In
hension can be understood in terms of the dynamics of
Section 6 we present a new context-dependent neural model
modular networks of neural associative memories. In our
that performs goal-oriented linguistic behavior. In Section 5
computational modeling that we present below each module
we link matrix associative memory models with the mathe-
has a core matrix associative memory. This representation is
matics that underlies functional neuroimaging techniques
ample enough to accept connectionist approaches à la
and thus show the “functional brain imaging” that would be
AndersoneKohonen or McClellandeRumelhart (Anderson,
generated from the model described in Section 6. Finally, in
1972; Kohonen, 1977; McClelland et al., 1986; Rumelhart
Section 7 we use the numerical results emerging from this
et al., 1986a) where learning is essential, and also some
model to explore a possible explanation for the disconnection-
eventual a priori neural circuits created by evolution and
hyperconnectivity paradox that is reported in some patients
developed during brain embryogenesis.
with schizophrenia.
The aim of this paper is to present this modular network of
associative memories as a convenient theoretical tool to unify
diverse aspects of cognitive neuroscience of language pro-
2. Order and disorder in the topology of
duction, comprehension and distortion, and exemplify its
language
potential with the analysis of some expected consequences of
functional disconnection and its explicative power of empir-
In addition to being an important human trait on its own,
ical observations of language disorganization in some patients
language is an observable that can point to underlying pa-
with schizophrenia.
thologies or disorders. The fact that language shows regular-
Therefore, as a contribution toward a neurocomputational
ities at many levels allows for empirical quantitative
‘theory’ of language we connect three theoretical and
descriptions to be devised that capture an operational
c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6 63
measure of putative pathological processes. A valuable levels of thought disorder in patients with schizophrenia
objective is to find measures that are related to the underlying (Cabana et al., 2011a).
mechanisms involved in language production, and thus The graphical representations thus far presented, where
whose change we can model. the semantic space spanned by the discourse is projected in
We have recently described the use of graphical repre- thematic subspaces, has been inspired by a class of context-
sentations for the trajectories of discourse (Cabana et al., dependent associative memory models in which the con-
2011a) comparing the linguistic productions from patients texts allow the dissection of the conceptual structure of the
with schizophrenia to healthy comparison participants. In the memory (Pomi and Mizraji, 2004). In the next section we
present paper, we are interested in establishing connections introduce the basic ideas underlying these models in terms of
between the graphical representations and neural models. We their potential usefulness to explore some aspects of the
first present an application of our method as an illustrative pathophysiological hypotheses in language disorganization.
example. To do so, we selected a brief passage from the per-
sonal diaries of the famous Russian ballet dancer Vaslav
Nijinsky (1890e1950) that was written during a period of 3. Modeling cognitive neural processing
illness (1919) and that is particularly lacking in coherence with context-dependent matrix memories
(taken from the unexpurgated edition, published in 1999). In
the most comprehensive modern examination of Nijinsky’s The classical models of associative memories (Anderson,
medical notes and life in general, Joseph H. Stephens, M.D. 1972; Cooper, 1973; Kohonen, 1977) are based on matrix
and Peter Ostwald, M.D. conclude: “Thus, our final diagnosis algebra, and large dimensional vectors are used as a natural
according to DSM1 III of the tragic genius Vaslav Nijinsky must mathematical representation of neural data. A landmark in
be Schizoaffective Disorder in a Narcissistic Personality” neural modeling was heralded by the emergence of the
(Ostwald, 1991; p. 350). Naturally there are many problems “Parallel Distributed Processing (PDP) Group” (Rumelhart
and risks with retrospective psychiatric diagnosis, but we et al., 1986b; McClelland et al., 1986), as well as the pioneer-
nonetheless present this case information simply to show ing research previously conducted by Rumelhart and
that the writing samples from Nijinsky’s diary were generated McClelland which developed the learning and representation
by a person who most certainly struggled with a form of abilities of original neural models. This work established the
psychosis. As a contrasting sample, we used the first para- very foundation for subsequent investigations aimed at
graphs of “A Study in Scarlet” about the adventures of the exploring deep psychological questions using neural models,
fictional detective Sherlock Holmes written by Arthur Conan specifically language and development (see Elman et al.,
Doyle in 1887 that features a first-person introduction of 1997). Indeed, language processing has motivated some
Dr. John Watson’s experiences in the British army (Doyle, extremely influential neural models such as the one devel-
2005). The clearly written and narrative nature of the text oped by Elman (1990), a multimodular neural model con-
presupposes few complex metaphors or other literary devices sisting of a perceptual neural device, a working memory and
that could complicate analysis and hence obscure compari- an associative memory.
son. Both passages are displayed in Table 1 (see Fig. 1). It is interesting to emphasize that one of the main advan-
Both samples were subjectively evaluated in order to tages provided by the kind of neural models we are employing
identify a small set of topics that were present in the text. in this work is the universality of vector coding of neural ac-
Here, we illustrate a graphical representation of the trajectory tivity. If we consider the receptive aspects of language, we
of the discourse where the visits to different topics are used to note that the phonetic reception of speech, the visual
quantitatively evaluate the degree of order of discourse. In perception of written language or the tactile detection of texts
Fig. 1, the text is represented as a line that traverses over the coded in Braille involve different physical signals (acoustic
top of relevant nodes of an underlying semantic graph, and waves, photons and mechanical pressures). In these three
each topic is represented as a distinct layer on the graph. A cases, each kind of sensory receptor transduces its associated
text with a more disordered trajectory would show a greater signal into a bioelectric neuronal signal that neurons carry to a
degree of oscillation and change between topic layers. variety of neural processing relays. Finally, in each case, it is
As mentioned earlier, we have recently developed a highly plausible that the neural modules responsible for
quantitative measure to index the degree of disorder, namely decoding language receive large sets of electrochemical and
“topic entropy” (see Cabana et al., 2011a). Specifically, this neurochemical activities that are, by definition, neural vectors
entropy results in higher values for samples in which one or independent of the original sensory modality.
more topics are “visited several times”, and in lower values Similarly, in the case of language production, the large sets
when each topic is “visited” only once. When we apply this of neuronal signals that emerge during the cognitive and lin-
measure to the literary examples in Fig. 1, crucially we obtain guistic processes involved, also define neural vectors. The
a higher topic entropy for the Nijinsky sample (S ¼ 1.75) than ensemble of neural vectors produced in this way, will after a
for the Doyle sample (S ¼ 0). This difference is what would be number of preprocessing stages command the motor pro-
expected given that topic entropy measures the disorganiza- cesses that make language communicable and observable. For
tion of discourse, and we have previously shown that this these reasons, models involving large dimensional vectors as
metric correlates systematically with the ratings of clinical inputs and outputs are potentially apt to represent the
neuronal dynamics underlying linguistic processing and pro-
duction. The models of the pathophysiology of schizophrenia
1
Diagnostic and Statistical Manual of Mental Disorders by Hoffman and McGlashan (1997) and Valle-Lisboa et al (2005)
64 c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6
Table 1 e The two texts used for the illustration of the notion of the underlying thematic structure of language production.
The first text is extracted from “A study in scarlet” (Doyle, 2005) and the other from the diaries of Nijinsky (1999).
“A Study in Scarlet”, Chapter 1, A. Conan Doyle

In the year 1878 I took my degree of Doctor of Medicine of the University of London, and proceeded to Netley to go through the course prescribed
for surgeons in the army.
Having completed my studies there, I was duly attached to the Fifth Northumberland Fusiliers as Assistant Surgeon. The regiment was
stationed in India at the time, and before I could join it, the second Afghan war had broken out. On landing at Bombay, I learned that my corps
had advanced through the passes, and was already deep in the enemy’s country. I followed, however, with many other officers who were in
the same situation as myself, and succeeded in reaching Candahar in safety, where I found my regiment, and at once entered upon my new
duties. The campaign brought honors and promotion to many, but for me it had nothing but misfortune and disaster. I was removed from my
brigade and attached to the Berkshires, with whom I served at the fatal battle of Maiwand. There I was struck on the shoulder by a Jezail
bullet, which shattered the bone and grazed the subclavian artery. I should have fallen into the hands of the murderous Ghazis had it not
been for the devotion and courage shown by Murray, my orderly, who threw me across a pack-horse, and succeeded in bringing me safely to
the British lines. Worn with pain, and weak from the prolonged hardships which I had undergone, I was removed, with a great train of
wounded sufferers, to the base hospital at Peshawar. Here I rallied, and had already improved so far as to be able to walk about the wards, and
even to bask a little upon the verandah, when I was struck down by enteric fever, that curse of our Indian possessions. For months my life was
despaired of, and when at last I came to myself and became convalescent, I was so weak and emaciated that a medical board determined that
not a day should be lost in sending me back to England. I was dispatched, accordingly, in the troopship “Orontes,” and landed a month later
on Portsmouth jetty, with my health irretrievably ruined, but with permission from a paternal government to spend the next 9 months in
attempting to improve it.
Nijinsky’s Book I, On Life, pp 3e4 (Unexpurgated Edition).
I have had a good lunch, for I ate two soft-boiled eggs and fried potatoes and beans. I like beans, only they are dry. I do not like dry beans, because
there is no life in them. Switzerland is sick because it is full of mountains. In Switzerland people are dry because there is no life in them. I have
a dry maid because she does not feel. She thinks a lot because she has been dried out in another job that she had for a long time. I do not like
Zurich, because it is a dry town. It has a lot of factories and many business people. I do not like dry people, and therefore I do not like business
people.
The maid was serving lunch to my wife, to my first cousin (this, if I am not mistaken, is how someone related to me by being my wife’s sister is
called), and to Kyra, together with the Red Cross nurse. She wears crosses, but she does not realize their significance. A cross is something
that Christ bore. Christ bore a large cross, but the nurse wears a small cross on a little ribbon that is attached to her headdress, and the
headdress has been moved back so as to show the hair. Red Cross nurses think that it is prettier this way and have therefore abandoned the
practice that doctors wanted to instill in them. The nurses do not obey doctors, because they do not understand the instructions they have to
carry out. The nurse does not understand the purpose she is here for, because when the little one was eating, she wanted to tear her away
from her food, thinking that the little one wanted dessert. I told her that “she would get dessert when she had eaten what was on the plate”.
The little one was not offended, because she knew I loved her, but the nurse felt otherwise. She thought I was correcting her. She is not getting
any better, because she likes eating meat. I have said many times that it is bad to eat meat. They don’t understand me. They think that meat is
an essential thing. They want a lot of meat. After eating lunch they laugh. I am heavy and stale after eating, because I feel my stomach. They
do not feel their stomachs, but feel blood playing up. They get excited after eating. Children also get excited. They are put to bed because
people think they are weak creatures. Children are strong and do not need help. I cannot write, my wife disturbs me. She is always thinking
about things I have to do.
illustrate a scenario with phonetic inputs and conceptual use the letter k to indicate this particular friend) and we
outputs. However, it should be emphasized that we employ no observe her from different angles, each visual experience is
specific phonetic particularities and we assume that the coded by the neural system as a set of neural vectors f1(k),
output of the phonological module is a vector representing a f2(k), f3(k), f4(k), ., fn(k), corresponding to the different views
word. This does not imply that the phonological processes of the face of this friend (different perceptual angles, for
cannot be the locus of or the basis for language associated example). In this case, what the memory stores is an average
pathologies. Indeed, a strong case has been made for the role pattern f(k), a large n-dimensional vector whose dimension n
of phonological aspects of working memory in language pro- depends on anatomical connectivity. A new pattern similar to
cessing (see Baddeley, 2007 for a review), and consequently in f(k) produced by an unknown photo from our friend k, say f*(k),
various language disorders (Gathercole and Baddeley, 1990). is finally projected onto f(k), and due to the large dimension-
Matrix models of distributed associative memories are, ality, it is in fact confounded with f(k) (the small differences
naturally, only approximations of real memories and thus are made null) and this pattern is identified as the face of our
they have obvious limitations. Nevertheless, these models are friend. This is the basic reason that enables these matrices to
capable of capturing important aspects of collective neuronal act as statistical identifiers. As was demonstrated by Kohonen
behaviors, especially considering neural networks composed (1972) and Anderson (1972) large dimensionalities enable the
of a very large number of neurons. In his early work on the sharing of the same matrix support with a number of other
biological plausibility of these matrix models, Cooper (1973) different patterns (e.g., the faces of different friends k0 , k00 , etc.)
analyzed the manner in which similar patterns correspond- We illustrate this central property of associative matrix
ing to the same percept could be identified with a prototype. models in Fig. 2. Note that these averages behave in the
This observation arises because similar patterns processed framework of the model in a similar manner to prototypical
during learning generate a type of statistical average, and this concepts used in cognition. In this way the emergence of
average is by definition the prototype of the experienced concepts, a high level cognitive brain ability, can be traced
pattern. For instance, if we are introduced to a new friend (we back to the imperfection of memory and its propensity to
c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6 65
Fig. 1 e Topic graphs obtained from manual label assignment for (A) Dr. Watson’s “speech” and (B) Vaslav Nijinsky’s diary.
Discourse trajectories are shown as lines over the graphs. From visual inspection alone (and supported by the
corresponding entropies, see text) it can be seen that the latter sample is more disordered with a more recurrent trajectory.
The yellow “dry” node in (B) indicates perseveration of a concept over different topics.
confound similar patterns (for an analysis of this issue, see each association is weighted by its frequency which leads
Cooper, 1973, and also Levi-Montalcini, 1989). An interesting directly to the Singular Value Decomposition (henceforth
consequence of the capacity to project onto prototypes is the SVD) of the matrix. SVD is a technique that decomposes a
ability of these matrix memories to correctly identify patterns matrix as a sum of matrices in order of decreasing impor-
that have deteriorated, a remarkable phenomenon beautifully tance. In matrix memories each term of the SVD is a
illustrated by the classical numerical experiments of Kohonen particular associative memory and the highest weighted
and coworkers (Kohonen, 1977; Kohonen et al., 1977). terms retain the most important associations. In this way
It is noteworthy that matrix memories can retrieve pat- the effect of learning can be seen as an enhancement of
terns according to the frequency of presentation of the prototypical associations and a reduction of the importance
patterns during the learning process, something which can of infrequent or underrepresented patterns, effectively
be interpreted as a statistical procedure (Pomi and Mizraji, performing a kind of dimensionality reduction (see below,
1999; Mizraji, 2008; see also Oja, 1982). In its simplest form, and Mizraji, 2008).
66 c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6
Fig. 2 e Example of an associative memory matrix whose input and output vectors are images of human faces [the images
we used in these simulations were adapted from the set of Samaria and Harter (1994)]. (A) For each grayscale image, the
edges were enhanced and the pixel values stored in a single column vector. (B) An associative memory matrix was built
with two faces of the same individual used as inputs and outputs. (C) The resulting memory was tested for its ability to
restore the corresponding output given an incomplete input (a noisy input) and the destruction of half the synaptic weights
stored in the matrix.
The potential of the matrix memory models can be largely the existence of coincidence detectors, or due to the display of
extended by employing multiplicative vector contexts. Multi- AND functions at the level of small neural networks capable of
plicative interactions have been postulated in different acting as units in a larger neural model (for further detail on
cognitive contexts (Humphreys et al., 1989; Mizraji, 1989; these multiplicative performances, see Koch and Poggio, 1992;
Smolensky, 1990). In our models (Pomi and Mizraji, 2004; Peña and Konishi, 2001).
Mizraji et al., 2009), the context vectors allow the storage in An alternative to multiplicative contexts is to assume that
memories of a large variety of potentially adaptive behaviors. contexts are part of a network that interacts additively with
This kind of contextualization requires performing some form the input. This is precisely the approach adopted in classical
of multiplication of signals at the synaptic level, or through multilayer perceptron models trained by backpropagation
Fig. 3 e Schematic description of LSA. (A) A term-by-document matrix (TD-matrix) is constructed, where each element is a
function of the frequency of a word in a document from the corpus. Then, SVD is applied to the matrix, but only k singular
vectors and values are retained (with k heuristically selected in the order of several hundreds), resulting in a truncated TD-
matrix. (B) Pairwise word (cosine) similarities for “cat”, “dog” and “bird” before versus after SVD truncation of the TD-
matrix. This procedure enhances the similarity between “cat” and “dog” while dramatically reducing the similarity
between the other two. A depiction of the vector positions in the semantic space enabled by LSA is also shown.
c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6 67
(Rumelhart et al., 1986a, 1986b). This approach is of neuro-

computational interest despite the fact that the biological
basis of backpropagation training remains uncertain,
although there are recent proposals that aim to solve this
lack of biological realism while preserving the desirable
computational properties (Grüning, 2007). Instead, the multi-
plicative networks, that we adopt here in the simulations
displayed in Figs. 5e8, can be trained with simpler Hebbian-
like learning procedures. A caveat is that potential biophysi-
cal mechanisms that can support multiplication have not yet
been definitively established. This limitation notwith-
standing, a great advantage of a neurocomputational
Fig. 4 e The neural architecture underlying the topic approach based on multiplicative contexts is that it provides a
detection model. (A) Neural processing of language clear theoretical foundation for exploring neural networks
requires phonetic and phonological processing achieved and allows a straightforward vector-matrix representation of
by module (P), a Sketchpad where integration of some of the networks; this very fact presents a unique op-
information is produced (S), and several memory modules portunity to exploit powerful mathematical instruments to
(corresponding to different modalities or association evaluate the potential of the modeled neural systems (Pomi
cortices) that are queried for an interpretation of the and Mizraji, 2004; Mizraji et al., 2009).
representation present in the sketchpad. (B) Our version of
this model connects LSA and neural models by collapsing
the model in (A). The module (P) produces phonological 4. A model for the thematic packing of
representations. The integration is achieved through the lexical strings and its relation to LSA
Kronecker product (symbolized by “5”) and it is used to
query memory systems that produce a vector of activities A second approach that can bridge the gap between neurobi-
conceived as the interpretation. Therefore, each pair (word, ological and cognitive levels is research in the domain of
interpretation) produces a new interpretation which is the machine learning and information retrieval. These fields have
“gist” of the linguistic input. It is in this sense that the created a set of methodologies to automatically retrieve in-
model detects the “topic” of a processed discourse. In the formation, extract topics and organize information derived
same way that a string of words is mapped to a vector that from stored textual documents. At least one of these methods
represents the aggregated meaning of the words in LSA, e LSA (Deerwester et al., 1990) e shows similarities with
here the model gives the composite meaning as a topic procedures plausibly employed by the human brain to acquire
vector. and decode linguistic productions (Mizraji, 2008; Pulvermüller,
2012). LSA is interesting since although its original application
was as a method for information retrieval (Deerwester et al.,
1990), early on it was proposed as an influential psychologi-
cal model of language acquisition and analysis (Landauer and
Dumais, 1997; Landauer et al., 1998; Kintsch, 2001; Dumais,
2003; Landauer et al., 2007). LSA exploits well-understood
and powerful mathematical methods based on matrix and
vector algebra particularly apt for computer implementations.
In LSA documents are coded as large dimensional (column)
vectors, whose components correspond to some defined
function of the frequency in each document of particular
words in the vocabulary. The whole set of documents defines
a matrix, the “term-by-document matrix” which is processed
using SVD (Berry and Browne, 2005). This method captures
indirect links between different documents (tens of thou-
sands) and thus reveals “latent” semantic relationships be-
Fig. 5 e Graphical representation of the associations stored tween them. The resulting matrices can be huge, but easily
in the simple model. Two examples of the pathway to available computational power makes it possible to compute a
North. In both cases the associations are trained between term-by-document matrix of hundreds of millions of com-
neighboring neural (cognitive) states but with a projected ponents using a simple desktop computer.
overall design such that the system can achieve the We have already mentioned the implicit connection be-
explicitly stated goal. The global design could be to tween matrix memories and SVD, but in LSA the SVD is
perform only clockwise steps (A) or to reach the goal in explicitly employed. The processing of the matrix through
only one step if the initial point is adjacent to it (B). The SVD leads to a set of non-negative numbers of decreasing
successive states of the system will be blind local steps but magnitude namely the “singular values”, and a set of associ-
they are globally organized as actions aimed toward ated vectors called “singular vectors”. The final result of the
reaching the overall goal. SVD is that the original term-by-document matrix is
68 c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6
Fig. 6 e A model of language production as an ordered trajectory through topics. (A) The installed graph of concepts. Every
color represents a topic, where each node refers to a concept. Full lines between nodes are context specific links; dashed
lines represent connections between concepts in different topics. The table under the graph describes every topic and every
associated concept. (B) Output of the model when asked to “describe a typical day, starting in the morning” (i) or (ii) “going
out during a weekend”. The graph represents the first situation. Below the graph some example trajectories are depicted.
Note that there is some repetition of the concepts in the target topic given that we did not include a stopping criterion. The
imposition of the topic selects a set of links between concepts in such a way that from every concept stored there is at least a
route to the target topic (see Appendix for the mathematical details). Whenever there are multiple routes the system decides
probabilistically which route to take.
expressed as a sum of matrices that result from the product of et al., 1998). Additionally, it has been shown to be useful in
singular vectors, with each one of these new matrices understanding some of the unique features of language in
weighted by the corresponding singular values (Berry and Broca’s aphasia (Roll et al., 2011), and also in accounting for
Browne, 2005). Usually (but not always) raw word fre- word frequency differences in semantic aphasia and semantic
quencies are not used in LSA, but are substituted by one dementia (Hoffman et al., 2011). Moreover, it has been
particular function. The functions weight differently each demonstrated that it is possible to evaluate patients with
term according to its distribution in a corpus of texts (see schizophrenia based on open-ended verbalizations and using
Salton and Lesk, 1965; Dumais, 1991; Landauer et al., 2007). For LSA automatically derived language scores to accurately
instance highly frequent words (which are not informative distinguish patients from controls, patients from other pa-
about the topic of the document) are weakly weighted and tients, and also from their family members (Elvevåg et al.,
some other words (e.g., articles or prepositions) are discarded. 2007, 2010). In this sense, besides being successful for the
Words that appear only in a single document are also dis- description of putatively “normal” language use and acquisi-
carded. This defines a meaningful vocabulary, and the tion (Landauer and Dumais, 1997), LSA also provides a
dimension of the document vector is the length of the framework with which to examine its breakdown.
meaningful vocabulary used in the full set of documents. As a Inspired by LSA and its relationships with neural models,
consequence, the vector dimension is usually on the order of we present below a novel neural model that can make these
thousands (Landauer et al., 1998). The central idea of LSA is connections explicit. As our main concern is the exploration
the retention of hundreds (out of thousands) of those terms of of possible links between memories, neural imaging and LSA,
the decomposition that are associated with the largest sin- we assume that the input of our model is a sequence of vec-
gular values (see Fig. 3 for an illustration of this idea). tors, each consisting of a particular pattern of activity of a
It was empirically discovered that the retained singular neural population that represents each word in the input.
vectors can usually be regarded as “topic markers” and that Each of these internally represented words operates as a
the singular vectors involved act as symbolic conceptual query to the memory systems of the “brain” in such a way that
vectors that produce a form of averaging of the real docu- together with the current understanding of the previous lin-
ments involved in a topic (Hofmann, 1999; Papadimitriou guistic input, it elicits new conceptual understanding (see
et al., 2000; Valle-Lisboa and Mizraji, 2007). Fig. 4 for a general outline of the model).
The relevance of LSA to language processing models has The various memory systems that are queried by the
been discussed previously (e.g., Landauer et al., 1998; Foltz internally represented word can be the modality specific
c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6 69
Fig. 7 e (A) Structure of the neural blocks involved in the simulations. Each block corresponds to a vector in the model. The
noisy blocks vectors do not participate in the computations and each of their cells is assumed to have Gaussian noise. The task
memory buffer is the vector that codes for the goal or target of the discourse. The phonetic input buffer has a special coding of
the input, assumed to be different to the input network which represents the activity that goes into the computational module.
Finally the output network is the output activity of the computational module. (B) Time averaging and matrix construction.
Each unit in the model e including unrelated blocks e has an associated activity that varies with time. In order to have a more
realistic time resolution, we use a moving window of width d that moves s time steps, in order to average the activity of each
unit. If the whole set of tasks takes T units of time we end up with T/s time epochs. In each epoch the average activity of each
unit is placed in a row vector as shown. (C) SVD of the matrix that results from B yields a set of V and U vectors. The first V vector
is the (first) eigenimage (D) projecting the eigenimages obtained from the model of Fig. 6. The models were first subjected to the
task of going from topics 3 / 1, then paused (p), followed by 1 / 3, p, 1 / 3, p, 3 / 1. The images are the mean eigenvectors
obtained over twenty different simulation sets. The colorbar in the bottom of Panel (B) shows the deviation of activity levels
from baseline; in red the highest activity increase and in blue the biggest decrease.
cortex or even category specific areas (Pulvermüller, 2010). this simple and approximate conception the interpreter acts
The data retrieved from this query is a vector of activities as a naive Bayesian classifier (Valle-Lisboa, 2007).
representing the various pieces of information that each word Richer diagnosis capabilities can be obtained by subtle
in the specific context elicits in each memory system. We can modifications such as having a short-term memory that keeps
formally represent this network by a distributed memory track of several inputs (see Pomi and Olivera, 2006; for an
whose output is an interpretation of the context and which application to medical diagnosis) leading to enhancements of
depends both on the previous interpretation and the current recognition capabilities in artificial models aimed at Natural
input. This module is an example of a multiplicative context- Language Processing tasks (Valle-Lisboa, 2007; Cabana and
dependent memory as described above that associates each Valle-Lisboa, in preparation). What is of critical relevance here
word with a subset of interpretations. The strength of the is the fact that a linguistic input consisting of a sequence of
association reflects the likelihood of the word given the words is transformed into a sequence of internally represented
interpretation. In the absence of input, the interpretation is a words and associated concepts, which can thus be processed
combination of possible interpretations weighted by the prior further. But what is unique about this inner conceptual repre-
probability. Therefore, it can be shown that the output of this sentation, as opposed to the bare (original) input? The impor-
context-dependent module at each time step is a combination tance of the process depicted in Fig. 4 lies in the fact that in order
of interpretations weighted by the a posteriori probability. In to understand an utterance, individuals first need to generalize
70 c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6
Fig. 8 e The effect of synaptic weight elimination in the workings of the simplified neural model of Fig. 6. The performance
is evaluated on one of the tasks as in Fig. 7. (A) Example of a trajectory (first task) of an “affected” individual (simulation).
Notice the jump between different topics. Under the discourse trajectory we show the calculation that leads to the entropy of
topic 1, S(1). As there are six words belonging to topic 1 every term has a denominator equal to 6. Each term adds the
contribution of each segment of topic 1; the first segment has two words, the second and third have one word, and the last
segment belonging to topic 1 has two words. Thus the topic 1 entropy is 1.33. The mean entropy for the discourse is
calculated by averaging the entropy over the four topics, and this mean entropy is 1.19. (B) Mean entropy as a function
of synaptic weight elimination. For each level of destruction we ran 100 simulations and calculated the mean entropy of the
output (red diamonds). The blue line shows the grand average of entropy at each level of synaptic elimination. (C) The
average eigenimage of the simulated individuals with 80% of the network’s weights pruned. (D) “Voxel” to “voxel”
difference in activity between the average eigenimage of the non-pruned networks and the average eigenimage of the
pruned networks. The first block was used as normalization and then the images were subtracted. Notice the predominance
of the non-pruned network image.
from what they encountered to novel situations. In our be termed “topics” (Griffiths and Steyvers, 2002). Naturally this
extremely simplified model this is accomplished by the is only a small part of the relevant processes, but in the next
sequence of interpretation vectors that produce a form of section we demonstrate that motor activities related to lan-
contextual interpretation of each term. In this sense, our model guage can also be modeled in our neural networks and that the
is closely related to LSA (Mizraji, 2008), in which the domain or destruction of part of the network results in a confusion of
topic that a linguistic expression refers to is the final output of discourse targets that can be used to model certain aspects of
the interpreter module after the sequence of words has been psychopathology as expressed via language.
presented. In LSA a string of words is projected onto a semantic
space and the resulting semantic vector can be said to represent
the interpretation of that string. Other corpus based methods
5. Functional imaging
have the same objectives, despite different technical imple-
mentations (Hofmann and Puzicha, 1999; Griffiths and Steyvers,
In recent years substantial advances in technologies designed
2002). Our model’s interpretations are what in other models can
to capture and process brain images have greatly expanded
c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6 71
available classical neuroanatomical data concerning neural with which to explore aspects of language production and
connectivity. Neuroimages obtained using positron emission additionally to model its breakdown due to anatomical and
tomography (PET) and functional magnetic resonance imag- functional disconnections as may be the case in psychopa-
ing (fMRI) open up new avenues to investigate the relationship thology. This is the issue that we address in the next two
between the connectivity of different neural networks and the sections.
functional behaviors reliant upon these networks (Sporns,
2010). Within this framework, three different empirical ap-
proaches to connectivity can be considered: (1) structural or 6. Context-dependent associative models for
anatomical connectivity, defined by the specific physical links goal-oriented linguistic behaviors
between neurons; (2) functional connectivity, that explores
the statistical correlations of different networks during The organization of goal-directed sequences of behavior is a
behavioral tasks performed during a defined time interval; basic operation of neural systems. This ranges from the
and (3) effective connectivity, that goes beyond the functional simplest animals and motor actions to the most complex ab-
correlations and looks for the network of causal dependencies stract human behavior such as navigation, language produc-
underlying a given behavior (Friston, 1995, 2011; Sporns, 2010). tion or the searching for solutions in the presence of
Recently the resting state activity (namely activity not due to constraints (which naturally is the foundation of scientific
any particular experimental task) has been shown to be inquiry).
organized according to similar functional networks as those Context-dependent matrix memories enable a neural sys-
revealed by conventional activation paradigms. In brief, there tem to store (eventually in the same neural substrate)
is a dramatic explosion of studies designed to understand different sequences (also rhythmic sequences) that can be
those networks in terms of neural models (Bienenstock and accessed depending upon the neural context (Mizraji, 1989). In
Lehmann, 1998; Rubinov and Sporns, 2010; Tagliazucchi the model presented above (Fig. 4) the context was the topic or
et al., 2011). interpretation arrived at by the model. Here we illustrate the
Functional connectivity is a method that enables the generation of sequences using a geographical metaphor: a
exploration of how neural activities correlate during particular context-dependent memory trained in such a way that start-
tasks and has been applied to a variety of lexical and linguistic ing from any one of the cardinal points it is able to reach any
data (Xiang et al., 2010). During the execution of a defined other point. This can be thought of as a minimalistic system
verbal production, a sequence of “functional” brain images is displaying a goal-directed behavior, an essential feature of
captured. These images are represented mathematically as a any model of language production. In our simple example
very large vector, each component of this vector being a voxel presented previously (Mizraji et al., 1994), given a cardinal
spatially located in the brain at a specific point and time. The destination (a neural activity acting as context), a particular
“intensity” of a voxel is a measure of the level of the neural memory was trained to assign a step clockwise or counter-
activity in the corresponding spatial location. Thus, the clockwise in the compass rose, starting from each possible
resulting data from a verbal task would generate a sequence of cardinal point (see Fig. 5).
vectors that are subsequently arranged to form a rectangular Many aspects of language production are naturally goal-
matrix (Friston, 1995). This matrix is processed using SVD and directed behaviors. To describe a daily routine such as the
the first few singular vectors are retained. Each singular vector act of dressing, or how to reach some address in a city, or
is thus an image, usually called an “eigenimage”. The eigen- simply to tell a story, requires the ability to organize a tra-
image is one of the singular vectors of this SVD associated with jectory in discourse so as to meet the objective. In those cases,
the larger singular values. Given what SVD does, this is effec- the discourse develops a trajectory of associations within a
tively retaining most of the variability in the data. theme or a topic and smoothly crosses to a neighboring topic
Here again, as is the case in LSA and in matrix memories, to continue its associative “navigation”. A thematic ambient
SVD applied to the appropriate matrix, produces vectors that can be represented by a vector context in the model. Within
code for a strategic synthesis of the data under investigation. each topic, different chains of associations can be produced
Obviously, in each one of the three different situations, the depending on another contextual signal, namely the goal to be
nature of the matrices is unique. In LSA the matrices store reached. This context marks the transition between the
abstract versions of the documents, in the neural memories, different themes or topics of the discourse and the sequence
the matrices store associative information in their synaptic of associations within each theme.
weights, and in the case of neuroimages the matrices are Consider for instance the following extension of the
constructed from time sequences of large voxel vectors. geographic model in which patients’ flow of speech is evalu-
An appealing possibility is the creation of a model system ated by simple sequential “script-like” questions, such as
that allows the viewing of the same kind of linguistic task describing daily routines. The speaker aims to communicate
using the three approaches in combination. However, at this the routine starting with their morning activities, then going
point it is not possible to completely reach this objective with to work, and so on. A simple neural model of these activities is
real data, primarily because there is as of yet no universally devised below using a single context-dependent memory
acceptable neural model for linguistic decoding and produc- module that is fed by a neural activity representing the target
tion and also because the spatial resolution of imaging does endpoint in the discourse (e.g., having dinner) and a starting
not reach the level of individual neurons. Thus, in order to state (e.g., waking up). During the operation of the model, the
illustrate how this connection potentially could be achieved memory module outputs the words concerning any interme-
we present a miniature system. This model is a framework diate themes (for instance going to work).
72 c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6
The behavior of the model is illustrated in Fig. 6, with every The activity of each neuron is recorded and averaged over
topic imposing associative links between concepts, even be- time during the execution of each task, in this case using a
tween terms that are not directly related to the topic. The set sliding window. Averaging over sets of units is also possible
of concepts associated with the topic that is the target of the but it is not shown, basically because it adds complexity but
discourse acts as an “attractor”, in such a way that every no additional insights. The resulting succession of vectors of
concept trajectory ends in the intended domains. The links averaged activity is used to build a matrix [which is processed
imposed on concepts associated with other topics are un- as detailed in Friston (2011)]. After obtaining the first eigen-
specified in the sense of being context-independent (in Fig. 6 vector of this matrix (the ‘eigenimage’ in Friston nomencla-
we describe the workings of the model when it is asked to ture) each unit activity is color-coded and placed in the
describe the activities of a typical day starting in the morning corresponding cell in the anatomical model. A scheme of the
and then progressing through work and up until dinner). process is shown in Fig. 7.
When the input target is noise (assumed to be a combination The results of the pruning simulations are illustrated in
of all topics) every concept within each topic is associated to Fig. 8. To quantify language disorder we use the “topic entropy
each of the concepts belonging to the same topic and there are measure” (introduced in Cabana et al., 2011a), which essen-
special links between all topics (see the dashed lines in the tially measures the tendency of words belonging to the same
Fig. 6A) that connect some concepts in one topic with nodes in topics to be produced separately. Elimination of 80% of the
other topics. Notice that the target imposes that every the- synaptic weights resulted in the majority of the simulation
matic network organizes in such a way that trajectories will go sets producing trajectories that were either disordered or
in the direction of the relevant topic. During each step, the deviated from their target when the models were asked to
output activity of the module enters a probabilistic decision reconstruct “what they did during the day” (notice the
process which selects only one concept; the probability of dispersion of entropies in Fig. 8 panel B). In Fig. 8 panels C and
selecting each concept being proportional to the weight of D we present the average eigenimage of a pruned model with
each concept in the output. 80% of its connections set to zero, and in Fig. 8 panel D the
difference between the images of an intact network and a
pruned network.
7. Using neurocomputational models to Although seemingly large, elimination of 80% of the
solve the disconnectionehyper-connection model’s synapses is not unrealistic, since this percentage is
paradox calculated by only taking into account the context-dependent
memory module, and not other parts of the system which
A very elegant and influential model of the pathophysiology of despite being part of the eigenimage are modeled as noise.
schizophrenia states that part of its neural substrate is the Moreover, as argued in Valle-Lisboa et al. (2005; see also
reduction of synapses, possibly as an overshoot of the normal Mizraji et al., 1994) multiplicative models are idealized in the
pruning process and importantly this has consequences at the sense that they assume that the full Kronecker product is
behavioral-symptom level, specifically concerning hallucina- implemented. We suggest that this only approximates reality
tions (Hoffman et al., 1995; Hoffman and McGlashan, 1997). and we have shown previously that removal of a certain
The authors proposed a standard connectionist model to number of synapses does not significantly affect the model’s
show that when this overpruning was applied to a language behavior (Mizraji et al., 1994). However, it is important to
processing module spontaneous activity resulted, something emphasize that, as is the case in the physical sciences and in
that they linked to the resulting “hallucinated voices”. This is some engineering applications, a rigorous application of the
because the intact model forms linguistic expectations in the quantitative conclusions emerging from neurocomputational
sense that the recognition of words depends on the previous models to real neural systems requires scaling. These scaling
linguistic context and when it is pruned it generalizes these laws determine the manner in which many properties scale
expectations to spontaneous neural activity, in a sense with size, and are very specific to each system. In the case of
confusing noise with input. Essentially the same results are neuronal models, these laws have only just begun to be
obtained when using the multiplicative context models that explored and likely will become crucial topics in the near
constitute the building blocks of our models (Valle-Lisboa future (see Bassett et al., 2010).
et al., 2005). The deterioration of neural connectivity can be due to
In line with this previous work, here we analyze the effect physical disconnection or to modulatory disturbances. We
of synaptic pruning in discourse production by applying a have shown previously that in language processing networks
disconnection regime to the neural model depicted in Fig. 6. the effect of synaptic pruning can be mimicked by changing
Synaptic pruning involves turning to zero a certain percentage each unit’s threshold and that these changes are reversible
of randomly selected weights, a manipulation that can be (Valle-Lisboa et al., 2005). This deterioration in synaptic con-
conceived of as the result of anatomical disconnection but nectivity does not imply a disconnection of the conceptual
that could also result from neurochemical alterations. We also semantic network; the way information is coded could make
obtain the imaging counterpart of the neural model. In order synaptic disconnection into conceptual over-linkage, leading
to do so, we devise the anatomical model system as shown in to confusion, or strange associations. Crucially, associative
Fig. 7, where each square represents a specific “brain” region, memory models can clarify this issue and resolve the paradox.
and within the squares, each cell represents units in the Distinguishable concepts in the model are coded by orthog-
model. To calculate the unit’s activity, the model is run to onal or quasi-orthogonal vectors. The effect of the physical or
execute a specific set of tasks that involve many time steps. functional disconnection of synaptic elements provokes the
c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6 73
loss of the mathematical orthogonality among vectors coding et al., 2011a) we also need to obtain entropy measures from
some concepts, creating confusion in the associations and more speech samples from patients and control participants,
resulting in the linking of concepts that were not previously something that will contribute to further improvements in the
associated. This linkage does not depend on the semantic methods used to quantify disorder. Toward this end, we are
similarity but on the arbitrary neural coding of the concepts, developing a combination of machine learning techniques and
which is not pre-determined. LSA in order to obtain automatically the graphical represen-
The presence of rare associations induces “jumps” in the tations and the associated analysis (Cabana et al., 2011b).
thematic sequence of the discourse. The possibility that the Some further issues merit comment: First, the type of
synaptic pruning also affects the separateness of the concept models we apply here are quite basic. In recent years it has
or theme acting as a target (these are also coded by vectors become possible to perform large scale simulations of a huge
that could have lost their orthogonality) can create a loss of number of realistic, or at least rich, spiking neurons
polarity in the trajectory, and even result in the random (Izhikevich, 2006). Although this line of work is increasingly
“wandering” through the semantic landscape. Thus, in the important and likely will figure prominently in future at-
model, the synaptic disconnection is evident at the same time tempts to understand the relationship between brain asso-
as a disconnection in the neuroimages, and as an increase in ciative networks and emergent activities such as language,
semantic connectivity leading to conceptual confusion (for a the number of parameters needed to be set can be extremely
full explanation of the differences between anatomical net- large and difficult to determine. Potentially this can lead to a
works and semantic networks see Pomi and Mizraji, 2004). combinatorial explosion of possible parameter values, some-
thing that is highly impractical and leaves several in-
terpretations open. More importantly, even if the correct
8. Discussion parameter set can be found and the model performs well, the
basis of its functioning can be hard to discern. In contrast, the
The relationship between language and the brain has tradi- model type we have employed in this paper aims to balance
tionally been studied by a combination of computational and realism and interpretability.
linguistic tools as well as by functional neuroimaging. The Second, the models we have presented above aim to cap-
addition of neural network models has contributed to new ture the assumed basis of mental functioning, which is that
perspectives but also introduced new challenges. We suggest mental activities are the result of the concurrent operation of
that neural models are the key to connect these approaches a large number of interconnected units. Central to this
and provide a unified theory of normal brain functioning and approach, the nature of the local computations together with
the ways in which language can be disrupted. The present the wiring diagram and the plasticity rules determine mental
work is a step in this direction, but much remains to be done. activities. Thus, we have sacrificed realism because we have
The neural models based on multiplicative contextualization employed context-dependent matrix memories as the core
that we use in this work considerably simplify the imple- computational devices within each module. Obviously, the
mentation of modulatory effects of context in associations. level of required detail in any model is determined by the
Instead of using multiple hidden layers and being trainable by problem that needs to be addressed. In particular, language
a powerful but biologically hard to sustain algorithm (as it is phenomena belong to a level of organization that requires
the case of backpropagation), our multiplicative models can highly collective neural behaviors to be implemented: lan-
be based on coincidence detection at the synaptic level, and guage manages abstract symbolic syntactic or logical struc-
learning can be achieved using simple Hebbian processes. In tures, organizes the received and produced conceptual data
addition, the mathematical structure implied by multiplica- and interfaces with many other cognitive capacities.
tive contextualization via tensor products gives us a powerful These limitations notwithstanding, we propose that the
symbolic representation that allows deep theoretical insights, models and methods based on matrix algebra provide us with
capable for instance of guiding the design of computer ex- an appropriate level of description that can be used as the
periments. This formalism reveals links between the statistics foundation for more detailed models (see for example Cooper,
of discourse organization displayed by LSA and the mathe- 2000; Lee et al., 2006), or at the very least as building blocks of
matics that construct the functional neuroimages. This fact complex multimodular systems (Mizraji et al., 2009; Mizraji
gives rise to a challenging perspective: the possibility of and Lin, 2011).
creating a mutual feedback process that improves our un- The ubiquity of matrix formalisms evident in several of the
derstanding in the three domains. In this way, we can imagine neural models of language, in the statistical procedures
that a deep understanding of linguistic structures in close developed to capture the properties of functional brain images
correspondence with refined functional neuroimages, can and in procedures such as LSA that discover topic organiza-
help to improve the neural models that describe the under- tion in language corpora, is not a coincidence. There are two
lying (and usually non observable) neuronal dynamics. core properties that these approaches have in common that
The model presented in Sections 5e7 can be used to un- require the use of matrix techniques. One obvious property is
derstand the processes underlying the disorganization of the large number of interacting parts, be it voxels, terms or
discourse as determined by the entropy we presented in Sec- neurons. The other, more subtle, property is the inclusion, in
tion 2. To obtain quantitative agreement, we should employ a all the aforementioned approaches, steps of dimensionality
detailed model of language production, for instance one reduction as a fundamental procedure. Indeed, Friston eige-
including a larger vocabulary than what we have used here by nimages, LSA topics and associative memory prototypes, all
way of illustration. As we have discussed elsewhere (Cabana emerge from procedures such as SVDs and retention of only
74 c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6
part of the latent structure that the decomposition produces. network trajectories. During the putatively “normal” opera-
In image processing SVD is a standard procedure to eliminate tion of the model the sequence of outputs ends within the
noise. In LSA dimensionality reduction discovers underlying desired target of the discourse. This models the task where
hidden links between terms that are not superficially related. patients with schizophrenia or healthy participants are asked
The rules of synaptic change in neural networks can produce a for example to “describe a typical day”. Interestingly, dis-
similar dimensionality reduction that ends up storing pro- connecting the synapses in our model leads to a confusion of
totypes of the experienced activities. targets and in some cases to random “jumps” between topics,
These similarities should not be taken to imply that the as if the model has lost its target-oriented capabilities. A
connection between the approaches is straightforward. On paradoxical feature of these simulations is the fact that
the contrary, as we have demonstrated here, together with the increasing the level of disconnection produces an increase in
many similar properties there are many important differ- statistical connectivity of regions in the neural model. More-
ences. For instance, each of the voxels that fMRI or PET reg- over, targets that were originally far apart (due to vector
ister involve the activity of tens of thousands of neurons that orthogonality) become closer after damage (due to the loss of
within models are usually (but not always, see Poirazi et al., orthogonality) and this leads to target confusion.
2003) represented by single units. Language production relies The results obtained from our model suggest new explo-
on hidden structures and variables (in addition to word pro- rations at the level of language production, and in particular in
nunciation) and so a neural instantiation of latent structures some cases of schizophrenia. For instance, can the functional
is expected, but that does not necessarily imply that the disconnection replicate the paradoxical “shrinking” (hyper-
eigenimages, or our vector topics, coincide with latent vari- connectivity) at the level of semantic space postulated in
ables. However, we do think that the similarity of the ap- some cases of psychosis? Many studies have been conducted
proaches and tools calls for a principled integration of the to explore natural semantic spaces (Jones et al., 2006; Griffiths
different levels involved. In fact, neural models such as the et al., 2007), and thus we have available many methods to test
one we described in Section 4 (Mizraji et al., 2009) can produce our hypotheses. Indeed, our explorations are related to several
a type of neural LSA and in this way establish contact with the recent computational attempts to understand how semantic
underlying mathematics of classical LSA (and the results of knowledge is stored and represented in the brain, how it is
Landauer and Dumais, 1997). learned through development, and how this knowledge is
We have explored here the other part of the connection, affected and degraded by acquired injury and illness (for a
namely the relationship between neural models and functional review, see Rogers and McClelland, 2004). A core premise in
neuroimaging (see Sporns, 2010). Each latent variable in our this neurocomputational work is that explicit theorizing helps
model implies relationships between concepts but there is not a clarify ambiguity and it motivates very specific hypotheses.
simple one-to-one mapping between eigenimages and either Additionally there is the promise of being able to resolve
concepts or latent variables. Naturally there are many other clinical paradoxes and thereby provide unified accounts of
possible ways of implementing an imaging output from the language and its breakdown.
model. In our case we opted to accumulate the units of neural
activity as a measure of voxel activity but we did not include any
spatial coarsening. In this sense our data is more detailed than
Acknowledgments
what standardly would be obtained experimentally. Neverthe-
less, there are clear differences between the images obtained in
AP, EM and JCVL acknowledge the partial financial support by
a damaged model compared to the intact model. In particular
PEDECIBA and CSIC-UdelaR. AC was supported by PEDECIBA.
the difference between the average eigenimage of the intact
BE was supported by the Northern Norwegian Regional Health
models and that of the damaged models shows a clear decrease
Authority (Helse Nord RHF).
in correlation between areas in the lesioned models.
The relevance of our experiments that damage the synapses
Appendix
of the model is the relationship to phenomena that previously
have been postulated as the basic pathophysiological mecha-
Context-dependent associative matrix memories (CDAMM)
nisms underlying some symptoms in psychosis. As mentioned,
can be used to implement two-argument vector-valued
a well known example is the neural model of Hoffman and
functions.
McGlashan that aims to connect the neurodevelopmental hy-
In general a CDAMM has the following matrix structure:
pothesis of schizophrenia with the subsequent experience of
hallucinated voices (Hoffman and McGlashan, 1997). Following M ¼ Output1 ðInput1 5Context1 ÞT þ Output2 ðInput1 5Context2 ÞT
on from this pioneering work, we have previously shown that
þ Output3 ðInput2 5Context1 ÞT þ /
context-dependent matrix associative memories, adopting
sigma-pi neurons and multiplicative contexts, can also be used where Output1, Output2, ., are different output vectors, Input1,
to investigate the same problem with very similar results Input2, . are different input vectors, Context1, Context2, . are
(Valle-Lisboa et al., 2005). different contexts and 5 represents the Kronecker product
Here we adopt a simple geographical model based on between vectors. When both an input, say Input2, and a
multiplicative contexts, to explore aspects of language pro- context, say, Context1 are presented to the memory, the output
duction with the view that this production is a target-oriented will be Output3.
neural activity. The deterioration of the connectivity of parts The core of the model depicted in Fig. 6 is a context-
of the neural modules results in a disorganization of the dependent matrix memory consisting of four contexts (one for
c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6 75
each topic) and five concepts in each topic that are associated Cabana Á, Valle-Lisboa JC, Elvevåg B, and Mizraji E. Detecting
both as inputs and as outputs. All concepts of the same topic are orderedisorder transitions in discourse: Implications for
associated with each other regardless of context in the sense schizophrenia. Schizophrenia Research, 131: 157e164, 2011a.
Cabana Á, Valle-Lisboa JC, Elvevåg B, and Mizraji E. Using Machine
that the matrix has a term such that for every context, every
Learning Techniques to Study Discourse Alterations in Patients with
concept in the input elicits a concept of the same topic. If the Schizophrenia. São Paulo, Brazil: Schizophrenia International
sum of all topic vectors is All topics, then the matrix Research Society (SIRS) South America Meeting. Abstract
X X T published in Revista de Psiquiatria Clı́nica, http://www.hcnet.
M1 ¼ Conceptstopic 1 Inputtopic 1 5All topics usp.br/ipq/revista/vol38/s1/index.html; 2011b (downloaded
X X T December 18, 2012).
þ Conceptstopic 2Inputtopic 2 5All topics Cooper LN. A possible organization of animal memory and
X X T learning. In Proceedings of the Nobel Symposium on
þ Conceptstopic 3 Inputtopic 3 5All topics Collective Properties of Physical Systems. New York:
X X T Academic Press, 1973.
þ Conceptstopic 4 Inputtopic 4 5All topics Cooper LN. Memories and memory: A physicist’s approach to the
brain. International Journal of Modern Physics A, 15: 4069e4082,
incorporates this part of the model. 2000.
Associations that connect topics are also stored in such a Deerwester S, Dumais S, Furnas G, Landauer T, and Harshman R.
way that when a given context is active it is always possible to Indexing by latent semantic analysis. Journal of the American
Society of Information Science, 41: 391e407, 1990.
travel from one concept in another topic to concepts corre-
Doyle AC. A study in scarlet. In The Complete Sherlock Holmes by
sponding to the active topic. This is achieved by terms in the Sir Arthur Conan Doyle(Collector’s Library Editions). London:
equation associating a concept in one topic to a concept in CRW Publishing Limited, 2005 [First published by London:
another topic, in such a way that the concepts associated to Ward Lock & Co, 1887].
the target topic can be reached by a chain of associations. Dumais S. Data-driven approaches to information access.
Thus the term: Cognitive Science, 27(3): 491e524, 2003.
Dumais S. Improving the retrieval of information from external
T sources. Behavior Research Methods: Instruments and Computers,
Concept2topic1 Concept1topic2 5topic1
T 23: 229e236, 1991.
þ Concept3topic2 Concept1topic3 5topic1 DeLisi LE. Speech disorder in schizophrenia: Review of the literature
and exploration of its relation to the uniquely human capacity
where Concepti topic k denotes a concept belonging to topic k, for language. Schizophrenia Bulletin, 27(3): 481e496, 2001.
together with the previous matrix M1 ensures that when the Elman JL, Bates EA, Johnson MH, Karmiloff-Smith A, Parisi D, and
topic 1 is active starting from any concept in topic 3 there is a Plunkett K. Rethinking Innateness: A Connectionist Perspective on
trajectory that goes from concept 1 in topic 3, to concept 3 in Development. Cambridge, MA: The MIT Press, 1997.
topic 2. When this reaches concept 1 in topic 2 the trajectory Elman J. Finding structure in time. Cognitive Science, 14: 179e211, 1990.
can jump to topic 1. Elvevåg B, Foltz PW, Rosenstein M, and DeLisi LE. An automated
method to analyze language use in patients with
The construction of the model implies that from every
schizophrenia and their first-degree relatives. Journal of
concept several outputs can be reached. The particular path Neurolinguistics, 23: 270e284, 2010.
followed is based on a probabilistic decision using the same Elvevåg B, Foltz PW, Weinberger DR, and Goldberg TE.
probability for each available association. Quantifying incoherence in speech: An automated
methodology and novel application to schizophrenia.
Schizophrenia Research, 93(1e3): 304e316, 2007.
references Foltz PW, Kintsch W, and Landauer TK. The measurement of
textual coherence with Latent Semantic Analysis. Discourse
Processes, 25: 285e307, 1998.
Anderson JA. An Introduction to Neural Networks. Cambridge, MA: Friston KJ. Functional and effective connectivity in neuroimaging:
MIT Press, 1995. A synthesis. Human Brain Mapping, 2: 56e78, 1995.
Anderson JA. A simple neural network generating an interactive Friston KJ. Functional and effective connectivity: A review. Brain
memory. Mathematical Biosciences, 14: 197e220, 1972. Connectivity, 1(1): 13e36, 2011.
Andreasen NC and Grove WM. Thought, language and Gathercole SE and Baddeley AD. Phonological memory deficits in
communication in schizophrenia: Diagnosis and prognosis. language disordered children: Is there a causal connection?
Schizophrenia Bulletin, 12: 348e359, 1986. Journal of Memory & Language, 22: 103e127, 1990.
Arbib MA (Ed), The Handbook of Brain Theory and Neural Networks. Griffiths TL, Steyvers M, and Tenenbaum JB. Topics in semantic
Cambridge, MA: MIT Press, 1995. representation. Psychological Review, 114(2): 211e244, 2007.
Baddeley A. Working Memory, Thought, and Action. In Oxford Griffiths TL and Steyvers M. A Probabilistic Approach to Semantic
Psychology Series. New York, NY: Oxford University Press, Representation. Proceedings of the 24th Annual Meeting of the
2007. Cognitive Science Society.
Bassett DS, Greenfield DL, Meyer-Lindenberg A, Weinberger D, Grüning A. Elman backpropagation as reinforcement for simple
Moore SW, and Bullmore ET. Efficient physical embedding of recurrent networks. Neural Computation, 19(11): 3108e3131, 2007.
topologically complex information processing networks in Hoffman P, Rogers T, and Lambon Ralph MA. Semantic diversity
brains and computer circuits. PLoS Computational Biology, 6: accounts for the “missing” word frequency effect in stroke
e1000748, 2010. aphasia: Insights using a novel method to quantify contextual
Berry MW and Browne M. Understanding Search Engines: Mathematical variability in meaning. Journal of Cognitive Neuroscience, 23(9):
Modeling and Text Retrieval. Philadelphia: SIAM, 2005. 2432e2446, 2011.
Bienenstock E and Lehmann D. Regulated criticality in the brain? Hoffman RE and McGlashan TH. Synaptic elimination,
Advances in Complex Systems, 1: 361e384, 1998. neurodevelopment, and the mechanism of hallucinated
76 c o r t e x 5 5 ( 2 0 1 4 ) 6 1 e7 6
"Voices" in Schizophrenia. American Journal of Psychiatry, 154: Ostwald P. Vaslav Nijinsky: A Leap into Madness. London: Robson
1683e1689, 1997. Books Ltd, 1991.
Hoffman RE, Rapaport J, Ameli R, McGlashan TH, Harcherik D, Papadimitriou C, Raghavan P, Tamaki H, and Vempala S. Latent
and Servan-Schreiber D. A neural network simulation of semantic indexing: A probabilistic analysis. Journal of Computer
hallucinated "voices" and associated speech perception and System Sciences, 61(2): 217e235, 2000.
impairments in schizophrenia patients. Journal of Cognitive Peña JL and Konishi M. Auditory spatial receptive fields created by
Neuroscience, 7: 479e497, 1995. multiplication. Science, 292(5515): 249e252, 2001.
Hofmann T. Probabilistic Latent Semantic Analysis. Proceedings of Poirazi P, Brannon T, and Mel BW. Pyramidal neuron as two-layer
Uncertainty in Artificial Intelligence, UAI’99. neural network. Neuron, 37(6): 989e999, 2003.
Hofmann T and Puzicha J. Latent class models for collaborative Pomi A and Mizraji E. Memories in context. Biosystems, 50:
filtering. In Proceedings of the Sixteenth International Joint 173e188, 1999.
Conference on Artificial Intelligence: 688e693. Pomi A and Mizraji E. Semantic graphs and associative memories.
Humphreys MS, Bain LD, and Pike R. Different ways to cue a Physical Review E, 70(6): 066136, 2004.
coherent memory system: A theory for episodic, semantic, and Pomi A and Olivera F. Context-sensitive autoassociative
procedural tasks. Psychological Review, 96: 208e233, 1989. memories as expert systems in medical diagnosis. BMC
Izhikevich E. Dynamical Systems in Neuroscience: The Geometry of Medical Informatics and Decision Making, 6: 39, 2006.
Excitability and Bursting. Cambridge, MA: The MIT Press, 2006. Pulvermüller F. Brain-language research: Where is the progress?
Jones M, Kintsch W, and Mewhort D. High-dimensional semantic Biolinguistics, 4(2e3): 255e288, 2010.
space accounts of priming. Journal of Memory and Language, Pulvermüller F. Meaning and the brain: The neurosemantics of
55(4): 534e552, 2006. referential, interactive, and combinatorial knowledge. Journal
Kintsch W. Predication. Cognitive Science, 25: 173e202, 2001. of Neurolinguistics, 25: 423e459, 2012.
Koch C and Poggio T. Single Neuron Computation. Cambridge, MA: Rogers TT and McClelland JL. Semantic Cognition. A Parallel Distributed
Academic Press, 1992. Processing Approach. Cambridge, MA: The MIT Press, 2004.
Kohonen T. Correlation matrix memories. IEEE Transactions on Roll M, Mårtensson F, Sikström S, Apt P, Arnling-Bååth R, and
Computers, C-21: 353e359, 1972. Horne M. Atypical associations to abstract words in Broca’s
Kohonen T. Associative Memory: A System-Theoretical Approach. aphasia. Cortex, 48(8): 1068e1072, 2011.
New York: Springer-Verlag, 1977. Rubinov M and Sporns O. Complex network measures of brain
Kohonen T, Lehtiö P, Rovamo J, Hyvärinen J, Bry K, and Vainio L. connectivity: Uses and interpretations. NeuroImage, 52:
A principle of neural associative memory. Neuroscience, 2(6): 1059e1069, 2010.
1065e1076, 1977. Rumelhart DE, Hinton GE, and Williams RJ. Learning
Landauer TK, Foltz P, and Laham D. An introduction to latent representations by back-propagating errors. Nature, 323(6088):
semantic analysis. Discourse Processes, 25: 259e284, 1998. 533e536, 1986a.
Landauer T and Dumais S. A solution to Plato’s problem: The Rumelhart DE, McClelland JL, and the PDP Research Group. Parallel
latent semantic analysis theory of acquisition, induction and Distributed Processing: Explorations in the Microstructure of
representation of knowledge. Psychological Review, 104: Cognition. In Foundations. Cambridge, MA: MIT Press, 1986b.
211e240, 1997. Salton G and Lesk ME. The SMART automatic document retrieval
Landauer TK, McNamara DS, Dennis S, and Kintsch W (Eds), systems e An illustration. Communications of the ACM, 8:
Handbook of Latent Semantic Analysis. Mahwah, NJ: Lawrence 391e398, 1965.
Erlbaum Associates, 2007. Samaria F and Harter A. Parameterisation of a Stochastic Model for
Lee L, Friston K, and Horwitz B. Large-scale neural models and Human Face Identification. Sarasota (Florida): 2nd IEEE
dynamic causal modelling. NeuroImage, 30(4): 1243e1254, 2006. Workshop on Applications of Computer Vision, 1994.
Levi-Montalcini R. Praise of Imperfection: My Life and Work. New Smolensky P. Tensor product variable binding and the
York: Basic Books, 1989. representation of symbolic structures in connectionist
McClelland JL, Rumelhart DE, and the PDP Research Group. systems. Artificial Intelligence, 46: 159e216, 1990.
Parallel Distributed Processing: Explorations in the Snowdon DA, Kemper SJ, Mortimer JA, Greiner LH, Wekstein DR,
Microstructure of Cognition. In Psychological and Biological and Markesbery WR. Linguistic ability in early life and
Models. Cambridge, MA: MIT Press, 1986. cognitive function and Alzheimer’s disease in late life.
McCulloch WS and Pitts W. A logical calculus of the ideas Findings from the Nun Study. The Journal of the American
immanent in nervous activity. Bulletin of Mathematical Medical Association, 275(7): 528e532, 1996.
Biophysics, 5: 115e133, 1943. Sporns O. Networks of the Brain. Cambridge: The MIT Press, 2010.
McKenna P and Oh T. Schizophrenic Speech: Making Sense of Tagliazucchi E, Balenzuela P, Fraiman D, Montoya P, and
Bathroots and Ponds that Fall in Doorways. Cambridge, UK: Chialvo DR. Spontaneous BOLD event triggered averages for
CambridgeUniversity Press, 2005. estimating functional connectivity at resting state.
Mizraji E, Pomi A, and Alvarez F. Multiplicative contexts in Neuroscience Letters, 488(2): 158e163, 2011.
associative memories. BioSystems, 32: 145e161, 1994. Valle-Lisboa JC. Las redes neuronales y el procesamiento del lenguaje
Mizraji E. Neural memories and search engines. International natural. PhD Thesis. Montevideo, Uruguay: PEDECIBA-
Journal of General Systems, 37(6): 715e732, 2008. Universidad de la República, 2007.
Mizraji E. Context-dependent associations in linear distributed Valle-Lisboa JC and Mizraji E. The uncovering of hidden
memories. Bulletin of Mathematical Biology, 51: 195e205, 1989. structures by latent semantic analysis. Information Sciences,
Mizraji E and Lin J. Logic in a dynamic brain. Bulletin of 177: 4122e4147, 2007.
Mathematical Biology, 73: 373e397, 2011. Valle-Lisboa JC, Reali F, Anastası́a H, and Mizraji E. Elman
Mizraji E, Pomi A, and Valle-Lisboa JC. Dynamic searching in the topology with sigma-pi units: An application to the modeling
brain. Cognitive Neurodynamics, 3: 401e414, 2009. of verbal hallucinations in schizophrenia. Neural Networks, 18:
Nijinsky V. In Acocella JR (Ed), The Diary of Vaslav Nijinsky 863e877, 2005.
(Unexpurgated). New York: Farrar, Straus, and Giroux, 1999. Xiang H, Fonteijn HM, Norris DG, and Hagoort P. Topographical
Oja E. Simplified neuron model as a principal component functional connectivity pattern in the perisylvian language
analyzer. Journal of Mathematical Biology, 15: 267e273, 1982. networks. Cerebral Cortex, 20: 549e560, 2010.
View publication stats

A Modular Approach To Language Production Models A

Uploaded by

Copyright:

Available Formats

A Modular Approach To Language Production Models A

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Modular Approach To Language Production Models A

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

A modular approach to language production: Models and facts

Article in Cortex · February 2013

Juan Valle Lisboa Andrés Pomi

SEE PROFILE SEE PROFILE

Alvaro Cabana Eduardo Mizraji

SEE PROFILE SEE PROFILE

Semantics View project

Revista Psicología, Conocimiento y Sociedad View project

The user has requested enhancement of the downloaded file.

Available online at www.sciencedirect.com

Special issue: Research report

A modular approach to language production:

Juan C. Valle-Lisboa a, Andrés Pomi a, Álvaro Cabana a, Brita Elvevåg b,c

article info abstract

empirical domains currently used to analyze language,

“A Study in Scarlet”, Chapter 1, A. Conan Doyle

(Rumelhart et al., 1986a, 1986b). This approach is of neuro-

View publication stats

You might also like