WORDS, CORPUS
AND BACK TO WORDS
QUADERNS DE FILOLOGIA
ESTUDIS LINGÜÍSTICS XXII
WORDS, CORPUS
AND BACK TO WORDS
Edició de
MIGUEL FUSTER MÁRQUEZ
MOISÉS ALMELA
FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ
UNIVERSITAT DE VALÈNCIA
2017
QUADERNS DE FILOLOGIA DE LA UNIVERSITAT DE VALÈNCIA
ESTUDIS LINGÜÍSTICS
Volum XXII: Words, Corpus and back to Words
Quaderns de Filologia és la publicació regular de la Facultat de Filologia,
Traducció i Comunicació de la Universitat de València. Va nàixer el 1980 amb
el nom de Cuadernos de Filología. A partir de 1995 enceta una segona fase com
a Quaderns de Filologia. Compta amb dues sèries de publicació anual (Estudis
lingüístics i Estudis literaris).
Cada número de Quaderns de Filologia té un caràcter monogràic i l’edició
corre a càrrec de professors de la Facultat de Filologia, Traducció i Comunicació
especialistes en la matèria. Aquests editors són, en cada número, els responsables
de la selecció dels articles. No obstant això, els articles publicats des del número IX,
són sotmesos a dues avaluacions, interna i externa. La proporció d’articles externs
a la Universitat de València és, actualment, del 80 % de les contribucions al volum.
Quaderns de Filologia, a més, compta amb una col·lecció d’estudis titulada
Anejos de Quaderns de Filologia.
Edita:
Universitat de València
Intercanvi i subscripcions:
Vicedeganat de Cultura, Igualtat i Comunicació. Facultat de Filologia,
Traducció i Comunicació
Avda. Blasco Ibáñez, 32. 46010 València
[email protected]
Distribució:
Publicacions de la Universitat de València
C/ Arts Gràiques, 13. 46010-València / Tfn.: 963 937 174 - Fax: 963 617 051
© dels textos: els autores i les autores
© d’aquesta edició: Universitat de València, 2017
© de la coberta: Reproducció d’un fragment de l’oli de Pieter Brueghel (1563) La
torre de Babel (Kunsthistorisches Museum Wien).
Disseny de la coberta: Celso Hernández de la Figuera (PUV).
Fotocomposició i maquetació: Communico. Letras y Píxeles, S. L.
Dipòsit legal: V.229-1995
ISSN: 1135-416X
Imprimeix: Arts Gràiques Soler, S. L.
QUADERNS DE FILOLOGIA DE LA UNIVERSITAT DE VALÈNCIA
ESTUDIS LINGÜÍSTICS
Directors Honoríics: Ángel López García i Joan Oleza
Directora: Begonya Pozo Sánchez
Secretari de Redacció: Sergio Maruenda Bataller
Secretaria d’Edició:
Vicedeganat de Cultura, Igualtat i Comunicació. Facultat de Filologia, Traducció
i Comunicació
Consell de Redacció:
Mikel Labiano (Dept. Filologia Clàssica, UVEG)
Julia Sanmartín (Dept. Filologia Espanyola, UVEG)
Cesáreo Calvo (Dept. Filologia Francesa i Italiana, UVEG)
Begoña Clavel (Dept. Filologia Anglesa i Alemanya, UVEG)
Emili Casanova (Dept. Filologia Catalana, UVEG)
Monserrat Veyrat (Dept. Tª dels Llenguatges i CC, UVEG)
Guillermo Montes Cala (Dept. Filología Griega, Universidad de Cádiz)
Humberto López Morales (RAE y Asoc. de Academias de la Lengua Española)
Lorenzo Renzi (Dept. di Studi Linguistici e Letterari, Università di Padova)
Michael McCarthy (School of English, University of Nottingham)
Jordi Ginebra (Dept. Filologia Catalana, Universitat Rovira i Virgili)
José del Valle (Graduate Center, City University of New York (CUNY))
Elia Hernández Socas (Universität Leipzig)
Comité Cientíic:
Jean-Claude Anscombre (CNRS-Paris XII, França)
Manuel Carrera Díaz (U. de Sevilla, Espanya)
Nelson Cartagena (U. de Heildelberg, Alemanya)
Germà Colón (U. de Basilea, Suïssa)
Emilio Crespo (U. Autónoma de Madrid, Espanya)
Perfecto E. Cuadrado (U. de les Illes Balears, Espanya)
Luis Fernando Lara (Colegio de México)
Jacek Fisiak (U. de Poznań, Polònia)
Humberto López Morales (U. de Puerto Rico)
Elena Rojas (U. de Tucumán, Argentina)
Eustaquio Sánchez Salor (U. de Extremadura, Espanya)
Barbara Wotjak (U. de Leipzig, Alemanya)
ÍNDEX
IntroductIon.............................................................................
9
BautIsta ZamBrana, María Rosario
Corpus analysis of phraseology in an A1 level textbook
of German as a foreign language ..........................................
13
Bestgen, Yves
Getting rid of the Chi-square and Log-likelihood tests
for analysing vocabulary differences between corpora .........
33
chIerIchettI, Luisa
“El criado pesado”: La caracterización en la serie Águila Roja.....
57
garofalo, Giovanni
Persiguiendo con imparcialidad “el total desprecio
a la Constitución”: el léxico valorativo en la Querella
del Fiscal de Cataluña contra Carme Forcadell i Lluís .........
79
gIméneZ-moreno, Rosa & Ivorra-PéreZ, Francisco Miguel
The malleability behind terms referring to common professional
roles: the current meaning of “boss” in British newspapers ..
105
hennecke, Inga & Baayen, Harald
A quantitative survey of N Prep N constructions in Romance
languages and prepositional variability .................................
129
mansIlla, Ana
Lingüística de corpus y fraseología contrastiva (alemán-español):
Las combinaciones usuales de estructura [PREP + S].
El caso de entre lágrimas y unter Tränen ............................. 147
marín, María José & rea rIZZo, Camino
Assessing EPAP lexical features: A corpus-based study ...............
165
mattIolI, Virginia
Translator’s creativity in cultural elements transposition:
a corpus-based study .............................................................
187
sáncheZ-moya, Alfonso
Corpus-driven insights into the discourse of women survivors
of Intimate Partner Violence .................................................
215
castaño castaño, Emilia, laso martín, Natalia Judith &
verdaguer clavera, Isabel
Immigration metaphors in a corpus of legal English:
an exploratory study of EAL learners’ metaphorical
production and awareness .....................................................
245
normes d’edIcIó........................................................................
273
índex general de PuBlIcacIons............................................
301
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
WORDS, CORPUS AND BACK TO WORDS:
FROM LANGUAGE TO DISCOURSE
Miguel Fuster Márquez
Moisés Almela
Last century’s revolution in computer technologies has also brought
with it some changes in the way we conceive language, which are partly
due to such revolution, though not entirely. Technological advances in
the ield of information and communication have made the compilation
and processing of large amounts of data an incredibly easy and fast
task. Until quite recently, the compilation of large amounts of text was
a job that required an enormous effort by researchers. At present, such
process has become more feasible and certainly less time consuming,
giving the researcher more freedom to think about interesting ways of
exploring the data.
However, other important ‘revolutions’ have taken place in linguistics which in various ways have been favoured by these technological
developments. One such important revolution has to do with linguistic
theorisation. Linguists in the past would have been happy to decide
on language matters simply by asking themselves how the grammar of
their mother tongues worked since, as native speakers, they felt to be
competent enough to take such decisions. This mentalistic approach,
of course we are oversimplifying such approaches considerably, relied
on the introspective mental power of well-educated speakers, and for
most insightful decisions they made on the matter at hand they did not
need to observe the authentic language produced by other speakers. All
they needed was their own knowledge and their analytical power. In the
Fuster Márquez, Miguel & Almela, Moisés. 2017. “Words, Corpus and back to
Words: from language to discourse”. Quaderns de Filologia: Estudis Lingüístics 22: 9-12. doi: 10.7203/qf.22.11297
10
Quaderns de Filologia
famous Saussurean dichotomy between ‘langue’ and ‘parole’, these linguists were on the side of ‘langue’; ‘parole’ was of little or no interest.
However, an important change that was taking place in linguistics was
one in which other linguists started to give priority to the manifestations of ‘parole’; that is, how language was actually used by speakers
in their communities in order to theorise with greater accuracy about
‘langue’, or linguistic competence. Various signiicant developments
are related to such more empirical linguistic movement. One of these
was the acknowledgement of the spoken language as a legitimate part
of language. Twentieth century lexicographers started to collect and introduce examples of informal or conversational registers in the dictionaries they produced. Also, no less signiicant in this new approach was,
for example, the thrust of sociolinguistics, a broad research ield, with
many branches and fuzzy boundaries, that viewed languages as heterogeneous entities. Sociolinguists observed that variation was more the
rule than the exception in speech communities. Sociolinguists brought
with them empirical methodologies that enabled them to analyse how
real speakers produced language in real settings in order to build their
theories of variation and change. Sociolinguistics also made use of
quantiication in their methodologies. This is partly the context for the
emergence of corpus linguistics as a new approach to language. The
new framework relied on the examination of real data that had its origin
in language use, to build convincing linguistic arguments. Both variation and usage have been essential arguments in corpus approaches.
However, a corpus should not be confused with a database, quoting
Sinclair (1996: 2.1) “[a] corpus is a collection of pieces of language
that are selected and ordered according to explicit linguistic criteria in
order to be used as a sample of the language.” In contrast with any collection of data – any corpus linguist would insist – a corpus contains a
representative sample of language if the researcher needs to draw relevant conclusions about language. Broadly speaking, unlike essentially
mentalistic approaches, corpus research is empirical, with a preference
for inductiveness, that is, the careful analysis of data in representative
corpora.
However, most practitioners would agree that corpus linguistics is
not a theory, it is a methodology, even if such a methodology is somehow special. In fact, such methodology may be applied to a language,
different languages, different varieties of language or registers, by
Introduction
11
means of small, medium or large corpora, and adopt different approaches in order to test different theories. Interest in corpus linguistics today
may refer to areas such as the quality of corpus compilation, lexis and
phraseology, grammar, variation and change, discourse or stylistics,
among others. Corpus linguistics has been of interest in theoretical and
applied linguistics. There is abundant applied research, for example, in
the ields of lexicography, second language acquisition or translation.
Indeed, it is dificult to think of research areas where corpus linguistics
does not have room and something important to offer.
Quite regularly, corpus methodology combines quantitative and
qualitative approaches; where, in fact, one approach feeds the other.
Former purely qualitative analyses have been in many cases superseded
by approaches where quantiication and statistics are becoming more
prominent. Nevertheless, many convinced corpus linguists would also
claim that they are in favour of triangulation and convergent evidence
as a more acceptable approach.
Very frequently, the procedure of a corpus linguist will have as its
starting point a word or a word list. Therefore, the close examination
of a word’s behaviour will be crucial for practically any kind of research which relies on language use. It is also known that the most
signiicant advances in contemporary lexicography have been driven
by the inspection of reference corpora of variable size and scope that
have allowed researchers a more thorough understanding of real usage.
Also, the compilation of comparable corpora has provided the basis for
establishing parallels, differences and nuances for the purpose of comparability or contrast between languages. In addition, the possibility of
compiling more specialized ad hoc corpora has allowed the detailed
analysis of vocabulary in different types of discourse, either to determine its value in specialized languages or to gain a better understanding
of social or ideological implications, which is determined by the evaluation of linguistic preferences. Finally, it should be added that corpus
approaches have revealed the existence of linguistic units which go beyond more traditional lexicological approaches. Extensive research on
phraseology and corpus-based lexicography produced in recent decades
has brought to light the frequency in discourse of meaningful co-occurring lexical patterns and lexical-grammatical co-selection.
The aim of this issue is to bring together investigation into the lexicon in a variety of languages, in a diversity of manifestations – both at
12
Quaderns de Filologia
the word level and beyond the word level – and from a variety of perspectives, including not only those which focus on how the vocabulary
is internally organized, but also those which deal with the role that lexical units and lexical relations play in the organization of other language
levels, particularly in the organization of discourse. These issues are
approached from a variety of perspectives that include not only developments in several disciplines of theoretical and descriptive linguistics,
particularly in lexicology, phraseology, word formation, discourse analysis, but also in diverse applied disciplines such as translation, foreign
language teaching, English for speciic purposes and critical discourse
analysis. One of the criteria employed in the compilation of the volume was also the coverage of linguistic diversity. In total, six different
languages are investigated in the studies selected in this volume: English, German, Spanish, French, Portuguese, Italian. Without claiming
exhaustiveness, we consider that the variety of contributions presented
here offers an insight into the vigour of current corpus research into
phenomena related to the lexicon. Admittedly, the full range of topics,
approaches and methodologies developed in this area of research could
not it in a single volume, but a careful selection of studies representing
a variety of interesting advances can be representative of signiicant
developments taking place in the ield.
References
Sinclair, John McH. 1996. EAGLES. Preliminary Recommendations on Corpus Typology. http://www.ilc.pi.cnr.it/EAGLES96/corpustyp/corpustyp.html.
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
Corpus analysis of phraseology in an A1 level textbook
of German as a foreign language
Análisis basado en corpus de fraseología en un libro de texto de alemán
como lengua extranjera de nivel A1
María Rosario Bautista Zambrana
Universidad de Málaga.
[email protected]
Received: 25/05/2017. Accepted: 06/10/2017
Resumen: El objetivo de este artículo es analizar hasta qué punto el libro de texto de
alemán como lengua extranjera DaF kompakt A1 (Sander et al., 2011) cumple con las
recomendaciones del Marco Común Europeo de Referencia para las Lenguas (Consejo de Europa, 2001) con respecto a la competencia léxica y la competencia sociolingüística en actividades de comprensión y expresión, en concreto en lo concerniente a
unidades fraseológicas. En este sentido, nos hemos centrado en las fórmulas ijas y las
estructuras ijas presentes en un corpus formado por los materiales del libro de texto,
y hemos comprobado si esas expresiones ijas se corresponden con las competencias
fraseológicas y sociolingüísticas que se esperan en el Marco para un estudiante de lengua alemana de nivel A1. Con este in, hemos compilado un corpus con los materiales
de comprensión y expresión del libro de texto, formado por tres subcorpus: uno con los
textos escritos, otro con los textos orales, y un tercer subcorpus formado por ejercicios.
Hemos llevado a cabo un análisis cuantitativo (por medio de AntConc 3.4.4 [Anthony,
2016]) y kfNgram [Fletcher, 2007]), y uno cualitativo. Nuestros resultados apuntan a
que el libro de texto se ajusta a las recomendaciones del Marco.
Palabras clave: corpus; fraseología; alemán como lengua extranjera; Marco Común
Europeo de Referencia para las Lenguas; nivel A1.
Abstract: This paper aims to analyse the extent to which the textbook for German as a
foreign language DaF kompakt A1 (Sander et al., 2011) complies with the recommendations of the Common European Framework of Reference for Languages (Council of
Europe, 2001) (hereafter CEFR) in respect to lexical competence and sociolinguistic
competence in receptive and productive activities, speciically with regard to phraseological units. In this respect, we have focused on sentential formulae and ixed frames
present in a corpus containing the textbook materials, and we have checked whether
Bautista Zambrana, María Rosario. 2017. “Corpus analysis of phraseology in an
A1 level textbook of German as a foreign language”. Quaderns de Filologia:
Estudis Lingüístics 22: 13-32. doi: 10.7203/qf.22.11298
those ixed expressions correspond to the phraseological and sociolinguistic competences that are expected in the Framework for an A1 level student of German language.
To this end, we have compiled a corpus of the textbook receptive and productive materials, made up by three subcorpora: one for the written texts, one for the oral texts, and
a third subcorpus containing exercises. We have performed a quantitative analysis (by
means of AntConc 3.4.4 [Anthony, 2016] and kfNgram [Fletcher, 2007]), and a qualitative one. Our results suggest that the textbook complies with the recommendations
of the CEFR.
Keywords: corpus; phraseology; German as a foreign language; Common European
Framework of Reference for Languages; A1 level.
Corpus analysis of phraseology in an A1 level textbook of German...
15
1. Introduction
This paper is based on the premise that much of the language we use
is based on ready-made multi-word combinations, following Sinclair’s
idiom principle (Sinclair, 1991: 110):
the principle of idiom is that a language user has available to him a large
number of semi-preconstructed phrases that constitute single choices,
even though they might appear to be analysable into segments.
A considerable amount of literature has been published following
this approach, as well as resources such as the Academic Phrasebank
(Morley, 2017), which draws on the above-mentioned insight:
It is now accepted that much of the language we use is phraseological
in nature; that it is acquired, stored and retrieved as pre-formulated constructions (Bolinger, 1976; Pawley and Syder, 1983). These insights began to be supported empirically as computer technology permitted the
identiication of recurrent phraseological patterns in very large corpora
of spoken and written English using specialised software (e.g. Sinclair,
1991). (Morley, 2017: 5)
This insight has important implications for language teaching and
learning. We consider that learning phraseological units is essential for
basic level language learners, and that their teaching should start from
the very beginning, at the basic levels. As O’Keeffe et al. (2007: 46)
state for the case of chunks or clusters1:
(...) the vocabulary syllabus for the basic level is incomplete without
due attention being paid to the most frequent chunks, since many of
them are as frequent as or more frequent than single items which everyone would agree must be taught.
As O’Keeffe et al. (2007: 63) explain, there are many terms to describe the phenomena
of multi-word vocabulary or chunks: some of these terms are lexical phrases (Nattinger
and DeCarrico, 1992), routine formulae (Coulmas, 1979), formulaic sequences (Wray,
2000, 2002), chunks (De Cock, 2000), as well as (restricted) collocations, ixed expressions, or multi-word units/expressions. Throughout this paper we will use the generic
terms phraseological units and ixed expressions, and when referring to our speciic
object of study, sentential formulae or ixed frames.
1
16
María Rosario Bautista Zambrana
Bearing this in mind, this paper aims to analyse the extent to which
the textbook for German as a foreign language DaF kompakt A1 (Sander et al., 2011) complies with the recommendations of the Common
European Framework of Reference for Languages (Council of Europe,
2001) (hereafter CEFR) in respect to lexical competence and sociolinguistic competence in receptive and productive activities, speciically
with regard to phraseological units. This textbook was selected because
we have been using it in several courses at our University since the academic year 2013/2014, with good results and wide acceptance among
lecturers and students.
The CEFR describes lexical competence as the knowledge of, and
ability to use, the vocabulary of a language, and it consists of lexical
elements and grammatical elements. The lexical elements comprise,
according to the CEFR, single word forms and ixed expressions: the
latter consist of several words and are used and learnt as wholes (CEFR,
2001: 111). They include sentential formulae, phrasal idioms, ixed
frames, phrasal verbs, compound prepositions and ixed collocations.
We will focus in this paper on sentential formulae and ixed frames.
Sentential formulae are not deined explicitly in the CEFR, but are
described as including three kinds of expressions: direct exponents
of language functions such as greetings (e.g. Eng. How do you do?,
Good morning! and deut. Guten Morgen!, Nett, Sie kennenzulernen),
proverbs and relict archaisms. We have focused on the irst type, direct
exponents of language functions, and have looked for minimal communicative units, that can function as autonomous sequences2. As for the
language functions involved, they are presented in the CEFR (2001:
126) as part of the functional competence3:
1.1 imparting and seeking factual information:
• identifying
• reporting
• correcting
2
In this sense, we consider that sentential formulae are phraseological statements
(‘enunciados fraseológicos’), as deined by Corpas Pastor (1996): they are autonomous
speech sequences, minimal communicative units, stated with a distinct intonation.
3
These language functions are called speciically microfunctions and are deined as
“categories for the functional use of single (usually short) utterances, usually as turns in
an interaction.” (CEFR, 2001: 125)
Corpus analysis of phraseology in an A1 level textbook of German...
17
• asking
• answering
1.2 expressing and inding out attitudes:
• factual (agreement/disagreement)
• knowledge (knowledge/ignorance, remembering, forgetting, probability, certainty)
• modality (obligations, necessity, ability, permission)
• volition (wants, desires, intentions, preference)
• emotions (pleasure/displeasure, likes/dislikes, satisfaction, interest,
surprise, hope, disappointment, fear, worry, gratitude)
• moral (apologies, approval, regret, sympathy)
1.3 suasion:
• suggestions, requests, warnings, advice, encouragement, asking
help, invitations, offers
1.4 socialising:
• attracting attention, addressing, greetings, introductions, toasting,
leave-taking
1.5 structuring discourse4:
• (28 microfunctions, opening, turntaking, closing, etc.)
1.6 communication repair
• (16 microfunctions)
Fixed frames, on the other hand, are described as expressions “learnt
and used as unanalysed wholes, into which words or phrases are inserted to form meaningful sentences” (CEFR, 2001: 111), e.g. Eng. Please
may I have ... or Deut. Könnte ich bitte ... haben? Fixed frame is another
name for phrase frame, which Römer (2009: 150) deines as “sets of
n-grams which are identical except for one word, e.g. at the end of, at
the beginning of, and at the turn of would all be part of the p[hrase]frame at the * of.”
Lexical competence is associated in the CEFR with the scale of
“Vocabulary range”; its descriptor for the A1 level points also to phraseological competence: “Has a basic vocabulary repertoire of isolated
words and phrases related to particular concrete situations.”
Sociolinguistic competence, on the other hand, is concerned with
the knowledge and skills required to deal with the social dimension of
language use, as the CEFR (2001: 118) explains. There are two areas
We can ind the complete lists of microfunctions for structuring discourse and for
communication repair in Threshold Level 1990 (van Ek and Trim, 1991).
4
18
María Rosario Bautista Zambrana
here closely related to phraseology: linguistic markers of social relations and politeness conventions. The former comprises the following
types of expressions, many of which are ixed (CEFR, 2001: 118):
•
•
•
•
use and choice of greetings:
on arrival, e.g. Hello! Good morning!
introductions, e.g. How do you do?
leave-taking, e.g. Good-bye . . . See you later
use and choice of address forms:
frozen, e.g. My Lord, Your Grace
formal, e.g. Sir, Madam, Miss, Dr, Professor (+ surname)
informal, e.g. irst name only, such as John! Susan!
informal, e.g. no address form
familiar, e.g. dear, darling; (popular) mate, love
peremptory, e.g. surname only, such as Smith! You (there)!
ritual insult, e.g. you stupid idiot! (often affectionate)
conventions for turntaking
use and choice of expletives (e.g. Dear, dear!, My God!, Bloody
Hell!, etc.)
Politeness conventions, for their part, include the following types of
expressions (many are as well phraseological in nature) (CEFR, 2001:
119):
1. ‘positive’ politeness, e.g.:
• showing interest in a person’s well being;
• sharing experiences and concerns, ‘troubles talk’;
• expressing admiration, affection, gratitude;
• offering gifts, promising future favours, hospitality;
2. ‘negative’ politeness, e.g.:
• avoiding face-threatening behaviour (dogmatism, direct orders,
etc.);
• expressing regret, apologising for face-threatening behaviour (correction, contradiction, prohibitions, etc.);
• using hedges, etc. (e.g. ‘ I think’, tag questions, etc.);
3. appropriate use of ‘please’, ‘thank you’, etc.;
4. impoliteness (deliberate louting of politeness conventions), e.g.:
• bluntness, frankness;
• expressing contempt, dislike;
• strong complaint and reprimand;
Corpus analysis of phraseology in an A1 level textbook of German...
•
•
19
venting anger, impatience;
asserting superiority.
There is a scale related to sociolinguistic competence, “Sociolinguistic appropriateness”, and it includes a descriptor for the A1 level which mentions phraseological aspects: “Can establish basic social
contact by using the simplest everyday polite forms of: greetings and
farewells; introductions; saying please, thank you, sorry, etc.”
This study is speciically centered on receptive activities (reception)
and productive activities (production). The former include reading and
listening activities (CEFR, 2001: 65-71). For the A1 level there are
not any descriptors for listening activities that include any reference to
ixed expressions, but we do ind some descriptors about reading that
mention phraseology: in “Overall reading comprehension” it is recommended for the A1 level that the learner can “understand very short,
simple texts a single phrase at a time, picking up familiar names, words
and basic phrases and rereading as required”. In the section “Reading
for orientation” we ind that the learner “Can recognise familiar names,
words and very basic phrases on simple notices in the most common
everyday situations.”
Production, on the other hand, includes speaking and writing activities. With respect to oral production, there is one descriptor for the
A1 level that mentions phraseology: in “Overall oral production” it is
proposed that the learner “can produce simple mainly isolated phrases about people and places.” As for writing activities, the descriptor
“Overall written production” includes the recommendation that the A1
level learner “can write simple isolated phrases and sentences”, while
the descriptor “Creative writing” mentions that the learner “can write
simple phrases and sentences about themselves and imaginary people,
where they live and what they do.”
The speciic objective of this paper has been to study the sentential
formulae and ixed frames present in a corpus containing the receptive
and productive materials of the textbook DaF kompakt A1, and to check
whether those ixed expressions correspond to the phraseological and
sociolinguistic competences that are expected in the Framework for an
A1 level student of German language. The remaining part of the paper
proceeds as follows: in Section 2 we present the methodology that we
have followed to carry out this study, while in Section 3 the results of
20
María Rosario Bautista Zambrana
the quantitative and qualitative corpus analysis are laid out. Finally,
Section 4 offers the discussion of the results, and Section 5 some concluding remarks.
2. Methodology
We have followed a quantitative and a qualitative methodology. In order to perform the linguistic analysis that we have set out to do, we
have compiled a corpus of the DaF kompakt A1 textbook materials,
made up by three subcorpora: one for the written texts (letters, e-mails,
advertisements, text messages, biographies, news…), one for the oral
texts (transcriptions of conversations and monologues, mostly voice
messages), and one for the exercises; all of these texts were taken both
from the Kursbuch (‘coursebook’) and the Übungsbuch (‘workbook’).
In the case of the spoken and the written subcorpora, we decided to include only complete texts, while for the exercise subcorpus, we selected
those activities that contained sentences or at least some type of ixed
expressions; in this way, exercises focusing exclusively on single word
forms or morphology were left out. The formulation and instructions of
the exercises, as well as the grammar reference sections and vocabulary
lists, were left out too.
The textbook is a compact method, containing relatively few written
texts, a moderate amount of oral texts, and a substantial number of exercises. Thus, the written subcorpus includes 26 texts, containing 2620
tokens and 929 types (type-token ratio 35,46%); the oral component
comprises 81 texts, containing 7936 tokens and 1449 types (type-token
ratio 18,26%); and the exercise subcorpus is made of 215 texts (each
one representing a different task), containing 10250 tokens and 1620
types (type-token ratio 15,8%). As we can see, there is greater lexical
variety in the written subcorpus, whereas the exercise subcorpus has
the lowest ratio, which means that many of its words occur repeatedly.
We have performed the quantitative analysis by means of AntConc
3.4.4 (Anthony, 2016) and kfNgram (Fletcher, 2007). We have used the
Cluster/N-Gram function of AntConc to extract all 2-, 3-, 4- and 5-word
n-grams from each corpus. We established a normalised threshold of
250 occurrences per million words for each corpus, which resulted in
a minimum threshold of two for the spoken corpus, and of only one
occurrence for the written corpus. Even though it might seem a very
Corpus analysis of phraseology in an A1 level textbook of German...
21
low absolute threshold, it is actually a high normalised threshold, which
can be justiied by the fact that we are dealing with very frequent word
combinations, relevant for basic level language learners. The exercise
subcorpus, on the other hand, was used for comparison purposes, so all
the n-grams extracted in the previous steps were searched for later in
this subcorpus.
Afterwards, we employed kfNgram to extract all 2- to 6-word phrase
frames, i.e. n-grams which are identical except for a single word, from
each corpus. We expanded the number of words (n) to 6, as we noticed
that in that way some more relevant frames could be extracted. As for
the options speciied, it is worth noting that in order to generate lists of
phrase-frames, the programme relies on previously-produced lists of
wordgrams (n-grams) with values of n of 2 or greater; that is why we
generated in the irst place as many n-grams as possible, by setting the
minimum frequency of occurrence to 1.
As for the qualitative methodology, we examined all n-grams and
phrase frames extracted from the oral and the written subcorpora to
see which ones complied with the deinition of sentential formulae and
ixed frames as proposed by the CEFR, and then compared the results
with the n-grams and phrase frames extracted from the exercise subcorpus, so as to check whether the phraseological units laid out in the
receptive materials were later practised in the productive sections. In
this sense, we could deine our work as corpus-based, as Storjohann
(2005: 8-9) describes:
From this repository, appropriate material is extracted to support intuitive knowledge, to verify expectations, to allow linguistic phenomena
to be quantiied, and to ind proof for existing theories or to retrieve
illustrative samples. It is a method where the corpus is interrogated and
data is used to conirm linguistic pre-set explanations and assumptions.
It acts, therefore, as additional supporting material.
Thus, we have used the corpus to ind pre-deined linguistic structures: sentential formulae and ixed frames. As we mentioned above,
both are types of ixed expressions, which consist of several words and
are used and learnt as wholes (CEFR, 2001: 111). In this way, we have
selected those n-grams which fulilled the conditions to be a sentential
formula and complied with any of the language functions listed above.
22
María Rosario Bautista Zambrana
As for the ixed frames, we followed the same approach: to focus
on those that corresponded to minimal communicative units, and that
complied with any of the language functions cited above.
The study on sociolinguistic competence, on the other hand, was
carried out by reviewing all the sentential formulae that we had previously extracted from the spoken and the written subcorpora, and by
determining which ones could meet the criteria to constitute a linguistic
marker of social relations, or an expression of politeness. The results
were then compared with the expressions found in the exercise subcorpus.
3. Results
We extracted n-grams and phrase frames following the criteria mentioned above, and classiied the results in two groups: those related to
lexical competence, and those related to sociolinguistic competence.
3.1. Lexical competence
We explored the spoken and the written subcorpora separately, in order
to detect differences in spoken and written discourse, so we will offer
differentiated results.
3.1.1. Spoken subcorpus
From the spoken subcorpus of DaF kompakt A1 we extracted 60 sentential formulae and 23 ixed frames. We classiied the sentential formulae according to the number of words in the n-grams, and noted down
which language function (LF) was being fulilled. Here are some examples of 2-, 3- and 4-grams5:
5
We did not ind any relevant 5-grams.
Corpus analysis of phraseology in an A1 level textbook of German...
Rank
5
6
25
31
47
1
23
57
105
134
1
4
21
69
108
Freq.
20
18
10
9
7
4
3
3
2
2
4
3
2
2
2
N-gram
guten Tag6 (‘good morning/afternoon’)
vielen Dank (‘thank you very much’)
auf Wiedersehen (‘goodbye’)
auf Wiederhören (‘goodbye’[telephone])
das geht (‘it is possible’)
wie geht’s? (‘how are things?’)
das ist alles (‘that’s everything’)
weißt du was? (‘you know what?’)
das klingt gut (‘that sounds good’)
es geht so (‘so-so’)
wie geht’s dir? (‘how are you?)
kann ich Ihnen helfen? (‘can I help you?’)
das geht leider nicht (‘unfortunately
that is not possible’)
können Sie mir helfen? (‘can you help me?’)
wie geht es dir? (‘how are you?’)
23
LF
1.4
1.2
1.4
1.4
1.1/1.2
1.4
1.1
1.2
1.2
1.2
1.2
1.3
1.1/1.2
1.3
1.2
The 60 sentential formulae that we have found in the oral corpus fulil the following language functions, as described by the CEFR (2001):
1.1 imparting and seeking factual information
1.2 expressing and inding out attitudes
1.3 suasion
1.4 socialising
1.5 structuring discourse
1.6 communication repair
11
36
5
10
2
1
Out of the 60 sentential formulae extracted from the oral subcorpus,
45 of them are found in the exercise subcorpus, occurring at least once.
22 of them occur three or more times.
In respect to the ixed frames, we classiied them according to the
number of words and we noted down their language function. Below
are some ixed frames of 2-, 3-, 4- and 5-grams.
Our search was not case-sensitive, but we have capitalized the nouns in these tables
of results.
6
24
María Rosario Bautista Zambrana
Fixed frame
Total freq.
bis *7 (‘see you *’)
13
soll ich *8 (‘shall I *’)
4
das macht * (‘that’s [price]’)
3
wie geht’s *? (‘how are *?’)
7
was ist mit *? (‘what about *?’)
3
ich hätte gern * (‘I’d like *’)
3
mir geht es * (‘I am *’)
2
wie komme ich zum *9
3
Nr. of varieties
6
4
3
4
3
2
2
3
LF
1.4
1.3
1.1
1.2
1.1
1.2
1.2
1.1
The 23 ixed frames that we have found in the spoken corpus comply with the following language functions:
1.1 imparting and seeking factual information
1.2 expressing and inding out attitudes
1.3 suasion
1.4 socialising
1.5 structuring discourse
1.6 communication repair
9
9
2
3
0
0
As we observe, most of the ixed frames are used to impart and seek
factual information, or are related to expressing and inding out attitudes. As in the case of the sentential formulae, we have been barely
able to ind expressions for structuring discourse or repairing communication.
Out of the 23 ixed frames found in the oral subcorpus, 13 of them
appear in the exercises.
3.1.2. Written subcorpus
From the written subcorpus of DaF kompakt A1 we extracted 25 sentential formulae and four ixed frames. We classiied the sentential formulae according to the number of words in the n-grams, and noted down
Only with nouns or adverbs expressing a point of time in the future, for instance: bis
Montag (‘see you on Monday’), bis später (‘see you later’).
8
This phrase frame is actually not only completed by adding one word, but more, but
we decided to include it given its function: to propose something.
9
In English: ‘how do I get to *?’.
7
25
Corpus analysis of phraseology in an A1 level textbook of German...
which language function (LF) was being fulilled. Here are some examples of 2-, 3- and 4-grams:
Rank Freq.
6
8
38
3
180
2
3
3
873
1
1589 1
2390 1
2423 1
950
1
2011 1
N-gram
LF
liebe Grüße (‘kind regards’)
1.5
du weißt (‘you know’)
1.2
sehr gern (‘I’d love to’)
1.3
hast du Lust? (‘do you feel like it/doing it?’)
1.3
Gott sei Dank (‘thank God’)
1.2
mit freundlichen Grüßen (‘yours sincerely’)
1.5
wie geht es dir? (‘how are you?’)
1.2/1.4
wir grüßen euch herzlich (‘we send our best wishes’)
1.5
hast du Zeit und Lust?10
1.3
so geht es nicht weiter (‘it cannot go on like this’)
1.2
The 25 sentential formulae that we have found in the written subcorpus fulil the following language functions, as described by the CEFR
(2001):
1.1 imparting and seeking factual information
1.2 expressing and inding out attitudes
1.3 suasion
1.4 socialising
1.5 structuring discourse
1.6 communication repair
2
6
8
4
7
0
As we can see, the sentential formulae fulil varied functions, being
suasion and structuring discourse the most common.
Out of the 25 sentential formulae detected, 12 are found also in the
oral subcorpus, and 15 in the exercise subcorpus (and nine of them occur three or more times).
With regard to the ixed frames, we classiied them according to the
number of words and we noted down their language function. Below
are the ixed frames that we were able to extract (2-, 3-, and 5-grams):
10
In English: ‘do you have time and feel like it?’.
26
María Rosario Bautista Zambrana
Fixed frame
liebe * (‘dear *’)
lieber * (‘dear’)
danke für * (‘thanks for *’)
* gefällt mir sehr gut11
Total freq.
9
5
2
2
Nr. of varieties
8
5
2
2
LF
1.5
1.5
1.2
1.2
As we can see, two of the ixed frames fulil the function of structuring discourse, and the other two are used to express attitudes.
Two of the ixed frames found in this subcorpus are present in the
exercises: liebe * and lieber *.
3.2. Sociolinguistic competence
We explored the spoken and the written subcorpora separately, so we
will offer differentiated results.
3.2.1. Spoken subcorpus
We analysed the sentential formulae that we extracted from the corpus
in order to establish which ones could meet the criteria to act as linguistic markers of social relations or as politeness conventions. We found
that 10 expressions can be considered linguistic markers of social relations, and all of them are 2-grams. Below are some examples:
Rank
128
245
248
252
721
Frequency
4
3
3
3
2
Expression
bis später (‘see you later’)
grüß Gott (‘hello’)
guten Morgen (‘good morning’)
herzlich willkommen (‘welcome’)
oh je (‘oh dear’)
Most of these expressions are related to the use and choice of greetings (on arrival and leave-taking). We also ind one expletive (oh je). 8
of these expressions are present also in the exercise subcorpus.
As for the politeness conventions, we detected 34 expressions
among the sentential formulae that we had previously extracted. Below
are some examples (2-, 3- and 4-grams):
11
In English: ‘I like * very much’.
Corpus analysis of phraseology in an A1 level textbook of German...
Rank
65
67
70
92
143
282
324
1
4
Frequency
6
6
6
5
4
2
2
4
3
27
Expression
hier bitte (‘here you are’)
ja, gern (‘with pleasure’)
kein Problem (‘no problem’)
gern geschehen (‘my pleasure’)
freut mich (‘pleased to meet you’)
tut mir leid (‘sorry’)
wie Sie wollen (‘as you like’)
wie geht’s dir? (‘how are you?’)
kann ich Ihnen helfen? (‘can I help you?’)
Most of these expressions are related to positive politeness (wie geht’s dir, freut mich), while a few correspond to negative politeness (tut
mir leid). We also ind clusters for expressing ‘please’ or ‘thank you’
(vielen Dank; nein, danke). 26 of these expressions are found also in
the exercise subcorpus.
3.2.2. Written subcorpus
We determined that nine sentential formulae from the written subcorpus can be considered linguistic markers of social relations. Below are
some examples:
Rank
899
46
1935
Frequency
1
2
1
Expression
grüß dich (‘hello’)
viele liebe grüße (‘lots of love’)
seid herzlich gegrüßt (‘best wishes’)
Three of these linguistic markers are found in the oral subcorpus,
while six are also present in the exercise subcorpus.
On the other hand, we found eight expressions that qualify as politeness conventions, such as the following:
Rank
915
1862
2390
Frequency
1
1
1
Expression
guten Appetit (‘enjoy your meal’)
stimmt’s? (‘right?’)
wie geht es dir? (‘how are you?’)
Five of these politeness conventions have been detected also in the
oral subcorpus, whereas six are present in the exercise subcorpus.
28
María Rosario Bautista Zambrana
4. Discussion
In respect to lexical competence, we have divided our results into two
groups, oral texts and written texts, and have compared them with those
of the exercise subcorpus.
The spoken subcorpus contains 60 sentential formulae and 23 ixed
frames, whereas 45 of the formulae (75%) and 13 of the frames (56,5%)
are practised in the exercises.
As for the written subcorpus, we have found quite a few sentential
formulae (25; 15 of which in exercise subcorpus, 60%). Only four ixed
frames have been extracted, whereas two of them (50%) are practised
in the exercise section. Some of the sentential formulae are present as
well in the oral subcorpus: 12 (48%).
If we consider the results of the oral and the written subcorpora as
a whole, we obtain a total number of 73 sentential formulae and of 27
ixed frames. 51 of these sentential formulae are found in the exercise
subcorpus (69,86%), while we detect 15 ixed frames (55,55%).
Given these results, and taking into account the amount and variety
of sentential formulae and ixed frames that we have encountered, we
can state that both the oral and the written subcorpora comply suficiently with the recommendations of the CEFR in respect to lexical
competence: “Has a basic vocabulary repertoire of isolated words and
phrases related to particular concrete situations.” Even though the written component is quite small and therefore probably not representative
enough of the German language at a basic level, this shortcoming is
offset by the fact that a part of its ixed expressions are present also in
the oral and exercise subcorpora. Considering this fact, we may state
that this subcorpus contributes to the compliance with the reading descriptors in the CEFR: the learner “can understand very short, simple
texts a single phrase at a time, picking up familiar names, words and
basic phrases and rereading as required” and the learner “can recognise
familiar names, words and very basic phrases on simple notices in the
most common everyday situations.”
Regarding productive activities, although we have used the exercise
subcorpus mainly for comparison purposes, we can draw some interesting conclusions: a majority of ixed expressions from the receptive
materials are present in this subcorpus, as we stated above, and their
frequency is relatively high; while the total number of sentential for-
Corpus analysis of phraseology in an A1 level textbook of German...
29
mulae (tokens) for the oral and written subcorpora is 261 and 43, respectively, we can ind 198 tokens in the exercise section. On the other
hand, there are 161 ixed frames (tokens) in the spoken component, 18
in the written one and 125 in the exercise subcorpus. This fact allows
us to state that the productive component complies suficiently with
the descriptors “Overall written production” (the A1 level learner “can
write simple isolated phrases and sentences”) and “Creative writing”
(the learner “can write simple phrases and sentences about themselves
and imaginary people, where they live and what they do”).
Another signiicant inding is that we have detected a noticeable difference between the written and the oral subcorpora, in spite of the
coincidences that we have mentioned above. In the written subcorpus
there are more phraseological units for structuring discourse (mit freundlichen Grüßen) and for expressing suasion (hast du Zeit und Lust?),
while in the oral subcorpus there are more ixed expressions for expressing and inding out attitudes (das kling gut), for socialising (wie
geht’s?) and for imparting and seeking factual information (das geht).
Regarding sociolinguistic competence, we have also distinguished
two groups, oral texts and written texts, and have compared the results
with those of the exercise subcorpus.
In the oral subcorpus there are 10 linguistic markers of social relations and 34 politeness conventions (respectively, 8 [80%] and 26
[76,47%] in the exercises). On the other hand, in the written subcorpus
there are nine linguistic markers of social relations and eight politeness
conventions (respectively, six [75%] and six [75%] in the exercises).
Given the number and variety of expressions extracted from the oral
subcorpus (and to a lesser extent, from the written subcorpus), we can
state that DaF kompakt A1 complies with the descriptor of sociolinguistic appropriateness for the A1 level, as formulated in the CEFR (2001:
122): “Can establish basic social contact by using the simplest everyday
polite forms of: greetings and farewells; introductions; saying please,
thank you, sorry, etc.” The written subcorpus has yielded very limited
results, but this can be partially compensated by the fact that the phraseological units are also present in the oral subcorpus (33% of linguistic
markers and 62,5% of politeness conventions) and in the exercises of
the book, so we can consider that they are suficiently repeated, in such
a way that students are exposed to them.
30
María Rosario Bautista Zambrana
5. Conclusions
This paper has offered an overview of two competences of the CEFR
that are directly related to phraseology: lexical and sociolinguistic competences. We set out to analyse a corpus of the receptive and productive
materials of the textbook DaF kompakt A1, in order to check whether it
contained ixed expressions that corresponded to the recommendations
of the CEFR. After analysing the corpus in search for sentential formulae
and ixed frames, we can conclude that both the spoken and the written
subcorpora (as well as the exercise component) comply with the descriptors laid out in the CEFR. The written subcorpus was very limited in size
and therefore did not yield many results in form of ixed expressions,
but still they can be considered suficient, taking into account that a fair
amount of the ixed expressions listed occur also in the exercises and in
the oral subcorpus.
Apart from the obvious difference in the number of detected ixed
expressions, we also determined that the spoken subcorpus contains
more expressions related to imparting and seeking factual information, to expressing and inding out attitudes and to socialising, while
the phraseological units found in the written subcorpus deal more with
suasion and structuring discourse. This is in line with the well-known
differences between oral and written discourse.
With respect to sociolinguistic competence, we found numerous expressions that comply with the minimum recommendations for the A1
level. With regard to linguistic markers of social relations there are not
signiicant differences between the spoken and the written subcorpora,
but we have observed that politeness conventions are much more prevalent in oral discourse.
Further investigations can be carried out by comparing the results
to a general reference corpus of the German language, to determine if
these ixed expressions are indeed the most frequent and widely used
by German speakers. However, we might encounter the limitation that,
to the best of our knowledge, there is not any general reference corpus
of the spoken German language that allows the user to create lists of
n-grams or clusters.
Corpus analysis of phraseology in an A1 level textbook of German...
31
6. Bibliography
Anthony, Laurence. 2016. AntConc (Version 3.4.4) [Computer Software]. Tokyo, Japan: Waseda University. http://www.laurenceanthony.net/
Bolinger, Dwight. 1976. Meaning and memory. Forum Linguisticum 1: 1-14.
Corpas Pastor, Gloria. 1996. Manual de fraseología española. Madrid: Gredos.
Coulmas, Florian. 1979. On the sociolinguistic relevance of routine formulae.
Journal of Pragmatics 3: 239-266.
Council of Europe. 2001. Common European Framework of Reference for
Languages: Learning, teaching, assessment. Cambridge: Cambridge
University Press.
De Cock, Sylvie. 2000. Repetitive phrasal chunkiness and advanced EFL
speech and writing. In Mair, C. & Hundt, M. (eds.) Corpus Linguistics
and Linguistic Theory. Papers from ICAME 20. Amsterdam: Rodopi,
51-68.
Fletcher, William H. 2007. KfNgram [Computer Software]. Annapolis MD:
USNA. http://www.kwicinder.com/kfNgram/kfNgramHelp.html
Morley, John. 2017. Academic Phrasebank. Manchester: The University of
Manchester. http://www.phrasebank.manchester.ac.uk/ [Accessed
05/03/2017].
Nattinger, James & DeCarrico, Jeanette. 1992. Lexical Phrases and Language
Teaching. Oxford: Oxford University Press.
O’Keeffe, Anne; McCarthy, Michael & Carter, Ronald. 2007. From Corpus
to Classroom: language use and language teaching. Cambridge: Cambridge University Press.
Pawley, Andrew & Syder, Frances Hodgetts. 1983. Two puzzles for linguistic
theory: nativelike selection and nativelike luency. In Richards, J. C. &
Schmidt, R. W. (eds.) Language and Communication. Longman: New
York, 191-226.
Römer, Ute. 2009. The inseparability of lexis and grammar: Corpus linguistic
perspectives. Annual Review of Cognitive Linguistics 7: 141-163.
Sander, Ilse; Braun, Birgit; Doubek, Margit; Frater-Vogel, Andrea; Trebesius-Bensch, Ulrike; Vitale, Rossana; Behnes, Sibylle; Kotas, Ondrej &
Marquardt-Langermann, Martina. 2011. DaF kompakt A1. Deutsch als
Fremdsprache für Erwachsene. Stuttgart: Klett.
Sinclair, John. 1991. Corpus, Concordance and Collocation. Oxford: Oxford
University Press.
Storjohann, Petra. 2005. Corpus-driven vs. corpus-based approach to the
study of relational patterns. In Conference e-journal, Corpus Linguistics 2005 conference. Birmingham: University of Birmingham, 1-20.
32
María Rosario Bautista Zambrana
http://www.birmingham.ac.uk/research/activity/corpus/publications/
conference-archives/2005-conf-e-journal.aspx [Accessed 5/03/2017].
van Ek, Jan Ate & Trim, John Leslie Melville. 1991. Threshold Level 1990.
Cambridge: Cambridge University Press.
Wray, Alison. 2000. Formulaic sequences in second language teaching: principle and practice. Applied Linguistics 21(4): 463-489.
Wray, Alison. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press.
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
Getting rid of the Chi-square and Log-likelihood tests
for analysing vocabulary differences between corpora
Analizar las diferencias de vocabulario entre corpus
sin los tests Chi-cuadrado y Log-likelihood
Yves Bestgen
Université catholique de Louvain.
[email protected]
Received: 20/04/2017. Accepted: 11/10/2017
Resumen: Los tests de log-likelihood y chi-cuadrado probablemente sean las pruebas
estadísticas más populares utilizadas en la lingüística de corpus, especialmente cuando la investigación tiene como objetivo describir las variaciones léxicas entre corpus
distintos. Sin embargo, dado que este uso especíico del chi-cuadrado no es válido, produce demasiados resultados signiicativos. Esta contribución explica el origen del problema (es decir, la no independencia de las observaciones), los motivos por los cuales
las soluciones habituales no son aceptables y qué clase de pruebas estadísticas deben ser
utilizadas en su lugar. Se ha realizado un análisis de corpus sobre las diferencias léxicas
entre el inglés británico y el inglés americano para mostrar el problema y conirmar la
adecuación de la solución propuesta. La última sección presenta las órdenes que pueden
darse a WordSmith Tools, un programa informático muy popular en el procesamiento de
corpus, a in de obtener los datos necesarios para las pruebas adecuadas, así como un
procedimiento muy fácil de usar en R, un paquete estadístico gratuito y fácil de instalar,
que realiza estas pruebas.
Palabras clave: diferencias léxicas entre corpus; test de remuestreo; Wordsmith tools;
inglés británico y americano.
Abstract: Log-likelihood and Chi-square tests are probably the most popular statistical
tests used in corpus linguistics, especially when the research is aiming to describe the
lexical variations between corpora. However, because this speciic use of the Chi-square
test is not valid, it produces far too many signiicant results. This paper explains the
source of the problem (i.e., the non-independence of the observations), the reasons for
which the usual solutions are not acceptable and which kinds of statistical test should be
used instead. A corpus analysis conducted on the lexical differences between American
and British English is then reported, in order to demonstrate the problem and to conirm
Bestgen, Yves. 2017. “Getting rid of the Chi-square and Log-likelihood tests for
analysing vocabulary differences between corpora”. Quaderns de Filologia:
Estudis Lingüístics 22: 33-56. doi: 10.7203/qf.22.11299
the adequacy of the proposed solution. The last section presents the commands that can
be used with WordSmith Tools, a very popular software for corpus processing, to obtain
the necessary data for the adequate tests, as well as a very easy-to-use procedure in R, a
free and easy to install statistical software, that performs these tests.
Keywords: lexical differences between corpora; resampling test; WordSmith Tools;
British and American English.
35
Getting rid of the Chi-square and Log-likelihood tests...
1. Introduction
Many studies in corpus linguistics aim to analyse lexical differences
between corpora of different genres (Tribble, 2000), their regional and
diatypic varieties (Oakes & Farrow, 2007), their oral or written modalities (Rayson, Leech & Hodges, 1997), the period of writing (Laviosa,
Pagano, Kemppanen & Ji, 2017) or certain sociological characteristics
of the speaker or writer, such as gender, age and socio-economic status
(Brezina & Meyerhoff, 2014; Marquilhas, 2015), to cite a few examples. This kind of study immediately raises the question of how to decide whether a difference observed when comparing two given corpora
(i.e., more occurrences of towards or male in an American English as
opposed to a British English corpus) is purely accidental, or whether
it relects a real difference in the way English is used. The answer is
typically provided through the use of the Pearson’s Chi-square (Chi2)
test or its close neighbour, the log-likelihood (LL) test (Biber & Jones,
2008; Rayson & Garside, 2000).
These statistical tests are applied to a contingency table made up of
the frequency of a word in the two corpora to be compared and the total
number of words in each corpus. Table 1 shows the contingency tables
for the words towards and male in the British English corpus FLOB
and in the American English corpus FROWN, which are used in the
empirical analyses reported in section 4.
British
American
Towards
17
293
~Towards
1016832
1018360
British
American
Male
89
177
~Male
1016760
1018476
Table 1. Frequency counts for two words in the FLOB and FROWN corpora
The null hypothesis tested is that the difference between the frequency of use in the two corpora is only the result of random variations,
the two samples compared being randomly extracted from a single population. The statistics used are:
36
Yves Bestgen
in which O represents the observed frequency and E the expected
frequency, computed on the basis of the marginal totals, and the summation is over the four cells (and not over the irst two as in Brezina and
Meyerhoff (2014)).
Under H0, these statistics are approximately distributed as a Chisquare with one degree of freedom, which makes it possible to calculate
the probability of obtaining a statistic at least as high as that which
would be observed if the differences were due to chance alone. Applied
to the words towards and male, these two tests return probabilities of
less than 0.00000001.
As noted by Sampson (2003), the use of these tests has, for example,
expanded our understanding of the lexical differences between British
and American English by Holand and Johansson (1982). These authors
showed that masculine words, such as he, boy and man, are signiicantly more frequent in American English, while feminine words are
signiicantly more frequent in British English.
The popularity of these tests has undoubtedly been reinforced by
their implementation in a software as frequently used in corpus linguistics as WordSmith Tools (Scott, 1997), one of its main functions being the identiication of Keywords, i.e., all words that successfully pass
the Chi2 or LL tests for a probability threshold of 0.000001. This same
function is also available in other software, such as AntConc (Anthony,
2012). These two tests are also very frequently used to test speciic
hypotheses in corpus linguistics (Lee & Chen, 2009; Lubbers Quesada
& Blackwell, 2009; see Gablasova, Brezina & McEnery (2017) for illustrations and a discussion). For example, Siyanova-Chanturia (2015)
used the Chi2 test to conirm that Chinese beginner learners of Italian
used more strongly associated collocations at the end of an intensive
course than they did at the beginning.
However, these tests, according to the way they are used to analyse
lexical differences between corpora, are inadequate, as has already been
pointed out by several authors, and should no longer be used (Bestgen,
2012, 2014; Brezina & Meyerhoff, 2014; Kilgarriff, 1996, 2005; Lijfijt, Nevalainen, Säily, Papapetrou, Puolamäki & Mannila, 2016). The
aim of this paper is to help researchers to abandon them by explaining
in detail the problem they pose and its origin, by showing why several
possible solutions are ineffective and by recommending two valid and
eficient statistical tests. To make the use of these adequate tests as sim-
Getting rid of the Chi-square and Log-likelihood tests...
37
ple as possible, the last section provides the commands to obtain the
necessary data by means of WordSmith Tools and a very easy-to-use
script in R, a free and easy to install multi-operating system statistical
software, to perform them.
2. The problem
The use of these two tests in corpus linguistics has been criticized
for the very large number of signiicant differences they claim to detect (Baker, 2004; Gries, 2005; Kilgarriff, 1996, 2005). For example,
Paquot and Bestgen (2009) observed, when comparing a literary corpus
and an academic corpus of 15 million words each, that more than 90%
of the 10,333 words tested were signiicantly more frequent in one of
the two corpora for a probability threshold of 0.000001. The origin of
this problem was most often explained by the very large sample size
under analysis (Kilgarriff, 2005) or in the large number of tests performed (Gries, 2005). The problem is, in fact, much deeper and does
not arise only in linguistics. It was mentioned by Lewis and Burke as
early as 1949 as the main misuse of the Chi2 test in psychology, and
has been repeatedly emphasized since then: “Chi-square may be correctly used only if all N observations are made independently” (Kurtz
& Mayo, 1979: 366); that is, each observation must be “taken from the
population at random, and the selection of each member of the sample
is independent from the next” (Wallis, 2013: 352). In other words, for
the test to be valid1, the unit analysed must be the sampling unit (Bestgen, 2014; Gablasova et alii., 2017). This is (almost) never the case in
corpus linguistics. The unit analysed is often a word, or sometimes a
sentence, while the sampling unit used to construct the corpus is a text
(or an extract from a text).
Why does this discrepancy between the sampling unit and the unit
of analysis so strongly affect the number of signiicant words in corpus
comparison? It has long been known that the frequency of word occurrences varies greatly between texts (Church, 2000). It follows that the
presence of some very speciic texts, or even a single one, in a corpus
may be suficient to increase the frequency of certain words and thus
This problem arises for all statistical tests that can be applied to a contingency table,
including Fisher’s exact test, which also requires the observations to be independent.
1
38
Yves Bestgen
to modify the words considered as being signiicantly more frequent
in this corpus according to the Chi2 and LL tests. This phenomenon is
perfectly illustrated in the following example reported in Oakes and
Farrow (2007). These authors observed that one of the most typical
words in British English, according to the Chi2 test, is thalidomide.
They note, however, that all of the 55 occurrences of this word in the
British corpus appear in one single text. Contrary to what the Chi2 test
seems to indicate, thalidomide is not typical of British English, just of
one text in the British corpus. It is because this text has been selected
in its entirety for inclusion in the corpus that thalidomide appears as
typical. If the sampling unit had coincided with the unit of analysis
(the word), thalidomide would have had (virtually) no chance of being
declared typical. Thus, each selected text may cause a series of false
positives. It is important to note that it is not just such extreme cases
that invalidate the Chi2 and LL tests. The simple fact that the probability
of a word occurring in a text for a second time is far higher than that of
having it for the irst time, shows that non-independence is general and
not occasional (Church, 2000).
3. The solutions
A irst solution consists of disregarding the probability derived from the
inferential test (Bestgen, 2014; Gabrielatos & Marchi, 2011; Leedham,
2012). The Chi2 (or LL) values (called Keyness in WST) are interpreted
as indicators of the potential interest of each of the numerous vocabulary differences between the corpora: the larger it is, the more interesting the word. This solution has the major drawback of only masking
the problem without solving it because there is an inverse monotonic
relationship between the p-value and the test statistic. A word such as
thalidomide is extremely signiicant, because it has a very high Chi2
value. Pretending to only look at the Chi2 or the LL scores does not
solve anything.
A second solution is to use a dispersion measure to eliminate words
that only occur in a part of a corpus (Baker, 2004; Oakes & Farrow,
2007). The irst problem with this solution is that the threshold used to
decide that a word is insuficiently dispersed is necessarily arbitrary,
which is all the more annoying since the main measures proposed in
the literature are dificult to interpret (Oakes & Farrow, 2007). More-
Getting rid of the Chi-square and Log-likelihood tests...
39
over, Bestgen (2014) showed that taking dispersion into account made
it possible to reduce the problem posed by very badly dispersed words
(like thalidomide), but not to eliminate it. The LL and Chi2 tests remain
inadequate.
The only acceptable solution is to use an inferential test that reconciles the sampling units and the units of analysis and that is therefore
based on the frequency of the words2 not in the corpus, but in the texts
making up the corpus. Several statistical tests are possible. The most
obvious choice is the Student’s t-test for comparing two means. This
test, however, is problematic, because it is based on a postulate of normality which is very dificult to sustain in the case of data made of
word frequencies. For this reason, Kilgarriff (1996) and several authors
after him (Brezina & Meyerhoff, 2014; Lijfijt et alii., 2016; Paquot &
Bestgen, 2009) proposed the use of a distribution-free test (also called
a nonparametric test). The test recommended by Kilgarriff is the Wilcoxon-Mann-Whitney test (WMW), which is carried out on the relative
frequency of each word in each text after they have been transformed
into ranks. When simpliied a little, it calculates the probability of having, under the null hypothesis that the two corpora were drawn at random from identical populations of texts, a difference which is at least as
important between the average ranks as that observed.
This proposal was strongly criticized by Rayson and colleagues
(Rayson, Berridge & Francis, 2004; Rayson & Garside, 2000) because
this test neglects to take into account some important information
available in the data due to the transformation of the frequencies into
ranks. However, it is easy to remedy this problem because there is a
WMW-equivalent test that can be applied to the non-ranked values: the
Fisher-Pitman (FP) test (Berry, Mielke & Mielke, 2002; Neuhauser &
Manly, 2004). It calculates the probability, under the same null hypothesis, of obtaining a difference between the mean frequencies in texts as
large as the difference actually observed. The only difference between
these two tests is therefore that one is calculated on the basis of ranked
data and the other on raw data.
Since the texts in a corpus are rarely of exactly the same length, the analyses must be
carried out on the relative frequencies (number of occurrences divided by the length of
the text).
2
40
Yves Bestgen
These two tests have some properties which are important to know
in order to use them adequately. First, because they free us from the
normality assumption, they test a more general null hypothesis than
that tested by the Student’s t-test. They also detect differences in the
variability and even in the shape of the distributions. However, they are
particularly sensitive to differences in mean or medians (Howell, 2008;
Hesterberg, Moore, Monaghan, Clipson & Epstein, 2006).
Second, the p-value that they provide when analysing large samples, as is almost always the case in corpus linguistics, is obtained using a Monte-Carlo resampling procedure3. This type of test is gaining
more and more attention in statistics (Good, 2005) as well as in corpus linguistics (Gries, 2006). However, its weakness is that the degree
of precision of the probability depends on the number of resamplings
performed and it is therefore time-consuming to obtain probability estimates for many words. This limitation is especially important when
estimating extremely small probabilities, since they cannot be smaller
than one divided by the number of resamplings done.
Finally, replacing the relative frequencies by ranks in the case of
the WMW has the consequence that the corpus containing the fewest
occurrences of a word may be the one whose texts have the highest
average rank. This will be the case, for instance, if one of the two corpora only contains a single text containing many occurrences of the
word, while the other corpus contains a suficiently large number of
texts containing a small number of occurrences of it. The irst corpus
will have the highest frequency, but the second will have the highest
average rank. This difference between the two tests is not a defect. It
points out words showing an atypical proile.
4. Empirical evaluation of the different tests
So far, studies which have stressed the inadequacy of the Chi2 and LL
tests for analysing lexical differences between corpora presented arLijfijt et alii. (2016) proposed an ad hoc resampling procedure of the bootstrap type
that differs from the usual practices in statistics since the resampling is done in a manner that is not consistent with the null hypothesis (Hesterberg et alii., 2010) and since,
when the two samples are unequal in size, the smallest sample size is used in the resampling procedure (see Efron and Tibshirani [1993, Chap. 16] for a signiicance test
based on bootstrap).
3
Getting rid of the Chi-square and Log-likelihood tests...
41
guments using the fact that these tests declare too many words to be
signiicant even when extremely strict probability thresholds are used
(Bestgen, 2014; Brezina & Meyerhoff, 2014; Kilgarriff, 2005; Lijfijt
et alii., 2016). Such demonstrations have obviously not been suficient,
since these tests continue to be used in corpus linguistics and they are
still the only statistical tests available in WST and AntConc. We are
thus proposing another proof of the problem. We will evaluate the effectiveness of a statistical test based on what it is really used for, that
is, the conclusion derived from a signiicant difference. If a test claims
that a given word is more frequent in one variety of English than it is in
another because it inds a signiicant difference between the frequency
of this word in the two corpora, it is expected that if two other corpora
that differ on the same dimension are analysed, that difference will also
be observed. One can immediately see the problematic consequences
resulting from a test that is not very effective according to this criterion:
nobody can trust the conclusions to which it leads. This evaluation procedure is used in the analyses reported below, which were conducted on
the distinction between American and British English. The statistically
signiicant differences were determined on the basis of two corpora of
one million words each, and the veriication on the basis of two very
large corpora, frequently used as reference corpora for the varieties in
question.
4.1. Materials
4.1.1. Corpora for inding the signiicant differences
We made use of the FLOB (Freiburg LOB Corpus of British English)
and the FROWN corpus (Freiburg Brown Corpus of American English), both compiled at the University of Freiburg to be as similar as
possible except, of course, in terms of the variety of English. Each corpus contains a million words, corresponding to 500 extracts from texts4
published in the early 90s. Each contains approximately 2000 words,
and they comprise 15 genres of written texts, such as press texts, scientiic writing, romantic iction and science iction. They are available
on the ICAME CD-ROM (Holand, Lindebjerg & Thunestvedt, 1999).
4
The resampling tests do not require both corpora to contain the same number of texts.
42
Yves Bestgen
4.1.2. Corpora for evaluating the test decisions
Two large reference corpora for these varieties of English were used:
• The British National Corpus (BNC), a 100-million-word collection of samples of written and spoken language designed to represent a wide cross-section of British English from the late 20th
century.
• The Corpus of Contemporary American English (COCA) is a
very large and balanced corpus of American English. The version
we used contains more than 425 million words of text (20 million words for each year between 1990 and 2011) and is equally
divided between speech, iction, popular magazines, newspapers
and academic texts.
In the following analyses, a word is considered typical of an English
variety according to the reference corpus when its relative frequency is
higher in the corresponding corpus.
4.2. Procedure
A series of pre-treatments had to be applied to the texts, such as word
segmentation and special character removal. The same pre-processing
steps were carried out on the analysed corpora (FLOB and FROWN)
and on the reference corpora (BNC and COCA).
The Chi2, LL, WMW and FP tests were applied to all words with a
total frequency of at least 10 in the two corpora, so as to analyse only
words with a suficient expected frequency (a requirement for using the
Chi2 test). To estimate the p-values for the two resampling tests, one
million permutations were made. The probability threshold for deciding
that a word is signiicantly more frequent in one of the two compared
corpora was set at 0.000001, which is the default value in WST.
The analyses were carried out twice: the irst time without taking
into account the dispersion criterion and the second time only considering words occurring at least in 5% of the texts of the corpus in which
they have the highest relative frequency. This is the dispersion criterion, the range, which is used in WST, and it is set to its default value
in this software. This threshold of 5% corresponds to 25 texts in these
43
Getting rid of the Chi-square and Log-likelihood tests...
corpora and therefore implies a minimum frequency of 25 occurrences
of the word in the corpus. An advantage of the range over many other
measures of dispersion is that it is easily interpretable. It is important
to compare the performance of the tests with and without a dispersion
threshold because few studies use them, whereas Oakes and Farrow
(2007) have shown that it is useful for iltering uninteresting words
when using the Chi2 test.
4.3. Results
Table 2 summarizes the main results of the analyses. For each statistical test, and with or without taking the dispersion criterion into account, the number of words considered as signiicant at the threshold
of 0.000001 is given, as well as the proportion of these words validated
by the reference corpora and the number of words not validated. As can
be seen, many more words are selected by the Chi2 and LL tests than
by the two adequate tests, conirming the criticism raised by Kilgarriff
(2005). Without a control on dispersion, a non-negligible percentage
of these words is not validated by the reference corpora. When the dispersion threshold is taken into account, 8% of the words selected by
the two inappropriate statistical tests are rejected. For both appropriate
statistical tests, the results are very different. These tests clearly select
fewer words, but all of them are validated when dispersion is taken into
account, and only one word is not validated when this criterion is not
considered
Without Range
Test
Nbr. Sig
%OK
With Range
Nbr. KO
Nbr.Sig
%OK
Nbr. KO
CH
577
83.36
96
280
92.14
22
G
805
81.24
151
288
92.01
23
WMW
122
99.18
1
113
100.00
0
FP
104
99.04
1
99
100.00
0
Table 2: Results for the four statistical tests
From a qualitative point of view, the words selected by the Chi2 and
the LL tests that were not validated by the corpus of reference when ap-
44
Yves Bestgen
plying the dispersion criterion are as follows (ordered according to their
keyness score): t, i, japan, have, st, m, ai, children, opera, last, male,
stress, performance, poll, has, relations, okay, legal, mental, d, yeah
and prison. The LL adds the word patient to this list. This list includes
the word male, which has been used as an example in Table 1 and which
is considered by the inadequate tests as being typical of American English. It is interesting to compare this list with the 25 words validated by
the corpus of reference with the highest keyness scores: percent, which,
cent, labour, toward, program, clinton, bush, president, programs, towards, american, uk, per, states, london, labor, british, was, defense,
centre, center, britain, united and washington. This list includes the
other example in Table 1 (towards). There is no doubt that the words
on the second list are clearly more easily interpretable, in the sense
that it is easy to guess the variety of English in which they occur most
frequently, whereas it is much more dificult for the irst list. The term
selected by the WMW and FP tests that is not validated by the corpus of
reference is DC, which is more frequent in the FROWN corpus than the
FLOB corpus, but less frequent in the COCA than in the BNC, where
it appears not only as expected after Washington, but also as the abbreviation of direct current and in an extract of The Dickens Index book.
The objective of this analysis is to illustrate the problems posed by
the classical Chi2 and LL tests and to show that the proposed tests do
not encounter these dificulties. It is not possible to analyse the two
inappropriate tests in detail, in order to determine whether it is possible to make them more eficient, by using more extreme probability
thresholds or by using other dispersion measures. Such analyses would
require a variation in the size of the corpora to determine whether or
not an eficient solution for comparing two one-million-word corpora
is also appropriate for smaller and larger corpora, or for corpora of different sizes.
5. How can the adequate test be obtained?
The previous section very concretely shows the problems caused by
using inadequate statistical tests when analysing lexical differences between corpora. However, to persuade researchers to adopt the adequate
tests, it is necessary to simplify their use as much as possible. This
section presents instructions for both WST and R, which make it easy
to use these tests.
Getting rid of the Chi-square and Log-likelihood tests...
45
5.1. Getting the necessary data with WST
The irst step is to create a wordlist for each corpus containing the frequencies of all of the words in all of the texts by supplying a ile per text
to WST (after, if necessary, using the Split function in File Utilities). In
the Wordlist function, use the Make a batch now option with One ile
with all individual results in it in zip format. Then, use the Detailed
consistency function, where you select the zip ile containing all Wordlists (one per ile). Finally, save the results displayed on the screen in
a .txt ile with tab as column separator and uncheck the Separate thousands box. These steps must be performed separately for each corpus.
5.2. The R script for computing the statistical tests
The R (R Core Team, 2013) script provided in the appendix requires a
complementary package, called Coin (Hothorn, Hornik, van de Wiel &
Zeileis, 2008), which performs the resampling tests. If it is not already
installed, the script tries to do so. To use this script, just copy the whole
code (the CorpLexTests function) and paste it into the R console window and press Enter. Then, it is necessary to adapt the command line
provided below to the iles to be analysed and the parameter values to
be used.
CorpLexTests(ile1=”E:/FLOBList.txt”, ile2=”E:/FROWNList.txt”,
minfreq = 10, minrange = 0.05, maxpll = 0.0001, niter1 = 10000, pperm =
0.0003, niter2 = 1000000)
The parameters are as follows:
• ile1 and ile2 provide the paths and ilenames for the two iles
obtained from WST (on Windows, “E:\\FLOBList.txt” works as
well).
• minfreq indicates the minimum total frequency of the word in
both corpora for the analysis to be conducted. The default value
is 0 and corresponds to no threshold. However, it seems meaningless to try to determine whether a very rare word is more frequent in one corpus than in another. Moreover, in addition to the
problem of non-independence described above, it is known that,
in order to be valid, the Chi-square test imposes a condition on
46
Yves Bestgen
the expected frequencies in the contingency table cells (usually
at least ive). It should be noted that this condition does not apply
to the permutation tests, but that a rare word is unlikely to be signiicant enough to merit a thorough linguistic analysis.
• minrange gives the minimum threshold of the number of texts
in which the analysed word should occur. This value is given in
proportion to the number of texts in the corpus containing the
most occurrences (in terms of relative frequency) of this word.
The default value, taken from WST, is 0.05.
• maxpll indicates the maximum p-value from the LL test for the
analysis to be conducted. It must be between 0 and 1, 0 allowing
only a very small number of tests and 1 allowing all tests. This
function makes it possible to reduce the duration of the analysis
by only performing the resampling tests on words which would
have been declared signiicant by the usual (but problematic) LL
test. The default value is set to 0.000001, as in WST.
• niter1 indicates the number of resamplings to be performed for
any word that successfully passes the three conditions (minfreq,
minrange and maxpll). It is desirable not to go below 1000. The
chosen value will determine the smallest probability that can be
given to a word by the resampling tests. For example, 1000 corresponds to a probability of 0.001. The greater the number of
resamplings requested, the more time the analyses will take. For
this reason, it is possible to request additional resamplings for the
most signiicant words using the last two parameters. This function is activated by the parameter pperm, which gives the maximum p-value for performing a series of complementary resamplings. It is applied independently to each of the two resampling
tests. Thus, for each of these tests, if the p-value resulting from
the irst niter1 resamplings is less than or equal to this parameter,
a total of niter2 resamplings is performed. Niter2 must necessarily be greater than niter1 (since it includes these iterations). Setting niter2 to 1000000 by default will yield probabilities as small
as 0.000001, the default threshold for WST. The default value of
pperm is set to 0, and this option is thus not used.
The only required parameters are the two ile paths, since all other
parameters have acceptable default values. This script works both on
Windows and Mac OS X (but not WST).
Getting rid of the Chi-square and Log-likelihood tests...
47
5.3. R script output
The results are displayed on the R Console and saved in a ile named
CorpLexTestsRes.txt in the folder where corpus 1 is located. The irst
fourth lines give general information about the analyses performed. The
irst two lines show the ile path and name of each corpus, as well as
the number of texts and the number of words they contain. This line
thus serves as a reminder of which corpus corresponds to corpus 1 in
the results. The third row contains the values of the parameters used in
the analysis. The fourth line gives the names of the variables provided
in the results.
Figure 1 : Output of the R script for the FLOB vs. FROWN comparison (partim)
The printed results are as follows:
• The analysed word.
• FreqC1 gives the frequency of the word in corpus 1 and FreqC2
its frequency in corpus 2. The relative frequencies can be calculated using the total frequencies of the two corpora given in the
irst row.
• Chi2 gives the Chi-square statistic and Chi2_p the corresponding
p-value. LL and LL_p do the same for the LL test.
• Range gives the number of texts containing this word in the corpus in which the word is the most frequent (in terms of relative
frequency).
• WMW_p gives the obtained p-value from the WMW test and
FP_p does the same for the FP test.
As can be seen in the above extract of a comparative analysis of the
FLOB and FROWN corpora, only those words which pass the minfreq,
48
Yves Bestgen
the minrange and the maxpll thresholds are printed. The word BABY
is preceded by an asterisk because the corpus that contains the most
occurrences of this word is the one whose texts do not have the highest
average rank. In this case, caution should be taken when interpreting
the results, as explained in section 3. However, it is unlikely that this
kind of result will be observed for words that are very signiicant for
both the WMW and the FP tests.
6. Conclusion
This paper deals with statistical tests used in corpus linguistics for analysing lexical differences between corpora. Its most important contributions are the following:
• To explain in detail why the Chi2 and LL tests are inadequate
in this research ield. It is important to emphasize again that the
problem raised applies as much to the statistics resulting from
the tests as it does to the probability that it is derived from them,
and therefore also affects the keyness score or other any effect
size measures such as that proposed by Gabrielatos and Marchi
(2011). The problem raised affects any use of these tests to analyse corpora regardless of the linguistic unit counted: words,
but also lexical bundles, collocations, syntactic constituents... It
follows that, for instance, using these tests to analyse the use of
the passive voice in different corpora is also inappropriate.
• To concretely demonstrate the seriousness of the erroneous conclusions reached when they are used.
• To propose two tests that are adequate and effective.
• To provide a concrete solution, which we hope is easy to put into
practice, to use the appropriate tests.
An important question that has so far gone unanswered is which of
the two appropriate tests is preferable. The main difference between
them is that the WMW test, based as it is on ranks, is more sensitive
than the FP test to small differences in frequency within a relatively
large number of texts, whereas the FP test is more sensitive to the presence of a relatively small number of texts containing relatively high
frequencies. Ideally, both tests should be signiicant. If only one of them
Getting rid of the Chi-square and Log-likelihood tests...
49
is clearly not signiicant for the chosen probability threshold, it is necessary to be very careful in interpreting the results and, in any case,
to analyse the distribution of this word in the texts of the two corpora
using WST.
7. Bibliography
Anthony, Laurence. 2012. AntConc Version 3.3.5. [Computer Software]. Tokyo: Waseda University. http://www.antlab.sci.waseda.ac.jp/.
Baker, Paul. 2004. Querying keywords: questions of difference, frequency and
sense in keywords analysis. Journal of English Linguistics 32(4): 346359.
Berry Kenneth J.; Mielke, Paul W. & Mielke, Howard W. 2002. The Fisher-Pitman permutation test: an attractive alternative to the F test. Psychological Reports 90: 495-502.
Bestgen, Yves. 2012. Analyse des différences lexicales entre des corpus : test
ou distance du Khi-2? Dans Actes de JADT 2012 : 11es Journées internationales d’Analyse statistique des Données Textuelles, 150-161.
Bestgen, Yves. 2014. Inadequacy of the chi-squared test to examine vocabulary differences between corpora. Literary and Linguistic Computing
29: 164-170
Biber, Doug & Jones, James. 2009. Quantitative methods in corpus linguistics.
In Ludeling, Anke & Kytö, Merja (ed.) Corpus Linguistics. An International Handbook. Berlin: Mouton de Gruyter, 1286-1304.
Brezina, Vaclav & Meyerhoff, Miriam. 2014. A critical review of sociolinguistic generalisations based on large corpora. International Journal of
Corpus Linguistics 19(1): 1-28.
Church, Kenneth W. 2000. Empirical estimates of adaptation: The chance of
two Noriegas is closer to p/2 than p2. In Proceedings of the 17th International Conference on Computational Linguistics, 180-186.
Efron, Brad & Tibshirani, Rob. 1993. An introduction to the bootstrap. New
York: Chapman & Hall.
Gablasova, Dana; Brezina, Vaclav & McEnery, Tony. 2017. Exploring learner
language through corpora: Comparing and interpreting corpus frequency information. Language Learning (advance access). doi: 10.1111/
lang.12226.
Gabrielatos, Costas & Marchi, Anna. 2011. Keyness: Matching metrics to deinitions. Paper presented at the Corpus Linguistics in the South: Theoretical-methodological challenges in corpus approaches to discourse
studies - and some ways of addressing them. Portsmouth: 5th November 2011.
50
Yves Bestgen
Good, Phillip I. 2005. Permutation, parametric and bootstrap tests of hypotheses (Third Edition). New-York: Springer.
Gries, Stefan Th. 2005. Null hypothesis signiicance testing of word frequencies: a follow-up on Kilgarriff. Corpus Linguistics and Linguistic Theory 1: 277-294.
Gries, Stefan Th. 2006. Exploring variability within and between corpora:
some methodological considerations. Corpora 1(2): 109-151.
Hesterberg, Tim; Moore, David S.; Monaghan, Shaun; Clipson, Ashley & Epstein, Rachel. 2006. Bootstrap methods and permutation tests. Supplemental chapter for Moore, David S. & McCabe, George P. Introduction
to the Practice of Statistics. New York: W H Freeman.
Holand, Knut & Johansson, Stig. 1982. Word frequencies in British and American English. Bergen: The Norwegian Computing Centre for the Humanities.
Holand, Knut; Lindebjerg, Anne & Thunestvedt, Jorn. 1999. ICAME collection of English language corpora. [CD-ROM]. Bergen: The HIT Centre, University of Bergen.
Hothorn, Torsten; Hornik, Kurt; van de Wiel, Mark A. & Zeileis, Achim. 2008.
Implementing a class of permutation tests: the coin package. Journal of
Statistical Software 28(8): 1-23.
Howell, David C. 2008. Méthodes statistiques en sciences humaines. Bruxelles: De Boeck Université.
Kilgarriff, Adam. 1996. Comparing word frequencies across corpora: Why
Chi-square doesn’t work, and an improved LOB-Brown comparison. In
Proceedings of ALLC-ACH Conference, 169-172.
Kilgarriff, Adam. 2005. Language is never, ever, ever random. Corpus Linguistics and Linguistic Theory 1: 263-275.
Kurtz, Albert K. & Mayo, Samuel T. 1979. Statistical methods in psychology
and education. New York: Springer.
Laviosa, Sara; Pagano, Adriana; Kemppanen, Hannu & Ji, Meng. 2017. Textual and contextual analysis in empirical translation studies. Singapore:
Springer.
Lee, David Y. W. & Chen, Sylvia Xiao 2009. Making a bigger deal of the
smaller words: Function words and other key items in research writing
by Chinese learners. Journal of Second Language Writing 18: 149-165.
Leedham, Maria. 2012. Review of: “New trends in corpora and language
learning” and “Keyness in texts”. System 40(1): 162-165.
Lewis, Don & Burke, C. J. 1949. The use and misuse of the chi-square test.
Psychological Bulletin 46(6): 433-489.
Lijfijt, Jefrey; Nevalainen, Terttu; Säily, Tanja; Papapetrou, Panagiotis; Puolamäki, Kai & Mannila, Heikki. 2016. Signiicance testing of word
Getting rid of the Chi-square and Log-likelihood tests...
51
frequencies in corpora. Literary and Linguistic Computing 31(2): 374397.
Lubbers Quesada, Margaret & Blackwell, Sarah E. 2009. The L2 acquisition
of null and overt spanish subject pronouns: A pragmatic approach. In
Collentine, Joseph (ed.) Selected Proceedings of the 11th Hispanic Linguistics Symposium. Somerville, MA: Cascadilla Proceedings Project,
117-130.
Marquilhas, Rita. 2015. Non-anachronism in the historical sociolinguistic
study of Portuguese. Journal of Historical Sociolinguistics 1(2): 213242.
Neuhauser, Markus & Manly, Bryan F. J. 2004. The Fisher-Pitman permutation test when testing for differences in mean and variance. Psychological Reports 94: 189-194.
Oakes, Michael & Farrow, Malcolm. 2007. Use of the chi-squared test to examine vocabulary differences in English language corpora representing
seven different countries. Literary and Linguistic Computing 22: 85-99.
Paquot, Magali & Bestgen, Yves. 2009. Distinctive words in academic writing:
A comparison of three statistical tests for keyword extraction. In Jucker,
Andreas H.; Schreier, Daniel & Hundt, Marianne (ed.) Corpora: Pragmatics and Discourse. Amsterdam: Rodopi, 247-269.
R Core Team. 2013. R: A language and environment for statistical computing.
Vienna: R Foundation for Statistical Computing. http://www.R-project.
org/.
Rayson, Paul; Leech, Geoffrey & Hodges, Mary. 1997. Social differentiation
in the use of English vocabulary: Some analyses of the conversational component of the British National Corpus. International Journal of
Corpus Linguistics 2: 133-152.
Rayson, Paul; Berridge, Damon & Francis, Brian. 2004. Extending the Cochran
rule for the comparison of word frequencies between corpora. In Proceedings of the 7th International Conference on Statistical analysis of
textual data, 926-936.
Rayson, Paul & Garside, Roger. 2000. Comparing corpora using frequency
proiling. In Kilgariff, Adam & Sardinha, Tony B. (ed.) Proceedings of
the Comparing Corpora Workshop, 1-6.
Sampson, Geoffrey. 2003. Statistical linguistics. In Frawley, William J. International Encyclopedia of Linguistics (2 ed.). New York: Oxford University Press.
Scott, Mike. 1997. PC analysis of key words - and key key words. System
25(2): 233-245.
Siyanova-Chanturia, Anna. 2015. Collocation in beginner learner writing: A
longitudinal study. System 53: 148-160.
52
Yves Bestgen
Tribble, Chris. 2000. Genres, keywords, teaching: towards a pedagogic account of the language of project proposals. In Burnard, Lou & McEnery, Tony (ed.) Rethinking language pedagogy from a corpus perspective: papers from the third international conference on teaching and
language corpora. Bern: Peter Lang, 75-90.
Wallis, Sean. 2013. z-squared: The origin and application of Chi-square. Journal of Quantitative Linguistics 20(4): 350-378.
Getting rid of the Chi-square and Log-likelihood tests...
53
Appendix: The R script for computing the statistical tests
CorpLexTests <- function(ile1=”no-ile”,ile2=”no-ile”,
minfreq=0,minrange=0.05,maxpll=0.000001,niter1=10000,
maxpperm=0,niter2=1000000) {
#parametres
if (maxpll<0 | maxpll>1) {cat(sprintf(“\nParamater error : maxpll= %f not
between 0 and 1\n”,maxpll)); stop(“Please change this parameter
value”)}
if (maxpperm<0 | maxpperm>1) {cat(sprintf(“\nParamater error : maxpperm=
%f not between 0 and 1\n”,maxpperm)); stop(“Please change this
parameter value”)}
if (minrange<0 | minrange>1) {cat(sprintf(“\nParamater error : minrange=
%f not between 0 and 1\n”,minrange)); stop(“Please change this
parameter value”)}
if (minfreq<0) {cat(sprintf(“\nParamater error : minfreq= %d must be >=
0\n”,minfreq)); stop(“Please change this parameter value”)}
if (maxpperm>=1/niter1 & niter2<=niter1) {cat(sprintf(“\nParamater error
: niter2= %d must be > niter1= %d\n”,niter1,niter2)); stop(“Please
change this parameter value”)}
cat(“Loading coin package\n”)
if(!require(coin)){ #Try to install the Coin package if not already installed
install.packages(“coin”)
}
library(“coin”)
cat(“Reading irst ile\n”)
d1=read.table(ile1, header = FALSE,skip=1,comment.char=”.”,row.names =
1,ileEncoding=”UTF-16LE”,sep = “\t”,dec = “,”, quote=”\””)
cat(“Reading second ile\n”)
d2=read.table(ile2, header = FALSE,skip=1,comment.char=”.”,row.names =
1,ileEncoding=”UTF-16LE”,sep = “\t”,dec = “,”, quote=”\””)
#fnout = paste(dirname(ile1),”Res.txt”,sep=”/”) si on est sur de sep...
fnout = paste(substr(ile1,1,nchar(ile1)nchar(basename(ile1))),”CorpLexTestsRes.txt”,sep=””)
cat(“Preparing data for processing\n”)
#Delete some columns
d1=d1[,-(2:5)]
d2=d2[,-(2:5)]
#nbr of texts
ncol1 <- ncol(d1)-1 #the irst is the word
ncol2 <- ncol(d2)-1
54
Yves Bestgen
#merge by words, keeping all of them
da=merge(d1,d2,by.x=1,by.y=1,all=TRUE)
#transpose the data, but not the word
tda=t(da[,-1])
#remplace NA by 0
tda[is.na(tda)] <- 0
#Corpus id
corpus=c(rep(1, ncol1), rep(2, ncol2))
#number of mots in each text
rs=rowSums(tda)
#add these variables to the data
mydata=cbind(corpus,rs,tda)
lastword <- ncol(mydata) #last column number
cat(“Start computing the statistical tests\n”)
sink(fnout, append=FALSE, split=TRUE) #to print the output in a ile
#print irst lines
cat(sprintf(“# Corpus 1: File=%s NbrText=%d NbrWord=%d\n# Corpus 2:
File=%s NbrText=%d NbrWord=%d\n”,
ile1,ncol1,sum(mydata[mydata[,’corpus’] ==
1,2]),ile2,ncol2,sum(mydata[mydata[,’corpus’] == 2,2])))
cat(sprintf(“# minfreq=%d maxpll=%f niter1=%d maxpperm=%f niter2=%d\
n”,minfreq,maxpll,niter1,maxpperm,niter2))
cat(sprintf(“%-25s %10s %10s %10s %13s %10s %13s %10s %10s
%10s\n”,”Word”, “FreqC1”,”FreqC2”,”Chi2”,”Chi2_p”,”LL”,”LL_p”
,”Range”,”WMW_p”,”FP_p”))
#Loop on the words
for (myidx in 3:lastword) {
if (sum(mydata[,myidx])>=minfreq) {
#compute the number of other words in the texts
otherwords <- mydata[,2]-mydata[,myidx]
#Compute Chi2 and LL
ori <- as.table(rbind(tapply(mydata[,myidx], list(mydata[,’corpus’]),
FUN=sum), tapply(otherwords, list(mydata[,’corpus’]), FUN=sum)))
oriXsq <- chisq.test(ori,correct=FALSE)
LL<2*sum(oriXsq$observed[oriXsq$observed>0]*log(oriXsq$observed
[oriXsq$observed>0]/oriXsq$expected[oriXsq$observed>0]))
LL_pval=1-pchisq(LL,1)
#Compute range in the corpus in which this word is the most frequent (in
relative frequency)
Getting rid of the Chi-square and Log-likelihood tests...
55
if (oriXsq$observed[1]>=oriXsq$expected[1]) {
whichcor=1
range<-sum(mydata[mydata[,’corpus’] == 1,myidx] > 0)
rm<-range/ncol1
}
else {
whichcor=2
range<-sum(mydata[mydata[,’corpus’] == 2,myidx] > 0)
rm<-range/ncol2
}
if (rm>=minrange & LL_pval<=maxpll) { #No output if range is
insuficient or the p-value for LL is to large
#Relative frequency (by overwriting the original data)
mydata[,myidx] <- mydata[,myidx]/mydata[,2]
wt<-wilcox_test(mydata[,myidx] ~ factor(mydata[,’corpus’]),distribution
= “asymptotic”)
pwta<-pvalue(wilcox_test(mydata[,myidx] ~ factor(mydata[,’corpus’]),dis
tribution = approximate(B = niter1-1)))
if ((pwta*(niter1-1)+1)/niter1<=maxpperm) {
pwta<-(1+pwta+pvalue(wilcox_test(mydata[,myidx] ~ factor(mydata[
,’corpus’]),distribution = approximate(B = niter2-niter1)))*(niter2niter1))/niter2
}
else pwta<-(pwta*(niter1-1)+1)/niter1
pfpa<-pvalue(oneway_test(mydata[,myidx] ~ factor(mydata[,’corpus’]),
distribution = approximate(B = niter1-1)))
if ((pfpa*(niter1-1)+1)/niter1<=maxpperm) {
pfpa<-(1+pfpa+pvalue(oneway_test(mydata[,myidx] ~ factor(mydata
[,’corpus’]),distribution = approximate(B = niter2-niter1)))*(niter2niter1))/niter2
}
else pfpa<-(pfpa*(niter1-1)+1)/niter1
if ((statistic(wt)>0 & whichcor==2) | (statistic(wt)<0 & whichcor==1))
{ #For discordances between ranks and frequencies
cat(sprintf(“*%-24s %10.0f %10.0f %10.2f %13.4e %10.2f %13.4e
%10.0f %10.8f %10.8f\n”,
da[myidx2,1],oriXsq$observed[1],oriXsq$observed[3],oriXsq$statistic,oriXsq
$p.value,LL,LL_pval,range,pwta,pfpa))
}
else { #For the normal case
56
Yves Bestgen
cat(sprintf(“%-25s %10.0f %10.0f %10.2f %13.4e %10.2f %13.4e
%10.0f %10.8f %10.8f\n”,
da[myidx-2,1],oriXsq$observed[1],oriXsq$observed[3],oriXsq$statistic,o
riXsq$p.value,LL,LL_pval,range,pwta,pfpa))
}
}
}#End of the range and the maxpll conditions
} #Loop end
sink() #End of output in a ile
}
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
“El criado pesado”: La caracterización en la serie Águila Roja
“The annoying servant”: Characterization in the TV series Águila Roja
Luisa Chierichetti
Università degli Studi di Bergamo.
[email protected]
Recibido: 14/05/2017. Aceptado: 10/10/2017
Resumen: El presente artículo, partiendo de los más recientes estudios sobre el discurso telecinemático, pretende contribuir a la investigación en el ámbito de la lingüística
aplicada a las series televisivas, las cuales constituyen uno de los productos culturales
populares más inluyentes en la sociedad contemporánea. El corpus de estudio está
compuesto por los guiones completos de la exitosa serie Águila Roja, emitida por Radio
Televisión Española entre 2009 y 2016. Compaginando técnicas de la lingüística de
corpus y del análisis del discurso, este estudio examina la caracterización de Sátur, uno
de los personajes principales de la icción, a través de la co-construcción del signiicado
elaborada por la audiencia televisiva. Los resultados sugieren que el discurso del personaje está caracterizado por el uso del registro coloquial contemporáneo y por incongruencias y anacronismos, rasgos que crean humor y familiaridad con los espectadores.
Palabras clave: series televisivas; discurso telecinemático; caracterización; lingüística
de corpus; análisis del discurso.
Abstract: This article, based on the most recent studies on telecinematic dialogue,
proposes a contribution to linguistic research on television series, one of the most inluential popular cultural products in contemporary society. The work is based on the
complete scripts of the successful Spanish series Águila Roja, aired on Radio Televisión Española between 2009 and 2016. Combining techniques of corpus linguistics
and discourse analysis, this study examines the characterization of Sátur, one of the
main characters of this iction, through the co-construction of the meaning, as processed
by the television audience. The results suggest that Sátur’s discourse is characterized by
the use of contemporary colloquial language and by incongruity; such features create
humor and familiarity with the audience.
Keywords: television series; telecinematic discourse; characterization; corpus linguistics; discourse analysis.
Chierichetti, Luisa. 2017. “‘El criado pesado’: La caracterización en la serie Águila Roja”. Quaderns de Filologia: Estudis Lingüístics 22: 57-78. doi: 10.7203/
qf.22.11301
“El criado pesado”: La caracterización en la serie Águila Roja
59
1. Introducción
En este artículo nos ocupamos del lenguaje de las series televisivas
y especialmente de cómo se utiliza para crear la identidad expresiva
de los personajes. Siguiendo a Bednarek (2010, 2011a, 2011b, 2012a,
2012b, 2015a, 2015b) y los trabajos sobre la caracterización en textos
teatrales de Culpeper (2001, 2009), recurrimos a las herramientas de
la lingüística de corpus y del análisis del discurso para examinar los
guiones originales de una serie española de gran éxito, Águila Roja, indagando en la construcción de uno de los personajes principales, Sátur.
La caracterización lingüística de personajes en la icción audiovisual
ha sido objeto de cierto interés especialmente con referencia a las series
televisivas norteamericanas, que actualmente gozan de un abrumador
éxito a nivel mundial. Recordemos, sin pretensión de exhaustividad, algunas investigaciones de relieve a partir del volumen de Baker (2005),
en el que se combina la sociolingüística y la lingüística de corpus en
la construcción de la identidad homosexual, dedicando un capítulo a
los personajes gais de la serie Will and Grace. También Wodak (2009),
estudiando la relación compleja y cambiante entre la política y los medios de comunicación, examina la construcción discursiva del héroe en
la serie The West Wing. Bednarek, en su volumen sobre el lenguaje de la
icción televisiva (2010), así como en algunos ensayos posteriores
(2011a, 2011b, 2012a, 2012b) sobre la caracterización de los personajes
de series, ofrece una relexión sobre las características de la icción en
la pequeña pantalla uniendo los temas de la multimodalidad, el género
y la audiencia con la creación de una “identidad expresiva del personaje”, utilizando técnicas de análisis de corpus y el análisis discursivo
‘manual’. Bubel (2006) dedica su trabajo de tesis doctoral a la construcción discursiva de las relaciones entre personajes, centrándose en la
amistad en Sex and the City. La caracterización lingüística de personajes de series es investigada por Bubel & Spitz (2006), Gregori Signes
(2007) y Richardson (2010) focalizando los medios del humor verbal y
la descortesía, y por Mandala (2007, 2008, 2011), quien se centra en relevantes recursos discursivos en inglés, como el uso de adjetivos en -y,
el cambio de código del inglés al chino y la cortesía lingüística.
Hasta el momento, que sepamos, las investigaciones sobre la creación de personajes en series españolas se ciñen al ámbito de la comunicación y de la sociología más que al lingüístico, como las de Galán
60
Luisa Chierichetti
Fajardo (2007) y González de Garay (2009) sobre construcción de género y López & Cuenca (2005), Galán Fajardo (2006), Igartua, Barrios
& Ortega (2012) y Marcos & Igartua (2014) sobre estereotipos sociales.
Águila Roja ha suscitado interés a nivel académico como fenómeno de
audiencia (Barrientos Bueno, 2011, 2012) y por su desarrollo transmediático (Costa Sánchez & Piñeiro Otero, 2012; Guerrero, 2014).
Nos proponemos comprobar, a través del análisis del discurso de
uno de los personajes principales y más populares de Águila Roja, el
del criado Sátur, cómo los rasgos coloquiales y cierta incongruencia y
tendencia al anacronismo en sus diálogos están muy lejos de ser producto de descuido o torpeza por parte de los autores de la serie; por
lo contrario, sostenemos que son el producto de una caracterización
orientada a crear un personaje contemporáneo y cercano al público, una
igura que escapa del marco iccional del Siglo de Oro para acercarse
al universo de la audiencia televisiva. A este in, realizamos en primer
lugar una breve síntesis del tema y de las características más sobresalientes de la serie y delimitamos el marco contextual de nuestro análisis,
el discurso telecinemático. Posteriormente, presentamos el corpus de
trabajo y la metodología, para luego exponer nuestro análisis y llegar a
las conclusiones.
2. Águila Roja
Águila Roja es una serie de televisión contemporánea, creada por Daniel
Écija, Pilar Nadal, Ernesto Pozuelo y Juan Carlos Cueto, producida por
Globomedia para Radio Televisión Española y emitida en La 1 desde el
19 de febrero de 2009 hasta el 27 de octubre de 2016. Se compone de
nueve temporadas y 116 episodios. Desde su primer episodio logró un
gran éxito de audiencia, con un 30 % de cuota de pantalla y más de cinco
millones de espectadores. Su calidad televisiva ha sido reconocida por
la industria de la televisión, ya que la serie ha cosechado un total de 37
premios, más otras 17 nominaciones1. La serie ha sido emitida en su versión original o adaptada en decenas de países2. A partir de la producción
1
Obtuvo el Premio Ondas (2010) a la mejor serie nacional, el TP de Oro (2009, 2010
y 2011), la Medalla de Plata del Festival de Televisión de Nueva York, el premio como
Mejor Serie del Festival de TV de Vitoria y seis premios de la Academia de TV (www.
globomedia.es/2005-2009) [Acceso 29/3/2017].
2
https://es.wikipedia.org/wiki/Águila_Roja [Acceso 29/3/2017].
“El criado pesado”: La caracterización en la serie Águila Roja
61
audiovisual convencional se ha desarrollado un producto transmedia,
también galardonado como “Mejor contenido multiplataforma” (Costa
Sánchez & Piñeiro Otero, 2012: 107). Los fans de la serie tienen a disposición un micrositio en la web de RTVE3 que vehicula la información
oicial de la serie, permite el consumo bajo demanda de todos sus capítulos, fomenta la interacción con los diferentes públicos (encuestas, foros,
redes sociales) y da acceso al sitio web miaguilaroja.com, centrado en
un videojuego. Consideramos, pues, que se trata de un producto que ha
suscitado un interés considerable no solo dentro, sino también fuera del
panorama cultural español.
La web de RTVE presenta así la serie, en la que se hibridan varios
géneros televisivos (aventuras, histórico, cómico, romántico) (Barrientos Bueno, 2011: 5), una “dramedia” (es decir, una mezcla plausible de
elementos dramáticos y cómicos) de capa y espada (Guerrero, 2014:
241) destinada a la audiencia familiar:
Televisión Española entra de lleno en el género de aventuras de época
con Águila Roja, una producción de Globomedia para toda la familia
ambientada en el Siglo XVII español. Con Águila Roja nos adentramos
en una serie de aventuras e intriga sobre el valor, la nobleza, la amistad
y el amor. El protagonista, interpretado por David Janer, es un héroe
anónimo justiciero del Siglo XVII –conocido con el apelativo de Águila
Roja– que ayuda a los débiles y que está empeñado en desenmascarar
la conspiración que se esconde tras el asesinato de su joven esposa y en
conocer sus orígenes4.
3. El diálogo telecinemático
Utilizamos, siguiendo a Piazza, Bednarek & Rossi (2011a), el adjetivo
telecinemático al referirnos a características compartidas entre el lenguaje iccional y narrativo por el cine y la televisión; aunque hay diferencias intrínsecas entre los dos medios, es especialmente signiicativo
el hecho de que ambos discursos estén regulados por el doble plano de
comunicación que caracteriza a todo discurso en la pantalla –y, hasta
cierto punto, también al discurso teatral– (Piazza, Bednarek & Rossi,
2011b: 1):
3
4
http://www.rtve.es/television/aguila-roja/ [Acceso 29/3/2017].
www.rtve.es/television/aguila-roja/serie [Acceso 29/3/2017].
62
Luisa Chierichetti
At the utter level there is a relationship between dramatist(s) and
audience(s); within that are the displayed relationships between characters (Richardson 2010: 188).
El diálogo telecinemático está cuidadosamente diseñado para los
oyentes no ratiicados5, para que puedan reconstruir los conocimientos
compartidos entre los participantes en la conversación (Bubel, 2008:
69); la audiencia, para comprender el discurso telecinemático, lo “coconstruye” utilizando una base de conocimientos compartidos activados
por modelos cognitivos o frames, nacidos del conocimiento del mundo;
este incluye no solo la realidad, sino también el mundo iccional de
los personajes televisivos y cinematográicos (Richardson, 2010: 127,
143). Con respecto a los espectadores circunstanciales u oyentes furtivos en el mundo real, los espectadores de películas no tienen derechos
o responsabilidades conversacionales, ni pueden negociar el signiicado
con los hablantes, tomando parte en el intercambio; estas dos desventajas hacen que el diseño de diálogos telecinemáticos sea un reto para el
equipo de producción (Bubel, 2008: 63-64).
El diálogo telecinemático nunca es realista, porque siempre está diseñado en función de una audiencia; sobre todo, en el caso de las series, está ahí para “enganchar” a los espectadores (Bednarek, 2010: 64;
2012a: 57). Este diálogo profesional exige que cada personaje tenga
una voz propia, pero su función principal es “hacer que progrese la
trama, dar información, revelar psicología o datos, establecer conlictos, ofrecer contextos del pasado”; el diálogo no solo se mantiene entre
personajes, sino que aparece también en las narraciones con voz en
off, en los pensamientos expresados con voz en off y en los monólogos
(Fernández Tubau, 2012: 180).
Es importante señalar que, sin duda, la esencia de los personajes va
más allá del uso de la lengua, y no solo porque se trate de personajes
televisivos; los esquemas de los personajes son constructos cognitivos
y su interpretación se halla en la intersección del mundo dramático y
Es conocida la clasiicación de los oyentes establecida por Goffman (1981: 124-159),
que distingue entre, por un lado, los participantes ratiicados (ratiied participants),
divididos entre destinatarios directos (addressed recipients) y destinatarios indirectos
(unaddressed), y, por otro, los espectadores no ratiicados o circunstanciales (bystanders), divididos, a su vez, en oyentes casuales (overhearers) y oyentes furtivos (eavesdroppers).
5
“El criado pesado”: La caracterización en la serie Águila Roja
63
del mundo real, tal y como existen en la mente del telespectador (Richardson, 2010: 149). Con todo, una de las funciones del diálogo es la
revelación del personaje y el descubrimiento por parte de los espectadores de información acerca de los estados mentales y de la personalidad
del personaje (Bednarek, 2010: 101), ya que, en palabras de Culpeper
(2009: 31), “it is the speech of each character that partly determines the
different characters we perceive”.
El signiicado de los diálogos iccionales nace de la colaboración
entre guionistas y público. Por un lado, los guionistas cuentan con el
conocimiento esquemático que tiene la audiencia y hacen suposiciones
acerca de los seres que ya pueblan los mundos cognitivos de su audiencia ideal de referencia; por otro lado, el público colabora facilitando esquemas apropiados y se acerca a la audiencia imaginada por los autores
(Richardson, 2010: 150).
En las series televisivas, la extensa duración de la icción permite
que el público desarrolle un apego especial hacia determinados personajes y un conocimiento profundo de los acontecimientos contados
(Richardson, 2010: 57). Los personajes, también a través de la caracterización de su forma de expresarse, “idelizan” la audiencia, ya que
los espectadores/oyentes entablan con ellos una relación especial (Bednarek, 2012a: 201). Este fenómeno es algo que hoy en día podemos
documentar por lo menos en parte, ya que en la era de la Web 2.0 el
público disfruta expresando sus gustos y sus emociones a través de géneros como las redes sociales, los blogs, y los fandoms, en un ámbito
transmedia en el que se sitúan fácilmente los mencionados desarrollos
multiplataforma de Águila Roja.
Nuestro análisis pretende demostrar que la audiencia de Águila Roja,
después de reconocer a Sátur como criado del siglo xvII, basándose en
la ambientación explícita de la serie, lo recategoriza como ‘contemporáneo’ a través de su discurso cercano al habla coloquial actual y en
tanto que se percata de que el diálogo va más allá del intercambio diegético ‘normal’, ya que se dirige directamente al oyente furtivo. De esta
manera, se rompe la ilusión de que el espectador esté solamente escuchando de manera furtiva a los personajes y se pone de maniiesto que
el diálogo se dirige al público, violando el principio de la suspensión
de la incredulidad y a la vez “sorprendiendo” al espectador en el acto
de escuchar furtivamente (Kozloff, 2000: 57). A este efecto, que crea
emoción e impresión de cercanía en la audiencia, se le añade, además,
64
Luisa Chierichetti
un efecto humorístico que nace de la interrupción de la convención narrativa (Ruiz Gurillo, 2012: 26).
4. Corpus de estudio y metodología
4.1. Corpus
Nuestro corpus se compone de guiones; se trata, por lo tanto, de textos
escritos para ser oralizados como algo no escrito, según la clasiicación
de Gregory & Carroll (1978: 47).
Hemos tenido acceso a los 116 guiones originales de la serie, en sus
versiones deinitivas, que resultan ser, según leemos en las portadas
de los documentos, la número cuatro o cinco (en tres casos la seis). Es
normal que en la icción televisiva se redacten varios borradores, que se
modiican sobre la base de relecturas y revisiones comunes. La creación
de guiones de series es una tarea colectiva llevada a cabo por escritores
que trabajan en equipo y se sitúa en especíicas condiciones sociales
especíicas de producción dentro de la industria televisiva (Richardson,
2010: 63-64).
Analizamos el corpus completo de los guiones de la serie Águila
Roja, del que hemos extraído los subcorpus Sátur, correspondiente a
los diálogos de este personaje, y Otros, que reúne los diálogos de todos
los demás personajes; los datos cuantitativos se resumen en la tabla 1:
Número de types
Número de tokens
Corpus completo
36.009
1.575.600
Sátur
12.404
163.686
Otros
24.123
578.481
Tabla 1: Datos del corpus y de los subcorpus
La información contenida en el corpus completo corresponde al
conjunto de todos los guiones, en los que se incluyen no solo los diálogos, sino también las acotaciones, las descripciones y las indicaciones
de voz y de encabezado de escena, es decir, toda la información que los
guionistas consideran necesaria para la construcción de la serie. Los
datos presentados en la tabla 1 también evidencian la importancia cuan-
“El criado pesado”: La caracterización en la serie Águila Roja
65
titativa del diálogo de Sátur, que de por sí cubre un tercio del diálogo de
los treinta personajes de que se compone la serie.
Este tipo de información también va contenida en la tabla 2, que presentamos más abajo, que contiene las cincuenta palabras más frecuentes del corpus completo; en ella observamos que la palabra “Sátur” es
la palabra léxica de mayor frecuencia tras las palabras gramaticales; el
dato, correspondiente a 23.068 casos, nos permite comprobar la enorme
presencia de referencias a este personaje en los guiones y, por lo tanto,
su relevancia y presencia discursiva en la serie, inferior solo a la del
protagonista (17.128 casos de “Gonzalo” a los que hay que añadir los
6118 de “Águila”, “Águila Roja”, “AR” y “A.R”, en un total de 23.246
casos).
la, de, a, que, el, y, se, no, en, sátur, un, con, gonzalo, lo, marquesa, le, los,
una, por, mira, al, comisario, es, su, está, margarita, día, las, pero, me, qué,
catalina, del, alonso, para, águila, te, roja, ha, cipri, si, va, yo, muy, ese, ya,
nuño, cardenal, más, mi
Tabla 2: Listado de las primeras 50 palabras más frecuentes en el corpus completo
A la hora de crear los subcorpus Otros y Sátur con el objetivo de
deinir la caracterización del personaje de Sátur, hemos eliminado los
encabezados y las descripciones, dejando solo, junto con los diálogos,
las acotaciones y, en el subcorpus Otros, también los nombres de los
personajes; las acotaciones nos brindan elementos de tipo contextual,
mientras que los nombres de los personajes sirven para desambiguar
las líneas de cada uno de ellos. Ambos subcorpus han sido examinados
y corregidos manualmente, ya que la complejidad de la redacción de
los guiones no permite una selección automática completa. El análisis
del corpus y de los subcorpus se ha llevado a cabo utilizando las herramientas del programa AntConc, en su versión 3.4.4w (Anthony, 2014).
4.2. Metodología
Para nuestro análisis partimos de la amplia noción de identidad expresiva del personaje de icción televisiva formulada por Bednarek (2011a).
La autora la describe como el conjunto de “those character traits that
concern emotions, attitudes, values, and ideologies, which all have a
66
Luisa Chierichetti
strong element of subjectivity”, que se construye a través de recursos
expresivos de tipo verbal y no verbal, en un contexto y cotexto determinado (Bednarek, 2011a: 9-10; 13).
Utilizamos la búsqueda de palabras clave (Keywords) y de n-gramas (N-grams) a través de AntConc para comparar el discurso de Sátur
con el de los demás personajes y luego desambiguarlos a través de la
función de Concordancias (Concordances) comprobando sus distintos
usos, así como analizando las funciones especíicas que cumplen (Bednarek, 2012a: 59).
Según Culpeper (2001: 199) las palabras frecuentes de un personaje pueden considerarse marcas estilísticas cuando se comparan con
una norma apropiada, es decir, un corpus de referencia para cuya construcción “no hay reglas mágicas”. Las palabras clave se relacionan directamente con la caracterización del personaje al ser palabras cuya
frecuencia, o repetición, diiere de manera signiicativa de una pauta.
Comparando el subcorpus Sátur con el subcorpus Otros, de mayor extensión, utilizado como corpus de referencia, obtenemos las palabras
inusualmente frecuentes (o inusualmente infrecuentes), es decir, las palabras clave (Culpeper, 2009: 33). En la tabla 3, que contiene las primeras cincuenta palabras clave de Sátur, observamos que, al lado de unas
pocas palabras gramaticales, el discurso del personaje presenta una alta
frecuencia de sustantivos, pronombres personales, conectores y marcadores, verbos en primera y tercera persona del singular y disfemismos
relacionados con palabras tabú.
RANK
FREQ
KEYNESS
KEYWORD
1
2974
8602.229
amo
2
1467
1834.199
usted
3
353
937.742
joder
4
8689
833.119
que
5
688
398.991
pues
6
1895
371.289
pero
7
182
363.695
mire
8
1689
349.958
le
9
3464
334.952
y
67
“El criado pesado”: La caracterización en la serie Águila Roja
RANK
FREQ
KEYNESS
KEYWORD
10
2252
274.910
se
11
737
266.226
va
12
1547
251.253
yo
13
100
249.276
cojones
14
437
233.306
dios
15
225
229.851
digo
16
119
227.778
Cipriano
17
94
219.229
cago
18
439
215.131
porque
19
1490
186.500
si
20
295
181.947
sabe
21
2374
179.284
me
22
80
176.593
leches
23
78
136.472
chiquillo
24
48
123.792
coño
25
302
121.998
ahí
26
56
118.142
parió
27
480
117.885
ni
28
181
117.295
eh
29
612
109.965
esto
30
521
102.156
tiene
Tabla 3: Primeras 30 palabras clave del subcorpus Sátur
En nuestro análisis consideramos que la signiicancia estadística de
n-gramas, entendidos como conjuntos de palabras que pueden aparecer
juntas en un texto con un orden consecutivo determinado, puede ser relevante a la hora de caracterizar a un personaje (Bednarek, 2012: 205).
Finalmente, el análisis de concordancias para la caracterización de los
personajes nos permitirá analizar listas de todas las ocurrencias de de-
68
Luisa Chierichetti
terminadas palabras del corpus, incluyendo su cotexto (a la izquierda y
a la derecha).
A través del análisis del corpus nos planteamos comprobar cómo el
personaje de Sátur se construye de manera implícita (Culpeper, 2001)
por medio de algunas pautas discursivas típicas del registro coloquial
oral (Briz Gómez, 2011), entre las que destacamos el uso de vocativos,
la intensiicación por medio de disfemismos y la de la relajación articulatoria (evidentemente en su reproducción gráica), rasgos en parte
compartidos por otros personajes de rango plebeyo frente a los de estamentos superiores. Las pautas repetidas crean una identidad expresiva
relativamente estable, siendo precisamente la estabilidad un rasgo comprobado en la icción serial, ya que se relaciona con la idelización de
la audiencia a lo largo de un periodo de tiempo extendido (Bednarek,
2011b: 187-197). Al centrarnos posteriormente en algunas incongruencias y anacronismos, argumentamos cómo todos estos rasgos contribuyen a crear un personaje gracioso, familiar y cercano a la audiencia
televisiva, según corroboramos apoyándonos en documentos de la producción televisiva (la biblia6 de la serie) y en el cotejo de comentarios
expresados a través de la red social Twitter.
5. Análisis
Investigamos el habla de Sátur aplicando la metodología propuesta en
el apartado anterior para explorar las pautas textuales implícitas que
dan lugar a la caracterización del personaje, es decir, las que se inieren
a partir de datos como pueden ser los rasgos léxicos y sintácticos o la
estructura conversacional (Culpeper, 2001: 172).
En los datos contenidos en la Tabla 3 hallamos “amo” como primera
palabra clave, “usted” como segunda, y “yo” en la posición 12. A través de la búsqueda de bigramas en el subcorpus Sátur (tabla 4), resulta
especialmente signiicativa la reiterada presencia del vocativo “amo”
en determinadas secuencias de palabras, que también revela su uso preponderante como apelativo directo:
6
“La biblia es un documento escrito donde se detallan y se explican, en distintos apartados, todos los aspectos importantes relacionados con una serie de televisión […] Suele
ser un documento que se elabora a priori aunque va sufriendo modiicaciones en el proceso. Debería servir como base de los futuros guionistas, actores y directores que entren
en el proyecto una vez empezado” (Ríos San Martín & Olivares, 2012: 45).
“El criado pesado”: La caracterización en la serie Águila Roja
Bigramas
Número total de
Agrupación de Types:
665
Número total de
Agrupación de
Tokens: 1070
44
30
28
26
17
15
14
12
11
11
no, amo
joder, amo
sí, amo
el amo
mi amo
pero, amo
ver, amo
amo, amo
¡amo! ¡amo!
espere, amo
8
8
7
7
7
7
7
6
6
6
69
eso, amo
nada, amo
esto, amo
oiga, amo
pasa, amo
siento, amo
yo, amo
al amo
dios, amo
verdad, amo
Tabla 4: Primeros 20 bigramas con amo del subcorpus Sátur
Interpretamos la conspicua frecuencia de uso del vocativo como una
de las pautas características del registro coloquial oral: la voz del “tú”
aparece junto al “yo” casi siempre de forma directa, y ambas pueden representar una estrategia retórica de intensiicación (Briz Gómez, 2011:
84). El vocativo, destinado a facilitar la apertura del canal de comunicación (Vigara Tauste, 1997), crea una repercusión afectiva y orienta
con bastante precisión acerca del carácter de Sátur y de su estrecha
relación con Águila Roja. Ya que, como señala Bednarek (2011a: 13),
los recursos expresivos pueden ser exclusivos de un personaje, o bien
pueden ser compartidos por otros, también los buscamos en el subcorpus Otros. En este subcorpus, los bigramas que contienen la forma nominal “amo” (desambiguada manualmente de la forma verbal) son 34,
pero su uso como apelativo aparece solo 3 veces, y precisamente en
unas notas escritas por Sátur (p. ej. “Amo, me han secuestrado, y no sé
quién. Sátur”), mientras que en el subcorpus Sátur la misma búsqueda
nos ha permitido encontrar 665 ocurrencias (tabla 4), de las que solo 26
no son apelativos, según una comprobación manual. El uso de “amo”
como vocativo es, por lo tanto, distintivo de Sátur y de su manera de
dirigirse a Águila Roja.
Leemos la presencia de “usted” como segunda palabra clave del subcorpus dentro de la misma estrategia de realce del papel del interlocutor
de la enunciación (Briz Gómez, 2011: 85), ya que el criado, coherentemente con su papel, se dirige a su amo con el tratamiento de cortesía.
Con la herramienta de concordancias y un control manual encontramos
que Sátur utiliza “usted” para dirigirse a Gonzalo 1434 veces de 1467,
como en los ejemplos que siguen:
70
Luisa Chierichetti
Esta enorme preponderancia es, por un lado, una pista que refuerza
la consideración del criado con respecto a su amo, y, por otro, conirma
la relación interpersonal que los une, basada en una situación vivencial
de proximidad. La elevada frecuencia de uso contribuye a caracterizar
al personaje (Bednarek, 2011b: 202) como especialmente locuaz e insistente, rasgo que también comentamos al interpretar la tabla 1; se trata
de una peculiaridad que los autores de los guiones consideran central,
como comprobamos en la biblia, en la que a Sátur se le denomina “el
criado pesado”, describiéndole de esta manera:
25 años. Judío converso, buscavidas, pesado, gracioso y metepatas pero
iel hasta la muerte con su señor Gonzalo que lo rescata de la cárcel. Le
sirve como escudero, criado, consejero… Es el único que sabe el secreto de su jefe y le ayuda en sus aventuras. Referente: Asno en Shrek.
La presencia de disfemismos relacionados con palabras tabú entre
las palabras clave del corpus Sátur, según vemos en la tabla 3 –“joder”
con 353 ocurrencias, “cojones” con 100, “cago” con 94, “leches” con
797, “coño” con 48–, nos conduce al ámbito de la intensiicación de
actitud en el registro coloquial oral (Briz Gómez, 2011: 98) (ejemplos
7
Con una desambiguación manual a través de concordancias, comprobamos que en una
ocurrencia no se trata de un disfemismo.
“El criado pesado”: La caracterización en la serie Águila Roja
71
1, 2), junto con el intensiicador sintáctico “la madre que me/los/etc.
parió” (ejemplo 3):
(1) Amo, que sigo vivo porque me debió ver esmirriao o algo… ¡Que
lo vi con mis propios ojos, me cago en la leche! (Águila Roja,
capítulo 51)
(2) ¡Joder, por aquí ya he pasado tres veces! (Lloriqueando) ¡Me cago
en las ratas, me cago en el pan y me cago en todo lo cagable!
¡Amooo! (Águila Roja, capítulo 12)
(3) ¡Mire! ¡Mire lo que hemos conseguido! Estamos en una jaula,
como animales pa la cena… Porque esto es una jaula, ¿no? (irónico) Aunque igual es sólo mi imaginación de ignorante. (Mordiendo
las palabras) ¡La madre que me parió! (Águila Roja, capítulo 20)
Aunque este tipo de estrategia de realce resulta ser clave para el personaje de Sátur, al buscar en el subcorpus Otros las concordancias de joder (31 repeticiones), cag* (24), parió (10), averiguamos manualmente
que los mismos rasgos enfáticos son compartidos por los personajes
del pueblo (Cipriano, Catalina, Sancho), no por los que pertenecen a
estamentos superiores (el Comisario y los nobles, como la Marquesa, el
Rey, etc.). Este resultado nos plantea que la caracterización de los personajes tiene en cuenta no solo su dimensión individual, sino también
la de su condición social.
La tercera y última pauta que consideramos que reconduce el discurso de Sátur al lenguaje coloquial es la reproducción gráica de la
pronunciación descuidada o popular (Briz Gómez, 2011: 95; Díaz Castañón, 1975: 115), como el uso de pa/pa’ por para (véase también el
ejemplo 3), así como la relajación del suijo -ado en -ao (véase también
el ejemplo 1), una característica visible a través de la herramienta de
Concordancias (108 ocurrencias de pa y 181 de *ao, comprobadas manualmente); a continuación presentamos los primeros 20 resultados por
pa y pa’:
La búsqueda de concordancias en el subcorpus Otros nos devuelve
89 ocurrencias de pa/pa’ y 134 de *ao, también delimitados al discurso
de las iguras más humildes, lo que conirma la inclusión de Sátur en
este grupo social.
72
Luisa Chierichetti
Resumiendo, el personaje de Sátur se caracteriza por la gran extensión de discurso producido (por la importancia cuantitativa de sus diálogos y la abundancia de enunciados dirigidos a su amo), por la estrecha
relación que le une con Águila Roja (la comprobamos en el uso muy
extendido del vocativo “amo” y del pronombre de cortesía “usted”) y
por el uso de un registro coloquial, marcado por la pronunciación descuidada y la intensiicación a través de disfemismos relacionados con
palabras tabú. El registro coloquial también lo inscribe en un estrato
social inferior con respecto a los personajes más poderosos. En palabras de Bednarek (2010: 125) la identidad expresiva, pues, combina la
individual y la social, y es a la vez una manera de expresar la identidad
única de un personaje y de alinear simultáneamente a este con un grupo
que expresa similares identidades expresivas.
La desviación que se produce entre la situación comunicativa y la
coloquialidad contemporánea del discurso de Sátur crea un efecto humorístico basado en la interrupción de la convención narrativa (Ruiz
Gurillo, 2012: 26) ambientada en el siglo xvII. Este procedimiento es
especialmente llamativo cuando Sátur incorpora a su discurso referencias totalmente incongruentes con el contexto histórico en el que se
hallan –evidentemente inteligibles solo para el público– o bien unas
unidades léxicas y fraseológicas coloquiales indiscutiblemente actuales, como en los ejemplos siguientes:
“El criado pesado”: La caracterización en la serie Águila Roja
73
(4) (Descubriéndose) Decir criado es simpliicar mucho mi condición.
En realidad soy ayuda de cámara, postillón, paje, cocinero. Vamos
que ordeno a las personas y las cosas, se podría decir que soy un
ordenador personal. (Águila Roja, capítulo 1)
(5) Que… como el gorro es talla única, me queda grande. Y los ojos
que… que no veo. (Águila Roja, capítulo 24)
(6) Como nos ataquen aquí, no lo contamos. Dicen que en este bosque
los bandoleros primero disparan y luego... ni preguntan ni nada...
te rematan y no te ponen mirando para Cuenca de milagro. (Águila
Roja, capítulo 43)
(7) Amo, por una vez, póngase en modo disfrute, no en modo justiciero, haga el favor. (Águila Roja, capítulo 112)
El efecto humorístico se explica a partir del proceso de co-construcción del signiicado de la serie por la audiencia, según vimos en el
apartado 3. Consideramos con Culpeper (2001) que inicialmente los
espectadores activan una estrategia top down para situar a Sátur en el
universo del siglo xvII, basándose en la ambientación de la serie; en
un segundo momento, a medida que profundizan en su conocimiento
del personaje, a través de la exposición al diálogo iccional, activan las
estrategias interpretativas bottom up, y pasan a percibir su comprensión del personaje de manera más completa, basándose en una serie de
indicios textuales, entre los que se sitúan los que destacamos. Resulta
así evidente la oposición entre dos marcos distintos, el del Siglo de
Oro y el de la contemporaneidad, así como entre dos planos distintos:
el de los destinatarios directos de los enunciados de Sátur y el de los
destinatarios indirectos, a saber, la audiencia del diálogo telecinemático. Este doble conlicto crea humor e ironía en el discurso del gracioso
y representa una razón de éxito para el personaje de Sátur. De hecho,
siguiendo a Bednarek (2012b: 201), podemos comprobar este logro a
través de algunos de los textos secundarios de los consumidores televisivos; por ejemplo, en la red social Twitter se han creado los hashtags
#grandesatur y #PerlasSatur, de los que proponemos a continuación algunos textos8:
8
Reproducimos la grafía originaria de los posts.
74
Luisa Chierichetti
Ana Barahona@ 92arcoiris 18 set 2014
Que gracia @CarmenMartin11: Amo está en modo heroe no en modo
maestro asi que no se vaya por las ramas #PerlasSatur #GrandeSatur
Ramos@ jesusilloramos 15 giu 2016
“La vidas felices solo pasan en los cuentos, aqui es una hostia sobre
otra” #GrandeSatur
Lorena@ LoreVdF 29 set 2016
Tira pa la casa y aprovecha que no estamos pa hacer limpieza general
jajajjaaja #perlassatur #ÁguilaRoja112
“Este traje es... Amo, que si se lo pone usted se van a cagar por la pata
abajo” jajajajajaaaaa #PerlasSatur #ÁguilaRoja114 @aguilaroja_tve
En estos posts los usuarios citan y comparten los “mejores” aciertos
discursivos del personaje, reconociendo como generadores de humor y
de ailiación en la comunidad virtual los rasgos coloquiales y de incongruencia y anacronismo que analizamos a través del análisis de corpus
y del análisis del discurso.
6. Conclusiones
En este trabajo nos hemos basado en un estudio de tipo cuantitativo y
cualitativo para analizar cómo el discurso sirve dentro del trabajo en
equipo de la escritura del guion, para caracterizar a un personaje de
manera expresiva.
Hemos utilizado algunas técnicas de la lingüística de corpus para
poner de maniiesto los rasgos más evidentes que constituyen la identidad del personaje de Sátur en la serie televisiva Águila Roja. La búsqueda de palabras claves y de n-gramas, profundizada a través de la exploración de concordancias, nos ha permitido comprobar, por un lado,
la copiosa presencia discursiva de Sátur, traducción discursiva de la
cualidad de “criado pesado” que le atribuye la biblia de la serie; por
otro, hemos reconocido que algunas pautas repetidas –la insistencia en
el uso del alocutivo “amo” y del pronombre de cortesía “usted”, el uso
de disfemismos como recursos intensiicadores, la pronunciación descuidada– marcan el uso del registro coloquial que caracteriza a Sátur,
a la vez que lo alinean al grupo de los personajes socialmente menos
“El criado pesado”: La caracterización en la serie Águila Roja
75
favorecidos. El uso de palabras tabú y de incongruencias –estas últimas
no localizadas a través de las técnicas de corpus– sitúa decididamente
al personaje en una dimensión contemporánea que contrasta con la ambientación histórica de la serie. Los resultados obtenidos a través del
estudio de corpus se han insertado en el universo especíico del discurso
telecinemático, escrutándolos en la perspectiva de las dinámicas de coconstrucción del signiicado que realiza la audiencia. Apoyándonos en
el estudio de Culpeper (2001), argumentamos que la discrepancia entre
la previsión que la audiencia hace acerca de Sátur, basándose en los rasgos de la serie, y el posterior descubrimiento de un discurso coloquial
contemporáneo y de incongruencias y anacronismos patentes, crea humor e ironía (Ruiz Gurillo, 2012), efectos valorados positivamente por
los espectadores que interactúan en las redes sociales. Las técnicas de
la lingüística de corpus, combinadas con el análisis discursivo ‘manual’
nos han permitido delinear los rasgos principales de la identidad expresiva de Sátur que motivan el aprecio generado dentro del éxito de la
serie Águila Roja.
Agradecimientos
Este trabajo no hubiera sido posible sin la ayuda de Andrés Cuenca
Lillo, director de casting de cine y televisión.
Le agradezco enormemente a Roberto Bernasconi su indispensable
asistencia informática en la automatización del proceso de selección de
datos.
Bibliografía
Anthony, Laurence. 2014. AntConc (Version 3.4.4w) [Computer Software].
Tokyo, Japan: Waseda University. http://www.laurenceanthony.net/
[Acceso 21/3/2017].
Baker, Paul. 2005. Public Discourses of Gay Men. London/New York: Routledge.
Barrientos Bueno, Mónica. 2011. Águila Roja, un espectáculo de masas (de
espectadores). Comunicación 9(1): 4-18.
Bednarek, Monika. 2010. The language of ictional television: Drama and
identity. London/New York: Continuum.
Bednarek, Monika. 2011a. Expressivity and televisual characterization. Language & Literature 20(1): 1-19.
76
Luisa Chierichetti
Bednarek, Monika. 2011b. The stability of the televisual character: A corpus
stylistic case study. En Piazza, Roberta; Bednarek, Monika & Rossi,
Fabio (ed.), Telecinematic discourse. Approaches to the language of
ilm and television series. Amsterdam/Philadelphia: John Benjamins,
185-204.
Bednarek, Monika. 2012a. Constructing “nerdiness”; Characterization in The
Big Bang Theory. International Journal of Corpus Linguistics 17(1):
35-63.
Bednarek, Monika. 2012b. Get us the hell out of here. Key words and trigrams
in ictional television series. Multilingua 31: 199-229.
Bednarek, Monika. 2015a. “Wicked” women in contemporary popculture:
“bad” language and gender. En Weeds, Nurse Jackie, and Saving Grace.
Text&Talk 35(4): 431-451.
Bednarek, Monika. 2015b. Corpus-assisted multimodal discourse analysis of
television and ilm narratives. En Baker, Paul & McEnery, Tony (ed.)
Corpora and Discourse Studies. Basingstoke/New York: Palgrave
Macmillan, 63-87.
Briz Gómez, Antonio. 2011. El español coloquial en la conversación. Esbozo
de pragmagramática. Barcelona: Ariel.
Bubel, Claudia. 2006. The linguistic construction of character relations in TV
drama: Doing friendship in Sex and the City. Saarbrücken, Alemania:
Universität des Saarlandes dissertation. http://scidok.sulb.uni-saarland.
de/volltexte/2006/598/ [Acceso 21/3/2017].
Bubel, Claudia. 2008. Film audiences as overhearers. Journal of Pragmatics
40: 55-71.
Bubel, Claudia & Spitz, Alice. 2006. One of the last vestiges of gender bias.
The characterization of women through the telling of dirty jokes in Ally
McBeal. Humor 19(1): 71-104.
Costa Sánchez, Carmen & Piñeiro Otero, Teresa. 2012. Nuevas narrativas
audiovisuales: multiplataforma, crossmedia y transmedia. El caso de
Águila Roja. ICONO 14 10(2): 102-125.
Culpeper, Jonathan. 2001. Language and characterisation: People in plays
and other texts. London: Longman.
Culpeper, Jonathan. 2009. Keyness: Words, part-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and Juliet. International Journal of Corpus Linguistics 14(1): 29-59.
Díaz Castañón, Carmen. 1975. Sobre la terminación “-ado” en el español de
hoy. Revista española de lingüística 5(1): 111-120.
Fernández Tubau, Valentín. 2012. Diálogos en el guion. Arte y técnica. En
Ríos San Martín, Manuel El guion para series de televisión. Madrid:
Instituto RTVE, 169-207.
“El criado pesado”: La caracterización en la serie Águila Roja
77
Galán Fajardo, Elena. 2006. La representación de los inmigrantes en la icción televisiva en España. Propuesta para un análisis de contenido. El
Comisario y Hospital Central. Revista Latina de comunicación social
61. http://www.ull.es/publicaciones/latina/200608galan.htm [Acceso
21/3/2017].
Galán Fajardo, Elena. 2007. Construcción de género y icción televisiva en España. Comunicar: Revista cientíica iberoamericana de comunicación
y educación 28: 229-236. http://www.revistacomunicar.com/index.
php?contenido=detalles&numero=28&articulo=28-2007-28 [Acceso
21/3/2017].
Goffman, Erving. 1981. Forms of Talk. Oxford: Blackwell.
González de Garay Domínguez, Beatriz. 2009. Ficción online frente a icción
televisiva en la nueva sociedad digital. Diferencias de representación
del lesbianismo entre las series españolas para televisión generalista y
las series para Internet. Actas ICONO 14. http://eprints.ucm.es/9856/
[Acceso 21/3/2017].
Gregori Signes, Carmen. 2007. What do we laugh at? Gender representations
in 3rd Rock from the Sun. En Santaemilia, José; Bou, Patricia; Maruenda, Sergio & Zaragoza, Gora (ed.) International Perspectives on Gender and Language. Valencia: Universitat de Valencia, 726-750.
Gregory Michael & Carroll, Suzanne. 1978. Language and situation: Language varieties and their social contexts. London: Routledge and Kegan
Paul.
Guerrero, Mar. 2014. Webs televisivas y sus usuarios: un lugar para la narrativa transmedia. Los casos de Águila Roja y Juego de Tronos en España.
Comunicación y sociedad 21: 239-267.
Igartua, Juan José; Barrios, Isabel M. & Ortega, Félix. 2012. Analysis of immigration image in the prime time television iction. Comunicación y
Sociedad 25(2): 5-28.
López, José Antonio & Cuenca, Francisco Antonio. 2005. Ficción televisiva
y representación generacional: modelos de tercera edad en las series
nacionales. Comunicar 25. http://www.revistacomunicar.com/index.
php?contenido=detalles&numero=25&articulo=25-2005-147 [Acceso
21/3/2017].
Mandala, Susan. 2007. Solidarity and the scoobies: an analysis of the -y sufix
in the television series Buffy the Vampire Slayer. Language and Literature 16(1): 53-73.
Mandala, Susan. 2008. Representing the future: Chinese and codeswitching in
Firely. En Rhonda V. Wilcox & Cochran Tanya R. (ed) Investigating
Firely and Serenity: Science iction on the frontier. London/New York:
I. B. Tauris, 31-40.
78
Luisa Chierichetti
Mandala, Susan. 2011. Star Trek: Voyager’s seven of nine: a case study of
language and character in a televisual text. En Piazza, Roberta; Bednarek, Monika & Rossi, Fabio (ed.) Telecinematic discourse: Approaches
to the language of ilm and television series. Amsterdam/Philadelphia:
John Benjamins, 205-223.
Marcos, María & Igartua, Juan José. 2014. Análisis de las interacciones entre
personajes inmigrantes/extranjeros y nacionales/autóctonos en la icción televisiva española. Disertaciones: Anuario electrónico de estudios en Comunicación Social 7(2): 136-159.
Piazza, Roberta; Bednarek, Monika & Rossi, Fabio (ed.). 2011a. Telecinematic
discourse: Approaches to the language of ilm and television series.
Amsterdam/Philadelphia: John Benjamins.
Piazza, Roberta; Bednarek, Monika & Rossi, Fabio. 2011b. Introduction:
Analysing telecinematic discourse. En Piazza, Roberta; Bednarek, Monika & Rossi, Fabio (ed.). Telecinematic discourse: Approaches to the
language of ilm and television series. Amsterdam/Philadelphia: John
Benjamins, 1-17.
Richardson, Kay. 2010. Television dramatic dialogue: A sociolinguistic study.
Oxford: Oxford University Press.
Ríos San Martín, Manuel & Olivares, Javier. 2012. De la idea a la emisión. En
Ríos San Martín, Manuel El guion para series de televisión. Madrid:
Instituto RTVE, 169-207.
Ruiz Gurillo, Leonor. 2012. La lingüística del humor en español. Madrid:
Arco/Libros.
Vigara Tauste, Ana María. 1997. Miau: El lenguaje coloquial (humano) en
Galdós. Espéculo 5. https://pendientedemigracion.ucm.es/info/especulo/numero5/miau_vig.htm [Acceso 18/8/2017].
Wodak, Ruth. 2009. The discourse of politics in action. Basingstoke/New
York: Palgrave Macmillan.
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
Persiguiendo con imparcialidad “el total desprecio
a la Constitución”: el léxico valorativo en la Querella
del Fiscal de Cataluña contra Carme Forcadell i Lluís
Impartially prosecuting “the total contempt for the Constitution”:
Evaluative lexis in the criminal complaint iled by the Public Prosecutor
of Catalonia against Carme Forcadell i Lluís
Giovanni Garofalo
Università degli Studi di Bergamo.
[email protected]
Recibido: 29/04/2017. Aceptado: 16/10/2017
Resumen: Se propone un estudio semántico-discursivo de las dos querellas presentadas por la Fiscalía Superior de Cataluña contra D.ª Carme Forcadell i Lluís, presidenta
del Parlamento de Cataluña, y contra los miembros de la Mesa del Parlamento catalán
por los delitos de desobediencia y prevaricación. Compaginando las metodologías del
análisis de sentimiento, de la lingüística del corpus y de la teoría de la valoración, este
estudio desmiente la idea de que la querella solicita de forma imparcial la aplicación de
normas generales a casos concretos. Lejos de ser fácticos o ideacionales, los enunciados del iscal están cargados de signiicados interpersonales y maniiestan implicación
subjetiva con una vehemencia aledaña de la invectiva política.
Palabras clave: querella; análisis de sentimiento; polaridad textual; subjetividad; teoría de la valoración.
Abstract: This paper proposes a semantic-discursive study of the two criminal complaints iled by the Public Prosecutor of Catalonia against Mrs. Carme Forcadell i Lluís,
President of the Catalan Parliament, and against key members of the Catalan Parliament’s Bureau for the crimes of disobedience and misconduct. Combining sentiment
analysis, corpus linguistics and appraisal theory, this study denies the idea according to
which a criminal complaint seeks the application of general norms to concrete cases in
an objective fashion. Far from being factual or ideational, the Prosecutor’s utterances
are laden with interpersonal meanings and reveal subjective implication with a vehemence reminiscent of political invective.
Keywords: criminal complaint; sentiment analysis; text polarity; subjectivity; appraisal theory.
Garofalo, Giovanni. 2017. “Persiguiendo con imparcialidad ‘el total desprecio a la
Constitución’: el léxico valorativo en la Querella del Fiscal de Cataluña contra
Carme Forcadell i Lluís”. Quaderns de Filologia: Estudis Lingüístics 22: 79103. doi: 10.7203/qf.22.11302
Persiguiendo con imparcialidad “el total desprecio a la Constitución”...
81
1. El escenario discursivo de las querellas
El 19 de octubre de 2016 el Fiscal Superior de Cataluña presentó, ante
el Tribunal Superior de Justicia de Cataluña (TSJC), una querella contra
D.ª Carme Forcadell i Lluís, presidenta del Parlament, acusándola de
los delitos de prevaricación y de desobediencia al Tribunal Constitucional (TC), ya que había permitido que la Cámara catalana votara la hoja
de ruta independentista.
El texto del iscal precisaba que, en fecha 20 de julio de 2016, la
Mesa del Parlamento autonómico, tras escuchar a la Junta de Portavoces, había tomado nota de las conclusiones de la Comisión de Estudio
del Proceso Constituyente (CEPC), en las que se indicaba el recorrido
hacia la desconexión. Dichas conclusiones ya habían sido declaradas
inconstitucionales por un auto del TC, debidamente notiicado a los
querellados. Aunque los servicios jurídicos de la cámara se pronunciaran en contra de las indicaciones de la CEPC por no ser constitucionalmente admisibles, en palabras del Fiscal, Forcadell
pese a ser consciente de que tal decisión contravenía frontalmente [...]
el Auto de 19 de julio de 2016, acordó someter a votación la alteración
del orden del día para incluir la votación sobre las conclusiones de la
Comisión de Estudio del Proceso Constituyente, resultando aprobada la
alteración del orden del día y la inclusión del nuevo punto.
El 23 de febrero de 2017, la Fiscalía volvió a querellarse contra la
presidenta del Parlament y los miembros soberanistas de la Mesa Lluís
Corominas, Anna Simó y Ramona Barrufet por desobediencia y prevaricación, por permitir la aprobación de dos resoluciones presentadas por
Junts pel Sí y la CUP que instaban a convocar un referéndum unilateral
de secesión.
El sentido común prescribiría que ambos documentos judiciales se
ciñeran a los hechos documentales, acudiendo a enunciados neutrales
desde el punto de vista interpersonal y, por ello, “fácticos” u “objetivos”. Nos esperaríamos, por ende, que el Ministerio Fiscal actuara
*
Este trabajo se enmarca en el proyecto Discurso jurídico y claridad comunicativa.
Análisis contrastivo de sentencias españolas y de sentencias en español del Tribunal
de Justicia de la Unión Europea (Referencia FFI2015-70332-P), inanciado por el Ministerio de Economía y Competitividad de España así como por los Fondos FEDER..
82
Giovanni Garofalo
como boca inanimada de la ley (bouche de loi) y que sus enunciados
vehicularan signiicados que, en la terminología de la gramática sistémico-funcional, se deinen como ideacionales (Halliday & Hasan 1985:
20), es decir, relacionados con la mera representación de los hechos del
mundo, tal como los entendemos a través de la experiencia. Este comportamiento verbal sería el más congruente con los principios de imparcialidad y objetividad que deberían guiar la actuación del Ministerio
Público, según el art. 124.1 de la Constitución Española, desarrollado
en el art. 7 del Estatuto Orgánico del Ministerio Fiscal (EOMF) en los
términos siguientes:
Por el principio de imparcialidad el Ministerio Fiscal actuará con plena
objetividad e independencia en defensa de los intereses que le estén
encomendados.
Así pues, el iscal debería ser neutral en la evaluación de los hechos
y pruebas que dan lugar a la causa, sin perjudicar a ninguno de los que
intervienen en el proceso, dado que su actuación ha de ser desinteresada
y desapasionada, debiendo atenerse únicamente a la realidad objetiva.
Analizadas más de cerca, sin embargo, las elecciones léxicas y gramaticales del emisor resultan cargadas de signiicados interpersonales,
que no se limitan a representar la realidad sino que interactúan con la
parte contraria, a través de un amplio abanico de valoraciones subjetivas de polaridad negativa. De esta manera, la voz autoral del texto
acaba inluyendo en la decisión del TSJC, favoreciendo “el empleo de
un determinado subconjunto de valores del sistema de valoración y desechando otros” (White, 2001: 9, trad. propia). En concreto, esta estrategia apunta a derrumbar el ethos de la querellada y el ideario soberanista
y, por otro, otorgar carta de naturaleza a actitudes, creencias y supuestos
que vertebran el discurso del constitucionalismo.
En cuanto acción social –a saber, proceder humano orientado por las
acciones de otro (Weber, 1921)– la querella del iscal se encuadra en
un marco interactivo institucionalizado, deinido por las convenciones
del género discursivo. Según el acusador sea un procurador de los tribunales (Garofalo, 2009) o el ministerio iscal, dicho marco interactivo
admite variantes que tienen que ver con la organización de la escena
de enunciación y con el reparto de los papeles discursivos entre los
intervinientes.
Persiguiendo con imparcialidad “el total desprecio a la Constitución”...
83
A partir del concepto de escena de enunciación, entendida como ‘el
interior del discurso’, en el que la palabra es puesta en escena (Maingueneau, 1993), en la querella del iscal cabe distinguir tres escenas
distintas:
1. La escena englobante se reiere al tipo de escena general en el que
hay que situarse para entender los propósitos retóricos y pragmáticos del emisor y de qué modo el destinatario es interpelado por
el texto. En el caso que nos ocupa, la escena o evento englobante
correspondería a presentar una queja ante un juzgado o tribunal.
2. La escena genérica es impuesta por el género discursivo especíico: entre sus componentes, destacan los roles que desempeñan
los participantes, y el propósito principal del emisor. Desde la
perspectiva sociolingüística de Goffman (1981), el emisor actúa
como autor de sus propios enunciados y, a la vez, como animador o reformulador de un punto de vista compartido por toda la
Fiscalía Superior de Cataluña y por el Fiscal General del Estado.
Este último órgano, valiéndose de su superioridad jerárquica requirió que se formulara querella contra Forcadell y, por tanto,
desempeña en la interacción el papel de responsable. El destinatario, en cambio, se desglosa en destinatario directo (el TSJC,
conocido, ratiicado y apelado), para quien está especíicamente
construido el texto, y el destinatario indirecto, a saber, la presidenta del Parlament y los demás miembros independentistas del
Govern. El principal objetivo del emisor consiste en constituirse
en parte acusadora, solicitando al TSJC para que admita a trámite
ambas querellas.
3. La escenografía, entendida como escena construida en el texto,
legitima los enunciados seleccionados por el iscal y permite la
introducción de posturas evaluativas especíicas para interpelar y
persuadir al TSJC. Como tal, según Chareaudeau y Maingueneau
(2002: 222), la escenografía no constituye el simple marco del
texto, sino que remite a un esquema cognitivo concreto que anida
y va consolidándose dentro de la escena genérica: “al emerger, la
palabra implica una determinada escena de enunciación que, en
realidad, se valida progresivamente a través de esa enunciación
misma”. A la luz de lo anterior, la escenografía activada por el
iscal al acusar a Forcadell y a los demás querellados es la de un
84
Giovanni Garofalo
panleto1 de defensa de la Constitución de 1978 y del Estado de
derecho, que legitima y fomenta las elecciones gramaticales y
léxicas de carácter valorativo. En este sentido, recurriendo a las
palabras de los precitados autores (2002: 222, cursiva añadida),
la defensa apasionada del constitucionalismo, convertida en escenografía, es
aquello de lo que procede el discurso y lo que este discurso engendra:
ella legitima un enunciado que a su vez debe legitimarla, debe dejar
establecido que esa escenografía de la que procede la palabra es precisamente la escenografía requerida para contar una historia, denunciar
una injusticia, etc.
Dentro de esta escenografía, las valoraciones especíicas realizadas
por el iscal en las secuencias narrativas y argumentativas de ambos
textos han de acomodarse a un enmarcado conceptual o guion –impuesto por los artículos 410.1 (delito de desobediencia) y 404 (delito de
prevaricación continuada) del CP2– deinido por Taranilla (2012: 101)
como “esquema temporalmente ordenado de acontecimientos y situaciones comunes”. Basándose en Schank y Abelson (1987) y en la teoría
del ilósofo del derecho Nerhot (1990), la precitada autora destaca que
la norma contiene un modelo que construye la realidad, asignando pertinencia a determinadas situaciones particulares y estandarizándolas,
para que queden inmediatamente reconocibles por los interlocutores
que poseen en su enciclopedia ese mismo esquema o enmarcado. El
análisis cuantitativo y cualitativo que se ofrece a continuación demosSe entiende aquí el término panleto como “opúsculo de carácter agresivo” (DLE
2014, s.v.) o “folleto u hoja de propaganda política o de ideas de cualquier clase” (María
Moliner 1992, s.v.), es decir, como discurso de apasionada defensa de los principios
constitucionales.
2
Los mencionados artículos tienen el siguiente tenor:
Art. 404.0: “A la autoridad o funcionario público que, a sabiendas de su injusticia, dictare una resolución arbitraria en un asunto administrativo se le castigará con la pena de
inhabilitación especial para empleo o cargo público y para el ejercicio del derecho de
sufragio pasivo por tiempo de nueve a quince años”.
Artículo 410.1. “Las autoridades o funcionarios públicos que se negaren abiertamente
a dar el debido cumplimiento a resoluciones judiciales, decisiones u órdenes de la autoridad superior, dictadas dentro del ámbito de su respectiva competencia y revestidas
de las formalidades legales, incurrirán en la pena de multa de tres a doce meses e inhabilitación especial para empleo o cargo público por tiempo de seis meses a dos años”.
1
Persiguiendo con imparcialidad “el total desprecio a la Constitución”...
85
trará que los guiones especíicos activados por los artículos del código
penal anteriormente mencionados resultan determinantes para la selección de los elementos léxico-gramaticales de valor axiológico observables en ambos textos.
En efecto, la función acusadora del género querella hace que dichos elementos presenten una polaridad negativa y una carga afectiva
tendente a la intensiicación. Cuando quien formula la acusación es un
procurador, según la gravedad del delito atribuido al querellado, se ha
observado cierto grado de implicación de la subjetividad del emisor
en las circunstancias personales de su cliente, unida a la manifestación
de cierta empatía entre el procurador y su mandante (Garofalo, 2009:
175). Esta dinámica intersubjetiva no sorprende, ya que entre el abogado que redacta el texto y el querellante hay una relación profesional
entre particulares, a raíz de la cual un profesional del derecho recibe un
beneicio monetario para representar, con la debida contundencia, los
intereses de su cliente. Un tanto distinto es el caso del subgénero “querella del Ministerio Fiscal”, en el que quien se constituye en parte acusadora es una igura institucional llamada a actuar con mayor equilibrio
e imparcialidad y a mantenerse equidistante de polémicas políticas, en
cuanto defensora de la legalidad. Ahondando en esta línea, los epígrafes
siguientes pondrán de maniiesto que la defensa del constitucionalismo
desde los tribunales puede llegar a realizarse con una intensidad parecida al ímpetu de un ataque político.
2. Metodología de análisis y objetivos de la investigación
El presente estudio se inscribe en el marco de la investigación sobre la
valoración y la emoción en los discursos especializados (López Ferrero,
2008; Diaz Rojo, 2010; Serpa, 2011, entre otros) y pretende integrar diferentes enfoques metodológicos, computacionales y semánticos, para
lograr una comprensión más profunda de los recursos axiológicos movilizados por el ministerio iscal. El recorrido analítico que se propone
a tal efecto se articula en tres etapas interrelacionadas.
En primer lugar, se realiza una medición cuantitativa de las marcas de valoración presentes en ambas querellas, para que los elementos
léxico-gramaticales que maniiestan la subjetividad del emisor “emerjan por sí solos” del corpus de estudio (Biber, 2009), proporcionando
datos empíricos capaces de orientar el análisis. El primer estadio de
86
Giovanni Garofalo
esta investigación, por tanto, sigue un enfoque inductivo guiado por el
corpus (corpus driven) y se basa en la medición de la polaridad y de la
intensidad del sentimiento textual mediante la herramienta de análisis
de sentimiento Lingmotif v.1.0 (Moreno-Ortiz, 2016). Entendido como
procesamiento computarizado de la expresión de opiniones y juicios y
emociones del emisor y, en general, de su subjetividad (Liu, 2010), el
análisis de sentimiento (o minería de opinión) permite medir la carga
afectiva de los textos y, pese a sus límites, ofrece la ventaja de cuantiicar la fuerza de los ataques del iscal con datos numéricos y de compararla con la intensidad del sentimiento en un corpus de referencia del
género querella criminal.
Como se verá, los resultados arrojados por Lingmotif se basan en palabras valorativas aisladas, monolexémicas o polilexémicas, contenidas
en el diccionario de la aplicación, conigurado para la lengua estándar.
Para paliar los inevitables errores de detección automática de la polaridad textual de elementos muy sensibles al contexto, se ha procedido
a la constitución manual de un diccionario complementario especíico
(plugin), que recoge ítems léxicos axiológicos en el dominio de los delitos de desobediencia y de prevaricación.
La segunda etapa profundiza en el análisis de sentimiento e ilustra
los criterios adoptados para seleccionar las marcas valorativas que se
han incorporado al plugin, para que el vocabulario de complemento
de Lingmotif no fuera un mero listado de palabras seleccionadas de
manera impresionista. En concreto, se ha optado por un doble criterio
semántico-estadístico, según el cual se han añadido al plugin solo palabras clave dotadas de frecuencia inusual (keywords), pertenecientes
a los tres dominios de la Teoría de la Valoración (actitud, compromiso
y gradación, véase Martin, 2000 y 2003; Martin y White, 2005; Martin y Rose, 2007). Para este in, con la ayuda del programa AntConc
(Anthony, 2014), se ha extraído la keyword list y se han observado las
concordancias y los agrupamientos léxicos (clusters) de las palabras de
dicha keyword list.
Por último, la tercera etapa de la investigación elabora los resultados de las dos primeras y propone un breve análisis cualitativo de
cada uno de los tres dominios de la valoración en las dos querellas
de referencia, combinando el enfoque basado en corpus y el guiado
por el corpus (Tognini-Bonelli, 2001). El análisis cualitativo ofrece
la ventaja de ainar el análisis de sentimiento y de profundizar en sus
Persiguiendo con imparcialidad “el total desprecio a la Constitución”...
87
resultados, dado que las herramientas informáticas existentes no son
capaces de interpretar adecuadamente las correlaciones semánticas de
una información con otros conceptos aledaños y no siempre el analista
dispone de modelos de representación del conocimiento especializado,
p.ej., de ontologías o redes semánticas capilares del dominio penal,
capaces de garantizar un análisis automatizado riguroso de los signiicados valorativos. Por tanto, se ha considerado necesario ‘pulir’ los
resultados obtenidos con Lingmotif mediante una tasación cualitativa
del sentimiento, considerando que la actitud del iscal se maniiesta a
menudo de forma implícita y no puede considerarse como una característica o propiedad de palabras individuales, sino de enunciados o
textos enteros (White, 2001).
Cabe reconocer que este enfoque metodológico presenta tanto una
limitación de fondo como una ventaja implícita. Por un lado, es inusual
que una aplicación de Sentiment Analysis, concebida para el análisis
automático de textos en lengua estándar, se aplique a un género judicial como la querella del Ministerio Fiscal y, en concreto, a un corpus
de tamaño bastante reducido (solo 23.826 palabras). De hecho, la incorporación de un plugin lexicon tiende necesariamente a variar los
valores de polaridad y, por esta razón, hubiera sido oportuno basar el
análisis en un corpus suicientemente amplio, para determinar qué elementos lexicogramaticales se deben introducir en un plugin útil para el
análisis automático de textos de este dominio penal.
Por otro lado, entendemos que la selección de los ítems axiológicos
y la determinación de su polaridad son variables muy sensibles no solo
a las dimensiones del corpus de estudio y al género textual, sino también al enmarcado cognitivo (escenografía y guiones delictivos) de los
textos judiciales que se quieran analizar. El plugin constituido para esta
investigación abarca solo palabras clave coherentes con el enmarcado
concreto impuesto por el iscal y, por esta razón, creemos que los resultados obtenidos –pese a sus límites– pueden resultar signiicativos para
el tratamiento automático de otros textos dotados del mismo frame,
orientado hacia la defensa apasionada del constitucionalismo y hacia la
reprobación de conductas desobedientes y prevaricadoras, actualmente
imputadas a más de una igura del independentismo por los jueces de
Cataluña.
En resumen, el estudio propuesto constituye un caso de triangulación (McNeill, 1990: 22), ya que intenta conjugar múltiples enfoques
88
Giovanni Garofalo
metodológicos, discursivos y computacionales, para ofrecer una descripción empíricamente cimentada de las marcas de subjetividad del
iscal, en cuya voz el discurso judicial se hibrida con el político.
3. Medición de la polaridad de las querellas con Lingmotif v. 1.0
El grado de implicación subjetiva del iscal en ambas querellas se ha
cuantiicado, de entrada, mediante el software Lingmotif (MorenoOrtiz, 2016), una aplicación de análisis de sentimiento capaz de identiicar en los textos palabras y frases con carga afectiva, contenidas en
los diccionarios del programa, y de aplicar reglas de contexto (de inversión, intensiicación y atenuación), para dar cabida a posibles modiicadores del sentimiento (Moreno-Ortiz, 2017: 133).
Los valores arrojados por Lingmotif se diferencian en dos magnitudes, a saber, el TSI (Text Sentiment Intensity) o índice de intensidad del
sentimiento textual –es decir, la relación entre ítems que expresan sentimiento e ítems de valor no emocional– y el TSS (Text Sentiment Score),
o valor global del sentimiento textual, expresado como promedio de
elementos positivos, negativos y neutros contenidos en cada texto. Ambas magnitudes se miden en una escala graduada, concebida como un
continuum de valores de 0 a 100, que van, para el TSS, de lo extremadamente negativo (< 20) a lo extremadamente positivo (˃ 80) y, para el
TSI, de lo extremadamente factual (< 55) a lo extremadamente intenso
(˃ 85). El programa asigna una valencia positiva (entre 5 y 2), negativa
(entre -5 y -2) o neutra a cada ítem léxico (excepto a las palabras gramaticales) y los valores del TSI relejan el porcentaje de las valencias
asignadas, teniendo presente la longitud de cada texto.
Se ha realizado un análisis conjunto de las dos querellas (23.826
palabras en total), ya que ambas apuntan a enjuiciar delitos idénticos
cometidos por las mismas personas, con una cronología y una dinámica
ligeramente diferente. La decisión de reunir ambos textos en un único
corpus de análisis se debe también a razones funcionales y estructurales. En primer lugar, la propia Fiscalía solicita la acumulación de la
segunda querella a las diligencias previas activadas por la primera y
seguidas ante el TSJC. En segundo lugar, se han comparado los dos textos con la herramienta de traducción asistida por ordenador SDL Trados
Studio y este cotejo ha evidenciado que la primera querella presenta un
Persiguiendo con imparcialidad “el total desprecio a la Constitución”...
89
total de 9.587 palabras y la segunda un total de 14.239 palabras, de las
cuales 7.649 están tomadas y repetidas del primer texto.
Cabe destacar que el diccionario de Lingmotif (que incluye, para
el español, 207.000 palabras y 300 reglas contextuales, Moreno-Ortiz,
2017: 137) está concebido para analizar el sentimiento de textos de registro estándar, aunque permite utilizar léxicos especíicos, elaborados
por el propio usuario a modo de léxico complementario (plugin), lo
cual posibilita el análisis de la carga afectiva de géneros especializados.
Dado que la orientación semántica depende del ámbito de especialidad
(Moreno-Ortiz & Fernández Cruz, 2015: 332), a falta de un extractor
estadístico capaz de identiicar con cierta iabilidad candidatos a términos de un corpus de querellas, el análisis ha requerido la elaboración
manual de un plugin especíico, capaz de detectar la polaridad del léxico de la Fiscalía en ambos documentos. Los criterios para la constitución del plugin se detallan en el epígrafe siguiente; lo que interesa
destacar aquí es que, de entrada, se ha efectuado un análisis del sentimiento de ambos textos, con y sin diccionario de complemento, lo que
ha producido los siguientes resultados:
Fig. 1. Análisis de sentimiento de ambas querellas sin plugin
Fig. 2. Análisis de sentimiento de ambas querellas con un plugin especíico
90
Giovanni Garofalo
Lo que se observa a simple vista es que, tras incorporar el léxico
especíico, la aplicación arroja valores bastante parecidos de TSS (con
un ligero viraje de lo neutro a lo ligeramente negativo) e índices de TSI
muy distintos, hasta alcanzar un índice extremadamente intenso (92)
del sentimiento global en el segundo análisis (ig. 2).
Para interpretar correctamente estos datos, relacionándolos con el
género de estudio, se ha procedido al cálculo de la línea de base, es
decir, de los valores que expresan la normalidad estadística del TSS y
del TSI en un corpus de referencia de 63 querellas formuladas por un
amplio abanico de delitos (629.893 palabras en total):
Fig. 3. Línea de base del TSS y del TSI en el corpus de referencia
Los indicadores de la igura 3, recabados a partir del diccionario de
Lingmotif, demuestran que el género querella, cuya función primaria
consiste en mover una acusación contra alguien, suele caracterizarse
por un léxico de polaridad bastante negativa y por una carga afectiva
tendiente a la intensiicación. Según la gravedad del delito atribuido al
querellado, la implicación de la subjetividad del procurador en las circunstancias personales de su cliente, unida a la manifestación de cierta
empatía entre animador y autor del texto, son estrategias discursivas ya
observadas en la querella española y, en general, ausentes en los textos
paralelos italianos (Garofalo, 2009: 175). No es baladí advertir que,
en las 63 querellas del corpus de referencia, entre el procurador que
redacta el texto y el querellante hay una relación profesional entre particulares, en la que el abogado recibe un beneicio monetario para representar, con la debida contundencia, los intereses de su cliente. Un tanto
distinto es el caso que nos ocupa, en el que quien se constituye en parte
acusadora es el Ministerio Fiscal, igura institucional que actúa como
defensor de la legalidad y que suele intervenir con mayor equilibrio y
Persiguiendo con imparcialidad “el total desprecio a la Constitución”...
91
con la imparcialidad que corresponde (o debería corresponder) a sus
funciones. De ahí que el promedio de elementos léxicos positivos, negativos y neutros (TSS), en ambos textos de la Fiscalía General de Cataluña (ig. 1), tienda a una mayor neutralidad, resultando cuatro puntos
superior a la línea de base (ig. 3). Ello no signiica, no obstante, que
el Fiscal deienda su tesis con menor dureza, como parece demostrar el
valor bastante intenso del TSI (ig. 1), apenas cuatro puntos inferior a
la línea de base.
Es oportuno destacar que los resultados comentados hasta aquí no
varían de forma signiicativa si se repite el análisis de sentimiento del
corpus de referencia incorporando el plugin especíico elaborado para
las dos querellas contra Forcadell: se obtiene un mero incremento de
un punto del valor del TSS (38), mientras que el índice de TSI baja una
unidad (64). Este resultado se interpreta fácilmente, ya que el plugin
funciona para detectar la carga afectiva del léxico relacionado con los
delitos de desobediencia y prevaricación, y el corpus de referencia no
contempla casos subsumibles en la misma tipiicación jurídica.
Por último, es interesante señalar que, tras analizar el corpus de cotejo con Lingmotif, dos textos presentan un índice de TSI igual a 100: se
trata, respectivamente, de una querella formulada por un delito que vio
al Partido Popular como parte ofendida y de otra presentada por delitos
que se produjeron a raíz de un proyecto de ley impulsado por el mismo
partido3. Ligeramente por debajo del valor máximo de TSI, en el corpus de referencia, destacan una querella por injurias presentada por el
lehendakari Ibarretxe contra un periodista de El País (TSI = 97) y otra
formulada por algunos ciudadanos españoles contra el expresidente del
Gobierno del Partido Popular José María Aznar, por delitos contra personas y bienes protegidos en caso de conlicto armado (TSI = 94). Los
datos cuantitativos parecen indicar, por tanto, que el sentimiento textual
global se hace acusadamente intenso cuando el discurso judicial y el
político se hibridan, en textos ideológicamente polarizados.
3
En el primer caso, se trata de una querella presentada (por delitos de injurias) por la
asociación Tertulia Feminista ‘Les Comadres’ contra el obispo de Alcalá de Henares, a
raíz de una protesta contra la reforma de la Ley de Aborto, impulsada por el Ministro de
Justicia Alberto Ruiz Gallardón. En el segundo caso, el querellante es el Partido Popular contra el secretario general de la Federación Socialista Madrileña, por los delitos de
injurias, calumnias, coacciones y amenazas.
92
Giovanni Garofalo
4. Constitución del plugin y análisis de los términos seleccionados
La selección de los 328 términos de polaridad positiva o negativa relacionados con los delitos de desobediencia y prevaricación e incluidos
en el plugin se ha realizado manualmente, tras una atenta lectura de
ambos textos. Pese al margen de error que todo análisis manual entraña,
la constitución del plugin se ha ajustado al siguiente enfoque híbrido,
a la vez estadístico y semántico:
1. Creación de una lista de palabras clave ordenadas por valor de
keyness (“calidad de palabra clave”);
2. Identiicación de términos monolexémicos y polilexémicos de
polaridad positiva y negativa, según un criterio onomasiológico.
Observación de las concordancias de los términos clave y de sus
respectivos colocados y clusters;
3. Comprobación de que los términos identiicados iguren en la lista de palabras clave y que no estén ya incluidos en el diccionario
de Lingmotif. Determinación de la polaridad que dichos ítems
asumen en la lengua común y en el ámbito penal;
4. Clasiicación del léxico evaluativo obtenido en cuatro subgrupos: a) términos que maniiestan carga afectiva hacia las conductas supuestamente ilícitas de los querellados; b) términos que
evalúan productos normativos (p. ej., resoluciones aprobadas por
la Cámara catalana, derivadas de las conductas criminógenas);
c) recursos léxicos y gramaticales por medio de los cuales la voz
del Fiscal se posiciona intersubjetivamente (p. ej., estructuras
polifónicas, verbos modales, negaciones y elementos evidenciales); c) valoraciones escalares. Todos los elementos axiológicos
identiicados de esta manera han resultado compatibles con la
escenografía del discurso y con los guiones de los delitos de desobediencia y de prevaricación continuada (§ 1).
El primer estadio de la metodología antedicha consiste, por tanto,
en la extracción de las 1965 palabras relevantes de ambas querellas,
obtenidas mediante la función Keyword list del programa AntConc
(Anthony, 2014), que compara los dos textos de la Fiscalía Superior de
Cataluña con el corpus de referencia. De dichas palabras, se han eliminado los ítems léxicos semánticamente vacíos (palabras gramaticales)
Persiguiendo con imparcialidad “el total desprecio a la Constitución”...
93
y los acrónimos característicos del ámbito judicial de referencia (p. ej.,
LOTC, Ley Orgánica del Tribunal Constitucional, CE, Constitución española, etc.). Tras esta operación de limpieza, las primeras 50 palabras
clave de ambas querellas, ordenadas por índice de keyness, resultan ser
las siguientes:
Cataluña, parlamento, constituyente, resolución, constitucional, proceso,
Constitución, presidenta, desobediencia, votación, estudio, parlamentaria,
mandatos, referéndum, incidente, paralizar, resoluciones, tribunal, mandato, poderes, catalán, desconexión, pleno, parlamentarios, parlamentarias,
cumplimiento, propuestas, boletín, parlamentario, julio, eludir, Carme,
Forcadell, suponga, ordenamiento, ignorar, voluntad, parlament, providencia, creación, inviolabilidad, democrático, iniciativa, conclusiones, inconstitucional, impugnación, negativa, decisiones, suspensión, soberanía
Según lo previsto, la lista así obtenida contiene dos clases de palabras (Baker 2006: 127): nombres propios que identiican el marco
espacio-temporal, el dominio del discurso (en concreto, la escenografía
y los guiones delictivos descritos en § 1) y la protagonista principal
de los hechos encausados (Cataluña, junio, Constitución, Carme, Forcadell), más una serie de palabras clave relacionadas con la temática
central (aboutness keywords). A partir de estas últimas, analizando las
concordancias de cada una de ellas, sus colocados y los clusters a la
derecha, se evalúa la polaridad efectiva de los candidatos a términos
que se incluirán en el plugin.
La clasiicación semántica de estos términos clave se ha realizado
acudiendo a los tres dominios de la teoría de la valoración (Martin,
2000, 2003; Martin & Rose, 2007; White, 2001, 2003; Martin & White,
2005), es decir, la actitud, el compromiso y la gradación. Como es sabido, en la actitud se incluyen los signiicados mediante los cuales el emisor atribuye un valor o una evaluación intersubjetiva al comportamiento
de los querellados en relación con las normas penales y a los productos
de sus respectivas actuaciones. En el dominio del compromiso se incluyen los recursos lingüísticos utilizables para posicionar la voz del iscal
en relación con las diversas proposiciones o iniciativas de los partidos
independentistas mencionadas en el texto. Por último, por medio de la
gradación se representa un espacio semántico de escala relacionado con
94
Giovanni Garofalo
la manera en el que el iscal intensiica o atenúa la fuerza de sus enunciados o gradúa el foco de sus categorizaciones semánticas.
Un análisis exhaustivo de todos y cada uno de los términos seleccionados para el plugin excedería con creces los límites de espacio de
este estudio; los epígrafes siguientes se limitarán, por tanto, a ilustrar
algunos casos representativos para cada dominio semántico.
4.1. Ítems léxicos que maniiestan actitud
En el marco del dominio de la actitud, la Fiscalía se limita a evaluar el
comportamiento de los querellados (subdominio del juicio) y los productos de su actuación, a saber, el proceso de desconexión y la producción normativa del Parlament, encaminada a llevar a cabo el referéndum vinculante en Cataluña (subdominio de la apreciación).
Por lo que se reiere al ámbito del juicio, desde la perspectiva del
iscal constituyen delito y se cargan de valoración negativa una serie
de comportamientos que, ignorando las repetidas advertencias del TC,
infringen el art. 410.1 (desobediencia) y el art. 404 (prevaricación continuada) del CP. No sorprende, pues, que en la lista de palabras clave
desobediencia (índice de keyness 186.754) igure inmediatamente después de presidenta (keyness 200.417). En efecto, es suiciente observar
los clusters en un intervalo de 15 palabras a la derecha del nombre y
apellido de la presidenta para encontrar valoraciones contundentes (señaladas en cursiva a continuación) como:
La Sra. Presidenta del Parlamento de Cataluña, Carme Forcadell i Lluís,
manifestando una voluntad inequívoca e irreversible de llevar adelante
su proyecto político por la fuerza de los hechos consumados, con total
desprecio de la Constitución de 1978, del ordenamiento emanado de la
misma, y de los pronunciamientos de la STC de 2 de diciembre de 2015
y del ATC de 19 de Julio de 2016, procedió a dar impulso al proceso
constituyente preordinado en la Resolución 1/XI.
La conducta de Doña Carme Forcadell que con su voto permitió el
debate y votación de las propuestas registradas con los números [...]
evidencia aún más su contumaz y obstinada voluntad de incumplir los
mandatos constitucionales [...].
Repárese en que los comportamientos delictivos de la querellada
se expresan mediante elementos léxicos que, en la lengua estándar, no
Persiguiendo con imparcialidad “el total desprecio a la Constitución”...
95
tienen carga afectiva alguna (p. ej., permitir el debate y la votación,
llevar adelante su proyecto político) y que, desde la perspectiva soberanista, constituyen la quintaesencia del derecho a decidir de la Cámara
catalana. Dichas expresiones, sin embargo, se han incluido en el plugin
no solo porque constituyen las acciones criminógenas esenciales que
motivan las querellas, sino también porque coaparecen junto a modalizadores adjetivos y adverbiales de inequívoca polaridad negativa, que
vehiculan la reprobación más severa del iscal (voluntad inequívoca e
irreversible, aún más, con total desprecio, por la fuerza de los hechos
consumados, su contumaz y obstinada voluntad).
Se ha observado, además, que en la mayoría de las concordancias la
palabra clave voluntad se reiere a los propósitos de los querellados y,
por tanto, presenta una prosodia semántica acusadamente negativa (p.
ej., voluntad obstativa, rebelde, de incumplir los mandatos, de desobedecer, de no dar cumplimiento a las decisiones, etc.). Del mismo modo,
el adverbio constitucionalmente se emplea casi siempre para evaluar
críticamente la conducta de Forcadell (ser constitucionalmente ilegítimo, ilícito; no resultar constitucionalmente admisible) y contribuye a la
creación de la metáfora conceptual de fondo (Lakoff & Johnson, 2003)
“el soberanismo rompe la legalidad”.
Si las palabras con el índice más elevado de keyness representan los
nudos temáticos de ambos textos y expresan signiicados ideacionales, las menos frecuentes pueden encapsular signiicados connotativos
o metafóricos interpersonales. Por ej., el sustantivo ardid, los adjetivos
camulada [retórica] y voluntarioso, el verbo enmascarar o el adverbio
torticeramente, que ocupan respectivamente el lugar 1.136, 684, 428,
599 y 1041 de la keyword list, contienen metáforas lexicalizadas que
descaliican el ethos de la querellada y ponen en entredicho su honradez
institucional:
Son estos actos de la Presidencia, utilizando torticeramente el Reglamento de la Cámara, los que lesionan el bien jurídico.
[Forcadell sustituye la ejecución de la sentencia del TC] por un voluntarioso intercambio de argumentos con los que enmascarar la conducta
desobediente […].
El pretendido ardid elucubrado para evitar la intervención de la Mesa
y trasladar la eventual responsabilidad a un Pleno irresponsable no es
sino una camulada retórica al servicio del incumplimiento.
96
Giovanni Garofalo
En varias ocasiones ha sido necesario invertir la polaridad asignada por defecto por Lingmotif a algunos elementos léxicos que expresan
sentimiento, p. ej., a la expresión ardid elucubrado. De hecho, el programa atribuye una polaridad positiva a cualquier sujeto lógico de un
verbo implicativo como evitar (Sbisà, 2007: 59-62), que suele activar la
presuposición de que la consecuencia evitada es mala y la causa que
la evita es buena.
El subdominio de la apreciación, en cambio, abarca el conjunto de
evaluaciones sobre los ‘productos’ del Parlament (p. ej., la Resolución
1/XI del Parlamento de Cataluña, sobre el inicio del proceso político
en Cataluña) o sobre el proceso de desconexión. Nótese que tanto resolución como proceso son palabras con un valor de keyness muy elevado
(526.149 y 421.294), en cuyos clusters (a la derecha) iguran elementos
léxicos que maniiestan una actitud censoria palmaria:
La resolución [...] no es efecto de una aplicación de la Constitución,
sino pura y simplemente, producto de [la] libertad [del Parlament],
convertida irrazonablemente en fuente de norma particular.
Al ratiicar y asumir como propias las conclusiones aprobadas por la
referida comisión parlamentaria, el Parlamento de Cataluña elude los
pronunciamientos de la STC 259/2015 e ignora las advertencias del
ATC 141/2016, pues pretende dar continuidad y soporte al denominado “proceso constituyente en Cataluña” dirigido a su desconexión del
Estado español.
El análisis de las concordancias revela que resolución y proceso,
palabras neutras en español estándar, se cargan de valor negativo, evidenciando una marcada preferencia semántica por relacionarse con
elementos léxicos que remiten a conductas improcedentes o ilegales
(irrazonablemente, eludir, ignorar, pretender, desconexión).
4.2. Ítems léxicos que expresan compromiso
La semántica del compromiso presupone una interpretación heteroglósica de ambos textos de la Fiscalía, cuyo andamiaje argumentativo se
construye a partir de la voz del oponente, con la que el emisor polemiza,
en una continua tensión dialéctica. Desde una perspectiva polifónica e
interaccionista, al recurrir a un verbo modal como deber, el iscal no
pretende solo expresar un signiicado lógico-deóntico, sino que mani-
Persiguiendo con imparcialidad “el total desprecio a la Constitución”...
97
iesta también rechazo y hostilidad hacia la postura de los querellados.
Nótese, p. ej., cómo el modal debe se revela útil para acometer contra
la postura de Forcadell, quien apela a su inviolabilidad e invoca una
interpretación elástica de la Ley:
La STC nº 51/1985, de 10 de abril, estableció que todo lo que afecta a
las prerrogativas parlamentarias debe ser interpretado de forma estricta, no cubriendo la inviolabilidad cualquier actuación, aún con relevancia política, del parlamentario.
Así pues, si las frecuentes citas directas de la jurisprudencia del Tribunal Constitucional o del Tribunal Supremo funcionan como mecanismos de respaldo de la tesis defendida, el punto de vista de los querellados puede evaluarse y neutralizarse de forma más indirecta.
Siguiendo con el análisis del signiicado interpersonal de los verbos
modales, se observa que poder aparece con altísima frecuencia (en 19
de las 26 ocurrencias totales) en contrargumentaciones que apuntan a
la total indisponibilidad del iscal para negociar con la opinión del contrincante. Por esta razón, suele coligarse con el adverbio de negación no
o con palabras de polaridad negativa (en ningún caso):
No puede alegarse para negar la desobediencia que la querellada o sus
asesores llegaran a la conclusión de que lo realizado no incumplía las
providencias del Tribunal Constitucional […].
La inviolabilidad no puede concebirse como cobijo de la arbitrariedad, sino que los actos parlamentarios quedan sometidos a la Constitución española.
El ordenamiento jurídico, con la Constitución en su cúspide, en ningún caso puede ser considerado como límite de la democracia, sino
como su garantía misma.
La relevancia semántica de estas negaciones queda comprobada no
solo por la presencia entre sus constituyentes de keywords como alegarse, concebirse o considerado (que ocupan respectivamente el lugar 1328, 218 y 1299 de la lista de palaras clave) sino también por su
proximidad con conceptos nucleares expresados por palabras con un
valor de keyness más elevado, p. ej., desobediencia, constitución, inviolabilidad, ordenamiento, que resultan totalmente coherentes con los
guiones delictivos activados por el iscal y iguran entre las 50 primeras
palabras clave.
98
Giovanni Garofalo
Asimismo, evocan y evalúan negativamente la voz del oponente algunos elementos lexicogramaticales de valor evidencial y ciertos
recursos ortográicos como el entrecomillado (p. ej., “el denominado
“proceso constituyente” en Cataluña”, “una supuesta legitimidad democrática”, “el pretendido ardid elucubrado para evitar la intervención
de la Mesa”).
Los elementos valorativos mediante los cuales el iscal alude a la
postura de los querellados son, en su aplastante mayoría, “proclamaciones”, es decir, enunciados implícitamente polifónicos mediante los
cuales el emisor aumenta la fuerza de su compromiso con el contenido
proposicional de sus aseveraciones. Se trata de una opción de “intravocalización cerrada” (White, 2001: 25), que evoca la voz del oponente
para desacreditarla y suprimirla, limitando las posibilidades de interacción con la diversidad ideológica:
El texto constitucional releja las manifestaciones del principio democrático, cuyo ejercicio no cabe fuera del mismo [STC 42/2014]. Por
ello, el ordenamiento jurídico, con la Constitución en su cúspide, en
ningún caso puede ser considerado come límite de la democracia, sino
como su garantía misma (FJ 50). […].
4.3. Valores que indican gradación
Las valoraciones expresadas mediante una escala de grado apuntan a
enfatizar la fuerza interpersonal que el iscal atribuye a sus enunciados
o bien agudizan el foco de sus valoraciones. La ampliicación de la carga afectiva se logra, p. ej., mediante los adverbios focales o mensurativos en –mente (Pinuer Rodríguez y Oteíza Silva, 2015: 112-116). Los
focales (p. ej., estrictamente, precisamente, meramente, etc.) explicitan
que la entidad individuada está jerarquizada entre varias posibles y establecen una relación “entre su foco y el conjunto de alternativas posibles con las que se contraponen expresa o tácitamente” (NGLE, 2009:
2992). El adverbio que agudiza el foco de la valoración con el mayor
índice de keyness (8.464) es claramente: aparece 10 veces y presenta
una prosodia negativa en 9 casos (contravenir ̴ los mandatos, adoptar
acuerdos ̴ contrarios, lesionar ̴ el bien jurídico, desbordar ̴ los estrechos márgenes de la excusa absolutoria). El signiicado interpersonal
de este elemento es el de ‘estrechar el foco de la evaluación’, adscri-
Persiguiendo con imparcialidad “el total desprecio a la Constitución”...
99
biendo los hechos narrados a conductas típicas previstas y sancionadas
en el código penal.
Los adverbios mensurativos, en cambio, son cuantiicadores escalares o presuposicionales que acrecen la fuerza del posicionamiento
intersubjetivo, ya que se forman a partir de adjetivos axiológicos (absoluto, sobrado). La cuantiicación que expresan sitúa un elemento dentro de un conjunto, donde se diferenciará por su posición escalar, que
suele establecerse a partir de factores pragmáticos, dependientes de la
subjetividad del emisor. El iscal opta por mensurativos que señalan el
máximo grado de la escala negativa, entre los cuales, en la keyword list,
aparecen, p. ej., absolutamente (0.742) y sobradamente (0.953):
La actividad de la comisión creada resulta absolutamente inviable si
no se entiende condicionada al cumplimiento de las exigencias de la
Constitución.
La resolución 1/XI […] excede sobradamente de los límites que [el
TC] imponía a la Comisión de Estudio.
Como puede observarse, el principio de gradación de fuerza opera
intrínsecamente en los valores de actitud, en el sentido de que cada
signiicado actitudinal representa un punto especíico en una escala de
intensidad de menor a mayor. Para constituir el plugin, a los términos
que identiican escuetamente una conducta delictiva (p. ej., celebrar el
referéndum, ejecutar la acción típica, permitir la alteración del orden
del día) se les ha asignado una valencia -2, mientras que a los ítems que
maniiestan el máximo grado de reprobación o el máximo riesgo para
el ordenamiento constitucional (con total desprecio de la Constitución,
creación de un Estado catalán) se les ha atribuido una valencia -5.
5. Conclusiones
Se ha ofrecido un análisis cuantitativo y cualitativo del léxico del sentimiento manifestado por la Fiscalía General de Cataluña en las dos querellas contra Carme Focadell i Lluís, presidenta del Parlamento catalán.
A simple vista, los datos cuantitativos recabados con la aplicación Lingmotif indican que en ambos textos el iscal maniiesta una carga afectiva
de cierta intensidad, con un valor de TSI (61) parecido a la línea de base
(65) calculada en un corpus de referencia de 63 querellas (629.893 palabras en total). Profundizando más en el análisis semántico de los ele-
100
Giovanni Garofalo
mentos lexicogramaticales, descubrimos que el vocabulario de Lingmotif, programado para el análisis de la lengua estándar, no detecta varios
ítems léxicos de polaridad negativa que jalonan la argumentación del
iscal. Por consiguiente, ha sido necesario constituir un vocabulario de
complemento (plugin) especíico, que la aplicación permite incorporar.
Los términos monolexémicos y polilexémicos incluidos en el plugin se
han seleccionado manualmente, teniendo en cuenta su coherencia con
el enmarcado cognitivo del texto (escenografía y guiones delictivos) y
su calidad de palabra clave (keyness), sistematizando su valor axiológico según los tres dominios semánticos previstos por la teoría de la valoración (actitud, compromiso y gradación). Tras la incorporación del
plugin especíico, se ha obtenido un valor de intensidad del sentimiento
(92) sorprendentemente elevado, afín al índice de TSI que presentan algunas querellas del corpus de referencia, en las que el discurso judicial
se hibrida con el político, lo cual parece indicar que ambas querellas
contra Forcadell son ejemplos de politización de la justicia.
Pese a la brevedad del corpus de estudio, la constitución del plugin
ha permitido cuantiicar la carga valorativa de ítems léxicos que presentan una polaridad neutra en la lengua común, pero que cobran un
signiicado negativo evidente en la escenografía y en el guion activados
por el iscal (p. ej., permitir el debate y la votación; llevar adelante
un proyecto político, proceso constituyente en Cataluña, etc.). De este
modo, ha sido posible integrar la dimensión cognitiva especíica de este
dominio penal al análisis de sentimiento realizado con Lingmotif. Por
otra parte, el análisis cualitativo ha resultado esencial para seleccionar
los ítems incluidos en el plugin según un criterio semántico-funcional.
En concreto, las tres categorías que vertebran la Teoría de la Valoración
han ofrecido la pauta de clasiicación de los elementos del léxico complementario, posibilitando, p. ej., la inclusión de elementos que deinen
no solo la conducta ilícita de los querellados, sino también el posicionamiento intersubjetivo del iscal (valores de negación y de contraargumentación, entre otros), los valores de gradación y los enunciados
implícitamente polifónicos.
El trabajo de extracción de los candidatos a términos se hubiera podido intentar con la ayuda de un extractor estadístico, basado en algoritmos de aprendizaje automático que comparan las frecuencias de las
palabras de un dominio especíico y de un corpus general, compaginando mediciones estadísticas con varias técnicas heurísticas (Moreno-
Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 101
Ortiz & Fernández-Cruz, 2015: 333). Es bastante probable, sin embargo, que este enfoque estadístico no hubiera logrado resultados del todo
satisfactorios, ya que, como se ha observado, la mayoría de los ítems
incorporados al plugin pertenecen al vocabulario semitécnico, formado
“por unidades léxicas del lenguaje común que han adquirido uno o varios nuevos signiicados dentro del español jurídico” (Alcaráz Varó &
Hughes, 2002: 59) mediante un proceso de resemantización. Asimismo,
la carga afectiva asociada a los recursos lexicogramaticales analizados
parece depender en gran medida del contexto argumentativo de uso. De
ahí la necesidad de elaborar ontologías jurídicas cada vez más ainadas
que permitan la gestión automática y el análisis de sentimiento de documentos procesales.
Bibliografía
Alcaraz Varó, Enrique & Hughes, Brian. 2002. El español jurídico. Barcelona:
Ariel Derecho.
Baker, Paul. 2006. Using Corpora in Discourse Analysis. London/New York:
Continuum.
Biber, Douglas. 2009. A corpus-driven approach to formulaic language: Multi-word patterns in speech and writing. International Journal of Corpus
Linguistics 14: 275-311.
Charaudeau, Patrick & Maingueneau, Dominique. (2002) 2005. Diccionario
de análisis del discurso. Buenos Aires: Amorrortu.
Díaz Rojo, José Antonio. 2010. El lenguaje valorativo en noticias periodísticas
españolas sobre avances médicos. Tonos 20. https://www.um.es/tonosdigital/znum20/secciones/estudios-5-el_lenguaje_valorativo_en_noticias.htm [Acceso 10/08/2017].
Garofalo, Giovanni. 2009. Géneros discursivos de la justicia penal. Milano:
FrancoAngeli.
Goffman, Erving. 1981. Forms of Talk. Philadelphia: University of Pennsylvania Press.
Halliday, M.A.K. & Hasan, Ruqaiya. 1985. Language, Context and Text: Aspects of Language in a Social-Semiotic Perspective. Oxford: Oxford
University Press.
Lakoff, George & Johnson, Mark. 2003. Metaphors we live by. Chicago/London: The University of Chicago Press.
Liu, Bing. 2012. Sentiment Analysis and Opinion Mining. San Rafao, CA:
Morgan & Claypool Publishers.
102
Giovanni Garofalo
López Ferrero, Carmen. 2008. La valoración y la emoción en español en discursos especializados. En Moreno Sandoval, Antonio (ed.) El valor de
la diversidad (meta)lingüística: Actas del VIII congreso de Lingüística General. http://www.lllf.uam.es/clg8/actas/index.html [Acceso
10/08/2017].
Maingueneau, Dominique. 1993. Le contexte de l’oeuvre littéraire. Énonciation, écrivain, société. Paris: Duunod
Martin, James R. & Rose, David. 2007. Working with Discourse. London/New
York: Continuum.
Martin, James R. & White, Peter R.R. 2005. The Language of Evaluation, Appraisal in English. London/New York: Palgrave Macmillan.
Martin, James R. 2000. Beyond Exchange: APPRAISAL Systems in English.
En Hunston, S. & Thompson, G. (eds), Evaluation in Text. Oxford:
Oxford University Press.
Martin, James R. 2003. Introduction. Text 23(2): 171-181.
McNeill, Patrick. 1990. Research Methods. London: Routledge.
Moreno-Ortiz, Antonio & Fernández-Cruz, Javier. 2015. Identifying polarity
in inancial texts for sentiment analysis: a corpus-based approach. Procedia. Social and Behavioral Sciences 198: 330-338.
Moreno-Ortiz, Antonio. 2017. Lingmotif: A User-focused Sentiment Analysis
Tool. Procesamiento del Lenguaje Natural, Revista 58: 133-140.
Nerhot, Patrick. 1990. The law and its reality. En Nerhot, P. (ed.) Law, interpretation and reality. Dordrecht/Boston/London: Kluwer, 50-69.
Pinuer Rodríguez, Claudio & Oteíza Silva, Teresa. 2015. Los adverbios en
-mente como factor de valoración en el discurso de la historia. Verba
42: 99-134.
Real Academia Española & Asociación de Academias de la Lengua Española. 2009. Nueva Gramática de la lengua española. Madrid: Espasa.
(NGLE).
Sbisà, Marina. 2007. Detto e non detto. Roma/Bari: Laterza.
Schank, Roger & Abelson, Robert. 1987. Guiones, planes, metas y entendimiento: un estudio de las estructuras del conocimiento humano. Barcelona: Paidós [1977].
Serpa, Cecilia. 2011. Signiicados interpersonales en los géneros legislativos:
el texto como macropropuesta. Pragmalingüística 19: 96-114.
Taranilla, Raquel. 2012. La Justicia Narrante. Cizur Menor: Aranzadi.
Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam/Philadelphia: John Benjamins.
Weber, Max. (1921) 1977. Economía y sociedad. México: Fondo de Cultura
Económica.
Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 103
White, Peter R.R. 2001. An introductory tour through appraisal theory. The
Appraisal Website. http://www.grammatics.com/appraisal/ [Acceso
29/03/2017].
White, Peter R.R. 2003. Beyond modality and hedging: A dialogic view of the
language intersubjective stance. Text 23(2): 259-284.
Anthony, Laurence. 2014. AntConc (Version 3.4.3) [Computer Software]. Tokyo: Waseda University. http://www.laurenceanthony.net/
Moreno-Ortiz, Antonio. 2016. Lingmotif 1.0 [Computer Software]. Málaga:
Universidad de Málaga. http://tecnolengua.uma.es/lingmotif
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
The malleability behind terms referring to common professional
roles: the current meaning of “boss” in British newspapers
La maleabilidad de los términos referidos a los roles profesionales comunes:
el signiicado actual de boss en la prensa británica
Rosa Giménez-Morenoa & Francisco Miguel Ivorra-Pérezb
Universitat de València.
[email protected]
Universitat de València.
[email protected]
Received: 19/04/2017. Accepted: 10/10/2017
a
b
Resumen: El objetivo de la presente investigación es abordar la variación y ductilidad de conceptos aparentemente claros e inequívocos relacionados con los roles profesionales habituales. El estudio se centra en las estructuras semánticas, y subsecuentes
modelos cognitivos, asociados con el término boss, tal y como son expresados y transmitidos en la actualidad a través de los grandes medios de comunicación británicos. El
análisis lingüístico, cualitativo y cuantitativo, de un corpus signiicativo de textos en
los que aparece este término muestra claras diferencias en su signiicado, dependiendo
de factores clave como la orientación sociopolítica e ideológica de la plataforma de
publicación.
Palabras clave: semántica cognitiva; lingüística de corpus; modelos mentales; roles
profesionales; prensa británica.
Abstract: The aim of the present research is to approach the current variation and
vulnerability to manipulation of concepts, apparently clear and unambiguous, related to
usual professional roles. The study concentrates on semantic frames, and subsequent,
cognitive models associated to the term ‘boss’ as they are expressed and transmitted
through large-scale British media. The qualitative and quantitative linguistic analysis
of a substantial corpus of texts, in which this term appears, shows clear differences in
its meaning, depending on key factors such as the socio-political and ideological orientation of the medium of publication.
Keywords: cognitive semantics; corpus linguistics; mental models, professional roles;
British press.
Giménez-Moreno, Rosa & Ivorra-Pérez, Francisco Miguel. 2017. “The malleability
behind terms referring to common professional roles: the current meaning of
‘boss’ in British newspapers”. Quaderns de Filologia: Estudis Lingüístics 22:
105-128. doi: 10.7203/qf.22.11303
The malleability behind terms referring to common professional roles...
107
1. Introduction
There are a number of relational identities and communicative roles
(Sluss & Ashforth, 2007) used daily by a great majority of speakers
(eg. father, neighbour, colleague, employee, etc.). These identities are
named through widespread standard terms that are usually deined briefly and simply; for example, the identity of a “boy or a man in relation
to either or both of his parents” is generally referred to as “son” and can
be simply deined as “a male descendant” (Oxford English Dictionary).
However, despite their apparent simplicity, these generic terms relect
complex mental constructs that are very sensitive to cultural variation,
socio-political variation, inter-generational variation, etc. (van Dijk,
2006, 2008). Depending on each of these parameters of variation, the
mental models attached to these terms, which help in the inference of
their pragmatic meaning, are conigured dependent on different stereotypes, connotations and socio-cognitive standards, belonging therefore
to various semantic ields and frames (Lehrer & Kittay, 1992).
From this variation-sensitive perspective of words concerning communicative roles, the present study focuses on the term “boss”, referred
to “a person who is in charge of a worker or organization” (OED), and
also on its closest synomyms: CEO, chairman, chief, chief executive,
director, employer, head, leader and top. Our aim is to observe the
ductility of this concept in today’s mass media, paying particular attention to its compliance with the different socio-political ideologies and
perspectives that underlie these media.
The research is framed within the ield of corpus-based cognitive
semantics applied to professional communication. After essential background about semantic ields and frames is exposed, the paper summarises the range of deinitions, synonyms and characteristic expressions
associated to the term “boss” according to the major dictionaries in use.
Then, the target terms are analysed in a corpus of texts belonging to the
British mass media, and both quantitative and qualitative methods are
used in the lexical and semantic description which leads to the results.
2. Semantic frames and lexical ields associated with a company’s
structure
Interest in lexical ields and semantic frames has been growing exponentially since the 1970s (Habermas, 1970; Lehrer, 1974), and especial-
108
Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez
ly since the 1990s (Lehrer & Kittay, 1992), in parallel with the development of other complementary disciplines such as artiicial intelligence,
computational linguistics, cognitive psychology and interdisciplinary
linguistics. According to Fillmore and Atkins (1992:76), semantic ield
theories study, characterise and catalogue “systems of paradigmatic
and syntagmatic relationships connecting members of selected sets of
lexical items”. Cognitive frames or “knowledge schemata” can only
approach a word’s meaning “with reference to a structured background
of experience, beliefs, or practices, constituting a kind of conceptual
prerequisite for understanding the meaning” (p. 77). The meaning of
a word can only be fully understood “by irst understanding the background frames that motivate the concept that the word encodes”.
Cognitive frame analysis, and semantic parsing, has become very
popular and productive, especially within the area of computer sciences, with the development of language processing applications based on
lexical resources such as FrameNet, WebNet or WordNet (Shi & Mihalcea, 2005). However, the notion of “semantic frame” was originally proposed by Fillmore (1977, 1985) and has also become central in
cognitive linguistics, together with key related and interdependent concepts such as “domain” and “cognitive model” (Lakoff, 1987; Van Dijk,
2006, 2008), that have been essential in the development of research
areas such as critical discourse analysis (CDA) and of knowledge structures such as metaphor, metonymy and other communicative igures.
The basic assumption of frame analysis is that word meaning understanding and interpretation requires the recognition of the relevant
contextually related background information within which that word is
expressed, which conforms its semantic frame. According to Fillmore
and Baker (2011: 317), frame analysis implies a thorough methodological procedure which allows identifying the essential frame elements
and lexical units, necessary to make accurate and objective interpretative observations.
In the present study we will adapt this context-based procedure to
approach words related to professions, particularly the word “boss”.
Historically, as we will comment on in the following section, the conceptualisation of the term “boss” has been associated with a number of
key concepts in the past that conform its lexical ield; however, today’s
interpretation of this term seems dependent on other mental models and
experiential constructs developed by current speakers, with their present interpretative criteria, concerns, habits, values and way of under-
The malleability behind terms referring to common professional roles...
109
standing the reality that surrounds the concept of “boss” at the moment.
Although our study will be limited to this concept, there is evidence
that this semantic luctuation affects many other terms within business
English (i.e. Nelson, 2005). Traditional terms referring to a company’s
structure (e.g. president, advisor, administrator, oficer, supervisor, etc.)
are adapting their semantic and pragmatic coverage, not only due to the
evolution of socio-economic trends and political ideologies, but also
by technological implementation and the modernisation of corporate
cultures to foster innovation, motivation and effectiveness in their companies (Camisón & Villar-López, 2014)
3. Deining the lexical ield of the word “boss”
According to the most popular and prestigious monolingual dictionaries of English (i.e. Cambridge Dictionary, Merriam-Webster, MacMillan Dictionary or Collins Dictionary), the general meaning of the
noun “boss” refers to “a person who exercises control or authority” or
“a person who makes decisions, exercises authority, dominates, etc.”
According to the Merriam-Webster Dictionary, the etymology of this
term goes back to the Dutch word baas, meaning “master”, used in the
Dutch colonies settled in North America during the 17th century. The
word became popular as a free-labour alternative to avoid the slave-labour related term “master”. This original dual positive-negative meaning continues to persist up until now.
This word has a polysemic meaning. In fact, its irst dated use in the
13th century places its origins in the Old French and Middle English
word boce, which belonged to the world of architecture and geology,
and referred to a circular ornamental decoration (MacMillan Dictionary). Also, according to Dictionary.com it also refers to a young cow or
calf in biology, a round growth or protuberant part on the body in medicine, a form of protection for a book and a projecting part of a ship’s
hull. This term is also used as an adjective in slang English, meaning
“very good, excellent, incredibly awesome, great” (Internet Slang Dictionary and The Urban Dictionary).
In the present study these meanings are discarded, concentrating on
its meaning inside the world of business and politics. Within this lexical ield we ind speciic deinitions, such as the person “who directs or
supervises workers” (Merriam-Webster), “the person who is in charge
of an organization and who tells others what to do” or “the manager, the
110
Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez
person who employs or superintends workers” (Dictionary.com), and
also other more elaborate and complete descriptions:
An individual that is usually the immediate supervisor of some number
of employees and has certain capacities and responsibilities to make
decisions. The term itself is not a formal title, and is sometimes used to
refer to any higher level employee in a company, including a supervisor, manager, director, or the CEO (Online Business Dictionary).
Its adaptation to political contexts generates more clear-cut deinitions such as “the head of a group (as a political organization)” or the
person “who controls votes in a party organization or dictates appointments or legislative measures” (Merriam-Webster), or “a politician who
controls the party organization, as in a particular district” (Dictionary.
com).
As we see in most of the dictionaries cited, this neutral or positive
meaning of the word, as part of the professional hierarchies and responsibilities, seems to be the most widely-accepted, being also expressed
through other synonymous terms such as: superior, manager, director,
president, managing director, CEO, chief, supervisor, head, foreman,
overseer, founder, governor, magnate, taskmaster, master, captain, superintendent, commander, employer, master, trainer, wield power, authority, etc. Nevertheless, the negative, derogatory and sarcastic version
of its meaning still persists and is increasingly rooted in today’s society.
This negative side of the term can be clearly observed when looking
at its phrasal use in “to boss someone around” which is deined as “to
give orders to, especially in an arrogant, authoritative, or domineering
manner” (Free Dictionary and Dictionary.com). This adverse meaning
is evident in the deinitions that appear in slang dictionaries: “someone
who runs shit in his/her hood or city” or “bosses are like diapers: full of
shit and all over your ass” (Urban Dictionary). It is also observed in the
additional set of metaphorical, hyperbolic and derogatory synonyms,
pointed out by most of the above dictionaries, that currently substitute
or alternate with “boss”, especially in slang and casual registers of English, accentuating three negative dimensions of the term:
•
Oppressive and despotic (e.g. padrone, Goliath, fuhrer, dictator, king,
etc.)
The malleability behind terms referring to common professional roles...
•
•
111
Old-fashioned and obsolete (e.g. overlord, skipper, warlord, the powers
that be, wear the pants or trousers, etc.)
Sarcastic and ridiculing (e.g. big cheese, top dog, top cat, head honcho,
big shot, etc.)
The irst negative concept of the term has developed from its natural duality, which instigated its origin in the 17th century, and it is still
a focus of concern within the professional community, as we see in
the following research articles: “The boss is watching your every click
…” (Newitz, 2006), “Privacy in electronic communication: watch your
e-mail, your boss is snooping!” (Kierkegaard, 2005), “In nomine patris: discursive strategies and ideology in the Cosa Nostra family discourse” (Indio et al., 2017). The second dimension is also latent, as we
see in “Being the boss is not what it used to be!” (Muller-Smith, 1998)
or “Why are there bosses?” (Hess, 1983). Finally, specialists already
warned twenty years ago about the third negative trend of its meaning,
in publications such as “When the boss is away” (Clarck & Riddick,
1991) or “Think your boss is incompetent? You’re probably right” (Buchanan, 2009). This phenomenon has accelerated considerably in the
last ten years, together with the global economic, social and ethical crisis, and the way in which society and the media are approaching the
values, attitudes and mental models related to this term (i.e. courage,
control, respect, authority, etc.) are affecting its present and probably
future meaning and use (Uhl-Bien & Carsten, 2007).
On this basis, our aim here is to study the current semantic frames
and subsequent cognitive models associated to the term “boss” as they
are expressed and transmitted through large-scale British media.
4. Methodology and corpus analysis
A sample of 40 articles from two acclaimed British digital newspapers,
The Guardian and The Telegraph, has been compiled and analysed.
The corpus contains about 50,000 words, including 20 articles from
each newspaper, both with a balanced length of approximately 25,000
words. They are representative of the British mass media and, more importantly, respond to British bipartisan politics relected in different socio-political trends, which is of great interest for our research purposes
(e.g. The Guardian has traditionally been associated with a centre-left
112
Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez
political ideology while the The Telegraph holds a more centre-right,
conservative orientation). As we are interested in examining the different mental models attached to the meaning of the term “boss”, in the
context of Brexit and the global socio-economic crisis, we have particularly drawn our attention to analyse articles included in the “business
section”, such as those related to inance, retail or economy during the
year 2016.
As far as the method of analysis is concerned, we have found it convenient to adapt Fillmore’s and Baker’s frame analysis (2011) to our
study. As this is a preliminary research on the variation meaning of the
term “boss” in the British mass media, we have only focused on the irst
three steps that the aforementioned authors establish in the FrameNet
process (pp. 321-22). Firstly, we have characterised the frames making
up the sample of analysis; secondly, we have concentrated on describing and naming the elements that belong to those frames; inally, we
have selected the main lexical units frequently included in the frames.
Both a qualitative and a quantitative analysis are followed. To do
so, we have made use of the corpus manager and analysis software
Sketch Engine (2003). The main indings are distributed into two main
parts. One is devoted to describing and discussing the results extracted
from a qualitative overview based on the concordance search analysis.
The other is focused on the quantitative results drawn from applications
such as word lists and frequencies, collocations and word sketch.
5. Results and discussion
5.1. A qualitative overview
The indings obtained from the concordance search analysis of the term
“boss” indicate important differences between both data sets. As regards The Guardian, it is observed that this term leads to and is included in a major distinctive frame that semantically connotes a person who
adopts a pessimistic and uncertain attitude towards the economic situation the UK may face after the Brexit vote as well as someone who is
not deprived of corruption and owns unfair privileges over employees
or the rest of the population. On the contrary, the indings drawn from
The Telegraph data set show that the semantic frame in which the term
“boss” is included differs considerably from that of The Guardian. In
The malleability behind terms referring to common professional roles...
113
this case, the semantic connotation of the term points towards a person
who has a more encouraging attitude towards the Brexit vote results
and can give hope and improve the economic situation of the UK despite the dificulties the country may have. Additionally, less importance is given to cases of corruption committed by those who are at the
top. To appreciate these two apparently contrastive semantic frames, a
small selection of the most representative terms extracted from the concordance search analysis, both in The Guardian and in The Telegraph,
is shown in Table 1:
Word categories
Noun/adjective+noun
Adjective
Verb
The Guardian
hard Brexit
abuse of position
low-paid insecure jobs
fraud
false accounting
signiicant economic damages
corrosive impact
charges
problems
prison
serious implications
cautious
dumb
fat
lazy
stupid
accused
criticized
sabotage
spend
raided
seized
sentenced
suffer
The Telegraph
Brexit era,
expertise
job creation
respected boss
investment
strong economy
new opportunities
reassurance
growth
success
sense of calm
conident
reliable
able
dynamic
clear
carry on
keep calm
committed to maximizing
contribute
commits to help
create
ensure
get on
Table 1. Examples from the concordance search analysis: The Guardian
and The Telegraph
114
Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez
To throw some light to the above observational indings, a few extracts from The Guardian are reproduced next. It is important to observe that the connotations linked with the term “boss”, which were
previously commented, are also interrelated with grammatical features
such as the use of supporting data (£5.5 m, 10%), the inclusion of boosters (signiicant, pretty) as well as speciic collocations and idiomatic
expressions (false accounting, abuse of position). These seem to be included with the intention of reinforcing the more negative attributions
of the term:
•
•
•
•
“Loans boss paid hackers to attack consumer websites, court told…
was sentenced to four months in prison…the businessman’s home was
raided and his computer equipment seized…There is a low risk of him
committing further offences of this nature”.
“Pay ratio between bosses and employees will be ‘2016’s hot topic’…
K’s top bosses received 10% pay rise in 2015 as average salary hit
£5.5m…The bosses of Britain’s largest public companies earned an average of £5.5m last year, and have enjoyed a 10% pay rise while wages
in the rest of the economy lag far behind…”.
“Britain will end up looking stupid over Brexit, says Ryanair boss…
The UK is going to suffer some signiicant economic damage when they
get into the entrails of the Brexit decision…The UK will end up looking
pretty stupid, he said”.
“Ex-Tesco bosses to appear in court on fraud and false accounting
charges… The former Tesco bosses are all charged with one count of
fraud by abuse of position and one count of false accounting”.
In relation to the examples selected from The Telegraph, we can
perceive that the semantic connotations held towards the term “boss”
are also interrelated with some particular grammatical elements. For
instance, it is worth considering the presence of hedged expressions by
means of probability adverbs, verbs or linking words of contrast (unlikely, almost, predicted, despite, etc.) to mitigate somehow the more
positive connotations concerning the term under analysis:
•
“The boss of Britain’s biggest business group said it was vital policymakers worked closely with companies to set out a clear plan to ensure
the UK remained a top investment destination… He also urged policymakers to maintain a ‘sense of calm’ regarding the millions of EU
workers and pensioners who are currently living in the UK …”.
The malleability behind terms referring to common professional roles...
•
•
115
“Brexit is unlikely to lead to a sudden decline in London’s status as one
of the leading centres for the global capital markets, the boss of Barclays has predicted”.
“British bosses are more upbeat about business prospects this year than
almost every other major advanced economy, as companies ‘keep calm
and carry on’, despite domestic and global uncertainty”.
Although these have been the results obtained from a qualitative
overview analysis, it is necessary to provide more convincible results
by means of an analysis of a more quantitative nature. As such, the next
sub-section particularly concentrates on describing and discussing the
quantitative indings emerging from our analysis.
5.2. Quantitative analysis
5.2.1. Word lists and frequencies
The word lists and frequencies analysis for the term “boss” or its plural form “bosses” yields interesting indings for both samples. To start
with, the general use of this term, in raw frequencies, is slightly higher
in The Guardian (153)1 than in The Telegraph (96), which may suggest
that the term is more prone to be included in newspapers with a more
left-wing political orientation like The Guardian.
If the term “boss” co-occurs with different synonyms, as our preliminary observational analysis has revealed, we have found it important to
take them into account in our quantitative analysis. We are referring to
words such as CEO, chairman, chief, chief executive, director, employer, executive, top, along with its plural forms. The results show that,
except for the term “director”, whose frequency is practically similar
in both corpora (G22/T20)2, The Guardian includes a wider number
of synonyms. The most widely used in both data sets is “chief” (G115/
T69) either alone or in combination with “executive” (G90/ T42) or
“executives” (G60/ T35). In this newspaper, the above synonyms are
followed, in terms of frequency of use, by “top” (49), “chairman” (37),
“leaders” (27), “director” (22), “CEO” (21), “directors” (16), “employFrom now onwards the numbers included in brackets refer to raw frequencies.
From now onwards G will be the abbreviation for The Guardian, and T for The Telegraph.
1
2
116
Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez
ers” (16), “CEOs” (15), “head” (15) and “employer” (6). With respect to
The Telegraph, apart from the higher frequencies obtained for “chief”,
be it alone or in combination with “executive” or “executives”, “chairman” (21) is the most widely used frequent term followed by “director”
(20), “top” (20), “head” (13), “CEO” (9), “CEOs” (8) and “leaders” (7).
It is noteworthy that no instance of the term “employer” or “employers”
is found in The Telegraph sample.
A relevant observation is that the terms “boss” or “bosses” and its
synonymous counterparts are frequently substitued by means of pronouns performing an anaphoric function in the text. The analysis reveals
that the frequencies of these pronouns are also higher in The Guardian,
perhaps in tune with the characteristic freedom of expression of this
newspaper, than in The Telegraph: “He” (G79/T37), “His” (G13/T5),
“he” (G203/T137), “him” (G15/T7), “his” (G105/T67), “they” (G131/
T61), “their” (G142/T79), “them” (G42/T18), “themselves” (G8/T0). It
is also noticeable that there is a tendency to include more plural forms
of this term in The Guardian data set.
Remarkable differences have also been encountered in the frequency of words surrounding the term “boss” and its main synonyms, coniguring different semantic frames, which relect a signiicant degree of
variation in the current mental models which conceptualise this word.
This contrast can be observed in Table 2, which shows a small sample
of the most distinctive word categories obtained in the word lists and
frequencies analysis:
The malleability behind terms referring to common professional roles...
Word
categories
Nouns
Adjectives
Verbs
117
The Guardian
The Telegraph
beneits (12), Brexit (91),
change (29), charges (13), consequences (7), costs (29), court
(11), crisis (16), data (20),
decline (15), economy (68),
employee (177), evidence (11),
executive (150), igures (15),
indings (7), growth (62), London (54), losses (9), measure
(77), pressure (15), productivity (22), remuneration (20),
risk (13), roles (7), salaries (9),
source (8), staff (49), strategy
(16), success (16), survey (32),
UK (234), uncertainty (23),
vote (59), wage (25), warning
(11), wellness (12), workers
(32), etc.
cautious (15), chief (115),
clear (21), committed (5),
false (6), fat (5), inancial (66),
global (54), hard (13), living
(24), low (16), minimum (8),
national (24), new (85), possible (12), signiicant (22),
worry (6), worth (9), wrong
(5), etc.
accused (5), believe (8),
change (29), charged (6), committed (5), earn (8), employs
(10), encourage (11), face
(13), found (17), help (17), hit
(13), improve (13), pay (117),
reduce (9), reform (11),shows
(12), solve (6), suffer (6), tackle (9), think (45), trying (9),
voted (10), want (26), warned
(29), etc.
beneits (0), Brexit (49),
change (11), charges (5),
consequences (0), costs (9),
court (0), crisis (0), data (7),
decline (6), economy (37),
employee (0), evidence (0),
executive (77), igures (0),
indings (0), growth (46),
London (39), losses (0),
measure (0), pressure (8),
productivity (8), remuneration (0), risk (0), roles (0),
salaries (0), source (0), staff
(23), strategy (7), success
(9), survey (13), UK (137),
uncertainty (19), vote (34),
wage (0),warning (0), wellness (0), workers (0), etc.
cautious (0), chief (69),
clear (13), committed (7),
false (0), fat (0), inancial
(44), global (35), hard (5),
living (6), low (8), minimum (0), national (6), new
(65), possible (69), signiicant (10), worry (0), worth
(0), wrong (0), etc.
accused (0), believe (0),
change (11), charged (0),
committed (7), earn (0),
employs (0), encourage (0),
face (0), found (7), help
(10), hit (0), improve (9),
pay (10), reduce (0), reform
(0), shows (0), solve (0),
suffer (0), tackle (0), think
(27), trying (0), voted (0),
want (19), warned (13), etc.
118
Word
categories
Adverbs
Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez
The Guardian
The Telegraph
actually (8), already (27), even
(26), increasingly (6), less (20),
likely (17), many (40), more
(173), most (56), not (195),
probably (7), really (18), etc.
actually (0), already (17),
even (13), increasingly (0),
less (10), likely (9), many
(21), more (100), most (36),
not (96), probably (0), really
(11), etc.
against (7), by (118), forward (0), over (44), under
(11), up (74), etc.
He (37), he (137), his (67),
him (7), I (86), me (7), they
(10), their (79), them (18),
themselves (0), this (23),
when (28), where (20), who
(57), you (7), your (6), etc.
do (45), does (28), can (43),
could (32), had (42), has
(156), have (107), might
(0), should (18), will (147),
would (80), etc.
But (34), Despite (5), However (16), If (13), also (78),
and (578), as (212), because
(17), but (75), despite (15),
if (30), like (17), must (8), or
(40), than (63), though (6),
while (22), etc.
against (22), by (247), forward
(5), over (78), under (23), up
(106), etc.
Pronouns He (79), he (203), his (105),
him (15), I (102), me (13), they
(21), their (142), them (42),
themselves (8), this (41), when
(49), where (33), who (103),
you (14), your (11), etc.
Auxiliary do (84), does (40), can (61),
and modal could (89), had (93), has (230),
verbs
have (208), might (17), should
(43), will (221), would (170),
etc.
Connectors But (64), Despite (11), However (27), If (31), also (95), and
(962), as (328), because (29),
but (137), despite (24), if (66),
like (36), must (17), or (81),
than (120), though (12), while
(35), etc.
Prepositions
Table 2. Comparison of word categories from the word lists and frequencies analysis
of The Guardian with The Telegraph. Raw frequencies
The above indings can corroborate the results obtained from the
concordance search analysis discussed in the previous subsection. As
for The Guardian, the results obtained reinforce the mental model of
“boss” as a person who seems to hold a distrustful attitude towards the
United Kingdom’s withdrawal from the European Union and its future
consequences for the UK economy (e.g. Brexit, cautious, consequences,
The malleability behind terms referring to common professional roles...
119
face, hit, vote, voted, worry, wrong, etc.) and feels insecure and uncertain about the economic situation of the country if it inally leaves the
EU (e.g. crisis, decline, economy, employs, hard, hit, losses, pressure,
productivity, risk, suffer, uncertainty, etc.). In the same vein, there is
a higher frequency of words that refer to “boss” and its synonymous
expressions as someone involved in cases of corruption and owning
more privileges than the staff or the rest of the population (e.g. accused,
beneits, costs, court, earn, false, fat, hit, pay, remuneration, salaries,
wages, etc.). These negative connotations and its corresponding synonyms are also translated in a high frequency of prepositions connoting
strong opposition, as seen in “against” (G22/T7).
However, not all the mental model is so negative in this part of our
corpus. Words relating “boss” to someone who can provide solutions
despite the uncertainty and insecurity regarding the Brexit vote are also
frequently used (e.g. change, encourage, forward, improve, measure,
reform, solve, strategy, success, tackle, wellness, etc.)
We can also remark in the sample analysed that, in order to justify their own opinions towards the economic and inancial situation of
the UK, the term “boss” or “bosses” and their synonymous expressions
are surrounded by words that semantically connote a person who constantly resorts to the use of proofs demonstrating the veracity of his/
her views (e.g. data, evidence, igures, indings, source, survey, shows,
etc.), together with passive sentences including the agent who performs
the action preceded by the preposition “by”, whose frequency is also
much higher in The Guardian (G247/T118). These viewpoints are frequently communicated through the higher use of emphasising adverbs,
irst person singular pronouns and addition linking words to reinforce
bosses’ opinions on the problems associated with the UK (e.g. actually,
already, also, and, even, I, increasingly, many, more, most, really, etc.).
Nonetheless, despite the veracity of their opinions and relections, these
are frequently mitigated by means of cognitive verbs as well as modal
verbs and adverbs of probability acting as hedges (e.g. believe, can,
could, likely, might, should, think, would, etc.). This understatement is
also conveyed through the high frequency of contrastive linking words
(e.g. but, despite, however, if, or, though, while, etc.).
When comparing the results drawn from The Guardian with the
ones obtained in The Telegraph, a partially different picture seems to
emerge. The frequency rates, and the semantic frame related to “boss”
120
Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez
in this sub-corpus, seem to spin around terms such as chief, new, Brexit,
growth, inancial, economy, executive, global, London, etc. The mental model attached to those words differs considerably between both
samples: unlike the dark and discouraging attitude that their meaning
connotes in The Guardian, in The Telegraph their semantic connotations evolve around someone closely associated to power centres (both
locally and globally), who has a more optimistic attitude towards the
Brexit election and calms down the UK population by assuring them
that the Brexit is not going to change the economic situation of the
country in the future.
By the same token, there are even terms in The Guardian which are
completely absent in The Telegraph. This may portray an image of the
“boss” and its synonymous related terms as someone who, despite being attributed cases of corruption and unfair privilege, has the capacity
to act as an adviser and expert trying to relax the UK citizenship with
solutions and promising a good forecast for the country.
Firstly, we observe that despite the awareness of the Brexit vote and
the consequences this may have for the UK’s economy, the attitude held
by bosses is not as pessimistic and dubious as the one revealed in The
Guardian. This can be demonstrated, on the one hand, by the lower
frequencies obtained for terms such as Brexit, costs, crisis, decline,
economy, hard, national, pressure, productivity, staff, uncertainty, vote,
workers, etc. and, on the other, the complete absence of terms such as
cautious, consequences, employee, face, hit, losses, risk, suffer, worry,
wrong, etc.
Likewise, the frequency of words referring to “boss” and its synonymous related terms connoting someone involved with bribery, fraud and
in an advantaged position with respect to employees or the rest of the
citizens is also lower (e.g. costs, economy, hard, low, over, pay, staff,
etc.). In addition, there are null frequencies for signiicant terms such as
accused, beneits, court, earn, employee, face, false, hit, remuneration,
roles, salaries, suffer, wage, workers, etc. Apart from that, prepositions
connoting negative meanings like “against” (G22/T7) appear in much
lower frequencies if we compare them with The Guardian data set.
Regarding the concept of “boss” as a person who has the ability to
provide solutions despite the British drawbacks as a result of the Brexit
vote, the indings uncover that the terms semantically connoting this
meaning also appear in lower frequencies than in The Guardian sample
The malleability behind terms referring to common professional roles...
121
(e.g. change, clear, help, improve, new, strategy, success, warned, etc.).
Furthermore, no instances have been found for terms such as encourage, forward, measure, reform, solve, tackle, trying, etc.
If in The Guardian we have found terms that semantically evoke veracity so as to justify bosses’opinion regarding the economic and inancial situation of the UK, the frequencies of these terms in The Telegraph
are also much lower (e.g. data, like, survey, when, where, etc.) and no
instances have been found for terms such as evidence, igures, indings,
source and shows. Concerning passive sentences followed by the preposition “by” with reference to the agent who performs an action, the
frequencies obtained are also much lower than in The Guardian (G247/
T118). The same applies to the use of emphasising adverbs and additional connectors to reinforce bosses’s views on the economic problems
of the UK (e.g. already, also, and, many, more, most, really, etc.) and
no instances are found for “actually” or“increasingly”. In keeping with
this line of thought, the frequency of cognitive verbs, modal verbs and
adverbs of probability functioning as hedges to downtone bosses’ statements is lower too (e.g. can, could, likely, should, think, would, etc.)
and others like “believe” or “might” are null. Finally, the recurrence
to linking words of contrast to understate bosses’ viewpoints are not as
frequent as those included in The Guardian (e.g. but, despite, however,
if, or, though, while, etc.).
5.2.2. Collocation analysis
The collocation analysis for the term “boss” also unveils interesting
indings as far as both data sets are concerned. As such, the words that
co-occur with the term “boss” in both samples indicate divergent frequencies, as seen in Table 3.
The data shown in this table corroborate the trends and contrasts
already indicated in the previous indings. The words co-ocurring with
the term “boss” in The Guardian data set semantically connote someone
who has many doubts and indecisions regarding the economic problems the UK citizenship may face after the Brexit political elections as
observed in the higher frequencies obtained if these are compared with
the ones found in The Telegraph (e.g. bank, Britain, company, customer, cut, crisis, staff, warn, etc.). Nevertheless, no co-ocurrence terms
such as company, customer, crisis, cut, not, price, and staff for the word
122
Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez
“boss” have been found in The Telegraph sample. As most of the times
there is a reference to the future consequences of the Brexit vote, it is
not surprising to frequently ind the preposition “after” (G81/T0) and
no instance of the latter in The Telegraph data set.
The Guardian
The Telegraph
after (81), bank (51), beneit (18), big
(52), Britain (80), British (49), chief
(118), company (219), customer
(81), cut (25), could (89), crisis (17),
deal (66), employee (3), executive
(4), ind (41), he (282), high (41),
insist (0), London (54), more (176),
most (58), new (86), not (273), over
(81), pay (151), price (66), receive
(13), rise (68), say (419), staff (50),
tell (30), than (120), their (142), them
(42), they (152), top (55), UK (233),
want (37), warn (44), we (237), will
(0), year (214), etc.
after (0), bank (26), beneit (0), big
(0), Britain (37), British (23), chief
(0), company (0), customer (0), cut
(0), could (0), crisis (0), deal (0),
employee (0), executive (0), ind (0),
he (174), high (0), insist (9), London
(0), more (101), most (73), new (66),
not (0), over (44), pay (0), price (0),
receive (0), rise (0), say (337), staff
(0), tell (0), than (0), their (0), them
(0), they (0), top (0), UK (137), want
(0), warn (21), we (0), will (154),
year (120), etc.
Table 3. Comparison of words from the collocation analysis of The Guardian
with The Telegraph. Raw frequencies
About the words that co-occur with the term “boss” semantically
connoting a corrupted person and enjoying more beneits than the rest
of the people, the frequency of words that collocate with this meaning
is also higher in The Guardian (e.g. big, company, high, more, over,
pay, receive, rise, than, their, them, they, top). Nonetheless, terms such
as big, chief, company, employee, executive, high, pay, rise, than, their,
them, they, top are not found in The Telegraph.
The veracity and truthfulness of bosses’ opinion are shown in the
frequent use of the verb “ind” in The Guardian whereas the latter does
not appear as a collocation term for the word “boss” in The Telegraph.
Additionally, the use of the modal verb “could” in The Guardian and
its absence as a collocation word in The Telegraph may imply, as observed in previous analyses, that the views held by bosses tend to be
understated in the former. Aside from that, the higher use of co-ocurring
terms such as deal, new, want, or we can convey the idea that the term
The malleability behind terms referring to common professional roles...
123
“boss” is related to someone who, despite his or her gloomy attitude for
the economic and inancial inconveniences the UK may have, has the
ability to act as an adviser, expert, promoting initiatives and solutions
in collaboration with the rest of the citizens to sort out the current shortcomings.
One inal point to be made is that in The Guardian the presence of
verbs like “say” and “tell” co-ocurring with the word “boss” is higher
than in The Telegraph. Particular importance deserves the verb “tell”,
with a null presence in The Telegraph. This verb is frequently used in
neutral or informal registers. This could mean that the register used
in the The Guardian could luctuate between neutral and informal and
more formal in the case of The Telegraph.
5.2.3. Word sketch analysis
The Word Sketch analysis has allowed us to know the different types of
modiiers that go with the word “boss”, nouns and verbs that are modiied by “boss”, verbs with “boss” either as subject or object and adjective predicates accompanying the term “boss”. The indings stemming
from this analysis have also shown important differences concerning
both data sets. These are shown in Table 4 below:
Word
sketch
Modiiers
of “boss”
The Guardian
The Telegraph
UK (10.62), new (10.18),
Deutsche (10.16), retail
(9.94), inance (9.94)
female (9.67), factory
(9.66), bank (9.64), British (9.48), business
(9.19),respected (8.69),
supermarket (8.69),
quietly-spoken (8.69),
economy (8.69)
Nouns/verbs
skyscraper (10.6), ight
skyscraper (11.0), ight
modiied by “boss” (10.6), class (10.54), mat- (11.0),
ter (10.54)
Britain (10.68), intelligence (10.24)
UK (11.08), top (10.33),
bank (9.97), British
(9.74), Deutsche (9.59),
factory (9.05) industry
(8.96), new (8.33), respected (8.09), quietlyspoken (8.09), go-ahead
(8.09), stripping (8.09)
124
Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez
Word
sketch
Verbs with “boss”
as a object
Verbs with “boss”
as a subject
Adjective predicates of “boss”
The Guardian
The Telegraph
falter (10.82), lead
(10.75), appoint (10.75)
choose (10.68), charge
(10.74), allow (9.91),
show (98.7), tell (9.67),
do (9.32), be (8.19)
warn (10.54), remain
(10.1), have (9.6), go
(9.38), say (9.16), waive
(8.89), care (8.89), spy
(8.89), land (8.89), acknowledge (8.89), press
(8.89), shy (8.89), shrug
(8.87), respond (8.87)
know (8.87), pledge
(8.87), shock (8.87), cite
(8.85), appear (8.85), accuse (8.85) receive (8.85),
insist (8.82)
fat (12.83), upbeat
(12.41), cautious (11.83),
optimistic (11.54), such
(9.83)
terrify (11.19), falter
(11.19), appoint (11.09),
poach (11.00), choose
(10.91), say (10.64), show
(10.54), ind (10.47)
lead (10.24), be (7.31)
remain (10.88), insist
(10.47), say (10.38), warn
(10.3), have (9.75), shy
(9.61), pledge (9.61) respond (9.61), cite (9.61),
press (9.61), promote
(9.61) slash (9.53), shrug
(9.53), argue (9.53), enjoy
(9.5), plan (9.48), choose
(9.48), predict (9.48),
want (9.44), believe
(9.41), ind (9.38), show
(9.38), be (9.09)
upbeat (13.41), optimistic
(12.41)
Table 4. Examples from the word sketch analysis of “boss” in The Telegraph
and The Guardian. Raw frequencies ordered from the highest to the lowest
The indings reveal that, in The Guardian, the terms with the highest
frequencies modifying the word “boss” have the semantic connotation
of someone who is more aware about the current problems the UK faces
as regards important social issues like the Brexit vote, inequality between social classes, particularly regarding the salaries earned by bosses and those earned by staff, cases of fraud and corruption on the part of
bosses, etc. (e.g. accuse, bank, British, cautious, charge, class, falter,
fat, lead, matter, receive, shock, skyscraper, spy, top, UK) as well as
someone who acts as an adviser encouraging citizens to improve the
current social situation, as observed in allow, care, ight, go-ahead, insist, new, remain, show, warn.
The malleability behind terms referring to common professional roles...
125
On the contrary, in The Telegraph sample, we perceive that this same
term is modiied by words that tend to connote a person who, despite
being concerned about the economic situation that the population of the
UK may suffer with the consequences of Brexit, the attitude towards
this social issue is more optimistic and conident. Particularly, a boss is
perceived as someone acting as an adviser and calming citizens down,
that the UK has always been a rich and prosperous nation that cannot be
affected, under any circumstances, by the Brexit vote (e.g. be, believe,
business, ight, inance, ind, insist, new, plan, UK, warn). Added to
that, the negative connotations associated with the concept of “boss”
as regards cases of corruption or standing in a more powerful position
than the rest of the population is also given scarce consideration, as
seen by the complete absence of terms such as accuse, cautious, charge,
class, fat, matter, shock, top. The term “boss” is more conceptualised
as a person who deserves respect (e.g. respected, quietly-spoken), as he
or she is chosen and promoted by his or her expertise, intelligence and
skills (e.g. appoint, be, choose, ind, intelligence, promote). Therefore,
he or she can be the perfect guide to ensure workers that the UK is a rich
country and nothing can alter that, even if the UK leaves the EU (e.g.
believe, Britain, British, business, economy, enjoy, optimistic, plan,
predict, remain, show, upbeat, want). Likewise, “bosses” are regarded
as persons who worry about the negative considerations that the society
has towards them regarding cases of bribery and abuse of power, as
observed in the frequent use of the verb terrify.
In addition to all these insights, the corpus and the analysis could
allow for many more indings and interpretations, which would extend
further than the aim of the present research.
6. Conclusions
The present study demonstrates that the concept of “boss” mostly transmitted in current British society, and reinforced through its press media,
implies certain intrinsic deining elements that foster a solid generic
interpretative basis of this professional role as a person who has the
ability to act as an adviser and an expert in his ield promoting initiatives and solutions despite the surrounding setbacks and uncertainty.
This generic interpretative model is also reinforced by widely-accepted
synomyms such as executive, director, head and leader. These persis-
126
Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez
tent semantic components support socially-shared and accepted mental
models which seem to be fairly objective, operative and useful in many
professional contexts.
Notwithstanding this, our analysis also shows that today this concept entails another set of deining and interpretive parameters, of a
more variable and subjective nature, which are highly dependent on the
context and make it very vulnerable to the socio-political ideology or
orientation of the speakers who use it and of the media through which
it is transmitted. Because of this, in our corpus the semantic frames of
“boss” connote both a cautious and gloomy professional who is concerned about –and sometimes adversely involved in– hot socio-economic issues such as Brexit, unfair salaries, inequality, fraud, corruption, etc., and also, by contrast, a hope-inspiring and optimistic expert
who seems to be above all these issues and is more concerned about
predicting a promising and prosperous future for the UK.
The study implies that, depending on the socio-political and contextual factors involving the expression of terms referring to usual
professional roles, their deinition and meaning differ remarkably, also
affecting other associated concepts, such as authority, hierarchy, power,
immunity to criticism, company organisation, etc. This dual conceptual
and malleable nature of their meaning can signiicantly inluence the
correct understanding, translation, acquisition and use of these words,
and their associated cognitive/mental models, in today’s professional
and educational communication.
References
Buchanan, Mark. 2009. Think your boss is incompetent? You’re probably
right. New Scientist 204(2739): 68-69.
Camisón, César & Villar-López, Ana. 2014. Organizational innovation as an
enabler of technological innovation capabilities and irm performance.
Journal of Business Research 67(1): 2891-2902.
Clark, Mary Elizabeth & John F. Riddick. 1991. When the boss is away. Serials Review 17(1): 69-72.
Fillmore, Charles J. 1977. Scenes-and-frames semantics. In Zampolli, Antonio (ed.) Linguistic Structures Processing. Amsterdam: North Holland
Publishing, 55-88.
Fillmore, Charles J. 1985. Frames and the Semantics of Understanding. Quaderni di Semantica 6(2): 222-254.
The malleability behind terms referring to common professional roles...
127
Fillmore, Charles J. & Baker, Collin F. 2010. A frames approach to semantic
analysis. In Heine, Bernd & Narrog, Heiko (eds.) The Oxford Handbook of Linguistic Analysis. Oxford: OUP, 313-339.
Fillmore, Charles J. & Atkins, Beryl T. 1992. Towards a frame-based lexicon:
the semantics of RISK and its neighbors. In Lehrer, Adrienne & Kittay,
Eva Feder (eds.) Frames, Fields and Contrasts.New Essays in Semantic
and Lexical Organization. New York: Routledge, 75-102.
Habermas, Jürgen. 1974. Towards a theory of communicative competence. Inquiry 13(1-4): 360-375.
Hess, James D. 1983. Why are there bosses? In Hess, James D. (ed.) The Economics of Organization. Oxford: North-Holland Publishing Company,
87-97.
Indio, Fabio; Poppi, Massimo & Di Piazza, Salvatore. 2017. In nomine patris:
discursive strategies and ideology in the Cosa Nostra family discourse.
Discourse, Context & Media 15: 45-53.
Kierkegaard, Sylvia. 2005. Privacy in electronic communication: watch your
e-mail, your boss is snooping! Computer Law & Security Review 21(3):
226-236.
Lakoff, George. 1987. Women, Fire, and Dangerous Things: What Categories
Reveal about the Mind. Chicago: The University of Chicago Press.
Lehrer, Adrienne & Kittay, Eva Feder (eds.). 1992. Frames, Fields and Contrasts. New Essays in Semantic and Lexical Organization. New York:
Routledge.
Lehrer, Adrienne. 1974. Semantic ields and lexical structure. New York:
American Elsevier.
Muller-Smith, Patricia. 1998. Being the boss is not what is used to be! Journal
of PeriAnesthesia Nursing 13(5): 317-319.
Nelson, Mike. 2005. Semantic associations in Business English: A corpus
based analysis. English for Speciic Purposes 25(2): 217-234.
Newitz, Annalee. 2006. The boss is watching your every click… New Scientist
191(2571): 30-31.
Oxford English Dictionary. 2016. Oxford living dictionaries. Oxford: Oxford University Press. https://en.oxforddictionaries.com [Accesed
12/12/2016].
Shi, Lei & Mihalcea, Rada. 2005. Putting pieces together: combining
FrameNet, VerbNet and WordNet for robust semantic parsing. In Gelbukh, Alexander (ed.) CICLing 2005. Berlin: Springer-Verlag, 100-111.
Sluss, David M. & Ashforth, Blake E. 2007. Relational identity and identiication: deining ourselves through work relationships. Academy of Management Review 32(1): 9-32.
128
Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez
Uhl-Bien, Mary & Carstern, Melissa K. 2007. Being ethical when the boss is
not. Organizational Dynamics 36(2): 187-201.
Van Dijk, Teun A. 2006. Discourse, context and cognition. Discourse Studies
8(1): 159-177.
Van Dijk, Teun A. 2008. Discourse and Context. A Sociocognitive Approach.
Cambridge: Cambridge University Press.
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
A quantitative survey of N Prep N constructions in Romance
languages and prepositional variability
Un estudio cuantitativo de las construcciones N Prep N
en las lenguas románicas y variabilidad preposicional
Inga Henneckea & Harald Baayenb
Universität Tübingen.
[email protected]
Universität Tübingen.
[email protected]
Received: 24/04/2017. Accepted: 10/10/2017
a
b
Abstract: The distinction between syntagmatic compounds of the type N Prep N, such
as Fr. jouet d’enfant, and nominal syntagms of the type N Prep N, such as the partially
equivalent Fr. jouet pour enfants, remains unclear and vague. This is mainly because
the lexical and syntactic status of syntagmatic compounds still is controversial. In some
cases, as in jouet d’enfant and jouet pour enfants, partial equivalent syntagmatic compounds and nominal syntagms may coexist and underlie a speciic variation and alternation. In other cases, such as Pt. bracelete de aço and bracelete em aço, two variants
of a syntagmatic compound may alternate and coexist.
The irst part of this paper provides an overview of the current discussion on these two
types of constructions. The second part addresses the alternation and variation of syntagmatic compounds and nominal syntagms by means of analysis of large-scale corpus
data, the French, Spanish and Portuguese corpus of the TenTen family. Here, the focus
lies on the variation of the prepositional internal element of these constructions as well
as on a comparison of different word formation patterns.
Keywords: Compounds; quantitative corpus linguistics; lexicon-syntax interface; Romance.
Resumen: La distinción entre los compuestos sintagmáticos del tipo N Prep N, como
por ejemplo Fr. jouet d’enfant, y los sintagmas nominales del tipo N Prep N, como
Fr. jouet pour enfants, sigue siendo confusa. Esto se debe, sobre todo, a que no existe
consenso a propósito de la categorización léxica y sintáctica de los compuestos sintagmáticos. En algunos casos, como en jouet d’enfant y jouet pour enfants, se trata de
equivalentes parciales que pueden coexistir y estar sujetos a una variación y alternancia
Hennecke, Inga & Baayen, Harald. 2017. “A quantitative survey of N Prep N constructions in Romance languages and prepositional variability”. Quaderns de
Filologia: Estudis Lingüístics 22: 129-146. doi: 10.7203/qf.22.11305
especíica. En otros, como en Pt. bracelete de aço y bracelete em aço, las posibles variaciones pueden alternar y coexistir en prácticamente todos los contextos.
La primera parte de esta contribución ofrece un breve resumen de la discusión reciente sobre estos dos tipos de construcciones. La segunda sección discute la alternancia
y variación de los compuestos sintagmáticos y los sintagmas nominales mediante el
análisis de diferentes corpus de gran tamaño: el corpus español, francés y portugués
de los corpus TenTen. El análisis se centra especialmente en la variación del elemento
preposicional interno de los compuestos y los sintagmas, y en la comparación entre los
diferentes tipos de formación de palabras que tienen lugar en ellos.
Palabras clave: palabras compuestas; lingüística de corpus cuantitativa; interfaz léxico-sintaxis; lenguas románicas.
A quantitative survey of N Prep N constructions in Romance languages... 131
1. State of the Art
Terminological insecurity and inconsistent classiications dominate the
scientiic debate on syntagmatic compounds of the type N Prep N in
Romance languages. Currently, possible denominations include terms
such as phrasal compounds (Bisetto & Scalise, 2005), syntactic compounds (Rio-Torto & Ribeiro, 2009), improper compounds (Kornfeld,
2009), phrasal lexemes (Masini, 2007, 2009; Masini & Scalise, 2012),
“frozen” multiword units (Guevara, 2012), lexicalized syntactic constructions (Villoing, 2012), lexicalized phrases (Fradin, 2009), syntactic words (DiSciullo & Williams, 1987) or even syntactic syntagms or
prepositional syntagms. The heterogeneous terminology goes along
with a diverse delimitation and integration of different types of lexical
and syntagmatic units. In the same way, syntagmatic compounds of the
type N Prep N may or may not – depending on the underlying terminology – be included in the group of compounds.
Moyna (2011) includes in her deinition of syntagmatic compounds
different combinations of substantives and adjectives, which may or
may not show orthographic union:
[N PREP N]N
[N PREP Art N]N
[N + A]N
[A + N]N
dulce de leche,
árbol de la cera
hierbabuena
malasombra
“caramel”
“wax myrtle”
“mint”
“evil person”
(Moyna 2011: 38)
In contrast, Masini (2009) does not include orthographically uniied
combinations, such as hierbabuena, but she adds constructions of the
type N Prep VINF, such as salle à manger ‘dining room’.
Traditional grammars and dictionaries generally classify nominal
syntagmatic compounds of the type Sp. bicicleta de montaña ‘mountain
bike’, Fr. brosse à dents ‘tooth brush’ or Pt. moinho de vento ‘windmill’ as lexical units and therefore as compounds. But Kabatek & Pusch
(2009) indicate that it is not always clear how to differentiate between
lexical items of the type perro de caza and more syntactic items such as
libro para niños (Kabatek & Pusch, 2009: 93f.). According to de Bustos
Gisbert, syntagmatic compounds consist of at least two etymological
words and are formally not distinguishable from nominal phrases (de
Bustos Gisbert, 1986: 69). In the same line of argumentation, Masini
132
Inga Hennecke & Harald Baayen
notes that syntagmatic compounds of the type N Prep N follow the normal syntactic patterns of head modiication of the nominal phrase by the
prepositional phrase (2009: 257). N Prep N constructions in Romance
languages therefore tend to be left-headed and inlectional processes are
performed at the head constituent (ibd.).
According to Val Àlvaro (1999), the main distinctive feature between syntagmatic compounds and free nominal syntagms is the absence of a compositional meaning in syntagmatic compounds (Val
Àlvaro, 1999: 4827). Therefore, they can be interpreted as complex
nominals and not as nominal phrases. In the same line of argumentation, Štekauer (2001b: 39) classiies ‘syntax-based word formations’
such as son-in-law or stuff-leaver as onomasiological naming units that
dispose of an internal structure and resort to the same word formation
processes as other naming units. Furthermore, syntagmatic compounds
generally differ from nominal syntagms in that they form an accentual
unit (de Bustos Gisbert, 1986).
Still, a main concern of past research on syntagmatic compounds
was their delimitation, especially by introducing new delimitation tests
(e.g. Bouvier, 2000; Buenafuentes de la Mata, 2006; Bisetto & Scalise,
2005; Lieber & Scalise, 2007; Masini, 2009; Masini & Scalise, 2012).
These tests generally include criteria such as the modiication of the
constituents (e.g. modiication of the constituent order, insertion or
omission of elements) via topicalization, intensiication or the insertion
of modifying adjectives. For Portuguese, the last two tests can be exempliied by Rio-Torto and Ribeiro (2012: 125):
moinho de vento
moinho *antigo de vento
moinho de *muito vento
“windmill”
“*wind old mill”
“*wind much mill”
These delimitation tests are of major importance for studies taking
a lexicological, semantic and morphological perspective. These studies
generally follow Benveniste (1966) in his statement that syntagmatic
compounds are the real word formation process in French. In this perspective, syntagmatic compounds are commonly perceived as lexical
structures that may show signs of internal syntactic patterns (Z.B. Bisetto & Scalise, 1999, 2005; Rio-Torto & Ribeiro, 2012). In contrast,
studies that focus on syntax, such as Kornfeld (2003) or Lieber (1992),
A quantitative survey of N Prep N constructions in Romance languages... 133
generally perceive syntagmatic compounding as a clearly syntactic process. Other studies again do not focus on the delimitation of lexicon
and syntax. From a construction grammar, respectively a construction
morphology perspective, syntagmatic compounds and (partially) equivalent nominal syntagms are both considered as constructions, lying on
a continuum between lexicon and morphosyntax (e.g. Masini 2009).
Still, these studies also target a description and classiication of different constructions, such as syntagmatic compounds, phrases and other
types of compounds (Masini 2009). In the present account, we argue
that there is no clear line between syntagmatic compounds and syntactic constructions, but that they lie on a continuum between a lexicalized
and syntactic pole.
A second major concern in research on syntagmatic compounds is
the question of whether these constructions are lexicalized syntactic
constructions or whether they emerge by productive word formation
patterns. Rainer (2016) clearly opts for the classiication of syntagmatic
compounds as productive lexical patterns:
Formations of this kind [syntagmatic compounds] are not, as often
stated erroneously, the result of the lexicalization of regular syntactic
sequences, but constitute very productive lexical patterns (…) (Rainer
2016: 2624).
In contrast, Guevara (2012) excludes syntagmatic compounds of
the type in de semana ‘weekend’ from its description of Spanish compounds, along with cases such as sabelotodo ‘know-it-all’. He explains
his decision in that “they are clearly not formed by any rule of the language, they are “frozen” multiword units arising as the result of processes of lexicalization and fossilization and do not belong in the core
of word-formation” (Guevara, 2012: 179). In a similar argumentation,
Villoing excludes “lexicalized syntactic constructions that behave like
lexical units” (Villoing, 2012: 35) such as il de fer ‘wire’, brosse à
dents ‘toothbrush’ but also sous verre ‘coaster’, sans-papier ‘illegal
immigrant’ and boit-sans-soif ‘boozehound’ from his delimitation of
compounds. By contrast, in the same volume on Romance compounds,
Rio-Torto & Ribeiro (2012) propose a classiication of phrasal compounds, such as caminho de ferro ‘railway’ in Portuguese, which are
classiied as involving “word sequences whose internal structure obeys
the syntax rules typical of phrases” (Rio-Torto & Ribeiro, 2012: 7).
134
Inga Hennecke & Harald Baayen
This short introduction to the current discussion demonstrates strikingly the terminological insecurity as well as the problematic delimitation and classiication of syntagmatic compounds (for an overview
see e.g. Bisetto & Scalise, 2005; Lieber & Scalise, 2007). The most
prominent problem in this debate is by far the question of whether syntagmatic compounds should be considered as a part of the lexicon or a
part of syntax. Furthermore, in most of the cases, the discussion comes
down to the crucial question of whether syntagmatic compounding is
a process of lexicalization or a process of productive word formation.
In the present paper, we assume that syntagmatic compounding is a
productive and rule-governed process of word formation in Romance
languages. Furthermore, we assume that there is no clear boundary between lexicalized and syntactic constructions of the type N Prep N.
The aim of the present work is to have a closer look at syntagmatic
compounding of the type N Prep N in corpora of written French, Spanish, and Portuguese, focusing on the internal variation of N Prep N constructions as well as on their frequency and productivity and potential
differences across these three languages.
2. Internal alternation and variation in syntagmatic compounds
The above review of the theoretical status of syntagmatic compounds in
Romance languages does not present a uniied perspective. Nevertheless, syntagmatic compounds appear to be at least partially lexicalized
constructions. The degree of their lexicalization may vary along with
other factors such as semantic opacity/idiomaticity, entrenchment, ixedness of the internal constituents, frequency of occurrence, productivity etc. Despite their more or less strong degree of lexicalization, syntagmatic compounds still appear to preserve at least some of their syntactic
characteristics. The at least partially syntactic character of syntagmatic
compounds is apparent from the internal lexical and inlectional variation of these constructions. Rio-Torto and Ribeira (2012) consider
the possibility of internal change in N Prep N – constructions as a test
of compound status. From this perspective, examples of constructions
in which the preposition can be replaced without changing meaning
would imply the construction to be syntactic rather than lexical. Thus,
the pair Pt. forno a microondas and forno de microondas ‘microwave
oven’, where no clear semantic difference is discernable, would sug-
A quantitative survey of N Prep N constructions in Romance languages... 135
gest we are dealing with a syntactic construction, but conversely the
French pair lûte de champagne ‘glass of champagne’ and lûte à champagne ‘champagne glass’, where there is a change of meaning, would
indicate word formation is at issue. However, the phenomenon of internal prepositional alternation appears to be more complex than this.
Internal alternation of the preposition appears to be not uncommon in
Romance languages. The possibility of alternation depends to a large
extent on factors such as the semantic function of the N2 as well as on
the ixedness and idiomaticity of the whole construction. Consider the
following examples:
1a. Sp. esmalte de uñas – esmalte para uñas (Pacagnini 2003)
“nail polish”
“polish for nails”
b. Sp. água de lavagem – água para lavagem (ptTenTen)
“wash water”
“water for washing”
c. Fr. jouet d’enfant – jouet pour enfants (frTenTen)
“toy”
“toy for kids”
2a. Sp. motor(es) de gasolina – motores a gasolina (esTenTen)
“gas engine”
b. Fr. épingle de nourrice – épingle à nourrice
“safety pin”
c. Pt. Fogão de lenha – Fogão a lenha (ptTenTen)
“wood stove”
3a. Fr. chemise de coton – chemise en coton (frTenTen)
“cotton shirt”
“shirt of cotton”
b. Pt. bracelete de aço – bracelete em aço (ptTenTen)
“steel bracelet”
“bracelet of steel”
c. Sp. ciclismo de pista – ciclismo en pista (esTenTen)
“track cycling”
“cycling on track”
In example 1, we see internal variation of the linking preposition
de/para and de/pour. While the constructions containing de are clearly
lexicalized, the combinations containing para/pour count as syntactic
constructions. The use of pour/para intensiies the semantic relation of
the two nominal items in the constructions, in this case ‘function’ (see
Kornfeld 2009: 442 ff.). In 1a. and 1b., the N2 designates the object
(1a.) or the process (1b.) of use of the N1, whereas in 1c. the user of N1
is speciied.
136
Inga Hennecke & Harald Baayen
Example 2 illustrates the alternation between the prepositions de
and à (a). Here, both variants have lexical status that does not trigger
a change from lexical to syntactic status. The same applies to the examples in 3, where we cannot identify a change in the lexical status,
but clearly a certain discrepancy in the degree of lexicalization and the
semantic relation between N1 and N2. That is to say that the constructions as shown in example 1.-3. are only considered partial equivalents,
as they may also differ from each other in their actual usage frequency,
their productivity and their opacity.
Some authors, such as Kampers-Manhé (2001), argue that the internal
preposition has purely connecting properties (“opérateurs de couplage”)
(Kampers-Manhé 2001: 107) and “ne sont pas porteuses de sens” (ibd.).
The above examples suggest that the preposition is not semantically completely inert, even though, as we shall see below, some noun pairs show
considerable variation with respect to the choice of the internal preposition. Furthermore, the possibility of internal variation in the above examples indicates that these constructions may not be completely lexicalized.
They still allow internal modiication that appears to be syntactically motivated.
The following quantitative corpus survey aims to give further evidence for the productivity and frequency of the internal prepositional
variation in syntagmatic compounds in Romance languages.
3. Corpus survey
3.1. Data
The present corpus linguistic investigation is based on three web corpora from the TenTen corpus family from Sketchengine1, more precisely on the corpora frTenTen12 (French), esTenTen11 (Spanish) and
ptTenTen11 (Portuguese). Their type counts range from 4 to 10 billion
and their token count ranges from 5 to 11 billion (see General Corpus
Information on sketchengine.co.uk):
1
<https://www.sketchengine.co.uk>.
A quantitative survey of N Prep N constructions in Romance languages... 137
frTenTen
esTenTen
ptTenTen
Tokens
11,444,973,582
10,994,616,207
4,626,584,246
Words
9,889,689,889
9,497,402,122
3,900,501,097
Sentences
456,065,104
407,205,587
190,221,913
Paragraphs
188,079,362
213,364,685
91,248,976
Documents
20,400,411
22,287,566
10,216,060
Table 1. Corpus Info of the TenTen corpora for French, Spanish and Portuguese
(https://the.sketchengine.co.uk)
The corpora ptTenTen and esTenTen can furthermore be divided into
an American and a European part, whereby the majority of the data represent American varieties of Spanish (79% of the esTenTen data) and
Portuguese (76% of the ptTenTen data). We made use of normalized
samples of 100 million tokens each, provided to us by Sketchengine.
Language
Types
Tokens
French
284.432
1.301.850
Spanish
385.162
1.949.941
Portuguese
642.022
3.204.462
Table 2. Type and token counts of N Prep N sequences in the TenTen corpora
for French, Spanish, and Portuguese
Table 1 lists type and token counts for all N Prep N sequences in
the three corpora. In Portuguese, the construction seems to appear on
a particularly frequent basis when compared to French and Spanish,
which show relatively similar frequencies. The frequent occurrence of
the N Prep N construction is in part due to the existence in Portuguese
of hybrid forms of the type Prep + Art (do(s), da(s), na(s), no(s)) as well
as Prep + Pron (daquela(s)/e(s), naquela(s)/e(s); deste(s)/a(s), neste(s)/a(s)). The equivalent constructions in French and Spanish would
be of the form N Prep Article N. In order to dispose of a syntactically homogenous dataset, these constructions were not included for the
present analysis. In what follows, we refer to the complete set of N
Prep N sequences extracted from the corpora as dataset 1. This dataset
138
Inga Hennecke & Harald Baayen
is noisy and contains instances in which the N Prep N sequence is not a
syntactic or onomasiological unit, that is to say a naming unit (Štekauer
(2001b). Removal of these irrelevant cases from a list of more than 6
million examples was beyond the scope of the present study. Despite
this noise, dataset 1 was included in the quantitative survey in order to
obtain an overview of the occurrence and productivity of the construction type N Prep N in the languages under investigation. Furthermore,
the results from the analysis of dataset 1 offer a irst point of comparison of the analysis of dataset 2.
From dataset 1, a second dataset was derived from which word triplets that did not instantiate the N Prep N construction were manually
removed. This second dataset, henceforth dataset 2, focused on the internal preposition of the constructions. In a irst step, all constructions
overlapping in their N1 and N2 and diverging in their preposition were
selected (e.g. livre pour/d’enfants). In a second step, the data was manually inspected and the following constructions were excluded: grammaticalized constructions (frente a, jusqu’à, en dehors), partitive constructions or spatial, temporal or mass nouns (kilo de, lunes a viernes,
visita a Roma, journées par semaine), binominal pairs (dia a dia, instant après instant), antonyms (chien sans/avec laisse, personnes avec/
sans emploi), preposition phrases (N1 à base de, par hasard de), verb
phrases (mettre N1 en danger, donner N1 à N2), and hybrid forms of
the above.
Language
Types
Tokens
French
1062
6991
Spanish
547
10219
6795
58932
Portuguese
Table 3. Type and token counts for dataset 2, which includes all pairs of nouns
that are attested with at least two different internal prepositions
Table 3 lists type and token counts for dataset 2. As for dataset 1, the
counts for Portuguese outnumber those for French and Spanish.
Both datasets were further analysed by considering, in addition to
the counts of tokens (N) and types (V), the counts of hapax legomena
(V1, the formations occurring once only), the productivity measure P =
A quantitative survey of N Prep N constructions in Romance languages... 139
V1/N, which assesses the probability that an additional N Prep N token
represents a novel, previously unobserved type, and an estimate S of
the potential number of formations in use in the text type sampled by
the corpus. Note that S = V + V0, where V0 is the count of formations
that do not appear in the sample. S can be estimated given the numbers
of word types Vk that occur once, twice, three times etc., when these
counts Vk decrease in a regular way. If so, V0 can be estimated and
given V0, an estimate of S = V + V0 follows immediately. For further
mathematical detail on these measures, see Baayen (2009) and for the
estimation of S, Baayen (2001, 2008).
Thus, we have three estimates, each highlighting a different aspect
of productivity: The number of types V for the extent to which a head
or modiier position is used in the corpus, the probability P that when
the corpus is increased, new types will be sampled, and the limiting
number of types that one might sample if the corpus size were increased
to ininity.
3.2. Analysis dataset 1
Table 4 summarizes the frequency and productivity statistics for dataset
1, focusing on the productivity of the nominal slots in the N Prep N
construction.
The upper subtable documents the counts when types are deined
by the irst noun of the construction. The lower subtable concerns the
corresponding counts for the second noun. On the basis of the numbers
of tokens N, types V, potential types S, and hapax legomena V1, the N
Prep N construction appears least productive in French, of medium productivity in Spanish, and most productive in Portuguese. This ordering
holds for both the irst and the second noun.
The ranking of the three languages by P is different, with Portuguese
having the lowest productivity measure. It should be kept in mind, however, that P is itself a function of N, and that it decreases as N (and V)
increase. (As we read through a text, the rate at which new words are
encountered decreases steadily.) Given that N is very much larger for
Portuguese, the value of P is actually surprisingly large. Comparing
Spanish and French, the similar values of P are surprising given that N
is substantially larger for Spanish than for French. Therefore, the P values provide further support for the ranking based on the other statistics.
140
Inga Hennecke & Harald Baayen
Noun1
P
S
V
N
V1
Noun2
P
S
V
N
V1
French
0.0023
20147
13719
1301850
2994
French
0.0028
24688
16174
1301850
3645
Spanish
0.0023
28755
18407
1949941
4485
Spanish
0.0031
39037
23245
1949941
6045
Portuguese
0.0017
36624
23409
3204462
5448
Portuguese
0.0023
49079
28545
3204462
7370
Table 4. Frequency and productivity statistics for dataset 1. The upper part
of the table deines types on the basis of the irst noun, the lower part bases types
on the second noun
Table 4 also indicates that the second noun position of the construction is used more productively than the irst noun position: all measures
assume larger values in the second part of the table. The greater productivity of the modiier position makes sense from an onomasiological
perspective, as the second noun slot is typically used to differentiate
between subcategories of the head noun, which in Romance languages
generally occupies the irst noun slot.
The large numbers of hapax legomena, as well as the fact that S >>
V all support – within the limits of dataset 1 – that the N Prep N construction is solidly productive in the three Romance languages under
consideration here.
Further informal surveys of the prepositions de, en-em, à-a, pour-para as well as avec-con-com, again using dataset 1, indicated that French
N Prep N constructions containing the prepositions avec and pour are
less frequent and productive than equivalent constructions in Portuguese and Spanish containing the prepositions con-com and para.
French appears to resort to other types of word formation such as NN
or NA constructions instead of using N Prep N constructions containing
avec, as in:
A quantitative survey of N Prep N constructions in Romance languages... 141
5a) Fr. personne handicapée “handicapped person”
b) Sp. persona con discapacidad física/mental “handicapped person”
c) Pt. pessoa com necessidades especiais “handicapped person”
French also shows a preference for constructions with de instead of
pour. At the same time, constructions with the preposition à-a appear
to be more productive and frequent in French than in Spanish and Portuguese. Semantic relations that are expressed via à in French tend to
require other prepositions, such as de or para, in Spanish or Portuguese:
7a) Fr. Verre à vin “wine glass”
b) Sp. Copo de vino/ Copo para vino “wine glass”
c) Pt. Copo de vinho “wine glass”
3.3. Analysis dataset 2
Table 5 summarizes the frequency and productivity measures for data
set 2, which includes only those (manually veriied) examples of N Prep
N constructions in which the irst and second noun co-occur with at
least two different prepositions. For this analysis, each combination of
irst and second noun and preposition counted as a separate type.
P
S
V
N
V1
French
0.0594
1748.232
1062
6991
415
Spanish
0
547
10219
0
Portuguese
0.0464
13378.57
6795
58932
2733
Table 5. Frequency and productivity statistics for dataset 2, which comprises
all instances of noun pairs that occur with at least two different prepositions
As in the analysis of dataset 1, Portuguese again shows the highest
type (V) and token (N) frequencies, the largest number of hapax legomena (V1), the highest estimate of possible types (S), and given the
large numbers of tokens, a surprisingly large degree of productivity P.
Although numbers are reduced for French, the construction – as evaluated on the basis of dataset 2 – remains solidly productive, as evidenced
142
Inga Hennecke & Harald Baayen
by the large number of types missed in the sample (S – V = 1748-1062
= 686 = V0).
Spanish, by contrast, shows a very different pattern. There are no
hapax legomena in dataset 2 for Spanish, and hence P is zero, and S
cannot even be estimated (it is expected to be only slightly larger than
V, if at all). The number of types (547) is roughly half of that observed
for French, and less than 10% of that observed for Portuguese. In other
words, internal variation of the preposition for ixed head and modiier
nouns is not productive in Spanish, whereas it is productive in French
and especially Portuguese. In Portuguese, we ind examples of noun
pairs occurring with 5 different prepositions, in French, this reduces to
4, and in Spanish, the maximum is 3.
Thus, when we consider the productivity of internal variation of the
preposition, the ranking of the languages places French above Spanish.
Inspection of the Spanish examples suggests a strong tendency to make
use of the high frequent preposition de and to restrict variation in prepositions to a relatively small set of lexicalized compounds.
4. Discussion
The present study sheds new light on the vexed question of the status
of N Prep N construction in Romance languages. First, the survey of N
Prep N sequences in the TenTen corpora of French, Spanish, and Portuguese clearly shows that this construction contributes substantially
to the lexicon (in the onomasiological sense) of these languages. In
all three languages, the construction is realized in tens of thousands
of examples (dataset 1). Admittedly, dataset 1 includes many instances
that do not conform to the N Prep N construction. Nevertheless, even if
half of the tokens and types were to be discarded, the counts of legitimate constructions still would portray this construction as the most productive onomasiological process in Romance – mirroring the evidence
from Germanic languages suggests that derivational word formation is
less productive than compounding by several orders of magnitude. It
is therefore unlikely that N Prep N constructions in Romance languages are merely lexicalized or fossilized syntactic constructions without
support of a productive process of word formation (pace Guevara 2012
and Villoing 2012). To the contrary, for all three languages, large numbers of novel types are expected to be observable in larger samples of
A quantitative survey of N Prep N constructions in Romance languages... 143
language use, as indicated by the (tentative) estimates of the population
numbers of types (S).
An analysis of a hand-curated subset of dataset 1, comprising all
attestations of N1 Prep N2 constructions in which N1 and N2 co-occur
with at least two different prepositions (dataset 2), brought to light an
unexpected difference between Portuguese and French on one hand,
and Spanish on the other hand. Portuguese, and to a lesser extent
French, exhibit productive internal variation of the preposition. Spanish, by contrast, appears not to allow its speakers the same lexibility in
the choice of preposition. In the absence of hapax legomena for Spanish noun pairs, Spanish emerges as a language that avoids both “free”
variation of the preposition for approximately the same meaning, as
well as using different prepositions for differentiating between shades
of meaning given a modiier and head noun (as instantiated for instance
for French by the pair ‘verre à vin’ and ‘verre de vin’).
An informal survey of which prepositions are favored revealed
French as showing a stronger preference for constructions containing
the preposition à compared to Spanish or Portuguese, which use de
or para more productively. The absence of avec in French N Prep N
constructions is likely to be due to NA-constructions being preferred.
In French, pour emerged as slightly more productive than de (e.g. livre
d’enfant and livre pour enfants).
5. Conclusions
The present quantitative survey of N Prep N constructions in Spanish,
French and Portuguese offers new empirical evidence for the discussion
on Romance word formation. The two main points addressed in this
study concern the lexical or syntactic status of syntagmatic compounds
as well as their productivity and degree of lexicalization or fossilization.
The analysis indicates that these constructions indeed are realized
according to productive processes of Romance word formation. That
is to say, syntagmatic compounds are naming units that form part of
the lexicon. N Prep N constructions are not merely fossilized syntactic
constructions, rather, the construction type N Prep N is an important
and frequently used mechanism of word formation. Still, it is important
to highlight that it is neither possible nor necessary to draw a clear line
between lexical onomasiological units of the type N Prep N and syn-
144
Inga Hennecke & Harald Baayen
tactic constructions of the type N Prep N. Here, different criteria, such
as the degree of ixedness, idiomaticity and compositionality play an
important role.
Furthermore, the present quantitative analysis points out that internal prepositional variation is possible in N Prep N constructions in
Romance languages, but that this variation displays different characteristics in the three Romance languages under investigation. Portuguese
shows the highest frequency and productivity of internal prepositional
variation in a large number of different semantic contexts. In contrast,
the Spanish data do not allow any productivity in the internal variation
of N Prep N constructions. In the same line, Spanish has the strongest
tendency of employing the preposition de as internal prepositions in N
Prep N constructions.
In conclusion, it can be stated that syntagmatic compounds of the
type N Prep N form a productive and frequent part of Romance word
formation. Still, their frequency and productivity as a word formation
type vary in the three Romance languages, as well as their disposition
for internal prepositional variation. Further studies on this subject need
to consider the qualitative characteristics of internal prepositional variation, notably the semantic relation between the N1 and the N2.
5. References
Anshen, Frank & Aronoff, Mark. 1997. Morphology in real time. In Geert, E.
Booij & van Marle, Jaap (eds.) Yearbook of Morphology 1996. Dordrecht: Kluwer Academic Publishers, 9-12.
Aronoff, Mark. 1976. Word formation in generative grammar. Cambridge,
MA: MIT Press.
Baayen, Harald & Lieber, Rochelle. 1991. Productivity and English derivation: a corpus-based study. Linguistics 29(5): 801-844.
Bauer, Laurie. 2001. Morphological productivity. Cambridge.
Baayen, R. H. 2009. Corpus linguistics in morphology: morphological productivity. In Lüdeling, A. & Kyto, M. (eds.) Corpus Linguistics. An
International Handbook. Berlin: Mouton De Gruyter, 900-919.
Baayen, R. H. 2008. Analyzing Linguistic Data. A Practical Introduction to
Statistics Using R. Cambridge University Press.
Baayen, R. H. 2001. Word Frequency Distributions. Kluwer.
Benveniste, Émile (ed.). 1966. Problèmes de linguistique générale (Bibliothèque des sciences humaines 1). Paris: Gallimard.
A quantitative survey of N Prep N constructions in Romance languages... 145
Bisetto, Antonietta & Scalise, Sergio. 1999. Compounding: morphology and/
or syntax? In Mereu, Lunella (ed.) Boundaries of Morphology and Syntax (Amsterdam Studies in the Theory and History of Linguistic Science 4). Amsterdam/Philadelphia: Benjamins, 31-49.
Bisetto, Antonietta & Scalise, Sergio. 2005. The classiication of compounds.
Lingue e Linguaggio 4(2): 319-332.
Bouvier, Yves F. 2000. Deinir les composes par opposition aux syntagmes. In
Haeberli, Eric & Laenzlinger, Christopher (eds.) Generative Grammar
in Geneva, 165-187.
Buenafuentes de la Malta, Cristina. 2006/04. Entre la morfología, la sintaxis
y el léxico: la delimitaciòn de la composición sintagmática en espanol
(VII Congrés de Lingüística General). Barcelona.
Di Sciullo, Anne-Marie & Williams, Edwin. 1987. On the deinition of word
(Linguistic inquiry. Monographs 14). Cambridge, MA.
Faria, André. 2010. Formação de compostos nominais de base livre do PB. In
Almeida, Maria L.; Ferreira, Rosangela & Pinheiro, Diogo (eds.) Linguística cognitiva em foco: morfologia e semântica do português. Rio
de Janeiro: Soluções Editoriais.
Fradin, Bernhard. 2009. IE, Romance: French. In Lieber, Rochelle & Štekauer,
Pavol (eds.) The Oxford Handbook of compounding. Oxford University
Press, 417-435.
Guevara, Emiliano R. 2012. Spanish compounds. Probus. International Journal of Latin and Romance Linguistics 24(1): 175-195.
Kabatek, Johannes & Pusch, Claus D. 2009. Spanische Sprachwissenschaft:
Eine Einführung. Tübingen: Narr Franke Attempto Verlag.
Kampers-Manhe, Brigitte. 2001. Le statut de la préposition dans les mots composés. Travaux de Linguistique 42-43(1), 83-95.
Kornfeld, Laura M. 2003. Compounds N+N as formally lexicalized appositions in Spanish. In Booij, Geert; De Cesaris, Janet; Ralli, Angela &
Scalise, Sergio (eds.) Topics in Morphology: Selected Papers from the
Third Meditteranean Morphology Meeting. Barcelona: Universitat de
Pompeu Fabra, 211-225.
Kornfeld, Laura M. 2009. IE, Romance: Spanish. In Rochelle Lieber & Pavol
Štekauer (eds.) The Oxford Handbook of Compounding. Oxford University Press , 436-453.
Lieber, Rochelle. 1992. Deconstruction Morphology: Word Formation in Syntactic Theory. Chicago/London: University of Chicago Press.
Lieber, Rochelle & Scalise, Sergio. 2007. The lexical integrity hypothesis in
a new theoretical universe. In Booij, Geert; Ducceschi, Luca; Fradin,
Bernhard; Guevara, Emiliano R.; Ralli, Angela & Scalise, Sergio (eds.)
On-line Proceedings of the Fifth Mediterranean Morphology Meeting,
1-25.
146
Inga Hennecke & Harald Baayen
Masini, Francesca. 2009. Phrasal lexemes, compounds and phrases: A construcionist perspective. Word Structure 2(2): 254-271.
Masini, Francesca & Scalise, Sergio. 2012. Italian compounds. Probus. International Journal of Latin and Romance Linguistics 24(1): 61-91.
Masini, Francesca & Thornton, Anna. 2007. Italian VEV lexical constructions.
In Booij, Geert; Ralli, Angela & Scalise, Sergio (eds.) Morphology and
Dialectology 6: 148-189.
Pacagnini, Ana M. J. 2003. Compuestos sintagmáticos y alternancia preposicional. Moenia 9: 159-172.
Rainer, Franz. 2016. Italian. In Müller, O. P.; Ohnheiser, Ingeborg; Olsen, Susan & Rainer, Franz (eds.) Word Formation: An International Handbook of the Languages of Europe. Berlin/Boston: Mouton de Gruyter,
2712-2731.
Rainer, Franz. 2016. Spanish. In Müller, O. P.; Ohnheiser, Ingeborg; Olsen,
Susan & Rainer, Franz (eds.) Word Formation: An International Handbook of the Languages of Europe. Berlin/Boston: Mouton de Gruyter,
2620-2640.
Rio-Torto, Graça & Ribeiro, Sílvia. 2009. Compounds in portuguese. Lingue e
Linguaggio 8(2): 271-291.
Rio-Torto, Graça & Ribeiro, Sílvia. 2012. Portuguese compounds. Probus. International Journal of Latin and Romance Linguistics 24(1): 119-145.
Schlechtweg, Marcel & Härtl, Holden. 2015. Compound versus phrase: Evidence from a learning study (10th Mediterranean Morphology Meeting). Haifa.
Štekauer, Pavol. 2001b. Fundamental principles of an onomasiological theory
of English word-formation. Onomasiology Online 2: 1-42.
van Goethem, Kristel. 2009. Choosing between A+N compounds and lexicalized A+N phrases: The position of French in comparison to Germanic
languages. Word Structure 2(2): 241-253.
Villoing, Florence. 2012. French compounds. Probus. International Journal of
Latin and Romance Linguistics 24(1): 29-60.
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
Lingüística de corpus y fraseología contrastiva (alemán-español):
Las combinaciones usuales de estructura [PREP + S].
El caso de entre lágrimas y unter Tränen
Corpus linguistics and contrastive phraseology (German-Spanish):
The multi word units of [PREP+N] structure.
The case of entre lágrimas and under Tränen
Ana Mansilla
Universidad de Murcia.
[email protected]
Recibido: 14/05/2017. Aceptado: 25/10/2017
Resumen: En el presente artículo analizamos la ijación externa e interna que presenta
el binomio [entre + S] con artículo cero en español, y sus equivalentes en alemán en
las combinaciones usuales entre lágrimas y unter Tränen. Los datos son extraídos del
corpus DeReKo (Das Deutsche Referenzkorpus) y del Sketch Engine, deTenTen 13 y
eseuTenTen11. En primer lugar, presentamos los objetivos del proyecto en el que este
trabajo está enmarcado. En un segundo punto exponemos las diferentes aplicaciones de
la lingüística de corpus en el ámbito de la fraseología. Por último, en base a los corpus
consultados comentamos convergencias o divergencias más notables de las combinaciones usuales objeto de estudio.
Palabras clave: fraseología; lingüística de corpus; combinaciones usuales; alemán;
español.
Abstract: The present article analyses the external and internal ixation of the binomial
pattern [entre + NN] with zero article in Spanish and its equivalents in German. In
particular, the article focuses on the expressions entre lágrimas/unter Tränen. The data
have been extracted from the DeReKo corpus (Das Deutsche Referenzkorpus) and from
the Sketch Engine corpora deTenTen 13 and eseuTenTen11. First, the article will present
the objectives of the research project of which this study forms part. Then, I will address
some applications of corpus linguistics in the ield of phraseology. Finally, similarities
and differences between the expressions investigated will be analysed based on the
evidence obtained from the corpora.
Keywords: phraseology; corpus linguistics; common multi-word units; German; Spanish.
Mansilla, Ana. 2017. “Lingüística de corpus y fraseología contrastiva (alemán-español): Las combinaciones usuales de estructura [PREP + S]. El caso de entre
lágrimas y unter Tränen”. Quaderns de Filologia: Estudis Lingüístics 22: 147164. doi: 10.7203/qf.22.11306
Lingüística de corpus y fraseología contrastiva (alemán-español)...
149
1. Introducción
Objeto del presente artículo es abordar los desarrollos recientes de la
fraseología desde la perspectiva de la lingüística de corpus y analizar
las posibilidades que ofrecen las tecnologías lingüísticas para la resolución de aspectos pendientes como el signiicado, la función o la forma
de las unidades fraseológicas. Este terreno emergente de la lingüística de
corpus en el ámbito de la fraseología bilingüe (alemán-español) se
encuentra, a nuestro modo de ver, escasamente explorado, hecho que
nos ha empujado a escribir este trabajo. Enfocamos nuestro estudio hacia las combinaciones usuales [PREP + S] basándonos en la teoría de
Kathrin Steyer (2013). En especial, prestamos atención a las combinaciones usuales [PREP + S] entre lágrimas y su equivalente alemán unter Tränen. La inalidad es detectar convergencias y divergencias entre
las distintas CU, comprobar los periles sintagmáticos, tanto la ijación
externa (colocados sustantivos y verbales que funcionan de nodo del
modiicador preposicional), como la ijación interna (slots internos) en
ambos sistemas fraseológicos y si es posible sistematizarlos semántica
y fraseológicamente.
2. Marco del Proyecto. Breve descripción
Este trabajo se enmarca en el proyecto de investigación Combinaciones
fraseológicas del alemán de estructura [PREP + S]: patrones sintagmáticos, descripción lexicográica y correspondencias en español en
curso del grupo FRASESPAL1 cuya inalidad es extraer, inventariar y
describir las combinaciones usuales (CU) de estructura [PREP + S] de
diferentes corpus. Seguimos un método inductivo, es decir, empleamos
las herramientas y los datos estadísticos que nos proporciona el corpus con el in de extraer información que sería invisible en ausencia de
Proyecto del Ministerio de Ciencia e Innovación (FFI2013-45769-P) con el título
Combinaciones fraseológicas del alemán de estructura [PREP + S]: patrones sintagmáticos, descripción lexicográica y correspondencias en español, promovido por el
equipo de investigación FRASESPAL dirigido por Carmen Mellado Blanco de la Universidad de Santiago de Compostela con la colaboración de Kathrin Steyer, directora
del proyecto “Usuelle Wortverbindungen” del IDS (http://www1.ids-mannheim.de/
lexik/uwv.html). Los resultados del proyecto serán publicados en la plataforma online
OWID del IDS http://www.owid.de/wb/uwv/start.html.
1
150
Ana Mansilla
corpus, y que resulta relevante para nuestro estudio. En nuestro proyecto excluimos las CU que forman parte de “Funktionsverbgefüge” o
complementos preposicionales regidos por verbos, sustantivos o adjetivos, (padecer de, tener que ver con, sucumbir a, etc.). Asimismo, manejamos herramientas diseñadas en el IDS (Institut für Deutsche Sprache) como el programa COSMAS II (Corpus Search, Management and
Analysis System) y Lexpan (Lexical Patterns Analyzer) que en base a
las listas de Kwics (Key Word in Context) facilita el análisis de los slots
internos o Lückenfüller de las CU.
Asimismo, compilamos la información en base al corpus Sketch Engine, en concreto en base al corpus del alemán deTenTen 13 y los corpus
European Spanish Web 2011 y eseuTenTen11 para el español.
Fundamentamos nuestro análisis a partir de dos parámetros: la ijación externa hace referencia a los colocados verbales y sustantivos
que aparecen a la derecha y a la izquierda del nodo (cotexto anterior o
posterior) o la palabra o palabras que se están estudiando, es decir, en
contacto directo o no con el nodo [X (...) PREP + S (…) X], con el in
de que se puedan constatar modelos recurrentes que tienden a la lexicalización (Mellado Blanco, 2015) y la ijación interna [PREP + X + S],
esto es, qué tipo de slots internos aparecen en el discurso entre la preposición y el sustantivo: [con X dolores] (con fuertes dolores); [unter X
Schmerzen] (unter starken Schmerzen).
Se nos antoja de obligada mención resaltar la función semántica de
las preposiciones y de los sustantivos en las CU que estudiamos [PREP
+ S] y si cada unidad léxica mantiene sus signiicados léxicos o no,
esto es, si actúan con un sentido literal o con uno igurado, porque la
interacción entre forma y signiicado inluye directamente en la mayor
o menor idiomaticidad y por tanto lexicalización de las CU.
Las combinaciones usuales objeto de nuestro estudio lo constituyen combinaciones de palabras que poseen constituyentes ijos y otros
variables que, aunque son slots libres, están sujetos a ciertas restricciones semántico-combinatorias [unter Vorbehalt Xdem Abkommen/einem Beitritt/
zustimmen], [con Xtono/mueca/sonrisa de satisfacción]. Steyer delimita
dem Projekt
conceptualmente el término de combinación usual (usuelle Wortverbindung) como sigue:
Usuelle Wortverbindungen (UWV) sind als polylexikalische, habitualisierte sprachliche Zeichen zu verstehen, die speziischen Beschränkungen unterliegen. Diese Beschränkungen können alle Ebenen der Spra-
Lingüística de corpus y fraseología contrastiva (alemán-español)...
151
che betreffen. Sie ergeben sich aber primär nicht aus dem Sprachsystem,
etwa bedingt durch transformationelle Defekte oder semantische Selektionsbeschränkungen, sondern aus dem rekurrenten Gebrauch dieser
mehrgliedrigen Entitäten (Steyer, 2013: 16).
Desde un punto de vista semántico estas construcciones presentan
un componente pragmático “adicional”, no deducible de la suma de los
signiicados parciales de sus constituyentes. Steyer (2013) sustenta su
teoría en datos estadísticos del corpus y establece diferentes parámetros
para valorar la mayor o menor ijación de las combinaciones usuales
objeto de estudio:
•
•
•
•
El grado de ijación de la preposición y del sustantivo.
El grado de posible saturación entre preposición y sustantivo.
La interacción entre forma y signiicado de las CU.
La ausencia de determinante entre preposición y sustantivo.
La teoría de Steyer (2013) se sustenta a grandes rasgos en aquellos
fraseologismos que se han situado durante mucho tiempo en la periferia de la fraseología alemana, y, por tanto, no han sido abordados de
forma sistematizada y lo conforman los Muster, Schemata, Schablonen. Al respecto cabe citar el término de Modellbildung acuñado por
Häusermann (1977:30), o el de Phraseoschablone que apunta Fleischer
(1997: 130) y que hace referencia a esquemas que encierran una interpretación semántica ija y están sujetos a una especia de “idiomaticidad
sintáctica”. Algunos ejemplos que Fleischer cita en su obra se ajustan al
esquema “X ist X” (p. ej. sicher ist sicher, Urlaub ist Urlaub, geschenkt
ist geschenkt, etc.). En este sentido, Burger (2015: 45) y Dobrovol’ skij
(2011) abogan por el término Schema que hace referencia a un esquema
sintáctico con una semántica irregular, cuyos slots son rellenados por
componentes léxicos libres aunque sometidos a ciertas restricciones semánticas: [PRONder/die/ und XINFINITIVO]: der und singen!, der und diktieren [¡él cantando, ¡él imponiéndose!], entre otros.
3. Lingüística de corpus
El concepto lingüística de corpus está indisolublemente asociado con
la esfera de la lingüística computacional (o lingüística informática) que,
según Chantal Pérez & Antonio Moreno (2009: 68), constituye “un
152
Ana Mansilla
campo cientíico de carácter interdisciplinar, vinculado a la lingüística
y a la informática, cuyo in fundamental es la elaboración de modelos
computacionales que reproduzcan distintos aspectos del lenguaje humano y que faciliten el tratamiento informatizado de las lenguas”.
La irrupción de las nuevas tecnologías ha enriquecido sobremanera
la visión de conjunto del fenómeno de la lingüística. Entre las múltiples
aplicaciones de la lingüística de corpus, hay que señalar la frecuencia
de palabras en torno a un campo semántico determinado, la elaboración de modelos lingüísticos (gramática sintagmática generalizada), o
la descripción de diferentes niveles de la lengua (sintaxis, semántica,
pragmática). De igual modo, cabe mencionar el campo de las tecnologías del habla (reconocimiento del habla, la síntesis del habla) o la
traducción automática, o asistida por ordenador, entre otros. Asimismo,
se puede aplicar al ámbito de la fraseología cuando de lo que se trata es
del hábito colocacional de las palabras, esto es, de la frecuencia de coaparición (aparición simultánea) de varias palabras. Se pueden extraer y
ordenar frecuencias por orden alfabético, o índices estadísticos de palabras que aparezcan a la derecha y a la izquierda del nodo, entre otros.
Sin dejar de lado la corriente computacional, es justo mencionar la
labor desempeñada por Sinclair (1991: 170)2, para quien es sustancial
la frecuencia de coaparición de las unidades que integran la colocación
o la combinación de palabras, acogiéndose al idiomatic principle. En
relación a este principio, los hablantes suelen emplear unas palabras
con una frecuencia mayor que otras, lo que da lugar a combinaciones
“léxicas semiprefabricadas”. Asimismo, las coocurrencias léxicas pueden dar cuenta del peril colocacional, los patrones coligacionales o las
preferencias semánticas.
En el ámbito de la lexicografía los corpus electrónicos se han vuelto
prácticamente imprescindibles al aportar información relevante desde
un punto de vista pragmático (Sánchez & Almela, 2010: 5), porque el
signiicado de una palabra es su uso en un contexto, una idea que ya en
su día formuló Wittgenstein en su obra Philosophische Untersuchungen
(1953) hace más de seis décadas: “Die Bedeutung eines Wortes ist sein
Gebrauch in der Sprache”.
Sinclair hace referencia al node (núcleo), a los collocates (colocados) y span (espacio
de texto) para entender con más claridad qué se entiende por colocación, siendo especialmente relevante el espacio que dista entre dos o más palabras.
2
Lingüística de corpus y fraseología contrastiva (alemán-español)...
153
4. Análisis de la CU con estructura [UNTER + S]/[ENTRE + S].
El caso unter Tränen y entre lágrimas
A partir del rastreo en el corpus DeTenTen y EnTenTen de Sketch Engine queremos mostrar la combinatoria del cotexto de las combinaciones
usuales entre lágrimas y su equivalente en alemán unter Tränen. La
búsqueda que llevamos a cabo tiene en cuenta si la CU aparece al principio o no de la frase. Hemos observado que la frecuencia de aparición
de las CU alemanas es mayor que las españolas, y está relacionado
con aspectos sintácticos intrínsecos a cada lengua (unter Tränen, 10299
frente a entre lágrimas 710 ocurrencias).
Como hemos señalado más arriba, en nuestro proyecto incidimos
tanto en la ijación externa [X entre lágrimas X] como en la ijación interna [entre X lágrimas], cuyo slot X suele estar saturado por adjetivos.
La alta frecuencia de aparición de determinados colocados verbales o
sustantivos puede desembocar en patrones o construcciones sintácticas
que contienen un signiicado más o menos idiomático (Phraseoschablonen).
De las cinco acepciones que el DRAE recoge para el lexema lágrima, la primera es la que nos interesa para nuestro estudio: (1) “Cada una
de las gotas que segrega la glándula lagrimal (usado más en plural)”.
Por su parte, el Deutsches Wörterbuch der deutschen Sprache (DWDS)
registra una acepción para el lexema Träne: (1) “von den Tränendrüsen
im Auge abgesonderte, klare Flüssigkeit”.
La preposición unter en la CU unter Schmerzen designa un “Begleitumstand” (Helbig & Buscha, 1996: 439), esto es, una circunstancia
(valor modal) que va de la mano de la acción principal. A modo de
ejemplo, sirvan de ilustración los siguientes ejemplos extraídos de Helbig & Buscha (1996: 439):
(1) Unter großem Beifall wurde der Redner vorgestellt.
(2) Unter Jubel und Gelächter iel der Vorhang.
A este respecto, procede mencionar la publicación de Tibor Kiss
(2014) que, de forma pormenorizada, aborda las preposiciones desde
diferentes enfoques (sintáctico, pragmático o semántico). En lo que a la
preposición unter se reiere, de las once acepciones que presentan los
154
Ana Mansilla
autores la tercera es la que mejor se ajusta al signiicado de la preposición unter en la CU unter Tränen (3b. Begleitumstand/2 Vorgänge):
Begleitumstände umfassen sowohl äußere Begleitumstände (wie Beifall) als auch Gefühle und Gemütszustände, die eine Handlung begleiten. Die Bedeutung tritt nur bei Modiikation von Ereignissen, Handlungen oder Zuständen auf. Die Lesart wird häuig durch unbelebte
oder fehlende Agenten hervorgerufen. In der Semantik sind zwei identiizierbare Vorgänge angelegt, von denen der eine den anderen begleitet. Bei dem zweiten Vorgang handelt es sich um die Umstände der
Handlung, die oftmals nicht intentional verursacht werden (Kiss et al.,
2014: 188).
En lo que concierne al español, la preposición entre especiica un
signiicado locativo que etimológicamente procede de inter ‘en el interior de algo discontinuo’ (Martínez García, 2012: 25) y que, a su vez,
proviene de la locución prepositiva intro usque (hasta dentro de). En
latín clásico la preposición de acusativo inter designaba tanto un valor
locativo (inter multitudinem ‘en medio de la multitud’) como uno temporal (inter noctem ‘durante la noche’). En opinión de Cabezas Holgado (2013: 17), las propiedades léxicas del núcleo predicativo entre se
resumen en dos: valor locativo y valor colectivo.
Existe un grupo numeroso de CU que subyacen al esquema siguiente [unter/entre Splural /sentimientos (movimientos corporales, gritos, partes del discurso] por ejemplo, entre protestas – unter Protesten; entre gemidos - unter Seufzern;
entre lágrimas – unter Tränen; entre risas – unter Gelächter, etc. Los
sustantivos que ocupan el slot S lo conforman semánticamente sustantivos del ámbito de la comunicación verbal o de la expresión corporal
(Mellado Blanco & López Meirama en prensa).
Por lo que se reiere a la preposición entre en la CU entre lágrimas aquella expresa dos valores: temporal de simultaneidad y modal.
Un estado se desarrolla de manera paralela a la acción que expresa el
verbo principal. El hecho de que se produzcan dos eventos de forma
simultánea radica en la naturaleza semántica de los sustantivos que
acompañan a la preposición, por ser en su mayoría de tipo deverbal
(entre llantos, sollozos, risas, etc.). Partiendo en este caso del signiicado del sustantivo podemos, en cierto modo, llegar al signiicado real
de la preposición:
Lingüística de corpus y fraseología contrastiva (alemán-español)...
155
(3) Unter Tränen sagt er in einem Interview: “Ich wollte immer nur
ein ganz normaler Junge sein. Aber das Schicksal hat es anders
gewollt. [http://casting.mattschiibe.ch/2009/06/]
Como se desprende del ejemplo (3), la CU unter Tränen suele tener
como equivalencia un gerundio en español (unter Tränen sagt er – dice
llorando). Si bien existen sustantivos con suijos sustantivadores que,
desde un punto de vista morfológico, evidencian su relación con los
sustantivos, se les incluye en el grupo de los sustantivos deverbales, aún
cuando no describan sensu stricto un proceso, como p. ej. llanto, lloriqueo, suspiro, aullido, etc. Ejemplos aines los encontramos en otras
CU españolas cuyos sustantivos son de naturaleza deverbal como entre
sollozos que sería sinónimo, en función del contexto, de sollozando.
Por tanto, los sustantivos llanto, lágrima, suspiro denotan semánticamente el desarrollo del evento y sintácticamente ponen de maniiesto
una estructura argumental. A la vista del ejemplo (3), la preposición unter deja de convertirse en mero nexo funcional, al mantener un estrecho
vínculo con el sustantivo Träne.
4.1. Fijación externa e interna [X entre lágrimas X] y
[X unter Tränen X]; [entre X lágrimas X] y [unter X Tränen]
De acuerdo con la combinatoria o ijación externa, se observa en ambas
CU una clara preferencia por colocados verbales que son verbos de
comunicación o verba dicendi. Al tratarse de una CU en la que el componente emocional (de estado) está presente, hay que destacar el hecho
de que las CU entre lágrimas y unter Tränen coocurran con verbos en
los que se evidencien relaciones sociales o emociones.
La Tabla 1 proporciona una lista de los 25 primeros colocados más
frecuentes que aparecen inmediatamente a la izquierda y a la derecha
de la CU unter Tränen a una distancia de 5 palabras, tanto sustantivos
como verbales estos últimos conjugados en diferentes formas verbales. Los datos sugieren que los verbos que aparecen en el top según
el índice logDice, el índice estadístico desde un enfoque lexicográico
más iable, lo conforman verbos directivos o exhortativos tales como
bitten y lehen que tienen una alta frecuencia de aparición. En español
no hemos detectado verbos exhortativos equivalentes del verbo lehen,
anlehen (suplicar, implorar). La combinatoria de los colocados ver-
156
Ana Mansilla
bales nos muestra, entre otros muchos aspectos, qué contextos pueden
predominar. Por lo que atañe a la situación de uso de los verbos lehen y
bitten en combinación con unter Tränen se observa una clara tendencia
del registro de habla elevado propio de contextos literarios o religiosos:
(4) Turribius erschrak sehr, als er davon hörte, er weigerte sich, diese
Würde anzunehmen und lehte unter Tränen zu Gott, ihm diese Last weg zu nehmen. [http://www.heiligenlegenden.de/monate/
maerz/23/turribius/home.html]
Tabla 1. Ejemplo de colocados de la CU unter Tränen
Del ejemplo (4) se desprende que el sujeto paciente de la acción es
Gott (zu XGott/Herrn/Jesus lehen).
Lingüística de corpus y fraseología contrastiva (alemán-español)...
157
Los verba dicendi conforman un grupo igualmente numeroso, sobre
todo en lo que concierne a verbos en alemán como beichten, berichten,
sich entschuldigen, erzählen o gestehen. Este último es el más signiicativo del grupo por frecuencia de coaparición y mantiene un signiicado
afín con la CU unter Tränen por el hecho de que reconocer una culpa
(gestehen) encierra un momento de una gran carga emocional (unter
Tränen).
Tabla 2. Ejemplo de colocados de la CU “entre lágrimas”
158
Ana Mansilla
Tal y como se desprende de la tabla 2, los casos más representativos
lo componen verba dicendi (repetir, confesar, pedir), verbos de interacción social (agradecer) y verbos de contacto (abrazar). Se constata en
ambos idiomas un grupo recurrente de verbos que expresan relaciones
sociales como sich verabschieden, Abschied nehmen y despedirse o decir adiós en español en los que se describe mayoritariamente el cese de
la actividad dentro de un contexto deportivo:
(5) Dabei verabschiedete er sich unter Tränen von den fast 24.000
Zuschauern im Arthur Ashe Stadium in New York. [http://www.
whoswho.de/bio/andre-agassi.html]
(6) pero que sirvió como perfecto colofón a la carrera de Joseba Etxeberría, quién se despidió de San Mamés entre lágrimas con más
de 400 partidos como bagaje. [http://agendapolitica.es/deportes/elgetafe-regresa-a-europa-por-la-puerta-grande.html]
En español y en alemán se observan diferencias de uso con relación
al colocado verbal sich umarmen y abrazar. En español, se enuncia con
frecuencia en presente en 3.ª persona del plural con sentido recíproco, y
en cambio en alemán se acentúa más el uso en pasado igualmente en 3.ª
persona del plural. Los lemas y sus correspondientes formas verbales
evidencian en el corpus distintas estrategias de implicación del interlocutor. En el caso del presente de indicativo en 3.ª personal del plural
(sie umarmen sich, se abrazan), los lemas verbales utilizados poseen
una función descriptiva.
Antes de pasar a la ijación interna, exponemos brevemente la expansión a la derecha de las CU [unter Tränen X]y [entre lágrimas X]. El
análisis de las listas de colocados a la derecha de las CU presenta elementos interesantes (unter Tränen des/der/und y entre lágrimas de/y).
La expansión de [unter Tränen NPgenitivo] que actúa como modiicador del lexema ‘Träne’ engloba sustantivos que pertenecen mayoritariamente al ámbito de las emociones (Rührung, Mitleid). En cuanto a la
secuencia española [entre lágrimas NPgenitivo] se constatan sustantivos
de diferente naturaleza semántica (rabia, tristeza, emoción). El análisis de los corpus nos ha permitido constatar que especialmente algunos
colocados sustantivos muestran una marcada tendencia a la expansión
a la derecha entre lágrimas y sollozos, entre lágrimas y abrazos, entre
lágrimas y aplausos. En alemán los sustantivos más prototípicos refuerzan el componente emocional de llorar por ser sinónimos de Trä-
Lingüística de corpus y fraseología contrastiva (alemán-español)...
159
nen (unter Tränen und Bluttränen, unter Tränen und Schluchzen, unter
Tränen und Seufzern). De ahí que algunos de estos binomios presenten
un cierto grado de lexicalización (Mellado Blanco, 2015).
(7) Wenn die Kinder dieser Welt meine Worte nicht annehmen und verachten – die ich unter Tränen und Bluttränen laut herausschreie
– weiterhin den Vergnügungen und dem Müßiggang frönen, wird
der Verfall wie ein Dieb in der Nacht hereinbrechen. [http://www.
kommherrjesus.de/index.php]
Asimismo, tenemos que tener en cuenta que las lenguas diieren en
su morfología, así, por ejemplo, en alemán, donde la composición es
mucho más frecuente que en español, la equivalencia de lágrimas de
emoción o lágrimas de cocodrilo es un compuesto que conforma una
unidad ortográica (Gefühlstränen, Kokrodilstränen) en parte por el hecho de que el léxico alemán es más proclive a un mayor número de
compuestos de este tipo frente al español, que presenta una expansión
hacia la derecha en forma de complemento del nombre (lágrimas de
Nemoción/cocodrilo/alegría). La diferencia estriba en la morfología léxica de cada
lengua, concretamente en la preferencia del alemán por palabras compuestas que conforman una unidad ortográica (Lachtränen) frente a las
formaciones compuestas en español que no están unidas gráicamente
(lágrimas de emoción). Este rasgo es fundamental cuando manejamos
corpus y consultamos frecuencias de aparición de grupos de palabras,
que más adelante comprobaremos en relación a la ijación interna.
En alemán, hemos observado que la palabra compuesta Kokrodilstränen como CU se emplea tanto con la preposición mit como con la
preposición unter (unter Kokrodilstränen, mit Kokrodilstränen). Curiosamente, en español no aparece ningún caso con la preposición entre
lágrimas de cocodrilo, únicamente con la preposición con. Esta CU con
expansión a la derecha se usa frecuentemente en oraciones de imperativo negativo (no me vengas con lágrimas de cocodrilo) o en oraciones
airmativas:
(8) Lo que me parece patética es la actitud de algunos, que vienen con
lágrimas de cocodrilo diciendo que se están cargando el ciclismo,
sois vosotros los que os lo habéis cargado, pandilla de golfos y
maleantes, iros a llorar con vuestra p... madre. [http://blogs.abc.es/]
160
Ana Mansilla
El caso de lágrimas de risa es frecuente en alemán acompañado de
la preposición entre (unter Lachtränen) frente al español que se decanta
por la combinación con la preposición con lágrimas de risa. En el corpus solo hemos encontrado un caso (entre lágrimas de risa).
En relación con la ijación interna [unter + X + Tränen] y [entre + X
+ lágrimas] el análisis de los corpus constata que la expansión interna
en español es menor que en alemán, la tendencia del adjetivo a la posición posnominal promueve secuencias del tipo lágrimas amargas. En
este punto, consideramos fundamental hacer hincapié en el fenómeno
de la posición de los adjetivos en ambas lenguas. La fuerza de la posición atributiva de los adjetivos en alemán es mucho mayor que en
español porque se da el caso en español de que el signiicado según sea
la posición del adjetivo cambia (un pobre hombre frente a un hombre
pobre). Seco (1975: 5) señala que “el sentido recto siempre se conserva
en el adjetivo pospuesto, mientras que el antepuesto está más o menos
deformado”. Este pequeño inciso nos ayuda a entender que la ijación
interna sea, por regla general, en secuencias del tipo [PREP + X + S]
mayor en alemán que en español. Una de las varas de medir la mayor o
menor ijación de las CU en el discurso radica en observar el comportamiento de este tipo de secuencias en su ijación interna. Cuanto mayor
sea el nivel de saturación de los slots internos, menor será la ijación y
por tanto más bajo será su nivel de lexicalización. Sirva como ejemplo
de repente cuyo slot interno X [de X repente], a la luz de los corpus
consultados, es prácticamente nulo.
Entre los slots adjetivos más recurrentes en alemán cabe destacar
adjetivos que inciden en la intensidad del llanto (unter vielen Tränen),
o establecen una relación cognitiva con el concepto del agua (unter
strömenden Tränen) o con sensaciones gustativas (unter bitteren Tränen). Un ejemplo que se asemeja al que hemos señalado más arriba
unter Tränen und Bluttränen posee de nuevo un valor intensiicador
del acto de llorar (unter blutigen Tränen) que hace acto de presencia
mayoritariamente en contextos literarios y adquiere un claro signiicado
translaticio:
(9) Die Wirtin wurde immer aufmerksamer, als er aber daran kam, wie
er die schöne Jungfrau aus dem Loche erlöst und sich mit ihr verlobt hatte, da schloss sie ihn in ihre Arme und rief unter blutigen
Tränen: [http://internet-maerchen.de/]
Lingüística de corpus y fraseología contrastiva (alemán-español)...
161
La CU unter heißen Tränen está presente en registros elevados junto
con verbos en tiempos pasados y al igual que unter blutigen Tränen
existe una traslación metafórica:
Tabla 3. Ejemplos de slots internos de la CU “unter X Tränen”
La mayoría de los slots internos de la CU unter X Tränen actúan
como valor intensiicador del lexema “Träne” (unter heißen/bitteren/
162
Ana Mansilla
strömenden/heftigen Tränen). En español se evidencia una saturación
mínima de los slots internos (adjetivos) de la CU entre X lágrimas. La
explicación de este fenómeno guarda relación con la equivalencia de
entre lágrimas con el gerundio llorando, por ello en el corpus apenas se
detectan slots internos con valor modal y temporal. Cuando se intercala
un determinante (las, mis, tus) la preposición expresa un signiicado
local o divisivo:
(10) Pero el dios sigue susurrando entre las lágrimas. [http://avisos.
realbiblioteca.es/?p=article&aviso=43&art=879&lang=es]
Como se desprende del ejemplo (10), los determinantes desempeñan
un papel fundamental para anular el signiicado idiomático o fraseológico de entre lágrimas. Los valores modal y temporal dejan paso al
valor local o divisivo o a verbos con régimen verbal o complemento
preposicional (escoger entre las lágrimas o la risa). Cuando los slots
son varios, el español exhibe sustantivos acompañados de una conjunción (entre sonrisas y lágrimas, entre aplausos y lágrimas, entre risas
y lágrimas, entre un mar de lágrimas). En buena parte de los casos se
observa una antítesis o contraste entre el par de sustantivos (sonrisa y
lágrima, risa y lágrima) que mitigan el efecto o el sentimiento negativo
del llanto (lágrima).
5. Relexiones inales
Los corpus brindan a los lingüistas la posibilidad de manejar datos voluminosos que se desmarcan de los datos que proceden del juicio intuitivo e introspectivo del lingüista y que están insertos en contextos
discursivos reales. Ahora bien, es preciso puntualizar que trabajar con
corpus no está exento de diicultades. Una de las limitaciones de nuestro estudio consiste en que los textos del corpus que hemos manejado
proceden mayoritariamente del lenguaje periodístico. Otra limitación
ha sido la diicultad para obtener información de uso a nivel nivel supraoracional, hecho que diiculta distinguir aspectos del contexto que
aportan información valiosa para el cotexto inmediato de las CU objeto
de estudio. Con todo y con ello, es evidente que los corpus lingüísticos aportan datos reveladores (estadísticos) que proceden de evidencias
Lingüística de corpus y fraseología contrastiva (alemán-español)...
163
empíricas y favorecen un análisis más exhaustivo en el campo de la
fraseología.
De acuerdo con la combinatoria o ijación externa de las CU entre
lágrimas y unter Tränen, se observa una clara preferencia por colocados verbales que son verba dicendi y de contacto. En español entre
lágrimas va muy ligado a la forma de gerundio del verbo llorar llorando, y en alemán se suple esta carencia exhibiendo el valor modal.
Asimismo, es llamativo el fuerte componente emocional y expresivo
del signiicado pragmático de los binomios entre lágrimas y sollozos y
unter Tränen und Bluttränen. Por último, en relación a los slots (X) en
la ijación interna [entre X lágrimas] y [unter X Tränen] el hecho de que
en español el adjetivo tienda a la posición posnominal implica un nivel
menor de saturación del slot X frente al alemán (unter bitteren Tränen
– entre lágrimas amargas).
6. Bibliografía
Burger, Harald. 2010. Phraseologie: eine Einführung am Beispiel des Deutschen. Berlin: Erich Schmidt Verlag.
Cabezas Holgado, Emilio. 2013. La predicación: Las construcciones en abanico. Aplicaciones al español. http:// eprints.ucm.es/22365/1/T34644.
pdf [Acceso 27/03/2017].
Dobrovol’skij, Dmitrij. 2011. Phraseologie und Konstruktionsgrammatik. En
Lasch, Alexander & Ziem, Alexander (ed.) Konstruktionsgrammatik
III. Aktuelle Fragen und Lösungsansätze. Tübingen: Stauffenburg, 111130.
Fleischer, Wolfgang. 1997. Phraseologie der deutschen Gegenwartssprache.
Tübingen: De Gruyter.
Häusermann, Jürg. 1977. Phraseologie. Hauptprobleme der deutschen Phraseologie auf der Basis sowjetischer Forschungsergebnisse. (=Linguistische Arbeiten 47). Tübingen: Max Niemeyer.
Helbig, Gerhard & Buscha, Joachim. 1996. Deutsche Grammatik. Ein Handbuch für den Ausländerunterricht. Leipzig: Langenscheidt Verlag Enzyklopädie.
Kiss, Tibor; Müller, Antje; Roch, Claudia; Stadtfeld, Tobias; Börner, Katharina
& Duzyein, Monika (ed.). 2014. Handbuch für die Bestimmung und
Annotation von Präpositionsbedeutungen im Deutschen (Bochumer
Linguistische Arbeiten 14). Bochum: Sprachwissenschaftliches Institut, Ruhr-Universität Bochum.
164
Ana Mansilla
Martínez García, Hortensia. 2012. Viejos y nuevos valores de las preposiciones españolas. Verba 39: 7-34.
Mellado Blanco, Carmen. 2015. Phrasem-Konstruktionen und lexikalische
Idiom Varianten: der Fall der komparativen Phraseme des Deutschen.
En Engelberg, Stefan; Meliss, Meike; Proost, Kristel & Winkler, Edeltraud (ed.) Argumentstruktur – Valenz – Konstruktionen. Tübingen:
Narr, 217-235.
Mellado Blanco, Carmen & Belén López Meirama. Esquemas sintácticos de
preposición + sustantivo: el caso de [entre + Splural/corporal]. En Mellado Blanco, Carmen; Berty, Katrin & Olza, Inés (ed.) Discurso repetido y fraseología textual (español y español-alemán). Frankfurt am
Main: Vervuert / Iberoamericana (en prensa).
Pérez Hernández, Chantal & Moreno Ortiz, Antonio. 2009. Lingüística Computacional y Lingüística de Corpus. En Rodríguez Ortega, Nuria (ed.)
Teoría y literatura artística en la sociedad digital: construcción y aplicabilidad de colecciones textuales informatizadas. Gijón: Trea, 67-96.
Sánchez, Aquilino & Almela, Moisés. 2010. A Mosaic of Corpus Linguistics.
Frankfurt am Main: Peter Lang.
Seco, Manuel. 1975. Manual de gramática española. Madrid: Alfaguara.
Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.
Steyer, Kathrin. 2013. Usuelle Wortverbindungen. Zentrale Muster des Sprachgebrauchs aus korpusanalytischer Sicht. Tübingen: Narr.
Wittgenstein, Ludwig. 1953. Philosophische Untersuchungen. Oxford: Blackwell.
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
Assessing EPAP lexical features: A corpus-based study
Análisis de los rasgos léxicos de IFE: un estudio de corpus
María José Marína & Camino Rea Rizzob
Universidad de Murcia.
[email protected]
Universidad Politécnica de Cartagena.
[email protected]
Received: 10/02/2017. Accepted: 12/09/2017
a
b
Resumen: Las características de los lenguajes de especialidad se han descrito profusamente en la literatura especializada. El trabajo de Enrique Alcaraz destaca entre
otros por su exhaustiva y minuciosa descripción del IFE a todos los niveles: léxico,
sintáctico, semántico y pragmático. Este estudio tiene como inalidad la constatación
de dicha descripción desde una perspectiva basada en análisis de dos corpus de inglés
jurídico y de telecomunicaciones. Los resultados obtenidos corroboran lo ya observado
por Alcaraz en lo que se reiere al uso de los términos especializados, la relevancia del
vocabulario subtécnico, las peculiaridades de los términos latinos en el inglés jurídico y
la signiicativa presencia de las abreviaturas en el inglés de telecomunicaciones.
Palabras clave: IFE; inglés jurídico; inglés de telecomunicaciones; lingüística del corpus.
Abstract: The features of specialised languages have been extensively described by
scholars in the literature. Amongst them, Enrique Alcaraz’s work stands out as an exhaustive and comprehensive description of EPAP at all linguistic levels: lexical, syntactic, semantic and pragmatic. This research aims to provide a bottom-up assessment
of his description on a lexical level through the implementation of corpus-based techniques on two specialised corpora of legal and telecommunications English. The results
support Alcaraz’s portrayal as regards term usage, the relevance of sub-technical vocabulary, the peculiarities of Latin single and multi-word terms in legal English and the
signiicant presence and usage of abbreviations in telecommunications English.
Keywords: EPAP; ESP; Corpus Linguistics; Legal English; Telecommunications English.
Marín, María José & Rea Rizzo, Camino. 2017. “Assessing EPAP lexical features: A
corpus-based study”. Quaderns de Filologia: Estudis Lingüístics 22: 165-186.
doi: 10.7203/qf.22.11307
Assessing EPAP lexical features: A corpus-based study
167
1. Introduction
Specialised languages have been traditionally deemed functional varieties or registers (Biber, 1988; Halliday, 1988) deined in terms of the
variation of the recurrence of particular linguistic features in comparison to general language or other registers. Cabré (1993) considers special languages a set of sub-codes from general language which are characterised by their own special features and pragmatically determined by
the variables of topic, user and communication act.
Focusing on the deinition of the language of science and technology, Sager et al. (1980) provide a comprehensive description of specialised languages. Their deinition, use and function are synthesised
as follows: “Special languages are semi-autonomous, complex systems
based on and derived from general languages; their use presupposes
special education and is restricted to communication among specialists
in the same or closely related ields (1980: 69).” Similarly, Tiersma
(1999) asserts that law practitioners depend upon language in their profession. According to this author, the special features of their jargon
undeniably reveal their membership of the same community.
Alcaraz’s (2000) deinition is in line with all of the above, as he
states that so-called special languages refer to the speciic language
that professionals and specialists use in order to transmit information
and negotiate terms, concepts and knowledge in a particular ield of
knowledge. In El inglés profesional y académico (2000), Alcaraz describes the most relevant features of English for Professional and Academic Purposes (EPAP), a term that he coins to refer to the specialised
language which professionals and specialists employ to communicate.
EPAP embraces many different branches or varieties associated with
different professional or scientiic ields such as medicine, law, engineering or business, amongst many others.
This research was conceived as an appraisal of Alcaraz’s fundamental work through the analysis of the lexicon of two specialised corpora,
TC (Telematics Corpus: 1.2 million words) (Rea, 2008) and UKSCC
(United Kingdom Supreme Court Corpus: 2.6 million words) (Marín,
2014; Marín & Rea, 2012a), in search of linguistic evidence supporting
some of the most relevant characteristics which scholars (Mellinkoff,
1963; Tiersma, 1999; Sager et al., 1980; Alcaraz, 2000, 2002) have portrayed in the literature. The reasons to single out such differing EPAP
168
María José Marín & Camino Rea Rizzo
varieties as legal and telecommunications English were related to the
major objective of this research, that is, attempting to provide a bottom-up characterisation of specialised lexicons based on the general
portrayals provided by scholars in the literature, speciically, Enrique
Alcaraz’s (2000; 2002). In principle, one would expect legal and telecommunications English terminology to differ considerably owing
to their very nature and origins, the former belonging to the ield of
humanities and social sciences and having Latin and French inluence
(often being archaic and redundant) (Mellinkoff, 1963; Tiersma, 1999;
Alcaraz, 2000, 2002), the latter coming from the realm of engineering
and science and being highly speciic and accurate. However, with regard to the statistical data associated with these lexical units, our main
hypothesis was that both technical and subtechnical terms would behave similarly in both EPAP varieties, conirming the general descriptions made by scholars.
Owing to the size of both corpora and, above all, to our wish to carry
out a fully automatic analysis with the aim of processing as much data
as possible, only some of the features described by Alcaraz in his work
were considered in this appraisal, namely, the ratio and distribution of
highly specialised terms in both corpora; the relevance of subtechnical
vocabulary; the use of Latin words and phrases in the legal corpus and
the presence and signiicance of abbreviations and acronyms in the telecommunications corpus.
2. Literature review
Following from the above, this research concentrates on four major lexical features of EPAP which have been assessed applying a bottom-up
corpus-based methodology, that is, by observing the statistical behaviour of the lexicon found in two specialised corpora, TC and UKSCC.
The literature devoted to the study of such features highlights the
usage of specialised terminology as one of the most noteworthy aspects
of EPAP as regards both its frequency of use and its distribution across
text collections. Specialised terms could be deined as conceptual vehicles which are employed to transmit specialised knowledge amongst
scientists, researchers, or professionals in all specialised areas, hence
their relevance in EPAP. As Cabré (2000: 62) puts it, terms are “form
and content units which, used in different discursive conditions, acquire
a specialised value”. According to Alcaraz (2000), terms tend to be uni-
Assessing EPAP lexical features: A corpus-based study
169
vocal and their understanding is key to a proper comprehension of specialised texts, both oral and written. In other words, terms encapsulate
speciic concepts and must be understood and mastered by specialists,
otherwise communication will fail.
Still within the lexical level, Alcaraz (2000) underlines the signiicance of semitechnical or subtechnical vocabulary as another relevant
feature of EPAP. Subtechnical vocabulary is deined as those lexical
units present in general language which acquire one or several speciic
meanings within a ield of knowledge (Alcaraz, 2000: 43). In addition,
subtechnical vocabulary is also understood as a collection of general
words which are shared both by the general and the specialised ields
without changing their meaning. Numerous authors have approached
this question and deined sub-technical terms from different angles
(Cowan, 1974; Baker, 1988; Flowerdew, 2001; Chung & Nation, 2003;
Wang & Nation, 2004), agreeing on their ambiguous character and the
dificulties that they cause to EPAP learners due to such obscurity. For
the concept to be clearly delimited, Marín (2016) attempts to deine it
taking into consideration both qualitative and quantitative criteria.
Another relevant feature of EPAP, speciically of legal English, is
the strong inluence of Latin on its terminology, something that does not
happen in telecommunications English. Although common law bears
almost no resemblance with Roman law (which civil/continental law
systems are based on), the presence of Latin in its terminology is more
than merely anecdotal. Alcaraz (2000: 78) distinguishes between purely Latin borrowings like obiter dictum or ratio decidendi, which were
imported directly from Latin without being adapted into English, and
cognates such as exonerate or presumption, which relect the English
orthography although their meaning and form remain closely linked to
their etymological origin. In the present research we will concentrate
on the former and attempt to support these observations with evidence
obtained from UKSCC, our legal corpus.
Within EPAP, the area of science, technology and computing is also
characterised by the constant creation of new lexical units by using the
linguistic resources of the corresponding language (Alcaraz, 2000: 50).
The creation of new words responds to the need for the unique naming
of concepts. According to Sager et al. (1980) and Alcaraz (2000), the
principal method of designation in general and even more so in special
reference is the modiication of existing resources by means of concatenative processes, which follow the principle of adding some morpho-
170
María José Marín & Camino Rea Rizzo
logical material to a given form, namely, derivation and compounding.
Nevertheless, there are also word formation processes that do not follow the principle of concatenation so new items are formed by deleting
linguistic material instead of adding it. Amongst them, abbreviation
refers to any kind of word which has undertaken a shortening process,
that is, any compressed form in general. Abbreviation is an umbrella
term which covers initials (also called initialism), acronyms (also called
letter words) and clippings (Sager et al., 1980; Alcaraz, 2000; Plag et
al., 2007).
Despite the relevance of the features depicted above, to the best of
our knowledge, there are no corpus-based studies which can contribute
to a bottom-up characterisation of specialised lexicons, apart from the
ones carried out by Marín (2014; 2016), Rea (2008) and Marín & Rea
(2012a; 2012b; 2014), hence the need to develop further research along
these lines.
3. Methodology, results and discussion
Alcaraz’s (2000) work presents a comprehensive portrayal of the major
features of EPAP, which comprises the lexical, semantic, syntactic and
pragmatic levels of the language and focuses on the use of specialised
terminology, the features of major phrase types, the presence of polysemic words and metaphors in specialised texts or the communicative
dimension of these texts, amongst many others.
For practical reasons and given the fact that this study was intended
to be carried out automatically, a selection of these features was made
so as to concentrate on the lexicon of legal and telecommunications
English applying corpus linguistics techniques, that is, adopting a bottom-up perspective for the analysis of the two corpora. The selected
features are: the use of specialised terminology in EPAP; the relevance
of subtechnical terms; the signiicance of Latin terms and phrases in
legal English; and the use of compressed forms or abbreviations as a
result of word formation processes in telecommunications English.
3.1. Specialised terminology in TC and UKSCC
As regards the identiication of specialised terms in large text collections
like UKSCC or TC, they can be mined automatically using Automatic
Assessing EPAP lexical features: A corpus-based study
171
Term Recognition (ATR) Methods. There is a whole plethora of them,
some of which were validated on both corpora (Marín & Rea, 2014).
The methods selected for evaluation were: TF-IDF (term frequency-inverse document frequency) (Sparck Jones, 1972); TermoStat (Drouin,
2003); C-Value (Frantzi & Anniadou, 1999) and Terminus (Nazar &
Cabré, 2012). Their assessment was deployed through a comparison
between the output lists of candidate terms produced by each method
and two specialised glossaries of legal and telecommunications terms1.
The overlap percentage between both vocabulary inventories showed
the precision levels achieved by each of the methods and therefore led
to a selection of the most eficient one.
Out of the four methods tested by Marín & Rea (2014), Terminus
(Nazar & Cabré, 2012) excelled in comparison with the other three,
managing to extract 71.5% true terms (terms which coincided with the
ones in the glossaries used as gold standard) from UKSCC and 60%
from TC on average. Precision was even higher for the top 200 candidate terms, reaching 84.5% for the former corpus and 69.5% for the
latter.
As regards the legal corpus, implementing Terminus as the selected ATR method, a list of 1,787 terms was obtained, which represented 6.6% of the total 27,060 types identiied by Wordsmith 5.0 (Scott,
2008)2. These terms displayed an average frequency of 1,037 (each of
them repeats itself throughout the corpus on 1,037 occasions) and appeared in 27 texts on average (out of 193). If compared with the average
distribution of all the word types in UKSCC (19.8), excluding hapax
and dis legomena3, the distribution of the specialised terms extracted
by Terminus could be deemed considerably high, actually, almost twice
as high as all the types in the corpus. Not only were legal terms well
distributed, but their frequency was also much higher than the average
frequency of all the word types, occurring on 1,037 occasions as opposed to the average value of such types, 169.45, 6 times lower than the
former (again, hapax and dis legomena were excluded from this count).
The automatic validation of the lists was performed by resorting to two specialised
electronic glossaries of legal English (of 10,054 terms) and telecommunications English (of 5,102).
2
The term type refers to each of the words present in a corpus without counting their
repetitions. Each of these repetitions would be labeled as tokens.
3
Those types which occur once or twice respectively in the corpus.
1
172
María José Marín & Camino Rea Rizzo
The number of terms identiied in the telematics corpus by Terminus
was smaller, 888 terms out of 25,774 types, which represented 3.44%
of the whole list. Their frequency counts were also lower than the same
value in the legal corpus since the terms identiied in TC occurred on
38.62 occasions on average, whereas the mean frequency of the whole
type list was almost three times as high, that is, 89.93. Nevertheless,
they were well distributed in the corpus being present in 30.14 texts out
of 272 (the whole text collection) as opposed to the same average value
for the whole of the type list, 14.59.
Judging by these igures, although term frequency counts were not
so high in the telematics corpus as they were in the legal one, it could
be afirmed that Alcaraz’s observation about the signiicance of the use
of terminology in specialised texts was conirmed from a bottom-up
perspective with regard to both the frequency and distribution of legal
and telematics terms.
3.2. Quantifying the relevance of subtechnical terms in TC
and UKSCC
Concerning UKSCC and TC, the presence of subtechnical vocabulary
was measured using Heatley & Nation’s (2002) software Range. This
software allows the user to obtain the percentage of running words in
a text or text collection covered by a given word list which is included
in the software package. Both the term lists obtained from our corpora
were processed using the British National Corpus (BNC) list of the most
frequent 3,000 words of English as the base list to compare them with.
The resulting percentage would relect the proportion of specialised
terms from our lists which could be found amongst the most frequent
3,000 words of English, comprising words like father, bank, the or water, amongst many others. Such overlap would signal the percentage of
subtechnical words present in both corpora given the fact that they were
identiied as specialised terms by Terminus, validated as such against a
specialised glossary and also found as general vocabulary amongst the
3,000 most frequent words of English.
The overlap percentages varied in both cases, legal English being
the variety which presented a higher amount of terms which coincided with the general English vocabulary from the BNC. 47.35% of the
terms identiied by Terminus were also present in the list of the most
Assessing EPAP lexical features: A corpus-based study
173
frequent 3,000 words of English. Such frequent words as action, claim,
decision or criminal were included in our term inventory. Apart from
their high frequency counts in the general ield, they could be labelled
as subtechnical owing to the fact that they acquire a technical meaning
when in contact with the legal context.
About one third (35.55%) of the terms mined from TC, our telematics corpus, could also be found in the BNC list. The words processor
or controller, which have a specialised meaning both in the general and
the telematics ields, were found amongst that third. Other terms like
backbone, also in the list of subtechnical telematics terms, specialise in
the technical environment referring not to the human spine but rather to
a local computer network. Table 1 below illustrates the top 25 subtechnical terms obtained from both corpora.
SUB-TECHNICAL LEGAL
TERMS
SUBTECHNICAL TELEMATIC
TERMS
Ability
Application
Absolute
Backbone
Acceptable
Bit
Action
Box
Admit
Call
Complaint
Client
Consistent
Controller
Creditor
Depend
Criminal
Entry
Damages
File
Debt
Logic
Employ
Mapping
Evidence
Model
Excuse
Neighbor
Exercise
Noise
Expense
Object
174
María José Marín & Camino Rea Rizzo
SUB-TECHNICAL LEGAL
TERMS
SUBTECHNICAL TELEMATIC
TERMS
Fact
Operate
Form
Packet
Privilege
Path
Proof
Programme
Reveal
Refer
Signature
Resource
Suicide
Route
Suspend
Server
Terminate
Site
Table 1. Top 25 subtechnical terms obtained from UKSCC and TC
Once more, having adopted a bottom-up perspective, Alcaraz’s observation about the relevance of subtechnical vocabulary in specialised
English has been corroborated by corpus evidence.
3.3. Latin terms in UKSCC: a corpus-based assessment
This section presents the study of legal terms which are employed in
legal English without being adapted to the English orthographic or
phonetic system, that is, they are pure Latin borrowings, as deined by
Alcaraz (2000: 78). These must be distinguished from cognates, which
are adapted to the English language system although their meaning and
form still remain close to their etymological origin. The data and discussion offered below revisit and upgrade the study by Marín & Rea
(2012b).
As a preliminary step, a list of Latin terms was obtained from text
and academic books4 which acted as reference for the identiication of
these lexical units in UKSCC. Such identiication was carried out using
See Mellinkoff, 1963; Alcaraz, 1994; Borja, 2000 and Orts, 2006, for academic references on Latin vocabulary in legal English and Fernández, 1994; Rice, 2007; KroisLinder & Firth, 2008; Frost, 2009; Callanan, 2010 and Orts, 2010 for textbook references.
4
Assessing EPAP lexical features: A corpus-based study
175
an excel spreadsheet to compare the type list produced by Wordsmith
(Scott, 2008) with the Latin term list obtained from the books cited
below automatically. Once single word Latin units were extracted (187
in total), it was attested that the top 10 most frequent ones were mostly
function words, as is the case in general English, namely: versus (v), per,
de, inter or re. There were other forms which, owing to their similarity
with English, were excluded from these considerations (i.e. in, sub or
ex), since they might produce misleading results. However, if compared
with the whole UKSCC type list, their frequency was considerably low
standing between the 400th and 1800th positions of the frequency rank.
As a matter of fact, only 17 of these single word Latin terms fell within
the top 2,000 word types identiied by Wordsmith. Other Latin terms
within this frequency range were afidavit, quantum, jure, or incapax.
Text range was also considered in this study as an indicator of a
term’s representativeness. Nation (2001) afirms that the higher this
value for a given word is in a corpus, the greater its relevance within
that corpus. The concept text range points at the percentage of running
words in a text covered by that term or word list. For the sake of comparison, a sample list of 35 crime nouns (also regarded as specialised
terms) was mined from the list of word types conirming the low frequency counts associated to Latin terms. Nevertheless, as regards text
range, the igures varied showing that the 187 Latin term list covered
0.0059% of the words in UKSCC, whereas crime nouns covered only
0.00095%, almost six times less. Therefore, it could be stated that Latin
terms, although not excessively frequent, present higher text coverage
values than other specialised terms like murder, abduction, threats or
battery, always bearing in mind that Latin terms only represent 0.69%
of the total types identiied in the corpus.
In a similar fashion, keyness was computed with the aim of determining the level of representativeness of Latin single-word units within
the legal text collection. According to Scott (2008: 184), “a word is
considered key if it is unusually frequent (or unusually infrequent) in
comparison with what one would expect on the basis of the larger wordlists”. Keyness can be calculated automatically by comparison with a
general English corpus using Wordsmith. Resorting again to the list of
crime nouns used as reference for comparison with our Latin word inventory, the results showed that, in spite of the lower frequency of Latin
terms, they could be considered as relevant as crime nouns standing at
176
María José Marín & Camino Rea Rizzo
only three points below the latter and displaying 94.3 keyness. This
value is also considerably high if compared with the average keyness of
the whole list produced by wordsmith, namely, 116.08.
Finally, the level of specialisation of these terminological units was
also measured in an attempt to substantiate Alcaraz’s observation on
their relevance in legal English. In this case, Chung’s (2003) ratio ATR
method was applied to rank the Latin terms according to their degree
of speciicity. Chung’s method is based on corpus comparison, classifying a word type as a term only “if it occurs 50 times more often in
the technical text than in the comparison corpus, or if it only occurs in
the comparison corpus” (2003: 53). This termhood ratio can be easily
calculated by irst dividing a word’s frequency of occurrence in both
corpora by the number of tokens in each corpus, and then dividing the
result obtained using the data from the specialised corpus by the same
data obtained from a general one5. The value obtained should be above
50 for a word type to be regarded as a specialised term. The Latin terms
in our list were therefore arranged and iltered according to Chung’s
method, which resulted in an inventory which included terms such as
afidavit, caveat, proviso, extempore, quantum, lex or subpoena.
Nevertheless, most of these forms are either part of general or academic vocabulary and could therefore not be regarded as legal terms
proper, for instance plus, nil, persona, memorandum, caveat or alibi, or
they simply do not occur in isolation but rather as part of phrases. This
is why the study on their speciicity level was extended to Latin phrases, displayed in table 2.
TYPE
FREQUENCY
UKSCC
DISTRIBUTION
RATIO
Ex turpi causa
129
3
∞
Doli incapax
36
1
∞
Quantum meruit
27
5
∞
Mutatis mutandis
24
18
∞
Alter ego
21
5
∞
The general English corpus used in this case was LACELL, a 20 million-word corpus
of general English texts compiled and owned by the LACELL research group from the
English Department at the University of Murcia, Spain.
5
177
Assessing EPAP lexical features: A corpus-based study
TYPE
FREQUENCY
UKSCC
DISTRIBUTION
RATIO
Forum non conveniens
13
3
∞
Actus reus
10
5
∞
Ad litem
10
3
∞
Usque ad coelum
8
1
∞
Pari delicto
7
1
∞
Ratione personae
6
3
∞
Doli capax
5
1
∞
Debet ese
4
1
∞
Ad factum
4
1
∞
Res iudicata
4
2
∞
De novo
4
3
∞
Praesumptio juris
3
1
∞
Jus cogens
3
1
∞
In par material
3
2
∞
De jure
52
5
145,6
Pari passu
28
4
117,6
115
26
96,6
Ultra vires
79
16
82,95
Et seq
29
17
81,2
A fortiori
32
28
67,2
Ex parte
Table 2. Top 25 Latin phrases and their level of specialisation
As shown in table 2, like single-word Latin terms, the average frequency of these phrases is far from the mean value of the whole corpus,
the former being 27.66 whereas the latter is 7 times higher. This data
clearly points at their high level of specialisation, which is reinforced
by the ratio values. 22 out of the 53 phrases mined from UKSCC do not
occur in the general English corpus, being therefore assigned an ininity
178
María José Marín & Camino Rea Rizzo
ratio value and standing at the top of the speciicity rank, namely, mutatis mutandis, quantum meruit or actus reus, amongst other.
In spite of their low frequency, their distribution across the corpus
is quite high. Phrases like de facto, inter alia, prima facie or pro rata
occur in approximately a fourth of the texts in the corpus. Furthermore,
while the average text distribution of all the word types in the corpus
(excluding hapax and dis legomena) is 25.82, an eighth of the texts in
it, Latin terms appear in 14.97 texts on average (under the same conditions), quite a high value given their degree of specialisation.
Summing up, term distribution together with their speciicity may
be considered as two key factors in determining the relevance and representativeness of a word or group of words within a corpus, whereas
frequency simply indicates how many times a word repeats itself. Thus,
the low frequency rates associated with Latin terms in UKSCC should
not be deemed indicative of their little signiicance within the corpus.
On the contrary, their level of specialisation coupled with their considerably high text distribution clearly signals their keyness within the
variety supporting, once again, Alcaraz’s (2000) observations as well as
other scholars’ like Mellinkoff (1963), Tiersma (1999) or Borja (2000).
3.4. Abbreviations in TC: major indings and discussion
As stated in the section devoted to the literature review, the term abbreviation is an umbrella term which covers initials (also called initialism),
acronyms (also called letter words) and clippings (Sager et al., 1980;
Alcaraz, 2000; Plag et al., 2007). First, initialisms are formed by combining only the initial letter of multi-word combinations giving rise to
a sequence of letters which are pronounced individually, in the way in
which the letters are spelt in the alphabet, e.g. TNT, DVD, IP, GPS, etc.
However, when the combination of initial letters is pronounced as regular words following the regular reading rules of English, it becomes
an acronym, e.g. NASA, LASER, NATO, etc. Clippings, in turn, result
from usually monosyllabic or disyllabic words where the irst part of
the word base is kept, e.g. doc from doctor, sec from second, etc. Sometimes, an initial or middle element of the word can be also omitted like
gbyte from gigabyte (Sager et al., 1980; Jackson, 1988; Alcaraz, 2000;
Plag et al., 2007).
179
Assessing EPAP lexical features: A corpus-based study
Corpus analysis corroborates Alcaraz’s description of EPAP, precisely in telecommunications English, where compressed forms play
a crucial role, since they stand for 16% of the terms included in the
Telecommunications Engineering Word List (TEWL) (Rea, 2008). This
lexical repertoire includes the most salient, central and typical specialised lexical units in the domain. They are all found within the range
of the 1000 most statistically signiicant word families in the domain,
as drawn by the comparison of the general language corpus LACELL.
Their specialty index is obtained by applying Chung’s method (2003)
and the keyness index is given by the likelihood test in WordSmith
(Scott, 2008) mirroring the procedure applied to the study of Latin
terms in legal English.
Rank
TEWL
F.Tec
F.Lacell
Ratio
Keyness
1
IP
5,239
20
994,85
16,182
2
TCP
1,717
12
543,41
5,248
3
ATM
1,639
35
177,85
4,817
4
LAN
1,481
27
208,32
4,387
5
OSPF
1,284
0
∞
4,027
6
QOS
1,155
0
∞
3,622
7
VHDL
1,150
0
∞
3,607
8
MPLS
1,112
0
∞
3,487
9
GSM
1,109
4
1052,96
3,427
10
VPN
1,007
5
764,89
3,097
11
IEEE
1,002
9
422,83
3,044
12
LSAS
858
1
3258,58
2,676
13
DSP
906
41
83,92
2,523
14
LSA
804
0
∞
2,521
15
CDMA
805
1
3057,29
2,510
16
CISCO
840
14
227,87
2,498
17
MHZ
792
18
167,11
2,319
18
GHZ
734
2
1393,82
2,275
180
Rank
María José Marín & Camino Rea Rizzo
TEWL
F.Tec
F.Lacell
Ratio
Keyness
19
FPGA
713
0
∞
2,236
20
SCTP
703
0
∞
2,205
21
RF
716
8
339,91
2,161
22
DB
774
36
81,65
2,149
23
WLAN
677
0
∞
2,123
24
ISDN
699
14
189,62
2,061
25
HTTP
801
96
31,69
1,946
Table 3. Top 25 abbreviations in TEWL
As table 3 illustrates, the relevance of abbreviations is evidenced by
their quantitative behaviour both within TEC, the main telecommunications corpus, and TC, the subcorpus of telematics. As already stated,
a whole of 443 abbreviations comprise 16% of the word forms of the
specialised repertoire.
Considering the total number of abbreviations appearing in the term
inventory extracted from TC and the ratio yielded by Chung’s method, there are 237 forms (53%) which are not found in the general corpus, hence they are assumed not to be typical of general language but
characterised by a high degree of specialisation. Such highly technical terms display a keyness index which ranges from 4,027 (OSPF) to
12.5 (VDMS). The higher their frequency in the specialised corpus, the
higher their keyness index. The next group comprises the abbreviations
whose ratio is > 50, which amount to 119 forms (27%), and also occur
in the general English corpus LACELL. They are characterised by their
high frequency in TC and their low frequency in the general corpus,
their keyness being also dependent on their frequency in the former
corpus. The most signiicant abbreviation, TCP, belongs to this group,
being 543 times more frequent in the telecommunications domain than
in general language, and scoring 5,248 in keyness. Finally, the remaining 87 abbreviations (20%) are also used to a greater or lesser extent in
LACELL so that their ratio is < 50. This does not mean that they are not
specialised terms but their use has been extended to general language,
thus being subtechnical. Therefore, their frequencies in both corpora do
not differ that much, although their keyness might vary considerably.
Assessing EPAP lexical features: A corpus-based study
181
The most signiicant unit in this group is HTTP (1,946) and the lowest
score is yielded by GUIS (11). Other forms in this category are the following: RADAR, PC, ID, MAC, WAN, WWW, etc.
A inal perspective is gained when approaching the quantitative behaviour of abbreviations in connection with the whole telecommunications word list (TEWL). When the different values which deine the
lexical behaviour of the terms in the list are taken as reference, the
particular performance of abbreviations may be contrasted so that it is
evidenced to what extent they approach the top and bottom scores. The
most relevant term in TEWL is network (F. TEC: 16,649; F. LACELL
1,686; R: 37,50; K: 41,784) and microchips gets the lowest score in
keyness (F.TEC: 9; F. LACELL: 6; R: 5.69; K: 10). Such references
highlight and clarify the terminological character of abbreviations and
their relevance in the speciic domain, particularly of those which rank
the highest like IP (the ifth most relevant term in TEWL), TCP, ATM,
LAN, OSPF, etc. Moreover, amongst the top 100 words of the speciic
list, there are 14 abbreviations of which OSPF, QOS,VHDL, MPLS and
LSA cannot be found in the general language corpus, and IP, TCP, ATM,
LAN, GSM, VPN, IEEE, LSAS and DSP give a ratio > 50 whereas their
keyness is considerably high ranging from 16,182 (IP) to 2,521 (LSA).
With respect to the different shortening process that abbreviations
undergo, initialisms (360) remarkably stand out from the rest since they
represent 81% of the total. The majority of the abbreviations found in
the speciic corpus come from the combination of the initial letter of
multi-word units which is pronounced as a sequence of letters such as
IP, TCP, ATM, GPRS, SNMP, BGP, DCE, GPS, IGRP, PBX or BS. Concerning acronyms, there are 74 in the list covering 17% of the abbreviations. In that case, the combination of the letters is pronounced as regular words like RADARS, FIFO, VOIP, RIP, IPSEC, PAC, QOS, MAC,
CISCO, OSI, LABVIEW, LDAP, SPICE, etc. Finally, there are only 11
clippings, where the irst or last part of the word base has been kept.
Some abbreviations, particularly acronyms, have been lexicalised
and accepted as full words capable of undergoing compounding, derivation and conversion processes. Clear evidence of this behaviour is
observed directly from the list of abbreviations where pairs of singular and plural forms are found, for example LAN/S, VLAN/S, RAM/S,
COMSAT/S, RADAR/S, FIFO/S, PAC/S, etc. The metal-oxide semiconductor (MOS) family neatly illustrates compounding and how it forms
182
María José Marín & Camino Rea Rizzo
multi-word units which again undergo a shortening process and become
a longer acronym: CMOS (complementary metal-oxide semiconductor), NMOS (n-channel metal-oxide semiconductor), PMOS (p-channel
metal-oxide semiconductor), BICMOS (bipolar complementary metal-oxide semiconductor) and MOSFETs (metal-oxide semiconductor
ield-effect transistors).
In short, it follows from the above that both the quantitative behaviour and the lexicalisation of abbreviations demonstrate their terminological character and typicality in the subject ield as pointed out by
Alcaraz (2000). In addition, all those compressed forms are linguistic
labels which stand for deinitions, being characterised by special reference within telecommunications, even those which have been integrated into the general language. Therefore, standardised abbreviations are
also terms which achieve complete and effective communication in the
specialised language singling it out from general language.
4. Conclusion
Corpus Linguistic techniques can detect automatically what is usual
or unusual in a sublanguage with respect to general language, which
establishes a reference norm, or in comparison to other sublanguages.
In this research, the adoption of a corpus-based approach has allowed to
identify the typical behaviour of the lexicons of legal and telecommunications English, providing a bottom-up depiction of some of their most
relevant characteristics and corroborating the portrayal carried out by
authors such as Alcaraz (2000, 2002). The application of ATR methods
and the quantitative parameters intended to measure how vocabulary
performs in UKSCC, the legal corpus, and TC, the telematics Corpus,
have permitted to depict the use of specialised terminology (including
subtechnical terms) and the outstanding use of Latin terms and phrases
in legal English and abbreviations in telecommunications English.
Our initial hypothesis departed from the assumption that specialised
terminology would behave similarly across EPAP varieties, following
Alcaraz’s (2000; 2002) portrayal. Such hypothesis was conirmed although certain differences were also observed between the two varieties selected for this research, namely, legal and telecommunications
English.
Assessing EPAP lexical features: A corpus-based study
183
Concerning the use of specialised terms in both varieties, the results vary slightly particularly concerning the frequency of these lexical
items in the ield of telecommunications. While terms tended to occur
6 times as much (1,037) as the whole list of types (169.45) identiied
in the legal corpus on average, this value was three times lower (38.62)
than the average for the whole type list (89.93) in TC, the telecommunications corpus. Nevertheless, they were well distributed throughout
both corpora appearing in 13.98% legal texts and 11.08% telecommunications ones and representing 6.6% and 3.44% of the whole list of types
identiied in both text collections respectively.
The literature also signals the signiicance of subtechnical terms in
specialised languages, that is to say, of those terms which can be found
in both specialised and general language contexts either retaining their
technical meaning or activating it when in contact with the specialised
environment. Testing showed that a large proportion of legal and telecommunications terms overlapped with the list of the 3,000 most frequent words of English found in the BNC. In fact, almost half of the
terms in the legal corpus (47.35%) and about one third (35.55%) of the
telecommunications terms could be found amongst these general words.
Within the ield of legal English, Alcaraz (2000) particularly underlines the relevance of Latin words and phrases, which was also tested
from a bottom-up perspective. The results evidenced that their frequency was not as high as expected, that is, if compared with the whole type
list, they stood between positions 400th and 1800th in the frequency
rank. However, when considering only Latin phrases, it appeared that
both their level of specialisation and their distribution throughout the
text collection was much higher, standing at the top of the speciicity
rank and appearing in 14.97 of the texts in the corpus (on average) in
spite of their low frequency.
Finally, the use of abbreviations was also assessed within the ield
of telecommunications English. It was attested that 16% of the terms
in TC were abbreviations (almost one ifth of the whole list), displaying
really high levels of specialisation since 53% of them were not even
found in the general context. In fact, when processing the telecommunications corpus with Keywords (Scott, 2008), abbreviations were
assigned an average keyness value of 1,634 as opposed to the same value for the whole term list, that is, 237.26, which clearly points at their
speciicity and relevance in the corpus.
184
María José Marín & Camino Rea Rizzo
5. References
Alcaraz Varó, Enrique. 2000. El inglés profesional y académico. Madrid:
Alianza Editorial.
Alcaraz Varó, Enrique. 2002. El inglés jurídico: textos y documentos. Madrid:
Ariel Derecho.
Baker, Mona. 1988. Subtechnical vocabulary and the ESP teacher: An analysis
of some rhetorical items in medical journal articles. Reading in a Foreign Language 4(2): 91-105.
Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press.
Borja Albí, Anabel. 2000. El texto jurídico en inglés y su traducción. Barcelona: Ariel.
Cabré, María Teresa. 1993. La teminología. Teoría, metodología aplicaciones.
Barcelona: Antártida/Empúries.
Cabré, MaríaTeresa 2000. Terminologie et linguistique: la théorie des portes.
Terminologies nouvelles. Terminologie et diversité culturelle 21: 10-15.
Callanan, Helen & Edwards, Linda. 2010. Absolute Legal English. London:
Delta.
Cowan, Ronayne. 1974. Lexical and syntactic research for the design of EFL.
TESOL Quarterly 8: 389-399.
Chung, Teresa M. & Nation, Paul. 2003. Technical Vocabulary in Specialised
Texts. Reading in a Foreign Language 15(2): 103-116.
Drouin, Patrick. 2003. Term extraction using non-technical corpora as a point
of leverage. Terminology 9(1): 99-117.
Flowerdew, John. 2001. Concordancing as tool in course design. In Ghadessy,
Mohsen; Henry, Alex & Roseberry, Robert (eds.) Small Corpus Studies
and ELT: Theory and Practice. Amsterdam: John Benjamins.
Frantzi, Katerina T. & Ananiadou, Sophia. 1999. The c/nc value domain independent method formulti-word term extraction. Journal of Natural
Language Processing 3(2): 115-127.
Halliday, Michael. 1988. On the language of physical science. In Ghadessy,
Mohsen (ed.) Registers of Written English: Situational Factors and
Linguistic Features. London: Pinter.
Heatley, Andrew & Nation, Paul. 2002. Range, computer software. Wellington, New Zealand: Victoria University of Wellington.
Krois-Linder, Amy & Firth, Matt. 2008. Introduction to International Legal
English: A course for Classroom or Self-study Use. Cambridge: Cambridge University Press.
Jackson, Howard. 1988. Words and their Meaning. London: Longman.
Assessing EPAP lexical features: A corpus-based study
185
Marín Pérez, María José. 2014. Evaluation of ive single-word term recognition methods on a legal corpus. Corpora 9(1): 83-107.
Marín Pérez, María José. 2016. Measuring the Degree of Specialisation of
Sub-Technical Legal Terms through Corpus Comparison: a DomainIndependent Method. Terminology 22(1): 80-102.
Marín Pérez, María José & Rea Rizzo, Camino. 2012a. Structure and design of
the BLRC: a legal corpus of judicial decisions from the UK. Journal of
English Studies 10: 131-145.
Marín Pérez, María José & Rea Rizzo, Camino. 2012b. How relevant are Latin
wordforms and clusters in legal English? A corpus-based study on the
representativeness and speciicity of such elements in UKSCC: an ad
hoc legal corpus. ES. Revista de Filología Inglesa 33: 161-182.
Marín Pérez, María José & Rea Rizzo, Camino. 2014. Assessing four automatic term recognition methods: Are they domain-dependent? English for
Speciic Purposes World 42: 1-27.
Mellinkoff, David. 1963. The Language of the Law. Boston: Little, Brown &
Co.
Nation, Paul. 2001. Learning Vocabulary in Another Language. Cambridge:
Cambridge University Press.
Nazar, Rogelio & Cabré, María Teresa. 2012. Supervised Learning Algorithms Applied to Terminology Extraction. In Aguado de Cea, Guadalupe; Suárez-Figueroa, Mari Carmen; García-Castro, Raul & Montiel-Ponsoda, Elena (eds.) Proceedings of the 10th Terminology and
Knowledge Engineering Conference (TKE 2012). Madrid: Ontology Engineering Group, Association for Terminology and Knowledge
Transfer, 209-217.
Orts, María Ángeles. 2006. Aproximación al discurso jurídico en inglés. Madrid: Edisofer Libros Juridicos S.L.
Plag, Ingo; Arndt-Lappe, Sabine; Braun, Maria & Schramm, Maria. 2007. Introduction to English Linguistics. Berlin: Mouton de Gruyter.
Rea Rizzo, Camino. 2008. El inglés de las telecomunicaciones: estudio léxico
basado en un corpus especíico (Tesis doctoral). Universidad de Murcia.
Rice, Sally. 2007. Professional English in Use: Law. Cambridge: Cambridge
University Press.
Sager, Juan; Dungworth, David & McDonald, Peter F. 1980. English Special
Languages. Principles and Practice in Science and Technology. Wiesbaden: Brandstetter Verlag KG.
Scott, Mike. 2008. WordSmith Tools version 5. Liverpool: Lexical Analysis
Software.
186
María José Marín & Camino Rea Rizzo
Sparck Jones, Karen. 1972. A statistical interpretation of term speciicity and
its application in retrieval. Journal of Documentation 28: 11-21.
Tiersma, Peter. 1999. Legal Language. Chicago: The University of Chicago
Press.
Wang, Karen & Nation, Paul. 2004. Word Meaning in Academic English:
Homography in the Academic Word List. Applied Linguistics 25(3):
291-314.
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
Translator’s creativity in cultural elements transposition:
a corpus-based study
La creatividad del traductor en la transposición de elementos culturales:
un estudio de corpus
Virginia Mattioli
Universitat Jaume I.
[email protected]
Received: 20/04/2017. Accepted: 9/11/2017
Resumen: En este artículo, se presenta un estudio basado en corpus con el objetivo de
determinar el nivel de creatividad (frente al de convencionalismo) en la traducción de
los elementos culturales. Considerando la creatividad como el uso de aquellas estrategias que manipulan el material léxico del texto de origen, se utilizó la metodología de
la lingüística de corpus para examinar un corpus trilingüe (español, inglés, italiano)
formado por 50 novelas (25 obras originales y las 25 traducciones correspondientes).
La metodología adoptada se estructura en tres fases: (a) identiicación de los elementos
culturales, (b) determinación de las estrategias de traducción y (c) distinción entre técnicas creativas y convencionales. Los resultados demuestran que, por lo que se reiere
a la transposición de los culturemas, los traductores propenden por las técnicas más
creativas.
Palabras clave: lingüística de corpus; estudios de traducción; elementos culturales;
creatividad; técnicas de traducción.
Abstract: This article presents a corpus-based study developed to determine the degree
of creativity (as opposed to conventionalism) in the translation of cultural elements.
Considering creativity as the use of those strategies that manipulate the lexical material
of the source language, a literary corpus consisting of 50 novels (25 translations and 25
corresponding originals) was examined through corpus linguistics. Firstly, culture-speciic elements were identiied; secondly, translation strategies were determined; and
inally, they were placed in conventional or creative groups. The results show that transposition of culture-speciic elements is strictly related to creativity.
Keywords: corpus linguistics; translation studies; cultural elements; creativity; translation techniques.
Mattioli, Virginia. 2017. “Translator’s creativity in cultural elements transposition: a
corpus-based study”. Quaderns de Filologia: Estudis Lingüístics 22: 187-213.
doi: 10.7203/qf.22.11308
Translator’s creativity in cultural elements transposition...
189
1. Introduction
This article aims to assess translators’ creativity in relation to culture-speciicity in a corpus of iction novels by comparing translations
with their corresponding original works. In fact, the morphology of culture-speciic elements suggests that the cultural nature of such lexical
items determines which translation techniques are adopted to transpose
them from one language to another.
After a brief introduction regarding the theoretical frame and the
methodology adopted, this paper describes the analysis carried out to
demonstrate the existence of such a relationship and translators’ tendency to creativity.
Thus, in section one, corpus linguistics is presented and justiied as
the methodology chosen for the research; then cultural elements are
introduced through a chronological presentation of previous authors’
attempts to recognize and translate them; and inally, the concept of creativity is deined and compared with that of conventionalism. In section
two, the case of study is described. Here, the speciic hypothesis and
objective of the research are presented, the corpus used is shown, and
the various phases of the analysis are explained. Lastly, the outcomes
are presented and the results are discussed. The article ends with some
concluding remarks and suggestions for possible future research.
2. Theoretical frame
2.1. Corpus linguistics in translation studies
Since ancient times, a corpus has been deined as a collection of texts
used to study common textual features. In the 1990s, Sinclair (1991:
171) underlined the nature of those texts, which should be natural (produced by human beings) and authentic (produced for real contexts).
In the following years, several authors tried to propose a deinition of
corpus taking into account all the characteristics presented by this set of
texts. Given the multiplicity of features dealt with in this study, we have
adopted the deinition proposed by Sánchez (1995: 8-9). Considering
its origin, purpose, composition, representativeness and extension,
Sánchez deines a corpus as a collection of linguistic data systematized
according to certain criteria, wide enough in range and depth to be representative of the whole language or of some of its varieties. Moreover,
190
Virginia Mattioli
he highlights the value of electronic processing in providing data which
yield varied and useful results for description and analysis.
Corpus linguistics was born at the beginning of the 20th century
(although its effectiveness increased from the 1960s thanks to developments in computing), with the objective of studying the language from
real examples (Sinclair 1991: 171). As in this study corpus linguistics
methodology has been used to observe translators’ behavior with respect to speciic lexical elements, the main interest here is the application of this methodology to translation studies and lexicology. Hence,
according to translation studies scholars, corpus linguistics is very useful (a) to analyze the relationship between source and target text, in particular to describe the translation techniques chosen by translators (Lepinette, 2004: 2-3), and (b) to investigate translated language regularities
and behaviors, observing translation processes, products and functions
(Xiao and Ming, 2009: 237 Toury, 1995: 265 cit. Xiao & Ming 2009:
237). On the other hand, from a lexicological perspective, by using corpus linguistics one can study word frequency, presence, use, characteristics, distribution and collocations (Procházková, 2006: 7-8).
In recent decades, there has been much debate regarding the nature
of corpus linguistics and, considering its deinition and objectives, several authors have questioned whether it should be treated as a discipline
or a methodology.
In this study, priority has been given to the multiplicity of applications of corpus linguistics and, concurring with numerous authors
(Leech, 1992: 105; McEnery and Wilson, 1996: 2 among others), it
has been considered a methodology – more speciically, an empirical
methodology based on the fact that language is a probabilistic system
in which distinct features appear with different frequency. Considering
both the advantages and the shortcomings of corpus linguistics, its application seems convenient for this research on two fronts: on the one
hand, it facilitates identiication of culture-speciic elements and on the
other, the fact that it permits us to analyze a great variety of texts guarantees a broad variety of authorities, topics and translators.
2.2. Cultural elements
Since the 1960s, several scholars of translation studies have demonstrated an increasing interest in cultural elements. Following Nida’s
Translator’s creativity in cultural elements transposition...
191
irst approach in 1964, many other authors focused their attention on
these lexical items and the challenge they represent during the translation process. The main aims of such studies can be condensed into
two groups according to their main objectives. On the one hand, those
authors who attempt to deine and classify cultural elements proposing
various deinitions, apparently without reaching any agreement about
their nature and their identiication. On the other, those who propose
different techniques to transfer such elements from one language to another.
2.2.1. Deinition and classiication of cultural elements
Among the former, Nida (1945) recognizes cultural elements as a problem in translation and classiies them in ive basic categories. Some
years later, Newmark (1988) denominates these elements cultural words
and introduces the concept of cultural language referred to the speciic
language of a certain culture within which it is possible to ind a wide
variety of culture-speciic vocabulary (Newmark, 1988: 94). After him,
Mayoral Asensio (1994: 76) labels as cultural references (referencias
culturales) those elements of the discourse that, because of their reference to the original culture, are completely or partly misunderstood
by the members of the target culture, and Aixelá (1996) focuses on the
absence of these elements in the target culture. Christiane Nord (1997)
adopts Vermeer’s denomination and deinition of cultureme as a “social
phenomenon of a culture X that is regarded as relevant by members of
this culture and, when compared with a corresponding social phenomenon in a culture Y, is found to be speciic to culture X” (Vermeer, 1980;
cfr. Nord 1997: 34). Finally, in this century, Santamaria (2001) deines
and organizes cultural references in a detailed classiication consisting
of numerous categories and subcategories.
As this diachronic presentation of the studies regarding culture-speciic items suggests, it seems that no agreement has been reached among
the authors and that none of them explains clearly how to recognize a
culture-speciic element within a text. Moreover, some scholars focus
on the changeable nature of cultural elements over time and following
linguistic changes. In this sense, Molina Martínez (2001) considers that
they exist only in those situations characterized by a cultural transfer
–that is, in a translational context.
192
Virginia Mattioli
As a result, it seems that the most commonly accepted characteristics of culture-speciic elements are their speciicity with respect to the
original culture; their absence in the target culture; and their connotative
value. Considering that the authors’ divergent positions and the lack of
a proper deinition of culture-speciic elements make it impossible to
determine systematically whether they present or not a cultural nature,
in this study they have been identiied through their morphological
structure (i.e. formation process, construction and origin of a word). In
fact, the use in a language X of words borrowed from other languages
implies the absence of such terms in the patrimonial vocabulary of the
language X. According to Delwey (1950: 60-61 cit. Molina Martínez
2001: 23), language is a product of culture; hence, the absence in language X of a word to deine an object or concept denotes the absence of
such an object or concept in the X culture. Consequently, words imported from a language Y to a language X designate objects or concepts that
originally belong to the Y culture and that therefore can be considered
culturally speciic to the Y culture. In this paper, therefore, culture-speciic elements are taken to be all those words that present a morphological structure alien to the word formation rules of the language of the
analyzed text (imported from a different language, thus from a different
culture). Some examples of culture-speciic elements identiied in the
novels translated into Italian and analyzed in the study are “bistrot”,
“whisky” and “sari” – words borrowed from foreign languages to designate objects that did not exist in the contemporary Italian culture (hence
the lack of an Italian word to label them) – that represent respectively
the French, Scottish and Indian cultures.
2.2.2. Treatment and translation of cultural elements
As regards the treatment of cultural elements, Nida (1964) initially
proposes three basic methods to translate these references –addition,
omission and conversion– to which he later adds some other solutions.
His attempt is followed by Vázquez Ayora (1977: 251-384), Newmark
(1988: 103-104), and Molina Martínez (2006), among many others.
With the same purpose, some scholars prefer to organize translation
techniques along a continuum instead of classifying them in categories.
Among these, Mangiron (2006) distributes translation techniques along
a line, ordering them from the most faithful to the source language and
Translator’s creativity in cultural elements transposition...
193
culture (transposition) to the most adapted to the target culture (cultural
adaptation).
Adopting terms coined by Venuti (1995), these two opposite extremes can be named respectively foreignization and domestication.
Some other authors, instead, study the factors that inluence the
choice of the most appropriate translation technique among the ones
suggested. In this sense, the precursor is Newmark (1998: 103), who
in 1988 focused on six factors –text inality, readers’ motivation and
cultural level, importance of the cultural reference in the original text,
area of use, novelty and future of the term.
While all the authors analyzed seem to agree on the factors to be
taken into account at the moment of the linguistic transfer, there are still
many discordant proposals regarding possible translation solutions to
overcome the problems created by the cultural differences. To resolve
these controversies and try to consider the most ample gamut of techniques possible, in this study the two main kinds of proposals have been
merged and a new taxonomy of translation techniques has been suggested. The proposal, shown in igure 1, is composed of 15 techniques
ordered in a continuum, from the most exotic to the most domesticated,
each one deined and exempliied below.
Fig. 1. Continuum of translation techniques used in the present study
• Transposition: maintenance of the original foreign word (Fish
and Chips > Fish and Chips)
• Transposition of proper name: maintenance of the original proper name (Victoria street > Victoria street)
194
Virginia Mattioli
• Borrowing: maintenance of an original foreign word recognized
by the dictionary of the target language (Web > Web)
• Naturalization: adaptation to the target language phonetics
(school bus > scuolabus)
• Literal translation: literal translation of the culture-speciic element (email > posta elettronica)
• Neutralization: explication by means of words that explain the
function or the characteristics of the culture-speciic element
(turf > tappeto erboso del giardino)
• Hyperonym or hyponym: generalization or speciication (respectively: bus station > stazione and knife > machete)
• Accepted standard translation: non-literal translation accepted by
the vocabularies and the grammars of the target language (conference committee > commissione congiunta)
• Paraphrase: addition of explication within the text (gondola >
gondola, a narrow Venetian boat)
• Footnote: addition of information in a footnote (prega Santa Lucia per recuperare la vista > she prays to Saint Lucy to recover
her sight1 - 1. Saint Lucy is considered the protector of sight,
because of her name, Lucia, from the Latin word “lux” which
means “light”).
• Omission: omission of a culture-speciic element (watching
Friends on the TV > guardare la televisione)
• Functional or cultural equivalent: the use of a different element
with the same cultural value of the original one (BA degree >
laurea triennale)
• Addition: addition of information absent in the source text (they
drove back > tornarono indietro con la jeep)
• Lack of semantic or formal correspondence: translation presents
a divergence of meaning or style with respect to the source text
(respectively: on the corner of Sloane Street > all’angolo di piazza Sloane and snatching from street urchins > furti dei bambini
di strada)
• Autonomous creation: introduction of a cultural element that was
absent in the source text (he sat and ate calmly > si sedette e mangió con calma le sue tagliatelle )
Translator’s creativity in cultural elements transposition...
195
2.3. Creativity vs. conservationism
Gil-Bardají (2003: 96), adopting Toury’s (1974; cit. Gil-Bardají, 2003)
deinition, considers norms as a set of regularities in a translator’s behavior determined by a certain socio-cultural situation.
Kenny (2001: 66) transfers the concept of normalization to corpus-based translation studies and deines it as the use of conventional
target translation solutions (opposed to the adoption of unusual source
text features). The author adds that normalization can be applied at any
language level and denominates the application of such techniques
to individual words or collocations lexical normalization. So, Kenny
(2001: 66) relates the idea of normalization to that of conventionalism.
According to Corpas Pastor (2001), traditional (hence conventional)
translation techniques are those that maintain a sort of equivalence between source and target text.
Despite the debatable nature of the concept of equivalence, in this
study equivalence is observed from a formal and a semantic perspective, so items are considered equivalent (hence conventional) only
when they present both a formal and semantic correspondence –respectively in terms of signiier and meaning. Hereafter, all those techniques
characterized by some kind of omission, addition, manipulation or alteration of the original lexical material (see the previous section 2.2.2
for the techniques taxonomy adopted in this study) are considered not
equivalent, thus not conventional, and consequently creative.
From here, in this paper translation strategies are divided into conventional and creative ones. The irst group includes only literal translations, as they are the only ones that present a complete level of equivalence –both from a formal and a semantic point of view. On the opposite
side, all the other techniques considered in the range presented in igure
1 are characterized by some kind of modiication of the original material, so by some sort of nonequivalence (lexical, semantic or both), hence
they are assigned to the creative strategies group.
This division enables us to observe and classify translators’ behavior regarding culture-speciic elements in terms of creativity: do they
tend to maintain equivalence with the original elements (using literal
translations) or do they prefer a more creative approach, modifying and
manipulating the original items (using one of the techniques included in
the creative strategies group)?
196
Virginia Mattioli
3. Case of study
3.1. Hypothesis and objectives
The object of this research is to assess translators’ creativity in relation
to culture-speciic elements. With this goal, corpus linguistic methodology was used to observe this feature in a set of translated novels,
starting from the hypothesis that translators prefer creative techniques
to transpose culture-speciic items –according to the division between
creative and conservative techniques proposed in the previous section.
Actually, the relation between culture-speciicity and foreign morphological structure (explained in section 2.2) seems to support this
supposition. To corroborate this hypothesis, three semantic classes of
culture-speciic items were considered: (a) food and drinks, such as
“curry”, “bistrot” or “cognac” (b) communication and transportation,
like “jeep”, “parkway” or “roulotte” and (c) clothes and body care, e.g.
“tweed”, “gilet” or “sari”. Once the items had been identiied in a balanced and representative corpus, the techniques used to translate them
were established by comparing aligned originals with translations. Finally, the results were observed to establish translators’ preference for
creative or conservative behavior.
3.2. Corpus used
The corpus used in the study, named LIT_TRAD, is compounded of two
parallel subcorpora of award-winning iction novels published between
2000 and 2014 and translated from English and Spanish into Italian.
The two sets of novels are denominated LIT_TRAD_EN_IT – which
includes 26 novels (13 English originals and 13 Italian translations) –
and LIT_TRAD_ES_IT – which is formed of 24 novels (12 Spanish
originals and 12 Italian translations)–. Table 1 shows the details of the
works included (original and translated versions) and their distribution
within the two subcorpora:
197
Translator’s creativity in cultural elements transposition...
Subcorpus Name
Linguistic
Pairs
EN>IT
Original Novels
Translations
Original_en and Original_es
Target_en_it and Target_es_it
Author
Title
Year of
Publication
Translator
Title
Year of
Publication
Atwood,
Margaret
Oryx and
Crake
2003
Belletti,
Raffaella
L’ultimo
degli uomini
2003
Auster, Paul
The
Brooklyn
follies
2005
Bocchiola,
Massimo
Le follie di
Brooklyn
2005
Banville,
Jhon
The sea
2005
Kampmann, Eva
Il mare
2006
Coetzee,
Jhon Maxwell
Elizabeth
Costello
2003
Baiocchi,
Maria
Elizabeth
Costello
2003
Cunningham, Michael
Specimen
days
2005
Cotroneo,
Ivan
Giorni
memorabili
2005
De Lillo,
Don
Cosmopolis
2003
Pareschi,
Silvia
Cosmopolis 2003
Desai, Anita
The artist
of disappearance
2011
Nadotti,
Anna
L’artista
della
sparizione
2013
Ghosh,
Amitav
The hungry tide
2004
Nadotti,
Anna
Il paese
delle maree
2005
Lessing,
Doris
Alfred and
Emily
2008
Pareschi,
Monica
Alfred e
Emily
2010
Morrison,
Toni
Home
2012
Fornasiero, A casa
Silvia
Potok,
Chaim
Old men at 2001
midnight
Muzzarelli, Mara
Vecchi a
mezzanotte
2002
Roth, Philip
The plot
against
America
2004
Mantovani,
Vincenzo
Il complotto contro
l’America
2005
Suraiprasad
Naipaul,
Vidiadhar
Half a life
2001
Cavagnoli,
Franca
La metà di
una vita
2002
2012
198
Subcorpus Name
Linguistic
Pairs
ES>IT
Virginia Mattioli
Original Novels
Translations
Original_en and Original_es
Target_en_it and Target_es_it
Author
Title
Year of
Publication
Translator
Title
Year of
Publication
Bryce
Echenique,
Alfredo
El huerto
de mi
amada
2002
Bovaia,
Roberta
Il giardino
della mia
amata
2003
Cercas,
Javier
Soldados
de Salamina
2001
Cacucci,
Pino
Soldati di
Salamina
2002
Marías,
Javier
Los enamo- 2011
ramientos
Felici,
Glauco
Gli innamo- 2012
ramenti
Montero,
Rosa
La loca de
la casa
2003
Finassi
Parolo,
Michela
La pazza di
casa
2004
Muñoz
Molina,
Antonio
El viento
de la luna
2006
Nicola,
Maria
Il vento
della luna
2008
Piglia, Ricardo
Blanco
Nocturno
2011
Cacucci, P.
Bersaglio
notturno
2011
Restrepo,
Laura
Delirio
2004
Simini, D.
Delirio
2005
Rosa, Isaac
El vano
ayer
2005
Annabella
Cardinali
Il vano ieri
2007
Skarmeta,
Antonio
El baile de
la victoria
2003
Collo,
Paolo
Il ballo
della vittoria
2005
Vargas
Travesuras
Llosa, Mario de la niña
mala
2006
Felici,
Glauco
Le avventure della
ragazza
cattiva
2006
Vazquez
Montalbán,
Manuel
El hombre
de mi vida
2000
Hado,
Lyria
L’uomo
della mia
vita
2000
Vila-Matas,
Enrique
El viaje
vertical
2001
Cattaneo,
S.
Il viaggio
verticale
2006
Table 1. Composition of LIT_TRAD
Translator’s creativity in cultural elements transposition...
199
After a close study of the literature on corpus compilation, the works
to be included in the collection were chosen according to the following
criteria:
• Representativeness (from a qualitative and quantitative point of
view). Firstly, all the novels selected had been awarded international literary prizes, to satisfy the qualitative representativeness
criterion. Then, once the corpus had been compiled, its quantitative representativeness was assessed using ReCor (Corpas
Pastor, Seghiri, Maggi 2006), a statistical program speciically
developed to evaluate the quantitative representativeness of a
corpus a posteriori, according to the number of words and of
texts that it includes.
• Inclusion of whole texts: to achieve the aim of the study, identifying as many culture-speciic elements as possible.
• Balance: the two subcorpora include the same number of works
and, despite the inclusion of entire texts, they are still comparable
as regards the number of words.
• Variability: the original novels selected are written in different
varieties of English and Spanish to guarantee a high level of variability.
• Authenticity: the texts included are literary works written for real
contexts by native authors.
To facilitate the identiication of the culture-speciic elements, the
corpus was semantically tagged using USAS (UCREL Semantic Analysis System) developed by the UCREL research group of the University
of Lancaster (Piao et al., 2016). This tagging system adds after each
word an underscore followed by a code formed of numbers and letters
(e.g. _F1 for food related words).
3.3. Analysis
The two subcorpora were analyzed separately and at the end, results
were compared. The analytic process can be divided into 4 steps:
1. Selection of the culture-speciic elements
200
Virginia Mattioli
2. Comparison of the translated culture-speciic elements with the
corresponding original items
3. Determination of the translation technique used in each case
4. Comparison between the results obtained from the two subcorpora.
Various programs and tools were used to analyze the texts.
In the irst phase, a word list was created for each target corpus
(TARGET_EN_IT and TARGET_ES_IT) using AntConc (Anthony,
2014). Then, the terms related to the three semantic categories considered in this study (see section 3.2) were identiied in the lists. This
process was facilitated by the format of the semantic tagging used. In
fact, searching for each tag in the concordance list, the outputs present
the searched node in the middle of each line (in blue in the screenshot
in igure 2 below), and on its left all the terms included in the related
semantic category (in red in the screenshot in igure 2).
Fig. 2. Extract from the results of the search for the semantic tag F1 (food)
in TARGET_EN_IT in AntConc (Anthony, 2014)
Among the words belonging to each semantic ield considered,
only culture-speciic elements were selected manually according to
their morphological structure (only the words with a foreign morphology were chosen). To follow the example given in igure 2, among the
words related to the semantic ield of food (in red) –identiied by means
of the search for tag F1–, only the ones with a foreign morphological
structure were chosen, thus only the word “yogurt”.
Among the elements speciic to foreign cultures (which present a
foreign morphological structure), those items that are speciic to Italian
culture were also considered, to observe their treatment in the transfer
from the source languages studied to the Italian target language: are they
present in the foreign novels, or are they added by Italian translators?
And if they are present in the source text, which techniques does the
translator use to transpose them into Italian without losing their exotic
201
Translator’s creativity in cultural elements transposition...
Italian-style function (if any)? With this objective, also those words
with an Italian morphological structure that are frequently used in foreign languages (like “panini”, “vespa” or “spaghetti”) were included.
Finally, the elements of the resulting lists were subjected to a further selection in order to assure a high level of representativeness and
to exclude from the study the terms that do not represent any speciic
culture. This selection excluded the following culture-speciic elements
from the analysis:
• those items with a frequency lower than 10 occurrences;
• those items that appear in fewer than three different novels;
• those items that could not be considered culture-speciic elements, despite presenting a morphological structure external to
Italian grammar, because of their complete assimilation into Italian daily life and language, as demonstrated by a high frequency
in general Italian corpora (e.g. jeans, computer, internet, etc.).
As a result, only the elements that satisied these criteria were analyzed.
The complete lists of the culture-speciic elements resulting from
this selection process that were analyzed in the present study are presented in tables 2 and 3:
Culture-Speciic
Element
Frequency
Semantic Class
Original
Language
Avenue
157
Communication and Transportation
FR
Street
157
Communication and Transportation
EN
Taxi
67
Communication and Transportation
FR
Sari
49
Clothing and Body care
HI
Camion
43
Communication and Transportation
FR
Autobus
42
Communication and Transportation
FR
Garage
40
Communication and Transportation
FR
Station
37
Communication and Transportation
EN
Road
31
Communication and Transportation
EN
Square
29
Communication and Transportation
EN
Pullman
25
Communication and Transportation
EN
Jeep
23
Communication and Transportation
EN
202
Culture-Speciic
Element
Virginia Mattioli
Frequency
Semantic Class
Original
Language
Scotch
23
Food and drink
EN
Picnic
18
Food and drink
EN
Vodka
18
Food and drink
RU
Whisky
16
Food and drink
EN
Toast
17
Food and drink
EN
Champagne
15
Food and drink
FR
Brandy
14
Food and drink
EN
Parkway
14
Communication and Transportation
EN
Pizza
14
Food and drink
IT
Sandwich
14
Food and drink
EN
Slogan
14
Communication and Transportation
EN
Mais
13
Food and drink
ES
Berretto da
baseball
12
Clothing and Body care
EN
Roulotte
12
Communication and Transportation
FR
Curry
10
Food and drink
HI
Tunnel
10
Communication and Transportation
FR
10
Clothing and Body care
EN
Tweed
TOTAL
944
Table 2. Culture-speciic elements selected for the analysis in LIT_TRAD_EN_IT
Cultur-Speciic
Element
Frecuency
Semantic Class
Original
Language
Calle
163
Communication and Transportation
ES
Taxi
105
Communication and Transportation
FR
Avenida
58
Communication and Transportation
ES
Autobus
51
Communication and Transportation
FR
Champagne
49
Food and drinks
FR
Whisky
39
Food and drinks
EN
Bistrot
29
Food and drinks
FR
Camion
23
Communication and Transportation
FR
Panini
20
Food and drinks
IT
Gin
16
Food and drinks
EN
203
Translator’s creativity in cultural elements transposition...
Cultur-Speciic
Element
Frecuency
Original
Language
Semantic Class
Sandwich
16
Food and drinks
EN
Reportage
14
Communication and Transportation
FR
Tunnel
13
Communication and Transportation
EN
Gilet
12
Clothing and body care
FR
Cognac
11
Food and drinks
FR
Dessert
10
Food and drinks
FR
TOTAL
629
Table 3. Culture-speciic elements selected for the analysis in LIT_TRAD_ES_IT
Tables 4 and 5 show the number of elements identiied in each step
of this irst phase of analysis for each subcorpus (the number of elements included in the three semantic classes chosen, the culture-speciic elements identiied among them and the most representative ones
selected for the analysis):
Food and
Drink
Total
Tot.
Semantic
elements
Culturespeciic
elements
identiied
Culturespeciic
elements
analyzed
%
Tot.
Clothes and
body care
%
Tot.
Transportation
and communication
%
Tot.
%
Tokens
8969
--
2248
25%
2213
24%
4507
50%
Types
972
--
385
40%
267
27%
320
33%
Tokens
1945
22%
*
493
22%
**
253
11%
**
1199
27%
**
Types
285
36%
*
138
36%
**
57
21 %
**
90
28%
**
Tokens
944
49%
***
172
35%
****
71
28%
****
701
58%
****
Types
29
10%
***
11
8%
****
3
5%
****
15
17%
****
* % of the total semantic elements, ** % of the total semantic elements of the category,
*** % of the total culture-speciic elements, **** % of the total culture-speciic elements
of the category
Table 4. Culture-speciic elements identiied in LIT_TRAD_EN_IT
204
Virginia Mattioli
Food and
Drink
Total
Semantic
elements
Culturespeciic
elements
identiied
Culturespeciic
elements
analyzed
Clothes and
body care
Transportation
and communications
Tot.
%
Tot.
%
Tot.
%
Tot.
%
Tokens
7599
--
2234
29%
1923
25%
3442
45%
Types
997
--
443
44%
260
26%
294
29%
Tokens
1594
21%
*
588
26%
**
168
9%
**
838
24%
**
Types
348
35%
*
184
42%
**
56
22%
**
108
37%
**
Tokens
629
39%
***
190
32%
****
12
7%
****
427
50%
****
Types
16
5%
***
8
4%
****
1
1%
****
7
6%
****
* % of the total semantic elements, ** % of the total semantic elements of the category, *** % of
the total culture-speciic elements, **** % of the total culture-speciic elements of the category
Table 5. Culture-speciic elements identiied in LIT_TRAD_ES_IT
The second and the third phases aimed to establish the translation
technique used in each case, starting respectively from the target and
the original text. These steps were carried out using the AntPConc program (Anthony, 2013), which searches for an item in one of the two
aligned corpora and shows the resulting concordances in both of them.
The second phase, characterized by the search for the culture-specific elements identiied in the target corpus, revealed the corresponding
original form of each item. The screenshot in igure 3 shows the search
for the item “roulotte”, as an example of this step.
In the example in igure 3, the culture-speciic element “roulotte”
was searched for in the target corpus. By comparing the outcomes
shown in the upper and the lower part of the screen (respectively, the results of the search in the target and the source corpus) it was possible to
determine the corresponding original terms in the source corpus, in this
case “trailer” and “caravan”. The comparison also revealed whether the
translator had added any culture-speciic element originally absent in
the source text (a case that would imply a high degree of translator’s
creativity).
Translator’s creativity in cultural elements transposition...
Fig. 3. Search for the culture-speciic element “roulotte” (caravan) in the Italian target corpus TARGET_EN_IT
and its comparison with the source language aligned corpus LIT_TRAD_EN_IT
205
206
Virginia Mattioli
In the third phase, the original forms of each culture-speciic element were searched for in the source corpus (following the example
in igure 3, the words “caravan” and “trailer” were searched for in the
source corpus). Through this search, it was possible to establish which
of the translation techniques included in the proposed taxonomy presented in igure 1 had been used. The results obtained from the second
and third phases of the analysis applied to each subcorpus are detailed
in the following table (table 6):
Occurrences
Techniques
LIT_TRAD_EN_IT
occurrences
Transposition
13
%
33%
LIT_TRAD_ES_IT
occurrences
0
31%
Transposition of proper name
389
Borrowing
630
53%
306
43%
5
< 1%
2
< 1%
60
5%
89
12%
Naturalization
Literal translation
222
%
Neutralization
0
0%
0
0%
Hyperonym
8
< 1%
8
1%
Hyponym
2
< 1%
1
< 1%
12
1%
0
0%
Paraphrase
0
0%
0
0%
Footnote
0
0%
0
0%
Omission
21
2%
8
1%
5
< 1%
6
<1%
Addition
15
1%
27
4%
Lack of semantic or formal
equivalence
13
1%
9
1%
3
< 1%
0
0%
13
1%
5
< 1%
Standard accepted translation
Cultural or functional equivalent
Autonomous creation
Other techniques
Table 6. Translation techniques used in LIT_TRAD
Considering the wide use of borrowings and transpositions, a further
analysis was carried out to explore the origin of the foreign terms. In
this case, translators’ creativity was assessed according to the original-
207
Translator’s creativity in cultural elements transposition...
ity of such items with respect to the source text. To this end, different
values were attributed to the elements adopted, depending on:
• whether a word had been adopted from the source language but
it was absent in the original text (high level of creativity) (e.g.
making his pitch from his knees > lanciando i suoi slogan in
ginocchio)
• whether a source word had been transferred from the source to
the target text through a borrowing or a transposition from a language different from the source one (mid level of creativity) (e.g.
buttering corn bread > imburrava pane di mais)
• whether the foreign word used in the translation was the same
one used in the source text (low level of creativity) (e.g. a cheap
printed sari > un modesto sari di tessuto stampato).
The detailed results of this comparison are presented in tables 7 and
8 below:
Creativity
level
More
creativity
Less
creativity
Translator’s behavior
Type of technique
Foreign elements added by the
translator
32 Borrowing
Foreign elements added from
languages other than the
source one
156 Borrowing
Foreign element directly transposed from the original text
806 Transposition
31
Other techniques
Borrowing
1
156
413
393
Table 7: translator’s creativity in the use of borrowings in LIT_TRAD_EN_IT
Creativity
level
More
creativity
Less
creativity
Translator’s behavior
Type of technique
Foreign elements added by the
translator
1 Borrowing
1
Foreign elements added from
languages other than the
source one
96 Borrowing
96
Foreign element directly transposed from the original text
420
Transposition
221
Borrowing
199
Table 8: translator’s creativity in the use of borrowings in LIT_TRAD_ES_IT
208
Virginia Mattioli
As a last step, the results obtained from the two subcorpora analyzed
were compared.
3.4. Discussion
The outcomes of the analysis show that the quantity of culture-specific elements identiied in the two subcorpora (LIT_TRAD_EN_IT and
LIT_TRAD_ES_IT) is similar for both pairs of languages. However,
the proportion of tokens to types is higher in the subcorpus of novels
translated from English (1945 tokens and 285 types) than in the one
composed of Spanish translations (1594 tokens and 348 types). This
difference indicates that Spanish translations present a greater variety of
culture-speciic elements, each one with a lower number of occurrences.
On the other hand, regarding the items analyzed, there is a signiicant
quantitative difference between the two subcopora. In fact, after selection according to the representativeness and culture-speciicity criteria
(see section 3.3), in LIT_TRAD_EN_IT 29 culture-speciic elements
were analyzed (10% of the total) while in LIT_TRAD_ES_IT only 16
(5%) (see tables 4 and 5). This difference underpins the results obtained
for the total culture-speciic elements explained above: in LIT_TRAD_
ES_IT there is a greater variety of items with a lower frequency, so that
only few of them met the representativeness criteria (being present in
more than 3 novels and presenting at least 10 occurrences) and were
selected for the analysis. With regard to culture speciicity, there are no
differences between the two subcorpora: in the English-Italian one, 4
elements were eliminated because of their assimilation into the target
culture, and in the Spanish-Italian corpus, 5. It is interesting to note that
the eliminated elements are the same in the two subcorpora (in both
groups of texts the words “ilm”, “computer”, “jeans”, “internet” and
in LIT_TRAD_ES_IT also “yoghurt” were eliminated). These results
also show that the words most assimilated into the Italian language and
culture are the English ones (regardless of the source language of the
texts). Because of the different number of elements, all the comparisons
between the outcomes obtained in the two subcorpora are expressed in
percentages.
Regarding the translation techniques used, as shown in table 6, the
most commonly-used strategies are borrowings (used in 53% of the
cases in the novels translated from English and 43% in those translated
Translator’s creativity in cultural elements transposition...
209
from Spanish) and transpositions (33% and 31% respectively, considering both transposition and transposition of proper names). On the other
hand, literal translations had been used in only 5% of the occurrences
in LIT_TRAD_EN_IT and 12% in LIT_TRAD_ES_IT. These results
conirm the initial hypothesis and demonstrate that in transposing culture-speciic elements, translators tend more to creativity than to conventionalism.
Comparing the two language pairs considered, Spanish-into-Italian
translators seem to be more faithful to the original text, thus presenting
a lower level of creativity (considering creativity –opposed to conventionalism– as any kind of manipulation of the source text that causes
any sort of nonequivalence: see section 2.3). In fact, LIT_TRAD_ES_
IT presents a lower percentage of borrowings than LIT_TRAD_EN_IT
(43% as opposed to 53%) and a higher one of literal translations (12%
versus 5%). Considering also the origin of such borrowings, the level
of creativity is higher in translations from English than in those from
Spanish. Actually, in LIT_TRAD_EN_IT, although the majority of the
foreign words are transposed directly from the source text (81%) or
come from a language different from the source one –usually French–
(15%), in 3% of the borrowings translators decided to add a word from
English that was absent in the original texts, demonstrating a higher
level of initiative and creativity. On the other hand, in LIT_TRAD_ES_
IT translators opted almost always to use the same terms as the original
text or to substitute them with words from other languages (respectively in 81% and 19% of the use of foreign words), but in just one case
(0,2%) a borrowed word from Spanish that was absent in the original
text was added to the translation (see tables 4 and 5). These results
could be interpreted as being related to the socio-cultural prestige of the
languages analyzed. English is a prestigious language in the centre of
the polysystem (according to the polysystem theory proposed by Even
Zohar, 1990), so it is less translated and translators tend to maintain
English words in the target texts. On the other hand, Spanish is a marginal language in the polysystem with a low degree of socio-cultural
prestige; consequently, translators are less interested in maintaining
items from this language in the target texts and frequently exchange
them with terms adopted from other languages which are external to the
linguistic pair but more prestigious.
210
Virginia Mattioli
These outcomes also suggest that the techniques used do not depend on the similarity or difference between the source and the target
language, but on the degree of socio-cultural prestige of a language.
Speciically, the greater use of literal translation in Spanish-Italian
translations does not seem to depend on the afinity between Spanish
and Italian (in fact, in translating into Italian from Spanish –a closer
language than English to Italian– translators frequently opt to add many
English words that are completely different from both the source and
the target language, instead of maintaining a Spanish term more similar
to the target language).
4. Conclusions
The aim of this study was to assess translators’ creativity in the transposition of culture-speciic elements. To reach this objective a corpus-based analysis was applied to a set of 25 translated novels focusing
on the techniques chosen by translators to transpose culture-speciic
elements from certain semantic ields (food and drink, clothing and
body care, and transportation and communication). The results of the
analysis show that the most commonly used techniques are borrowings
and transpositions. These outcomes corroborate the initial hypothesis,
demonstrating that translators do indeed prefer to adopt creative techniques to transpose culture-speciic items, and suggest that translation
helps to enlarge target-language lexis from two perspectives. On the
one hand, translators’ choices tend to enlarge the vocabulary of the target language by importing terms from other languages and helping to
increase their frequency of use. On the other, translation –as linguistic
and cultural transfer– contributes to multiculturalism by enriching the
target culture with words and concepts from the source language as well
as from other different languages and cultures.
From a methodological perspective, the choice of a corpus linguistics method enabled us to reach the initial goal, and it proved a useful
approach to identify culture-speciic elements in an ample range of texts
and analyze their translation techniques electronically, thanks to the
use of several tools appropriate to each different phase and goal. These
results appear to suggest two considerations regarding corpus-based
methods: irstly, this methodology can be successfully applied to literary texts, and speciically to literary translation; secondly, the applica-
Translator’s creativity in cultural elements transposition...
211
tion of this method to lexical and terminological research shows itself
to be highly effective.
This article is only a irst approach to the study of creativity in the
translation of lexical elements focusing on culture-speciic items. It
could be followed by further research into the role of translation in the
adoption of new lexical units and in the extension of vocabulary. There
is ample scope for continued investigation of translators’ creativity in
relation to culture-related items, from both perspectives: translation
process (observing their transposition) and product (analyzing their
form in the target language). In this sense, further analysis could focus on the study of other semantic categories of culture-speciic items,
on lexical elements related to the discourse of a speciic culture, or on
those lexical elements that represent the culture of speciic social classes or groups. Moreover, the methodology proposed in this paper could
be replicated to observe the characteristics of other lexical units –not
necessary linked to culture-speciicity– from the same translational perspective.
5. References
Aixelá, Javier Franco. 1996. Culture-speciic Items in Translation. In Ávarez,
Román & Vidal, M. Carmen-África (eds.) Translation, Power Subversion. Clevedon, Philadelphia, Adelaide: Multilingual Matters, 52-78.
Anthony, Laurence. 2014. Antconc (Version 3.4.1) [Computer Software]. Tokyo, Japan: Waseda University. Http://www.antlab.sci.waseda.ac.jp/
[Accessed 01/12/2015].
Anthony, Laurence. 2013. AntPConc (Version 1.0.3) [Computer Software].
Tokyo, Japan: Waseda University. Http://www.antlab.sci.waseda.ac.jp/
[Accessed 01/12/2015].
Corpas Pastor, Gloria. 2001. La creatividad fraseológica: efectos semántico-pragmáticos y estrategias de traducción. Paremia 10: 67-78.
Corpas Pastor, Gloria; Seghiri Domínguez, Miriam & Romano, Maggi. 2006.
Recor: método para la determinación de la representatividad de un
corpus, patente n. ES2320511 de la Universidad de Málaga, http://
umapatent.uma.es/es/patent/metodo-para-la-determinacion-de-la-representa4b0/ [Accessed 01/06/2016].
Even-Zohar, Itamar. 1999 La posición de la literatura traducida en el polisistema literario. Traducción de Montserrat Iglesias Santos revisada por el
autor. In Iglesias Santos, Montserrat (ed.) Teoría de los Polisistemas.
Madrid: Arco [Bibliotheca Philologica, Serie Lecturas], 223-231.
212
Virginia Mattioli
Gil-Bardají, Anna. 2003. Procedimientos, técnicas, estrategias: operadores del
proceso traductor. Recercat, Universitat Autònoma de Barcelona. http://
hdl.handle.net/2072/8998 [Accessed 16/04/2017].
Kenny, Dorothy. 2001. Corpus and Creativity in Translation. A Corpus-based
Study. St. Jerome Publications.
Leech, Geoffrey. 1992. Corpora and theories of linguistic performance. In
Svartvik, Jan (ed.) Directions in Corpus Linguistics. Proceedings of
Nobel Symposium 82, Stockholm, 4-8 August 1991. Berlin/New York:
De Gruyter.
Lepinette, Brigitte. La historia de la traducción. Metodología. Apuntes
bibliográicos. HISTAL 2004. http://www.histal.ca/wp-content/uploads/2011/08/La-historia-de-la-traduccion-metodologia-apuntes-bibliograicos.pdf [Accessed 20/04/2017].
Mangiron i Hevia, Carme. 2006. El tractament dels referents culturals a les
traduccions de la novel·la Botxan: la interacció entre els elements textuals i extratextuals (PhD thesis). Barcelona: Universitat Autònoma de
Barcelona, Departamento de Traducción e Interpretación. http://hdl.
handle.net/10803/5270 [Accessed 30/10/2014].
Mayoral Asensio, Roberto. 1994. La explicitación de la información en la traducción intercultural. In Hurtado A. (ed.) Estudis sobre la traducció.
Castellón de la Plana: Publicacions de la Universitat Jaume I.
McEnery, Tony & Wilson, Andrew. 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press.
Molina Martínez, Lucía. 2001. Análisis descriptivo de la traducción de los
culturemas árabe-español. Barcelona: Universitat Autònoma de Barcelona, Departamento de Traducción e Interpretación. http://hdl.handle.
net/10803/5263 [Accessed 30/10/2014].
Molina Martínez, Lucía. 2006. El otoño del pingüino. Castellón de la Plana:
Publicacions de la Universitat Jaume I.
Newmark, Paul. 1988. A Textbook of Translation. New York: Prentice Hall.
Nida, Eugene. 1945. Linguistics and Ethnology in Translation Problems.
Word 1.
Nida, Eugene. 1964. Toward a Science of Translating: With Special Reference
to Principles and Procedures Involved in Bible Translating. Leiden:
Brill Archive.
Nord, Christiane. 1997. Translating as a Purposeful Activity. Manchester: St
Jerome.
Piao, Scott et al. 2016. Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages. Proceedings of the
10th edition of the Language Resources and Evaluation Conference
(LREC2016). Portoroz, Slovenia. 2614-2619. USAS Italian Semantic
Tagger. http://ucrel.lancs.ac.uk/usas/gui/ [Accessed 20/04/17].
Translator’s creativity in cultural elements transposition...
213
Procházková, Petra. 2006. Fundamentos de la lingüística de corpus. Concepción de los corpus y métodos de investigación con Corpus. www.prochazkova.de/fundamentos_de_la_ling%C3%BC%C3%adstica_de_corpus.pdf [Accessed 06/08/14].
Sánchez, Aquilino. 1995. Deinición e historia de los corpus. In Sánchez,
A.; Sarmiento, R.; Cantos, P. & Simón, J. (org.) CUMBRE. Corpus
lingüístico del español contemporáneo: fundamentos, metodología y
análisis. Madrid: SGEL (Sociedad General Española de Librería).
Santamaria Guinot, Laura. 2001. Subtitulació i referents culturals. La traducció com a mitjà d’adquisició de representacions mentals. (PhD thesis).
Barcelona: Universitat Autònoma de Barcelona, Departamento de Traducción e Interpretación. http://hdl.handle.net/10803/5249 [Accessed
30/10/2014].
Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.
Vázquez-Ayora, Gerardo. 1977. Introducción a la traductología: curso básico
de traducción. Washington D.C.: Georgetown University Press.
Venuti, Lawrence. 1995. The Translator’s Invisibility: A History of Translation. London: Routledge.
Xiao, Richard & Ming, Yue. 2009. Using corpora in translation studies: the
state of art. In Baker, P. 2012. Contemporary Corpus Linguistics. London: A&C Black.
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
Corpus-driven insights into the discourse of women survivors
of Intimate Partner Violence
El discurso de mujeres supervivientes de violencia de género:
incursiones lingüísticas basadas en un análisis de corpus
Alfonso Sánchez-Moya
Universidad Complutense de Madrid/Vrije Universiteit Amsterdam.
[email protected]
Received: 30/04/2017. Accepted: 11/10/2017
Abstract: Despite its ubiquity, Intimate Partner Violence (IPV) is still under-researched
from a Critical Discourse Studies (CDS) perspective. Thus, this paper investigates the
discourse of women survivors of IPV focusing on a corpus-driven examination of the
data. This is done after applying the text-analysis software tool LIWC (Linguistic Inquiry and Word Count) to a 120,000-word corpus collected from an anonymised, public, online forum available to IPV survivors. I contrast a plethora of linguistic phenomena in three online communities embedded within this forum (“Is it Abuse?”, “Getting
out” and “Life after abuse”) in the attempt to sketch out how the discursive output
varies across these three stages. This paper shows how pronominal distribution plays
a role in the forging of collective identity. Differences in the emotional tone across the
three explored groups are also identiied. Useful though these corpus-driven pointers
may be, this study also warns of the precaution with which indings solely deriving
from quantitative analyses need to be treated.
Keywords: intimate partner violence (IPV); digital discourse; CDS; corpus; LIWC.
Resumen: A pesar de su ubicuidad, la violencia de género es un campo aún poco explorado desde la perspectiva de los Estudios Críticos del Discurso. Este artículo investiga
el discurso de mujeres supervivientes de violencia de género poniendo el foco en un
análisis basado en el estudio de un corpus. Se efectúa tras aplicar LIWC (Linguistic
Inquiry and Word Count) a un corpus de 120.000 palabras de un foro en línea, público
y anonimizado disponible a supervivientes de violencia de género. Se contrastan varios
fenómenos lingüísticos en tres comunidades digitales de este foro (“¿Es abuso?”, “Dejando una relación abusiva” y “La vida después del abuso”) en el intento de esbozar
cómo la producción discursiva varía en estas etapas. Este estudio muestra cómo la
distribución pronominal es relevante en la forja de la identidad colectiva. Se identiican
Sánchez-Moya, Alfonso. 2017. “Corpus-driven insights into the discourse of women
survivors of Intimate Partner Violence”. Quaderns de Filologia: Estudis Lingüístics 22: 215-243. doi: 10.7203/qf.22.11309
también diferencias en el tono emocional de estos tres grupos. A pesar de su utilidad,
esta investigación advierte de la precaución con la que lidiar con resultados procedentes
únicamente de análisis cuantitativos.
Palabras clave: violencia de género; discurso digital; Estudios Críticos del Discurso;
corpus; LIWC.
Corpus-driven insights into the discourse of women survivors...
217
1. Introduction
Based on the intersections of critical discourse studies (CDS), a corpus-driven analysis, and the exploration of a sociological phenomenon
such as Intimate Partner Violence (IPV henceforth) from a discursive
perspective, this article seeks to provide insights into the discourse used
by women in a publicly-accessible online forum that fosters the exchange of posts around this type of violence. Given the affordances
of the site under scrutiny, and by employing corpus-assisted research,
this study pursues to gain a better understanding of IPV as a social
phenomenon by evaluating the linguistic choices made by users of this
forum. To wit, I shall investigate the differences in language use among
three of the different online communities nested within this site: ‘Is
it abuse?’, ‘Getting Out’, and ‘Life after an abusive relationship’. By
doing so, I arguably establish a correlation between these three communities and different stages within an abusive relationship in the attempt to sketch out how the discursive output varies across these three
stages. This is operationalised by running a LIWC analysis to a corpus
consisting of 120,000 words (40,000 words per each of the above-mentioned communities) and by later contrasting the distribution of words
as grouped in linguistic categories provided by LIWC (%) that characterise the three online communities.
This paper is organised as follows: Section 2 offers a succinct overview of core concepts in this paper and how they are understood (namely IPV, discourse and CDS and Corpus Linguistics (CL)). Section 3
considers the most salient methodological considerations, placing an
emphasis on LIWC, the text-analysis software tool being used for carrying out my analysis. Section 4 engages with the presentation of the
indings, in addition to discussing their implications. Finally, Section
5 gives concluding remarks, identiies limitations and draws possible
lines for future research.
2. Theoretical preliminaries: the exploration of IPV from CDS
Asserting that violence is widely spread across most societies and cultures is certainly unproblematic, especially when violence is regarded
one of the most salient global public health problems nowadays (WHO,
2016). Trying to provide a deinition of both the phenomenon and the
218
Alfonso Sánchez-Moya
many related issues around it, however, is not at all cut and dried. This
is partly rooted in the dificulty when conceptualising violence per se.
In fact, as suggested by sociological research in these lines, determining
the boundaries of what stands for violence and not-violence is hard,
especially in practice (Krug, Dahlberg, Mercy, Zwi and Lozano, 2002;
Walby, Towers et al., 2017). Reasons for this are multiple and are related to, inter alia, whether violence is actual, intended or threatened,
the diverse interpretations of concepts such as harm, or the repetition
of violent events (Walby, Towers et al., 2017). Not surprisingly, this
conceptual fuzziness has triggered methodological divergences when
trying to provide reliable accounts of violent events (Walby, Towers et
al., 2017).
Nonetheless, it can be arguably stated that Intimate Partner Violence
(IPV) is one of the most salient types of abuse addressed against women
(Heise, 1998). Contrary to more collective and multi-layered forms of
violence against women (VAW), IPV is characterised for its interpersonal character in the sense that violence largely takes place between
family members and intimate partners in wide range of settings, mostly
in private contexts (Krug et al., 2002:6). Straightforward though this
may seem, the mere attempt of providing a unique deinition of IPV as
a phenomenon is far from inding an agreement, which gives an idea of
how slippery this endeavour might be. In fact, although I adhere to the
understanding of IPV as a gendered phenomenon, many scholarly voices have challenged the assumption that IPV is a gender-driven phenomenon. According to these views, this is linked to higher victimisation
rates among women (Nicholls and Dutton, 2001) which may be related
to conservative ideas around manhood and a consequent under-reporting of abuse by male victims (Dutton and Nicholls, 2005) or the tendency to believe that violence initiated by women is treated differently
because it results in less serious physical harm on male partners than
vice versa (Ross and Babcock, 2009). Although I believe that the rather
ill-deined boundaries of some violent acts play a signiicant role in
what accounts for violence – especially when it comes to psychological
abuse, for instance, as argued by Winstok and Sowan-Basheer (2015),
there is solid evidence to claim that IPV is strongly inluenced by the
gender variable (Harris et al., 2012). Global institutions have widely
observed that “the overwhelming global burden of IPV is borne by
women” (WHO, 2016), so much so that 1 in 3 (35%) women worldwide
Corpus-driven insights into the discourse of women survivors...
219
can be alleged to have experience IPV in their lifetime (WHO, 2016).
In the attempt to provide a more proximate depiction of this situation in
the context where this research is framed, it is noteworthy to mention
that 46% of female homicide victims in England and Wales between
2013-2014 were killed by a male partner or ex-partner in contrast with
7% of male victims by a female partner during the same period (Ofice
for National Statistics, 2015).
Interesting though discussions around these concepts may be, further
engagement with them would fall outside the scope of this article1. Notwithstanding the controversies around this type of violence, and based
on previous studies (Crowell & Burgess, 1996; Heise & García-Moreno, 2002), I understand IPV as multiple, non-mutually exclusive acts of
controlling, coercive, threating, degrading or violent behaviour within
an intimate relationship triggered by a partner or ex-partner that causes
physical, psychological or sexual harm to those in the relationship. As
may be noted, I refrain from using a gender-based deinition of IPV. By
no means does this imply I do not recognise the gender dimension within IPV. Rather, the main motivation for this is that this approach lends
itself more suitably to also deal with this type of violence in homosexual partnerships, where the application of gender standards is not always
so straightforward. Nonetheless, this piece of research concentrates in
heterosexual relationships in which violence is exerted in women by the
male counterpart in the relationship.
Awareness-raising around IPV was brought about partly as an aftermath of the second wave of feminism back in the 1980s. Since then,
there have been serious attempts to tackle this issue from a multiplicity
of angles. From an institutional standpoint, after the United Nations
Declaration on the Elimination of Violence Against Women in 1993,
efforts to deine gender violence as a particular type of violence crystallised, providing a taxonomy of different types therein, and a systematic
encouragement to eradicate it in any of its possible manifestations. Not
unexpectedly, academic work has similarly contributed to providing
a more accurate understanding of IPV in a plethora of possibilities, a
small representation of which I move on to briely mention now. AlFor a brief illustration on the multiple attempts to understand IPV, albeit advocating
that no single theory can fully explain the phenomenon of IPV, see Ali and Naylor
(2013).
1
220
Alfonso Sánchez-Moya
though research on the sociological (and worldwide) dimensions of
IPV is extensive (Dobash and Dobash, 2015), many others have also
examined the connections between IPV and physical (Campbell, 2002),
psychological (Kumar et al., 2013) and reproductive health (Dartnall
and Jewkes, 2013). Furthermore, as a positive outcome of the institutional claims, the legal facets of IPV have been widely investigated too
(Walker, 2015).
Interestingly, a great proportion of studies taking IPV on board suggest that their main motivation is to be conducive to deeper insights into
this social phenomenon, therefore implying that there is still much to
be done in these lines. Research from the language sciences have also
echoed this pressing need, giving rise to a growing body of research investigating how the forms in which linguistic issues and IPV are intertwined. One observable trend deals with discourses of/about IPV, mostly focussing on recontextualised representations of both IPV and key
social actors typically involved in it (namely abused women and their
abusive male partners) in media discourse (Santaemilia and Maruenda,
2014) or online environments (Bou-Franch, 2013). Necessary though
these studies are, attempts to examine discourses by social actors in IPV
contexts are somewhat less frequent to date. This may be related to the
complexity of gathering data, given the sensitive nature of this issue.
Nonetheless, explorations of the macro-level of discourse in IPV contexts have drawn thought-provoking conclusions that can be of valuable help to gain a richer comprehension of IPV and social actors therein
(Baly, 2010). Boonzaier (2008), for example, identiies the traces of
“femininity discourse” in narratives of abused women, which underpins
the loving, caring and nurturing roles of women that partly affect these
women’s self-construction as the ones to blame for the situation. This
paucity of research becomes even more remarkable when studies on the
micro-level of discourse are concerned. In fact, although studies relying
on a more detailed linguistic operationalisation have analysed an array
of discursive structures in the representation of IPV episodes (Stokoe,
2010), I would argue that discourse-driven approaches to women suffering from IPV and their self-reported experiences around it are still
under-researched.
In fact, it is striking to observe that IPV has not gained suficient
attention from Critical Discourse Studies, a ield that has been traditionally characterised, inter alia, for analysing “opaque as well as trans-
Corpus-driven insights into the discourse of women survivors...
221
parent structural relationships of dominance, discrimination, power
and control as manifested in language” (Wodak and Meyer, 2009: 10).
This view is partly possible due to the conceptualisation of discourse
as socially constitutive as well as socially conditioned (Fairclough and
Wodak, 2004), which turns discourse into a “potential and arguably
actual agent of social construction” (Sunderland and Litosseliti, 2002:
13) with a crucial role for creating, sustaining and/or transforming the
social status quo (Hart and Piotr, 2014). These are the principles that
ooze from the many social issues that have been explored through the
CDS lenses, dealing with power issues in contexts related to political
discourse (Marín-Arrese, 2011), racism (Van Dijk, 2015) and gender
and sexualities (Baker, 2008), to name just a few. In fact, this motivation of readdressing power inequalities is a priority for CDS analysts.
Similarly, CDS is also characterised by presupposing a political stance
on the part of the researchers that seeks to bring about social change
(Hart and Piotr, 2014). For this to be accomplished, a permanent recursivity between linguistic mechanisms (especially at the micro-level of
discourse) and how these are interwoven in the fabric of the macro-(social) structures (KhosraviNik, 2010).
Although the investigation of IPV from CDS seems justiied now,
the outcome of this study would surely differ depending on the perspective within CDS I were to adopt when examining this social issue. As thoroughly depicted by one the latest compilations dealing with
CDS (Hart and Piotr, 2014), different theoretical and methodological
approaches to the study of discourse have prompted the development
of multiple tool boxes from which to provide discourse-based insights
into a social-driven concern. More traditional approaches (Wodak and
Meyer, 2009) have been widely criticised on the basis of researchers’
bias and data representativeness (Stubbs, 1997; Widdowson, 2004).
This has triggered interesting reactions within the ield to respond to
this criticism. Both the socio-cognitive and the corpus linguistic approach can be seen as two consistent and systematic attempts to tackle
some of the above-mentioned weaknesses. Interestingly, this article is
somewhat embedded in the intersection of these two approaches, as I
try to justify in what follows.
As Teun Van Dijk puts it,
most earlier and contemporary theories in CDS assume a direct link
between discourse and society (or culture), [but] the problem is that
222
Alfonso Sánchez-Moya
the nature of these casual or similar direct relationships is not made
explicit but taken for granted or reduced to unexplained correlations
(2014: 121).
It is the unexplained nature of these correlations that Van Dijk attempts to solve by endorsing the socio-cognitive approach to the understanding of discourse (Van Dijk, 2014). While providing an accurate
picture of this approach to discourse would challenge the space constraints of this paper, it is noteworthy to mention some of its key tenets.
In short, it is claimed that the accounts in which individual language
users frame text and talk is based on socially shared representations
of individual social actors as members of various social collectivities,
thus implying that personal and social dimensions in discourse processing are inextricably intertwined (Van Dijk, 2014). In other words, “our
ongoing experience and understanding of the events and situations of
our environment take place in terms of mental models that segment,
interpret and deine reality as we ‘live it’” (Shipley and Zacks, 2008;
Van Dijk, 2014). Mental models are therefore regarded as the “interface
between discourse and the social or natural environment” (Van Dijk,
2014:124) and are given the potential of having a fundamental role in
the production and comprehension of discourse. Accordingly, this approach defends
a mutually constitutive relationship between discourse and social cognition, where discourse is instantiated in texts that project and transform
socio-cognitive representations (SCRs), both the discourse producers’
and the recipients’ (Koller, 2014:152).
What is more, socio-cognitive representations (SCRs) are “not individually held mental models, but cognitive structures shared by members of a particular group” (Koller, 2014). Consequently, they are “socially and discursively constructed in the course of … communication
[…], and are subject to ‘continual transformation […] through the ebb
and low of intergroup relations’” (Augoustinos et al., 2006: 258-259).
As will be speciied in the next section, this view of discourse gains
more prominence if the communicative context this article pays attention to is taken into account. Rather than analysing discourse by isolated language users, I investigate how online users of an IPV forum engage in the construction of their online collective identity and the ways
Corpus-driven insights into the discourse of women survivors...
223
in which this is instantiated in their discursive production. This seems
to it nicely into the motivations of this approach, since as suggested
indeed by Koller (2014: 153),
[a] socio-cognitive approach to critical discourse studies is well suited
to analysing collective identities and is especially relevant at the interpretation stage of analysis, which addresses the questions as to why
text producers have selected a range of linguistic devices to construct
groups in a particular way.
As anticipated before, CDS research has been criticised for a lack
of rigour in both collecting and analysing data, accusing studies in the
ield of cherry-picking and questioning issues of representativeness and
randomness in data selection (Widdowson, 1998; 2004). In the attempt
to neutralise these arguments, CDS have gradually drifted towards a
reliance on the corpus linguistic approach, which are well suited for
identifying ideological patters of texts that would otherwise remain unnoticed (Baker, 2006). Another interesting contribution of the corpus
linguistic approach pertains to the possibility of enabling the researcher to examine the texts under analysis without preconceived notions
regarding the content of selected data (Baker et al., 2008). Despite its
multiple strengths, it is also important to bear in mind that an over-dependence on the corpus linguistic approach may also have undesirable
consequences for a CDS-oriented study. As pointed out by Fairclough,
corpus linguistics (CL) can be arguably criticised for a positivist reduction of the ‘actual’ to the ‘empirical’ or ‘the observable’ (2015: 22),
exposing CDS research to losing its character and purpose and to the
risk of being too constrained by the capacities of CL (2015: 23). This
is of particular signiicance in CDS, since many power imbalances are
discursively crafted in ways that are not textually explicit, becoming
therefore invisible for CL software (Fairclough, 2015). As far as this
article is concerned, I use a text-analysis software tool to provide a solid
starting point for my research purposes. On no account should this be
regarded as a deinite exploration of my data, which would very much
require a more in-depth qualitative investigation.
Overall, this section has sought to underpin the theoretical foundations of this study, which is embedded at the crossroads of CDS, IPV
and CL. As already discussed, taking into account the motivations behind CDS research, the exploration of a social phenomenon such as IPV
224
Alfonso Sánchez-Moya
from a socio-cognitive approach to discourse is deemed feasible. I assist my analysis by making use of a software tool (LIWC) and therefore
falls within CL, although I understand this application as a very initial
procedure that needs to be complemented by a closer examination of
the data.
3. Methodological issues
3.1. Data and data collection
This article is based on data collected from a publicly-accessible online
forum, hosted by a British charity with an outstanding determination to
provide support and resources of many sorts to both women and their
offspring when undergoing IPV. Although this type of data can be regarded as sensitive due to its content, the corpus analysed here is believed to respect principles of research ethics and ethical treatment of
persons as promulgated by key documents in this area (Markham and
Buchanan, 2012). Data under investigation here was collected from an
online forum where users are warned of the live, public character of
the site. Posts were therefore collected without the need of registering
in the site. Although my research interests comply with the socio-cognitive character of this type of discourse and are less concerned with
individual discourse usage per se, users are completely anonymised and
posts are moderated online, making sure that the revealing of personal
details cannot become a potential risk to the human being behind the
online persona. Nonetheless, discussions around internet-based data
are still vivid and currently being developed (Nissenbaum, 2010).
The analysis presented here is based on a corpus collected in two
different time spans to guarantee a richer discursive outcome (December 2014 – March 2015 and December 2015 – May 2016). Despite the
fact that studying the interaction generated from the exchanging of
messages would surely yield interesting data, this corpus only consists
of posts which are the irst in the thread they belong to. Reasons behind
this deal with the primary purpose of my research, which is interested
in how the perpetrator is referred to in these posts for the irst time.
The assumption that the activation of the perpetrator in the irst post
of a thread would likely inluence the mechanisms used in following
posts, cross-post interaction has not yet been considered. In the attempt
Corpus-driven insights into the discourse of women survivors...
225
to contrast the discursive production within the online forum, the total
amount of words was collected from three out of the many online communities in the same site. Accordingly, 40.000 words were collected
from ‘Is it abuse?’ (122 unique posts), ‘Getting Out’ (163 posts), and
‘Life after an abusive relationship’ (187 posts) respectively, resulting
in a total of 120.000 words. These three communities are frequently
referred to SB1, SB2 and SB3 respectively. Full posts from the three
communities are included in Table 1 below in the attempt to illustrate
the type of discourse under investigation.
Forum
community
SB1
SB2
SB3
Illustrative post
What is abusive? Is it when they constantly need u around his
relative is v I’ll and he’s saying he needs me someone close
however I need to work night shifts so I’m knackered and
I’m stressed myself [sad_emoji] I feel bad I’m not with him
after my nights but he can’t sleep and he’s snappy cos he’s
upset he tells me I’m selish sometimes wen I don’t come
over I just feel like a realty bad girlfriend I can’t take time
off cos I’ve taken time off not so long ago for a death in my
own family and I was sick few times plus my work he has
been violently abusive towards me before snd actually gave
me somewhere to live so it’s not a good look.... Advice and
suggestions
i’m having to lee again
need to pack up & start again as he crushed my life again
this time trying to do it all with laughter
anyone got any practical tips on the subject ov securing
permanent housing as feel 2 mentally unstable to mix with
people but need new start & 2 rocky to think practically
Its been almost (information removed by moderator) months
and i can honestly say i’ve broken the seal he used to brainwash me to the point i stopped drinking and going out socially although i have not mastered the going out to town
with the gorls bit yet i inally felt conident safe and unguilty
to enjoy myself over the new year and with friends i aint seen
in ages! Massive sigh of relief! I had my irst few drinks in a
year! X
Table 1. Illustrative posts collected from the three forum communities
226
Alfonso Sánchez-Moya
3.2. Applying LIWC to the analysis of discourse by IPV survivors
Given the pressing need of counteracting claims of cherry picking in
CDS (Hart and Cap, 2014), there has been a gradual increase in the use
of software tools to scrutinise texts within the ield in particular and
applied linguistics in general. Although not as widely spread as software used for similar purposes (such as AntConc, WMatrix or Sketch
Engine), Linguistic Inquiry Word Count (LIWC henceforth) was developed by a team of social psychologists led by James Pennebaker at the
University of Texas. In short, LIWC is a programme for quantitative
text analysis that relies on word count strategies to investigate issues
concerned with content analysis and style. It is based on the assumption
that lexical choices made by people transmit psychological information
over and above their literal meaning and independent of their semantic
context (Pennebaker et al., 2007), which can at the same time be used to
make inferences about dimensions of individuals’ personalities (Tausczik and Pennebaker, 2010).
This tool processes speech samples by identifying and classifying
them according to the three internal dictionaries that the LIWC2015
version has, which consists of almost 6.400 words, words stems and selected emotions (LIWC, 2017)2. LIWC software provides the percentage-use indices of 80 standard linguistic categories of different types
as they are represented in the scrutinised texts submitted by the LIWC
user. Apart from the word count of each ile, this data record includes
4 language variables (analytical thinking, clout, authenticity, and emotional tone), 21 standard categories identifying function words (% of
pronouns, articles, auxiliary verbs, etc.), 41 semantic categories dealing with psychological constructs (such as affect, cognition, biological
processes). Additionally, although not so central for the motivation of
this study, information is supplied regarding informal language makers (assents, illers, swear words) or punctuation categories (periods,
commas). Broadly speaking, this output measure is correlated to both
personality and real-world outcome measures, which arguably capture
people’s social and psychological statuses as represented in their discursive production.
This paper is based on LIWC 2015 version. More details on the development and
psychometric properties of it can be found in Pennebaker et al., 2015.
2
Corpus-driven insights into the discourse of women survivors...
227
LIWC has been applied to language-driven research in combination with more social-oriented issues. Generally speaking, Pennebaker
(2011) has suggested that the frequency with which people engage in
the use of word categories can be directly linked with issues of power and social class or people’s degree of social connectedness. More
speciically, LIWC has been used in educational settings in order to
predict inal course performance based on the difference in thinking
styles by comparing high-performing students with low-performing
ones (Robinson, Navea and Ickes, 2014). Additionally, perhaps closer
to my research interests, LIWC has also been employed to scrutinise
political discourse, using the software tool to try to measure aspects
of personality dimensions in political speeches (Slatcher et al., 2007;
Kangas, 2014).
Nevertheless, to my knowledge, the investigation of online accounts
of IPV by employing LIWC has not been endeavoured yet. Rather
than focussing on how LIWC categories would relect individuals’
real-world measures, I was interested in observing if the distribution
of word categories would vary if the above-mentioned communities
within the same online forum were to be contrasted. To this end, I submitted each set of 40.000 words to LIWC, obtaining as a result the
percentage of words belonging to each of the already-given categories
provided by LIWC. Although it is possible to think of some limitations
to this (which shall be explored in the inal section of this article), by
doing so I sought to shed light on the bigger discursive picture of these
three online sub-communities. Therefore, my main aim was to obtain
a preliminary approximation to the discursive character of these three
groups based on the LIWC categories, observations that would certainly need to be considered from a more contextualised perspective of the
analysed discourse via a qualitative-driven exploration of the data. All
things considered, this piece of research is guided by the following research questions:
1. How can the application of text-analysis software tools such as
LIWC contribute to better understand the online discourse of
women undergoing IPV-related experiences?
2. How can LIWC-provided categories shed light on the discursive
characterisation of the three communities nested in the IPV online forum under scrutiny?
228
Alfonso Sánchez-Moya
4. Analysis and discussion
This section presents the percentage-use indices of those LIWC categories that are deemed to be more pertinent for the purposes of this study.
In fact, the output of these LIWC categories are used to organise this
section in several subsections, which present and discuss the implications of those percentages for the social issue under investigation. It is
worth pointing out that statistical treatment of these igures is complex
given that this study does not account for individuals’ discursive production but, rather, it understands the language production in each of
the three analysed categories as embedded in the socio-cognitive approach to discourse. Nonetheless, note that the number of words in each
of them is always the same (40.000), enabling thus the contrast between
them. For discussion purposes, I normally take the online community
including users writing about life after abuse (SB3) as a reference, paying special attention to increasing or decreasing patterns if the other two
are considered.
4.1. Language variables: analytical thinking, clout, authenticity
and emotional tone
Among the many categories LIWC uses to classify words, there are
six of them that fall within the group “summary language variables”
(Pennebaker et al., 2015). It is possible to obtain information about the
words per sentence (WPS), the percentage of words with less than six
letters (Sixltr) or ind out about those which appear in LIWC dictionaries (Dic). Although these categories have been used by research to trace
correlation between them and the complexity of thinking styles, in this
article I will be focussing on the remaining four: analytical thinking
(Analytic), clout (Clout), authenticity (Authentic) and emotional tone
(Tone). Quite remarkably, these four categories are based on indings
from previous research carried out by the developers of the tool, references to which will be pointed out accordingly.
It may be useful to briely explain these four categories. First, the
category analytical thinking is thought to capture the extent to which
words may indicate formal, logical and hierarchical thinking patterns
(LIWC, 2017, Pennebaker et al., 2014). Results from educational contexts have put forward that a low percentage in this category may im-
229
Corpus-driven insights into the discourse of women survivors...
ply using language in more narrative ways, focussing on the here-andnow and leave more room for personal experiences (Pennebaker et al.,
2014). Second, clout refers to “the relative social status, conidence,
or leadership that people display through their writing” (LIWC, 2017,
Kacewicz et al., 2013), a high number suggesting a more expert and
conident style whereas a low number would indicate more tentative or
even anxious style (Pennebaker et al., 2014). Third, the algorithm for
authenticity derives from a series of studies indicating that when people
reveal themselves in authentic or honest ways are prone to be more
personal, humble, and vulnerable (LIWC, 2017; Newman et al., 2003).
Fourth, emotional tone seems to be more straightforward in interpretative issues, since the higher the percentage, the more positive the tone
(LIWC, 2017; Cohn et al., 2004).
Having considered these categories and what they stand for, it seems
timely to present the outcome measures (%) for the analysed corpus. As
illustrated in Table 2 below, two different tendencies can be observed
if both the forum communities and the summary language variables
are compared. On the one hand, especially if the irst and third stages are compared, there is an increase in categories referring to analytical thinking (+2,88%), emotional tone (+7,84%), and authenticity
(+13,76%). On the other hand, the clout category seems to behave differently, with a higher percentage of words in the irst community than
in the third one (-8,38%).
Forum
communities
SB1
‘Is it abuse?’
SB2
‘Getting out’
SB3
‘Life after abuse’
Summary language variables (LIWC)
Analytical
Clout
Authenticity
Tone
17,61
48,21
62,72
6,23
18,45
41,42
70,50
10,59
20,49
39,83
76,48
14,07
Table 2. LIWC summary language variables (in %)
If the brief considerations above are taken into account, one of the
most notable contrasts is observed when the emotional tone of these
three communities is considered (+7,84%). Quite expectedly, LIWC
230
Alfonso Sánchez-Moya
can be employed to suggest that women writing in ‘Life after abuse’
show a more positive emotional tone than those contributing to ‘Is it
abuse?’, an observation which was somewhat expected. Furthermore,
it can be argued that discourse in the ‘Life after abuse’ subcorpus responds to a more analytical pattern than discourse in ‘Is it abuse?’. A
lower percentage in the latter may therefore point out a stronger focus
on the here-and-now and on personal experiences, together with a tendency to offer more narrative accounts of these users’ experiences with
IPV. The tendency to express themselves in more personal and humble
ways is also reinforced by the higher percentage measuring authenticity
found in the third community, which additionally represents the most
noticeable contrast (+13,76%) if these four LIWC categories are taken
into account. Surprisingly, though, this would also suggest a greater
degree of vulnerability among users of this community. Neither had I
foreseen a weaker percentage for the category clout in the third subcorpus, especially because this could be seen as a characteristic of more
tentative, humble or even anxious style. These results would be at odds
with my original expectations and would also contradict results in some
other categories that will be discussed later in this paper.
4.2. Pronominal distribution
It goes without saying that a critical approach to the study of pronouns
has been traditionally central for CDS research, since they can convey
key information concerning issues of power and dominance (Van Dijk,
1993). As a result, given that they are frequently used as remote sensors
of group dynamics (Kacewicz et al., 2012), pronouns are at the core of
studies willing to draw conclusions on the discursive construction of
collective identities (Koller, 2008) since they can be used to identify
focus, priorities and intentions (Tausczik and Pennebaker, 2010). Not
unexpectedly, LIWC caters for this need in any language-driven inquiry
and provides percentages for a wide range of pronominal information.
There is an interesting number of indings deriving from the application of LIWC to social issues. For instance, it seems to be a correlation
between people who are undergoing physical or emotional pain and a
higher tendency to use irst-person singular pronouns (Rude, Gortner
& Pennebaker, 2004). In a similar vein, studies have also shown that
couples using the irst-person plural pronoun proved to assess the qual-
Corpus-driven insights into the discourse of women survivors...
231
ity of their marriage more positively than those who did not (Simmons,
Gordon and Chambless, 2005). More broadly speaking, research combining LIWC and pronouns has also explored political (Gunsch et al.,
2000) and academic discourses (Kowalski, 2000).
Table 3 below depicts the percentages offered by LIWC once the
three subcorpora under scrutiny were processed. It is worthwhile to
mention though that igures to indicate the percentage of ‘he’ needed to
be measured by AntConc (Anthony, 2014), since the version of LIWC
used for this analysis makes no difference between he and she. This can
arguably be seen as one of the major shortcomings of the tool. Notwithstanding this limitation, LIWC can provide interesting insights into the
way pronouns are used across the three subcorpora. Based on the data,
it is possible to observe the general use of personal pronouns is less salient in ‘Life after abuse’ (-1.24%), especially if both the irst and third
stages are juxtaposed. A similar pattern is observed in the case of he
(-1.62%) and we (-0.14%). However, the use of the irst-person pronoun
I (+0.39), the pronoun you (+0.12%) and instances of they (+0,25%) do
increase in SB3 if SB1 is considered.
Forum
communities
SB1
‘Is it abuse?’
SB2
‘Getting out’
SB3
‘Life after
abuse’
Pronouns (LIWC)
PPRON
I
WE
YOU
HE
THEY
17,47
9,53
0,86
0,48
3,84
0,45
16,77
9,91
0,78
0,41
2,82
0,43
16,23
9,92
0,72
0,60
2,22
0,70
Table 3. Pronominal distribution (in %)
The way in which pronouns are used across these forum communities may have several interpretations, as suggested by the variation in
percentages shown in Table 3 above. As far as the use of the irst-person
pronoun is concerned, there is a higher tendency to make use of it when
users post in ‘Life after abuse’. Based on similar studies (Rude, Gortner
& Pennebaker, 2004), this would suggest a higher index of psycholog-
232
Alfonso Sánchez-Moya
ical and emotional distress in stages where abuse has been somewhat
internalised, since many posts in ‘Is it abuse?’ show frequent instances
of hesitation in the attempt to comprehend if users’ particular situations
should be considered abusive for the rest of the online community. Interesting information can also be obtained by observing the distribution
of the pronoun you. Although the nature of the pronoun you in English
makes it dificult to differentiate if reference is being made to either
singular or plural entities, the increasing tendency of you in ‘Life after
abuse’ can be interpreted as a more consistent attempt to refer directly
to potential readers of the post (women in similar situations). In fact,
many posts in this inal stage adopt a more encouraging nuance when
providing support.
Besides, it seems clear that the pronoun he, undoubtedly one of the
most common mechanisms to refer to the perpetrator in these online
communities, becomes less and less central in these users’ discourse
when posting in ‘Life after abuse’. It could be hypothesised that this
may be due to the fact that the perpetrator is given less discursive prominence in the inal phase, when abuse seems to be a past event (note the
use of the preposition after in the very name of the community) and the
social actor responsible for that is gradually replaced. Nonetheless, a
rather different interpretation is also feasible if attention is paid to the
evolution of the pronoun they. Given the prominence that the third-person plural pronoun gains in SB3 if contrasted to SB1, this could be
also understood as a discursive collectivisation of the perpetrator. To
put it differently, there may be a discursive drift from representing the
perpetrator in individual terms (he) to collective ones (they), which may
have been partly inluenced by the mere use of the forum itself and to
the process of generating a stronger bond (favouring references of us as
women users against them, the perpetrators). Nonetheless, as it usually
happens when working with decontextualised instances of data, a more
qualitative exploration of the text would be required to pin down the
social actors behind these referential devices (since he could refer to a
male child and they can also possibly stand for my friends).
4.3. Analysing emotionality: positive and negative emotions
Studies combining linguistic analyses and psychological processes
within major social phenomena have proved that LIWC is capable of
Corpus-driven insights into the discourse of women survivors...
233
providing accurate identiication of emotion in language use (Tausczik
and Pennebaker, 2004; Kahn et al., 2007). This research is driven by the
assumption that the different degrees and mechanisms in which people
express their emotions are fundamental to comprehend how they are
experiencing the world (Tausczik and Pennebaker, 2004). Not surprisingly, LIWC has been applied to the exploration of emotionality in trauma and health discourses in different contexts, such as cancer (Bantum
and Owen, 2009) or relationship narratives (Boals and Klein, 2005).
Moreover, there have been attempts to examine narratives by IPV survivors (Holmes et al., 2007). Although the analysis was based on 32
volunteers in non-CMC contexts, a LIWC scan found that making use
of more positive and negative emotion words to talk about their experiences with violence prompted increased feelings of physical pain over
the writing sessions, concluding that the higher use of emotion words,
the more the perceived immersion in the traumatic event.
LIWC measurements for emotionality in the corpus under inspection
are depicted in Table 4 below. Broadly speaking, LIWC is able to identify emotions in two broad spectra: positive and negative emotions. More
speciically, it can detect three subtypes of negative emotions (anxiety,
anger and sadness). As Table 4 shows, the amount of positive emotions
increases within ‘Life after abuse’ if compared to ‘Is it abuse?’ (+0,71%).
Conversely, the percentage measuring negative emotions decreases in
the third community if compared to the irst one (-0,22%), although this
decrease would be even more signiicant if the second community was
to be taken into account (-0,34%). Curious results can be observed if the
type of negative emotions is compared. Accordingly, words measuring
‘anxiety’ escalate from SB1 to SB2 (+0,10%), although a higher peak is
observed in SB2 (+0,15%). With regard to ‘anger’, however, percentages decline if ‘Life after abuse’ and ‘Is it abuse?’ are compared (-0,24%).
Interestingly, the percentages measuring the output for sadness show a
more stable distribution across the three communities, inding a slight
deviation from the inal to the initial stage (-0,01%).
These results yield thought-provoking interpretations. On the one
hand, especially judging from the observed deviation in percentages found SB1 and SB3, the emotional tone across these communities
seems to be arguably distinctive. Thus, whereas lexical choices categorised as positive are more salient in ‘Life after abuse’, a more negative nuance is perceived in ‘Is it abuse?’. This is a somewhat expected
234
Alfonso Sánchez-Moya
inding, since more a more optimistic sort of narrative was more likely
to permeate the overall discursive scheme within this community. This
gains more prominence if posts within this community are analysed in
qualitative terms, as they are generally characterised by very supportive
messages who seek to give encouragement to other users at this stage.
As opposed to this, a more negative emotional tone takes over within
the irst community, which again is understandable bearing in mind that
many of these users take advantage of this community to share their
experiences so that other users can share their views on the abusive
character of these situations.
Forum
communities
Emotionality (LIWC)
POSEMO
NEGEMO
ANX
ANGER
SAD
SB1
‘Is it abuse?’
2,03
3,81
0,67
1,32
0,79
SB2
‘Getting out’
2,27
3,47
0,82
1,11
0,72
SB3
‘Life after
abuse’
2,74
3,59
0,77
1,08
0,78
Table 4. Analysing emotionality with LIWC (in %)
On the other hand, the evolution of more nuanced negative emotions
is worth alluding to. Unlike a more even distribution of lexical items
across the three communities belonging to the category ‘sad’ according
to LIWC, a somewhat divergent tendency is perceived should the focus
be on ‘anxiety’ and ‘anger’. In fact, based on the results illustrated in
Table 4 above, lexical choices suggesting a higher degree of anxiety
reach its peak in ‘Getting out’. This may imply that women undergoing
IPV may feel more anxious when, having acknowledged they are being abused, they are in the process of leaving the abusive relationship.
However, traces of ‘anger’ in the corpus under scrutiny seem to be more
present at an initial stage (SB1), decreasing gradually if the inal phase
(SB3) is regarded.
235
Corpus-driven insights into the discourse of women survivors...
4.4. Acting in particular ways: the drives behind these forum users
Although slightly less covered by previous studies using LIWC, another interesting set of categories is the one amalgamated into the umbrella
term ‘drives’ (Pennebaker et al., 2015). Broadly speaking, LIWC attempts to offer insights into the feelings that make language users act in
particular ways. Five subcategories are considered for these purposes,
relying upon lexical items which are namely included here: afiliation
(ally, friend, social), achievement (win, success, better), power (superior), reward (take, prize, beneit) and risk (danger, doubt) (Pennebaker
et al., 2015).
LIWC analysis around these forum users’ drives are summarised
in Table 5 below. As suggested, lexical items measuring the degree of
afiliation decrease in ‘Life after abuse’ if compared to ‘Is it abuse?’
(-0,08%), although not remarkably. The different measurement is equally slight if ‘achieve’ is taken into account, with a rather stronger tendency in SB3 than in SB1 (+0,07%). Steadier divergences are encountered
however if the remaining three categories are analysed. When it comes
to quantifying levels of ‘power’ as encapsulated by the lexical choices
across the three communities, a more signiicant difference is found in
SB3 if the two previous stages are contrasted (+0,30% if compared to
SB1, +0,44% if the same is done with SB2). A higher percentage is also
observed in ‘Life after abuse’ as far as ‘reward’ is concerned (+0,26%).
Contrary to this tendency, lexical choices measuring ‘risk’ seem to be
less noticeable in SB3 if set against SB1 (-0,24%).
Forum
communities
Drives (LIWC)
AFFILIAACHIEVE POWER REWARD
TION
RISK
SB1
‘Is it abuse?’
2,54
1,06
2,30
1,18
0,95
SB2
‘Getting out’
2,32
1,13
2,16
1,34
0,79
SB3
‘Life after
abuse’
2,46
1,13
2,60
1,44
0,71
Table 5. Forum users’ drives according to LIWC (in %)
236
Alfonso Sánchez-Moya
As suggested by the percentages above, some of these areas show
a degree of divergence that may suggest signiicant alterations in the
discursive characterisation of the online communities explored for this
study. One of the categories that particularly catches my attention is
the one linked to power. As pointed out in the irst subsection of this
chapter above, obtaining a minor percentage in the category ‘clout’ in
the third subcorpus can be interpreted as a characteristic of more tentative, humble or even anxious style in SB3. This interpretation seems to
be at odds if closer attention is paid to the evolution of lexical choices
itting in the ‘power’ category. In fact, such irm increase (+0,30% SB3
to SB1, +0,44% SB3 to SB2) would respond to my original expectations, which presumed traces of empowered discourse in the ‘Life after
abuse’. In a similar vein, this trend would also be reinforced by examining the evolution of risk. Lexical choices connected to risk within SB3
are less signiicant if contrasted with SB1, which would again match
my original expectations when equating the ‘Is it abuse?’ with a stage
that is characterised for a higher presence of risks and challenges for
women undergoing IPV.
5. Concluding remarks
This article has sought to demonstrate how a software tool for quantitative text analysis (LIWC) can effectively be employed to provide
corpus-driven insights into the micro-level of discourse by an online
community of women who have at some point experienced IPV in their
lives. This study has made use of some of the most relevant linguistic
categories measured by LIWC to investigate the discursive frames that
characterise three online communities within an online forum that offers its users the chance of engaging in narratives that seek to provide
assistance and help to other users that participate in this online environment. It is worth recalling a key research question in this article was
to explore the ways in which the application of text-analysis software
tools such as LIWC can contribute to better understand the online discourse of women undergoing IPV-related experiences. For this purpose,
and by scrutinising the output measures provided by LIWC (in percentages), results have showed how collective identity is forged within
these three online communities and the ways in which this permeates
in the discourse they use. These indings are of particular relevance
Corpus-driven insights into the discourse of women survivors...
237
if framed within the socio-cognitive approach to discourse, tenets of
which are also highlighted.
Exploring the possible ways in which LIWC-provided categories
can shed light on the discursive characterisation of the three online
communities investigated in this paper was another central research
question. With this in mind, having incorporated the analysis provided
by LIWC, interesting observations have been found. As suggested elsewhere, users in ‘Is it abuse?’ are remarkably characterised for a negative
emotional tone, which becomes more positive in ‘Life after abuse’. Additionally, users within ‘Life after abuse’ seem to express themselves in
more personal and humble ways, which is justiied by a higher percentage in the category measuring authenticity. The fact that LIWC is capable of offering a detailed account of pronominal distribution in a given
corpus paves the way for reaching fascinating conclusions based on
the usage of pronouns. Although more qualitative explorations would
be crucial to reinforce the validity of these arguments, the decreasing
tendency when using the third-person singular pronoun (he) in ‘Life
after abuse’ may prove that discourses around the perpetrator weaken in
the third community. Nevertheless, based again on the percentage that
LIWC offers for the third-person plural pronoun (they) another feasible
interpretation would view this changing pattern as a process of collectivisation of the perpetrator. Thus, inluenced by exposure to socio-cognitive representations of the perpetrator in the forum, users can be said
to move from an individualised referential strategy (he) to a collective
one (they). Furthermore, LIWC can also provide assistance when measuring lexical emotionality. As argued above, there seems to be a divergence in the ways in which negative emotions evolve if the three corpora are contrasted. Whereas lexical indicators of sadness prove to be
more uniform across the three communities, pointers of anxiety seem
to be more pervasive at intermediates stages (‘Getting out’) than at irst
ones, while the ones that measure anger are more likely to occur at the
outset. Quite relatedly, the use of LIWC can also be used to suggest a
gradual discursive empowerment in users writing in ‘Life after abuse’,
which may somewhat mirror a change also behavioural terms at this
inal stage.
Useful though these pointers may be to build bridges between the
micro and the macro levels of discourse, results deriving solely from
quantitative explorations need to be treated with due precaution. As
238
Alfonso Sánchez-Moya
stated by main developers of LIWC itself, “the study of word use as
a relection of psychological state is in its early stages” (Tausczik and
Pennebaker, 2010:30). This is one the reasons why future research in
this ield could aim at incorporating similar text-analysis tools such
as Lingmotif (Moreno-Ortiz, 2016) to investigate these tools and their
different affordances may trigger complementary results. In any case,
although the incorporation of corpus-driven approaches to discourse
analysis has shown to be eficient to build language analyses upon more
empirically-based indings, the limitations of corpus linguistics need
to be considered and addressed. As already mentioned, making strong
claims on the basis of pronoun usage may trigger misleading interpretations of any discursive event. Together with context, software tools
are still not well-equipped with mechanisms to deal with igurative language or ironic and sarcastic references. Consequently, studies aiming
at providing a holist view of a discursive phenomenon should always
leave room for qualitative examinations, which can usually account for
many of the already-mentioned drawbacks.
Acknowledgements
This research is funded by the Spanish Ministry of Education (FPU1304471). I would also like to thank both reviewers for their interesting
comments and observations, which I have incorporated in the inal version of this paper. Likewise, my wholehearted gratitude to the editors of
this volume for their editorial initiative and their admirable hard work
when compiling all the contributions in a very comprehensive harmony.
References
Ali, Parveen Azam & Naylor, Paul. 2013. Intimate partner violence: A narrative review of the feminist, social and ecological explanations for
its causation. Aggression and Violent Behavior 18(6): 611-619. doi:
https://doi.org/10.1016/j.avb.2013.01.003
Anthony, Lawrence. 2011. AntConc (Version 3.2. 2)[Computer Software]. Tokyo: Waseda University.
Augoustinos, Martha; Walker, Iain & Donaghue, Ngaire. 2006. Social Cognition: An Integrated Introduction (2nd ed.). London: Sage.
Baker, Paul. 2006. Using Corpora in Discourse Analysis. London: Continuum.
Corpus-driven insights into the discourse of women survivors...
239
Baker, Paul. 2008. Sexed texts: Language, Gender and Sexuality. London:
Equinox.
Baker, Paul; Gabrielatos, Costas; Khosravinik, Majid; Krzyżanowski, Michał;
McEnery, Tony & Wodak, Ruth. 2008. A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to
examine discourses of refugees and asylum seekers in the UK press.
Discourse & Society 19(3): 273-306. doi:10.1177/0957926508088962
Bantum, Erin O’Carroll & Owen, Jason. 2009. Evaluating the validity of computerized content analysis programs for identiication of emotional expression in cancer narratives. Psychological Assessment 21(1): 79. doi:
10.1037/a0014643
Boals, Adriel, & Klein, Kitty. 2005. Word use in emotional narratives
about failed romantic relationships and subsequent mental health.
Journal of Language and Social Psychology 24(3): 252-268. doi:
10.1177/0261927X05278386
Boonzaier, Floretta. 2008. “If the man says you must sit, then you must sit.”
The relational construction of woman abuse: Gender, subjectivity and
violence. Feminism & Psychology 18(2): 183-206. doi: https://doi.
org/10.1177/0959353507088266
Baly, Andrew. 2010. Leaving abusive relationships: Constructions of self and
situation by abused women. Journal of Interpersonal Violence 25(12):
2297-2315. doi: https://doi.org/10.1177/0886260509354885
Bou-Franch, Patricia. 2013. Domestic Violence and Public Participation in the
Media: The Case of Citizen Journalism. Gender and Language 7(3):
275-302. doi: 10.1558/genl.v7i3.275
Burguess, Anne & Crowell, Nancy. 1996. Understanding violence against
women. Washington: National Academy Press.
Campbell, Jacquelyn. 2002. Health consequences of intimate partner violence.
The Lancet 359(9314): 1331-1336. doi: http://dx.doi.org/10.1016/
S0140-6736(02)08336-8
Cohn, Michael; Mehl, Matthias & Pennebaker, James. 2004. Linguistic Markers of Psychological Change Surrounding September 11, 2001. Psychological Science 15: 687-693. doi: 10.1111/j.0956-7976.2004.00741.x
Dartnall, Elizabeth & Jewkes, Rachel. 2013. Sexual violence against women:
the scope of the problem. Best Practice & Research Clinical Obstetrics
& Gynaecology 27(1): 3-13. doi: 10.1016/j.bpobgyn.2012.08.002
Dobash, Rebecca & Dobash, Russell. 2015. Domestic Violence: Sociological
Perspectives. International Encyclopedia of the Social & Behavioral
Sciences (2nd ed.). Elsevier, 632-635. doi: 10.1016/B0-08-0430767/03935-8
240
Alfonso Sánchez-Moya
Dutton, Donald & Nicholls, Tonia. 2005. The gender paradigm in domestic
violence research and theory: Part 1, The conlict of theory and data.
Aggression and Violent Behavior 10(6): 680-714. doi: 10.1016/j.
avb.2005.02.001
Fairclough, Norman. 2015. Language and Power (3rd ed.). London: Routledge.
Gunsch, Mark; Brownlow, Sarah & Mabe, Zachary. 2000. Differential forms
linguistic content of various of political advertising. Journal of Broadcasting & Electronic Media 44(1): 27-42. doi: http://dx.doi.org/10.1207/
s15506878jobem4401_3
Harris, Kate; Palazzolo, Kellie, & Savage, Matthew. 2012. “I’m not sexist,
but...”: How ideological dilemmas reinforce sexism in talk about intimate partner violence. Discourse & Society 23(6): 643-656. doi:
10.1177/0957926512455382
Hart, Christopher & Cap, Piotr (ed.). 2014. Contemporary Critical Discourse
Studies. London: Bloomsbury Publishing.
Heise, Lori. 1998. Violence against women an integrated, ecological framework. Violence Against Women 4(3): 262-290. doi: 10.1
177/1077801298004003002
Heise, Lori & García-Moreno, Claudia. 2002. Violence by intimate partners. In
Krug, E.; Dahlberg, L.L.; Mercy, J. A.; Zwi, A. B. & Lozano, R. (ed.)
World Report on Violence and Health. Geneva: World Health Organization, 88-121.
Holmes, Danielle; Alper, Georg; Ismailji, Tasneem; Classen, Catherine; Wales,
Talor; Cheasty, Valerie; Miller, Andrew & Koopman, Cheryl. 2007.
Cognitive and emotional processing in narratives of women abused
by intimate partners. Violence Against Women 13(11): 1192-1205. doi:
10.1177/1077801207307801
Kacewicz, Ewa; Pennebaker, James; Davis, Matthew; Jeon, Moongee &
Graesser, Arthur. 2013. Pronoun use relects standings in social hierarchies. Journal of Language and Social Psychology 33(2): 125-143. doi:
10.1177/0261927X13502654
Kahn, Jeffrey; M. Tobin, Renee; Massey, Audra & Anderson, Jennifer. 2007.
Measuring emotional expression with the Linguistic Inquiry and
Word Count. The American Journal of Psychology: 263-286. doi:
10.2307/20445398
Kangas, Sara. 2014. What can software tell us about political candidates?: A
critical analysis of a computerized method for political discourse. Journal of Language and Politics 13(1): 77-97. doi: 10.1075/jlp.13.1. 04kan
KhosraviNik, Majid. 2010. Actor descriptions, action attributions, and argumentation: towards a systematization of CDA analytical categories in
Corpus-driven insights into the discourse of women survivors...
241
the representation of social groups. Critical Discourse Studies 7(1): 5572. doi: 10.1080/17405900903453948
Koller, Veronika. 2008. Lesbian Discourses: Images of a Community. London:
Routledge.
Koller, Veronika. 2014. Applying Social Cognition Research to Critical Discourse Studies: The Case of Collective Identities. In Hart, Christopher
& Cap, Piotr (ed.) Contemporary Critical Discourse Studies. London:
Bloomsbury, 147-166.
Kowalski, Robin. 2000. “I was only kidding!”: Victims’ and perpetrators’ perceptions of teasing. Personality and Social Psychology Bulletin 26(2):
231-241. doi: 10.1177/0146167200264009
Krug, Etienne; Dahlberg, L.L; Mercy, James; Zwi, Anthony & Lozano, Rafael
(ed.). 2002. World Report on Violence and Health. Geneva, Switzerland: World Health Organization.
Kumar, Anant; Nizamie, S. Haque & Srivastava, Naveen. 2013. Violence
against women and mental health. Mental Health & Prevention 1(1):
4-10. doi: https://doi.org/10.1016/j.mhp.2013.06.002
LIWC. 2017. Where do the numbers come from? How are they calculated? https://liwc.wpengine.com/interpreting-liwc-output/ [Accessed
22/03/2017].
Markham, Annette & Buchanan, Elizabeth. 2012. Ethical decision-making and
internet research: Recommendations from the AOIR ethics working
committee (version 2.0). http://www.dphu.org/uploads/attachements/
books/books_5612_0.pdf [Accessed 21/03/2017].
Marín Arrese, Juana Isabel. 2011. Effective vs. epistemic stance and subjectivity in political discourse: Legitimising strategies and mystiication
of responsibility. Critical Discourse Studies in Context and Cognition.
Amsterdam: John Benjamins, 193-224.
Moreno-Ortiz, A. (2016). Lingmotif 1.0 [Computer Software]. Málaga, Spain:
Universidad de Málaga. http://tecnolengua.uma.es/lingmotif.
Newman, Matthew; Pennebaker, James; Berry, Diane & Richards, Jane.
2003. Lying words: Predicting deception from linguistic styles.
Personality and Social Psychology Bulletin 29: 665-675. doi:
10.1177/0146167203029005010
Nicholls, Tonia & Dutton, Donald. 2001. Abuse committed by women against
male intimates. Journal of Couples Therapy 10(1): 41-57. doi: 10.1300/
J036v10n01_04
Nissenbaum, Helen. 2010. Privacy in context: Technology, policy, and the integrity of social life. Stanford: Stanford University Press.
Ofice for National Statistics. 2015. Intimate partner violence and partner abuse. https://www.ons.gov.uk/peoplepopulationandcommunity/
242
Alfonso Sánchez-Moya
crimeandjustice/compendium/focusonviolentcrimeandsexualoffences/
yearendingmarch2015/chapter4intimatepersonalviolenceandpartnerabuse [Accessed 03/03/2017].
Pennebaker, James; Booth, Roger & Francis, Martha. 2007. Linguistic Inquiry
and Word Count: LIWC [Computer software]. Austin: LIWC.net
Pennebaker, James. 2011. The Secret Life of Pronouns: What Our Words Say
About Us. New York: Bloomsbury.
Pennebaker James; Chung, Cindy; Frazee Joey, Lavergne Gary & Beaver, David. 2014. When small words foretell academic success: The case of
college admissions essays. PLoS ONE 9(12): e115844. doi: https://doi.
org/10.1371/journal.pone.0115844
Pennebaker, James; Boyd, Ryan; Jordan, Kayla & Blackburn, Kate. 2015. The
development and psychometric properties of LIWC2015. Texas: The
University of Texas. doi: 10.15781/T29G6Z
Robinson, Rebecca; Navea, Reanelle & Ickes, William. 2013. Predicting inal
course performance from students’ written self-introductions: A LIWC
analysis. Journal of Language and Social Psychology 32(4): 469-479.
doi: 10.1177/0261927X13476869
Rude, Stephanie; Gortner, Eva-Maria & Pennebaker, James. 2004. Language
use of depressed and depression-vulnerable college students. Cognition
& Emotion 18(8): 1121-1133. doi: 10.1080/02699930441000030
Santaemilia, José & Maruenda, Sergio. 2014. The linguistic representation of
gender violence in (written) media discourse. Journal of Language Aggression and Conlict 2(2): 249-273
Shipley, Thomas & Zacks, Jeffrey (ed.). 2008. Understanding Events. From
Perception to Action. Oxford: Oxford University Press.
Simmons, Rachel; Gordon, Peter & Chambless, Dianne. 2005. Pronouns
in Marital Interaction What Do “You” and “I” Say About Marital
Health? Psychological science 16(12): 932-936. doi: 10.1111/j.14679280.2005.01639
Slatcher, Richard; Chung, Cindy; Pennebaker, James & Stone, Lori. 2007.
Winning words: Individual differences in linguistic style among US
presidential and vice presidential candidates. Journal of Research in
Personality 41(1): 63-75. doi: 10.1016/j.jrp.2006.01.006
Stokoe, Elizabeth. 2010. “I’m not gonna hit a lady”: Conversation analysis, membership categorization and men’s denials of violence towards women. Discourse & Society 21(1): 59-82. doi: http://dx.doi.
org/10.1177/0957926509345072
Stubbs, Michael. 1997. Whorf’s children: critical comments on critical discourse analysis. In Ryan, Ann & Wray, Alison (ed.) Evolving Models of
Language. Clevedon: Multilingual atters, 100-116.
Corpus-driven insights into the discourse of women survivors...
243
Sunderland, Jane & Litosseliti, Lia. 2002. Gender identity and discourse analysis: Theoretical and empirical considerations. Gender Identity and Discourse Analysis. Amsterdam: John Benjamins, 1-39.
Tausczik, Yla & Pennebaker, James. 2010. The psychological meaning
of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology 29: 24-54. doi: https://doi.
org/10.1177/0261927X09351676
Van Dijk, Teun. 1993. Elite Discourse and Racism. London: Sage.
Van Dijk, Teun. 2014. Discourse-Cognition-Society. Current state and prospects of the Socio-Cognitive Approach to Discourse. In Hart, Christopher & Cap, Piotr (ed.) Contemporary Critical Discourse Studies.
London: Bloomsbury, 121-146.
Van Dijk, Teun. 2015. Racism and the Press. London: Routledge.
Walby, Sylvia; Towers, Jude; Balderston, Susan; Corradi, Consuelo; Francis,
Brian; Heiskanen, Markku; Helweg-Larsen, Karin et alii (ed.) 2017.
The Concept and Measurement of Violence against Women and Men.
Bristol: Policy Press.
Walker, Lenore. 2015. Looking back and looking forward: Psychological and
legal interventions for domestic violence. Ethics, Medicine and Public
Health 1(1): 19-32. doi: 10.1016/j.jemep.2015.02.002
Widdowson, Henry. 2004. Text, Context, Pretext: Critical Issues in Discourse
Analysis. Oxford: Blackwell
Winstok, Zeev & Sowan-Basheer, Wafa. 2015. Does psychological violence contribute to partner violence research? A historical, conceptual and critical review. Aggression and Violent Behavior 21: 5-16. doi:
10.1016/j.avb.2015.01.003
Wodak, Ruth & Fairclough, Norman. 2004. Critical discourse analysis. Qualitative Research Practice: Concise Paperback Edition. 185-202.
Wodak, Ruth & Meyer, Michael. 2009. Methods for Critical Discourse Analysis. London: Sage
World Health Organization. 2016. Violence Against Women. Geneva: World
Health Organization. http://www.who.int/mediacentre/factsheets/
fs239/en/ [Accessed 07/03/2017].
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
Immigration metaphors in a corpus of legal English:
an exploratory study of EAL learners’ metaphorical production
and awareness
Metáforas sobre inmigración en un corpus de inglés jurídico:
un estudio preliminar de la producción y conciencia metafórica
de estudiantes de inglés como lengua adicional (EAL)
Emilia Castañoa Castaño, Natalia Judith Laso Martínb
& Isabel Verdaguer Claverac
University of Barcelona. e.castano@ub
University of Barcelona.
[email protected]
c
University of Barcelona.
[email protected]
Received: 20/04/2017. Accepted: 31/10/2017
a
b
Abstract: Metaphor is central to human understanding and communication. It pervades
our everyday language and also abounds in specialized discourse, with legal language
not being an exception. This is particularly relevant since metaphors are powerful framing tools able to affect our worldview. With the aim of exploring the use that EAL law
undergraduate students make of metaphorical expressions as well as their awareness of
their connotations, a learner corpus was compiled and qualitatively analyzed. Results
have shown that learners, like native speakers, rely on the use of conceptual metaphors
such as mIgratIon Is a natural force, states are contaIners or ImmIgrants
are a threat to describe immigration issues. This exploratory study has also revealed
that learners are not always conscious of the negative slant that metaphors may convey
and that raising their awareness is key to enhance critical thinking.
Keywords: corpus linguistics; conceptual metaphor; metaphorical awareness; legal
discourse; EAL learners.
Resumen: La metáfora es un elemento central de la comunicación y la comprensión
humana. Abunda en el lenguaje cotidiano y también en el de especialización, no siendo una excepción el discurso legal. Este hecho es relevante ya que las metáforas nos
permiten enmarcar la realidad desde diversas perspectivas que condicionan nuestra per-
Castaño Castaño, Emilia; Laso Martín, Natalia Judith & Verdaguer Clavera, Isabel. 2017. “Immigration metaphors in a corpus of legal English: an exploratory
study of EAL learners’ metaphorical production and awareness”. Quaderns de
Filologia: Estudis Lingüístics 22: 245-272. doi: 10.7203/qf.22.11310
cepción del mundo. Con el objetivo de explorar el uso que los estudiantes de Derecho
con inglés como lengua adicional (EAL) hacen de las metáforas y de determinar si son
conscientes de sus connotaciones, se compiló y analizó cualitativamente un corpus de
aprendices. Los resultados han demostrado que los aprendices al igual que los hablantes
nativos utilizan metáforas conceptuales tales como la InmIgracIón es una fuerZa natural, los estados son contenedores o los InmIgrantes son una
amenaZa para describir el fenómeno de la inmigración. Este estudio exploratorio también subrayó la importancia de que los aprendices sean conscientes de la carga negativa
de algunas metáforas para promover el pensamiento crítico.
Palabras clave: metáfora conceptual; conciencia metafórica; discurso legal; aprendices de EAL; lingüística de corpus.
Immigration metaphors in a corpus of legal English...
247
1. Corpus linguistics and metaphor
Word meanings are multi-faceted and can present multiple sides, which
vary depending on the perspective from which they are viewed. Words
in isolation are ambiguous but their ambiguity is lost or reduced when
they are put in context. They have meaning potential, which is activated in a given context (Hanks, 2007). Many words, in addition to their
literal meanings, have metaphorical meanings, which often relect the
cognitive operations whereby we understand complex concepts (Lakoff
& Johnson, 1980, 1999). Thus, the meaning of rise in (1) is concrete
and refers to motion and in (2) it is metaphorical and refers to quantity.
(1) By the time the plane rose in to the air it was dark (British National
Corpus)
(2) Aluminium recycling in the UK rose to 9.5 last year (British National Corpus)
Corpus linguistics, which has allowed to analyze real language in
context, irst approached the analysis of the syntagmatic patterns of
language. More recently, however, it has also been applied to the analysis of igurative language, offering a way to carry out quantitative and
qualitative studies of metaphorical expressions as a phenomenon of
language in use. The availability of electronic corpora has enabled the
systematic search for metaphorical expressions in authentic texts and
has provided empirical evidence for the theoretical claims of the theory
of Conceptual Metaphor (Cameron & Deignan, 2003; Stefanowitsch,
2007).
The identiication and analysis of metaphorical expressions is methodologically much more complex than the study of lexical items, for
example, since metaphorical mappings have different lexical realizations and cannot be extracted from texts in a straightforward way. For
this reason, a number of procedures for metaphor extraction have been
devised, among them, manual searching; search for source-domain vocabulary; search for target-domain vocabulary; search for both source
and target-domain, or search based on ‘markers of metaphor’ realizations, that is to say, linguistic devices that may indicate the presence of
a metaphor (see Stefanowitsch, 2007 for an account of the problems
encountered in automatic metaphor extraction).
248
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
In addition, literature has shown that igurative language is used not
only in general language, but also in different types of genres and registers (Deignan et al., 2013). Recent research (Caballero, 2003, 2006;
Deignan et al., 2013; Herrmann & Sardinha, 2015) has stressed the need
to take into account register and the speciic linguistic characteristics of
a discourse community in the exploration of igurative language, due to
the relevance of shared knowledge in the production, recognition and
interpretation of metaphors. Finally, corpus linguistics has also greatly
contributed to aiding critical metaphor analysis by providing attested
linguistic evidence of the framing-evaluative power of metaphor (Charteris-Black, 2005).
2. Metaphor and Framing
The advent of Cognitive Metaphor Theory in the early 1980s shifted
the locus of metaphor from language to thought and posited the claim
that abstract concepts are metaphorically grounded in experiences arising from our embodied interactions with the environment, which “[is]
at once physical, social, cultural, economic, moral, legal, gendered, and
racialized” (Johnson, 2007). In this respect, metaphor, far from being
considered an ornamental device, is conceived of as a cognitive operation whereby abstract domains (target domains) are mapped onto
concrete experiential domains (source domains) through projections
that under the form ‘target domaIn Is source domaIn’ allow us to
understand, reason, and talk about abstract concepts and subjective or
complex experiences in terms of more concrete ones (Lakoff & Johnson, 1980, 1999; Semino, 2008). This property of metaphor makes it
a powerful framing tool, able to shape the way we perceive a situation or event by evoking particular worldviews and highlighting certain
aspects of a phenomenon while downplaying others (Lakoff & Johnson, 1980, 1999; Lakoff, 2004; Charteris-Black, 2005; Johnson, 2007).
Thus, for example, the choice of metaphors related to either sports or
war to describe a country’s foreign policy frames the topic in different
and contrasting ways: while sports metaphors depict foreign countries
as opponents, war metaphors do it as enemies, foregrounding the notion
of hostility. This property of metaphor transcends language boundaries,
by helping “to promote a particular problem deinition, causal interpretation, moral evaluation and/or treatment recommendation for the item
Immigration metaphors in a corpus of legal English...
249
described” (Entman, 1993: 52). Hence, metaphor becomes an exceptional instrument to analyze the conventional understanding of some of
the most controversial topics included in the journalistic, political and
legal agenda, such as, for example, immigration.
3. Immigration Metaphors in Public Discourse
A large body of studies has lately analyzed the metaphorical expressions that have shaped the European and American discourse on immigration in our recent history, as relected by mass media, blogs and political speeches (O’Brien, 2003; Charteris-Black, 2006; Wodak, 2006;
Cisneros, 2008; Biria, 2012; Musolff, 2015; Saiz de Lobado, 2015;
among others). From their results it becomes apparent that, despite
cross-cultural variation, the portrayal of immigration that has dominated public discourse since the early 20th century, at one point or another, has revolved around a network of metaphors that dehumanize
immigrants and/or describe them as a threat to host countries. Thus,
for example, several studies have shown that immigration is often described as a natural force, a flood, with an uncontrollable power and disastrous consequences for recipient communities (Santa Ana,
2002; O’Brien 2003; Charteris-Black, 2006; Chavez & Hoewe, 2012;
Strom & Alcock, 2017). Similar devastating effects have been found to
be attributed to immigration in metaphors that equate immigrants with
moBIle toxIc wastes (Cisneros, 2008) or weeds that infest the land
(Deignan, 2005).
Research has also provided evidence that subhuman metaphors such
as ImmIgrants are anImals (Santa Ana, 1999; Deignan, 2005), oBjects or commodItIes (El Refaie, 2001; O’Brien 2003) have coexisted, at least since the 1990’s, with metaphors that bestowed nations
with human qualities and led to conceptualize them as a Body or organIsm whose wellbeing is endangered by immigrants, seen now as
either a Burden (Santa Ana, 2002; Cisneros, 2008; Crespo-Fernández,
2013), IndIgestIBle food, InfectIous organIsms, (O’Brien 2003)
or ParasItes (Musolff, 2015). These metaphors seem to have been
partially displaced now by those that depict immigrants as Invaders,
crImInals or Illegal alIens (Flores, 2003; Binotto, 2015) against
whom a heroic ighter, the government, must act to protect the country’s
integrity (Santa Anna, 2002; O’Brien 2003; Musolff, 2011; Binotto,
250
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
2015). These metaphors rest on the conceptualization of the natIon as
a house or fortress (Charteris-Black, 2006; Cisneros, 2008; Biria,
2012) whose boundaries, either physical or symbolic, serve the purpose
of setting a dividing line between the us and them (Van Dijk, 2000) and
reinforce the sense of otherness that seems to pervade not only public,
as seen in this section, but also legal discourse on immigration, as will
be discussed in the following section.
4. Immigration metaphors in legal discourse
In spite of the preconceived idea that specialized registers are largely
free from igurative expressions, expanding research gives proof that
metaphor is widely used in speciic text-types (Deignan et al., 2013):
in the ields of politics and economics (Musolff, 2004; Charteris-Black,
2005); in medicine (Salager-Meyer, 1990; Faber & Márquez, 2004); in
biology (Ureña, 2012; Knudsen, 2015) are just a few examples. In this
respect, legal discourse is not an exception. Legal discourse is highly
metaphorical to the extent that conceptual metaphor and radial categories are argued to shape legal language and, to a certain extent, determine which arguments are valid in legal reasoning (Winter, 2001, 2006;
Ebbesson, 2008). This results logical if it is considered that law is “an
ideological artifact” (Orts, 2015: 30), a product of human understanding,
which is essentially metaphorical (Lakoff & Johnson, 1999; Johnson,
2007). Metaphors act as framing instruments not only able to convey
legal concepts but also to inluence thought and policies. Beyond theoretical assertions, the ubiquity of metaphor in law has been extensively
attested both in general (Winter, 2006) and speciic legal domains such
as corporate and criminal law (Duncan, 1994; Berger, 2004); constitutional and administrative law (Noah, 2000; Jackson, 2006); or intellectual property regulation (Loughlan, 2006; Larsson, 2013). In the case of
immigration law, several studies have shown that metaphor also plays
an important role in the legal construct of immigration. Thus, for instance, according to Cunningham-Parmeter (2011), the analysis of the
American Supreme Court texts evidences that for decades immigration
has been commonly conceptualized as a flood, an avalanche or
an InvasIon, and immigrants, as alIen outsIders or Illegals that
threaten the country’s stability. These same images repeat in the European legislation where immigration is also depicted as an uncontrol-
Immigration metaphors in a corpus of legal English...
251
laBle fluId and the natIon as a contaIner metaphor grounds the
proliferation of exclusion metaphors such as ImmIgrants are alIens
or enemIes to be fought (Rosello, 1999; Incelli, 2013). In this context,
border protection is given priority, which leads to adopt a closeddoor PolIcy towards immigrants. Only occasionally, the metaphor of
hospitality, closely connected to the former, is invoked and immigrants
are presented as guests who enjoy the generosity of a host whose borders are now seen as an oPen door (Rosello, 1999). Finally, the legal
system also seems to draw on the metaphor ImmIgrants are oBjects
(Incelli, 2013), in which immigrants are conceived of as entities that can
be relocated.
A close reading of the dominant metaphorical construction of immigration described above relects a tight connection with what Lakoff
called the strict father model (Lakoff, 1996, 2006; Lakoff & Wehling,
2012). Framing immigration as a security problem and immigrants as
illegal aliens or invaders (Lakoff & Ferguson, 2006) appeals to the governments’ duty of protecting their citizens, just as a father would do,
and contributes to enhancing the treatment of immigration as a threat in
public and legal discourse.
5. The conceptualization of immigration in a learner corpus
of legal English
Metaphors are a fundamental part of linguistic competence and need
to be addressed in second and foreign language learning and teaching.
Early research in this area (Boers, 2000; Littlemore & Low, 2006) has
mostly focused on the importance of raising learners’ awareness of metaphorical thought during the language learning process. In particular, it
has approached the students’ learning and understanding of metaphors
as well as the metaphorical extension of the meaning of words to facilitate vocabulary learning, since it has been demonstrated that making learners aware of the relationship between the literal and igurative
meanings of lexical items (Boers, 2000; Charteris-Black, 2000) aids the
comprehension and retention of new vocabulary.
Learners, however, in addition to being familiar with conceptual
metaphor and the metaphorical extension of meaning of certain expressions, need to learn and use the linguistic instantiations of metaphorical
thought in the target language (Charteris-Black, 2000). As Boers (2000)
252
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
points out, knowledge of metaphorical thought does not guarantee the
command of its linguistic realizations. Thus, metaphoric competence,
that is, the ability to recognize and use igurative language effectively
and appropriately, is indeed a relevant step in the language learning
process.
Yet, to this date there are still few studies approaching the students’
actual production of metaphorical expressions in L2. In spite of the
great expansion that research on learner corpora has seen in the last
decade, after Granger’s pioneering work and the compilation of the International Corpus of Learner English (ICLE) (Granger et al. 2002),
followed by the compilation of many other learner corpora, little research has been published on the actual use of metaphor by learners
(Littlemore & Low, 2006; Chapetón et al., 2012; Golden, 2012; Nacey,
2013; Littlemore et al., 2014). As shown earlier, there are several studies dealing with the study of metaphor in both public discourse and
legal discourse. A few have also approached metaphors in the language
of learners or non-experts, and, very recently, the metaphors used by
migrant students in their account of their own experiences (Catalano,
2016). However, to our knowledge, no study has analyzed the use of
immigration metaphors in a learner corpus of legal English.
This paper, which approaches the metaphorical conceptualization of
immigration in an EAL1 learner corpus of legal English, aims to ill
a gap in learner corpus research. Its objective is to analyze the use of
igurative language in a corpus of texts on migration law written in English by Spanish undergraduates of Law and test their awareness of the
evaluative power of metaphor. Although there are a few corpus-based
studies on the use of metaphors by learners (mentioned above) claiming
that learners do use metaphorical expressions, they focus on students’
general argumentative writing, not on a specialized register. To our
knowledge, there is no other learner corpus of legal English which has
been compiled and analyzed so far.
Being immigration a highly debated and controversial topic, and an
important social issue in western society, metaphorical language is expected to play a major role in learners’ production. If so, can university
students of law recognize the metaphors used in legal discourse as well
The term EAL was preferred to EFL/ESL here as the population under study has been
instructed in English.
1
Immigration metaphors in a corpus of legal English...
253
as their connotations? And do they reproduce metaphors charged with
negative associations without, perhaps, even being aware of them?
6. Data and Method
6.1. Learner corpus data and learner proile used in this study
With the aim of exploring the use that learners make of metaphors in
their legal English written production and what type of conceptualizations are being used, twenty-ive unrevised written assignments (circa 25,000 tokens) on European immigration and asylum produced by
thirty Spanish undergraduate students of Law who use English as an
Additional Language (EAL) were selected. Admittedly, this is a small
corpus, which will be enlarged in the future, but taking into account
that some previous studies on metaphor in learner corpora are based
on small datasets (Nacey, 2013), we think ours is enough as an exploratory qualitative study which can provide valid conclusions about
the use of igurative language and the patterns followed in this type of
discourse. This collection of texts has been constructed to inform the
VESPA (“Varieties of English for Speciic Purposes dAtabase”) learner
corpus project, aimed at building up a large corpus of ESP texts written
by L2 writers from various mother tongue backgrounds.
This group of undergraduates was enrolled in a 6 ECTS optional
course on Migration Law and Citizenship, which examines the rules
and policies developed by the European Union and Member States in
order to manage migration lows. In addition to describing this phenomenon both at the EU and at national level, the course also focuses on EU
and national powers that govern the entrance, removal and status of
non-nationals. The difference between Union citizens and third-country
nationals is also analysed and compared, as well as the status of family
members of Union citizens. The acquisition of citizenship by former
migrants and the special situation of asylum seekers are also addressed.
Regarding the learning outcomes of the course, learners are expected, on the one hand, to gain knowledge on the basic concepts of migration law, asylum and citizenship as well as the rules that govern
migration both at EU and national level and, on the other, acquire a
better command of migration terminology and associated phraseology
in English.
254
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
No level of language proiciency is required to enrol in the course,
but learners sit a placement test during the irst week of the course and
their English proiciency level ranges from B1 to C2, using the CEFR
system.
6.2. Survey
In order to attest learners’ awareness of the use of metaphorical expressions usually associated with the role and functions of the government and laws, as far as immigration policies are concerned, a survey
was handed out among participants to the study (see Appendix). In this
study, we present qualitative results on part 3, in which respondents
were given some information about metaphors as mechanisms used to
understand one concept in terms of another and were asked whether
they were aware of the fact that legal discourse was highly metaphorical and that certain terms were associated with negative connotations.
They also had to justify their answers.
6.3. Metaphor extraction
Following the “metaphor identiication procedure” (MIP; Pragglejaz
Group 2007; Steen et al., 2010), a 25,000-word sample from a learner corpus of legal English was analysed manually in order to identify
the most salient metaphorically used expressions; that is, expressions
that have a contextual (metaphorical) meaning that can be understood in
comparison with a more basic (literal) meaning, commonly found in the
learners’ essays. Each of these expressions was classiied according to
their source domain.
With the aim of ensuring accuracy and consistency, three analysts
were involved in the metaphor identiication process. The analysts are
all linguists and researchers specialised in discourse analysis. Their individual results were discussed and only those metaphorical expressions agreed among the three analysts were selected for the present
study. Finally, these expressions were compared against those already
identiied in native production (O’Brien, 2003; Charteris-Black, 2006;
among others).
Immigration metaphors in a corpus of legal English...
255
7. Results and Discussion
7.1. Metaphor analysis
Corpus data reveal that learners use a large number of metaphorical
expressions and that the most frequent conceptual metaphors used by
learners in our corpus of legal English to depict immigration and its actors can be grouped as follows: natIons are contaIners, ImmIgratIon Is a threat/ProBlem; ImmIgratIon control Is a Battle,
ImmIgratIon Is a natural force; ImmIgrants are oBjects.
7.1.1. natIons are contaIners
The analysis of the examples found in the learner corpus dataset has
shown that learners also conceptualize nations as bounded spaces of
limited capacity (Example (3)) vulnerable to collapse in the event of a
large-scale increase in immigration (Example (4)) (Rosello, 1999). In
this context, governments and institutions become guarantors of protection and border security turns out to be essential for the stability of the
country (Example (5)) (Castan Pinos, 2008). Hence, borders are metaphorically conceived of as gates or doors that can be sealed (Example
(6)) or selectively opened to people based on criteria of desirability and
need (Example (7)) (Zaiotti, 2007).
(3) Relocation as a concept which emphasizes distribution of persons
in clear need of international protection among Member States,
will be used when the volume of arrivals is already full (ML 19).
(4) With 600.000 people applying for asylum in 2014, the European
Union is under a lot of pressure, the system is overwhelmed.
These distribution criteria relect the capacity of the Member
States to absorb and integrate refugees (ML 20).
(5) What European institutions have tried to do so far is ind a balance
between these two apparently opposing obligations: the humanitarian one of saving those in peril and the one of protecting Europe
(ML 19).
(6) One of the main arguments of those who believe we should “close
our borders” is the fact that we simply do not know if everyone
coming in is an actual asylum seeker, or a member of a terrorist
group who is just taking advantage of the situation and making
his way to Europe (ML 21).
256
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
(7) The EEUU or the UK’s primes ministers openly said that they
would open the borders for those highly-educated professionals
(with no less than a College degree) to seek for opportunities in
their countries (ML 22).
The container metaphor implies both an inside and an outside and
therefore in relation to immigration discourse it requires both the ‘us’
and the ‘them’ referred to by Van Dijk (2000): “the penetration of the
boundary of a container implies the ‘them’ symbolically entering the
‘us’” (Charteris-Black, 2006: 577). In this sense, borders simultaneously serve the purpose of setting categorization lines that help to
distinguish citizens from non-citizens, often referred to as applicants
(Examples (8), (9)) to whom states can offer temporary or permanent
protection. The metaphor of hospitality is invoked in this case and nations are presented as hosts (Example (10), (11)).
(8) Agenda did not provide Member States with instructions how to
do this seperation between applicants if they are facing higher
number of applications than expected (ML 19).
(9) Member States shall prevent secondary movements of relocated
applicants during the period of the examination of application for
international protection (ML08).
(10) Very commendable is that Europol and Eurojust are ready to assist the host Member State with investigations to dismantle the
smuggling and traficking networks (ML 19)
(11) It is important to make progress when it comes to relocation and
resettlement with respect to the Member States and third countries which host large numbers of refugees (ML21).
All in all, the examples above provide a picture of immigration that
frames it as a problem mainly related to having to cope with more than
a fair share of refugees and migrants, which has justiied the adoption
of a protection policy oriented to controlling the porosity of Europe’s
borders establishing tight selective criteria.
7.1.2. ImmIgratIon Is a threat/ProBlem
As shown in the literature, the social phenomenon of immigration is often portrayed as dangerous in immigration discourse (Santa Ana, 2002;
Immigration metaphors in a corpus of legal English...
257
Charteris-Black, 2006; Cisneros, 2008, to name but a few). Cisneros
(2008: 569) points out that “[t]hough the degree of popular obsession
with immigrants rises and falls, there is always an awareness that these
strangers potentially bring with them monumental and threatening
changes”. In this scenario, immigrants are seen as threatening enemies
(invaders, troublemakers) and as a danger to the stability of member
states.
(12) They are considered to be a threat to public policy, internal security, public health and international relations (ML03)
(13) (…) to check (…) the identity of any person, irrespective of his
behaviour and of speciic circumstances giving rise to a risk of
breach of public order (ML03)
(14) A Member State can apply for temporary protection in the event
of a mass inlux of displaced persons from third countries who
are unable to return to their country of origin and to promote a
balance of effort between Member States in receiving and bearing
the consequences of receiving such persons (ML05)
(15) EU states tend to view any large-scale international migration as
a threat to the sovereignity of their national and regional borders, their economies and their societies (ML21)
As illustrated in the examples above, many expressions of negative
evaluation, such as “a threat to public policy, internal security” (Example 12); “a risk of breach of public order” (Example 13); “bearing the
consequences of” (Example 14) and “a threat to the sovereignity of their
national and regional borders” (Example 15) can be found in the corpus data, which goes in line with those expressions identiied in native
production.
7.1.3. ImmIgratIon Is a natural force
Immigration is conceptualized as a low of water, as a tidal wave
(source domaIn) which is dificult to control and thus is portrayed as
a social catastrophe: “(…) by their nature, liquids –tides, rivers, waves,
etc. – move around; they can therefore be related to a more primary
conceptual metaphor: changes are movements” (Kövecses, 2002:
134). To this respect, Charteris-Black (2006: 572) highlights that “lack
of control over change is lack of control over movement”, which rein-
258
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
forces the idea that the use of disaster metaphors to describe migratory
lows implies that the phenomenon of immigration is perceived as a
danger (ImmIgratIon Is a threat). In addition, within this frame,
immigrants are not seen as individuals anymore. On the contrary, they
are depicted as an undifferentiated, anonymous and hence dehumanized
mass: “understanding migrants as molecules in a liquid depersonalizes
and dehumanizes them.” (Dervinyte, 2009: 53).
(16) In the event of a mass inlux of displaced persons. (ML10)
(17) It means that this situation of a mass inlux or imminent mass inlux of displaced people from third countries has to be recognised
internationally (ML13)
(18) It’s just a large migratory low (ML11)
(19) The complexity of the migrant inlow has put enormous strain on
the asylum system. Some countries (…) have reached breaking
point in their ability to manage the unplanned inlow and meet
EU standards for receiving and processing applicants (ML22)
Expressions such as a “mass inlux” (Examples (16) and (17)), “a
low” (Example (18) and “inlow” (Example 19) relate to the image of
water. It is also worth noting that as immigration is here represented in
terms of a natural force (e.g., low, inlux), it is often described by means
of gradable adjectives, such as large or big (Example (18)).
7.1.4. ImmIgrants are oBjects
Immigrants are often presented as impersonal or interchangeable objects; that is, materials depicted as cheap labour that can be easily replaced ore removed from one place to another. As pointed out in the literature (O’Brien, 2003; Charteris-Black, 2006), images of immigrants
as quantiiable goods (Examples (20) and (21)) discourage empathy
with incomers, who are associated with a feeling of fear of destruction:
(20) The controversial issue is the proposal to introduce ixed amounts
distribution of refugees among Member States tabled by Germany and rejected by the countries of Central Europe (ML21)
(21) Some countries are now challenging the EU proposals by introducing the number of asylum seekers they are willing to take
(ML22)
Immigration metaphors in a corpus of legal English...
259
The ImmIgrants are oBjects metaphor tends to appear in combination with the natIons are contaIners metaphor. States are
viewed as containers and immigrants are perceived of as dehumanized
entities that can be easily relocated from one place to another (Example
(22) and (23)) and even exploited as cheap labour (Example 24), which
turns them into a source of ProfIt (Examples (24) and (25)) that
exert pressure on cheap labour (immigrants) that is seen as a threat:
(22) The member state of relocation shall take back the person as they
are a threat to the country (ML19)
(23) Some countries are now challenging the EU proposals by introducing the number of asylum seekers they are willing to take
(ML22)
(24) Supporting effective management of labour migration to tackle
exploitation and support migrant workers (ML23)
(25) Migrants in an irregular situation are also more vulnerable to labour and other forms of exploitation (ML24)
7.1.5. ImmIgratIon control Is a Battle
Metaphors that refer to ImmIgratIon Is a Battle are also very frequently found in immigration discourse. Immigrants are often portrayed
as the invading enemy threatening the stability of the state and, thus,
dealing with them requires military action (Biria, 2012: 37):
(26) Europe is facing the biggest wave of refugees after decades. In
this moment there is no possibility for Member States to combat
illegal pathways of reaching Europe alone (ML22)
(27) To ight the migration massive inlux, the EU is trying to solve
the problem from the bottom looking for the main reasons that
made people move from one country to another (ML20)
In the examples above, the EU is conceptualized as a container which
must be protected and kept secure from external dangers, such as, “the
biggest wave of refugees” and/or “the migration massive inlux”. Yet
again, immigration is understood in terms of a threat that hinders the
stability and integrity of the nation.
The data provided above evidence that metaphors are an inextricable part of learners’ description and analysis of migration law, which
arises the question whether the construct of immigration that their es-
260
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
says relect was purposely built upon metaphors or not. As Lakoff &
Turner (1989) point out: “Metaphor is a tool so ordinary that we use
it unconsciously and automatically, with so little effort that we hardly
notice it” (p. xi). Thus, we are not always necessarily conscious of the
metaphors that we use and the connotations that they evoke. To attest
learners’ metaphorical awareness and assess how conscious they were
of the evaluative slant that most of the metaphors used in their written
production convey, participants were asked to take a survey that overtly
asked about these two aspects.
7.2. Survey
After a brief introduction that reported on the pervasiveness of metaphor in language, and the fact that metaphor can inluence our perception of the world and our attitudes to it (see Appendix for the complete
survey), students were asked to answer two questions:
• Q1. Were you aware that legal language was highly metaphorical?
• Q2. Had you ever realized that the use of terms such as those
mentioned above evoke negative connotations? Explain briely.
Almost 40% of law students had not realized before that metaphorical expressions are pervasive in this register. This high percentage conirms that it is important to train students to become aware of the use of
metaphorical expressions and the associations they have. The irst step
is thus to raise students’ awareness of igurative language; since only if
they can recognize the metaphorical use of a speciic word, will they be
able to understand the associations it may carry.
Then students were asked if they had ever realized that the use of
the terms included in the survey evoked negative connotations and had
to explain why. It is in the open comments where we can more clearly
see the participants’ awareness of metaphors and their connotations,
since they show different degrees in the students’ perception both of
metaphors and of their associations. Learners’ responses indicate that
72% of the students who said were conscious of metaphorical language also showed their perception of the negative connotations of
the metaphors associated with migration. As the following student’s
Immigration metaphors in a corpus of legal English...
261
comment conveys, words may not be neutral and can carry different
connotations:
(28) Yes because the language that you use is not neutral and depending
on how you use the words you can transmit different messages.
(ML12).
The pejorative connotations of a word in its literal usage are transferred to its metaphorical use which thus also carries a negative message, as shown in the following comment:
(29) Flood in everyday’s language is rather a negative word, so connecting “lood” with immigration would always evoke negative
connotations. (ML05).
The purpose of these metaphors to communicate a particular view
of immigration to justify some governments’ policies is also recognized
by some of the students:
(30) Yes, the use of this kind of expressions may contain discriminatory
clauses, such as even racist connotations; indirectly. Then, it justiies some policies to restrict immigrants rights, as it’s happening in
a great part of Europe. The law is written by people, so it’s obvious
it may contain metaphors and political matter. (ML13).
The remaining answers (28%) were more ambiguous as far as negative connotations are concerned. As connotative meaning is subjective,
not all students recognize pejorative implications:
(31) For me words such as lows, inlux and curbs don’t necessarily
have a negative connotation. Matter fact all those words have different uses, and even as metaphors I don’t think you can get a universal deinition because they have different meanings. (ML02).
(32) Probably because the use of legal terms in a metaphorical way
seems easier to understand for laymen from a shallow perspective.
While in fact these terms have a very profound and unforseeable
load of speciic meaning behind them. That’s why every word in a
legal context is important. (ML06).
262
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
However, most students who were previously unaware of the presence of metaphorical language in legal English realized the negative
associations implied in these expressions:
(33) I had never realized before that metaphorical uses of words such as
the ones indicated above evoke negative connotations. However, it
is obvious once I have seen in that example that they totally evoke
negative connotations by subliminally (?) impressing people’s subconscious with negative meaning which leads then to think of that
phenomenon as something dangerous or harmful. (ML01)
(34) No, I’ve never realized it, but I think metaphors are useful in this
sense because they can bring up different reactions. (ML04).
As relected in the comments above, making learners conscious of
the fact that legal language is highly metaphorical in nature contributes to raising their awareness of the “strategic dimension” (Damele,
2016: 175) that metaphorical expressions play in inluencing people’s
views. The use of metaphors can be unconscious and automatic, thus
raising learners’ metaphorical awareness is crucial not only to aid language learners’ communicative competence and proiciency but also to
enhance critical thinking. Arguments in favor of integrating metaphor
competence (i.e. the ability to acquire, produce and interpret metaphor
(Littlemore & Low, 2006)) in the second, foreign and speciic purposes
language curricula have multiplied in the last 30 years (Danesi, 1993,
2008; Littlemore & Low, 2006; Boers, 2013). This seems logical if we
consider that the development of metaphor competence also contributes
to improving textual, grammatical, illocutionary, strategic and sociolinguistic competence (Littlemore & Low, 2006). Thus, for example,
explicit metaphor instruction has proven to enhance the expansion and
retention of vocabulary (Boers, 2004); the understanding and recalling
of polysemous senses and idioms (Kövecses, 2001) or the reduction of
negative transfer errors derived from cross-cultural differences in metaphor usage and wording (Boers, 2003, 2004; Campos-Pardillos, 2016).
In the case of legal ESP and law studies, an explicit approach to conceptual metaphors and their lexicogrammatical instantiations may help
learners not only to detect differences in the metaphorical models that
every legal system select to deal with issues such as immigration but
Immigration metaphors in a corpus of legal English...
263
also to develop the necessary pragmatic skills to uncover the inference
patterns and evaluative slant that they evoke.
8. Conclusion
Our results have shown that the metaphorical construct of immigration
in learners’ legal discourse builds upon a web of interrelated metaphors
(Ponterotto, 2000; Semino, 2008) that seem to have the metaphor natIons are contaIners as their core constituent. The fact that countries are conceptualized as bounded areas vulnerable to the irregular
entry of third nationals renders immigration as a risk for their internal
welfare, which licenses the use of the metaphors ImmIgratIon Is a
threat and ImmIgratIon control Is a Battle to protect the country’s interests. The threat that immigration is thought to pose is often
described as a natural hazard, a natural force with catastrophic
consequences for the recipient countries. This metaphor contributes to
dehumanizing immigrants by equating them with overwhelming lows
of water, just as it does their depiction as oBjects whose relocation
or expulsion represents a relief for the recipient countries. Only when
the perspective shifts away from the devastating effects of immigration
on nations and focuses instead on immigrants, the igure of nations as
protective hosts emerges. Our analysis has also evidenced that public,
legal and learners’ discourse on immigration are largely shaped by a
common metaphorical model, which is so highly entrenched that its use
and negative slant often go unnoticed by learners. Finally, this study has
also proved that bringing to the fore the connotations and power relationships implicit in a given choice of words positively helps learners
to take a critical stance towards seemingly neutral terms and realize that
words always matter.
9. Acknowledgements
We acknowledge the support of the Agència de Gestió d’Ajuts Universitaris i de Recerca (2014 SGR 1374) and the beca de formación en
ivestigación y docencia de la Fundación Obra Social y Universidad de
Barcelona (grand held by Emilia Castaño). The authors are also grateful to two anonymous reviewers for their comments.
264
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
10. References
Berger Linda, L. 2004. What is the sound of a corporation speaking? How the
cognitive theory of metaphor can help lawyers shape the law. Journal
of the Association of Legal Writing Directors 2: 169-208.
Binotto, Marco. 2015. Invaders, Aliens and Criminals: Metaphors and Spacesin the Media Deinition of Migration and Security Policies. In Bond,
Emma; Guido, Bonsaver & Faloppa, Federico (ed.) Destination Italy:
Representing Migration in Contemporary Media and Narrative. Oxford: Peter Lang, 31-58.
Biria, Ensieh. 2012. Figurative Language in the Immigration Debate: Comparing Early 20th Century and Current U.S. Debate with the Contemporary European Debate. (Thesis). http://pdxscholar.library.pdx.edu/
open_access_etds (234).
Boers, Frank. 2000. Metaphor awareness and vocabulary retention. Applied
Linguistics 21(4): 553-571.
Boers, Frank. 2003. Applied linguistics perspectives on cross-cultural variation in Conceptual Metaphor. Metaphor and Symbol 18(4): 231-238.
Boers, Frank. 2004. Expanding learners’ vocabulary through metaphor awareness: What expansion, what learners, what vocabulary. In Achard,
Michel & Niemeier, Susanne (ed.) Cognitive Linguistics, Second Language Acquisition and Foreign Language Teaching. Berlin/New York:
De Gruyter, 211-232.
Boers, Frank. 2013. Cognitive Linguistic approaches to teaching vocabulary:
Assessment and integration. Language Teaching 46(2): 208-224.
Caballero, Rosario. 2003. Metaphor and genre: the presence and role of metaphor in the building review. Applied Linguistics, 24(2): 145-167.
Caballero, Rosario. 2006. Re-Viewing Space. Figurative Language in Architects’ Assessment of Built Space. Berlin/New York: Mouton De Gruyter.
Cameron, Lynn & Deignan, Alice. 2003. Combining large and small corpora to
investigate tuning devices around metaphor in spoken discourse. Metaphor and Symbol 18(3): 149-160.
Campos-Pardillos, Miguel A. 2016. Increasing Metaphor Awareness in Legal
English Teaching. ESP Today 4(2): 165-183.
Castan Pinos, Jaume. 2008. Building Fortress Europe? Schengen and the cases
of Ceuta and Melilla. CIBR/WP10. Belfast: CIBR Working Papers in
Border Studies.
Castaño, Emilia; Verdaguer, Isabel; Laso, Natalia Judith & Ventura, Aaron.
2014. Economy is a living organism. Metaphorical expressions in
a learner corpus of English. Spanish Journal of Applied Linguistics
27(2): 323-337.
Immigration metaphors in a corpus of legal English...
265
Catalano, Theresa. 2016. Talking About Global Migration: Implications for
Language Teaching. Bristol: Multilingual Matters.
Chapetón, Marcela & Verdaguer, Isabel. 2012. Researching linguistic metaphor in native, non-native and expert writing. In MacArthur, Fiona;
Oncins-Martínez, José Luis; Sánchez-García, Manuel & Piquer-Píriz,
Ana María (eds.) Metaphor in Use: Context, Culture, and Communication. Amsterdam/Philadelphia: John Benjamins Publishing Company,
149-174.
Charteris-Black, Jonathan. 2000. Metaphor and vocabulary teaching in ESP
economics. English for Speciic Purposes 19: 149-165.
Charteris-Black, Jonathan. 2004. Corpus Approaches to Critical Metaphor
Analysis. London: Palgrave-MacMillan.
Charteris-Black, Jonathan. 2005. Politicians and Rhetoric: The Persuasive
Power of Metaphor. New York: Palgrave Macmillan.
Charteris-Black, Jonathan. 2006. Britain as a container: immigration metaphors in the 2005 election campaign. Discourse & Society 17(5): 563581.
Chavez, Manuel & Hoewe, Jennifer. 2012. National perspectives on state turmoil. Characteristics of elite U.S. newspaper coverage of Arizona SB
1070. In Santa Ana, Otto & González de Bustamante, Celeste (ed.) Arizona Firestorm. Global Immigration Realities, National Media, and
Provincial Politics. New York: Rowman & Littleield Publishers, 189202.
Cisneros, David. 2008. Contaminated Communities: The Metaphor of “Immigrant as Pollutant” in Media Representations of Immigration. Rhetoric
& Public Affairs 11(4): 569-601.
Crespo-Fernández, Eliecer. 2013. The treatment of immigrants in the current
Spanish and British right-wing press: A cross-linguistic study. In Martínez-Lirola, M. (ed.) Discourses on Immigration in Times of Economic
Crisis: A Critical Perspective. UK: Cambridge Scholars Publishing,
86-112.
Cunningham-Parmeter, Keith. 2011. Alien Language: Immigration Metaphors
and the Jurisprudence of Otherness. Fordham Law Review 79: 15451598.
Damele, Giovanni. 2016. Adventures of a metaphor: Apian imagery in the history of political thought. In Gola, Elisabetta & Ervas, Francesca (eds.)
Metaphor and Communication. Amsterdam/Philadelphia: John Benjamins Publishing Company, 173-188.
Danesi, Marcel. 1993. Metaphorical competence in second language acquisition and second language teaching. The neglected dimension. In Altais,
James (ed.) Georgetown University Round Table on Language and Linguistics. Washington DC: Georgetown University Press, 489-515.
266
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
Danesi, Marcel. 2008. Conceptual errors in second-language learning. In de
Knop, Sabine & de Rycker, Teun (ed.) Cognitive Approaches to Pedagogical Grammar. Berlin/New York: Mouton de Gruyter, 231-256.
Deignan, Alice. 2005. Metaphor and Corpus Linguistics. Amsterdam/Philadelphia: John Benjamins Publishing Company.
Deignan, Alice; Littlemore, Jeannette & Semino, Elena. 2013. Figurative Language, Genre and Register. Cambridge: Cambridge University Press.
Dervinyte, Inga. 2009. Conceptual emigration and immigration metaphors in
the language of the press: A contrastive analysis. Studies about Languages 14: 49-55.
Duncan, Martha. G. 1994. In slime and darkness: The metaphor of ilth in
criminal justice. Tulane Law Review 68: 725-802.
Ebbesson, Jonas. 2008. Law, Power and Language: Beware of Metaphors.
Scandinavian Studies in Law 53: 259-269.
El Refaie, Elisabeth. 2001. Metaphors we discriminate by: Naturalised themes
in Austrian newspaper artices about asylum seekers. Journal of Sociolinguistics 5(3): 352-371.
Entman, Robert. 1993. Framing: Toward Clariication of a Fractured Paradigm. Journal of Communication 43(4): 51-58.
Faber, Pamela & Márquez Linares, Carlos. 2004. The role of imagery in specialized communication. In Lewandowska-Tomaszczyk, Barbara &
Kwiatkowska, Alina (ed.) Imagery in Language. Frankfurt: Peter Lang,
585-560.
Flores, Lisa. 2003. Constructing Rhetorical Borders: Peons, Illegal Aliens, and
Competing Narratives of Immigration. Critical Studies in Media Communication 20(4): 362-387.
Golden, Anne. 2012. Metaphorical expressions in L2 production: The importance of text topic in corpus research. In MacArthur, Fiona; Oncins-Martínez, José Luis; Sánchez-García, Manuel & Piquer-Píriz, Ana
M. (eds.) Metaphor in Use. Amsterdam/Philadelphia, 135-148.
Granger, Sylvianne; Dagneaux, Estelle & Meunier, Fanny. 2002. The International Corpus of Learner English. Handbook and CD-ROM. Louvainla-Neuve: Presses universitaires de Louvain.
Hanks, Patrick. 2007. Metaphoricity is gradable. In Stefanowitsch, Anatol &
Gries, Stefan Th. (ed.) Corpus-Based Approaches to Metaphor and Metonymy. Berlin/New York: Mouton de Gruyter, 17-35.
Herrmann, J. Berenicke & Sardinha, Tony Berber (eds.). 2015. Metaphor in
Specialist Discourse. Amsterdam/Philadelphia: John Benjamins Publishing Company.
Incelli, Ersilia. 2013. Shaping reality through metaphorical patterns in legislative texts on immigration: a corpus-assisted approach. In Williams,
Immigration metaphors in a corpus of legal English...
267
Chistopher & Tessuto, Girolamo (eds.) Language in the Negotiation of
Justice Contexts, Issues and Applications. Series: Law, Language and
Communication. Farnham: Ashgate, 235-256.
Jackson, Vicki. C. 2006. Constitutions as “living trees”? Comparative constitutional law and interpretive metaphors. Fordham Law Review 75:
921-960.
Johnson, Mark. 2007. Mind, Metaphor, Law. Mercer Law Review, 58(3): 845868.
Knudsen, Sanne, 2015. Framings of the concept of metaphor in biological specialist communication. In Herrmann, J. Berenicke & Sardinha, Tony
Berber (eds.) Metaphor in Specialist Discourse. Amsterdam/Philadelphia: John Benjamins Publishing Company, 191-214.
Kövecses, Zoltan. 2001. A cognitive linguistic view of learning idioms in an
FLT context. In Pütz, Marti; Niemeier, Susanne & Dirven René (eds.)
Applied Cognitive Linguistics II: Language Pedagogy. Berlin: Mouton
de Gruyter, 87-115.
Kövecses, Zoltan. 2002. Metaphor: A Practical Introduction. New York/Oxford: Oxford University Press.
Lakoff, George & Johnson, Mark. 1980. Metaphors We Live By. Chicago: University of Chicago Press.
Lakoff, George & Johnson, Mark. 1999. Philosophy in the Flesh. The Embodied Mind and its Challenge to Western Thought. Nueva York: Basic
Books.
Lakoff, George.1996. Moral Politics. Chicago: University of Chicago Press.
Lakoff, George. 2004. Don’t Think of an Elephant: Know your Values and
Frame the Debate: The Essential Guide for Progressives. White River
Junction, Vt: Chelsea Green Pub. Co.
Lakoff, George. 2006. Whose Freedom? The Battle Over America’s Most Important Idea. New York: Picador.
Lakoff, George. 2008. The Political Mind. New York: Viking.
Lakoff, George & Ferguson, Sam, 2006. The Framing of Immigration. The
Rockridge Institute. http://www.rockridgeinstitute.org/research/rockridge/immigration.
Lakoff, George & Turner, Mark. 1989. More Than Cool Reason: A Field Guide
to Poetic Metaphor. Chicago: University of Chicago Press.
Lakoff, George & Wehling, Elisabeth. 2012. The Little Blue Book: The Essential Guide to Thinking and Talking Democratic. New York: Free Press.
Larsson, Stefan. 2013. Metaphors, Law and Digital Phenomena: The Swedish
Pirate Bay Court Case. International Journal of Law and Information
Technology. Advance Access 21(4): 354-379.
268
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
Littlemore, Jeannette & Low, Graham. 2006. Metaphoric competence, second
language learning, and communicative language ability. Applied Linguistics 27(2): 268-294.
Littlemore, Jeannette; Krennmayr, Tina; Turner, James & Turner, Sarah. 2014.
An investigation into metaphor use at different levels of second language writing. Applied Linguistics 35(2): 117-144.
Loughlan, Patricia. 2006. Pirates, parasites, reapers, sowers, fruits, foxes …
The metaphors of intellectual property. Sydney Law Review 28: 211226.
Musolff, Andreas. 2004. Metaphor and Political Discourse. New York: Palgrave Macmillan.
Musolff, Andreas. 2011. Migration, media and “deliberate” metaphors. metaphorik.de 21: 7-19.
Musolff, Andreas. 2015. Dehumanizing metaphors in UK immigrant debates
in press and online media. Journal of Language Aggression and Conlict 3(1): 41-56.
Nacey, Susan. 2013. Metaphors in Learner English. Amsterdam/Philadelphia:
John Benjamins Publishing Company.
Noah, Lars. 2000. Interpreting agency enabling acts: Misplaced metaphors in
administrative law. William & Mary Law Review 41(5): 1463-1530.
O’Brien, Gerald. 2003. Indigestible Food, Conquering Hordes, and Waste Materials: Metaphors of Immigrants and the Early Immigration Restriction
Debate in the United States. Metaphor and Symbol 18(1): 33-47.
Orts Llopis, María Ángeles. 2015. Legal English and Legal Spanish: The Role
of Culture and Knowledge in the Creation and Interpretation of Legal
Texts. ESP Today 3(1): 1-134.
Ponterotto, Diane. 2000. The cohesive role of cognitive metaphor in discourse
and conversation. In Barcelona, Antonio (ed.) Metaphor and Metonymy
at the Crossroads. Berlin: Mouton de Gruyter, 283-298.
Pragglejaz Group. 2007. MIP: A Method for Identifying Metaphorically Used
Words in Discourse. Metaphor and Symbol 22(1): 1-39.
Rosello, Mireille. 1999. Fortress Europe and its Metaphors. Immigration and
Law. Madison: European Studies Program.
Saiz de Lobado, María Ester. 2015. Análisis de la información y análisis
metafórico desde una perspectiva estadístico-lingüística (Tesis doctoral). Alcalá: Universidad de Alcalá- Departamento de Filología.
Salager-Meyer, Francoise. 1990. Metaphors in medical English prose: A comparative study with French and Spanish. English for Speciic Purposes
9(2): 145-159.
Santa Ana, Otto. 1999. Like an animal I was treated: Anti-immigrant metaphor
in U.S. public discourse. Discourse and Society 10(2): 191-224.
Immigration metaphors in a corpus of legal English...
269
Santa Ana, Otto. 2002. Brown Tide Rising: Metaphors of Latinos in Contemporary American Public Discourse. Texas: University of Texas Press.
Semino, Elena. 2008. Metaphor in Discourse. Cambridge: Cambridge University Press.
Steen, Gerard; Dorst, Aletta; Herrmann, Berenike; Kaal, Anna; Krennmayr,
Tina & Pasma, Trijntje. 2010. A Method for Linguistic Metaphor Identiication. Amsterdam: John Benjamins Publishing Company.
Stefanowitsch, Anatol. 2007. Corpus-Based Approaches to Metaphor and
Metonymy. In Stefanowitsch, Anatol & Gries, Stefan Th. (eds.) Corpus-Based Approaches to Metaphor and Metonymy. Berlin/New York:
Mouton de Gruyter, 1-16.
Strom, Megan & Alcock, Emily. 2017. Floods, waves, and surges: the representation of Latin@ immigrant children in the United States mainstream media. Critical Discourse Studies. doi: 10.1080/17405904.2017.1284137.
Ureña Gómez-Moreno, José Manuel. 2012. Conceptual types of terminological metaphor in marine biology. In MacArthur, Fiona; Oncins-Martínez,
José Luis; Sánchez-García, Manuel & Piquer-Píriz, Ana María (eds.)
Metaphor in Use. Amsterdam/Philadelphia: John Benjamins Publishing Company, 239-260.
Van Dijk, Teun A. 2000. Ideology and Discourse: A Multidisciplinary Introduction. Barcelona: Pompeu Fabra University.
Winter, Steven L. 2001. A Clearing in the Forest: Law, Life, and Mind, Chicago and London: University of Chicago Press.
Winter, Steven L. 2006. Re-embodying Law. Mercer Law Review 58: 869-892.
Wodak, Ruth. 2006. Mediation between discourse and society: Assessing cognitive approaches in CDA. Discourse Studies 8: 179-190.
Zaiotti, Ruben. 2007. Of Friends and Fences: Europe’s Neighbourhood Policy and the Gated Community Syndrome. European Integration 29(2):
143-162.
270
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
Appendix
Part 1
1. List THREE adjectives that describe the noun immigrants
1.
2.
3.
2. List THREE verbs that combine with the noun immigration
1.
2.
3.
3. Tick the expressions that best describe the function(s) of the government/
state and order them from most signiicant to leastsigniicant:
Tick
Order
To prevent problems
To punish
To tell right from wrong
To empower citizens
To propose reforms
To be empathic
To demand responsibility
To nurture
To impose limits
To protect the country’s interests
To look after citizens
4. Tick the expressions that best describe the function(s) of laws:
Tick
Order
To protect
To set limits
271
Immigration metaphors in a corpus of legal English...
Tick
Order
To control people
To enforce rights
To guarantee freedom
To favour distinctions
To promote equality
To enjoy public support
To reinforce diversity
Part 2
1. Use the following response scale to rate how well the statement below describe immigration
This does not describe immigration adequately 1 2 3 4 5 This describes immigration perfectly
1
very poorly
2
poorly
3
moderately well
4
well
1. The Freedom of Movement is a great possibility to
connect people, learn from each other and of course it
makes travelling so much easier.
2. France is also arming itself in preparation for a
wave of refugees.
1. Countries have turned to immigrants to contribute
to economic growth.
2. Britain is facing a nightly tidal wave of asylum
seekers from Cherbourg, France’s second biggest
port.
3. We are at a point in this nation’s history where we
cannot afford to keep our borders porous in order to
provide employers with cheap labor.
4. State members of the European Union have agreed
to develop a common immigration policy in order to
ensure an eficient management of migration.
5
perfectly well
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
272
Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer
5. This global approach to migration aims to
encourage mobility, to ensure coherent policy making.
6. The same authority that protects the borders can
decide on who is crossing them seeking for protection.
7. America is not the only country wrestling with
immigration.
8. There are almost no measures today to cope with
the problem of people’s outlow.
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Part 3
Metaphors are conceived as mechanisms used to understand one concept
in terms of another – i.e., in expressions such as Lead someone step by step
through an argument, follow an argument or get lost THINKING IS MOVING). They are an inextricable part of our everyday language and, despite
the preconceived idea that legal language is largely free from metaphors, the
use of metaphorical expressions is also pervasive in legal discourse. This fact
is particularly relevant because metaphors evoke particular worldviews and
highlight certain aspects of a phenomenon while downplaying other parts.
Thus, for example, when terms such as lows, inlux or curbs are used to describe immigration laws, IMMIGRATION is conceived of as a FLOOD, which
brings about negative connotations and dehumanizes immigrants. Likewise,
when national borders as deined as areas whose security must be protected
against immigration, immigrants are seen as a source of security problems
and a threat to the stability of the country. Metaphors are more than igures of
speech, the choice of a metaphor over other can profoundly affect the manner
in which legal thought is affected (Berger, 2002; Cunningham-Parmeter, 2011;
Santa Ana, 1997)
1. Were you aware that legal language was highly metaphorical?
YES
NO
2. Had you ever realized that the use of terms such as those mentioned above
evoke negative connotations? Explain briely.
Qf
ojs.uv.es/index.php/qilologia/index
Lingüístics
QUADERNS DE FILOLOGIA
NORMES D’EDICIÓ
Quaderns de Filologia és el nom que reben les publicacions inançades i/o
gestionades directament per la Facultat de Filologia, Traducció i Comunicació
de la Universitat de València. Aquestes publicacions es concreten en:
1. La publicació periòdica titulada Quaderns de Filologia, la qual s’edita
en format de revista cientíica des de l’any 1995 i es publica en format
digital des de 2014 mitjançant l’Open Journal System (OJS). La revista
Quaderns de Filologia (QF) compta amb dues sèries:
a. QF Estudis Lingüístics – ojs.uv.es/index.php/qilologia/index
b. QF Estudis Literaris – ojs.uv.es/index.php/qdef/index
2. La col·lecció de monograies titulada Anejos de Quaderns de Filologia.
Les normes d’edició de totes les publicacions són les següents:
1. Format de la pàgina
Paràmetres pàgina en Word (sobre pàgina A4)
Caixa del text: distància des del límit del paper
Marge superior: 6,6 cm Marge esquerra: 5,0 cm
Marge inferior: 5,5 cm
Marge dreta: 5,0 cm
Marge intern: 0,0 cm
2. Presentació del volum
2.1. La primera pàgina
La primera pàgina del volum inclourà (en aquest ordre), en Times 11 i espai
1,5 línies:
1. Amb majúscula i centrat (deixeu 2 línies des de l’encapçalament):
274
Quaderns de Filologia
QUADERNS DE FILOLOGIA
[NOM DE LA SÈRIE] [NÚM. DINS DE LA SÈRIE amb romans]
Exemple:
QUADERNS DE FILOLOGIA
ESTUDIS LINGÜISTICS XVIII
2. Amb majúscula i centrat (deixeu 4 línies en blanc des del nom de la sèrie):
NOM DEL VOLUM
Exemple:
LENGUA Y CIENCIA.
RECEPCIÓN DEL DISCURSO CIENTÍFICO
3. Amb cursiva i centrat (deixeu 4 línies des del nom del volum, excepte si el
nom del volum ocupa més d’una línia. En aquest cas, resteu els espais):
Edició de
4. Amb majúscula i centrat (sense cap espai anterior):
NOM I COGNOM(S) D’EDITOR 1
NOM I COGNOM(S) D’EDITOR 2
(Afegir-hi tantes línies com calga)
Exemple:
JULIA PINILLA MARTÍNEZ
VIRGINIA GONZÁLEZ GARCÍA
CECILIO GARCÍA ESCRIBANO
5. En la part inferior de la pàgina s’inclourà (deixeu 5 línies en blanc):
FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ
UNIVERSITAT DE VALÈNCIA
ANY D’EDICIÓ
Les següents dues pàgines contenen informació sobre la publicació Quaderns de Filologia. Aquesta informació està ja predeterminada, com es veurà a
la plantilla. Els editors només inclouran la informació sobre el títol del volum
i el número dins de la sèrie.
275
Normes d’edició
2.2. L’índex
L’índex del volum s’inclourà en la següent pàgina senar, amb el següent format:
Nom de la secció en majúscula, Times 11 i centrat:
ÍNDEX
Lletra: Times 11. El(s) cognom(s) dels autors en versaleta seguit(s) de
coma. El(s) nom(s) dels autors en redona minúscula. El títol de l’article en
redona minúscula en la següent línia, amb sagnat de 0,75 cm. Nombre de la
pàgina al inal dels punts suspensius, amb tabulat al inal. Es recomana seguir
la plantilla de QF per als editors.
Exemple:
Un autor:
cognom(s), Nom
Títol de l’article.......................................................................
xx
Dos autors:
cognom(s), Nom & cognom(s), Nom
Títol de l’article.......................................................................
xx
Més de dos autors:
cognom(s), Nom; cognom(s), Nom & cognom(s), Nom
Títol de l’article.......................................................................
xx
En l’índex constarà la identitat sencera de tots els autors, amb independència del seu nombre.
Nota: Les normes de presentació dels articles i l’índex general de tots els
volums seran inclosos posteriorment per Quaderns de Filologia.
2.3. Encapçalament de pàgina
L’encapçalament de la primera pàgina de cada capítol, article o secció ve
predeterminat a la plantilla de QF.
Per a la resta d’encapçalaments, el cos de la lletra serà Times 10.
Contingut dels encapçalaments de la resta de pàgines de l’article:
276
Quaderns de Filologia
•
Encapçalament par: número de pàgina a l’esquerra (en lletra redona) i
nom de l’autor en lletra cursiva alineat a la dreta.
Encapçalament senar: títol de l’article o volum en lletra cursiva (versió
curta – no ocupar més d’una línia) alineat a l’esquerra i número de pàgina a la dreta (en lletra redona).
•
3. Normes de presentació dels articles
3.1. Qüestions generals
Els autors enviaran als editors en suport digital dos arxius en una versió de
Word 97 o superior (si no se’n disposa, s’hi enviarà en format RTF):
1. En una pàgina, el nom de l’autor, el títol de l’article i un resum de no
més de 10 línies i paraules clau (no més de 5) en l’idioma de l’article i
en anglès. Així mateix, s’inclouran les adreces postal i electrònica i el
número de telèfon de contacte de l’autor.
2. El text de l’article (seguint les normes d’edició d’aquest document).
Els articles tindran una extensió màxima de 15 pàgines. Els continguts de
l’article han de ser originals i no haver estat publicats amb anterioritat.
3.2. Format general del text
El tipus i cos de la lletra del text de l’article serà Times 11. L’interlineat serà
senzill.
L’article anirà encapçalat amb el títol en minúscula, en negreta i centrat.
Després d’una línia en blanc, la següent línia inclourà el nom i cognom de
l’autor o els autors en redona minúscula. En la següent línia, s’inclourà el nom
de la institució acadèmica a què pertany l’autor o autors amb cursiva minúscula. Si hi ha més d’un autor que no pertanyen a la mateixa institució, s’afegirà
el nom de l’autor i la institució en altres línies. En la següent línia, incloeu
l’adreça electrònica de l’autor en redona minúscula i sense subratllat d’enllaç
electrònic. Seguiu l’exemple següent i la plantilla:
277
Normes d’edició
Títol de l’article
Nom de l’autor 1
Nom de la institució acadèmica
[email protected]
Nom de l’autor 2
Nom de la institució acadèmica
[email protected]
El peu de la primera pàgina de cada article inclou la citació de l’article amb
doi segons les normes de QF (veure secció 3.6.2.2.). El cos de la lletra serà
Times 9:
Forma bàsica:
Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació
volum(número): pàgina inicial-pàgina inal. doi: http://dx.doi.org/
xx.xxxx.xxxx.xx
Exemple:
Escandell, Dari & Marcillas, Isabel. 2011. Els límits de l’espai autobiogràic en la narrativa breu de Mercè Rodoreda. Quaderns de Filologia.
Estudis Literaris XVI: 101-123. doi: http://dx.doi.org/10.1037/qf234
A continuació incloureu els resums de l’article en els espais disposats a
la plantilla: un resum en català/castellà i un altre en anglès. Si la llengua de
l’article és distinta d’aquestes, s’inclourà un resum en la llengua de l’article
i un altre en anglès.
El tipus i cos de la lletra del resum és Times 10 redona, minúscula i justiicat. La paraula resum en negreta. El resum no tindrà més de 10 línies.
Deixeu una línia després del resum i incloeu les paraules clau (no més de 5),
separades per punt i coma (;) i tanqueu amb punt inal (.). Seguiu l’exemple
següent i la plantilla:
278
Quaderns de Filologia
Resum
Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del
resum. Text del resum. Text del resum. Text del resum. Text del resum. Text
del resum. Text del resum. Text del resum. Text del resum. Text del resum.
Paraules clau: paraula 1; paraula 2; paraula 3; paraula 4; paraula 5.
Abstract
Text of the abstract. Text of the abstract. Text of the abstract. Text of the
abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text
of the abstract. Text of the abstract. Text of the abstract. Text of the abstract.
Text of the abstract. Text of the abstract. Text of the abstract. Text of the
abstract.
Keywords: keyword 1; keyword 2; keyword 3; keyword 4; keyword 5.
El text de l’article començarà en la següent pàgina senar.
Els títols dels epígrafs i subepígrafs aniran en lletra redona minúscula, en
una línia separada dels paràgrafs anterior i següent. Per a la numeració s’utilitzaran números aràbics (1., 1.1., etc.). El format dels distints nivells és el
següent:
1. Epígraf de nivell 1 [Times 11, negreta]
1.1. Subepígraf de nivell 2 [Times 11, cursiva]
1.1.1. Subepígraf de nivell 3 [Times 11, redona]
1.1.1.1 (nivells subsegüents [Times 11, redona])
(Noteu que la numeració va sempre en redona).
El format general del text anirà en Times 11, amb sagnat de primera línia en
0,5 cm. excepte en el primer paràgraf de cada epígraf o subepígraf. En general,
els textos respectaran les convencions tipogràiques de les llengües en les qual
estiguen redactats:
Exemple:
Normes d’edició
279
1.3. La reformulación como actividad metadiscursiva
La reformulación ha sido estudiada desde diferentes ópticas. Para Fuchs
(1994) la paráfrasis es un tipo de reformulación: hablar de reformulación es
hablar de sentido. La paráfrasis es una estrategia discursiva y cognitiva, que
reposa sobre la idea de invariante semántico (remitiéndonos a la lógica), y
sobre la noción de equivalencia (…)
Ahora bien, la reformulación ha seguido ocupando la atención de los
lingüistas que han distinguido la auto-reformulación y la hetero-reformulación (reformulación de nuestro propio discurso vs la reformulación del
otro, que predomina en la conversación) (…).
3.3. Citacions
Les citacions breus (una o dues línies) apareixeran dins del text entre cometes
angleses (“...”):
... “L’any d’edició de l’obra se sap actualment que és el 1872” (Rubio, 1994:
23).
Tal com observa Rubio (1994: 23), “l’any d’edició de l’obra...”.
Les citacions de més de dues línies aniran en paràgraf a banda, amb separació d’una línia respecte als paràgrafs anterior i posterior, sense cometes. El
cos de la lletra serà Times 10 amb marge esquerra (endinsat) a 1 cm. Veieu
l’exemple o seguiu la plantilla:
Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de
l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article.
Text de l’article. Text de l’article.
Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita.
Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita.
Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita.
Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita.
Text de la cita. Text de la cita. Text de la cita. Text de la cita.
Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de
l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article.
Text de l’article. Text de l’article.
Les elisions s’indicaran amb tres punts entre claudàtors: [...].
280
Quaderns de Filologia
3.4. Notes a peu de pàgina
Les notes crítiques (no bibliogràiques) apareixeran a peu de pàgina, amb interlineat senzill, justiicat i en Times 9.
Les crides a nota s’indicaran en números aràbics volats, darrere de la paraula indicada. Si aquesta porta després un signe de puntuació, aniran davant del
signe de puntuació i darrere de la cometa:
...varietats orientals d’aquesta llengua1: amosta2, escalfar i “gallofa”3.
3.5. Referències bibliogràiques internes
Les referències bibliogràiques dins del text seguiran els models següents:
•
Si l’autor no forma part de l’oració:
Forma bàsica:
text de l’article (cognom autor, any: pàgina)
Exemple:
... com ja s’ha observat en altres treballs (Rubio, 1994: 23).
•
Si la referència abasta tota una obra o s’hi troba dins de l’oració, s’hi
poden ometre les pàgines:
Forma bàsica:
text de l’article cognom autor (any) text de l’article (…)
Exemple:
... es la oposición que mencionaba Lliteras (2002) como característica del
momento.
•
Si l’obra citada té més de dos autors, la primera vegada es citarà amb
tots els cognoms. En les següents mencions, només s’escriurà el cognom del primer autor seguit de la frase et alii.
Exemple:
El término inteligencia emocional lo utilizaron por primera vez Salovey y
Mayer en 1990 (Álvarez Manilla, Valdés Krieg & Curiel de Valdés, 2006).
(…)
En cuanto al desempeño escolar, Álvarez Manilla et alii (2006) encontraron
que la inteligencia emocional no incide en el mismo.
Normes d’edició
•
281
Si se cita més d’un autor fora de l’oració, cada referència anirà separada
per punt i coma:
Forma bàsica:
(…) text de l’article (cognom, any; cognom, any; cognom & cognom, any).
Text de l’article (...)
Exemple:
... parece demostrado a través de numerosos trabajos (Haverkate, 1994; Briz,
1995; Casamiglia & Tusón, 2002; Ballesteros, 2002).
•
Si se citen més treballs del mateix autor en la mateixa referència, els
anys de publicació se separen amb una coma:
Forma bàsica:
(...) text de l’article (cognom autor(s), any, any, any). Text de l’article (...)
Exemple:
... com es recull a les versions més recents de la teoria (Wilson, 2007, 2010,
2012).
3.6. Bibliograia
3.6.1. Consideracions generals
•
•
•
•
•
•
El tipus i cos de la lletra es Times 10.
Les referències citades apareixen al inal ordenades alfabèticament pel
cognom del primer autor.
Les obres d’un mateix autor s’ordenen cronològicament i es repeteix el
cognom i el nom en cada referència. Les obres d’un mateix autor i any
es presentaran afegint una lletra redona minúscula a l’any [ex. (1999a)
(1999b)] i s’ordenaran cronològicament.
No empreu la forma abreujada et ali per als treballs amb coautoria, independentment del nombre d’autors. (Com s’ha indicat, es pot fer servir
aquesta forma al cos del text de l’article com a referència interna, però
no a la secció de bibliograia).
Per citar treballs en procés de publicació o en preparació, feu servir la
indicació corresponent en lloc de l’any (ex. en premsa, en preparació).
Si la data de l’edició utilitzada no es correspon amb l’original o amb
una edició diferent a la primera i s’hi vol fer constar la data, s’utilitzaran
claudàtors i la data original en primer lloc: ([1972] 1998).
282
Quaderns de Filologia
•
Si es vol fer constar el número de l’edició emprada d’una obra, l’afegireu darrere del títol, entre parèntesi amb redona minúscula i abreviada
(ex. 2ª ed.).
Cada referència té el format del paràgraf francès (hanging indent) a 1
cm.
Per indicar el volum utilitzat d’una obra que en té diversos, s’afegirà
en números romans després del títol separat per un espai (per exemple:
Història de la Literatura II).
Si es vol fer constar el nombre de volums de què consta una obra,
s’indicarà després de l’editorial en números aràbics (per exemple: (3
vol.), sense marcar el plural.
Les coedicions s’indicaran amb una barra separadora (per exemple:
Barcelona/València: PAM/IIFV).
Es podrà, si així es vol, afegir, després de la bibliograia, una llista de
textos objecte d’estudi o que han servit com a corpus d’anàlisi. Se seguiran per a la seva referència els criteris exposats més avall.
•
•
•
•
•
Documents electrònics
• Excepte en el cas de les tesis i llibres electrònics, no incloeu el nom de
la base de dades on trobareu el recurs.
• Incloeu la data d’accés o descàrrega de l’article.
• No escriviu un punt (.) després de l’adreça web (URL).
• No feu servir el subratllat dels enllaços electrònics.
Digital Object Identiier (doi)
• És una sèrie alfanumèrica assignada per la institució que gestiona
l’edició a un document en format electrònic.
• El doi identiica el contingut de l’article com a objecte digital únic.
• Proveu un enllaç permanent per a la localització de l’article a internet.
• És requeriment de Quaderns de Filologia incloure el doi com a part de
la referència d’un recurs citat a la bibliograia, si en té (seguiu indicacions més avall).
3.6.2. Articles en revistes i publicacions periòdiques
3.6.2.1. Article imprès
Un autor:
Forma bàsica:
Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació
volum(número): pàgina inicial-pàgina inal.
Normes d’edició
283
Exemple:
Chillón, Lluís-Albert. 1995. Discurs periodístic i fraseologia. Caplletra 18:
165-176.
Dos autors:
Forma bàsica:
Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la
publicació volum(número): pàgina inicial-pàgina inal.
Exemple:
Bishop, John E. & Brousseau, Kevin. 2011. The end of the Jesuit lexicographic tradition in Nêhirawêwin: Jean-Baptiste de la Brosse and his
compilation of the Radicum Montanarum Silva (1766–1772). Historiographia Linguistica 38(3): 293-324.
Tres autors:
Forma bàsica:
Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina
inal.
Exemple:
Vila, Ignasi; Oller, Judith & Fresquet, Montserrat. 2008. Una anàlisi comparativa del coneixement de català de l’alumnat castellanoparlant autòcton i l’alumnat hispà en inalitzar l’educació infantil a Catalunya.
Caplletra 45: 203-228.
3.6.2.2. Articles electrònics amb doi:
Un autor:
Forma bàsica:
Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació
volum(número): pàgina inicial-pàgina inal. doi: http://dx.doi.org/
xx.xxxx.xxxx.xx
Exemple:
Siianou, Maria. 2012. Disagreements, face and politeness. Journal of Pragmatics 44(12): 1554-1564. doi: http://dx.doi.org/10.1016/j.pragma.2012.03.009
284
Quaderns de Filologia
Dos autors:
Forma bàsica:
Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la
publicació volum(número): pàgina inicial-pàgina inal. doi: http://
dx.doi.org/xx.xxxx.xxxx.xx
Exemple:
De Wit, Astrid & Bisard, Frank. 2013. A cognitive gramar account of the
semantics of the English present progressive. Journal of Linguistics
45(7): 1-42. doi: http://dx.doi.org/10.1017/S00222267
13000169
Tres autors:
Forma bàsica:
Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. doi: http://dx.doi.org/xx.xxxx.xxxx.xx
Exemple:
Noh, Eun-Ju; Hyeree, Choo & Sungryong, Koh. 2013. Processing metalinguistic negation: Evidence from eye-tracking experiments. Journal of Pragmatics 57: 1-18. doi: http://dx.doi.org/10.1016/j.pragma.2013.07.005
3.6.2.3. Articles en línia (sense doi):
Un autor:
Forma bàsica:
Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació
volum(número): pàgina inicial-pàgina inal. http://www.aaaaa.com
[Accés dd/mm/aaaa].
Exemple:
Martínez Lirola, María. 2008. La importancia de los nuevos modos de evaluación en el EEES. Una aproximación a las ventajas del uso del Portfolio. Revista de Enseñanza Universitaria 31: 62-72. http://rua.ua.es/
dspace/bitstream/10045/17235/1/6MartinezLirola.pdf
Normes d’edició
285
Dos autors:
Forma bàsica:
Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la
publicació volum(número): pàgina inicial-pàgina inal. http://www.
aaaaa.com [Accés dd/mm/aaaa].
Exemple:
Bustelo Ruesta, Carlota & García-Morales Huidobro, Elisa. 2000. La consultoría en la organización de la información. El Profesional de la
Información 9: 4-10. http://publishersnet.swets.nl/direct/issue?
/title=2246163 [Accés 10/05/2009].
Tres autors:
Forma bàsica:
Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. http://www.aaaaa.com [Accés dd/mm/aaaa].
Exemple:
Pozo Muñoz, Carmen; Giménez Torres, Mª Luisa & Bretones Nieto, Blanca. 2009. La evaluación de la calidad docente en el nuevo marco del
EEES: Un estudio sobre la encuesta de opinión del programa Docentia-Andalucía. Revista Educación 11: 43-64. http://rabida.uhu.es/
dspace/bitstrea
m/handle/10272/4905/b15643773.pdf?sequence=3
3.6.2.4. Article en publicació periòdica:
Revista:
Forma bàsica:
Cognom(s), Nom. Any (dia i mes). Títol de l’article. Títol de la publicació
volum(número): pàgina inicial-pàgina inal.
Exemple:
Viadero, Daniel. 2009 (12 de setembre). Social-skills programs found to
yield gains in academic subjects. Education Week 27(16): 1-15.
286
Quaderns de Filologia
Periòdic:
Forma bàsica:
Cognom(s), Nom. Any (dia i mes). Títol de l’article. Títol de la publicació,
p. xx.
Exemple:
Patarroyo, Manuel. 2011 (19 de juny). El parásito de la malaria es mi conidente. El País, p. 64.
Periòdic en línia:
Forma bàsica:
Cognom(s), Nom. Any (dia i mes). Títol de l’article. Títol de la publicació.
http://www.xxx.com
Exemple:
Martínez, Francesc. 2013 (3 d’octubre). Crisi i futur de la televisió pública.
El Punt-Avui. http://www.elpuntavui.cat/noticia/article/5-cultura/19cultura/682407-crisi-i-futur-de-la-televisio-publica.html
3.6.3. Llibres
3.6.3.1. Llibre imprès
Un autor:
Forma bàsica:
Cognom(s), Nom. Any. Títol del llibre(: Subtítol)*. Lloc de l’edició: Editorial.
(* el parèntesi marca la opcionalitat del subtítol, si la publicació en té).
Exemple:
Spang, Kurt. 2003. Géneros literarios. Madrid: Síntesis.
Dos autors:
Forma bàsica:
Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc
de l’edició: Editorial.
Exemple:
Allan, Keith & Burridge, Kate. 2006. Forbidden words: Taboo and censoring of language. Cambridge: Cambridge University Press.
Normes d’edició
287
Tres autors:
Forma bàsica:
Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc de l’edició: Editorial.
Exemple:
Wagner, Emma; Bech, Svend & Martínez, Jesús M. 2002. Translating for
the European Union institutions. Manchester: St. Jerome Publishing.
3.6.3.2. Llibre electrònic amb doi:
Un autor:
Forma bàsica:
Cognom(s), Nom. Any. Títol del llibre(: Subtítol). [Base de dades]. Lloc
d’edició: Editorial. doi: http://dx.doi.org/xx.xxxx.xxxx.xx
Exemple:
Rapaport, Herman. 2011. The literature theory toolkit: A compendium of
concepts and methods. West Sussex: Wiley-Blackwell. [Versió de
Wiley-Online]. doi: http://dx.doi.org/10.1002/9781444395693
Dos autors:
Forma bàsica:
Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc
d’edició: Editorial. [Base de dades]. doi: http://dx.doi.org/xx.xxxx.
xxxx.xx
Exemple:
Montero, Maritza & Sonn, Christopher C. 2009. Psychology of liberation:
Theory and applications. New York: Springer Science & Business
Media. [Versió de Springer.com]. doi: http://dx.doi.org/10.1007/9780-387-85784-8
Tres autors:
Forma bàsica:
Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc d’edició: Editorial. [Base de dades]. doi: http://
dx.doi.org/xx.xxxx.xxxx.xx
288
Quaderns de Filologia
Exemple:
Hardcastle, William J.; Laver, John & Gibbon, Fionna E. 2010. The handbook
of phonetics science. Oxford: Blacwell. [Versió de Wiley-Online].
doi: http://dx.doi.org/10.1002/9781444317251
3.6.3.3. Llibre en línia (sense doi):
[Noteu que les formes bàsiques es refereixen a llibres en línia amb accés institucional des d’una plataforma digital. En cas d’altre tipus d’accés, ometeu
les dades d’identiicació (docID: número d’identiicació). Veieu exemple en
3.6.4.3].
Un autor:
Forma bàsica:
Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc de l’edició: Editorial.
docID: número d’identiicació del document. http://www.aaaaa.com
[Accés dd/mm/aaaa].
Exemple:
Silva, Reinaldo F. 2011. Portuguese American literature. Penrith: Humanities E-books, LLP. docID: 1056727. http://site.ebrary.com [Accés
19/09/2013].
Dos autors:
Forma bàsica:
Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc
de l’edició: Editorial. docID: número d’identiicació del document.
http://www.aaaaa.com [Accés dd/mm/aaaa].
Exemple:
Valsalobre, Pep & Rossich, Albert. 2007. Literatura i cultura catalanes (segles xvii-xviii). Barcelona: Editorial UOC. docID: 10566824. http://
site.ebrary.com [Accés 29/08/2013].
Tres autors:
Forma bàsica:
Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc de l’edició: Editorial. docID: número d’identiicació del document. http://www.aaaaa.com [Accés dd/mm/aaaa].
Normes d’edició
289
Exemple:
Benito, Jesús; Manzanas, Anna M. & Simal, Begoña. 2009. Critical approaches to ethnic American literature. Uncertain mirrors: Magical realism in US ethnic literatures. Amsterdam: Rodopi. dociID: 10380441.
http://site.ebrary.com [Accés 10/10/2011].
3.6.4. Llibre amb editor o coordinador:
3.6.4.1. Llibre imprès
Un editor/coordinador:
Forma bàsica:
Cognom(s), Nom (ed./coord.)*. Any. Títol del llibre. Lloc d’edició: Editorial.
[*Altres opcions: Director (dir.) o Compilador (comp.)]
Exemple:
Bou Franch, Patricia (ed.). 2006. Ways into discourse. Granada: Comares.
Monereo, Carles (coord.). 2000. Estrategias de aprendizaje. Madrid: Visor.
Dos editors/coordinadors:
Forma bàsica:
Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc
d’edició: Editorial.
[* Altres: Director (dir.) o Compilador (comp.). No marqueu plural en cap cas].
Exemple:
Bravo, Diana & Briz, Antonio (ed.). 2004. Pragmática sociocultural: Estudios sobre el discurso de la cortesía en espanyol. Barcelona: Ariel.
Carranza, José A. & Ato, Esther (coord.). 2010. Manual de prácticas de
psicología del desarrollo. Murcia: Ediciones de la Universidad de
Murcia.
290
Quaderns de Filologia
Tres editors/coordinadors:
Forma bàsica:
Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any.
Títol del llibre. Lloc d’edició: Editorial.
Exemple:
Blas Arroyo, José Luis; Casanovas, Manuela & Velando, Mónica (ed.). 2006.
Discurso y sociedad: Contribuciones al estudio de la lengua en el
contexto social. Castellón: Universitat Jaume I.
Oltramari, Alessandro; Vossen, Piek & Qin, Lu (coord.). 2013. New trends
of research in ontologies and lexical resources: Ideas, projects and
systems. Heildelberg: Springer.
3.6.4.2. Llibre electrònic amb doi:
Un editor/coordinador:
Forma bàsica:
Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc d’edició: Editorial.
[Base de dades]. doi: http://dx.doi.org/xx.xxxx.xxxx.xx
Exemple:
Romero-Trillo, Jesús (ed.). 2012. Pragmatics and prosody in English panguage teaching. Netherlands: Springer. [Versió de Springer.com].
doi: http://dx.doi.org/10.1007/978-94-007-3883-6
Dos editors/coordinadors:
Forma bàsica:
Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc
d’edició: Editorial. [Base de dades]. doi: http://dx.doi.org/xx.xxxx.
xxxx.xx
Exemple:
Boehmer, Elleke & Morton, Stephen (ed.). 2009. Terror and the postcolonial: A concise companion. West Sussex: Wiley-Blackwell. [Versió de
Wiley-Online]. doi: http://dx.doi.org/10.1002/978144
44310085
Normes d’edició
291
Tres editors/coordinadors:
Forma bàsica:
Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any.
Títol del llibre. Lloc d’edició: Editorial. [Base de dades]. doi: http://
dx.doi.org/xx.xxxx.xxxx.xx
Exemple:
Clark, Andy; Ezquerro, Jesús & Larrazábal, Jesús M. (ed.). 1996. Philosophy
and cognitive science: Categories, consciousness and reasoning.
Netherlands: Springer. [Versió de Springer.com]. doi: http://dx.doi.
org/10.1007/978-94-015-8731-0
3.6.4.3. Llibre en línia (sense doi):
Un editor/coordinador:
Forma bàsica:
Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc d’edició: Editorial.
docID: número d’identiicació del document. http://www.xxx.com
[Accés dd/mm/aaaa].
Exemple:
Ciapuscio, Guiomar E. (ed.) 2009. De la palabra al texto: Estudios lingüísticos del español. Buenos Aires: Eudeba. http://core.cambeiro.com.
ar/0-4222-5.pdf [Accés 12/04/2010].
Dos editors/coordinadors:
Forma bàsica:
Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc
d’edició: Editorial. docID: número d’identiicació del document.
http://www.aaaaa.com [Accés dd/mm/aaaa].
Exemple:
Barletta, Norma & Chamorro, Diana (ed.). 2011. El texto escolar y el aprendizaje: Enredos y desenredos. Barranquilla: Universidad del Norte.
docID: 10485834. http://site.ebrary.com [Accés 12/06/2013].
292
Quaderns de Filologia
Tres editors/coordinadors:
Forma bàsica:
Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any.
Títol del llibre. Lloc d’edició: Editorial. docID: número d’identiicació del document. http://www.aaaaa.com [Accés dd/mm/aaaa].
Exemple:
Newman, John; Baayen, Harald R. & Rice, Sally (ed.). 2010. Corpusbased studies in language use, language learning and language documentation. Amsterdam & New York: Rodopi. http://bvbr.
bibbvb.de:8991/F?func=service&doc_library=BVB01&doc_
number=024531245&line_number=0001&func_code=DB_
RECORDS&service_type=MEDIA [Accés 13/09/2013].
3.6.5. Capítols de llibres o entrades en obres de referència:
3.6.5.1. Capítols de llibre
Un autor i un o més editors del llibre:
(Afegir tants autors i/o editors com calga, seguint el model dels articles de
revistes i els llibres).
Forma bàsica:
Cognom(s), Nom. Any. Títol del capítol. En Cognom(s), Nom (ed./coord.)
Títol del llibre. Lloc de publicació: Editorial, pàgina inicial-pàgina
inal.
Cognom(s), Nom. Any. Títol del capítol. En Cognom(s), Nom & Cognom(s),
Nom (ed./coord.) Títol del llibre. Lloc de publicació: Editorial, pàgina inicial-pàgina inal.
Cognom(s), Nom. Any. Títol del capítol. En Cognom(s), Nom; Cognom(s),
Nom & Cognom(s), Nom (ed./coord.) Títol del llibre. Lloc de publicació: Editorial, pàgina inicial-pàgina inal.
Normes d’edició
293
Exemple:
Schegloff, Emmanuel. 1982. Discourse as an interactional achievement. In
Tannen, Deborah (ed.) Analysing discourse: Text and talk. Washington DC: Georgetown University Press, 73-93.
Kerbrat-Orecchioni, Catherine. 2004. ¿Es universal la cortesía? En Bravo,
Diana & Briz, Antonio (ed.) Pragmática sociocultural: Estudios sobre el discurso de cortesía en español. Barcelona: Ariel, 39-54.
Nota: La indicació lingüística de la localització de l’obra en el volum (ex.
“In”, “En”, etc.), depèn de la llengua del l’article.
3.6.5.2. Entrada amb autor en una obra de referència impresa:
Forma bàsica:
Cognom(s), Nom. Any. Títol de l’entrada. En Cognom(s), Nom (ed.) Títol de
l’obra de referència. Lloc d’edició: Editor.
Exemple:
Isidore, Ian. 1998. African-American literature: Central and South America.
Encyclopedia of Latin American Literature. Chicago, IL: Fitzroy Dearnborn Publishers.
3.6.5.3. Entrada amb autor en una obra de referència en línia:
Forma bàsica:
Cognom(s), Nom. Any. Títol de l’entrada. En Cognom(s), Nom (ed.) Títol
de l’obra de referència. http://www.aaaaa.com [Accés dd/mm/aaaa].
Cognom(s), Nom. Any. Títol de l’entrada. En Cognom(s), Nom (ed.) Títol de
l’obra de referència. doi: http://dx.doi.org/xxx.xxxx.xxxxx
Exemple:
Graham, George. 2008. Behaviourism. En Zalta, Enrique (ed.) The Standford Encyclopaedia of Philosophy. http://plato.stanford.edu/entries/
behaviorism [Accés 23/11/2009].
Palfreyman, Mark & Jorgensen, Erik . 2009. In vivo analysis of membrane
fusion. En Wiley InterScience Encyclopedia of Life Sciences. doi:
http://dx.doi.org/10.1002/9780470015902.a0020891
294
Quaderns de Filologia
3.6.5.4. Entrada sense autor en una obra de referència en línia:
Forma bàsica:
Nom de la entrada. (s.a.)*. En Títol de l’obra de referència. http://www.
aaaaa.com
(*sense any)
Exemple:
Feminism. (s.a.). En Encyclopaedia Britannica. http://global.britannica.
com/ EBchecked/topic/724633/
feminism
3.6.6. Tesis doctorals, treballs d’investigació o treballs de màster
Tesis doctorals impreses:
Forma bàsica:
Cognom(s), Nom. Any. Títol de la tesi (Tesi Doctoral). Lloc: Universitat/
Institució – Departament/Facultat/Institut.
Exemple:
Vegara Fabregat, Laura. 2013. La metáfora en los textos jurídicos y su traducción (Tesis Doctoral). Alacant: Universitat d’Alacant - Departament de Filologia Anglesa.
Tesis doctorals en línia:
Forma bàsica:
Cognom(s), Nom. Any. Títol de la tesi (Tesi Doctoral). Lloc: Universitat/Institució – Departament/Facultat/Institut. [Base de dades]. http://www.
aaaaa.com
Exemple:
Solís García, Inmaculada. 2011. La utilidad del concepto de Referencia en la
didáctica del Español Lengua Extranjera (Tesis Doctoral). Oviedo:
Universidad de Oviedo - Departamento de Filologia Española. [Base
de dades Teseo]. https://www.educacion.gob.es/teseo/imprimirFicheroTesis.do?
ichero=29251
Normes d’edició
295
Tesis doctorals en base de dades comercial:
Forma bàsica:
Cognom(s), Nom. Any. Títol de la tesi (Tesi Doctoral). [Base de dades] (número d’identiicació).
Exemple:
Santini Rivera, Manuel. 1998. The Effects of Various Kinds of Verbal Feedback on the Performance of the Selected Motor Development Skills
in Adolescent Males with Down Syndrome (Tesis Doctoral). [Bases de
dades ProQuest Dissertations & Theses] (AAT 9832765).
Per a treballs d’investigació, tesines o treballs de màster, seguiu el model
de les tesis substituint el tipus de document entre parèntesi darrere del títol del
treball.
3.6.7. Informes tècnics i d’investigació
Amb autor(s):
Forma bàsica:
Cognom(s), Nom. Any. Títol de l’informe (número assignat). Institució que
encarrega l’informe. Lloc de publicació: Editor. http://www.aaaaa.
com [Accés dd/mm/aa].
Exemple:
González García, Maria del Mar & Corredera González, Azucena. 2004.
Evaluación de la enseñanza y aprendizaje de la lengua inglesa: Educación secundaria obligatòria 2001 – Informe inal. Ministerio de
Educación y Ciencia. Instituto Nacional de Evaluación y Calidad del
Sistema Educativo (INECSE). Madrid: Subdirección General de Información y Publicaciones.
Amb autor corporatiu, institució o organització
Forma bàsica:
Nom de la institució. Any. Títol de l’informe (número assignat). Lloc de publicació: Editor. http://www.aaaaa.com [Accés dd/mm/aa].
296
Quaderns de Filologia
Exemple:
Instituto Nacional de Evaluación Educativa. 2013. Panorama de la educación. Indicadores de la OCDE 2013. Informe español. Madrid: Ministerio de Educación, Cultura y Deporte. http://www.mecd.gob.es/
dctm/inee/internacional/panoramadelaeducacion2013informe-espanol.pdf?documentId=0901e72b816996b6 [Accés 12/09/2013].
3.6.8. Contribucions en congressos i conferències
3.6.8.1. Publicació en Actes
Forma bàsica:
Cognom(s), Nom. Any. Títol de la contribució. En Cognom, Nom (ed.) Títol de les Actes del Congrés. Lloc d’edició: Editorial, pàgina inicialpàgina inal.
Exemple:
Yates, Alan. 1998. Sobre les característiques (sub)genèriques de la novel·la
curta o nouvelle. En Alonso, Vicent; Bernal, Assumpció i Gregori,
Carme (ed.) Actes del I Simposi Internacional de Narrativa Breu.
Barcelona: Publicacions de la Abadia de Montserrat, 9-40.
(...)
Bernal, María. 2005. Hacia una categoritzación sociopragmática de la cortesía, descortesia y anticortesía en conversaciones españolas de registro
coloquial. En Bravo, Diana (ed.) Actas del Primer Coloquio Edice:
La perspectiva no etnocentrista de la cortesía. Estocolmo: Universidad de Estocolmo, 365-398.
Si se citen tres o més treballs d’un mateix volum (d’un llibre o d’unes
Actes), es pot simpliicar la citació de la següent manera, incloent-hi a més a
la llista una referència completa del volum seguint les normes de citació dels
llibres:
Exemple:
Yates, Alan. 1998. Sobre les característiques (sub)genèriques de la novel·la
curta o nouvelle. En Vicent Alonso, Assumpció Bernal i Carme Gregori (eds.), 9-40.
(...)
Alonso, Vicent; Bernal, Assumpció & Gregori, Carme (eds.). 1998. Actes del
I Simposi Internacional de Narrativa Breu. Barcelona: Publicacions
de la Abadia de Montserrat.
Normes d’edició
297
3.6.8.2. Treballs no publicats
Forma bàsica:
Cognom(s), Nom. Any. Títol de la contribució. Comunicació/Ponència presentada en el Nom del Congrés. Lloc de celebració: dates del congrés.
Exemple:
Gil-Bardají, Ana & Minett-Wilkinson, Jacqueline. 2011. Traducción e interpretación en los servicios públicos de Cataluña: Resultados de un
estudio empírico. IV Congreso Internacional de Traducción e Interpretación en los Servicios Públicos. Alcalá de Henares: Universidad
de Alcalá, 13-15 de abril.
3.6.9. Altres fonts o recursos electrònics
3.6.9.1. Pàgines web
Forma bàsica:
Cognom, Nom / Editor. Any d’última actualització. Títol de la pàgina web.
Lloc de publicació: Editorial*. http://www.aaaaa.com [Accés dd/mm/
aaa].
(*si està disponible)
Exemple:
Modern Language Association. 2003. MLA Style. http://www.mla.org/style
[Accés 01/05/2012].
3.6.9.2. CD-ROM i DVD
Forma bàsica:
Cognom, Nom / Editor. Any. Títol del recurs. [Tipus de suport]. Lloc de
publicació: Editorial (si està disponible).
Exemple:
Real Academia Española (RAE). 2001. Nuevo tesoro lexicográico de la lengua española. [DVD-ROM]. Madrid: Espasa.
En el cas de produccions cinematogràiques, curtmetratges, documentals,
etc., es podrà incloure el director i l’autor del guió, de la següent forma:
298
Quaderns de Filologia
Forma bàsica:
Cognom, Nom (director) & Cognom, Nom (guió). Any. Títol de la producció. [Tipus de suport]. Lloc de distribució: Estudi (si està disponible).
Exemple:
Parker, Oliver (director) & Finlay, Toby (guió). (2010). El retrato de Dorian
Gray. [DVD]. Madrid: Aurum producciones.
3.6.9.3. Altres fonts en línia:
Seguiu la forma bàsica següent per altres tipus de fonts o recursos en línia. Per
a especiicar el tipus de recurs, utilitzeu claudàtors després del títol: [Àudio
podcast], [comentari en línia de for], [missatge de llista de discussió], etc.
Forma bàsica:
Cognom(s), Nom. Any (dia i mes). Títol del recurs. [Tipus de recurs]. En
font Nom del programa o medi. http://www.aaaaa.com [Accés dd/
mm/aaaa].
Exemple:
Marcé, Xavier. 2013 (2 d’octubre). Radiograia del panorama cultural
català. [Àudio podcast]. En Catalunya Radio El Cafè de la República. http://www.catradio.cat/audio/758840/Xavier-Marce [Accés
03/10/2013].
3.7. Requisits tipogràics
La cursiva podrà utilitzar-se (a més d’en els títols de les publicacions) per destacar algun terme o diferenciar paraules o frases curtes en una llengua diferent
a la de l’article. No per a les citacions.
S’usarà el guió curt (-) en els casos ortogràicament exigibles i l’intermedi
(–) en funció de parèntesi dins d’una frase. En aquest cas, si l’incís acaba en
punt, se suprimirà l’últim guió.
S’usaran les cometes angleses (“...”). Quan calguen distincions internes en
una citació, s’empraran les cometes simples (‘...’). Si es dóna el cas, entre les
cometes simples i les angleses, es deixarà un espai (...’ ”).
3.8. Elements gràics
Les taules aniran centrades, amb amplària màxima de la caixa del text.
299
Normes d’edició
Les igures tindran una resolució màxima de 300ppp. Els autors no les inclouran en el text de l’article que entreguen a l’editor. Les igures aniran en un
document apart. L’autor marcarà en l’article el lloc on s’inserirà cada igura
entre claudàtors, majúscula i amb una espai anterior i posterior respecte al text:
Exemple:
Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de
l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article.
Text de l’article. Text de l’article. Text de l’article. Text de l’article.
[INSERTAR FIGURA 1]
Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de
l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article.
Text de l’article. Text de l’article. Text de l’article.
En ambdós casos, la llegenda anirà en la part inferior, en Times 9 redona,
centrada i separada amb un espaiat anterior de 6 pt., amb indicació de tipus
d’element i numerada.
Exemple:
dades de la taula
dades de la taula
dades de la taula
dades de la taula
dades de la taula
dades de la taula
dades de la taula
dades de la taula
Taula 1. Llegenda de la taula
(Si es tracta d’una taula, les dades aniran, preferiblement, en Times 10).
Exemple:
Figura 1. Llegenda de la igura
ojs.uv.es/index.php/qilologia/index
Qf
Lingüístics
REVISTA QUADERNS de FILOLOGIA
ESTUDIS LINGÜÍSTICS
Volum I (1995): Aspectes de la relexió i de la praxi interlingüística. Ed. de
Carlos Hernández, Brigitte Lépinette i Manuel Pérez Saldanya.
Volum II (1997): Sobre l’oral i l’escrit. Ed. d’Antonio Briz, Maria Josep
Cuenca i Enric Serra.
Volum III (1998): Pragmàtica intercultural. Ed. d’Antonia Sánchez, Vicent
Salvador i Josep-Ramon Gómez.
Volum IV (1999): El contacto lingüístico en el desarrollo de las lenguas
occidentales. Ed. de Milagros Aleza, Miguel Fuster i Brigitte Lépinette.
Volum V (2000): Aprendizaje y enseñanza de una segunda lengua. Ed. de M.ª
José Coperías, Jordi Redondo i Julia Sanmartín.
Volum VI (2001): La pragmática de los conectores y las partículas modales.
Ed. de Hang Ferrer i Salvador Pons.*
Volum VII (2002): Sexe i llenguatge: la construcció lingüística de les identitats
de gènere. Ed. de José Santaemilia, Beatriz Gallardo i Julia Sanmartín.*
Volum VIII (2003): Historia de la traducción. Ed. de Brigitte Lépinette i
Antonio Melero.
Volum IX (2004): Lingüística diacrónica contrastiva. Ed. de Cesáreo Calvo,
Emili Casanova i Fco. Javier Satorre.
Volum X (2005): Les llengües d’especialitat: noves perspectives d’investigació.
Ed. de M.ª Amparo Olivares Pardo i Francisca Suau Jiménez.
Volum XI (2006): Critical Discourse Analysis. Ed. de Júlia Todolí, María
Labarta i Rosanna Dolón.
Volum XII (2007): Pragmática, discurso y sociedad. Ed. de Patricia Bou
Franch, A. Emma Sopeña Balordi i Antonio Briz.
Volum XIII (2008): Historiografía lingüística hispánica. Ed. de Brigitte
Lépinette, María José Martínez Alcalde i Emili Casanova.
Volum XVI (2009): Nuevas perspectivas en lingüística cognitiva / New
perspectives in cognitive linguistics. Ed. de M.ª Amparo Olivares i Eusebio
Llácer.
Volum XV (2010): Lexicografía en el ámbito hispánico. Ed. de Cesáreo Calvo,
Brigitte Lépinette i Jean-Claude Anscombre.
302
Quaderns de Filologia
Volum XVI (2011): La comunicación escrita en el siglo xxi. Ed. de Nicolás
Estévez, José Ramón Gómez i María Carbonell.
Volum XVII (2012): Lengua y ciencia. Recepción del discurso cientíico.
Ed. de Julia Pinilla Martínez, Virginia González García i Cecilio Garriga
Escribano.
Volum XVIII (2013): Theoretical and empirical advances in word-formation.
Ed. de Manuel Pruñonosa-Tomás, Jesús Fernández-Domínguez i Vincent
Renner.
Volum XIX (2014): La fonética como ámbito interdisciplinar. Estudios de
fonopragmática, fonética aplicada y otras interfaces. Ed. de Antonio
Hidalgo Navarro, Carlos Hernández Sacristán i Francisco José Cantero
Serena.
Volum XX (2015): Toponímia romànica. Ed. de Germà Colón, Dieter Kremer
i Emili Casanova.
Volum XXI (2016): La igura del traductor a través de los tiempos. Ed. de
Jordi Sanchis, María Elena Jiménez i Nicolás Antonio Campos Plaza.
Volum XXII (2017): Words, Corpus and back to Words. Ed. de Miguel Fuster
Márquez i Moisés Almela.
ESTUDIS LITERARIS
Volum I (1995): Homenatge a Amelia García-Valdecasas. Volums I i II. Ed. de
Ferran Carbó, Juan Vicente Martínez, Evelio Miñano i Carmen Morenilla.
Volum II (1996): Funció didàctica i persuasió en la literatura. Ed. de Ferran
Carbó Aguilar, Evelio Miñano i Carmen Morenilla.
Volum III (1997): Dona i literatura. Ed. de Ferran Carbó, Sonia Mattalía,
Evelio Miñano i Carmen Morenilla.
Volum IV (1999): Les avantguardes i la renovació teatral. Ed. de Juan Vte.
Martínez Luciano, Carmen Morenilla, Ramon X. Rosselló i Josep Lluís
Sirera.
Volum V (2000): Homenatge a César Simón. Ed. d’Antònia Cabanilles, José
Vicente Bañuls i Arcadio López.
Volum VI (2001): Humor i literatura. Ed. de Carme Gregori, Dolores Jiménez
i Juan Vicente Martínez.
Volum VII (2002): Narrativa i història. Ed. d’Assumpció Bernal, María José
Coperías i Nuria Girona.
Volum VIII (2003): Traducción y práctica literaria en la Edad Media
Románica. Ed. de Rosanna Cantavella, Marta Haro i Elena Real.
Volum IX (2004): Tropos del cuerpo. Ed. de Nuria Girona i Manuel Asensi
Pérez.
303
Índex de publicacions
Volum X (2005): La recepción de los clásicos. Ed. de Rafael Beltrán Llavador,
Puriicación Ribes Traver i Jorge L. Sanchis Llopis.
Volum XI (2006): Poesia i silenci. Ed. d’Antònia Cabanilles, Ferran Carbó i
Evelio Miñano.
Volum XII (2007): Cruzando la frontera. Ed. d’Ana Calero Valera, Domingo
Pujante i Miguel Teruel Pozas.
Volum XIII (2008): Traducció creativa. Ed. de Cecilia López i Jesús Tronch.
Volum XIV (2009): La ciencia icción en los discursos culturales y medios
de expresión contemporáneos. Ed. de Adela Cortijo, Guillermo López i
Antonio Altarriba.
Volum XV (2010): La recepció del teatre contemporani. Ed. de Ramon X.
Rosselló, Josep Lluís Sirera i John London.
Volum XVI (2011): Escrituras del yo. Ed. de Brigitte Jirku, Begoña Pozo i
Ursula Schneider.
Volum XVII (2012): Las mujeres, la escritura y el poder. Ed. de Júlia Benavent
Benavent, Elena Moltó Hernández i Silvia Fabrizio-Costa.
Volum XVIII (2013): El relat: literatura, lectura i escriptura. Ed. de Gemma
Lluch, Lluís Quintana i Carmen Gregori.
Volum XIX (2014): Teatro de excepción: experiencias escénicas no
institucionales en la Europa de los siglos xx y xxi. Ed. de Juan Carlos de
Miguel y Canuto, Mireia Aragay Sastre y Juan Vicente Martínez Luciano.
Volum XX (2015): Traducción y censura: Nuevas perspectivas. Ed. de Gora
Zaragoza Ninet, Juan José Martínez Sierra i José Javier Ávila-Cabrera.
Volum XXI (2016): El universo concentracionario: escribir para no olvidar.
Ed. de Javier Lluch-Prats, Evelio Miñano Martínez i Javier Sánchez
Zapatero.
Volum XXII (2017): Revisión crítica de ediciones y traducciones de textos
en el siglo xix. Ed. de María José Bertomeu Masiá, María José Coperías
Aguilar i Sondra Dall’oco.
ESTUDIS DE COMUNICACIÓ
Volum I (2002): La cultura mediàtica. Modes de representació i estratègies
discursives. Ed. de Josep V. Gavaldà, Carmen Gregori i Ramon X. Rosselló.
Volum II (2004): Periodisme de complexitat: ciència, tecnologia i societat. Ed.
de Carolina Moreno Castro, Josep Lluís Gómez Mompart i Xavier Gómez
Font.
Volum III (2008): El discurs del còmic. Ed. de Pelegrí Sancho Cremades,
Carmen Gregori Signes i Santiago Renard Álvarez.
304
Quaderns de Filologia
Col·lecció Anejos de Quaderns de Filologia
Anejo I. Carlos HernándeZ (1985): Oraciones relejas y estructuras
actanciales en español.
Anejo II. Julio Calvo PéreZ (1986): El adjetivo puro. Estructura léxica y
topología.
Anejo III. Milagros AleZa IZquIerdo (1987): SER con participio de perfecto
en construcciones activas no oblicuas (español medieval).
Anejo IV. Antonio BrIZ GómeZ (1989): Sustantivación y lexicalización en
español (La incidencia del artículo).*
Anejo V. Milagros AleZa IZquIerdo (Con la colaboración de Salvador Pons
Bordería e Isabel García IZquIerdo) (1992): Americanismos léxicos
en la narrativa de José María Arguedas.*
Anejo VI. Rosario Peñaranda MedIna (1994): La novela modernista
hispanoamericana: estrategias narrativas.*
Anejo VII. Carme Manuel Cuenca (1994): Mito e innovación en la narrativa
estadounidense del Nuevo Sur (1879-1918).
Anejo VIII. Paul Scott DerrIck (1994): Thinking for a change. Gravity’s
Rainbow and symptoms of the paradigm shift in occidental culture.
Anejo IX. Mercedes Román FernándeZ (1994): El español dominicano en
el siglo xviii. Análisis lingüístico de la ‘Historia de la conquista de la isla
española de Sto. Domingo’ de L. J. Peguero.*
Anejo X. Juan Pedro SáncheZ MéndeZ (1994): Aproximación al léxico
venezolano del siglo xviii a través de la ‘Descripción exacta de la provincia
de Benezuela’, de J. L. Cisneros.*
Anejo XI. Francisco José LóPeZ Alonso (1995): César Vallejo, Las Trazas
del narrador.*
Anejo XII. Amparo RIcós (1995): Uso, función y evolución de las
construcciones pasivas en español medieval.*
Anejo XIII. Joaquín García-Medall (1995): Casi un siglo de formación de
palabras del español (1900-1994): Guía bibliográica.*
Anejo XIV. Marta Haro (1995): Los compendios de castigos del xiii:
estructuras narrativas y mecanismos adoctrinadores.
Anejo XV. Mercedes Román FernándeZ (1995): Aportaciones a los estudios
sobre el caló en España.*
Anejo XVI. Antonio BrIZ GómeZ (coord.) (1995): La conversación coloquial.
Materiales para su estudio.
Anejo XVII. Nuria GIrona FIBla (1995): Escrituras de la historia. La novela
argentina de los años 80.
Anejo XVIII. Karen Andresen et alii (eds.) (1995): Ilustración y modernidad.
La crítica de la modernidad en la Literatura alemana.
Índex de publicacions
305
Anejo XIX. M.ª José MartíneZ Alcalde (1996): Morfología histórica de
los posesivos españoles.*
Anejo XX. Eusebio V. Llácer (1997): Introducción a los estudios sobre
traducción. Historia, teoría y análisis descriptivos.*
Anejo XXI. Antonio HIdalgo Navarro (1997): La entonación coloquial.
Función demarcativa y unidades de habla.*
Anejo XXII. Javier García GIBert (1997): La imaginación amorosa en la
poesía del Siglo de Oro.
Anejo XXIII. Roger GonZáleZ Martell y Maribel CruZ GonZáleZ
(1997): Adivinanzas en La Habana.
Anejo XXIV. Leonor RuIZ GurIllo (1997): Aspectos de fraseología teórica
española.*
Anejo XXV. Julia Sanmartín SáeZ (1998): Lenguaje y cultura marginal. El
argot de la delincuencia.*
Anejo XXVI. Rosana Dolón (1998): La negociación como tipo discursivo.
Anejo XXVII. Salvador Pons Bordería (1998): Conexión y conectores.
Estudio de su relación en el registro informal de la lengua.
Anejo XXVIII. José Ramón GómeZ MolIna (1998): Actitudes lingüísticas
en una comunidad bilingüe y multilectal. Área metropolitana de Valencia.
Anejo XXIX. Juan GómeZ CaPuZ (1998): El préstamo lingüístico. Conceptos,
problemas y métodos.
Anejo XXX. Brigitte E. JIrku, Cecilia LóPeZ RoIg y Herta SchulZe
SchwarZ (eds.) (1998): El cuerpo en la lengua y literatura alemanas: Ein
Weites Feld.
Anejo XXXI. Rosa ÁlvareZ Sellers (ed.) (1999): Literatura portuguesa y
literatura española. Inluencias y relaciones.
Anejo XXXII. Elena Ortells Montón (1999): Ficción y no icción: La
unidad literaria en la obra de Truman Capote.*
Anejo XXXIII. Berta RaPoso HernándeZ (ed.) (1999): Textos alemanes
primitivos. La Edad Media alemana temprana en sus testimonios literarios.
Anejo XXXIV. Mercedes QuIlIs Merín (1999): Orígenes históricos de la
Lengua Española.
Anejo XXXV. Javier Satorre Grau (1999): Los posesivos en español.
Anejo XXXVI. Adela García Valle (1999): El notariado hispánico
medieval: Consideraciones histórico-diplomáticas y ilológicas.
Anejo XXXVII. Francisca Suau JIméneZ (2000): La inferencia léxica como
estrategia cognitiva. Aplicación al discurso escrito en lengua inglesa.
Anejo XXXVIII. Fernando Martín Polo (coord.) y Eduardo Tello Torres
(eds.) (2000): Historia civil, eclesiástica de Titaguas de D. Simón Rojas
Clemente y Rubio.
306
Quaderns de Filologia
Anejo XXXIX. Paloma Arroyo Vega (2001): Expresión y contenido de las
oposiciones diatéticas en el castellano del siglo xv de la Corona de Aragón.
Anejo XL. Luis Veres Cortés (2001): La narrativa del indio en la revista
Amauta.
Anejo XLI. Marcial TerrádeZ Gurrea (2001): Frecuencias léxicas del
español coloquial: Análisis cuantitativo y cualitativo.
Anejo XLII. Carmen MorenIlla Talens y M.ª Julia JIméneZ FIol (eds.)
(2001): Desde las tierras de José Martí. Estudios lingüísticos y literarios.
Anejo XLIII. Ricardo HernándeZ PéreZ (2001): Poesía latina sepulcral de
la Hispania romana: Estudio de los tópicos y sus formulaciones.
Anejo XLIV. Cristina Matute y Azucena PalacIos (2001): El indigenismo
americano II.*
Anejo XLV. Vicente Revert SanZ (2001): Entonación y variación geográica
en el español de América.
Anejo XLVI. José Ramón GómeZ MolIna (coord.) (2001): El español
hablado de Valencia. Materiales para su estudio (PRESEEA). 1 Nivel
sociocultural alto.
Anejo XLVII. José María García Martín (2001): La formación de los
tiempos compuestos del verbo en español medieval y clásico.*
Anejo XLVIII. Azucena PalacIos y Ana Isabel García (2001): El
Indigenismo americano III.*
Anejo XLIX. Dolores JIméneZ y Evelio MIñano (2002): Homenaje a Josefa
María Castellví.
Anejo L. Rafael Beltrán, Marta Haro, Josep Lluís SIrera y Antoni
Tordera (2002): Homenaje a Luis Quirante, 2 vol.
Anejo 51. Rosario Navarro Gala (2003): Lengua y cultura en la “Nueua
corónica y buen gobierno”. Aproximación al español de los indígenas en
el Perú de los siglos xvi-xvii.
Anejo 52. Jesús PerIs Llorca (2003): Gauchos en el mundo del 80. Leyendo
a Eduardo Gutiérrez y Eugenio Cambaceres.
Anejo 53. Beatriz Ferrús Antón (2004): Discursos cautivos: convento, vida,
escritura.
Anejo 54. Marta InIgo Ros (2004): Cultural terms in King Alfred’s Translation
of the Consolatio Philosophiae.
Anejo 55. Guillermo LóPeZ García (2004): Comunicación electoral y
formación de la opinión pública: las elecciones generales de 2000 en la
prensa española.
Anejo 56. José Ramón gómeZ molIna y M.ª Begoña gómeZ devís (2004):
La disponibilidad léxica de los estudiantes preuniversitarios valencianos.
Estudio de estratiicación sociolingüística.
Índex de publicacions
307
Anejo 57. Antonio torres torres (2004): Procesos de americanización del
léxico hispánico.
Anejo 58. José Ramón gómeZ molIna (coord.) (2005): El español hablado
de Valencia. Materiales para su estudio (PRESEEA). II Nivel sociocultural
medio.
Anejo 59. Maria Josep marín jordà (2005): Marcadors discursius procedents
de verbs de percepció. Argumentació implícita en el debat electoral.
Anejo 60. Dolors Palau samPIo (2005): Els estils periodístics. Maneres
diverses de veure i construir la realitat.
Anejo 61. José Ramón gómeZ molIna (coord.) (2007): El español hablado
de Valencia. Materiales para su estudio (PRESEEA). III Nivel sociocultural
bajo.
Anejo 62. Hang ferrer mora, Herbert Josef holZInger y Berta raPoso
fernándeZ (eds.) (2007): Homenaje a Herta Schulze Schwarz.
Anejo 63. Juan Carlos tordera yllescas (2008): Introducción a la
Gramática Léxico-Funcional.
Anejo 64. Jaume PerIs Blanes (2008): Historia del testimonio chileno. De
las estrategias de denuncia a las políticas memoria.
Anejo 65. Claude BenoIt, Dolores BermúdeZ, Juli leal y Elena real
(eds.) (2009): Homenaje a Dolores Jiménez Plaza. Escrituras del amor y
del erotismo.
Anejo 66. Virginia gonZáleZ garcía (2009): Mayans y la lexicografía del
xviii: Un modelo de diccionario universal aplicado a la jurisprudencia.
Anejo 67. Adrián caBedo neBot (2009): La segmentación prosódica en
español coloquial.
Anejo 68. Eduardo esPaña PaloP (2009): Construcciones con cuantiicador
en el ámbito panhispánico: norma y uso.
Anejo 69. Ferran grau codIna, José María maestre maestre y Jordi
PéreZ durá (2009): Litterae Humaniore. Del Renacimiento a la
Ilustración. Homenaje al profesor José María Estellés.
Anejo 70. María estornell Pons (2009): Neologismos en la prensa:
criterios para reconocer y caracterizar las unidades neológicas.
Anejo 71. Brigitte léPInette y Brisa gómeZ-ángel (2009): Études de
linguistique française.
Anejo 72. Maria Josep marIn, Llum Brancho, Josep À. mas i Anna I.
montesInos (eds.) (2010): Discurs polític i identitats (trans)nacionals.
Anejo 73. Miguel martíneZ lóPeZ, Puriicación rIBes traver y Santiago
gonZáleZ y fernándeZ-corugedo (2010): La lengua y la literatura
inglesa en sus textos: aproximación crítica. Homenaje al profesor
Francisco Fernández.
308
Quaderns de Filologia
Anejo 74. Juan Carlos tordera yllescas (2010): Lingüística computacional.
Teorías del habla.
Anejo 75. Antonio hIdalgo, Yolanda congosto i Mercedes quIlIs (eds.)
(2011): El estudio de la prosodia en el siglo xxi: perspectivas y ámbitos.
Anejo 76. Santiago vIcente llavata (2011): Estudio de las locuciones en
la obra literaria de Don Íñigo López de Mendoza (Marqués de Santillana).
Hacia una fraseología histórica del español.
Anejo 77. Esteban T. montoro del arco (ed.) (2012): Neología y
creatividad lingüística.
Anejo 78. Nuria gIrona fIBla (ed.ª) (2012): La cultura en tiempos de
desarrollo: violencias, contradicciones y alternativas.
Anejo 79. Vicente álvareZ vIves (2012): Estudio fraseológico contrastivo
de las locuciones adverbiales en los diccionarios de Vicente Salvá y de
Esteban Pichardo.
Anejo 80. Francisco Pedro Pla colomer (2012): Métrica, rima y oralidad en
el ‘Libro de Buen Amor’.
Anejo 81. Nicolás estéveZ fuertes y Begoña clavel arroItIa (2013):
Adquisición de Segundas Lenguas (L2) en el marco del Nuevo Milenio:
Homenaje a la profesora María del Mar Martí Viaño.
Anejo 82. Jorge Martí contreras (2016): Estudio contrastivo gramatical
de campo en español como lengua extranjera.
Anejo 83. Violeta martíneZ-ParIcIo (ed.) (2017): Cien años después del
Cours de Linguistique Générale.
Anejo 84. Carles PadIlla carmona (ed.) (2017): Llull, Cervantes,
Shakespeare. Imágenes literarias de la locura.
* Els números amb asterisc estan exhaurits.
Distribució: Publicacions de la Universitat de València.
C/ Arts Gràiques, 13; 46010-València; Tfn.: 963 937 174 - Fax: 963 617 051