Academia.eduAcademia.edu

Co-editors(2017) Words, Corpus and Back to Words

2017, Quaderns de Filologia: Estudis Linguistics

https://doi.org/10.7203/qf.22.11297

The aim of this issue of is to bring together investigation into the lexicon in a variety of languages, in a diversity of manifestations – both at the word level and beyond the word level – and from a variety of perspectives, including not only those which focus on how the vocabulary is internally organized, but also those which deal with the role that lexical units and lexical relations play in the organization of other language levels, particularly in the organization of the discourse. These issues are approached from a variety of perspectives that include not only developments in several disciplines of theoretical and descriptive linguistics, particularly in lexicology, phraseology, word formation, discourse analysis, but also in diverse applied disciplines such as translation, foreign language teaching, English for specific purposes and critical discourse analysis. One of the criteria employed in the compilation of the volume was also the coverage of linguistic diversity. In total, six different languages are investigated in the studies selected in this volume: English, German, Spanish, French, Portuguese, Italian. Without claiming exhaustiveness, we consider that the variety of contributions presented here offers an insight into the vigour of current corpus research into phenomena related to the lexicon. Admittedly, the full range of topics, approaches and methodologies developed in this area of research could not fit in a single volume, but a careful selection of studies representing a variety of interesting advances can be representative of significant developments taking place in the field.

WORDS, CORPUS AND BACK TO WORDS QUADERNS DE FILOLOGIA ESTUDIS LINGÜÍSTICS XXII WORDS, CORPUS AND BACK TO WORDS Edició de MIGUEL FUSTER MÁRQUEZ MOISÉS ALMELA FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ UNIVERSITAT DE VALÈNCIA 2017 QUADERNS DE FILOLOGIA DE LA UNIVERSITAT DE VALÈNCIA ESTUDIS LINGÜÍSTICS Volum XXII: Words, Corpus and back to Words Quaderns de Filologia és la publicació regular de la Facultat de Filologia, Traducció i Comunicació de la Universitat de València. Va nàixer el 1980 amb el nom de Cuadernos de Filología. A partir de 1995 enceta una segona fase com a Quaderns de Filologia. Compta amb dues sèries de publicació anual (Estudis lingüístics i Estudis literaris). Cada número de Quaderns de Filologia té un caràcter monogràic i l’edició corre a càrrec de professors de la Facultat de Filologia, Traducció i Comunicació especialistes en la matèria. Aquests editors són, en cada número, els responsables de la selecció dels articles. No obstant això, els articles publicats des del número IX, són sotmesos a dues avaluacions, interna i externa. La proporció d’articles externs a la Universitat de València és, actualment, del 80 % de les contribucions al volum. Quaderns de Filologia, a més, compta amb una col·lecció d’estudis titulada Anejos de Quaderns de Filologia. Edita: Universitat de València Intercanvi i subscripcions: Vicedeganat de Cultura, Igualtat i Comunicació. Facultat de Filologia, Traducció i Comunicació Avda. Blasco Ibáñez, 32. 46010 València [email protected] Distribució: Publicacions de la Universitat de València C/ Arts Gràiques, 13. 46010-València / Tfn.: 963 937 174 - Fax: 963 617 051 © dels textos: els autores i les autores © d’aquesta edició: Universitat de València, 2017 © de la coberta: Reproducció d’un fragment de l’oli de Pieter Brueghel (1563) La torre de Babel (Kunsthistorisches Museum Wien). Disseny de la coberta: Celso Hernández de la Figuera (PUV). Fotocomposició i maquetació: Communico. Letras y Píxeles, S. L. Dipòsit legal: V.229-1995 ISSN: 1135-416X Imprimeix: Arts Gràiques Soler, S. L. QUADERNS DE FILOLOGIA DE LA UNIVERSITAT DE VALÈNCIA ESTUDIS LINGÜÍSTICS Directors Honoríics: Ángel López García i Joan Oleza Directora: Begonya Pozo Sánchez Secretari de Redacció: Sergio Maruenda Bataller Secretaria d’Edició: Vicedeganat de Cultura, Igualtat i Comunicació. Facultat de Filologia, Traducció i Comunicació Consell de Redacció: Mikel Labiano (Dept. Filologia Clàssica, UVEG) Julia Sanmartín (Dept. Filologia Espanyola, UVEG) Cesáreo Calvo (Dept. Filologia Francesa i Italiana, UVEG) Begoña Clavel (Dept. Filologia Anglesa i Alemanya, UVEG) Emili Casanova (Dept. Filologia Catalana, UVEG) Monserrat Veyrat (Dept. Tª dels Llenguatges i CC, UVEG) Guillermo Montes Cala (Dept. Filología Griega, Universidad de Cádiz) Humberto López Morales (RAE y Asoc. de Academias de la Lengua Española) Lorenzo Renzi (Dept. di Studi Linguistici e Letterari, Università di Padova) Michael McCarthy (School of English, University of Nottingham) Jordi Ginebra (Dept. Filologia Catalana, Universitat Rovira i Virgili) José del Valle (Graduate Center, City University of New York (CUNY)) Elia Hernández Socas (Universität Leipzig) Comité Cientíic: Jean-Claude Anscombre (CNRS-Paris XII, França) Manuel Carrera Díaz (U. de Sevilla, Espanya) Nelson Cartagena (U. de Heildelberg, Alemanya) Germà Colón (U. de Basilea, Suïssa) Emilio Crespo (U. Autónoma de Madrid, Espanya) Perfecto E. Cuadrado (U. de les Illes Balears, Espanya) Luis Fernando Lara (Colegio de México) Jacek Fisiak (U. de Poznań, Polònia) Humberto López Morales (U. de Puerto Rico) Elena Rojas (U. de Tucumán, Argentina) Eustaquio Sánchez Salor (U. de Extremadura, Espanya) Barbara Wotjak (U. de Leipzig, Alemanya) ÍNDEX IntroductIon............................................................................. 9 BautIsta ZamBrana, María Rosario Corpus analysis of phraseology in an A1 level textbook of German as a foreign language .......................................... 13 Bestgen, Yves Getting rid of the Chi-square and Log-likelihood tests for analysing vocabulary differences between corpora ......... 33 chIerIchettI, Luisa “El criado pesado”: La caracterización en la serie Águila Roja..... 57 garofalo, Giovanni Persiguiendo con imparcialidad “el total desprecio a la Constitución”: el léxico valorativo en la Querella del Fiscal de Cataluña contra Carme Forcadell i Lluís ......... 79 gIméneZ-moreno, Rosa & Ivorra-PéreZ, Francisco Miguel The malleability behind terms referring to common professional roles: the current meaning of “boss” in British newspapers .. 105 hennecke, Inga & Baayen, Harald A quantitative survey of N Prep N constructions in Romance languages and prepositional variability ................................. 129 mansIlla, Ana Lingüística de corpus y fraseología contrastiva (alemán-español): Las combinaciones usuales de estructura [PREP + S]. El caso de entre lágrimas y unter Tränen ............................. 147 marín, María José & rea rIZZo, Camino Assessing EPAP lexical features: A corpus-based study ............... 165 mattIolI, Virginia Translator’s creativity in cultural elements transposition: a corpus-based study ............................................................. 187 sáncheZ-moya, Alfonso Corpus-driven insights into the discourse of women survivors of Intimate Partner Violence ................................................. 215 castaño castaño, Emilia, laso martín, Natalia Judith & verdaguer clavera, Isabel Immigration metaphors in a corpus of legal English: an exploratory study of EAL learners’ metaphorical production and awareness ..................................................... 245 normes d’edIcIó........................................................................ 273 índex general de PuBlIcacIons............................................ 301 ojs.uv.es/index.php/qilologia/index Qf Lingüístics WORDS, CORPUS AND BACK TO WORDS: FROM LANGUAGE TO DISCOURSE Miguel Fuster Márquez Moisés Almela Last century’s revolution in computer technologies has also brought with it some changes in the way we conceive language, which are partly due to such revolution, though not entirely. Technological advances in the ield of information and communication have made the compilation and processing of large amounts of data an incredibly easy and fast task. Until quite recently, the compilation of large amounts of text was a job that required an enormous effort by researchers. At present, such process has become more feasible and certainly less time consuming, giving the researcher more freedom to think about interesting ways of exploring the data. However, other important ‘revolutions’ have taken place in linguistics which in various ways have been favoured by these technological developments. One such important revolution has to do with linguistic theorisation. Linguists in the past would have been happy to decide on language matters simply by asking themselves how the grammar of their mother tongues worked since, as native speakers, they felt to be competent enough to take such decisions. This mentalistic approach, of course we are oversimplifying such approaches considerably, relied on the introspective mental power of well-educated speakers, and for most insightful decisions they made on the matter at hand they did not need to observe the authentic language produced by other speakers. All they needed was their own knowledge and their analytical power. In the Fuster Márquez, Miguel & Almela, Moisés. 2017. “Words, Corpus and back to Words: from language to discourse”. Quaderns de Filologia: Estudis Lingüístics 22: 9-12. doi: 10.7203/qf.22.11297 10 Quaderns de Filologia famous Saussurean dichotomy between ‘langue’ and ‘parole’, these linguists were on the side of ‘langue’; ‘parole’ was of little or no interest. However, an important change that was taking place in linguistics was one in which other linguists started to give priority to the manifestations of ‘parole’; that is, how language was actually used by speakers in their communities in order to theorise with greater accuracy about ‘langue’, or linguistic competence. Various signiicant developments are related to such more empirical linguistic movement. One of these was the acknowledgement of the spoken language as a legitimate part of language. Twentieth century lexicographers started to collect and introduce examples of informal or conversational registers in the dictionaries they produced. Also, no less signiicant in this new approach was, for example, the thrust of sociolinguistics, a broad research ield, with many branches and fuzzy boundaries, that viewed languages as heterogeneous entities. Sociolinguists observed that variation was more the rule than the exception in speech communities. Sociolinguists brought with them empirical methodologies that enabled them to analyse how real speakers produced language in real settings in order to build their theories of variation and change. Sociolinguistics also made use of quantiication in their methodologies. This is partly the context for the emergence of corpus linguistics as a new approach to language. The new framework relied on the examination of real data that had its origin in language use, to build convincing linguistic arguments. Both variation and usage have been essential arguments in corpus approaches. However, a corpus should not be confused with a database, quoting Sinclair (1996: 2.1) “[a] corpus is a collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language.” In contrast with any collection of data – any corpus linguist would insist – a corpus contains a representative sample of language if the researcher needs to draw relevant conclusions about language. Broadly speaking, unlike essentially mentalistic approaches, corpus research is empirical, with a preference for inductiveness, that is, the careful analysis of data in representative corpora. However, most practitioners would agree that corpus linguistics is not a theory, it is a methodology, even if such a methodology is somehow special. In fact, such methodology may be applied to a language, different languages, different varieties of language or registers, by Introduction 11 means of small, medium or large corpora, and adopt different approaches in order to test different theories. Interest in corpus linguistics today may refer to areas such as the quality of corpus compilation, lexis and phraseology, grammar, variation and change, discourse or stylistics, among others. Corpus linguistics has been of interest in theoretical and applied linguistics. There is abundant applied research, for example, in the ields of lexicography, second language acquisition or translation. Indeed, it is dificult to think of research areas where corpus linguistics does not have room and something important to offer. Quite regularly, corpus methodology combines quantitative and qualitative approaches; where, in fact, one approach feeds the other. Former purely qualitative analyses have been in many cases superseded by approaches where quantiication and statistics are becoming more prominent. Nevertheless, many convinced corpus linguists would also claim that they are in favour of triangulation and convergent evidence as a more acceptable approach. Very frequently, the procedure of a corpus linguist will have as its starting point a word or a word list. Therefore, the close examination of a word’s behaviour will be crucial for practically any kind of research which relies on language use. It is also known that the most signiicant advances in contemporary lexicography have been driven by the inspection of reference corpora of variable size and scope that have allowed researchers a more thorough understanding of real usage. Also, the compilation of comparable corpora has provided the basis for establishing parallels, differences and nuances for the purpose of comparability or contrast between languages. In addition, the possibility of compiling more specialized ad hoc corpora has allowed the detailed analysis of vocabulary in different types of discourse, either to determine its value in specialized languages or to gain a better understanding of social or ideological implications, which is determined by the evaluation of linguistic preferences. Finally, it should be added that corpus approaches have revealed the existence of linguistic units which go beyond more traditional lexicological approaches. Extensive research on phraseology and corpus-based lexicography produced in recent decades has brought to light the frequency in discourse of meaningful co-occurring lexical patterns and lexical-grammatical co-selection. The aim of this issue is to bring together investigation into the lexicon in a variety of languages, in a diversity of manifestations – both at 12 Quaderns de Filologia the word level and beyond the word level – and from a variety of perspectives, including not only those which focus on how the vocabulary is internally organized, but also those which deal with the role that lexical units and lexical relations play in the organization of other language levels, particularly in the organization of discourse. These issues are approached from a variety of perspectives that include not only developments in several disciplines of theoretical and descriptive linguistics, particularly in lexicology, phraseology, word formation, discourse analysis, but also in diverse applied disciplines such as translation, foreign language teaching, English for speciic purposes and critical discourse analysis. One of the criteria employed in the compilation of the volume was also the coverage of linguistic diversity. In total, six different languages are investigated in the studies selected in this volume: English, German, Spanish, French, Portuguese, Italian. Without claiming exhaustiveness, we consider that the variety of contributions presented here offers an insight into the vigour of current corpus research into phenomena related to the lexicon. Admittedly, the full range of topics, approaches and methodologies developed in this area of research could not it in a single volume, but a careful selection of studies representing a variety of interesting advances can be representative of signiicant developments taking place in the ield. References Sinclair, John McH. 1996. EAGLES. Preliminary Recommendations on Corpus Typology. http://www.ilc.pi.cnr.it/EAGLES96/corpustyp/corpustyp.html. ojs.uv.es/index.php/qilologia/index Qf Lingüístics Corpus analysis of phraseology in an A1 level textbook of German as a foreign language Análisis basado en corpus de fraseología en un libro de texto de alemán como lengua extranjera de nivel A1 María Rosario Bautista Zambrana Universidad de Málaga. [email protected] Received: 25/05/2017. Accepted: 06/10/2017 Resumen: El objetivo de este artículo es analizar hasta qué punto el libro de texto de alemán como lengua extranjera DaF kompakt A1 (Sander et al., 2011) cumple con las recomendaciones del Marco Común Europeo de Referencia para las Lenguas (Consejo de Europa, 2001) con respecto a la competencia léxica y la competencia sociolingüística en actividades de comprensión y expresión, en concreto en lo concerniente a unidades fraseológicas. En este sentido, nos hemos centrado en las fórmulas ijas y las estructuras ijas presentes en un corpus formado por los materiales del libro de texto, y hemos comprobado si esas expresiones ijas se corresponden con las competencias fraseológicas y sociolingüísticas que se esperan en el Marco para un estudiante de lengua alemana de nivel A1. Con este in, hemos compilado un corpus con los materiales de comprensión y expresión del libro de texto, formado por tres subcorpus: uno con los textos escritos, otro con los textos orales, y un tercer subcorpus formado por ejercicios. Hemos llevado a cabo un análisis cuantitativo (por medio de AntConc 3.4.4 [Anthony, 2016]) y kfNgram [Fletcher, 2007]), y uno cualitativo. Nuestros resultados apuntan a que el libro de texto se ajusta a las recomendaciones del Marco. Palabras clave: corpus; fraseología; alemán como lengua extranjera; Marco Común Europeo de Referencia para las Lenguas; nivel A1. Abstract: This paper aims to analyse the extent to which the textbook for German as a foreign language DaF kompakt A1 (Sander et al., 2011) complies with the recommendations of the Common European Framework of Reference for Languages (Council of Europe, 2001) (hereafter CEFR) in respect to lexical competence and sociolinguistic competence in receptive and productive activities, speciically with regard to phraseological units. In this respect, we have focused on sentential formulae and ixed frames present in a corpus containing the textbook materials, and we have checked whether Bautista Zambrana, María Rosario. 2017. “Corpus analysis of phraseology in an A1 level textbook of German as a foreign language”. Quaderns de Filologia: Estudis Lingüístics 22: 13-32. doi: 10.7203/qf.22.11298 those ixed expressions correspond to the phraseological and sociolinguistic competences that are expected in the Framework for an A1 level student of German language. To this end, we have compiled a corpus of the textbook receptive and productive materials, made up by three subcorpora: one for the written texts, one for the oral texts, and a third subcorpus containing exercises. We have performed a quantitative analysis (by means of AntConc 3.4.4 [Anthony, 2016] and kfNgram [Fletcher, 2007]), and a qualitative one. Our results suggest that the textbook complies with the recommendations of the CEFR. Keywords: corpus; phraseology; German as a foreign language; Common European Framework of Reference for Languages; A1 level. Corpus analysis of phraseology in an A1 level textbook of German... 15 1. Introduction This paper is based on the premise that much of the language we use is based on ready-made multi-word combinations, following Sinclair’s idiom principle (Sinclair, 1991: 110): the principle of idiom is that a language user has available to him a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analysable into segments. A considerable amount of literature has been published following this approach, as well as resources such as the Academic Phrasebank (Morley, 2017), which draws on the above-mentioned insight: It is now accepted that much of the language we use is phraseological in nature; that it is acquired, stored and retrieved as pre-formulated constructions (Bolinger, 1976; Pawley and Syder, 1983). These insights began to be supported empirically as computer technology permitted the identiication of recurrent phraseological patterns in very large corpora of spoken and written English using specialised software (e.g. Sinclair, 1991). (Morley, 2017: 5) This insight has important implications for language teaching and learning. We consider that learning phraseological units is essential for basic level language learners, and that their teaching should start from the very beginning, at the basic levels. As O’Keeffe et al. (2007: 46) state for the case of chunks or clusters1: (...) the vocabulary syllabus for the basic level is incomplete without due attention being paid to the most frequent chunks, since many of them are as frequent as or more frequent than single items which everyone would agree must be taught. As O’Keeffe et al. (2007: 63) explain, there are many terms to describe the phenomena of multi-word vocabulary or chunks: some of these terms are lexical phrases (Nattinger and DeCarrico, 1992), routine formulae (Coulmas, 1979), formulaic sequences (Wray, 2000, 2002), chunks (De Cock, 2000), as well as (restricted) collocations, ixed expressions, or multi-word units/expressions. Throughout this paper we will use the generic terms phraseological units and ixed expressions, and when referring to our speciic object of study, sentential formulae or ixed frames. 1 16 María Rosario Bautista Zambrana Bearing this in mind, this paper aims to analyse the extent to which the textbook for German as a foreign language DaF kompakt A1 (Sander et al., 2011) complies with the recommendations of the Common European Framework of Reference for Languages (Council of Europe, 2001) (hereafter CEFR) in respect to lexical competence and sociolinguistic competence in receptive and productive activities, speciically with regard to phraseological units. This textbook was selected because we have been using it in several courses at our University since the academic year 2013/2014, with good results and wide acceptance among lecturers and students. The CEFR describes lexical competence as the knowledge of, and ability to use, the vocabulary of a language, and it consists of lexical elements and grammatical elements. The lexical elements comprise, according to the CEFR, single word forms and ixed expressions: the latter consist of several words and are used and learnt as wholes (CEFR, 2001: 111). They include sentential formulae, phrasal idioms, ixed frames, phrasal verbs, compound prepositions and ixed collocations. We will focus in this paper on sentential formulae and ixed frames. Sentential formulae are not deined explicitly in the CEFR, but are described as including three kinds of expressions: direct exponents of language functions such as greetings (e.g. Eng. How do you do?, Good morning! and deut. Guten Morgen!, Nett, Sie kennenzulernen), proverbs and relict archaisms. We have focused on the irst type, direct exponents of language functions, and have looked for minimal communicative units, that can function as autonomous sequences2. As for the language functions involved, they are presented in the CEFR (2001: 126) as part of the functional competence3: 1.1 imparting and seeking factual information: • identifying • reporting • correcting 2 In this sense, we consider that sentential formulae are phraseological statements (‘enunciados fraseológicos’), as deined by Corpas Pastor (1996): they are autonomous speech sequences, minimal communicative units, stated with a distinct intonation. 3 These language functions are called speciically microfunctions and are deined as “categories for the functional use of single (usually short) utterances, usually as turns in an interaction.” (CEFR, 2001: 125) Corpus analysis of phraseology in an A1 level textbook of German... 17 • asking • answering 1.2 expressing and inding out attitudes: • factual (agreement/disagreement) • knowledge (knowledge/ignorance, remembering, forgetting, probability, certainty) • modality (obligations, necessity, ability, permission) • volition (wants, desires, intentions, preference) • emotions (pleasure/displeasure, likes/dislikes, satisfaction, interest, surprise, hope, disappointment, fear, worry, gratitude) • moral (apologies, approval, regret, sympathy) 1.3 suasion: • suggestions, requests, warnings, advice, encouragement, asking help, invitations, offers 1.4 socialising: • attracting attention, addressing, greetings, introductions, toasting, leave-taking 1.5 structuring discourse4: • (28 microfunctions, opening, turntaking, closing, etc.) 1.6 communication repair • (16 microfunctions) Fixed frames, on the other hand, are described as expressions “learnt and used as unanalysed wholes, into which words or phrases are inserted to form meaningful sentences” (CEFR, 2001: 111), e.g. Eng. Please may I have ... or Deut. Könnte ich bitte ... haben? Fixed frame is another name for phrase frame, which Römer (2009: 150) deines as “sets of n-grams which are identical except for one word, e.g. at the end of, at the beginning of, and at the turn of would all be part of the p[hrase]frame at the * of.” Lexical competence is associated in the CEFR with the scale of “Vocabulary range”; its descriptor for the A1 level points also to phraseological competence: “Has a basic vocabulary repertoire of isolated words and phrases related to particular concrete situations.” Sociolinguistic competence, on the other hand, is concerned with the knowledge and skills required to deal with the social dimension of language use, as the CEFR (2001: 118) explains. There are two areas We can ind the complete lists of microfunctions for structuring discourse and for communication repair in Threshold Level 1990 (van Ek and Trim, 1991). 4 18 María Rosario Bautista Zambrana here closely related to phraseology: linguistic markers of social relations and politeness conventions. The former comprises the following types of expressions, many of which are ixed (CEFR, 2001: 118): • • • • use and choice of greetings: on arrival, e.g. Hello! Good morning! introductions, e.g. How do you do? leave-taking, e.g. Good-bye . . . See you later use and choice of address forms: frozen, e.g. My Lord, Your Grace formal, e.g. Sir, Madam, Miss, Dr, Professor (+ surname) informal, e.g. irst name only, such as John! Susan! informal, e.g. no address form familiar, e.g. dear, darling; (popular) mate, love peremptory, e.g. surname only, such as Smith! You (there)! ritual insult, e.g. you stupid idiot! (often affectionate) conventions for turntaking use and choice of expletives (e.g. Dear, dear!, My God!, Bloody Hell!, etc.) Politeness conventions, for their part, include the following types of expressions (many are as well phraseological in nature) (CEFR, 2001: 119): 1. ‘positive’ politeness, e.g.: • showing interest in a person’s well being; • sharing experiences and concerns, ‘troubles talk’; • expressing admiration, affection, gratitude; • offering gifts, promising future favours, hospitality; 2. ‘negative’ politeness, e.g.: • avoiding face-threatening behaviour (dogmatism, direct orders, etc.); • expressing regret, apologising for face-threatening behaviour (correction, contradiction, prohibitions, etc.); • using hedges, etc. (e.g. ‘ I think’, tag questions, etc.); 3. appropriate use of ‘please’, ‘thank you’, etc.; 4. impoliteness (deliberate louting of politeness conventions), e.g.: • bluntness, frankness; • expressing contempt, dislike; • strong complaint and reprimand; Corpus analysis of phraseology in an A1 level textbook of German... • • 19 venting anger, impatience; asserting superiority. There is a scale related to sociolinguistic competence, “Sociolinguistic appropriateness”, and it includes a descriptor for the A1 level which mentions phraseological aspects: “Can establish basic social contact by using the simplest everyday polite forms of: greetings and farewells; introductions; saying please, thank you, sorry, etc.” This study is speciically centered on receptive activities (reception) and productive activities (production). The former include reading and listening activities (CEFR, 2001: 65-71). For the A1 level there are not any descriptors for listening activities that include any reference to ixed expressions, but we do ind some descriptors about reading that mention phraseology: in “Overall reading comprehension” it is recommended for the A1 level that the learner can “understand very short, simple texts a single phrase at a time, picking up familiar names, words and basic phrases and rereading as required”. In the section “Reading for orientation” we ind that the learner “Can recognise familiar names, words and very basic phrases on simple notices in the most common everyday situations.” Production, on the other hand, includes speaking and writing activities. With respect to oral production, there is one descriptor for the A1 level that mentions phraseology: in “Overall oral production” it is proposed that the learner “can produce simple mainly isolated phrases about people and places.” As for writing activities, the descriptor “Overall written production” includes the recommendation that the A1 level learner “can write simple isolated phrases and sentences”, while the descriptor “Creative writing” mentions that the learner “can write simple phrases and sentences about themselves and imaginary people, where they live and what they do.” The speciic objective of this paper has been to study the sentential formulae and ixed frames present in a corpus containing the receptive and productive materials of the textbook DaF kompakt A1, and to check whether those ixed expressions correspond to the phraseological and sociolinguistic competences that are expected in the Framework for an A1 level student of German language. The remaining part of the paper proceeds as follows: in Section 2 we present the methodology that we have followed to carry out this study, while in Section 3 the results of 20 María Rosario Bautista Zambrana the quantitative and qualitative corpus analysis are laid out. Finally, Section 4 offers the discussion of the results, and Section 5 some concluding remarks. 2. Methodology We have followed a quantitative and a qualitative methodology. In order to perform the linguistic analysis that we have set out to do, we have compiled a corpus of the DaF kompakt A1 textbook materials, made up by three subcorpora: one for the written texts (letters, e-mails, advertisements, text messages, biographies, news…), one for the oral texts (transcriptions of conversations and monologues, mostly voice messages), and one for the exercises; all of these texts were taken both from the Kursbuch (‘coursebook’) and the Übungsbuch (‘workbook’). In the case of the spoken and the written subcorpora, we decided to include only complete texts, while for the exercise subcorpus, we selected those activities that contained sentences or at least some type of ixed expressions; in this way, exercises focusing exclusively on single word forms or morphology were left out. The formulation and instructions of the exercises, as well as the grammar reference sections and vocabulary lists, were left out too. The textbook is a compact method, containing relatively few written texts, a moderate amount of oral texts, and a substantial number of exercises. Thus, the written subcorpus includes 26 texts, containing 2620 tokens and 929 types (type-token ratio 35,46%); the oral component comprises 81 texts, containing 7936 tokens and 1449 types (type-token ratio 18,26%); and the exercise subcorpus is made of 215 texts (each one representing a different task), containing 10250 tokens and 1620 types (type-token ratio 15,8%). As we can see, there is greater lexical variety in the written subcorpus, whereas the exercise subcorpus has the lowest ratio, which means that many of its words occur repeatedly. We have performed the quantitative analysis by means of AntConc 3.4.4 (Anthony, 2016) and kfNgram (Fletcher, 2007). We have used the Cluster/N-Gram function of AntConc to extract all 2-, 3-, 4- and 5-word n-grams from each corpus. We established a normalised threshold of 250 occurrences per million words for each corpus, which resulted in a minimum threshold of two for the spoken corpus, and of only one occurrence for the written corpus. Even though it might seem a very Corpus analysis of phraseology in an A1 level textbook of German... 21 low absolute threshold, it is actually a high normalised threshold, which can be justiied by the fact that we are dealing with very frequent word combinations, relevant for basic level language learners. The exercise subcorpus, on the other hand, was used for comparison purposes, so all the n-grams extracted in the previous steps were searched for later in this subcorpus. Afterwards, we employed kfNgram to extract all 2- to 6-word phrase frames, i.e. n-grams which are identical except for a single word, from each corpus. We expanded the number of words (n) to 6, as we noticed that in that way some more relevant frames could be extracted. As for the options speciied, it is worth noting that in order to generate lists of phrase-frames, the programme relies on previously-produced lists of wordgrams (n-grams) with values of n of 2 or greater; that is why we generated in the irst place as many n-grams as possible, by setting the minimum frequency of occurrence to 1. As for the qualitative methodology, we examined all n-grams and phrase frames extracted from the oral and the written subcorpora to see which ones complied with the deinition of sentential formulae and ixed frames as proposed by the CEFR, and then compared the results with the n-grams and phrase frames extracted from the exercise subcorpus, so as to check whether the phraseological units laid out in the receptive materials were later practised in the productive sections. In this sense, we could deine our work as corpus-based, as Storjohann (2005: 8-9) describes: From this repository, appropriate material is extracted to support intuitive knowledge, to verify expectations, to allow linguistic phenomena to be quantiied, and to ind proof for existing theories or to retrieve illustrative samples. It is a method where the corpus is interrogated and data is used to conirm linguistic pre-set explanations and assumptions. It acts, therefore, as additional supporting material. Thus, we have used the corpus to ind pre-deined linguistic structures: sentential formulae and ixed frames. As we mentioned above, both are types of ixed expressions, which consist of several words and are used and learnt as wholes (CEFR, 2001: 111). In this way, we have selected those n-grams which fulilled the conditions to be a sentential formula and complied with any of the language functions listed above. 22 María Rosario Bautista Zambrana As for the ixed frames, we followed the same approach: to focus on those that corresponded to minimal communicative units, and that complied with any of the language functions cited above. The study on sociolinguistic competence, on the other hand, was carried out by reviewing all the sentential formulae that we had previously extracted from the spoken and the written subcorpora, and by determining which ones could meet the criteria to constitute a linguistic marker of social relations, or an expression of politeness. The results were then compared with the expressions found in the exercise subcorpus. 3. Results We extracted n-grams and phrase frames following the criteria mentioned above, and classiied the results in two groups: those related to lexical competence, and those related to sociolinguistic competence. 3.1. Lexical competence We explored the spoken and the written subcorpora separately, in order to detect differences in spoken and written discourse, so we will offer differentiated results. 3.1.1. Spoken subcorpus From the spoken subcorpus of DaF kompakt A1 we extracted 60 sentential formulae and 23 ixed frames. We classiied the sentential formulae according to the number of words in the n-grams, and noted down which language function (LF) was being fulilled. Here are some examples of 2-, 3- and 4-grams5: 5 We did not ind any relevant 5-grams. Corpus analysis of phraseology in an A1 level textbook of German... Rank 5 6 25 31 47 1 23 57 105 134 1 4 21 69 108 Freq. 20 18 10 9 7 4 3 3 2 2 4 3 2 2 2 N-gram guten Tag6 (‘good morning/afternoon’) vielen Dank (‘thank you very much’) auf Wiedersehen (‘goodbye’) auf Wiederhören (‘goodbye’[telephone]) das geht (‘it is possible’) wie geht’s? (‘how are things?’) das ist alles (‘that’s everything’) weißt du was? (‘you know what?’) das klingt gut (‘that sounds good’) es geht so (‘so-so’) wie geht’s dir? (‘how are you?) kann ich Ihnen helfen? (‘can I help you?’) das geht leider nicht (‘unfortunately that is not possible’) können Sie mir helfen? (‘can you help me?’) wie geht es dir? (‘how are you?’) 23 LF 1.4 1.2 1.4 1.4 1.1/1.2 1.4 1.1 1.2 1.2 1.2 1.2 1.3 1.1/1.2 1.3 1.2 The 60 sentential formulae that we have found in the oral corpus fulil the following language functions, as described by the CEFR (2001): 1.1 imparting and seeking factual information 1.2 expressing and inding out attitudes 1.3 suasion 1.4 socialising 1.5 structuring discourse 1.6 communication repair 11 36 5 10 2 1 Out of the 60 sentential formulae extracted from the oral subcorpus, 45 of them are found in the exercise subcorpus, occurring at least once. 22 of them occur three or more times. In respect to the ixed frames, we classiied them according to the number of words and we noted down their language function. Below are some ixed frames of 2-, 3-, 4- and 5-grams. Our search was not case-sensitive, but we have capitalized the nouns in these tables of results. 6 24 María Rosario Bautista Zambrana Fixed frame Total freq. bis *7 (‘see you *’) 13 soll ich *8 (‘shall I *’) 4 das macht * (‘that’s [price]’) 3 wie geht’s *? (‘how are *?’) 7 was ist mit *? (‘what about *?’) 3 ich hätte gern * (‘I’d like *’) 3 mir geht es * (‘I am *’) 2 wie komme ich zum *9 3 Nr. of varieties 6 4 3 4 3 2 2 3 LF 1.4 1.3 1.1 1.2 1.1 1.2 1.2 1.1 The 23 ixed frames that we have found in the spoken corpus comply with the following language functions: 1.1 imparting and seeking factual information 1.2 expressing and inding out attitudes 1.3 suasion 1.4 socialising 1.5 structuring discourse 1.6 communication repair 9 9 2 3 0 0 As we observe, most of the ixed frames are used to impart and seek factual information, or are related to expressing and inding out attitudes. As in the case of the sentential formulae, we have been barely able to ind expressions for structuring discourse or repairing communication. Out of the 23 ixed frames found in the oral subcorpus, 13 of them appear in the exercises. 3.1.2. Written subcorpus From the written subcorpus of DaF kompakt A1 we extracted 25 sentential formulae and four ixed frames. We classiied the sentential formulae according to the number of words in the n-grams, and noted down Only with nouns or adverbs expressing a point of time in the future, for instance: bis Montag (‘see you on Monday’), bis später (‘see you later’). 8 This phrase frame is actually not only completed by adding one word, but more, but we decided to include it given its function: to propose something. 9 In English: ‘how do I get to *?’. 7 25 Corpus analysis of phraseology in an A1 level textbook of German... which language function (LF) was being fulilled. Here are some examples of 2-, 3- and 4-grams: Rank Freq. 6 8 38 3 180 2 3 3 873 1 1589 1 2390 1 2423 1 950 1 2011 1 N-gram LF liebe Grüße (‘kind regards’) 1.5 du weißt (‘you know’) 1.2 sehr gern (‘I’d love to’) 1.3 hast du Lust? (‘do you feel like it/doing it?’) 1.3 Gott sei Dank (‘thank God’) 1.2 mit freundlichen Grüßen (‘yours sincerely’) 1.5 wie geht es dir? (‘how are you?’) 1.2/1.4 wir grüßen euch herzlich (‘we send our best wishes’) 1.5 hast du Zeit und Lust?10 1.3 so geht es nicht weiter (‘it cannot go on like this’) 1.2 The 25 sentential formulae that we have found in the written subcorpus fulil the following language functions, as described by the CEFR (2001): 1.1 imparting and seeking factual information 1.2 expressing and inding out attitudes 1.3 suasion 1.4 socialising 1.5 structuring discourse 1.6 communication repair 2 6 8 4 7 0 As we can see, the sentential formulae fulil varied functions, being suasion and structuring discourse the most common. Out of the 25 sentential formulae detected, 12 are found also in the oral subcorpus, and 15 in the exercise subcorpus (and nine of them occur three or more times). With regard to the ixed frames, we classiied them according to the number of words and we noted down their language function. Below are the ixed frames that we were able to extract (2-, 3-, and 5-grams): 10 In English: ‘do you have time and feel like it?’. 26 María Rosario Bautista Zambrana Fixed frame liebe * (‘dear *’) lieber * (‘dear’) danke für * (‘thanks for *’) * gefällt mir sehr gut11 Total freq. 9 5 2 2 Nr. of varieties 8 5 2 2 LF 1.5 1.5 1.2 1.2 As we can see, two of the ixed frames fulil the function of structuring discourse, and the other two are used to express attitudes. Two of the ixed frames found in this subcorpus are present in the exercises: liebe * and lieber *. 3.2. Sociolinguistic competence We explored the spoken and the written subcorpora separately, so we will offer differentiated results. 3.2.1. Spoken subcorpus We analysed the sentential formulae that we extracted from the corpus in order to establish which ones could meet the criteria to act as linguistic markers of social relations or as politeness conventions. We found that 10 expressions can be considered linguistic markers of social relations, and all of them are 2-grams. Below are some examples: Rank 128 245 248 252 721 Frequency 4 3 3 3 2 Expression bis später (‘see you later’) grüß Gott (‘hello’) guten Morgen (‘good morning’) herzlich willkommen (‘welcome’) oh je (‘oh dear’) Most of these expressions are related to the use and choice of greetings (on arrival and leave-taking). We also ind one expletive (oh je). 8 of these expressions are present also in the exercise subcorpus. As for the politeness conventions, we detected 34 expressions among the sentential formulae that we had previously extracted. Below are some examples (2-, 3- and 4-grams): 11 In English: ‘I like * very much’. Corpus analysis of phraseology in an A1 level textbook of German... Rank 65 67 70 92 143 282 324 1 4 Frequency 6 6 6 5 4 2 2 4 3 27 Expression hier bitte (‘here you are’) ja, gern (‘with pleasure’) kein Problem (‘no problem’) gern geschehen (‘my pleasure’) freut mich (‘pleased to meet you’) tut mir leid (‘sorry’) wie Sie wollen (‘as you like’) wie geht’s dir? (‘how are you?’) kann ich Ihnen helfen? (‘can I help you?’) Most of these expressions are related to positive politeness (wie geht’s dir, freut mich), while a few correspond to negative politeness (tut mir leid). We also ind clusters for expressing ‘please’ or ‘thank you’ (vielen Dank; nein, danke). 26 of these expressions are found also in the exercise subcorpus. 3.2.2. Written subcorpus We determined that nine sentential formulae from the written subcorpus can be considered linguistic markers of social relations. Below are some examples: Rank 899 46 1935 Frequency 1 2 1 Expression grüß dich (‘hello’) viele liebe grüße (‘lots of love’) seid herzlich gegrüßt (‘best wishes’) Three of these linguistic markers are found in the oral subcorpus, while six are also present in the exercise subcorpus. On the other hand, we found eight expressions that qualify as politeness conventions, such as the following: Rank 915 1862 2390 Frequency 1 1 1 Expression guten Appetit (‘enjoy your meal’) stimmt’s? (‘right?’) wie geht es dir? (‘how are you?’) Five of these politeness conventions have been detected also in the oral subcorpus, whereas six are present in the exercise subcorpus. 28 María Rosario Bautista Zambrana 4. Discussion In respect to lexical competence, we have divided our results into two groups, oral texts and written texts, and have compared them with those of the exercise subcorpus. The spoken subcorpus contains 60 sentential formulae and 23 ixed frames, whereas 45 of the formulae (75%) and 13 of the frames (56,5%) are practised in the exercises. As for the written subcorpus, we have found quite a few sentential formulae (25; 15 of which in exercise subcorpus, 60%). Only four ixed frames have been extracted, whereas two of them (50%) are practised in the exercise section. Some of the sentential formulae are present as well in the oral subcorpus: 12 (48%). If we consider the results of the oral and the written subcorpora as a whole, we obtain a total number of 73 sentential formulae and of 27 ixed frames. 51 of these sentential formulae are found in the exercise subcorpus (69,86%), while we detect 15 ixed frames (55,55%). Given these results, and taking into account the amount and variety of sentential formulae and ixed frames that we have encountered, we can state that both the oral and the written subcorpora comply suficiently with the recommendations of the CEFR in respect to lexical competence: “Has a basic vocabulary repertoire of isolated words and phrases related to particular concrete situations.” Even though the written component is quite small and therefore probably not representative enough of the German language at a basic level, this shortcoming is offset by the fact that a part of its ixed expressions are present also in the oral and exercise subcorpora. Considering this fact, we may state that this subcorpus contributes to the compliance with the reading descriptors in the CEFR: the learner “can understand very short, simple texts a single phrase at a time, picking up familiar names, words and basic phrases and rereading as required” and the learner “can recognise familiar names, words and very basic phrases on simple notices in the most common everyday situations.” Regarding productive activities, although we have used the exercise subcorpus mainly for comparison purposes, we can draw some interesting conclusions: a majority of ixed expressions from the receptive materials are present in this subcorpus, as we stated above, and their frequency is relatively high; while the total number of sentential for- Corpus analysis of phraseology in an A1 level textbook of German... 29 mulae (tokens) for the oral and written subcorpora is 261 and 43, respectively, we can ind 198 tokens in the exercise section. On the other hand, there are 161 ixed frames (tokens) in the spoken component, 18 in the written one and 125 in the exercise subcorpus. This fact allows us to state that the productive component complies suficiently with the descriptors “Overall written production” (the A1 level learner “can write simple isolated phrases and sentences”) and “Creative writing” (the learner “can write simple phrases and sentences about themselves and imaginary people, where they live and what they do”). Another signiicant inding is that we have detected a noticeable difference between the written and the oral subcorpora, in spite of the coincidences that we have mentioned above. In the written subcorpus there are more phraseological units for structuring discourse (mit freundlichen Grüßen) and for expressing suasion (hast du Zeit und Lust?), while in the oral subcorpus there are more ixed expressions for expressing and inding out attitudes (das kling gut), for socialising (wie geht’s?) and for imparting and seeking factual information (das geht). Regarding sociolinguistic competence, we have also distinguished two groups, oral texts and written texts, and have compared the results with those of the exercise subcorpus. In the oral subcorpus there are 10 linguistic markers of social relations and 34 politeness conventions (respectively, 8 [80%] and 26 [76,47%] in the exercises). On the other hand, in the written subcorpus there are nine linguistic markers of social relations and eight politeness conventions (respectively, six [75%] and six [75%] in the exercises). Given the number and variety of expressions extracted from the oral subcorpus (and to a lesser extent, from the written subcorpus), we can state that DaF kompakt A1 complies with the descriptor of sociolinguistic appropriateness for the A1 level, as formulated in the CEFR (2001: 122): “Can establish basic social contact by using the simplest everyday polite forms of: greetings and farewells; introductions; saying please, thank you, sorry, etc.” The written subcorpus has yielded very limited results, but this can be partially compensated by the fact that the phraseological units are also present in the oral subcorpus (33% of linguistic markers and 62,5% of politeness conventions) and in the exercises of the book, so we can consider that they are suficiently repeated, in such a way that students are exposed to them. 30 María Rosario Bautista Zambrana 5. Conclusions This paper has offered an overview of two competences of the CEFR that are directly related to phraseology: lexical and sociolinguistic competences. We set out to analyse a corpus of the receptive and productive materials of the textbook DaF kompakt A1, in order to check whether it contained ixed expressions that corresponded to the recommendations of the CEFR. After analysing the corpus in search for sentential formulae and ixed frames, we can conclude that both the spoken and the written subcorpora (as well as the exercise component) comply with the descriptors laid out in the CEFR. The written subcorpus was very limited in size and therefore did not yield many results in form of ixed expressions, but still they can be considered suficient, taking into account that a fair amount of the ixed expressions listed occur also in the exercises and in the oral subcorpus. Apart from the obvious difference in the number of detected ixed expressions, we also determined that the spoken subcorpus contains more expressions related to imparting and seeking factual information, to expressing and inding out attitudes and to socialising, while the phraseological units found in the written subcorpus deal more with suasion and structuring discourse. This is in line with the well-known differences between oral and written discourse. With respect to sociolinguistic competence, we found numerous expressions that comply with the minimum recommendations for the A1 level. With regard to linguistic markers of social relations there are not signiicant differences between the spoken and the written subcorpora, but we have observed that politeness conventions are much more prevalent in oral discourse. Further investigations can be carried out by comparing the results to a general reference corpus of the German language, to determine if these ixed expressions are indeed the most frequent and widely used by German speakers. However, we might encounter the limitation that, to the best of our knowledge, there is not any general reference corpus of the spoken German language that allows the user to create lists of n-grams or clusters. Corpus analysis of phraseology in an A1 level textbook of German... 31 6. Bibliography Anthony, Laurence. 2016. AntConc (Version 3.4.4) [Computer Software]. Tokyo, Japan: Waseda University. http://www.laurenceanthony.net/ Bolinger, Dwight. 1976. Meaning and memory. Forum Linguisticum 1: 1-14. Corpas Pastor, Gloria. 1996. Manual de fraseología española. Madrid: Gredos. Coulmas, Florian. 1979. On the sociolinguistic relevance of routine formulae. Journal of Pragmatics 3: 239-266. Council of Europe. 2001. Common European Framework of Reference for Languages: Learning, teaching, assessment. Cambridge: Cambridge University Press. De Cock, Sylvie. 2000. Repetitive phrasal chunkiness and advanced EFL speech and writing. In Mair, C. & Hundt, M. (eds.) Corpus Linguistics and Linguistic Theory. Papers from ICAME 20. Amsterdam: Rodopi, 51-68. Fletcher, William H. 2007. KfNgram [Computer Software]. Annapolis MD: USNA. http://www.kwicinder.com/kfNgram/kfNgramHelp.html Morley, John. 2017. Academic Phrasebank. Manchester: The University of Manchester. http://www.phrasebank.manchester.ac.uk/ [Accessed 05/03/2017]. Nattinger, James & DeCarrico, Jeanette. 1992. Lexical Phrases and Language Teaching. Oxford: Oxford University Press. O’Keeffe, Anne; McCarthy, Michael & Carter, Ronald. 2007. From Corpus to Classroom: language use and language teaching. Cambridge: Cambridge University Press. Pawley, Andrew & Syder, Frances Hodgetts. 1983. Two puzzles for linguistic theory: nativelike selection and nativelike luency. In Richards, J. C. & Schmidt, R. W. (eds.) Language and Communication. Longman: New York, 191-226. Römer, Ute. 2009. The inseparability of lexis and grammar: Corpus linguistic perspectives. Annual Review of Cognitive Linguistics 7: 141-163. Sander, Ilse; Braun, Birgit; Doubek, Margit; Frater-Vogel, Andrea; Trebesius-Bensch, Ulrike; Vitale, Rossana; Behnes, Sibylle; Kotas, Ondrej & Marquardt-Langermann, Martina. 2011. DaF kompakt A1. Deutsch als Fremdsprache für Erwachsene. Stuttgart: Klett. Sinclair, John. 1991. Corpus, Concordance and Collocation. Oxford: Oxford University Press. Storjohann, Petra. 2005. Corpus-driven vs. corpus-based approach to the study of relational patterns. In Conference e-journal, Corpus Linguistics 2005 conference. Birmingham: University of Birmingham, 1-20. 32 María Rosario Bautista Zambrana http://www.birmingham.ac.uk/research/activity/corpus/publications/ conference-archives/2005-conf-e-journal.aspx [Accessed 5/03/2017]. van Ek, Jan Ate & Trim, John Leslie Melville. 1991. Threshold Level 1990. Cambridge: Cambridge University Press. Wray, Alison. 2000. Formulaic sequences in second language teaching: principle and practice. Applied Linguistics 21(4): 463-489. Wray, Alison. 2002. Formulaic Language and the Lexicon. Cambridge: Cambridge University Press. ojs.uv.es/index.php/qilologia/index Qf Lingüístics Getting rid of the Chi-square and Log-likelihood tests for analysing vocabulary differences between corpora Analizar las diferencias de vocabulario entre corpus sin los tests Chi-cuadrado y Log-likelihood Yves Bestgen Université catholique de Louvain. [email protected] Received: 20/04/2017. Accepted: 11/10/2017 Resumen: Los tests de log-likelihood y chi-cuadrado probablemente sean las pruebas estadísticas más populares utilizadas en la lingüística de corpus, especialmente cuando la investigación tiene como objetivo describir las variaciones léxicas entre corpus distintos. Sin embargo, dado que este uso especíico del chi-cuadrado no es válido, produce demasiados resultados signiicativos. Esta contribución explica el origen del problema (es decir, la no independencia de las observaciones), los motivos por los cuales las soluciones habituales no son aceptables y qué clase de pruebas estadísticas deben ser utilizadas en su lugar. Se ha realizado un análisis de corpus sobre las diferencias léxicas entre el inglés británico y el inglés americano para mostrar el problema y conirmar la adecuación de la solución propuesta. La última sección presenta las órdenes que pueden darse a WordSmith Tools, un programa informático muy popular en el procesamiento de corpus, a in de obtener los datos necesarios para las pruebas adecuadas, así como un procedimiento muy fácil de usar en R, un paquete estadístico gratuito y fácil de instalar, que realiza estas pruebas. Palabras clave: diferencias léxicas entre corpus; test de remuestreo; Wordsmith tools; inglés británico y americano. Abstract: Log-likelihood and Chi-square tests are probably the most popular statistical tests used in corpus linguistics, especially when the research is aiming to describe the lexical variations between corpora. However, because this speciic use of the Chi-square test is not valid, it produces far too many signiicant results. This paper explains the source of the problem (i.e., the non-independence of the observations), the reasons for which the usual solutions are not acceptable and which kinds of statistical test should be used instead. A corpus analysis conducted on the lexical differences between American and British English is then reported, in order to demonstrate the problem and to conirm Bestgen, Yves. 2017. “Getting rid of the Chi-square and Log-likelihood tests for analysing vocabulary differences between corpora”. Quaderns de Filologia: Estudis Lingüístics 22: 33-56. doi: 10.7203/qf.22.11299 the adequacy of the proposed solution. The last section presents the commands that can be used with WordSmith Tools, a very popular software for corpus processing, to obtain the necessary data for the adequate tests, as well as a very easy-to-use procedure in R, a free and easy to install statistical software, that performs these tests. Keywords: lexical differences between corpora; resampling test; WordSmith Tools; British and American English. 35 Getting rid of the Chi-square and Log-likelihood tests... 1. Introduction Many studies in corpus linguistics aim to analyse lexical differences between corpora of different genres (Tribble, 2000), their regional and diatypic varieties (Oakes & Farrow, 2007), their oral or written modalities (Rayson, Leech & Hodges, 1997), the period of writing (Laviosa, Pagano, Kemppanen & Ji, 2017) or certain sociological characteristics of the speaker or writer, such as gender, age and socio-economic status (Brezina & Meyerhoff, 2014; Marquilhas, 2015), to cite a few examples. This kind of study immediately raises the question of how to decide whether a difference observed when comparing two given corpora (i.e., more occurrences of towards or male in an American English as opposed to a British English corpus) is purely accidental, or whether it relects a real difference in the way English is used. The answer is typically provided through the use of the Pearson’s Chi-square (Chi2) test or its close neighbour, the log-likelihood (LL) test (Biber & Jones, 2008; Rayson & Garside, 2000). These statistical tests are applied to a contingency table made up of the frequency of a word in the two corpora to be compared and the total number of words in each corpus. Table 1 shows the contingency tables for the words towards and male in the British English corpus FLOB and in the American English corpus FROWN, which are used in the empirical analyses reported in section 4. British American Towards 17 293 ~Towards 1016832 1018360 British American Male 89 177 ~Male 1016760 1018476 Table 1. Frequency counts for two words in the FLOB and FROWN corpora The null hypothesis tested is that the difference between the frequency of use in the two corpora is only the result of random variations, the two samples compared being randomly extracted from a single population. The statistics used are: 36 Yves Bestgen in which O represents the observed frequency and E the expected frequency, computed on the basis of the marginal totals, and the summation is over the four cells (and not over the irst two as in Brezina and Meyerhoff (2014)). Under H0, these statistics are approximately distributed as a Chisquare with one degree of freedom, which makes it possible to calculate the probability of obtaining a statistic at least as high as that which would be observed if the differences were due to chance alone. Applied to the words towards and male, these two tests return probabilities of less than 0.00000001. As noted by Sampson (2003), the use of these tests has, for example, expanded our understanding of the lexical differences between British and American English by Holand and Johansson (1982). These authors showed that masculine words, such as he, boy and man, are signiicantly more frequent in American English, while feminine words are signiicantly more frequent in British English. The popularity of these tests has undoubtedly been reinforced by their implementation in a software as frequently used in corpus linguistics as WordSmith Tools (Scott, 1997), one of its main functions being the identiication of Keywords, i.e., all words that successfully pass the Chi2 or LL tests for a probability threshold of 0.000001. This same function is also available in other software, such as AntConc (Anthony, 2012). These two tests are also very frequently used to test speciic hypotheses in corpus linguistics (Lee & Chen, 2009; Lubbers Quesada & Blackwell, 2009; see Gablasova, Brezina & McEnery (2017) for illustrations and a discussion). For example, Siyanova-Chanturia (2015) used the Chi2 test to conirm that Chinese beginner learners of Italian used more strongly associated collocations at the end of an intensive course than they did at the beginning. However, these tests, according to the way they are used to analyse lexical differences between corpora, are inadequate, as has already been pointed out by several authors, and should no longer be used (Bestgen, 2012, 2014; Brezina & Meyerhoff, 2014; Kilgarriff, 1996, 2005; Lijfijt, Nevalainen, Säily, Papapetrou, Puolamäki & Mannila, 2016). The aim of this paper is to help researchers to abandon them by explaining in detail the problem they pose and its origin, by showing why several possible solutions are ineffective and by recommending two valid and eficient statistical tests. To make the use of these adequate tests as sim- Getting rid of the Chi-square and Log-likelihood tests... 37 ple as possible, the last section provides the commands to obtain the necessary data by means of WordSmith Tools and a very easy-to-use script in R, a free and easy to install multi-operating system statistical software, to perform them. 2. The problem The use of these two tests in corpus linguistics has been criticized for the very large number of signiicant differences they claim to detect (Baker, 2004; Gries, 2005; Kilgarriff, 1996, 2005). For example, Paquot and Bestgen (2009) observed, when comparing a literary corpus and an academic corpus of 15 million words each, that more than 90% of the 10,333 words tested were signiicantly more frequent in one of the two corpora for a probability threshold of 0.000001. The origin of this problem was most often explained by the very large sample size under analysis (Kilgarriff, 2005) or in the large number of tests performed (Gries, 2005). The problem is, in fact, much deeper and does not arise only in linguistics. It was mentioned by Lewis and Burke as early as 1949 as the main misuse of the Chi2 test in psychology, and has been repeatedly emphasized since then: “Chi-square may be correctly used only if all N observations are made independently” (Kurtz & Mayo, 1979: 366); that is, each observation must be “taken from the population at random, and the selection of each member of the sample is independent from the next” (Wallis, 2013: 352). In other words, for the test to be valid1, the unit analysed must be the sampling unit (Bestgen, 2014; Gablasova et alii., 2017). This is (almost) never the case in corpus linguistics. The unit analysed is often a word, or sometimes a sentence, while the sampling unit used to construct the corpus is a text (or an extract from a text). Why does this discrepancy between the sampling unit and the unit of analysis so strongly affect the number of signiicant words in corpus comparison? It has long been known that the frequency of word occurrences varies greatly between texts (Church, 2000). It follows that the presence of some very speciic texts, or even a single one, in a corpus may be suficient to increase the frequency of certain words and thus This problem arises for all statistical tests that can be applied to a contingency table, including Fisher’s exact test, which also requires the observations to be independent. 1 38 Yves Bestgen to modify the words considered as being signiicantly more frequent in this corpus according to the Chi2 and LL tests. This phenomenon is perfectly illustrated in the following example reported in Oakes and Farrow (2007). These authors observed that one of the most typical words in British English, according to the Chi2 test, is thalidomide. They note, however, that all of the 55 occurrences of this word in the British corpus appear in one single text. Contrary to what the Chi2 test seems to indicate, thalidomide is not typical of British English, just of one text in the British corpus. It is because this text has been selected in its entirety for inclusion in the corpus that thalidomide appears as typical. If the sampling unit had coincided with the unit of analysis (the word), thalidomide would have had (virtually) no chance of being declared typical. Thus, each selected text may cause a series of false positives. It is important to note that it is not just such extreme cases that invalidate the Chi2 and LL tests. The simple fact that the probability of a word occurring in a text for a second time is far higher than that of having it for the irst time, shows that non-independence is general and not occasional (Church, 2000). 3. The solutions A irst solution consists of disregarding the probability derived from the inferential test (Bestgen, 2014; Gabrielatos & Marchi, 2011; Leedham, 2012). The Chi2 (or LL) values (called Keyness in WST) are interpreted as indicators of the potential interest of each of the numerous vocabulary differences between the corpora: the larger it is, the more interesting the word. This solution has the major drawback of only masking the problem without solving it because there is an inverse monotonic relationship between the p-value and the test statistic. A word such as thalidomide is extremely signiicant, because it has a very high Chi2 value. Pretending to only look at the Chi2 or the LL scores does not solve anything. A second solution is to use a dispersion measure to eliminate words that only occur in a part of a corpus (Baker, 2004; Oakes & Farrow, 2007). The irst problem with this solution is that the threshold used to decide that a word is insuficiently dispersed is necessarily arbitrary, which is all the more annoying since the main measures proposed in the literature are dificult to interpret (Oakes & Farrow, 2007). More- Getting rid of the Chi-square and Log-likelihood tests... 39 over, Bestgen (2014) showed that taking dispersion into account made it possible to reduce the problem posed by very badly dispersed words (like thalidomide), but not to eliminate it. The LL and Chi2 tests remain inadequate. The only acceptable solution is to use an inferential test that reconciles the sampling units and the units of analysis and that is therefore based on the frequency of the words2 not in the corpus, but in the texts making up the corpus. Several statistical tests are possible. The most obvious choice is the Student’s t-test for comparing two means. This test, however, is problematic, because it is based on a postulate of normality which is very dificult to sustain in the case of data made of word frequencies. For this reason, Kilgarriff (1996) and several authors after him (Brezina & Meyerhoff, 2014; Lijfijt et alii., 2016; Paquot & Bestgen, 2009) proposed the use of a distribution-free test (also called a nonparametric test). The test recommended by Kilgarriff is the Wilcoxon-Mann-Whitney test (WMW), which is carried out on the relative frequency of each word in each text after they have been transformed into ranks. When simpliied a little, it calculates the probability of having, under the null hypothesis that the two corpora were drawn at random from identical populations of texts, a difference which is at least as important between the average ranks as that observed. This proposal was strongly criticized by Rayson and colleagues (Rayson, Berridge & Francis, 2004; Rayson & Garside, 2000) because this test neglects to take into account some important information available in the data due to the transformation of the frequencies into ranks. However, it is easy to remedy this problem because there is a WMW-equivalent test that can be applied to the non-ranked values: the Fisher-Pitman (FP) test (Berry, Mielke & Mielke, 2002; Neuhauser & Manly, 2004). It calculates the probability, under the same null hypothesis, of obtaining a difference between the mean frequencies in texts as large as the difference actually observed. The only difference between these two tests is therefore that one is calculated on the basis of ranked data and the other on raw data. Since the texts in a corpus are rarely of exactly the same length, the analyses must be carried out on the relative frequencies (number of occurrences divided by the length of the text). 2 40 Yves Bestgen These two tests have some properties which are important to know in order to use them adequately. First, because they free us from the normality assumption, they test a more general null hypothesis than that tested by the Student’s t-test. They also detect differences in the variability and even in the shape of the distributions. However, they are particularly sensitive to differences in mean or medians (Howell, 2008; Hesterberg, Moore, Monaghan, Clipson & Epstein, 2006). Second, the p-value that they provide when analysing large samples, as is almost always the case in corpus linguistics, is obtained using a Monte-Carlo resampling procedure3. This type of test is gaining more and more attention in statistics (Good, 2005) as well as in corpus linguistics (Gries, 2006). However, its weakness is that the degree of precision of the probability depends on the number of resamplings performed and it is therefore time-consuming to obtain probability estimates for many words. This limitation is especially important when estimating extremely small probabilities, since they cannot be smaller than one divided by the number of resamplings done. Finally, replacing the relative frequencies by ranks in the case of the WMW has the consequence that the corpus containing the fewest occurrences of a word may be the one whose texts have the highest average rank. This will be the case, for instance, if one of the two corpora only contains a single text containing many occurrences of the word, while the other corpus contains a suficiently large number of texts containing a small number of occurrences of it. The irst corpus will have the highest frequency, but the second will have the highest average rank. This difference between the two tests is not a defect. It points out words showing an atypical proile. 4. Empirical evaluation of the different tests So far, studies which have stressed the inadequacy of the Chi2 and LL tests for analysing lexical differences between corpora presented arLijfijt et alii. (2016) proposed an ad hoc resampling procedure of the bootstrap type that differs from the usual practices in statistics since the resampling is done in a manner that is not consistent with the null hypothesis (Hesterberg et alii., 2010) and since, when the two samples are unequal in size, the smallest sample size is used in the resampling procedure (see Efron and Tibshirani [1993, Chap. 16] for a signiicance test based on bootstrap). 3 Getting rid of the Chi-square and Log-likelihood tests... 41 guments using the fact that these tests declare too many words to be signiicant even when extremely strict probability thresholds are used (Bestgen, 2014; Brezina & Meyerhoff, 2014; Kilgarriff, 2005; Lijfijt et alii., 2016). Such demonstrations have obviously not been suficient, since these tests continue to be used in corpus linguistics and they are still the only statistical tests available in WST and AntConc. We are thus proposing another proof of the problem. We will evaluate the effectiveness of a statistical test based on what it is really used for, that is, the conclusion derived from a signiicant difference. If a test claims that a given word is more frequent in one variety of English than it is in another because it inds a signiicant difference between the frequency of this word in the two corpora, it is expected that if two other corpora that differ on the same dimension are analysed, that difference will also be observed. One can immediately see the problematic consequences resulting from a test that is not very effective according to this criterion: nobody can trust the conclusions to which it leads. This evaluation procedure is used in the analyses reported below, which were conducted on the distinction between American and British English. The statistically signiicant differences were determined on the basis of two corpora of one million words each, and the veriication on the basis of two very large corpora, frequently used as reference corpora for the varieties in question. 4.1. Materials 4.1.1. Corpora for inding the signiicant differences We made use of the FLOB (Freiburg LOB Corpus of British English) and the FROWN corpus (Freiburg Brown Corpus of American English), both compiled at the University of Freiburg to be as similar as possible except, of course, in terms of the variety of English. Each corpus contains a million words, corresponding to 500 extracts from texts4 published in the early 90s. Each contains approximately 2000 words, and they comprise 15 genres of written texts, such as press texts, scientiic writing, romantic iction and science iction. They are available on the ICAME CD-ROM (Holand, Lindebjerg & Thunestvedt, 1999). 4 The resampling tests do not require both corpora to contain the same number of texts. 42 Yves Bestgen 4.1.2. Corpora for evaluating the test decisions Two large reference corpora for these varieties of English were used: • The British National Corpus (BNC), a 100-million-word collection of samples of written and spoken language designed to represent a wide cross-section of British English from the late 20th century. • The Corpus of Contemporary American English (COCA) is a very large and balanced corpus of American English. The version we used contains more than 425 million words of text (20 million words for each year between 1990 and 2011) and is equally divided between speech, iction, popular magazines, newspapers and academic texts. In the following analyses, a word is considered typical of an English variety according to the reference corpus when its relative frequency is higher in the corresponding corpus. 4.2. Procedure A series of pre-treatments had to be applied to the texts, such as word segmentation and special character removal. The same pre-processing steps were carried out on the analysed corpora (FLOB and FROWN) and on the reference corpora (BNC and COCA). The Chi2, LL, WMW and FP tests were applied to all words with a total frequency of at least 10 in the two corpora, so as to analyse only words with a suficient expected frequency (a requirement for using the Chi2 test). To estimate the p-values for the two resampling tests, one million permutations were made. The probability threshold for deciding that a word is signiicantly more frequent in one of the two compared corpora was set at 0.000001, which is the default value in WST. The analyses were carried out twice: the irst time without taking into account the dispersion criterion and the second time only considering words occurring at least in 5% of the texts of the corpus in which they have the highest relative frequency. This is the dispersion criterion, the range, which is used in WST, and it is set to its default value in this software. This threshold of 5% corresponds to 25 texts in these 43 Getting rid of the Chi-square and Log-likelihood tests... corpora and therefore implies a minimum frequency of 25 occurrences of the word in the corpus. An advantage of the range over many other measures of dispersion is that it is easily interpretable. It is important to compare the performance of the tests with and without a dispersion threshold because few studies use them, whereas Oakes and Farrow (2007) have shown that it is useful for iltering uninteresting words when using the Chi2 test. 4.3. Results Table 2 summarizes the main results of the analyses. For each statistical test, and with or without taking the dispersion criterion into account, the number of words considered as signiicant at the threshold of 0.000001 is given, as well as the proportion of these words validated by the reference corpora and the number of words not validated. As can be seen, many more words are selected by the Chi2 and LL tests than by the two adequate tests, conirming the criticism raised by Kilgarriff (2005). Without a control on dispersion, a non-negligible percentage of these words is not validated by the reference corpora. When the dispersion threshold is taken into account, 8% of the words selected by the two inappropriate statistical tests are rejected. For both appropriate statistical tests, the results are very different. These tests clearly select fewer words, but all of them are validated when dispersion is taken into account, and only one word is not validated when this criterion is not considered Without Range Test Nbr. Sig %OK With Range Nbr. KO Nbr.Sig %OK Nbr. KO CH 577 83.36 96 280 92.14 22 G 805 81.24 151 288 92.01 23 WMW 122 99.18 1 113 100.00 0 FP 104 99.04 1 99 100.00 0 Table 2: Results for the four statistical tests From a qualitative point of view, the words selected by the Chi2 and the LL tests that were not validated by the corpus of reference when ap- 44 Yves Bestgen plying the dispersion criterion are as follows (ordered according to their keyness score): t, i, japan, have, st, m, ai, children, opera, last, male, stress, performance, poll, has, relations, okay, legal, mental, d, yeah and prison. The LL adds the word patient to this list. This list includes the word male, which has been used as an example in Table 1 and which is considered by the inadequate tests as being typical of American English. It is interesting to compare this list with the 25 words validated by the corpus of reference with the highest keyness scores: percent, which, cent, labour, toward, program, clinton, bush, president, programs, towards, american, uk, per, states, london, labor, british, was, defense, centre, center, britain, united and washington. This list includes the other example in Table 1 (towards). There is no doubt that the words on the second list are clearly more easily interpretable, in the sense that it is easy to guess the variety of English in which they occur most frequently, whereas it is much more dificult for the irst list. The term selected by the WMW and FP tests that is not validated by the corpus of reference is DC, which is more frequent in the FROWN corpus than the FLOB corpus, but less frequent in the COCA than in the BNC, where it appears not only as expected after Washington, but also as the abbreviation of direct current and in an extract of The Dickens Index book. The objective of this analysis is to illustrate the problems posed by the classical Chi2 and LL tests and to show that the proposed tests do not encounter these dificulties. It is not possible to analyse the two inappropriate tests in detail, in order to determine whether it is possible to make them more eficient, by using more extreme probability thresholds or by using other dispersion measures. Such analyses would require a variation in the size of the corpora to determine whether or not an eficient solution for comparing two one-million-word corpora is also appropriate for smaller and larger corpora, or for corpora of different sizes. 5. How can the adequate test be obtained? The previous section very concretely shows the problems caused by using inadequate statistical tests when analysing lexical differences between corpora. However, to persuade researchers to adopt the adequate tests, it is necessary to simplify their use as much as possible. This section presents instructions for both WST and R, which make it easy to use these tests. Getting rid of the Chi-square and Log-likelihood tests... 45 5.1. Getting the necessary data with WST The irst step is to create a wordlist for each corpus containing the frequencies of all of the words in all of the texts by supplying a ile per text to WST (after, if necessary, using the Split function in File Utilities). In the Wordlist function, use the Make a batch now option with One ile with all individual results in it in zip format. Then, use the Detailed consistency function, where you select the zip ile containing all Wordlists (one per ile). Finally, save the results displayed on the screen in a .txt ile with tab as column separator and uncheck the Separate thousands box. These steps must be performed separately for each corpus. 5.2. The R script for computing the statistical tests The R (R Core Team, 2013) script provided in the appendix requires a complementary package, called Coin (Hothorn, Hornik, van de Wiel & Zeileis, 2008), which performs the resampling tests. If it is not already installed, the script tries to do so. To use this script, just copy the whole code (the CorpLexTests function) and paste it into the R console window and press Enter. Then, it is necessary to adapt the command line provided below to the iles to be analysed and the parameter values to be used. CorpLexTests(ile1=”E:/FLOBList.txt”, ile2=”E:/FROWNList.txt”, minfreq = 10, minrange = 0.05, maxpll = 0.0001, niter1 = 10000, pperm = 0.0003, niter2 = 1000000) The parameters are as follows: • ile1 and ile2 provide the paths and ilenames for the two iles obtained from WST (on Windows, “E:\\FLOBList.txt” works as well). • minfreq indicates the minimum total frequency of the word in both corpora for the analysis to be conducted. The default value is 0 and corresponds to no threshold. However, it seems meaningless to try to determine whether a very rare word is more frequent in one corpus than in another. Moreover, in addition to the problem of non-independence described above, it is known that, in order to be valid, the Chi-square test imposes a condition on 46 Yves Bestgen the expected frequencies in the contingency table cells (usually at least ive). It should be noted that this condition does not apply to the permutation tests, but that a rare word is unlikely to be signiicant enough to merit a thorough linguistic analysis. • minrange gives the minimum threshold of the number of texts in which the analysed word should occur. This value is given in proportion to the number of texts in the corpus containing the most occurrences (in terms of relative frequency) of this word. The default value, taken from WST, is 0.05. • maxpll indicates the maximum p-value from the LL test for the analysis to be conducted. It must be between 0 and 1, 0 allowing only a very small number of tests and 1 allowing all tests. This function makes it possible to reduce the duration of the analysis by only performing the resampling tests on words which would have been declared signiicant by the usual (but problematic) LL test. The default value is set to 0.000001, as in WST. • niter1 indicates the number of resamplings to be performed for any word that successfully passes the three conditions (minfreq, minrange and maxpll). It is desirable not to go below 1000. The chosen value will determine the smallest probability that can be given to a word by the resampling tests. For example, 1000 corresponds to a probability of 0.001. The greater the number of resamplings requested, the more time the analyses will take. For this reason, it is possible to request additional resamplings for the most signiicant words using the last two parameters. This function is activated by the parameter pperm, which gives the maximum p-value for performing a series of complementary resamplings. It is applied independently to each of the two resampling tests. Thus, for each of these tests, if the p-value resulting from the irst niter1 resamplings is less than or equal to this parameter, a total of niter2 resamplings is performed. Niter2 must necessarily be greater than niter1 (since it includes these iterations). Setting niter2 to 1000000 by default will yield probabilities as small as 0.000001, the default threshold for WST. The default value of pperm is set to 0, and this option is thus not used. The only required parameters are the two ile paths, since all other parameters have acceptable default values. This script works both on Windows and Mac OS X (but not WST). Getting rid of the Chi-square and Log-likelihood tests... 47 5.3. R script output The results are displayed on the R Console and saved in a ile named CorpLexTestsRes.txt in the folder where corpus 1 is located. The irst fourth lines give general information about the analyses performed. The irst two lines show the ile path and name of each corpus, as well as the number of texts and the number of words they contain. This line thus serves as a reminder of which corpus corresponds to corpus 1 in the results. The third row contains the values of the parameters used in the analysis. The fourth line gives the names of the variables provided in the results. Figure 1 : Output of the R script for the FLOB vs. FROWN comparison (partim) The printed results are as follows: • The analysed word. • FreqC1 gives the frequency of the word in corpus 1 and FreqC2 its frequency in corpus 2. The relative frequencies can be calculated using the total frequencies of the two corpora given in the irst row. • Chi2 gives the Chi-square statistic and Chi2_p the corresponding p-value. LL and LL_p do the same for the LL test. • Range gives the number of texts containing this word in the corpus in which the word is the most frequent (in terms of relative frequency). • WMW_p gives the obtained p-value from the WMW test and FP_p does the same for the FP test. As can be seen in the above extract of a comparative analysis of the FLOB and FROWN corpora, only those words which pass the minfreq, 48 Yves Bestgen the minrange and the maxpll thresholds are printed. The word BABY is preceded by an asterisk because the corpus that contains the most occurrences of this word is the one whose texts do not have the highest average rank. In this case, caution should be taken when interpreting the results, as explained in section 3. However, it is unlikely that this kind of result will be observed for words that are very signiicant for both the WMW and the FP tests. 6. Conclusion This paper deals with statistical tests used in corpus linguistics for analysing lexical differences between corpora. Its most important contributions are the following: • To explain in detail why the Chi2 and LL tests are inadequate in this research ield. It is important to emphasize again that the problem raised applies as much to the statistics resulting from the tests as it does to the probability that it is derived from them, and therefore also affects the keyness score or other any effect size measures such as that proposed by Gabrielatos and Marchi (2011). The problem raised affects any use of these tests to analyse corpora regardless of the linguistic unit counted: words, but also lexical bundles, collocations, syntactic constituents... It follows that, for instance, using these tests to analyse the use of the passive voice in different corpora is also inappropriate. • To concretely demonstrate the seriousness of the erroneous conclusions reached when they are used. • To propose two tests that are adequate and effective. • To provide a concrete solution, which we hope is easy to put into practice, to use the appropriate tests. An important question that has so far gone unanswered is which of the two appropriate tests is preferable. The main difference between them is that the WMW test, based as it is on ranks, is more sensitive than the FP test to small differences in frequency within a relatively large number of texts, whereas the FP test is more sensitive to the presence of a relatively small number of texts containing relatively high frequencies. Ideally, both tests should be signiicant. If only one of them Getting rid of the Chi-square and Log-likelihood tests... 49 is clearly not signiicant for the chosen probability threshold, it is necessary to be very careful in interpreting the results and, in any case, to analyse the distribution of this word in the texts of the two corpora using WST. 7. Bibliography Anthony, Laurence. 2012. AntConc Version 3.3.5. [Computer Software]. Tokyo: Waseda University. http://www.antlab.sci.waseda.ac.jp/. Baker, Paul. 2004. Querying keywords: questions of difference, frequency and sense in keywords analysis. Journal of English Linguistics 32(4): 346359. Berry Kenneth J.; Mielke, Paul W. & Mielke, Howard W. 2002. The Fisher-Pitman permutation test: an attractive alternative to the F test. Psychological Reports 90: 495-502. Bestgen, Yves. 2012. Analyse des différences lexicales entre des corpus : test ou distance du Khi-2? Dans Actes de JADT 2012 : 11es Journées internationales d’Analyse statistique des Données Textuelles, 150-161. Bestgen, Yves. 2014. Inadequacy of the chi-squared test to examine vocabulary differences between corpora. Literary and Linguistic Computing 29: 164-170 Biber, Doug & Jones, James. 2009. Quantitative methods in corpus linguistics. In Ludeling, Anke & Kytö, Merja (ed.) Corpus Linguistics. An International Handbook. Berlin: Mouton de Gruyter, 1286-1304. Brezina, Vaclav & Meyerhoff, Miriam. 2014. A critical review of sociolinguistic generalisations based on large corpora. International Journal of Corpus Linguistics 19(1): 1-28. Church, Kenneth W. 2000. Empirical estimates of adaptation: The chance of two Noriegas is closer to p/2 than p2. In Proceedings of the 17th International Conference on Computational Linguistics, 180-186. Efron, Brad & Tibshirani, Rob. 1993. An introduction to the bootstrap. New York: Chapman & Hall. Gablasova, Dana; Brezina, Vaclav & McEnery, Tony. 2017. Exploring learner language through corpora: Comparing and interpreting corpus frequency information. Language Learning (advance access). doi: 10.1111/ lang.12226. Gabrielatos, Costas & Marchi, Anna. 2011. Keyness: Matching metrics to deinitions. Paper presented at the Corpus Linguistics in the South: Theoretical-methodological challenges in corpus approaches to discourse studies - and some ways of addressing them. Portsmouth: 5th November 2011. 50 Yves Bestgen Good, Phillip I. 2005. Permutation, parametric and bootstrap tests of hypotheses (Third Edition). New-York: Springer. Gries, Stefan Th. 2005. Null hypothesis signiicance testing of word frequencies: a follow-up on Kilgarriff. Corpus Linguistics and Linguistic Theory 1: 277-294. Gries, Stefan Th. 2006. Exploring variability within and between corpora: some methodological considerations. Corpora 1(2): 109-151. Hesterberg, Tim; Moore, David S.; Monaghan, Shaun; Clipson, Ashley & Epstein, Rachel. 2006. Bootstrap methods and permutation tests. Supplemental chapter for Moore, David S. & McCabe, George P. Introduction to the Practice of Statistics. New York: W H Freeman. Holand, Knut & Johansson, Stig. 1982. Word frequencies in British and American English. Bergen: The Norwegian Computing Centre for the Humanities. Holand, Knut; Lindebjerg, Anne & Thunestvedt, Jorn. 1999. ICAME collection of English language corpora. [CD-ROM]. Bergen: The HIT Centre, University of Bergen. Hothorn, Torsten; Hornik, Kurt; van de Wiel, Mark A. & Zeileis, Achim. 2008. Implementing a class of permutation tests: the coin package. Journal of Statistical Software 28(8): 1-23. Howell, David C. 2008. Méthodes statistiques en sciences humaines. Bruxelles: De Boeck Université. Kilgarriff, Adam. 1996. Comparing word frequencies across corpora: Why Chi-square doesn’t work, and an improved LOB-Brown comparison. In Proceedings of ALLC-ACH Conference, 169-172. Kilgarriff, Adam. 2005. Language is never, ever, ever random. Corpus Linguistics and Linguistic Theory 1: 263-275. Kurtz, Albert K. & Mayo, Samuel T. 1979. Statistical methods in psychology and education. New York: Springer. Laviosa, Sara; Pagano, Adriana; Kemppanen, Hannu & Ji, Meng. 2017. Textual and contextual analysis in empirical translation studies. Singapore: Springer. Lee, David Y. W. & Chen, Sylvia Xiao 2009. Making a bigger deal of the smaller words: Function words and other key items in research writing by Chinese learners. Journal of Second Language Writing 18: 149-165. Leedham, Maria. 2012. Review of: “New trends in corpora and language learning” and “Keyness in texts”. System 40(1): 162-165. Lewis, Don & Burke, C. J. 1949. The use and misuse of the chi-square test. Psychological Bulletin 46(6): 433-489. Lijfijt, Jefrey; Nevalainen, Terttu; Säily, Tanja; Papapetrou, Panagiotis; Puolamäki, Kai & Mannila, Heikki. 2016. Signiicance testing of word Getting rid of the Chi-square and Log-likelihood tests... 51 frequencies in corpora. Literary and Linguistic Computing 31(2): 374397. Lubbers Quesada, Margaret & Blackwell, Sarah E. 2009. The L2 acquisition of null and overt spanish subject pronouns: A pragmatic approach. In Collentine, Joseph (ed.) Selected Proceedings of the 11th Hispanic Linguistics Symposium. Somerville, MA: Cascadilla Proceedings Project, 117-130. Marquilhas, Rita. 2015. Non-anachronism in the historical sociolinguistic study of Portuguese. Journal of Historical Sociolinguistics 1(2): 213242. Neuhauser, Markus & Manly, Bryan F. J. 2004. The Fisher-Pitman permutation test when testing for differences in mean and variance. Psychological Reports 94: 189-194. Oakes, Michael & Farrow, Malcolm. 2007. Use of the chi-squared test to examine vocabulary differences in English language corpora representing seven different countries. Literary and Linguistic Computing 22: 85-99. Paquot, Magali & Bestgen, Yves. 2009. Distinctive words in academic writing: A comparison of three statistical tests for keyword extraction. In Jucker, Andreas H.; Schreier, Daniel & Hundt, Marianne (ed.) Corpora: Pragmatics and Discourse. Amsterdam: Rodopi, 247-269. R Core Team. 2013. R: A language and environment for statistical computing. Vienna: R Foundation for Statistical Computing. http://www.R-project. org/. Rayson, Paul; Leech, Geoffrey & Hodges, Mary. 1997. Social differentiation in the use of English vocabulary: Some analyses of the conversational component of the British National Corpus. International Journal of Corpus Linguistics 2: 133-152. Rayson, Paul; Berridge, Damon & Francis, Brian. 2004. Extending the Cochran rule for the comparison of word frequencies between corpora. In Proceedings of the 7th International Conference on Statistical analysis of textual data, 926-936. Rayson, Paul & Garside, Roger. 2000. Comparing corpora using frequency proiling. In Kilgariff, Adam & Sardinha, Tony B. (ed.) Proceedings of the Comparing Corpora Workshop, 1-6. Sampson, Geoffrey. 2003. Statistical linguistics. In Frawley, William J. International Encyclopedia of Linguistics (2 ed.). New York: Oxford University Press. Scott, Mike. 1997. PC analysis of key words - and key key words. System 25(2): 233-245. Siyanova-Chanturia, Anna. 2015. Collocation in beginner learner writing: A longitudinal study. System 53: 148-160. 52 Yves Bestgen Tribble, Chris. 2000. Genres, keywords, teaching: towards a pedagogic account of the language of project proposals. In Burnard, Lou & McEnery, Tony (ed.) Rethinking language pedagogy from a corpus perspective: papers from the third international conference on teaching and language corpora. Bern: Peter Lang, 75-90. Wallis, Sean. 2013. z-squared: The origin and application of Chi-square. Journal of Quantitative Linguistics 20(4): 350-378. Getting rid of the Chi-square and Log-likelihood tests... 53 Appendix: The R script for computing the statistical tests CorpLexTests <- function(ile1=”no-ile”,ile2=”no-ile”, minfreq=0,minrange=0.05,maxpll=0.000001,niter1=10000, maxpperm=0,niter2=1000000) { #parametres if (maxpll<0 | maxpll>1) {cat(sprintf(“\nParamater error : maxpll= %f not between 0 and 1\n”,maxpll)); stop(“Please change this parameter value”)} if (maxpperm<0 | maxpperm>1) {cat(sprintf(“\nParamater error : maxpperm= %f not between 0 and 1\n”,maxpperm)); stop(“Please change this parameter value”)} if (minrange<0 | minrange>1) {cat(sprintf(“\nParamater error : minrange= %f not between 0 and 1\n”,minrange)); stop(“Please change this parameter value”)} if (minfreq<0) {cat(sprintf(“\nParamater error : minfreq= %d must be >= 0\n”,minfreq)); stop(“Please change this parameter value”)} if (maxpperm>=1/niter1 & niter2<=niter1) {cat(sprintf(“\nParamater error : niter2= %d must be > niter1= %d\n”,niter1,niter2)); stop(“Please change this parameter value”)} cat(“Loading coin package\n”) if(!require(coin)){ #Try to install the Coin package if not already installed install.packages(“coin”) } library(“coin”) cat(“Reading irst ile\n”) d1=read.table(ile1, header = FALSE,skip=1,comment.char=”.”,row.names = 1,ileEncoding=”UTF-16LE”,sep = “\t”,dec = “,”, quote=”\””) cat(“Reading second ile\n”) d2=read.table(ile2, header = FALSE,skip=1,comment.char=”.”,row.names = 1,ileEncoding=”UTF-16LE”,sep = “\t”,dec = “,”, quote=”\””) #fnout = paste(dirname(ile1),”Res.txt”,sep=”/”) si on est sur de sep... fnout = paste(substr(ile1,1,nchar(ile1)nchar(basename(ile1))),”CorpLexTestsRes.txt”,sep=””) cat(“Preparing data for processing\n”) #Delete some columns d1=d1[,-(2:5)] d2=d2[,-(2:5)] #nbr of texts ncol1 <- ncol(d1)-1 #the irst is the word ncol2 <- ncol(d2)-1 54 Yves Bestgen #merge by words, keeping all of them da=merge(d1,d2,by.x=1,by.y=1,all=TRUE) #transpose the data, but not the word tda=t(da[,-1]) #remplace NA by 0 tda[is.na(tda)] <- 0 #Corpus id corpus=c(rep(1, ncol1), rep(2, ncol2)) #number of mots in each text rs=rowSums(tda) #add these variables to the data mydata=cbind(corpus,rs,tda) lastword <- ncol(mydata) #last column number cat(“Start computing the statistical tests\n”) sink(fnout, append=FALSE, split=TRUE) #to print the output in a ile #print irst lines cat(sprintf(“# Corpus 1: File=%s NbrText=%d NbrWord=%d\n# Corpus 2: File=%s NbrText=%d NbrWord=%d\n”, ile1,ncol1,sum(mydata[mydata[,’corpus’] == 1,2]),ile2,ncol2,sum(mydata[mydata[,’corpus’] == 2,2]))) cat(sprintf(“# minfreq=%d maxpll=%f niter1=%d maxpperm=%f niter2=%d\ n”,minfreq,maxpll,niter1,maxpperm,niter2)) cat(sprintf(“%-25s %10s %10s %10s %13s %10s %13s %10s %10s %10s\n”,”Word”, “FreqC1”,”FreqC2”,”Chi2”,”Chi2_p”,”LL”,”LL_p” ,”Range”,”WMW_p”,”FP_p”)) #Loop on the words for (myidx in 3:lastword) { if (sum(mydata[,myidx])>=minfreq) { #compute the number of other words in the texts otherwords <- mydata[,2]-mydata[,myidx] #Compute Chi2 and LL ori <- as.table(rbind(tapply(mydata[,myidx], list(mydata[,’corpus’]), FUN=sum), tapply(otherwords, list(mydata[,’corpus’]), FUN=sum))) oriXsq <- chisq.test(ori,correct=FALSE) LL<2*sum(oriXsq$observed[oriXsq$observed>0]*log(oriXsq$observed [oriXsq$observed>0]/oriXsq$expected[oriXsq$observed>0])) LL_pval=1-pchisq(LL,1) #Compute range in the corpus in which this word is the most frequent (in relative frequency) Getting rid of the Chi-square and Log-likelihood tests... 55 if (oriXsq$observed[1]>=oriXsq$expected[1]) { whichcor=1 range<-sum(mydata[mydata[,’corpus’] == 1,myidx] > 0) rm<-range/ncol1 } else { whichcor=2 range<-sum(mydata[mydata[,’corpus’] == 2,myidx] > 0) rm<-range/ncol2 } if (rm>=minrange & LL_pval<=maxpll) { #No output if range is insuficient or the p-value for LL is to large #Relative frequency (by overwriting the original data) mydata[,myidx] <- mydata[,myidx]/mydata[,2] wt<-wilcox_test(mydata[,myidx] ~ factor(mydata[,’corpus’]),distribution = “asymptotic”) pwta<-pvalue(wilcox_test(mydata[,myidx] ~ factor(mydata[,’corpus’]),dis tribution = approximate(B = niter1-1))) if ((pwta*(niter1-1)+1)/niter1<=maxpperm) { pwta<-(1+pwta+pvalue(wilcox_test(mydata[,myidx] ~ factor(mydata[ ,’corpus’]),distribution = approximate(B = niter2-niter1)))*(niter2niter1))/niter2 } else pwta<-(pwta*(niter1-1)+1)/niter1 pfpa<-pvalue(oneway_test(mydata[,myidx] ~ factor(mydata[,’corpus’]), distribution = approximate(B = niter1-1))) if ((pfpa*(niter1-1)+1)/niter1<=maxpperm) { pfpa<-(1+pfpa+pvalue(oneway_test(mydata[,myidx] ~ factor(mydata [,’corpus’]),distribution = approximate(B = niter2-niter1)))*(niter2niter1))/niter2 } else pfpa<-(pfpa*(niter1-1)+1)/niter1 if ((statistic(wt)>0 & whichcor==2) | (statistic(wt)<0 & whichcor==1)) { #For discordances between ranks and frequencies cat(sprintf(“*%-24s %10.0f %10.0f %10.2f %13.4e %10.2f %13.4e %10.0f %10.8f %10.8f\n”, da[myidx2,1],oriXsq$observed[1],oriXsq$observed[3],oriXsq$statistic,oriXsq $p.value,LL,LL_pval,range,pwta,pfpa)) } else { #For the normal case 56 Yves Bestgen cat(sprintf(“%-25s %10.0f %10.0f %10.2f %13.4e %10.2f %13.4e %10.0f %10.8f %10.8f\n”, da[myidx-2,1],oriXsq$observed[1],oriXsq$observed[3],oriXsq$statistic,o riXsq$p.value,LL,LL_pval,range,pwta,pfpa)) } } }#End of the range and the maxpll conditions } #Loop end sink() #End of output in a ile } ojs.uv.es/index.php/qilologia/index Qf Lingüístics “El criado pesado”: La caracterización en la serie Águila Roja “The annoying servant”: Characterization in the TV series Águila Roja Luisa Chierichetti Università degli Studi di Bergamo. [email protected] Recibido: 14/05/2017. Aceptado: 10/10/2017 Resumen: El presente artículo, partiendo de los más recientes estudios sobre el discurso telecinemático, pretende contribuir a la investigación en el ámbito de la lingüística aplicada a las series televisivas, las cuales constituyen uno de los productos culturales populares más inluyentes en la sociedad contemporánea. El corpus de estudio está compuesto por los guiones completos de la exitosa serie Águila Roja, emitida por Radio Televisión Española entre 2009 y 2016. Compaginando técnicas de la lingüística de corpus y del análisis del discurso, este estudio examina la caracterización de Sátur, uno de los personajes principales de la icción, a través de la co-construcción del signiicado elaborada por la audiencia televisiva. Los resultados sugieren que el discurso del personaje está caracterizado por el uso del registro coloquial contemporáneo y por incongruencias y anacronismos, rasgos que crean humor y familiaridad con los espectadores. Palabras clave: series televisivas; discurso telecinemático; caracterización; lingüística de corpus; análisis del discurso. Abstract: This article, based on the most recent studies on telecinematic dialogue, proposes a contribution to linguistic research on television series, one of the most inluential popular cultural products in contemporary society. The work is based on the complete scripts of the successful Spanish series Águila Roja, aired on Radio Televisión Española between 2009 and 2016. Combining techniques of corpus linguistics and discourse analysis, this study examines the characterization of Sátur, one of the main characters of this iction, through the co-construction of the meaning, as processed by the television audience. The results suggest that Sátur’s discourse is characterized by the use of contemporary colloquial language and by incongruity; such features create humor and familiarity with the audience. Keywords: television series; telecinematic discourse; characterization; corpus linguistics; discourse analysis. Chierichetti, Luisa. 2017. “‘El criado pesado’: La caracterización en la serie Águila Roja”. Quaderns de Filologia: Estudis Lingüístics 22: 57-78. doi: 10.7203/ qf.22.11301 “El criado pesado”: La caracterización en la serie Águila Roja 59 1. Introducción En este artículo nos ocupamos del lenguaje de las series televisivas y especialmente de cómo se utiliza para crear la identidad expresiva de los personajes. Siguiendo a Bednarek (2010, 2011a, 2011b, 2012a, 2012b, 2015a, 2015b) y los trabajos sobre la caracterización en textos teatrales de Culpeper (2001, 2009), recurrimos a las herramientas de la lingüística de corpus y del análisis del discurso para examinar los guiones originales de una serie española de gran éxito, Águila Roja, indagando en la construcción de uno de los personajes principales, Sátur. La caracterización lingüística de personajes en la icción audiovisual ha sido objeto de cierto interés especialmente con referencia a las series televisivas norteamericanas, que actualmente gozan de un abrumador éxito a nivel mundial. Recordemos, sin pretensión de exhaustividad, algunas investigaciones de relieve a partir del volumen de Baker (2005), en el que se combina la sociolingüística y la lingüística de corpus en la construcción de la identidad homosexual, dedicando un capítulo a los personajes gais de la serie Will and Grace. También Wodak (2009), estudiando la relación compleja y cambiante entre la política y los medios de comunicación, examina la construcción discursiva del héroe en la serie The West Wing. Bednarek, en su volumen sobre el lenguaje de la icción televisiva (2010), así como en algunos ensayos posteriores (2011a, 2011b, 2012a, 2012b) sobre la caracterización de los personajes de series, ofrece una relexión sobre las características de la icción en la pequeña pantalla uniendo los temas de la multimodalidad, el género y la audiencia con la creación de una “identidad expresiva del personaje”, utilizando técnicas de análisis de corpus y el análisis discursivo ‘manual’. Bubel (2006) dedica su trabajo de tesis doctoral a la construcción discursiva de las relaciones entre personajes, centrándose en la amistad en Sex and the City. La caracterización lingüística de personajes de series es investigada por Bubel & Spitz (2006), Gregori Signes (2007) y Richardson (2010) focalizando los medios del humor verbal y la descortesía, y por Mandala (2007, 2008, 2011), quien se centra en relevantes recursos discursivos en inglés, como el uso de adjetivos en -y, el cambio de código del inglés al chino y la cortesía lingüística. Hasta el momento, que sepamos, las investigaciones sobre la creación de personajes en series españolas se ciñen al ámbito de la comunicación y de la sociología más que al lingüístico, como las de Galán 60 Luisa Chierichetti Fajardo (2007) y González de Garay (2009) sobre construcción de género y López & Cuenca (2005), Galán Fajardo (2006), Igartua, Barrios & Ortega (2012) y Marcos & Igartua (2014) sobre estereotipos sociales. Águila Roja ha suscitado interés a nivel académico como fenómeno de audiencia (Barrientos Bueno, 2011, 2012) y por su desarrollo transmediático (Costa Sánchez & Piñeiro Otero, 2012; Guerrero, 2014). Nos proponemos comprobar, a través del análisis del discurso de uno de los personajes principales y más populares de Águila Roja, el del criado Sátur, cómo los rasgos coloquiales y cierta incongruencia y tendencia al anacronismo en sus diálogos están muy lejos de ser producto de descuido o torpeza por parte de los autores de la serie; por lo contrario, sostenemos que son el producto de una caracterización orientada a crear un personaje contemporáneo y cercano al público, una igura que escapa del marco iccional del Siglo de Oro para acercarse al universo de la audiencia televisiva. A este in, realizamos en primer lugar una breve síntesis del tema y de las características más sobresalientes de la serie y delimitamos el marco contextual de nuestro análisis, el discurso telecinemático. Posteriormente, presentamos el corpus de trabajo y la metodología, para luego exponer nuestro análisis y llegar a las conclusiones. 2. Águila Roja Águila Roja es una serie de televisión contemporánea, creada por Daniel Écija, Pilar Nadal, Ernesto Pozuelo y Juan Carlos Cueto, producida por Globomedia para Radio Televisión Española y emitida en La 1 desde el 19 de febrero de 2009 hasta el 27 de octubre de 2016. Se compone de nueve temporadas y 116 episodios. Desde su primer episodio logró un gran éxito de audiencia, con un 30 % de cuota de pantalla y más de cinco millones de espectadores. Su calidad televisiva ha sido reconocida por la industria de la televisión, ya que la serie ha cosechado un total de 37 premios, más otras 17 nominaciones1. La serie ha sido emitida en su versión original o adaptada en decenas de países2. A partir de la producción 1 Obtuvo el Premio Ondas (2010) a la mejor serie nacional, el TP de Oro (2009, 2010 y 2011), la Medalla de Plata del Festival de Televisión de Nueva York, el premio como Mejor Serie del Festival de TV de Vitoria y seis premios de la Academia de TV (www. globomedia.es/2005-2009) [Acceso 29/3/2017]. 2 https://es.wikipedia.org/wiki/Águila_Roja [Acceso 29/3/2017]. “El criado pesado”: La caracterización en la serie Águila Roja 61 audiovisual convencional se ha desarrollado un producto transmedia, también galardonado como “Mejor contenido multiplataforma” (Costa Sánchez & Piñeiro Otero, 2012: 107). Los fans de la serie tienen a disposición un micrositio en la web de RTVE3 que vehicula la información oicial de la serie, permite el consumo bajo demanda de todos sus capítulos, fomenta la interacción con los diferentes públicos (encuestas, foros, redes sociales) y da acceso al sitio web miaguilaroja.com, centrado en un videojuego. Consideramos, pues, que se trata de un producto que ha suscitado un interés considerable no solo dentro, sino también fuera del panorama cultural español. La web de RTVE presenta así la serie, en la que se hibridan varios géneros televisivos (aventuras, histórico, cómico, romántico) (Barrientos Bueno, 2011: 5), una “dramedia” (es decir, una mezcla plausible de elementos dramáticos y cómicos) de capa y espada (Guerrero, 2014: 241) destinada a la audiencia familiar: Televisión Española entra de lleno en el género de aventuras de época con Águila Roja, una producción de Globomedia para toda la familia ambientada en el Siglo XVII español. Con Águila Roja nos adentramos en una serie de aventuras e intriga sobre el valor, la nobleza, la amistad y el amor. El protagonista, interpretado por David Janer, es un héroe anónimo justiciero del Siglo XVII –conocido con el apelativo de Águila Roja– que ayuda a los débiles y que está empeñado en desenmascarar la conspiración que se esconde tras el asesinato de su joven esposa y en conocer sus orígenes4. 3. El diálogo telecinemático Utilizamos, siguiendo a Piazza, Bednarek & Rossi (2011a), el adjetivo telecinemático al referirnos a características compartidas entre el lenguaje iccional y narrativo por el cine y la televisión; aunque hay diferencias intrínsecas entre los dos medios, es especialmente signiicativo el hecho de que ambos discursos estén regulados por el doble plano de comunicación que caracteriza a todo discurso en la pantalla –y, hasta cierto punto, también al discurso teatral– (Piazza, Bednarek & Rossi, 2011b: 1): 3 4 http://www.rtve.es/television/aguila-roja/ [Acceso 29/3/2017]. www.rtve.es/television/aguila-roja/serie [Acceso 29/3/2017]. 62 Luisa Chierichetti At the utter level there is a relationship between dramatist(s) and audience(s); within that are the displayed relationships between characters (Richardson 2010: 188). El diálogo telecinemático está cuidadosamente diseñado para los oyentes no ratiicados5, para que puedan reconstruir los conocimientos compartidos entre los participantes en la conversación (Bubel, 2008: 69); la audiencia, para comprender el discurso telecinemático, lo “coconstruye” utilizando una base de conocimientos compartidos activados por modelos cognitivos o frames, nacidos del conocimiento del mundo; este incluye no solo la realidad, sino también el mundo iccional de los personajes televisivos y cinematográicos (Richardson, 2010: 127, 143). Con respecto a los espectadores circunstanciales u oyentes furtivos en el mundo real, los espectadores de películas no tienen derechos o responsabilidades conversacionales, ni pueden negociar el signiicado con los hablantes, tomando parte en el intercambio; estas dos desventajas hacen que el diseño de diálogos telecinemáticos sea un reto para el equipo de producción (Bubel, 2008: 63-64). El diálogo telecinemático nunca es realista, porque siempre está diseñado en función de una audiencia; sobre todo, en el caso de las series, está ahí para “enganchar” a los espectadores (Bednarek, 2010: 64; 2012a: 57). Este diálogo profesional exige que cada personaje tenga una voz propia, pero su función principal es “hacer que progrese la trama, dar información, revelar psicología o datos, establecer conlictos, ofrecer contextos del pasado”; el diálogo no solo se mantiene entre personajes, sino que aparece también en las narraciones con voz en off, en los pensamientos expresados con voz en off y en los monólogos (Fernández Tubau, 2012: 180). Es importante señalar que, sin duda, la esencia de los personajes va más allá del uso de la lengua, y no solo porque se trate de personajes televisivos; los esquemas de los personajes son constructos cognitivos y su interpretación se halla en la intersección del mundo dramático y Es conocida la clasiicación de los oyentes establecida por Goffman (1981: 124-159), que distingue entre, por un lado, los participantes ratiicados (ratiied participants), divididos entre destinatarios directos (addressed recipients) y destinatarios indirectos (unaddressed), y, por otro, los espectadores no ratiicados o circunstanciales (bystanders), divididos, a su vez, en oyentes casuales (overhearers) y oyentes furtivos (eavesdroppers). 5 “El criado pesado”: La caracterización en la serie Águila Roja 63 del mundo real, tal y como existen en la mente del telespectador (Richardson, 2010: 149). Con todo, una de las funciones del diálogo es la revelación del personaje y el descubrimiento por parte de los espectadores de información acerca de los estados mentales y de la personalidad del personaje (Bednarek, 2010: 101), ya que, en palabras de Culpeper (2009: 31), “it is the speech of each character that partly determines the different characters we perceive”. El signiicado de los diálogos iccionales nace de la colaboración entre guionistas y público. Por un lado, los guionistas cuentan con el conocimiento esquemático que tiene la audiencia y hacen suposiciones acerca de los seres que ya pueblan los mundos cognitivos de su audiencia ideal de referencia; por otro lado, el público colabora facilitando esquemas apropiados y se acerca a la audiencia imaginada por los autores (Richardson, 2010: 150). En las series televisivas, la extensa duración de la icción permite que el público desarrolle un apego especial hacia determinados personajes y un conocimiento profundo de los acontecimientos contados (Richardson, 2010: 57). Los personajes, también a través de la caracterización de su forma de expresarse, “idelizan” la audiencia, ya que los espectadores/oyentes entablan con ellos una relación especial (Bednarek, 2012a: 201). Este fenómeno es algo que hoy en día podemos documentar por lo menos en parte, ya que en la era de la Web 2.0 el público disfruta expresando sus gustos y sus emociones a través de géneros como las redes sociales, los blogs, y los fandoms, en un ámbito transmedia en el que se sitúan fácilmente los mencionados desarrollos multiplataforma de Águila Roja. Nuestro análisis pretende demostrar que la audiencia de Águila Roja, después de reconocer a Sátur como criado del siglo xvII, basándose en la ambientación explícita de la serie, lo recategoriza como ‘contemporáneo’ a través de su discurso cercano al habla coloquial actual y en tanto que se percata de que el diálogo va más allá del intercambio diegético ‘normal’, ya que se dirige directamente al oyente furtivo. De esta manera, se rompe la ilusión de que el espectador esté solamente escuchando de manera furtiva a los personajes y se pone de maniiesto que el diálogo se dirige al público, violando el principio de la suspensión de la incredulidad y a la vez “sorprendiendo” al espectador en el acto de escuchar furtivamente (Kozloff, 2000: 57). A este efecto, que crea emoción e impresión de cercanía en la audiencia, se le añade, además, 64 Luisa Chierichetti un efecto humorístico que nace de la interrupción de la convención narrativa (Ruiz Gurillo, 2012: 26). 4. Corpus de estudio y metodología 4.1. Corpus Nuestro corpus se compone de guiones; se trata, por lo tanto, de textos escritos para ser oralizados como algo no escrito, según la clasiicación de Gregory & Carroll (1978: 47). Hemos tenido acceso a los 116 guiones originales de la serie, en sus versiones deinitivas, que resultan ser, según leemos en las portadas de los documentos, la número cuatro o cinco (en tres casos la seis). Es normal que en la icción televisiva se redacten varios borradores, que se modiican sobre la base de relecturas y revisiones comunes. La creación de guiones de series es una tarea colectiva llevada a cabo por escritores que trabajan en equipo y se sitúa en especíicas condiciones sociales especíicas de producción dentro de la industria televisiva (Richardson, 2010: 63-64). Analizamos el corpus completo de los guiones de la serie Águila Roja, del que hemos extraído los subcorpus Sátur, correspondiente a los diálogos de este personaje, y Otros, que reúne los diálogos de todos los demás personajes; los datos cuantitativos se resumen en la tabla 1: Número de types Número de tokens Corpus completo 36.009 1.575.600 Sátur 12.404 163.686 Otros 24.123 578.481 Tabla 1: Datos del corpus y de los subcorpus La información contenida en el corpus completo corresponde al conjunto de todos los guiones, en los que se incluyen no solo los diálogos, sino también las acotaciones, las descripciones y las indicaciones de voz y de encabezado de escena, es decir, toda la información que los guionistas consideran necesaria para la construcción de la serie. Los datos presentados en la tabla 1 también evidencian la importancia cuan- “El criado pesado”: La caracterización en la serie Águila Roja 65 titativa del diálogo de Sátur, que de por sí cubre un tercio del diálogo de los treinta personajes de que se compone la serie. Este tipo de información también va contenida en la tabla 2, que presentamos más abajo, que contiene las cincuenta palabras más frecuentes del corpus completo; en ella observamos que la palabra “Sátur” es la palabra léxica de mayor frecuencia tras las palabras gramaticales; el dato, correspondiente a 23.068 casos, nos permite comprobar la enorme presencia de referencias a este personaje en los guiones y, por lo tanto, su relevancia y presencia discursiva en la serie, inferior solo a la del protagonista (17.128 casos de “Gonzalo” a los que hay que añadir los 6118 de “Águila”, “Águila Roja”, “AR” y “A.R”, en un total de 23.246 casos). la, de, a, que, el, y, se, no, en, sátur, un, con, gonzalo, lo, marquesa, le, los, una, por, mira, al, comisario, es, su, está, margarita, día, las, pero, me, qué, catalina, del, alonso, para, águila, te, roja, ha, cipri, si, va, yo, muy, ese, ya, nuño, cardenal, más, mi Tabla 2: Listado de las primeras 50 palabras más frecuentes en el corpus completo A la hora de crear los subcorpus Otros y Sátur con el objetivo de deinir la caracterización del personaje de Sátur, hemos eliminado los encabezados y las descripciones, dejando solo, junto con los diálogos, las acotaciones y, en el subcorpus Otros, también los nombres de los personajes; las acotaciones nos brindan elementos de tipo contextual, mientras que los nombres de los personajes sirven para desambiguar las líneas de cada uno de ellos. Ambos subcorpus han sido examinados y corregidos manualmente, ya que la complejidad de la redacción de los guiones no permite una selección automática completa. El análisis del corpus y de los subcorpus se ha llevado a cabo utilizando las herramientas del programa AntConc, en su versión 3.4.4w (Anthony, 2014). 4.2. Metodología Para nuestro análisis partimos de la amplia noción de identidad expresiva del personaje de icción televisiva formulada por Bednarek (2011a). La autora la describe como el conjunto de “those character traits that concern emotions, attitudes, values, and ideologies, which all have a 66 Luisa Chierichetti strong element of subjectivity”, que se construye a través de recursos expresivos de tipo verbal y no verbal, en un contexto y cotexto determinado (Bednarek, 2011a: 9-10; 13). Utilizamos la búsqueda de palabras clave (Keywords) y de n-gramas (N-grams) a través de AntConc para comparar el discurso de Sátur con el de los demás personajes y luego desambiguarlos a través de la función de Concordancias (Concordances) comprobando sus distintos usos, así como analizando las funciones especíicas que cumplen (Bednarek, 2012a: 59). Según Culpeper (2001: 199) las palabras frecuentes de un personaje pueden considerarse marcas estilísticas cuando se comparan con una norma apropiada, es decir, un corpus de referencia para cuya construcción “no hay reglas mágicas”. Las palabras clave se relacionan directamente con la caracterización del personaje al ser palabras cuya frecuencia, o repetición, diiere de manera signiicativa de una pauta. Comparando el subcorpus Sátur con el subcorpus Otros, de mayor extensión, utilizado como corpus de referencia, obtenemos las palabras inusualmente frecuentes (o inusualmente infrecuentes), es decir, las palabras clave (Culpeper, 2009: 33). En la tabla 3, que contiene las primeras cincuenta palabras clave de Sátur, observamos que, al lado de unas pocas palabras gramaticales, el discurso del personaje presenta una alta frecuencia de sustantivos, pronombres personales, conectores y marcadores, verbos en primera y tercera persona del singular y disfemismos relacionados con palabras tabú. RANK FREQ KEYNESS KEYWORD 1 2974 8602.229 amo 2 1467 1834.199 usted 3 353 937.742 joder 4 8689 833.119 que 5 688 398.991 pues 6 1895 371.289 pero 7 182 363.695 mire 8 1689 349.958 le 9 3464 334.952 y 67 “El criado pesado”: La caracterización en la serie Águila Roja RANK FREQ KEYNESS KEYWORD 10 2252 274.910 se 11 737 266.226 va 12 1547 251.253 yo 13 100 249.276 cojones 14 437 233.306 dios 15 225 229.851 digo 16 119 227.778 Cipriano 17 94 219.229 cago 18 439 215.131 porque 19 1490 186.500 si 20 295 181.947 sabe 21 2374 179.284 me 22 80 176.593 leches 23 78 136.472 chiquillo 24 48 123.792 coño 25 302 121.998 ahí 26 56 118.142 parió 27 480 117.885 ni 28 181 117.295 eh 29 612 109.965 esto 30 521 102.156 tiene Tabla 3: Primeras 30 palabras clave del subcorpus Sátur En nuestro análisis consideramos que la signiicancia estadística de n-gramas, entendidos como conjuntos de palabras que pueden aparecer juntas en un texto con un orden consecutivo determinado, puede ser relevante a la hora de caracterizar a un personaje (Bednarek, 2012: 205). Finalmente, el análisis de concordancias para la caracterización de los personajes nos permitirá analizar listas de todas las ocurrencias de de- 68 Luisa Chierichetti terminadas palabras del corpus, incluyendo su cotexto (a la izquierda y a la derecha). A través del análisis del corpus nos planteamos comprobar cómo el personaje de Sátur se construye de manera implícita (Culpeper, 2001) por medio de algunas pautas discursivas típicas del registro coloquial oral (Briz Gómez, 2011), entre las que destacamos el uso de vocativos, la intensiicación por medio de disfemismos y la de la relajación articulatoria (evidentemente en su reproducción gráica), rasgos en parte compartidos por otros personajes de rango plebeyo frente a los de estamentos superiores. Las pautas repetidas crean una identidad expresiva relativamente estable, siendo precisamente la estabilidad un rasgo comprobado en la icción serial, ya que se relaciona con la idelización de la audiencia a lo largo de un periodo de tiempo extendido (Bednarek, 2011b: 187-197). Al centrarnos posteriormente en algunas incongruencias y anacronismos, argumentamos cómo todos estos rasgos contribuyen a crear un personaje gracioso, familiar y cercano a la audiencia televisiva, según corroboramos apoyándonos en documentos de la producción televisiva (la biblia6 de la serie) y en el cotejo de comentarios expresados a través de la red social Twitter. 5. Análisis Investigamos el habla de Sátur aplicando la metodología propuesta en el apartado anterior para explorar las pautas textuales implícitas que dan lugar a la caracterización del personaje, es decir, las que se inieren a partir de datos como pueden ser los rasgos léxicos y sintácticos o la estructura conversacional (Culpeper, 2001: 172). En los datos contenidos en la Tabla 3 hallamos “amo” como primera palabra clave, “usted” como segunda, y “yo” en la posición 12. A través de la búsqueda de bigramas en el subcorpus Sátur (tabla 4), resulta especialmente signiicativa la reiterada presencia del vocativo “amo” en determinadas secuencias de palabras, que también revela su uso preponderante como apelativo directo: 6 “La biblia es un documento escrito donde se detallan y se explican, en distintos apartados, todos los aspectos importantes relacionados con una serie de televisión […] Suele ser un documento que se elabora a priori aunque va sufriendo modiicaciones en el proceso. Debería servir como base de los futuros guionistas, actores y directores que entren en el proyecto una vez empezado” (Ríos San Martín & Olivares, 2012: 45). “El criado pesado”: La caracterización en la serie Águila Roja Bigramas Número total de Agrupación de Types: 665 Número total de Agrupación de Tokens: 1070 44 30 28 26 17 15 14 12 11 11 no, amo joder, amo sí, amo el amo mi amo pero, amo ver, amo amo, amo ¡amo! ¡amo! espere, amo 8 8 7 7 7 7 7 6 6 6 69 eso, amo nada, amo esto, amo oiga, amo pasa, amo siento, amo yo, amo al amo dios, amo verdad, amo Tabla 4: Primeros 20 bigramas con amo del subcorpus Sátur Interpretamos la conspicua frecuencia de uso del vocativo como una de las pautas características del registro coloquial oral: la voz del “tú” aparece junto al “yo” casi siempre de forma directa, y ambas pueden representar una estrategia retórica de intensiicación (Briz Gómez, 2011: 84). El vocativo, destinado a facilitar la apertura del canal de comunicación (Vigara Tauste, 1997), crea una repercusión afectiva y orienta con bastante precisión acerca del carácter de Sátur y de su estrecha relación con Águila Roja. Ya que, como señala Bednarek (2011a: 13), los recursos expresivos pueden ser exclusivos de un personaje, o bien pueden ser compartidos por otros, también los buscamos en el subcorpus Otros. En este subcorpus, los bigramas que contienen la forma nominal “amo” (desambiguada manualmente de la forma verbal) son 34, pero su uso como apelativo aparece solo 3 veces, y precisamente en unas notas escritas por Sátur (p. ej. “Amo, me han secuestrado, y no sé quién. Sátur”), mientras que en el subcorpus Sátur la misma búsqueda nos ha permitido encontrar 665 ocurrencias (tabla 4), de las que solo 26 no son apelativos, según una comprobación manual. El uso de “amo” como vocativo es, por lo tanto, distintivo de Sátur y de su manera de dirigirse a Águila Roja. Leemos la presencia de “usted” como segunda palabra clave del subcorpus dentro de la misma estrategia de realce del papel del interlocutor de la enunciación (Briz Gómez, 2011: 85), ya que el criado, coherentemente con su papel, se dirige a su amo con el tratamiento de cortesía. Con la herramienta de concordancias y un control manual encontramos que Sátur utiliza “usted” para dirigirse a Gonzalo 1434 veces de 1467, como en los ejemplos que siguen: 70 Luisa Chierichetti Esta enorme preponderancia es, por un lado, una pista que refuerza la consideración del criado con respecto a su amo, y, por otro, conirma la relación interpersonal que los une, basada en una situación vivencial de proximidad. La elevada frecuencia de uso contribuye a caracterizar al personaje (Bednarek, 2011b: 202) como especialmente locuaz e insistente, rasgo que también comentamos al interpretar la tabla 1; se trata de una peculiaridad que los autores de los guiones consideran central, como comprobamos en la biblia, en la que a Sátur se le denomina “el criado pesado”, describiéndole de esta manera: 25 años. Judío converso, buscavidas, pesado, gracioso y metepatas pero iel hasta la muerte con su señor Gonzalo que lo rescata de la cárcel. Le sirve como escudero, criado, consejero… Es el único que sabe el secreto de su jefe y le ayuda en sus aventuras. Referente: Asno en Shrek. La presencia de disfemismos relacionados con palabras tabú entre las palabras clave del corpus Sátur, según vemos en la tabla 3 –“joder” con 353 ocurrencias, “cojones” con 100, “cago” con 94, “leches” con 797, “coño” con 48–, nos conduce al ámbito de la intensiicación de actitud en el registro coloquial oral (Briz Gómez, 2011: 98) (ejemplos 7 Con una desambiguación manual a través de concordancias, comprobamos que en una ocurrencia no se trata de un disfemismo. “El criado pesado”: La caracterización en la serie Águila Roja 71 1, 2), junto con el intensiicador sintáctico “la madre que me/los/etc. parió” (ejemplo 3): (1) Amo, que sigo vivo porque me debió ver esmirriao o algo… ¡Que lo vi con mis propios ojos, me cago en la leche! (Águila Roja, capítulo 51) (2) ¡Joder, por aquí ya he pasado tres veces! (Lloriqueando) ¡Me cago en las ratas, me cago en el pan y me cago en todo lo cagable! ¡Amooo! (Águila Roja, capítulo 12) (3) ¡Mire! ¡Mire lo que hemos conseguido! Estamos en una jaula, como animales pa la cena… Porque esto es una jaula, ¿no? (irónico) Aunque igual es sólo mi imaginación de ignorante. (Mordiendo las palabras) ¡La madre que me parió! (Águila Roja, capítulo 20) Aunque este tipo de estrategia de realce resulta ser clave para el personaje de Sátur, al buscar en el subcorpus Otros las concordancias de joder (31 repeticiones), cag* (24), parió (10), averiguamos manualmente que los mismos rasgos enfáticos son compartidos por los personajes del pueblo (Cipriano, Catalina, Sancho), no por los que pertenecen a estamentos superiores (el Comisario y los nobles, como la Marquesa, el Rey, etc.). Este resultado nos plantea que la caracterización de los personajes tiene en cuenta no solo su dimensión individual, sino también la de su condición social. La tercera y última pauta que consideramos que reconduce el discurso de Sátur al lenguaje coloquial es la reproducción gráica de la pronunciación descuidada o popular (Briz Gómez, 2011: 95; Díaz Castañón, 1975: 115), como el uso de pa/pa’ por para (véase también el ejemplo 3), así como la relajación del suijo -ado en -ao (véase también el ejemplo 1), una característica visible a través de la herramienta de Concordancias (108 ocurrencias de pa y 181 de *ao, comprobadas manualmente); a continuación presentamos los primeros 20 resultados por pa y pa’: La búsqueda de concordancias en el subcorpus Otros nos devuelve 89 ocurrencias de pa/pa’ y 134 de *ao, también delimitados al discurso de las iguras más humildes, lo que conirma la inclusión de Sátur en este grupo social. 72 Luisa Chierichetti Resumiendo, el personaje de Sátur se caracteriza por la gran extensión de discurso producido (por la importancia cuantitativa de sus diálogos y la abundancia de enunciados dirigidos a su amo), por la estrecha relación que le une con Águila Roja (la comprobamos en el uso muy extendido del vocativo “amo” y del pronombre de cortesía “usted”) y por el uso de un registro coloquial, marcado por la pronunciación descuidada y la intensiicación a través de disfemismos relacionados con palabras tabú. El registro coloquial también lo inscribe en un estrato social inferior con respecto a los personajes más poderosos. En palabras de Bednarek (2010: 125) la identidad expresiva, pues, combina la individual y la social, y es a la vez una manera de expresar la identidad única de un personaje y de alinear simultáneamente a este con un grupo que expresa similares identidades expresivas. La desviación que se produce entre la situación comunicativa y la coloquialidad contemporánea del discurso de Sátur crea un efecto humorístico basado en la interrupción de la convención narrativa (Ruiz Gurillo, 2012: 26) ambientada en el siglo xvII. Este procedimiento es especialmente llamativo cuando Sátur incorpora a su discurso referencias totalmente incongruentes con el contexto histórico en el que se hallan –evidentemente inteligibles solo para el público– o bien unas unidades léxicas y fraseológicas coloquiales indiscutiblemente actuales, como en los ejemplos siguientes: “El criado pesado”: La caracterización en la serie Águila Roja 73 (4) (Descubriéndose) Decir criado es simpliicar mucho mi condición. En realidad soy ayuda de cámara, postillón, paje, cocinero. Vamos que ordeno a las personas y las cosas, se podría decir que soy un ordenador personal. (Águila Roja, capítulo 1) (5) Que… como el gorro es talla única, me queda grande. Y los ojos que… que no veo. (Águila Roja, capítulo 24) (6) Como nos ataquen aquí, no lo contamos. Dicen que en este bosque los bandoleros primero disparan y luego... ni preguntan ni nada... te rematan y no te ponen mirando para Cuenca de milagro. (Águila Roja, capítulo 43) (7) Amo, por una vez, póngase en modo disfrute, no en modo justiciero, haga el favor. (Águila Roja, capítulo 112) El efecto humorístico se explica a partir del proceso de co-construcción del signiicado de la serie por la audiencia, según vimos en el apartado 3. Consideramos con Culpeper (2001) que inicialmente los espectadores activan una estrategia top down para situar a Sátur en el universo del siglo xvII, basándose en la ambientación de la serie; en un segundo momento, a medida que profundizan en su conocimiento del personaje, a través de la exposición al diálogo iccional, activan las estrategias interpretativas bottom up, y pasan a percibir su comprensión del personaje de manera más completa, basándose en una serie de indicios textuales, entre los que se sitúan los que destacamos. Resulta así evidente la oposición entre dos marcos distintos, el del Siglo de Oro y el de la contemporaneidad, así como entre dos planos distintos: el de los destinatarios directos de los enunciados de Sátur y el de los destinatarios indirectos, a saber, la audiencia del diálogo telecinemático. Este doble conlicto crea humor e ironía en el discurso del gracioso y representa una razón de éxito para el personaje de Sátur. De hecho, siguiendo a Bednarek (2012b: 201), podemos comprobar este logro a través de algunos de los textos secundarios de los consumidores televisivos; por ejemplo, en la red social Twitter se han creado los hashtags #grandesatur y #PerlasSatur, de los que proponemos a continuación algunos textos8: 8 Reproducimos la grafía originaria de los posts. 74 Luisa Chierichetti Ana Barahona‫@ ‏‬92arcoiris 18 set 2014 Que gracia @CarmenMartin11: Amo está en modo heroe no en modo maestro asi que no se vaya por las ramas #PerlasSatur #GrandeSatur Ramos‫@ ‏‬jesusilloramos 15 giu 2016 “La vidas felices solo pasan en los cuentos, aqui es una hostia sobre otra” #GrandeSatur Lorena‫@ ‏‬LoreVdF 29 set 2016 Tira pa la casa y aprovecha que no estamos pa hacer limpieza general jajajjaaja #perlassatur #ÁguilaRoja112 “Este traje es... Amo, que si se lo pone usted se van a cagar por la pata abajo” jajajajajaaaaa #PerlasSatur #ÁguilaRoja114 @aguilaroja_tve En estos posts los usuarios citan y comparten los “mejores” aciertos discursivos del personaje, reconociendo como generadores de humor y de ailiación en la comunidad virtual los rasgos coloquiales y de incongruencia y anacronismo que analizamos a través del análisis de corpus y del análisis del discurso. 6. Conclusiones En este trabajo nos hemos basado en un estudio de tipo cuantitativo y cualitativo para analizar cómo el discurso sirve dentro del trabajo en equipo de la escritura del guion, para caracterizar a un personaje de manera expresiva. Hemos utilizado algunas técnicas de la lingüística de corpus para poner de maniiesto los rasgos más evidentes que constituyen la identidad del personaje de Sátur en la serie televisiva Águila Roja. La búsqueda de palabras claves y de n-gramas, profundizada a través de la exploración de concordancias, nos ha permitido comprobar, por un lado, la copiosa presencia discursiva de Sátur, traducción discursiva de la cualidad de “criado pesado” que le atribuye la biblia de la serie; por otro, hemos reconocido que algunas pautas repetidas –la insistencia en el uso del alocutivo “amo” y del pronombre de cortesía “usted”, el uso de disfemismos como recursos intensiicadores, la pronunciación descuidada– marcan el uso del registro coloquial que caracteriza a Sátur, a la vez que lo alinean al grupo de los personajes socialmente menos “El criado pesado”: La caracterización en la serie Águila Roja 75 favorecidos. El uso de palabras tabú y de incongruencias –estas últimas no localizadas a través de las técnicas de corpus– sitúa decididamente al personaje en una dimensión contemporánea que contrasta con la ambientación histórica de la serie. Los resultados obtenidos a través del estudio de corpus se han insertado en el universo especíico del discurso telecinemático, escrutándolos en la perspectiva de las dinámicas de coconstrucción del signiicado que realiza la audiencia. Apoyándonos en el estudio de Culpeper (2001), argumentamos que la discrepancia entre la previsión que la audiencia hace acerca de Sátur, basándose en los rasgos de la serie, y el posterior descubrimiento de un discurso coloquial contemporáneo y de incongruencias y anacronismos patentes, crea humor e ironía (Ruiz Gurillo, 2012), efectos valorados positivamente por los espectadores que interactúan en las redes sociales. Las técnicas de la lingüística de corpus, combinadas con el análisis discursivo ‘manual’ nos han permitido delinear los rasgos principales de la identidad expresiva de Sátur que motivan el aprecio generado dentro del éxito de la serie Águila Roja. Agradecimientos Este trabajo no hubiera sido posible sin la ayuda de Andrés Cuenca Lillo, director de casting de cine y televisión. Le agradezco enormemente a Roberto Bernasconi su indispensable asistencia informática en la automatización del proceso de selección de datos. Bibliografía Anthony, Laurence. 2014. AntConc (Version 3.4.4w) [Computer Software]. Tokyo, Japan: Waseda University. http://www.laurenceanthony.net/ [Acceso 21/3/2017]. Baker, Paul. 2005. Public Discourses of Gay Men. London/New York: Routledge. Barrientos Bueno, Mónica. 2011. Águila Roja, un espectáculo de masas (de espectadores). Comunicación 9(1): 4-18. Bednarek, Monika. 2010. The language of ictional television: Drama and identity. London/New York: Continuum. Bednarek, Monika. 2011a. Expressivity and televisual characterization. Language & Literature 20(1): 1-19. 76 Luisa Chierichetti Bednarek, Monika. 2011b. The stability of the televisual character: A corpus stylistic case study. En Piazza, Roberta; Bednarek, Monika & Rossi, Fabio (ed.), Telecinematic discourse. Approaches to the language of ilm and television series. Amsterdam/Philadelphia: John Benjamins, 185-204. Bednarek, Monika. 2012a. Constructing “nerdiness”; Characterization in The Big Bang Theory. International Journal of Corpus Linguistics 17(1): 35-63. Bednarek, Monika. 2012b. Get us the hell out of here. Key words and trigrams in ictional television series. Multilingua 31: 199-229. Bednarek, Monika. 2015a. “Wicked” women in contemporary popculture: “bad” language and gender. En Weeds, Nurse Jackie, and Saving Grace. Text&Talk 35(4): 431-451. Bednarek, Monika. 2015b. Corpus-assisted multimodal discourse analysis of television and ilm narratives. En Baker, Paul & McEnery, Tony (ed.) Corpora and Discourse Studies. Basingstoke/New York: Palgrave Macmillan, 63-87. Briz Gómez, Antonio. 2011. El español coloquial en la conversación. Esbozo de pragmagramática. Barcelona: Ariel. Bubel, Claudia. 2006. The linguistic construction of character relations in TV drama: Doing friendship in Sex and the City. Saarbrücken, Alemania: Universität des Saarlandes dissertation. http://scidok.sulb.uni-saarland. de/volltexte/2006/598/ [Acceso 21/3/2017]. Bubel, Claudia. 2008. Film audiences as overhearers. Journal of Pragmatics 40: 55-71. Bubel, Claudia & Spitz, Alice. 2006. One of the last vestiges of gender bias. The characterization of women through the telling of dirty jokes in Ally McBeal. Humor 19(1): 71-104. Costa Sánchez, Carmen & Piñeiro Otero, Teresa. 2012. Nuevas narrativas audiovisuales: multiplataforma, crossmedia y transmedia. El caso de Águila Roja. ICONO 14 10(2): 102-125. Culpeper, Jonathan. 2001. Language and characterisation: People in plays and other texts. London: Longman. Culpeper, Jonathan. 2009. Keyness: Words, part-of-speech and semantic categories in the character-talk of Shakespeare’s Romeo and Juliet. International Journal of Corpus Linguistics 14(1): 29-59. Díaz Castañón, Carmen. 1975. Sobre la terminación “-ado” en el español de hoy. Revista española de lingüística 5(1): 111-120. Fernández Tubau, Valentín. 2012. Diálogos en el guion. Arte y técnica. En Ríos San Martín, Manuel El guion para series de televisión. Madrid: Instituto RTVE, 169-207. “El criado pesado”: La caracterización en la serie Águila Roja 77 Galán Fajardo, Elena. 2006. La representación de los inmigrantes en la icción televisiva en España. Propuesta para un análisis de contenido. El Comisario y Hospital Central. Revista Latina de comunicación social 61. http://www.ull.es/publicaciones/latina/200608galan.htm [Acceso 21/3/2017]. Galán Fajardo, Elena. 2007. Construcción de género y icción televisiva en España. Comunicar: Revista cientíica iberoamericana de comunicación y educación 28: 229-236. http://www.revistacomunicar.com/index. php?contenido=detalles&numero=28&articulo=28-2007-28 [Acceso 21/3/2017]. Goffman, Erving. 1981. Forms of Talk. Oxford: Blackwell. González de Garay Domínguez, Beatriz. 2009. Ficción online frente a icción televisiva en la nueva sociedad digital. Diferencias de representación del lesbianismo entre las series españolas para televisión generalista y las series para Internet. Actas ICONO 14. http://eprints.ucm.es/9856/ [Acceso 21/3/2017]. Gregori Signes, Carmen. 2007. What do we laugh at? Gender representations in 3rd Rock from the Sun. En Santaemilia, José; Bou, Patricia; Maruenda, Sergio & Zaragoza, Gora (ed.) International Perspectives on Gender and Language. Valencia: Universitat de Valencia, 726-750. Gregory Michael & Carroll, Suzanne. 1978. Language and situation: Language varieties and their social contexts. London: Routledge and Kegan Paul. Guerrero, Mar. 2014. Webs televisivas y sus usuarios: un lugar para la narrativa transmedia. Los casos de Águila Roja y Juego de Tronos en España. Comunicación y sociedad 21: 239-267. Igartua, Juan José; Barrios, Isabel M. & Ortega, Félix. 2012. Analysis of immigration image in the prime time television iction. Comunicación y Sociedad 25(2): 5-28. López, José Antonio & Cuenca, Francisco Antonio. 2005. Ficción televisiva y representación generacional: modelos de tercera edad en las series nacionales. Comunicar 25. http://www.revistacomunicar.com/index. php?contenido=detalles&numero=25&articulo=25-2005-147 [Acceso 21/3/2017]. Mandala, Susan. 2007. Solidarity and the scoobies: an analysis of the -y sufix in the television series Buffy the Vampire Slayer. Language and Literature 16(1): 53-73. Mandala, Susan. 2008. Representing the future: Chinese and codeswitching in Firely. En Rhonda V. Wilcox & Cochran Tanya R. (ed) Investigating Firely and Serenity: Science iction on the frontier. London/New York: I. B. Tauris, 31-40. 78 Luisa Chierichetti Mandala, Susan. 2011. Star Trek: Voyager’s seven of nine: a case study of language and character in a televisual text. En Piazza, Roberta; Bednarek, Monika & Rossi, Fabio (ed.) Telecinematic discourse: Approaches to the language of ilm and television series. Amsterdam/Philadelphia: John Benjamins, 205-223. Marcos, María & Igartua, Juan José. 2014. Análisis de las interacciones entre personajes inmigrantes/extranjeros y nacionales/autóctonos en la icción televisiva española. Disertaciones: Anuario electrónico de estudios en Comunicación Social 7(2): 136-159. Piazza, Roberta; Bednarek, Monika & Rossi, Fabio (ed.). 2011a. Telecinematic discourse: Approaches to the language of ilm and television series. Amsterdam/Philadelphia: John Benjamins. Piazza, Roberta; Bednarek, Monika & Rossi, Fabio. 2011b. Introduction: Analysing telecinematic discourse. En Piazza, Roberta; Bednarek, Monika & Rossi, Fabio (ed.). Telecinematic discourse: Approaches to the language of ilm and television series. Amsterdam/Philadelphia: John Benjamins, 1-17. Richardson, Kay. 2010. Television dramatic dialogue: A sociolinguistic study. Oxford: Oxford University Press. Ríos San Martín, Manuel & Olivares, Javier. 2012. De la idea a la emisión. En Ríos San Martín, Manuel El guion para series de televisión. Madrid: Instituto RTVE, 169-207. Ruiz Gurillo, Leonor. 2012. La lingüística del humor en español. Madrid: Arco/Libros. Vigara Tauste, Ana María. 1997. Miau: El lenguaje coloquial (humano) en Galdós. Espéculo 5. https://pendientedemigracion.ucm.es/info/especulo/numero5/miau_vig.htm [Acceso 18/8/2017]. Wodak, Ruth. 2009. The discourse of politics in action. Basingstoke/New York: Palgrave Macmillan. ojs.uv.es/index.php/qilologia/index Qf Lingüístics Persiguiendo con imparcialidad “el total desprecio a la Constitución”: el léxico valorativo en la Querella del Fiscal de Cataluña contra Carme Forcadell i Lluís Impartially prosecuting “the total contempt for the Constitution”: Evaluative lexis in the criminal complaint iled by the Public Prosecutor of Catalonia against Carme Forcadell i Lluís Giovanni Garofalo Università degli Studi di Bergamo. [email protected] Recibido: 29/04/2017. Aceptado: 16/10/2017 Resumen: Se propone un estudio semántico-discursivo de las dos querellas presentadas por la Fiscalía Superior de Cataluña contra D.ª Carme Forcadell i Lluís, presidenta del Parlamento de Cataluña, y contra los miembros de la Mesa del Parlamento catalán por los delitos de desobediencia y prevaricación. Compaginando las metodologías del análisis de sentimiento, de la lingüística del corpus y de la teoría de la valoración, este estudio desmiente la idea de que la querella solicita de forma imparcial la aplicación de normas generales a casos concretos. Lejos de ser fácticos o ideacionales, los enunciados del iscal están cargados de signiicados interpersonales y maniiestan implicación subjetiva con una vehemencia aledaña de la invectiva política. Palabras clave: querella; análisis de sentimiento; polaridad textual; subjetividad; teoría de la valoración. Abstract: This paper proposes a semantic-discursive study of the two criminal complaints iled by the Public Prosecutor of Catalonia against Mrs. Carme Forcadell i Lluís, President of the Catalan Parliament, and against key members of the Catalan Parliament’s Bureau for the crimes of disobedience and misconduct. Combining sentiment analysis, corpus linguistics and appraisal theory, this study denies the idea according to which a criminal complaint seeks the application of general norms to concrete cases in an objective fashion. Far from being factual or ideational, the Prosecutor’s utterances are laden with interpersonal meanings and reveal subjective implication with a vehemence reminiscent of political invective. Keywords: criminal complaint; sentiment analysis; text polarity; subjectivity; appraisal theory. Garofalo, Giovanni. 2017. “Persiguiendo con imparcialidad ‘el total desprecio a la Constitución’: el léxico valorativo en la Querella del Fiscal de Cataluña contra Carme Forcadell i Lluís”. Quaderns de Filologia: Estudis Lingüístics 22: 79103. doi: 10.7203/qf.22.11302 Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 81 1. El escenario discursivo de las querellas El 19 de octubre de 2016 el Fiscal Superior de Cataluña presentó, ante el Tribunal Superior de Justicia de Cataluña (TSJC), una querella contra D.ª Carme Forcadell i Lluís, presidenta del Parlament, acusándola de los delitos de prevaricación y de desobediencia al Tribunal Constitucional (TC), ya que había permitido que la Cámara catalana votara la hoja de ruta independentista. El texto del iscal precisaba que, en fecha 20 de julio de 2016, la Mesa del Parlamento autonómico, tras escuchar a la Junta de Portavoces, había tomado nota de las conclusiones de la Comisión de Estudio del Proceso Constituyente (CEPC), en las que se indicaba el recorrido hacia la desconexión. Dichas conclusiones ya habían sido declaradas inconstitucionales por un auto del TC, debidamente notiicado a los querellados. Aunque los servicios jurídicos de la cámara se pronunciaran en contra de las indicaciones de la CEPC por no ser constitucionalmente admisibles, en palabras del Fiscal, Forcadell pese a ser consciente de que tal decisión contravenía frontalmente [...] el Auto de 19 de julio de 2016, acordó someter a votación la alteración del orden del día para incluir la votación sobre las conclusiones de la Comisión de Estudio del Proceso Constituyente, resultando aprobada la alteración del orden del día y la inclusión del nuevo punto. El 23 de febrero de 2017, la Fiscalía volvió a querellarse contra la presidenta del Parlament y los miembros soberanistas de la Mesa Lluís Corominas, Anna Simó y Ramona Barrufet por desobediencia y prevaricación, por permitir la aprobación de dos resoluciones presentadas por Junts pel Sí y la CUP que instaban a convocar un referéndum unilateral de secesión. El sentido común prescribiría que ambos documentos judiciales se ciñeran a los hechos documentales, acudiendo a enunciados neutrales desde el punto de vista interpersonal y, por ello, “fácticos” u “objetivos”. Nos esperaríamos, por ende, que el Ministerio Fiscal actuara * Este trabajo se enmarca en el proyecto Discurso jurídico y claridad comunicativa. Análisis contrastivo de sentencias españolas y de sentencias en español del Tribunal de Justicia de la Unión Europea (Referencia FFI2015-70332-P), inanciado por el Ministerio de Economía y Competitividad de España así como por los Fondos FEDER.. 82 Giovanni Garofalo como boca inanimada de la ley (bouche de loi) y que sus enunciados vehicularan signiicados que, en la terminología de la gramática sistémico-funcional, se deinen como ideacionales (Halliday & Hasan 1985: 20), es decir, relacionados con la mera representación de los hechos del mundo, tal como los entendemos a través de la experiencia. Este comportamiento verbal sería el más congruente con los principios de imparcialidad y objetividad que deberían guiar la actuación del Ministerio Público, según el art. 124.1 de la Constitución Española, desarrollado en el art. 7 del Estatuto Orgánico del Ministerio Fiscal (EOMF) en los términos siguientes: Por el principio de imparcialidad el Ministerio Fiscal actuará con plena objetividad e independencia en defensa de los intereses que le estén encomendados. Así pues, el iscal debería ser neutral en la evaluación de los hechos y pruebas que dan lugar a la causa, sin perjudicar a ninguno de los que intervienen en el proceso, dado que su actuación ha de ser desinteresada y desapasionada, debiendo atenerse únicamente a la realidad objetiva. Analizadas más de cerca, sin embargo, las elecciones léxicas y gramaticales del emisor resultan cargadas de signiicados interpersonales, que no se limitan a representar la realidad sino que interactúan con la parte contraria, a través de un amplio abanico de valoraciones subjetivas de polaridad negativa. De esta manera, la voz autoral del texto acaba inluyendo en la decisión del TSJC, favoreciendo “el empleo de un determinado subconjunto de valores del sistema de valoración y desechando otros” (White, 2001: 9, trad. propia). En concreto, esta estrategia apunta a derrumbar el ethos de la querellada y el ideario soberanista y, por otro, otorgar carta de naturaleza a actitudes, creencias y supuestos que vertebran el discurso del constitucionalismo. En cuanto acción social –a saber, proceder humano orientado por las acciones de otro (Weber, 1921)– la querella del iscal se encuadra en un marco interactivo institucionalizado, deinido por las convenciones del género discursivo. Según el acusador sea un procurador de los tribunales (Garofalo, 2009) o el ministerio iscal, dicho marco interactivo admite variantes que tienen que ver con la organización de la escena de enunciación y con el reparto de los papeles discursivos entre los intervinientes. Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 83 A partir del concepto de escena de enunciación, entendida como ‘el interior del discurso’, en el que la palabra es puesta en escena (Maingueneau, 1993), en la querella del iscal cabe distinguir tres escenas distintas: 1. La escena englobante se reiere al tipo de escena general en el que hay que situarse para entender los propósitos retóricos y pragmáticos del emisor y de qué modo el destinatario es interpelado por el texto. En el caso que nos ocupa, la escena o evento englobante correspondería a presentar una queja ante un juzgado o tribunal. 2. La escena genérica es impuesta por el género discursivo especíico: entre sus componentes, destacan los roles que desempeñan los participantes, y el propósito principal del emisor. Desde la perspectiva sociolingüística de Goffman (1981), el emisor actúa como autor de sus propios enunciados y, a la vez, como animador o reformulador de un punto de vista compartido por toda la Fiscalía Superior de Cataluña y por el Fiscal General del Estado. Este último órgano, valiéndose de su superioridad jerárquica requirió que se formulara querella contra Forcadell y, por tanto, desempeña en la interacción el papel de responsable. El destinatario, en cambio, se desglosa en destinatario directo (el TSJC, conocido, ratiicado y apelado), para quien está especíicamente construido el texto, y el destinatario indirecto, a saber, la presidenta del Parlament y los demás miembros independentistas del Govern. El principal objetivo del emisor consiste en constituirse en parte acusadora, solicitando al TSJC para que admita a trámite ambas querellas. 3. La escenografía, entendida como escena construida en el texto, legitima los enunciados seleccionados por el iscal y permite la introducción de posturas evaluativas especíicas para interpelar y persuadir al TSJC. Como tal, según Chareaudeau y Maingueneau (2002: 222), la escenografía no constituye el simple marco del texto, sino que remite a un esquema cognitivo concreto que anida y va consolidándose dentro de la escena genérica: “al emerger, la palabra implica una determinada escena de enunciación que, en realidad, se valida progresivamente a través de esa enunciación misma”. A la luz de lo anterior, la escenografía activada por el iscal al acusar a Forcadell y a los demás querellados es la de un 84 Giovanni Garofalo panleto1 de defensa de la Constitución de 1978 y del Estado de derecho, que legitima y fomenta las elecciones gramaticales y léxicas de carácter valorativo. En este sentido, recurriendo a las palabras de los precitados autores (2002: 222, cursiva añadida), la defensa apasionada del constitucionalismo, convertida en escenografía, es aquello de lo que procede el discurso y lo que este discurso engendra: ella legitima un enunciado que a su vez debe legitimarla, debe dejar establecido que esa escenografía de la que procede la palabra es precisamente la escenografía requerida para contar una historia, denunciar una injusticia, etc. Dentro de esta escenografía, las valoraciones especíicas realizadas por el iscal en las secuencias narrativas y argumentativas de ambos textos han de acomodarse a un enmarcado conceptual o guion –impuesto por los artículos 410.1 (delito de desobediencia) y 404 (delito de prevaricación continuada) del CP2– deinido por Taranilla (2012: 101) como “esquema temporalmente ordenado de acontecimientos y situaciones comunes”. Basándose en Schank y Abelson (1987) y en la teoría del ilósofo del derecho Nerhot (1990), la precitada autora destaca que la norma contiene un modelo que construye la realidad, asignando pertinencia a determinadas situaciones particulares y estandarizándolas, para que queden inmediatamente reconocibles por los interlocutores que poseen en su enciclopedia ese mismo esquema o enmarcado. El análisis cuantitativo y cualitativo que se ofrece a continuación demosSe entiende aquí el término panleto como “opúsculo de carácter agresivo” (DLE 2014, s.v.) o “folleto u hoja de propaganda política o de ideas de cualquier clase” (María Moliner 1992, s.v.), es decir, como discurso de apasionada defensa de los principios constitucionales. 2 Los mencionados artículos tienen el siguiente tenor: Art. 404.0: “A la autoridad o funcionario público que, a sabiendas de su injusticia, dictare una resolución arbitraria en un asunto administrativo se le castigará con la pena de inhabilitación especial para empleo o cargo público y para el ejercicio del derecho de sufragio pasivo por tiempo de nueve a quince años”. Artículo 410.1. “Las autoridades o funcionarios públicos que se negaren abiertamente a dar el debido cumplimiento a resoluciones judiciales, decisiones u órdenes de la autoridad superior, dictadas dentro del ámbito de su respectiva competencia y revestidas de las formalidades legales, incurrirán en la pena de multa de tres a doce meses e inhabilitación especial para empleo o cargo público por tiempo de seis meses a dos años”. 1 Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 85 trará que los guiones especíicos activados por los artículos del código penal anteriormente mencionados resultan determinantes para la selección de los elementos léxico-gramaticales de valor axiológico observables en ambos textos. En efecto, la función acusadora del género querella hace que dichos elementos presenten una polaridad negativa y una carga afectiva tendente a la intensiicación. Cuando quien formula la acusación es un procurador, según la gravedad del delito atribuido al querellado, se ha observado cierto grado de implicación de la subjetividad del emisor en las circunstancias personales de su cliente, unida a la manifestación de cierta empatía entre el procurador y su mandante (Garofalo, 2009: 175). Esta dinámica intersubjetiva no sorprende, ya que entre el abogado que redacta el texto y el querellante hay una relación profesional entre particulares, a raíz de la cual un profesional del derecho recibe un beneicio monetario para representar, con la debida contundencia, los intereses de su cliente. Un tanto distinto es el caso del subgénero “querella del Ministerio Fiscal”, en el que quien se constituye en parte acusadora es una igura institucional llamada a actuar con mayor equilibrio e imparcialidad y a mantenerse equidistante de polémicas políticas, en cuanto defensora de la legalidad. Ahondando en esta línea, los epígrafes siguientes pondrán de maniiesto que la defensa del constitucionalismo desde los tribunales puede llegar a realizarse con una intensidad parecida al ímpetu de un ataque político. 2. Metodología de análisis y objetivos de la investigación El presente estudio se inscribe en el marco de la investigación sobre la valoración y la emoción en los discursos especializados (López Ferrero, 2008; Diaz Rojo, 2010; Serpa, 2011, entre otros) y pretende integrar diferentes enfoques metodológicos, computacionales y semánticos, para lograr una comprensión más profunda de los recursos axiológicos movilizados por el ministerio iscal. El recorrido analítico que se propone a tal efecto se articula en tres etapas interrelacionadas. En primer lugar, se realiza una medición cuantitativa de las marcas de valoración presentes en ambas querellas, para que los elementos léxico-gramaticales que maniiestan la subjetividad del emisor “emerjan por sí solos” del corpus de estudio (Biber, 2009), proporcionando datos empíricos capaces de orientar el análisis. El primer estadio de 86 Giovanni Garofalo esta investigación, por tanto, sigue un enfoque inductivo guiado por el corpus (corpus driven) y se basa en la medición de la polaridad y de la intensidad del sentimiento textual mediante la herramienta de análisis de sentimiento Lingmotif v.1.0 (Moreno-Ortiz, 2016). Entendido como procesamiento computarizado de la expresión de opiniones y juicios y emociones del emisor y, en general, de su subjetividad (Liu, 2010), el análisis de sentimiento (o minería de opinión) permite medir la carga afectiva de los textos y, pese a sus límites, ofrece la ventaja de cuantiicar la fuerza de los ataques del iscal con datos numéricos y de compararla con la intensidad del sentimiento en un corpus de referencia del género querella criminal. Como se verá, los resultados arrojados por Lingmotif se basan en palabras valorativas aisladas, monolexémicas o polilexémicas, contenidas en el diccionario de la aplicación, conigurado para la lengua estándar. Para paliar los inevitables errores de detección automática de la polaridad textual de elementos muy sensibles al contexto, se ha procedido a la constitución manual de un diccionario complementario especíico (plugin), que recoge ítems léxicos axiológicos en el dominio de los delitos de desobediencia y de prevaricación. La segunda etapa profundiza en el análisis de sentimiento e ilustra los criterios adoptados para seleccionar las marcas valorativas que se han incorporado al plugin, para que el vocabulario de complemento de Lingmotif no fuera un mero listado de palabras seleccionadas de manera impresionista. En concreto, se ha optado por un doble criterio semántico-estadístico, según el cual se han añadido al plugin solo palabras clave dotadas de frecuencia inusual (keywords), pertenecientes a los tres dominios de la Teoría de la Valoración (actitud, compromiso y gradación, véase Martin, 2000 y 2003; Martin y White, 2005; Martin y Rose, 2007). Para este in, con la ayuda del programa AntConc (Anthony, 2014), se ha extraído la keyword list y se han observado las concordancias y los agrupamientos léxicos (clusters) de las palabras de dicha keyword list. Por último, la tercera etapa de la investigación elabora los resultados de las dos primeras y propone un breve análisis cualitativo de cada uno de los tres dominios de la valoración en las dos querellas de referencia, combinando el enfoque basado en corpus y el guiado por el corpus (Tognini-Bonelli, 2001). El análisis cualitativo ofrece la ventaja de ainar el análisis de sentimiento y de profundizar en sus Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 87 resultados, dado que las herramientas informáticas existentes no son capaces de interpretar adecuadamente las correlaciones semánticas de una información con otros conceptos aledaños y no siempre el analista dispone de modelos de representación del conocimiento especializado, p.ej., de ontologías o redes semánticas capilares del dominio penal, capaces de garantizar un análisis automatizado riguroso de los signiicados valorativos. Por tanto, se ha considerado necesario ‘pulir’ los resultados obtenidos con Lingmotif mediante una tasación cualitativa del sentimiento, considerando que la actitud del iscal se maniiesta a menudo de forma implícita y no puede considerarse como una característica o propiedad de palabras individuales, sino de enunciados o textos enteros (White, 2001). Cabe reconocer que este enfoque metodológico presenta tanto una limitación de fondo como una ventaja implícita. Por un lado, es inusual que una aplicación de Sentiment Analysis, concebida para el análisis automático de textos en lengua estándar, se aplique a un género judicial como la querella del Ministerio Fiscal y, en concreto, a un corpus de tamaño bastante reducido (solo 23.826 palabras). De hecho, la incorporación de un plugin lexicon tiende necesariamente a variar los valores de polaridad y, por esta razón, hubiera sido oportuno basar el análisis en un corpus suicientemente amplio, para determinar qué elementos lexicogramaticales se deben introducir en un plugin útil para el análisis automático de textos de este dominio penal. Por otro lado, entendemos que la selección de los ítems axiológicos y la determinación de su polaridad son variables muy sensibles no solo a las dimensiones del corpus de estudio y al género textual, sino también al enmarcado cognitivo (escenografía y guiones delictivos) de los textos judiciales que se quieran analizar. El plugin constituido para esta investigación abarca solo palabras clave coherentes con el enmarcado concreto impuesto por el iscal y, por esta razón, creemos que los resultados obtenidos –pese a sus límites– pueden resultar signiicativos para el tratamiento automático de otros textos dotados del mismo frame, orientado hacia la defensa apasionada del constitucionalismo y hacia la reprobación de conductas desobedientes y prevaricadoras, actualmente imputadas a más de una igura del independentismo por los jueces de Cataluña. En resumen, el estudio propuesto constituye un caso de triangulación (McNeill, 1990: 22), ya que intenta conjugar múltiples enfoques 88 Giovanni Garofalo metodológicos, discursivos y computacionales, para ofrecer una descripción empíricamente cimentada de las marcas de subjetividad del iscal, en cuya voz el discurso judicial se hibrida con el político. 3. Medición de la polaridad de las querellas con Lingmotif v. 1.0 El grado de implicación subjetiva del iscal en ambas querellas se ha cuantiicado, de entrada, mediante el software Lingmotif (MorenoOrtiz, 2016), una aplicación de análisis de sentimiento capaz de identiicar en los textos palabras y frases con carga afectiva, contenidas en los diccionarios del programa, y de aplicar reglas de contexto (de inversión, intensiicación y atenuación), para dar cabida a posibles modiicadores del sentimiento (Moreno-Ortiz, 2017: 133). Los valores arrojados por Lingmotif se diferencian en dos magnitudes, a saber, el TSI (Text Sentiment Intensity) o índice de intensidad del sentimiento textual –es decir, la relación entre ítems que expresan sentimiento e ítems de valor no emocional– y el TSS (Text Sentiment Score), o valor global del sentimiento textual, expresado como promedio de elementos positivos, negativos y neutros contenidos en cada texto. Ambas magnitudes se miden en una escala graduada, concebida como un continuum de valores de 0 a 100, que van, para el TSS, de lo extremadamente negativo (< 20) a lo extremadamente positivo (˃ 80) y, para el TSI, de lo extremadamente factual (< 55) a lo extremadamente intenso (˃ 85). El programa asigna una valencia positiva (entre 5 y 2), negativa (entre -5 y -2) o neutra a cada ítem léxico (excepto a las palabras gramaticales) y los valores del TSI relejan el porcentaje de las valencias asignadas, teniendo presente la longitud de cada texto. Se ha realizado un análisis conjunto de las dos querellas (23.826 palabras en total), ya que ambas apuntan a enjuiciar delitos idénticos cometidos por las mismas personas, con una cronología y una dinámica ligeramente diferente. La decisión de reunir ambos textos en un único corpus de análisis se debe también a razones funcionales y estructurales. En primer lugar, la propia Fiscalía solicita la acumulación de la segunda querella a las diligencias previas activadas por la primera y seguidas ante el TSJC. En segundo lugar, se han comparado los dos textos con la herramienta de traducción asistida por ordenador SDL Trados Studio y este cotejo ha evidenciado que la primera querella presenta un Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 89 total de 9.587 palabras y la segunda un total de 14.239 palabras, de las cuales 7.649 están tomadas y repetidas del primer texto. Cabe destacar que el diccionario de Lingmotif (que incluye, para el español, 207.000 palabras y 300 reglas contextuales, Moreno-Ortiz, 2017: 137) está concebido para analizar el sentimiento de textos de registro estándar, aunque permite utilizar léxicos especíicos, elaborados por el propio usuario a modo de léxico complementario (plugin), lo cual posibilita el análisis de la carga afectiva de géneros especializados. Dado que la orientación semántica depende del ámbito de especialidad (Moreno-Ortiz & Fernández Cruz, 2015: 332), a falta de un extractor estadístico capaz de identiicar con cierta iabilidad candidatos a términos de un corpus de querellas, el análisis ha requerido la elaboración manual de un plugin especíico, capaz de detectar la polaridad del léxico de la Fiscalía en ambos documentos. Los criterios para la constitución del plugin se detallan en el epígrafe siguiente; lo que interesa destacar aquí es que, de entrada, se ha efectuado un análisis del sentimiento de ambos textos, con y sin diccionario de complemento, lo que ha producido los siguientes resultados: Fig. 1. Análisis de sentimiento de ambas querellas sin plugin Fig. 2. Análisis de sentimiento de ambas querellas con un plugin especíico 90 Giovanni Garofalo Lo que se observa a simple vista es que, tras incorporar el léxico especíico, la aplicación arroja valores bastante parecidos de TSS (con un ligero viraje de lo neutro a lo ligeramente negativo) e índices de TSI muy distintos, hasta alcanzar un índice extremadamente intenso (92) del sentimiento global en el segundo análisis (ig. 2). Para interpretar correctamente estos datos, relacionándolos con el género de estudio, se ha procedido al cálculo de la línea de base, es decir, de los valores que expresan la normalidad estadística del TSS y del TSI en un corpus de referencia de 63 querellas formuladas por un amplio abanico de delitos (629.893 palabras en total): Fig. 3. Línea de base del TSS y del TSI en el corpus de referencia Los indicadores de la igura 3, recabados a partir del diccionario de Lingmotif, demuestran que el género querella, cuya función primaria consiste en mover una acusación contra alguien, suele caracterizarse por un léxico de polaridad bastante negativa y por una carga afectiva tendiente a la intensiicación. Según la gravedad del delito atribuido al querellado, la implicación de la subjetividad del procurador en las circunstancias personales de su cliente, unida a la manifestación de cierta empatía entre animador y autor del texto, son estrategias discursivas ya observadas en la querella española y, en general, ausentes en los textos paralelos italianos (Garofalo, 2009: 175). No es baladí advertir que, en las 63 querellas del corpus de referencia, entre el procurador que redacta el texto y el querellante hay una relación profesional entre particulares, en la que el abogado recibe un beneicio monetario para representar, con la debida contundencia, los intereses de su cliente. Un tanto distinto es el caso que nos ocupa, en el que quien se constituye en parte acusadora es el Ministerio Fiscal, igura institucional que actúa como defensor de la legalidad y que suele intervenir con mayor equilibrio y Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 91 con la imparcialidad que corresponde (o debería corresponder) a sus funciones. De ahí que el promedio de elementos léxicos positivos, negativos y neutros (TSS), en ambos textos de la Fiscalía General de Cataluña (ig. 1), tienda a una mayor neutralidad, resultando cuatro puntos superior a la línea de base (ig. 3). Ello no signiica, no obstante, que el Fiscal deienda su tesis con menor dureza, como parece demostrar el valor bastante intenso del TSI (ig. 1), apenas cuatro puntos inferior a la línea de base. Es oportuno destacar que los resultados comentados hasta aquí no varían de forma signiicativa si se repite el análisis de sentimiento del corpus de referencia incorporando el plugin especíico elaborado para las dos querellas contra Forcadell: se obtiene un mero incremento de un punto del valor del TSS (38), mientras que el índice de TSI baja una unidad (64). Este resultado se interpreta fácilmente, ya que el plugin funciona para detectar la carga afectiva del léxico relacionado con los delitos de desobediencia y prevaricación, y el corpus de referencia no contempla casos subsumibles en la misma tipiicación jurídica. Por último, es interesante señalar que, tras analizar el corpus de cotejo con Lingmotif, dos textos presentan un índice de TSI igual a 100: se trata, respectivamente, de una querella formulada por un delito que vio al Partido Popular como parte ofendida y de otra presentada por delitos que se produjeron a raíz de un proyecto de ley impulsado por el mismo partido3. Ligeramente por debajo del valor máximo de TSI, en el corpus de referencia, destacan una querella por injurias presentada por el lehendakari Ibarretxe contra un periodista de El País (TSI = 97) y otra formulada por algunos ciudadanos españoles contra el expresidente del Gobierno del Partido Popular José María Aznar, por delitos contra personas y bienes protegidos en caso de conlicto armado (TSI = 94). Los datos cuantitativos parecen indicar, por tanto, que el sentimiento textual global se hace acusadamente intenso cuando el discurso judicial y el político se hibridan, en textos ideológicamente polarizados. 3 En el primer caso, se trata de una querella presentada (por delitos de injurias) por la asociación Tertulia Feminista ‘Les Comadres’ contra el obispo de Alcalá de Henares, a raíz de una protesta contra la reforma de la Ley de Aborto, impulsada por el Ministro de Justicia Alberto Ruiz Gallardón. En el segundo caso, el querellante es el Partido Popular contra el secretario general de la Federación Socialista Madrileña, por los delitos de injurias, calumnias, coacciones y amenazas. 92 Giovanni Garofalo 4. Constitución del plugin y análisis de los términos seleccionados La selección de los 328 términos de polaridad positiva o negativa relacionados con los delitos de desobediencia y prevaricación e incluidos en el plugin se ha realizado manualmente, tras una atenta lectura de ambos textos. Pese al margen de error que todo análisis manual entraña, la constitución del plugin se ha ajustado al siguiente enfoque híbrido, a la vez estadístico y semántico: 1. Creación de una lista de palabras clave ordenadas por valor de keyness (“calidad de palabra clave”); 2. Identiicación de términos monolexémicos y polilexémicos de polaridad positiva y negativa, según un criterio onomasiológico. Observación de las concordancias de los términos clave y de sus respectivos colocados y clusters; 3. Comprobación de que los términos identiicados iguren en la lista de palabras clave y que no estén ya incluidos en el diccionario de Lingmotif. Determinación de la polaridad que dichos ítems asumen en la lengua común y en el ámbito penal; 4. Clasiicación del léxico evaluativo obtenido en cuatro subgrupos: a) términos que maniiestan carga afectiva hacia las conductas supuestamente ilícitas de los querellados; b) términos que evalúan productos normativos (p. ej., resoluciones aprobadas por la Cámara catalana, derivadas de las conductas criminógenas); c) recursos léxicos y gramaticales por medio de los cuales la voz del Fiscal se posiciona intersubjetivamente (p. ej., estructuras polifónicas, verbos modales, negaciones y elementos evidenciales); c) valoraciones escalares. Todos los elementos axiológicos identiicados de esta manera han resultado compatibles con la escenografía del discurso y con los guiones de los delitos de desobediencia y de prevaricación continuada (§ 1). El primer estadio de la metodología antedicha consiste, por tanto, en la extracción de las 1965 palabras relevantes de ambas querellas, obtenidas mediante la función Keyword list del programa AntConc (Anthony, 2014), que compara los dos textos de la Fiscalía Superior de Cataluña con el corpus de referencia. De dichas palabras, se han eliminado los ítems léxicos semánticamente vacíos (palabras gramaticales) Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 93 y los acrónimos característicos del ámbito judicial de referencia (p. ej., LOTC, Ley Orgánica del Tribunal Constitucional, CE, Constitución española, etc.). Tras esta operación de limpieza, las primeras 50 palabras clave de ambas querellas, ordenadas por índice de keyness, resultan ser las siguientes: Cataluña, parlamento, constituyente, resolución, constitucional, proceso, Constitución, presidenta, desobediencia, votación, estudio, parlamentaria, mandatos, referéndum, incidente, paralizar, resoluciones, tribunal, mandato, poderes, catalán, desconexión, pleno, parlamentarios, parlamentarias, cumplimiento, propuestas, boletín, parlamentario, julio, eludir, Carme, Forcadell, suponga, ordenamiento, ignorar, voluntad, parlament, providencia, creación, inviolabilidad, democrático, iniciativa, conclusiones, inconstitucional, impugnación, negativa, decisiones, suspensión, soberanía Según lo previsto, la lista así obtenida contiene dos clases de palabras (Baker 2006: 127): nombres propios que identiican el marco espacio-temporal, el dominio del discurso (en concreto, la escenografía y los guiones delictivos descritos en § 1) y la protagonista principal de los hechos encausados (Cataluña, junio, Constitución, Carme, Forcadell), más una serie de palabras clave relacionadas con la temática central (aboutness keywords). A partir de estas últimas, analizando las concordancias de cada una de ellas, sus colocados y los clusters a la derecha, se evalúa la polaridad efectiva de los candidatos a términos que se incluirán en el plugin. La clasiicación semántica de estos términos clave se ha realizado acudiendo a los tres dominios de la teoría de la valoración (Martin, 2000, 2003; Martin & Rose, 2007; White, 2001, 2003; Martin & White, 2005), es decir, la actitud, el compromiso y la gradación. Como es sabido, en la actitud se incluyen los signiicados mediante los cuales el emisor atribuye un valor o una evaluación intersubjetiva al comportamiento de los querellados en relación con las normas penales y a los productos de sus respectivas actuaciones. En el dominio del compromiso se incluyen los recursos lingüísticos utilizables para posicionar la voz del iscal en relación con las diversas proposiciones o iniciativas de los partidos independentistas mencionadas en el texto. Por último, por medio de la gradación se representa un espacio semántico de escala relacionado con 94 Giovanni Garofalo la manera en el que el iscal intensiica o atenúa la fuerza de sus enunciados o gradúa el foco de sus categorizaciones semánticas. Un análisis exhaustivo de todos y cada uno de los términos seleccionados para el plugin excedería con creces los límites de espacio de este estudio; los epígrafes siguientes se limitarán, por tanto, a ilustrar algunos casos representativos para cada dominio semántico. 4.1. Ítems léxicos que maniiestan actitud En el marco del dominio de la actitud, la Fiscalía se limita a evaluar el comportamiento de los querellados (subdominio del juicio) y los productos de su actuación, a saber, el proceso de desconexión y la producción normativa del Parlament, encaminada a llevar a cabo el referéndum vinculante en Cataluña (subdominio de la apreciación). Por lo que se reiere al ámbito del juicio, desde la perspectiva del iscal constituyen delito y se cargan de valoración negativa una serie de comportamientos que, ignorando las repetidas advertencias del TC, infringen el art. 410.1 (desobediencia) y el art. 404 (prevaricación continuada) del CP. No sorprende, pues, que en la lista de palabras clave desobediencia (índice de keyness 186.754) igure inmediatamente después de presidenta (keyness 200.417). En efecto, es suiciente observar los clusters en un intervalo de 15 palabras a la derecha del nombre y apellido de la presidenta para encontrar valoraciones contundentes (señaladas en cursiva a continuación) como: La Sra. Presidenta del Parlamento de Cataluña, Carme Forcadell i Lluís, manifestando una voluntad inequívoca e irreversible de llevar adelante su proyecto político por la fuerza de los hechos consumados, con total desprecio de la Constitución de 1978, del ordenamiento emanado de la misma, y de los pronunciamientos de la STC de 2 de diciembre de 2015 y del ATC de 19 de Julio de 2016, procedió a dar impulso al proceso constituyente preordinado en la Resolución 1/XI. La conducta de Doña Carme Forcadell que con su voto permitió el debate y votación de las propuestas registradas con los números [...] evidencia aún más su contumaz y obstinada voluntad de incumplir los mandatos constitucionales [...]. Repárese en que los comportamientos delictivos de la querellada se expresan mediante elementos léxicos que, en la lengua estándar, no Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 95 tienen carga afectiva alguna (p. ej., permitir el debate y la votación, llevar adelante su proyecto político) y que, desde la perspectiva soberanista, constituyen la quintaesencia del derecho a decidir de la Cámara catalana. Dichas expresiones, sin embargo, se han incluido en el plugin no solo porque constituyen las acciones criminógenas esenciales que motivan las querellas, sino también porque coaparecen junto a modalizadores adjetivos y adverbiales de inequívoca polaridad negativa, que vehiculan la reprobación más severa del iscal (voluntad inequívoca e irreversible, aún más, con total desprecio, por la fuerza de los hechos consumados, su contumaz y obstinada voluntad). Se ha observado, además, que en la mayoría de las concordancias la palabra clave voluntad se reiere a los propósitos de los querellados y, por tanto, presenta una prosodia semántica acusadamente negativa (p. ej., voluntad obstativa, rebelde, de incumplir los mandatos, de desobedecer, de no dar cumplimiento a las decisiones, etc.). Del mismo modo, el adverbio constitucionalmente se emplea casi siempre para evaluar críticamente la conducta de Forcadell (ser constitucionalmente ilegítimo, ilícito; no resultar constitucionalmente admisible) y contribuye a la creación de la metáfora conceptual de fondo (Lakoff & Johnson, 2003) “el soberanismo rompe la legalidad”. Si las palabras con el índice más elevado de keyness representan los nudos temáticos de ambos textos y expresan signiicados ideacionales, las menos frecuentes pueden encapsular signiicados connotativos o metafóricos interpersonales. Por ej., el sustantivo ardid, los adjetivos camulada [retórica] y voluntarioso, el verbo enmascarar o el adverbio torticeramente, que ocupan respectivamente el lugar 1.136, 684, 428, 599 y 1041 de la keyword list, contienen metáforas lexicalizadas que descaliican el ethos de la querellada y ponen en entredicho su honradez institucional: Son estos actos de la Presidencia, utilizando torticeramente el Reglamento de la Cámara, los que lesionan el bien jurídico. [Forcadell sustituye la ejecución de la sentencia del TC] por un voluntarioso intercambio de argumentos con los que enmascarar la conducta desobediente […]. El pretendido ardid elucubrado para evitar la intervención de la Mesa y trasladar la eventual responsabilidad a un Pleno irresponsable no es sino una camulada retórica al servicio del incumplimiento. 96 Giovanni Garofalo En varias ocasiones ha sido necesario invertir la polaridad asignada por defecto por Lingmotif a algunos elementos léxicos que expresan sentimiento, p. ej., a la expresión ardid elucubrado. De hecho, el programa atribuye una polaridad positiva a cualquier sujeto lógico de un verbo implicativo como evitar (Sbisà, 2007: 59-62), que suele activar la presuposición de que la consecuencia evitada es mala y la causa que la evita es buena. El subdominio de la apreciación, en cambio, abarca el conjunto de evaluaciones sobre los ‘productos’ del Parlament (p. ej., la Resolución 1/XI del Parlamento de Cataluña, sobre el inicio del proceso político en Cataluña) o sobre el proceso de desconexión. Nótese que tanto resolución como proceso son palabras con un valor de keyness muy elevado (526.149 y 421.294), en cuyos clusters (a la derecha) iguran elementos léxicos que maniiestan una actitud censoria palmaria: La resolución [...] no es efecto de una aplicación de la Constitución, sino pura y simplemente, producto de [la] libertad [del Parlament], convertida irrazonablemente en fuente de norma particular. Al ratiicar y asumir como propias las conclusiones aprobadas por la referida comisión parlamentaria, el Parlamento de Cataluña elude los pronunciamientos de la STC 259/2015 e ignora las advertencias del ATC 141/2016, pues pretende dar continuidad y soporte al denominado “proceso constituyente en Cataluña” dirigido a su desconexión del Estado español. El análisis de las concordancias revela que resolución y proceso, palabras neutras en español estándar, se cargan de valor negativo, evidenciando una marcada preferencia semántica por relacionarse con elementos léxicos que remiten a conductas improcedentes o ilegales (irrazonablemente, eludir, ignorar, pretender, desconexión). 4.2. Ítems léxicos que expresan compromiso La semántica del compromiso presupone una interpretación heteroglósica de ambos textos de la Fiscalía, cuyo andamiaje argumentativo se construye a partir de la voz del oponente, con la que el emisor polemiza, en una continua tensión dialéctica. Desde una perspectiva polifónica e interaccionista, al recurrir a un verbo modal como deber, el iscal no pretende solo expresar un signiicado lógico-deóntico, sino que mani- Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 97 iesta también rechazo y hostilidad hacia la postura de los querellados. Nótese, p. ej., cómo el modal debe se revela útil para acometer contra la postura de Forcadell, quien apela a su inviolabilidad e invoca una interpretación elástica de la Ley: La STC nº 51/1985, de 10 de abril, estableció que todo lo que afecta a las prerrogativas parlamentarias debe ser interpretado de forma estricta, no cubriendo la inviolabilidad cualquier actuación, aún con relevancia política, del parlamentario. Así pues, si las frecuentes citas directas de la jurisprudencia del Tribunal Constitucional o del Tribunal Supremo funcionan como mecanismos de respaldo de la tesis defendida, el punto de vista de los querellados puede evaluarse y neutralizarse de forma más indirecta. Siguiendo con el análisis del signiicado interpersonal de los verbos modales, se observa que poder aparece con altísima frecuencia (en 19 de las 26 ocurrencias totales) en contrargumentaciones que apuntan a la total indisponibilidad del iscal para negociar con la opinión del contrincante. Por esta razón, suele coligarse con el adverbio de negación no o con palabras de polaridad negativa (en ningún caso): No puede alegarse para negar la desobediencia que la querellada o sus asesores llegaran a la conclusión de que lo realizado no incumplía las providencias del Tribunal Constitucional […]. La inviolabilidad no puede concebirse como cobijo de la arbitrariedad, sino que los actos parlamentarios quedan sometidos a la Constitución española. El ordenamiento jurídico, con la Constitución en su cúspide, en ningún caso puede ser considerado como límite de la democracia, sino como su garantía misma. La relevancia semántica de estas negaciones queda comprobada no solo por la presencia entre sus constituyentes de keywords como alegarse, concebirse o considerado (que ocupan respectivamente el lugar 1328, 218 y 1299 de la lista de palaras clave) sino también por su proximidad con conceptos nucleares expresados por palabras con un valor de keyness más elevado, p. ej., desobediencia, constitución, inviolabilidad, ordenamiento, que resultan totalmente coherentes con los guiones delictivos activados por el iscal y iguran entre las 50 primeras palabras clave. 98 Giovanni Garofalo Asimismo, evocan y evalúan negativamente la voz del oponente algunos elementos lexicogramaticales de valor evidencial y ciertos recursos ortográicos como el entrecomillado (p. ej., “el denominado “proceso constituyente” en Cataluña”, “una supuesta legitimidad democrática”, “el pretendido ardid elucubrado para evitar la intervención de la Mesa”). Los elementos valorativos mediante los cuales el iscal alude a la postura de los querellados son, en su aplastante mayoría, “proclamaciones”, es decir, enunciados implícitamente polifónicos mediante los cuales el emisor aumenta la fuerza de su compromiso con el contenido proposicional de sus aseveraciones. Se trata de una opción de “intravocalización cerrada” (White, 2001: 25), que evoca la voz del oponente para desacreditarla y suprimirla, limitando las posibilidades de interacción con la diversidad ideológica: El texto constitucional releja las manifestaciones del principio democrático, cuyo ejercicio no cabe fuera del mismo [STC 42/2014]. Por ello, el ordenamiento jurídico, con la Constitución en su cúspide, en ningún caso puede ser considerado come límite de la democracia, sino como su garantía misma (FJ 50). […]. 4.3. Valores que indican gradación Las valoraciones expresadas mediante una escala de grado apuntan a enfatizar la fuerza interpersonal que el iscal atribuye a sus enunciados o bien agudizan el foco de sus valoraciones. La ampliicación de la carga afectiva se logra, p. ej., mediante los adverbios focales o mensurativos en –mente (Pinuer Rodríguez y Oteíza Silva, 2015: 112-116). Los focales (p. ej., estrictamente, precisamente, meramente, etc.) explicitan que la entidad individuada está jerarquizada entre varias posibles y establecen una relación “entre su foco y el conjunto de alternativas posibles con las que se contraponen expresa o tácitamente” (NGLE, 2009: 2992). El adverbio que agudiza el foco de la valoración con el mayor índice de keyness (8.464) es claramente: aparece 10 veces y presenta una prosodia negativa en 9 casos (contravenir ̴ los mandatos, adoptar acuerdos ̴ contrarios, lesionar ̴ el bien jurídico, desbordar ̴ los estrechos márgenes de la excusa absolutoria). El signiicado interpersonal de este elemento es el de ‘estrechar el foco de la evaluación’, adscri- Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 99 biendo los hechos narrados a conductas típicas previstas y sancionadas en el código penal. Los adverbios mensurativos, en cambio, son cuantiicadores escalares o presuposicionales que acrecen la fuerza del posicionamiento intersubjetivo, ya que se forman a partir de adjetivos axiológicos (absoluto, sobrado). La cuantiicación que expresan sitúa un elemento dentro de un conjunto, donde se diferenciará por su posición escalar, que suele establecerse a partir de factores pragmáticos, dependientes de la subjetividad del emisor. El iscal opta por mensurativos que señalan el máximo grado de la escala negativa, entre los cuales, en la keyword list, aparecen, p. ej., absolutamente (0.742) y sobradamente (0.953): La actividad de la comisión creada resulta absolutamente inviable si no se entiende condicionada al cumplimiento de las exigencias de la Constitución. La resolución 1/XI […] excede sobradamente de los límites que [el TC] imponía a la Comisión de Estudio. Como puede observarse, el principio de gradación de fuerza opera intrínsecamente en los valores de actitud, en el sentido de que cada signiicado actitudinal representa un punto especíico en una escala de intensidad de menor a mayor. Para constituir el plugin, a los términos que identiican escuetamente una conducta delictiva (p. ej., celebrar el referéndum, ejecutar la acción típica, permitir la alteración del orden del día) se les ha asignado una valencia -2, mientras que a los ítems que maniiestan el máximo grado de reprobación o el máximo riesgo para el ordenamiento constitucional (con total desprecio de la Constitución, creación de un Estado catalán) se les ha atribuido una valencia -5. 5. Conclusiones Se ha ofrecido un análisis cuantitativo y cualitativo del léxico del sentimiento manifestado por la Fiscalía General de Cataluña en las dos querellas contra Carme Focadell i Lluís, presidenta del Parlamento catalán. A simple vista, los datos cuantitativos recabados con la aplicación Lingmotif indican que en ambos textos el iscal maniiesta una carga afectiva de cierta intensidad, con un valor de TSI (61) parecido a la línea de base (65) calculada en un corpus de referencia de 63 querellas (629.893 palabras en total). Profundizando más en el análisis semántico de los ele- 100 Giovanni Garofalo mentos lexicogramaticales, descubrimos que el vocabulario de Lingmotif, programado para el análisis de la lengua estándar, no detecta varios ítems léxicos de polaridad negativa que jalonan la argumentación del iscal. Por consiguiente, ha sido necesario constituir un vocabulario de complemento (plugin) especíico, que la aplicación permite incorporar. Los términos monolexémicos y polilexémicos incluidos en el plugin se han seleccionado manualmente, teniendo en cuenta su coherencia con el enmarcado cognitivo del texto (escenografía y guiones delictivos) y su calidad de palabra clave (keyness), sistematizando su valor axiológico según los tres dominios semánticos previstos por la teoría de la valoración (actitud, compromiso y gradación). Tras la incorporación del plugin especíico, se ha obtenido un valor de intensidad del sentimiento (92) sorprendentemente elevado, afín al índice de TSI que presentan algunas querellas del corpus de referencia, en las que el discurso judicial se hibrida con el político, lo cual parece indicar que ambas querellas contra Forcadell son ejemplos de politización de la justicia. Pese a la brevedad del corpus de estudio, la constitución del plugin ha permitido cuantiicar la carga valorativa de ítems léxicos que presentan una polaridad neutra en la lengua común, pero que cobran un signiicado negativo evidente en la escenografía y en el guion activados por el iscal (p. ej., permitir el debate y la votación; llevar adelante un proyecto político, proceso constituyente en Cataluña, etc.). De este modo, ha sido posible integrar la dimensión cognitiva especíica de este dominio penal al análisis de sentimiento realizado con Lingmotif. Por otra parte, el análisis cualitativo ha resultado esencial para seleccionar los ítems incluidos en el plugin según un criterio semántico-funcional. En concreto, las tres categorías que vertebran la Teoría de la Valoración han ofrecido la pauta de clasiicación de los elementos del léxico complementario, posibilitando, p. ej., la inclusión de elementos que deinen no solo la conducta ilícita de los querellados, sino también el posicionamiento intersubjetivo del iscal (valores de negación y de contraargumentación, entre otros), los valores de gradación y los enunciados implícitamente polifónicos. El trabajo de extracción de los candidatos a términos se hubiera podido intentar con la ayuda de un extractor estadístico, basado en algoritmos de aprendizaje automático que comparan las frecuencias de las palabras de un dominio especíico y de un corpus general, compaginando mediciones estadísticas con varias técnicas heurísticas (Moreno- Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 101 Ortiz & Fernández-Cruz, 2015: 333). Es bastante probable, sin embargo, que este enfoque estadístico no hubiera logrado resultados del todo satisfactorios, ya que, como se ha observado, la mayoría de los ítems incorporados al plugin pertenecen al vocabulario semitécnico, formado “por unidades léxicas del lenguaje común que han adquirido uno o varios nuevos signiicados dentro del español jurídico” (Alcaráz Varó & Hughes, 2002: 59) mediante un proceso de resemantización. Asimismo, la carga afectiva asociada a los recursos lexicogramaticales analizados parece depender en gran medida del contexto argumentativo de uso. De ahí la necesidad de elaborar ontologías jurídicas cada vez más ainadas que permitan la gestión automática y el análisis de sentimiento de documentos procesales. Bibliografía Alcaraz Varó, Enrique & Hughes, Brian. 2002. El español jurídico. Barcelona: Ariel Derecho. Baker, Paul. 2006. Using Corpora in Discourse Analysis. London/New York: Continuum. Biber, Douglas. 2009. A corpus-driven approach to formulaic language: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics 14: 275-311. Charaudeau, Patrick & Maingueneau, Dominique. (2002) 2005. Diccionario de análisis del discurso. Buenos Aires: Amorrortu. Díaz Rojo, José Antonio. 2010. El lenguaje valorativo en noticias periodísticas españolas sobre avances médicos. Tonos 20. https://www.um.es/tonosdigital/znum20/secciones/estudios-5-el_lenguaje_valorativo_en_noticias.htm [Acceso 10/08/2017]. Garofalo, Giovanni. 2009. Géneros discursivos de la justicia penal. Milano: FrancoAngeli. Goffman, Erving. 1981. Forms of Talk. Philadelphia: University of Pennsylvania Press. Halliday, M.A.K. & Hasan, Ruqaiya. 1985. Language, Context and Text: Aspects of Language in a Social-Semiotic Perspective. Oxford: Oxford University Press. Lakoff, George & Johnson, Mark. 2003. Metaphors we live by. Chicago/London: The University of Chicago Press. Liu, Bing. 2012. Sentiment Analysis and Opinion Mining. San Rafao, CA: Morgan & Claypool Publishers. 102 Giovanni Garofalo López Ferrero, Carmen. 2008. La valoración y la emoción en español en discursos especializados. En Moreno Sandoval, Antonio (ed.) El valor de la diversidad (meta)lingüística: Actas del VIII congreso de Lingüística General. http://www.lllf.uam.es/clg8/actas/index.html [Acceso 10/08/2017]. Maingueneau, Dominique. 1993. Le contexte de l’oeuvre littéraire. Énonciation, écrivain, société. Paris: Duunod Martin, James R. & Rose, David. 2007. Working with Discourse. London/New York: Continuum. Martin, James R. & White, Peter R.R. 2005. The Language of Evaluation, Appraisal in English. London/New York: Palgrave Macmillan. Martin, James R. 2000. Beyond Exchange: APPRAISAL Systems in English. En Hunston, S. & Thompson, G. (eds), Evaluation in Text. Oxford: Oxford University Press. Martin, James R. 2003. Introduction. Text 23(2): 171-181. McNeill, Patrick. 1990. Research Methods. London: Routledge. Moreno-Ortiz, Antonio & Fernández-Cruz, Javier. 2015. Identifying polarity in inancial texts for sentiment analysis: a corpus-based approach. Procedia. Social and Behavioral Sciences 198: 330-338. Moreno-Ortiz, Antonio. 2017. Lingmotif: A User-focused Sentiment Analysis Tool. Procesamiento del Lenguaje Natural, Revista 58: 133-140. Nerhot, Patrick. 1990. The law and its reality. En Nerhot, P. (ed.) Law, interpretation and reality. Dordrecht/Boston/London: Kluwer, 50-69. Pinuer Rodríguez, Claudio & Oteíza Silva, Teresa. 2015. Los adverbios en -mente como factor de valoración en el discurso de la historia. Verba 42: 99-134. Real Academia Española & Asociación de Academias de la Lengua Española. 2009. Nueva Gramática de la lengua española. Madrid: Espasa. (NGLE). Sbisà, Marina. 2007. Detto e non detto. Roma/Bari: Laterza. Schank, Roger & Abelson, Robert. 1987. Guiones, planes, metas y entendimiento: un estudio de las estructuras del conocimiento humano. Barcelona: Paidós [1977]. Serpa, Cecilia. 2011. Signiicados interpersonales en los géneros legislativos: el texto como macropropuesta. Pragmalingüística 19: 96-114. Taranilla, Raquel. 2012. La Justicia Narrante. Cizur Menor: Aranzadi. Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam/Philadelphia: John Benjamins. Weber, Max. (1921) 1977. Economía y sociedad. México: Fondo de Cultura Económica. Persiguiendo con imparcialidad “el total desprecio a la Constitución”... 103 White, Peter R.R. 2001. An introductory tour through appraisal theory. The Appraisal Website. http://www.grammatics.com/appraisal/ [Acceso 29/03/2017]. White, Peter R.R. 2003. Beyond modality and hedging: A dialogic view of the language intersubjective stance. Text 23(2): 259-284. Anthony, Laurence. 2014. AntConc (Version 3.4.3) [Computer Software]. Tokyo: Waseda University. http://www.laurenceanthony.net/ Moreno-Ortiz, Antonio. 2016. Lingmotif 1.0 [Computer Software]. Málaga: Universidad de Málaga. http://tecnolengua.uma.es/lingmotif ojs.uv.es/index.php/qilologia/index Qf Lingüístics The malleability behind terms referring to common professional roles: the current meaning of “boss” in British newspapers La maleabilidad de los términos referidos a los roles profesionales comunes: el signiicado actual de boss en la prensa británica Rosa Giménez-Morenoa & Francisco Miguel Ivorra-Pérezb Universitat de València. [email protected] Universitat de València. [email protected] Received: 19/04/2017. Accepted: 10/10/2017 a b Resumen: El objetivo de la presente investigación es abordar la variación y ductilidad de conceptos aparentemente claros e inequívocos relacionados con los roles profesionales habituales. El estudio se centra en las estructuras semánticas, y subsecuentes modelos cognitivos, asociados con el término boss, tal y como son expresados y transmitidos en la actualidad a través de los grandes medios de comunicación británicos. El análisis lingüístico, cualitativo y cuantitativo, de un corpus signiicativo de textos en los que aparece este término muestra claras diferencias en su signiicado, dependiendo de factores clave como la orientación sociopolítica e ideológica de la plataforma de publicación. Palabras clave: semántica cognitiva; lingüística de corpus; modelos mentales; roles profesionales; prensa británica. Abstract: The aim of the present research is to approach the current variation and vulnerability to manipulation of concepts, apparently clear and unambiguous, related to usual professional roles. The study concentrates on semantic frames, and subsequent, cognitive models associated to the term ‘boss’ as they are expressed and transmitted through large-scale British media. The qualitative and quantitative linguistic analysis of a substantial corpus of texts, in which this term appears, shows clear differences in its meaning, depending on key factors such as the socio-political and ideological orientation of the medium of publication. Keywords: cognitive semantics; corpus linguistics; mental models, professional roles; British press. Giménez-Moreno, Rosa & Ivorra-Pérez, Francisco Miguel. 2017. “The malleability behind terms referring to common professional roles: the current meaning of ‘boss’ in British newspapers”. Quaderns de Filologia: Estudis Lingüístics 22: 105-128. doi: 10.7203/qf.22.11303 The malleability behind terms referring to common professional roles... 107 1. Introduction There are a number of relational identities and communicative roles (Sluss & Ashforth, 2007) used daily by a great majority of speakers (eg. father, neighbour, colleague, employee, etc.). These identities are named through widespread standard terms that are usually deined briefly and simply; for example, the identity of a “boy or a man in relation to either or both of his parents” is generally referred to as “son” and can be simply deined as “a male descendant” (Oxford English Dictionary). However, despite their apparent simplicity, these generic terms relect complex mental constructs that are very sensitive to cultural variation, socio-political variation, inter-generational variation, etc. (van Dijk, 2006, 2008). Depending on each of these parameters of variation, the mental models attached to these terms, which help in the inference of their pragmatic meaning, are conigured dependent on different stereotypes, connotations and socio-cognitive standards, belonging therefore to various semantic ields and frames (Lehrer & Kittay, 1992). From this variation-sensitive perspective of words concerning communicative roles, the present study focuses on the term “boss”, referred to “a person who is in charge of a worker or organization” (OED), and also on its closest synomyms: CEO, chairman, chief, chief executive, director, employer, head, leader and top. Our aim is to observe the ductility of this concept in today’s mass media, paying particular attention to its compliance with the different socio-political ideologies and perspectives that underlie these media. The research is framed within the ield of corpus-based cognitive semantics applied to professional communication. After essential background about semantic ields and frames is exposed, the paper summarises the range of deinitions, synonyms and characteristic expressions associated to the term “boss” according to the major dictionaries in use. Then, the target terms are analysed in a corpus of texts belonging to the British mass media, and both quantitative and qualitative methods are used in the lexical and semantic description which leads to the results. 2. Semantic frames and lexical ields associated with a company’s structure Interest in lexical ields and semantic frames has been growing exponentially since the 1970s (Habermas, 1970; Lehrer, 1974), and especial- 108 Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez ly since the 1990s (Lehrer & Kittay, 1992), in parallel with the development of other complementary disciplines such as artiicial intelligence, computational linguistics, cognitive psychology and interdisciplinary linguistics. According to Fillmore and Atkins (1992:76), semantic ield theories study, characterise and catalogue “systems of paradigmatic and syntagmatic relationships connecting members of selected sets of lexical items”. Cognitive frames or “knowledge schemata” can only approach a word’s meaning “with reference to a structured background of experience, beliefs, or practices, constituting a kind of conceptual prerequisite for understanding the meaning” (p. 77). The meaning of a word can only be fully understood “by irst understanding the background frames that motivate the concept that the word encodes”. Cognitive frame analysis, and semantic parsing, has become very popular and productive, especially within the area of computer sciences, with the development of language processing applications based on lexical resources such as FrameNet, WebNet or WordNet (Shi & Mihalcea, 2005). However, the notion of “semantic frame” was originally proposed by Fillmore (1977, 1985) and has also become central in cognitive linguistics, together with key related and interdependent concepts such as “domain” and “cognitive model” (Lakoff, 1987; Van Dijk, 2006, 2008), that have been essential in the development of research areas such as critical discourse analysis (CDA) and of knowledge structures such as metaphor, metonymy and other communicative igures. The basic assumption of frame analysis is that word meaning understanding and interpretation requires the recognition of the relevant contextually related background information within which that word is expressed, which conforms its semantic frame. According to Fillmore and Baker (2011: 317), frame analysis implies a thorough methodological procedure which allows identifying the essential frame elements and lexical units, necessary to make accurate and objective interpretative observations. In the present study we will adapt this context-based procedure to approach words related to professions, particularly the word “boss”. Historically, as we will comment on in the following section, the conceptualisation of the term “boss” has been associated with a number of key concepts in the past that conform its lexical ield; however, today’s interpretation of this term seems dependent on other mental models and experiential constructs developed by current speakers, with their present interpretative criteria, concerns, habits, values and way of under- The malleability behind terms referring to common professional roles... 109 standing the reality that surrounds the concept of “boss” at the moment. Although our study will be limited to this concept, there is evidence that this semantic luctuation affects many other terms within business English (i.e. Nelson, 2005). Traditional terms referring to a company’s structure (e.g. president, advisor, administrator, oficer, supervisor, etc.) are adapting their semantic and pragmatic coverage, not only due to the evolution of socio-economic trends and political ideologies, but also by technological implementation and the modernisation of corporate cultures to foster innovation, motivation and effectiveness in their companies (Camisón & Villar-López, 2014) 3. Deining the lexical ield of the word “boss” According to the most popular and prestigious monolingual dictionaries of English (i.e. Cambridge Dictionary, Merriam-Webster, MacMillan Dictionary or Collins Dictionary), the general meaning of the noun “boss” refers to “a person who exercises control or authority” or “a person who makes decisions, exercises authority, dominates, etc.” According to the Merriam-Webster Dictionary, the etymology of this term goes back to the Dutch word baas, meaning “master”, used in the Dutch colonies settled in North America during the 17th century. The word became popular as a free-labour alternative to avoid the slave-labour related term “master”. This original dual positive-negative meaning continues to persist up until now. This word has a polysemic meaning. In fact, its irst dated use in the 13th century places its origins in the Old French and Middle English word boce, which belonged to the world of architecture and geology, and referred to a circular ornamental decoration (MacMillan Dictionary). Also, according to Dictionary.com it also refers to a young cow or calf in biology, a round growth or protuberant part on the body in medicine, a form of protection for a book and a projecting part of a ship’s hull. This term is also used as an adjective in slang English, meaning “very good, excellent, incredibly awesome, great” (Internet Slang Dictionary and The Urban Dictionary). In the present study these meanings are discarded, concentrating on its meaning inside the world of business and politics. Within this lexical ield we ind speciic deinitions, such as the person “who directs or supervises workers” (Merriam-Webster), “the person who is in charge of an organization and who tells others what to do” or “the manager, the 110 Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez person who employs or superintends workers” (Dictionary.com), and also other more elaborate and complete descriptions: An individual that is usually the immediate supervisor of some number of employees and has certain capacities and responsibilities to make decisions. The term itself is not a formal title, and is sometimes used to refer to any higher level employee in a company, including a supervisor, manager, director, or the CEO (Online Business Dictionary). Its adaptation to political contexts generates more clear-cut deinitions such as “the head of a group (as a political organization)” or the person “who controls votes in a party organization or dictates appointments or legislative measures” (Merriam-Webster), or “a politician who controls the party organization, as in a particular district” (Dictionary. com). As we see in most of the dictionaries cited, this neutral or positive meaning of the word, as part of the professional hierarchies and responsibilities, seems to be the most widely-accepted, being also expressed through other synonymous terms such as: superior, manager, director, president, managing director, CEO, chief, supervisor, head, foreman, overseer, founder, governor, magnate, taskmaster, master, captain, superintendent, commander, employer, master, trainer, wield power, authority, etc. Nevertheless, the negative, derogatory and sarcastic version of its meaning still persists and is increasingly rooted in today’s society. This negative side of the term can be clearly observed when looking at its phrasal use in “to boss someone around” which is deined as “to give orders to, especially in an arrogant, authoritative, or domineering manner” (Free Dictionary and Dictionary.com). This adverse meaning is evident in the deinitions that appear in slang dictionaries: “someone who runs shit in his/her hood or city” or “bosses are like diapers: full of shit and all over your ass” (Urban Dictionary). It is also observed in the additional set of metaphorical, hyperbolic and derogatory synonyms, pointed out by most of the above dictionaries, that currently substitute or alternate with “boss”, especially in slang and casual registers of English, accentuating three negative dimensions of the term: • Oppressive and despotic (e.g. padrone, Goliath, fuhrer, dictator, king, etc.) The malleability behind terms referring to common professional roles... • • 111 Old-fashioned and obsolete (e.g. overlord, skipper, warlord, the powers that be, wear the pants or trousers, etc.) Sarcastic and ridiculing (e.g. big cheese, top dog, top cat, head honcho, big shot, etc.) The irst negative concept of the term has developed from its natural duality, which instigated its origin in the 17th century, and it is still a focus of concern within the professional community, as we see in the following research articles: “The boss is watching your every click …” (Newitz, 2006), “Privacy in electronic communication: watch your e-mail, your boss is snooping!” (Kierkegaard, 2005), “In nomine patris: discursive strategies and ideology in the Cosa Nostra family discourse” (Indio et al., 2017). The second dimension is also latent, as we see in “Being the boss is not what it used to be!” (Muller-Smith, 1998) or “Why are there bosses?” (Hess, 1983). Finally, specialists already warned twenty years ago about the third negative trend of its meaning, in publications such as “When the boss is away” (Clarck & Riddick, 1991) or “Think your boss is incompetent? You’re probably right” (Buchanan, 2009). This phenomenon has accelerated considerably in the last ten years, together with the global economic, social and ethical crisis, and the way in which society and the media are approaching the values, attitudes and mental models related to this term (i.e. courage, control, respect, authority, etc.) are affecting its present and probably future meaning and use (Uhl-Bien & Carsten, 2007). On this basis, our aim here is to study the current semantic frames and subsequent cognitive models associated to the term “boss” as they are expressed and transmitted through large-scale British media. 4. Methodology and corpus analysis A sample of 40 articles from two acclaimed British digital newspapers, The Guardian and The Telegraph, has been compiled and analysed. The corpus contains about 50,000 words, including 20 articles from each newspaper, both with a balanced length of approximately 25,000 words. They are representative of the British mass media and, more importantly, respond to British bipartisan politics relected in different socio-political trends, which is of great interest for our research purposes (e.g. The Guardian has traditionally been associated with a centre-left 112 Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez political ideology while the The Telegraph holds a more centre-right, conservative orientation). As we are interested in examining the different mental models attached to the meaning of the term “boss”, in the context of Brexit and the global socio-economic crisis, we have particularly drawn our attention to analyse articles included in the “business section”, such as those related to inance, retail or economy during the year 2016. As far as the method of analysis is concerned, we have found it convenient to adapt Fillmore’s and Baker’s frame analysis (2011) to our study. As this is a preliminary research on the variation meaning of the term “boss” in the British mass media, we have only focused on the irst three steps that the aforementioned authors establish in the FrameNet process (pp. 321-22). Firstly, we have characterised the frames making up the sample of analysis; secondly, we have concentrated on describing and naming the elements that belong to those frames; inally, we have selected the main lexical units frequently included in the frames. Both a qualitative and a quantitative analysis are followed. To do so, we have made use of the corpus manager and analysis software Sketch Engine (2003). The main indings are distributed into two main parts. One is devoted to describing and discussing the results extracted from a qualitative overview based on the concordance search analysis. The other is focused on the quantitative results drawn from applications such as word lists and frequencies, collocations and word sketch. 5. Results and discussion 5.1. A qualitative overview The indings obtained from the concordance search analysis of the term “boss” indicate important differences between both data sets. As regards The Guardian, it is observed that this term leads to and is included in a major distinctive frame that semantically connotes a person who adopts a pessimistic and uncertain attitude towards the economic situation the UK may face after the Brexit vote as well as someone who is not deprived of corruption and owns unfair privileges over employees or the rest of the population. On the contrary, the indings drawn from The Telegraph data set show that the semantic frame in which the term “boss” is included differs considerably from that of The Guardian. In The malleability behind terms referring to common professional roles... 113 this case, the semantic connotation of the term points towards a person who has a more encouraging attitude towards the Brexit vote results and can give hope and improve the economic situation of the UK despite the dificulties the country may have. Additionally, less importance is given to cases of corruption committed by those who are at the top. To appreciate these two apparently contrastive semantic frames, a small selection of the most representative terms extracted from the concordance search analysis, both in The Guardian and in The Telegraph, is shown in Table 1: Word categories Noun/adjective+noun Adjective Verb The Guardian hard Brexit abuse of position low-paid insecure jobs fraud false accounting signiicant economic damages corrosive impact charges problems prison serious implications cautious dumb fat lazy stupid accused criticized sabotage spend raided seized sentenced suffer The Telegraph Brexit era, expertise job creation respected boss investment strong economy new opportunities reassurance growth success sense of calm conident reliable able dynamic clear carry on keep calm committed to maximizing contribute commits to help create ensure get on Table 1. Examples from the concordance search analysis: The Guardian and The Telegraph 114 Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez To throw some light to the above observational indings, a few extracts from The Guardian are reproduced next. It is important to observe that the connotations linked with the term “boss”, which were previously commented, are also interrelated with grammatical features such as the use of supporting data (£5.5 m, 10%), the inclusion of boosters (signiicant, pretty) as well as speciic collocations and idiomatic expressions (false accounting, abuse of position). These seem to be included with the intention of reinforcing the more negative attributions of the term: • • • • “Loans boss paid hackers to attack consumer websites, court told… was sentenced to four months in prison…the businessman’s home was raided and his computer equipment seized…There is a low risk of him committing further offences of this nature”. “Pay ratio between bosses and employees will be ‘2016’s hot topic’… K’s top bosses received 10% pay rise in 2015 as average salary hit £5.5m…The bosses of Britain’s largest public companies earned an average of £5.5m last year, and have enjoyed a 10% pay rise while wages in the rest of the economy lag far behind…”. “Britain will end up looking stupid over Brexit, says Ryanair boss… The UK is going to suffer some signiicant economic damage when they get into the entrails of the Brexit decision…The UK will end up looking pretty stupid, he said”. “Ex-Tesco bosses to appear in court on fraud and false accounting charges… The former Tesco bosses are all charged with one count of fraud by abuse of position and one count of false accounting”. In relation to the examples selected from The Telegraph, we can perceive that the semantic connotations held towards the term “boss” are also interrelated with some particular grammatical elements. For instance, it is worth considering the presence of hedged expressions by means of probability adverbs, verbs or linking words of contrast (unlikely, almost, predicted, despite, etc.) to mitigate somehow the more positive connotations concerning the term under analysis: • “The boss of Britain’s biggest business group said it was vital policymakers worked closely with companies to set out a clear plan to ensure the UK remained a top investment destination… He also urged policymakers to maintain a ‘sense of calm’ regarding the millions of EU workers and pensioners who are currently living in the UK …”. The malleability behind terms referring to common professional roles... • • 115 “Brexit is unlikely to lead to a sudden decline in London’s status as one of the leading centres for the global capital markets, the boss of Barclays has predicted”. “British bosses are more upbeat about business prospects this year than almost every other major advanced economy, as companies ‘keep calm and carry on’, despite domestic and global uncertainty”. Although these have been the results obtained from a qualitative overview analysis, it is necessary to provide more convincible results by means of an analysis of a more quantitative nature. As such, the next sub-section particularly concentrates on describing and discussing the quantitative indings emerging from our analysis. 5.2. Quantitative analysis 5.2.1. Word lists and frequencies The word lists and frequencies analysis for the term “boss” or its plural form “bosses” yields interesting indings for both samples. To start with, the general use of this term, in raw frequencies, is slightly higher in The Guardian (153)1 than in The Telegraph (96), which may suggest that the term is more prone to be included in newspapers with a more left-wing political orientation like The Guardian. If the term “boss” co-occurs with different synonyms, as our preliminary observational analysis has revealed, we have found it important to take them into account in our quantitative analysis. We are referring to words such as CEO, chairman, chief, chief executive, director, employer, executive, top, along with its plural forms. The results show that, except for the term “director”, whose frequency is practically similar in both corpora (G22/T20)2, The Guardian includes a wider number of synonyms. The most widely used in both data sets is “chief” (G115/ T69) either alone or in combination with “executive” (G90/ T42) or “executives” (G60/ T35). In this newspaper, the above synonyms are followed, in terms of frequency of use, by “top” (49), “chairman” (37), “leaders” (27), “director” (22), “CEO” (21), “directors” (16), “employFrom now onwards the numbers included in brackets refer to raw frequencies. From now onwards G will be the abbreviation for The Guardian, and T for The Telegraph. 1 2 116 Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez ers” (16), “CEOs” (15), “head” (15) and “employer” (6). With respect to The Telegraph, apart from the higher frequencies obtained for “chief”, be it alone or in combination with “executive” or “executives”, “chairman” (21) is the most widely used frequent term followed by “director” (20), “top” (20), “head” (13), “CEO” (9), “CEOs” (8) and “leaders” (7). It is noteworthy that no instance of the term “employer” or “employers” is found in The Telegraph sample. A relevant observation is that the terms “boss” or “bosses” and its synonymous counterparts are frequently substitued by means of pronouns performing an anaphoric function in the text. The analysis reveals that the frequencies of these pronouns are also higher in The Guardian, perhaps in tune with the characteristic freedom of expression of this newspaper, than in The Telegraph: “He” (G79/T37), “His” (G13/T5), “he” (G203/T137), “him” (G15/T7), “his” (G105/T67), “they” (G131/ T61), “their” (G142/T79), “them” (G42/T18), “themselves” (G8/T0). It is also noticeable that there is a tendency to include more plural forms of this term in The Guardian data set. Remarkable differences have also been encountered in the frequency of words surrounding the term “boss” and its main synonyms, coniguring different semantic frames, which relect a signiicant degree of variation in the current mental models which conceptualise this word. This contrast can be observed in Table 2, which shows a small sample of the most distinctive word categories obtained in the word lists and frequencies analysis: The malleability behind terms referring to common professional roles... Word categories Nouns Adjectives Verbs 117 The Guardian The Telegraph beneits (12), Brexit (91), change (29), charges (13), consequences (7), costs (29), court (11), crisis (16), data (20), decline (15), economy (68), employee (177), evidence (11), executive (150), igures (15), indings (7), growth (62), London (54), losses (9), measure (77), pressure (15), productivity (22), remuneration (20), risk (13), roles (7), salaries (9), source (8), staff (49), strategy (16), success (16), survey (32), UK (234), uncertainty (23), vote (59), wage (25), warning (11), wellness (12), workers (32), etc. cautious (15), chief (115), clear (21), committed (5), false (6), fat (5), inancial (66), global (54), hard (13), living (24), low (16), minimum (8), national (24), new (85), possible (12), signiicant (22), worry (6), worth (9), wrong (5), etc. accused (5), believe (8), change (29), charged (6), committed (5), earn (8), employs (10), encourage (11), face (13), found (17), help (17), hit (13), improve (13), pay (117), reduce (9), reform (11),shows (12), solve (6), suffer (6), tackle (9), think (45), trying (9), voted (10), want (26), warned (29), etc. beneits (0), Brexit (49), change (11), charges (5), consequences (0), costs (9), court (0), crisis (0), data (7), decline (6), economy (37), employee (0), evidence (0), executive (77), igures (0), indings (0), growth (46), London (39), losses (0), measure (0), pressure (8), productivity (8), remuneration (0), risk (0), roles (0), salaries (0), source (0), staff (23), strategy (7), success (9), survey (13), UK (137), uncertainty (19), vote (34), wage (0),warning (0), wellness (0), workers (0), etc. cautious (0), chief (69), clear (13), committed (7), false (0), fat (0), inancial (44), global (35), hard (5), living (6), low (8), minimum (0), national (6), new (65), possible (69), signiicant (10), worry (0), worth (0), wrong (0), etc. accused (0), believe (0), change (11), charged (0), committed (7), earn (0), employs (0), encourage (0), face (0), found (7), help (10), hit (0), improve (9), pay (10), reduce (0), reform (0), shows (0), solve (0), suffer (0), tackle (0), think (27), trying (0), voted (0), want (19), warned (13), etc. 118 Word categories Adverbs Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez The Guardian The Telegraph actually (8), already (27), even (26), increasingly (6), less (20), likely (17), many (40), more (173), most (56), not (195), probably (7), really (18), etc. actually (0), already (17), even (13), increasingly (0), less (10), likely (9), many (21), more (100), most (36), not (96), probably (0), really (11), etc. against (7), by (118), forward (0), over (44), under (11), up (74), etc. He (37), he (137), his (67), him (7), I (86), me (7), they (10), their (79), them (18), themselves (0), this (23), when (28), where (20), who (57), you (7), your (6), etc. do (45), does (28), can (43), could (32), had (42), has (156), have (107), might (0), should (18), will (147), would (80), etc. But (34), Despite (5), However (16), If (13), also (78), and (578), as (212), because (17), but (75), despite (15), if (30), like (17), must (8), or (40), than (63), though (6), while (22), etc. against (22), by (247), forward (5), over (78), under (23), up (106), etc. Pronouns He (79), he (203), his (105), him (15), I (102), me (13), they (21), their (142), them (42), themselves (8), this (41), when (49), where (33), who (103), you (14), your (11), etc. Auxiliary do (84), does (40), can (61), and modal could (89), had (93), has (230), verbs have (208), might (17), should (43), will (221), would (170), etc. Connectors But (64), Despite (11), However (27), If (31), also (95), and (962), as (328), because (29), but (137), despite (24), if (66), like (36), must (17), or (81), than (120), though (12), while (35), etc. Prepositions Table 2. Comparison of word categories from the word lists and frequencies analysis of The Guardian with The Telegraph. Raw frequencies The above indings can corroborate the results obtained from the concordance search analysis discussed in the previous subsection. As for The Guardian, the results obtained reinforce the mental model of “boss” as a person who seems to hold a distrustful attitude towards the United Kingdom’s withdrawal from the European Union and its future consequences for the UK economy (e.g. Brexit, cautious, consequences, The malleability behind terms referring to common professional roles... 119 face, hit, vote, voted, worry, wrong, etc.) and feels insecure and uncertain about the economic situation of the country if it inally leaves the EU (e.g. crisis, decline, economy, employs, hard, hit, losses, pressure, productivity, risk, suffer, uncertainty, etc.). In the same vein, there is a higher frequency of words that refer to “boss” and its synonymous expressions as someone involved in cases of corruption and owning more privileges than the staff or the rest of the population (e.g. accused, beneits, costs, court, earn, false, fat, hit, pay, remuneration, salaries, wages, etc.). These negative connotations and its corresponding synonyms are also translated in a high frequency of prepositions connoting strong opposition, as seen in “against” (G22/T7). However, not all the mental model is so negative in this part of our corpus. Words relating “boss” to someone who can provide solutions despite the uncertainty and insecurity regarding the Brexit vote are also frequently used (e.g. change, encourage, forward, improve, measure, reform, solve, strategy, success, tackle, wellness, etc.) We can also remark in the sample analysed that, in order to justify their own opinions towards the economic and inancial situation of the UK, the term “boss” or “bosses” and their synonymous expressions are surrounded by words that semantically connote a person who constantly resorts to the use of proofs demonstrating the veracity of his/ her views (e.g. data, evidence, igures, indings, source, survey, shows, etc.), together with passive sentences including the agent who performs the action preceded by the preposition “by”, whose frequency is also much higher in The Guardian (G247/T118). These viewpoints are frequently communicated through the higher use of emphasising adverbs, irst person singular pronouns and addition linking words to reinforce bosses’ opinions on the problems associated with the UK (e.g. actually, already, also, and, even, I, increasingly, many, more, most, really, etc.). Nonetheless, despite the veracity of their opinions and relections, these are frequently mitigated by means of cognitive verbs as well as modal verbs and adverbs of probability acting as hedges (e.g. believe, can, could, likely, might, should, think, would, etc.). This understatement is also conveyed through the high frequency of contrastive linking words (e.g. but, despite, however, if, or, though, while, etc.). When comparing the results drawn from The Guardian with the ones obtained in The Telegraph, a partially different picture seems to emerge. The frequency rates, and the semantic frame related to “boss” 120 Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez in this sub-corpus, seem to spin around terms such as chief, new, Brexit, growth, inancial, economy, executive, global, London, etc. The mental model attached to those words differs considerably between both samples: unlike the dark and discouraging attitude that their meaning connotes in The Guardian, in The Telegraph their semantic connotations evolve around someone closely associated to power centres (both locally and globally), who has a more optimistic attitude towards the Brexit election and calms down the UK population by assuring them that the Brexit is not going to change the economic situation of the country in the future. By the same token, there are even terms in The Guardian which are completely absent in The Telegraph. This may portray an image of the “boss” and its synonymous related terms as someone who, despite being attributed cases of corruption and unfair privilege, has the capacity to act as an adviser and expert trying to relax the UK citizenship with solutions and promising a good forecast for the country. Firstly, we observe that despite the awareness of the Brexit vote and the consequences this may have for the UK’s economy, the attitude held by bosses is not as pessimistic and dubious as the one revealed in The Guardian. This can be demonstrated, on the one hand, by the lower frequencies obtained for terms such as Brexit, costs, crisis, decline, economy, hard, national, pressure, productivity, staff, uncertainty, vote, workers, etc. and, on the other, the complete absence of terms such as cautious, consequences, employee, face, hit, losses, risk, suffer, worry, wrong, etc. Likewise, the frequency of words referring to “boss” and its synonymous related terms connoting someone involved with bribery, fraud and in an advantaged position with respect to employees or the rest of the citizens is also lower (e.g. costs, economy, hard, low, over, pay, staff, etc.). In addition, there are null frequencies for signiicant terms such as accused, beneits, court, earn, employee, face, false, hit, remuneration, roles, salaries, suffer, wage, workers, etc. Apart from that, prepositions connoting negative meanings like “against” (G22/T7) appear in much lower frequencies if we compare them with The Guardian data set. Regarding the concept of “boss” as a person who has the ability to provide solutions despite the British drawbacks as a result of the Brexit vote, the indings uncover that the terms semantically connoting this meaning also appear in lower frequencies than in The Guardian sample The malleability behind terms referring to common professional roles... 121 (e.g. change, clear, help, improve, new, strategy, success, warned, etc.). Furthermore, no instances have been found for terms such as encourage, forward, measure, reform, solve, tackle, trying, etc. If in The Guardian we have found terms that semantically evoke veracity so as to justify bosses’opinion regarding the economic and inancial situation of the UK, the frequencies of these terms in The Telegraph are also much lower (e.g. data, like, survey, when, where, etc.) and no instances have been found for terms such as evidence, igures, indings, source and shows. Concerning passive sentences followed by the preposition “by” with reference to the agent who performs an action, the frequencies obtained are also much lower than in The Guardian (G247/ T118). The same applies to the use of emphasising adverbs and additional connectors to reinforce bosses’s views on the economic problems of the UK (e.g. already, also, and, many, more, most, really, etc.) and no instances are found for “actually” or“increasingly”. In keeping with this line of thought, the frequency of cognitive verbs, modal verbs and adverbs of probability functioning as hedges to downtone bosses’ statements is lower too (e.g. can, could, likely, should, think, would, etc.) and others like “believe” or “might” are null. Finally, the recurrence to linking words of contrast to understate bosses’ viewpoints are not as frequent as those included in The Guardian (e.g. but, despite, however, if, or, though, while, etc.). 5.2.2. Collocation analysis The collocation analysis for the term “boss” also unveils interesting indings as far as both data sets are concerned. As such, the words that co-occur with the term “boss” in both samples indicate divergent frequencies, as seen in Table 3. The data shown in this table corroborate the trends and contrasts already indicated in the previous indings. The words co-ocurring with the term “boss” in The Guardian data set semantically connote someone who has many doubts and indecisions regarding the economic problems the UK citizenship may face after the Brexit political elections as observed in the higher frequencies obtained if these are compared with the ones found in The Telegraph (e.g. bank, Britain, company, customer, cut, crisis, staff, warn, etc.). Nevertheless, no co-ocurrence terms such as company, customer, crisis, cut, not, price, and staff for the word 122 Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez “boss” have been found in The Telegraph sample. As most of the times there is a reference to the future consequences of the Brexit vote, it is not surprising to frequently ind the preposition “after” (G81/T0) and no instance of the latter in The Telegraph data set. The Guardian The Telegraph after (81), bank (51), beneit (18), big (52), Britain (80), British (49), chief (118), company (219), customer (81), cut (25), could (89), crisis (17), deal (66), employee (3), executive (4), ind (41), he (282), high (41), insist (0), London (54), more (176), most (58), new (86), not (273), over (81), pay (151), price (66), receive (13), rise (68), say (419), staff (50), tell (30), than (120), their (142), them (42), they (152), top (55), UK (233), want (37), warn (44), we (237), will (0), year (214), etc. after (0), bank (26), beneit (0), big (0), Britain (37), British (23), chief (0), company (0), customer (0), cut (0), could (0), crisis (0), deal (0), employee (0), executive (0), ind (0), he (174), high (0), insist (9), London (0), more (101), most (73), new (66), not (0), over (44), pay (0), price (0), receive (0), rise (0), say (337), staff (0), tell (0), than (0), their (0), them (0), they (0), top (0), UK (137), want (0), warn (21), we (0), will (154), year (120), etc. Table 3. Comparison of words from the collocation analysis of The Guardian with The Telegraph. Raw frequencies About the words that co-occur with the term “boss” semantically connoting a corrupted person and enjoying more beneits than the rest of the people, the frequency of words that collocate with this meaning is also higher in The Guardian (e.g. big, company, high, more, over, pay, receive, rise, than, their, them, they, top). Nonetheless, terms such as big, chief, company, employee, executive, high, pay, rise, than, their, them, they, top are not found in The Telegraph. The veracity and truthfulness of bosses’ opinion are shown in the frequent use of the verb “ind” in The Guardian whereas the latter does not appear as a collocation term for the word “boss” in The Telegraph. Additionally, the use of the modal verb “could” in The Guardian and its absence as a collocation word in The Telegraph may imply, as observed in previous analyses, that the views held by bosses tend to be understated in the former. Aside from that, the higher use of co-ocurring terms such as deal, new, want, or we can convey the idea that the term The malleability behind terms referring to common professional roles... 123 “boss” is related to someone who, despite his or her gloomy attitude for the economic and inancial inconveniences the UK may have, has the ability to act as an adviser, expert, promoting initiatives and solutions in collaboration with the rest of the citizens to sort out the current shortcomings. One inal point to be made is that in The Guardian the presence of verbs like “say” and “tell” co-ocurring with the word “boss” is higher than in The Telegraph. Particular importance deserves the verb “tell”, with a null presence in The Telegraph. This verb is frequently used in neutral or informal registers. This could mean that the register used in the The Guardian could luctuate between neutral and informal and more formal in the case of The Telegraph. 5.2.3. Word sketch analysis The Word Sketch analysis has allowed us to know the different types of modiiers that go with the word “boss”, nouns and verbs that are modiied by “boss”, verbs with “boss” either as subject or object and adjective predicates accompanying the term “boss”. The indings stemming from this analysis have also shown important differences concerning both data sets. These are shown in Table 4 below: Word sketch Modiiers of “boss” The Guardian The Telegraph UK (10.62), new (10.18), Deutsche (10.16), retail (9.94), inance (9.94) female (9.67), factory (9.66), bank (9.64), British (9.48), business (9.19),respected (8.69), supermarket (8.69), quietly-spoken (8.69), economy (8.69) Nouns/verbs skyscraper (10.6), ight skyscraper (11.0), ight modiied by “boss” (10.6), class (10.54), mat- (11.0), ter (10.54) Britain (10.68), intelligence (10.24) UK (11.08), top (10.33), bank (9.97), British (9.74), Deutsche (9.59), factory (9.05) industry (8.96), new (8.33), respected (8.09), quietlyspoken (8.09), go-ahead (8.09), stripping (8.09) 124 Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez Word sketch Verbs with “boss” as a object Verbs with “boss” as a subject Adjective predicates of “boss” The Guardian The Telegraph falter (10.82), lead (10.75), appoint (10.75) choose (10.68), charge (10.74), allow (9.91), show (98.7), tell (9.67), do (9.32), be (8.19) warn (10.54), remain (10.1), have (9.6), go (9.38), say (9.16), waive (8.89), care (8.89), spy (8.89), land (8.89), acknowledge (8.89), press (8.89), shy (8.89), shrug (8.87), respond (8.87) know (8.87), pledge (8.87), shock (8.87), cite (8.85), appear (8.85), accuse (8.85) receive (8.85), insist (8.82) fat (12.83), upbeat (12.41), cautious (11.83), optimistic (11.54), such (9.83) terrify (11.19), falter (11.19), appoint (11.09), poach (11.00), choose (10.91), say (10.64), show (10.54), ind (10.47) lead (10.24), be (7.31) remain (10.88), insist (10.47), say (10.38), warn (10.3), have (9.75), shy (9.61), pledge (9.61) respond (9.61), cite (9.61), press (9.61), promote (9.61) slash (9.53), shrug (9.53), argue (9.53), enjoy (9.5), plan (9.48), choose (9.48), predict (9.48), want (9.44), believe (9.41), ind (9.38), show (9.38), be (9.09) upbeat (13.41), optimistic (12.41) Table 4. Examples from the word sketch analysis of “boss” in The Telegraph and The Guardian. Raw frequencies ordered from the highest to the lowest The indings reveal that, in The Guardian, the terms with the highest frequencies modifying the word “boss” have the semantic connotation of someone who is more aware about the current problems the UK faces as regards important social issues like the Brexit vote, inequality between social classes, particularly regarding the salaries earned by bosses and those earned by staff, cases of fraud and corruption on the part of bosses, etc. (e.g. accuse, bank, British, cautious, charge, class, falter, fat, lead, matter, receive, shock, skyscraper, spy, top, UK) as well as someone who acts as an adviser encouraging citizens to improve the current social situation, as observed in allow, care, ight, go-ahead, insist, new, remain, show, warn. The malleability behind terms referring to common professional roles... 125 On the contrary, in The Telegraph sample, we perceive that this same term is modiied by words that tend to connote a person who, despite being concerned about the economic situation that the population of the UK may suffer with the consequences of Brexit, the attitude towards this social issue is more optimistic and conident. Particularly, a boss is perceived as someone acting as an adviser and calming citizens down, that the UK has always been a rich and prosperous nation that cannot be affected, under any circumstances, by the Brexit vote (e.g. be, believe, business, ight, inance, ind, insist, new, plan, UK, warn). Added to that, the negative connotations associated with the concept of “boss” as regards cases of corruption or standing in a more powerful position than the rest of the population is also given scarce consideration, as seen by the complete absence of terms such as accuse, cautious, charge, class, fat, matter, shock, top. The term “boss” is more conceptualised as a person who deserves respect (e.g. respected, quietly-spoken), as he or she is chosen and promoted by his or her expertise, intelligence and skills (e.g. appoint, be, choose, ind, intelligence, promote). Therefore, he or she can be the perfect guide to ensure workers that the UK is a rich country and nothing can alter that, even if the UK leaves the EU (e.g. believe, Britain, British, business, economy, enjoy, optimistic, plan, predict, remain, show, upbeat, want). Likewise, “bosses” are regarded as persons who worry about the negative considerations that the society has towards them regarding cases of bribery and abuse of power, as observed in the frequent use of the verb terrify. In addition to all these insights, the corpus and the analysis could allow for many more indings and interpretations, which would extend further than the aim of the present research. 6. Conclusions The present study demonstrates that the concept of “boss” mostly transmitted in current British society, and reinforced through its press media, implies certain intrinsic deining elements that foster a solid generic interpretative basis of this professional role as a person who has the ability to act as an adviser and an expert in his ield promoting initiatives and solutions despite the surrounding setbacks and uncertainty. This generic interpretative model is also reinforced by widely-accepted synomyms such as executive, director, head and leader. These persis- 126 Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez tent semantic components support socially-shared and accepted mental models which seem to be fairly objective, operative and useful in many professional contexts. Notwithstanding this, our analysis also shows that today this concept entails another set of deining and interpretive parameters, of a more variable and subjective nature, which are highly dependent on the context and make it very vulnerable to the socio-political ideology or orientation of the speakers who use it and of the media through which it is transmitted. Because of this, in our corpus the semantic frames of “boss” connote both a cautious and gloomy professional who is concerned about –and sometimes adversely involved in– hot socio-economic issues such as Brexit, unfair salaries, inequality, fraud, corruption, etc., and also, by contrast, a hope-inspiring and optimistic expert who seems to be above all these issues and is more concerned about predicting a promising and prosperous future for the UK. The study implies that, depending on the socio-political and contextual factors involving the expression of terms referring to usual professional roles, their deinition and meaning differ remarkably, also affecting other associated concepts, such as authority, hierarchy, power, immunity to criticism, company organisation, etc. This dual conceptual and malleable nature of their meaning can signiicantly inluence the correct understanding, translation, acquisition and use of these words, and their associated cognitive/mental models, in today’s professional and educational communication. References Buchanan, Mark. 2009. Think your boss is incompetent? You’re probably right. New Scientist 204(2739): 68-69. Camisón, César & Villar-López, Ana. 2014. Organizational innovation as an enabler of technological innovation capabilities and irm performance. Journal of Business Research 67(1): 2891-2902. Clark, Mary Elizabeth & John F. Riddick. 1991. When the boss is away. Serials Review 17(1): 69-72. Fillmore, Charles J. 1977. Scenes-and-frames semantics. In Zampolli, Antonio (ed.) Linguistic Structures Processing. Amsterdam: North Holland Publishing, 55-88. Fillmore, Charles J. 1985. Frames and the Semantics of Understanding. Quaderni di Semantica 6(2): 222-254. The malleability behind terms referring to common professional roles... 127 Fillmore, Charles J. & Baker, Collin F. 2010. A frames approach to semantic analysis. In Heine, Bernd & Narrog, Heiko (eds.) The Oxford Handbook of Linguistic Analysis. Oxford: OUP, 313-339. Fillmore, Charles J. & Atkins, Beryl T. 1992. Towards a frame-based lexicon: the semantics of RISK and its neighbors. In Lehrer, Adrienne & Kittay, Eva Feder (eds.) Frames, Fields and Contrasts.New Essays in Semantic and Lexical Organization. New York: Routledge, 75-102. Habermas, Jürgen. 1974. Towards a theory of communicative competence. Inquiry 13(1-4): 360-375. Hess, James D. 1983. Why are there bosses? In Hess, James D. (ed.) The Economics of Organization. Oxford: North-Holland Publishing Company, 87-97. Indio, Fabio; Poppi, Massimo & Di Piazza, Salvatore. 2017. In nomine patris: discursive strategies and ideology in the Cosa Nostra family discourse. Discourse, Context & Media 15: 45-53. Kierkegaard, Sylvia. 2005. Privacy in electronic communication: watch your e-mail, your boss is snooping! Computer Law & Security Review 21(3): 226-236. Lakoff, George. 1987. Women, Fire, and Dangerous Things: What Categories Reveal about the Mind. Chicago: The University of Chicago Press. Lehrer, Adrienne & Kittay, Eva Feder (eds.). 1992. Frames, Fields and Contrasts. New Essays in Semantic and Lexical Organization. New York: Routledge. Lehrer, Adrienne. 1974. Semantic ields and lexical structure. New York: American Elsevier. Muller-Smith, Patricia. 1998. Being the boss is not what is used to be! Journal of PeriAnesthesia Nursing 13(5): 317-319. Nelson, Mike. 2005. Semantic associations in Business English: A corpus based analysis. English for Speciic Purposes 25(2): 217-234. Newitz, Annalee. 2006. The boss is watching your every click… New Scientist 191(2571): 30-31. Oxford English Dictionary. 2016. Oxford living dictionaries. Oxford: Oxford University Press. https://en.oxforddictionaries.com [Accesed 12/12/2016]. Shi, Lei & Mihalcea, Rada. 2005. Putting pieces together: combining FrameNet, VerbNet and WordNet for robust semantic parsing. In Gelbukh, Alexander (ed.) CICLing 2005. Berlin: Springer-Verlag, 100-111. Sluss, David M. & Ashforth, Blake E. 2007. Relational identity and identiication: deining ourselves through work relationships. Academy of Management Review 32(1): 9-32. 128 Rosa Giménez-Moreno & Francisco Miguel Ivorra-Pérez Uhl-Bien, Mary & Carstern, Melissa K. 2007. Being ethical when the boss is not. Organizational Dynamics 36(2): 187-201. Van Dijk, Teun A. 2006. Discourse, context and cognition. Discourse Studies 8(1): 159-177. Van Dijk, Teun A. 2008. Discourse and Context. A Sociocognitive Approach. Cambridge: Cambridge University Press. ojs.uv.es/index.php/qilologia/index Qf Lingüístics A quantitative survey of N Prep N constructions in Romance languages and prepositional variability Un estudio cuantitativo de las construcciones N Prep N en las lenguas románicas y variabilidad preposicional Inga Henneckea & Harald Baayenb Universität Tübingen. [email protected] Universität Tübingen. [email protected] Received: 24/04/2017. Accepted: 10/10/2017 a b Abstract: The distinction between syntagmatic compounds of the type N Prep N, such as Fr. jouet d’enfant, and nominal syntagms of the type N Prep N, such as the partially equivalent Fr. jouet pour enfants, remains unclear and vague. This is mainly because the lexical and syntactic status of syntagmatic compounds still is controversial. In some cases, as in jouet d’enfant and jouet pour enfants, partial equivalent syntagmatic compounds and nominal syntagms may coexist and underlie a speciic variation and alternation. In other cases, such as Pt. bracelete de aço and bracelete em aço, two variants of a syntagmatic compound may alternate and coexist. The irst part of this paper provides an overview of the current discussion on these two types of constructions. The second part addresses the alternation and variation of syntagmatic compounds and nominal syntagms by means of analysis of large-scale corpus data, the French, Spanish and Portuguese corpus of the TenTen family. Here, the focus lies on the variation of the prepositional internal element of these constructions as well as on a comparison of different word formation patterns. Keywords: Compounds; quantitative corpus linguistics; lexicon-syntax interface; Romance. Resumen: La distinción entre los compuestos sintagmáticos del tipo N Prep N, como por ejemplo Fr. jouet d’enfant, y los sintagmas nominales del tipo N Prep N, como Fr. jouet pour enfants, sigue siendo confusa. Esto se debe, sobre todo, a que no existe consenso a propósito de la categorización léxica y sintáctica de los compuestos sintagmáticos. En algunos casos, como en jouet d’enfant y jouet pour enfants, se trata de equivalentes parciales que pueden coexistir y estar sujetos a una variación y alternancia Hennecke, Inga & Baayen, Harald. 2017. “A quantitative survey of N Prep N constructions in Romance languages and prepositional variability”. Quaderns de Filologia: Estudis Lingüístics 22: 129-146. doi: 10.7203/qf.22.11305 especíica. En otros, como en Pt. bracelete de aço y bracelete em aço, las posibles variaciones pueden alternar y coexistir en prácticamente todos los contextos. La primera parte de esta contribución ofrece un breve resumen de la discusión reciente sobre estos dos tipos de construcciones. La segunda sección discute la alternancia y variación de los compuestos sintagmáticos y los sintagmas nominales mediante el análisis de diferentes corpus de gran tamaño: el corpus español, francés y portugués de los corpus TenTen. El análisis se centra especialmente en la variación del elemento preposicional interno de los compuestos y los sintagmas, y en la comparación entre los diferentes tipos de formación de palabras que tienen lugar en ellos. Palabras clave: palabras compuestas; lingüística de corpus cuantitativa; interfaz léxico-sintaxis; lenguas románicas. A quantitative survey of N Prep N constructions in Romance languages... 131 1. State of the Art Terminological insecurity and inconsistent classiications dominate the scientiic debate on syntagmatic compounds of the type N Prep N in Romance languages. Currently, possible denominations include terms such as phrasal compounds (Bisetto & Scalise, 2005), syntactic compounds (Rio-Torto & Ribeiro, 2009), improper compounds (Kornfeld, 2009), phrasal lexemes (Masini, 2007, 2009; Masini & Scalise, 2012), “frozen” multiword units (Guevara, 2012), lexicalized syntactic constructions (Villoing, 2012), lexicalized phrases (Fradin, 2009), syntactic words (DiSciullo & Williams, 1987) or even syntactic syntagms or prepositional syntagms. The heterogeneous terminology goes along with a diverse delimitation and integration of different types of lexical and syntagmatic units. In the same way, syntagmatic compounds of the type N Prep N may or may not – depending on the underlying terminology – be included in the group of compounds. Moyna (2011) includes in her deinition of syntagmatic compounds different combinations of substantives and adjectives, which may or may not show orthographic union: [N PREP N]N [N PREP Art N]N [N + A]N [A + N]N dulce de leche, árbol de la cera hierbabuena malasombra “caramel” “wax myrtle” “mint” “evil person” (Moyna 2011: 38) In contrast, Masini (2009) does not include orthographically uniied combinations, such as hierbabuena, but she adds constructions of the type N Prep VINF, such as salle à manger ‘dining room’. Traditional grammars and dictionaries generally classify nominal syntagmatic compounds of the type Sp. bicicleta de montaña ‘mountain bike’, Fr. brosse à dents ‘tooth brush’ or Pt. moinho de vento ‘windmill’ as lexical units and therefore as compounds. But Kabatek & Pusch (2009) indicate that it is not always clear how to differentiate between lexical items of the type perro de caza and more syntactic items such as libro para niños (Kabatek & Pusch, 2009: 93f.). According to de Bustos Gisbert, syntagmatic compounds consist of at least two etymological words and are formally not distinguishable from nominal phrases (de Bustos Gisbert, 1986: 69). In the same line of argumentation, Masini 132 Inga Hennecke & Harald Baayen notes that syntagmatic compounds of the type N Prep N follow the normal syntactic patterns of head modiication of the nominal phrase by the prepositional phrase (2009: 257). N Prep N constructions in Romance languages therefore tend to be left-headed and inlectional processes are performed at the head constituent (ibd.). According to Val Àlvaro (1999), the main distinctive feature between syntagmatic compounds and free nominal syntagms is the absence of a compositional meaning in syntagmatic compounds (Val Àlvaro, 1999: 4827). Therefore, they can be interpreted as complex nominals and not as nominal phrases. In the same line of argumentation, Štekauer (2001b: 39) classiies ‘syntax-based word formations’ such as son-in-law or stuff-leaver as onomasiological naming units that dispose of an internal structure and resort to the same word formation processes as other naming units. Furthermore, syntagmatic compounds generally differ from nominal syntagms in that they form an accentual unit (de Bustos Gisbert, 1986). Still, a main concern of past research on syntagmatic compounds was their delimitation, especially by introducing new delimitation tests (e.g. Bouvier, 2000; Buenafuentes de la Mata, 2006; Bisetto & Scalise, 2005; Lieber & Scalise, 2007; Masini, 2009; Masini & Scalise, 2012). These tests generally include criteria such as the modiication of the constituents (e.g. modiication of the constituent order, insertion or omission of elements) via topicalization, intensiication or the insertion of modifying adjectives. For Portuguese, the last two tests can be exempliied by Rio-Torto and Ribeiro (2012: 125): moinho de vento moinho *antigo de vento moinho de *muito vento “windmill” “*wind old mill” “*wind much mill” These delimitation tests are of major importance for studies taking a lexicological, semantic and morphological perspective. These studies generally follow Benveniste (1966) in his statement that syntagmatic compounds are the real word formation process in French. In this perspective, syntagmatic compounds are commonly perceived as lexical structures that may show signs of internal syntactic patterns (Z.B. Bisetto & Scalise, 1999, 2005; Rio-Torto & Ribeiro, 2012). In contrast, studies that focus on syntax, such as Kornfeld (2003) or Lieber (1992), A quantitative survey of N Prep N constructions in Romance languages... 133 generally perceive syntagmatic compounding as a clearly syntactic process. Other studies again do not focus on the delimitation of lexicon and syntax. From a construction grammar, respectively a construction morphology perspective, syntagmatic compounds and (partially) equivalent nominal syntagms are both considered as constructions, lying on a continuum between lexicon and morphosyntax (e.g. Masini 2009). Still, these studies also target a description and classiication of different constructions, such as syntagmatic compounds, phrases and other types of compounds (Masini 2009). In the present account, we argue that there is no clear line between syntagmatic compounds and syntactic constructions, but that they lie on a continuum between a lexicalized and syntactic pole. A second major concern in research on syntagmatic compounds is the question of whether these constructions are lexicalized syntactic constructions or whether they emerge by productive word formation patterns. Rainer (2016) clearly opts for the classiication of syntagmatic compounds as productive lexical patterns: Formations of this kind [syntagmatic compounds] are not, as often stated erroneously, the result of the lexicalization of regular syntactic sequences, but constitute very productive lexical patterns (…) (Rainer 2016: 2624). In contrast, Guevara (2012) excludes syntagmatic compounds of the type in de semana ‘weekend’ from its description of Spanish compounds, along with cases such as sabelotodo ‘know-it-all’. He explains his decision in that “they are clearly not formed by any rule of the language, they are “frozen” multiword units arising as the result of processes of lexicalization and fossilization and do not belong in the core of word-formation” (Guevara, 2012: 179). In a similar argumentation, Villoing excludes “lexicalized syntactic constructions that behave like lexical units” (Villoing, 2012: 35) such as il de fer ‘wire’, brosse à dents ‘toothbrush’ but also sous verre ‘coaster’, sans-papier ‘illegal immigrant’ and boit-sans-soif ‘boozehound’ from his delimitation of compounds. By contrast, in the same volume on Romance compounds, Rio-Torto & Ribeiro (2012) propose a classiication of phrasal compounds, such as caminho de ferro ‘railway’ in Portuguese, which are classiied as involving “word sequences whose internal structure obeys the syntax rules typical of phrases” (Rio-Torto & Ribeiro, 2012: 7). 134 Inga Hennecke & Harald Baayen This short introduction to the current discussion demonstrates strikingly the terminological insecurity as well as the problematic delimitation and classiication of syntagmatic compounds (for an overview see e.g. Bisetto & Scalise, 2005; Lieber & Scalise, 2007). The most prominent problem in this debate is by far the question of whether syntagmatic compounds should be considered as a part of the lexicon or a part of syntax. Furthermore, in most of the cases, the discussion comes down to the crucial question of whether syntagmatic compounding is a process of lexicalization or a process of productive word formation. In the present paper, we assume that syntagmatic compounding is a productive and rule-governed process of word formation in Romance languages. Furthermore, we assume that there is no clear boundary between lexicalized and syntactic constructions of the type N Prep N. The aim of the present work is to have a closer look at syntagmatic compounding of the type N Prep N in corpora of written French, Spanish, and Portuguese, focusing on the internal variation of N Prep N constructions as well as on their frequency and productivity and potential differences across these three languages. 2. Internal alternation and variation in syntagmatic compounds The above review of the theoretical status of syntagmatic compounds in Romance languages does not present a uniied perspective. Nevertheless, syntagmatic compounds appear to be at least partially lexicalized constructions. The degree of their lexicalization may vary along with other factors such as semantic opacity/idiomaticity, entrenchment, ixedness of the internal constituents, frequency of occurrence, productivity etc. Despite their more or less strong degree of lexicalization, syntagmatic compounds still appear to preserve at least some of their syntactic characteristics. The at least partially syntactic character of syntagmatic compounds is apparent from the internal lexical and inlectional variation of these constructions. Rio-Torto and Ribeira (2012) consider the possibility of internal change in N Prep N – constructions as a test of compound status. From this perspective, examples of constructions in which the preposition can be replaced without changing meaning would imply the construction to be syntactic rather than lexical. Thus, the pair Pt. forno a microondas and forno de microondas ‘microwave oven’, where no clear semantic difference is discernable, would sug- A quantitative survey of N Prep N constructions in Romance languages... 135 gest we are dealing with a syntactic construction, but conversely the French pair lûte de champagne ‘glass of champagne’ and lûte à champagne ‘champagne glass’, where there is a change of meaning, would indicate word formation is at issue. However, the phenomenon of internal prepositional alternation appears to be more complex than this. Internal alternation of the preposition appears to be not uncommon in Romance languages. The possibility of alternation depends to a large extent on factors such as the semantic function of the N2 as well as on the ixedness and idiomaticity of the whole construction. Consider the following examples: 1a. Sp. esmalte de uñas – esmalte para uñas (Pacagnini 2003) “nail polish” “polish for nails” b. Sp. água de lavagem – água para lavagem (ptTenTen) “wash water” “water for washing” c. Fr. jouet d’enfant – jouet pour enfants (frTenTen) “toy” “toy for kids” 2a. Sp. motor(es) de gasolina – motores a gasolina (esTenTen) “gas engine” b. Fr. épingle de nourrice – épingle à nourrice “safety pin” c. Pt. Fogão de lenha – Fogão a lenha (ptTenTen) “wood stove” 3a. Fr. chemise de coton – chemise en coton (frTenTen) “cotton shirt” “shirt of cotton” b. Pt. bracelete de aço – bracelete em aço (ptTenTen) “steel bracelet” “bracelet of steel” c. Sp. ciclismo de pista – ciclismo en pista (esTenTen) “track cycling” “cycling on track” In example 1, we see internal variation of the linking preposition de/para and de/pour. While the constructions containing de are clearly lexicalized, the combinations containing para/pour count as syntactic constructions. The use of pour/para intensiies the semantic relation of the two nominal items in the constructions, in this case ‘function’ (see Kornfeld 2009: 442 ff.). In 1a. and 1b., the N2 designates the object (1a.) or the process (1b.) of use of the N1, whereas in 1c. the user of N1 is speciied. 136 Inga Hennecke & Harald Baayen Example 2 illustrates the alternation between the prepositions de and à (a). Here, both variants have lexical status that does not trigger a change from lexical to syntactic status. The same applies to the examples in 3, where we cannot identify a change in the lexical status, but clearly a certain discrepancy in the degree of lexicalization and the semantic relation between N1 and N2. That is to say that the constructions as shown in example 1.-3. are only considered partial equivalents, as they may also differ from each other in their actual usage frequency, their productivity and their opacity. Some authors, such as Kampers-Manhé (2001), argue that the internal preposition has purely connecting properties (“opérateurs de couplage”) (Kampers-Manhé 2001: 107) and “ne sont pas porteuses de sens” (ibd.). The above examples suggest that the preposition is not semantically completely inert, even though, as we shall see below, some noun pairs show considerable variation with respect to the choice of the internal preposition. Furthermore, the possibility of internal variation in the above examples indicates that these constructions may not be completely lexicalized. They still allow internal modiication that appears to be syntactically motivated. The following quantitative corpus survey aims to give further evidence for the productivity and frequency of the internal prepositional variation in syntagmatic compounds in Romance languages. 3. Corpus survey 3.1. Data The present corpus linguistic investigation is based on three web corpora from the TenTen corpus family from Sketchengine1, more precisely on the corpora frTenTen12 (French), esTenTen11 (Spanish) and ptTenTen11 (Portuguese). Their type counts range from 4 to 10 billion and their token count ranges from 5 to 11 billion (see General Corpus Information on sketchengine.co.uk): 1 <https://www.sketchengine.co.uk>. A quantitative survey of N Prep N constructions in Romance languages... 137 frTenTen esTenTen ptTenTen Tokens 11,444,973,582 10,994,616,207 4,626,584,246 Words 9,889,689,889 9,497,402,122 3,900,501,097 Sentences 456,065,104 407,205,587 190,221,913 Paragraphs 188,079,362 213,364,685 91,248,976 Documents 20,400,411 22,287,566 10,216,060 Table 1. Corpus Info of the TenTen corpora for French, Spanish and Portuguese (https://the.sketchengine.co.uk) The corpora ptTenTen and esTenTen can furthermore be divided into an American and a European part, whereby the majority of the data represent American varieties of Spanish (79% of the esTenTen data) and Portuguese (76% of the ptTenTen data). We made use of normalized samples of 100 million tokens each, provided to us by Sketchengine. Language Types Tokens French 284.432 1.301.850 Spanish 385.162 1.949.941 Portuguese 642.022 3.204.462 Table 2. Type and token counts of N Prep N sequences in the TenTen corpora for French, Spanish, and Portuguese Table 1 lists type and token counts for all N Prep N sequences in the three corpora. In Portuguese, the construction seems to appear on a particularly frequent basis when compared to French and Spanish, which show relatively similar frequencies. The frequent occurrence of the N Prep N construction is in part due to the existence in Portuguese of hybrid forms of the type Prep + Art (do(s), da(s), na(s), no(s)) as well as Prep + Pron (daquela(s)/e(s), naquela(s)/e(s); deste(s)/a(s), neste(s)/a(s)). The equivalent constructions in French and Spanish would be of the form N Prep Article N. In order to dispose of a syntactically homogenous dataset, these constructions were not included for the present analysis. In what follows, we refer to the complete set of N Prep N sequences extracted from the corpora as dataset 1. This dataset 138 Inga Hennecke & Harald Baayen is noisy and contains instances in which the N Prep N sequence is not a syntactic or onomasiological unit, that is to say a naming unit (Štekauer (2001b). Removal of these irrelevant cases from a list of more than 6 million examples was beyond the scope of the present study. Despite this noise, dataset 1 was included in the quantitative survey in order to obtain an overview of the occurrence and productivity of the construction type N Prep N in the languages under investigation. Furthermore, the results from the analysis of dataset 1 offer a irst point of comparison of the analysis of dataset 2. From dataset 1, a second dataset was derived from which word triplets that did not instantiate the N Prep N construction were manually removed. This second dataset, henceforth dataset 2, focused on the internal preposition of the constructions. In a irst step, all constructions overlapping in their N1 and N2 and diverging in their preposition were selected (e.g. livre pour/d’enfants). In a second step, the data was manually inspected and the following constructions were excluded: grammaticalized constructions (frente a, jusqu’à, en dehors), partitive constructions or spatial, temporal or mass nouns (kilo de, lunes a viernes, visita a Roma, journées par semaine), binominal pairs (dia a dia, instant après instant), antonyms (chien sans/avec laisse, personnes avec/ sans emploi), preposition phrases (N1 à base de, par hasard de), verb phrases (mettre N1 en danger, donner N1 à N2), and hybrid forms of the above. Language Types Tokens French 1062 6991 Spanish 547 10219 6795 58932 Portuguese Table 3. Type and token counts for dataset 2, which includes all pairs of nouns that are attested with at least two different internal prepositions Table 3 lists type and token counts for dataset 2. As for dataset 1, the counts for Portuguese outnumber those for French and Spanish. Both datasets were further analysed by considering, in addition to the counts of tokens (N) and types (V), the counts of hapax legomena (V1, the formations occurring once only), the productivity measure P = A quantitative survey of N Prep N constructions in Romance languages... 139 V1/N, which assesses the probability that an additional N Prep N token represents a novel, previously unobserved type, and an estimate S of the potential number of formations in use in the text type sampled by the corpus. Note that S = V + V0, where V0 is the count of formations that do not appear in the sample. S can be estimated given the numbers of word types Vk that occur once, twice, three times etc., when these counts Vk decrease in a regular way. If so, V0 can be estimated and given V0, an estimate of S = V + V0 follows immediately. For further mathematical detail on these measures, see Baayen (2009) and for the estimation of S, Baayen (2001, 2008). Thus, we have three estimates, each highlighting a different aspect of productivity: The number of types V for the extent to which a head or modiier position is used in the corpus, the probability P that when the corpus is increased, new types will be sampled, and the limiting number of types that one might sample if the corpus size were increased to ininity. 3.2. Analysis dataset 1 Table 4 summarizes the frequency and productivity statistics for dataset 1, focusing on the productivity of the nominal slots in the N Prep N construction. The upper subtable documents the counts when types are deined by the irst noun of the construction. The lower subtable concerns the corresponding counts for the second noun. On the basis of the numbers of tokens N, types V, potential types S, and hapax legomena V1, the N Prep N construction appears least productive in French, of medium productivity in Spanish, and most productive in Portuguese. This ordering holds for both the irst and the second noun. The ranking of the three languages by P is different, with Portuguese having the lowest productivity measure. It should be kept in mind, however, that P is itself a function of N, and that it decreases as N (and V) increase. (As we read through a text, the rate at which new words are encountered decreases steadily.) Given that N is very much larger for Portuguese, the value of P is actually surprisingly large. Comparing Spanish and French, the similar values of P are surprising given that N is substantially larger for Spanish than for French. Therefore, the P values provide further support for the ranking based on the other statistics. 140 Inga Hennecke & Harald Baayen Noun1 P S V N V1 Noun2 P S V N V1 French 0.0023 20147 13719 1301850 2994 French 0.0028 24688 16174 1301850 3645 Spanish 0.0023 28755 18407 1949941 4485 Spanish 0.0031 39037 23245 1949941 6045 Portuguese 0.0017 36624 23409 3204462 5448 Portuguese 0.0023 49079 28545 3204462 7370 Table 4. Frequency and productivity statistics for dataset 1. The upper part of the table deines types on the basis of the irst noun, the lower part bases types on the second noun Table 4 also indicates that the second noun position of the construction is used more productively than the irst noun position: all measures assume larger values in the second part of the table. The greater productivity of the modiier position makes sense from an onomasiological perspective, as the second noun slot is typically used to differentiate between subcategories of the head noun, which in Romance languages generally occupies the irst noun slot. The large numbers of hapax legomena, as well as the fact that S >> V all support – within the limits of dataset 1 – that the N Prep N construction is solidly productive in the three Romance languages under consideration here. Further informal surveys of the prepositions de, en-em, à-a, pour-para as well as avec-con-com, again using dataset 1, indicated that French N Prep N constructions containing the prepositions avec and pour are less frequent and productive than equivalent constructions in Portuguese and Spanish containing the prepositions con-com and para. French appears to resort to other types of word formation such as NN or NA constructions instead of using N Prep N constructions containing avec, as in: A quantitative survey of N Prep N constructions in Romance languages... 141 5a) Fr. personne handicapée “handicapped person” b) Sp. persona con discapacidad física/mental “handicapped person” c) Pt. pessoa com necessidades especiais “handicapped person” French also shows a preference for constructions with de instead of pour. At the same time, constructions with the preposition à-a appear to be more productive and frequent in French than in Spanish and Portuguese. Semantic relations that are expressed via à in French tend to require other prepositions, such as de or para, in Spanish or Portuguese: 7a) Fr. Verre à vin “wine glass” b) Sp. Copo de vino/ Copo para vino “wine glass” c) Pt. Copo de vinho “wine glass” 3.3. Analysis dataset 2 Table 5 summarizes the frequency and productivity measures for data set 2, which includes only those (manually veriied) examples of N Prep N constructions in which the irst and second noun co-occur with at least two different prepositions. For this analysis, each combination of irst and second noun and preposition counted as a separate type. P S V N V1 French 0.0594 1748.232 1062 6991 415 Spanish 0 547 10219 0 Portuguese 0.0464 13378.57 6795 58932 2733 Table 5. Frequency and productivity statistics for dataset 2, which comprises all instances of noun pairs that occur with at least two different prepositions As in the analysis of dataset 1, Portuguese again shows the highest type (V) and token (N) frequencies, the largest number of hapax legomena (V1), the highest estimate of possible types (S), and given the large numbers of tokens, a surprisingly large degree of productivity P. Although numbers are reduced for French, the construction – as evaluated on the basis of dataset 2 – remains solidly productive, as evidenced 142 Inga Hennecke & Harald Baayen by the large number of types missed in the sample (S – V = 1748-1062 = 686 = V0). Spanish, by contrast, shows a very different pattern. There are no hapax legomena in dataset 2 for Spanish, and hence P is zero, and S cannot even be estimated (it is expected to be only slightly larger than V, if at all). The number of types (547) is roughly half of that observed for French, and less than 10% of that observed for Portuguese. In other words, internal variation of the preposition for ixed head and modiier nouns is not productive in Spanish, whereas it is productive in French and especially Portuguese. In Portuguese, we ind examples of noun pairs occurring with 5 different prepositions, in French, this reduces to 4, and in Spanish, the maximum is 3. Thus, when we consider the productivity of internal variation of the preposition, the ranking of the languages places French above Spanish. Inspection of the Spanish examples suggests a strong tendency to make use of the high frequent preposition de and to restrict variation in prepositions to a relatively small set of lexicalized compounds. 4. Discussion The present study sheds new light on the vexed question of the status of N Prep N construction in Romance languages. First, the survey of N Prep N sequences in the TenTen corpora of French, Spanish, and Portuguese clearly shows that this construction contributes substantially to the lexicon (in the onomasiological sense) of these languages. In all three languages, the construction is realized in tens of thousands of examples (dataset 1). Admittedly, dataset 1 includes many instances that do not conform to the N Prep N construction. Nevertheless, even if half of the tokens and types were to be discarded, the counts of legitimate constructions still would portray this construction as the most productive onomasiological process in Romance – mirroring the evidence from Germanic languages suggests that derivational word formation is less productive than compounding by several orders of magnitude. It is therefore unlikely that N Prep N constructions in Romance languages are merely lexicalized or fossilized syntactic constructions without support of a productive process of word formation (pace Guevara 2012 and Villoing 2012). To the contrary, for all three languages, large numbers of novel types are expected to be observable in larger samples of A quantitative survey of N Prep N constructions in Romance languages... 143 language use, as indicated by the (tentative) estimates of the population numbers of types (S). An analysis of a hand-curated subset of dataset 1, comprising all attestations of N1 Prep N2 constructions in which N1 and N2 co-occur with at least two different prepositions (dataset 2), brought to light an unexpected difference between Portuguese and French on one hand, and Spanish on the other hand. Portuguese, and to a lesser extent French, exhibit productive internal variation of the preposition. Spanish, by contrast, appears not to allow its speakers the same lexibility in the choice of preposition. In the absence of hapax legomena for Spanish noun pairs, Spanish emerges as a language that avoids both “free” variation of the preposition for approximately the same meaning, as well as using different prepositions for differentiating between shades of meaning given a modiier and head noun (as instantiated for instance for French by the pair ‘verre à vin’ and ‘verre de vin’). An informal survey of which prepositions are favored revealed French as showing a stronger preference for constructions containing the preposition à compared to Spanish or Portuguese, which use de or para more productively. The absence of avec in French N Prep N constructions is likely to be due to NA-constructions being preferred. In French, pour emerged as slightly more productive than de (e.g. livre d’enfant and livre pour enfants). 5. Conclusions The present quantitative survey of N Prep N constructions in Spanish, French and Portuguese offers new empirical evidence for the discussion on Romance word formation. The two main points addressed in this study concern the lexical or syntactic status of syntagmatic compounds as well as their productivity and degree of lexicalization or fossilization. The analysis indicates that these constructions indeed are realized according to productive processes of Romance word formation. That is to say, syntagmatic compounds are naming units that form part of the lexicon. N Prep N constructions are not merely fossilized syntactic constructions, rather, the construction type N Prep N is an important and frequently used mechanism of word formation. Still, it is important to highlight that it is neither possible nor necessary to draw a clear line between lexical onomasiological units of the type N Prep N and syn- 144 Inga Hennecke & Harald Baayen tactic constructions of the type N Prep N. Here, different criteria, such as the degree of ixedness, idiomaticity and compositionality play an important role. Furthermore, the present quantitative analysis points out that internal prepositional variation is possible in N Prep N constructions in Romance languages, but that this variation displays different characteristics in the three Romance languages under investigation. Portuguese shows the highest frequency and productivity of internal prepositional variation in a large number of different semantic contexts. In contrast, the Spanish data do not allow any productivity in the internal variation of N Prep N constructions. In the same line, Spanish has the strongest tendency of employing the preposition de as internal prepositions in N Prep N constructions. In conclusion, it can be stated that syntagmatic compounds of the type N Prep N form a productive and frequent part of Romance word formation. Still, their frequency and productivity as a word formation type vary in the three Romance languages, as well as their disposition for internal prepositional variation. Further studies on this subject need to consider the qualitative characteristics of internal prepositional variation, notably the semantic relation between the N1 and the N2. 5. References Anshen, Frank & Aronoff, Mark. 1997. Morphology in real time. In Geert, E. Booij & van Marle, Jaap (eds.) Yearbook of Morphology 1996. Dordrecht: Kluwer Academic Publishers, 9-12. Aronoff, Mark. 1976. Word formation in generative grammar. Cambridge, MA: MIT Press. Baayen, Harald & Lieber, Rochelle. 1991. Productivity and English derivation: a corpus-based study. Linguistics 29(5): 801-844. Bauer, Laurie. 2001. Morphological productivity. Cambridge. Baayen, R. H. 2009. Corpus linguistics in morphology: morphological productivity. In Lüdeling, A. & Kyto, M. (eds.) Corpus Linguistics. An International Handbook. Berlin: Mouton De Gruyter, 900-919. Baayen, R. H. 2008. Analyzing Linguistic Data. A Practical Introduction to Statistics Using R. Cambridge University Press. Baayen, R. H. 2001. Word Frequency Distributions. Kluwer. Benveniste, Émile (ed.). 1966. Problèmes de linguistique générale (Bibliothèque des sciences humaines 1). Paris: Gallimard. A quantitative survey of N Prep N constructions in Romance languages... 145 Bisetto, Antonietta & Scalise, Sergio. 1999. Compounding: morphology and/ or syntax? In Mereu, Lunella (ed.) Boundaries of Morphology and Syntax (Amsterdam Studies in the Theory and History of Linguistic Science 4). Amsterdam/Philadelphia: Benjamins, 31-49. Bisetto, Antonietta & Scalise, Sergio. 2005. The classiication of compounds. Lingue e Linguaggio 4(2): 319-332. Bouvier, Yves F. 2000. Deinir les composes par opposition aux syntagmes. In Haeberli, Eric & Laenzlinger, Christopher (eds.) Generative Grammar in Geneva, 165-187. Buenafuentes de la Malta, Cristina. 2006/04. Entre la morfología, la sintaxis y el léxico: la delimitaciòn de la composición sintagmática en espanol (VII Congrés de Lingüística General). Barcelona. Di Sciullo, Anne-Marie & Williams, Edwin. 1987. On the deinition of word (Linguistic inquiry. Monographs 14). Cambridge, MA. Faria, André. 2010. Formação de compostos nominais de base livre do PB. In Almeida, Maria L.; Ferreira, Rosangela & Pinheiro, Diogo (eds.) Linguística cognitiva em foco: morfologia e semântica do português. Rio de Janeiro: Soluções Editoriais. Fradin, Bernhard. 2009. IE, Romance: French. In Lieber, Rochelle & Štekauer, Pavol (eds.) The Oxford Handbook of compounding. Oxford University Press, 417-435. Guevara, Emiliano R. 2012. Spanish compounds. Probus. International Journal of Latin and Romance Linguistics 24(1): 175-195. Kabatek, Johannes & Pusch, Claus D. 2009. Spanische Sprachwissenschaft: Eine Einführung. Tübingen: Narr Franke Attempto Verlag. Kampers-Manhe, Brigitte. 2001. Le statut de la préposition dans les mots composés. Travaux de Linguistique 42-43(1), 83-95. Kornfeld, Laura M. 2003. Compounds N+N as formally lexicalized appositions in Spanish. In Booij, Geert; De Cesaris, Janet; Ralli, Angela & Scalise, Sergio (eds.) Topics in Morphology: Selected Papers from the Third Meditteranean Morphology Meeting. Barcelona: Universitat de Pompeu Fabra, 211-225. Kornfeld, Laura M. 2009. IE, Romance: Spanish. In Rochelle Lieber & Pavol Štekauer (eds.) The Oxford Handbook of Compounding. Oxford University Press , 436-453. Lieber, Rochelle. 1992. Deconstruction Morphology: Word Formation in Syntactic Theory. Chicago/London: University of Chicago Press. Lieber, Rochelle & Scalise, Sergio. 2007. The lexical integrity hypothesis in a new theoretical universe. In Booij, Geert; Ducceschi, Luca; Fradin, Bernhard; Guevara, Emiliano R.; Ralli, Angela & Scalise, Sergio (eds.) On-line Proceedings of the Fifth Mediterranean Morphology Meeting, 1-25. 146 Inga Hennecke & Harald Baayen Masini, Francesca. 2009. Phrasal lexemes, compounds and phrases: A construcionist perspective. Word Structure 2(2): 254-271. Masini, Francesca & Scalise, Sergio. 2012. Italian compounds. Probus. International Journal of Latin and Romance Linguistics 24(1): 61-91. Masini, Francesca & Thornton, Anna. 2007. Italian VEV lexical constructions. In Booij, Geert; Ralli, Angela & Scalise, Sergio (eds.) Morphology and Dialectology 6: 148-189. Pacagnini, Ana M. J. 2003. Compuestos sintagmáticos y alternancia preposicional. Moenia 9: 159-172. Rainer, Franz. 2016. Italian. In Müller, O. P.; Ohnheiser, Ingeborg; Olsen, Susan & Rainer, Franz (eds.) Word Formation: An International Handbook of the Languages of Europe. Berlin/Boston: Mouton de Gruyter, 2712-2731. Rainer, Franz. 2016. Spanish. In Müller, O. P.; Ohnheiser, Ingeborg; Olsen, Susan & Rainer, Franz (eds.) Word Formation: An International Handbook of the Languages of Europe. Berlin/Boston: Mouton de Gruyter, 2620-2640. Rio-Torto, Graça & Ribeiro, Sílvia. 2009. Compounds in portuguese. Lingue e Linguaggio 8(2): 271-291. Rio-Torto, Graça & Ribeiro, Sílvia. 2012. Portuguese compounds. Probus. International Journal of Latin and Romance Linguistics 24(1): 119-145. Schlechtweg, Marcel & Härtl, Holden. 2015. Compound versus phrase: Evidence from a learning study (10th Mediterranean Morphology Meeting). Haifa. Štekauer, Pavol. 2001b. Fundamental principles of an onomasiological theory of English word-formation. Onomasiology Online 2: 1-42. van Goethem, Kristel. 2009. Choosing between A+N compounds and lexicalized A+N phrases: The position of French in comparison to Germanic languages. Word Structure 2(2): 241-253. Villoing, Florence. 2012. French compounds. Probus. International Journal of Latin and Romance Linguistics 24(1): 29-60. ojs.uv.es/index.php/qilologia/index Qf Lingüístics Lingüística de corpus y fraseología contrastiva (alemán-español): Las combinaciones usuales de estructura [PREP + S]. El caso de entre lágrimas y unter Tränen Corpus linguistics and contrastive phraseology (German-Spanish): The multi word units of [PREP+N] structure. The case of entre lágrimas and under Tränen Ana Mansilla Universidad de Murcia. [email protected] Recibido: 14/05/2017. Aceptado: 25/10/2017 Resumen: En el presente artículo analizamos la ijación externa e interna que presenta el binomio [entre + S] con artículo cero en español, y sus equivalentes en alemán en las combinaciones usuales entre lágrimas y unter Tränen. Los datos son extraídos del corpus DeReKo (Das Deutsche Referenzkorpus) y del Sketch Engine, deTenTen 13 y eseuTenTen11. En primer lugar, presentamos los objetivos del proyecto en el que este trabajo está enmarcado. En un segundo punto exponemos las diferentes aplicaciones de la lingüística de corpus en el ámbito de la fraseología. Por último, en base a los corpus consultados comentamos convergencias o divergencias más notables de las combinaciones usuales objeto de estudio. Palabras clave: fraseología; lingüística de corpus; combinaciones usuales; alemán; español. Abstract: The present article analyses the external and internal ixation of the binomial pattern [entre + NN] with zero article in Spanish and its equivalents in German. In particular, the article focuses on the expressions entre lágrimas/unter Tränen. The data have been extracted from the DeReKo corpus (Das Deutsche Referenzkorpus) and from the Sketch Engine corpora deTenTen 13 and eseuTenTen11. First, the article will present the objectives of the research project of which this study forms part. Then, I will address some applications of corpus linguistics in the ield of phraseology. Finally, similarities and differences between the expressions investigated will be analysed based on the evidence obtained from the corpora. Keywords: phraseology; corpus linguistics; common multi-word units; German; Spanish. Mansilla, Ana. 2017. “Lingüística de corpus y fraseología contrastiva (alemán-español): Las combinaciones usuales de estructura [PREP + S]. El caso de entre lágrimas y unter Tränen”. Quaderns de Filologia: Estudis Lingüístics 22: 147164. doi: 10.7203/qf.22.11306 Lingüística de corpus y fraseología contrastiva (alemán-español)... 149 1. Introducción Objeto del presente artículo es abordar los desarrollos recientes de la fraseología desde la perspectiva de la lingüística de corpus y analizar las posibilidades que ofrecen las tecnologías lingüísticas para la resolución de aspectos pendientes como el signiicado, la función o la forma de las unidades fraseológicas. Este terreno emergente de la lingüística de corpus en el ámbito de la fraseología bilingüe (alemán-español) se encuentra, a nuestro modo de ver, escasamente explorado, hecho que nos ha empujado a escribir este trabajo. Enfocamos nuestro estudio hacia las combinaciones usuales [PREP + S] basándonos en la teoría de Kathrin Steyer (2013). En especial, prestamos atención a las combinaciones usuales [PREP + S] entre lágrimas y su equivalente alemán unter Tränen. La inalidad es detectar convergencias y divergencias entre las distintas CU, comprobar los periles sintagmáticos, tanto la ijación externa (colocados sustantivos y verbales que funcionan de nodo del modiicador preposicional), como la ijación interna (slots internos) en ambos sistemas fraseológicos y si es posible sistematizarlos semántica y fraseológicamente. 2. Marco del Proyecto. Breve descripción Este trabajo se enmarca en el proyecto de investigación Combinaciones fraseológicas del alemán de estructura [PREP + S]: patrones sintagmáticos, descripción lexicográica y correspondencias en español en curso del grupo FRASESPAL1 cuya inalidad es extraer, inventariar y describir las combinaciones usuales (CU) de estructura [PREP + S] de diferentes corpus. Seguimos un método inductivo, es decir, empleamos las herramientas y los datos estadísticos que nos proporciona el corpus con el in de extraer información que sería invisible en ausencia de Proyecto del Ministerio de Ciencia e Innovación (FFI2013-45769-P) con el título Combinaciones fraseológicas del alemán de estructura [PREP + S]: patrones sintagmáticos, descripción lexicográica y correspondencias en español, promovido por el equipo de investigación FRASESPAL dirigido por Carmen Mellado Blanco de la Universidad de Santiago de Compostela con la colaboración de Kathrin Steyer, directora del proyecto “Usuelle Wortverbindungen” del IDS (http://www1.ids-mannheim.de/ lexik/uwv.html). Los resultados del proyecto serán publicados en la plataforma online OWID del IDS http://www.owid.de/wb/uwv/start.html. 1 150 Ana Mansilla corpus, y que resulta relevante para nuestro estudio. En nuestro proyecto excluimos las CU que forman parte de “Funktionsverbgefüge” o complementos preposicionales regidos por verbos, sustantivos o adjetivos, (padecer de, tener que ver con, sucumbir a, etc.). Asimismo, manejamos herramientas diseñadas en el IDS (Institut für Deutsche Sprache) como el programa COSMAS II (Corpus Search, Management and Analysis System) y Lexpan (Lexical Patterns Analyzer) que en base a las listas de Kwics (Key Word in Context) facilita el análisis de los slots internos o Lückenfüller de las CU. Asimismo, compilamos la información en base al corpus Sketch Engine, en concreto en base al corpus del alemán deTenTen 13 y los corpus European Spanish Web 2011 y eseuTenTen11 para el español. Fundamentamos nuestro análisis a partir de dos parámetros: la ijación externa hace referencia a los colocados verbales y sustantivos que aparecen a la derecha y a la izquierda del nodo (cotexto anterior o posterior) o la palabra o palabras que se están estudiando, es decir, en contacto directo o no con el nodo [X (...) PREP + S (…) X], con el in de que se puedan constatar modelos recurrentes que tienden a la lexicalización (Mellado Blanco, 2015) y la ijación interna [PREP + X + S], esto es, qué tipo de slots internos aparecen en el discurso entre la preposición y el sustantivo: [con X dolores] (con fuertes dolores); [unter X Schmerzen] (unter starken Schmerzen). Se nos antoja de obligada mención resaltar la función semántica de las preposiciones y de los sustantivos en las CU que estudiamos [PREP + S] y si cada unidad léxica mantiene sus signiicados léxicos o no, esto es, si actúan con un sentido literal o con uno igurado, porque la interacción entre forma y signiicado inluye directamente en la mayor o menor idiomaticidad y por tanto lexicalización de las CU. Las combinaciones usuales objeto de nuestro estudio lo constituyen combinaciones de palabras que poseen constituyentes ijos y otros variables que, aunque son slots libres, están sujetos a ciertas restricciones semántico-combinatorias [unter Vorbehalt Xdem Abkommen/einem Beitritt/ zustimmen], [con Xtono/mueca/sonrisa de satisfacción]. Steyer delimita dem Projekt conceptualmente el término de combinación usual (usuelle Wortverbindung) como sigue: Usuelle Wortverbindungen (UWV) sind als polylexikalische, habitualisierte sprachliche Zeichen zu verstehen, die speziischen Beschränkungen unterliegen. Diese Beschränkungen können alle Ebenen der Spra- Lingüística de corpus y fraseología contrastiva (alemán-español)... 151 che betreffen. Sie ergeben sich aber primär nicht aus dem Sprachsystem, etwa bedingt durch transformationelle Defekte oder semantische Selektionsbeschränkungen, sondern aus dem rekurrenten Gebrauch dieser mehrgliedrigen Entitäten (Steyer, 2013: 16). Desde un punto de vista semántico estas construcciones presentan un componente pragmático “adicional”, no deducible de la suma de los signiicados parciales de sus constituyentes. Steyer (2013) sustenta su teoría en datos estadísticos del corpus y establece diferentes parámetros para valorar la mayor o menor ijación de las combinaciones usuales objeto de estudio: • • • • El grado de ijación de la preposición y del sustantivo. El grado de posible saturación entre preposición y sustantivo. La interacción entre forma y signiicado de las CU. La ausencia de determinante entre preposición y sustantivo. La teoría de Steyer (2013) se sustenta a grandes rasgos en aquellos fraseologismos que se han situado durante mucho tiempo en la periferia de la fraseología alemana, y, por tanto, no han sido abordados de forma sistematizada y lo conforman los Muster, Schemata, Schablonen. Al respecto cabe citar el término de Modellbildung acuñado por Häusermann (1977:30), o el de Phraseoschablone que apunta Fleischer (1997: 130) y que hace referencia a esquemas que encierran una interpretación semántica ija y están sujetos a una especia de “idiomaticidad sintáctica”. Algunos ejemplos que Fleischer cita en su obra se ajustan al esquema “X ist X” (p. ej. sicher ist sicher, Urlaub ist Urlaub, geschenkt ist geschenkt, etc.). En este sentido, Burger (2015: 45) y Dobrovol’ skij (2011) abogan por el término Schema que hace referencia a un esquema sintáctico con una semántica irregular, cuyos slots son rellenados por componentes léxicos libres aunque sometidos a ciertas restricciones semánticas: [PRONder/die/ und XINFINITIVO]: der und singen!, der und diktieren [¡él cantando, ¡él imponiéndose!], entre otros. 3. Lingüística de corpus El concepto lingüística de corpus está indisolublemente asociado con la esfera de la lingüística computacional (o lingüística informática) que, según Chantal Pérez & Antonio Moreno (2009: 68), constituye “un 152 Ana Mansilla campo cientíico de carácter interdisciplinar, vinculado a la lingüística y a la informática, cuyo in fundamental es la elaboración de modelos computacionales que reproduzcan distintos aspectos del lenguaje humano y que faciliten el tratamiento informatizado de las lenguas”. La irrupción de las nuevas tecnologías ha enriquecido sobremanera la visión de conjunto del fenómeno de la lingüística. Entre las múltiples aplicaciones de la lingüística de corpus, hay que señalar la frecuencia de palabras en torno a un campo semántico determinado, la elaboración de modelos lingüísticos (gramática sintagmática generalizada), o la descripción de diferentes niveles de la lengua (sintaxis, semántica, pragmática). De igual modo, cabe mencionar el campo de las tecnologías del habla (reconocimiento del habla, la síntesis del habla) o la traducción automática, o asistida por ordenador, entre otros. Asimismo, se puede aplicar al ámbito de la fraseología cuando de lo que se trata es del hábito colocacional de las palabras, esto es, de la frecuencia de coaparición (aparición simultánea) de varias palabras. Se pueden extraer y ordenar frecuencias por orden alfabético, o índices estadísticos de palabras que aparezcan a la derecha y a la izquierda del nodo, entre otros. Sin dejar de lado la corriente computacional, es justo mencionar la labor desempeñada por Sinclair (1991: 170)2, para quien es sustancial la frecuencia de coaparición de las unidades que integran la colocación o la combinación de palabras, acogiéndose al idiomatic principle. En relación a este principio, los hablantes suelen emplear unas palabras con una frecuencia mayor que otras, lo que da lugar a combinaciones “léxicas semiprefabricadas”. Asimismo, las coocurrencias léxicas pueden dar cuenta del peril colocacional, los patrones coligacionales o las preferencias semánticas. En el ámbito de la lexicografía los corpus electrónicos se han vuelto prácticamente imprescindibles al aportar información relevante desde un punto de vista pragmático (Sánchez & Almela, 2010: 5), porque el signiicado de una palabra es su uso en un contexto, una idea que ya en su día formuló Wittgenstein en su obra Philosophische Untersuchungen (1953) hace más de seis décadas: “Die Bedeutung eines Wortes ist sein Gebrauch in der Sprache”. Sinclair hace referencia al node (núcleo), a los collocates (colocados) y span (espacio de texto) para entender con más claridad qué se entiende por colocación, siendo especialmente relevante el espacio que dista entre dos o más palabras. 2 Lingüística de corpus y fraseología contrastiva (alemán-español)... 153 4. Análisis de la CU con estructura [UNTER + S]/[ENTRE + S]. El caso unter Tränen y entre lágrimas A partir del rastreo en el corpus DeTenTen y EnTenTen de Sketch Engine queremos mostrar la combinatoria del cotexto de las combinaciones usuales entre lágrimas y su equivalente en alemán unter Tränen. La búsqueda que llevamos a cabo tiene en cuenta si la CU aparece al principio o no de la frase. Hemos observado que la frecuencia de aparición de las CU alemanas es mayor que las españolas, y está relacionado con aspectos sintácticos intrínsecos a cada lengua (unter Tränen, 10299 frente a entre lágrimas 710 ocurrencias). Como hemos señalado más arriba, en nuestro proyecto incidimos tanto en la ijación externa [X entre lágrimas X] como en la ijación interna [entre X lágrimas], cuyo slot X suele estar saturado por adjetivos. La alta frecuencia de aparición de determinados colocados verbales o sustantivos puede desembocar en patrones o construcciones sintácticas que contienen un signiicado más o menos idiomático (Phraseoschablonen). De las cinco acepciones que el DRAE recoge para el lexema lágrima, la primera es la que nos interesa para nuestro estudio: (1) “Cada una de las gotas que segrega la glándula lagrimal (usado más en plural)”. Por su parte, el Deutsches Wörterbuch der deutschen Sprache (DWDS) registra una acepción para el lexema Träne: (1) “von den Tränendrüsen im Auge abgesonderte, klare Flüssigkeit”. La preposición unter en la CU unter Schmerzen designa un “Begleitumstand” (Helbig & Buscha, 1996: 439), esto es, una circunstancia (valor modal) que va de la mano de la acción principal. A modo de ejemplo, sirvan de ilustración los siguientes ejemplos extraídos de Helbig & Buscha (1996: 439): (1) Unter großem Beifall wurde der Redner vorgestellt. (2) Unter Jubel und Gelächter iel der Vorhang. A este respecto, procede mencionar la publicación de Tibor Kiss (2014) que, de forma pormenorizada, aborda las preposiciones desde diferentes enfoques (sintáctico, pragmático o semántico). En lo que a la preposición unter se reiere, de las once acepciones que presentan los 154 Ana Mansilla autores la tercera es la que mejor se ajusta al signiicado de la preposición unter en la CU unter Tränen (3b. Begleitumstand/2 Vorgänge): Begleitumstände umfassen sowohl äußere Begleitumstände (wie Beifall) als auch Gefühle und Gemütszustände, die eine Handlung begleiten. Die Bedeutung tritt nur bei Modiikation von Ereignissen, Handlungen oder Zuständen auf. Die Lesart wird häuig durch unbelebte oder fehlende Agenten hervorgerufen. In der Semantik sind zwei identiizierbare Vorgänge angelegt, von denen der eine den anderen begleitet. Bei dem zweiten Vorgang handelt es sich um die Umstände der Handlung, die oftmals nicht intentional verursacht werden (Kiss et al., 2014: 188). En lo que concierne al español, la preposición entre especiica un signiicado locativo que etimológicamente procede de inter ‘en el interior de algo discontinuo’ (Martínez García, 2012: 25) y que, a su vez, proviene de la locución prepositiva intro usque (hasta dentro de). En latín clásico la preposición de acusativo inter designaba tanto un valor locativo (inter multitudinem ‘en medio de la multitud’) como uno temporal (inter noctem ‘durante la noche’). En opinión de Cabezas Holgado (2013: 17), las propiedades léxicas del núcleo predicativo entre se resumen en dos: valor locativo y valor colectivo. Existe un grupo numeroso de CU que subyacen al esquema siguiente [unter/entre Splural /sentimientos (movimientos corporales, gritos, partes del discurso] por ejemplo, entre protestas – unter Protesten; entre gemidos - unter Seufzern; entre lágrimas – unter Tränen; entre risas – unter Gelächter, etc. Los sustantivos que ocupan el slot S lo conforman semánticamente sustantivos del ámbito de la comunicación verbal o de la expresión corporal (Mellado Blanco & López Meirama en prensa). Por lo que se reiere a la preposición entre en la CU entre lágrimas aquella expresa dos valores: temporal de simultaneidad y modal. Un estado se desarrolla de manera paralela a la acción que expresa el verbo principal. El hecho de que se produzcan dos eventos de forma simultánea radica en la naturaleza semántica de los sustantivos que acompañan a la preposición, por ser en su mayoría de tipo deverbal (entre llantos, sollozos, risas, etc.). Partiendo en este caso del signiicado del sustantivo podemos, en cierto modo, llegar al signiicado real de la preposición: Lingüística de corpus y fraseología contrastiva (alemán-español)... 155 (3) Unter Tränen sagt er in einem Interview: “Ich wollte immer nur ein ganz normaler Junge sein. Aber das Schicksal hat es anders gewollt. [http://casting.mattschiibe.ch/2009/06/] Como se desprende del ejemplo (3), la CU unter Tränen suele tener como equivalencia un gerundio en español (unter Tränen sagt er – dice llorando). Si bien existen sustantivos con suijos sustantivadores que, desde un punto de vista morfológico, evidencian su relación con los sustantivos, se les incluye en el grupo de los sustantivos deverbales, aún cuando no describan sensu stricto un proceso, como p. ej. llanto, lloriqueo, suspiro, aullido, etc. Ejemplos aines los encontramos en otras CU españolas cuyos sustantivos son de naturaleza deverbal como entre sollozos que sería sinónimo, en función del contexto, de sollozando. Por tanto, los sustantivos llanto, lágrima, suspiro denotan semánticamente el desarrollo del evento y sintácticamente ponen de maniiesto una estructura argumental. A la vista del ejemplo (3), la preposición unter deja de convertirse en mero nexo funcional, al mantener un estrecho vínculo con el sustantivo Träne. 4.1. Fijación externa e interna [X entre lágrimas X] y [X unter Tränen X]; [entre X lágrimas X] y [unter X Tränen] De acuerdo con la combinatoria o ijación externa, se observa en ambas CU una clara preferencia por colocados verbales que son verbos de comunicación o verba dicendi. Al tratarse de una CU en la que el componente emocional (de estado) está presente, hay que destacar el hecho de que las CU entre lágrimas y unter Tränen coocurran con verbos en los que se evidencien relaciones sociales o emociones. La Tabla 1 proporciona una lista de los 25 primeros colocados más frecuentes que aparecen inmediatamente a la izquierda y a la derecha de la CU unter Tränen a una distancia de 5 palabras, tanto sustantivos como verbales estos últimos conjugados en diferentes formas verbales. Los datos sugieren que los verbos que aparecen en el top según el índice logDice, el índice estadístico desde un enfoque lexicográico más iable, lo conforman verbos directivos o exhortativos tales como bitten y lehen que tienen una alta frecuencia de aparición. En español no hemos detectado verbos exhortativos equivalentes del verbo lehen, anlehen (suplicar, implorar). La combinatoria de los colocados ver- 156 Ana Mansilla bales nos muestra, entre otros muchos aspectos, qué contextos pueden predominar. Por lo que atañe a la situación de uso de los verbos lehen y bitten en combinación con unter Tränen se observa una clara tendencia del registro de habla elevado propio de contextos literarios o religiosos: (4) Turribius erschrak sehr, als er davon hörte, er weigerte sich, diese Würde anzunehmen und lehte unter Tränen zu Gott, ihm diese Last weg zu nehmen. [http://www.heiligenlegenden.de/monate/ maerz/23/turribius/home.html] Tabla 1. Ejemplo de colocados de la CU unter Tränen Del ejemplo (4) se desprende que el sujeto paciente de la acción es Gott (zu XGott/Herrn/Jesus lehen). Lingüística de corpus y fraseología contrastiva (alemán-español)... 157 Los verba dicendi conforman un grupo igualmente numeroso, sobre todo en lo que concierne a verbos en alemán como beichten, berichten, sich entschuldigen, erzählen o gestehen. Este último es el más signiicativo del grupo por frecuencia de coaparición y mantiene un signiicado afín con la CU unter Tränen por el hecho de que reconocer una culpa (gestehen) encierra un momento de una gran carga emocional (unter Tränen). Tabla 2. Ejemplo de colocados de la CU “entre lágrimas” 158 Ana Mansilla Tal y como se desprende de la tabla 2, los casos más representativos lo componen verba dicendi (repetir, confesar, pedir), verbos de interacción social (agradecer) y verbos de contacto (abrazar). Se constata en ambos idiomas un grupo recurrente de verbos que expresan relaciones sociales como sich verabschieden, Abschied nehmen y despedirse o decir adiós en español en los que se describe mayoritariamente el cese de la actividad dentro de un contexto deportivo: (5) Dabei verabschiedete er sich unter Tränen von den fast 24.000 Zuschauern im Arthur Ashe Stadium in New York. [http://www. whoswho.de/bio/andre-agassi.html] (6) pero que sirvió como perfecto colofón a la carrera de Joseba Etxeberría, quién se despidió de San Mamés entre lágrimas con más de 400 partidos como bagaje. [http://agendapolitica.es/deportes/elgetafe-regresa-a-europa-por-la-puerta-grande.html] En español y en alemán se observan diferencias de uso con relación al colocado verbal sich umarmen y abrazar. En español, se enuncia con frecuencia en presente en 3.ª persona del plural con sentido recíproco, y en cambio en alemán se acentúa más el uso en pasado igualmente en 3.ª persona del plural. Los lemas y sus correspondientes formas verbales evidencian en el corpus distintas estrategias de implicación del interlocutor. En el caso del presente de indicativo en 3.ª personal del plural (sie umarmen sich, se abrazan), los lemas verbales utilizados poseen una función descriptiva. Antes de pasar a la ijación interna, exponemos brevemente la expansión a la derecha de las CU [unter Tränen X]y [entre lágrimas X]. El análisis de las listas de colocados a la derecha de las CU presenta elementos interesantes (unter Tränen des/der/und y entre lágrimas de/y). La expansión de [unter Tränen NPgenitivo] que actúa como modiicador del lexema ‘Träne’ engloba sustantivos que pertenecen mayoritariamente al ámbito de las emociones (Rührung, Mitleid). En cuanto a la secuencia española [entre lágrimas NPgenitivo] se constatan sustantivos de diferente naturaleza semántica (rabia, tristeza, emoción). El análisis de los corpus nos ha permitido constatar que especialmente algunos colocados sustantivos muestran una marcada tendencia a la expansión a la derecha entre lágrimas y sollozos, entre lágrimas y abrazos, entre lágrimas y aplausos. En alemán los sustantivos más prototípicos refuerzan el componente emocional de llorar por ser sinónimos de Trä- Lingüística de corpus y fraseología contrastiva (alemán-español)... 159 nen (unter Tränen und Bluttränen, unter Tränen und Schluchzen, unter Tränen und Seufzern). De ahí que algunos de estos binomios presenten un cierto grado de lexicalización (Mellado Blanco, 2015). (7) Wenn die Kinder dieser Welt meine Worte nicht annehmen und verachten – die ich unter Tränen und Bluttränen laut herausschreie – weiterhin den Vergnügungen und dem Müßiggang frönen, wird der Verfall wie ein Dieb in der Nacht hereinbrechen. [http://www. kommherrjesus.de/index.php] Asimismo, tenemos que tener en cuenta que las lenguas diieren en su morfología, así, por ejemplo, en alemán, donde la composición es mucho más frecuente que en español, la equivalencia de lágrimas de emoción o lágrimas de cocodrilo es un compuesto que conforma una unidad ortográica (Gefühlstränen, Kokrodilstränen) en parte por el hecho de que el léxico alemán es más proclive a un mayor número de compuestos de este tipo frente al español, que presenta una expansión hacia la derecha en forma de complemento del nombre (lágrimas de Nemoción/cocodrilo/alegría). La diferencia estriba en la morfología léxica de cada lengua, concretamente en la preferencia del alemán por palabras compuestas que conforman una unidad ortográica (Lachtränen) frente a las formaciones compuestas en español que no están unidas gráicamente (lágrimas de emoción). Este rasgo es fundamental cuando manejamos corpus y consultamos frecuencias de aparición de grupos de palabras, que más adelante comprobaremos en relación a la ijación interna. En alemán, hemos observado que la palabra compuesta Kokrodilstränen como CU se emplea tanto con la preposición mit como con la preposición unter (unter Kokrodilstränen, mit Kokrodilstränen). Curiosamente, en español no aparece ningún caso con la preposición entre lágrimas de cocodrilo, únicamente con la preposición con. Esta CU con expansión a la derecha se usa frecuentemente en oraciones de imperativo negativo (no me vengas con lágrimas de cocodrilo) o en oraciones airmativas: (8) Lo que me parece patética es la actitud de algunos, que vienen con lágrimas de cocodrilo diciendo que se están cargando el ciclismo, sois vosotros los que os lo habéis cargado, pandilla de golfos y maleantes, iros a llorar con vuestra p... madre. [http://blogs.abc.es/] 160 Ana Mansilla El caso de lágrimas de risa es frecuente en alemán acompañado de la preposición entre (unter Lachtränen) frente al español que se decanta por la combinación con la preposición con lágrimas de risa. En el corpus solo hemos encontrado un caso (entre lágrimas de risa). En relación con la ijación interna [unter + X + Tränen] y [entre + X + lágrimas] el análisis de los corpus constata que la expansión interna en español es menor que en alemán, la tendencia del adjetivo a la posición posnominal promueve secuencias del tipo lágrimas amargas. En este punto, consideramos fundamental hacer hincapié en el fenómeno de la posición de los adjetivos en ambas lenguas. La fuerza de la posición atributiva de los adjetivos en alemán es mucho mayor que en español porque se da el caso en español de que el signiicado según sea la posición del adjetivo cambia (un pobre hombre frente a un hombre pobre). Seco (1975: 5) señala que “el sentido recto siempre se conserva en el adjetivo pospuesto, mientras que el antepuesto está más o menos deformado”. Este pequeño inciso nos ayuda a entender que la ijación interna sea, por regla general, en secuencias del tipo [PREP + X + S] mayor en alemán que en español. Una de las varas de medir la mayor o menor ijación de las CU en el discurso radica en observar el comportamiento de este tipo de secuencias en su ijación interna. Cuanto mayor sea el nivel de saturación de los slots internos, menor será la ijación y por tanto más bajo será su nivel de lexicalización. Sirva como ejemplo de repente cuyo slot interno X [de X repente], a la luz de los corpus consultados, es prácticamente nulo. Entre los slots adjetivos más recurrentes en alemán cabe destacar adjetivos que inciden en la intensidad del llanto (unter vielen Tränen), o establecen una relación cognitiva con el concepto del agua (unter strömenden Tränen) o con sensaciones gustativas (unter bitteren Tränen). Un ejemplo que se asemeja al que hemos señalado más arriba unter Tränen und Bluttränen posee de nuevo un valor intensiicador del acto de llorar (unter blutigen Tränen) que hace acto de presencia mayoritariamente en contextos literarios y adquiere un claro signiicado translaticio: (9) Die Wirtin wurde immer aufmerksamer, als er aber daran kam, wie er die schöne Jungfrau aus dem Loche erlöst und sich mit ihr verlobt hatte, da schloss sie ihn in ihre Arme und rief unter blutigen Tränen: [http://internet-maerchen.de/] Lingüística de corpus y fraseología contrastiva (alemán-español)... 161 La CU unter heißen Tränen está presente en registros elevados junto con verbos en tiempos pasados y al igual que unter blutigen Tränen existe una traslación metafórica: Tabla 3. Ejemplos de slots internos de la CU “unter X Tränen” La mayoría de los slots internos de la CU unter X Tränen actúan como valor intensiicador del lexema “Träne” (unter heißen/bitteren/ 162 Ana Mansilla strömenden/heftigen Tränen). En español se evidencia una saturación mínima de los slots internos (adjetivos) de la CU entre X lágrimas. La explicación de este fenómeno guarda relación con la equivalencia de entre lágrimas con el gerundio llorando, por ello en el corpus apenas se detectan slots internos con valor modal y temporal. Cuando se intercala un determinante (las, mis, tus) la preposición expresa un signiicado local o divisivo: (10) Pero el dios sigue susurrando entre las lágrimas. [http://avisos. realbiblioteca.es/?p=article&aviso=43&art=879&lang=es] Como se desprende del ejemplo (10), los determinantes desempeñan un papel fundamental para anular el signiicado idiomático o fraseológico de entre lágrimas. Los valores modal y temporal dejan paso al valor local o divisivo o a verbos con régimen verbal o complemento preposicional (escoger entre las lágrimas o la risa). Cuando los slots son varios, el español exhibe sustantivos acompañados de una conjunción (entre sonrisas y lágrimas, entre aplausos y lágrimas, entre risas y lágrimas, entre un mar de lágrimas). En buena parte de los casos se observa una antítesis o contraste entre el par de sustantivos (sonrisa y lágrima, risa y lágrima) que mitigan el efecto o el sentimiento negativo del llanto (lágrima). 5. Relexiones inales Los corpus brindan a los lingüistas la posibilidad de manejar datos voluminosos que se desmarcan de los datos que proceden del juicio intuitivo e introspectivo del lingüista y que están insertos en contextos discursivos reales. Ahora bien, es preciso puntualizar que trabajar con corpus no está exento de diicultades. Una de las limitaciones de nuestro estudio consiste en que los textos del corpus que hemos manejado proceden mayoritariamente del lenguaje periodístico. Otra limitación ha sido la diicultad para obtener información de uso a nivel nivel supraoracional, hecho que diiculta distinguir aspectos del contexto que aportan información valiosa para el cotexto inmediato de las CU objeto de estudio. Con todo y con ello, es evidente que los corpus lingüísticos aportan datos reveladores (estadísticos) que proceden de evidencias Lingüística de corpus y fraseología contrastiva (alemán-español)... 163 empíricas y favorecen un análisis más exhaustivo en el campo de la fraseología. De acuerdo con la combinatoria o ijación externa de las CU entre lágrimas y unter Tränen, se observa una clara preferencia por colocados verbales que son verba dicendi y de contacto. En español entre lágrimas va muy ligado a la forma de gerundio del verbo llorar llorando, y en alemán se suple esta carencia exhibiendo el valor modal. Asimismo, es llamativo el fuerte componente emocional y expresivo del signiicado pragmático de los binomios entre lágrimas y sollozos y unter Tränen und Bluttränen. Por último, en relación a los slots (X) en la ijación interna [entre X lágrimas] y [unter X Tränen] el hecho de que en español el adjetivo tienda a la posición posnominal implica un nivel menor de saturación del slot X frente al alemán (unter bitteren Tränen – entre lágrimas amargas). 6. Bibliografía Burger, Harald. 2010. Phraseologie: eine Einführung am Beispiel des Deutschen. Berlin: Erich Schmidt Verlag. Cabezas Holgado, Emilio. 2013. La predicación: Las construcciones en abanico. Aplicaciones al español. http:// eprints.ucm.es/22365/1/T34644. pdf [Acceso 27/03/2017]. Dobrovol’skij, Dmitrij. 2011. Phraseologie und Konstruktionsgrammatik. En Lasch, Alexander & Ziem, Alexander (ed.) Konstruktionsgrammatik III. Aktuelle Fragen und Lösungsansätze. Tübingen: Stauffenburg, 111130. Fleischer, Wolfgang. 1997. Phraseologie der deutschen Gegenwartssprache. Tübingen: De Gruyter. Häusermann, Jürg. 1977. Phraseologie. Hauptprobleme der deutschen Phraseologie auf der Basis sowjetischer Forschungsergebnisse. (=Linguistische Arbeiten 47). Tübingen: Max Niemeyer. Helbig, Gerhard & Buscha, Joachim. 1996. Deutsche Grammatik. Ein Handbuch für den Ausländerunterricht. Leipzig: Langenscheidt Verlag Enzyklopädie. Kiss, Tibor; Müller, Antje; Roch, Claudia; Stadtfeld, Tobias; Börner, Katharina & Duzyein, Monika (ed.). 2014. Handbuch für die Bestimmung und Annotation von Präpositionsbedeutungen im Deutschen (Bochumer Linguistische Arbeiten 14). Bochum: Sprachwissenschaftliches Institut, Ruhr-Universität Bochum. 164 Ana Mansilla Martínez García, Hortensia. 2012. Viejos y nuevos valores de las preposiciones españolas. Verba 39: 7-34. Mellado Blanco, Carmen. 2015. Phrasem-Konstruktionen und lexikalische Idiom Varianten: der Fall der komparativen Phraseme des Deutschen. En Engelberg, Stefan; Meliss, Meike; Proost, Kristel & Winkler, Edeltraud (ed.) Argumentstruktur – Valenz – Konstruktionen. Tübingen: Narr, 217-235. Mellado Blanco, Carmen & Belén López Meirama. Esquemas sintácticos de preposición + sustantivo: el caso de [entre + Splural/corporal]. En Mellado Blanco, Carmen; Berty, Katrin & Olza, Inés (ed.) Discurso repetido y fraseología textual (español y español-alemán). Frankfurt am Main: Vervuert / Iberoamericana (en prensa). Pérez Hernández, Chantal & Moreno Ortiz, Antonio. 2009. Lingüística Computacional y Lingüística de Corpus. En Rodríguez Ortega, Nuria (ed.) Teoría y literatura artística en la sociedad digital: construcción y aplicabilidad de colecciones textuales informatizadas. Gijón: Trea, 67-96. Sánchez, Aquilino & Almela, Moisés. 2010. A Mosaic of Corpus Linguistics. Frankfurt am Main: Peter Lang. Seco, Manuel. 1975. Manual de gramática española. Madrid: Alfaguara. Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Steyer, Kathrin. 2013. Usuelle Wortverbindungen. Zentrale Muster des Sprachgebrauchs aus korpusanalytischer Sicht. Tübingen: Narr. Wittgenstein, Ludwig. 1953. Philosophische Untersuchungen. Oxford: Blackwell. ojs.uv.es/index.php/qilologia/index Qf Lingüístics Assessing EPAP lexical features: A corpus-based study Análisis de los rasgos léxicos de IFE: un estudio de corpus María José Marína & Camino Rea Rizzob Universidad de Murcia. [email protected] Universidad Politécnica de Cartagena. [email protected] Received: 10/02/2017. Accepted: 12/09/2017 a b Resumen: Las características de los lenguajes de especialidad se han descrito profusamente en la literatura especializada. El trabajo de Enrique Alcaraz destaca entre otros por su exhaustiva y minuciosa descripción del IFE a todos los niveles: léxico, sintáctico, semántico y pragmático. Este estudio tiene como inalidad la constatación de dicha descripción desde una perspectiva basada en análisis de dos corpus de inglés jurídico y de telecomunicaciones. Los resultados obtenidos corroboran lo ya observado por Alcaraz en lo que se reiere al uso de los términos especializados, la relevancia del vocabulario subtécnico, las peculiaridades de los términos latinos en el inglés jurídico y la signiicativa presencia de las abreviaturas en el inglés de telecomunicaciones. Palabras clave: IFE; inglés jurídico; inglés de telecomunicaciones; lingüística del corpus. Abstract: The features of specialised languages have been extensively described by scholars in the literature. Amongst them, Enrique Alcaraz’s work stands out as an exhaustive and comprehensive description of EPAP at all linguistic levels: lexical, syntactic, semantic and pragmatic. This research aims to provide a bottom-up assessment of his description on a lexical level through the implementation of corpus-based techniques on two specialised corpora of legal and telecommunications English. The results support Alcaraz’s portrayal as regards term usage, the relevance of sub-technical vocabulary, the peculiarities of Latin single and multi-word terms in legal English and the signiicant presence and usage of abbreviations in telecommunications English. Keywords: EPAP; ESP; Corpus Linguistics; Legal English; Telecommunications English. Marín, María José & Rea Rizzo, Camino. 2017. “Assessing EPAP lexical features: A corpus-based study”. Quaderns de Filologia: Estudis Lingüístics 22: 165-186. doi: 10.7203/qf.22.11307 Assessing EPAP lexical features: A corpus-based study 167 1. Introduction Specialised languages have been traditionally deemed functional varieties or registers (Biber, 1988; Halliday, 1988) deined in terms of the variation of the recurrence of particular linguistic features in comparison to general language or other registers. Cabré (1993) considers special languages a set of sub-codes from general language which are characterised by their own special features and pragmatically determined by the variables of topic, user and communication act. Focusing on the deinition of the language of science and technology, Sager et al. (1980) provide a comprehensive description of specialised languages. Their deinition, use and function are synthesised as follows: “Special languages are semi-autonomous, complex systems based on and derived from general languages; their use presupposes special education and is restricted to communication among specialists in the same or closely related ields (1980: 69).” Similarly, Tiersma (1999) asserts that law practitioners depend upon language in their profession. According to this author, the special features of their jargon undeniably reveal their membership of the same community. Alcaraz’s (2000) deinition is in line with all of the above, as he states that so-called special languages refer to the speciic language that professionals and specialists use in order to transmit information and negotiate terms, concepts and knowledge in a particular ield of knowledge. In El inglés profesional y académico (2000), Alcaraz describes the most relevant features of English for Professional and Academic Purposes (EPAP), a term that he coins to refer to the specialised language which professionals and specialists employ to communicate. EPAP embraces many different branches or varieties associated with different professional or scientiic ields such as medicine, law, engineering or business, amongst many others. This research was conceived as an appraisal of Alcaraz’s fundamental work through the analysis of the lexicon of two specialised corpora, TC (Telematics Corpus: 1.2 million words) (Rea, 2008) and UKSCC (United Kingdom Supreme Court Corpus: 2.6 million words) (Marín, 2014; Marín & Rea, 2012a), in search of linguistic evidence supporting some of the most relevant characteristics which scholars (Mellinkoff, 1963; Tiersma, 1999; Sager et al., 1980; Alcaraz, 2000, 2002) have portrayed in the literature. The reasons to single out such differing EPAP 168 María José Marín & Camino Rea Rizzo varieties as legal and telecommunications English were related to the major objective of this research, that is, attempting to provide a bottom-up characterisation of specialised lexicons based on the general portrayals provided by scholars in the literature, speciically, Enrique Alcaraz’s (2000; 2002). In principle, one would expect legal and telecommunications English terminology to differ considerably owing to their very nature and origins, the former belonging to the ield of humanities and social sciences and having Latin and French inluence (often being archaic and redundant) (Mellinkoff, 1963; Tiersma, 1999; Alcaraz, 2000, 2002), the latter coming from the realm of engineering and science and being highly speciic and accurate. However, with regard to the statistical data associated with these lexical units, our main hypothesis was that both technical and subtechnical terms would behave similarly in both EPAP varieties, conirming the general descriptions made by scholars. Owing to the size of both corpora and, above all, to our wish to carry out a fully automatic analysis with the aim of processing as much data as possible, only some of the features described by Alcaraz in his work were considered in this appraisal, namely, the ratio and distribution of highly specialised terms in both corpora; the relevance of subtechnical vocabulary; the use of Latin words and phrases in the legal corpus and the presence and signiicance of abbreviations and acronyms in the telecommunications corpus. 2. Literature review Following from the above, this research concentrates on four major lexical features of EPAP which have been assessed applying a bottom-up corpus-based methodology, that is, by observing the statistical behaviour of the lexicon found in two specialised corpora, TC and UKSCC. The literature devoted to the study of such features highlights the usage of specialised terminology as one of the most noteworthy aspects of EPAP as regards both its frequency of use and its distribution across text collections. Specialised terms could be deined as conceptual vehicles which are employed to transmit specialised knowledge amongst scientists, researchers, or professionals in all specialised areas, hence their relevance in EPAP. As Cabré (2000: 62) puts it, terms are “form and content units which, used in different discursive conditions, acquire a specialised value”. According to Alcaraz (2000), terms tend to be uni- Assessing EPAP lexical features: A corpus-based study 169 vocal and their understanding is key to a proper comprehension of specialised texts, both oral and written. In other words, terms encapsulate speciic concepts and must be understood and mastered by specialists, otherwise communication will fail. Still within the lexical level, Alcaraz (2000) underlines the signiicance of semitechnical or subtechnical vocabulary as another relevant feature of EPAP. Subtechnical vocabulary is deined as those lexical units present in general language which acquire one or several speciic meanings within a ield of knowledge (Alcaraz, 2000: 43). In addition, subtechnical vocabulary is also understood as a collection of general words which are shared both by the general and the specialised ields without changing their meaning. Numerous authors have approached this question and deined sub-technical terms from different angles (Cowan, 1974; Baker, 1988; Flowerdew, 2001; Chung & Nation, 2003; Wang & Nation, 2004), agreeing on their ambiguous character and the dificulties that they cause to EPAP learners due to such obscurity. For the concept to be clearly delimited, Marín (2016) attempts to deine it taking into consideration both qualitative and quantitative criteria. Another relevant feature of EPAP, speciically of legal English, is the strong inluence of Latin on its terminology, something that does not happen in telecommunications English. Although common law bears almost no resemblance with Roman law (which civil/continental law systems are based on), the presence of Latin in its terminology is more than merely anecdotal. Alcaraz (2000: 78) distinguishes between purely Latin borrowings like obiter dictum or ratio decidendi, which were imported directly from Latin without being adapted into English, and cognates such as exonerate or presumption, which relect the English orthography although their meaning and form remain closely linked to their etymological origin. In the present research we will concentrate on the former and attempt to support these observations with evidence obtained from UKSCC, our legal corpus. Within EPAP, the area of science, technology and computing is also characterised by the constant creation of new lexical units by using the linguistic resources of the corresponding language (Alcaraz, 2000: 50). The creation of new words responds to the need for the unique naming of concepts. According to Sager et al. (1980) and Alcaraz (2000), the principal method of designation in general and even more so in special reference is the modiication of existing resources by means of concatenative processes, which follow the principle of adding some morpho- 170 María José Marín & Camino Rea Rizzo logical material to a given form, namely, derivation and compounding. Nevertheless, there are also word formation processes that do not follow the principle of concatenation so new items are formed by deleting linguistic material instead of adding it. Amongst them, abbreviation refers to any kind of word which has undertaken a shortening process, that is, any compressed form in general. Abbreviation is an umbrella term which covers initials (also called initialism), acronyms (also called letter words) and clippings (Sager et al., 1980; Alcaraz, 2000; Plag et al., 2007). Despite the relevance of the features depicted above, to the best of our knowledge, there are no corpus-based studies which can contribute to a bottom-up characterisation of specialised lexicons, apart from the ones carried out by Marín (2014; 2016), Rea (2008) and Marín & Rea (2012a; 2012b; 2014), hence the need to develop further research along these lines. 3. Methodology, results and discussion Alcaraz’s (2000) work presents a comprehensive portrayal of the major features of EPAP, which comprises the lexical, semantic, syntactic and pragmatic levels of the language and focuses on the use of specialised terminology, the features of major phrase types, the presence of polysemic words and metaphors in specialised texts or the communicative dimension of these texts, amongst many others. For practical reasons and given the fact that this study was intended to be carried out automatically, a selection of these features was made so as to concentrate on the lexicon of legal and telecommunications English applying corpus linguistics techniques, that is, adopting a bottom-up perspective for the analysis of the two corpora. The selected features are: the use of specialised terminology in EPAP; the relevance of subtechnical terms; the signiicance of Latin terms and phrases in legal English; and the use of compressed forms or abbreviations as a result of word formation processes in telecommunications English. 3.1. Specialised terminology in TC and UKSCC As regards the identiication of specialised terms in large text collections like UKSCC or TC, they can be mined automatically using Automatic Assessing EPAP lexical features: A corpus-based study 171 Term Recognition (ATR) Methods. There is a whole plethora of them, some of which were validated on both corpora (Marín & Rea, 2014). The methods selected for evaluation were: TF-IDF (term frequency-inverse document frequency) (Sparck Jones, 1972); TermoStat (Drouin, 2003); C-Value (Frantzi & Anniadou, 1999) and Terminus (Nazar & Cabré, 2012). Their assessment was deployed through a comparison between the output lists of candidate terms produced by each method and two specialised glossaries of legal and telecommunications terms1. The overlap percentage between both vocabulary inventories showed the precision levels achieved by each of the methods and therefore led to a selection of the most eficient one. Out of the four methods tested by Marín & Rea (2014), Terminus (Nazar & Cabré, 2012) excelled in comparison with the other three, managing to extract 71.5% true terms (terms which coincided with the ones in the glossaries used as gold standard) from UKSCC and 60% from TC on average. Precision was even higher for the top 200 candidate terms, reaching 84.5% for the former corpus and 69.5% for the latter. As regards the legal corpus, implementing Terminus as the selected ATR method, a list of 1,787 terms was obtained, which represented 6.6% of the total 27,060 types identiied by Wordsmith 5.0 (Scott, 2008)2. These terms displayed an average frequency of 1,037 (each of them repeats itself throughout the corpus on 1,037 occasions) and appeared in 27 texts on average (out of 193). If compared with the average distribution of all the word types in UKSCC (19.8), excluding hapax and dis legomena3, the distribution of the specialised terms extracted by Terminus could be deemed considerably high, actually, almost twice as high as all the types in the corpus. Not only were legal terms well distributed, but their frequency was also much higher than the average frequency of all the word types, occurring on 1,037 occasions as opposed to the average value of such types, 169.45, 6 times lower than the former (again, hapax and dis legomena were excluded from this count). The automatic validation of the lists was performed by resorting to two specialised electronic glossaries of legal English (of 10,054 terms) and telecommunications English (of 5,102). 2 The term type refers to each of the words present in a corpus without counting their repetitions. Each of these repetitions would be labeled as tokens. 3 Those types which occur once or twice respectively in the corpus. 1 172 María José Marín & Camino Rea Rizzo The number of terms identiied in the telematics corpus by Terminus was smaller, 888 terms out of 25,774 types, which represented 3.44% of the whole list. Their frequency counts were also lower than the same value in the legal corpus since the terms identiied in TC occurred on 38.62 occasions on average, whereas the mean frequency of the whole type list was almost three times as high, that is, 89.93. Nevertheless, they were well distributed in the corpus being present in 30.14 texts out of 272 (the whole text collection) as opposed to the same average value for the whole of the type list, 14.59. Judging by these igures, although term frequency counts were not so high in the telematics corpus as they were in the legal one, it could be afirmed that Alcaraz’s observation about the signiicance of the use of terminology in specialised texts was conirmed from a bottom-up perspective with regard to both the frequency and distribution of legal and telematics terms. 3.2. Quantifying the relevance of subtechnical terms in TC and UKSCC Concerning UKSCC and TC, the presence of subtechnical vocabulary was measured using Heatley & Nation’s (2002) software Range. This software allows the user to obtain the percentage of running words in a text or text collection covered by a given word list which is included in the software package. Both the term lists obtained from our corpora were processed using the British National Corpus (BNC) list of the most frequent 3,000 words of English as the base list to compare them with. The resulting percentage would relect the proportion of specialised terms from our lists which could be found amongst the most frequent 3,000 words of English, comprising words like father, bank, the or water, amongst many others. Such overlap would signal the percentage of subtechnical words present in both corpora given the fact that they were identiied as specialised terms by Terminus, validated as such against a specialised glossary and also found as general vocabulary amongst the 3,000 most frequent words of English. The overlap percentages varied in both cases, legal English being the variety which presented a higher amount of terms which coincided with the general English vocabulary from the BNC. 47.35% of the terms identiied by Terminus were also present in the list of the most Assessing EPAP lexical features: A corpus-based study 173 frequent 3,000 words of English. Such frequent words as action, claim, decision or criminal were included in our term inventory. Apart from their high frequency counts in the general ield, they could be labelled as subtechnical owing to the fact that they acquire a technical meaning when in contact with the legal context. About one third (35.55%) of the terms mined from TC, our telematics corpus, could also be found in the BNC list. The words processor or controller, which have a specialised meaning both in the general and the telematics ields, were found amongst that third. Other terms like backbone, also in the list of subtechnical telematics terms, specialise in the technical environment referring not to the human spine but rather to a local computer network. Table 1 below illustrates the top 25 subtechnical terms obtained from both corpora. SUB-TECHNICAL LEGAL TERMS SUBTECHNICAL TELEMATIC TERMS Ability Application Absolute Backbone Acceptable Bit Action Box Admit Call Complaint Client Consistent Controller Creditor Depend Criminal Entry Damages File Debt Logic Employ Mapping Evidence Model Excuse Neighbor Exercise Noise Expense Object 174 María José Marín & Camino Rea Rizzo SUB-TECHNICAL LEGAL TERMS SUBTECHNICAL TELEMATIC TERMS Fact Operate Form Packet Privilege Path Proof Programme Reveal Refer Signature Resource Suicide Route Suspend Server Terminate Site Table 1. Top 25 subtechnical terms obtained from UKSCC and TC Once more, having adopted a bottom-up perspective, Alcaraz’s observation about the relevance of subtechnical vocabulary in specialised English has been corroborated by corpus evidence. 3.3. Latin terms in UKSCC: a corpus-based assessment This section presents the study of legal terms which are employed in legal English without being adapted to the English orthographic or phonetic system, that is, they are pure Latin borrowings, as deined by Alcaraz (2000: 78). These must be distinguished from cognates, which are adapted to the English language system although their meaning and form still remain close to their etymological origin. The data and discussion offered below revisit and upgrade the study by Marín & Rea (2012b). As a preliminary step, a list of Latin terms was obtained from text and academic books4 which acted as reference for the identiication of these lexical units in UKSCC. Such identiication was carried out using See Mellinkoff, 1963; Alcaraz, 1994; Borja, 2000 and Orts, 2006, for academic references on Latin vocabulary in legal English and Fernández, 1994; Rice, 2007; KroisLinder & Firth, 2008; Frost, 2009; Callanan, 2010 and Orts, 2010 for textbook references. 4 Assessing EPAP lexical features: A corpus-based study 175 an excel spreadsheet to compare the type list produced by Wordsmith (Scott, 2008) with the Latin term list obtained from the books cited below automatically. Once single word Latin units were extracted (187 in total), it was attested that the top 10 most frequent ones were mostly function words, as is the case in general English, namely: versus (v), per, de, inter or re. There were other forms which, owing to their similarity with English, were excluded from these considerations (i.e. in, sub or ex), since they might produce misleading results. However, if compared with the whole UKSCC type list, their frequency was considerably low standing between the 400th and 1800th positions of the frequency rank. As a matter of fact, only 17 of these single word Latin terms fell within the top 2,000 word types identiied by Wordsmith. Other Latin terms within this frequency range were afidavit, quantum, jure, or incapax. Text range was also considered in this study as an indicator of a term’s representativeness. Nation (2001) afirms that the higher this value for a given word is in a corpus, the greater its relevance within that corpus. The concept text range points at the percentage of running words in a text covered by that term or word list. For the sake of comparison, a sample list of 35 crime nouns (also regarded as specialised terms) was mined from the list of word types conirming the low frequency counts associated to Latin terms. Nevertheless, as regards text range, the igures varied showing that the 187 Latin term list covered 0.0059% of the words in UKSCC, whereas crime nouns covered only 0.00095%, almost six times less. Therefore, it could be stated that Latin terms, although not excessively frequent, present higher text coverage values than other specialised terms like murder, abduction, threats or battery, always bearing in mind that Latin terms only represent 0.69% of the total types identiied in the corpus. In a similar fashion, keyness was computed with the aim of determining the level of representativeness of Latin single-word units within the legal text collection. According to Scott (2008: 184), “a word is considered key if it is unusually frequent (or unusually infrequent) in comparison with what one would expect on the basis of the larger wordlists”. Keyness can be calculated automatically by comparison with a general English corpus using Wordsmith. Resorting again to the list of crime nouns used as reference for comparison with our Latin word inventory, the results showed that, in spite of the lower frequency of Latin terms, they could be considered as relevant as crime nouns standing at 176 María José Marín & Camino Rea Rizzo only three points below the latter and displaying 94.3 keyness. This value is also considerably high if compared with the average keyness of the whole list produced by wordsmith, namely, 116.08. Finally, the level of specialisation of these terminological units was also measured in an attempt to substantiate Alcaraz’s observation on their relevance in legal English. In this case, Chung’s (2003) ratio ATR method was applied to rank the Latin terms according to their degree of speciicity. Chung’s method is based on corpus comparison, classifying a word type as a term only “if it occurs 50 times more often in the technical text than in the comparison corpus, or if it only occurs in the comparison corpus” (2003: 53). This termhood ratio can be easily calculated by irst dividing a word’s frequency of occurrence in both corpora by the number of tokens in each corpus, and then dividing the result obtained using the data from the specialised corpus by the same data obtained from a general one5. The value obtained should be above 50 for a word type to be regarded as a specialised term. The Latin terms in our list were therefore arranged and iltered according to Chung’s method, which resulted in an inventory which included terms such as afidavit, caveat, proviso, extempore, quantum, lex or subpoena. Nevertheless, most of these forms are either part of general or academic vocabulary and could therefore not be regarded as legal terms proper, for instance plus, nil, persona, memorandum, caveat or alibi, or they simply do not occur in isolation but rather as part of phrases. This is why the study on their speciicity level was extended to Latin phrases, displayed in table 2. TYPE FREQUENCY UKSCC DISTRIBUTION RATIO Ex turpi causa 129 3 ∞ Doli incapax 36 1 ∞ Quantum meruit 27 5 ∞ Mutatis mutandis 24 18 ∞ Alter ego 21 5 ∞ The general English corpus used in this case was LACELL, a 20 million-word corpus of general English texts compiled and owned by the LACELL research group from the English Department at the University of Murcia, Spain. 5 177 Assessing EPAP lexical features: A corpus-based study TYPE FREQUENCY UKSCC DISTRIBUTION RATIO Forum non conveniens 13 3 ∞ Actus reus 10 5 ∞ Ad litem 10 3 ∞ Usque ad coelum 8 1 ∞ Pari delicto 7 1 ∞ Ratione personae 6 3 ∞ Doli capax 5 1 ∞ Debet ese 4 1 ∞ Ad factum 4 1 ∞ Res iudicata 4 2 ∞ De novo 4 3 ∞ Praesumptio juris 3 1 ∞ Jus cogens 3 1 ∞ In par material 3 2 ∞ De jure 52 5 145,6 Pari passu 28 4 117,6 115 26 96,6 Ultra vires 79 16 82,95 Et seq 29 17 81,2 A fortiori 32 28 67,2 Ex parte Table 2. Top 25 Latin phrases and their level of specialisation As shown in table 2, like single-word Latin terms, the average frequency of these phrases is far from the mean value of the whole corpus, the former being 27.66 whereas the latter is 7 times higher. This data clearly points at their high level of specialisation, which is reinforced by the ratio values. 22 out of the 53 phrases mined from UKSCC do not occur in the general English corpus, being therefore assigned an ininity 178 María José Marín & Camino Rea Rizzo ratio value and standing at the top of the speciicity rank, namely, mutatis mutandis, quantum meruit or actus reus, amongst other. In spite of their low frequency, their distribution across the corpus is quite high. Phrases like de facto, inter alia, prima facie or pro rata occur in approximately a fourth of the texts in the corpus. Furthermore, while the average text distribution of all the word types in the corpus (excluding hapax and dis legomena) is 25.82, an eighth of the texts in it, Latin terms appear in 14.97 texts on average (under the same conditions), quite a high value given their degree of specialisation. Summing up, term distribution together with their speciicity may be considered as two key factors in determining the relevance and representativeness of a word or group of words within a corpus, whereas frequency simply indicates how many times a word repeats itself. Thus, the low frequency rates associated with Latin terms in UKSCC should not be deemed indicative of their little signiicance within the corpus. On the contrary, their level of specialisation coupled with their considerably high text distribution clearly signals their keyness within the variety supporting, once again, Alcaraz’s (2000) observations as well as other scholars’ like Mellinkoff (1963), Tiersma (1999) or Borja (2000). 3.4. Abbreviations in TC: major indings and discussion As stated in the section devoted to the literature review, the term abbreviation is an umbrella term which covers initials (also called initialism), acronyms (also called letter words) and clippings (Sager et al., 1980; Alcaraz, 2000; Plag et al., 2007). First, initialisms are formed by combining only the initial letter of multi-word combinations giving rise to a sequence of letters which are pronounced individually, in the way in which the letters are spelt in the alphabet, e.g. TNT, DVD, IP, GPS, etc. However, when the combination of initial letters is pronounced as regular words following the regular reading rules of English, it becomes an acronym, e.g. NASA, LASER, NATO, etc. Clippings, in turn, result from usually monosyllabic or disyllabic words where the irst part of the word base is kept, e.g. doc from doctor, sec from second, etc. Sometimes, an initial or middle element of the word can be also omitted like gbyte from gigabyte (Sager et al., 1980; Jackson, 1988; Alcaraz, 2000; Plag et al., 2007). 179 Assessing EPAP lexical features: A corpus-based study Corpus analysis corroborates Alcaraz’s description of EPAP, precisely in telecommunications English, where compressed forms play a crucial role, since they stand for 16% of the terms included in the Telecommunications Engineering Word List (TEWL) (Rea, 2008). This lexical repertoire includes the most salient, central and typical specialised lexical units in the domain. They are all found within the range of the 1000 most statistically signiicant word families in the domain, as drawn by the comparison of the general language corpus LACELL. Their specialty index is obtained by applying Chung’s method (2003) and the keyness index is given by the likelihood test in WordSmith (Scott, 2008) mirroring the procedure applied to the study of Latin terms in legal English. Rank TEWL F.Tec F.Lacell Ratio Keyness 1 IP 5,239 20 994,85 16,182 2 TCP 1,717 12 543,41 5,248 3 ATM 1,639 35 177,85 4,817 4 LAN 1,481 27 208,32 4,387 5 OSPF 1,284 0 ∞ 4,027 6 QOS 1,155 0 ∞ 3,622 7 VHDL 1,150 0 ∞ 3,607 8 MPLS 1,112 0 ∞ 3,487 9 GSM 1,109 4 1052,96 3,427 10 VPN 1,007 5 764,89 3,097 11 IEEE 1,002 9 422,83 3,044 12 LSAS 858 1 3258,58 2,676 13 DSP 906 41 83,92 2,523 14 LSA 804 0 ∞ 2,521 15 CDMA 805 1 3057,29 2,510 16 CISCO 840 14 227,87 2,498 17 MHZ 792 18 167,11 2,319 18 GHZ 734 2 1393,82 2,275 180 Rank María José Marín & Camino Rea Rizzo TEWL F.Tec F.Lacell Ratio Keyness 19 FPGA 713 0 ∞ 2,236 20 SCTP 703 0 ∞ 2,205 21 RF 716 8 339,91 2,161 22 DB 774 36 81,65 2,149 23 WLAN 677 0 ∞ 2,123 24 ISDN 699 14 189,62 2,061 25 HTTP 801 96 31,69 1,946 Table 3. Top 25 abbreviations in TEWL As table 3 illustrates, the relevance of abbreviations is evidenced by their quantitative behaviour both within TEC, the main telecommunications corpus, and TC, the subcorpus of telematics. As already stated, a whole of 443 abbreviations comprise 16% of the word forms of the specialised repertoire. Considering the total number of abbreviations appearing in the term inventory extracted from TC and the ratio yielded by Chung’s method, there are 237 forms (53%) which are not found in the general corpus, hence they are assumed not to be typical of general language but characterised by a high degree of specialisation. Such highly technical terms display a keyness index which ranges from 4,027 (OSPF) to 12.5 (VDMS). The higher their frequency in the specialised corpus, the higher their keyness index. The next group comprises the abbreviations whose ratio is > 50, which amount to 119 forms (27%), and also occur in the general English corpus LACELL. They are characterised by their high frequency in TC and their low frequency in the general corpus, their keyness being also dependent on their frequency in the former corpus. The most signiicant abbreviation, TCP, belongs to this group, being 543 times more frequent in the telecommunications domain than in general language, and scoring 5,248 in keyness. Finally, the remaining 87 abbreviations (20%) are also used to a greater or lesser extent in LACELL so that their ratio is < 50. This does not mean that they are not specialised terms but their use has been extended to general language, thus being subtechnical. Therefore, their frequencies in both corpora do not differ that much, although their keyness might vary considerably. Assessing EPAP lexical features: A corpus-based study 181 The most signiicant unit in this group is HTTP (1,946) and the lowest score is yielded by GUIS (11). Other forms in this category are the following: RADAR, PC, ID, MAC, WAN, WWW, etc. A inal perspective is gained when approaching the quantitative behaviour of abbreviations in connection with the whole telecommunications word list (TEWL). When the different values which deine the lexical behaviour of the terms in the list are taken as reference, the particular performance of abbreviations may be contrasted so that it is evidenced to what extent they approach the top and bottom scores. The most relevant term in TEWL is network (F. TEC: 16,649; F. LACELL 1,686; R: 37,50; K: 41,784) and microchips gets the lowest score in keyness (F.TEC: 9; F. LACELL: 6; R: 5.69; K: 10). Such references highlight and clarify the terminological character of abbreviations and their relevance in the speciic domain, particularly of those which rank the highest like IP (the ifth most relevant term in TEWL), TCP, ATM, LAN, OSPF, etc. Moreover, amongst the top 100 words of the speciic list, there are 14 abbreviations of which OSPF, QOS,VHDL, MPLS and LSA cannot be found in the general language corpus, and IP, TCP, ATM, LAN, GSM, VPN, IEEE, LSAS and DSP give a ratio > 50 whereas their keyness is considerably high ranging from 16,182 (IP) to 2,521 (LSA). With respect to the different shortening process that abbreviations undergo, initialisms (360) remarkably stand out from the rest since they represent 81% of the total. The majority of the abbreviations found in the speciic corpus come from the combination of the initial letter of multi-word units which is pronounced as a sequence of letters such as IP, TCP, ATM, GPRS, SNMP, BGP, DCE, GPS, IGRP, PBX or BS. Concerning acronyms, there are 74 in the list covering 17% of the abbreviations. In that case, the combination of the letters is pronounced as regular words like RADARS, FIFO, VOIP, RIP, IPSEC, PAC, QOS, MAC, CISCO, OSI, LABVIEW, LDAP, SPICE, etc. Finally, there are only 11 clippings, where the irst or last part of the word base has been kept. Some abbreviations, particularly acronyms, have been lexicalised and accepted as full words capable of undergoing compounding, derivation and conversion processes. Clear evidence of this behaviour is observed directly from the list of abbreviations where pairs of singular and plural forms are found, for example LAN/S, VLAN/S, RAM/S, COMSAT/S, RADAR/S, FIFO/S, PAC/S, etc. The metal-oxide semiconductor (MOS) family neatly illustrates compounding and how it forms 182 María José Marín & Camino Rea Rizzo multi-word units which again undergo a shortening process and become a longer acronym: CMOS (complementary metal-oxide semiconductor), NMOS (n-channel metal-oxide semiconductor), PMOS (p-channel metal-oxide semiconductor), BICMOS (bipolar complementary metal-oxide semiconductor) and MOSFETs (metal-oxide semiconductor ield-effect transistors). In short, it follows from the above that both the quantitative behaviour and the lexicalisation of abbreviations demonstrate their terminological character and typicality in the subject ield as pointed out by Alcaraz (2000). In addition, all those compressed forms are linguistic labels which stand for deinitions, being characterised by special reference within telecommunications, even those which have been integrated into the general language. Therefore, standardised abbreviations are also terms which achieve complete and effective communication in the specialised language singling it out from general language. 4. Conclusion Corpus Linguistic techniques can detect automatically what is usual or unusual in a sublanguage with respect to general language, which establishes a reference norm, or in comparison to other sublanguages. In this research, the adoption of a corpus-based approach has allowed to identify the typical behaviour of the lexicons of legal and telecommunications English, providing a bottom-up depiction of some of their most relevant characteristics and corroborating the portrayal carried out by authors such as Alcaraz (2000, 2002). The application of ATR methods and the quantitative parameters intended to measure how vocabulary performs in UKSCC, the legal corpus, and TC, the telematics Corpus, have permitted to depict the use of specialised terminology (including subtechnical terms) and the outstanding use of Latin terms and phrases in legal English and abbreviations in telecommunications English. Our initial hypothesis departed from the assumption that specialised terminology would behave similarly across EPAP varieties, following Alcaraz’s (2000; 2002) portrayal. Such hypothesis was conirmed although certain differences were also observed between the two varieties selected for this research, namely, legal and telecommunications English. Assessing EPAP lexical features: A corpus-based study 183 Concerning the use of specialised terms in both varieties, the results vary slightly particularly concerning the frequency of these lexical items in the ield of telecommunications. While terms tended to occur 6 times as much (1,037) as the whole list of types (169.45) identiied in the legal corpus on average, this value was three times lower (38.62) than the average for the whole type list (89.93) in TC, the telecommunications corpus. Nevertheless, they were well distributed throughout both corpora appearing in 13.98% legal texts and 11.08% telecommunications ones and representing 6.6% and 3.44% of the whole list of types identiied in both text collections respectively. The literature also signals the signiicance of subtechnical terms in specialised languages, that is to say, of those terms which can be found in both specialised and general language contexts either retaining their technical meaning or activating it when in contact with the specialised environment. Testing showed that a large proportion of legal and telecommunications terms overlapped with the list of the 3,000 most frequent words of English found in the BNC. In fact, almost half of the terms in the legal corpus (47.35%) and about one third (35.55%) of the telecommunications terms could be found amongst these general words. Within the ield of legal English, Alcaraz (2000) particularly underlines the relevance of Latin words and phrases, which was also tested from a bottom-up perspective. The results evidenced that their frequency was not as high as expected, that is, if compared with the whole type list, they stood between positions 400th and 1800th in the frequency rank. However, when considering only Latin phrases, it appeared that both their level of specialisation and their distribution throughout the text collection was much higher, standing at the top of the speciicity rank and appearing in 14.97 of the texts in the corpus (on average) in spite of their low frequency. Finally, the use of abbreviations was also assessed within the ield of telecommunications English. It was attested that 16% of the terms in TC were abbreviations (almost one ifth of the whole list), displaying really high levels of specialisation since 53% of them were not even found in the general context. In fact, when processing the telecommunications corpus with Keywords (Scott, 2008), abbreviations were assigned an average keyness value of 1,634 as opposed to the same value for the whole term list, that is, 237.26, which clearly points at their speciicity and relevance in the corpus. 184 María José Marín & Camino Rea Rizzo 5. References Alcaraz Varó, Enrique. 2000. El inglés profesional y académico. Madrid: Alianza Editorial. Alcaraz Varó, Enrique. 2002. El inglés jurídico: textos y documentos. Madrid: Ariel Derecho. Baker, Mona. 1988. Subtechnical vocabulary and the ESP teacher: An analysis of some rhetorical items in medical journal articles. Reading in a Foreign Language 4(2): 91-105. Biber, Douglas. 1988. Variation across Speech and Writing. Cambridge: Cambridge University Press. Borja Albí, Anabel. 2000. El texto jurídico en inglés y su traducción. Barcelona: Ariel. Cabré, María Teresa. 1993. La teminología. Teoría, metodología aplicaciones. Barcelona: Antártida/Empúries. Cabré, MaríaTeresa 2000. Terminologie et linguistique: la théorie des portes. Terminologies nouvelles. Terminologie et diversité culturelle 21: 10-15. Callanan, Helen & Edwards, Linda. 2010. Absolute Legal English. London: Delta. Cowan, Ronayne. 1974. Lexical and syntactic research for the design of EFL. TESOL Quarterly 8: 389-399. Chung, Teresa M. & Nation, Paul. 2003. Technical Vocabulary in Specialised Texts. Reading in a Foreign Language 15(2): 103-116. Drouin, Patrick. 2003. Term extraction using non-technical corpora as a point of leverage. Terminology 9(1): 99-117. Flowerdew, John. 2001. Concordancing as tool in course design. In Ghadessy, Mohsen; Henry, Alex & Roseberry, Robert (eds.) Small Corpus Studies and ELT: Theory and Practice. Amsterdam: John Benjamins. Frantzi, Katerina T. & Ananiadou, Sophia. 1999. The c/nc value domain independent method formulti-word term extraction. Journal of Natural Language Processing 3(2): 115-127. Halliday, Michael. 1988. On the language of physical science. In Ghadessy, Mohsen (ed.) Registers of Written English: Situational Factors and Linguistic Features. London: Pinter. Heatley, Andrew & Nation, Paul. 2002. Range, computer software. Wellington, New Zealand: Victoria University of Wellington. Krois-Linder, Amy & Firth, Matt. 2008. Introduction to International Legal English: A course for Classroom or Self-study Use. Cambridge: Cambridge University Press. Jackson, Howard. 1988. Words and their Meaning. London: Longman. Assessing EPAP lexical features: A corpus-based study 185 Marín Pérez, María José. 2014. Evaluation of ive single-word term recognition methods on a legal corpus. Corpora 9(1): 83-107. Marín Pérez, María José. 2016. Measuring the Degree of Specialisation of Sub-Technical Legal Terms through Corpus Comparison: a DomainIndependent Method. Terminology 22(1): 80-102. Marín Pérez, María José & Rea Rizzo, Camino. 2012a. Structure and design of the BLRC: a legal corpus of judicial decisions from the UK. Journal of English Studies 10: 131-145. Marín Pérez, María José & Rea Rizzo, Camino. 2012b. How relevant are Latin wordforms and clusters in legal English? A corpus-based study on the representativeness and speciicity of such elements in UKSCC: an ad hoc legal corpus. ES. Revista de Filología Inglesa 33: 161-182. Marín Pérez, María José & Rea Rizzo, Camino. 2014. Assessing four automatic term recognition methods: Are they domain-dependent? English for Speciic Purposes World 42: 1-27. Mellinkoff, David. 1963. The Language of the Law. Boston: Little, Brown & Co. Nation, Paul. 2001. Learning Vocabulary in Another Language. Cambridge: Cambridge University Press. Nazar, Rogelio & Cabré, María Teresa. 2012. Supervised Learning Algorithms Applied to Terminology Extraction. In Aguado de Cea, Guadalupe; Suárez-Figueroa, Mari Carmen; García-Castro, Raul & Montiel-Ponsoda, Elena (eds.) Proceedings of the 10th Terminology and Knowledge Engineering Conference (TKE 2012). Madrid: Ontology Engineering Group, Association for Terminology and Knowledge Transfer, 209-217. Orts, María Ángeles. 2006. Aproximación al discurso jurídico en inglés. Madrid: Edisofer Libros Juridicos S.L. Plag, Ingo; Arndt-Lappe, Sabine; Braun, Maria & Schramm, Maria. 2007. Introduction to English Linguistics. Berlin: Mouton de Gruyter. Rea Rizzo, Camino. 2008. El inglés de las telecomunicaciones: estudio léxico basado en un corpus especíico (Tesis doctoral). Universidad de Murcia. Rice, Sally. 2007. Professional English in Use: Law. Cambridge: Cambridge University Press. Sager, Juan; Dungworth, David & McDonald, Peter F. 1980. English Special Languages. Principles and Practice in Science and Technology. Wiesbaden: Brandstetter Verlag KG. Scott, Mike. 2008. WordSmith Tools version 5. Liverpool: Lexical Analysis Software. 186 María José Marín & Camino Rea Rizzo Sparck Jones, Karen. 1972. A statistical interpretation of term speciicity and its application in retrieval. Journal of Documentation 28: 11-21. Tiersma, Peter. 1999. Legal Language. Chicago: The University of Chicago Press. Wang, Karen & Nation, Paul. 2004. Word Meaning in Academic English: Homography in the Academic Word List. Applied Linguistics 25(3): 291-314. ojs.uv.es/index.php/qilologia/index Qf Lingüístics Translator’s creativity in cultural elements transposition: a corpus-based study La creatividad del traductor en la transposición de elementos culturales: un estudio de corpus Virginia Mattioli Universitat Jaume I. [email protected] Received: 20/04/2017. Accepted: 9/11/2017 Resumen: En este artículo, se presenta un estudio basado en corpus con el objetivo de determinar el nivel de creatividad (frente al de convencionalismo) en la traducción de los elementos culturales. Considerando la creatividad como el uso de aquellas estrategias que manipulan el material léxico del texto de origen, se utilizó la metodología de la lingüística de corpus para examinar un corpus trilingüe (español, inglés, italiano) formado por 50 novelas (25 obras originales y las 25 traducciones correspondientes). La metodología adoptada se estructura en tres fases: (a) identiicación de los elementos culturales, (b) determinación de las estrategias de traducción y (c) distinción entre técnicas creativas y convencionales. Los resultados demuestran que, por lo que se reiere a la transposición de los culturemas, los traductores propenden por las técnicas más creativas. Palabras clave: lingüística de corpus; estudios de traducción; elementos culturales; creatividad; técnicas de traducción. Abstract: This article presents a corpus-based study developed to determine the degree of creativity (as opposed to conventionalism) in the translation of cultural elements. Considering creativity as the use of those strategies that manipulate the lexical material of the source language, a literary corpus consisting of 50 novels (25 translations and 25 corresponding originals) was examined through corpus linguistics. Firstly, culture-speciic elements were identiied; secondly, translation strategies were determined; and inally, they were placed in conventional or creative groups. The results show that transposition of culture-speciic elements is strictly related to creativity. Keywords: corpus linguistics; translation studies; cultural elements; creativity; translation techniques. Mattioli, Virginia. 2017. “Translator’s creativity in cultural elements transposition: a corpus-based study”. Quaderns de Filologia: Estudis Lingüístics 22: 187-213. doi: 10.7203/qf.22.11308 Translator’s creativity in cultural elements transposition... 189 1. Introduction This article aims to assess translators’ creativity in relation to culture-speciicity in a corpus of iction novels by comparing translations with their corresponding original works. In fact, the morphology of culture-speciic elements suggests that the cultural nature of such lexical items determines which translation techniques are adopted to transpose them from one language to another. After a brief introduction regarding the theoretical frame and the methodology adopted, this paper describes the analysis carried out to demonstrate the existence of such a relationship and translators’ tendency to creativity. Thus, in section one, corpus linguistics is presented and justiied as the methodology chosen for the research; then cultural elements are introduced through a chronological presentation of previous authors’ attempts to recognize and translate them; and inally, the concept of creativity is deined and compared with that of conventionalism. In section two, the case of study is described. Here, the speciic hypothesis and objective of the research are presented, the corpus used is shown, and the various phases of the analysis are explained. Lastly, the outcomes are presented and the results are discussed. The article ends with some concluding remarks and suggestions for possible future research. 2. Theoretical frame 2.1. Corpus linguistics in translation studies Since ancient times, a corpus has been deined as a collection of texts used to study common textual features. In the 1990s, Sinclair (1991: 171) underlined the nature of those texts, which should be natural (produced by human beings) and authentic (produced for real contexts). In the following years, several authors tried to propose a deinition of corpus taking into account all the characteristics presented by this set of texts. Given the multiplicity of features dealt with in this study, we have adopted the deinition proposed by Sánchez (1995: 8-9). Considering its origin, purpose, composition, representativeness and extension, Sánchez deines a corpus as a collection of linguistic data systematized according to certain criteria, wide enough in range and depth to be representative of the whole language or of some of its varieties. Moreover, 190 Virginia Mattioli he highlights the value of electronic processing in providing data which yield varied and useful results for description and analysis. Corpus linguistics was born at the beginning of the 20th century (although its effectiveness increased from the 1960s thanks to developments in computing), with the objective of studying the language from real examples (Sinclair 1991: 171). As in this study corpus linguistics methodology has been used to observe translators’ behavior with respect to speciic lexical elements, the main interest here is the application of this methodology to translation studies and lexicology. Hence, according to translation studies scholars, corpus linguistics is very useful (a) to analyze the relationship between source and target text, in particular to describe the translation techniques chosen by translators (Lepinette, 2004: 2-3), and (b) to investigate translated language regularities and behaviors, observing translation processes, products and functions (Xiao and Ming, 2009: 237 Toury, 1995: 265 cit. Xiao & Ming 2009: 237). On the other hand, from a lexicological perspective, by using corpus linguistics one can study word frequency, presence, use, characteristics, distribution and collocations (Procházková, 2006: 7-8). In recent decades, there has been much debate regarding the nature of corpus linguistics and, considering its deinition and objectives, several authors have questioned whether it should be treated as a discipline or a methodology. In this study, priority has been given to the multiplicity of applications of corpus linguistics and, concurring with numerous authors (Leech, 1992: 105; McEnery and Wilson, 1996: 2 among others), it has been considered a methodology – more speciically, an empirical methodology based on the fact that language is a probabilistic system in which distinct features appear with different frequency. Considering both the advantages and the shortcomings of corpus linguistics, its application seems convenient for this research on two fronts: on the one hand, it facilitates identiication of culture-speciic elements and on the other, the fact that it permits us to analyze a great variety of texts guarantees a broad variety of authorities, topics and translators. 2.2. Cultural elements Since the 1960s, several scholars of translation studies have demonstrated an increasing interest in cultural elements. Following Nida’s Translator’s creativity in cultural elements transposition... 191 irst approach in 1964, many other authors focused their attention on these lexical items and the challenge they represent during the translation process. The main aims of such studies can be condensed into two groups according to their main objectives. On the one hand, those authors who attempt to deine and classify cultural elements proposing various deinitions, apparently without reaching any agreement about their nature and their identiication. On the other, those who propose different techniques to transfer such elements from one language to another. 2.2.1. Deinition and classiication of cultural elements Among the former, Nida (1945) recognizes cultural elements as a problem in translation and classiies them in ive basic categories. Some years later, Newmark (1988) denominates these elements cultural words and introduces the concept of cultural language referred to the speciic language of a certain culture within which it is possible to ind a wide variety of culture-speciic vocabulary (Newmark, 1988: 94). After him, Mayoral Asensio (1994: 76) labels as cultural references (referencias culturales) those elements of the discourse that, because of their reference to the original culture, are completely or partly misunderstood by the members of the target culture, and Aixelá (1996) focuses on the absence of these elements in the target culture. Christiane Nord (1997) adopts Vermeer’s denomination and deinition of cultureme as a “social phenomenon of a culture X that is regarded as relevant by members of this culture and, when compared with a corresponding social phenomenon in a culture Y, is found to be speciic to culture X” (Vermeer, 1980; cfr. Nord 1997: 34). Finally, in this century, Santamaria (2001) deines and organizes cultural references in a detailed classiication consisting of numerous categories and subcategories. As this diachronic presentation of the studies regarding culture-speciic items suggests, it seems that no agreement has been reached among the authors and that none of them explains clearly how to recognize a culture-speciic element within a text. Moreover, some scholars focus on the changeable nature of cultural elements over time and following linguistic changes. In this sense, Molina Martínez (2001) considers that they exist only in those situations characterized by a cultural transfer –that is, in a translational context. 192 Virginia Mattioli As a result, it seems that the most commonly accepted characteristics of culture-speciic elements are their speciicity with respect to the original culture; their absence in the target culture; and their connotative value. Considering that the authors’ divergent positions and the lack of a proper deinition of culture-speciic elements make it impossible to determine systematically whether they present or not a cultural nature, in this study they have been identiied through their morphological structure (i.e. formation process, construction and origin of a word). In fact, the use in a language X of words borrowed from other languages implies the absence of such terms in the patrimonial vocabulary of the language X. According to Delwey (1950: 60-61 cit. Molina Martínez 2001: 23), language is a product of culture; hence, the absence in language X of a word to deine an object or concept denotes the absence of such an object or concept in the X culture. Consequently, words imported from a language Y to a language X designate objects or concepts that originally belong to the Y culture and that therefore can be considered culturally speciic to the Y culture. In this paper, therefore, culture-speciic elements are taken to be all those words that present a morphological structure alien to the word formation rules of the language of the analyzed text (imported from a different language, thus from a different culture). Some examples of culture-speciic elements identiied in the novels translated into Italian and analyzed in the study are “bistrot”, “whisky” and “sari” – words borrowed from foreign languages to designate objects that did not exist in the contemporary Italian culture (hence the lack of an Italian word to label them) – that represent respectively the French, Scottish and Indian cultures. 2.2.2. Treatment and translation of cultural elements As regards the treatment of cultural elements, Nida (1964) initially proposes three basic methods to translate these references –addition, omission and conversion– to which he later adds some other solutions. His attempt is followed by Vázquez Ayora (1977: 251-384), Newmark (1988: 103-104), and Molina Martínez (2006), among many others. With the same purpose, some scholars prefer to organize translation techniques along a continuum instead of classifying them in categories. Among these, Mangiron (2006) distributes translation techniques along a line, ordering them from the most faithful to the source language and Translator’s creativity in cultural elements transposition... 193 culture (transposition) to the most adapted to the target culture (cultural adaptation). Adopting terms coined by Venuti (1995), these two opposite extremes can be named respectively foreignization and domestication. Some other authors, instead, study the factors that inluence the choice of the most appropriate translation technique among the ones suggested. In this sense, the precursor is Newmark (1998: 103), who in 1988 focused on six factors –text inality, readers’ motivation and cultural level, importance of the cultural reference in the original text, area of use, novelty and future of the term. While all the authors analyzed seem to agree on the factors to be taken into account at the moment of the linguistic transfer, there are still many discordant proposals regarding possible translation solutions to overcome the problems created by the cultural differences. To resolve these controversies and try to consider the most ample gamut of techniques possible, in this study the two main kinds of proposals have been merged and a new taxonomy of translation techniques has been suggested. The proposal, shown in igure 1, is composed of 15 techniques ordered in a continuum, from the most exotic to the most domesticated, each one deined and exempliied below. Fig. 1. Continuum of translation techniques used in the present study • Transposition: maintenance of the original foreign word (Fish and Chips > Fish and Chips) • Transposition of proper name: maintenance of the original proper name (Victoria street > Victoria street) 194 Virginia Mattioli • Borrowing: maintenance of an original foreign word recognized by the dictionary of the target language (Web > Web) • Naturalization: adaptation to the target language phonetics (school bus > scuolabus) • Literal translation: literal translation of the culture-speciic element (email > posta elettronica) • Neutralization: explication by means of words that explain the function or the characteristics of the culture-speciic element (turf > tappeto erboso del giardino) • Hyperonym or hyponym: generalization or speciication (respectively: bus station > stazione and knife > machete) • Accepted standard translation: non-literal translation accepted by the vocabularies and the grammars of the target language (conference committee > commissione congiunta) • Paraphrase: addition of explication within the text (gondola > gondola, a narrow Venetian boat) • Footnote: addition of information in a footnote (prega Santa Lucia per recuperare la vista > she prays to Saint Lucy to recover her sight1 - 1. Saint Lucy is considered the protector of sight, because of her name, Lucia, from the Latin word “lux” which means “light”). • Omission: omission of a culture-speciic element (watching Friends on the TV > guardare la televisione) • Functional or cultural equivalent: the use of a different element with the same cultural value of the original one (BA degree > laurea triennale) • Addition: addition of information absent in the source text (they drove back > tornarono indietro con la jeep) • Lack of semantic or formal correspondence: translation presents a divergence of meaning or style with respect to the source text (respectively: on the corner of Sloane Street > all’angolo di piazza Sloane and snatching from street urchins > furti dei bambini di strada) • Autonomous creation: introduction of a cultural element that was absent in the source text (he sat and ate calmly > si sedette e mangió con calma le sue tagliatelle ) Translator’s creativity in cultural elements transposition... 195 2.3. Creativity vs. conservationism Gil-Bardají (2003: 96), adopting Toury’s (1974; cit. Gil-Bardají, 2003) deinition, considers norms as a set of regularities in a translator’s behavior determined by a certain socio-cultural situation. Kenny (2001: 66) transfers the concept of normalization to corpus-based translation studies and deines it as the use of conventional target translation solutions (opposed to the adoption of unusual source text features). The author adds that normalization can be applied at any language level and denominates the application of such techniques to individual words or collocations lexical normalization. So, Kenny (2001: 66) relates the idea of normalization to that of conventionalism. According to Corpas Pastor (2001), traditional (hence conventional) translation techniques are those that maintain a sort of equivalence between source and target text. Despite the debatable nature of the concept of equivalence, in this study equivalence is observed from a formal and a semantic perspective, so items are considered equivalent (hence conventional) only when they present both a formal and semantic correspondence –respectively in terms of signiier and meaning. Hereafter, all those techniques characterized by some kind of omission, addition, manipulation or alteration of the original lexical material (see the previous section 2.2.2 for the techniques taxonomy adopted in this study) are considered not equivalent, thus not conventional, and consequently creative. From here, in this paper translation strategies are divided into conventional and creative ones. The irst group includes only literal translations, as they are the only ones that present a complete level of equivalence –both from a formal and a semantic point of view. On the opposite side, all the other techniques considered in the range presented in igure 1 are characterized by some kind of modiication of the original material, so by some sort of nonequivalence (lexical, semantic or both), hence they are assigned to the creative strategies group. This division enables us to observe and classify translators’ behavior regarding culture-speciic elements in terms of creativity: do they tend to maintain equivalence with the original elements (using literal translations) or do they prefer a more creative approach, modifying and manipulating the original items (using one of the techniques included in the creative strategies group)? 196 Virginia Mattioli 3. Case of study 3.1. Hypothesis and objectives The object of this research is to assess translators’ creativity in relation to culture-speciic elements. With this goal, corpus linguistic methodology was used to observe this feature in a set of translated novels, starting from the hypothesis that translators prefer creative techniques to transpose culture-speciic items –according to the division between creative and conservative techniques proposed in the previous section. Actually, the relation between culture-speciicity and foreign morphological structure (explained in section 2.2) seems to support this supposition. To corroborate this hypothesis, three semantic classes of culture-speciic items were considered: (a) food and drinks, such as “curry”, “bistrot” or “cognac” (b) communication and transportation, like “jeep”, “parkway” or “roulotte” and (c) clothes and body care, e.g. “tweed”, “gilet” or “sari”. Once the items had been identiied in a balanced and representative corpus, the techniques used to translate them were established by comparing aligned originals with translations. Finally, the results were observed to establish translators’ preference for creative or conservative behavior. 3.2. Corpus used The corpus used in the study, named LIT_TRAD, is compounded of two parallel subcorpora of award-winning iction novels published between 2000 and 2014 and translated from English and Spanish into Italian. The two sets of novels are denominated LIT_TRAD_EN_IT – which includes 26 novels (13 English originals and 13 Italian translations) – and LIT_TRAD_ES_IT – which is formed of 24 novels (12 Spanish originals and 12 Italian translations)–. Table 1 shows the details of the works included (original and translated versions) and their distribution within the two subcorpora: 197 Translator’s creativity in cultural elements transposition... Subcorpus Name Linguistic Pairs EN>IT Original Novels Translations Original_en and Original_es Target_en_it and Target_es_it Author Title Year of Publication Translator Title Year of Publication Atwood, Margaret Oryx and Crake 2003 Belletti, Raffaella L’ultimo degli uomini 2003 Auster, Paul The Brooklyn follies 2005 Bocchiola, Massimo Le follie di Brooklyn 2005 Banville, Jhon The sea 2005 Kampmann, Eva Il mare 2006 Coetzee, Jhon Maxwell Elizabeth Costello 2003 Baiocchi, Maria Elizabeth Costello 2003 Cunningham, Michael Specimen days 2005 Cotroneo, Ivan Giorni memorabili 2005 De Lillo, Don Cosmopolis 2003 Pareschi, Silvia Cosmopolis 2003 Desai, Anita The artist of disappearance 2011 Nadotti, Anna L’artista della sparizione 2013 Ghosh, Amitav The hungry tide 2004 Nadotti, Anna Il paese delle maree 2005 Lessing, Doris Alfred and Emily 2008 Pareschi, Monica Alfred e Emily 2010 Morrison, Toni Home 2012 Fornasiero, A casa Silvia Potok, Chaim Old men at 2001 midnight Muzzarelli, Mara Vecchi a mezzanotte 2002 Roth, Philip The plot against America 2004 Mantovani, Vincenzo Il complotto contro l’America 2005 Suraiprasad Naipaul, Vidiadhar Half a life 2001 Cavagnoli, Franca La metà di una vita 2002 2012 198 Subcorpus Name Linguistic Pairs ES>IT Virginia Mattioli Original Novels Translations Original_en and Original_es Target_en_it and Target_es_it Author Title Year of Publication Translator Title Year of Publication Bryce Echenique, Alfredo El huerto de mi amada 2002 Bovaia, Roberta Il giardino della mia amata 2003 Cercas, Javier Soldados de Salamina 2001 Cacucci, Pino Soldati di Salamina 2002 Marías, Javier Los enamo- 2011 ramientos Felici, Glauco Gli innamo- 2012 ramenti Montero, Rosa La loca de la casa 2003 Finassi Parolo, Michela La pazza di casa 2004 Muñoz Molina, Antonio El viento de la luna 2006 Nicola, Maria Il vento della luna 2008 Piglia, Ricardo Blanco Nocturno 2011 Cacucci, P. Bersaglio notturno 2011 Restrepo, Laura Delirio 2004 Simini, D. Delirio 2005 Rosa, Isaac El vano ayer 2005 Annabella Cardinali Il vano ieri 2007 Skarmeta, Antonio El baile de la victoria 2003 Collo, Paolo Il ballo della vittoria 2005 Vargas Travesuras Llosa, Mario de la niña mala 2006 Felici, Glauco Le avventure della ragazza cattiva 2006 Vazquez Montalbán, Manuel El hombre de mi vida 2000 Hado, Lyria L’uomo della mia vita 2000 Vila-Matas, Enrique El viaje vertical 2001 Cattaneo, S. Il viaggio verticale 2006 Table 1. Composition of LIT_TRAD Translator’s creativity in cultural elements transposition... 199 After a close study of the literature on corpus compilation, the works to be included in the collection were chosen according to the following criteria: • Representativeness (from a qualitative and quantitative point of view). Firstly, all the novels selected had been awarded international literary prizes, to satisfy the qualitative representativeness criterion. Then, once the corpus had been compiled, its quantitative representativeness was assessed using ReCor (Corpas Pastor, Seghiri, Maggi 2006), a statistical program speciically developed to evaluate the quantitative representativeness of a corpus a posteriori, according to the number of words and of texts that it includes. • Inclusion of whole texts: to achieve the aim of the study, identifying as many culture-speciic elements as possible. • Balance: the two subcorpora include the same number of works and, despite the inclusion of entire texts, they are still comparable as regards the number of words. • Variability: the original novels selected are written in different varieties of English and Spanish to guarantee a high level of variability. • Authenticity: the texts included are literary works written for real contexts by native authors. To facilitate the identiication of the culture-speciic elements, the corpus was semantically tagged using USAS (UCREL Semantic Analysis System) developed by the UCREL research group of the University of Lancaster (Piao et al., 2016). This tagging system adds after each word an underscore followed by a code formed of numbers and letters (e.g. _F1 for food related words). 3.3. Analysis The two subcorpora were analyzed separately and at the end, results were compared. The analytic process can be divided into 4 steps: 1. Selection of the culture-speciic elements 200 Virginia Mattioli 2. Comparison of the translated culture-speciic elements with the corresponding original items 3. Determination of the translation technique used in each case 4. Comparison between the results obtained from the two subcorpora. Various programs and tools were used to analyze the texts. In the irst phase, a word list was created for each target corpus (TARGET_EN_IT and TARGET_ES_IT) using AntConc (Anthony, 2014). Then, the terms related to the three semantic categories considered in this study (see section 3.2) were identiied in the lists. This process was facilitated by the format of the semantic tagging used. In fact, searching for each tag in the concordance list, the outputs present the searched node in the middle of each line (in blue in the screenshot in igure 2 below), and on its left all the terms included in the related semantic category (in red in the screenshot in igure 2). Fig. 2. Extract from the results of the search for the semantic tag F1 (food) in TARGET_EN_IT in AntConc (Anthony, 2014) Among the words belonging to each semantic ield considered, only culture-speciic elements were selected manually according to their morphological structure (only the words with a foreign morphology were chosen). To follow the example given in igure 2, among the words related to the semantic ield of food (in red) –identiied by means of the search for tag F1–, only the ones with a foreign morphological structure were chosen, thus only the word “yogurt”. Among the elements speciic to foreign cultures (which present a foreign morphological structure), those items that are speciic to Italian culture were also considered, to observe their treatment in the transfer from the source languages studied to the Italian target language: are they present in the foreign novels, or are they added by Italian translators? And if they are present in the source text, which techniques does the translator use to transpose them into Italian without losing their exotic 201 Translator’s creativity in cultural elements transposition... Italian-style function (if any)? With this objective, also those words with an Italian morphological structure that are frequently used in foreign languages (like “panini”, “vespa” or “spaghetti”) were included. Finally, the elements of the resulting lists were subjected to a further selection in order to assure a high level of representativeness and to exclude from the study the terms that do not represent any speciic culture. This selection excluded the following culture-speciic elements from the analysis: • those items with a frequency lower than 10 occurrences; • those items that appear in fewer than three different novels; • those items that could not be considered culture-speciic elements, despite presenting a morphological structure external to Italian grammar, because of their complete assimilation into Italian daily life and language, as demonstrated by a high frequency in general Italian corpora (e.g. jeans, computer, internet, etc.). As a result, only the elements that satisied these criteria were analyzed. The complete lists of the culture-speciic elements resulting from this selection process that were analyzed in the present study are presented in tables 2 and 3: Culture-Speciic Element Frequency Semantic Class Original Language Avenue 157 Communication and Transportation FR Street 157 Communication and Transportation EN Taxi 67 Communication and Transportation FR Sari 49 Clothing and Body care HI Camion 43 Communication and Transportation FR Autobus 42 Communication and Transportation FR Garage 40 Communication and Transportation FR Station 37 Communication and Transportation EN Road 31 Communication and Transportation EN Square 29 Communication and Transportation EN Pullman 25 Communication and Transportation EN Jeep 23 Communication and Transportation EN 202 Culture-Speciic Element Virginia Mattioli Frequency Semantic Class Original Language Scotch 23 Food and drink EN Picnic 18 Food and drink EN Vodka 18 Food and drink RU Whisky 16 Food and drink EN Toast 17 Food and drink EN Champagne 15 Food and drink FR Brandy 14 Food and drink EN Parkway 14 Communication and Transportation EN Pizza 14 Food and drink IT Sandwich 14 Food and drink EN Slogan 14 Communication and Transportation EN Mais 13 Food and drink ES Berretto da baseball 12 Clothing and Body care EN Roulotte 12 Communication and Transportation FR Curry 10 Food and drink HI Tunnel 10 Communication and Transportation FR 10 Clothing and Body care EN Tweed TOTAL 944 Table 2. Culture-speciic elements selected for the analysis in LIT_TRAD_EN_IT Cultur-Speciic Element Frecuency Semantic Class Original Language Calle 163 Communication and Transportation ES Taxi 105 Communication and Transportation FR Avenida 58 Communication and Transportation ES Autobus 51 Communication and Transportation FR Champagne 49 Food and drinks FR Whisky 39 Food and drinks EN Bistrot 29 Food and drinks FR Camion 23 Communication and Transportation FR Panini 20 Food and drinks IT Gin 16 Food and drinks EN 203 Translator’s creativity in cultural elements transposition... Cultur-Speciic Element Frecuency Original Language Semantic Class Sandwich 16 Food and drinks EN Reportage 14 Communication and Transportation FR Tunnel 13 Communication and Transportation EN Gilet 12 Clothing and body care FR Cognac 11 Food and drinks FR Dessert 10 Food and drinks FR TOTAL 629 Table 3. Culture-speciic elements selected for the analysis in LIT_TRAD_ES_IT Tables 4 and 5 show the number of elements identiied in each step of this irst phase of analysis for each subcorpus (the number of elements included in the three semantic classes chosen, the culture-speciic elements identiied among them and the most representative ones selected for the analysis): Food and Drink Total Tot. Semantic elements Culturespeciic elements identiied Culturespeciic elements analyzed % Tot. Clothes and body care % Tot. Transportation and communication % Tot. % Tokens 8969 -- 2248 25% 2213 24% 4507 50% Types 972 -- 385 40% 267 27% 320 33% Tokens 1945 22% * 493 22% ** 253 11% ** 1199 27% ** Types 285 36% * 138 36% ** 57 21 % ** 90 28% ** Tokens 944 49% *** 172 35% **** 71 28% **** 701 58% **** Types 29 10% *** 11 8% **** 3 5% **** 15 17% **** * % of the total semantic elements, ** % of the total semantic elements of the category, *** % of the total culture-speciic elements, **** % of the total culture-speciic elements of the category Table 4. Culture-speciic elements identiied in LIT_TRAD_EN_IT 204 Virginia Mattioli Food and Drink Total Semantic elements Culturespeciic elements identiied Culturespeciic elements analyzed Clothes and body care Transportation and communications Tot. % Tot. % Tot. % Tot. % Tokens 7599 -- 2234 29% 1923 25% 3442 45% Types 997 -- 443 44% 260 26% 294 29% Tokens 1594 21% * 588 26% ** 168 9% ** 838 24% ** Types 348 35% * 184 42% ** 56 22% ** 108 37% ** Tokens 629 39% *** 190 32% **** 12 7% **** 427 50% **** Types 16 5% *** 8 4% **** 1 1% **** 7 6% **** * % of the total semantic elements, ** % of the total semantic elements of the category, *** % of the total culture-speciic elements, **** % of the total culture-speciic elements of the category Table 5. Culture-speciic elements identiied in LIT_TRAD_ES_IT The second and the third phases aimed to establish the translation technique used in each case, starting respectively from the target and the original text. These steps were carried out using the AntPConc program (Anthony, 2013), which searches for an item in one of the two aligned corpora and shows the resulting concordances in both of them. The second phase, characterized by the search for the culture-specific elements identiied in the target corpus, revealed the corresponding original form of each item. The screenshot in igure 3 shows the search for the item “roulotte”, as an example of this step. In the example in igure 3, the culture-speciic element “roulotte” was searched for in the target corpus. By comparing the outcomes shown in the upper and the lower part of the screen (respectively, the results of the search in the target and the source corpus) it was possible to determine the corresponding original terms in the source corpus, in this case “trailer” and “caravan”. The comparison also revealed whether the translator had added any culture-speciic element originally absent in the source text (a case that would imply a high degree of translator’s creativity). Translator’s creativity in cultural elements transposition... Fig. 3. Search for the culture-speciic element “roulotte” (caravan) in the Italian target corpus TARGET_EN_IT and its comparison with the source language aligned corpus LIT_TRAD_EN_IT 205 206 Virginia Mattioli In the third phase, the original forms of each culture-speciic element were searched for in the source corpus (following the example in igure 3, the words “caravan” and “trailer” were searched for in the source corpus). Through this search, it was possible to establish which of the translation techniques included in the proposed taxonomy presented in igure 1 had been used. The results obtained from the second and third phases of the analysis applied to each subcorpus are detailed in the following table (table 6): Occurrences Techniques LIT_TRAD_EN_IT occurrences Transposition 13 % 33% LIT_TRAD_ES_IT occurrences 0 31% Transposition of proper name 389 Borrowing 630 53% 306 43% 5 < 1% 2 < 1% 60 5% 89 12% Naturalization Literal translation 222 % Neutralization 0 0% 0 0% Hyperonym 8 < 1% 8 1% Hyponym 2 < 1% 1 < 1% 12 1% 0 0% Paraphrase 0 0% 0 0% Footnote 0 0% 0 0% Omission 21 2% 8 1% 5 < 1% 6 <1% Addition 15 1% 27 4% Lack of semantic or formal equivalence 13 1% 9 1% 3 < 1% 0 0% 13 1% 5 < 1% Standard accepted translation Cultural or functional equivalent Autonomous creation Other techniques Table 6. Translation techniques used in LIT_TRAD Considering the wide use of borrowings and transpositions, a further analysis was carried out to explore the origin of the foreign terms. In this case, translators’ creativity was assessed according to the original- 207 Translator’s creativity in cultural elements transposition... ity of such items with respect to the source text. To this end, different values were attributed to the elements adopted, depending on: • whether a word had been adopted from the source language but it was absent in the original text (high level of creativity) (e.g. making his pitch from his knees > lanciando i suoi slogan in ginocchio) • whether a source word had been transferred from the source to the target text through a borrowing or a transposition from a language different from the source one (mid level of creativity) (e.g. buttering corn bread > imburrava pane di mais) • whether the foreign word used in the translation was the same one used in the source text (low level of creativity) (e.g. a cheap printed sari > un modesto sari di tessuto stampato). The detailed results of this comparison are presented in tables 7 and 8 below: Creativity level More creativity Less creativity Translator’s behavior Type of technique Foreign elements added by the translator 32 Borrowing Foreign elements added from languages other than the source one 156 Borrowing Foreign element directly transposed from the original text 806 Transposition 31 Other techniques Borrowing 1 156 413 393 Table 7: translator’s creativity in the use of borrowings in LIT_TRAD_EN_IT Creativity level More creativity Less creativity Translator’s behavior Type of technique Foreign elements added by the translator 1 Borrowing 1 Foreign elements added from languages other than the source one 96 Borrowing 96 Foreign element directly transposed from the original text 420 Transposition 221 Borrowing 199 Table 8: translator’s creativity in the use of borrowings in LIT_TRAD_ES_IT 208 Virginia Mattioli As a last step, the results obtained from the two subcorpora analyzed were compared. 3.4. Discussion The outcomes of the analysis show that the quantity of culture-specific elements identiied in the two subcorpora (LIT_TRAD_EN_IT and LIT_TRAD_ES_IT) is similar for both pairs of languages. However, the proportion of tokens to types is higher in the subcorpus of novels translated from English (1945 tokens and 285 types) than in the one composed of Spanish translations (1594 tokens and 348 types). This difference indicates that Spanish translations present a greater variety of culture-speciic elements, each one with a lower number of occurrences. On the other hand, regarding the items analyzed, there is a signiicant quantitative difference between the two subcopora. In fact, after selection according to the representativeness and culture-speciicity criteria (see section 3.3), in LIT_TRAD_EN_IT 29 culture-speciic elements were analyzed (10% of the total) while in LIT_TRAD_ES_IT only 16 (5%) (see tables 4 and 5). This difference underpins the results obtained for the total culture-speciic elements explained above: in LIT_TRAD_ ES_IT there is a greater variety of items with a lower frequency, so that only few of them met the representativeness criteria (being present in more than 3 novels and presenting at least 10 occurrences) and were selected for the analysis. With regard to culture speciicity, there are no differences between the two subcorpora: in the English-Italian one, 4 elements were eliminated because of their assimilation into the target culture, and in the Spanish-Italian corpus, 5. It is interesting to note that the eliminated elements are the same in the two subcorpora (in both groups of texts the words “ilm”, “computer”, “jeans”, “internet” and in LIT_TRAD_ES_IT also “yoghurt” were eliminated). These results also show that the words most assimilated into the Italian language and culture are the English ones (regardless of the source language of the texts). Because of the different number of elements, all the comparisons between the outcomes obtained in the two subcorpora are expressed in percentages. Regarding the translation techniques used, as shown in table 6, the most commonly-used strategies are borrowings (used in 53% of the cases in the novels translated from English and 43% in those translated Translator’s creativity in cultural elements transposition... 209 from Spanish) and transpositions (33% and 31% respectively, considering both transposition and transposition of proper names). On the other hand, literal translations had been used in only 5% of the occurrences in LIT_TRAD_EN_IT and 12% in LIT_TRAD_ES_IT. These results conirm the initial hypothesis and demonstrate that in transposing culture-speciic elements, translators tend more to creativity than to conventionalism. Comparing the two language pairs considered, Spanish-into-Italian translators seem to be more faithful to the original text, thus presenting a lower level of creativity (considering creativity –opposed to conventionalism– as any kind of manipulation of the source text that causes any sort of nonequivalence: see section 2.3). In fact, LIT_TRAD_ES_ IT presents a lower percentage of borrowings than LIT_TRAD_EN_IT (43% as opposed to 53%) and a higher one of literal translations (12% versus 5%). Considering also the origin of such borrowings, the level of creativity is higher in translations from English than in those from Spanish. Actually, in LIT_TRAD_EN_IT, although the majority of the foreign words are transposed directly from the source text (81%) or come from a language different from the source one –usually French– (15%), in 3% of the borrowings translators decided to add a word from English that was absent in the original texts, demonstrating a higher level of initiative and creativity. On the other hand, in LIT_TRAD_ES_ IT translators opted almost always to use the same terms as the original text or to substitute them with words from other languages (respectively in 81% and 19% of the use of foreign words), but in just one case (0,2%) a borrowed word from Spanish that was absent in the original text was added to the translation (see tables 4 and 5). These results could be interpreted as being related to the socio-cultural prestige of the languages analyzed. English is a prestigious language in the centre of the polysystem (according to the polysystem theory proposed by Even Zohar, 1990), so it is less translated and translators tend to maintain English words in the target texts. On the other hand, Spanish is a marginal language in the polysystem with a low degree of socio-cultural prestige; consequently, translators are less interested in maintaining items from this language in the target texts and frequently exchange them with terms adopted from other languages which are external to the linguistic pair but more prestigious. 210 Virginia Mattioli These outcomes also suggest that the techniques used do not depend on the similarity or difference between the source and the target language, but on the degree of socio-cultural prestige of a language. Speciically, the greater use of literal translation in Spanish-Italian translations does not seem to depend on the afinity between Spanish and Italian (in fact, in translating into Italian from Spanish –a closer language than English to Italian– translators frequently opt to add many English words that are completely different from both the source and the target language, instead of maintaining a Spanish term more similar to the target language). 4. Conclusions The aim of this study was to assess translators’ creativity in the transposition of culture-speciic elements. To reach this objective a corpus-based analysis was applied to a set of 25 translated novels focusing on the techniques chosen by translators to transpose culture-speciic elements from certain semantic ields (food and drink, clothing and body care, and transportation and communication). The results of the analysis show that the most commonly used techniques are borrowings and transpositions. These outcomes corroborate the initial hypothesis, demonstrating that translators do indeed prefer to adopt creative techniques to transpose culture-speciic items, and suggest that translation helps to enlarge target-language lexis from two perspectives. On the one hand, translators’ choices tend to enlarge the vocabulary of the target language by importing terms from other languages and helping to increase their frequency of use. On the other, translation –as linguistic and cultural transfer– contributes to multiculturalism by enriching the target culture with words and concepts from the source language as well as from other different languages and cultures. From a methodological perspective, the choice of a corpus linguistics method enabled us to reach the initial goal, and it proved a useful approach to identify culture-speciic elements in an ample range of texts and analyze their translation techniques electronically, thanks to the use of several tools appropriate to each different phase and goal. These results appear to suggest two considerations regarding corpus-based methods: irstly, this methodology can be successfully applied to literary texts, and speciically to literary translation; secondly, the applica- Translator’s creativity in cultural elements transposition... 211 tion of this method to lexical and terminological research shows itself to be highly effective. This article is only a irst approach to the study of creativity in the translation of lexical elements focusing on culture-speciic items. It could be followed by further research into the role of translation in the adoption of new lexical units and in the extension of vocabulary. There is ample scope for continued investigation of translators’ creativity in relation to culture-related items, from both perspectives: translation process (observing their transposition) and product (analyzing their form in the target language). In this sense, further analysis could focus on the study of other semantic categories of culture-speciic items, on lexical elements related to the discourse of a speciic culture, or on those lexical elements that represent the culture of speciic social classes or groups. Moreover, the methodology proposed in this paper could be replicated to observe the characteristics of other lexical units –not necessary linked to culture-speciicity– from the same translational perspective. 5. References Aixelá, Javier Franco. 1996. Culture-speciic Items in Translation. In Ávarez, Román & Vidal, M. Carmen-África (eds.) Translation, Power Subversion. Clevedon, Philadelphia, Adelaide: Multilingual Matters, 52-78. Anthony, Laurence. 2014. Antconc (Version 3.4.1) [Computer Software]. Tokyo, Japan: Waseda University. Http://www.antlab.sci.waseda.ac.jp/ [Accessed 01/12/2015]. Anthony, Laurence. 2013. AntPConc (Version 1.0.3) [Computer Software]. Tokyo, Japan: Waseda University. Http://www.antlab.sci.waseda.ac.jp/ [Accessed 01/12/2015]. Corpas Pastor, Gloria. 2001. La creatividad fraseológica: efectos semántico-pragmáticos y estrategias de traducción. Paremia 10: 67-78. Corpas Pastor, Gloria; Seghiri Domínguez, Miriam & Romano, Maggi. 2006. Recor: método para la determinación de la representatividad de un corpus, patente n. ES2320511 de la Universidad de Málaga, http:// umapatent.uma.es/es/patent/metodo-para-la-determinacion-de-la-representa4b0/ [Accessed 01/06/2016]. Even-Zohar, Itamar. 1999 La posición de la literatura traducida en el polisistema literario. Traducción de Montserrat Iglesias Santos revisada por el autor. In Iglesias Santos, Montserrat (ed.) Teoría de los Polisistemas. Madrid: Arco [Bibliotheca Philologica, Serie Lecturas], 223-231. 212 Virginia Mattioli Gil-Bardají, Anna. 2003. Procedimientos, técnicas, estrategias: operadores del proceso traductor. Recercat, Universitat Autònoma de Barcelona. http:// hdl.handle.net/2072/8998 [Accessed 16/04/2017]. Kenny, Dorothy. 2001. Corpus and Creativity in Translation. A Corpus-based Study. St. Jerome Publications. Leech, Geoffrey. 1992. Corpora and theories of linguistic performance. In Svartvik, Jan (ed.) Directions in Corpus Linguistics. Proceedings of Nobel Symposium 82, Stockholm, 4-8 August 1991. Berlin/New York: De Gruyter. Lepinette, Brigitte. La historia de la traducción. Metodología. Apuntes bibliográicos. HISTAL 2004. http://www.histal.ca/wp-content/uploads/2011/08/La-historia-de-la-traduccion-metodologia-apuntes-bibliograicos.pdf [Accessed 20/04/2017]. Mangiron i Hevia, Carme. 2006. El tractament dels referents culturals a les traduccions de la novel·la Botxan: la interacció entre els elements textuals i extratextuals (PhD thesis). Barcelona: Universitat Autònoma de Barcelona, Departamento de Traducción e Interpretación. http://hdl. handle.net/10803/5270 [Accessed 30/10/2014]. Mayoral Asensio, Roberto. 1994. La explicitación de la información en la traducción intercultural. In Hurtado A. (ed.) Estudis sobre la traducció. Castellón de la Plana: Publicacions de la Universitat Jaume I. McEnery, Tony & Wilson, Andrew. 1996. Corpus Linguistics. Edinburgh: Edinburgh University Press. Molina Martínez, Lucía. 2001. Análisis descriptivo de la traducción de los culturemas árabe-español. Barcelona: Universitat Autònoma de Barcelona, Departamento de Traducción e Interpretación. http://hdl.handle. net/10803/5263 [Accessed 30/10/2014]. Molina Martínez, Lucía. 2006. El otoño del pingüino. Castellón de la Plana: Publicacions de la Universitat Jaume I. Newmark, Paul. 1988. A Textbook of Translation. New York: Prentice Hall. Nida, Eugene. 1945. Linguistics and Ethnology in Translation Problems. Word 1. Nida, Eugene. 1964. Toward a Science of Translating: With Special Reference to Principles and Procedures Involved in Bible Translating. Leiden: Brill Archive. Nord, Christiane. 1997. Translating as a Purposeful Activity. Manchester: St Jerome. Piao, Scott et al. 2016. Lexical Coverage Evaluation of Large-scale Multilingual Semantic Lexicons for Twelve Languages. Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC2016). Portoroz, Slovenia. 2614-2619. USAS Italian Semantic Tagger. http://ucrel.lancs.ac.uk/usas/gui/ [Accessed 20/04/17]. Translator’s creativity in cultural elements transposition... 213 Procházková, Petra. 2006. Fundamentos de la lingüística de corpus. Concepción de los corpus y métodos de investigación con Corpus. www.prochazkova.de/fundamentos_de_la_ling%C3%BC%C3%adstica_de_corpus.pdf [Accessed 06/08/14]. Sánchez, Aquilino. 1995. Deinición e historia de los corpus. In Sánchez, A.; Sarmiento, R.; Cantos, P. & Simón, J. (org.) CUMBRE. Corpus lingüístico del español contemporáneo: fundamentos, metodología y análisis. Madrid: SGEL (Sociedad General Española de Librería). Santamaria Guinot, Laura. 2001. Subtitulació i referents culturals. La traducció com a mitjà d’adquisició de representacions mentals. (PhD thesis). Barcelona: Universitat Autònoma de Barcelona, Departamento de Traducción e Interpretación. http://hdl.handle.net/10803/5249 [Accessed 30/10/2014]. Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press. Vázquez-Ayora, Gerardo. 1977. Introducción a la traductología: curso básico de traducción. Washington D.C.: Georgetown University Press. Venuti, Lawrence. 1995. The Translator’s Invisibility: A History of Translation. London: Routledge. Xiao, Richard & Ming, Yue. 2009. Using corpora in translation studies: the state of art. In Baker, P. 2012. Contemporary Corpus Linguistics. London: A&C Black. ojs.uv.es/index.php/qilologia/index Qf Lingüístics Corpus-driven insights into the discourse of women survivors of Intimate Partner Violence El discurso de mujeres supervivientes de violencia de género: incursiones lingüísticas basadas en un análisis de corpus Alfonso Sánchez-Moya Universidad Complutense de Madrid/Vrije Universiteit Amsterdam. [email protected] Received: 30/04/2017. Accepted: 11/10/2017 Abstract: Despite its ubiquity, Intimate Partner Violence (IPV) is still under-researched from a Critical Discourse Studies (CDS) perspective. Thus, this paper investigates the discourse of women survivors of IPV focusing on a corpus-driven examination of the data. This is done after applying the text-analysis software tool LIWC (Linguistic Inquiry and Word Count) to a 120,000-word corpus collected from an anonymised, public, online forum available to IPV survivors. I contrast a plethora of linguistic phenomena in three online communities embedded within this forum (“Is it Abuse?”, “Getting out” and “Life after abuse”) in the attempt to sketch out how the discursive output varies across these three stages. This paper shows how pronominal distribution plays a role in the forging of collective identity. Differences in the emotional tone across the three explored groups are also identiied. Useful though these corpus-driven pointers may be, this study also warns of the precaution with which indings solely deriving from quantitative analyses need to be treated. Keywords: intimate partner violence (IPV); digital discourse; CDS; corpus; LIWC. Resumen: A pesar de su ubicuidad, la violencia de género es un campo aún poco explorado desde la perspectiva de los Estudios Críticos del Discurso. Este artículo investiga el discurso de mujeres supervivientes de violencia de género poniendo el foco en un análisis basado en el estudio de un corpus. Se efectúa tras aplicar LIWC (Linguistic Inquiry and Word Count) a un corpus de 120.000 palabras de un foro en línea, público y anonimizado disponible a supervivientes de violencia de género. Se contrastan varios fenómenos lingüísticos en tres comunidades digitales de este foro (“¿Es abuso?”, “Dejando una relación abusiva” y “La vida después del abuso”) en el intento de esbozar cómo la producción discursiva varía en estas etapas. Este estudio muestra cómo la distribución pronominal es relevante en la forja de la identidad colectiva. Se identiican Sánchez-Moya, Alfonso. 2017. “Corpus-driven insights into the discourse of women survivors of Intimate Partner Violence”. Quaderns de Filologia: Estudis Lingüístics 22: 215-243. doi: 10.7203/qf.22.11309 también diferencias en el tono emocional de estos tres grupos. A pesar de su utilidad, esta investigación advierte de la precaución con la que lidiar con resultados procedentes únicamente de análisis cuantitativos. Palabras clave: violencia de género; discurso digital; Estudios Críticos del Discurso; corpus; LIWC. Corpus-driven insights into the discourse of women survivors... 217 1. Introduction Based on the intersections of critical discourse studies (CDS), a corpus-driven analysis, and the exploration of a sociological phenomenon such as Intimate Partner Violence (IPV henceforth) from a discursive perspective, this article seeks to provide insights into the discourse used by women in a publicly-accessible online forum that fosters the exchange of posts around this type of violence. Given the affordances of the site under scrutiny, and by employing corpus-assisted research, this study pursues to gain a better understanding of IPV as a social phenomenon by evaluating the linguistic choices made by users of this forum. To wit, I shall investigate the differences in language use among three of the different online communities nested within this site: ‘Is it abuse?’, ‘Getting Out’, and ‘Life after an abusive relationship’. By doing so, I arguably establish a correlation between these three communities and different stages within an abusive relationship in the attempt to sketch out how the discursive output varies across these three stages. This is operationalised by running a LIWC analysis to a corpus consisting of 120,000 words (40,000 words per each of the above-mentioned communities) and by later contrasting the distribution of words as grouped in linguistic categories provided by LIWC (%) that characterise the three online communities. This paper is organised as follows: Section 2 offers a succinct overview of core concepts in this paper and how they are understood (namely IPV, discourse and CDS and Corpus Linguistics (CL)). Section 3 considers the most salient methodological considerations, placing an emphasis on LIWC, the text-analysis software tool being used for carrying out my analysis. Section 4 engages with the presentation of the indings, in addition to discussing their implications. Finally, Section 5 gives concluding remarks, identiies limitations and draws possible lines for future research. 2. Theoretical preliminaries: the exploration of IPV from CDS Asserting that violence is widely spread across most societies and cultures is certainly unproblematic, especially when violence is regarded one of the most salient global public health problems nowadays (WHO, 2016). Trying to provide a deinition of both the phenomenon and the 218 Alfonso Sánchez-Moya many related issues around it, however, is not at all cut and dried. This is partly rooted in the dificulty when conceptualising violence per se. In fact, as suggested by sociological research in these lines, determining the boundaries of what stands for violence and not-violence is hard, especially in practice (Krug, Dahlberg, Mercy, Zwi and Lozano, 2002; Walby, Towers et al., 2017). Reasons for this are multiple and are related to, inter alia, whether violence is actual, intended or threatened, the diverse interpretations of concepts such as harm, or the repetition of violent events (Walby, Towers et al., 2017). Not surprisingly, this conceptual fuzziness has triggered methodological divergences when trying to provide reliable accounts of violent events (Walby, Towers et al., 2017). Nonetheless, it can be arguably stated that Intimate Partner Violence (IPV) is one of the most salient types of abuse addressed against women (Heise, 1998). Contrary to more collective and multi-layered forms of violence against women (VAW), IPV is characterised for its interpersonal character in the sense that violence largely takes place between family members and intimate partners in wide range of settings, mostly in private contexts (Krug et al., 2002:6). Straightforward though this may seem, the mere attempt of providing a unique deinition of IPV as a phenomenon is far from inding an agreement, which gives an idea of how slippery this endeavour might be. In fact, although I adhere to the understanding of IPV as a gendered phenomenon, many scholarly voices have challenged the assumption that IPV is a gender-driven phenomenon. According to these views, this is linked to higher victimisation rates among women (Nicholls and Dutton, 2001) which may be related to conservative ideas around manhood and a consequent under-reporting of abuse by male victims (Dutton and Nicholls, 2005) or the tendency to believe that violence initiated by women is treated differently because it results in less serious physical harm on male partners than vice versa (Ross and Babcock, 2009). Although I believe that the rather ill-deined boundaries of some violent acts play a signiicant role in what accounts for violence – especially when it comes to psychological abuse, for instance, as argued by Winstok and Sowan-Basheer (2015), there is solid evidence to claim that IPV is strongly inluenced by the gender variable (Harris et al., 2012). Global institutions have widely observed that “the overwhelming global burden of IPV is borne by women” (WHO, 2016), so much so that 1 in 3 (35%) women worldwide Corpus-driven insights into the discourse of women survivors... 219 can be alleged to have experience IPV in their lifetime (WHO, 2016). In the attempt to provide a more proximate depiction of this situation in the context where this research is framed, it is noteworthy to mention that 46% of female homicide victims in England and Wales between 2013-2014 were killed by a male partner or ex-partner in contrast with 7% of male victims by a female partner during the same period (Ofice for National Statistics, 2015). Interesting though discussions around these concepts may be, further engagement with them would fall outside the scope of this article1. Notwithstanding the controversies around this type of violence, and based on previous studies (Crowell & Burgess, 1996; Heise & García-Moreno, 2002), I understand IPV as multiple, non-mutually exclusive acts of controlling, coercive, threating, degrading or violent behaviour within an intimate relationship triggered by a partner or ex-partner that causes physical, psychological or sexual harm to those in the relationship. As may be noted, I refrain from using a gender-based deinition of IPV. By no means does this imply I do not recognise the gender dimension within IPV. Rather, the main motivation for this is that this approach lends itself more suitably to also deal with this type of violence in homosexual partnerships, where the application of gender standards is not always so straightforward. Nonetheless, this piece of research concentrates in heterosexual relationships in which violence is exerted in women by the male counterpart in the relationship. Awareness-raising around IPV was brought about partly as an aftermath of the second wave of feminism back in the 1980s. Since then, there have been serious attempts to tackle this issue from a multiplicity of angles. From an institutional standpoint, after the United Nations Declaration on the Elimination of Violence Against Women in 1993, efforts to deine gender violence as a particular type of violence crystallised, providing a taxonomy of different types therein, and a systematic encouragement to eradicate it in any of its possible manifestations. Not unexpectedly, academic work has similarly contributed to providing a more accurate understanding of IPV in a plethora of possibilities, a small representation of which I move on to briely mention now. AlFor a brief illustration on the multiple attempts to understand IPV, albeit advocating that no single theory can fully explain the phenomenon of IPV, see Ali and Naylor (2013). 1 220 Alfonso Sánchez-Moya though research on the sociological (and worldwide) dimensions of IPV is extensive (Dobash and Dobash, 2015), many others have also examined the connections between IPV and physical (Campbell, 2002), psychological (Kumar et al., 2013) and reproductive health (Dartnall and Jewkes, 2013). Furthermore, as a positive outcome of the institutional claims, the legal facets of IPV have been widely investigated too (Walker, 2015). Interestingly, a great proportion of studies taking IPV on board suggest that their main motivation is to be conducive to deeper insights into this social phenomenon, therefore implying that there is still much to be done in these lines. Research from the language sciences have also echoed this pressing need, giving rise to a growing body of research investigating how the forms in which linguistic issues and IPV are intertwined. One observable trend deals with discourses of/about IPV, mostly focussing on recontextualised representations of both IPV and key social actors typically involved in it (namely abused women and their abusive male partners) in media discourse (Santaemilia and Maruenda, 2014) or online environments (Bou-Franch, 2013). Necessary though these studies are, attempts to examine discourses by social actors in IPV contexts are somewhat less frequent to date. This may be related to the complexity of gathering data, given the sensitive nature of this issue. Nonetheless, explorations of the macro-level of discourse in IPV contexts have drawn thought-provoking conclusions that can be of valuable help to gain a richer comprehension of IPV and social actors therein (Baly, 2010). Boonzaier (2008), for example, identiies the traces of “femininity discourse” in narratives of abused women, which underpins the loving, caring and nurturing roles of women that partly affect these women’s self-construction as the ones to blame for the situation. This paucity of research becomes even more remarkable when studies on the micro-level of discourse are concerned. In fact, although studies relying on a more detailed linguistic operationalisation have analysed an array of discursive structures in the representation of IPV episodes (Stokoe, 2010), I would argue that discourse-driven approaches to women suffering from IPV and their self-reported experiences around it are still under-researched. In fact, it is striking to observe that IPV has not gained suficient attention from Critical Discourse Studies, a ield that has been traditionally characterised, inter alia, for analysing “opaque as well as trans- Corpus-driven insights into the discourse of women survivors... 221 parent structural relationships of dominance, discrimination, power and control as manifested in language” (Wodak and Meyer, 2009: 10). This view is partly possible due to the conceptualisation of discourse as socially constitutive as well as socially conditioned (Fairclough and Wodak, 2004), which turns discourse into a “potential and arguably actual agent of social construction” (Sunderland and Litosseliti, 2002: 13) with a crucial role for creating, sustaining and/or transforming the social status quo (Hart and Piotr, 2014). These are the principles that ooze from the many social issues that have been explored through the CDS lenses, dealing with power issues in contexts related to political discourse (Marín-Arrese, 2011), racism (Van Dijk, 2015) and gender and sexualities (Baker, 2008), to name just a few. In fact, this motivation of readdressing power inequalities is a priority for CDS analysts. Similarly, CDS is also characterised by presupposing a political stance on the part of the researchers that seeks to bring about social change (Hart and Piotr, 2014). For this to be accomplished, a permanent recursivity between linguistic mechanisms (especially at the micro-level of discourse) and how these are interwoven in the fabric of the macro-(social) structures (KhosraviNik, 2010). Although the investigation of IPV from CDS seems justiied now, the outcome of this study would surely differ depending on the perspective within CDS I were to adopt when examining this social issue. As thoroughly depicted by one the latest compilations dealing with CDS (Hart and Piotr, 2014), different theoretical and methodological approaches to the study of discourse have prompted the development of multiple tool boxes from which to provide discourse-based insights into a social-driven concern. More traditional approaches (Wodak and Meyer, 2009) have been widely criticised on the basis of researchers’ bias and data representativeness (Stubbs, 1997; Widdowson, 2004). This has triggered interesting reactions within the ield to respond to this criticism. Both the socio-cognitive and the corpus linguistic approach can be seen as two consistent and systematic attempts to tackle some of the above-mentioned weaknesses. Interestingly, this article is somewhat embedded in the intersection of these two approaches, as I try to justify in what follows. As Teun Van Dijk puts it, most earlier and contemporary theories in CDS assume a direct link between discourse and society (or culture), [but] the problem is that 222 Alfonso Sánchez-Moya the nature of these casual or similar direct relationships is not made explicit but taken for granted or reduced to unexplained correlations (2014: 121). It is the unexplained nature of these correlations that Van Dijk attempts to solve by endorsing the socio-cognitive approach to the understanding of discourse (Van Dijk, 2014). While providing an accurate picture of this approach to discourse would challenge the space constraints of this paper, it is noteworthy to mention some of its key tenets. In short, it is claimed that the accounts in which individual language users frame text and talk is based on socially shared representations of individual social actors as members of various social collectivities, thus implying that personal and social dimensions in discourse processing are inextricably intertwined (Van Dijk, 2014). In other words, “our ongoing experience and understanding of the events and situations of our environment take place in terms of mental models that segment, interpret and deine reality as we ‘live it’” (Shipley and Zacks, 2008; Van Dijk, 2014). Mental models are therefore regarded as the “interface between discourse and the social or natural environment” (Van Dijk, 2014:124) and are given the potential of having a fundamental role in the production and comprehension of discourse. Accordingly, this approach defends a mutually constitutive relationship between discourse and social cognition, where discourse is instantiated in texts that project and transform socio-cognitive representations (SCRs), both the discourse producers’ and the recipients’ (Koller, 2014:152). What is more, socio-cognitive representations (SCRs) are “not individually held mental models, but cognitive structures shared by members of a particular group” (Koller, 2014). Consequently, they are “socially and discursively constructed in the course of … communication […], and are subject to ‘continual transformation […] through the ebb and low of intergroup relations’” (Augoustinos et al., 2006: 258-259). As will be speciied in the next section, this view of discourse gains more prominence if the communicative context this article pays attention to is taken into account. Rather than analysing discourse by isolated language users, I investigate how online users of an IPV forum engage in the construction of their online collective identity and the ways Corpus-driven insights into the discourse of women survivors... 223 in which this is instantiated in their discursive production. This seems to it nicely into the motivations of this approach, since as suggested indeed by Koller (2014: 153), [a] socio-cognitive approach to critical discourse studies is well suited to analysing collective identities and is especially relevant at the interpretation stage of analysis, which addresses the questions as to why text producers have selected a range of linguistic devices to construct groups in a particular way. As anticipated before, CDS research has been criticised for a lack of rigour in both collecting and analysing data, accusing studies in the ield of cherry-picking and questioning issues of representativeness and randomness in data selection (Widdowson, 1998; 2004). In the attempt to neutralise these arguments, CDS have gradually drifted towards a reliance on the corpus linguistic approach, which are well suited for identifying ideological patters of texts that would otherwise remain unnoticed (Baker, 2006). Another interesting contribution of the corpus linguistic approach pertains to the possibility of enabling the researcher to examine the texts under analysis without preconceived notions regarding the content of selected data (Baker et al., 2008). Despite its multiple strengths, it is also important to bear in mind that an over-dependence on the corpus linguistic approach may also have undesirable consequences for a CDS-oriented study. As pointed out by Fairclough, corpus linguistics (CL) can be arguably criticised for a positivist reduction of the ‘actual’ to the ‘empirical’ or ‘the observable’ (2015: 22), exposing CDS research to losing its character and purpose and to the risk of being too constrained by the capacities of CL (2015: 23). This is of particular signiicance in CDS, since many power imbalances are discursively crafted in ways that are not textually explicit, becoming therefore invisible for CL software (Fairclough, 2015). As far as this article is concerned, I use a text-analysis software tool to provide a solid starting point for my research purposes. On no account should this be regarded as a deinite exploration of my data, which would very much require a more in-depth qualitative investigation. Overall, this section has sought to underpin the theoretical foundations of this study, which is embedded at the crossroads of CDS, IPV and CL. As already discussed, taking into account the motivations behind CDS research, the exploration of a social phenomenon such as IPV 224 Alfonso Sánchez-Moya from a socio-cognitive approach to discourse is deemed feasible. I assist my analysis by making use of a software tool (LIWC) and therefore falls within CL, although I understand this application as a very initial procedure that needs to be complemented by a closer examination of the data. 3. Methodological issues 3.1. Data and data collection This article is based on data collected from a publicly-accessible online forum, hosted by a British charity with an outstanding determination to provide support and resources of many sorts to both women and their offspring when undergoing IPV. Although this type of data can be regarded as sensitive due to its content, the corpus analysed here is believed to respect principles of research ethics and ethical treatment of persons as promulgated by key documents in this area (Markham and Buchanan, 2012). Data under investigation here was collected from an online forum where users are warned of the live, public character of the site. Posts were therefore collected without the need of registering in the site. Although my research interests comply with the socio-cognitive character of this type of discourse and are less concerned with individual discourse usage per se, users are completely anonymised and posts are moderated online, making sure that the revealing of personal details cannot become a potential risk to the human being behind the online persona. Nonetheless, discussions around internet-based data are still vivid and currently being developed (Nissenbaum, 2010). The analysis presented here is based on a corpus collected in two different time spans to guarantee a richer discursive outcome (December 2014 – March 2015 and December 2015 – May 2016). Despite the fact that studying the interaction generated from the exchanging of messages would surely yield interesting data, this corpus only consists of posts which are the irst in the thread they belong to. Reasons behind this deal with the primary purpose of my research, which is interested in how the perpetrator is referred to in these posts for the irst time. The assumption that the activation of the perpetrator in the irst post of a thread would likely inluence the mechanisms used in following posts, cross-post interaction has not yet been considered. In the attempt Corpus-driven insights into the discourse of women survivors... 225 to contrast the discursive production within the online forum, the total amount of words was collected from three out of the many online communities in the same site. Accordingly, 40.000 words were collected from ‘Is it abuse?’ (122 unique posts), ‘Getting Out’ (163 posts), and ‘Life after an abusive relationship’ (187 posts) respectively, resulting in a total of 120.000 words. These three communities are frequently referred to SB1, SB2 and SB3 respectively. Full posts from the three communities are included in Table 1 below in the attempt to illustrate the type of discourse under investigation. Forum community SB1 SB2 SB3 Illustrative post What is abusive? Is it when they constantly need u around his relative is v I’ll and he’s saying he needs me someone close however I need to work night shifts so I’m knackered and I’m stressed myself [sad_emoji] I feel bad I’m not with him after my nights but he can’t sleep and he’s snappy cos he’s upset he tells me I’m selish sometimes wen I don’t come over I just feel like a realty bad girlfriend I can’t take time off cos I’ve taken time off not so long ago for a death in my own family and I was sick few times plus my work he has been violently abusive towards me before snd actually gave me somewhere to live so it’s not a good look.... Advice and suggestions i’m having to lee again need to pack up & start again as he crushed my life again this time trying to do it all with laughter anyone got any practical tips on the subject ov securing permanent housing as feel 2 mentally unstable to mix with people but need new start & 2 rocky to think practically Its been almost (information removed by moderator) months and i can honestly say i’ve broken the seal he used to brainwash me to the point i stopped drinking and going out socially although i have not mastered the going out to town with the gorls bit yet i inally felt conident safe and unguilty to enjoy myself over the new year and with friends i aint seen in ages! Massive sigh of relief! I had my irst few drinks in a year! X Table 1. Illustrative posts collected from the three forum communities 226 Alfonso Sánchez-Moya 3.2. Applying LIWC to the analysis of discourse by IPV survivors Given the pressing need of counteracting claims of cherry picking in CDS (Hart and Cap, 2014), there has been a gradual increase in the use of software tools to scrutinise texts within the ield in particular and applied linguistics in general. Although not as widely spread as software used for similar purposes (such as AntConc, WMatrix or Sketch Engine), Linguistic Inquiry Word Count (LIWC henceforth) was developed by a team of social psychologists led by James Pennebaker at the University of Texas. In short, LIWC is a programme for quantitative text analysis that relies on word count strategies to investigate issues concerned with content analysis and style. It is based on the assumption that lexical choices made by people transmit psychological information over and above their literal meaning and independent of their semantic context (Pennebaker et al., 2007), which can at the same time be used to make inferences about dimensions of individuals’ personalities (Tausczik and Pennebaker, 2010). This tool processes speech samples by identifying and classifying them according to the three internal dictionaries that the LIWC2015 version has, which consists of almost 6.400 words, words stems and selected emotions (LIWC, 2017)2. LIWC software provides the percentage-use indices of 80 standard linguistic categories of different types as they are represented in the scrutinised texts submitted by the LIWC user. Apart from the word count of each ile, this data record includes 4 language variables (analytical thinking, clout, authenticity, and emotional tone), 21 standard categories identifying function words (% of pronouns, articles, auxiliary verbs, etc.), 41 semantic categories dealing with psychological constructs (such as affect, cognition, biological processes). Additionally, although not so central for the motivation of this study, information is supplied regarding informal language makers (assents, illers, swear words) or punctuation categories (periods, commas). Broadly speaking, this output measure is correlated to both personality and real-world outcome measures, which arguably capture people’s social and psychological statuses as represented in their discursive production. This paper is based on LIWC 2015 version. More details on the development and psychometric properties of it can be found in Pennebaker et al., 2015. 2 Corpus-driven insights into the discourse of women survivors... 227 LIWC has been applied to language-driven research in combination with more social-oriented issues. Generally speaking, Pennebaker (2011) has suggested that the frequency with which people engage in the use of word categories can be directly linked with issues of power and social class or people’s degree of social connectedness. More speciically, LIWC has been used in educational settings in order to predict inal course performance based on the difference in thinking styles by comparing high-performing students with low-performing ones (Robinson, Navea and Ickes, 2014). Additionally, perhaps closer to my research interests, LIWC has also been employed to scrutinise political discourse, using the software tool to try to measure aspects of personality dimensions in political speeches (Slatcher et al., 2007; Kangas, 2014). Nevertheless, to my knowledge, the investigation of online accounts of IPV by employing LIWC has not been endeavoured yet. Rather than focussing on how LIWC categories would relect individuals’ real-world measures, I was interested in observing if the distribution of word categories would vary if the above-mentioned communities within the same online forum were to be contrasted. To this end, I submitted each set of 40.000 words to LIWC, obtaining as a result the percentage of words belonging to each of the already-given categories provided by LIWC. Although it is possible to think of some limitations to this (which shall be explored in the inal section of this article), by doing so I sought to shed light on the bigger discursive picture of these three online sub-communities. Therefore, my main aim was to obtain a preliminary approximation to the discursive character of these three groups based on the LIWC categories, observations that would certainly need to be considered from a more contextualised perspective of the analysed discourse via a qualitative-driven exploration of the data. All things considered, this piece of research is guided by the following research questions: 1. How can the application of text-analysis software tools such as LIWC contribute to better understand the online discourse of women undergoing IPV-related experiences? 2. How can LIWC-provided categories shed light on the discursive characterisation of the three communities nested in the IPV online forum under scrutiny? 228 Alfonso Sánchez-Moya 4. Analysis and discussion This section presents the percentage-use indices of those LIWC categories that are deemed to be more pertinent for the purposes of this study. In fact, the output of these LIWC categories are used to organise this section in several subsections, which present and discuss the implications of those percentages for the social issue under investigation. It is worth pointing out that statistical treatment of these igures is complex given that this study does not account for individuals’ discursive production but, rather, it understands the language production in each of the three analysed categories as embedded in the socio-cognitive approach to discourse. Nonetheless, note that the number of words in each of them is always the same (40.000), enabling thus the contrast between them. For discussion purposes, I normally take the online community including users writing about life after abuse (SB3) as a reference, paying special attention to increasing or decreasing patterns if the other two are considered. 4.1. Language variables: analytical thinking, clout, authenticity and emotional tone Among the many categories LIWC uses to classify words, there are six of them that fall within the group “summary language variables” (Pennebaker et al., 2015). It is possible to obtain information about the words per sentence (WPS), the percentage of words with less than six letters (Sixltr) or ind out about those which appear in LIWC dictionaries (Dic). Although these categories have been used by research to trace correlation between them and the complexity of thinking styles, in this article I will be focussing on the remaining four: analytical thinking (Analytic), clout (Clout), authenticity (Authentic) and emotional tone (Tone). Quite remarkably, these four categories are based on indings from previous research carried out by the developers of the tool, references to which will be pointed out accordingly. It may be useful to briely explain these four categories. First, the category analytical thinking is thought to capture the extent to which words may indicate formal, logical and hierarchical thinking patterns (LIWC, 2017, Pennebaker et al., 2014). Results from educational contexts have put forward that a low percentage in this category may im- 229 Corpus-driven insights into the discourse of women survivors... ply using language in more narrative ways, focussing on the here-andnow and leave more room for personal experiences (Pennebaker et al., 2014). Second, clout refers to “the relative social status, conidence, or leadership that people display through their writing” (LIWC, 2017, Kacewicz et al., 2013), a high number suggesting a more expert and conident style whereas a low number would indicate more tentative or even anxious style (Pennebaker et al., 2014). Third, the algorithm for authenticity derives from a series of studies indicating that when people reveal themselves in authentic or honest ways are prone to be more personal, humble, and vulnerable (LIWC, 2017; Newman et al., 2003). Fourth, emotional tone seems to be more straightforward in interpretative issues, since the higher the percentage, the more positive the tone (LIWC, 2017; Cohn et al., 2004). Having considered these categories and what they stand for, it seems timely to present the outcome measures (%) for the analysed corpus. As illustrated in Table 2 below, two different tendencies can be observed if both the forum communities and the summary language variables are compared. On the one hand, especially if the irst and third stages are compared, there is an increase in categories referring to analytical thinking (+2,88%), emotional tone (+7,84%), and authenticity (+13,76%). On the other hand, the clout category seems to behave differently, with a higher percentage of words in the irst community than in the third one (-8,38%). Forum communities SB1 ‘Is it abuse?’ SB2 ‘Getting out’ SB3 ‘Life after abuse’ Summary language variables (LIWC) Analytical Clout Authenticity Tone 17,61 48,21 62,72 6,23 18,45 41,42 70,50 10,59 20,49 39,83 76,48 14,07 Table 2. LIWC summary language variables (in %) If the brief considerations above are taken into account, one of the most notable contrasts is observed when the emotional tone of these three communities is considered (+7,84%). Quite expectedly, LIWC 230 Alfonso Sánchez-Moya can be employed to suggest that women writing in ‘Life after abuse’ show a more positive emotional tone than those contributing to ‘Is it abuse?’, an observation which was somewhat expected. Furthermore, it can be argued that discourse in the ‘Life after abuse’ subcorpus responds to a more analytical pattern than discourse in ‘Is it abuse?’. A lower percentage in the latter may therefore point out a stronger focus on the here-and-now and on personal experiences, together with a tendency to offer more narrative accounts of these users’ experiences with IPV. The tendency to express themselves in more personal and humble ways is also reinforced by the higher percentage measuring authenticity found in the third community, which additionally represents the most noticeable contrast (+13,76%) if these four LIWC categories are taken into account. Surprisingly, though, this would also suggest a greater degree of vulnerability among users of this community. Neither had I foreseen a weaker percentage for the category clout in the third subcorpus, especially because this could be seen as a characteristic of more tentative, humble or even anxious style. These results would be at odds with my original expectations and would also contradict results in some other categories that will be discussed later in this paper. 4.2. Pronominal distribution It goes without saying that a critical approach to the study of pronouns has been traditionally central for CDS research, since they can convey key information concerning issues of power and dominance (Van Dijk, 1993). As a result, given that they are frequently used as remote sensors of group dynamics (Kacewicz et al., 2012), pronouns are at the core of studies willing to draw conclusions on the discursive construction of collective identities (Koller, 2008) since they can be used to identify focus, priorities and intentions (Tausczik and Pennebaker, 2010). Not unexpectedly, LIWC caters for this need in any language-driven inquiry and provides percentages for a wide range of pronominal information. There is an interesting number of indings deriving from the application of LIWC to social issues. For instance, it seems to be a correlation between people who are undergoing physical or emotional pain and a higher tendency to use irst-person singular pronouns (Rude, Gortner & Pennebaker, 2004). In a similar vein, studies have also shown that couples using the irst-person plural pronoun proved to assess the qual- Corpus-driven insights into the discourse of women survivors... 231 ity of their marriage more positively than those who did not (Simmons, Gordon and Chambless, 2005). More broadly speaking, research combining LIWC and pronouns has also explored political (Gunsch et al., 2000) and academic discourses (Kowalski, 2000). Table 3 below depicts the percentages offered by LIWC once the three subcorpora under scrutiny were processed. It is worthwhile to mention though that igures to indicate the percentage of ‘he’ needed to be measured by AntConc (Anthony, 2014), since the version of LIWC used for this analysis makes no difference between he and she. This can arguably be seen as one of the major shortcomings of the tool. Notwithstanding this limitation, LIWC can provide interesting insights into the way pronouns are used across the three subcorpora. Based on the data, it is possible to observe the general use of personal pronouns is less salient in ‘Life after abuse’ (-1.24%), especially if both the irst and third stages are juxtaposed. A similar pattern is observed in the case of he (-1.62%) and we (-0.14%). However, the use of the irst-person pronoun I (+0.39), the pronoun you (+0.12%) and instances of they (+0,25%) do increase in SB3 if SB1 is considered. Forum communities SB1 ‘Is it abuse?’ SB2 ‘Getting out’ SB3 ‘Life after abuse’ Pronouns (LIWC) PPRON I WE YOU HE THEY 17,47 9,53 0,86 0,48 3,84 0,45 16,77 9,91 0,78 0,41 2,82 0,43 16,23 9,92 0,72 0,60 2,22 0,70 Table 3. Pronominal distribution (in %) The way in which pronouns are used across these forum communities may have several interpretations, as suggested by the variation in percentages shown in Table 3 above. As far as the use of the irst-person pronoun is concerned, there is a higher tendency to make use of it when users post in ‘Life after abuse’. Based on similar studies (Rude, Gortner & Pennebaker, 2004), this would suggest a higher index of psycholog- 232 Alfonso Sánchez-Moya ical and emotional distress in stages where abuse has been somewhat internalised, since many posts in ‘Is it abuse?’ show frequent instances of hesitation in the attempt to comprehend if users’ particular situations should be considered abusive for the rest of the online community. Interesting information can also be obtained by observing the distribution of the pronoun you. Although the nature of the pronoun you in English makes it dificult to differentiate if reference is being made to either singular or plural entities, the increasing tendency of you in ‘Life after abuse’ can be interpreted as a more consistent attempt to refer directly to potential readers of the post (women in similar situations). In fact, many posts in this inal stage adopt a more encouraging nuance when providing support. Besides, it seems clear that the pronoun he, undoubtedly one of the most common mechanisms to refer to the perpetrator in these online communities, becomes less and less central in these users’ discourse when posting in ‘Life after abuse’. It could be hypothesised that this may be due to the fact that the perpetrator is given less discursive prominence in the inal phase, when abuse seems to be a past event (note the use of the preposition after in the very name of the community) and the social actor responsible for that is gradually replaced. Nonetheless, a rather different interpretation is also feasible if attention is paid to the evolution of the pronoun they. Given the prominence that the third-person plural pronoun gains in SB3 if contrasted to SB1, this could be also understood as a discursive collectivisation of the perpetrator. To put it differently, there may be a discursive drift from representing the perpetrator in individual terms (he) to collective ones (they), which may have been partly inluenced by the mere use of the forum itself and to the process of generating a stronger bond (favouring references of us as women users against them, the perpetrators). Nonetheless, as it usually happens when working with decontextualised instances of data, a more qualitative exploration of the text would be required to pin down the social actors behind these referential devices (since he could refer to a male child and they can also possibly stand for my friends). 4.3. Analysing emotionality: positive and negative emotions Studies combining linguistic analyses and psychological processes within major social phenomena have proved that LIWC is capable of Corpus-driven insights into the discourse of women survivors... 233 providing accurate identiication of emotion in language use (Tausczik and Pennebaker, 2004; Kahn et al., 2007). This research is driven by the assumption that the different degrees and mechanisms in which people express their emotions are fundamental to comprehend how they are experiencing the world (Tausczik and Pennebaker, 2004). Not surprisingly, LIWC has been applied to the exploration of emotionality in trauma and health discourses in different contexts, such as cancer (Bantum and Owen, 2009) or relationship narratives (Boals and Klein, 2005). Moreover, there have been attempts to examine narratives by IPV survivors (Holmes et al., 2007). Although the analysis was based on 32 volunteers in non-CMC contexts, a LIWC scan found that making use of more positive and negative emotion words to talk about their experiences with violence prompted increased feelings of physical pain over the writing sessions, concluding that the higher use of emotion words, the more the perceived immersion in the traumatic event. LIWC measurements for emotionality in the corpus under inspection are depicted in Table 4 below. Broadly speaking, LIWC is able to identify emotions in two broad spectra: positive and negative emotions. More speciically, it can detect three subtypes of negative emotions (anxiety, anger and sadness). As Table 4 shows, the amount of positive emotions increases within ‘Life after abuse’ if compared to ‘Is it abuse?’ (+0,71%). Conversely, the percentage measuring negative emotions decreases in the third community if compared to the irst one (-0,22%), although this decrease would be even more signiicant if the second community was to be taken into account (-0,34%). Curious results can be observed if the type of negative emotions is compared. Accordingly, words measuring ‘anxiety’ escalate from SB1 to SB2 (+0,10%), although a higher peak is observed in SB2 (+0,15%). With regard to ‘anger’, however, percentages decline if ‘Life after abuse’ and ‘Is it abuse?’ are compared (-0,24%). Interestingly, the percentages measuring the output for sadness show a more stable distribution across the three communities, inding a slight deviation from the inal to the initial stage (-0,01%). These results yield thought-provoking interpretations. On the one hand, especially judging from the observed deviation in percentages found SB1 and SB3, the emotional tone across these communities seems to be arguably distinctive. Thus, whereas lexical choices categorised as positive are more salient in ‘Life after abuse’, a more negative nuance is perceived in ‘Is it abuse?’. This is a somewhat expected 234 Alfonso Sánchez-Moya inding, since more a more optimistic sort of narrative was more likely to permeate the overall discursive scheme within this community. This gains more prominence if posts within this community are analysed in qualitative terms, as they are generally characterised by very supportive messages who seek to give encouragement to other users at this stage. As opposed to this, a more negative emotional tone takes over within the irst community, which again is understandable bearing in mind that many of these users take advantage of this community to share their experiences so that other users can share their views on the abusive character of these situations. Forum communities Emotionality (LIWC) POSEMO NEGEMO ANX ANGER SAD SB1 ‘Is it abuse?’ 2,03 3,81 0,67 1,32 0,79 SB2 ‘Getting out’ 2,27 3,47 0,82 1,11 0,72 SB3 ‘Life after abuse’ 2,74 3,59 0,77 1,08 0,78 Table 4. Analysing emotionality with LIWC (in %) On the other hand, the evolution of more nuanced negative emotions is worth alluding to. Unlike a more even distribution of lexical items across the three communities belonging to the category ‘sad’ according to LIWC, a somewhat divergent tendency is perceived should the focus be on ‘anxiety’ and ‘anger’. In fact, based on the results illustrated in Table 4 above, lexical choices suggesting a higher degree of anxiety reach its peak in ‘Getting out’. This may imply that women undergoing IPV may feel more anxious when, having acknowledged they are being abused, they are in the process of leaving the abusive relationship. However, traces of ‘anger’ in the corpus under scrutiny seem to be more present at an initial stage (SB1), decreasing gradually if the inal phase (SB3) is regarded. 235 Corpus-driven insights into the discourse of women survivors... 4.4. Acting in particular ways: the drives behind these forum users Although slightly less covered by previous studies using LIWC, another interesting set of categories is the one amalgamated into the umbrella term ‘drives’ (Pennebaker et al., 2015). Broadly speaking, LIWC attempts to offer insights into the feelings that make language users act in particular ways. Five subcategories are considered for these purposes, relying upon lexical items which are namely included here: afiliation (ally, friend, social), achievement (win, success, better), power (superior), reward (take, prize, beneit) and risk (danger, doubt) (Pennebaker et al., 2015). LIWC analysis around these forum users’ drives are summarised in Table 5 below. As suggested, lexical items measuring the degree of afiliation decrease in ‘Life after abuse’ if compared to ‘Is it abuse?’ (-0,08%), although not remarkably. The different measurement is equally slight if ‘achieve’ is taken into account, with a rather stronger tendency in SB3 than in SB1 (+0,07%). Steadier divergences are encountered however if the remaining three categories are analysed. When it comes to quantifying levels of ‘power’ as encapsulated by the lexical choices across the three communities, a more signiicant difference is found in SB3 if the two previous stages are contrasted (+0,30% if compared to SB1, +0,44% if the same is done with SB2). A higher percentage is also observed in ‘Life after abuse’ as far as ‘reward’ is concerned (+0,26%). Contrary to this tendency, lexical choices measuring ‘risk’ seem to be less noticeable in SB3 if set against SB1 (-0,24%). Forum communities Drives (LIWC) AFFILIAACHIEVE POWER REWARD TION RISK SB1 ‘Is it abuse?’ 2,54 1,06 2,30 1,18 0,95 SB2 ‘Getting out’ 2,32 1,13 2,16 1,34 0,79 SB3 ‘Life after abuse’ 2,46 1,13 2,60 1,44 0,71 Table 5. Forum users’ drives according to LIWC (in %) 236 Alfonso Sánchez-Moya As suggested by the percentages above, some of these areas show a degree of divergence that may suggest signiicant alterations in the discursive characterisation of the online communities explored for this study. One of the categories that particularly catches my attention is the one linked to power. As pointed out in the irst subsection of this chapter above, obtaining a minor percentage in the category ‘clout’ in the third subcorpus can be interpreted as a characteristic of more tentative, humble or even anxious style in SB3. This interpretation seems to be at odds if closer attention is paid to the evolution of lexical choices itting in the ‘power’ category. In fact, such irm increase (+0,30% SB3 to SB1, +0,44% SB3 to SB2) would respond to my original expectations, which presumed traces of empowered discourse in the ‘Life after abuse’. In a similar vein, this trend would also be reinforced by examining the evolution of risk. Lexical choices connected to risk within SB3 are less signiicant if contrasted with SB1, which would again match my original expectations when equating the ‘Is it abuse?’ with a stage that is characterised for a higher presence of risks and challenges for women undergoing IPV. 5. Concluding remarks This article has sought to demonstrate how a software tool for quantitative text analysis (LIWC) can effectively be employed to provide corpus-driven insights into the micro-level of discourse by an online community of women who have at some point experienced IPV in their lives. This study has made use of some of the most relevant linguistic categories measured by LIWC to investigate the discursive frames that characterise three online communities within an online forum that offers its users the chance of engaging in narratives that seek to provide assistance and help to other users that participate in this online environment. It is worth recalling a key research question in this article was to explore the ways in which the application of text-analysis software tools such as LIWC can contribute to better understand the online discourse of women undergoing IPV-related experiences. For this purpose, and by scrutinising the output measures provided by LIWC (in percentages), results have showed how collective identity is forged within these three online communities and the ways in which this permeates in the discourse they use. These indings are of particular relevance Corpus-driven insights into the discourse of women survivors... 237 if framed within the socio-cognitive approach to discourse, tenets of which are also highlighted. Exploring the possible ways in which LIWC-provided categories can shed light on the discursive characterisation of the three online communities investigated in this paper was another central research question. With this in mind, having incorporated the analysis provided by LIWC, interesting observations have been found. As suggested elsewhere, users in ‘Is it abuse?’ are remarkably characterised for a negative emotional tone, which becomes more positive in ‘Life after abuse’. Additionally, users within ‘Life after abuse’ seem to express themselves in more personal and humble ways, which is justiied by a higher percentage in the category measuring authenticity. The fact that LIWC is capable of offering a detailed account of pronominal distribution in a given corpus paves the way for reaching fascinating conclusions based on the usage of pronouns. Although more qualitative explorations would be crucial to reinforce the validity of these arguments, the decreasing tendency when using the third-person singular pronoun (he) in ‘Life after abuse’ may prove that discourses around the perpetrator weaken in the third community. Nevertheless, based again on the percentage that LIWC offers for the third-person plural pronoun (they) another feasible interpretation would view this changing pattern as a process of collectivisation of the perpetrator. Thus, inluenced by exposure to socio-cognitive representations of the perpetrator in the forum, users can be said to move from an individualised referential strategy (he) to a collective one (they). Furthermore, LIWC can also provide assistance when measuring lexical emotionality. As argued above, there seems to be a divergence in the ways in which negative emotions evolve if the three corpora are contrasted. Whereas lexical indicators of sadness prove to be more uniform across the three communities, pointers of anxiety seem to be more pervasive at intermediates stages (‘Getting out’) than at irst ones, while the ones that measure anger are more likely to occur at the outset. Quite relatedly, the use of LIWC can also be used to suggest a gradual discursive empowerment in users writing in ‘Life after abuse’, which may somewhat mirror a change also behavioural terms at this inal stage. Useful though these pointers may be to build bridges between the micro and the macro levels of discourse, results deriving solely from quantitative explorations need to be treated with due precaution. As 238 Alfonso Sánchez-Moya stated by main developers of LIWC itself, “the study of word use as a relection of psychological state is in its early stages” (Tausczik and Pennebaker, 2010:30). This is one the reasons why future research in this ield could aim at incorporating similar text-analysis tools such as Lingmotif (Moreno-Ortiz, 2016) to investigate these tools and their different affordances may trigger complementary results. In any case, although the incorporation of corpus-driven approaches to discourse analysis has shown to be eficient to build language analyses upon more empirically-based indings, the limitations of corpus linguistics need to be considered and addressed. As already mentioned, making strong claims on the basis of pronoun usage may trigger misleading interpretations of any discursive event. Together with context, software tools are still not well-equipped with mechanisms to deal with igurative language or ironic and sarcastic references. Consequently, studies aiming at providing a holist view of a discursive phenomenon should always leave room for qualitative examinations, which can usually account for many of the already-mentioned drawbacks. Acknowledgements This research is funded by the Spanish Ministry of Education (FPU1304471). I would also like to thank both reviewers for their interesting comments and observations, which I have incorporated in the inal version of this paper. Likewise, my wholehearted gratitude to the editors of this volume for their editorial initiative and their admirable hard work when compiling all the contributions in a very comprehensive harmony. References Ali, Parveen Azam & Naylor, Paul. 2013. Intimate partner violence: A narrative review of the feminist, social and ecological explanations for its causation. Aggression and Violent Behavior 18(6): 611-619. doi: https://doi.org/10.1016/j.avb.2013.01.003 Anthony, Lawrence. 2011. AntConc (Version 3.2. 2)[Computer Software]. Tokyo: Waseda University. Augoustinos, Martha; Walker, Iain & Donaghue, Ngaire. 2006. Social Cognition: An Integrated Introduction (2nd ed.). London: Sage. Baker, Paul. 2006. Using Corpora in Discourse Analysis. London: Continuum. Corpus-driven insights into the discourse of women survivors... 239 Baker, Paul. 2008. Sexed texts: Language, Gender and Sexuality. London: Equinox. Baker, Paul; Gabrielatos, Costas; Khosravinik, Majid; Krzyżanowski, Michał; McEnery, Tony & Wodak, Ruth. 2008. A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & Society 19(3): 273-306. doi:10.1177/0957926508088962 Bantum, Erin O’Carroll & Owen, Jason. 2009. Evaluating the validity of computerized content analysis programs for identiication of emotional expression in cancer narratives. Psychological Assessment 21(1): 79. doi: 10.1037/a0014643 Boals, Adriel, & Klein, Kitty. 2005. Word use in emotional narratives about failed romantic relationships and subsequent mental health. Journal of Language and Social Psychology 24(3): 252-268. doi: 10.1177/0261927X05278386 Boonzaier, Floretta. 2008. “If the man says you must sit, then you must sit.” The relational construction of woman abuse: Gender, subjectivity and violence. Feminism & Psychology 18(2): 183-206. doi: https://doi. org/10.1177/0959353507088266 Baly, Andrew. 2010. Leaving abusive relationships: Constructions of self and situation by abused women. Journal of Interpersonal Violence 25(12): 2297-2315. doi: https://doi.org/10.1177/0886260509354885 Bou-Franch, Patricia. 2013. Domestic Violence and Public Participation in the Media: The Case of Citizen Journalism. Gender and Language 7(3): 275-302. doi: 10.1558/genl.v7i3.275 Burguess, Anne & Crowell, Nancy. 1996. Understanding violence against women. Washington: National Academy Press. Campbell, Jacquelyn. 2002. Health consequences of intimate partner violence. The Lancet 359(9314): 1331-1336. doi: http://dx.doi.org/10.1016/ S0140-6736(02)08336-8 Cohn, Michael; Mehl, Matthias & Pennebaker, James. 2004. Linguistic Markers of Psychological Change Surrounding September 11, 2001. Psychological Science 15: 687-693. doi: 10.1111/j.0956-7976.2004.00741.x Dartnall, Elizabeth & Jewkes, Rachel. 2013. Sexual violence against women: the scope of the problem. Best Practice & Research Clinical Obstetrics & Gynaecology 27(1): 3-13. doi: 10.1016/j.bpobgyn.2012.08.002 Dobash, Rebecca & Dobash, Russell. 2015. Domestic Violence: Sociological Perspectives. International Encyclopedia of the Social & Behavioral Sciences (2nd ed.). Elsevier, 632-635. doi: 10.1016/B0-08-0430767/03935-8 240 Alfonso Sánchez-Moya Dutton, Donald & Nicholls, Tonia. 2005. The gender paradigm in domestic violence research and theory: Part 1, The conlict of theory and data. Aggression and Violent Behavior 10(6): 680-714. doi: 10.1016/j. avb.2005.02.001 Fairclough, Norman. 2015. Language and Power (3rd ed.). London: Routledge. Gunsch, Mark; Brownlow, Sarah & Mabe, Zachary. 2000. Differential forms linguistic content of various of political advertising. Journal of Broadcasting & Electronic Media 44(1): 27-42. doi: http://dx.doi.org/10.1207/ s15506878jobem4401_3 Harris, Kate; Palazzolo, Kellie, & Savage, Matthew. 2012. “I’m not sexist, but...”: How ideological dilemmas reinforce sexism in talk about intimate partner violence. Discourse & Society 23(6): 643-656. doi: 10.1177/0957926512455382 Hart, Christopher & Cap, Piotr (ed.). 2014. Contemporary Critical Discourse Studies. London: Bloomsbury Publishing. Heise, Lori. 1998. Violence against women an integrated, ecological framework. Violence Against Women 4(3): 262-290. doi: 10.1 177/1077801298004003002 Heise, Lori & García-Moreno, Claudia. 2002. Violence by intimate partners. In Krug, E.; Dahlberg, L.L.; Mercy, J. A.; Zwi, A. B. & Lozano, R. (ed.) World Report on Violence and Health. Geneva: World Health Organization, 88-121. Holmes, Danielle; Alper, Georg; Ismailji, Tasneem; Classen, Catherine; Wales, Talor; Cheasty, Valerie; Miller, Andrew & Koopman, Cheryl. 2007. Cognitive and emotional processing in narratives of women abused by intimate partners. Violence Against Women 13(11): 1192-1205. doi: 10.1177/1077801207307801 Kacewicz, Ewa; Pennebaker, James; Davis, Matthew; Jeon, Moongee & Graesser, Arthur. 2013. Pronoun use relects standings in social hierarchies. Journal of Language and Social Psychology 33(2): 125-143. doi: 10.1177/0261927X13502654 Kahn, Jeffrey; M. Tobin, Renee; Massey, Audra & Anderson, Jennifer. 2007. Measuring emotional expression with the Linguistic Inquiry and Word Count. The American Journal of Psychology: 263-286. doi: 10.2307/20445398 Kangas, Sara. 2014. What can software tell us about political candidates?: A critical analysis of a computerized method for political discourse. Journal of Language and Politics 13(1): 77-97. doi: 10.1075/jlp.13.1. 04kan KhosraviNik, Majid. 2010. Actor descriptions, action attributions, and argumentation: towards a systematization of CDA analytical categories in Corpus-driven insights into the discourse of women survivors... 241 the representation of social groups. Critical Discourse Studies 7(1): 5572. doi: 10.1080/17405900903453948 Koller, Veronika. 2008. Lesbian Discourses: Images of a Community. London: Routledge. Koller, Veronika. 2014. Applying Social Cognition Research to Critical Discourse Studies: The Case of Collective Identities. In Hart, Christopher & Cap, Piotr (ed.) Contemporary Critical Discourse Studies. London: Bloomsbury, 147-166. Kowalski, Robin. 2000. “I was only kidding!”: Victims’ and perpetrators’ perceptions of teasing. Personality and Social Psychology Bulletin 26(2): 231-241. doi: 10.1177/0146167200264009 Krug, Etienne; Dahlberg, L.L; Mercy, James; Zwi, Anthony & Lozano, Rafael (ed.). 2002. World Report on Violence and Health. Geneva, Switzerland: World Health Organization. Kumar, Anant; Nizamie, S. Haque & Srivastava, Naveen. 2013. Violence against women and mental health. Mental Health & Prevention 1(1): 4-10. doi: https://doi.org/10.1016/j.mhp.2013.06.002 LIWC. 2017. Where do the numbers come from? How are they calculated? https://liwc.wpengine.com/interpreting-liwc-output/ [Accessed 22/03/2017]. Markham, Annette & Buchanan, Elizabeth. 2012. Ethical decision-making and internet research: Recommendations from the AOIR ethics working committee (version 2.0). http://www.dphu.org/uploads/attachements/ books/books_5612_0.pdf [Accessed 21/03/2017]. Marín Arrese, Juana Isabel. 2011. Effective vs. epistemic stance and subjectivity in political discourse: Legitimising strategies and mystiication of responsibility. Critical Discourse Studies in Context and Cognition. Amsterdam: John Benjamins, 193-224. Moreno-Ortiz, A. (2016). Lingmotif 1.0 [Computer Software]. Málaga, Spain: Universidad de Málaga. http://tecnolengua.uma.es/lingmotif. Newman, Matthew; Pennebaker, James; Berry, Diane & Richards, Jane. 2003. Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin 29: 665-675. doi: 10.1177/0146167203029005010 Nicholls, Tonia & Dutton, Donald. 2001. Abuse committed by women against male intimates. Journal of Couples Therapy 10(1): 41-57. doi: 10.1300/ J036v10n01_04 Nissenbaum, Helen. 2010. Privacy in context: Technology, policy, and the integrity of social life. Stanford: Stanford University Press. Ofice for National Statistics. 2015. Intimate partner violence and partner abuse. https://www.ons.gov.uk/peoplepopulationandcommunity/ 242 Alfonso Sánchez-Moya crimeandjustice/compendium/focusonviolentcrimeandsexualoffences/ yearendingmarch2015/chapter4intimatepersonalviolenceandpartnerabuse [Accessed 03/03/2017]. Pennebaker, James; Booth, Roger & Francis, Martha. 2007. Linguistic Inquiry and Word Count: LIWC [Computer software]. Austin: LIWC.net Pennebaker, James. 2011. The Secret Life of Pronouns: What Our Words Say About Us. New York: Bloomsbury. Pennebaker James; Chung, Cindy; Frazee Joey, Lavergne Gary & Beaver, David. 2014. When small words foretell academic success: The case of college admissions essays. PLoS ONE 9(12): e115844. doi: https://doi. org/10.1371/journal.pone.0115844 Pennebaker, James; Boyd, Ryan; Jordan, Kayla & Blackburn, Kate. 2015. The development and psychometric properties of LIWC2015. Texas: The University of Texas. doi: 10.15781/T29G6Z Robinson, Rebecca; Navea, Reanelle & Ickes, William. 2013. Predicting inal course performance from students’ written self-introductions: A LIWC analysis. Journal of Language and Social Psychology 32(4): 469-479. doi: 10.1177/0261927X13476869 Rude, Stephanie; Gortner, Eva-Maria & Pennebaker, James. 2004. Language use of depressed and depression-vulnerable college students. Cognition & Emotion 18(8): 1121-1133. doi: 10.1080/02699930441000030 Santaemilia, José & Maruenda, Sergio. 2014. The linguistic representation of gender violence in (written) media discourse. Journal of Language Aggression and Conlict 2(2): 249-273 Shipley, Thomas & Zacks, Jeffrey (ed.). 2008. Understanding Events. From Perception to Action. Oxford: Oxford University Press. Simmons, Rachel; Gordon, Peter & Chambless, Dianne. 2005. Pronouns in Marital Interaction What Do “You” and “I” Say About Marital Health? Psychological science 16(12): 932-936. doi: 10.1111/j.14679280.2005.01639 Slatcher, Richard; Chung, Cindy; Pennebaker, James & Stone, Lori. 2007. Winning words: Individual differences in linguistic style among US presidential and vice presidential candidates. Journal of Research in Personality 41(1): 63-75. doi: 10.1016/j.jrp.2006.01.006 Stokoe, Elizabeth. 2010. “I’m not gonna hit a lady”: Conversation analysis, membership categorization and men’s denials of violence towards women. Discourse & Society 21(1): 59-82. doi: http://dx.doi. org/10.1177/0957926509345072 Stubbs, Michael. 1997. Whorf’s children: critical comments on critical discourse analysis. In Ryan, Ann & Wray, Alison (ed.) Evolving Models of Language. Clevedon: Multilingual atters, 100-116. Corpus-driven insights into the discourse of women survivors... 243 Sunderland, Jane & Litosseliti, Lia. 2002. Gender identity and discourse analysis: Theoretical and empirical considerations. Gender Identity and Discourse Analysis. Amsterdam: John Benjamins, 1-39. Tausczik, Yla & Pennebaker, James. 2010. The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology 29: 24-54. doi: https://doi. org/10.1177/0261927X09351676 Van Dijk, Teun. 1993. Elite Discourse and Racism. London: Sage. Van Dijk, Teun. 2014. Discourse-Cognition-Society. Current state and prospects of the Socio-Cognitive Approach to Discourse. In Hart, Christopher & Cap, Piotr (ed.) Contemporary Critical Discourse Studies. London: Bloomsbury, 121-146. Van Dijk, Teun. 2015. Racism and the Press. London: Routledge. Walby, Sylvia; Towers, Jude; Balderston, Susan; Corradi, Consuelo; Francis, Brian; Heiskanen, Markku; Helweg-Larsen, Karin et alii (ed.) 2017. The Concept and Measurement of Violence against Women and Men. Bristol: Policy Press. Walker, Lenore. 2015. Looking back and looking forward: Psychological and legal interventions for domestic violence. Ethics, Medicine and Public Health 1(1): 19-32. doi: 10.1016/j.jemep.2015.02.002 Widdowson, Henry. 2004. Text, Context, Pretext: Critical Issues in Discourse Analysis. Oxford: Blackwell Winstok, Zeev & Sowan-Basheer, Wafa. 2015. Does psychological violence contribute to partner violence research? A historical, conceptual and critical review. Aggression and Violent Behavior 21: 5-16. doi: 10.1016/j.avb.2015.01.003 Wodak, Ruth & Fairclough, Norman. 2004. Critical discourse analysis. Qualitative Research Practice: Concise Paperback Edition. 185-202. Wodak, Ruth & Meyer, Michael. 2009. Methods for Critical Discourse Analysis. London: Sage World Health Organization. 2016. Violence Against Women. Geneva: World Health Organization. http://www.who.int/mediacentre/factsheets/ fs239/en/ [Accessed 07/03/2017]. ojs.uv.es/index.php/qilologia/index Qf Lingüístics Immigration metaphors in a corpus of legal English: an exploratory study of EAL learners’ metaphorical production and awareness Metáforas sobre inmigración en un corpus de inglés jurídico: un estudio preliminar de la producción y conciencia metafórica de estudiantes de inglés como lengua adicional (EAL) Emilia Castañoa Castaño, Natalia Judith Laso Martínb & Isabel Verdaguer Claverac University of Barcelona. e.castano@ub University of Barcelona. [email protected] c University of Barcelona. [email protected] Received: 20/04/2017. Accepted: 31/10/2017 a b Abstract: Metaphor is central to human understanding and communication. It pervades our everyday language and also abounds in specialized discourse, with legal language not being an exception. This is particularly relevant since metaphors are powerful framing tools able to affect our worldview. With the aim of exploring the use that EAL law undergraduate students make of metaphorical expressions as well as their awareness of their connotations, a learner corpus was compiled and qualitatively analyzed. Results have shown that learners, like native speakers, rely on the use of conceptual metaphors such as mIgratIon Is a natural force, states are contaIners or ImmIgrants are a threat to describe immigration issues. This exploratory study has also revealed that learners are not always conscious of the negative slant that metaphors may convey and that raising their awareness is key to enhance critical thinking. Keywords: corpus linguistics; conceptual metaphor; metaphorical awareness; legal discourse; EAL learners. Resumen: La metáfora es un elemento central de la comunicación y la comprensión humana. Abunda en el lenguaje cotidiano y también en el de especialización, no siendo una excepción el discurso legal. Este hecho es relevante ya que las metáforas nos permiten enmarcar la realidad desde diversas perspectivas que condicionan nuestra per- Castaño Castaño, Emilia; Laso Martín, Natalia Judith & Verdaguer Clavera, Isabel. 2017. “Immigration metaphors in a corpus of legal English: an exploratory study of EAL learners’ metaphorical production and awareness”. Quaderns de Filologia: Estudis Lingüístics 22: 245-272. doi: 10.7203/qf.22.11310 cepción del mundo. Con el objetivo de explorar el uso que los estudiantes de Derecho con inglés como lengua adicional (EAL) hacen de las metáforas y de determinar si son conscientes de sus connotaciones, se compiló y analizó cualitativamente un corpus de aprendices. Los resultados han demostrado que los aprendices al igual que los hablantes nativos utilizan metáforas conceptuales tales como la InmIgracIón es una fuerZa natural, los estados son contenedores o los InmIgrantes son una amenaZa para describir el fenómeno de la inmigración. Este estudio exploratorio también subrayó la importancia de que los aprendices sean conscientes de la carga negativa de algunas metáforas para promover el pensamiento crítico. Palabras clave: metáfora conceptual; conciencia metafórica; discurso legal; aprendices de EAL; lingüística de corpus. Immigration metaphors in a corpus of legal English... 247 1. Corpus linguistics and metaphor Word meanings are multi-faceted and can present multiple sides, which vary depending on the perspective from which they are viewed. Words in isolation are ambiguous but their ambiguity is lost or reduced when they are put in context. They have meaning potential, which is activated in a given context (Hanks, 2007). Many words, in addition to their literal meanings, have metaphorical meanings, which often relect the cognitive operations whereby we understand complex concepts (Lakoff & Johnson, 1980, 1999). Thus, the meaning of rise in (1) is concrete and refers to motion and in (2) it is metaphorical and refers to quantity. (1) By the time the plane rose in to the air it was dark (British National Corpus) (2) Aluminium recycling in the UK rose to 9.5 last year (British National Corpus) Corpus linguistics, which has allowed to analyze real language in context, irst approached the analysis of the syntagmatic patterns of language. More recently, however, it has also been applied to the analysis of igurative language, offering a way to carry out quantitative and qualitative studies of metaphorical expressions as a phenomenon of language in use. The availability of electronic corpora has enabled the systematic search for metaphorical expressions in authentic texts and has provided empirical evidence for the theoretical claims of the theory of Conceptual Metaphor (Cameron & Deignan, 2003; Stefanowitsch, 2007). The identiication and analysis of metaphorical expressions is methodologically much more complex than the study of lexical items, for example, since metaphorical mappings have different lexical realizations and cannot be extracted from texts in a straightforward way. For this reason, a number of procedures for metaphor extraction have been devised, among them, manual searching; search for source-domain vocabulary; search for target-domain vocabulary; search for both source and target-domain, or search based on ‘markers of metaphor’ realizations, that is to say, linguistic devices that may indicate the presence of a metaphor (see Stefanowitsch, 2007 for an account of the problems encountered in automatic metaphor extraction). 248 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer In addition, literature has shown that igurative language is used not only in general language, but also in different types of genres and registers (Deignan et al., 2013). Recent research (Caballero, 2003, 2006; Deignan et al., 2013; Herrmann & Sardinha, 2015) has stressed the need to take into account register and the speciic linguistic characteristics of a discourse community in the exploration of igurative language, due to the relevance of shared knowledge in the production, recognition and interpretation of metaphors. Finally, corpus linguistics has also greatly contributed to aiding critical metaphor analysis by providing attested linguistic evidence of the framing-evaluative power of metaphor (Charteris-Black, 2005). 2. Metaphor and Framing The advent of Cognitive Metaphor Theory in the early 1980s shifted the locus of metaphor from language to thought and posited the claim that abstract concepts are metaphorically grounded in experiences arising from our embodied interactions with the environment, which “[is] at once physical, social, cultural, economic, moral, legal, gendered, and racialized” (Johnson, 2007). In this respect, metaphor, far from being considered an ornamental device, is conceived of as a cognitive operation whereby abstract domains (target domains) are mapped onto concrete experiential domains (source domains) through projections that under the form ‘target domaIn Is source domaIn’ allow us to understand, reason, and talk about abstract concepts and subjective or complex experiences in terms of more concrete ones (Lakoff & Johnson, 1980, 1999; Semino, 2008). This property of metaphor makes it a powerful framing tool, able to shape the way we perceive a situation or event by evoking particular worldviews and highlighting certain aspects of a phenomenon while downplaying others (Lakoff & Johnson, 1980, 1999; Lakoff, 2004; Charteris-Black, 2005; Johnson, 2007). Thus, for example, the choice of metaphors related to either sports or war to describe a country’s foreign policy frames the topic in different and contrasting ways: while sports metaphors depict foreign countries as opponents, war metaphors do it as enemies, foregrounding the notion of hostility. This property of metaphor transcends language boundaries, by helping “to promote a particular problem deinition, causal interpretation, moral evaluation and/or treatment recommendation for the item Immigration metaphors in a corpus of legal English... 249 described” (Entman, 1993: 52). Hence, metaphor becomes an exceptional instrument to analyze the conventional understanding of some of the most controversial topics included in the journalistic, political and legal agenda, such as, for example, immigration. 3. Immigration Metaphors in Public Discourse A large body of studies has lately analyzed the metaphorical expressions that have shaped the European and American discourse on immigration in our recent history, as relected by mass media, blogs and political speeches (O’Brien, 2003; Charteris-Black, 2006; Wodak, 2006; Cisneros, 2008; Biria, 2012; Musolff, 2015; Saiz de Lobado, 2015; among others). From their results it becomes apparent that, despite cross-cultural variation, the portrayal of immigration that has dominated public discourse since the early 20th century, at one point or another, has revolved around a network of metaphors that dehumanize immigrants and/or describe them as a threat to host countries. Thus, for example, several studies have shown that immigration is often described as a natural force, a flood, with an uncontrollable power and disastrous consequences for recipient communities (Santa Ana, 2002; O’Brien 2003; Charteris-Black, 2006; Chavez & Hoewe, 2012; Strom & Alcock, 2017). Similar devastating effects have been found to be attributed to immigration in metaphors that equate immigrants with moBIle toxIc wastes (Cisneros, 2008) or weeds that infest the land (Deignan, 2005). Research has also provided evidence that subhuman metaphors such as ImmIgrants are anImals (Santa Ana, 1999; Deignan, 2005), oBjects or commodItIes (El Refaie, 2001; O’Brien 2003) have coexisted, at least since the 1990’s, with metaphors that bestowed nations with human qualities and led to conceptualize them as a Body or organIsm whose wellbeing is endangered by immigrants, seen now as either a Burden (Santa Ana, 2002; Cisneros, 2008; Crespo-Fernández, 2013), IndIgestIBle food, InfectIous organIsms, (O’Brien 2003) or ParasItes (Musolff, 2015). These metaphors seem to have been partially displaced now by those that depict immigrants as Invaders, crImInals or Illegal alIens (Flores, 2003; Binotto, 2015) against whom a heroic ighter, the government, must act to protect the country’s integrity (Santa Anna, 2002; O’Brien 2003; Musolff, 2011; Binotto, 250 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer 2015). These metaphors rest on the conceptualization of the natIon as a house or fortress (Charteris-Black, 2006; Cisneros, 2008; Biria, 2012) whose boundaries, either physical or symbolic, serve the purpose of setting a dividing line between the us and them (Van Dijk, 2000) and reinforce the sense of otherness that seems to pervade not only public, as seen in this section, but also legal discourse on immigration, as will be discussed in the following section. 4. Immigration metaphors in legal discourse In spite of the preconceived idea that specialized registers are largely free from igurative expressions, expanding research gives proof that metaphor is widely used in speciic text-types (Deignan et al., 2013): in the ields of politics and economics (Musolff, 2004; Charteris-Black, 2005); in medicine (Salager-Meyer, 1990; Faber & Márquez, 2004); in biology (Ureña, 2012; Knudsen, 2015) are just a few examples. In this respect, legal discourse is not an exception. Legal discourse is highly metaphorical to the extent that conceptual metaphor and radial categories are argued to shape legal language and, to a certain extent, determine which arguments are valid in legal reasoning (Winter, 2001, 2006; Ebbesson, 2008). This results logical if it is considered that law is “an ideological artifact” (Orts, 2015: 30), a product of human understanding, which is essentially metaphorical (Lakoff & Johnson, 1999; Johnson, 2007). Metaphors act as framing instruments not only able to convey legal concepts but also to inluence thought and policies. Beyond theoretical assertions, the ubiquity of metaphor in law has been extensively attested both in general (Winter, 2006) and speciic legal domains such as corporate and criminal law (Duncan, 1994; Berger, 2004); constitutional and administrative law (Noah, 2000; Jackson, 2006); or intellectual property regulation (Loughlan, 2006; Larsson, 2013). In the case of immigration law, several studies have shown that metaphor also plays an important role in the legal construct of immigration. Thus, for instance, according to Cunningham-Parmeter (2011), the analysis of the American Supreme Court texts evidences that for decades immigration has been commonly conceptualized as a flood, an avalanche or an InvasIon, and immigrants, as alIen outsIders or Illegals that threaten the country’s stability. These same images repeat in the European legislation where immigration is also depicted as an uncontrol- Immigration metaphors in a corpus of legal English... 251 laBle fluId and the natIon as a contaIner metaphor grounds the proliferation of exclusion metaphors such as ImmIgrants are alIens or enemIes to be fought (Rosello, 1999; Incelli, 2013). In this context, border protection is given priority, which leads to adopt a closeddoor PolIcy towards immigrants. Only occasionally, the metaphor of hospitality, closely connected to the former, is invoked and immigrants are presented as guests who enjoy the generosity of a host whose borders are now seen as an oPen door (Rosello, 1999). Finally, the legal system also seems to draw on the metaphor ImmIgrants are oBjects (Incelli, 2013), in which immigrants are conceived of as entities that can be relocated. A close reading of the dominant metaphorical construction of immigration described above relects a tight connection with what Lakoff called the strict father model (Lakoff, 1996, 2006; Lakoff & Wehling, 2012). Framing immigration as a security problem and immigrants as illegal aliens or invaders (Lakoff & Ferguson, 2006) appeals to the governments’ duty of protecting their citizens, just as a father would do, and contributes to enhancing the treatment of immigration as a threat in public and legal discourse. 5. The conceptualization of immigration in a learner corpus of legal English Metaphors are a fundamental part of linguistic competence and need to be addressed in second and foreign language learning and teaching. Early research in this area (Boers, 2000; Littlemore & Low, 2006) has mostly focused on the importance of raising learners’ awareness of metaphorical thought during the language learning process. In particular, it has approached the students’ learning and understanding of metaphors as well as the metaphorical extension of the meaning of words to facilitate vocabulary learning, since it has been demonstrated that making learners aware of the relationship between the literal and igurative meanings of lexical items (Boers, 2000; Charteris-Black, 2000) aids the comprehension and retention of new vocabulary. Learners, however, in addition to being familiar with conceptual metaphor and the metaphorical extension of meaning of certain expressions, need to learn and use the linguistic instantiations of metaphorical thought in the target language (Charteris-Black, 2000). As Boers (2000) 252 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer points out, knowledge of metaphorical thought does not guarantee the command of its linguistic realizations. Thus, metaphoric competence, that is, the ability to recognize and use igurative language effectively and appropriately, is indeed a relevant step in the language learning process. Yet, to this date there are still few studies approaching the students’ actual production of metaphorical expressions in L2. In spite of the great expansion that research on learner corpora has seen in the last decade, after Granger’s pioneering work and the compilation of the International Corpus of Learner English (ICLE) (Granger et al. 2002), followed by the compilation of many other learner corpora, little research has been published on the actual use of metaphor by learners (Littlemore & Low, 2006; Chapetón et al., 2012; Golden, 2012; Nacey, 2013; Littlemore et al., 2014). As shown earlier, there are several studies dealing with the study of metaphor in both public discourse and legal discourse. A few have also approached metaphors in the language of learners or non-experts, and, very recently, the metaphors used by migrant students in their account of their own experiences (Catalano, 2016). However, to our knowledge, no study has analyzed the use of immigration metaphors in a learner corpus of legal English. This paper, which approaches the metaphorical conceptualization of immigration in an EAL1 learner corpus of legal English, aims to ill a gap in learner corpus research. Its objective is to analyze the use of igurative language in a corpus of texts on migration law written in English by Spanish undergraduates of Law and test their awareness of the evaluative power of metaphor. Although there are a few corpus-based studies on the use of metaphors by learners (mentioned above) claiming that learners do use metaphorical expressions, they focus on students’ general argumentative writing, not on a specialized register. To our knowledge, there is no other learner corpus of legal English which has been compiled and analyzed so far. Being immigration a highly debated and controversial topic, and an important social issue in western society, metaphorical language is expected to play a major role in learners’ production. If so, can university students of law recognize the metaphors used in legal discourse as well The term EAL was preferred to EFL/ESL here as the population under study has been instructed in English. 1 Immigration metaphors in a corpus of legal English... 253 as their connotations? And do they reproduce metaphors charged with negative associations without, perhaps, even being aware of them? 6. Data and Method 6.1. Learner corpus data and learner proile used in this study With the aim of exploring the use that learners make of metaphors in their legal English written production and what type of conceptualizations are being used, twenty-ive unrevised written assignments (circa 25,000 tokens) on European immigration and asylum produced by thirty Spanish undergraduate students of Law who use English as an Additional Language (EAL) were selected. Admittedly, this is a small corpus, which will be enlarged in the future, but taking into account that some previous studies on metaphor in learner corpora are based on small datasets (Nacey, 2013), we think ours is enough as an exploratory qualitative study which can provide valid conclusions about the use of igurative language and the patterns followed in this type of discourse. This collection of texts has been constructed to inform the VESPA (“Varieties of English for Speciic Purposes dAtabase”) learner corpus project, aimed at building up a large corpus of ESP texts written by L2 writers from various mother tongue backgrounds. This group of undergraduates was enrolled in a 6 ECTS optional course on Migration Law and Citizenship, which examines the rules and policies developed by the European Union and Member States in order to manage migration lows. In addition to describing this phenomenon both at the EU and at national level, the course also focuses on EU and national powers that govern the entrance, removal and status of non-nationals. The difference between Union citizens and third-country nationals is also analysed and compared, as well as the status of family members of Union citizens. The acquisition of citizenship by former migrants and the special situation of asylum seekers are also addressed. Regarding the learning outcomes of the course, learners are expected, on the one hand, to gain knowledge on the basic concepts of migration law, asylum and citizenship as well as the rules that govern migration both at EU and national level and, on the other, acquire a better command of migration terminology and associated phraseology in English. 254 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer No level of language proiciency is required to enrol in the course, but learners sit a placement test during the irst week of the course and their English proiciency level ranges from B1 to C2, using the CEFR system. 6.2. Survey In order to attest learners’ awareness of the use of metaphorical expressions usually associated with the role and functions of the government and laws, as far as immigration policies are concerned, a survey was handed out among participants to the study (see Appendix). In this study, we present qualitative results on part 3, in which respondents were given some information about metaphors as mechanisms used to understand one concept in terms of another and were asked whether they were aware of the fact that legal discourse was highly metaphorical and that certain terms were associated with negative connotations. They also had to justify their answers. 6.3. Metaphor extraction Following the “metaphor identiication procedure” (MIP; Pragglejaz Group 2007; Steen et al., 2010), a 25,000-word sample from a learner corpus of legal English was analysed manually in order to identify the most salient metaphorically used expressions; that is, expressions that have a contextual (metaphorical) meaning that can be understood in comparison with a more basic (literal) meaning, commonly found in the learners’ essays. Each of these expressions was classiied according to their source domain. With the aim of ensuring accuracy and consistency, three analysts were involved in the metaphor identiication process. The analysts are all linguists and researchers specialised in discourse analysis. Their individual results were discussed and only those metaphorical expressions agreed among the three analysts were selected for the present study. Finally, these expressions were compared against those already identiied in native production (O’Brien, 2003; Charteris-Black, 2006; among others). Immigration metaphors in a corpus of legal English... 255 7. Results and Discussion 7.1. Metaphor analysis Corpus data reveal that learners use a large number of metaphorical expressions and that the most frequent conceptual metaphors used by learners in our corpus of legal English to depict immigration and its actors can be grouped as follows: natIons are contaIners, ImmIgratIon Is a threat/ProBlem; ImmIgratIon control Is a Battle, ImmIgratIon Is a natural force; ImmIgrants are oBjects. 7.1.1. natIons are contaIners The analysis of the examples found in the learner corpus dataset has shown that learners also conceptualize nations as bounded spaces of limited capacity (Example (3)) vulnerable to collapse in the event of a large-scale increase in immigration (Example (4)) (Rosello, 1999). In this context, governments and institutions become guarantors of protection and border security turns out to be essential for the stability of the country (Example (5)) (Castan Pinos, 2008). Hence, borders are metaphorically conceived of as gates or doors that can be sealed (Example (6)) or selectively opened to people based on criteria of desirability and need (Example (7)) (Zaiotti, 2007). (3) Relocation as a concept which emphasizes distribution of persons in clear need of international protection among Member States, will be used when the volume of arrivals is already full (ML 19). (4) With 600.000 people applying for asylum in 2014, the European Union is under a lot of pressure, the system is overwhelmed. These distribution criteria relect the capacity of the Member States to absorb and integrate refugees (ML 20). (5) What European institutions have tried to do so far is ind a balance between these two apparently opposing obligations: the humanitarian one of saving those in peril and the one of protecting Europe (ML 19). (6) One of the main arguments of those who believe we should “close our borders” is the fact that we simply do not know if everyone coming in is an actual asylum seeker, or a member of a terrorist group who is just taking advantage of the situation and making his way to Europe (ML 21). 256 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer (7) The EEUU or the UK’s primes ministers openly said that they would open the borders for those highly-educated professionals (with no less than a College degree) to seek for opportunities in their countries (ML 22). The container metaphor implies both an inside and an outside and therefore in relation to immigration discourse it requires both the ‘us’ and the ‘them’ referred to by Van Dijk (2000): “the penetration of the boundary of a container implies the ‘them’ symbolically entering the ‘us’” (Charteris-Black, 2006: 577). In this sense, borders simultaneously serve the purpose of setting categorization lines that help to distinguish citizens from non-citizens, often referred to as applicants (Examples (8), (9)) to whom states can offer temporary or permanent protection. The metaphor of hospitality is invoked in this case and nations are presented as hosts (Example (10), (11)). (8) Agenda did not provide Member States with instructions how to do this seperation between applicants if they are facing higher number of applications than expected (ML 19). (9) Member States shall prevent secondary movements of relocated applicants during the period of the examination of application for international protection (ML08). (10) Very commendable is that Europol and Eurojust are ready to assist the host Member State with investigations to dismantle the smuggling and traficking networks (ML 19) (11) It is important to make progress when it comes to relocation and resettlement with respect to the Member States and third countries which host large numbers of refugees (ML21). All in all, the examples above provide a picture of immigration that frames it as a problem mainly related to having to cope with more than a fair share of refugees and migrants, which has justiied the adoption of a protection policy oriented to controlling the porosity of Europe’s borders establishing tight selective criteria. 7.1.2. ImmIgratIon Is a threat/ProBlem As shown in the literature, the social phenomenon of immigration is often portrayed as dangerous in immigration discourse (Santa Ana, 2002; Immigration metaphors in a corpus of legal English... 257 Charteris-Black, 2006; Cisneros, 2008, to name but a few). Cisneros (2008: 569) points out that “[t]hough the degree of popular obsession with immigrants rises and falls, there is always an awareness that these strangers potentially bring with them monumental and threatening changes”. In this scenario, immigrants are seen as threatening enemies (invaders, troublemakers) and as a danger to the stability of member states. (12) They are considered to be a threat to public policy, internal security, public health and international relations (ML03) (13) (…) to check (…) the identity of any person, irrespective of his behaviour and of speciic circumstances giving rise to a risk of breach of public order (ML03) (14) A Member State can apply for temporary protection in the event of a mass inlux of displaced persons from third countries who are unable to return to their country of origin and to promote a balance of effort between Member States in receiving and bearing the consequences of receiving such persons (ML05) (15) EU states tend to view any large-scale international migration as a threat to the sovereignity of their national and regional borders, their economies and their societies (ML21) As illustrated in the examples above, many expressions of negative evaluation, such as “a threat to public policy, internal security” (Example 12); “a risk of breach of public order” (Example 13); “bearing the consequences of” (Example 14) and “a threat to the sovereignity of their national and regional borders” (Example 15) can be found in the corpus data, which goes in line with those expressions identiied in native production. 7.1.3. ImmIgratIon Is a natural force Immigration is conceptualized as a low of water, as a tidal wave (source domaIn) which is dificult to control and thus is portrayed as a social catastrophe: “(…) by their nature, liquids –tides, rivers, waves, etc. – move around; they can therefore be related to a more primary conceptual metaphor: changes are movements” (Kövecses, 2002: 134). To this respect, Charteris-Black (2006: 572) highlights that “lack of control over change is lack of control over movement”, which rein- 258 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer forces the idea that the use of disaster metaphors to describe migratory lows implies that the phenomenon of immigration is perceived as a danger (ImmIgratIon Is a threat). In addition, within this frame, immigrants are not seen as individuals anymore. On the contrary, they are depicted as an undifferentiated, anonymous and hence dehumanized mass: “understanding migrants as molecules in a liquid depersonalizes and dehumanizes them.” (Dervinyte, 2009: 53). (16) In the event of a mass inlux of displaced persons. (ML10) (17) It means that this situation of a mass inlux or imminent mass inlux of displaced people from third countries has to be recognised internationally (ML13) (18) It’s just a large migratory low (ML11) (19) The complexity of the migrant inlow has put enormous strain on the asylum system. Some countries (…) have reached breaking point in their ability to manage the unplanned inlow and meet EU standards for receiving and processing applicants (ML22) Expressions such as a “mass inlux” (Examples (16) and (17)), “a low” (Example (18) and “inlow” (Example 19) relate to the image of water. It is also worth noting that as immigration is here represented in terms of a natural force (e.g., low, inlux), it is often described by means of gradable adjectives, such as large or big (Example (18)). 7.1.4. ImmIgrants are oBjects Immigrants are often presented as impersonal or interchangeable objects; that is, materials depicted as cheap labour that can be easily replaced ore removed from one place to another. As pointed out in the literature (O’Brien, 2003; Charteris-Black, 2006), images of immigrants as quantiiable goods (Examples (20) and (21)) discourage empathy with incomers, who are associated with a feeling of fear of destruction: (20) The controversial issue is the proposal to introduce ixed amounts distribution of refugees among Member States tabled by Germany and rejected by the countries of Central Europe (ML21) (21) Some countries are now challenging the EU proposals by introducing the number of asylum seekers they are willing to take (ML22) Immigration metaphors in a corpus of legal English... 259 The ImmIgrants are oBjects metaphor tends to appear in combination with the natIons are contaIners metaphor. States are viewed as containers and immigrants are perceived of as dehumanized entities that can be easily relocated from one place to another (Example (22) and (23)) and even exploited as cheap labour (Example 24), which turns them into a source of ProfIt (Examples (24) and (25)) that exert pressure on cheap labour (immigrants) that is seen as a threat: (22) The member state of relocation shall take back the person as they are a threat to the country (ML19) (23) Some countries are now challenging the EU proposals by introducing the number of asylum seekers they are willing to take (ML22) (24) Supporting effective management of labour migration to tackle exploitation and support migrant workers (ML23) (25) Migrants in an irregular situation are also more vulnerable to labour and other forms of exploitation (ML24) 7.1.5. ImmIgratIon control Is a Battle Metaphors that refer to ImmIgratIon Is a Battle are also very frequently found in immigration discourse. Immigrants are often portrayed as the invading enemy threatening the stability of the state and, thus, dealing with them requires military action (Biria, 2012: 37): (26) Europe is facing the biggest wave of refugees after decades. In this moment there is no possibility for Member States to combat illegal pathways of reaching Europe alone (ML22) (27) To ight the migration massive inlux, the EU is trying to solve the problem from the bottom looking for the main reasons that made people move from one country to another (ML20) In the examples above, the EU is conceptualized as a container which must be protected and kept secure from external dangers, such as, “the biggest wave of refugees” and/or “the migration massive inlux”. Yet again, immigration is understood in terms of a threat that hinders the stability and integrity of the nation. The data provided above evidence that metaphors are an inextricable part of learners’ description and analysis of migration law, which arises the question whether the construct of immigration that their es- 260 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer says relect was purposely built upon metaphors or not. As Lakoff & Turner (1989) point out: “Metaphor is a tool so ordinary that we use it unconsciously and automatically, with so little effort that we hardly notice it” (p. xi). Thus, we are not always necessarily conscious of the metaphors that we use and the connotations that they evoke. To attest learners’ metaphorical awareness and assess how conscious they were of the evaluative slant that most of the metaphors used in their written production convey, participants were asked to take a survey that overtly asked about these two aspects. 7.2. Survey After a brief introduction that reported on the pervasiveness of metaphor in language, and the fact that metaphor can inluence our perception of the world and our attitudes to it (see Appendix for the complete survey), students were asked to answer two questions: • Q1. Were you aware that legal language was highly metaphorical? • Q2. Had you ever realized that the use of terms such as those mentioned above evoke negative connotations? Explain briely. Almost 40% of law students had not realized before that metaphorical expressions are pervasive in this register. This high percentage conirms that it is important to train students to become aware of the use of metaphorical expressions and the associations they have. The irst step is thus to raise students’ awareness of igurative language; since only if they can recognize the metaphorical use of a speciic word, will they be able to understand the associations it may carry. Then students were asked if they had ever realized that the use of the terms included in the survey evoked negative connotations and had to explain why. It is in the open comments where we can more clearly see the participants’ awareness of metaphors and their connotations, since they show different degrees in the students’ perception both of metaphors and of their associations. Learners’ responses indicate that 72% of the students who said were conscious of metaphorical language also showed their perception of the negative connotations of the metaphors associated with migration. As the following student’s Immigration metaphors in a corpus of legal English... 261 comment conveys, words may not be neutral and can carry different connotations: (28) Yes because the language that you use is not neutral and depending on how you use the words you can transmit different messages. (ML12). The pejorative connotations of a word in its literal usage are transferred to its metaphorical use which thus also carries a negative message, as shown in the following comment: (29) Flood in everyday’s language is rather a negative word, so connecting “lood” with immigration would always evoke negative connotations. (ML05). The purpose of these metaphors to communicate a particular view of immigration to justify some governments’ policies is also recognized by some of the students: (30) Yes, the use of this kind of expressions may contain discriminatory clauses, such as even racist connotations; indirectly. Then, it justiies some policies to restrict immigrants rights, as it’s happening in a great part of Europe. The law is written by people, so it’s obvious it may contain metaphors and political matter. (ML13). The remaining answers (28%) were more ambiguous as far as negative connotations are concerned. As connotative meaning is subjective, not all students recognize pejorative implications: (31) For me words such as lows, inlux and curbs don’t necessarily have a negative connotation. Matter fact all those words have different uses, and even as metaphors I don’t think you can get a universal deinition because they have different meanings. (ML02). (32) Probably because the use of legal terms in a metaphorical way seems easier to understand for laymen from a shallow perspective. While in fact these terms have a very profound and unforseeable load of speciic meaning behind them. That’s why every word in a legal context is important. (ML06). 262 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer However, most students who were previously unaware of the presence of metaphorical language in legal English realized the negative associations implied in these expressions: (33) I had never realized before that metaphorical uses of words such as the ones indicated above evoke negative connotations. However, it is obvious once I have seen in that example that they totally evoke negative connotations by subliminally (?) impressing people’s subconscious with negative meaning which leads then to think of that phenomenon as something dangerous or harmful. (ML01) (34) No, I’ve never realized it, but I think metaphors are useful in this sense because they can bring up different reactions. (ML04). As relected in the comments above, making learners conscious of the fact that legal language is highly metaphorical in nature contributes to raising their awareness of the “strategic dimension” (Damele, 2016: 175) that metaphorical expressions play in inluencing people’s views. The use of metaphors can be unconscious and automatic, thus raising learners’ metaphorical awareness is crucial not only to aid language learners’ communicative competence and proiciency but also to enhance critical thinking. Arguments in favor of integrating metaphor competence (i.e. the ability to acquire, produce and interpret metaphor (Littlemore & Low, 2006)) in the second, foreign and speciic purposes language curricula have multiplied in the last 30 years (Danesi, 1993, 2008; Littlemore & Low, 2006; Boers, 2013). This seems logical if we consider that the development of metaphor competence also contributes to improving textual, grammatical, illocutionary, strategic and sociolinguistic competence (Littlemore & Low, 2006). Thus, for example, explicit metaphor instruction has proven to enhance the expansion and retention of vocabulary (Boers, 2004); the understanding and recalling of polysemous senses and idioms (Kövecses, 2001) or the reduction of negative transfer errors derived from cross-cultural differences in metaphor usage and wording (Boers, 2003, 2004; Campos-Pardillos, 2016). In the case of legal ESP and law studies, an explicit approach to conceptual metaphors and their lexicogrammatical instantiations may help learners not only to detect differences in the metaphorical models that every legal system select to deal with issues such as immigration but Immigration metaphors in a corpus of legal English... 263 also to develop the necessary pragmatic skills to uncover the inference patterns and evaluative slant that they evoke. 8. Conclusion Our results have shown that the metaphorical construct of immigration in learners’ legal discourse builds upon a web of interrelated metaphors (Ponterotto, 2000; Semino, 2008) that seem to have the metaphor natIons are contaIners as their core constituent. The fact that countries are conceptualized as bounded areas vulnerable to the irregular entry of third nationals renders immigration as a risk for their internal welfare, which licenses the use of the metaphors ImmIgratIon Is a threat and ImmIgratIon control Is a Battle to protect the country’s interests. The threat that immigration is thought to pose is often described as a natural hazard, a natural force with catastrophic consequences for the recipient countries. This metaphor contributes to dehumanizing immigrants by equating them with overwhelming lows of water, just as it does their depiction as oBjects whose relocation or expulsion represents a relief for the recipient countries. Only when the perspective shifts away from the devastating effects of immigration on nations and focuses instead on immigrants, the igure of nations as protective hosts emerges. Our analysis has also evidenced that public, legal and learners’ discourse on immigration are largely shaped by a common metaphorical model, which is so highly entrenched that its use and negative slant often go unnoticed by learners. Finally, this study has also proved that bringing to the fore the connotations and power relationships implicit in a given choice of words positively helps learners to take a critical stance towards seemingly neutral terms and realize that words always matter. 9. Acknowledgements We acknowledge the support of the Agència de Gestió d’Ajuts Universitaris i de Recerca (2014 SGR 1374) and the beca de formación en ivestigación y docencia de la Fundación Obra Social y Universidad de Barcelona (grand held by Emilia Castaño). The authors are also grateful to two anonymous reviewers for their comments. 264 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer 10. References Berger Linda, L. 2004. What is the sound of a corporation speaking? How the cognitive theory of metaphor can help lawyers shape the law. Journal of the Association of Legal Writing Directors 2: 169-208. Binotto, Marco. 2015. Invaders, Aliens and Criminals: Metaphors and Spacesin the Media Deinition of Migration and Security Policies. In Bond, Emma; Guido, Bonsaver & Faloppa, Federico (ed.) Destination Italy: Representing Migration in Contemporary Media and Narrative. Oxford: Peter Lang, 31-58. Biria, Ensieh. 2012. Figurative Language in the Immigration Debate: Comparing Early 20th Century and Current U.S. Debate with the Contemporary European Debate. (Thesis). http://pdxscholar.library.pdx.edu/ open_access_etds (234). Boers, Frank. 2000. Metaphor awareness and vocabulary retention. Applied Linguistics 21(4): 553-571. Boers, Frank. 2003. Applied linguistics perspectives on cross-cultural variation in Conceptual Metaphor. Metaphor and Symbol 18(4): 231-238. Boers, Frank. 2004. Expanding learners’ vocabulary through metaphor awareness: What expansion, what learners, what vocabulary. In Achard, Michel & Niemeier, Susanne (ed.) Cognitive Linguistics, Second Language Acquisition and Foreign Language Teaching. Berlin/New York: De Gruyter, 211-232. Boers, Frank. 2013. Cognitive Linguistic approaches to teaching vocabulary: Assessment and integration. Language Teaching 46(2): 208-224. Caballero, Rosario. 2003. Metaphor and genre: the presence and role of metaphor in the building review. Applied Linguistics, 24(2): 145-167. Caballero, Rosario. 2006. Re-Viewing Space. Figurative Language in Architects’ Assessment of Built Space. Berlin/New York: Mouton De Gruyter. Cameron, Lynn & Deignan, Alice. 2003. Combining large and small corpora to investigate tuning devices around metaphor in spoken discourse. Metaphor and Symbol 18(3): 149-160. Campos-Pardillos, Miguel A. 2016. Increasing Metaphor Awareness in Legal English Teaching. ESP Today 4(2): 165-183. Castan Pinos, Jaume. 2008. Building Fortress Europe? Schengen and the cases of Ceuta and Melilla. CIBR/WP10. Belfast: CIBR Working Papers in Border Studies. Castaño, Emilia; Verdaguer, Isabel; Laso, Natalia Judith & Ventura, Aaron. 2014. Economy is a living organism. Metaphorical expressions in a learner corpus of English. Spanish Journal of Applied Linguistics 27(2): 323-337. Immigration metaphors in a corpus of legal English... 265 Catalano, Theresa. 2016. Talking About Global Migration: Implications for Language Teaching. Bristol: Multilingual Matters. Chapetón, Marcela & Verdaguer, Isabel. 2012. Researching linguistic metaphor in native, non-native and expert writing. In MacArthur, Fiona; Oncins-Martínez, José Luis; Sánchez-García, Manuel & Piquer-Píriz, Ana María (eds.) Metaphor in Use: Context, Culture, and Communication. Amsterdam/Philadelphia: John Benjamins Publishing Company, 149-174. Charteris-Black, Jonathan. 2000. Metaphor and vocabulary teaching in ESP economics. English for Speciic Purposes 19: 149-165. Charteris-Black, Jonathan. 2004. Corpus Approaches to Critical Metaphor Analysis. London: Palgrave-MacMillan. Charteris-Black, Jonathan. 2005. Politicians and Rhetoric: The Persuasive Power of Metaphor. New York: Palgrave Macmillan. Charteris-Black, Jonathan. 2006. Britain as a container: immigration metaphors in the 2005 election campaign. Discourse & Society 17(5): 563581. Chavez, Manuel & Hoewe, Jennifer. 2012. National perspectives on state turmoil. Characteristics of elite U.S. newspaper coverage of Arizona SB 1070. In Santa Ana, Otto & González de Bustamante, Celeste (ed.) Arizona Firestorm. Global Immigration Realities, National Media, and Provincial Politics. New York: Rowman & Littleield Publishers, 189202. Cisneros, David. 2008. Contaminated Communities: The Metaphor of “Immigrant as Pollutant” in Media Representations of Immigration. Rhetoric & Public Affairs 11(4): 569-601. Crespo-Fernández, Eliecer. 2013. The treatment of immigrants in the current Spanish and British right-wing press: A cross-linguistic study. In Martínez-Lirola, M. (ed.) Discourses on Immigration in Times of Economic Crisis: A Critical Perspective. UK: Cambridge Scholars Publishing, 86-112. Cunningham-Parmeter, Keith. 2011. Alien Language: Immigration Metaphors and the Jurisprudence of Otherness. Fordham Law Review 79: 15451598. Damele, Giovanni. 2016. Adventures of a metaphor: Apian imagery in the history of political thought. In Gola, Elisabetta & Ervas, Francesca (eds.) Metaphor and Communication. Amsterdam/Philadelphia: John Benjamins Publishing Company, 173-188. Danesi, Marcel. 1993. Metaphorical competence in second language acquisition and second language teaching. The neglected dimension. In Altais, James (ed.) Georgetown University Round Table on Language and Linguistics. Washington DC: Georgetown University Press, 489-515. 266 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer Danesi, Marcel. 2008. Conceptual errors in second-language learning. In de Knop, Sabine & de Rycker, Teun (ed.) Cognitive Approaches to Pedagogical Grammar. Berlin/New York: Mouton de Gruyter, 231-256. Deignan, Alice. 2005. Metaphor and Corpus Linguistics. Amsterdam/Philadelphia: John Benjamins Publishing Company. Deignan, Alice; Littlemore, Jeannette & Semino, Elena. 2013. Figurative Language, Genre and Register. Cambridge: Cambridge University Press. Dervinyte, Inga. 2009. Conceptual emigration and immigration metaphors in the language of the press: A contrastive analysis. Studies about Languages 14: 49-55. Duncan, Martha. G. 1994. In slime and darkness: The metaphor of ilth in criminal justice. Tulane Law Review 68: 725-802. Ebbesson, Jonas. 2008. Law, Power and Language: Beware of Metaphors. Scandinavian Studies in Law 53: 259-269. El Refaie, Elisabeth. 2001. Metaphors we discriminate by: Naturalised themes in Austrian newspaper artices about asylum seekers. Journal of Sociolinguistics 5(3): 352-371. Entman, Robert. 1993. Framing: Toward Clariication of a Fractured Paradigm. Journal of Communication 43(4): 51-58. Faber, Pamela & Márquez Linares, Carlos. 2004. The role of imagery in specialized communication. In Lewandowska-Tomaszczyk, Barbara & Kwiatkowska, Alina (ed.) Imagery in Language. Frankfurt: Peter Lang, 585-560. Flores, Lisa. 2003. Constructing Rhetorical Borders: Peons, Illegal Aliens, and Competing Narratives of Immigration. Critical Studies in Media Communication 20(4): 362-387. Golden, Anne. 2012. Metaphorical expressions in L2 production: The importance of text topic in corpus research. In MacArthur, Fiona; Oncins-Martínez, José Luis; Sánchez-García, Manuel & Piquer-Píriz, Ana M. (eds.) Metaphor in Use. Amsterdam/Philadelphia, 135-148. Granger, Sylvianne; Dagneaux, Estelle & Meunier, Fanny. 2002. The International Corpus of Learner English. Handbook and CD-ROM. Louvainla-Neuve: Presses universitaires de Louvain. Hanks, Patrick. 2007. Metaphoricity is gradable. In Stefanowitsch, Anatol & Gries, Stefan Th. (ed.) Corpus-Based Approaches to Metaphor and Metonymy. Berlin/New York: Mouton de Gruyter, 17-35. Herrmann, J. Berenicke & Sardinha, Tony Berber (eds.). 2015. Metaphor in Specialist Discourse. Amsterdam/Philadelphia: John Benjamins Publishing Company. Incelli, Ersilia. 2013. Shaping reality through metaphorical patterns in legislative texts on immigration: a corpus-assisted approach. In Williams, Immigration metaphors in a corpus of legal English... 267 Chistopher & Tessuto, Girolamo (eds.) Language in the Negotiation of Justice Contexts, Issues and Applications. Series: Law, Language and Communication. Farnham: Ashgate, 235-256. Jackson, Vicki. C. 2006. Constitutions as “living trees”? Comparative constitutional law and interpretive metaphors. Fordham Law Review 75: 921-960. Johnson, Mark. 2007. Mind, Metaphor, Law. Mercer Law Review, 58(3): 845868. Knudsen, Sanne, 2015. Framings of the concept of metaphor in biological specialist communication. In Herrmann, J. Berenicke & Sardinha, Tony Berber (eds.) Metaphor in Specialist Discourse. Amsterdam/Philadelphia: John Benjamins Publishing Company, 191-214. Kövecses, Zoltan. 2001. A cognitive linguistic view of learning idioms in an FLT context. In Pütz, Marti; Niemeier, Susanne & Dirven René (eds.) Applied Cognitive Linguistics II: Language Pedagogy. Berlin: Mouton de Gruyter, 87-115. Kövecses, Zoltan. 2002. Metaphor: A Practical Introduction. New York/Oxford: Oxford University Press. Lakoff, George & Johnson, Mark. 1980. Metaphors We Live By. Chicago: University of Chicago Press. Lakoff, George & Johnson, Mark. 1999. Philosophy in the Flesh. The Embodied Mind and its Challenge to Western Thought. Nueva York: Basic Books. Lakoff, George.1996. Moral Politics. Chicago: University of Chicago Press. Lakoff, George. 2004. Don’t Think of an Elephant: Know your Values and Frame the Debate: The Essential Guide for Progressives. White River Junction, Vt: Chelsea Green Pub. Co. Lakoff, George. 2006. Whose Freedom? The Battle Over America’s Most Important Idea. New York: Picador. Lakoff, George. 2008. The Political Mind. New York: Viking. Lakoff, George & Ferguson, Sam, 2006. The Framing of Immigration. The Rockridge Institute. http://www.rockridgeinstitute.org/research/rockridge/immigration. Lakoff, George & Turner, Mark. 1989. More Than Cool Reason: A Field Guide to Poetic Metaphor. Chicago: University of Chicago Press. Lakoff, George & Wehling, Elisabeth. 2012. The Little Blue Book: The Essential Guide to Thinking and Talking Democratic. New York: Free Press. Larsson, Stefan. 2013. Metaphors, Law and Digital Phenomena: The Swedish Pirate Bay Court Case. International Journal of Law and Information Technology. Advance Access 21(4): 354-379. 268 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer Littlemore, Jeannette & Low, Graham. 2006. Metaphoric competence, second language learning, and communicative language ability. Applied Linguistics 27(2): 268-294. Littlemore, Jeannette; Krennmayr, Tina; Turner, James & Turner, Sarah. 2014. An investigation into metaphor use at different levels of second language writing. Applied Linguistics 35(2): 117-144. Loughlan, Patricia. 2006. Pirates, parasites, reapers, sowers, fruits, foxes … The metaphors of intellectual property. Sydney Law Review 28: 211226. Musolff, Andreas. 2004. Metaphor and Political Discourse. New York: Palgrave Macmillan. Musolff, Andreas. 2011. Migration, media and “deliberate” metaphors. metaphorik.de 21: 7-19. Musolff, Andreas. 2015. Dehumanizing metaphors in UK immigrant debates in press and online media. Journal of Language Aggression and Conlict 3(1): 41-56. Nacey, Susan. 2013. Metaphors in Learner English. Amsterdam/Philadelphia: John Benjamins Publishing Company. Noah, Lars. 2000. Interpreting agency enabling acts: Misplaced metaphors in administrative law. William & Mary Law Review 41(5): 1463-1530. O’Brien, Gerald. 2003. Indigestible Food, Conquering Hordes, and Waste Materials: Metaphors of Immigrants and the Early Immigration Restriction Debate in the United States. Metaphor and Symbol 18(1): 33-47. Orts Llopis, María Ángeles. 2015. Legal English and Legal Spanish: The Role of Culture and Knowledge in the Creation and Interpretation of Legal Texts. ESP Today 3(1): 1-134. Ponterotto, Diane. 2000. The cohesive role of cognitive metaphor in discourse and conversation. In Barcelona, Antonio (ed.) Metaphor and Metonymy at the Crossroads. Berlin: Mouton de Gruyter, 283-298. Pragglejaz Group. 2007. MIP: A Method for Identifying Metaphorically Used Words in Discourse. Metaphor and Symbol 22(1): 1-39. Rosello, Mireille. 1999. Fortress Europe and its Metaphors. Immigration and Law. Madison: European Studies Program. Saiz de Lobado, María Ester. 2015. Análisis de la información y análisis metafórico desde una perspectiva estadístico-lingüística (Tesis doctoral). Alcalá: Universidad de Alcalá- Departamento de Filología. Salager-Meyer, Francoise. 1990. Metaphors in medical English prose: A comparative study with French and Spanish. English for Speciic Purposes 9(2): 145-159. Santa Ana, Otto. 1999. Like an animal I was treated: Anti-immigrant metaphor in U.S. public discourse. Discourse and Society 10(2): 191-224. Immigration metaphors in a corpus of legal English... 269 Santa Ana, Otto. 2002. Brown Tide Rising: Metaphors of Latinos in Contemporary American Public Discourse. Texas: University of Texas Press. Semino, Elena. 2008. Metaphor in Discourse. Cambridge: Cambridge University Press. Steen, Gerard; Dorst, Aletta; Herrmann, Berenike; Kaal, Anna; Krennmayr, Tina & Pasma, Trijntje. 2010. A Method for Linguistic Metaphor Identiication. Amsterdam: John Benjamins Publishing Company. Stefanowitsch, Anatol. 2007. Corpus-Based Approaches to Metaphor and Metonymy. In Stefanowitsch, Anatol & Gries, Stefan Th. (eds.) Corpus-Based Approaches to Metaphor and Metonymy. Berlin/New York: Mouton de Gruyter, 1-16. Strom, Megan & Alcock, Emily. 2017. Floods, waves, and surges: the representation of Latin@ immigrant children in the United States mainstream media. Critical Discourse Studies. doi: 10.1080/17405904.2017.1284137. Ureña Gómez-Moreno, José Manuel. 2012. Conceptual types of terminological metaphor in marine biology. In MacArthur, Fiona; Oncins-Martínez, José Luis; Sánchez-García, Manuel & Piquer-Píriz, Ana María (eds.) Metaphor in Use. Amsterdam/Philadelphia: John Benjamins Publishing Company, 239-260. Van Dijk, Teun A. 2000. Ideology and Discourse: A Multidisciplinary Introduction. Barcelona: Pompeu Fabra University. Winter, Steven L. 2001. A Clearing in the Forest: Law, Life, and Mind, Chicago and London: University of Chicago Press. Winter, Steven L. 2006. Re-embodying Law. Mercer Law Review 58: 869-892. Wodak, Ruth. 2006. Mediation between discourse and society: Assessing cognitive approaches in CDA. Discourse Studies 8: 179-190. Zaiotti, Ruben. 2007. Of Friends and Fences: Europe’s Neighbourhood Policy and the Gated Community Syndrome. European Integration 29(2): 143-162. 270 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer Appendix Part 1 1. List THREE adjectives that describe the noun immigrants 1. 2. 3. 2. List THREE verbs that combine with the noun immigration 1. 2. 3. 3. Tick the expressions that best describe the function(s) of the government/ state and order them from most signiicant to leastsigniicant: Tick Order To prevent problems To punish To tell right from wrong To empower citizens To propose reforms To be empathic To demand responsibility To nurture To impose limits To protect the country’s interests To look after citizens 4. Tick the expressions that best describe the function(s) of laws: Tick Order To protect To set limits 271 Immigration metaphors in a corpus of legal English... Tick Order To control people To enforce rights To guarantee freedom To favour distinctions To promote equality To enjoy public support To reinforce diversity Part 2 1. Use the following response scale to rate how well the statement below describe immigration This does not describe immigration adequately 1 2 3 4 5 This describes immigration perfectly 1 very poorly 2 poorly 3 moderately well 4 well 1. The Freedom of Movement is a great possibility to connect people, learn from each other and of course it makes travelling so much easier. 2. France is also arming itself in preparation for a wave of refugees. 1. Countries have turned to immigrants to contribute to economic growth. 2. Britain is facing a nightly tidal wave of asylum seekers from Cherbourg, France’s second biggest port. 3. We are at a point in this nation’s history where we cannot afford to keep our borders porous in order to provide employers with cheap labor. 4. State members of the European Union have agreed to develop a common immigration policy in order to ensure an eficient management of migration. 5 perfectly well 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 272 Emilia Castaño, Natalia Judith Laso & Isabel Verdaguer 5. This global approach to migration aims to encourage mobility, to ensure coherent policy making. 6. The same authority that protects the borders can decide on who is crossing them seeking for protection. 7. America is not the only country wrestling with immigration. 8. There are almost no measures today to cope with the problem of people’s outlow. 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 Part 3 Metaphors are conceived as mechanisms used to understand one concept in terms of another – i.e., in expressions such as Lead someone step by step through an argument, follow an argument or get lost THINKING IS MOVING). They are an inextricable part of our everyday language and, despite the preconceived idea that legal language is largely free from metaphors, the use of metaphorical expressions is also pervasive in legal discourse. This fact is particularly relevant because metaphors evoke particular worldviews and highlight certain aspects of a phenomenon while downplaying other parts. Thus, for example, when terms such as lows, inlux or curbs are used to describe immigration laws, IMMIGRATION is conceived of as a FLOOD, which brings about negative connotations and dehumanizes immigrants. Likewise, when national borders as deined as areas whose security must be protected against immigration, immigrants are seen as a source of security problems and a threat to the stability of the country. Metaphors are more than igures of speech, the choice of a metaphor over other can profoundly affect the manner in which legal thought is affected (Berger, 2002; Cunningham-Parmeter, 2011; Santa Ana, 1997) 1. Were you aware that legal language was highly metaphorical? YES NO 2. Had you ever realized that the use of terms such as those mentioned above evoke negative connotations? Explain briely. Qf ojs.uv.es/index.php/qilologia/index Lingüístics QUADERNS DE FILOLOGIA NORMES D’EDICIÓ Quaderns de Filologia és el nom que reben les publicacions inançades i/o gestionades directament per la Facultat de Filologia, Traducció i Comunicació de la Universitat de València. Aquestes publicacions es concreten en: 1. La publicació periòdica titulada Quaderns de Filologia, la qual s’edita en format de revista cientíica des de l’any 1995 i es publica en format digital des de 2014 mitjançant l’Open Journal System (OJS). La revista Quaderns de Filologia (QF) compta amb dues sèries: a. QF Estudis Lingüístics – ojs.uv.es/index.php/qilologia/index b. QF Estudis Literaris – ojs.uv.es/index.php/qdef/index 2. La col·lecció de monograies titulada Anejos de Quaderns de Filologia. Les normes d’edició de totes les publicacions són les següents: 1. Format de la pàgina Paràmetres pàgina en Word (sobre pàgina A4) Caixa del text: distància des del límit del paper Marge superior: 6,6 cm Marge esquerra: 5,0 cm Marge inferior: 5,5 cm Marge dreta: 5,0 cm Marge intern: 0,0 cm 2. Presentació del volum 2.1. La primera pàgina La primera pàgina del volum inclourà (en aquest ordre), en Times 11 i espai 1,5 línies: 1. Amb majúscula i centrat (deixeu 2 línies des de l’encapçalament): 274 Quaderns de Filologia QUADERNS DE FILOLOGIA [NOM DE LA SÈRIE] [NÚM. DINS DE LA SÈRIE amb romans] Exemple: QUADERNS DE FILOLOGIA ESTUDIS LINGÜISTICS XVIII 2. Amb majúscula i centrat (deixeu 4 línies en blanc des del nom de la sèrie): NOM DEL VOLUM Exemple: LENGUA Y CIENCIA. RECEPCIÓN DEL DISCURSO CIENTÍFICO 3. Amb cursiva i centrat (deixeu 4 línies des del nom del volum, excepte si el nom del volum ocupa més d’una línia. En aquest cas, resteu els espais): Edició de 4. Amb majúscula i centrat (sense cap espai anterior): NOM I COGNOM(S) D’EDITOR 1 NOM I COGNOM(S) D’EDITOR 2 (Afegir-hi tantes línies com calga) Exemple: JULIA PINILLA MARTÍNEZ VIRGINIA GONZÁLEZ GARCÍA CECILIO GARCÍA ESCRIBANO 5. En la part inferior de la pàgina s’inclourà (deixeu 5 línies en blanc): FACULTAT DE FILOLOGIA, TRADUCCIÓ I COMUNICACIÓ UNIVERSITAT DE VALÈNCIA ANY D’EDICIÓ Les següents dues pàgines contenen informació sobre la publicació Quaderns de Filologia. Aquesta informació està ja predeterminada, com es veurà a la plantilla. Els editors només inclouran la informació sobre el títol del volum i el número dins de la sèrie. 275 Normes d’edició 2.2. L’índex L’índex del volum s’inclourà en la següent pàgina senar, amb el següent format: Nom de la secció en majúscula, Times 11 i centrat: ÍNDEX Lletra: Times 11. El(s) cognom(s) dels autors en versaleta seguit(s) de coma. El(s) nom(s) dels autors en redona minúscula. El títol de l’article en redona minúscula en la següent línia, amb sagnat de 0,75 cm. Nombre de la pàgina al inal dels punts suspensius, amb tabulat al inal. Es recomana seguir la plantilla de QF per als editors. Exemple: Un autor: cognom(s), Nom Títol de l’article....................................................................... xx Dos autors: cognom(s), Nom & cognom(s), Nom Títol de l’article....................................................................... xx Més de dos autors: cognom(s), Nom; cognom(s), Nom & cognom(s), Nom Títol de l’article....................................................................... xx En l’índex constarà la identitat sencera de tots els autors, amb independència del seu nombre. Nota: Les normes de presentació dels articles i l’índex general de tots els volums seran inclosos posteriorment per Quaderns de Filologia. 2.3. Encapçalament de pàgina L’encapçalament de la primera pàgina de cada capítol, article o secció ve predeterminat a la plantilla de QF. Per a la resta d’encapçalaments, el cos de la lletra serà Times 10. Contingut dels encapçalaments de la resta de pàgines de l’article: 276 Quaderns de Filologia • Encapçalament par: número de pàgina a l’esquerra (en lletra redona) i nom de l’autor en lletra cursiva alineat a la dreta. Encapçalament senar: títol de l’article o volum en lletra cursiva (versió curta – no ocupar més d’una línia) alineat a l’esquerra i número de pàgina a la dreta (en lletra redona). • 3. Normes de presentació dels articles 3.1. Qüestions generals Els autors enviaran als editors en suport digital dos arxius en una versió de Word 97 o superior (si no se’n disposa, s’hi enviarà en format RTF): 1. En una pàgina, el nom de l’autor, el títol de l’article i un resum de no més de 10 línies i paraules clau (no més de 5) en l’idioma de l’article i en anglès. Així mateix, s’inclouran les adreces postal i electrònica i el número de telèfon de contacte de l’autor. 2. El text de l’article (seguint les normes d’edició d’aquest document). Els articles tindran una extensió màxima de 15 pàgines. Els continguts de l’article han de ser originals i no haver estat publicats amb anterioritat. 3.2. Format general del text El tipus i cos de la lletra del text de l’article serà Times 11. L’interlineat serà senzill. L’article anirà encapçalat amb el títol en minúscula, en negreta i centrat. Després d’una línia en blanc, la següent línia inclourà el nom i cognom de l’autor o els autors en redona minúscula. En la següent línia, s’inclourà el nom de la institució acadèmica a què pertany l’autor o autors amb cursiva minúscula. Si hi ha més d’un autor que no pertanyen a la mateixa institució, s’afegirà el nom de l’autor i la institució en altres línies. En la següent línia, incloeu l’adreça electrònica de l’autor en redona minúscula i sense subratllat d’enllaç electrònic. Seguiu l’exemple següent i la plantilla: 277 Normes d’edició Títol de l’article Nom de l’autor 1 Nom de la institució acadèmica [email protected] Nom de l’autor 2 Nom de la institució acadèmica [email protected] El peu de la primera pàgina de cada article inclou la citació de l’article amb doi segons les normes de QF (veure secció 3.6.2.2.). El cos de la lletra serà Times 9: Forma bàsica: Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. doi: http://dx.doi.org/ xx.xxxx.xxxx.xx Exemple: Escandell, Dari & Marcillas, Isabel. 2011. Els límits de l’espai autobiogràic en la narrativa breu de Mercè Rodoreda. Quaderns de Filologia. Estudis Literaris XVI: 101-123. doi: http://dx.doi.org/10.1037/qf234 A continuació incloureu els resums de l’article en els espais disposats a la plantilla: un resum en català/castellà i un altre en anglès. Si la llengua de l’article és distinta d’aquestes, s’inclourà un resum en la llengua de l’article i un altre en anglès. El tipus i cos de la lletra del resum és Times 10 redona, minúscula i justiicat. La paraula resum en negreta. El resum no tindrà més de 10 línies. Deixeu una línia després del resum i incloeu les paraules clau (no més de 5), separades per punt i coma (;) i tanqueu amb punt inal (.). Seguiu l’exemple següent i la plantilla: 278 Quaderns de Filologia Resum Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Text del resum. Paraules clau: paraula 1; paraula 2; paraula 3; paraula 4; paraula 5. Abstract Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Text of the abstract. Keywords: keyword 1; keyword 2; keyword 3; keyword 4; keyword 5. El text de l’article començarà en la següent pàgina senar. Els títols dels epígrafs i subepígrafs aniran en lletra redona minúscula, en una línia separada dels paràgrafs anterior i següent. Per a la numeració s’utilitzaran números aràbics (1., 1.1., etc.). El format dels distints nivells és el següent: 1. Epígraf de nivell 1 [Times 11, negreta] 1.1. Subepígraf de nivell 2 [Times 11, cursiva] 1.1.1. Subepígraf de nivell 3 [Times 11, redona] 1.1.1.1 (nivells subsegüents [Times 11, redona]) (Noteu que la numeració va sempre en redona). El format general del text anirà en Times 11, amb sagnat de primera línia en 0,5 cm. excepte en el primer paràgraf de cada epígraf o subepígraf. En general, els textos respectaran les convencions tipogràiques de les llengües en les qual estiguen redactats: Exemple: Normes d’edició 279 1.3. La reformulación como actividad metadiscursiva La reformulación ha sido estudiada desde diferentes ópticas. Para Fuchs (1994) la paráfrasis es un tipo de reformulación: hablar de reformulación es hablar de sentido. La paráfrasis es una estrategia discursiva y cognitiva, que reposa sobre la idea de invariante semántico (remitiéndonos a la lógica), y sobre la noción de equivalencia (…) Ahora bien, la reformulación ha seguido ocupando la atención de los lingüistas que han distinguido la auto-reformulación y la hetero-reformulación (reformulación de nuestro propio discurso vs la reformulación del otro, que predomina en la conversación) (…). 3.3. Citacions Les citacions breus (una o dues línies) apareixeran dins del text entre cometes angleses (“...”): ... “L’any d’edició de l’obra se sap actualment que és el 1872” (Rubio, 1994: 23). Tal com observa Rubio (1994: 23), “l’any d’edició de l’obra...”. Les citacions de més de dues línies aniran en paràgraf a banda, amb separació d’una línia respecte als paràgrafs anterior i posterior, sense cometes. El cos de la lletra serà Times 10 amb marge esquerra (endinsat) a 1 cm. Veieu l’exemple o seguiu la plantilla: Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de la cita. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Les elisions s’indicaran amb tres punts entre claudàtors: [...]. 280 Quaderns de Filologia 3.4. Notes a peu de pàgina Les notes crítiques (no bibliogràiques) apareixeran a peu de pàgina, amb interlineat senzill, justiicat i en Times 9. Les crides a nota s’indicaran en números aràbics volats, darrere de la paraula indicada. Si aquesta porta després un signe de puntuació, aniran davant del signe de puntuació i darrere de la cometa: ...varietats orientals d’aquesta llengua1: amosta2, escalfar i “gallofa”3. 3.5. Referències bibliogràiques internes Les referències bibliogràiques dins del text seguiran els models següents: • Si l’autor no forma part de l’oració: Forma bàsica: text de l’article (cognom autor, any: pàgina) Exemple: ... com ja s’ha observat en altres treballs (Rubio, 1994: 23). • Si la referència abasta tota una obra o s’hi troba dins de l’oració, s’hi poden ometre les pàgines: Forma bàsica: text de l’article cognom autor (any) text de l’article (…) Exemple: ... es la oposición que mencionaba Lliteras (2002) como característica del momento. • Si l’obra citada té més de dos autors, la primera vegada es citarà amb tots els cognoms. En les següents mencions, només s’escriurà el cognom del primer autor seguit de la frase et alii. Exemple: El término inteligencia emocional lo utilizaron por primera vez Salovey y Mayer en 1990 (Álvarez Manilla, Valdés Krieg & Curiel de Valdés, 2006). (…) En cuanto al desempeño escolar, Álvarez Manilla et alii (2006) encontraron que la inteligencia emocional no incide en el mismo. Normes d’edició • 281 Si se cita més d’un autor fora de l’oració, cada referència anirà separada per punt i coma: Forma bàsica: (…) text de l’article (cognom, any; cognom, any; cognom & cognom, any). Text de l’article (...) Exemple: ... parece demostrado a través de numerosos trabajos (Haverkate, 1994; Briz, 1995; Casamiglia & Tusón, 2002; Ballesteros, 2002). • Si se citen més treballs del mateix autor en la mateixa referència, els anys de publicació se separen amb una coma: Forma bàsica: (...) text de l’article (cognom autor(s), any, any, any). Text de l’article (...) Exemple: ... com es recull a les versions més recents de la teoria (Wilson, 2007, 2010, 2012). 3.6. Bibliograia 3.6.1. Consideracions generals • • • • • • El tipus i cos de la lletra es Times 10. Les referències citades apareixen al inal ordenades alfabèticament pel cognom del primer autor. Les obres d’un mateix autor s’ordenen cronològicament i es repeteix el cognom i el nom en cada referència. Les obres d’un mateix autor i any es presentaran afegint una lletra redona minúscula a l’any [ex. (1999a) (1999b)] i s’ordenaran cronològicament. No empreu la forma abreujada et ali per als treballs amb coautoria, independentment del nombre d’autors. (Com s’ha indicat, es pot fer servir aquesta forma al cos del text de l’article com a referència interna, però no a la secció de bibliograia). Per citar treballs en procés de publicació o en preparació, feu servir la indicació corresponent en lloc de l’any (ex. en premsa, en preparació). Si la data de l’edició utilitzada no es correspon amb l’original o amb una edició diferent a la primera i s’hi vol fer constar la data, s’utilitzaran claudàtors i la data original en primer lloc: ([1972] 1998). 282 Quaderns de Filologia • Si es vol fer constar el número de l’edició emprada d’una obra, l’afegireu darrere del títol, entre parèntesi amb redona minúscula i abreviada (ex. 2ª ed.). Cada referència té el format del paràgraf francès (hanging indent) a 1 cm. Per indicar el volum utilitzat d’una obra que en té diversos, s’afegirà en números romans després del títol separat per un espai (per exemple: Història de la Literatura II). Si es vol fer constar el nombre de volums de què consta una obra, s’indicarà després de l’editorial en números aràbics (per exemple: (3 vol.), sense marcar el plural. Les coedicions s’indicaran amb una barra separadora (per exemple: Barcelona/València: PAM/IIFV). Es podrà, si així es vol, afegir, després de la bibliograia, una llista de textos objecte d’estudi o que han servit com a corpus d’anàlisi. Se seguiran per a la seva referència els criteris exposats més avall. • • • • • Documents electrònics • Excepte en el cas de les tesis i llibres electrònics, no incloeu el nom de la base de dades on trobareu el recurs. • Incloeu la data d’accés o descàrrega de l’article. • No escriviu un punt (.) després de l’adreça web (URL). • No feu servir el subratllat dels enllaços electrònics. Digital Object Identiier (doi) • És una sèrie alfanumèrica assignada per la institució que gestiona l’edició a un document en format electrònic. • El doi identiica el contingut de l’article com a objecte digital únic. • Proveu un enllaç permanent per a la localització de l’article a internet. • És requeriment de Quaderns de Filologia incloure el doi com a part de la referència d’un recurs citat a la bibliograia, si en té (seguiu indicacions més avall). 3.6.2. Articles en revistes i publicacions periòdiques 3.6.2.1. Article imprès Un autor: Forma bàsica: Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. Normes d’edició 283 Exemple: Chillón, Lluís-Albert. 1995. Discurs periodístic i fraseologia. Caplletra 18: 165-176. Dos autors: Forma bàsica: Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. Exemple: Bishop, John E. & Brousseau, Kevin. 2011. The end of the Jesuit lexicographic tradition in Nêhirawêwin: Jean-Baptiste de la Brosse and his compilation of the Radicum Montanarum Silva (1766–1772). Historiographia Linguistica 38(3): 293-324. Tres autors: Forma bàsica: Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. Exemple: Vila, Ignasi; Oller, Judith & Fresquet, Montserrat. 2008. Una anàlisi comparativa del coneixement de català de l’alumnat castellanoparlant autòcton i l’alumnat hispà en inalitzar l’educació infantil a Catalunya. Caplletra 45: 203-228. 3.6.2.2. Articles electrònics amb doi: Un autor: Forma bàsica: Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. doi: http://dx.doi.org/ xx.xxxx.xxxx.xx Exemple: Siianou, Maria. 2012. Disagreements, face and politeness. Journal of Pragmatics 44(12): 1554-1564. doi: http://dx.doi.org/10.1016/j.pragma.2012.03.009 284 Quaderns de Filologia Dos autors: Forma bàsica: Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. doi: http:// dx.doi.org/xx.xxxx.xxxx.xx Exemple: De Wit, Astrid & Bisard, Frank. 2013. A cognitive gramar account of the semantics of the English present progressive. Journal of Linguistics 45(7): 1-42. doi: http://dx.doi.org/10.1017/S00222267 13000169 Tres autors: Forma bàsica: Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. doi: http://dx.doi.org/xx.xxxx.xxxx.xx Exemple: Noh, Eun-Ju; Hyeree, Choo & Sungryong, Koh. 2013. Processing metalinguistic negation: Evidence from eye-tracking experiments. Journal of Pragmatics 57: 1-18. doi: http://dx.doi.org/10.1016/j.pragma.2013.07.005 3.6.2.3. Articles en línia (sense doi): Un autor: Forma bàsica: Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. http://www.aaaaa.com [Accés dd/mm/aaaa]. Exemple: Martínez Lirola, María. 2008. La importancia de los nuevos modos de evaluación en el EEES. Una aproximación a las ventajas del uso del Portfolio. Revista de Enseñanza Universitaria 31: 62-72. http://rua.ua.es/ dspace/bitstream/10045/17235/1/6MartinezLirola.pdf Normes d’edició 285 Dos autors: Forma bàsica: Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. http://www. aaaaa.com [Accés dd/mm/aaaa]. Exemple: Bustelo Ruesta, Carlota & García-Morales Huidobro, Elisa. 2000. La consultoría en la organización de la información. El Profesional de la Información 9: 4-10. http://publishersnet.swets.nl/direct/issue? /title=2246163 [Accés 10/05/2009]. Tres autors: Forma bàsica: Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. http://www.aaaaa.com [Accés dd/mm/aaaa]. Exemple: Pozo Muñoz, Carmen; Giménez Torres, Mª Luisa & Bretones Nieto, Blanca. 2009. La evaluación de la calidad docente en el nuevo marco del EEES: Un estudio sobre la encuesta de opinión del programa Docentia-Andalucía. Revista Educación 11: 43-64. http://rabida.uhu.es/ dspace/bitstrea m/handle/10272/4905/b15643773.pdf?sequence=3 3.6.2.4. Article en publicació periòdica: Revista: Forma bàsica: Cognom(s), Nom. Any (dia i mes). Títol de l’article. Títol de la publicació volum(número): pàgina inicial-pàgina inal. Exemple: Viadero, Daniel. 2009 (12 de setembre). Social-skills programs found to yield gains in academic subjects. Education Week 27(16): 1-15. 286 Quaderns de Filologia Periòdic: Forma bàsica: Cognom(s), Nom. Any (dia i mes). Títol de l’article. Títol de la publicació, p. xx. Exemple: Patarroyo, Manuel. 2011 (19 de juny). El parásito de la malaria es mi conidente. El País, p. 64. Periòdic en línia: Forma bàsica: Cognom(s), Nom. Any (dia i mes). Títol de l’article. Títol de la publicació. http://www.xxx.com Exemple: Martínez, Francesc. 2013 (3 d’octubre). Crisi i futur de la televisió pública. El Punt-Avui. http://www.elpuntavui.cat/noticia/article/5-cultura/19cultura/682407-crisi-i-futur-de-la-televisio-publica.html 3.6.3. Llibres 3.6.3.1. Llibre imprès Un autor: Forma bàsica: Cognom(s), Nom. Any. Títol del llibre(: Subtítol)*. Lloc de l’edició: Editorial. (* el parèntesi marca la opcionalitat del subtítol, si la publicació en té). Exemple: Spang, Kurt. 2003. Géneros literarios. Madrid: Síntesis. Dos autors: Forma bàsica: Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc de l’edició: Editorial. Exemple: Allan, Keith & Burridge, Kate. 2006. Forbidden words: Taboo and censoring of language. Cambridge: Cambridge University Press. Normes d’edició 287 Tres autors: Forma bàsica: Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc de l’edició: Editorial. Exemple: Wagner, Emma; Bech, Svend & Martínez, Jesús M. 2002. Translating for the European Union institutions. Manchester: St. Jerome Publishing. 3.6.3.2. Llibre electrònic amb doi: Un autor: Forma bàsica: Cognom(s), Nom. Any. Títol del llibre(: Subtítol). [Base de dades]. Lloc d’edició: Editorial. doi: http://dx.doi.org/xx.xxxx.xxxx.xx Exemple: Rapaport, Herman. 2011. The literature theory toolkit: A compendium of concepts and methods. West Sussex: Wiley-Blackwell. [Versió de Wiley-Online]. doi: http://dx.doi.org/10.1002/9781444395693 Dos autors: Forma bàsica: Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc d’edició: Editorial. [Base de dades]. doi: http://dx.doi.org/xx.xxxx. xxxx.xx Exemple: Montero, Maritza & Sonn, Christopher C. 2009. Psychology of liberation: Theory and applications. New York: Springer Science & Business Media. [Versió de Springer.com]. doi: http://dx.doi.org/10.1007/9780-387-85784-8 Tres autors: Forma bàsica: Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc d’edició: Editorial. [Base de dades]. doi: http:// dx.doi.org/xx.xxxx.xxxx.xx 288 Quaderns de Filologia Exemple: Hardcastle, William J.; Laver, John & Gibbon, Fionna E. 2010. The handbook of phonetics science. Oxford: Blacwell. [Versió de Wiley-Online]. doi: http://dx.doi.org/10.1002/9781444317251 3.6.3.3. Llibre en línia (sense doi): [Noteu que les formes bàsiques es refereixen a llibres en línia amb accés institucional des d’una plataforma digital. En cas d’altre tipus d’accés, ometeu les dades d’identiicació (docID: número d’identiicació). Veieu exemple en 3.6.4.3]. Un autor: Forma bàsica: Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc de l’edició: Editorial. docID: número d’identiicació del document. http://www.aaaaa.com [Accés dd/mm/aaaa]. Exemple: Silva, Reinaldo F. 2011. Portuguese American literature. Penrith: Humanities E-books, LLP. docID: 1056727. http://site.ebrary.com [Accés 19/09/2013]. Dos autors: Forma bàsica: Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc de l’edició: Editorial. docID: número d’identiicació del document. http://www.aaaaa.com [Accés dd/mm/aaaa]. Exemple: Valsalobre, Pep & Rossich, Albert. 2007. Literatura i cultura catalanes (segles xvii-xviii). Barcelona: Editorial UOC. docID: 10566824. http:// site.ebrary.com [Accés 29/08/2013]. Tres autors: Forma bàsica: Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom. Any. Títol del llibre(: Subtítol). Lloc de l’edició: Editorial. docID: número d’identiicació del document. http://www.aaaaa.com [Accés dd/mm/aaaa]. Normes d’edició 289 Exemple: Benito, Jesús; Manzanas, Anna M. & Simal, Begoña. 2009. Critical approaches to ethnic American literature. Uncertain mirrors: Magical realism in US ethnic literatures. Amsterdam: Rodopi. dociID: 10380441. http://site.ebrary.com [Accés 10/10/2011]. 3.6.4. Llibre amb editor o coordinador: 3.6.4.1. Llibre imprès Un editor/coordinador: Forma bàsica: Cognom(s), Nom (ed./coord.)*. Any. Títol del llibre. Lloc d’edició: Editorial. [*Altres opcions: Director (dir.) o Compilador (comp.)] Exemple: Bou Franch, Patricia (ed.). 2006. Ways into discourse. Granada: Comares. Monereo, Carles (coord.). 2000. Estrategias de aprendizaje. Madrid: Visor. Dos editors/coordinadors: Forma bàsica: Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc d’edició: Editorial. [* Altres: Director (dir.) o Compilador (comp.). No marqueu plural en cap cas]. Exemple: Bravo, Diana & Briz, Antonio (ed.). 2004. Pragmática sociocultural: Estudios sobre el discurso de la cortesía en espanyol. Barcelona: Ariel. Carranza, José A. & Ato, Esther (coord.). 2010. Manual de prácticas de psicología del desarrollo. Murcia: Ediciones de la Universidad de Murcia. 290 Quaderns de Filologia Tres editors/coordinadors: Forma bàsica: Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc d’edició: Editorial. Exemple: Blas Arroyo, José Luis; Casanovas, Manuela & Velando, Mónica (ed.). 2006. Discurso y sociedad: Contribuciones al estudio de la lengua en el contexto social. Castellón: Universitat Jaume I. Oltramari, Alessandro; Vossen, Piek & Qin, Lu (coord.). 2013. New trends of research in ontologies and lexical resources: Ideas, projects and systems. Heildelberg: Springer. 3.6.4.2. Llibre electrònic amb doi: Un editor/coordinador: Forma bàsica: Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc d’edició: Editorial. [Base de dades]. doi: http://dx.doi.org/xx.xxxx.xxxx.xx Exemple: Romero-Trillo, Jesús (ed.). 2012. Pragmatics and prosody in English panguage teaching. Netherlands: Springer. [Versió de Springer.com]. doi: http://dx.doi.org/10.1007/978-94-007-3883-6 Dos editors/coordinadors: Forma bàsica: Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc d’edició: Editorial. [Base de dades]. doi: http://dx.doi.org/xx.xxxx. xxxx.xx Exemple: Boehmer, Elleke & Morton, Stephen (ed.). 2009. Terror and the postcolonial: A concise companion. West Sussex: Wiley-Blackwell. [Versió de Wiley-Online]. doi: http://dx.doi.org/10.1002/978144 44310085 Normes d’edició 291 Tres editors/coordinadors: Forma bàsica: Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc d’edició: Editorial. [Base de dades]. doi: http:// dx.doi.org/xx.xxxx.xxxx.xx Exemple: Clark, Andy; Ezquerro, Jesús & Larrazábal, Jesús M. (ed.). 1996. Philosophy and cognitive science: Categories, consciousness and reasoning. Netherlands: Springer. [Versió de Springer.com]. doi: http://dx.doi. org/10.1007/978-94-015-8731-0 3.6.4.3. Llibre en línia (sense doi): Un editor/coordinador: Forma bàsica: Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc d’edició: Editorial. docID: número d’identiicació del document. http://www.xxx.com [Accés dd/mm/aaaa]. Exemple: Ciapuscio, Guiomar E. (ed.) 2009. De la palabra al texto: Estudios lingüísticos del español. Buenos Aires: Eudeba. http://core.cambeiro.com. ar/0-4222-5.pdf [Accés 12/04/2010]. Dos editors/coordinadors: Forma bàsica: Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc d’edició: Editorial. docID: número d’identiicació del document. http://www.aaaaa.com [Accés dd/mm/aaaa]. Exemple: Barletta, Norma & Chamorro, Diana (ed.). 2011. El texto escolar y el aprendizaje: Enredos y desenredos. Barranquilla: Universidad del Norte. docID: 10485834. http://site.ebrary.com [Accés 12/06/2013]. 292 Quaderns de Filologia Tres editors/coordinadors: Forma bàsica: Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom (ed./coord.). Any. Títol del llibre. Lloc d’edició: Editorial. docID: número d’identiicació del document. http://www.aaaaa.com [Accés dd/mm/aaaa]. Exemple: Newman, John; Baayen, Harald R. & Rice, Sally (ed.). 2010. Corpusbased studies in language use, language learning and language documentation. Amsterdam & New York: Rodopi. http://bvbr. bibbvb.de:8991/F?func=service&doc_library=BVB01&doc_ number=024531245&line_number=0001&func_code=DB_ RECORDS&service_type=MEDIA [Accés 13/09/2013]. 3.6.5. Capítols de llibres o entrades en obres de referència: 3.6.5.1. Capítols de llibre Un autor i un o més editors del llibre: (Afegir tants autors i/o editors com calga, seguint el model dels articles de revistes i els llibres). Forma bàsica: Cognom(s), Nom. Any. Títol del capítol. En Cognom(s), Nom (ed./coord.) Títol del llibre. Lloc de publicació: Editorial, pàgina inicial-pàgina inal. Cognom(s), Nom. Any. Títol del capítol. En Cognom(s), Nom & Cognom(s), Nom (ed./coord.) Títol del llibre. Lloc de publicació: Editorial, pàgina inicial-pàgina inal. Cognom(s), Nom. Any. Títol del capítol. En Cognom(s), Nom; Cognom(s), Nom & Cognom(s), Nom (ed./coord.) Títol del llibre. Lloc de publicació: Editorial, pàgina inicial-pàgina inal. Normes d’edició 293 Exemple: Schegloff, Emmanuel. 1982. Discourse as an interactional achievement. In Tannen, Deborah (ed.) Analysing discourse: Text and talk. Washington DC: Georgetown University Press, 73-93. Kerbrat-Orecchioni, Catherine. 2004. ¿Es universal la cortesía? En Bravo, Diana & Briz, Antonio (ed.) Pragmática sociocultural: Estudios sobre el discurso de cortesía en español. Barcelona: Ariel, 39-54. Nota: La indicació lingüística de la localització de l’obra en el volum (ex. “In”, “En”, etc.), depèn de la llengua del l’article. 3.6.5.2. Entrada amb autor en una obra de referència impresa: Forma bàsica: Cognom(s), Nom. Any. Títol de l’entrada. En Cognom(s), Nom (ed.) Títol de l’obra de referència. Lloc d’edició: Editor. Exemple: Isidore, Ian. 1998. African-American literature: Central and South America. Encyclopedia of Latin American Literature. Chicago, IL: Fitzroy Dearnborn Publishers. 3.6.5.3. Entrada amb autor en una obra de referència en línia: Forma bàsica: Cognom(s), Nom. Any. Títol de l’entrada. En Cognom(s), Nom (ed.) Títol de l’obra de referència. http://www.aaaaa.com [Accés dd/mm/aaaa]. Cognom(s), Nom. Any. Títol de l’entrada. En Cognom(s), Nom (ed.) Títol de l’obra de referència. doi: http://dx.doi.org/xxx.xxxx.xxxxx Exemple: Graham, George. 2008. Behaviourism. En Zalta, Enrique (ed.) The Standford Encyclopaedia of Philosophy. http://plato.stanford.edu/entries/ behaviorism [Accés 23/11/2009]. Palfreyman, Mark & Jorgensen, Erik . 2009. In vivo analysis of membrane fusion. En Wiley InterScience Encyclopedia of Life Sciences. doi: http://dx.doi.org/10.1002/9780470015902.a0020891 294 Quaderns de Filologia 3.6.5.4. Entrada sense autor en una obra de referència en línia: Forma bàsica: Nom de la entrada. (s.a.)*. En Títol de l’obra de referència. http://www. aaaaa.com (*sense any) Exemple: Feminism. (s.a.). En Encyclopaedia Britannica. http://global.britannica. com/ EBchecked/topic/724633/ feminism 3.6.6. Tesis doctorals, treballs d’investigació o treballs de màster Tesis doctorals impreses: Forma bàsica: Cognom(s), Nom. Any. Títol de la tesi (Tesi Doctoral). Lloc: Universitat/ Institució – Departament/Facultat/Institut. Exemple: Vegara Fabregat, Laura. 2013. La metáfora en los textos jurídicos y su traducción (Tesis Doctoral). Alacant: Universitat d’Alacant - Departament de Filologia Anglesa. Tesis doctorals en línia: Forma bàsica: Cognom(s), Nom. Any. Títol de la tesi (Tesi Doctoral). Lloc: Universitat/Institució – Departament/Facultat/Institut. [Base de dades]. http://www. aaaaa.com Exemple: Solís García, Inmaculada. 2011. La utilidad del concepto de Referencia en la didáctica del Español Lengua Extranjera (Tesis Doctoral). Oviedo: Universidad de Oviedo - Departamento de Filologia Española. [Base de dades Teseo]. https://www.educacion.gob.es/teseo/imprimirFicheroTesis.do? ichero=29251 Normes d’edició 295 Tesis doctorals en base de dades comercial: Forma bàsica: Cognom(s), Nom. Any. Títol de la tesi (Tesi Doctoral). [Base de dades] (número d’identiicació). Exemple: Santini Rivera, Manuel. 1998. The Effects of Various Kinds of Verbal Feedback on the Performance of the Selected Motor Development Skills in Adolescent Males with Down Syndrome (Tesis Doctoral). [Bases de dades ProQuest Dissertations & Theses] (AAT 9832765). Per a treballs d’investigació, tesines o treballs de màster, seguiu el model de les tesis substituint el tipus de document entre parèntesi darrere del títol del treball. 3.6.7. Informes tècnics i d’investigació Amb autor(s): Forma bàsica: Cognom(s), Nom. Any. Títol de l’informe (número assignat). Institució que encarrega l’informe. Lloc de publicació: Editor. http://www.aaaaa. com [Accés dd/mm/aa]. Exemple: González García, Maria del Mar & Corredera González, Azucena. 2004. Evaluación de la enseñanza y aprendizaje de la lengua inglesa: Educación secundaria obligatòria 2001 – Informe inal. Ministerio de Educación y Ciencia. Instituto Nacional de Evaluación y Calidad del Sistema Educativo (INECSE). Madrid: Subdirección General de Información y Publicaciones. Amb autor corporatiu, institució o organització Forma bàsica: Nom de la institució. Any. Títol de l’informe (número assignat). Lloc de publicació: Editor. http://www.aaaaa.com [Accés dd/mm/aa]. 296 Quaderns de Filologia Exemple: Instituto Nacional de Evaluación Educativa. 2013. Panorama de la educación. Indicadores de la OCDE 2013. Informe español. Madrid: Ministerio de Educación, Cultura y Deporte. http://www.mecd.gob.es/ dctm/inee/internacional/panoramadelaeducacion2013informe-espanol.pdf?documentId=0901e72b816996b6 [Accés 12/09/2013]. 3.6.8. Contribucions en congressos i conferències 3.6.8.1. Publicació en Actes Forma bàsica: Cognom(s), Nom. Any. Títol de la contribució. En Cognom, Nom (ed.) Títol de les Actes del Congrés. Lloc d’edició: Editorial, pàgina inicialpàgina inal. Exemple: Yates, Alan. 1998. Sobre les característiques (sub)genèriques de la novel·la curta o nouvelle. En Alonso, Vicent; Bernal, Assumpció i Gregori, Carme (ed.) Actes del I Simposi Internacional de Narrativa Breu. Barcelona: Publicacions de la Abadia de Montserrat, 9-40. (...) Bernal, María. 2005. Hacia una categoritzación sociopragmática de la cortesía, descortesia y anticortesía en conversaciones españolas de registro coloquial. En Bravo, Diana (ed.) Actas del Primer Coloquio Edice: La perspectiva no etnocentrista de la cortesía. Estocolmo: Universidad de Estocolmo, 365-398. Si se citen tres o més treballs d’un mateix volum (d’un llibre o d’unes Actes), es pot simpliicar la citació de la següent manera, incloent-hi a més a la llista una referència completa del volum seguint les normes de citació dels llibres: Exemple: Yates, Alan. 1998. Sobre les característiques (sub)genèriques de la novel·la curta o nouvelle. En Vicent Alonso, Assumpció Bernal i Carme Gregori (eds.), 9-40. (...) Alonso, Vicent; Bernal, Assumpció & Gregori, Carme (eds.). 1998. Actes del I Simposi Internacional de Narrativa Breu. Barcelona: Publicacions de la Abadia de Montserrat. Normes d’edició 297 3.6.8.2. Treballs no publicats Forma bàsica: Cognom(s), Nom. Any. Títol de la contribució. Comunicació/Ponència presentada en el Nom del Congrés. Lloc de celebració: dates del congrés. Exemple: Gil-Bardají, Ana & Minett-Wilkinson, Jacqueline. 2011. Traducción e interpretación en los servicios públicos de Cataluña: Resultados de un estudio empírico. IV Congreso Internacional de Traducción e Interpretación en los Servicios Públicos. Alcalá de Henares: Universidad de Alcalá, 13-15 de abril. 3.6.9. Altres fonts o recursos electrònics 3.6.9.1. Pàgines web Forma bàsica: Cognom, Nom / Editor. Any d’última actualització. Títol de la pàgina web. Lloc de publicació: Editorial*. http://www.aaaaa.com [Accés dd/mm/ aaa]. (*si està disponible) Exemple: Modern Language Association. 2003. MLA Style. http://www.mla.org/style [Accés 01/05/2012]. 3.6.9.2. CD-ROM i DVD Forma bàsica: Cognom, Nom / Editor. Any. Títol del recurs. [Tipus de suport]. Lloc de publicació: Editorial (si està disponible). Exemple: Real Academia Española (RAE). 2001. Nuevo tesoro lexicográico de la lengua española. [DVD-ROM]. Madrid: Espasa. En el cas de produccions cinematogràiques, curtmetratges, documentals, etc., es podrà incloure el director i l’autor del guió, de la següent forma: 298 Quaderns de Filologia Forma bàsica: Cognom, Nom (director) & Cognom, Nom (guió). Any. Títol de la producció. [Tipus de suport]. Lloc de distribució: Estudi (si està disponible). Exemple: Parker, Oliver (director) & Finlay, Toby (guió). (2010). El retrato de Dorian Gray. [DVD]. Madrid: Aurum producciones. 3.6.9.3. Altres fonts en línia: Seguiu la forma bàsica següent per altres tipus de fonts o recursos en línia. Per a especiicar el tipus de recurs, utilitzeu claudàtors després del títol: [Àudio podcast], [comentari en línia de for], [missatge de llista de discussió], etc. Forma bàsica: Cognom(s), Nom. Any (dia i mes). Títol del recurs. [Tipus de recurs]. En font Nom del programa o medi. http://www.aaaaa.com [Accés dd/ mm/aaaa]. Exemple: Marcé, Xavier. 2013 (2 d’octubre). Radiograia del panorama cultural català. [Àudio podcast]. En Catalunya Radio El Cafè de la República. http://www.catradio.cat/audio/758840/Xavier-Marce [Accés 03/10/2013]. 3.7. Requisits tipogràics La cursiva podrà utilitzar-se (a més d’en els títols de les publicacions) per destacar algun terme o diferenciar paraules o frases curtes en una llengua diferent a la de l’article. No per a les citacions. S’usarà el guió curt (-) en els casos ortogràicament exigibles i l’intermedi (–) en funció de parèntesi dins d’una frase. En aquest cas, si l’incís acaba en punt, se suprimirà l’últim guió. S’usaran les cometes angleses (“...”). Quan calguen distincions internes en una citació, s’empraran les cometes simples (‘...’). Si es dóna el cas, entre les cometes simples i les angleses, es deixarà un espai (...’ ”). 3.8. Elements gràics Les taules aniran centrades, amb amplària màxima de la caixa del text. 299 Normes d’edició Les igures tindran una resolució màxima de 300ppp. Els autors no les inclouran en el text de l’article que entreguen a l’editor. Les igures aniran en un document apart. L’autor marcarà en l’article el lloc on s’inserirà cada igura entre claudàtors, majúscula i amb una espai anterior i posterior respecte al text: Exemple: Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. [INSERTAR FIGURA 1] Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. Text de l’article. En ambdós casos, la llegenda anirà en la part inferior, en Times 9 redona, centrada i separada amb un espaiat anterior de 6 pt., amb indicació de tipus d’element i numerada. Exemple: dades de la taula dades de la taula dades de la taula dades de la taula dades de la taula dades de la taula dades de la taula dades de la taula Taula 1. Llegenda de la taula (Si es tracta d’una taula, les dades aniran, preferiblement, en Times 10). Exemple: Figura 1. Llegenda de la igura ojs.uv.es/index.php/qilologia/index Qf Lingüístics REVISTA QUADERNS de FILOLOGIA ESTUDIS LINGÜÍSTICS Volum I (1995): Aspectes de la relexió i de la praxi interlingüística. Ed. de Carlos Hernández, Brigitte Lépinette i Manuel Pérez Saldanya. Volum II (1997): Sobre l’oral i l’escrit. Ed. d’Antonio Briz, Maria Josep Cuenca i Enric Serra. Volum III (1998): Pragmàtica intercultural. Ed. d’Antonia Sánchez, Vicent Salvador i Josep-Ramon Gómez. Volum IV (1999): El contacto lingüístico en el desarrollo de las lenguas occidentales. Ed. de Milagros Aleza, Miguel Fuster i Brigitte Lépinette. Volum V (2000): Aprendizaje y enseñanza de una segunda lengua. Ed. de M.ª José Coperías, Jordi Redondo i Julia Sanmartín. Volum VI (2001): La pragmática de los conectores y las partículas modales. Ed. de Hang Ferrer i Salvador Pons.* Volum VII (2002): Sexe i llenguatge: la construcció lingüística de les identitats de gènere. Ed. de José Santaemilia, Beatriz Gallardo i Julia Sanmartín.* Volum VIII (2003): Historia de la traducción. Ed. de Brigitte Lépinette i Antonio Melero. Volum IX (2004): Lingüística diacrónica contrastiva. Ed. de Cesáreo Calvo, Emili Casanova i Fco. Javier Satorre. Volum X (2005): Les llengües d’especialitat: noves perspectives d’investigació. Ed. de M.ª Amparo Olivares Pardo i Francisca Suau Jiménez. Volum XI (2006): Critical Discourse Analysis. Ed. de Júlia Todolí, María Labarta i Rosanna Dolón. Volum XII (2007): Pragmática, discurso y sociedad. Ed. de Patricia Bou Franch, A. Emma Sopeña Balordi i Antonio Briz. Volum XIII (2008): Historiografía lingüística hispánica. Ed. de Brigitte Lépinette, María José Martínez Alcalde i Emili Casanova. Volum XVI (2009): Nuevas perspectivas en lingüística cognitiva / New perspectives in cognitive linguistics. Ed. de M.ª Amparo Olivares i Eusebio Llácer. Volum XV (2010): Lexicografía en el ámbito hispánico. Ed. de Cesáreo Calvo, Brigitte Lépinette i Jean-Claude Anscombre. 302 Quaderns de Filologia Volum XVI (2011): La comunicación escrita en el siglo xxi. Ed. de Nicolás Estévez, José Ramón Gómez i María Carbonell. Volum XVII (2012): Lengua y ciencia. Recepción del discurso cientíico. Ed. de Julia Pinilla Martínez, Virginia González García i Cecilio Garriga Escribano. Volum XVIII (2013): Theoretical and empirical advances in word-formation. Ed. de Manuel Pruñonosa-Tomás, Jesús Fernández-Domínguez i Vincent Renner. Volum XIX (2014): La fonética como ámbito interdisciplinar. Estudios de fonopragmática, fonética aplicada y otras interfaces. Ed. de Antonio Hidalgo Navarro, Carlos Hernández Sacristán i Francisco José Cantero Serena. Volum XX (2015): Toponímia romànica. Ed. de Germà Colón, Dieter Kremer i Emili Casanova. Volum XXI (2016): La igura del traductor a través de los tiempos. Ed. de Jordi Sanchis, María Elena Jiménez i Nicolás Antonio Campos Plaza. Volum XXII (2017): Words, Corpus and back to Words. Ed. de Miguel Fuster Márquez i Moisés Almela. ESTUDIS LITERARIS Volum I (1995): Homenatge a Amelia García-Valdecasas. Volums I i II. Ed. de Ferran Carbó, Juan Vicente Martínez, Evelio Miñano i Carmen Morenilla. Volum II (1996): Funció didàctica i persuasió en la literatura. Ed. de Ferran Carbó Aguilar, Evelio Miñano i Carmen Morenilla. Volum III (1997): Dona i literatura. Ed. de Ferran Carbó, Sonia Mattalía, Evelio Miñano i Carmen Morenilla. Volum IV (1999): Les avantguardes i la renovació teatral. Ed. de Juan Vte. Martínez Luciano, Carmen Morenilla, Ramon X. Rosselló i Josep Lluís Sirera. Volum V (2000): Homenatge a César Simón. Ed. d’Antònia Cabanilles, José Vicente Bañuls i Arcadio López. Volum VI (2001): Humor i literatura. Ed. de Carme Gregori, Dolores Jiménez i Juan Vicente Martínez. Volum VII (2002): Narrativa i història. Ed. d’Assumpció Bernal, María José Coperías i Nuria Girona. Volum VIII (2003): Traducción y práctica literaria en la Edad Media Románica. Ed. de Rosanna Cantavella, Marta Haro i Elena Real. Volum IX (2004): Tropos del cuerpo. Ed. de Nuria Girona i Manuel Asensi Pérez. 303 Índex de publicacions Volum X (2005): La recepción de los clásicos. Ed. de Rafael Beltrán Llavador, Puriicación Ribes Traver i Jorge L. Sanchis Llopis. Volum XI (2006): Poesia i silenci. Ed. d’Antònia Cabanilles, Ferran Carbó i Evelio Miñano. Volum XII (2007): Cruzando la frontera. Ed. d’Ana Calero Valera, Domingo Pujante i Miguel Teruel Pozas. Volum XIII (2008): Traducció creativa. Ed. de Cecilia López i Jesús Tronch. Volum XIV (2009): La ciencia icción en los discursos culturales y medios de expresión contemporáneos. Ed. de Adela Cortijo, Guillermo López i Antonio Altarriba. Volum XV (2010): La recepció del teatre contemporani. Ed. de Ramon X. Rosselló, Josep Lluís Sirera i John London. Volum XVI (2011): Escrituras del yo. Ed. de Brigitte Jirku, Begoña Pozo i Ursula Schneider. Volum XVII (2012): Las mujeres, la escritura y el poder. Ed. de Júlia Benavent Benavent, Elena Moltó Hernández i Silvia Fabrizio-Costa. Volum XVIII (2013): El relat: literatura, lectura i escriptura. Ed. de Gemma Lluch, Lluís Quintana i Carmen Gregori. Volum XIX (2014): Teatro de excepción: experiencias escénicas no institucionales en la Europa de los siglos xx y xxi. Ed. de Juan Carlos de Miguel y Canuto, Mireia Aragay Sastre y Juan Vicente Martínez Luciano. Volum XX (2015): Traducción y censura: Nuevas perspectivas. Ed. de Gora Zaragoza Ninet, Juan José Martínez Sierra i José Javier Ávila-Cabrera. Volum XXI (2016): El universo concentracionario: escribir para no olvidar. Ed. de Javier Lluch-Prats, Evelio Miñano Martínez i Javier Sánchez Zapatero. Volum XXII (2017): Revisión crítica de ediciones y traducciones de textos en el siglo xix. Ed. de María José Bertomeu Masiá, María José Coperías Aguilar i Sondra Dall’oco. ESTUDIS DE COMUNICACIÓ Volum I (2002): La cultura mediàtica. Modes de representació i estratègies discursives. Ed. de Josep V. Gavaldà, Carmen Gregori i Ramon X. Rosselló. Volum II (2004): Periodisme de complexitat: ciència, tecnologia i societat. Ed. de Carolina Moreno Castro, Josep Lluís Gómez Mompart i Xavier Gómez Font. Volum III (2008): El discurs del còmic. Ed. de Pelegrí Sancho Cremades, Carmen Gregori Signes i Santiago Renard Álvarez. 304 Quaderns de Filologia Col·lecció Anejos de Quaderns de Filologia Anejo I. Carlos HernándeZ (1985): Oraciones relejas y estructuras actanciales en español. Anejo II. Julio Calvo PéreZ (1986): El adjetivo puro. Estructura léxica y topología. Anejo III. Milagros AleZa IZquIerdo (1987): SER con participio de perfecto en construcciones activas no oblicuas (español medieval). Anejo IV. Antonio BrIZ GómeZ (1989): Sustantivación y lexicalización en español (La incidencia del artículo).* Anejo V. Milagros AleZa IZquIerdo (Con la colaboración de Salvador Pons Bordería e Isabel García IZquIerdo) (1992): Americanismos léxicos en la narrativa de José María Arguedas.* Anejo VI. Rosario Peñaranda MedIna (1994): La novela modernista hispanoamericana: estrategias narrativas.* Anejo VII. Carme Manuel Cuenca (1994): Mito e innovación en la narrativa estadounidense del Nuevo Sur (1879-1918). Anejo VIII. Paul Scott DerrIck (1994): Thinking for a change. Gravity’s Rainbow and symptoms of the paradigm shift in occidental culture. Anejo IX. Mercedes Román FernándeZ (1994): El español dominicano en el siglo xviii. Análisis lingüístico de la ‘Historia de la conquista de la isla española de Sto. Domingo’ de L. J. Peguero.* Anejo X. Juan Pedro SáncheZ MéndeZ (1994): Aproximación al léxico venezolano del siglo xviii a través de la ‘Descripción exacta de la provincia de Benezuela’, de J. L. Cisneros.* Anejo XI. Francisco José LóPeZ Alonso (1995): César Vallejo, Las Trazas del narrador.* Anejo XII. Amparo RIcós (1995): Uso, función y evolución de las construcciones pasivas en español medieval.* Anejo XIII. Joaquín García-Medall (1995): Casi un siglo de formación de palabras del español (1900-1994): Guía bibliográica.* Anejo XIV. Marta Haro (1995): Los compendios de castigos del xiii: estructuras narrativas y mecanismos adoctrinadores. Anejo XV. Mercedes Román FernándeZ (1995): Aportaciones a los estudios sobre el caló en España.* Anejo XVI. Antonio BrIZ GómeZ (coord.) (1995): La conversación coloquial. Materiales para su estudio. Anejo XVII. Nuria GIrona FIBla (1995): Escrituras de la historia. La novela argentina de los años 80. Anejo XVIII. Karen Andresen et alii (eds.) (1995): Ilustración y modernidad. La crítica de la modernidad en la Literatura alemana. Índex de publicacions 305 Anejo XIX. M.ª José MartíneZ Alcalde (1996): Morfología histórica de los posesivos españoles.* Anejo XX. Eusebio V. Llácer (1997): Introducción a los estudios sobre traducción. Historia, teoría y análisis descriptivos.* Anejo XXI. Antonio HIdalgo Navarro (1997): La entonación coloquial. Función demarcativa y unidades de habla.* Anejo XXII. Javier García GIBert (1997): La imaginación amorosa en la poesía del Siglo de Oro. Anejo XXIII. Roger GonZáleZ Martell y Maribel CruZ GonZáleZ (1997): Adivinanzas en La Habana. Anejo XXIV. Leonor RuIZ GurIllo (1997): Aspectos de fraseología teórica española.* Anejo XXV. Julia Sanmartín SáeZ (1998): Lenguaje y cultura marginal. El argot de la delincuencia.* Anejo XXVI. Rosana Dolón (1998): La negociación como tipo discursivo. Anejo XXVII. Salvador Pons Bordería (1998): Conexión y conectores. Estudio de su relación en el registro informal de la lengua. Anejo XXVIII. José Ramón GómeZ MolIna (1998): Actitudes lingüísticas en una comunidad bilingüe y multilectal. Área metropolitana de Valencia. Anejo XXIX. Juan GómeZ CaPuZ (1998): El préstamo lingüístico. Conceptos, problemas y métodos. Anejo XXX. Brigitte E. JIrku, Cecilia LóPeZ RoIg y Herta SchulZe SchwarZ (eds.) (1998): El cuerpo en la lengua y literatura alemanas: Ein Weites Feld. Anejo XXXI. Rosa ÁlvareZ Sellers (ed.) (1999): Literatura portuguesa y literatura española. Inluencias y relaciones. Anejo XXXII. Elena Ortells Montón (1999): Ficción y no icción: La unidad literaria en la obra de Truman Capote.* Anejo XXXIII. Berta RaPoso HernándeZ (ed.) (1999): Textos alemanes primitivos. La Edad Media alemana temprana en sus testimonios literarios. Anejo XXXIV. Mercedes QuIlIs Merín (1999): Orígenes históricos de la Lengua Española. Anejo XXXV. Javier Satorre Grau (1999): Los posesivos en español. Anejo XXXVI. Adela García Valle (1999): El notariado hispánico medieval: Consideraciones histórico-diplomáticas y ilológicas. Anejo XXXVII. Francisca Suau JIméneZ (2000): La inferencia léxica como estrategia cognitiva. Aplicación al discurso escrito en lengua inglesa. Anejo XXXVIII. Fernando Martín Polo (coord.) y Eduardo Tello Torres (eds.) (2000): Historia civil, eclesiástica de Titaguas de D. Simón Rojas Clemente y Rubio. 306 Quaderns de Filologia Anejo XXXIX. Paloma Arroyo Vega (2001): Expresión y contenido de las oposiciones diatéticas en el castellano del siglo xv de la Corona de Aragón. Anejo XL. Luis Veres Cortés (2001): La narrativa del indio en la revista Amauta. Anejo XLI. Marcial TerrádeZ Gurrea (2001): Frecuencias léxicas del español coloquial: Análisis cuantitativo y cualitativo. Anejo XLII. Carmen MorenIlla Talens y M.ª Julia JIméneZ FIol (eds.) (2001): Desde las tierras de José Martí. Estudios lingüísticos y literarios. Anejo XLIII. Ricardo HernándeZ PéreZ (2001): Poesía latina sepulcral de la Hispania romana: Estudio de los tópicos y sus formulaciones. Anejo XLIV. Cristina Matute y Azucena PalacIos (2001): El indigenismo americano II.* Anejo XLV. Vicente Revert SanZ (2001): Entonación y variación geográica en el español de América. Anejo XLVI. José Ramón GómeZ MolIna (coord.) (2001): El español hablado de Valencia. Materiales para su estudio (PRESEEA). 1 Nivel sociocultural alto. Anejo XLVII. José María García Martín (2001): La formación de los tiempos compuestos del verbo en español medieval y clásico.* Anejo XLVIII. Azucena PalacIos y Ana Isabel García (2001): El Indigenismo americano III.* Anejo XLIX. Dolores JIméneZ y Evelio MIñano (2002): Homenaje a Josefa María Castellví. Anejo L. Rafael Beltrán, Marta Haro, Josep Lluís SIrera y Antoni Tordera (2002): Homenaje a Luis Quirante, 2 vol. Anejo 51. Rosario Navarro Gala (2003): Lengua y cultura en la “Nueua corónica y buen gobierno”. Aproximación al español de los indígenas en el Perú de los siglos xvi-xvii. Anejo 52. Jesús PerIs Llorca (2003): Gauchos en el mundo del 80. Leyendo a Eduardo Gutiérrez y Eugenio Cambaceres. Anejo 53. Beatriz Ferrús Antón (2004): Discursos cautivos: convento, vida, escritura. Anejo 54. Marta InIgo Ros (2004): Cultural terms in King Alfred’s Translation of the Consolatio Philosophiae. Anejo 55. Guillermo LóPeZ García (2004): Comunicación electoral y formación de la opinión pública: las elecciones generales de 2000 en la prensa española. Anejo 56. José Ramón gómeZ molIna y M.ª Begoña gómeZ devís (2004): La disponibilidad léxica de los estudiantes preuniversitarios valencianos. Estudio de estratiicación sociolingüística. Índex de publicacions 307 Anejo 57. Antonio torres torres (2004): Procesos de americanización del léxico hispánico. Anejo 58. José Ramón gómeZ molIna (coord.) (2005): El español hablado de Valencia. Materiales para su estudio (PRESEEA). II Nivel sociocultural medio. Anejo 59. Maria Josep marín jordà (2005): Marcadors discursius procedents de verbs de percepció. Argumentació implícita en el debat electoral. Anejo 60. Dolors Palau samPIo (2005): Els estils periodístics. Maneres diverses de veure i construir la realitat. Anejo 61. José Ramón gómeZ molIna (coord.) (2007): El español hablado de Valencia. Materiales para su estudio (PRESEEA). III Nivel sociocultural bajo. Anejo 62. Hang ferrer mora, Herbert Josef holZInger y Berta raPoso fernándeZ (eds.) (2007): Homenaje a Herta Schulze Schwarz. Anejo 63. Juan Carlos tordera yllescas (2008): Introducción a la Gramática Léxico-Funcional. Anejo 64. Jaume PerIs Blanes (2008): Historia del testimonio chileno. De las estrategias de denuncia a las políticas memoria. Anejo 65. Claude BenoIt, Dolores BermúdeZ, Juli leal y Elena real (eds.) (2009): Homenaje a Dolores Jiménez Plaza. Escrituras del amor y del erotismo. Anejo 66. Virginia gonZáleZ garcía (2009): Mayans y la lexicografía del xviii: Un modelo de diccionario universal aplicado a la jurisprudencia. Anejo 67. Adrián caBedo neBot (2009): La segmentación prosódica en español coloquial. Anejo 68. Eduardo esPaña PaloP (2009): Construcciones con cuantiicador en el ámbito panhispánico: norma y uso. Anejo 69. Ferran grau codIna, José María maestre maestre y Jordi PéreZ durá (2009): Litterae Humaniore. Del Renacimiento a la Ilustración. Homenaje al profesor José María Estellés. Anejo 70. María estornell Pons (2009): Neologismos en la prensa: criterios para reconocer y caracterizar las unidades neológicas. Anejo 71. Brigitte léPInette y Brisa gómeZ-ángel (2009): Études de linguistique française. Anejo 72. Maria Josep marIn, Llum Brancho, Josep À. mas i Anna I. montesInos (eds.) (2010): Discurs polític i identitats (trans)nacionals. Anejo 73. Miguel martíneZ lóPeZ, Puriicación rIBes traver y Santiago gonZáleZ y fernándeZ-corugedo (2010): La lengua y la literatura inglesa en sus textos: aproximación crítica. Homenaje al profesor Francisco Fernández. 308 Quaderns de Filologia Anejo 74. Juan Carlos tordera yllescas (2010): Lingüística computacional. Teorías del habla. Anejo 75. Antonio hIdalgo, Yolanda congosto i Mercedes quIlIs (eds.) (2011): El estudio de la prosodia en el siglo xxi: perspectivas y ámbitos. Anejo 76. Santiago vIcente llavata (2011): Estudio de las locuciones en la obra literaria de Don Íñigo López de Mendoza (Marqués de Santillana). Hacia una fraseología histórica del español. Anejo 77. Esteban T. montoro del arco (ed.) (2012): Neología y creatividad lingüística. Anejo 78. Nuria gIrona fIBla (ed.ª) (2012): La cultura en tiempos de desarrollo: violencias, contradicciones y alternativas. Anejo 79. Vicente álvareZ vIves (2012): Estudio fraseológico contrastivo de las locuciones adverbiales en los diccionarios de Vicente Salvá y de Esteban Pichardo. Anejo 80. Francisco Pedro Pla colomer (2012): Métrica, rima y oralidad en el ‘Libro de Buen Amor’. Anejo 81. Nicolás estéveZ fuertes y Begoña clavel arroItIa (2013): Adquisición de Segundas Lenguas (L2) en el marco del Nuevo Milenio: Homenaje a la profesora María del Mar Martí Viaño. Anejo 82. Jorge Martí contreras (2016): Estudio contrastivo gramatical de campo en español como lengua extranjera. Anejo 83. Violeta martíneZ-ParIcIo (ed.) (2017): Cien años después del Cours de Linguistique Générale. Anejo 84. Carles PadIlla carmona (ed.) (2017): Llull, Cervantes, Shakespeare. Imágenes literarias de la locura. * Els números amb asterisc estan exhaurits. Distribució: Publicacions de la Universitat de València. C/ Arts Gràiques, 13; 46010-València; Tfn.: 963 937 174 - Fax: 963 617 051
ojs.uv.es/index.php/qilologia/index Qf Lingüístics WORDS, CORPUS AND BACK TO WORDS: FROM LANGUAGE TO DISCOURSE Miguel Fuster Márquez Moisés Almela Last century’s revolution in computer technologies has also brought with it some changes in the way we conceive language, which are partly due to such revolution, though not entirely. Technological advances in the ield of information and communication have made the compilation and processing of large amounts of data an incredibly easy and fast task. Until quite recently, the compilation of large amounts of text was a job that required an enormous effort by researchers. At present, such process has become more feasible and certainly less time consuming, giving the researcher more freedom to think about interesting ways of exploring the data. However, other important ‘revolutions’ have taken place in linguistics which in various ways have been favoured by these technological developments. One such important revolution has to do with linguistic theorisation. Linguists in the past would have been happy to decide on language matters simply by asking themselves how the grammar of their mother tongues worked since, as native speakers, they felt to be competent enough to take such decisions. This mentalistic approach, of course we are oversimplifying such approaches considerably, relied on the introspective mental power of well-educated speakers, and for most insightful decisions they made on the matter at hand they did not need to observe the authentic language produced by other speakers. All they needed was their own knowledge and their analytical power. In the Fuster Márquez, Miguel & Almela, Moisés. 2017. “Words, Corpus and back to Words: from language to discourse”. Quaderns de Filologia: Estudis Lingüístics 22: 9-12. doi: 10.7203/qf.22.11297 10 Quaderns de Filologia famous Saussurean dichotomy between ‘langue’ and ‘parole’, these linguists were on the side of ‘langue’; ‘parole’ was of little or no interest. However, an important change that was taking place in linguistics was one in which other linguists started to give priority to the manifestations of ‘parole’; that is, how language was actually used by speakers in their communities in order to theorise with greater accuracy about ‘langue’, or linguistic competence. Various signiicant developments are related to such more empirical linguistic movement. One of these was the acknowledgement of the spoken language as a legitimate part of language. Twentieth century lexicographers started to collect and introduce examples of informal or conversational registers in the dictionaries they produced. Also, no less signiicant in this new approach was, for example, the thrust of sociolinguistics, a broad research ield, with many branches and fuzzy boundaries, that viewed languages as heterogeneous entities. Sociolinguists observed that variation was more the rule than the exception in speech communities. Sociolinguists brought with them empirical methodologies that enabled them to analyse how real speakers produced language in real settings in order to build their theories of variation and change. Sociolinguistics also made use of quantiication in their methodologies. This is partly the context for the emergence of corpus linguistics as a new approach to language. The new framework relied on the examination of real data that had its origin in language use, to build convincing linguistic arguments. Both variation and usage have been essential arguments in corpus approaches. However, a corpus should not be confused with a database, quoting Sinclair (1996: 2.1) “[a] corpus is a collection of pieces of language that are selected and ordered according to explicit linguistic criteria in order to be used as a sample of the language.” In contrast with any collection of data – any corpus linguist would insist – a corpus contains a representative sample of language if the researcher needs to draw relevant conclusions about language. Broadly speaking, unlike essentially mentalistic approaches, corpus research is empirical, with a preference for inductiveness, that is, the careful analysis of data in representative corpora. However, most practitioners would agree that corpus linguistics is not a theory, it is a methodology, even if such a methodology is somehow special. In fact, such methodology may be applied to a language, different languages, different varieties of language or registers, by Introduction 11 means of small, medium or large corpora, and adopt different approaches in order to test different theories. Interest in corpus linguistics today may refer to areas such as the quality of corpus compilation, lexis and phraseology, grammar, variation and change, discourse or stylistics, among others. Corpus linguistics has been of interest in theoretical and applied linguistics. There is abundant applied research, for example, in the ields of lexicography, second language acquisition or translation. Indeed, it is dificult to think of research areas where corpus linguistics does not have room and something important to offer. Quite regularly, corpus methodology combines quantitative and qualitative approaches; where, in fact, one approach feeds the other. Former purely qualitative analyses have been in many cases superseded by approaches where quantiication and statistics are becoming more prominent. Nevertheless, many convinced corpus linguists would also claim that they are in favour of triangulation and convergent evidence as a more acceptable approach. Very frequently, the procedure of a corpus linguist will have as its starting point a word or a word list. Therefore, the close examination of a word’s behaviour will be crucial for practically any kind of research which relies on language use. It is also known that the most signiicant advances in contemporary lexicography have been driven by the inspection of reference corpora of variable size and scope that have allowed researchers a more thorough understanding of real usage. Also, the compilation of comparable corpora has provided the basis for establishing parallels, differences and nuances for the purpose of comparability or contrast between languages. In addition, the possibility of compiling more specialized ad hoc corpora has allowed the detailed analysis of vocabulary in different types of discourse, either to determine its value in specialized languages or to gain a better understanding of social or ideological implications, which is determined by the evaluation of linguistic preferences. Finally, it should be added that corpus approaches have revealed the existence of linguistic units which go beyond more traditional lexicological approaches. Extensive research on phraseology and corpus-based lexicography produced in recent decades has brought to light the frequency in discourse of meaningful co-occurring lexical patterns and lexical-grammatical co-selection. The aim of this issue is to bring together investigation into the lexicon in a variety of languages, in a diversity of manifestations – both at 12 Quaderns de Filologia the word level and beyond the word level – and from a variety of perspectives, including not only those which focus on how the vocabulary is internally organized, but also those which deal with the role that lexical units and lexical relations play in the organization of other language levels, particularly in the organization of discourse. These issues are approached from a variety of perspectives that include not only developments in several disciplines of theoretical and descriptive linguistics, particularly in lexicology, phraseology, word formation, discourse analysis, but also in diverse applied disciplines such as translation, foreign language teaching, English for speciic purposes and critical discourse analysis. One of the criteria employed in the compilation of the volume was also the coverage of linguistic diversity. In total, six different languages are investigated in the studies selected in this volume: English, German, Spanish, French, Portuguese, Italian. Without claiming exhaustiveness, we consider that the variety of contributions presented here offers an insight into the vigour of current corpus research into phenomena related to the lexicon. Admittedly, the full range of topics, approaches and methodologies developed in this area of research could not it in a single volume, but a careful selection of studies representing a variety of interesting advances can be representative of signiicant developments taking place in the ield. References Sinclair, John McH. 1996. EAGLES. Preliminary Recommendations on Corpus Typology. http://www.ilc.pi.cnr.it/EAGLES96/corpustyp/corpustyp.html.