Norms, New Words, and Empirical Reality

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

International Journal of Lexicography, 2020, Vol. 33, No.

2, 135–149
doi: 10.1093/ijl/ecaa005
Advance Access Publication Date: 17 March 2020
Article

Article

NORMS, NEW WORDS, AND EMPIRICAL


REALITY

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


Pius ten Hacken
Leopold-Franzens-Universität Innsbruck ([email protected])

Abstract
The central question of this paper is how the inclusion of new words in dictionaries
can be related to the empirical reality and norms of language. Because dictionaries
are generally dictionaries of a language, the starting point is how this notion of
named language is framed. The traditional view of a language as a system is con-
trasted with the corpus-based view of a language as realized in use and with the
Chomskyan view based on language as a speaker’s competence. Then, the nature
of words in each perspective is discussed, leading to different characterizations
and different standards for the evaluation of new words. The function of new
words is generally to name new concepts. In naming, word formation, sense
extension, and borrowing can be used. Whereas lexicographers see their task
as mainly descriptive, users often expect dictionaries to be gatekeepers. The
competence-based perspective can serve as a ground where these views can be
reconciled.

Key words: Prescriptivism, named language, neologism, naming, competence, performance,


corpora

1. Norms and language


This paper is concerned with the question on what basis new words are included in diction-
aries. Obviously, this question touches both on norms and on observations of empirical
reality. Perhaps less obviously, the status of new words, but also the status of norms and
the nature and selection of data from empirical reality depend on the theoretical framework
that is adopted. A crucial point in this respect is the status of languages. Where necessary, I
will distinguish language as a property of the human species from named language. Named
languages are English, Portuguese, Bulgarian, etc.
The views on the nature of named languages can be classified in three broad categories,
which have a clear historical order. The first is the idea that a named language is a system
that should be carefully attended to in order to improve it and keep it in an excellent state.

C 2020 Oxford University Press. All rights reserved.


V
For permissions, please email: [email protected]
136 Pius ten Hacken

Here the image is that of a precious stone. The second view of named languages rather
focuses on the actual use of language. Here, each speaker’s language use is equally valued
and studied from a descriptive perspective. Finally, as a consequence of Chomsky’s (1965)
distinction between competence and performance, the insight emerged that named lan-
guages are not empirical objects at all.
In this paper, I will first briefly explain the rationale behind each of these views of
named language (section 2). Then I will turn to the position of words in each of these con-
ceptions (section 3). Finally, the question of new words and their coverage in dictionaries is
addressed (section 4).

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


2. Three views on the nature named languages
In linguistics and in lexicography, a wide range of views on what is a language can be
observed. In most cases, the nature of the view adopted is not stated explicitly, but must be
inferred from other assumptions and statements. Here, I will classify these views into three
types, a language as an entity, as the result of its use, and as an abstraction.

2.1. Named languages as entities


For languages such as English, French, or German, it is difficult to observe exactly what
they are. For this reason, metaphors are often used in talking about them. A common meta-
phor represents a language as a living creature. Thus, we speak of living languages and
dead languages. Also speaking of the growth or the health of a language fits into such a
metaphor, as does the idea that the Romance languages are descendants of Latin.
In this perspective, two concerns with respect to the language are its development and
its protection. In the case of French, the Académie Française, founded in 1635, played an
important role. In its original Statuts, its mission is described as in (1).

(1) ‘La principale mission de l’Académie sera de travailler avec tout le soin et toute la diligence
possibles à donner des règles certaines à notre langue et à la rendre pure, éloquente et cap-
able de traiter les arts et les sciences.’ (Académie Française 1635, Art. XXIV)1

The Académie Française was not the first language academy to be set up. In fact, it was
probably inspired by or modelled on the Accademia della Crusca, which had been founded
in Florence in 1583. However, in contrast to earlier academies, its close connection to the
French monarchy provided crucial political support for its role in setting up a standard for
French. Art. XXVI of its 1635 statutes assigns the Académie Française the task of compiling
a dictionary, a grammar, as well as rules for rhetoric and poetry. Only the dictionary was
realized, but as Pruvost (2002: 29-38) outlines, it was only published in 1694, after a long
history of conflict.
Whereas in France, a royal charter supported the mission in (1), in England no academy
was founded. This does not mean that the attitude of protecting the language was essentally
different. Thus, in outlining his Plan for his 1755 dictionary, Samuel Johnson makes the ob-
servation in (2).

(2) ‘BARBAROUS or impure words and expressions, may be branded with some note of infamy, as
they are carefully to be eradicated wherever they are found; and they occur too frequently
even in the best writers.’ (Johnson 2008 [1747]: 28)
Norms, New Words, and Empirical Reality 137

A striking observation about (2) is that Johnson assigns the judgement that a word is
‘barbarous or impure’ a higher priority than the use by ‘the best writers’. This suggests that
the English language is an object of a type that can be held up as a standard to any of its
speakers.
The idea that named languages such as French and English have a character that should
be protected against intrusions of bad expressions is at the basis of much linguistic thought
from the 17th century onwards. It underlies attempts to codify and standardize the lan-
guage, whether or not these attempts are sponsored by a state.

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


2.2. Named languages and language use
In the age of Romanticism, a different attitude to named languages emerged. Instead of
focusing on a number of major languages in their perfect or near-perfect state, an increased
interest in the historical development of languages and their variation as reflected in dialects
came up. Especially in Germany, which in the first half of the 19th century was more of a
linguistic and cultural than a political unit, this type of linguistic study was popular.
Important early representatives of these ideas were the brothers Jacob and Wilhelm Grimm
(1785-1863 and 1786-1859).
In linguistics, the view that actual use is more important than any normative rules
spread quickly, despite the persistence among the general public of the view of language
being subject to some authority. Hall (1960) tried to present the linguistic perspective to a
general public, making claims such as (3).

(3) ‘There is no such thing as good and bad (or correct and incorrect, grammatical and ungram-
matical, right and wrong) in language. [. . .]
A dictionary or grammar is not as good an authority for your own speech as the way you
yourself speak. [. . .]
All languages and dialects are of equal merit, each in its own way.’ (Hall 1960: 6)

The attitude in (3) represents a broad consensus among linguists, at least in North America.
It not only underlies the theoretical and descriptive work along the lines of Harris (1951),
but also influenced lexicographic work such as Gove’s (1961) third edition of Merriam-
Webster’s dictionary. Controversies about this dictionary, as described by Morton (1994),
reflect the clash between a section of dictionary critics and users expecting guidance from a
dictionary along the lines of (1-2) and lexicographers taking (3) as their lead. On the issue
of including ain’t in the dictionary, Philip Gove is quoted as in (4).

(4) ‘The dictionary merely recognizes a linguistic fact which cannot be disputed no matter how
objectionable: There are many areas of the United States in which cultivated speakers do use
“ain’t”[.]’ (Morton 1994: 158)

The role of the dictionary assumed in (4) is as a record of language use. Even though Gove
actually emphasizes that the entry marks the use of ain’t as ‘disapproved by many and more
common in less educated speech’ (1961: ain’t 1b), there is no sense that it should ‘be eradi-
cated’ as Johnson suggests in (2). In the view of English as determined by language use, the
lexicographer understands their role as describing rather than trying to influence language use.

2.3. Named languages and competence


The emergence of generative grammar, often called the Chomskyan Revolution (e.g.
Newmeyer 1986), was an event in the history of linguistics which affected first of all the
138 Pius ten Hacken

orientation of linguistic theorizing. Whereas the preceding school of Post-Bloomfieldian lin-


guistics had little to say about syntax, generative grammar made it the main focus of re-
search. However, it also changed the way language was perceived as an object of study.
This is epitomized in the well-known distinction in (5).

(5) ‘We thus make a fundamental distinction between competence (the speaker-hearer’s know-
ledge of his language) and performance (the actual use of language in concrete situations).’
(Chomsky 1965: 4)

The main purpose of Chomsky’s (1965) discussion of which (5) is a part was to explain

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


how his approach to language relates to the one adopted in Post-Bloomfieldian linguistics,
as for instance in Harris (1951), which takes the collection of a corpus as the first step in
linguistic research. Whereas a corpus consists of performance data, the actual object of
study according to Chomsky should be the underlying competence. An important step in
his reasoning is that competence, as intended in (5), is an empirical entity. It is realized in
the speaker’s brain, even though we do not know exactly how, and it exists independently
of any observation. A side effect of this reasoning that Chomsky only realized later is the
conclusion in (6).

(6) ‘grammars have to have a real existence, that is, there is something in your brain that corre-
sponds to the grammar. That’s got to be true. But there is nothing in the real world corre-
sponding to language.’ (Chomsky 2004 [1982]: 131)

In (6), taken from an interview, grammar is used for the system corresponding to compe-
tence in (5) in the speaker’s mind and language is used in the sense of what I call here
named language. The first discussion of the view in (6) appears in Chomsky (1980). For
English, we can observe the problem of seeing it as an empirical object when we consider
the statements in (7).

(7) a. English is the language of Britain.


b. English is the language of Shakespeare.
c. English is the language of science.
d. English is an old language.

Identifying English with the language of a geographical entity as in (7a) is an abstraction. It


is not the trees or the rocks where English is located. It depends on the people. Not all peo-
ple in Britain speak English, but we still feel (7a) is correct. In (7b), English is identified
with the language of an author. This is in line with the attitude in (1) for French. However,
the language found in Shakespeare’s works reflects Shakespeare’s competence and does not
correspond to current speakers’ competence or performance. The statement in (7c) is per-
haps more controversial than (7a-b). This highlights the fact that there is no empirical way
of verifying or falsifying it. Finally, the historical perspective evoked in (7d) raises the ques-
tion of how to delimit what counts as English. After all, Old English texts are not intelli-
gible to speakers of present-day English unless they have special training. Corresponding
statements for Romance languages also raise the issue of how to distinguish Latin and the
early Romance vernaculars.
Obviously, English as used in (7) is not an instantiation of either competence or per-
formance. It is a norm that has been set up. As such, it is the result of a process of
Norms, New Words, and Empirical Reality 139

classification of speakers and texts. In order to classify speakers and texts in this way, a
catalogue of languages has to be decided on first. Such decisions are not purely empirical. It
is not possible to decide on purely linguistic criteria, for instance, how many and which
Romance languages there are in the Iberian peninsula or in Italy. As far as there is a consen-
sus, this is the result of political decisions, only partly supported by linguistic evidence.

2.4. Relations between the three perspectives


In presenting the three perspectives on language, it is not my purpose to show that one is
correct and the others are not. In many cases, the assumptions about the nature of lan-

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


guage, both by lexicographers and by theoretical linguists, are not made explicit. They may
combine elements of different perspectives. For a pre-theoretical use of English, we do not
have to specify what the boundaries of the concept are. In lexicography, one often finds
positions that incorporate most aspects of the attitudes reflected both in (2) and in (4) with-
out pursuing either to the point where they get into conflict. This is reflected, for instance,
in ten Hacken’s (2012) critical analysis of statements in Simpson’s (2000) preface to the
OED, but also in Gilliver’s (2016) presentation of the early discussions in the Philological
Society.
It is also important to distinguish the general idea of each of the three perspectives from
particular theories. Although language use as an authority on the data was embraced by be-
haviourism, this does not mean that using a corpus as a source of data commits one to a be-
haviourist view of linguistics. Similarly, recognizing that named languages are not
empirical entities is an insight that emerged from Chomskyan linguistics, but it is not de-
pendent on it. The insight that competence can be studied scientifically is also at the basis
of many types of cognitive science, not only cognitive linguistics. A classic criticism of
Chomsky’s notion of competence such as Hymes (1971) does not attack this insight, but
only Chomsky’s delimitation of competence. The insight that a named language is a con-
struct that only comes into existence when discussions about language are subject to an au-
thority is not dependent on whether one assumes that grammatical competence, as
proposed by Chomsky, or communicative competence, as proposed by Hymes is the right
unit for a theoretical account. The insight about the non-empirical status of named lan-
guages is also found in sociolinguistic studies and textbooks such as Piller (2016) and
Coulmas (2018).
In lexicography, a central question is what are the units to be described in a dictionary.
These units are generally referred to as words. The three perspectives discussed here present
a good framework for the discussion of the nature of words and, following from this, the
nature of new words.

3. Words in a language
A classical view of the nature of a word is Saussure’s (1916) theory of the signe. Here, I will
first present this theory’s position on the nature of a language, then relate it to corpus-based
and mentalist theories of words.

3.1. The word as a signe


In Saussure’s (1916) theory, a language is a system of signes. A signe is composed of a signi-
fié, corresponding to the meaning, and a signifiant, corresponding to the form. The nature
140 Pius ten Hacken

of the signifié and the signifiant is essentially negative in the sense that what constitutes a si-
gnifiant or a signifié is determined by its difference from other signifiés and signifiants. The
relation between the two components of a single signe is governed by the arbitraire du
signe. This means that it is not possible to predict one component on the basis of the other.
Because of the importance of the relation between signes, the signes of one language consti-
tute a system, the langue, which Saussure contrasts with the parole, the use of language.
Saussure’s treatment of the opposition between langue and parole makes it clear that he
does not subscribe to a view where the use of language constitutes the highest authority on
what a language is. The emphasis on the langue as a system may suggest a comparison of

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


the opposition between langue and parole with Chomsky’s opposition between competence
and performance in (5). However, for Saussure, the langue is a social system, shared by the
members of a speech community. This contrasts with Chomsky’s notion of competence,
which is individual, realized in the brain of an individual speaker.
The idea that a langue is a system of signes is closer to the perspective in section 2.1
than to the other two. This does not imply that Saussure would have subscribed to state-
ments such as (1) and (2). However, for someone defending a normative view of language
of the type reflected in these statements, Saussure’s notion of langue offers a better starting
point than the tolerance expressed by Hall (1960) in (3) or Chomsky’s (1965) notion of an
individual grammatical competence. Saussure’s langue can be constructed as an object of
regulation and protection.

3.2. Words in a corpus


When language use is taken as the nature of a language, a corpus is the preferred data type.
It is worth considering, however, how words in language use are different from words in a
corpus.
A corpus is normally a collection of written texts. Spoken language can be included but
has to be transcribed. In written language, words appear as strings of characters separated
by spaces and punctuation. For spoken language, the transcription adds word boundaries.
The information that can be computed from a corpus is the frequency of words and their
distribution, i.e. the context in which they appear.
Language use, at least as it is reflected in a corpus, generally has a communicative pur-
pose. We also use language to formulate thoughts for our own benefit, but this can normal-
ly only become available as corpus material if the result is subsequently used for
communicating these thoughts in speech or writing. In communication, the meaning of
expressions, their expressive value, and their effect on the addressee are crucial. These
aspects are not directly visible in a corpus. They exist in the minds of the speaker and the
hearer (or the writer and reader) and depend on their background knowledge. Because of
the difficulty of verifying such information, Bloomfield (1933: 21-41) intended to eliminate
any reference to such mental entities in the study of language.
One problem with such a non-mental, purely corpus-based approach to words is that it
becomes impossible to recognize errors. COCA (2008-2019) contains four occurrences of
inforamtion. That these are errors rather than occurrences of a rare word is not recogniz-
able from the corpus alone.
Another problem is that it depends on the interpretation which word forms are grouped
together. That eat and ate, but not tea, belong to the same lexeme is a theory-dependent
classification, based on meaning. We can in principle recognize affixes in a string-based
Norms, New Words, and Empirical Reality 141

analysis, but the recognition of meaningful affixes again has to appeal to meaning. That in-
terminable does not involve a prefix inter- or that linguistics is not the plural of linguistic
can only be determined by taking into account what these words mean.
Therefore, recognizing words in a corpus as strings is simple in the sense that they are
given, but their analysis as meaningful words is much less straightforward. We do not find
signes in the corpus, only their signifiants. In order to make any interesting observations
about the words, we need to appeal to knowledge of language that is realized in the speaker
and hearer but not in the text.

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


3.3. Words in the mind
The only place where words are realized as a combination of form and meaning is in the
mind of speakers. Although this insight emerged in generative linguistics, the generative
focus on syntax directed attention away from a proper theory of the lexicon. In fact, before
Chomsky (1965: 84-88) introduced the lexicon, words were introduced by phrase structure
rules. Chomsky’s (1970) discussion of the distribution of tasks between phrase structure
rules, transformations, and the lexicon had as its main purpose to determine the scope of
the rules of syntax. The so-called lexicalist hypothesis he proposed served rather as a de-
limitation of the domain of grammar than as the basis for a theory of the realization of
words in the speaker’s mind. One property of the theory that demonstrates this particularly
well is that there is no interest in the representation of morphologically simplex words.
Only words that reflect rule application are discussed. This tradition is continued in Halle
& Marantz’s (1993) Distributed Morphology as well as in Lieber’s (2004) lexical semantic
theory.
A generative theory in which the structure of the meaning and form of words, whether
they are morphologically complex or not, and their realization in the mental lexicon
are represented is Jackendoff’s (2002) Parallel Architecture (PA). In PA, a word is a
combination of phonological, syntactic, and semantic information. This information is
represented separately, but the three representations of the same expression are linked at
the level of individual structural components where they have correlations. Each repre-
sentation has a structure. The phonological and conceptual structures are connected to
images of realizations in the outside world, represented in the mind as prototypes of the
kind proposed by Rosch (1978). The syntactic structure only has links to the phonologic-
al and the conceptual structure, not directly to the outside world, so that it is not based
on prototypes.
Compared to the Saussurean signe, the word in PA is enriched by a syntactic representa-
tion. The most prominent difference, however, is that whereas a signe is realized in the
speech community, the word in PA only exists in the mental lexicon of an individual speak-
er. Words as elements of a named language only come into existence by generalizations
about speakers. The decisions about what is a word in a language depend on how exactly
these generalizations are made. Words in a language are not empirical.

4. New words
Against the background of the three perspectives on the nature of words, we can now con-
sider the questions about new words in (8).
142 Pius ten Hacken

(8) a. What makes a word new?


b. How should new words be evaluated?
c. Why do new words emerge?
d. How should new words be covered in dictionaries?

The nature of the property new in (8a) depends on the perspective chosen, which also
determines the basis for the evaluation in (8b). I will address these two questions together
in section 4.1. An understanding of the status and origin of new words starts from (8c).
In section 4.2, I will argue that naming is crucial. On the basis of these considerations, (8d)
will be discussed in section 4.3.

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


4.1. The nature and evaluation of new words
If we consider a language such as English as a system of Saussurean signes, we can see the
position of each word as determined by its relation to other words. The views in (1) and (2)
suggest the metaphor of English (or any other named language) as a precious stone. By
careful operations of chipping away and polishing, it can become a beautiful diamond,
but improper handling can spoil it.
A word is new when it is not yet a part of the system of signes. At this point, it does not
have any relations. The relations to be established determine its position within the system.
These relations concern both the signifiant and the signifié. Establishing such relations may
upset the system. Thus, Zoppetti (2017) recognizes as dangers of anglicisms in Italian that
these words undermine the Italian phonological system by their violation of phonotactic rules,
e.g. their ending in a consonant, as well as the morphological system, because rules such as
the ones for plural formation of nouns cannot be applied to borrowings in the usual way.
In a corpus-based view of a named language, the recognition of new words requires the
comparison of a corpus with a list of existing words. In a computer-based analysis, words not
recognized as part of the lexicon are classified as unknown words. A classic study in this re-
gard is the one reported on by Amsler (1984: 173). He compares a corpus of texts from the
New York Times News Service to a collegiate dictionary. What is noteworthy is that he does
not presuppose the amount of preprocessing that is now common, but sticks rigorously to the
evidence as it is recorded in the corpus. He finds that as much as 64% of the word types in
the corpus do not appear in the dictionary.2 These unknown words include all new words
but also many other cases. Thus, inflected forms and hyphenated forms at the end of a line
are not so much new words as poorly analysed forms of existing words. Apart from these
classes, large remaining categories of unknown words are proper names and typos. Especially
the former are of course frequent in a news corpus. They are usually not considered words of
the language. About three quarters of the unknown words could immediately be assigned to
these categories, “the remainder could not be categorized without checking their contexts”
(1984: 173). Here, Amsler refers to human categorization, i.e. classification on the basis of
the knowledge of language the reader of the corpus has. Actual new words will be in the final
category, but especially in this category, judgement by the reader is essential.
One may object that Amsler’s (1984) experiment is very primitive and current corpus
technology yields a much more limited set of unknown words, which is much closer to the
set of new words. However, to the extent that current corpus technology improves on
Amsler’s analysis, the result reflects a theoretically based assessment of the categories of
Norms, New Words, and Empirical Reality 143

data that can be handled automatically. They emerge from generalizations about the corpus
that draw on an interpretation based on theory. Therefore, although unknown words are a
purely automatic consequence of the analysis of a corpus, the recognition of new words
among them requires decisions that cannot be derived from corpus data without human
intervention.
Let us finally turn to new words in the mental lexicon. Here the difference between
existing and new words is that the latter require an action that is not necessary for the
former. In using an existing word, the speaker or hearer has to retrieve the word from their
mental lexicon. For a new word, a new association between a concept and a name has to

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


be created.
At this point, it is worth paying attention to the nature of concepts. Concept is opposed
to individual. The former are named by words, the latter by proper names. The difference
can be illustrated by considering a word such as house and a name such as Laura. There are
many women called Laura, but they are not instantiations of a concept Laura. The name
Laura only serves as a (partial) identification of an individual and must be supplemented as
necessary if the identification is not sufficiently specific. By contrast, different instantiations
of house have properties in common that justify their inclusion in the concept for which
house serves as a name.
The newness of the association between a concept and a name is always related to a
particular mental lexicon, i.e. to a particular speaker. New concepts are those for which
the speaker does not have a word. Children learning their first language need new words
very often. However, adults are confronted with new concepts as well. The need for a new
name for a concept also arises when an existing name is no longer considered adequate, for
instance, as a result of a shift in connotation or because of fashion.
To sum up, what counts as a new word depends on the perspective. If a language is seen
as a system of signes, a new word is added to the system and has to find a place as a signe in
relation to the existing signes. If the existing system is considered as a valuable asset, changes
induced by adding new words have the potential to upset the system. If a language is deter-
mined by use, unknown words are simple facts recorded in a corpus, but the distinction be-
tween errors, proper names, nonce-words and actual neologisms can only be decided by an
informed speaker of the language. If the lexicon of a language is a speaker’s mental lexicon,
new words are the result of naming. Naming occurs whenever the speaker encounters a new
concept and creates a word for it in their lexicon. It also takes place when an existing name is
no longer deemed adequate. The evaluation of new words is rather negative in the first per-
spective, because they are dangerous for the system. In the other two perspectives, the evalu-
ation is neutral, as a matter of fact, or even positive, as the solution to a naming problem.

4.2. Naming
As naming is crucial in the formation and acceptance of new words, I will now briefly
present the main naming procedures. The starting point for naming is a concept. Let us
therefore start with a concept and take as an example Fig. 1.
A first observation about Fig. 1 is the distinction between an individual object and
a concept. Fig. 1 represents an individual object, but it is used here as an instance of a con-
cept. The concept retains only those properties of the instance that are deemed important.
Thus, in this case abstraction is made from, for instance, the colours. What is named is the
concept, not the instance. Fig. 1 represents a skateboard. The name skateboard is
144 Pius ten Hacken

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


Figure 1: Example of a skateboard3

a compound. Compounding is a word formation process that can be described by a rule.


However, the meaning of a compound is only partly determined by the rule. Thus, the rule
states that a compound consists of two components, but does not specify the semantic rela-
tion between the two components. The concept was there before the compound was
formed, so that the meaning is established before the name is chosen. This choice may be
influenced by other words. Thus, OED (2019 [1986]: skateboard n.) mentions the analogy
to surfboard.
The earliest attestation of skateboard in OED (2019 [1986]) is from 1964. For such
recent dates, it is likely that the word has been caught on record fairly soon after its first
use. However, it is not just the first speaker to coin the word who uses a word formation
rule. Each time when another speaker creates an entry for skateboard in their mental
lexicon on the basis of the relation to skate and board, they use the relevant word formation
rule. For most speakers, the trigger for this process will be that they come across (an
instance of) the concept together with the name. All earliest uses of skateboard are from
California. For speakers in Britain, it is likely that they encountered the word and the con-
cept as an import from the US. Nevertheless, also for British speakers, the word is the result
of word formation, not a borrowing, because they analyse it in terms of their own language.
OED (2019 [1986]) gives the structure as involving the noun skate. An alternative analysis
is that the first component is the verb skate. In this case, OED seems to take the line that
the oldest attested word of the same form and relevant meaning is given.4 In fact, the actual
origin may be different for different speakers.
In other languages, the concept in Fig. 1 is also called skateboard. For French, Robert
(1986) gives it as skate-board with a first attestation date of 1977 and a short form of skate.
For Italian, DISC (1997) gives skateboard with a first attestation of 1978. In French and
Italian, the word is a borrowing, because the components are not words of French
or Italian. Indeed, the French short form skate in the sense of ‘skateboard’ is only possible
because skate is not used otherwise in French. Although French and Italian speakers may
well be aware of the compound status of the word in English, this does not directly affect
the status in French or Italian.
Another concept is illustrated in Fig. 2. This is the component of a skateboard that con-
nects the board to the wheels. OED (2019 [1915]: truck, n.2) does not mention it, because
Norms, New Words, and Empirical Reality 145

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


Figure 2: Truck of a skateboard5

the entry is older than the sense of truck related to skateboards. In the original Californian
context, the choice of the name is a case of sense extension. However, the relevant sense in
the OED only has a reference to the entry for bogie, because truck in this sense is the
American equivalent of British bogie. This means that in British English, truck in the sense
of Fig. 2 is a borrowing, because it is a word from a different system.
Therefore, the concepts illustrated in Fig. 1 and Fig. 2 illustrate three naming strategies.
Word formation, as in skateboard, creates a new word form on the basis of a rule and one
or more existing words. Sense extension, as in American truck, takes an existing word to
use it in a new sense. Borrowing, as in British truck and French and Italian skateboard,
takes a word from a different system.
From a communicative point of view, word formation has the advantage of signalling,
by introducing a new word, that the word is meant to name a new concept. Moreover,
the rule gives an indication of the likely meaning of the word. ten Hacken (2013, 2019)
gives a more detailed discussion of these properties in the context of PA. Arguably,
sense extension provides less support to the addressee, because an existing word is used.
Sense extension adds to ambiguity and does not signal directly that a new sense is
intended. In the case of borrowing, the signal that we are dealing with a new concept is at
least as strong as in word formation, but the retrieval of the meaning depends on know-
ledge of another language. In all cases, of course, the context is an important source of
information.
From the perspective of a language as a system, sense extension is the least disruptive
operation, as it does not introduce a new word. In this perspective, word formation is less
problematic than borrowing, because the word formation rule and the base to which it
applies give more information on how to fit the new word into the existing system.

4.3. New words in dictionaries


In lexicography, the attitude to new words varies. The tolerance to their inclusion in a
dictionary depends in part on the conception of the language that is adopted. Two explicit
and opposite views are represented by Markowski (2010) and CED (1986).
146 Pius ten Hacken

Markowski (2010) is a Polish monolingual dictionary representing a normative view of


language. Already in the title, Wielki słownik poprawnej polszczyzny (‘Great dictionary of
correct Polish’), the assumption is made that it is possible and appropriate for a dictionary
to encode what is correct usage in a language. One aspect of this attitude is the extensive
discussion of hasła problemowe (‘problematic items’) in a section at the end of the diction-
ary (2010: 1546-1704). We find here, for instance, a discussion of anglicyzmy, kultura
jeR zyka (‘language culture’), neologizmy, and norma jeR zykowa (‘linguistic norm’). This
reflects an attitude that is compatible with the one in (1) and (2). In such a dictionary, one
expects that a new word is entered only when it is admitted into the language. Inclusion

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


implies a degree of endorsement.
CED (1986) makes it quite explicit that a different approach is taken. In the foreword to
the second edition, McLeod (1986) attributes the success of the first edition to its being “a
dictionary based on a fresh survey of the contemporary language as it was actually being
used in both its written and its spoken forms”. This is in line with the attitude adopted in
(3) and (4). In such a dictionary, one expects that words in actual use are covered without
any further conditions. Inclusion is based on corpus evidence.
An analysis of the nature of a dictionary from the perspective of a mentalist view of the
lexicon is proposed by ten Hacken (2009). As argued in this analysis, a dictionary cannot
describe a named language as an empirical entity. The language as a norm is not empirical.
The corpus is empirical, but it requires the use of language competence to interpret the
forms and recognize errors. The language competence is empirical, but it is individual and
cannot be equated with a named language. There is no appropriate entity that is at the
same time empirical and can be described in a dictionary as a named language. Instead, ten
Hacken (2009) argues that a dictionary should be interpreted as a tool. It provides the user
with information so that they can solve linguistic problems. This perspective of a dictionary
as a tool to be used in problem solving is in line with the view proposed by Tarp (2008).
For the inclusion of new words, the mentalist view implies that it cannot be verified em-
pirically whether a word exists in the language. Lexicographers have a pre-eminent role in
determining whether a word should be included in the dictionary. As the general public
generally expects the dictionary to tell them whether a word exists, the lexicographer has to
take an informed decision. This is illustrated in the discussion of ain’t in section 2.2.
Lexicographers tend to shift their responsibility to the corpus. However, although a corpus
may provide evidence, it does not demonstrate the existence or non-existence of a word.
The lexicographer’s informed interpretation of the evidence is crucial in reaching a deci-
sion. This interpretation derives from the lexicographer’s linguistic competence and from
their assessment of the dictionary users’ linguistic competence. The lexicographer’s skill is
then to interpret the corpus material and produce an entry that helps the user.

5. Conclusion
In this paper, I addressed the question on what basis new words are included in diction-
aries. In order to answer this question, I presented three conceptions of the notion of named
language and showed what they assume about the nature of word and of new word.
If named languages are thought of as entities that need to be cared for and protected, words
can be viewed as connected in a system of Saussurean signes. New words are then a poten-
tial danger, because the way they are introduced may undermine the system. If named
Norms, New Words, and Empirical Reality 147

languages are determined by language use, words are units of communication whose form
is reflected in a corpus. New words are just a fact of life. If named languages are seen as the
result of meta-linguistic consideration without underlying empirical entities, words are only
fully realized in a speaker’s competence. New words are new for individual speakers.
One of the central questions of any lexicographic project is the selection of words to be
included in the dictionary. In their approach to this question, lexicographers do not need to
choose a single perspective and pursue it radically in all cases. In fact, they typically take
into account corpus data as they find them, traditional decisions and sensitivities, and their
own and other speakers’ linguistic competence in deciding whether to include a particular

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


new word in their dictionary or not. Gilliver (2016: 128-130) gives some examples of the
OED’s struggle with comprehensiveness in the early period. The availability of data in the
form of corpora has changed notably since then, but the issue of comprehensiveness
remains of the same category. The lexicographic policy of a particular dictionary can be
interpreted as guidance in how to prioritize corpus data, tradition, and the lexicographer’s
language competence in specific decisions on inclusion.
Many dictionary users expect the dictionary to guide their language use. For new words,
dictionaries then have a function as a gatekeeper. This perspective is compatible with the
view of a named language as a system of signes. Many lexicographers see their work as
recording language use. The idea that a named language is actually a construct, based on
linguistic competence and performance but not corresponding to an empirical entity in the
real world, can serve as a meeting ground for these two views. It creates a basis for under-
standing the degree of compatibility and conflict. Both dictionary users and lexicographers
use their linguistic competence in their handling of dictionaries. However, as this is such an
intuitive, automatic action, it often happens unconsciously. The idea that the user has to in-
terpret the dictionary entry and that the lexicographer has to interpret the corpus evidence
is only noted if there is a problem or conflict. The insight that a named language is a con-
struct without empirical realization does not reduce the specific skills involved in compiling
and using dictionaries. In fact, the recognition that corpus evidence hugely underdetermines
lexicographic decisions arguably upgrades the value of the lexicographer’s skills.

Notes
1. Author’s translation: ‘The main mission of the Academy is to work with all possible
care and diligence in order to set up definite rules for our language and to make it pure,
eloquent, and capable of treating the arts and sciences.’
2. Amsler (1984: 173) gives the percentages out of the total sets of word types appearing
in either the corpus or the dictionary. I recalculated the value on the basis of these pro-
portions. For the categories of unknown words, Amsler gives the following propor-
tions: “approximately one-fourth were inflected forms, another one-fourth were
proper nouns, one-sixth were hyphenated forms, one-twelfth were misspellings”
(1984: 173).
3. Source: “Global_Skateboard”, by Cekay - Own work, CC BY-SA 4.0, https://
commons.wikimedia.org/w/index.php?curid¼68933279
4. The relevant sense of the noun (skate n.2) has a first attestation of 1662, whereas the
verb has a first attestation date of 1696.
5. Source: “Skateboard-truck”, by User:Tp - Own work, CC BY 1.0, https://commons.
wikimedia.org/w/index.php?curid¼161588
148 Pius ten Hacken

References

A. Dictionaries
CED. 1986. Collins Dictionary of the English Language (Second edition, Hanks, P. (ed.)).
Glasgow: Collins.
DISC. 1997. Dizionario italiano Sabatini Coletti. Sabatini, F. and Coletti V. (eds). Firenze: Giunti.
Gove, P. B. (ed.). 1961. Webster’s Third New International Dictionary of the English Language,
Unabridged. Cambridge (Mass.): Riverside Press.
Markowski, A. (ed.). 2010. Wielki słownik poprawnej polszczyzny. Warszawa: PWN.

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


Robert, P. (ed.). 1986. Dictionnaire alphabétique et analogique de la langue française (Second edi-
tion by A. Rey). Paris: Le Robert.
OED. 2000-2019. Oxford English Dictionary (Third edition, Simpson, J. (ed.)). Oxford: Oxford
University Press. www.oed.com

B. Other literature
Académie française. 1635. ‘Statuts et règlements de l’Académie française, 22 février 1635’. In
Académie française (ed.) Statuts et règlements, 7–26.
Amsler, R. A. 1984. ‘Machine-Readable Dictionaries.’ Annual Review of Information Science and
Technology 19: 161–209.
Bloomfield, L. 1933. Language. London: Allen & Unwin.
Chomsky, N. 1965. Aspects of the Theory of Syntax. Cambridge (Mass.): MIT Press.
Chomsky, N. 1970. ‘Remarks on Nominalization’ In Jacobs, R. A. and P. S. Rosenbaum (eds),
Readings in English Transformational Grammar. Waltham (Mass.): Ginn, 184–221.
Chomsky, N. 1980. Rules and Representations. New York: Columbia University Press.
Chomsky, N. 2004 [1982]. The Generative Enterprise Revisited. Berlin: Mouton de Gruyter.
COCA 2008-2019. The Corpus of Contemporary American English. Davies M. (ed.). Accessed on
25 May 2019. http://corpus.byu.edu/coca/.
Coulmas, F. 2018. An Introduction to Multilingualism: Language in a Changing World. Oxford:
Oxford University Press.
Gilliver, P. 2016. The Making of the Oxford English Dictionary. Oxford: Oxford University
Press.
ten Hacken, P. 2009. ‘What is a Dictionary? A View from Chomskyan Linguistics.’ International
Journal of Lexicography 22: 399–421.
ten Hacken, P. 2012. ‘In What Sense is the OED the Definitive Record of the English Language?’
In Fjeld, R. V. and J. M. Torjusen (eds), Proceedings of the Fifteenth EURALEX International
Congress, EURALEX 2012, Oslo, August 7-11, 2012. Oslo: Department of Linguistics and
Scandinavian Studies: University of Oslo, 834–845.
ten Hacken, P. 2013. ‘Semiproductivity and the Place of Word Formation in Grammar’. In ten
Hacken, P. and C. Thomas (eds), The Semantics of Word Formation and Lexicalization.
Edinburgh: Edinburgh University Press, 28–44.
ten Hacken, P. 2019. Word Formation in Parallel Architecture. Berlin: Springer.
Hall, R. A. 1960. Linguistics and Your Language. Garden City (NY): Doubleday.
Halle, M. and A. Marantz. 1993. ‘Distributed Morphology and the Pieces of Inflection’. In Hale,
K. and S. J. Keyser (eds.), The View from Building 20: Essays in Linguistics in Honor of Sylvain
Bromberger. Cambridge (Mass.): MIT Press, 111–176.
Harris, Z. S. 1951. Methods in Structural Linguistics. Chicago: University of Chicago Press.
Reprinted as Structural Linguistics, 1960.
Norms, New Words, and Empirical Reality 149

Hymes, D. 1971. ‘Competence and Performance in Linguistic Theory’. In Huxley, R. and Ingram,
E. (eds), Language Acquisition: Models and Methods. London: Academic Press, 3–24.
Jackendoff, R. 2002. Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford:
Oxford University Press.
Johnson, S. 2008 [1747]. The Plan for a Dictionary of the English Language. London: Knapton &
Knapton. Reprinted in Fontenelle, T. (ed.). 2008. Practical Lexicography: A Reader. Oxford:
Oxford University Press, 19–30.
Lieber, R. 2004. Morphology and Lexical Semantics. Cambridge: Cambridge University Press.
McLeod, W. T. 1986. ‘Foreword’. In Hanks, P. (ed.), Collins Dictionary of the English Language
(Second edition.) London/Glasgow: Collins, vii.

Downloaded from https://academic.oup.com/ijl/article/33/2/135/5809092 by guest on 23 April 2024


Morton, H. C. 1994. The Story of Webster’s Third: Philip Gove’s Controversial Dictionary and Its
Critics. Cambridge: Cambridge University Press.
Newmeyer, F. J. 1986. ‘Has There Been a ‘Chomskyan Revolution’ in Linguistics?’ Language 62:
1–18.
Piller, I. 2016. Linguistic Diversity and Social Justice: An Introduction to Applied Sociolinguistics.
New York: Oxford University Press.
Pruvost, J. 2002. Les Dictionnaires de langue française. Paris: Presses Universitaires de France.
Rosch, E. 1978. ‘Principles of Categorization’. In Rosch, E. and B. B. Lloyd (eds), Cognition and
Categorization. Hillside (NJ): Lawrence Erlbaum, 27–48.
Saussure, F. de 1916. Cours de linguistique générale. Bally C. and A. Sechehaye (eds), Édition cri-
tique préparée par Tullio de Mauro. Paris: Payot, 1981.
Simpson, J. 2000. ‘Preface to the Third Edition of the OED.’ Oxford: Oxford University Press.
Downloaded on 28 October 2011. http://www.oed.com/public/oed3preface/preface-to-the-
third-edition-of-the-oed.
Tarp, S. 2008. Lexicography in the Borderland between Knowledge and Non-Knowledge:
General Lexicographical Theory with Particular Focus on Learner’s Lexicography. Tübingen:
Niemeyer.
Zoppetti, A. 2017. Diciamolo in italiano: Gli abusi dell’inglese nel lessico dell’Italia e incolla.
Milano: Hoepli.

You might also like