Computational Morphology
Computational Morphology
Computational Morphology
Abstract
1 Introduction
Natural languages have intricate systems to create words and word forms
from smaller units in a systematic way. The part of linguistics dealing with
these phenomena is morphology. This chapter starts with a quick overview
over this fascinating field. It continues with applications of computational
morphology. The rest is devoted to processing techniques. Computational
morphology has evolved from very modest beginnings using full form
lexica or some ad-hoc concatenation techniques to the much more powerful
tools available today. The chapter concludes with a number of examples for
encoding morphological phenomena from different languages using these
tools.
2 Linguistic fundamentals
A basic distinction is the one between bound and free morphs. A free morph
may form a word on its own, e.g., the morph door. We call such words
monomorphemic because they consist of a single morph. Bound morphs, on
the other hand, occur only in combination with other forms. All affixes are
bound morphs. For example, the word doors consists of the free
morph door and the bound morph -s. Words may also consist of free morphs
only, e.g., tearoom, or bound morphs only, e.g., aggression.
Surprisingly, there is no easy answer to this question. One can easily spot
words" in a text because they are separated from each other by blanks or
punctuation. However, if you record ordinary speech you will find out that
there are no breaks between words. But, we could isolate units which occur
over and over again in speech, but in different combinations. So the notion
of word" makes sense. But how do we define it?
Base forms in English are at the same time always word forms in their own
right, e.g., the base form degrade is also present tense, active voice, non
3rd person singular. In other languages we find a slightly different situation.
In Italian nouns are marked for gender and number. Different affixes are
used to signal masculine and feminine on the one hand and singular an
plural on the other hand.
SINGULAR PLURAL
MASCULINE pomodoro pomodori tomato
FEMININE cipolla cipolle onion
Neither of the two forms of a noun can function as the base form. Instead,
we must assume that the base form is what is left over after removing the
respective suffixes, i.e., pomodor- and cipoll-. Such base forms that cannot
occur as word forms in their pure form are called stems.
2.2.1 Inflection
PRESENT PAST
INDICATIVE INDICATIVE SUBJUNCTIVE SUBJUNCTIVE
SINGULAR PLURAL SINGULAR PLURAL SINGULAR PLURAL SINGULAR PLURAL
st
1 PERSON lese lesen lese lesen las lasen lse lsen
2nd PERSON liest lest lesest leset last last lsest lset
3rd PERSON liest lesen lese lesen las lasen lse lsen
Participle lesend gelesen
Imperative lies lest
Infinitive lesen
Derivation can be applied recursively, i.e., words that are already the
product of derivation can undergo the process again. That way a potentially
infinite number of words can be produced. Take, for example, the following
chain of derivations:
Every word form must at the core contain some root form. This root can
(must) then be complemented with additional morphs. How are morphs
realized? Obviously, a morph must somehow be recognizable in the
phonetic or orthographic pattern constituing the word. The most common
type of morph is a continuous sequence of phonemes. All roots and affixes
are of this form. A complex word can then be analyzed as a series of morphs
concatenated together. Agglutinative languages function almost exclusively
this way. But there are surprisingly many other possibilities.
2.3.1 Affixation
common uncommon
A suffix is an affix that is attached after a stem. Take, e.g., the English
plural marker s:
shoe shoes
This term subsumes processes which do neither introduce new nor remove
existing segments. Morphs are not realized as any string of phonemes, but as
a change of phonetic properties or an alteration of the prosodic shape.
Umlaut has its origin in a phonological process, whereby root vowels were
assimilated to a high-front suffix vowel. When this suffix vowel was lost
later on, the change in the root vowel became the sole remaining mark of the
morphological feature originally signalled by the suffix.
NOUN VERB
xport exprt
rcord recrd
cnvict convct
2.3.4 Suppletion
(2) *pseudohospitalationize
Affixes is that they attach to specific categories only. This is an example for
a syntactic restriction. Restrictions may also be of a phonological, semantic
or purely lexical nature. A semantic restriction on the English adjectival
prefix un- prevents its attachment to an adjective that already has a negative
meaning:
unhappy *unsad
unhealthy *unill
unclean *undirty
The fact that in English some suffixes may only attach to words of Latin
origin (cf. 2.2.2) is an example for a lexical restriction.
great greater
tall taller
happy happier
competent *competenter
elegant *eleganter