Chapter #2 Morphological Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Morphological Analysis COSC 6405

2018/19 Sem I
ስነ ‫ו‬እֶድ
‫ו‬እֶድ ָጅነُ

አ‫ו‬ድ

ስ‫ץ‬

የቃָ ክፍָ
ָጅ ָጅነُ
ቤُ ቤِ٤ [ቤ(ُኦ)٤]
ቤُ ከቤُ

ነُ ָጅነُ
ኦ٤ ቤِ٤ [ቤ(ُኦ)٤]
ከ ከቤُ
ስብ‫ץ‬ ‫ר‬በ‫ץ‬
ስብ‫ץ‬ ‫רـ‬በ‫נ‬٤

-ና- ُናን ُን
አָ…ኧ‫ו‬ አָ‫ר‬በ‫[ ונ‬አָ‫ר‬በ(‫ץ‬ኧ)‫]ו‬
Positive Comparative Superlative
good better best
bad worse worst
little less least
much
more most
many
Verbal Root (Examples) Pattern of Derivation Derived Noun
ጥ-ቅ-‫ו‬ CእCእC ጥእቅእ‫[ ו‬ጥቅ‫]ו‬
‫ו‬-‫ץ‬-ُ CእCC ‫ו‬እ‫]ُץו[ ُץ‬
‫ו‬-ָ-ስ CኧCC ‫ו‬ኧָስ [‫ָא‬ስ]
ን-ግ-‫ץ‬ CኧCኧC ንኧግኧ‫[ ץ‬ነገ‫]ץ‬
ድ-ክ-‫ו‬ CእCኣC ድእክኣ‫[ ו‬ድካ‫]ו‬
ֱ-‫ו‬-‫ו‬ CእCኧC ֱእ‫ו‬ኧ‫]ואֱ[ ו‬
ግ-ብ-ዕ CእC ግእብ [ግብ]
ጥ-ው-‫ו‬ CኦC ጥኦ‫[ ו‬ጦ‫]ו‬
ቅ-ው-‫ץ‬-ጥ CኡCC ቅኡ‫ץ‬ጥ [‫ץשּׁ‬ጥ]
ድ-ብብ-ቅ CእC1C1እC ድእብብእቅ [ድብቅ]

Adjective (Examples) Morpheme Derived Noun


ደግ -ነُ ደግ-ነُ [ደግነُ]
ቅ‫ץ‬ብ -ኧُ ቅ‫ץ‬ብ-ኧُ [ቅ‫ץ‬በُ]
ብֱָ -ኣُ ብֱָ-ኣُ [ብָሃُ]
ብָጥ -ኦ ብָጥ-ኦ [ብָጦ]
Stem (Examples) Morpheme Derived Noun
ው‫ץ‬ድ- -ኧُ ው‫ץ‬ድ-ኧُ [ው‫ץ‬ደُ]
ቅዳስ- -ኤ ቅዳስ-ኤ [ቅዳሴ]
እ‫ץ‬ጅ- -እና እ‫ץ‬ጅ-እና [እ‫ץ‬ጅና]
ָ‫ו‬- -ኣُ ָ‫ו‬-ኣُ [ ָ‫]ُד‬
ስ‫ץ‬ቅ- -ኦ ስ‫ץ‬ቅ-ኦ [ስ‫] בּץ‬
٤ָ- -ኦٍ ٤ָ-ኦٍ [٤ֹٍ]
ውጥ- -ኤُ ውጥ-ኤُ [ውጤُ]
ፍֳግ- -ኣ ፍֳግ-ኣ [ፍֳጋ]
ናፍቅ- -ኦُ ናፍቅ-ኦُ [ናፍ‫]ُבּ‬
ድ‫ץ‬ግ- -ኢُ ድ‫ץ‬ግ-ኢُ [ድ‫ץ‬ጊُ]
‫וֹר‬ክ- -ኢ ‫וֹר‬ክ-ኢ [‫וֹר‬ኪ]
ዝ‫ץ‬ፍ- -ኢያ ዝ‫ץ‬ፍ-ኢያ [ዝ‫ץ‬ፊያ]
ጠ‫ושׂ‬- -ኤٍ ጠ‫ושׂ‬-ኤٍ [ጠ‫]ٍהשׂ‬
-ְድ ‫א‬- ‫א‬-ְድ [‫ְא‬ድ]


Stem-like Verb (Examples) Morpheme Derived Noun
ዝ‫ו‬- -ٍ ዝ‫ו‬-ٍ [ዝ‫]ٍו‬
ደስ- -ٍ ደስ-ٍ [ደስٍ]
Noun (Examples) Morpheme Derived Noun
ָጅ -ነُ ָጅ-ነُ [ָጅነُ]
እግ‫ץ‬ -ኧኛ እግ‫ץ‬-ኧኛ [እግ‫נ‬ኛ ]
ክብ‫ץ‬ -ኧُ ክብ‫ץ‬-ኧُ [ክብ‫]ُנ‬
ከ‫דـ‬ -ኤ ከ‫ דـ‬-ኤ [ከ‫]הـ‬
ጢ‫ו‬ -ኦ ጢ‫ו‬-ኦ [ጢ‫]ז‬
ኢُዮጵያ -ኣዊ ኢُዮጵያ-ኣዊ [ኢُዮጵያዊ]
እንግֵዝ -ኛ እንግֵዝ-ኛ [እንግֵዝኛ]

ኧ and ኦ
Classes of Compound Words Example Derived Noun
Noun + Noun ብ‫ ُנ‬+ ‫ו‬ጣድ ብ‫ו ُנ‬ጣድ
Noun + [ኧ] + Noun ቤُ + [ኧ] + ‫א‬ንግስُ ቤ‫א ـ‬ንግስُ
Noun + Verbal Stems ָብ + ወֳድ- ָብ ወֳድ
Verbal Stem + [ኦ] + Verbal Stem ‫ُץר‬- + [ኦ] + አደ‫ץ‬- ‫ ِץר‬አደ‫ץ‬
Verbal Stem + [ኦ] + Noun ‫ُץר‬- + [ኦ] + አዳ‫ע‬ ‫ ِץר‬አዳ‫ע‬
Amharic nouns can be marked for:
i. Number by affixation of morphemes (and vowel changes) or repetition of words
Noun in Singular Description of the Noun Morpheme Plural Form
Form (Examples)
Ending with consonant - - [ ]
Ending with vowel -
Personal Pronoun - - [ ]
Proper Noun -
Plural formation by repetition - - [ ]
Loanwords from Geez (do not have
similar patterns for plural formation)

ii. Definiteness by affixation of morphemes or vowels based on number, gender, and/or ending
of the noun.
Indefinite Noun Ending of Number Gender Definite Noun
(Examples) the Noun
Feminine - [ ] / - [ ]
Singular
Consonant Masculine - [ ]
Plural - [ ]
Feminine - [ ] / - [ ]
Singular
Vowel Masculine - [ ]
Plural - [ ]
iii. Gender by affixation of the morpheme - , e.g. --> - [ ]
iv. Case
(a) Objective case by affixation of the morpheme - , e.g. (subjective case) --> - [ ]
(b) Possessive case by affixation of morphemes or vowels based on person, number, gender,
and/or ending of the noun (personal pronouns by prefixing -, e.g. --> - [ / ])
Subjective Case Ending of Person Number Gender Possessive
(Examples) the Noun Case
Singular - [ ]
First
Plural - [ ]
Masculine - [ ]
Singular
Ending with Second Feminine - [ ]
consonant Plural - [ ]
Masculine - [ ]
Singular
Third Feminine - [ ]
Plural - [ ]
Singular - [ ]
First
Plural - [ ]
Masculine - [ ]
Singular
Ending with Second Feminine - [ ]
vowel Plural - [ ]
Masculine - [ ]
Singular
Third Feminine - [ ]
Plural - [ ]
Amharic adjectives can be derived from:
i. Verbal Roots by infixing vowels between consonants (C) as shown below
Verbal Root (Examples) Pattern of Derivation Derived Adjective
ድ-ር-ቅ CኧCኧC ድኧርኧቅ [ደረቅ]
ጥ-ቅ-ር CECUC ጥEቅUር [ጥቁር]
ጥ-ብ-ብ CኧC1C1IC ጥኧብIብ [ጠቢብ]
ፍ-ጥ-ን CኧC1C1ኣC ፍኧጥኣን [ፈጣን]

ii. Nouns by suffixing bound morphemes


Noun (Examples) Morpheme Derived Adjective
ነገር -ኧኛ ነገር-ኧኛ [ነገረኛ]
ተራራ -ኣማ ተራራ-ኣማ [ተራራማ]
ፈርስ -ኣም ፈርስ-ኣም [ፈርሳም]
ህዝብ -ኣዊ ህዝብ-ኣዊ [ህዝባዊ]

iii. Stems by suffixing bound morphemes


Stems (Examples) Morpheme Derived Adjective
ደካም- -ኣ ደካም-ኣ [ደካማ]
ንቅ- -U ንቅ-U [ንቁ]
በል- -Iታ በል-Iታ [በሊታ]

iv. Compound Words of nouns and adjectives by affixing the vowel -ኧ


e.g. ሆድ ሰፊ --> ሆድ-ኧ ሰፊ [ሆደ ሰፊ]
Amharic adjectives can be marked for:
i. Number by affixation of morphemes or repetition of consonants (and affixing the vowel - )
Adjective in Singular Description of the Morpheme Plural Form
Form (Examples) Adjective
Ending with consonant - - [ ]
Ending with vowel - - [ ]
Plural formation by repetition of consonant - - [ ]
ii. Definiteness by affixation of morphemes or vowels based on number, gender, and/or ending
of the adjective.
Indefinite Adjective Ending of the Number Gender Definite Adjective
(Examples) Adjective
Feminine - [ ]/ - [ ]
Singular
Consonant Masculine - [ ]
Plural - [ ]
Feminine - [ ]/ - [ ]
Singular
Vowel Masculine - [ ]
Plural - [ ]
iii. Gender by affixation of the morpheme - , e.g. --> - [ ]
iV. Case (Objective Case) by affixation of the morpheme - , e.g. --> - [ ]
Amharic verbal stems (from which various forms of verbs are formed) can be derived from:
i. Verbal Roots by
(a) affixing the vowel - - to produce C C1C1 C-, e.g. - - --> -[ -]
(b) repeating penultimate consonants and affixing the vowels - - and - - to produce
C C1 C1C1 C-, e.g. - - --> -[ -]

ii. Verbal Stems by affixing morphemes

Verbal Stem Morpheme Derived Verbal Stem


(Examples)
- - - -[ -]
- - - -[ -]
- - - -[ -]

iii. Compound Words of


(a) stems and verbs, e.g. -+ -->
(b) sub-words and verbs, e.g. + -->
Amharic verbs are marked for:
i. Person, gender, number, case, and tense//aspect
Person Singular Plural
Gender
(Subjective Case) Past Tense Non-Past Tense Past Tense Non-Past Tense
First - /- - - -
Masculine - /- - - - -
Second
Feminine - - - - - -
Masculine - - - - -
Third
Feminine - - - - -

Objective Case
Tense Subjective Case
Person Gender Singular Plural
First - - - -
Third Person, Masculine - - /-
Second - -
Singular, Feminine - -
Masculine Masculine - -
Third - -
Past Feminine - -
Tense First - - - -
Third Person, Second Masculine - -
- -
Singular, Feminine - -
Feminine Third Masculine - -
-- -
Feminine - -
.. .. .. .. .. ..
. . . . . .
etc etc etc etc etc etc
ii. Mood

Mood
Number Person Gender Completed
Command Request Negative
Action
First - /- - - - -
Masculine - /- - - -
Second
Singular Feminine - - - - - - -
Masculine - - - - -
Third
Feminine - - - - -
First - - - - -
Plural Second - - - - - - -
Third - - - - - - - -

Amharic verbs in general show high degree of inflection since person, case, gender, number,
tense, aspect, mood and others are marked on the verb. For example, indicates:
the subject (third person, masculine, singular)
the object (first person, plural)
negation …
past tense
• State machines are widely used in NLP for modeling phonology, morphology and syntax.

• State machines are formal models that consist of states, transitions among states, and
an input representation.
♦ States – represent the set of properties of an abstract machine
♦ Transitions – represent jumps from one state to another
♦ Inputs – sequences of symbols or letters that can be read by the machine

• A machine with finite number of states is called finite state machine (FSM).

• FSM has two special states: start state and final state.
1 1 Input symbol
0 Transition
1 Final state
S0 S1 S2
0

Start state

• There are two types of FSMs: finite state automata and finite state transducers.
• Finite state automaton (FSA) is finite state machine that only accepts a set of given
strings (a language).

• FSA can be deterministic or non-deterministic.

• In deterministic FSA, every state has one transition for each possible input.

♦ Example: A deterministic FSA that determines if a binary string contains


an even number of 0's.

1 1
0

S0
ε S2
S1
0

♦ Strings accepted by this deterministic FSA are: ε, 1, 11, 111, 00, 010,
1010, 10110, etc.
• In non-deterministic FSA, an input can lead to one, more than one or no transition for
a given state.

♦ Example: A non-deterministic FSA that determines if a binary string


contains an even number of 0’s or an even number of 1’s.
1 1
0

ε S2
S1
0
S0 0 0
1
ε
S3 S4
1

♦ Strings accepted by this non-deterministic FSA are: ε, 1, 11, 111, 00,


010, 1010, 10110, 011, 11011, 1010101, etc.
• FSAs can be used to recognize words in a language.

• Examples:

♦ Single word recognition

ሰ በ ረ
S0 S1 S2 S3

w a l k
S0 S1 S2 S3 S4

ሰበረ
S0 S1

walk
S0 S1
♦ Recognition of multiple words

ሰበረ, ሰበቀ, ሰበብ ረ

ሰበ ቀ
S0 S1 S2

internal, eternal, ethical, ethiopia, ethanol

in
S2
tern
e al
S0 S1 c
i opia
S4 S5
eth
S3 anol
♦ Recognition of multiple words (for instance, Amharic pronouns: Eኔ, Eኛ, Aንተ,
Aንቺ, Eናንተ, Eስዎ, Eርስዎ, Eሱ, Eርሱ, Eሷ, Eርሷ, Eሳቸው, Eርሳቸው, Eነሱ, Eነርሱ)


S1
Aን ሷ
E ሱ
ሳቸው
E ር ስዎ
S0 S2 S3 S6



ነ ናነተ

S4 ር S5

• One word and multiple inflections

s
walk ed
S0 S1 S2
ing

..
.
ኧን
ኧህ
ኣት
ኧው
S0 ሰበር S1 ኣቸው S2
ኧኝ
ኧሽ
ኣችሁ
ኣችሁት
..
.
• Multiple words and multiple inflections
..
.
jump s
walk ed
S0 S1 S2
help ing
..
.
..
.
ኧን
ኧህ
..
. ኣት
ማረክ ኧው
S0 ሰበር S1 ኣቸው S2
ገደል ኧኝ
..
. ኧሽ
ኣችሁ
ኣችሁት
..
.
• One word and multiple inflections with affixes

.
.
.

. ኧን
.
. ህ
Eንዲ ኣት
Eንዳይ ኧው
S0 ከሚ S1 ሰብር S2 ኣቸው S3
ሊ ብን
የሚ በት
.
. ለት
.
ባቸው
.
.
.
• Multiple words and multiple inflections with affixes

.
.
.

. ኧን
.
. ህ
.
Eንዲ . ኣት
.
Eንዳይ ማርክ ኧው
S0 ከሚ S1 ሰብር S2 ኣቸው S3
ሊ ገድል ብን
.
የሚ . በት
.
.
. ለት
.
ባቸው
.
.
.
• Marking part-of-speech

ion

[word] y cate
S0 S1 S3 S5

ism er y
ist
S2 S4
• Marking part-of-speech

ion

[word] y cate
S0 N Adj V

ism er y
ist
N N
• Collect words in a large corpus and compile into a trie data structure:

... walk walked walking walks wall walls want wanted wanting
wants warn warned warning warns ...

d
e
k s
i
l n g
l
s
d
e
w a n t s
i g
n

r e d
n s
i g
n
..
. ኧው
Eንደሚሰብረው
Uበት
Eንደሚሰብሩበት
Eንደሚሰብሩባቸው ሰብር Uባቸው
Uት
Eንደሚሰብሩት
Eንደሚሰብር
Eንደሚገድለው ገድል ኧው
Eንደሚገድሉበት
ሚ Uበት
Eንደሚገድሉባቸው
Uባቸው
Eንደሚገድሉት
Uት
Eንደሚገድል
Eንደማይሰብረው Eንደ
Eንደማይሰብሩበት ኧው
Eንደማይሰብሩባቸው ማይ Uበት
Eንደማይሰብሩት ሰብር Uባቸው
Eንደማይሰብር Uት
Eንደማይገድለው
Eንደማይገድሉበት
ገድል ኧው
Eንደማይገድሉባቸው
Eንደማይገድሉት Uበት
Eንደማይገድል Uባቸው
..
. Uት
• Identify frequent suffix trees

d Discovered Morphology
e
k s • Stems - with common
i suffix tree:
l n g
l ♦ walk
s
♦ want
d ♦ warn
e
w a n t s
i • Morphemes - frequent
n g suffix tree:
r e d ♦ ε
n s ♦ – ed
i ♦ –s
n g ♦ – ing
ኧው
Discovered Morphology
Uበት • Stems - with common
ሰብር Uባቸው suffix tree:
Uት
♦ ሰብር
ገድል ♦ ገድል
ኧው

ሚ Uበት • Morphemes - frequent


Uባቸው suffix tree:
Uት
Eንደ ♦ ε
♦ – ኧው
ኧው
ማይ
♦ – Uበት
Uበት
ሰብር Uባቸው
♦ – Uባቸው
Uት ♦ – Uት
• Other affixes:
ገድል ኧው
Uበት
♦ – Eንደ
Uባቸው ♦ –ሚ–
Uት ♦ – ማይ –

You might also like