A Survey of Applications of Finite Automata in Natural Language Processing
A Survey of Applications of Finite Automata in Natural Language Processing
A Survey of Applications of Finite Automata in Natural Language Processing
Bisht 62
transducer translation template model for statistical scale dictionary compilation and indexation of natural
machine translation. Manning and Schutze [2] and language text.
Jurafsky and Martin [3] provided details of applications
of finite automata in NLP in their books.
II. FINITE AUTOMATA IN NLP
In this section a survey of applications of finite
automata in natural language processing has been made.
A. Language Recognizer
There are many tasks that need language recognizing
Fig. 1. NFA for the words ‘boy’ and ‘bat’.
mechanism. For example, spelling checker,
morphological analysis, language identification etc.. Inflectional morphology can be well recognized by
Finite state machine are quite useful as a language finite automata. For each category of words we can
recognizer. For a given word, a NFA can be designed form a separate NFA and then combine them using ߣ
easily that recognize the word. For example, NFA for transitions. For example, nouns and their plural can be
the words ‘boy’ and ‘bat’ is shown in the Fig 1. recognized through one NFA and verbs and their
Similarly for every word a NFA can be designed and different forms can be recognized through another NFA
the different NFA’s can be combined to form spelling and finally the two NFA can be combined. Figure 2
checker or dictionary compilation for a language. Mohri shows the NFA for some words and their
[ ] showed the application of finite automata in large morphological variations.
Fig. 2. NFA for the some words and their morphological variations.
B. Morphological parsing and generation used for a noun. The word ‘girl’ can be replaced by
Morphological parsing is the process of producing a any other regular noun like ‘boy’.
lexical structure of a word, that is, breaking a word into
stem and affixes and labeling the morphemes with
category label. For example the word ‘books’ can be
parsed as book+s and further as book + N+PL, the
word ‘went’ can be parsed as go+V+PAST etc.
generation is the reverse process of parsing, that is
combining the lexical form of a word to produce the
word. For example, box+N+Pl generates the word Fig. 3. Generation of word from its lexical form
‘boxes’. Finite state transducers are quite useful in through transducer.
morphological parsing. Let us consider the lexicon form
Here simple regular nouns have been considered on
of regular inflectional ‘girl +N +PL’. Fig. 3 shows the
transducer which convert the lexicon form ‘girl+N+PL’ exemplarily basis, orthographic rules can be applied for
into the word ‘girls’. Let us assume that x represents the irregular inflections and derivations and NFAs can be
word ‘girl’ for simplification purpose as for every designed for every word and its variations in a certain
regular singular noun the input and output will remain language.
same in a transducer, that’s why a variable x can be
Bisht 63
C. Tense recognition in Hindi introduce the concept. Fig. 5 shows the recognition of
Tenses in Hindi are important part of grammar. Their two tenses in Hindi sentences through NFA. The
knowledge is quite useful in translation from Hindi to abbreviations showing final states are as follows: PI-
another language. Here we will show the application of Present indefinite and PC-Present continuous. Further
finite state machines in automatic recognition of tenses {∈ ݔRkk gS] rh gS] rs gS} , y ∈{ jgk gS] jgh gS] jgs
in Hindi language. We have some grammatical rules in gS},
Hindi through which we can identify the tense of a
III. CONCLUSION
sentence and translate the sentence in English. We
know that there are three tenses and each tense has a In the present paper a survey of applications of finite
four sub tenses. For example, the sentences which end automata in natural language processing has been done.
with ‘Rkk gS] rh gS] rs gS’ etc. come under present Some examples have been taken to show the
indefinite tense, like *eksgu QqVcky [ksyrk gS^ and morphological analysis and parsing of English
the sentence which end with ‘jgk gS] jgh gS] jgs gS’’ sentences. Finally the application of finite automata has
etcetc. come under present continuous tense, like *xhrk been shown in tense recognition of Hindi sentences
[ksy jgh gS^. We may have the following parse tree which is an important part of rule based machine
structure of the second Hindi sentence for tense translation.
recognition purpose. Here the non terminal EF stands
for ‘End Form’, other non terminal have usual meaning. REFERENCES
[1]. A. Jayara, A. Kornai and A.Sakarovitch, “Finite-state
methods and models in natural language processing, Natural
Language Engineering 17 (2): 141–144, 2011.
[2]. C.D. Manning, H. Schutze, Foundations of Statistical
Natural Language Processing. MIT Press, 2002.
[3]. D. Jurafsky, and J.H. Martin, Speech and Language
Processing, Pearson Eucation India, 2004.
[4]. K. Koskenniemi, “Finite state morphology and and
information retrieval” in the proceedings of the ECAI
workshop extended finite state models of language, 1996, pp.
42-45.
[5]. M. Mohri, “On some applications of finite automata
theory to natural language processing”, Natural Language
Engineering, Vol. 1(1), pp. 000-000, 1995.
Fig. 4. Parse tree for the sentence *xhrk [ksy jgh gS^. [6]. M. Shrivastava, P. Bhattachrayya, “Hindi POS Tagger
Using Naive Stemming : Harnessing Morphological
Information Without Extensive Linguistic Knowledge, in the
proceedings of ICON-2008: 6th International Conference on
Natural Language Processing, Macmillan Publishers, India.
[7]. R. Sharma, S.Paul, “A rule based approach for automatic
clause boundary detection and classification in Hindi”,
Proceedings of the 5th Workshop on South and Southeast
Asian NLP, 25th International Conference on Computational
Fig. 5. NFA for recognition of two tenses. Linguistics, pp. 102–111, 2014.
[8]. R.M.Kaplan, M. Kay, “Regular models of rule syatems”,
A finite state machine can be designed for the verb Computational Linguistics, vol. 20(3), pp. 331-378, 1994.
phrase that appears at the end of the sentence for [9]. S. Kumar, Y. Deng and W. Byrne, “A weighted finite
deciding the tense of the sentence. Tough there are state transducer translation template model for statistical
number of complexities and variations in Hindi machine translation”, Natural Language Engineering 12 (1):
sentences but here a simple example has been taken to 35–75, 2005.
Bisht 64