Parsing Bangla Grammar Using Context Free Grammar

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/314048397
Parsing Bangla Grammar Using Context Free Grammar
Chapter · January 2014

DOI: 10.4018/978-1-4666-6042-7.ch044
CITATIONS READS
0 136
4 authors, including:
Bishnu Sarker K. M. Azharul Hasan

National Institute for Research in Computer Science and Control Khulna University of Engineering and Technology
10 PUBLICATIONS 7 CITATIONS 64 PUBLICATIONS 285 CITATIONS
SEE PROFILE SEE PROFILE
Some of the authors of this publication are also working on these related projects:
EM-DMKM View project
Natural Language Processing View project
All content following this page was uploaded by K. M. Azharul Hasan on 22 July 2019.
The user has requested enhancement of the downloaded file.

137
Chapter 7
Parsing Bangla Grammar
Using Context Free Grammar
Al-Mahmud
Khulna University of Engineering and Technology (KUET), Bangladesh
Bishnu Sarker
K. M. Azharul Hasan
ABSTRACT
Parsing plays a very prominent role in computational linguistics. Parsing a Bangla sentence is a pri-
mary need in Bangla language processing. This chapter describes the Context Free Grammar (CFG)
for parsing Bangla language, and hence, a Bangla parser is proposed based on the Bangla grammar.
This approach is very simple to apply in Bangla sentences, and the method is well accepted for parsing
grammar. This chapter introduces a parser for Bangla language, which is, by nature, a predictive parser,
and the parse table is constructed for recognizing Bangla grammar. Parse table is an important tool to
recognize syntactical mistakes of Bangla sentences when there is no entry for a terminal in the parse
table. If a natural language can be successfully parsed then grammar checking of this language becomes
possible. The parsing scheme in this chapter works based on a top-down parsing method. CFG suffers
from a major problem called left recursion. The technique of left factoring is applied to avoid the problem.
INTRODUCTION way of expressing the expressions of feelings

through providing those necessary phonetics that
Language plays the most important role in human could be spoken and the symbols that could be
communication. Human communication is based written to be preserved. Language is the driving
on exchange of feelings, sharing of knowledge, etc. force to human communication. Language is the
Feelings can be exchanged through voice, symbols most important medium to represent and express
and signs. Language provides the most convenient human knowledge and human communication.
DOI: 10.4018/978-1-4666-3970-6.ch007
Copyright © 2013, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Bangla (or Bengali) is one of the more im- some kind of parse tree, abstract syntax tree or
portant Indo-Iranian languages, is the sixth most other hierarchical structure) implicitly in the
popular in the world and spoken by a population input tokens. Parsing can be defined as a method
that now exceeds 250 million. Geographically, where a parser algorithm is used to determine
Bangla-speaking population percentages are as whether a given input string is grammatically
follows: Bangladesh (over 95%), and the Indian correct or not for a given grammar. Parsing is a
States of Andaman & Nicobar Islands (26%), fundamental problem in language processing for
Assam (28%), Tripura (67%), and West Bengal both machines and humans. In general, the parsing
(85%). The global total includes those who are now problem includes the definition of an algorithm to
in diaspora in Canada, Malawi, Nepal, Pakistan, map any input sentence to its associated syntactic
Saudi Arabia, Singapore, United Arab Emirates, tree structure (Saha, 2006). The parser often uses
United Kingdom, and United States. a separate lexical analyzer to create tokens from
Bangla is still in degraded stage at least as far the sequence of input characters. Parsers may be
as work in the area of computational linguistics programmed by hand or may be automatically or
is concerned. Natural languages like English and semi-automatically generated (in some program-
even Hindi is rapidly progressing as far as work ming languages) by a tool.
done in processing by computers is concerned. A parse tree for a grammar is a tree where the
Unfortunately, Bangla lags more or less behind in root of the tree is the start symbol for the gram-
some crucial areas of research like parts of speech mar, the interior nodes are the non-terminals of
tagging, text summarization and categorization, the grammar, the leaf nodes are the terminals of
information retrieval and most importantly in the the grammar and the children of a node starting
area of grammar checking. The grammar checking from the left to the right correspond to the sym-
for a language has a wide variety of applications. bols on the right hand side of some production
The activity of breaking down a sentence into for the node in the grammar. Every valid parse
its constituent parts is known as parsing. Parsing is tree represents a string generated by the grammar
an earlier term for the diagramming of sentences (Yarowsky, 1995).
of natural languages, and is still used for the dia- A parser analyzes the sequence of symbols
gramming. Parsing a sentence involves the use of presented to it based on the grammar (Yarowsky,
linguistic knowledge of a language to discover the 1995). Natural language applications namely In-
way in which a sentence is structured. Exactly how formation Extraction, Machine Translation, and
this linguistic knowledge is represented and can be Speech Recognition, need to have an accurate
used to understand sentences is one of the questions parser (Haque & Khan, 2005). Parsing natural
that has engaged the interest of psycholinguists, language text is more difficult than the computer
linguists, computational linguists, and computer languages such as compiler and word processor
scientists. Bangla parsing is a challenging task. because the grammars for natural languages are
This chapter has a detail discussion over Bangla complex, ambiguous, and infinite in number of
parsing using Context Free Grammar. vocabulary. It is difficult to prepare formal rules
to describe informal behavior even though it is
clear that some rules are being followed. For a
BACKGROUND syntax based grammar checking the sentence is
completely parsed to check the correctness of it.
In computing, a parser is one of the components If the syntactic parsing fails, the text is considered
in an interpreter or compiler that checks for incorrect. On the other hand, for statistics based
correct syntax and builds a data structure (often approach, Parts of Speech (POS) tag sequences
138
are prepared from an annotated corpus, and hence If we transfer it as word to word then the
the frequency and the probability (Sengupta & English form will become:
Chaudhuri, 1997). The text is considered correct if
the POS-tagged text contains POS sequences with I + rice + eat.
frequencies higher than some threshold (Anwar,
Anwar & Bhuiyan, 2009) Here, in English object is appeared after the
In this chapter, we implemented Bangla dic- verb whereas in Bangla the object is appeared
tionary in XML format using the corresponding before the verb. It’s clearly states that how diffi-
word as tag name and it’s POS as value. It is a cult to parse Bangla sentences rather than English
very useful technique because it needs less time to sentences.
store data as well as data retrieval from this data Roughly speaking, phrases and idioms are
storage is very easy and fast compared to other expressions whose meaning cannot be completely
popular forms of data storage. It helps to search understood from the meanings of the component
the dictionary very fast. Bangla grammar has a parts. For example, whereas it is possible to work
large number of forms and rules. The aim of this out the meaning of (1a) on the basis of knowledge
chapter is to describe a parser that parses a Bangla of English grammar and the meaning of words,
sentence, rather than generating sentences. A parse this would not be sufficient to work out that (1b)
tree is displayed if the Bangla parser decides that can mean something like ‘If Sam dies, her children
sentence is correct otherwise parse tree are not will be rich’. This is because kick the bucket is
generated. This technique of parsing system with an idiom.
XML checks whether every sentence of given
Bangla passage is grammatically correct or not 1a: If Sam mends the bucket, her children will
correct. It also indicates the location of problems. be rich.
If any sentence of the passage is found accepted 1b: If Sam kicks the bucket, her children will be
or correct then it generates a parse tree. rich.
There has been a structural difference between
the Bangla and English grammar. It is challenging The problem with phrases and idioms, in a
to parse Bangla and English sentence with the parsing context, is that it is not usually possible
same grammatical structure. If we match word to to parse them using the normal rules.
word with Bangla and English grammar structure Throughout the chapter there will be a detailed
then the parser will not give the correct answer. discussion over Context Free Grammar for pars-
For example, considering the grammatical ing Bangla Text and predictive Bangla parser for
structure: constructing parse table. Here a top down parsing
scheme is adopted and the problem of left recur-
Subject + verb + object (I + eat+ rice) sion is avoided applying left factoring over the
proposed CFG.
In Bangla it appears as:
আমি + খাই + ভাত. RELATED WORKS
But it is a non-traditional form. The traditional Modern Bangla morphology is very productive,
form is: especially for verbs, with the root verbs takes
around 168 different forms. Bangla lexicon also
আমি + ভাত + খাই. has a very large number of compound words
139
(words that have more than one root), which parser blindly reaches the root of terminal using
can be created from at most any combination of the productions (Sengupta & Chaudhuri, 1997).
nouns, pronouns and adjectives. An effort is made Recursive descent Parsing is another way to
at building a complex morphological parser for produce parse trees for an input sentence. This
Bangla, where it can only handle simple words parser interprets the grammar as a specification of
with a single root. Though the addition of in- how to break a high-level goal into several lower-
flectional suffixes in Bangla compound word is level sub-goals. The top-level goal is to find the
fairly complex, the compound word’s individual start symbol of the grammar and sub-goals are the
root words may retain their inflectional suffixes. production that breaks down the start symbols of
Such a compound word morphological parser is the grammar. Such sub-goals are matched with an
developed which can efficiently parse compound input sentence and succeed to its next state if the
words having inflectional suffix and also resolves next word is matched. If there is no match the parser
ambiguities. Some rules are used to develop the backtracks to a previous node and tries a different
morphological analysis of simple and compound alternative. One problem with this recursive decent
Bangla words that can be used to make Universal method is that when a production is recursive,
Natural Language (UNL)-Bangla dictionary for the parser keeps on generating a never-ending
converting the natural Bangla sentences to UNL parse tree when it selects the recursive production
documents and vice versa. (Sengupta & Chaudhuri, 1997). An advantage for
There has been no organized formal initiative recursive decent over the shift-reduce method is
for generating a context-free grammar for Bangla that the parser can be forced to choose a specific
Language. There are some books available on production to continue parsing. Thus, this makes
context-free grammars for Bangla but most of it easier to produce correct parse tree of a sentence
them have been limited to the writers’ way of in a given grammar which may not be possible
generating the grammar rather than producing with a shift-reduce parser for the same grammar.
a generic grammar to cover the whole language A rule-based Bangla parser has been proposed
(Anwar, Anwar & Bhuiyan, 2009; Aho, Sethi & in (Saha, 2006), that handles semantics as well
Ullman, 2002). as POS identification from Bangla sentences
Shift-reduce parsing is the simplest kind of and ease the task of handling semantic issues in
bottom-up parsing. Here the parser takes an input machine translation.
string (sentence) and repeatedly pushes the next An open source morphological analyzer for
input word onto a stack which is the shift operation. Bangla using finite state technology is described
If the top n items on the stack match the n items in (Faridee & Tyers, 2009). They developed the
on the right-hand side of some production, then monolingual dictionary called Monodix, stored
they are popped off the stack and the item on the in XML file.
left-hand side of the production is pushed on the Anwar, Anwar & Bhuiyan (2009) address a
stack. This replacement operation is the reduce method to analyze syntactically Bangla sentence
part of the parsing. The parser ends its work when using context-sensitive grammar and interpret the
all the input is consumed and there is only one input Bangla sentence to English using the NLP
item remaining on the stack which is a parse tree conversion unit. The system is based on analyzing
with the start symbol of the grammar as its root. an input sentence and converting into a structural
The parser fails to parse a sentence when there representation.
are no remaining input words to shift or when There have been several parsing technique
there is no way to reduce the remaining items on proposed for Bangla Language. A parsing meth-
the stack. The second problem occurs because the odology for Bangla natural language sentences
140
is proposed by Hoque and Ali (2003) and shows Bangla sentences in semantic manner is presented
how phrase structure rules can be implemented by in (Hoque, Rahman & Dhar, 2007) and (Hasan,
top-down and bottom-up parsing approach to parse Mondal & Saha, 2010) presents a technique for
simple sentences of Bangla. An approach named detecting the named entity based on classifier
Predicate Preserving Parser (PPP) is described for Bangla documents. But these papers (Anwar,
in (Ali, Ripon and Allayear 2012) which maps Shume & Bhuiyan, 2010; Hoque, Rahman &
Bangla text in UNL which then can be translated KumarDhar, 2007; Hasan, Mondal & Saha, 2010)
to any other natural languages. A technique of do not deal with the detail grammar recognition
unsupervised morphological learning for Bengali for Bangla sentences.
language is introduced in (Dasgupta & Ng, 2006).
A Bengali Dependency Parser based on statistical
data driven parsing system followed by a rule PARSING BANGLA SENTENCES
based post processing is presented in (Ghosh et. WITH CONTEXT FREE GRAMMAR
al., 2010). A comprehensive approach for Bangla
syntax analysis was developed (Mehedy, Arifin Context Free Grammar
& Kaykobad, 2003) where a formal language is
defined as a set of strings. Each string is a con- The most commonly used and mathematical sys-
catenation of terminal symbols. tem for modeling constituent structure in natural
Some other approaches such as Lexical Func- languages is the Context-Free Grammar, or CFG.
tional Grammar (LFG) (Sengupta & Chaudhuri, Context-free grammars are also called Phrase
1997), and Context Sensitive Grammar (CSG) Structure Grammars. A Context-Free Grammar
(Hoque & Ali, 2004; Murshed, 1998; Selim (CFG) is a set of recursive rewriting rules called
& Iqbal, 1999) have also been developed for productions used to generate string patterns. A
parsing Bangla sentences. LFG is a monotonic CFG consists of the following components (Aho,
theory of syntax. Instead of postulating different Sethi & Ullman, 2002):
derivational levels represented in the same formal
language, it incorporates different parallel levels 1. A start symbol for the grammar.
of information, which can all potentially access 2. A set of terminal symbols that are characters
each other, each with its own formal language. The that appear in the strings generated by the
assumption about parallel levels of information grammar.
extends even to non-syntactic aspects of gram- 3. A set of non-terminal symbols that are
mar. Thus, for example, semantic information placeholders for patterns of terminal symbols
is assumed to be available to various levels of that can be generated by the non-terminal
syntax, and syntactic levels can input into phonol- symbols.
ogy (Joshi, 1993). 4. A set of productions that are used to replace
Some developed Bangla parser using SQL to the non-terminals with other non-terminals
check the correctness of sentence; but its space or terminals.
complexity is inefficient. Besides, it takes more
time for executing SQL command. As a result A formal language is context-free if there is a
those parsers becomes slower. A technique is context-free grammar that generates it.
implemented to perform structural analysis of This chapter describes a way of producing
Bangla sentences of different tenses using Context- CFG for Bangla Language and designing a Bangla
Free Grammar (CFG) rule (Anwar, Shume & parser. The grammar discussed throughout the
Bhuiyan, 2010). A methodology for analyzing the chapter can successfully parse a number of sen-
141
tences. We have defined a tag set and according Context-free grammars play a central role in the
to the tag set, we tagged the word. In the next step description and design of programming languages
of producing CFG, we defined the constituents and compilers. They are also used for analyzing
and finally the rules were generated. We used the syntax of natural languages. Noam Chomsky
top-down parsing techniques for designing Bangla (1956) has posited that all human languages are
parser by using XML. Our parser decides which based on context-free grammars at their core, with
sentence is correct and which is to be rejected. additional processes that can manipulate the output
And finally generate a parse tree for this sentence of the context-free component (the transformations
by using Bangla Parser. of early Chomskyan theory).
There are a number of important subclasses It is found that most parsing methods fall into
of the context-free grammars: one of these two classes, named as top-down
parsing method and bottom-up parsing method.
• LR(k) grammars (also known as deter- These terms refer to the order in which nodes in
ministic context-free grammars) allow the parse tree are constructed.
parsing (string recognition) with deter- In top-down parsing method, construction
ministic pushdown automata, but they can starts at the root and proceeds towards the leaves
only describe deterministic context-free i.e., this method works from sentence symbol to
languages. the sentence. Parsers designed using top-down
• Simple LR, Look-Ahead LR grammars are parsing method are known as top-down parsers.
subclasses that allow further simplification On the contrary, in case of bottom-up pars-
of parsing. ing method, construction starts at the leaves and
• LL(k) and LL(*) grammars allow parsing proceeds towards the root, i.e., this method works
by direct construction of a leftmost deriva- from the sentence to sentence symbol. And pars-
tion as described above, and describe even ers which are designed using bottom-up parsing
fewer languages. method are known as bottom-up parser.
• Simple grammars are a subclass of the There are some well-recognized problems
LL(1) grammars mostly interesting for its when concerning with bottom up parsing. Bottom
theoretical property that language equal- up parsing suffers from termination problem. It is
ity of simple grammars is decidable, while inefficient when dealing with empty categories.
language inclusion is not. Bottom up parsing is data directed as it attempts
• Bracketed grammars have the property that to parse the words that are there and inefficient
the terminal symbols are divided into left when there is great lexical ambiguity (grammar-
and right bracket pairs that always match driven control might help here).
up in rules. Parsing of Bangla sentences by using grammar
• Linear grammars have no rules with more rules is still in a rudimentary stage. Very little
than one non terminal in the right hand research has been conducted regarding parsing
side. of Bangla sentences but a significant number of
• Regular grammars are a subclass of the research activities have been conducted on the
linear grammars and describe the regular recognition of Bangla characters. Some developers
languages, i.e., they correspond to finite developed Bangla parser using SQL to check the
automata and regular expressions. correctness of sentence. But it takes more space
on our computers, meaning its space complexity
142
is inefficient. Besides, it takes more time for ex- 3. Parse Table: A two dimensional array
ecuting SQL command. As a result those parsers M[A,n] where A is nonterminal and n is a
becomes slower. To decrease time and space, terminal or $ sign.
XML files re used for storing data which is more 4. Output.
speedy and takes up less space.
The extensions and refinements presented in Building the Word Repository/
this chapter over previous works especially in Dictionary Using XML
Hasan et al. (2011) are as follows:
XML (eXtensible Markup Language) has been
• A set of CFG rules. designed as a markup language and a textual for-
• Store data into XML file. mat. It provides for a description of a document’s
• Design a Bangla Parser by using table- contents, with non-predefined tags, and does not
driven method with top-down approach. provide for any presentational characteristics. The
• Remove grammar ambiguity by using left eXtensible Markup Language (XML) is now used
factoring. almost everywhere for its simplicity and ease of
• Construct a parse tree. use. It is a good format to store any kind of data.
The tasks behind using XML are always the same:
Bangla words are stored as tag and its corre- reading data from XML and writing data into it.
sponding constituents or POS are used as value. XML have several advantages. Some of them are:
More over the idea of left factoring have been
applied to remove the left recursion and ambigu- 1. Ease of access.
ity of the grammar and hence according to the 2. Efficient in response to access time.
designed grammar, the parser can detect the errors 3. Easy to format the grammar structure.
in a Bangla sentence. 4. No need to use any space consuming DBMS.
5. Ease of defining the category of words.
6. Easy to implement and integrate.
SOLUTION AND
RECOMMENDATIONS The general format of XML tag is
A Parsing Scheme for Bangla <tag_name>value</tag_name>

Grammar Recognition
A dictionary is a very basic Natural Language
A predictive parser is an efficient way of imple- Processing (NLP) tool used to get the meaning,
menting recursive decent parsing by handling the parts of speech, and usage of a word and can also
stack of activation record. be used as a spell-checker to detect errors in a
The predictive parser is composed of four sentence and correct them by providing a set of
elements namely: correct alternatives which includes the intended
word. To design the dictionary in XML, we con-
1. Input: Contains the string to be parsed sider Bangla word as tag_name and value as its
or checked, followed by a $, the right end corresponding POS. An example of XML file is
marker. shown in Figure 1.
2. Stack: Contains a sequence of grammar
symbols.
143
Figure 1. XML word repository

Defining Constituents and Phrases
Each sentence is to be partitioned into its con-

stituents. A constituent expresses the complete
meaning for a specific context. After tagging the
words, each sentence is to be broken down into
their constituent parts. The constituents of the
sentences found out are shown in the Table 2.
Grammar Design with CFG
Once constituents have been identified, the

productions for Context-Free Grammar (CFG)
are developed for Bangla sentence structures.
Categorization of Words Table 3 shows the productions along with their
respective constituents. As Bangla grammar has
Each sentence is composed of one or more phrases. different forms, the same production term can be
So if the syntactic constituents of sentences are used only by reorganizing the in the grammar. For
identified, it will be easier to obtain the structural example, following three forms can be applied by
representation of the sentence (Hoque & Ali, 2003; reorganizing the production terms.
Mehedy, Arifin & Kaykobad, 2003). Bangla words
are tagged with their respective Parts-Of-Speech 1. আমি ভাত খাই |
(POS), modifier, pattern, number (Haque & Khan, 2. আমি খাই ভাত |
2005), and stored in XML file. 3. ভাত আমি খাই |
Table 1 shows the tag sets used in XML file for
storing Bangla words. Every word is tagged with CFG Productions for
the appropriate member of the tag set and stored Bangla Grammar
in the XML file in a definite format.
The CFG productions for the Bangla grammar G
Table 1. Tag set description for Bangla grammar is defined as follows:
Tag Name (Symbol) Examples S −> NP VP
Noun (noun)
রহিম, বিদ্যালয়ে, ছেলে, বই,
ভাত,টাকা| NP −> noun conjunction noun | noun ip| noun
pronoun conjunction pronoun| noun pronoun
Pronoun (pronoun) আমি, আমরা, তু মি, সে, তারা।
ip| noun pronoun noun| noun pronoun adjective|
Adjective (adjective) ভাল, দশ, অনেক, শখ। noun pronoun pronoun| noun pronoun tp| noun
Verb (verb) খাই, খেলে, পড়ছে,পড়া। adjective| noun noun conjunction pronoun| noun
Conjunction (conjunction) এবং, ও, চেয়ে| noun aw| noun noun| noun| noun pronoun| pronoun
Negative Description (neg) না , নয়| conjunction pronoun| pronoun ip| pronoun noun
conjunction noun| pronoun noun ip| pronoun noun
Modifier (modifier) এ, একটি, একদিন|
adjective| pronoun noun conjunction pronoun|
pronoun noun aw| pronoun noun| pronoun ad-
jective| pronoun pronoun conjunction pronoun|
144
Table 2. Defining constituents
No. Constituents Symbol Definitions Examples
Noun phrase may be consists of only noun or pronoun. It রহিম,

1 Noun Phrase NP may be modified by quantifier, post position, specifier, plural আমি,
marker etc.
এ পথে।
A group of words including a verb and its complements, বই পড়ছে,
objects, or other modifiers that functions syntactically as a
2 Verb Phrase VP পথ্য সেবন করে,
verb. In English a verb phrase combines with a noun or noun
phrase acting as subject to form a simple sentence. খাওয়া হল না।
A group of words including an adjective and its complements দশ,
3 Adjective Phrase AP or modifiers that functions as an adjective, as too openly
critical of the administration. ভাল ছেলে|
Table 3. Constituents for developing the grammar VP −> noun verb| noun verb verb ptrn| noun verb
Constituents
verb adjective verb| noun verb verb adjective noun|
Productions Examples
(Symbol) noun verb verb adjective pronoun| noun verb verb
NP −> noun রহিম, adjective| noun verb verb noun adjective verb|
Noun Phrase NP → pronoun noun verb verb noun ptrn| noun verb verb noun
(NP) NP → modifier noun আমি,
etc. এ পথে। aw| noun verb verb noun adjective verb| noun verb
VP −> noun verb
verb noun verb ptrn| noun verb verb noun verb aw|
বই পড়ছে, noun verb verb noun verb adjective| noun verb
Verb Phrase VP −> noun verb verb
(VP) VP −> noun verb ptrn পথ্য সেবন করে, verb noun ptrn| nounverb verb noun pronoun|
etc. খাওয়া হল না। noun verb verb| pronoun adjective verb| pronoun
AP −> adjective
Adjective
AP −> adjective noun
দশ, verb verb ptrn| pronoun verb verb adjective verb|
Phrase (AP) ভাল ছেলে| pronoun verb verb adjective noun| pronoun verb
etc .
verb adjective pronoun| pronoun verb ptrn| pro-
noun verb aw| pronoun verb adjective| pronoun
pronoun pronoun aw| pronoun pronoun| pronoun verb pronoun| pronoun verb noun| pronoun ptrn |
tp| pronoun| modifier noun| modifier adjective ptrn| verb verb verb adjective| verb verb verb adjective
modifier adjective| modifier pronoun| modifier verb| verb verb verb ptrn| verb verb verb pronoun|
conjunction adjective| modifier| adjective noun| verb verb ptrn| verb verb adjective verb| verb verb
adjective pronoun| adjective conjunction adjective| adjective noun| verb verb adjective pronoun| verb
adjective ptrn| adjective| ip| tp| xp ip| xp pronoun verb noun adjective verb| verb verb noun verb
conjunction pronoun| xp pronoun ip| xp pronoun ptrn| verb verb noun verb aw| verb verb noun
noun| xp pronoun adjective| xp pronoun tp| xp verb adjective| verb verb noun ptrn| verb verb
adjective| xp noun conjunction pronoun| xp noun noun verb pronoun| verb adjective| verb adjective
aw| xp noun| xp tp| xp aw| tp pronoun| tp adjective| verb ptrn| verb adjective noun| verb adjective verb
tp ip| tp pronoun conjunction pronoun| tp pronoun noun| verb adjective verb noun verb | adjective
noun| tp pronoun adjective| tp pronoun tp| tp noun noun verb adjective verb| adjective noun verb
conjunction pronoun| tp noun aw| tp aw. ptrn| adjective noun verb pronoun| adjective noun
adjective ptrn| adjective noun adjective aw| adjec-
AP −> adjective noun| adjective pronoun| tive noun adjective | adjective pronoun| adjective
adjective ptrn ptrn | conjunction
145
Left Factoring AP −> adjective AP1
The above grammar G is ambiguous. The parser AP1 −> noun | adjective AP2 | pronoun |
generated from this kind of grammar is not ef- conjunction AP | ε
ficient as it requires backtracking. To remove
the ambiguity from the grammar the grammar AP2 −> ptrn | ε
productions are reconstructed by left factoring.
Left factoring is a grammar transformation VP −> noun VP1| pronoun VP4 | verb VP2 | AP
useful for producing a grammar suitable for pre- VP3 | conjunction
dictive parsing. The basic idea is that when it is
not clear which of the productions are to use to VP1 −> verb VP2 | adjective VP3 | pronoun
expand a non terminal then it can defer to take noun |noun pronoun
decision until we get an input to expand it. In
general, if we have productions of form: VP2 −> verb VP3 | ptrn | aw | AP VP3 | ε
A → αβ1 αβ2 VP3 −> verb VP4 | ptrn | adjective VP5 | noun
VP4 | ε
We left factored productions by getting the
input α and break it as follows: VP4 −> AP verb | verb VP2| ptrn | pronoun
A → αA′ VP5 −> verb| noun VP4| pronoun AP1
A′ → β1 β2
There have been lexical productions which
are as follows:
The above grammar is correct and is free from
conflicts. After left factoring the grammar G, the noun −> রহিম| বিদ্যালয়ে| ছেলে| বই| ভাত| টাকা
reconstructed left factored grammar productions | কণে | সিংহ | ঘাস
are as below:
pronoun −> আমি | আমরা | তু মি | সে | তারা
S −> NP VP
verb −> খাই | খেলে | করবে | পড়া | চলে | যাই
NP −> noun NP1|pronoun NP2|modifier
AP1|conjunction NP1| AP | NP2 |xp NP1 |tp adjective −>
ভাল | বলবান | গুণবান | একটু |
NP1 দশ | অনেক | শখ | দ্রুত
NP1 −> conjunction noun |ip | pronoun NP2 | conjunction −> এবং | ও | চেয়ে
adjective | noun NP3 | tp | aw |ε
modifier −> এ | একটি | একদিন | এই
NP2 −> conjunction pronoun | ip | noun NP1 |
adjective | pronoun NP3 | tp | ε ptrn −> না | নয়
NP3 −> conjunction pronoun| aw | ε
146
Designing Bangla Parser FIRST(NP) = {noun, pronoun, modifier,

adjective, conjunction, xp, tp}
A parser for a grammar G is a program that takes
a string as input and produces a parse tree as out- FIRST(NP1) = {conjunction, ip, pronoun,
put if the string is a sentence of G or produces an adjective, noun, tp, aw, ε }
error message indicating that the sentence is not
according to the grammar G. The idea of predic- FIRST(NP2) = {conjunction, ip, pronoun,
tive parser design is well understood in compiler adjective, noun, tp, ε }
design (Murshed, 1998). A predictive parser is
FIRST(NP3) = {conjunction, aw, ε}
proposed in (Hasan et al., 2011) that can recognize
mistakes of Bengali sentences when there is no
FIRST(AP) = {adjective}
entry for terminal in parse table.
To construct a predictive parser for grammar
FIRST(AP1) = {noun, adjective, pronoun,
G two functions namely FIRST() and FOLLOW()
conjunction, ε}
are important. These functions allow the entries
of a predictive parse table for G. Once the parse FIRST(AP2) = {ptrn, ε }
table has been constructed any string could be
verified whether it satisfy the grammar G or not. FIRST(VP) = {noun, pronoun, verb, adjective,
The FIRST() and FOLLOW() determines the conjunction}
entries in the parse table. Any other entries in the
parse table are considered invalid. FIRST(VP1) = {verb, adjective, pronoun,
noun}
Computation of FIRST and FOLLOW
FIRST(VP2) = {verb, ptrn, aw, adjective, ε}
FIRST(α) be the set of terminals that begin the
strings derived from α. If α → ε then ε is also FIRST(VP3) = {noun, verb, adjective, ptrn, ε}
included in FIRST(α). The rules of computing
FIRST (Aho, Sethi & Ullman, 2002) are as follows: FIRST(VP4) = {verb, ptrn, pronoun, adjective,
ε}
• If X is a terminal symbol ⇒ first(X) ={X}
• If X is a non-terminal symbol and X → ε is FIRST(VP5) = {verb, noun, pronoun}
a production rule ⇒ ε is in first(X).
• If X is a non-terminal symbol and X → FOLLOW(A) of a non terminal A is the set
Y1Y2.Yn is a production rule ⇒ first(X) = of terminals a that can appear immediately to the
first(Y1). right of A. If A is the right most symbol in the
sentential form then $ is added to FOLLW(A).
According to the rules of computing FIRST(α) The rules for computing FOLLOW (Aho, Sethi &
(Aho, Sethi & Ullman, 2002) the values for Ullman, 2002) of a non terminal A is as follows:
FIRST() are computed for the grammar G and
are shown here: • If S is the start symbol ⇒ $ is in follow(S).
• If A → αBβ is a production rule ⇒ every-
FIRST(S) = {noun, pronoun, modifier, thing in first(β) is follow(B) except ε.
adjective, conjunction, xp, tp} • If (A→ αB is a production rule) or (A→
αBβ is a production rule and ε is in
147
first(β))⇒ everything in follow(A) is in FOLLOW(VP4) = {noun, adjective, pronoun,

follow(B). modifier, ip, xp, tp, conjunction, $}
According to the rules of computing FOLLOW(VP5) = {noun, adjective, pronoun,

FOLLOW(S) the values for FOLLOWS() are modifier, ip, xp, tp, conjunction, $}
computed for the grammar G and are shown as
follows: Parse Table Construction
FOLLOW(S) = {noun, adjective, pronoun, Parse table is a two-dimensional array where each
modifier, ip, xp, tp, conjunction} row is leveled with non terminal symbol and each
column is marked with terminal or special symbol
FOLLOW(NP) ={noun, verb, adjective, $. Each cell holds a production rule.
pronoun} Let M[m, n] be a matrix where m is the num-
ber of non terminals in grammar G and n is the
FOLLOW(NP1) = {noun, verb, adjective, number of distinct input symbols that may occur
pronoun} in a sentence of grammar G. Table 4 shows the
resulting parse table for the grammar G constructed
FOLLOW(NP2) = {noun, verb, adjective, by applying the following rules:
pronoun}
1. For each production of the form A → α of
FOLLOW(NP3) = {noun, verb, adjective, the grammar G for each terminal a in first(α),
pronoun} add A → α to M[A, a].
2. If ε is in first(α) then for each terminal in
FOLLOW(AP) = {noun, verb, adjective,
FOLLOW(A) add A → α to M[A, a].
pronoun, ptrn, modifier, ip, xp, tp, conjunction}
The table shows all the valid entries for the
FOLLOW(AP1) = {noun, verb, adjective,
parse table. Entries not in this table are error en-
tries. All other undefined entries of the parsing
FOLLOW(AP2) = {noun, verb, adjective, table are error entries.
Parse Tree Generation
FOLLOW(VP) = {noun, adjective, pronoun,
modifier, ip, xp, tp, conjunction, $} A parse tree for a grammar G is a tree where the
root is the start symbol for G, the interior nodes
FOLLOW(VP1) = {noun, adjective, pronoun, are the non terminals of G and the leaf nodes are
modifier, ip, xp, tp, conjunction, $} the terminal symbols of G. The children of a node
T (from left to right) correspond to the symbols
FOLLOW(VP2) = {noun, adjective, pronoun, on the right hand side of some production for T
modifier, ip, xp, tp, conjunction, $} in G. Every terminal string generated by a gram-
mar has a corresponding parse tree and every
FOLLOW(VP3) = {noun, adjective, pronoun, valid parse tree represents a string generated by
modifier, ip, xp, tp, conjunction, $} the grammar. We store the parse table M using a
148
Table 4. Predictive parse table for the grammar G
Noun Pronoun Modifier Adjective Verb Conjunction ptrn ip aw xp tp $

S NP VP NP VP NP VP NP VP NP VP NP VP NP VP NP VP
pronoun modifier conjunction
NP noun NP1 AP NP2 aw xp NP1 tp NP1
NP2 noun NP1
pronoun conjunction
NP1 noun NP3 NP Adjective ε ip tp
NP2 VP2
pronoun conjunction
NP2 noun NP1 Adjective ε ip tp
NP3 pronoun
conjunction
NP3 ε ε ε ε aw
AP
adjective
AP
AP1
adjective conjunction
AP1 noun pronoun ε ε ε ε ε ε ε
AP2 AP
AP2 ε ε ε ε ε ε ptrn ε ε ε ε
pronoun verb
VP noun VP1 AP VP3 conjunction
VP4 VP2
noun pronoun adjective verb
VP1
pronoun noun noun VP2
verb
VP2 noun VP3 ε ε AP VP3 ε ptrn ε aw ε ε ε
VP3
adjective verb
VP3 noun VP4 ε ε ε ptrn ε ε ε ε
VP5 VP4
verb
VP4 pronoun AP verb ε ptrn ε ε ε ε
VP2
VP5 noun VP4 pronoun AP1 Verb
two-dimensional array. To read an element from a Initially, the parser is in a configuration with
two-dimensional array, we must identify the sub- s$ in the input buffer and the start symbol S on
script of the corresponding row and then identify top of the stack and then $.
the subscript of the corresponding column. For
example, the production “NP→ modifier noun” Managing Non-Traditional Forms
is in row 2, column 3, (see Table 4) so it is identi-
fied as M[2][3]. The structure of Bangla language may change in
Let us explain the grammar for the sentence a sentence although the meaning does not change.
একটি ছেলে বই পড়ছে. Using the XML data file For example, the sentence আমি ভাত খাই can also
we get the tags “modifier noun noun verb”. Us- be written as আমি খাই ভাত or ভাত আমি খাই.
ing the production S→ NPVP of the grammar The latter two forms are also correct and have the
G the sentence matches to NP noun verb; in the same meaning. Hence it is sometimes difficult to
second iteration the noun verb part matches to detect such reorganized non traditional forms in
noun VP1. The VP1 in turn matches to verb VP4 a single grammar. The proposed grammar can
and VP4 produces ε. Table 5 shows the moves of detect the the non traditional forms. For example,
our implementation using a stack. the grammar G detects the forms “আমি খাই ভাত”
and “ভাত আমি খাই” are shown in Table 6 and
149
Table 5. Moves made by a Bangla parser on in- Table 7. Moves made by a Bangla parser on input
put “modifier noun noun verb” for correct sen- “noun pronoun verb” for correct sentence “ভাত
tence আমি খাই”
Stack Input Action Stack Input Action

modifier noun noun noun pronoun
$S $S
verb $ verb $
modifier noun noun noun pronoun
$ VP NP S->NP VP $ VP NP S->NP VP
verb $ verb $
$ VP noun modifier noun noun NP-> modifier noun pronoun
$ VP NP1 noun NP->noun NP1
modifier verb $ noun verb $
$ VP noun noun noun verb $ Poped $ VP NP1 pronoun verb $ Poped
$ VP noun verb $ Poped $ VP pronoun verb $ NP1->zero
$ VP1 noun noun verb $ VP-> noun VP1 $ verb AP VP->pronoun AP
pronoun verb $
pronoun verb
$ VP1 verb $ Poped
$ verb AP verb $ Poped
$ VP2 verb verb $ VP1-> verb VP2
$ verb verb $ AP->zero
$ VP2 $ Poped
$ $ Poped
$ $ VP2-> ε
Sentence is
Sentence is $ $
$ $ accepted
accepted
Table 6. Moves made by a Bangla parser on input Figure 2. Parse tree for Bangla parser on input
“pronoun verb noun” for correct sentence “আমি “modifier noun noun verb”
খাই ভাত”
Stack Input Action

pronoun verb
$S
noun $
pronoun verb
$ VP NP S->NP VP
noun $
$ VP NP2 pronoun verb
NP->pronoun NP2
pronoun noun $
$ VP NP2 verb noun $ poped
$ VP verb noun $ NP2->zero
$ VP3 verb verb noun $ VP->verb VP3
Figure 3. Parse tree for Bangla parser on input
“pronoun verb noun”
$ VP3 noun $ poped
$ noun noun $ VP3->noun
$ $ poped
Sentence is
$ $
accepted
150
Figure 4. Parse tree for Bangla parser on input

Table 7 respectively. Figure 3 and Figure 4 show
“noun pronoun verb”
the corresponding parse tree. Table 8 shows the
moves for an incorrect sentence “রহিম বিদ্যালয়ে”
of the form “noun noun”. The sentence is rejected
because there is no verb in the sentence and such
types of productions are not presented in the gram-
mar G (see Algorithm 1).
Algorithm 1 is for parsing which uses the parse
table M (Table 4) to produce a Bangla parser for
the input of a Bangla sentence. Figure 2 shows
the parse tree generated by the grammar G for the
input একটি ছেলে বই পড়ছে of the form modifier
noun noun verb.
Table 8. Moves made by a Bangla parser on input

“noun noun” for incorrect sentence PERFORMANCE ANALYSIS OF
DESIGNED BANGLA PARSER
Stack Input Action
$S noun noun $ Some input sentences are considered to be used
$ VP NP noun noun $ S->NP VP for performance analysis of the designed parser.
$ VP NP1 noun noun noun $ NP->noun NP1 There are three types of sentences used for this
$ VP NP1 noun $ Popped purpose:
$ VP NP3 noun noun $ NP1->noun NP3
$ VP NP3 $ Popped 1. Simple and traditional form
Sentence is rejected 2. Some non-traditional form
3. Paragraphs
Algorithm 1. Parsing with table M
1. Set Input Pointer(IP) to point to the first word of w;

2. Set X to the top stack word;
3. While (X! = $)
Begin /* stack is not empty */
a. If X is a terminal, pop the stack and advance IP;
b. If X is a Non terminal and M[X,IP] has the production X → Y1Y2…Yk
Output the production X -> YlY2 -Yk;
Pop the stack;
Push Yk, Yk-1,. . ., Yl onto the stack, with Yl on top;
c. If X=$, Sentence is Accepted.
End
151
The paragraphs are collected from newspaper. FUTURE RESEARCH DIRECTIONS

By non-traditional form we mean the same mean-
ing of another sentence having structural similarity. In this chapter, a context free grammar for Bangla
For example, “আমি ভাত খাই” and “খাই আমি ভাত” language is described and hence a Bangla parser
sentences are approximately similar but later on based on the grammar is developed. The approach
is rarely used. Table 4 shows the success rate of is very much general to apply in Bangla Sentences
our proposed grammar. and the method is well accepted for parsing a
language of a grammar. The structural representa-
• Input Sentences: tion that has been built can cover the maximum
simple, complex, and compound sentences. But
তিনি কি ভাল না?, আমি ভাত খাই । আমি কি ভাত the sentences composed of idioms and phrases
খাই?, আমি কি ভাত খাই না? “বাহ! পাখিটি ত�ো খুব সুন্দর”। are beyond the scope of this chapter.
শিতে আমরা খুব কষ্ট পাই । ত�োমরা আগামীকাল এস�ো । There are lots of research scopes on the fields
আমি, তু মি ও সে বাড়ি যাই । আগামীকাল কি তু মি আসবে? of Bangla Natural Language Parser. The future
development may include the following.
• Input Paragraphs:
1. More extensive CFGs with the addition of
মাঝরাতে হঠাৎ করেই আপনার অসুখ করল। চিকিৎসক more constituents for parsing complex and
বা হাসপাতাল আশপাশে নেই।ভয় পাওয়ার ক�োন কারণ নেই । compound sentences of Bangla NL more
যদি আপনার হাতের কাছেই ম�োবাইল পান,তবে ডায়াল করুন efficiently.
হেল্প লাইনে । শীঘ্রই আপনি পেয়ে যাবেন দরকারি পরামর্শ। 2. Efficient algorithm for Bangla NL Parser to
আমরাই একমাত্র জাতি যারা ভাষার জন্য যুদ্ধ করেছি solve the problem of ambiguity.
। মায়ের ভাষা বাংলাকে ছিনিয়ে এনেছি । ভাষার এই যুদ্ধ 3. Parser for semantic analysis of Bangla natural
এখানেই বন্ধ হয়নি । বাংলাকে আলাদা করে প্রতিষ্ঠিত করার sentences.
জন্য শুরু হয়েছে প্রযুক্তির ব্যাবহার । সেই যুদ্ধে ঝাঁপিয়ে পড়েছে
মেধাবী জনতা। Fusion of top-down and bottom-up parsing
একসময় বাংলা ভাষা যে হুমকির মুখে ছিল, তার মূলে technique could result in a more robust and pow-
ছিল উপনিবেশিক শাসন।যা ছিল সংখ্যাগরু রু ওপর চাপিয়ে erful parsing technique.
দেওয়া সংখ্যালঘুর শাসন । তার বিরুদ্ধে লড়াই করেছিল বাংলা
ভাষাভাষী বেশির ভাগ মানুষ । কিন্তু একটি ভাষার শক্তি এর
ব্যাবহারকারীদের ম�োট শক্তির সমান । ফলে বাংলা ভাষার CONCLUSION
বিরুদ্ধকারীরা পরাজিত হয়েছে । যেহেতু তারা সংখ্যায় বেশি
ছিল না। It is well established that parsing natural lan-
guage text is much more difficult than strictly
From Table 9, it is clearly viewed that the defined computer languages. One reason is that
parser is about 70-80% correct. grammars for natural languages are often com-
Table 9. Success rate for different Bangla sentences
Total No. of Sentences per Paragraph Acceptance Rate

Types of Sentences Correctly Detected (D)
(I) A=(D/I)*100%
Traditional 120 100 83.33%
Nontraditional 80 58 72.5%
Paragraph 25 18 72%
152
plex, ambiguous, and specified by collections of Anwar, M. M., Anwar, M. Z., & Bhuiyan, M. A.
examples rather than complete formal rules. The A. (2009). Syntax analysis and machine transla-
aim of this Chapter was to Design and Develop tion of Bangla sentences. International Journal
algorithms for Bangla Parser, which can parse of Computer Science and Network Security, 9(8),
Bangla language. Parsing is a process of trans- 317–326.
forming natural language into an internal system
Anwar, M. M., Shume, N. S., & Bhuiyan, M. A.
representation, which can be trees, dependency
A. (2010). Structural analysis of Bangla sentences
graphs, frames or some other SR (Structural Rep-
of different tenses for automatic Bangla machine
resentation). If a natural language is successfully
translator. International Journal of Computer Sci-
parse then grammar checking from this language
ence and Information Security, 8(9).
becomes easy. Parsing a sentence that involves
finding a possible legal structure for sentence Dasgupta, S., & Ng, V. (2006). Unsupervised
and finally gets an SR. An SR generally graphic morphological parsing of Bengali. Language Re-
object and this representation cannot be deals sources and Evaluation, 40, 311–330. doi:10.1007/
with computer. The standard representation of an s10579-007-9031-y.
SR is a list that are one of the data structure that
Faridee, A. Z. M., & Tyers, F. M. (2009). Devel-
can be implemented and manipulated very easily
opment of a morphological analyzer for Bengali.
within a computer. Our proposed CFG rules can
In Proceedings of the First International Work-
be assigned all types of Bangla sentences into an
shop on Free/Open-Source Rule-Based Machine
SR. To represents a complex and a compound
Translation, (pp. 43-50). IEEE.
sentence into an SR, we used the decomposition
technique. Further modify and extending the CFG Ghosh, A., Das, A., Bhaskar, P., & Bandyopadhyay,
rules (with the addition of sentences consisting S. (2010). Bengali parsing system. Paper presented
of idioms and phrases, double word, change of at ICON NLP Tool Contest 2010.
voice, and narration), we can able to represents
Haque, M. N., & Khan, M. (2005). Parsing Bangla
the all kinds of Bangla sentences which are used
using LFG: An introduction. BRAC University
as an input of an MT engine to produced other
Journal, 2(1), 105–110.
equivalent sentences.
Hasan, K. M. A., Mahmud, A., Mondal, A., &
Saha, A. (2011). Recognizing Bangla grammar
REFERENCES using predictive parser. International Journal of
Computer Science & Information Technology,
Aho, A. V., Sethi, R., & Ullman, J. D. (2002). 3(6). doi: doi:10.5121/ijcsit.2011.3605.
Compilers principles, techniques and tools. New
York: Pearson Education. Hasan, K. M. A., Mondal, A., & Saha, A. (2010).
A context free grammar and its predictive parser
Ali, M. N. Y., Ripon, S., & Allayear, S. M. (2012). for Bangla grammar recognition. In Proceedings
UNL based Bangla natural text conversion: of International Conference on Computer and
Predicate preserving parser approach. Interna- Information Technology, (pp. 87 – 91). IEEE.
tional Journal of Computer Science Issues, 9(3),
259–265. Hoque, M. M., & Ali, M. M. (2003). A parsing
methodology for Bangla natural language sen-
tences. In Proceedings of International Confer-
ence on Computer and Information Technology,
(277-282). IEEE.
153
Hoque, M. M., & Ali, M. M. (2004). Context- Yarowsky, D. (1995). Unsupervised word sense
sensitive phrase structure rule for structural rep- disambiguation rivaling supervised methods. In
resentation of Bangla natural language sentences. Proceedings of 33rd Annual Meeting of the ACL,
In Proceedings of International Conference on (pp. 189-196). ACL.
Computer and Information Technology, (pp.
615-620). IEEE.
Hoque, M. M., Rahman, M. J., & Dhar, P. K. KEY TERMS AND DEFINITIONS
(2007). Lexical semantics: A New approach to ana-
lyze the Bangla sentence with semantic features. Bottom-Up Parsing: Construction starts at
In Proceedings of the International Conference the leaves and proceeds towards the root, i.e.
on Information and Communication Technology, this method works from the sentence to sentence
(pp. 87 – 91). IEEE. symbol.
Context-Free Grammar (CFG): A set of
Joshi, S. (n.d.). Selection of grammatical and logi- recursive rewriting rules called productions used
cal functions in Marathi. (PhD Thesis). Stanford to generate string patterns.
University. Palo Alto, CA. FIRST(): FIRST(α) be the set of terminals that
Mehedy, L., Arifin, N., & Kaykobad, M. (2003). begin the strings derived from α. If α → ε then ε
Bangla syntax analysis: A comprehensive ap- is also included in FIRST(α).
proach. In Proceedings of International Confer- FOLLOW(): FOLLOW(A) of a non terminal
ence on Computer and Information Technology A is the set of terminals a that can appear im-
(ICCIT), (pp. 287-293). ICCIT. mediately to the right of A. If A is the right most
symbol in the sentential form then $ is added to
Murshed, M. M. (1998). Parsing of Bengali natural FOLLOW(A).
language sentences. In Proceedings of Interna- Parse Tree: A tree where the root of the tree
tional Conference on Computer and Information is the start symbol for the grammar, the interior
Technology (ICCIT), (pp. 185-189). ICCIT. nodes are the non-terminals of the grammar, the
Saha, G. K. (2006). Parsing Bengali text: An leaf nodes are the terminals of the grammar and
intelligent approach. ACM Ubiquity, 7(13), 1–5. the children of a node starting from the left to
doi:10.1145/1132512.1127026. the right correspond to the symbols on the right
hand side of some production for the node in the
Selim, M. R., & Iqbal, M. Z. (1999). Syntax analy- grammar.
sis of phrases and different types of sentences in Parser: A parser analyzes the sequence of
Bangla. In Proceedings of International Confer- symbols presented to it based on the grammar.
ence on Computer and Information Technology Parsing: A method where a parser algorithm
(ICCIT), (pp. 175-186). ICCIT. is used to determine whether a given input string
Sengupta, P., & Chaudhuri, B. B. (1997). A delayed is grammatically acceptable or not for a particular
syntactic-encoding-based LFG parsing strategy language.
for an Indian language – Bangla. Computational Top-Down Parsing: In top-down parsing
Linguistics, 23(2), 345–351. method, construction starts at the root and proceeds
towards the leaves i.e., this method works from
sentence symbol to the sentence. Parsers, which
are designed using Top-down parsing method, are
known as top-down parser.
154
View publication stats

Parsing Bangla Grammar Using Context Free Grammar

Uploaded by

Copyright:

Available Formats

Parsing Bangla Grammar Using Context Free Grammar

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parsing Bangla Grammar Using Context Free Grammar

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Parsing Bangla Grammar Using Context Free Grammar

Chapter · January 2014

Bishnu Sarker K. M. Azharul Hasan

SEE PROFILE SEE PROFILE

EM-DMKM View project

Natural Language Processing View project

The user has requested enhancement of the downloaded file.

INTRODUCTION way of expressing the expressions of feelings

আমি + খাই + ভাত. RELATED WORKS

A Parsing Scheme for Bangla <tag_name>value</tag_name>

Figure 1. XML word repository

Each sentence is to be partitioned into its con-

Grammar Design with CFG

Once constituents have been identified, the

Tag Name (Symbol) Examples S −> NP VP

Table 2. Defining constituents

No. Constituents Symbol Definitions Examples

Noun phrase may be consists of only noun or pronoun. It রহিম,

Left Factoring AP −> adjective AP1

A → αA′ VP5 −> verb| noun VP4| pronoun AP1

Designing Bangla Parser FIRST(NP) = {noun, pronoun, modifier,

first(β))⇒ everything in follow(A) is in FOLLOW(VP4) = {noun, adjective, pronoun,

According to the rules of computing FOLLOW(VP5) = {noun, adjective, pronoun,

Table 4. Predictive parse table for the grammar G

Noun Pronoun Modifier Adjective Verb Conjunction ptrn ip aw xp tp $

Stack Input Action Stack Input Action

Stack Input Action

Figure 4. Parse tree for Bangla parser on input

Table 8. Moves made by a Bangla parser on input

Algorithm 1. Parsing with table M

1. Set Input Pointer(IP) to point to the first word of w;

The paragraphs are collected from newspaper. FUTURE RESEARCH DIRECTIONS

Table 9. Success rate for different Bangla sentences

Total No. of Sentences per Paragraph Acceptance Rate

View publication stats

You might also like