Final Research Paper
Final Research Paper
Final Research Paper
Sentences
1
Harshal Kolhe, BE-Computer, VIIT Pune
2
Sanket Bhamre, BE-Computer, VIIT Pune
3
Yogesh Dahake, BE-Computer, VIIT Pune
4
Prathamesh Bamane, BE-Computer, VIIT Pune
Abstract-The majority of activities performed by people are usually carried out through language, whether communicated
directly or reported through human natural language. As technology is increasingly making advancements in the technologies and
devices through which we communicate, there is a greater need to understand the languages. Tremendous amount of work has
been carried out in various foreign languages like English. The research is still going on the Indian languages. However,
researchers are still researching the implementation of linguistics of Marathi language. By combining the power of Artificial
Intelligence, Computational Linguistics and Computer Science, Natural Language Processing helps machines read text, interpret
them and understand them by imitating the human ability to understand language. Here beginning with Rule based system, we
would be detecting the type of sentence entered in Marathi language, either it be Simple, Compound or Complex in Marathi.
Key words- Natural Language Processing, Rule based system, Text pre-processing, Rule Based Classification, if-then Rules
I. INTRODUCTION
A rule based system consists of a set of if-then rules, which can serve decision support in real applications. The
main idea behind this project is to build a system that can detect the type of the sentence in Marathi Language.
The type of the Sentence can be Simple Sentence, Compound Sentence and Complex Sentence.
A lot of work is already carried out in various foreign languages like English, Arabic, French, Mandarin, Urdu,
etc. Advancements are being researched and some are already explored for foreign language like text
summarization, grammatical error detection, chatbot, machine translation and many more. Research has been
done on these foreign languages. But there is a lot of work needed to be completed in the Indian Languages.
Marathi is a language spoken in India, predominantly by the native people of Maharashtra and the research
work is still in a progress for Marathi language. Work for machine translation though completed works only for
simple sentences. Most of the work in machine translation is partially carried out. Other things like text
summarization are yet to be completely explored.
Hence, considerable amount of work is to be needed for Marathi language with the application of NLP. These
problems led us to develop a system that can detect the type of sentence and further will be useful for linguistic
researchers and NLP researchers.
II. LITERATURE SURVEY
As discussed earlier a lot of work has already been carried for English as a natural language. Applications like
Swelly - facebook messenger chatbot, Ebay - an ecommerce chatbot, Lyft - being the biggest Uber competitor and
applications like Yes Sire which is an Alexa game shows the advancements in the field of Natural Language
Processing for English language. Now let us address the problems and flaws that have encountered while
developing such intelligent systems for local languages like Marathi.
Morphological Disambiguator for Marathi using NLP Arti P. Khadtare1, Dr. Suhas Raut and M. S. Otari
Marathi is the official language of the state of Maharashtra (India) and is one of the 23 official languages of India.
The goal of Natural Language Processing (NLP) is to build computational model of natural language for its analysis
and generation, building intelligent computer system such as machine translation system, man-machine interfaces to
computers in general, speech understanding system, text analysis and understanding system etc. Their thesis showed
that Morphological analysis helps to do perform this kind of knowledge based extraction of languages.
The model was built into following steps:
• Input Marathi text
• Transliteration
• Tokenization
• Morphological Analysis of words
• Finding English matching word
• Output English sentence
These research paper explained how morphological analysis can help turn the events and be the most important
phase of machine translation. Let’s take an example- पजू ा शाळेत जा. Now if you notice Pooja here can be ambiguous i.e.,
it could possibly mean worship or it could be a proper noun like being a girl’s name.
In this system web application is developed for the user to get the type of sentence entered by user. Languages
that are used for development of system is html and css in frontend, python and php using tomcat server in backend.
System has two web pages first one is user side in which user can enter the sentence and predicted type of sentence
is returned. While user enter a sentence in English the system transliterates the sentence into Marathi using
predefined library. Second web page contain management of database which include insertion, deletion, updation of
verbs and conjuctors. System used php because of three main purpose first to get sentence from web page, second to
run python file, third to post the type of sentence back to user side. Rule based approach is used in python file which
is collection of if then rules. The output will be any one of simple, compound or complex sentence. Verbs and
conjuctors are two data set in the system.
To categorize the sentence in any type first check conjuctors ( words which are used to connect two sentence ) if it
is present in a sentence then divide the sentence into sub sentence based upon the conjuctors. If any one of the sub
sentences contains verb (All verbs are stored in a database) it will be categorize as simple. If more than two
subsistence contains verbs then on basis of type of conjuctors it is categories into compound or complex. If the
sentence contains no conjuctor categorize system into simple sentence.
IV. ARCHITECTURE
V. CONCLUSION
Sentence is not just a series of a word, it is a series of words that communicate thought and purpose of Sentence. It is a
basic building block of communication. Each type of sentence conveys it's thought and purpose. So identifying type of
sentence is useful in the NLP domain and in real-life. NLP provides wide set of tools which can be applied in all areas of
life, by learning them and using them in our everyday interactions. It could be helpful in many research work and in the
education domain.
REFERENCES
[1] Pooja Pandey, Dhiraj Amin, Sharvari Govilkar, Rule based Stemmer using Marathi , Vol. 5, Issue 10, October2016 :
https://ijarcce.com/wp-content/uploads/2016/10/IJARCCE-54.pdf
[2] Akshar Bharati, Rajeev Sangal, Dipti M Sharma, Marathi shallow parser, 30 September 2007: http://ltrc.iiit.ac.in/analyzer/marathi/
[3] Jui-Feng Yeh, Tsung-Wei Hsu, Chan-Kun Yeh, Grammatical Error Detection Based on Machine Learning for Mandarin as Second
Language Learning, Proceedings of the 3rd Workshop on Natural Language Processing Techniques for Educational Applications, pages
140–147, Osaka, Japan, December 12 2016 : https://pdfs.semanticscholar.org/c8c1/84904cb3a68784614939608a0713cecdaed5.pdf
[4] G V Garje, Harshad Kulkarni, G K Kharate, Transmuter: An Approach to Rule-based English to Marathi Machine Translation,
International Journal of Computer Applications (0975 – 8887) Volume 98 – No.21, July 2014 :
https://pdfs.semanticscholar.org/0152/845bd621a335537e18e8eaa5e9ba5bd5c31a.pdf
[5] Shah Manthan Jigneshkumar, Arnav Mediratta, Akshay Kudale, Shubham Nangare, A.M.B.E.R Chatterbot, VJER-Vishwakarma
Journal of Engineering Research, Volume 1 Issue 4, December 2017: http://www.vjer.in/vol1issue4/vjer010410.pdf
[6] AlokKumar., Saurabh and Mushahid Raza., Syntax And Semantic Analysis of Devanagari Hindi, A International Journal of Recent
Scientific Research Vol. 8, Issue, 6, pp. 17965-17970, June, 2017 : https://www.recentscientific.com/syntax-and-semantic-analysis-
devanagari-hindi