Papers by Neha Prabhugaonkar
Proceedings of the AAAI Conference on Artificial Intelligence
WordNets are useful resources for natural language processing. Various WordNets for different lan... more WordNets are useful resources for natural language processing. Various WordNets for different languages have been developed by different groups. Recently, World WordNet Database Structure (WWDS) was proposed by Redkar et. al (2015) as a common platform to store these different WordNets. However, it is underutilized due to lack of programming interface. In this paper, we present WWDS APIs, which are designed to address this shortcoming. These WWDS APIs, in conjunction with WWDS, act as a wrapper that enables developers to utilize WordNets without worrying about the underlying storage structure. The APIs are developed in PHP, Java, and Python, as they are the preferred programming languages of most developers and researchers working in language technologies. These APIs can help in various applications like machine translation, word sense disambiguation, multilingual information retrieval, etc.
The IndoWordNet 1 Consortium consists of member institutions developing WordNet using the expansi... more The IndoWordNet 1 Consortium consists of member institutions developing WordNet using the expansion approach. The WordNets developed using expansion approach are very much influenced by the source language and may not reflect the richness of the target language (Walawalikar et al., 2010). And therefore the IndoWordNet Community decided to develop concepts which were specific to their respective language viz. language-specific concepts which will help in increasing the WordNet coverage. Besides the above requirement it was also felt that it should be possible to maintain additional information about the concepts i.e. an image, document describing the concept, links to websites and other resources, etc. In this paper, we discuss a Concept Space Synset Management Tool (CSS) 2 which was developed to assist creation of language specific concepts/synsets and manage their linkages to other Indian language WordNets.
Advances in Intelligent Systems and Computing, 2018
We propose the demonstration of a key component of our AI-based business analyst: a question inte... more We propose the demonstration of a key component of our AI-based business analyst: a question interpreter that takes as input a question and displays its meaning in terms of a semantic interpretation. The semantic interpretation captures linguistic properties, entities and metrics in the question. The question interpreter is an application of well-studied Natural Language Processing (NLP) techniques such as syntactic parsing, Part-of-Speech (POS) tagging, etc. The question interpreter is the first step of an AI-based business analyst that allows users to ask natural language questions on enterprise data.
The IndoWordNet 1 Consortium consists of member institutions developing WordNet using the expansi... more The IndoWordNet 1 Consortium consists of member institutions developing WordNet using the expansion approach. The WordNets developed using expansion approach are very much influenced by the source language and may not reflect the richness of the target language (Walawalikar et al., 2010). And therefore the IndoWordNet Community decided to develop concepts which were specific to their respective language viz. language-specific concepts which will help in increasing the WordNet coverage. Besides the above requirement it was also felt that it should be possible to maintain additional information about the concepts i.e. an image, document describing the concept, links to websites and other resources, etc. In this paper, we discuss a Concept Space Synset Management Tool (CSS) 2 which was developed to assist creation of language specific concepts/synsets and manage their linkages to other Indian language WordNets. 1 Background and Motivation The IndoWordNet is a multilingual WordNet which...
WordNet is a crucial resource that aids in several Natural Language Processing (NLP) tasks. The W... more WordNet is a crucial resource that aids in several Natural Language Processing (NLP) tasks. The WordNet development activity for 18 Indian languages has been initiated in INDIA by the IndoWordNet1 consortium using the expansion approach with the Hindi WordNet developed by IIT Bombay, as the source. After linking 20K synsets, it was decided that each of these languages should find the coverage of their respective language WordNets by using sense marker tool released by IIT Bombay. The sense marking activity mainly helped in validation of WordNet and improving the WordNet coverage. In this paper, the various effects that sense marking activity had on the Konkani2 language WordNet development are presented.
WordNets are useful resources for natural language processing. Various WordNets for different lan... more WordNets are useful resources for natural language processing. Various WordNets for different languages have been developed by different groups. Recently, World WordNet Database Structure (WWDS) was Redkar et. al (2015) as a common pla tform to store these different WordNet s. However, it is underutilized of programming interface. In this paper, we present APIs, which are designed to address this shortcoming. These WWDS APIs, in conjunction with WWDS, act as a wrapper that enables developers to utilize WordNets without worrying about the underlying storage structure. are developed in PHP, Java, and Python, as they are the preferred programming languages of most developers and researchers working in language techno logies can help in various applications like machine t word sense disambiguation, multilingual information retrieval, etc.
Work is currently under way to develop WordNet for various Indian languages. The IndoWordNet Cons... more Work is currently under way to develop WordNet for various Indian languages. The IndoWordNet Consortium consists of member institutions developing their own language WordNet using the expansion approach. Many tools and utilities have been developed by various institutes to help in this process. In this paper, we discuss an object oriented Application Programming Interface (API) that we have implemented to facilitate easy and rapid development of tools and other software resources that require WordNet access and manipulation functionality. The main objective of IndoWordNet Application Programming Interface (IWAPI) is to provide access to the WordNet resource independent of the underlying storage technology. The current implementation manipulates data stored in a relational database. Furthermore the IWAPI also supports parallel access and manipulation of WordNets in multiple languages. In this paper, we discuss functional requirements, design and the implementation of IndoWordNet API ...
Research in Computing Science, 2015
Most words in natural languages are polysemous in nature that is they have multiple possible mean... more Most words in natural languages are polysemous in nature that is they have multiple possible meanings or senses. The sense in which the word is used determines the translation of the word. We show that incorporating a sense-based translation model into statistical machine translation model consistently improves translation quality across all different test sets of five different language-pairs, according to all eight most commonly used evaluation metrics. This paper is an investigation on how to initiate research in word sense disambiguation and statistical machine translation for under-resourced languages by applying Word Sense Induction.
The WordNets for many official Indian languages are being developed by the members of the IndoWor... more The WordNets for many official Indian languages are being developed by the members of the IndoWordNet Consortium in India. It was decided that all these WordNets be made open for public use and feedback to further improve their quality and usage. Hence each member of the IndoWordNet Consortium had to develop their own website and deploy their WordNets online. In this paper, the Content Management System (CMS) based approach used to speed up the WordNet website development and deployment activity is presented. The CMS approach is database driven and dynamically creates the websites with minimum input and effort from the website creator. This approach has been successfully used for the deployment of WordNet websites with friendly user interface and all desired functionalities in very short time for many Indian languages.
We present our work on developing fifteen Hierarchical Phrase Based Statistical Machine Translati... more We present our work on developing fifteen Hierarchical Phrase Based Statistical Machine Translation (HPB-SMT) systems for five Indian language pairs namely Bengali-Hindi, English-Hindi, Marathi-Hindi, Tamil-Hindi, and Telugu-Hindi, in three domains each, HEALTH, TOURISM and GENERAL. We named them PanchBhoota, as these systems are elemental in nature. We used a very simple approach to train, tune, and test them using cdec toolkit. We hope that this work will motivate Indian Language Machine Translation researchers to look deeper into the field of HPBSMT which is known to perform better than Phrase Based Statistical Machine Translation.
Natural Language Processing, 2021
Business users across enterprises today rely on reports and dashboards created by IT organization... more Business users across enterprises today rely on reports and dashboards created by IT organizations to understand the dynamics of their business better and get insights into the data. In many cases, these users are underserved and do not possess the technical skillset to query the data source to get the information they need. There is a need for users to access information in the most natural way possible. AI-based Business Analysts are going to change the future of business analytics and business intelligence by providing a natural language interface between the user and data. This natural language interface can understand ambiguous questions from users, the intent and convert the same into a database query. One of the important elements of an AI-based business analyst is to interpret a natural language question. It also requires identification of key business entities within the question and relationship between them to generate insights. The Artificial Named Entity Classifier (ANE...
WordNet is a crucial resource that aids in Natural Language Processing (NLP) tasks such as Machin... more WordNet is a crucial resource that aids in Natural Language Processing (NLP) tasks such as Machine Translation, Information Retrieval, Word Sense Disambiguation, Multilingual Dictionary creation, etc. The IndoWordNet is a multilingual WordNet which links WordNets of different Indian languages on a common identification number given to each concept. WordNet is designed to capture the vocabulary of a language and can be considered as a dictionary cum thesaurus and much more. WordNets for some Indian Languages are being developed using expansion approach. In this paper we have discussed the details and our experiences during the evolution of this database design while working on the Indradhanush WordNet Project. The Indradhanush WordNet Project is working on the development of WordNets for seven Indian languages. Our database design gives an efficient plan for storage of WordNet data for all languages. In addition it extends the design to hold specific concepts for a language.
WordNet is a crucial resource that aids in Natural Language Processing (NLP) tasks such
as Machin... more WordNet is a crucial resource that aids in Natural Language Processing (NLP) tasks such
as Machine Translation, Information Retrieval, Word Sense Disambiguation, Multi-lingual
Dictionary creation, etc. The IndoWordNet is a multilingual WordNet which links WordNets of
different Indian languages on a common identification number given to each concept. WordNet
is designed to capture the vocabulary of a language and can be considered as a dictionary cum
thesaurus and much more. WordNets for some Indian Languages are being developed using
expansion approach.
In this paper we have discussed the details and our experiences during the evolution
of this database design while working on the Indradhanush WordNet Project. The Indradhanush
WordNet Project is working on the development of WordNets for seven Indian languages.
Our database design gives an efficient plan for storage of WordNet data for all languages. In
addition it extends the design to hold specific concepts for a language.
Work is currently under way to develop WordNet for various Indian languages. The IndoWord-Net Con... more Work is currently under way to develop WordNet for various Indian languages. The IndoWord-Net Consortium consists of member institutions developing their own language WordNet using the expansion approach. Many tools and utilities have been developed by various institutes to help in this process. In this paper, we discuss an object oriented Application Programming Interface (API) that we have implemented to facilitate easy and rapid development of tools and other software resources that require WordNet access and manipulation functionality. The main objective of IndoWordNet Application Programming Interface (IWAPI) is to provide access to the WordNet resource independent of the underlying storage technology. The current implementation manipulates data stored in a relational database. Furthermore the IWAPI also supports parallel access and manipulation of WordNets in multiple languages.
WordNet is a crucial resource that aids in several Natural Language Processing (NLP) tasks. The W... more WordNet is a crucial resource that aids in several Natural Language Processing (NLP) tasks. The WordNet development activity for 18 Indian languages has been initiated in INDIA by the IndoWordNet 1 consortium using the expansion approach with the Hindi WordNet developed by IIT Bombay, as the source. After linking 20K synsets, it was decided that each of these languages should find the coverage of their respective language WordNets by using sense marker tool released by IIT Bombay.
The WordNets for many official Indian languages are being developed by the members of the IndoWor... more The WordNets for many official Indian languages are being developed by the members of the IndoWordNet Consortium in India. It was decided that all these WordNets be made open for public use and feedback to further improve their quality and usage. Hence each member of the IndoWordNet Consortium had to develop their own website and deploy their WordNets online. In this paper, the Content Management System (CMS) based approach used to speed up the WordNet website development and deployment activity is presented. The CMS approach is database driven and dynamically creates the websites with minimum input and effort from the website creator. This approach has been successfully used for the deployment of WordNet websites with friendly user interface and all desired functionalities in very short time for many Indian languages.
The IndoWordNet 1 Consortium consists of member institutions developing WordNet using the expansi... more The IndoWordNet 1 Consortium consists of member institutions developing WordNet using the expansion approach.
Uploads
Papers by Neha Prabhugaonkar
as Machine Translation, Information Retrieval, Word Sense Disambiguation, Multi-lingual
Dictionary creation, etc. The IndoWordNet is a multilingual WordNet which links WordNets of
different Indian languages on a common identification number given to each concept. WordNet
is designed to capture the vocabulary of a language and can be considered as a dictionary cum
thesaurus and much more. WordNets for some Indian Languages are being developed using
expansion approach.
In this paper we have discussed the details and our experiences during the evolution
of this database design while working on the Indradhanush WordNet Project. The Indradhanush
WordNet Project is working on the development of WordNets for seven Indian languages.
Our database design gives an efficient plan for storage of WordNet data for all languages. In
addition it extends the design to hold specific concepts for a language.
as Machine Translation, Information Retrieval, Word Sense Disambiguation, Multi-lingual
Dictionary creation, etc. The IndoWordNet is a multilingual WordNet which links WordNets of
different Indian languages on a common identification number given to each concept. WordNet
is designed to capture the vocabulary of a language and can be considered as a dictionary cum
thesaurus and much more. WordNets for some Indian Languages are being developed using
expansion approach.
In this paper we have discussed the details and our experiences during the evolution
of this database design while working on the Indradhanush WordNet Project. The Indradhanush
WordNet Project is working on the development of WordNets for seven Indian languages.
Our database design gives an efficient plan for storage of WordNet data for all languages. In
addition it extends the design to hold specific concepts for a language.