Concept Space Synset Manager Tool
Apurva S Nagvenkar
DCST, Goa University
Taleigao Plateau, Goa.
[email protected]
Neha R Prabhugaonkar
DCST, Goa University
Taleigao Plateau, Goa.
[email protected]
Venkatesh P Prabhu
Thyway Creation
Mapusa, Goa.
[email protected]
Ramdas N Karmali
DCST, Goa University
Taleigao Plateau, Goa.
[email protected]
Jyoti D Pawar
DCST, Goa University
Taleigao Plateau, Goa.
[email protected]
Abstract
The IndoWordNet1 Consortium consists of member institutions developing WordNet using the
expansion approach.
The WordNets developed using expansion approach are very much influenced by the source
language and may not reflect the richness of the
target language (Walawalikar et al., 2010). And
therefore the IndoWordNet Community decided
to develop concepts which were specific to their
respective language viz. language-specific concepts which will help in increasing the WordNet
coverage. Besides the above requirement it was
also felt that it should be possible to maintain additional information about the concepts i.e. an
image, document describing the concept, links to
websites and other resources, etc.
In this paper, we discuss a Concept Space Synset
Management Tool (CSS) 2 which was developed
to assist creation of language specific concepts/synsets and manage their linkages to other
Indian language WordNets.
1
called as synset Id given to each concept
(Bhattacharyya, 2010). WordNet is designed to
capture the vocabulary of a language and can be
considered as a dictionary cum thesaurus and
much more (Miller, et al., 1993; Miller, 1995;
Fellbaum, 1998).
Synset (Fellbaum, 1998) is composed of a gloss
describing the concept, example sentences and a
set of synonym words that are used for the concept. Besides synset data, WordNet maintains
many lexical and semantic relations. Table1
gives the number of concepts/synsets created by
the language groups of the Indradhanush WordNet Consortium which is a part of the IndoWordNet Consortium.
Background and Motivation
The IndoWordNet is a multilingual WordNet
which links WordNets of different Indian languages on a common identification number
1
http://www.cfilt.iitb.ac.in/indowordnet
http://indradhanush.unigoa.ac.in/concep
tspace
2
Table1: Synset linkage status
Also a sense marked newspaper corpus (sense
marking is a task to tag each word of the corpus
with the WordNet sense) consisting of minimum
1,00,000 words has been created by each of the
members of the Indradhanush WordNet Consortium. The coverage is found to be low. In order
to increase the coverage of the WordNet it was
decided that a corpus will be created by all language groups and the corpus will be sense
marked.
To increase the coverage it was decided to add
the concepts which were specific to their respective language viz. language-specific concepts and
nullify the effect of influence of the source language on the target language WordNet. The CSS
Manager Tool3 was developed to assist in creation of language-specific concepts, linking to
other language WordNets, providing additional
information about synsets, etc. The features and
the detailed framework of the CSS Manager Tool
is explained in section 3 and 4.
The rest of the paper is organized as follows –
section 2 introduces the related work. The features of CSS Manager Tool are presented in section 3; section 4 presents the architecture of CSS
Manager Tool. Section 5 presents the implementation details followed by the conclusion and future work.
2
Related Work
For many Indian languages, WordNets are constructed using the expansion model where Hindi
WordNet synsets are taken as a source using the
MultiDict Tool (Chatterjee, 2010) created by IIT
Bombay. The tool also had feature to add comments and references but it was not an ideal tool
for creation of language-specific synsets.
The limitations of the MultiDict Tool are:
Creating and linking of languagespecific synsets across languages was
not possible,
finding the overlap of synsets across languages was not possible,
Feature to provide additional information
about the synset was not present,
Validation of synsets was not possible.
Features to search synsets based on domain, date, category was not present.
And therefore the CSS Manager Tool was developed in order to overcome the above limitations.
3
https://www.youtube.com/watch?v=BMhixBI
7xOY&feature=youtu.be
3
Features of CSS Tool
CSS Manager Tool is a centralized tool meant
for effective creation and management of
synsets. The features supported currently by the
CSS Manager Tool are as follows:
1. Synset Creation:
Addition/updation/validation of synsets,
linking of two or more synsets with similar gloss across languages,
Comments- Comments can be provided
in case of any issue in the synset content.
Allows adding additional information
about the synset (images, documents,
links, etc.).
2. Interactive User Interface:
The CSS Manager Tool is designed
keeping in mind the broadest range of
users and contexts of use.
Supports both left-to-right and right-toleft text rendition.
Allows adjustment of the layout as per
direction in which content language is
written through a simple setting of a flag.
Viewing various media added for clarity
on synsets, etc.
3. Security:
The CSS Manager Tool stores information in a centralized database system
where access control mechanisms can
more easily restrict access to your content.
User Management supports adding/
blocking/ unblocking users, and assigns
privileges to the users.
4. Use of RBAC approach
Role-based access control (RBAC) is an
approach to restricting system access to
authorized users.
Roles are created for various functions.
The permissions to perform certain operations are assigned to specific roles.
Members or staff are assigned particular
roles, and through those role assignments
acquire the permissions to perform particular functions.
Roles can be easily created, changed, or
discontinued as the needs evolve, without having to individually update the
privileges for every user.
4
Architecture of CSS Tool
Figure 1 represents the architecture of CSS Manager Tool. The CSS Manager Tool is implemented in three blocks: User block, Super Admin
block, and the Database. The CSS Manager tool
is developed using the Hierarchical Role Based
system with Access Control (RBAC) to control
the access to certain parts and features of the
CSS Manager Tool across different users. Refer
Figure 2 for the block diagram of RBAC.
bles may need to be added to store data specific
to module functionality.
Presently there are five modules, they are:
1. View All Synset: The view synset module allows the linguist to view synsets
belonging to a language group/ category/
domain/source. The linguist/ lexicographer can perform the operations which
are assigned for this module.
2. Synset Creation: Allows the linguist to
create synsets. The linguist/ lexicographer can also add source/domain/images/
documents/links in order to give additional information about the synset.
3. View Linked Synset: Allows the linguist to view the list of synsets linked
across languages.
4. User Management: Allows the administrator of a group to create new users, to
block/unblock user, to assign privileges
to the users, etc.
5. Synset Validation: Allows validation of
synsets.
4.2
Figure1: Architecture of CSS Manager Tool
The User block is responsible for creation/updation/validation of synsets, linking of synsets across languages, adding
comments, source, and domain.
The Super-Admin block is responsible
for the creation of groups, users, roles to
be assigned to the members in a group,
modules and its operations, etc.
The heart of the CSS Manager Tool is a
centralized database that stores all the
CSS data.
4.1
Modules of CSS Manager Tool
A module is an independent component which
offers specific functionality. Each module is assigned different operations related to the module.
The different operations are: Advance search,
add/view/edit/delete/link synsets, and add/delete/
change priority of example, add source, upload/delete file/add/view/reply comments, etc.
Only those operations that need to be performed
by members of a language group are assigned to
the modules and these modules are allotted to the
roles. These modules depend on CSS database.
While the addition of new modules does not require any changes to the CSS database, new ta-
Role-Based system used in CSS Manager Tool
A role hierarchy is a way of organizing roles to
reflect authority, responsibility, and competency.
Some general operations may be performed by
all the group members such as adding, viewing,
searching synsets. In this situation, it would be
inefficient and administratively cumbersome to
specify repeatedly these general operations for
each role that gets created. Therefore role hierarchy is used in order to avoid repetitive tasks. Also when a user is associated with a role, the user
can be given additional privileges.
Currently, the CSS Manager Tool has four roles:
Super admin, Admin, senior linguist and junior
linguist.
The super admin is responsible for creation of groups, users of a group, creation
of roles to be assigned to the members in
a group, addition of new modules and
operations, and various other administrative operations such as adding source,
domain, etc. which other roles cannot
perform.
Figure2: Role Based system with Access Control
5
The Admin is responsible for managing
his/her language group created by the
Super admin. The admin of a group can
add/block users to his group. And can
use all the modules which are assigned
to the Admin by the Superadmin.
The linguists are part of a language
group. The operations (such as creating/
validating/ linking of synsets) performed
by the junior linguists are further validated and approved by the senior linguists of the group.
Implementation Details
The CSS Manager Tool is developed using PHP
scripting language and is hosted on a Web Server
supporting PHP version 5.3.15. Currently
MySQL version 5.5.21 is used as database. The
CSS Manager Tool was developed using
XAMPP on 32 bit Microsoft Windows platform.
It has been deployed on Fedora 16 Linux Platform using Apache version 2.2.22 and MySQL
version 5.5.21 which come bundled with Fedora
16 Linux Platform. The screenshots of the tool
are shown at the end of the paper.
6
Conclusion and Future Work
The advantages of CSS Manager Tool can be
summarized as follows:
Ease in accessing synsets: The synset is
represented by an identification number
called as synset id. Remembering id’s is
difficult for user, than remembering the
concept of the synset. Earlier, the linguists had to remember synset id in order
to perform any operation on synset in future. In CSS Manager Tool, the user
need not remember the synset ids, all the
operations can be performed with the
help of concept and synonymous set of
the words.
Decentralized maintenance: Need of
specialized software or any specific kind
of technological environment to access
the tool is not required. Any browser device connected to the Internet would be
sufficient for the job.
WordNet Enhancement: Creation of
language specific concepts/synsets, adding additional information about the synset and their linkages to other Indian
language WordNets is possible. The tool
is being enhanced to support validation
of WordNets.
Acknowledgement
This work has been carried out as a part of the
Indradhanush WordNet Project (11(13)/2010HCC(TDIL), dated 3-8-2010) jointly carried out
by nine institutions. We wish to express our gratitude to the funding agency DeitY, Govt. of India
and also all the members of the Indradhanush
Consortium.
References
Pushpak Bhattacharyya. 2010. IndoWordNet, Lexical
Resources
Engineering
Conference
2010
(LREC2010), Malta.
Arindam Chatterjee, Salil Joshi, Mitesh Khapra,
Pushpak Bhattacharyya. 2010. Introduction to
Tools for IndoWordNet and Word Sense Disambiguation. 3rd IndoWordNet workshop, International Conference on Natural Language Procesing.
Christiane Fellbaum (ed). 1998. WordNet: An Electronic Lexical Database. Cambridge, MA: MIT
Press.
George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine Miller. 1993.
Introduction to WordNet: An On-line Lexical Database.
George A. Miller. 1995. WordNet: A Lexical Database for English. Communications of the ACM
Vol. 38, No. 11: 39-41.
Shantaram Walawalikar, Shilpa Desai, Ramdas Ka
rmali, Sushant Naik, Damodar Ghanekar, Chandrekha D’souza and Jyoti Pawar. 2010. Experiences in Building the Konkani Word Net using the expansion Approach. In Proceedings of the 5th GlobalWordNet Conference on Principles, Construction
and Application of Multilingual WordNets (Mumbai-India).
Konkani WordNet: WordNet For Konkani Language:
http://konkaniwordnet.unigoa.ac.in
http://indradhanush.unigoa.ac.in/c
onceptspace/
IndoWordNet Website: Multilingual WordNet which
links WordNets of eighteen Indian languages:
http://www.cfilt.iitb.ac.in/indowo
rdnet/
Concept Space Manager Tool Tutorial link:
https://www.youtube.com/watch?v=BM
hixBI7xOY&feature=youtu.be
Indradhanush Website: WordNets for seven Indian
Languages:
http://indradhanush.unigoa.ac.in
Concept Space Synset Manager Tool (CSS Manager
Tool)
:
Snapshots
1. Login Page: The login page of the CSS Manager Tool is shown below.
2. SuperAdmin: The super admin is the highest role in the role hierarchy. The super admin
owns all the privileges which the admin, linguist or lexicographer have. The super admin is
accountable for creation of groups, users of a group, creation of roles to be assigned to the
members in a group, addition of new modules and operations, and various other administrative
operations such as adding source, domain, etc. which other roles cannot perform. The snapshot of the super admin interface is shown below.
3. User Management: This module allows the administrator to view the users in a group, to add
new users, to block or unblock user, to assign privileges to the users, etc. The User Management module is only available to the administrator of the group and not the linguist/ lexicographer.
To add a new User,
The Modules which are available to the linguist and lexicographers are as follows:
Create Synset: This module allows the user to create a new synset.
View All Synset: This module allows the user to view all the synsets created so far. On
selecting ‘View All Synset’ menu link, the user can view synsets belonging to a language.
It also allows the user to select the number of synsets to be displayed per page, to view
synsets based on the date of creation. Each module provides the user with the help files to
assist in tool usage.
The ‘Advance search’ option allows the user to view synsets belonging to a particular
grammatical category i.e Noun, Verb, Adverb, Adjective, a domain, a source and also to
view the synsets created by a user of a group.
Based on the operations assigned to the modules and roles, the user can edit, view or validate the
synsets.
View Linked Synsets: This module is similar to the View All synset module, but it only
allows the users to view the synsets which are linked across languages.
Change Password: This module allows the user to change the password.
Log Out: To log out from the CSS Manager Tool, the user needs to click on ‘Log Out’
from the menu list.