The making of Ancient Greek WordNet
Yuri Bizzoni∗ , Federico Boschetti⋄ , Riccardo Del Gratta⋄ ,
Harry Diakoff‡ , Monica Monachini⋄ , Gregory Crane⋆
⋄
∗
CNR-ILC “A. Zampolli”, Pisa - Italy, Via Moruzzi 1
{firstname.lastname}@ilc.cnr.it
Università degli Studi di Pisa, Pisa - Italy, Via Santa Maria, 53
[email protected]
‡
Alpheios Project
http://alpheios.net,
[email protected]
⋆
Perseus Digital Library Project, Department of Classics - Eaton Hall 134C
Tufts University - Medford MA, 02155 USA
http://www.perseus.tufts.edu,
[email protected]
Abstract
This paper describes the process of creation and review of a new lexico-semantic resource for the classical studies: AncientGreekWordNet. The candidate sets of synonyms (synsets) are extracted from Greek-English dictionaries, on the assumption that Greek words
translated by the same English word or phrase have a high probability of being synonyms or at least semantically closely related. The
process of validation and the web interface developed to edit and query the resource are described in detail. The lexical coverage of
Ancient Greek WordNet is illustrated and the accuracy is evaluated. Finally, scenarios for exploiting the resource are discussed.
Keywords: Ancient Greek, Multilingualism, Classical Philology
1. Overview
This paper describes a work in progress in its early stage
for the creation of the Ancient Greek WordNet (AGWN)
and its linkage to other WordNets (WNs).
Rich and deep lexical and grammatical tradition, coupled
with the changes in meaning due to modern developments
makes the creation of lexical resources for classical languages quite a complicated task. The literature has plenty
of examples of attempts at endowing ancient languages
with Wordnets. We only mention here (Kulkarni et al.,
2010), which describes the construction of a Sanskrit WordNet, built using the expansion approach.
The need for a WordNet of ancient Greek, in particular after the creation of Minozzi’s WordNet for Latin (Minozzi,
2009; McGillivray, 2010), has become increasingly evident
as other digital resources for the Classics have appeared.
In the field of linguistic and literary analysis, text processing techniques offer the possibility to investigate the vocabulary of a large amount of classical texts, as explained
in (Bamman and Crane, 2008; Bamman and Crane, 2011).1
Most of philological digital instruments, such as concordance tools, would profit from the availability of a WordNet to allow for the extension of keyword-based searches
to semantically related lemmas. But the main motivation
for having computational resources for classic languages is
the possibility to perform automatic analysis. An Ancient
1
The Dynamic Lexicon is “an NEH-funded project to automatically create bilingual dictionaries (Greek/English and
Latin/English) using parallel texts [. . . ] along with the syntactic
data encoded in treebanks.”: http://nlp.perseus.tufts.edu/lexicon
Greek WordNet provides a lexical resource that can be used
as a tool for applying many computational linguistic techniques such as methods for Word Sense Disambiguation,
Word Similarity, etc. and enhancing the performance of Information retrieval.
In the long-standing field of computational lexicography,
the development of computational lexical resources has
gone on for the last thirty years at least, producing large
scale resources that are now commonly used in tandem with
tools for the automatic extraction of lexical items and relations to foster e.g. the production of thesauri.
Natural Language Processing tools and Lexical Extraction
tools are an aid both for enhancing access to electronic texts
and for supporting the analysis of texts. The annotation
of texts at different levels of linguistic analysis allows for
more refined search instruments to be offered to the reader
and for a more refined set of features to be used with algorithms of computational text analysis and classification.
The integration of WordNets with Treebanks is nowadays
recognized as one of the most compelling needs for both
research and educational purposes.
If we open the lens, making language resources and technologies available and easily usable to scholars of digital
humanities, we will help to overcome the present fragmentation within the discipline and will lead to new research
frontiers thus promoting a methodological change.
This paper presents the first results of efforts by an international collaboration among the Institute of Computational
Linguistics “Antonio Zampolli” in Pisa, the Perseus Project
in Boston, the Open Philology Project in Leipzig and the
Alpheios Project in New York, to address this need, which
1140
the late Emanuele Pianta (Bruno Kessler Foundation), in
collaboration with the University of Pavia (Sausa, 2012),
had also been planning to address before his tragic death.
2. Methodology
2.1.
Creation
The initial automatic construction of the AGWN was
achieved using Greek-English digitized lexicons provided
by the Perseus Project: the LSJ (Liddell et al., 1940), the
Middle-Liddell (Liddell and Scott, 1889) and Autenrieth’s
Homeric Lexicon (Autenrieth, 1891), to extract GreekEnglish word pairs.
The Middle-Liddell proved to be more consistently structured than the other two and thus provided the most reliable
parsing of English synonyms with the least ”noise.”
The Greek word of the extracted bilingual pair was linked to
every synset in the Princeton WordNet (PWN) (Fellbaum,
1998) in which the English member of the pair appeared.
This has been a common approach in the creation of a number of modern WNs (Sagot and Fišer, 2011), because of
the great richness and detail of the PWN, although it raises
both problems common to all uses of English as a pivot
language and issues arising from the attempt to map concepts across cultures that are so remote from one another
(Vossen, 1996).
2.2.
Validation of a Sample
In this early phase of the work, we had two main goals: the
identification of the principal sources of error in the automatic extraction and the evaluation of a relevant sample of
synsets, comparable with traditional studies of synonymy.
“ For this reason, the sample to be manually corrected and
validated was composed by the largest synsets (often due
to spurious synonymy related to very generic terms) and
by synsets including at least one word from J.H. Schmidt’s
Synonymik der griechischen Sprache (Schmidt, 1876).
Before the manual correction, misaligned polysemy had
been reduced by filtering out English meanings expressly
identified as colloquial in the PWN as well as anachronistic MultiWordNet (MWN) domains, such as those related
to modern science and technology and any other domains
consisting primarily of recent neologisms (e.g. aviation,
telecommunication, football, etc.). The MWN Domains2
resource was used for this purpose (Magnini et al., 2001).
For example, the English word “plane” assumes in different
domains specific meanings, a geographic entity (rhykánē,
ῥυκάνη), a kind of tree (plátanos, πλάτανος), but also “aircraft”. The latter meaning can be filtered out by the identification of the anachronistic domain “aviation”.
However, misaligned polysemy, although reduced and
highly filtered, remaines the main source of error, nor did
anachronisms completely disappear.
The manual review was performed by an Italian native language speaker, a graduate student in Digital Humanities,
with a BA in Classics and an intermediate level in English.
He participated in a pilot-project aimed at evaluating the
localization in the Italian language of part of the protocol
2
The version 3.2 is available at http://wndomains.fbk.eu
established in the Perseus Project for the creation, correction and validation of resources for the study of Classics.
During the manual validation process, the student had to
rank each word of the Ancient Greek synsets with a score
between 0 (semantically not related) and 2 (synonym with
the other words in the synset). The possibility to assign an
intermediate score 1 to the translation was made necessary
by linguistic and cultural problems.
When a word was considered inadequate for inclusion in
a synset but still semantically related to it, the type of the
specific semantic relation (e.g. hyponymy, meronymy, etc.)
was marked by the student, in order to insert, in a second
stage of the work, the inspected word into the correct synset
(whether existent or newly created).
For example, the synset that contains the English word
“bird”, glossed by “warm-blooded egg-laying vertebrates
characterized by feathers and forelimbs modified as
wings”, attracts a high number of Greek lemmas. Many of
these, however, do not refer to the general concept “bird”
(órnis, ὄρνις, in Greek), but to some species of birds:
kérkēris (κέρκηρις) is an aquatic bird, drepanı́s (δρεπανίς)
is defined by LSJ “a bird, so called from the shape of its
wings, probably the Alpine swift, Cypselus melba”, etc. All
these terms are marked with the suitable semantic relation,
in order to be placed in hyponymic synsets in a second stage
of the work.
If the list of Greek words in a specific synset is ill-formed,
either because the gloss for the concept is inadequate to express the correct meaning or because all the related English
words are inadequate to translate the Ancient Greek term,
the student logically isolated3 the synset.
The deactivation of a synset is a challenging choice, especially when the synset expresses a modern concept that is
an evolution of the ancient concept. In this case the student
marks a near-equivalent-like relation4 between the modern
and the ancient concept that must be inserted and glossed.
The first stage of the validation process was mainly focused
on deletion of inadequate words from synsets and isolation
of inadequate synsets from the semantic net, but the second
stage will take into account also the necessity to add new
words to the synsets or new synsets to the semantic net.
2.3.
Linkage to other WNs
English / Greek bilingual resources are available under free
licenses and frequently used not only by native speakers but
also by the entire community of scholars and students of
Classics. But we believe that scholars and students belonging to the international community of classicists and who
are involved in crowdsourcing efforts to extend available
resources for the study of classical languages are strongly
assisted in their work if they have at their disposition bilingual resources in various languages, and especially in their
native languages. These resources are aimed at understanding the nuances of meaning expressed by terms belonging
to different synsets.
3
The sysnet is temporary removed (deactivated) from the net
because the links are considered inadeguate at the moment of the
analysis, but they can be activated again after further investigation.
4
The correct statement is “near to the concept expressed by a
definition that needs adjustments”
1141
Accordingly, the manual review during the pilot-project
was facilitated by consultation with several thesauri of
Classical Greek and bilingual dictionaries (in particular
Greek / English and Greek / Italian, even if this latter unfortunately not available under free licence) and by alignment of the AGWN not only with PWN but also with Italian WordNet (IWN), developed at the Institute of Computational Linguistic in Pisa (Roventini et al., 2003), the Italian
section of the MWN, developed at Bruno Kessler Foundation and a Latin WordNet automatically produced by the
Alpheios Project linked to Minozzi’s Latin WordNet.
2.4.
Figure 1: The search page of the GUI
Comparison to Latin WN
The comparison with Latin WN is interesting, because
it is an available resource manually checked by a classicist, linked to PWN, which strengthens the evaluation of a
synset: if a Greek term is associated with an apparently inadequate synset but manually checked Latin terms are associated with the same synset, additional attention is needed
before rejecting the relation.
However, Latin WordNet seems to be less restrictive about
anachronisms that we decided to reject, managing them by
an “extended polysemy” policy: existing Latin words that
acquired modern senses are extended also to those senses,
in agreement with the modus operandi of the Lexicon Recentis Latinitatis (Egger, 2004). Accordingly, for instance,
“cliens” can have the meaning of “any computer that is
hooked up to a computer network” and, thus, find a place
as a hyponym of “machina”, which, in turn, can also signify a “4-wheeled motor vehicle; usually propelled by an
internal combustion engine”, hyponym of “vehiculum”. In
Latin WordNet (LWN) the term “accitus” means ‘an order
to appear in person at a given place and time’, ‘a writ issued
by authority of law’ and also ‘a telephone connection’.
Latin WordNet seems in other words to address the problem
of polysemy from a modernist perspective: the persistence
of at least one common sense between two words justifies
the inclusion of the ancient lemma in its modern counterpart’s synset.
3. The Ancient Greek WordNet GUI
In this section we describe the graphical user interface to
query and edit Ancient Greek WN, which has beeen developed according to the needs of manual checking and correction and validation. The interface is available at the URL:
http : //www.languagelibrary.eu/new ewnui
Figure 1 illustrates the search page, which is divided in two
sections: the top one is devoted to the personal profile of
the user, including his/her activities, while the bottom is
devoted to the search panel with options for the source and
target languages.
3.1.
Structure of the model
The data model behind the GUI has been designed to manage WN-like data structures. So far it deals with the following WNs:
• Princeton WordNet;
• Italian WordNet;
• Croatian WordNet;
• Arabic WordNet;
• Latin WordNet;
• Ancient Greek WordNet.
but it has been planned to have pluggable components, so
a new WN can be inserted into the model and added to the
search panel.
The main feature of the model is the possibility to have a set
of mapped concepts in different languages. According to
section 2.3., thus starting from English-Greek concepts and
words, each WN is mapped onto PWN so that the English
is the pivot language and each concept is mapped to the
corresponding English concept. For example, the Croatian
“107543288 (n) snažan osjećaj naklonosti, strastvene
privrženosti; duhovna i/ili spolna privlačnost jednog bića
prema drugome [ljubav]”, is mapped onto the English:
“107543288 (n) a strong positive emotion of regard and
affection [love]”, which correspond to the Ancient Greek
synset that contains the terms to be accepted or rejected
agápē (ἀγάπη), philótēs (φιλότης), érōs (ἔρως), etc.
3.2.
Search Panel
The search panel in Figure 1 contains two main zones: an
input language area where users can select the source language and a list of output languages from which users can
select the desired target language(s).
Once the users have selected the input language, they start
typing the word to search in the textbox, where an autocomplete mechanism suggests the list of words (which are
contained in the input WN, selected according to the input language) that starts with the characters they have been
typed.
Figure 2 describes how the English pivot language interface
between input (Greek) and target (Latin and Italian) to show
the target concepts that are mapped to the input synset(s).5
5
As explained in section 3.3., the input synsets which are presented to the users contain the input word.
1142
Figure 3: POS and gloss of the selected concept
Figure 2: English as pivot between input and target languages
3.3.
synset: they can choose 0 to exclude the word from
the synset, 2 to say the word is fully pertinent to
the synset, 1 to imply that further investigations are
needed. In addition they can add up to 5 words to the
synset: see Figure 4.
From words to synsets and related
operations
Once the “View Results” button is pressed, the list of
synsets that contain the typed word is presented to the users.
The entire list is presented to the users, so that they have to
click on the appropriate synset and navigate to the corresponding target concept.
• 2001100141806 (V)
the act of inspecting or verifying [ ἐπισχεθεῖν,
κολάζω, ἀντεφοράω, ἰσχανάω, ἴσχω, ... ]
• 2001100167446 (V) (chess)
a direct attack on an opponent’s king [ ἐπισχεθεῖν,
κολάζω, ἀντεφοράω, ἰσχανάω, ἴσχω, ... ]
• 2001100318735 (V)
the act of carrying something [ ἐμβαστάζω, ἐποχετεύω, πορίζω, μυριαγωγέω, κυέω, ... ]
Figure 4: Validate and add words to the synset
• ....
3.4.
Edit the data
While everyone can search and browse the data, the editing features of the GUI are available only to logged users.
Users can log on the system through their profile tab in the
search page, see Figure 1.
For example, logged users can edit the specific meanings
of échō (ἔχω): “the act of carrying something”.
Once clicked on the identifier of the synset,
2001100318735, logged users can do the following
activities:
Browse This tab of the GUI allows logged users to edit
the part of speech as well as the gloss of the concept
(for example a typical activity consists in modifying or
translating the gloss in the target language) see Figure
3.
Managing Relations The GUI displays the relations that
involve the selected synset and the synsets to which it
is connected. For example, the synset “100318735,
the act of carrying something” has the hypernym
“100315986, the act of moving something from one
location to another”. Logged users have the possibility
of validating/adding the relations, in the same way to
the word validation/addition, see figure 5. This functionality allows the users to modify the conceptual network, changing the original graph, which is inherited
from the PWN structure.
Managing words in the synsets As anticipated in section
2.2., logged users can validate words within the
1143
Figure 5: Validate and add relations
4. Results
4.1.
1013 out of 33910 synsets have been checked, in order to
evaluate the performance of the system and in order to start
correcting errors.
84 out of 1013 synsets (8.3%) have been deactivated, because of an erroneous association to modern concepts alien
to antiquity, such as “a series of linked atoms (generally in
an organic molecule)” automatically associated to hórmos
(ὅρμος), hàlysis (ἅλυσις), sýsphigma (σύσφιγμα), psállion
(ψάλλιον), hormathós (ὁρμαθός), due to the polysemy of
the English translation “chain”.
14 out of 1013 synsets (1.4%) have been marked as “near
to the concept expressed by a definition that needs adjustments”. These cases are interesting because they clearly
demonstrate the gap between Sinn (sense) and Bedeutung
(denotation), to use Frege’s categories.
For instance, the concept associated to gê (γῆ) and gâia
(γαῖα), is defined as “the third planet from the Sun; the
planet we live on; [...]”
The denotation of γαῖα is clearly our planet, but the sense
that defines the concept is related to the scientific paradigm
(Ptolemaic or Copernican).
The 1013 checked synsets contain 6457 senses, i.e. possibly repeated words with a specific different meaning.
4.2.
thálassa (θάλασσα), háls (ἅλς), pélagos (πέλαγος),
póntos (πόντος)
Evaluation of the corrections applied to the
Sample
Comparison with Schmidt’s Synonymik
A comparison with Schmidt’s is not straightforward, because Schmidt’s groupments are more similar to semantic
fields than synsets. However, a couple of lists of terms are
worthy of note:
a the main Greek synonyms to indicate the sea are
present both in the AGWN and in Schmidt (which
adds also the co-hyponym ōkeanós, ὠκεανός):
b the Greek synonyms to express the concept “moving quickly and lightly” (in English: agile, nimble,
quick and spry) can be divided in a subset shared
by both Schmidt and the Ancient Greek WordNet
and two complimentary subsets. Common synonyms
are: aiólos (αἰόλος), aipsērós (αἰψηρός), thoós (θοός),
kraipnós (κραιπνός), laipsērós (λαιψηρός), tachýs
(ταχύς), ōkýs (ὠκύς).
Terms present only in Schmidt with this meaning are: argós (ἀργός), baliós (βαλιός), elaphrós
(ἐλαφρός), karpalı́mos (καρπαλίμος), oksýs (ὀξύς),
panáiolos (παναίολος), sobarós (σοβαρός), trochalós
(τροχαλός), plus three terms related to the quickness
of the foot: argı́pous (ἀργίπους), pod´ōkēs (ποδώκης),
ōkýpous (ὠκύπους).
Finally, it is worthy to note that there are nine relevant terms that are present only in the Ancient
Greek WN: euag´ēs (εὐαγής), eukı́nētos (εὐκίνητος),
dierós (διερός), ı́ksalos (ἴξαλος), kôuphos (κοῦφος),
ksouthós (ξουθός), otrērós (ὀτρηρός), polýskarthmos
(πολύσκαρθμος), spoudâios (σπουδαῖος).
4.3.
Coverage
The total Greek lexicon counts up to 120k different lemmas, while the number of distinct lemmas contained in
the AGWN consists of 35k lemmas, with a coverage of
28%. This is mainly due to the fact that translations constituted by single words or phrases present in the PWN
are used to link the WNs, whereas translations with mismatching phrases currently are unparsed. For example,
tráchouros (τράχουρος) is associated to the correct PWN
synset, which correspond to “horse mackerel”, but óchanon
(ὄχανον) is discarded, because the PWN does not contain
1144
Greek Verb
the phrase “bar across of the shield”. These cases will be
managed in a further stage of the work.
The coverage of the AGWN on the Homeric lexicon is 69%
cf. table 1, due also to the fact that Autentrieth’ Homeric Dictionary has been used for the construction of the resource.
Part of Speech
Nouns (N)
Adjectives (A)
Verbs (V)
Adverbs (R)
N+A+V+R
% of lexicon
32%
27%
33%
80%
100%
échō (ἔχω)
AGWN coverage
76%
59%
72%
61%
69%
kóptō (κόπτω)
Table 1: AGWN coverage of the Homeric lexicon
4.4.
Propagated Polysemy
téuchō (τεύχω)
We have compared the Ancient Greek WordNet with the
Princeton WordNet in order to verify how polysemy in the
two resources is propagated, considering also that PWN
covers 148k different lemmas and AGWN only 35k.
tektáinomai (τεκταίνομαι)
Lemma
break
make
give
take
cut
# of senses
59
49
44
42
41
ágō (ἄγω)
English Verb
carry (40)
hold (36)
....
have (19)
take (6)
make (3)
give (1)
break (1)
......
cut (41)
strike (21)
....
take (1)
....
make (49)
work (27)
give (1)
take (1)
....
make (49)
work (27)
give (1)
take (1)
....
carry (40)
lead (15)
bring (11)
....
take (1)
Table 2: Top five English polysemous verbs
Table 4: Propagated polysemy
Lemma
échō (ἔχω)
kóptō (κόπτω)
téuchō (τεύχω)
tektáinomai (τεκταίνομαι)
ágō (ἄγω)
# of senses
162
125
105
104
91
and spêiron (σπεῖρον): “sail”, prýmna (πρύμνα): “stern”,
oi´ēion (οἰήιον) and póus (ποῦς): “steering-paddle”, ántlos
(ἄντλος): “hold of a ship” are correctly retrieved, even if
the precision and the recall need improvements.
Table 3: Significative polysemous verbs extracted from the
top ten polysemic verbs
The 5 most polysemous Greek words spread into many corresponding English words: for instance ἔχω spreads into
171 different English words among which some of the 5
words in table 2 are contained. This holds also for the other
most polysemous Greek words, due to the inheritance of
the senses from English: see table 4.
A similar analysis can be carried out also for other parts of
speech.
4.5.
Semantic relations
Currently semantic relations are inherited by the PWN, although it is possible to modify them through the graphical
interface. The study of semantic relations can have fruitful didactic applications, especially if focused on the lexicon of specific authors. For instance, the AGWN terms can
be filtered by the Homeric lexicon, in order to identify the
parts of the ship (nâus, ναῦς) in Homer. histı́on (ἱστίον)
5. Discussion
5.1.
Missing synsets in AGWN
Among the limitations of the methods used in the present
approach should be noted the obvious inability to identify
concepts present in ancient Greek that have no counterpart
in the Princeton WordNet, which was initiated in the
mid 1980′ s with American English. But even within
Greek, the use of general lexicons without specifying the
authors and time periods represented by the entries must
necessarily create many associations that were in fact valid
only for specific time periods, or even authors. Ideally
a WordNet should reflect the semantic relationships of a
specific text or collection of texts, linking the lemmas in
each synset with the lexemes in the texts where they have
that particular synset’s meaning. Some of this procedure
can be partially automated using collocations and the
synonymous relations identified within the WordNet itself,
but obviously there will be a need for manual curation if
1145
this level of precision is to be attempted at present.
The exercise of creating the AGWN also provided many
interesting opportunities to compare distinctive characteristics of the two languages. English is often surprisingly
polysemic in a quite different way from ancient Greek. Provided the right context, English makes it easy to detect the
part of speech of a word without morphological clues: exactly what ancient Greek doesn’t allow, vividly illustrating
a major difference between ancient Greek and modern English: the highly polysynthetic nature of the former and the
relatively isolating character of the latter. A difference that
clearly contributed to our difficulties with spurious polysemy among the Greek equivalents of the same English
word.
5.2.
Study of multilingual intertextuality
The AGWN is aimed at supporting the study of multilingual intertextuality inside the Memorata Poetis Project
(Boschetti et al., 2014), an Italian PRIN 2010/2011 funded
project focused on literary and epigraphic poetic texts in
Greek, Latin, Italian and Arabic, in order to evaluate the
transmission of themes and motives across different civilizations.
5.3.
Peculiarities of the user interface
Software to edit WordNets, such as DEBVisDic (Horák et
al., 2006a; Horák et al., 2006b) or WordNet Atlas (Abrate
et al., 2012; Abrate and Bacciu, 2012) and Wikyoto Knowledge Editor (Ronzano et al., 2011), have been evaluated before the creation of a new user interface to query and edit
the WordNet and we decided that some peculiarities of the
targeted language need to be managed accurately.
For example, in ancient Greek and in some other languages,
such as Arabic, present participles can be systematically
used as adjectives and nouns, whereas in other languages,
such Italian, only a few infinitives, past participles and
present participles (e.g. “cantante”, that means “singer”),
lexicalized in the dictionaries, are synonyms of the corresponding nomina actionis (“canzone, nomina rei actae or
nomina agentis (e.g. “cantore”, also with the meaning of
“singer”). Currently, the automated procedure that extracts
the synsets from bilingual dictionaries, due to the semantic relations among different parts of speech, in many cases
puts in the same set both nouns and verbs. Through the
user interface, the reviewer is able to generate from the verbal lemma the correct inflected form (e.g. the participle, or
in other cases the infinitive, etc.) that is synonym to some
nouns in a nominal synset, to lexicalize it, preserving the
morphological information and the lexical relation with the
original lemma, and eventually to validate it.
6. Future Work
In order to reach different groups of users, modules with
the same functionality and similar design must be developed for different platforms. In particular, we are planning
to develop a module for pedagogical use with the Moodle
(https://moodle.org) platform and one for more advanced
use with the Perseids platform for scholarly annotation of
Classical texts. Furthermore, a variety of data should be
linked to AGWN, such as etymological relations with other
WNs, through the crowdsourced Etymological WordNet
(http://www1.icsi.berkeley.edu/ demelo/etymwn).
The GUI needs to reflect these new ideas: while the customization for specific communities is in nuce within the
specific lexicons related to single authors, other features
must be added to make the GUI as complete as possible.
The following features are planned:
Bilingual Search It is essential to be able to perform a
bilingual search, crossing words in two different languages, in order to have a clear idea of missing couples in the created resource. In this case, experts may
see a direct snapshot of the vulnerabilities of the created resource and may add the missing words using
the features developed and described in section 3.
Semantic tagging of specific texts The GUI needs to help
users to select the correct sense of a specific word in a
given text. This is essential for experts to have a plain
idea of what authors mean is a specific contexts. First
experiments have been performed on Homer.
Access and Identity Management This is already available, but we need to create work-groups for specific
authors. Even if the data model is designed to perform
these features, an investigation is needed to create correct groups of users who share the same knowledge of
specific Greek authors.
Validation by a superuser The user profile panel contains
the list of activities performed by a user. This list is editable by both the specific owner and by the superuser.
Added and/or removed words and relations must be
validated by the superuser in order to be effectively
part of the WordNets. As long as the single activity is
not validated the WN remains unalterated.
6.1.
Distribution
The data contained in the Ancient Greek WordNet will be
released as (Linguistic) Linked Open Data ((L)LOD) following the way PWN is released into RDF.6
Indeed Italian WordNet (IWN) has been already released as
(L)LOD, (Del Gratta et al., 2013; Bartolini et al., 2013) and
the other WNs shortly will be released.
(L)LOD represents a new trend in the publication of linguistic resources: a survey on the formats and frameworks
used in the last 20 years to exchange linguistic resources,
(Lezcano et al., 2013) found “an increase in recent years in
approaches adopting the Linked Data initiative”.
(L)LOD are still quantitatively a minority within the linked
data cloud (Chiarcos et al., 2011; Lezcano et al., 2013)
but they are growing and becoming a central modality for
linguistic data and especially for lexical data publication.
Even if not big in number of triples,7 they are significant
in specific weight - especially the resources manually developed/checked as the ones contained in Ancient Greek
WordNet.
6
For example, http://www.w3.org/2006/03/wn/wn20/instances/wordbank.rdf
7
http://linguistics.okfn.org/resources/llod/.
1146
Acknowledgments
We acknowledge Eleonora Sausa (University of Pavia), for
her contribution to the initial design of the AGWN, Antonio
De Prisco (University of Verona) for the interconnection
to Latin WordNet, Neven Jovanović (University of Zagreb)
for his contribution to the connection with Croatian WordNet.
This research has been partially co-funded by the NEH and
the Italian CNR.
7. References
Abrate, M. and Bacciu, C. (2012). Visualizing word senses
in wordnet atlas. In LREC, pages 2648–2652.
Abrate, M., Bacciu, C., Marchetti, A., and Tesconi, M.
(2012). Wordnet atlas: a web application for visualizing
wordnet as a zoomable map. In GWC 2012 6th International Global Wordnet Conference, page 23.
Autenrieth, G. (1891). A Homeric Dictionary for Schools
and Colleges. Harper and Brothers, New York.
Bamman, D. and Crane, G. (2008). Building a Dynamic
Lexicon from a Digital Library. In Proceedings of the
8th ACM/IEEE-CS Joint Conference on Digital Libraries
(JCDL 2008), Pittsburgh, PA, USA.
Bamman, D. and Crane, G. (2011). Measuring Historical Word Sense Variation. In Proceedings of the 11th
ACM/IEEE Joint Conference on Digital Libraries (JCDL
2011), Ottawa.
Bartolini, R., Del Gratta, R., and Frontini, F. (2013). Towards the establishment of a linguistic linked data network for Italian. In Proceedings of the 2nd Workshop
On Linked Data in Linguistic, Collocated with the 6th
International Conference on Generative Approaches to
the Lexicon, Pisa, Italy, September.
Boschetti, F., Del Grosso, A. M., Khan, A. F., Lamé, M.,
and Nahli, O. (2014). A top-down approach to the design of components for the philological domain. In DH
2014 (accepted).
Chiarcos, C., Hellmann, S., and Nordhoff, S. (2011). Towards a linguistic linked open data cloud: The open linguistics working group. TAL, 52(3):245–275.
Del Gratta, R., Frontini, F., Khan, F., and Monachini, M.
(2013). Converting the PAROLE SIMPLE CLIPS Lexicon into RDF with lemon. Semantic Web Journal (submitted).
Egger, C. (2004). Lexicon Recentis Latinitatis. Officina
Libraria Editoria Vaticana (LEV), Città del Vaticano.
Fellbaum, C., editor. (1998). WordNet: An Electronic Lexical Database (Language, Speech, and Communication).
The MIT Press, Cambridge, MA, USA.
Horák, A., Pala, K., Rambousek, A., and Povolný, M.
(2006a). DEBVisDic - First Version of New ClientServer Wordnet Browsing and Editing Tool. In Proceedings of the Third International WordNet Conference
- GWC 2006, pages 325–328, Brno, Czech Republic.
Masaryk University.
Horák, A., Pala, K., Rambousek, A., and Rychlý, P.
(2006b). New clients for dictionary writing on the DEB
platform. In DWS 2006: Proceedings of the Fourth International Workshop on Dictionary Writings Systems,
pages 17–23, Torino, Italy. Lexical Computing Ltd.,
U.K.
Kulkarni, M., Dangarikar, C., Kulkarni, I., Nanda, A., and
Bhattacharya, P. (2010). Introducing sanskrit wordnet.
In The 5th International Conference of the Global WordNet Association (GWC-2010), 31st Jan-4th Feb.
Lezcano, L., Sanchez, S., and Roa-Valverde, A. J. (2013).
A survey on the exchange of linguistic resources: Publishing linguistic linked open data on the web. Program:
electronic library and information systems, 47(3):3.
Liddell, H. G. and Scott, R. (1889). An Intermediate
Greek-English Lexicon. Clarendon Press, Oxford.
Liddell, H. G., Scott, R., Jones, H. S., and McKenzie, R.
(1940). A Greek-English lexicon / compiled by Henry
George Liddell and Robert Scott. Clarendon Press, Oxford, new edition edition.
Magnini, B., Strapparava, C., Pezzulo, G., and Gliozzo, A.
(2001). Using domain information for word sense disambiguation. In The Proceedings of the Second International Workshop on Evaluating Word Sense Disambiguation Systems, SENSEVAL ’01, pages 111–114. ACL.
McGillivray, B. (2010). Automatic selectional preference acquisition for latin verbs. In Proceedings of the
ACL 2010 Student Research Workshop, ACLstudent ’10,
pages 73–78. ACL.
Minozzi, S. (2009). The Latin WordNet Project. In Anreiter, P. and Kienpointner, M., editors, Latin Linguistics Today. Akten des 15. Internationalem Kolloquiums
zur Lateinischen Linguistik, volume 137 of Innsbrucker
Beiträge zur Sprachwissenschaft, pages 707–716.
Ronzano, F., Marchetti, A., and Tesconi, M. (2011). Editing Knowledge Resources: The Wiki Way. In Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pages
2533–2536, New York, NY, USA. ACM.
Roventini, A., Alonge, A., Bertagna, F., Calzolari, N.,
Girardi, C., Magnini, B., Marinelli, R., and Zampolli,
A. (2003). Italwordnet: building a large semantic
database for the automatic treatment of italian. Computational Linguistics in Pisa, Special Issue, XVIII-XIX,
Pisa-Roma, IEPI, 2:745–791.
Sagot, B. and Fišer, D. (2011). Extending wordnets by
learning from multiple resources. In LTC’11 : 5th
Language and Technology Conference, Poznań, Poland,
November.
Sausa, E. (2012). Toward an ancient greek wordnet.
http://goo.gl/y3H3qu.
Schmidt, J. H. H. (1876). Synonymik der griechischen
Sprache. B.G. Teubner, Leipzig.
Vossen, P. (1996). Right or wrong: Combining lexical
resources in the eurowordnet project. In Euralex, volume 96, pages 715–728. Citeseer.
1147