Evaluating Semantic Relations by Exploring Ontologies On The Semantic Web
Evaluating Semantic Relations by Exploring Ontologies On The Semantic Web
Evaluating Semantic Relations by Exploring Ontologies On The Semantic Web
1 Introduction
The problem of understanding how two concepts relate to each other has been
investigated in various fields and from different points of view. Firstly, the level
of relatedness between two terms is a core input for several Natural Language
Processing (NLP) tasks, such as word sense disambiguation, text summarization,
annotation or correction of spelling errors in text. As a result, a wide range
of approaches to this problem have been proposed which mainly explore two
paradigms. On the one hand, corpora-based methods measure co-occurrence in
a given context (usually characterized by means of linguistic patterns) across
large-scale text collections [4, 14]. On the other hand, knowledge rich methods use
world knowledge explicitly declared in ontologies or thesauri (usually, WordNet)
as a source of evidence for relatedness [3].
Secondly, from the beginnings of the Semantic Web (SW), where semantic
relations are the core components of ontologies, the task of identifying the ac-
tual semantic relation that holds between two concepts has received attention
in the context of the ontology learning field [5]. Finally, recent years have seen
an evolution of Semantic Web technologies, which lead both to an increased
number of online ontologies and to a set of mature technologies for accessing
them1 . These changes have facilitated the appearance of a new generation of ap-
plications which are based on the paradigm of reusing this online knowledge [6].
These applications differ substantially from the typical knowledge-based AI ap-
plications (as well as some of the early SW applications) whose knowledge base
is provided a-priory rather than being acquired through re-use during runtime.
They also reform the notion of knowledge reuse, from an ontology-centered view,
1
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/
SemanticWebSearchEngines
to a more fine-grained perspective where individual knowledge statements (i.e.,
semantic relations) are reused rather than entire ontologies. In the case of these
applications, it is therefore important to estimate the correctness of a relation,
especially when it originates from a pool of ontologies with varying quality.
The problem we investigate in this paper is evaluating the correctness of a se-
mantic relation. Our hypothesis is that the Semantic Web is not just a motivation
for investigating this problem, but can actually be used as part of the solution.
We base this hypothesis on the observation that the Semantic Web is a large col-
lection of knowledge-rich resources, and, as such it exhibits core characteristics
of both data source types used in NLP for investigating relatedness: knowledge
resources (structured knowledge) and corpora (large scale, federated). Earlier
research has showed that although contributed by heterogeneous sources, online
ontologies provide a good enough quality to support a variety of tasks [17]. It
is therefore potentially promising to explore this novel source and to investigate
how NLP paradigms can be adapted to a source with hybrid characteristics such
as the SW. We phrase the above considerations into two research questions:
To answer these questions we present two methods that explore online on-
tologies to estimate the correctness of a relation and which are inspired from two
core paradigms used for assessing semantic relatedness. We perform an extensive
experimental evaluation involving 5 datasets from two topic domains and cov-
ering more than 1400 relations of various types. We obtain encouraging results,
with one of our measures reaching average precision values of 75%.
We start by describing some motivating scenarios where the evaluation of se-
mantic relations is needed (Section 2). Then, we describe two measures designed
for this purpose and give details over their implementation (Sections 3 and 4).
In Section 5 we detail and discuss our experimental investigation and results.
An overview of related work and our conclusions finalize the paper.
2 Motivating Scenarios
In this section we describe two motivating scenarios that would benefit from
measures to evaluate the correctness of a semantic relation.
Embedded into the NeOn Toolkit’s ontology editor, the Watson plugin2 sup-
ports the ontology editing process by allowing the user to reuse a set of relevant
ontology statements (equivalent to semantic relations) drawn from online ontolo-
gies. Concretely, for a given concept selected by the user, the plugin retrieves all
the relations in online ontologies that contain this concept (i.e., concepts hav-
ing the same label). The user can then integrate any of these relations into his
ontology through a mouse click. For example, for the concept Book the plugin
would suggest relations such as:
2
http://watson.kmi.open.ac.uk/editor_plugins.html
– Book ⊆ P ublication
– Chapter ⊆ Book
– Book − containsChapter − Chapter
Our measure relies on the hypothesis that there is a correlation between the
length of the derivation path and the correctness of the relation. In particular, we
think that longer paths probably lead to the derivation of less obvious relations,
which are therefore less likely to be correct. To verify this hypothesis we compute
three values: AverageP athLengthR is the average of the lengths of all derivation
paths for relation R (e.g., in our case (1 + 1 + 3)/3 = 1.66), minLengthR is
the length of the shortest derivation path that lead to R (in our case, 1), and
maxLengthR is the length of the longest derivation path associated to R (in our
case, 3). Formally:
P
P athLengthR,Oi
AverageP athLengthR = i
n
|Os,r,t |
RelatednessStrengths,t =
|Os,t |
f req(R)
StrengthRelationR =
allRelss,t
Note that we have also experimented with various ways of normalizing these
measures, however, we do not present them because experimental evaluation has
showed a less optimal behavior than for the original measures.
4 Implementation
We implemented our measures using the services of the Watson3 semantic web
gateway. Watson crawls and indexes a large number of online ontologies4 and
provides a comprehensive API which allows exploring these ontologies.
We have also built an algorithm that, using Watson, extracts relations be-
tween two given terms from online ontologies. The algorithm is highly parame-
terized5 . For the purposes of this study we have configured it so that for each
pair (A,B) of terms it identifies all ontologies containing the concepts A’ and B’
corresponding to A and B from which a relation can be derived between these
terms. Correspondence is established if the labels of the concepts are lexical
variations of the same term. For a given ontology (Oi ) the following derivation
rules are used:
≡
– if A0i ≡ Bi0 then derive A −→ B;
v
– if A0i v Bi0 then derive A −→ B;
w
– if A0i w Bi0 then derive A −→ B;
⊥
– if A0i ⊥ Bi0 then derive A −→ B;
R
– if R(A0i , Bi0 ) then derive A −→ B;
sibling
– if ∃ Pi such that A0i v Pi and Bi0 v Pi then derive A −→ B.
Note that in the above rules, the relations between A0i and Bi0 represent both
explicit and implicit relations (i.e., relations inherited through reasoning) in Oi .
For example, in the case of two concepts labeled DrinkingW ater and tap water,
v
the algorithm deduces the relation DrinkingW ater −→ tap water by virtue of
the following subsumption chain in the TAP ontology: DrinkingWater v Flat-
DrinkingWater v TapWater.
5 Experimental Evaluation
7
http://morpheus.cs.umbc.edu/aks1/ontosem.owl
Fig. 1. Precision variation in terms of threshold values set for the length of the deriva-
tion path.
In columns two and three of Table 5 we present the average values of the
RelatednessStrength measure for True and False relations respectively. Our hy-
pothesis for this measure was that correct relations will most likely be declared
between highly related terms (i.e., where the value of this measure is high), while
the inverse will hold for false relations. Indeed, this hypothesis is verified by the
obtained numbers as, for all datasets, on average, True relations are established
between terms with higher RelatednessStrength than False ones. We note how-
ever, that the difference between the average values of this measure for True
and False relations is rather small thus potentially decreasing its discriminative
power. Indeed, this is verified when computing the best threshold and the cor-
responding precisions (columns four and five), as the precision values are quite
low, not even reaching 50%.
In the second half of Table 5 we present the results of our experiments for the
StrengthRelation measure. Our hypothesis was that high values of this measure,
corresponding to popular relations, will mostly characterize True relations, while
False relations will be associated with lower values. This hypothesis has been
verified in four out of five datasets, where the average value of the measure is
Relatedness Strength
Data Set Strength Best Prec. Relation Best Prec.
T F Thresh. T F Thresh.
AGROVOC/NALT 0.91 0.88 0.89 45% 0.34 0.34 0.34 36%
OAEI’08 301 0.81 0.75 0.75 41% 0.36 0.04 0.33 42%
OAEI’08 302 0.80 0.75 0.80 46% 0.38 0.11 0.11 38%
OAEI’08 303 0.58 0.50 0.55 43% 0.15 0.11 0.12 53%
OAEI’08 304 0.63 0.55 0.59 46% 0.23 0.15 0.16 56%
Table 5. Average values for True and False relations, best threshold and precision
values for RelatednessStrength and StrengthRelation.
lower for False relations than for True relations. The AGROVOC/NALT dataset
is an exception, where both values are the same. We also notice that the difference
between these values is higher than for the previous measure. This has a positive
effect on the discriminative value of the measure, and indeed, we obtain higher
precision values than for RelatednessStrength (up to 56%).
We conclude that, overall, the StrengthRelation measure has a better behav-
ior than RelatednessStrength, although both are clearly inferior to the deriva-
tion path based measures discussed before. We think this is primarily due to
the fact that, despite its increasing size, the Semantic Web is still rather sparse
and as such negatively affects any corpus based measures. These measures could
potentially be strengthened when combined with path based measures.
6 Related Work
An overview of related work suggests that various approaches are used to eval-
uate relatedness or semantic relations. The output of measures that provide a
relatedness (or similarity) coefficient [3, 14] has been evaluated through theoreti-
cal examination of the desirable mathematical properties [10], by assessing their
effect on the performance of other tasks [3], and mainly by comparison against
human judgement by relying on gold-standards such as the Miller Charles data
set [13] or WordSim3538 . The field of ontology learning has focused on learn-
ing taxonomic structures (consisting of hyponymy relations) and other types of
relations [5]. For example, Hearst pattern based techniques have been success-
fully scaled up to the Web in order to identify certain types of relations such
as hyponymy, meronymy [18] or complex qualia structures [5]. The evaluation
measures used to assess the correctness of the learned relations either rely on
comparison to a conceptual structure that plays the role of a gold-standard
(mostly using the measures described in [12]) or on expert evaluation. Note that
the techniques that use Hearst patterns on the Web can implicitly be used to
verify whether a relation is of a given type. As such, these techniques are the
8
http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/
wordsim353.html
most similar to the presented work, with the difference that they explore the
Web (a large body of unstructured knowledge) rather than the Semantic Web (a
collection of structured knowledge).
Another important body of work exists in the context of ontology evaluation
(see two recent surveys for an overview [2], [9]), where existing approaches are
unevenly distributed in two major categories. On the one hand, a few princi-
pled approaches define a set of well-studied, high level ontology criteria to be
manually assessed (e.g., OntoClean [8], Ontometric [11]). On the other hand,
automatic approaches cover different evaluation perspectives (coverage of a cor-
pus, similarity to a gold standard ontology) and levels (e.g., labels, conceptual
structure). Common to these approaches is that they focus on evaluating an
ontology as a whole rather than on assessing the correctness of a given relation
as we do in this work.
Acknowledgements
Work funded by the NeOn IST-FF6-027595 and X-Media IST-FP6-026978 projects.
References
1. H. Alani and C. Brewster. Ontology Ranking based on the Analysis of Concept
Structures. In Proc. of the Third Int. Conf. on Knowledge Capture. ACM, 2005.
2. J. Brank, M. Grobelnik, and D. Mladenic. A survey of ontology evaluation tech-
niques. In Proc. of the Conf. on Data Mining and Data Warehouses, 2005.
3. A. Budanitsky and G. Hirst. Evaluating WordNet-based measures of semantic
distance. Computational Linguistics, 32(1):13 – 47, 2006.
4. R.L. Calibrasi and P.M. Vitanyi. The Google Similarity Distance. IEEE Transac-
tions on Knowledge and Data Engineering, 19(3):370 – 383, 2007.
5. P. Cimiano. Ontology Learning and Population from Text: Algorithms, Evaluation
and Applications. Springer, 2006.
6. M. d’Aquin, E. Motta, M. Sabou, S. Angeletou, L. Gridinoc, V. Lopez, and
D.Guidi. Towards a New Generation of Semantic Web Applications. IEEE In-
telligent Systems, 23(3):20 – 28, 2008.
7. J. Euzenat and P. Shvaiko. Ontology Matching. Springer-Verlag, 2007.
8. N. Guarino and C.A. Welty. An Overview of OntoClean. In S. Staab and R. Studer,
editors, Handbook on Ontologies. Springer-Verlag, 2004.
9. J. Hartmann, Y. Sure, A. Giboin, D. Maynard, M. C. Suarez-Figueroa, and R. Cuel.
Methods for ontology evaluation. Knowledge Web Deliverable D1.2.3, 2005.
10. D. Lin. An information-theoretic definition of similarity. In Proc. of the 15th Int.
Conf. on Machine Learning, 1998.
11. A. Lozano-Tello and A. Gomez-Perez. ONTOMETRIC: A Method to Choose the
Appropriate Ontology. Journal of Database Management, 15(2):1 – 18, 2004.
12. A. Madche and S. Staab. Measuring similarity between ontologies. In Proc. of the
European Conf. on Knowledge Acquisition and Management, 2002.
13. G.A. Miller and W. G. Charles. Contextual Correlates of Semantic Similarity.
Language and Cognitive Processes, 6(1):1 – 28, 1991.
14. S. Mohammad and G. Hirst. Distributional Measures as Proxies for Semantic
Relatedness. Submitted for peer review.
15. M. Sabou, M. d’Aquin, and E. Motta. Exploring the Semantic Web as Background
Knowledge for Ontology Matching. Journal on Data Semantics, XI, 2008.
16. M. Sabou and J. Gracia. Spider: Bringing Non-Equivalence Mappings to OAEI.
In Proc. of the Third International Workshop on Ontology Matching, 2008.
17. M. Sabou, J. Gracia, S. Angeletou, M. d’Aquin, and E. Motta. Evaluating the
Semantic Web: A Task-based Approach. In Proc. of ISWC/ASWC, 2007.
18. W. van Hage, H. Kolb, and G. Schreiber. A Method for Learning Part-Whole
Relations. In Proc. of the 5th Int. Semantic Web Conf., 2006.