Author manuscript, published in "Electronic Business Interoperability : Concepts, Opportunities and Challenges, Ejub Kajan (Ed.)
(2011) 243-278"
Ivan Bedini
Orange Labs
France
Georges Gardarin
University of Versailles
France
Benjamin Nguyen
University of Versailles
France
hal-00623913, version 1 - 15 Sep 2011
ABSTRACT
In this chapter, we study what semantic technologies can bring to the e-business domain and how
they can be applied to it. After an overview of the goals to be achieved by e-business applications
we detail a large panel of existing e-business standards, with a specific focus on B2B (Business to
Business) and their current modus operandi. Furthermore we also present some of the most
relevant e-business ontologies. We then argue that the use of semantic technologies will simplify
the automatic management of many e-business partnerships. However the construction of
ontologies brings a new level of complexity that might be facilitated by automating the great part
of the generation process. For this we have developed the Janus system, which is a prototype to
help with the automatic derivation of ontologies from XML Schemas, the de-facto format adopted
in e-business standard applications. Differently from existing systems it permits to retrieve
automatically conceptual knowledge from large XML corpus sources and is based on the use of
the Semantic Data Model for Ontology (SDMO) whose advantages are presented in this chapter.
INTRODUCTION
Computer mediated networks play a central role in the evolution of Information Systems. For
example the sales application must interface with the inventory application or the inventory
application must connect to the supplier’s application, or the simple mobile calendar must
synchronize with the professional calendar; all the time, applications require efficient and
effortless integration with others. Nevertheless the integration of enterprises applications still
remains harder than it really should be. Enterprises are typically composed of several applications
that are custom built, acquired from third parties or a combination of both. Moreover it is not
uncommon to find an enterprise whose information is segmented between different instances of
enterprise software and countless departmental solutions. In consequence, the integration of these
application systems becomes a real challenge that requires considerable human effort, especially
if the final goal is to connect applications belonging to different enterprises. This last use case
refers to what is also called Business to Business (or simply B2B).
Communication between applications is mainly governed by standard protocols and
standardized content, as shown in the European e-business report (E-Business W@tch, 2007)
among different solutions applicable to e-business, at least three enterprises out of four that
implement business exchanges with partners, declare implementing applications standards
solutions based on these two technologies (in Europe). The advent of XML along with Web
hal-00623913, version 1 - 15 Sep 2011
Services, and more generically with the Service Oriented Architecture (SOA), has contributed
greatly to the development of such standards-based integration solutions. But the large adoption
of these technologies entails a new fragmentation in applications development. As a result
standardisation addresses only parts of the integration challenge. The frequent claim that XML is
the lingua franca for system integration is somewhat misleading; indeed this statement does not
imply common semantics and its adoption has led to the creation of countless dialects and
languages which cannot be understood and integrated directly by machines. This problem is
reflected in the many existing B2B standards that we present in this Chapter. The analysis we
provide is based on the observation of more than 40 of them.
Following this approach, professional exchange integration scenarios are based on a complete
transformation of business messages at design time. Although this model works and businesses
are able to exchange messages electronically, the effort to produce these standards appears too
high. Moreover, it would be impossible to write a standard specification for every possible
business communication. Especially for (smaller) firms who are unable to contribute to
standardization. For this reason Semantic Web-related technologies are well suited to integrate
the e-business architecture in order to fulfil the standardization approach and achieve the needed
flexibility.
Another aspect that we tackle in the Chapter is the automatic construction of top-level domain
ontologies. As asserted by Euzenat and Shvaiko (Euzenat et al., 2007), the importance of the
generation of such kind of knowledge is fundamental for the improvement of the alignment and
thus integration problem. However most solutions implicitly assume that a reference knowledge
exists in compatible format and semantics, but actually it is often inadequate for the application
domain or difficult to find, if it even exists at all.
To give a point of comparison, we also present the most adopted approach to e-business data
integration. Through this analysis we point out the current architecture limitations and explain
why ontologies are a better approach which leads to a gain in flexibility and dynamicity. In this
sense we provide an overview of schema matching and ontology alignment solutions and we
point out one of the current limitations to their broad adoption and provide a system that
facilitates, by automation, the transformation from the current model to the "next one": from
XML to OWL.
The overall outline of the Chapter is as follows: the first section introduces current e-business
approaches to data integration and we follow with the presentation of more than 40 existing
standards for the B2B and B2C domains. Following this introduction we focus on Semantic Web
related technologies applied to the e-business domain. In the survey we detail some of the most
relevant works related to product classification and we continue with a section focusing on
schema matching and ontology alignment solutions. The last section provides the description of a
system we have implemented to fulfil some of the current shortcomings. We conclude with what
we think to be the most important issues to be developed and provide some directions to follow.
E-BUSINESS SEMANTICS DESIGN
Three main patterns to achieve messages exchanges
To understand how the integration of messages in e-business exchanges works let us consider a
common transaction among a buyer and its supplier. Figure 1 shows the two parties with an
hal-00623913, version 1 - 15 Sep 2011
internal interface used by their "domestic" applications. These interfaces reflect exactly internal
data requirements at semantic and structural level and applications are designed or adapted using
these interfaces. As we argue below most businesses already use a different format, most often a
standard based solution, for their external connections, that we call external interface. This
interface organizes the internal data necessary to the exchange and produces a first conversion
handled by each party to reflect their own application data input/output. If these first conversions
do not correspond exactly, another conversion is required, this time defined accordingly by both
parties.
Figure 1 – Representation of message transformation scenario
We define this approach to e-business exchanges as the
adoption of standards pattern (mutualisation). Here business
requirements are provided by a collegial work defined in a
specific consortium. The realization is a common preliminary
effort that involves several parties, mainly experts of the
specific process and/or the whole domain. It has the
advantage of being a standard and thus of guaranteeing a
certain level of compatibility, durability and reuse of past
Figure 2 – Message content
experiences and knowledge. The resulting definition of
definition adopting standards
business data is a static knowledge representation that can be
changed only with further common effort. Negative points are that it requires a tremendous
standardization effort and quite often several standards coexist for the same requirements. Figure
3 illustrates how this business exchange pattern centralises efforts and makes this approach more
profitable with respect to others, but only in a theoretical perspective because it can become
complex when more standards come into the arena.
hal-00623913, version 1 - 15 Sep 2011
Figure 3 – Message content definition in ad hoc solution
Alternatively consider the ad-hoc or point-to-point approach, where external interfaces and
the corresponding mappings are defined multilaterally during the design time phase of the
collaboration in order to respect the information to exchange. This system shows some kind of
"flexibility", in the sense that it does not present specific constraints: a new design is made every
time. This flexibility on the other hand clearly shows a low degree of reusability and integration
with new partners. The left hand side of Figure 3 shows the mapping between interfaces of two
companies, while the right hand side of the picture highlights what happens when a company has
more business relationships to set up. Interfaces defined by this approach are rarely compliant
among different connections. Therefore the number of conversion needed to have a fully meshed
point-to-point connections between n companies is n(n-1). i.e. for 10 applications to be fully
integrated point-to-point, 90 conversions could be necessary.
Figure 4 – Message content definition according a proprietary solution
Another pattern is the proprietary data model; in this case external interfaces are decided
unilaterally. Typically this approach covers business collaborations with a main contractor in
cooperation with small businesses, such as a big retail group and its suppliers. In this case it is
simpler for the big company to take entire charge of the business requirements design, trying to
adopt the larger predictable requirement, because it often has the more complex system to
manage and to make interoperable with internal processes, while a little company uses a smaller
information system. Setting up such a solution is faster and does not require the complex
harmonization phase, but on the other hand partners who do not adopt the same solution are
forced to develop a new application layer to join the business collaboration. Figure 4 depicts this
business collaboration pattern and draws attention to the fact that there is a party that is forced to
produce mappings and application layers for each new collaboration.
hal-00623913, version 1 - 15 Sep 2011
e-business Standards
Enterprises do not currently publish their interfaces formally in public repositories, which made it
difficult to produce an explicit base of reusable documents. However as shown in the European ebusiness report (E-Business W@tch, 2007) at least three enterprises out of four that implement
business exchanges with partners, declare implementing applications based on e-business
standards solutions (in Europe). Another conclusion drawn by this report is that the difficulty
with e-business and e-government development is that they mainly work vertically by producing
connexions among enterprises belonging to the same business area. Indeed while interoperability
within industries, such as the financial industry, is intended to enable efficient e-business (with
The Single Euro Payments Area – SEPA as an example), interoperability between all industry
sectors for e-business, i.e. between financial institutions and their clients from other industries, is
not optimal. Corporations’ expectations and financial institutions’ demand for value-added
services will, however, continue to rise. This means that the interfaces between them are
becoming increasingly important. These interfaces have not yet been implemented in their final
form, and most of them have not even been defined in detail yet (in terms of standards). Here
developments in standardization can take place to reduce interoperability problems and to benefit
from world wide experiences, but it is hopeless to standardize any possible business
collaboration. Moreover the problem of finding, reusing, harmonizing and adapting the different
standard components is not trivial: until now it has been common practice, including among
standardization organizations, to simply publish business data on a Web page using directories or
even flat files!
Table 1 presents a list of 37 e-business standards, mainly targeting the B2B area. The data
provided by this set of standards is a considerable corpus that gives us a broad view about current
practices. The table lists: the name of the standard body or consortium; column two lists the
business areas that the standard covers; the alliances column informs about declared compatibility
coalitions, already active or expected to come; the fourth column summarizes what kind of
business content is produced by each standard body; the following column details the
formalization of published standards; the standards' downloads column provides the information
of their availability and adoption (public, under a payment, or only for member of the
consortium); the last column just provides a link. The table does not say if the consortium also
provides a specific implementation framework.
We have not inserted in this list the standard bodies that have been a priori excluded because
they are designed for too specific use case. Examples of the overly specific working groups are:
EDItEUR (the international group for electronic commerce in the book and serials sectors), BISG
(Book Industry Study Group) and EPISTLE (the European Process Industries STEP Technical
Liaison Executive), PRODML (Production Mark-up Language and WITSML (Wellsite
Information Transfer Standard Mark-up Language).
As we can see, a lot of business data is defined by standard bodies: a dictionary of core
components, whole messages, business processes, Web Service descriptions, code lists and
EDIFACT messages. In this chapter, only core components, often called Data Dictionary, and
hal-00623913, version 1 - 15 Sep 2011
messages have had our attention and were analysed in detail. Our study shows that XML Schema
is the most widely supported formalism adopted by consortiums and at present it is the de-facto
standard document format. It has overtaken other formats like the "old" EDIFACT and, at least
for the moment, the "new" RDF/OWL format. cXMLi is the only standard to provides simply a
DTD, and not a single RDF/OWL format is officially produced by any consortium.
A growing number of standard bodies are currently adopting the ebXML (e-business XML)
design as basis for their own standards and are aligning their business components to the Core
Components Library (CCL). Among them we can cite: OASIS Universal Business Language
(UBL), Open Applications Group (OAG), EAN-UCC, SWIFT, ANSI ASC X12 and CIDX.
ebXML is a joint effort of OASIS and UN/CEFACT that aims to develop a complete
framework for e-business. The library is prevalently developed by the UN/CEFACT standard
body that counts 15 specific working groups, each one representing a business area such as
Supply Chain, Transport Domain, Customs, Finance, Construction, Insurance, Healthcare,
Agriculture and e-Gov. Another specialised group provides a synchronization of the
documentation and specifications proposed by each group. It finalizes the work with a
harmonized library of the so called CCL, which are the basic components to build B2B messages.
Others groups also define standard business processes and technical implementations. The CCL is
drawn on the UN/CEFACT Core Component Technical Specification (UN/CEFACT TMG,
2003) that provides a simple and powerful UML based data model, to define reusable structure
and semantic content of business messages.
Concerning data presentation, almost all organizations provide a package containing several
documents. It includes specifications, graphics, examples, guidelines, implementation tutorials
and XSD files. Generally XSD files are numerous, at least one for each specific business
message, one for grouping common core components, others for grouping common data type
definitions and code lists. Only few of them provide a specific repository with a detailed view and
discovery system of data components.
B2B Standards' Semantics
In order to understand if XML Schema standards can be processed by semantic engines we have
developed an automaton that extracts all XSD tags and retrieves the words from them. The
automaton uses WordNet (Miller, 1995) to verify that tags are compound words that can be
converted to real words. Once processed, our corpus source is composed of a collection of 26
B2B standards, composed of over 3000 XSD files with more than 170.000 named tags. We feel
that this is largely enough in order to have significant information about B2B business message
description practices and semantics. Our results depicted in Figure 6 show that 71% of tags are
composed of words recognized by the dictionary, 14% contain abbreviations that can be related to
dictionary words, and only 15% of total tags contain unknown words. From the pie-chart we
observe that Mismo is the more prolific standard body, a few others provide between 5 and 10 %
each and around 30 % is shared between the remaining standards. Finally we found that the
whole set of tags is built with only ~3300 different words, that we call the e-business vocabulary.
Moreover we have observed that at semantic level, past a given point, adding more standards into
the process does not change much. This is proven by the experiment we conducted and results
shown in Figure 5. We can see that for both pictures, the line indicating the percentage of words
added by each standard is high only during the first few iterations; afterwards we have only about
5% of the extracted words that are added to the vocabulary.
120,00%
60
,2
2
58
,9
5
59
,2
1
100,00%
10
0,
00
89
%
,8
6%
10
0,
00
We conclude that this corpus can be considered as a basis for a deeper semantic approach in
order to generate the domain ontology. In sections below we provide reasons for using a semantic
approach for the e-business domain and we continue with a contribution to the automation of the
generation of an ontology from XML Schemas.
60,00%
2,
30
4, %
28
0, %
19
2, %
71
4, %
85
11%
,6
5%
1,
61
%
2,
69
%1
3,
92
%
1,
76
1, %
44
1, %
42
2, %
99
0, %
64
6,%
21
5, %
11
0, %
39
0, %
35
5, %
75
7, %
48
%
1,
58
1, %
88
%
20,00%
IFX
F pM
L
OTA
A co rd
Ca t
UBL
A rts
B ME
FIX
Ag XM
L
H R-X
ML
PID X
20 02
2
C ID X
ISO
GS 1
Mism
o
X 12
Twis
t
pap iN
et
edif ra
n ce
eInvo
E tso
0,00%
ice A
T
e bXM
L
S TA
R
OAG
Is
Ad sM
L
6,
74
12
,6
5
9,
26
9,
13
28
,1
1
40,00%
ST
AR
Ag
XM
L
MIS
MO
17
,1
8
ISO IFX
20
02
2
TW
IS
T
HR
-XM
eb
L
Int
er fa
ce
Ad
sM
L
eb
XM
L
Pa
piN
et
PID
X
6,
06
8,
31
7,
84
5,
69
13
,5
0
20
,8
3
21
,0
8
10
,9
4
FIX
AR
TS
Fp
ML
ET
SO
CI
DX
OT
A
X1
2
UB
L
OA
GI
S
AC
OR
D
GS
1
3,
13
11
,5
7
30
,2
5
28
,3
1
35
,9
0
%
hal-00623913, version 1 - 15 Sep 2011
45
,2
8
%
80,00%
Figure 5 – e-business vocabulary generation
FpML 2%
IFX 3%
Acord 3%
OTA 6%
UBL 1%
BME Cat 0%
X12 3%
eInvoice AT 0%
ebXML 7%
STAR 6%
OAGIs 3%
AdsML 1%
Etso 0%
Twist 1%
papi Net 2%
Abbreviations
24120 (14%)
EDI France 1%
Arts 8%
GS1 3%
Identified
Terms
121420 (71%)
HR-XML 8%
AgXML 1%
FIX 2%
CIDX 2%
Mismo 34%
ISO 20022 4%
PIDX 1%
Figure 6 – Standard XML Schemas extraction figures
Unknown Terms
25041 (15%)
hal-00623913, version 1 - 15 Sep 2011
Standard Body
Business Area
Alliances
What
Published
Formats
Standards'
Web Site
Downloads
ASC-X12, XBRL,
HR-XML, eEG7,
CSIO
Dictionary, messages
EDIFACT, XML
Schema, WSDL
registration
www.acord.org
Dictionary, messages
XML Schema
free
www.adsml.org
Dictionary, messages
XML Schema
membership
www.agxml.org
fees
1
ACORD
Association for Cooperative
Operations Research and
Development
Insurance, reinsurance
and related financial
service
2
AdsML
Advertising Standards
Advertising, Graphics
communication
3
AgXML
Agricolture XML
Agriculture supply chain
4
AIAG
Automotive Industry Action
Group
Automotive industry
5
ARTS
Association for Retail
Technology Standards
Retail
Dictionary, Relational Data
Model
XML Schema
payment
(exept for
schemas)
www.nrf-arts.org
6
ASC X12
The Accredited Standards
Committee
Cross industry
Dictionary, messages, EDIfact
messages, BP
EDI X12, XML
Schema
registration
www.x12.org/
7
BMECat
Federal Association for
Material Management,
Purchasing and Logistics
Electronic
Dictionary, Classification
XML Schema and
schemas, Product Configuration,
registration
DTD
price formulas
8
ChemITC
American Chemistry
Council’s Chemical
Information Technology
Center
Chemical
9
CIDX
Chemical Industry Data
Exchange
Chemical
Centre for Studies in
Insurance Operations
Insurance, reinsurance
and related financial
service
10 CSIO
11 ebInterface
ebXML, CIDX,
RAPID
membership
www.aiag.org
fees
www.americanchemistry.com
/s_chemITC/
ebXML, RAPID,
OAGi, ChemITC
Invoice
European forum for energy
Business Information
eXchange
Energy
13 ebXML
e-business XML
Multi area. 15 business
area represented. One
WG with harmonisation ISO
purposes and one for
BP definition
14 eEg7
Insurance, reinsurance
E-business Standards for the
and related financial
European Insurance Industry
service
12 EbIX
www.bmecat.org
Dictionary, Business Processes,
XML Schema
WSDL, RFID codes, messages
free
www.cidx.org
www.csio.com/
Invoice Document
XML Schema
free
www.ebinterface.at/
free
www.ebix.org
XML Schema and
Dictionary, Messages, code lists,
UML, EDIFACT, free
EDIFACT, methodologies
Spreadsheet
www.unece.org/cefact/
www.eeg7.org/
hal-00623913, version 1 - 15 Sep 2011
15 Energistics
Energy
Dictionary
registration
www.energistics.org
16 ETSO
European Transmission
System Operators
Specific electric
transaction
ebXML
Dictionary
XML Schema
free
www.etso-net.org
17 FIX
Financial Information
eXchange
Banks, broker-dealers,
exchanges and
institutional investors
SWIFT (ISO
20022), FpML
Framework with message
protocol, message definition,
codes and Dictionary
XML Schema
registration
fixprotocol.org
18 FpML
Financial Product Markup
Language
Financial
FIX, FIXML
Dictionary, Business Processes,
XML Based
architecture
registration
www.fpml.org/
19 GS1
Global Standards
Supply chain for
Healthcare, Defence,
Transport & Logistics
ebXML
Dictionary, Business Processes,
XML Based
Messages, SOAP Messages…
free
www.gs1.org/
20 HL7
Health Level 7
Health
free
www.hl7.org
21 HR-XML
Human Resources XML
Human Resource
ACORD
Dictionary
XML Schema
free
www.hr-xml.org
22 IFX
Interactive Financial
eXchange (IFX) Forum
Financial
Dictionary, Messages, Web
Services
XML Schema,
WSDL
registration
www.ifxforum.org/
XML Schema,
UML
payment
www.iso20022.org/
registration
www.mddl.org/
free
www.mismo.org
ISO 20022 Universal
23 ISO 20022 financial industry message
scheme
Financial
IFX, OAGi, TWIST Dictionary
24 MDDL
Market Data Definition
Language
25 MISMO
Mortgage Industry Standards Residential, commercial, IFX, ACORD, ASC
Dictionary
eMortgage
X12
Maintenance Organization
26 NAESB
North American Energy
Standards Board
Energy (Gas, electric)
27 OAGi
Open Application Group
integration Standard
Cross industry
28 Odette
Financial
Specific XML framework
XML Schema
membership
www.naesb.org/
fees
ebXML
Dictionary, Web Services,
Messages
XML Schema,
WSDL
registration
oagi.org
membership
www.odette.org
fees
Automotive industry
29 OTA
Open Travel Alliance
Turist
Dictionary, codes, messages
XML, Spreadsheet registration
www.opentravel.org/
30 PapiNet
Paper Industry Network
Paper Industry
Dictionary, messages
XML Schema
free
www.papinet.org/
31 PIDX
Petroleum Industry Data
Exchange
Energy (petroleum
industry)
ebXML
Dictionary, Web Services, Bar
XML, WSDL,
codes, EDI messages, Business
EDIFACT
Process
free
www.pidx.org
32 RAPID
Agricolture
CIDX
Dictionary, Messages, Code
lists, Bar codes
XML Schema,
EDIFACT
free
www.rapidnet.org/
33 RosettaNet
Supply Chain
Management, IT,
Telecommunication
GS1 US, ebXML
Dictionary, Business Processes
DTD, EDIFACT,
XML Schema
registration
www.rosettanet.org
hal-00623913, version 1 - 15 Sep 2011
34 STAR
Standards for Technology in
Automotive Retail
Automotive industry
OAGi, ebXML
35 TWIST
Transaction Workflow
Innovation Standards Team
Supply chain, payment
FpML, FIX, SWIFT Dictionary, Business Process
36 UBL
Universal Business
Language
Invoicing, ordering
37 XBRL
eXtensible Business
Reporting Language
Reporting, accounting
Dictionary, messages, Web
Services
XML Schema,
UML, WSDL
free
www.starstandard.org
XML Schema
free
www.twiststandards.org/
ebXML
Dictionary, messages, Business XML Schema,
Processes
UML, ebBP
free
www.oasis-open.org/
committees/tc_home.php?
wg_abbrev=ubl
UN/CEFACT,
CIDX
Dictionary, messages, formulas XML
free
www.xbrl.org/
Table 1 – B2B Standards
hal-00623913, version 1 - 15 Sep 2011
WHY CREATE E-BUSINESS ONTOLOGIES?
Current methods of business collaborations and relative architectures exhibit a common
characteristic of business data design: they are always pre-formatted to strict and precise
structures and semantics. These methods have the advantage of allowing error-safe execution
management but to the cost of a strong initial effort. We define this approach as the deterministic
method, although no module exists yet to resolve ambiguous situations due to similar, though
different design. Since the Semantic Web Vision (Berners-Lee et al., 2001) which is all about
machines being able to locate and process information on the World Wide Web without the need
for human intervention, the next step to transform a deterministic method to a more dynamic and
automated method, should be the adoption of semantic related technologies. However it is known
that adding new tools adds new complexities and new learning curves, so there needs to be a
concrete business benefit to justify the cost of implementation. Throughout this section we argue
why ontologies should be introduced to the e-business domain.
Firstly we observe that e-business provides an interesting use case for semantic applications
because by its nature it illustrates the problem of different designs and ways of structuring the
same set of concepts producing data heterogeneity problems. The deterministic approach prevents
any possible automation of data interpretation because machines are only called to execute code
and no data description is available for handling reasoning and inferences at run time, even for
simple mismatches. This is the consequence of an approach completely designed for human
understanding. Reasoning on this kind of data is impossible because of the intrinsic limits of its
definition.
How can we combine dissimilarities of semantics, information details, structure and also
cultural approaches in a comprehensive model? How can machines communicate between
themselves reducing human effort?
As we already mentioned the Semantic Web, and particularly ontologies, seem to achieve
good results within the last years. Several people have addressed the specific adoption of such
technologies for the e-business domain. Dieter Fensel in his book, Ontologies: Silver bullet for
knowledge management and electronic commerce (Fensel, 2001b), outlines the key differences
between ontologies and databases schemas which are more close to a “physical data model”.
Moreover he argues that the language for defining ontologies is syntactically and semantically
richer, by its own nature the ontology requires a consensus among several parties and as such it is
more similar to a domain theory rather than a data container.
The document Best Practices and Guidelines (Leger, 2002) focuses on applications of
Semantic Web for electronic commerce on the Internet, and defines a specific list of potential
benefits from its adoption. For instance, it details the development of efficient and profitable
Internet solutions, a meaningfully share of information, that provide a good basis to argue the
benefit of the integration of semantic technologies. At the same time, the authors identify critical
issues and research priorities to transform these potentials into real benefits.
In the paper Potential Advantages of Semantic Web for Internet Commerce, (Zhao, 2003) the
author provides a comprehensive list of twelve points on the potential benefits of adopting
Semantic Web in the domain. Among these twelve categories let us stress the possible
improvement in the integration of applications, information management, filtering of information,
the composition of complex systems, a more flexible standard vocabulary, and serendipity
(unexpected benefits).
Antony B. Coates in his talk (Coates, 2007) is more pessimistic and argues that the Semantic
Web vision still remains a long term goal, and this is the reason why businesses and standard
hal-00623913, version 1 - 15 Sep 2011
bodies still hesitate to introduce it. However he adds some factual reasons linked to the
limitations of current data models and how ontologies can already improve them in the short
term. For instance UML (Unified Modelling Language) is the most widely used modelling
technique in the domain. Indeed UML is intended as a general modelling approach because it
does not only propose data modelling, but also use cases, process flows, state diagrams and also
has an XML interchange format (XMI). However the interchange format has numerous versions
and different tools either use different versions, or use the same version in different ways (too
much flexibility in the format?). In consequence, interoperability is in fact rather difficult.
Another relevant limitation of UML is that for object-oriented reasons in some cases it requires
adding extra classes, which is fine for technical users but it is irrelevant and unnecessary in a
model designed to be used by business experts. This makes diagrams more complex and
confusing than they need to be. Take as an example, illustrated in Figure 7, an intended business
model like “vendor sells to company or government”, where UML forces the creation of common
“purchaser” parent class. OWL adds simplicity, when representing the same model, and allows us
to say that a Vendor sells to a “Company or Government”, without introducing a named parent
class
Figure 7 – Example of UML class diagram and correspondent OWL modelization
Also the UML tools' support for objects/instances (e.g. “a particular car, a particular person”)
is much weaker than RDF/OWL tools, and not really usable for constructing business context
models referencing particular countries, business areas, etc. Moreover when merging models,
RDF/OWL assertions are preserved and also enable detection of inconsistencies, while the UML
merging operation is completely a human task.
In (Anicic, 2005) the author defines an architecture based on Semantic Web technologies to
investigate enterprise application integration (EAI). As an example both enterprise applications
implement two correlated but independent standards for messages exchanges. One is Standards in
Automotive Retail (STAR) and the second is the Automotive Industry Action Group (AIAG) and
both base their interface on a more "horizontal" standard defined by the Open Application Group
(OAG). Their study shows that ontologies and reasoners improve the integration of message
exchanges between companies. Conversely, in their implementation the integration still requires
human intervention, since identification and resolution of semantic and syntactic similarities, is
done by hand.
hal-00623913, version 1 - 15 Sep 2011
Figure 8 – Traditional and Semantic Web-based EAI Standards Architectures
This experience and similarly the architecture presented in the B2Boom work (Kajan, 2009),
show how the semantic mediator improves interoperability problems between worldwide
enterprise applications. However the problem is still strongly related to the ontology
matching/alignment problem, and the need for a specific domain ontology which becomes the
new core question.
The Canonical Data Model
The book Enterprise Integration Patterns by Gregor Hohpe (Hohpe, 2003) clearly formalizes
problems with application integration. He provides an exhaustive list composed of 65 enterprise
integration patterns to be considered when building a system able to manage the whole process of
electronic business exchange. His approach is based on a messaging system. Focusing on those
patterns for data integration, Hohpe suggests different approaches to resolve the problem. One is
to share the same basis of data like using a shared database or adopting the same base of
documents between applications, but these patterns can be at most adopted within a single
company. A second approach is to build a messaging system that translates business documents,
called message translator, which is similar to the point-to-point approach presented above. Yet in
the same approach a complementary pattern suggests using a message mapper which tries to
conceptualize messages as business objects and thus more independent of application data. By
doing so, he adds a pattern including a Canonical Data Model in order to minimize dependencies
from different data formats. In this approach the Canonical Data Model provides an additional
level of indirection between applications' individual format, similar to a pivotal format, like a
"lingua franca" for information systems. This approach is somewhat a mix of the proprietary
approach with the adoption of standard approach seen above. In fact this approach is used by
many industry specific consortia (like PIDX for the petroleum industry, or XBIT for the book
industry) that produce a formal model specific to their use that must be adopted by all
collaborating partners.
In our approach we suggest adopting an ontology when building the specific B2B messages
canonical data model. More than a pivotal format, we want to construct reference background
knowledge to improve application integration on the basis of a message mapper pattern. This
hal-00623913, version 1 - 15 Sep 2011
approach is quite different from other experiences in the e-business domain, such as those
provided by Corcho et al. (Corcho, 2001) and by Hepp (Hepp, 2006), because it targets message
definition rather than a thesaurus like the eCl@ss ontology, since a message is not a well defined
hierarchical set of products. This means that matching messages is a more complex operation
because each message meets a specific action, which is not always the same for different
standards. In other words, in a heterogeneous environment we are not able to say beforehand if
the sending application has messages that correspond exactly to the receiver application
messages, in a one-to-one association, but we can make the hypothesis that the sender application
manages some “concepts” that are similar to those of the receiver application. In this context we
consider a new pattern based on a canonical data model developed as ontology that aims to
correlate these messages with common concepts. A procedure that performs such pattern is
shown in Figure 9 and is as follows: 1) detect what concepts the message conveys; 2) match them
with the canonical model; 3) find corresponding concepts in the target application data model; 4)
chose the messages that fit the requirement best and finally; 5) translate.
However one main problem here is the Canonical Data Model generation, which corresponds
to the development of a domain ontology, or at least a reference ontology common to the whole
B2B domain. The difficulty is that the classical development of this ontology is typically entirely
based on strong human participation, which is a long task, really similar to the realization of a big
standard and delves into a static knowledge representation. In the B2B context, where business
partners can join a collaboration on the fly, the Canonical Data Model should be able to integrate
new knowledge on the fly as well. In the following section we trace the requirements that such
knowledge representation should have to fit into the B2B domain well and complete its assigned
tasks in the pattern defined above.
Figure 9 – Messages translation procedure
Ontology Requirements
There are some general features that have to be respected when building an ontology,
independently of the application domain. For example Barry Smith in his paper (Smith, 2006)
examines the ISO 15926 upper ontology (Batres, 2005) and furnishes a series of principles to
follow when developing a reference ontology, of which we can mention: the principles of
intelligibility; openness; simplicity and re-use of available resources; coherence;
compositional, if two concepts are used to express a third concept, the formers must be included
hal-00623913, version 1 - 15 Sep 2011
into the ontology; singular nouns, the terms of an ontology should be formulated in the singular.
In his analysis he concludes that ISO 15926 is not an ontology because it does not follow any of
these principles and the result is just a coding scheme rather than an ontology.
In a general way we can summarize that ontologies glue together three important requirements
to consider when developing one:
• Ontologies aim at consensual knowledge, their development requires a cooperative process
and normally, for pragmatics reasons (e.g. limiting complexity and dimension) they are
restrained to a specific domain or application.
• Ontologies formalize semantics for information, consequently allowing information
processing by a computer.
• Ontologies implicitly use real-world semantics, which make it possible to link machine
tractable content with meaning for humans.
We next detail some requirements that we have added specifically for the B2B use case, but
they can fit other use cases as well.
Firstly the concept of dynamicity of an ontology for the e-business domain has been already
introduced (Fensel, 2001b) which states that "Ontologies must have a network architecture and
Ontologies must be dynamic". Also (Hepp, 2008b) sustains that otology must be able to grow
dynamically without "bustling" existing applications. From the NeOn project we also find the
concept of networked ontologies (Tran, 2007 and D'Aquin, 2008) where ontologies can be
distributed in a dynamic environment, like a peer to peer network, and applied to an e-business
integration use case. At the same time computational time for discovering the best matches
between several ontologies is expensive, therefore the techniques applied to match elements
should maintain previous discovered alignments and common uses in order to quickly recognize
similarities between concepts and to compute only new information. We capture these
characteristics in the dynamism attribute for a domain ontology. In reality an ontology is a static
knowledge representation. In current literature the ontology dynamic is strictly associated to
ontology evolution/versioning and has been investigated in several papers, like Noy et al. (Noy,
2004) which traces all possible changes that can take place in ontologies. However when dealing
with dynamic ontologies we closely refer to the generation process of the ontology and with its
capacity to introduce new knowledge interactively. To this end, the process should follow an
iterative approach, i.e., conceptual knowledge may be integrated in turn. One condition that the
ontology must respect in this case is the completeness criterion, which means that all matched
concepts must be represented in the ontology, even after a merging operation, and in the simpler
case where a concept has no conflict with other concepts it is simply added to the ontology.
Consequently an ontology is a dynamic characteristic of the domain, thus evolution should not be
equivalent to a classical versioning system, but more to a learning system, including a merge
operation without loss of information and backward compatibility. We call this feature the
dynamism of an ontology.
On top of these requirements, we want to be able to generate and enrich the domain ontology
as automatically as possible. Indeed, even in a specific field, the concepts handled by the
applications can be numerous and the quantity of information which we wish to maintain for each
concept is vast. Solely relying on human management could quickly become impossible: recall
that our example corpus size is thousands of XSD files and all the more concepts.
E-BUSINESS ONTOLOGIES
In this section we present some of the most representative works on e-business ontologies. We
focus on development efforts to produce either upper or domain ontology. Where we recall that
an upper ontology has the purpose to be a reference knowledge base for the whole domain and
thus be useful to induce mappings among concepts of two or more application ontologies, as
described by Guarino (Guarino, 1998). Moreover, as already mentioned above, we distinguish
two kinds of ontologies for the e-business domain: the first one is more related to e-commerce
applications and product description and categorization; while the second is closer to B2B
applications, where messages and semantics are more difficult to categorize in a sole
representation, as the multiple standards presented in Table 1.
hal-00623913, version 1 - 15 Sep 2011
Semantic Web for e-commerce
In the past years several research works have studied the integration of Semantic Web and ecommerce applications. The interest of this kind of semantic improvement for businesses is still
under-estimated. Indeed the generation of semantically annotated documents can greatly increase
the visibility of commercial products when searching on the Web. Traditional Search Engine
Optimization (SEO) tries to put on top of all search results a Web page that matches a keyword
best, but quite clearly, that can work only for one company. Well semantically annotated
document put businesses on top of Web visibility for people who are looking for more precise
products or services independently from the Web page itself. If data integration, thus applications
capable of exchanging information automatically, still requires a lot of effort and new elements
before achieving concrete adoption, the generation of linkable data on the Web requires a lower
investment with a probable earlier return of benefits.
To this end, the Web Ontology for e-commerce produced by Hepp (Hepp, 2008) provides a
complete framework to produce annotated Web pages in a simple manner. It is a good starting
point for businesses that are seeking an early semantic adoption. The framework is based on the
ontology derived from eClass and UNSPSC, namely eClassOWL (Hepp, 2008c) and the similar
ontology unspscOWL, which is awaiting copyright clearance. The so called GoodRelations
framework includes a language that can be used to describe business offers very precisely. It can
be used to create a small data package that describes products and their features and prices, stores
and opening hours, payment options and the like. The framework is also supported by: tools for
creating directly GoodRelations annotated data; plug-ins/Extensions for e-commerce software; a
tool that spots semantic inconsistencies in GoodRelations data beyond the axioms of the
ontology. The result is easy to use: all it takes is to paste the data package into the Web page
using W3C's RDFa format, as shown in Listing 1.
hal-00623913, version 1 - 15 Sep 2011
<!-- BEGIN: RDFa Meta-data for machines -->
<div xmlns="http://www.w3.org/1999/xhtml" xmlns:rdf="http://www.w3.org/1999/02/22rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
xmlns:eco="http://www.ebusiness-unibw.org/ontologies/eclass/5.1.4/#"
xmlns:gr="http://purl.org/goodrelations/v1#"
xmlns:owl="http://www.w3.org/2002/07/owl#" class="rdf2rdfa">
<div class="description" about="http://www.oettl.it/" typeof="owl:Ontology">
<div rel="owl:imports" resource="http://www.ebusinessunibw.org/ontologies/eclass/5.1.4/"></div>
<div rel="owl:imports" resource="http://purl.org/goodrelations/v1"></div>
<div property="rdfs:label" content="RDF/XML data for Techn. Business, based on
http://purl.org/goodrelations/" xml:lang="en"></div>
</div>
<div class="description" about="http://www.oettl.it/#BusinessEntity"
typeof="gr:BusinessEntity">
<div rel="gr:hasOpeningHoursSpecification">
<div class="description"
about="http://www.oettl.it/#OpeningHoursSpecification_Sat_am"
typeof="gr:OpeningHoursSpecification">
<div property="gr:closes" content="12:00:00" datatype="xsd:time"></div>
<div rel="gr:hasOpeningHoursDayOfWeek"
resource="http://purl.org/goodrelations/v1#Saturday"></div>
<div property="gr:opens" content="08:00:00" datatype="xsd:time"></div>
</div>
</div>
<div rel="gr:hasOpeningHoursSpecification">
<div class="description"
about="http://www.oettl.it/#OpeningHoursSpecification_Mon-Fr_pm"
typeof="gr:OpeningHoursSpecification">
<div property="gr:closes" content="18:00:00" datatype="xsd:time"></div>
<div rel="gr:hasOpeningHoursDayOfWeek"
resource="http://purl.org/goodrelations/v1#Thursday"></div>
<div rel="gr:hasOpeningHoursDayOfWeek"
resource="http://purl.org/goodrelations/v1#Wednesday"></div>
<div rel="gr:hasOpeningHoursDayOfWeek"
resource="http://purl.org/goodrelations/v1#Monday"></div>
<div property="gr:opens" content="13:00:00" datatype="xsd:time"></div>
</div>
</div>
...
Listing 1 – Example of GoodRelations RDFa Web page annotation
B2B Ontologies
Conversely from e-commerce applications, in the B2B domain the higher complexity leaves
Semantic Web adoption one step behind. In this specific context semantic systems still have
difficulties to completely satisfy the requirements and the construction of an adequate domain
ontology. In this section we present the most relevant works that have been developed to breach
this gap. Among them, we can find some common points like: i) similarly to e-commerce
ontologies, all of them are developed starting from existing standards; ii) except the Ontolog
Community with the UBL Ontology Project, all others develop a direct transformation from the
XSD format to an ontology language, mainly OWL; iii) B2B ontologies are used to improve
matching and discovery of heterogeneous definition of similar concepts, but none of them
continue to use ontologies as a message exchange formalism directly; iv) all these B2B ontologies
are in a proof of concept phase or ongoing works, but as far as we know, no real business
transactions are formalised with the help of ontology adoption yet; v) the generated ontologies are
applicable to only a specific set of input sources, strictly related to the selected standard. Only the
SET ontology tries to develop a more generic reference model, but still too close to the standards
related to the CCTS model (UN/CEFACT, 2003). This last work confirms our idea expressed
above that the ebXML standard is gathering the largest consensus and this is naturally reflected in
the produced ontologies. Below we present the ontologies derived from the UBL, XBRL,
RosettaNet, ebXML, GS1 and OAGi standards
hal-00623913, version 1 - 15 Sep 2011
UBL Ontologies
The Ontolog Community UBL Ontology Project ii started the design of the UBL ontology in
March 2003. The aim of the project was to develop a formal ontology of the UBL Business
Information Entities as defined by the UBL OASIS technical committee. The ontology is mainly
hand made following the Ontology 101 method (Noy, 2001) and conceived as extensions of the
Suggested Upper Merged Ontology (SUMO) (Niles, 2001). They started formalizing UBL terms
in SUO-KIF (SUO Working Group, 2003) extracting nouns and verbs from a UBL specification
source text, then looked for classes in SUMO for the nouns and verbs extracted and finally
mapped related terms as being either equal, subsuming or instance of. Figure 10 shows a view of
the UBL ontology using Protégé editor.
Figure 10 – Ontolog Community UBL Ontology view
Figure 11 – Proposed UBL Component Ontology
hal-00623913, version 1 - 15 Sep 2011
Another experience targeting UBL Ontology has been developed by Yarimagan and Dogac
(Yarimagan, 2008) from the Middle East Technical University. The so called UBL Component
Ontologyiii is generated automatically by a conversion tool that reads UBL schemas and creates
corresponding class, object properties and existential restriction definitions in OWL.
The Component Ontology template, shown in Figure 11, represents relationships between
entities, types and business concepts. Each xsd:ComplexType and xsd:element declaration is a
corresponding subclass under DataType, TypeDefinition, ElementDeclaration and Concept root
classes of the Component Ontology. Every UBL element represents a unique business concept or
an entity. This allows the definition of multiple elements representing the same business
concept/entity and their correspondence is expressed through their relation to the same Concept
class.
Classes are related to each other through object properties where: Basic UBL types are defined
through extending simple data types such as text, integer, date; the referElement object property
represents the relationship between classes representing UBL aggregate types that refer to a
similar set of elements; the isOfType object property represents the relationship between classes
representing type definitions and element declarations; finally, the representConcept object
property allows the definition of multiple elements that represent identical business concepts and
relate element declaration classes to corresponding business concept classes. Listing 2 shows an
example of the ContactParty concept expressed in OWL following the UBL Component
Ontology representation.
XBRL Ontology Initiative
XBRL is a standard that formalizes financial reports. XBRL is used to define the so called XBRL
taxonomies, which provide the elements that are used to describe information, instances, and give
the real content of the elements defined. Ruben Lara et al. in (Lara, 2006) advocated the use of
OWL as an alternative to XBRL and produced a set of OWL files able to describe DGIiv, ES-BEFSv and IPPvi taxonomies. For this they have developed a generic translation process of XBRL
taxonomies into OWL ontologiesvii so that existing and future taxonomies can be easily converted
into OWL ontologies following the transformation rules defined in Table 2.
The conclusion was that extensions to OWL are required in order to fulfil all the requirements
of financial information reporting, to incorporate mathematical relations and that while its
semantics can be appropriate (e.g. for investment funds classification), they could sometimes be
problematic (e.g. for validation purposes). Finally they validate the adoption of such an ontology
to automate and improve the classification and discovery of funds but do not use them as a formal
format for data exchange.
Parsed taxonomy element
XML complex types
XBRL Tuples XBRL items
XLink links
XBRL Contexts
XBRL units
Root OWL class
DGI ComplexType
Direct OWL subclasses
A subclass for each complex
type
DGI Element
DGI Tuple DGI Item
DGI Link
DGI LabelLink DGI
PresentationLink DGI
CalculationLink
Context (range of properties Subclasses of
is subclass of
ContextElement:
ContextElement)
ContextEntity
ContextEntityElement
(Identifier) ContextPeriod
ContextScenario
Unit (range of properties is Subclass of UnitElement:
subclass of UnitElement)
UnitMeasure
hal-00623913, version 1 - 15 Sep 2011
Table 2 – Summary of parsed taxonomy element translations
<owl:Class rdf:about=" urn:ubl:CAC-2#ContactParty">
<owl:equivalentClass>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<owl:Restriction>
<owl:someValuesFrom rdf:resource="#ContactPartyConcept"/>
<owl:onProperty>
<owl:ObjectProperty rdf:about="#representConcept"/>
</owl:onProperty>
</owl:Restriction>
<owl:Restriction>
<owl:someValuesFrom rdf:resource=" urn:ubl:CAC-2#PartyType"/>
<owl:onProperty>
<owl:ObjectProperty rdf:ID="isOfType"/>
</owl:onProperty>
</owl:Restriction>
<owl:Class rdf:about="#ElementDeclaration"/>
</owl:intersectionOf>
</owl:Class>
</owl:equivalentClass>
</owl:Class>
<owl:Class rdf:about="urn:ubl:CAC-2#PartyType">
<owl:equivalentClass>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<owl:Restriction>
<owl:someValuesFrom>
<owl:Class>
<owl:intersectionOf rdf:parseType="Collection">
<owl:Class rdf:about="urn:ubl:CBC-2#WebsiteURI"/>
<owl:Class rdf:about="urn:ubl:CBC-2#EndpointID"/>
<owl:Class rdf:about="urn:ubl:CAC-2#PartyIdentification"/>
<owl:Class rdf:about="urn:ubl:CAC-2#PartyName"/>
<owl:Class rdf:about="urn:ubl:CAC-2#Language"/>
<owl:Class rdf:about="urn:ubl:CAC-2#PostalAddress"/>
<owl:Class rdf:about="urn:ubl:CAC-2#PhysicalLocation"/>
<owl:Class rdf:about="urn:ubl:CAC-2#Contact"/>
<owl:Class rdf:about="urn:ubl:CAC-2#Person"/>
<owl:Class rdf:about="urn:ubl:CAC-2#AgentParty"/>
</owl:intersectionOf>
</owl:Class>
</owl:someValuesFrom>
<owl:onProperty>
<owl:ObjectProperty rdf:about="#referElement"/>
</owl:onProperty>
</owl:Restriction>
<owl:Class rdf:about="#TypeDefinition"/>
</owl:intersectionOf>
</owl:Class>
</owl:equivalentClass>
</owl:Class>
Listing 2 – Excerpt of the UBL Component Ontology
RosettaNet Ontology
Armin Haller et al. (Haller, 2008) developed a Web Service Modeling Ontology
(WSMO) (Lausen, 2005) core ontology expressed in the WSML (De Bruijn, 2005) formal
language for the Supply Chain Management based on the RosettaNet standard. The process of
developing a complete Supply Chain ontology from RosettaNet schemas is carried out in two
steps: i) the core ontology is obtained by a direct translation from XSD to WSML including a
reconciliation phase to hierarchically structure the ontology and to add a proper subsumption
hierarchy; ii) RosettaNet specifications are analysed to identify remaining sources of
heterogeneity in order to model and reference richly axiomatised ontologies, forming the outer
layer in our ontological framework. As the previous experience they defined a set of rules from
the XML representation to the selected ontology language, Listing 3 shows an example of such
mapping from the XML extension element to its corresponding WSML formalism.
<xs:complexContent>
<xs:extension base="uat:IdentifierType">
<xs:sequence>
<xs:element name="ProductName" type="xs:string" minOccurs="0">
<xs:element name="Revision" type="xs:string" minOccurs="0">
</xs:sequence>
</xs:extension>
</xs:complexContent>
hasIdentifierType ofType extIdentifierType
hal-00623913, version 1 - 15 Sep 2011
concept extIdentifierType subConceptOf uat#IdentifierType
ProductName ofType (0 1) _string
Revision ofType (0 1) _string
Listing 3 – Example of Complex extension type mapping to WSML
Authors argued that their ontology is able to resolve most of the heterogeneity problems
between different RosettaNet implementations that are not structurally and semantically covered
by the RosettaNet specification.
The SET Harmonized Ontology
The SET Harmonized Ontology is an initiative of the OASIS Semantic Support for Electronic
Business Document Interoperability (SET) Technical Committeeviii. The purpose of this SET TC
deliverable (Dogac, 2009) is to provide standard semantic representations of electronic document
artefacts based on UN/CEFACT Core Component Technical Specification (CCTS)
(UN/CEFACT, 2003) and hence to facilitate the development of tools to support semantic
interoperability. The basic idea is to explicit the semantic information that is already given both in
the CCTS and the CCTS based document standards in a standard way to make this information
available for automated document interoperability tool support.
The resulting ontologyix provided by Asuman and Kabak is currently the most valuable effort
in describing an upper ontology for the real B2B domain. The SET Harmonized Ontology
contains about 4758 Named OWL Classes and 16122 Restriction Definitions. Their approach is a
semi-automatic derivation of an ontology from the business data components defined by OAGIS,
GS1, UBL and UN/CEFACT CCL, which are all B2B standards based on the CCTS
specification. Another point of interest is that it is one of the rare experiences applying a strong
adoption of Semantic technologies, like DL reasoners, SPARQL, OWL and OWL queries to
derive a harmonized ontology. This can be viewed as similar to a merging operation.
Without delving into details Figure 12 shows an overview of the SET upper ontology. The
overall process to get the harmonized ontology is as follows: i) first specify an upper ontology,
which is an OWL description of the CCTS specification; ii) transform input source documents
into schema ontologies, which are afterwards mapped manually to the defined upper ontology
format and thus automatically transformed to OWL compliant files; iii) define four normative
upper ontologies, one for each of the UBL, GS1 and OAGIS® 9.1 standards separately, while the
UN/CEFACT CCL is considered as upper ontology of reference. While creating these ontologies,
hal-00623913, version 1 - 15 Sep 2011
the relations with the CCTS upper ontology classes are also established. Finally, with the help of
additional heuristics, using a Description Logics (DL) reasoner, a Harmonized Ontology is
computed.
The resulting ontology and heuristics enable the discovery of equivalences and subsumptions
of structurally similar document artefacts between two document schemas. When translating such
document artefacts, automatically generated XSLT rules are used, that produce query templates
(SPARQL and Reasoner based queries) to facilitate the discovery and reuse of document
components.
The advantage of this approach is twofold. Firstly it shows the powerful benefits of semantic
technologies. Even with a more complex syntax description, a reasoner is able to autonomously
discover several useful subsumptions and equivalences. It also shows that it is possible to provide
a first real normative upper ontology formalization that could lead to a new era of B2B standard
ontologies development.
However a strong and somewhat limitative hypothesis is that input sources must be compliant
with the CCTS specification. This is not applicable to the whole domain and thus prevents a
larger adoption of this solution. It is also unclear how the different semantics of input elements
are matched. For example, as presented in Figure 13, it is not clear how the NameAndAddress
class has been associated to the owl Address class. For instance an automatic matcher should have
to choose between the classes Name and Address, which is not the case in the resulting ontology.
Another example is the detection of the semantic equivalence between Postal_zone and Postcode,
which is not explained.
To conclude, this approach also lacks the definition of a semantic matcher and we argue that
the integration of such a module could improve resulting correspondences and help with possible
ambiguities.
Figure 12 – An Overview of SET Upper Ontologies and Document Schema Ontologies
hal-00623913, version 1 - 15 Sep 2011
Figure 13 – The Semantic Equivalences among the BBIEs of UBL-Address, CCL-Structured
Address and GS1-NameAndAddress Discovered through the Harmonized Ontology
JANUS: AUTOMATIC ONTOLOGY BUILDING SYSTEM FROM XML
SCHEMAS
Over the past ten years, the Semantic Web wave has shown a new vision of ontology use for
application integration systems. Researchers have produced several software tools for building
ontologies (like Protégé or OntoEdit) and merging them two by two (like FCA Merge or Prompt)
or producing alignments (like S-Match, OLA, Mafra, H-MATCH, COMA). Nevertheless these
solutions, as well as adopted ontology building methodologies, are mainly human driven or
sometimes assisted by semi-automatic software tools. Furthermore, all of them make reference to
either an upper or domain ontology to improve the run-time automatic matching that often is
inadequate, if it exists at all.
Limitations to their adoption for integration of enterprise applications, among others reasons,
are: (i) the lack of tools capable of extracting and acquiring information from a large collection of
XML files (the “de-facto” format for applications information exchange definition); (ii) the
complexity of aligning and merging more than two sources, a complex task excessively
consuming of computational time; (iii) the difficulty of validation based on background
knowledge hard to produce and maintain.
The aim of this section is to introduce Janus, the software that we have developed. This
system is an implementation of our approach to ontology generation integrating SDMO, a
Semantic Data Model for Ontology, extracting information from XML Schemas and capable of
providing a solution to the limitations described above. Indeed as we show with our experimental
results, it is able to automatically generate and maintain a collective memory resource that
facilitates the discovery of alignments when matching concepts in a given domain with
satisfactory results.
The section is outlined as follows. Firstly we analyse the matching problem as it is seen by
systems aiming the integration of data. As consequence of the shortcomings of the studied
architectures we propose a semantic data model as solution to solve the multiple inputs
integration problem. We finish with the overall presentation of our prototype.
hal-00623913, version 1 - 15 Sep 2011
The Matching Problem
Even when input sources are either well formed ontologies or XML Schemas, definitions can be
similar but also heterogeneous, semantics different, and thus the discovery of correspondences is
probably the most basic, and at the same time the most challenging task that must be conducted.
The matching problem is often related to ontology learning and matching and it has been
largely investigated in literature. Among them, we can cite the paper by Mehrnoush and
Abdollahzadeh (Mehrnoush, 2003) which proposes a complete framework for classifying and
comparing ontology learning systems. The authors propose six main categories (called
dimensions) as follows: elements learned (concepts, relations, axioms, rules, instances, syntactic
categories and thematic roles); starting point (prior knowledge and the type and language of
input), pre-processing (linguistic processing such as deep understanding or shallow text
processing); learning methods including also an evaluation about the degree of automation
(manual, semi-automatic, cooperative, full automatic); the result (ontology vs. intermediate
structures and in the first case the features of the built ontology such as coverage degree, usage or
purpose, content type, structure and topology and representation language); and finally
evaluation methods (evaluating the learning methods or evaluating the resulted ontology).
We share the most part of the conclusion of their analysis, especially regarding the importance
of input sources, which of course are essential to the automation process and highly influence the
result of the final learned ontology. In fact ontology learning systems extract their knowledge of
interest from inputs, which can differ by type and language (e.g., English, German or French).
Types can be structured data like already existing ontologies, some schemata or lexical
semantic nets such as WordNet. Other sources for ontology learning systems are semi-structured
data such as dictionaries, HTML and XML schemas and DTDs (document type definitions),
which probably constitutes in the Web environment the most hot topic today. Finally, the most
difficult type of input from which to extract ontological knowledge are the unstructured ones
(e.g., free text). Tools that learn ontologies from natural language exploit the interacting
constraints on the various language levels (from morphology to pragmatics and background
knowledge) in order to discover new concepts and stipulate relationships between concepts
(Aussenac, 2002). Finally the authors of (Mehrnoush, 2003) assert that the first two kinds of input
data are more appropriate to build ontologies for the Semantic Web, thus with DL implications,
while the latter is more adapted to build more general lexicons such as taxonomies or dictionaries.
They also identify some open problems to be considered to improve the field, in particular:
(i) the way to evaluate ontology learning systems, currently evaluated only on the basis of their
final results; no measure is defined for specific parts of the learning process proving the accuracy,
efficiency, and completeness of the built ontology. (ii) Full automation of ontology learning
process is not described yet and integrating successful modules to build complete autonomous
systems may eliminate their weaknesses and intensify their strengths. (iii) At last, moving toward
flexible neutral ontology learning method may eliminate the need for reconstruction of the
learning system for new environments.
Moving forward the automation process to enter in more technical surveys, in (Buitelar, 2005)
authors provide a comprehensive tutorial and an overview on learning ontology from text. Rahm
hal-00623913, version 1 - 15 Sep 2011
et al. (Rahm, 2001) present an overview on techniques used for the schema matching automation.
Euzenat et al. in (Euzenat, 2004) provide a detailed overview and classifications of techniques
used for ontology alignment and a state of the art on existing systems for ontology
matching/alignment, probably the best known software at present. From the book Ontology
Matching by Euzenat and Shvaiko (Euzenat, 2007), which probably represents the most complete
work in the current literature around the matching theme, not only techniques by also theoretical
aspects and definitions involved into the matching process as well as their evaluation measures
are presented. As last, let us cite the survey presented by Castano et al. (Castano, 2007), which
provides a comprehensive and easily understandable classification of techniques and different
views of existing tools for ontology matching and coordination.
Moreover into the area of data and knowledge management we can find interesting surveys in
(Do, 2002; Doan, 2002; Ehrig, 2004) and still more focused on semantic integration in (Noy,
2004b; Shvaiko, 2005).
All these works provide a real detailed overview of the matching problem, ontology
generation tools and aspects of possible automation, at least for some specific tasks. As such, it is
not the scope of this chapter to provide an overview of them. Indeed, even if the frontier between
matching and generation tools is not always clearly definable, we can say that except the first one,
all referenced papers mainly focus on the matching step but do not cover the whole ontology
automation process, that is finally what we target with the system we have implemented. We can
also add that the matching problem is probably the most challenging part and this is the reason
why we analyse it more deeply below.
Known Matching Features
Classical matching approaches lack efficiency. This can be explained by three main reasons: (i)
the algorithm computational complexity order; (ii) the fact that algorithms compute measures
between every couple of items of ontologies to map, even when they do not have anything in
common; (iii) the lack of memorization: a comparison is done every time two items are met,
regardless of what has already been calculated.
As we can see from existing works, many researchers in the Semantic Web and Knowledge
Engineering communities agree that discovering correspondences between terms in different sets
of elements is a crucial problem. Sometimes two ontologies refer to similar or related topics but
do not have a common vocabulary, although many terms they contain are related. So this
complex task requires the application of several algorithms (each algorithm realizes at least a
matching operation) and once again we lose efficiency. Consider looking for correspondences
between sets of elements more complex than that presented in the example above: Figure 14
illustrates a non exhaustive list of possible mismatches that can be established between the
definitions of a same high level concept expressed in XML Schema format. For instance the
example shows two different vision of the concept address as defined by two B2B standards,
OAGIS and Papinet. It is clear that although both of these standards are based on the "upper"
standard UN/CEFACT CCTS, there are considerable differences in the resulting document
fragments. This illustrates why we need more than one algorithm to discover possible similarities
between two sets of elements. To this end we provide a first classification of the nature of these
algorithms categories: syntactic, semantic, and structural. A good process for matching discovery
should cover at least these three categories and also implement a combination of them in order to
improve results.
hal-00623913, version 1 - 15 Sep 2011
Figure 14 – Example of possible mismatchings between two XML Schemas definitions
The Matching Process
As already mentioned above matching problems can be approached from various standpoints and
this fact is reflected by the variety of the definitions that have been proposed in the literature. We
observe that there are some recurring terms often leading to confusion and thus producing
overlaps on the process definition. Learning, matching, anchoring, alignment, transformation,
mapping and merging are almost used to this purpose. Figure 15 proposes a view about the role
and sequence that each of these common terms play in the ontology "life-cycle" process.
The Learning phase aims to extract knowledge information from sources handling their
different representations. As output it provides a formal representation, sometimes an ontological
view of inputs. From here we assume that we have two or more input ontologies. This term often
refers to a larger operation that comprises the final ontology generation, but we prefer to use this
term just to highlight the fact that ontological knowledge is mainly retrieved, thus learnt, at this
stage of the process. The Matching phase realises similarity detections between input entities
executing one or more algorithms. As described previously, the "matcher" (the application
realising this phase) computes the algorithms for each couple of input entities and provides as
output a list of the best matches found, selected on the basis of parameters. The following
Alignment phase tries to select the best set of correspondences between all those provided by the
matcher. It permits to combine the different similarity algorithms executed previously and to
provide a uniform view of correspondences, normally without inconsistencies. At this stage the
match can be also contextualized, choosing a match rather than another because of heuristic
practices or if an existing upper ontology for the concerned domain suggests so.
Finally, depending on the purpose, alignments can be used to merge input ontologies
(Merging phase) or to transform instances of an ontology into another (Mapping phase).
hal-00623913, version 1 - 15 Sep 2011
Figure 15 – Ontology learning, matching, alignment, mapping and merging phases
This disambiguation enables us to situate the problem that we want to address well.
The Matching process considers only the matching phase described above. In our analysis we
argue that this is a core part that: i) mainly contributes to the computation time and; ii) is the most
generic and thus reusable part. These are the main reasons that conducted us to look for a scalable
solution to improve the whole ontology generation process in this phase.
Figure 16 – Matching process details
As shown in Figure 16 the matching phase can be split in different steps. The Retrieve step
takes as input information extracted from sources, and transforms this knowledge in an internal
ontology matching format, sometimes called reference model. In its simpler form it is a list of
terms representing semantics of input entities, and in other cases it can be a more complex Galois
lattice representation like in (Stumme, 2001). Subsequently the Match step is able to execute
similarity algorithms and Formalizes results with a correspondent confidence value for each
match found. Some algorithms, like synonymy detection, can also require external resources
(e.g.: WordNet or electronic dictionaries). Thresholds and some heuristic are used in the Prune
step to filter sets of matches. Techniques for matching sources are really numerous and the survey
published in (Euzenat, 2007) is a good reference for discover and compare them.
The Semantic Data Model for Ontology
In this section, we describe the Semantic Data Model for Ontologies (SDMO) defined to provide
an organized model to record as much knowledge as possible for matching systems. The goal is
improving the concept correspondences similarity detection. The improvement that we target with
this model is the machine capability to recognise similar concepts faster, on the basis of their
hal-00623913, version 1 - 15 Sep 2011
relationships and consequently the ability to adopt more efficient algorithms to refine mappings,
thus overcoming the matching problem seen above.
The basic representation of SDMO is data about concepts and relationships. Such objectbased modelling allows a high level of data definition independent from the different
representations. A second basic precept of our model is that many relationships are functional
like they are in nature. These functional relationships are often called has attribute in models like
the Relational Model and Entity-Relationships, or functional property in OWL. In our model
these relations are part of the set of what we call structural relationships which also provide
hierarchical mechanisms for building object types out of other object types. For example, address
and postal address that might be the aggregation of street, city, and country.
A third basic precept is the semantic relationship, which specifies the fact that some concepts
share a common meaning, like synonyms.
A fourth basic element of the model is the set of syntax or linguistic relationships. The aim of
this kind of concept relations is to maintain the link among concepts sharing a similar name, like
postcode and postal code attributes, or names sharing the same stem. This kind of relation brings
us more inside the characteristics that we want to give to the model. These are not natural human
precepts that we find in other models for the real-world representation, but rather a natural feature
for matchers, needed to compute an operation.
The fifth and final basic element is a link to the original input. A matcher usually normalizes
initial labels and during this operation some little details can be lost; yet it is important to
maintain the link with the source in order to be able to regain the original context or to produce a
mapping. In our model these relations are part of the set called source relationships.
!
Figure 17 – SDMO Concept relationships overview
Figure 17 shows the overall view of SDMO concept relationships. A SDMO concept is the
constituent entity of the model and is defined as a quadruple:
c = < l, R, S, f >
Where:
• l is a set of words, simple or compounds, that best represents the name of the concept.
Among them we also define a preferred label as the best representative label as concept
•
•
hal-00623913, version 1 - 15 Sep 2011
•
name (e.g.: having equivalent concepts named geographical_coordinate and coordinate,
they can be merged to form the same concept and the final name can be one of them)
R is the set of relations between concepts (all seen above)
S for Source, is the set of originating instances of a concept (not to be confused with
instances as individuals in OWL representations)
f is a frequency and/or rank measure
Moreover, similarly to UML and many other models, in SDMO we defined three basic kinds
of concepts, also called nature of the concepts, but a concept can be of more kinds at the same
time or change all over its "life in the model". No mandatory relationships are required
beforehand for a concept, but depending on them, we can determine dynamically its nature. These
three types are: class, property (or attribute) and printable-type.
The main concept type is called class and corresponds intuitively to non atomic concepts, thus
to concepts characterised by a finite set of attributes. The second basic nature of a concept is the
property (or attribute). It represents either a specific and atomic characteristic of a class or also a
role that semantically redefines another concept class, like an UML association (e.g. address that
becomes a residence for a person or a delivery address in another context). The former typically
corresponds to concepts in the world (of data exchange) that have no underlying structure. Simple
examples are first name and last name of a person, or city name, etc. The latter and most basic
concept type in the SDMO structure is the printable type. This kind of concept can be also
considered as the type that serves as the basis for application inputs and outputs. It can be a
conventional basic type, such as string or integer or a more complex representation of a printable
data type like measure, amount, or text that in turn are directly linked to basic types.
We stress out the fact that a concept can be of different types at the same time, they are not
strictly closed to be of only one nature at once, but depending on their behaviours they can be
seen for example as a class or a property. For instance a class property SDMO concept is
allowed and is a non atomic concept, thus a class, which is also property for another concept
class.
We have also defined a SDMO graphical representation that provides a global view of
concepts organization with their relationships. Figure 18 illustrates the graphical syntax we use to
describe a SDMO schema.
hal-00623913, version 1 - 15 Sep 2011
Figure 18 – SDM Graphical Representation
Implementation
Janus is a system that enables the automatic generation of dynamic ontologies from XML
Schemas. It is an implementation of the system described throughout previous Sections. Figure
19 shows the overall architecture of Janus.
Figure 19 – Janus overall architecture
The extraction task represented by the Extract arrow and Normalize rectangle supplies the
knowledge needed to generate the ontology. This knowledge is merely composed by candidate
concepts, properties, printable types, relationships of different nature and at the same time it
contains counters and ranks for each element. Implemented techniques for knowledge acquisition
are a combination of different types, such as: NLP (Natural Language Process) for morphological
and lexical analysis, association mining for calculating term frequencies and association rules,
semantics for finding synonymy, and clustering for grouping semantic and structural similar
concepts. We call XML Mining the adaptation of these techniques applied to XML schemas.
hal-00623913, version 1 - 15 Sep 2011
XML Mining is used to parse sources to extract XML constructs and to process XML tags
declarations. In addition it also includes a pre-matching treatment that aims to mutualise
element's processing that are clustered in a Galois Lattice and Formal Concept Analysis based
form. This treatment provides as output a pre filled model ready for automatic analysis.
The following step is build semantic network represented by the corresponding block. This
step finalizes the model integrating information coming from external sources, like other existing
ontologies or thesaurus. Moreover at this stage we do not look at similar concepts to be merged,
but only execute matching algorithms to collect as much correspondences as possible among
them. All these connections are stored and maintained in the model in order to be quickly
detected and not recalculated in future integrations.
The Analysis step aligns correspondences and looks for equivalent concepts to be integrated.
This step establishes the best similarities and analyses the model to unveil new possible relations
and correspondences not directly detected by matching algorithms and computes frequency and
rank measures.
The Generation step finalizes the meta-model used by the tool into a final semantic network.
The final model can be serialized in OWL built by the Transform module. Finally the Build
Views module derives useful views from the network provided to users.
The implementation phase of the prototype has been more complex than expected in the
beginning and this for a lot of more or less little problems we met. Problems generally were not
directly linked to the system approach but more of a technical nature. Like the lack of matching
API adequate to our scope, the lack of software capable of extracting information from XML
schemas rather than text corpus or OWL and last but not least the lack of reference ontologies for
tests and developments. Despite these numerous problems we have been capable of validating the
initial hypothesis that the model we designed to maintain a sort of memory of concepts
correspondences is realisable and its implementation is scalable. It can manage large input
sources and new sources can be added incrementally. Current problems are more linked to
implementation issues and a good compromise between storage and real time requirements can
resolve the most part of them. In the first case if we target a system with low physical space
requirement we can store only information extracted. Conversely if we target run time
applications we can store the whole generated model that provides very fast similarity detection
with acceptable precision. Thus, the system coupled with advanced matching systems can provide
a very useful support to run time data integration.
More detail on the implementation and results can be found in (Bedini, 2008; Bedini, 2008b;
Bedini, 2008c; Bedini, 2010).
Perspectives
The system we have developed is only a part of the whole architecture to achieve run-time data
integration with the adoption of semantics technologies. Semantic data must be produced at the
source and conceived as such, their direct transformation is still to hard to be completely and
safely automated. Nevertheless our system provides an essential part of the architecture that right
now has been misled, the lack of domain ontologies. Although it has been designed for a more
general use-case, its behaviours have been profiled over the e-business domain. Its early adoption
can be seen as a facilitator to the fast transformation of existing e-business XML documents into
a skeleton of an ontology to quickly build and test a semantic matcher for the domain. Indeed it is
quite fast and is only costly in computing resources during the generation of the model
calculations. The graphical representation is very powerful and with a lot of visualizations options
and visual measures (like importance of an edge or a concept with respect to others) are available
and of simple understanding for both human and software implementations. These are the reasons
why we believe that our system achieved the initial requirement to be able to extract very useful
knowledge from a large set of XML Schemas belonging to a common domain that can be simply
translated into an ontology.
Beyond what we have implemented another general trend of earlier semantic adoptions in the
domain are related to the SOA (Service Oriented Architecture) paradigm. Indeed the growing
number of services available on the Web and the tendency to split legacy software in a
choreography of services require a more advanced description of both data and services. Again
the adoption of Semantic technologies (i.e. OWL with SA-WSDL (Farrel, 2007) formalisms) is
the best alternative to follow for the next few years.
hal-00623913, version 1 - 15 Sep 2011
CONCLUSION
In this chapter we presented the e-business domain, with a more specific focus on the B2B
domain, the requirements that it currently imposes on companies and their information systems in
order to support business messages exchanges. Through this analysis we pointed out the current
architecture limitations and explained why ontologies are the best approach to follow to gain in
flexibility and dynamicity.
Nevertheless facts show that it is still not the case and e-business standards, which are the
most adopted solutions for e-business, do not define standards as ontologies but only as XML
Schemas. Although it is already a respectable improvement with respect to older systems like
EDIFACT, they still require relevant human effort to be operational.
In this sense we have provided an analysis of e-business ontology requirements and
summarized them into the need of a dynamic knowledge that can be built incrementally.
Afterwards we have presented some well-known ontologies for e-business. Despite the interest of
these works, real businesses still seem hesitant to use them in their implementations. We have
identified two main topics to develop, one is the definition of an enterprise semantic repository,
and the other one is a way to facilitate the automation of business document mapping. Finally we
have presented a system that facilitates, by automation, the transformation from the current model
to the "next one", from XML to OWL, believing that the existing gap can be breached by
improving this direction.
After a large overview of e-business standards and their derivate ontologies, we have seen that
existing systems aiming at data integration are strictly related to ontology and matching systems.
Research in this area is active and some frameworks dedicated to the e-business domain are
already appearing. The current lacking we have identified is the need for domain ontologies in
order to provide the necessary reference knowledge to improve existing matching systems.
Moreover, the adoption of Semantic Web technologies to business messages exchanges has an
essential requirement, which is that messages must be semantically well defined using ontologies.
To this end we have detailed a first prototype that provides a general viable solution.
REFERENCES
Anicic, N., Ivezic, N., & Jones A. (2005). Semantic Web Technologies for Enterprise Application
Integration. In Proceedings of the International Journal ComSIS Vol.2, No.1.
Aussenac-Gilles, N., & Maedche, A. (2002). Workshop on Machine Learning and Natural
Language Processing for Ontology Engineering. In conjunction with the ECAI'02 conference,
Lyon, France, July 22-23, 2002.
Batres, R., West, M., Leal, D., Price, D., & Naka, Y. (2005). An Upper Ontology based on ISO
15926. In proceedings of European Symposium on Computer Aided Process Engineering
(ESCAPE 15). Barcelona, Spain. June 2005.
Bedini, I., Nguyen, B., & Gardarin, G. (2008). Janus: Automatic Ontology Builder from XSD
files. Developer track at 17th International World Wide Web Conference (WWW2008). Beijing,
China, April 21 - 25, 2008
hal-00623913, version 1 - 15 Sep 2011
Bedini, I., Nguyen, B., & Gardarin, G. (2008b). B2B Automatic Taxonomy Construction. In
Proceedings 10th International Conference on Enterprise Information Systems. 12 - 16, June 2008
Barcelona, Spain.
Bedini, I., Gardarin, G, & Nguyen, B. (2008c). Deriving Ontologies from XML Schema. In
Proceedings of the Entrepôts de Données et Analyse en Ligne (EDA), France, June 2008. RNTI,
Vol. B-4, 3-17 (Invited Paper).
Bedini, I. (2010). Deriving ontologies automatically from XML Schemas applied to the B2B
domain. Doctoral dissertation, University of Versailles, France. (Available from:
http://bivan.pagespro-orange.fr/Janus/index.html)
Berners-Lee, T., Hendler, J., & Lassila, O. (2001). The Semantic Web. Scientific American,
284(5), pp 34-43.
Buitelaar, P., Cimiano, P., Grobelnik, M., & Sintek, M. (2005). Ontology Learning from Text
Tutorial. ECML/PKDD 2005 Porto, Portugal; 3rd October - 7th October, 2005. In conjunction
with the Workshop on Knowledge Discovery and Ontologies.
Castano, S., (Ed.). (2007). State of the Art on Ontology Coorination and Matching. BOEMIE
Project. Deliverable 4.4 Version 1.0 Final, March 2007.
Coates, A.B. (2007). Semantic data models and business context modeling. Invited speaker at
XML2007. Boston, Massachusetts, USA. 3-5 December 2007.
Corcho, O., & Gomez-Perez, A. (2001). Solving integration problems of e-commerce standards
and initiatives through ontological mappings. In Proceedings of the Workshop on e-business and
Intelligent Web.
D’Aquin, M., Haase, P., & Gómez-Pérez, J.M. (2008). NeOn - Lifecycle Support for Networked
Ontologies: Case studies in the pharmaceutical industry. In proceedings of European Semantic
Technology Conference. September 2008, Vienna, Austria.
De Bruijn, J., & Lausen, H. (2005). Web Service Modeling Language (WSML). W3C Member
Submission 3 June 2005. Available from: http://www.w3.org/Submission/WSML/
Do, H., & Rahm, E. (2002). COMA - A System for Flexible Combination of Schema Matching
Approaches. In Proceedings of 28th International Conference on Very Large Databases (VLDB
2002), Hong Kong, China.
Doan, A., Madhavan, J., Domingos, P., & Halevy, A. (2002). Learning to Map between
Ontologies on the Semantic Web. In Proceedings of the 11th International World Wide Web
Conference (WWW 2002), Honolulu, Hawaii, USA, pp. 662–673
Dogac, A., & Kabak, Y. (2009). Semantic Representations of the UN/CEFACT CCTS-based
Electronic Business Document Artifacts. Draft OASIS Profile. Retrieved November 15, 2009.
E-Business W@tch observatory. (2007). The European e-Business Report, 2006/07 edition. 5th
Synthesis Report of the e-Business W@tch, on behalf of the European Commission's Directorate
General for Enterprise and Industry. (http://www.ebusiness-watch.org)
Ehrig, M., & Sure, Y. (2004). Ontology Mapping - An Integrated Approach. In Proceedings of
the 1st European Semantic Web Symposium, Heraklion, Greece, Springer Verlag, pp. 76–91
hal-00623913, version 1 - 15 Sep 2011
Euzenat, J., (Ed.). (2004). State of the Art on Ontology Alignment. Knowledge Web Deliverable
D2.2.3. 2004.
Euzenat, J., & Shvaiko, P. (2007). Ontology matching. Springer-Verlag, Heidelberg (DE).
Farrel, J., & Lausen, H. (2007). Semantic Annotations for WSDL and XML Schema. W3C
Recommendation 28 August 2007.
Fensel, D., Ding, Y., Omelayenko, B., Schulten, E., Botquin, G., Brown, M., & Flett, A. (2001).
Product Data Integration in B2B E-Commerce. IEEE Intelligent Systems, vol. 16, pp. 54-59.
Fensel, D. (2001b). Ontologies: Silver bullet for knowledge management and electronic
commerce. Springer-Verlag, Berlin (DE).
Gruber, T. (2008). Encyclopedia of Database Systems. Ling Liu and M. Tamer Özsu (Eds.),
Springer-Verlag.
Guarino, N. (1998). Formal Ontology and Information Systems. In Proceedings of International
Conference on Formal Ontology in Information Systems (FOIS). Trento, Italy, 6-8 June 1998.
Amsterdam, IOS Press, pp. 3-15.
Haller, A., Gontarczyk, J., & Kotinurmi, P. (2008). Towards a complete SCM ontology: the case
of ontologising RosettaNet. In Proceedings of 23rd Annual ACM Symposium on Applied
Computing, pp. 1467-1473.
Hepp, M. (2006). Products and Services Ontologies: A Methodology for Deriving OWL
Ontologies from Industrial Categorization Standards. International Journal on Semantic Web &
Information Systems, Vol. 2, No. 1, pp. 72-99.
Hepp, M. (2008). GoodRelations: An Ontology for Describing Products and Services Offers on
the Web. In Proceedings of the 16th International Conference on Knowledge Engineering and
Knowledge Management, Italy. Springer LNCS, Vol 5268, pp. 332-347.
Hepp, M. (2008b). E-Business Vocabularies as a Moving Target: Quantifying the Conceptual
Dynamics in Domains. In Proceedings of the 16th International Conference on Knowledge
Engineering and Knowledge Management, Italy. Springer LNCS, Vol 5268, pp. 388–403.
Hepp, M. (2008c). eClassOWL. The Products and Services Ontology. Retrieved May 20, 2008,
http://www.heppnetz.de/eclassowl/
Hohpe, G., & Woolf, B. (2003). Enterprise Integration Patterns: Designing, Building, and
Deploying Messaging Solutions. Addison-Wesley, October 2003. ISBN13:9780321200686
ISBN10: 0-321-20068-3.
Kabak Y., & Dogac A. (2008). A Survey and Analysis of Electronic Business Document
Standards. Under revision in ACM Computing Surveys.
Kajan E., & Stoimenov L. (2009). An Approach for Semantic-Based EC Middleware. In
Proceedings of e-Commerce 2009. pp. 69-76.
IEEE SUO Working Group. (2003). Standard Upper Ontology Knowledge Interchange Format.
IEEE P1600.1 Standard Draft. Available from: http://suo.ieee.org/SUO/KIF/index.html
hal-00623913, version 1 - 15 Sep 2011
Lara, R., Cantador, I., & Castells, P. (2006). XBRL taxonomies and OWL ontologies for
investment funds. 1st International Workshop on Ontologizing Industrial Standards at the 25th
International Conference on Conceptual Modeling. Tucson, Arizona.
Lausen, H., Polleres, A., & Roman, D. (2005). Web Service Modeling Ontology (WSMO).
Member submission, W3C. Available from: http://www.w3.org/Submission/WSMO/.
Léger, A. (Ed.). (2002) OntoWeb: ontology-based information exchange for knowledge
management and electronic commerce. OntoWeb D2.2 final. 2002.
Mehrnoush, S., & Abdollahzadeh, B. (2003). The State of the Art in Ontology Learning: A
Framework for Comparison. The Knowledge Engineering Review, Volume 18, Issue 4.
Missikoff, M., & Taglino, F. (2003). Symontox: a web-ontology tool for ebusiness domains. In
Web Information Systems Engineering. In Proceedings of the Fourth International Conference
on Web Information Systems Engineering, pp 343-346.
Motta, E., & Sabou, M. (2006). Next Generation Semantic Web Applications. In Proceedings of
the 1st Asian Semantic Web Conference, China.
Niles, I., & Pease, A. (2001). Towards a standard upper ontology. In Proceedings of the
International Conference on Formal Ontology in Information Systems (FOIS), pages 2–9.
Noy, N.F., & McGuinness, D.L. (2001). Ontology Development 101: A Guide to Creating Your
First Ontology. Stanford Knowledge Systems Laboratory Technical Report KSL-01-05 and
Stanford Medical Informatics Technical Report SMI-2001-0880, March 2001.
Noy, N. F., & Klein, M. (2004). Ontology Evolution: Not the Same as Schema Evolution.
Knowledge and Information Systems 6(4), 428–440.
Noy, N. F. (2004b). Semantic Integration: a Survey of Ontology-based Approaches. SIGMOD
Record Special Issue on Semantic Integration.
Rahm, E., & Bernstein, P.A. (2001). A survey of approaches to automatic schema matching. The
VLDB Journal 10: 334–350. November 2001.
Shvaiko, P., & Euzenat, J. (2005). A Survey of Schema-based Matching Approaches. Journal on
Data Semantics (JoDS).
Smith, B. (2006). Against Idiosyncrasy in Ontology Development. In Proceedings of
International Conference on Formal Ontology in Information Systems (FOIS). Baltimore,
Maryland (USA), November 9-11, 2006.
Stumme, G., & Maedche, A. (2001). FCA-MERGE: Bottom-Up Merging of Ontologies. In
Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), Seattle,
WA, 2001.
Tran, D.C., Haase, P., Lewen, H., Munoz-Garcia, O., Gómez-Pérez, A., & Studer R. (2007).
Lifecycle-Support in Architectures for Ontology-Based Information Systems. In Proceedings of
the International Semantic Web Conference.
hal-00623913, version 1 - 15 Sep 2011
UN/CEFACT Techniques and Methodologies Group. (2003) UN/CEFACT Core Components
Technical Specification (CCTS). Part 8 of the ebXML Framework, ISO\TS 15000-5. Version
2.01, 15 November 2003.
Yarimagan, Y., & Dogac, A. (2009). A Semantic based Solution for the Interoperability of UBL
Schemas. To appear in IEEE Internet Computing Magazine.
Zhao, Y., & Sandahl, K. (2003). Potential Advantages of Semantic Web for Internet Commerce.
Proceedings of International Conference on Enterprise Information Systems (ICEIS), Vol 4,
pp151-158, Angers, France, April 23-26, 2003.
Zhao, Y., & Lövdahl, J. (2003b). A Reuse-Based Method of Developing the Ontology for EProcurement. In Proceedings of Second Nordic Conference on Web Services (NCWS'2003),
ISBN 91-7636-392-9, Växjö, Sweden, Nov 20-21, 2003.
ADDITIONAL READING SECTION
Euzenat, J., (Ed.). (2004). State of the Art on Ontology Alignment. Knowledge Web Deliverable
D2.2.3. 2004.
Euzenat, J., & Shvaiko, P. (2007). Ontology matching. Springer-Verlag, Heidelberg (DE).
Fensel, D. (2001b). Ontologies: Silver bullet for knowledge management and electronic
commerce. Springer-Verlag, Berlin (DE).
Hepp, M. (2007). Possible Ontologies: How Reality Constrains the Development of Relevant
Ontologies. IEEE Internet Computing 11(1): pp. 90-96.
Hepp, M. (2008). GoodRelations: An Ontology for Describing Products and Services Offers on
the Web. In Proceedings of the 16th International Conference on Knowledge Engineering and
Knowledge Management, Italy. Springer LNCS, Vol 5268, pp. 332-347.
Hohpe, G., & Woolf, B. (2003). Enterprise Integration Patterns: Designing, Building, and
Deploying Messaging Solutions. Addison-Wesley, October 2003. ISBN13:9780321200686
ISBN10: 0-321-20068-3.
Kent, W. Data and Reality. 1stBooks Library, rev. 3/28/2000. ISBN-13: 978-1585009701
Madhavan, J., Bernstein, P.A., Domingos, P., & Halevy, A. (2002). Representing and reasoning
about mappings between domain models. In Proceedings of the 18th National Conference on
Artificial Intelligence (AAAI’02), Edmonton, Alberta, Canada, August 2002.
Motta, E., & Sabou, M. (2006). Next Generation Semantic Web Applications. In Proceedings of
the 1st Asian Semantic Web Conference, China.
Noy, N. F. (2004b). Semantic Integration: a Survey of Ontology-based Approaches. SIGMOD
Record Special Issue on Semantic Integration.
Rahm, E., & Bernstein, P.A. (2001). A survey of approaches to automatic schema matching. The
VLDB Journal 10: 334–350. November 2001.
hal-00623913, version 1 - 15 Sep 2011
UN/CEFACT Techniques and Methodologies Group. (2003) UN/CEFACT Core Components
Technical Specification (CCTS). Part 8 of the ebXML Framework, ISO\TS 15000-5. Version
2.01, 15 November 2003.
KEY TERMS & DEFINITIONS
Design-time: Design time covers all the necessary tasks for modeling and for
setting up the execution of B2B collaborations. This phase involves the business
process specification, the partner profile definition, the trading partner contract
establishment, the business document conception and the message exchanges
integration (or mapping) to the existing information system. Design time also
includes the discovery and retrieval of existing business data.
Run-time: Run time covers the real execution of business exchanges from
beginning to their termination. (i.e., business processes execution, messages
exchange and dynamic services discovery).
B2B: Even though in this document we tend to use B2B as term to describe the
environment of our research, electronic message exchanges are not limited to
businesses. Administrations are increasingly confronted with similar problems in
their relationships with companies or other administration departments: they need
to provide high quality services to a wide audience, targeting both private and
public sectors, while improving their efficiency and reducing their costs. Even
internally, companies need dynamic message exchange solutions.
Ontology: An ontology is an explicit specification of a conceptualization
(Gruber, 2008)
Ontology evolution: with evolution of an ontology for the e-business data
integration we specifically mean an ontology as a dynamic characteristic of the
domain. Thus evolution should not be equivalent to a classical versioning system,
but more to a learning system, including a merge operation without loss of
information and backward compatibility
i
http://www.cxml.org
http://ontolog.cim3.net/cgi-bin/wiki.pl?UblOntology
iii
http://www.srdc.metu.edu.tr/ubl/UBL_Component_Ontology.owl
ii
iv
hal-00623913, version 1 - 15 Sep 2011
DGI stands for General Data Identification of economic agents Spanish taxonomy de agentes económicos
(DGI as Spanish acronym)
v
DGI is the Financial information report taxonomy for the Estados Públicos Individuales y Consolidados
vi
ES-BE-FS is the Taxonomy of the Stock Quote Exchange National Commission
vii
The resultant OWL ontologies can be found here:
http://www.tifbrewery.com/tifBrewery/resources/XBRLTaxonomies.zip
viii
http://www.oasis-open.org/committees/set/
ix
The SET Harmonized Ontology is publicly available from http://www.srdc.metu.edu.tr/iSURF/OASISSET-TC/ontology/HarmonizedOntology.owl