09 Chapter2
09 Chapter2
09 Chapter2
SEMANTIC WEB
2.1 Introduction
Semantic web implies a web that can process information both for humans as well
as machines in such a way that a machine can interpret and exchange the information on
web, without human interruption, producing more relevant data. The concept of semantic
network model [74] was conceived first time in early sixties by cognitive scientist Allan
M. Collins, linguist M. Ross Quillian and Elizabeth F [54]. They discussed the concept in
the context of how human brain uses long term memory to relate things to ascertain the
truth of a sentence. They categorized objects on the basis of their attributes and draw
7
inference from this categorization to decide whether a statement is true or false [54].
Later on Tim Berners-Lee [11], inventor of the World Wide Web and director of the
World Wide Web Consortium ("W3C"), coined the term semantic web in 2001. He
emphasized that semantic web will bring structure to the content of web pages, creating
an environment where software agents roaming from page to page will readily carry out
sophisticated tasks for users [21].
The term semantic web encompasses efforts to build a new WWW architecture
that enhances content with formal semantics. This will enable automated agents to reason
about web content, and produce an intelligent response to unforeseen situations [14].
Semantic web aims to change web development in such a way that machine can make the
sense of words displayed on the web pages, can easily relate them and produce
information more relevant to the subject at the ease of surfing. Human are possessed with
vocabulary and the contextual information of different words. Our brain can relate
incomplete and irrelevant words and still draw conclusions on the basis of knowledge and
experience, where as providing same capability to the computers is not an easy task and
semantic web aims to achieve such a challenge. It aims to describe things in a way that
various applications and web services can understand them. It is not about more links
between web pages rather it describes the relationships between entities and their
properties. The upcoming sections provide a brief description of semantic web
architecture.
The main objective of semantic web is to express the meaning. In order to achieve
this objective semantic web must meet certain standards and tools. Implementation of
8
semantic web demands for a tool through which information spread across various
sources can be accessed and processed. Tim Berners-Lee [14] indicated that software
agents will serve as the core working component in semantic web, since they are able to
draw information from distributed heterogeneous sources, relate them and draw
conclusion. Implementation of semantic web will lead to machine-readable web content
and automated searching
Figure. 2.1 Four Versions of the Semantic Web Reference Architecture (V1-V4).
Architecture for semantic web will assist in the development of specifications and
applications. The most well-known versions of the layered architecture that exist in the
literature are provided by Berners-Lee. Berners-Lee proposed four versions of semantic
web architecture in 2001 [12], 2003 [13] & 2006 [11] respectively (see figure 2.1).
However these versions were not consistent with the principles of layered architectures.
9
the four architectures on the principles of layered architectures and highlighted that the
architectures proposed by Tim Berners-Lee can not be termed as layered architectures at
all. Their work throws light on inconsistent layer arrangement and also the doubt in the
functionality of the layers.
This layer is responsible for the encoding of any symbol of any language or
character set and at the same time responsible for uniquely identifying different resource
or entities. Unicode and URI follow the important features of the existing world wide
10
web. Unicode is a standard of encoding international character sets and it emphasizes that
all human languages should be encoded using a standardized form. URI is a string in
standard form that allows unique identification of resources. URL is a subset of URI,
which contains access mechanism and a location of the web contents. The usage of URI
is important in distributed world wide web as it provides understandable identification of
all resources. An international variant of URI is Internationalized Resource Identifier
(IRI) [56] that allows usage of Unicode characters in identifiers, for which a mapping to
URI is defined.
XML layer along with XML namespace and XML schema definitions make sure
that a common syntax be used in the semantic web. XML is a general purpose markup
language for documents containing structured information. An XML document contains
elements that can be nested and may have attributes and contents. XML namespace allow
specifying different markup vocabularies in one XML document. XML schema provides
schema for a particular set of XML documents. Currently XML is not a necessary
component of semantic web technologies as alternative syntaxes such as Terse RDF
Triple Language (Turtle) [2] which is a serialization format for Resource Description
Framework (RDF) graph are available.
Resource Description Framework provides the core data representation format for
semantic web. RDF is recommended standard by W3C for publishing data on semantic
web. RDF is a framework for representing information about a resource in a graphical
form. It was primarily intended for representing metadata about WWW resources, but it
11
can be used for storing any other data as well. It makes use of subject-predicate-object
which results in graphical data. It provides a simple language for expressing data models
i.e. objects and their relationships. An RDF based model can be represented in a variety
of syntaxes such as RDF/XML etc. RDF is a fundamental standard of the semantic web.
RDF itself serves as a description of graph formed by triples.
Anyone can define vocabulary of terms used for more detailed description of
taxonomy and other ontological constructs. RDF schema extends RDF and is a
vocabulary for describing properties and classes of RDF based resources with semantics
for generalized-hierarchies of such properties and classes & thus can be used to describe
taxonomies of classes and properties and use them to create lightweight ontology. Figure
2.3 explains the above mentioned concept.
12
2.2.4 Ontology
13
There are two main classes of ontology. The first would be the one that is employed to
explicitly capture "static knowledge" of a domain in contrast the 2nd ontology provides
reasoning about the domain knowledge (problem solving knowledge).
TYPE Explanation
Domain Ontology Designed to represent knowledge relevant to a certain domain.
For example medical, mechanical etc.
2.2.5 Rules
This layer aims to support inference drawing to allow query and filtering. Up till
now, there is no recommended language for this layer but Rule Interchange Format (RIF)
is the most commonly used language. Rules are those used by the production systems
presented in the corresponding knowledge representation subsection. They capture
dynamic knowledge as a set of conditions that must be fulfilled in order to achieve the set
of consequences of the rule. The technology behind this layer is the Semantic Web Rule
Language (SWRL) [4]. It is based on Rule Modeling Language (RuleML). Similar to
RuleML, SWRL covers the entire rule spectrum, from derivation and transformation
rules to reaction rules. Thus it supports queries and inferences drawing using web
ontologies, mappings between various ontologies, and dynamic web behaviours of
workflows, services and agents
14
2.2.6 Logic & Proof Layer
This layer provides facility of writing logic into documents thus providing rules
for deduction of one type of document into another type. This layer includes predicate
logic and quantifiers so as to facilitate deductions. Knowledge Interchange Format (KIF)
is the language used to specify logic in this layer. The proof is a chain of assertions and
reasoning rules with pointers to all the supporting material. This layer helps in checking
the authenticity of the users and thus helps in trust establishment. Figure 2.5 represents
the structure of proof layer.
15
2.2.7 Trust Layer
Trust layer ensures that the source of information is genuine and authentic at the
same time ensuring that that only authorized application agents and users can have access
to the information. It requires that the reasoning system must include signature
verification system. This will result into a system which can express and reason about
relationships across the whole range of public key based security and trust systems.
As data objects, using semantic web language like RDF and OWL [83].
As embeded formal metadata in documents using new markup language like
Microformats [83].
RDF is a language for describing information and resources on web. In RDF files,
information is kept in such a way that makes it possible for computer programs to search,
discover, collect, analyze and process information from web. The RDF data model is
16
similar to that of classic conceptual modeling approach such as entity-relationship or
class diagrams, as it is based on the idea of making statements about resources in the
form of subject-predicate-object expressions. These expressions are known as triples in
RDF terminology. The subject denotes the resource and predicate and expresses the
relationship between the subject and the object. For example “The sky has the color blue”
in RDF triple, “the sky” is the subject, “has the color” is predicate and blue is an object.
RDF is an abstract model with several file formats and thus the particular way in which a
resource can be encoded vary depending upon the format used.
Currently, computers are changing from single, isolated devices to an entry points
to a world wide network of information exchange and business transactions. However,
the success of the WWW has made it increasingly difficult to find, access, present and
17
maintain the information required by a wide variety of users. In response to this problem,
many new research initiatives and commercial enterprises have been set up to enrich the
available information with machine understandable semantics. Thus semantic web is
providing intelligent access to heterogeneous, distributed information, enabling software
products (agents) to mediate between user needs and the information sources available.
Semantic web application areas are experiencing intensified interest due to the
rapid growth in the use of the web, together with the innovation and renovation of
information content technologies. The semantic web is regarded as an integrator across
different content and information applications, systems and provides mechanisms for the
realization of Enterprise Information Systems (EIS) [62]. This section explores some
areas of semantic web applicability.
Search engine are the most common tools to extract desired information from the
web. A semantic search engine stores semantic information about web resources and is
able to solve complex queries, considering the context where the web resource is
targeted; it also allows clients to obtain information about commercial products and
services as well as about sellers and service providers which can be hierarchically
organized. Semantic search engines may can significantly contribute to the development
of electronic business applications since it is based on strong theory and widely accepted
standards [5]. Thus in order to provide efficient search results search engines are also
developed using semantics and have emerged as semantic search engines. Currently
many of semantic search engines are developed and implemented in different working
environments. Some existing semantic web search engine are:
18
(A) Hakia
Hakia [34] is an innovative search engine that uses semantics to present the
relevant search results to the users. It works like any other search engine where a user has
to type the keywords to initiate a search query. However, Hakia provides the results
based on laws of ontological semantics, mathematics and fuzzy logic. As a result, the
user gets the most relevant answers to the query asked for. A user can give a combination
of search words that are pertinent to their query and expect results based on the relevance
rather than popularity of the webpage. If the search results are very common, instead of
scattering the information along many pages, Hakia presents the user with a gallery on
the subject. These galleries are placed on the first page of the search that tab the different
aspects of the search query; for example if a user searches for Clint Eastwood [5], it will
find his biography, images, photograph sites, interviews, blogs, quotes, contact
information, filmography etc presented under different tabs on the first page.
From the second page onwards the search results are simply presented in the order
of their relevance. Most search queries are accompanied by a blurb on top of the page
that explains the content of the page. In short Hakia tries to provide the users with
information that is really needed; while trying to make it easy for the user to search
through the results. However, there are times when the keyword typed in by the user
restricts the coverage of the search engine, hence the key words must be given very
carefully [34].
(B) SenseBot
SenseBot [21] is a web search engine that summarizes search results into one
concise digest on the topic being searched. The search engine attempts to understand
19
what the result pages are about. For this purpose, it uses text mining to analyze web pages
and identify their key semantic concepts. This way SenseBot helps the user to get a better
grasp of what the relevant term is about. Thus, the user does not have to go through a
large number of web pages and comb through results with incomprehensible expert [21].
20
structured form. From the knowledge management perspective, the current technology
suffers from limitations in the following areas:
The aim of the semantic web is to allow much more advanced knowledge management
systems. Where:
21
(v) Defining who may view certain parts of information (even parts of documents) will
be possible.
The semantic network represents knowledge as a directed graph. The nodes of the graph
are different things, and the edges of the graph express relationships between those things.
22
This is most easily illustrated with a diagram. Here, the things are "Animal", "Mammal",
"Fish", and so on. The relationships are "is a", "has" and "lives in".
Some versions are highly informal, but other versions are formally defined systems
of logic. Six most common kinds of semantic networks are Definitional Network,
Assertional Networks, Implicational networks, Executable networks, Learning networks,
Hybrid Networks and are defined as follows.
(i) A definitional network [62] emphasizes the subtype or is-a relation between a
concept type and a newly defined subtype. The resulting network, also called a
generalization hierarchy, supports the rule of inheritance for copying properties
defined for a super type to all of its subtypes. Since definitions are true by definition,
the information in these networks is often assumed to be necessarily true.
(ii) Assertional networks are designed to assert propositions. Unlike definitional
networks, the information in an assertional network is assumed to be contingently
true, unless it is explicitly marked with a modal operator. Some assertional
networks have been proposed as models of the conceptual structures underlying
natural language semantics.
(iii) Implicational networks use implication as the primary relation for connecting nodes.
They may be used to represent patterns of beliefs, causality, or inferences.
(iv) Executable networks include some mechanism, such as marker passing or attached
procedures, which can perform inferences, pass messages, or search for patterns and
associations.
(v) Learning networks build or extend their representations by acquiring knowledge
from examples. The new knowledge may change the old network by adding and
23
deleting nodes and arcs or by modifying numerical values, called weights,
associated with the nodes and arcs.
(vi) Hybrid networks combine two or more of the previous techniques, either in a single
network or in separate, but closely interacting networks.
2.4.5 E-Commerce
Over 300 million searches are conducted everyday on the internet by people
trying to find what they need. A majority of these searches are in the domain of consumer
e-commerce, where a web user is looking for something to buy. This represents a huge
cost in terms of people hours and an enormous drain of resources. Agent enabled
semantic search will have a dramatic impact on the precision of these searches. It will
reduce and possibly eliminate information asymmetry where a better informed buyer gets
the best value. By impacting this key determinant of market prices semantic web will
foster the evolution of different business and economic models.
24
across multiple domains. Ontology translation could bridge them. Internet commerce
requires automatic negotiation and contracting for all searched results. This feature could
significantly help machines process a large amount of business partner information that
humans cannot handle, and thus save time and money.
Web services are software modules which provide some kind of service and
which can be accessed and invoked via a network. Semantic web services [45] are
considered to be the next step in the evolution of web services. In addition to web
services, semantic web services provide formal and machine understandable descriptions
of their capabilities which allow them to be understood and processed by automatic
algorithms. Semantic web services are envisioned to enable sophisticated tasks such as
automated discovery of services by matching specified service requests with a large pool
of service providers, automated composition of services to instantiate high-level
descriptions of complex tasks by a sequence of calls to simpler services, and service
monitoring to evaluate the quality of work provided by services [45]. Some existing
semantic web services are for communication, description & discovery. There are briefly
discussed as follows:
25
2.5.2 Semantic Web Services for Description (WSDL)
2.6 Conclusions
This chapter provides a brief description of semantic web with its architecture.
Architecture for semantic web will assist in the development of specifications and
applications. Semantic web applications are experiencing intensified interest due to the
rapid growth in the use of the web, together with the innovation and renovation of
information content technologies and development focuses on the way through which
machines can understand the structure, behavior and even meaning of the published
information thereby making search of information and its integration more efficient. The
services of semantic web are also explained in this chapter. Next chapter provides details
of intelligent agents.
26