Survey of Current Terminologies and Onto
Survey of Current Terminologies and Onto
Survey of Current Terminologies and Onto
br]
ISSN 1981-6286
Original Articles
Stefan Schulz
Fred Freitas Institute of Medical Biome-
Informatics Center, Federal try and Medical Informatics,
University of Pernambuco, University Medical Center,
Recife, Brazil Freiburg, Germany
[email protected] [email protected]
Eduardo Moraes
Informatics Center, Federal University of Pernambuco, Recife, Brazil
[email protected]
Abstract
This paper provides a survey of the state of the art in terminologies and ontologies applied to Biology and Medicine.
Not intending to be fully comprehensive, we describe some of the most relevant resources that currently attract inter-
est from industry and academia. We introduce a description framework and compare the systems in terms of their
architectural elements, their expressiveness, and coverage, as well as analyze the nature of entities they denote. In par-
ticular, we scrutinize the International Classification of Diseases (ICD), the Medical Subject Headings (MeSH), the
Gene Ontology (GO), the Systematized Nomenclature of Medicine - Clinical Terms (SNOMED CT), the Generalized
Architecture for Languages, Encyclopaedias and Nomenclatures (openGALEN), the Foundational Model of Anatomy
(FMA), the Unified Medical Language System (UMLS), and the Open Biomedical Ontologies (OBO) Foundry.
Keywords
Terminology; ontology; biology; medicine
RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009 7
The current developments in biomedical knowledge other definitions of ontology, such as “representation of a
management have essentially two roots: domain of discourse, consisting of a list of terms, the re-
• the establishment of indexing vocabularies and lationships among them and the axioms which are always
classification systems such as the International Classifica- valid in the domain” (Antoniou & Harmelen 2004), or
tion of Diseases and the Index Medicus, dating back to the a “representational artifact whose representational units
19th century, driven by public health and epidemiology are intended to designate classes or universals in reality
interests on the one hand, and by library science on the and their interrelations” (Smith 2005).
other hand; and The notion of ontology is often specialized to
• the research on medical decision support and what is named “formal ontology” (Guarino 1998). This
expert systems, starting in the seventies of the last cen- means that the content of an ontology is described using
tury, driven by the emerging research field of Artificial mathematical logics which can endow computer systems
Intelligence and inspired by the idea of creating knowl- with the ability of logical inference. It can also support
edge-based computer tools to assist the complex process autonomous discovery over recorded data, as well as
of medical decision making. reuse and exchange of knowledge.
Motivated by the vision of the Semantic Web, the The rise of ontologies in the Computer Science
term “ontology” has become one of the most fashionable mainstream has spread to many other branches of
terms in Computer Science. Ontologies are advertized knowledge: Motivated by the vision of the Semantic
to precisely describe domains in detail and to employ Web (Berners-Lee 2001), many groups from academia
these descriptions in many types of applications, rang- and industry throughout the world became interested
ing from natural language processing to logic reasoning in ontologies, and the number of tools, standards, and
and decision support systems. Many application areas users grew accordingly. Indeed, some goals to produce
currently take advantage of ontologies, but the field of standard ontologies in some areas were accomplished,
life sciences is gaining more and more visibility in this particularly in Medicine and Biology.
picture, since very few scientific domains, if any, contain
such impressive and rapidly growing amounts of terms, Terminologies vs. ontologies
concepts, and definitions. Especially Medicine is characterized by a wealth
of so-called terminologies, best described as language-
Ontologies oriented artifacts that relate the various senses or mean-
The term “ontology” has become very popular ings of linguistic entities with each other. Terminologies
since the mid nineties but, unfortunately, no universally are generally built to serve well-defined purposes like
accepted definitions exist (Kuzniersky 2006). Since the document retrieval, resource annotation, the recording
seventeenth century it has been used for the discipline of of mortality and morbidity statistics, or health services
general metaphysics in the tradition of Aristotle’s “first billing. Biomedical terminologies do not use formal and
Philosophy” as the science of being qua being. It is often well-defined descriptions; they rather define the terms
seen as complementary to the notion of Epistemology (if ever) by human language expressions, and express the
(the science of knowledge). associations between terms by informal, close-to human
In Computer Science, the definition of ontology as language relations. Words or multiword terms are the basic
the explicit specification of a conceptualization (Gruber building blocks of terminologies, which generally organize
1995) prevails. Conceptualization is here meant as an them in hierarchies that relate their meanings in terms
abstract, simplified view of the world that we wish to of synonymy (same meaning), hyperonymy (broader
represent for some purpose, e.g., to draw inferences, to meaning), hyponymy (narrower meaning). Although
perform automatic classification, etc. A conceptualiza- terminologies can be successfully used in representing
tion usually includes concepts (also called classes or abstract meaning, e.g. in natural language processing or
types, e.g., Heart), individuals as instances of concepts in the annotation of resources (e.g. literature abstracts,
(e.g. the individual Fido is an instance of Dog), binary experimental results), they are not precise and expressive
relations between concepts or individuals (e.g. Dog is-a enough for more knowledge-intensive applications.
Vertebrate), logic-based restrictions (all instances of Her- Whereas one use case may require knowledge on how
bivore eat only vegetables while all instances of Carnivore and on what some terms differ from others, another one
eat some instances of Animals), and axioms (sentences may demand more precise relations between terms (for
that are always true in a domain – e.g., every instance example that every instance of a normal Arm has some
of Living Person has some instance of Heart). The link to instance of Forearm as its part. To meet these requirements,
connect these entities is clearly given by ontological rela- a language-centered resource is not expressive enough.
tions. They will represent the different aspects in which Here, a reality-centered resource is better suited in order
concepts relate to each other. The most relevant and used to capture the subtleties of which entities (objects, quali-
relation types are subclass (Heart is a subclass of Organ, ties, processes, etc.) are related to others, under which
since all instances of the former are instances of the circumstances these relations hold, and how these relations
latter, with some special features that distinguish them should be exactly interpreted (e.g. of whether the relation
from others), and partonomic relations (every instance part-of between a body part and a body still holds after
of Heart Ventricle is a part of some Heart). But there are the body part like a kidney is removed). That is where
8 RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009
ontologies come into play. Ontologies are expressed in • Nodes: the primary identifiers of meaning
logic-based formalisms, which provide (meta-) definitions • Links: the connections between nodes
of classes (concepts), relations, instances and axioms. • Codes: alphanumeric identifier for a node or a
Therefore, ontologies can represent a domain in a form link.
that computers can handle the definitions according to
• Hierarchies: network of links that constitute a
the semantics of the definitions instead of employing only
partial order, thus defining trees or directed graphs
terms or semantic identifiers. Thus, a system can check
whether some interpretation is correct or not, if a given • Attributes: seen as further descriptions of nodes
statement is true according to some ontology, among other and links
related tasks. Ontologies can also encompass different • Axioms: sentences expressed in logic which are
dimensions that a domain should embrace: for instance, always true in the domain.
in organisms, the degree of canonicity of organs (whether We furthermore describe the systems in terms of
an organism functions as usually supposed or not), the • Purpose: why they were built and where they
degree of development (e.g. embryo vs. adult), the place of were used
an organism or organic matter in the biological taxonomy
• Scope: the knowledge domain they represent
(e.g. fly vs. mouse), or the granularity by which biological
• Reference: what nodes and links denote
structure is described (e.g. macroscopic vs. microscopic),
to mention a few (Schulz 2004).
However, the classical terminological approach is The International Classification of Diseases
increasingly blended with principles of modern ontology Terminological standardization in Medicine has a
design, with ontology languages from the Computer Sci- long history. In 1880, the International Classification
ence domain and with the emerging discipline of applied of Diseases (ICD) (WHO 2008) was created, based on
ontology embedded in the field of Analytical Philosophy. the London Bills of Mortality which distinguished about
What we intend to describe in this paper is the broad 200 causes of death providing codes for all known dis-
range of these very heterogeneous artifacts, for which an eases at that time. For many years, the ICD was the only
overarching term is still missing (the often used term “bio- medical terminology resource. Its current (10th) edition
medical vocabularies” is misleading as it stresses too much is maintained by the World Health Organization (WHO)
the language aspect). In the remainder of this article we and translated into 42 languages. ICD-10 provides about
therefore use the acronym BMTOs for “biomedical termi- 13,000 classes for the encoding of diseases and reasons
nologies and ontologies”. It is organized as follows: In the of encounter. Originally created for epidemiological pur-
next section, the main BMTOs are explained in detail. Sec- poses, ICD now constitutes the most widely used disease
tion 3 is devoted to foundations and efforts that integrate encoding system and is globally used as a common basis
many of these systems. Section 4 discusses some important for health statistics. In many countries, the ICD is also
topics from each BMTO, while Section 5 addresses open employed as a basis for Diagnosis Related Groups (DRG)
issues and challenges for integration of BMTO. used for billing. DRGs group patients that are clinically
similar and are therefore expected to use the same health-
care resources.
Important examples of biomedical
ICD has a simple but efficient architecture. Par-
terminologies and ontologies (BMTOs) titioned into 22 chapters (Infections, Neoplasms, Blood
Diseases, Endocrine Diseases, etc.), its nodes denote classes
Description scheme of diseases and related problems. This means that each
Several efforts have been made in the biomedical individual disease falls into a category that has a unique
field for the development of semantic standards such as code, e.g. the myopia of the second author of this paper
medical terminologies, ontologies, and coding systems. can be encoded by H52.1. ICD classes are hierarchically
In this section, we will analyze a set of BMTOs which re- arranged into up to five levels. The hierarchy-building
flects the broad variety of this genre. We will address the relation is the is-a (subclass) relation, expressing that each
International Classification of Diseases (ICD), the Medi- member of a class is also member of any parent class. ICD
cal Subject Headings (MeSH), the Gene Ontology (GO), axiomatically assumes that sibling classes do not overlap.
the Systematized Nomenclature of Medicine - Clinical This warrants that no class has more than one parent class
Terms (SNOMED CT), the Generalized Architecture for and that there is exactly one terminal class for each entity
Languages, Encyclopaedias and Nomenclatures (open- to be classified, hence its characterization as a “classifi-
GALEN), the Foundational Model of Anatomy (FMA) cation”. The simple cause for this is to prevent that one
and, as examples of overarching initiatives, the Unified disease is counted twice. In order to avoid gaps, residual
Medical Language System (UMLS) and the Open Bio- categories (“not elsewhere classified”) were created. Addi-
medical Ontologies (OBO) Foundry. We will describe tional attributes of ICD classes are inclusion and exclusion
and compare them by identifying common features and statements, and in one chapter also glossary-like free text
differences. Moreover, we will discuss what these systems definitions. Inclusion statements list more specific diseases
represent and which architecture they use. To this end, that are contained in the same class, while classes with
we introduce the architectural elements we encounter in exclusion statements segregate certain conditions from a
all BMTOs as follows: class, thus assigning them to a different class.
RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009 9
ICD’s scope extends the realm of diseases as it also The Medical Subject Headings (MeSH)
includes injuries and external causes of health problems, The Medical Subject Headings (MeSH) (Nelson
signs and symptoms, and any kind of conditions that 2007, MESH 2008), edited and maintained by the
justify the encounter with health professionals. Figure U.S. National Library of Medicine (NLM), consist of a
1 displays an excerpt of ICD relating to certain types controlled vocabulary used for indexing the content of
of eye disorders, which are subclasses of the three-digit health related documents, above all literature abstracts
category H52. Note the exclusion under H52.1 and the in the life science literature database MEDLINE with
inclusions under H52.5. The former must be coded in nearly 20 Million citations (Nelson 2007, PubMed).
a different branch, while the latter names more specific MeSH is available in 41 languages.
disorders for which no separate codes are available.
MeSH is partitioned at its uppermost level into 16
Note also that H52.6 constitutes the complement to
branches (Anatomy, Organisms and Diseases, among oth-
H52.0-H52.5, and that H52.7 corresponds to H52 and
ers). MeSH’s nodes are named “headings” and denote
expresses that the coder lacks details that would enable
a standardized meaning of a group of medical terms. In
to use a more specific code.
contrast to the tree-like hierarchy of ICD, MeSH head-
ings are placed in multiple hierarchies. The hierarchical
order is based on the principle that all documents indexed
H52 Disorders of refraction and accommodation by a given heading are also relevant for any parent de-
H52.0 Hypermetropia scriptor. These informal links are also characterized by
H52.1 Myopia the name “broader/narrower”). So is the MeSH heading
Excludes: degenerative myopia ( H44.2 ) Leishmaniasis both part of the hierarchy Parasitic Diseases
H52.2 Astigmatism and the hierarchy Skin and Connective Tissue Diseases, as
H52.3 Anisometropia and aniseikonia depicted by Figure 2. Thus, documents on leishmaniasis
H52.4 Presbyopia are found in a MEDLINE query for parasitic diseases
H52.5 Disorders of accommodation just as in a query for skin diseases. MeSH headings have,
Internal ophthalmoplegia (complete)(total)
Paresis } in addition to their unique identifier, a so-called tree
Spasm } of accommodation
number for each hierarchical context.
H52.6 Other disorders of refraction
Headings are furthermore specified by a textual defi-
H52.7 Disorder of refraction, unspecified
nition, a so-called scope note. Additional attributes are
entry terms (synonyms or more specific terms) and allow-
Figure 1 - Excerpt of the International able qualifiers, such as prevention, therapy, and others in
Classification of Diseases, 10th version (ICD-10). the case of diseases, pathogenicity in case of organisms.
Figure 2 - The MeSH entry for “Leishmaniasis”. The table provides definition and attributes.
Two of the “trees” in which this heading is inserted are displayed at the bottom.
10 RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009
The Gene Ontology Molecular Function outline its scope. Each branch con-
The Gene Ontology (GO) (GO 2008) is maintained sists in a multiple hierarchy, of a totality of 24,500
by the Gene Ontology Consortium, which originally cre- nodes, called GO terms. As much as GO’s architecture
ated it to support shared annotations of genomic data may resemble MeSH at first sight, there are crucial
in three model organism (Drosophila, Yeast, Mouse) differences that may justify its qualification as an on-
databases. Since then, its scope has been broadened so tology. First of all, its nodes are more than semantic
that it now encompasses all biology independent of the descriptors. In contrast to MeSH headings, GO terms
characteristics of specific organisms. In contrast to its represent classes of real entities. For instance, the (ab-
name, GO is not an ontology of genes. Instead, it pro- stract) class Cell Nucleus has all (material) cell nuclei
vides semantic identifiers that standardize the descrip- in the world as members. GO terms are characterized
tion of data on genes or gene products (e.g., proteins) by identifiers, so-called accession numbers, and have
along three dimensions: (i) in which cell compartment synonyms and definitions as additional attributes.
a gene is expressed (e.g. the mitochondrium), (ii) with Another difference compared to MeSH is the semantic
which functions a protein is associated (e.g. signaling), explicitness of links. Instead of “broader / narrower”,
and (iii) in which biological processes a protein partici- GO provides two precisely labeled relations: is-a and
pates (e.g. mitosis). Thus GO is able to support queries part-of. The former signifies that every entity that is
across the databases consortium members maintain, member of one class is also member of all parent is-a
thus facilitating the access to the knowledge discovered classes, just as in ICD. Part-of has to be interpreted in
by them. the sense that every entity that is member of one class
Like MeSH, the Gene Ontology is partitioned is part of some entity that is member of all of its part-of
in disjoint branches at its uppermost level. The three classes. Figure 3 presents an entry from GO referring
branches Cellular Component, Biological Process, and to the class Cell.
Accession: GO:0005623
Ontology: cellular component
Synonyms: None
Definition: The basic structural and functional unit of all organisms. Includes the plasma membrane
and any external encapsulating structures such as the cell wall and cell envelope.
[source: GOC:go_curators]
RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009 11
concepts, denote mostly classes of individual entities (such CT concepts are uniquely identified by numeric keys to-
as diseases, procedures, lab results, drugs etc., but also gether with their fully specified names. Most SNOMED
particulars like geographic entities), although there is still CT concepts include several synonyms (named “descrip-
some controversy of whether the referents of, e.g., the tions”), and, in just a few cases, also free-text definitions.
concept Chest Pain, are the objects themselves (e.g. the Additional attributes are SNOMED qualifiers, which
pain in the chest of a given patient) or their mention in provide optional refinements for concepts, e.g. Laterality
the health record (e.g. the entry “chest pain”). SNOMED for anatomy or Severity for diseases.
Current Concept:
Fully Specified Name: Cholecystectomy (procedure)
ConceptId: 38102005
Defining Relationships:
Is a Biliary tract excision (procedure)
Is a Operation on gallbladder (procedure)
Group 1:
Method (attribute): Excision - action (qualifier value)
Procedure site - Direct (attribute): Gallbladder structure (body structure)
This concept is fully defined.
Qualifiers:
Access (attribute): Surgical access values (qualifier value)
Priority (attribute): Priorities (qualifier value)
Descriptions (Synonyms):
Preferred: Cholecystectomy
Synonyms: Excision of gallbladder, Gallbladder excision, Removal of gallbladder
Parents:
Biliary tract excision (procedure)
Operation on gallbladder (procedure)
Children:
Cholecystectomy and exploration of bile duct (procedure)
Cholecystectomy and operative cholangiogram (procedure)
Excision of lesion of gallbladder (procedure)
Laparoscopic cholecystectomy (procedure)
Partial cholecystectomy (procedure)
Total cholecystectomy and excision of surrounding tissue (procedure)
Figure 4 - SNOMED CT’s definition of Cholecystectomy. Note that this concept is fully defined, i.e.
the combination Method – Excision Action with Procedure Site – Gallbladder Structure is a sufficient
condition for Gallbladder
SNOMED CT offers also 50 link types, called linkage projects (GALEN) (Rector 2003). It is aimed at clinical
concepts. They are used in what can be considered the most applications and contains about 25,000 nodes (concepts)
important distinctive criterion of SNOMED CT, viz. the and 26 link types (relations). openGALEN concepts are
use of a rich ontology representation language compatible arranged in multiple is-a hierarchies, too. It uses a descrip-
which the Semantic Web standard OWL-DL (description tion logic language called GRAIL (GALEN Representation
logics) (Bechhofer et al. 2004). Description logics allow the and Integration Language), which allows the definition of
definition of new classes using existing classes and relations. classes similar as in SNOMED CT but provides a richer
As shown in Figure 4, Cholecystectomy is fully defined as a syntax, as can be seen in the example of Figure 5 which
new class, using the existing classes Excision and Gallbladder, describes a fixation of the left femur neck fracture. The
together with the links (relations) Method and Procedure Site. GALEN model is split into the following items:
This means that each and every excision procedure at some
• a high level ontology, which provides an overall
gallbladder is a cholecystectomy and vice versa.
categorization framework,
The creation of complex expressions based on
• the common reference (CORE) model, contain-
SNOMED concepts and obeying a formal syntax and
semantics is called coordination. This can be done at the ing reusable definitions from anatomy, diseases, surgical
moment of coding (pre-coordination) or beforehand, by procedures, symptoms, etc.,
introducing new concepts into the terminology (post- • detailed extensions for specific subdomains, such
coordination) (Chen 2005). as surgery.
Its purpose is therefore similar to SNOMED CT, but
openGALEN it has never reached its scope and granularity. However,
the Generalized Architecture for Languages, Encyclo- openGALEN can be regarded as the pioneer of the use of
paedias and Nomenclatures (openGALEN) provides an formal logics in biomedical terminologies. Its most impor-
open-source clinical ontology which had been developed tant use case was the development of the French medical
in the nineties as an outcome of a series of European procedure classification CCAM (Trombert-Paviot 2000).
12 RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009
openGalen: "Open fixation of a fracture of the neck of the left femur"
(‘SurgicalProcess’ which
isMainlyCharacterisedBy (performance which MAIN fixing
isEnactmentOf (‘SurgicalFixing’ which ACTS_ON fracture
actsSpecificallyOn (PathologicalBodyStructure which < HAS_LOCATION neck of long bone
involves Bone IS_PART_OF femur
hasUniqueAssociatedProcess FracturingProcess HAS_LATERALITY left
hasSpecificLocation (Collum which HAS_APPROACH open
isSpecificSolidDivisionOf (Femur which
hasLeftRightSelector leftSelection))>))))
Figure 5 - OpenGALEN detailed entry defining a type of fracture fixation. Left: description logics like
representation (GRAIL syntax). Right: close-to-user syntax devised for facilitating the definition of
surgery concepts.
Figure 6 - The Foundational Model of Anatomy’s definition of the Right Inferior Nasal Concha.
RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009 13
Efforts to gather different sources of The Open Biomedical Ontologies (OBO)
biomedical knowledge Foundry
Created in 2003, OBO, the Open Biomedical On-
Rationale tologies (OBO 2008) platform evolved as a library of
online, public-domain biomedical ontologies. On this
Considerable efforts have been devoted on the one
basis, the OBO Foundry initiative developed a set of
hand to align the numerous and largely overlapping bio-
shared principles regulating the development of biomedi-
medical terminologies and ontologies, but also to prevent
cal ontologies (Smith 2007). The coverage of the OBO
the anarchic proliferation of BMTOs by establishing
foundry comprises several anatomy ontologies (includ-
principles for the coordinated development of interoper-
ing the FMA), the Gene Ontology, as well as special-
able resources on the other hand. We will describe the ized ontologies of biochemistry (ChEBI), phenotypes
Unified Medical Language System (UMLS) and the OBO (PATO), sequences (SO), and investigation techniques
(Open Biological Ontologies) Foundry. Whereas UMLS (OBI). Currently, more than 50 ontologies are listed as
is an example for the former strategy, OBO embodies the candidates for the OBO Foundry.
latter approach.
The OBO Foundry propagates two representation
languages. Besides OWL-DL there is a proprietary for-
The Unified Medical Language System UMLS mat (OBO-EDIT 2009) in which most OBO ontologies
Metathesaurus are encoded.
The richest source of biomedical terminologies, Just as in the Gene Ontology, nodes in OBO ontolo-
thesauri, classification systems and ontologies is gies denote classes of entities in the real world. Links
constituted by the Unified Medical Language System between these classes are interpreted as existentially
(UMLS) Metathesaurus (Nelson 2006, UMLS 2008), quantified links; for instance, A part_of B means that
initiated in 1986 by the U.S. National Library of every instance of A is part of some instance of B (but
Medicine (NLM), with the purpose to integrate in- not vice-versa). OBO main relations (is_a, part_of, integral_
formation from a variety of disparate terminological part_of, proper_part_of, located_in, contained_in, adjacent_to,
sources. The UMLS now covers over 2 million names transformation_of, derives_from, preceded_by, has_participant,
for about 1 million biomedical concepts from more has_agent, instance_of) have been provided with consistent
than 120 BMTOs, as well as 12 million relations and unambiguous formal definitions (Smith 2005).
among these concepts (Bodenreider 2004). Apart
from openGALEN, all the above described systems are Discussion
included in the UMLS Metathesaurus, together with We have described a sample of BMTOs which pars
many others, covering organisms, drugs, chemicals, pro toto represent the variety of semantic standards in bi-
devices, procedures etc. ology and medicine. Our purpose was to give the readers
Besides facilitating transparent access to the sources an overview of the substantial efforts being carried out
(through the provision of raw files and online services), to describe terms and the entities they denote in order
the main achievement of the UMLS Metathesaurus lies to support querying and intelligent data and knowledge
essentially in the following: processing in general as well as specific applications.
Moreover, we present these efforts according to their
• each node of the source BMTO is retrospectively
expressivity in an increasing sequence. One aspect di-
mapped to a Metathesaurus concept, each of which has
rectly linked to expressivity is scaling and coverage, since
a unique identifier, called CUI (Concept Unique Identifier).
BMTOs encoded in expressive formalisms should be
These mappings are regularly updated by manual effort.
employed in more restricted domains, while for informal
They enable the bridging between different source BM-
terminologies this constraint is not relevant.
TOs. As a consequence, links between source nodes are
Though it seems straightforward in theory to distin-
mapped to links between CUIs, called semantic relations.
guish terminologies from formal ontologies, in practice the
Applications using them can therefore take advantage of
distinction is less clear. The key idea is that terminologies
concept linkages from both directions;
are much more related to organizing domain terms only
• each Metathesaurus concept is categorized by (as a huge amount of terms is at the core of any subfield
at least one semantic type from the UMLS Semantic of Biomedicine) – while ontologies give a more precise
Network, an overarching conceptual umbrella over account which is based on formal logic and as much as pos-
the biomedical domain (McCray 2003). A tree of sible independent of human language. A typical instance
135 semantic types, linked by is-a relations forms the for this is SNOMED CT. Its predecessors have their roots
backbone of this Semantic Network. Additionally, the in a compositional standardized nomenclature (SNOMED
network includes a hierarchy of 53 associative relation- Int.) and a clinical coding system (NHS Clinical Terms
ships (e.g., location_of, treats) which are used to form Version 3) but its current redesign is being increasingly
612 triples (e.g., Tissue, Diagnostic Procedure, etc.) from guided by ontological principles. On the contrary, BMTOs
which 6,252 additional triples can be inferred. These such as ICD and MeSH can be considered more estab-
triples are interpreted as domain / range restriction of lished as important and globally successful use cases have
the relations. existed for decades. ICD has the longest history and most
14 RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009
widespread dissemination due to its simple architecture for OBO ontologies to adopt OWL-DL, although a
and the early need for health or disease statistics. Endorsed proprietary format had been developed in the past and
by the WHO and by national bodies, its objective has is still largely used. Interestingly, openGALEN had been
then increasingly included clinical epidemiology, health conceived from the very beginning to use a logic-based,
management, quality assurance and billing in many coun- DL-like language. It therefore can be proud of having first
tries, including Brazil. MeSH, on the other hand, has a axiomatized significant amounts of medical terms, and
complex multihierarchical structure tailored to querying the lessons learned are highly valuable for biomedical
in biomedical text collections. ontology engineering till this date.
A clear trend that can be observed is the increasing The sheer amount of BMTOs describing partly
adoption of Semantic Web languages and formalisms, overlapping domains for similar or different use cases
particularly the ontology language OWL and its subset based upon different formalisms, philosophies and (tacit)
OWL-DL, the latter being adapted to the needs of ma- assumptions has been identified as a problem already in
chine reasoning. The main advantages of using inferenc- the eighties. Since then, large efforts have been invested
ing machinery such as the ones available for description into the UMLS Metathesaurus by which an increasing
logics is to be able to check the entailments of the axioms number of heterogeneous sources are annually cross-
contained in the ontology, to support knowledge-in- mapped and categorized. Two constraints must, however,
tensive queries, to calculate semantic equivalences of be stated. Firstly, the mapping cannot be more expressive
syntactically different expressions and to disambiguate than the least expressive source BMTO, and secondly,
natural language utterances. Although the currently avail- the usefulness of the UMLS for practical applications is
able classifiers run into scalability problems with more hampered by the fact that many of its sources are subject
expressive (and therefore more interesting) formalisms, to individual licensing.
the fact that standards like description logic and OWL In contrast, the OBO sources are completely in the
exist pays off for applications that require in-depth public domain and can be accessed by everyone. This, at
knowledge about a small number of subfields. As could least partly, explains their success and the high level of
be seen in the previous section, many of the BMTOs biological expertise being invested in their construction
presented have undergone endeavors to shift from their and maintenance.
original format to description logic: SNOMED was a In Figure 7, some key features of the described BM-
pure terminology in the past; FMA has already partially TOs and gathering efforts are summarized, showing their
shifted from frames to OWL, and there is a tendency scope, coverage, volume, formalism and usages.
Number of
Name Scope Formalism Applications URL
Nodes
Health Statistics,
Classi-
Around Epidemiology, www.who.int/classifica-
ICD Diseases fication,
13,000 classes Health Reporting tions/apps/icd/
strict is-a
Billing
Information about a
Everything encoded
Description 311,000 con- patient’s medical history,
SNOMED in the electronic www.ihtsdo.org
Logic cepts (2008) illnesses, treatments, and
health record
laboratory results
Electronic healthcare
Description records, clinical user
Anatomy, surgical
logic-like interfaces, decision sup-
GALEN deeds, diseases, Over 10,000 www.opengalen.org
language port systems, knowledge
health care
GRAIL access systems, natural
language processing
RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009 15
Used as a repository and
OBO/ OWL
Bioinformatics and an unified schema to
OBO / OBO_XML 60 ontologies www.obofoundry.org
molecular Biology interoperate biomedical
/ RDF
projects
scientific literature,
Biomedical and
Semantic Over 1 million guidelines, and public http://www.nlm.nih.
UMLS health related
Networks concepts health data, natural gov/research/umls/
concepts
language processing
Open issues and challenges Another key issue, which can also be seen in the
A new era for biomedical informatics is currently un- first example, is integration. As the number of biomedical
folding. Besides the algorithms employed in gene research, ontologies increases, many applications need to employ
ontologies are esteemed as an increasingly hot topic. There more than one ontology, which leads to a series of sig-
is already an active community researching and benefit- nificant consequences. Undeniably, this is not an issue
ting from semantic interoperability through ontologies, only for biomedicine; the main obstacles for knowledge
as ontologies are increasingly used for the annotation of reuse in the computer science mainstream come from
research data in Molecular Biology and Genomics. The knowledge heterogeneity. Knowledge is naturally diverse
emerging reusable vocabularies prove useful for describing in its various features: form, expression, representation
biomedical data and more and more kinds of applications. formalisms, language, syntax, contents, meaning, model-
The precise capture of biological knowledge in a compu- ing principles, practices and standards, points of view,
tational means enables the creation of systems capable perspectives, uses, granularity, terminology, premises, not
of meeting robust requirements as required by biologists, to mention that some unions of them can be hard for
medical researchers and practitioners: easy access to texts reasoning, regarding computational resources. Although
and databases containing detailed data, information, and ontologies (in a stricter sense, viz. statements about
statements; sound and complete reasoning, faster devel- what is always true and univocally accepted) only cover
opment of decision support systems for a broad range of a clear-cut segment of what is commonly understood
use cases, etc. However, some hard challenges have to be by knowledge representation, these varieties will always
overcome for the field to become mature. have an impact on crucial design decisions and will pose
subtle questions for ontology applications. Dealing with
A first issue resides in modeling. The subtle as-
heterogeneity has become a recurrent and challenging
pects that have to be described in biomedical ontolo-
research issue for ontology employment and, on the
gies usually requires toplevel ontologies and ontology
other hand, also a good source of ontology usage, e.g. for
assessment techniques (Guarino 2000) to come into
problems like information integration of heterogeneous
play, otherwise reasoning resulting from it can fail. An
ontologies, such as querying for hotels, whose descrip-
emblematic example can be seen in the relations between
tions are distinctly described in each of many systems.
the main classes Physical object and Amount of matter.
The famous WordNet ontology (Miller 1995), used for Granularity is a particular issue that has also deep
informatics researchers particularly from the field of impact on the integration of biomedical ontologies
Natural Language Processing, states that Physical Object (Schulz 2009). There is a hope to see medical and
is-a Amount of Matter. On the other hand, Pangloss, a large biological research join ontologies at the level of cell,
ontology mainly used for translation between languages, anatomy, drugs, etc. These communities might need dif-
describes the two classes in the opposite way, Amount ferent granularities or even different views of the same
of Matter being a superclass of Physical Object. Indeed, ontology. Another challenge related to integration is how
(Guarino & Welty 2000) state that both interpretations to handle existing biomedical ontologies that contain
are wrong: Every instance of Physical Object is constituted overlapping information, providing different views on a
by one or more instances of Amount of Matter. Yet there certain subdomain or covering different domains.
is no superclass relation, which can easily be seen by To enable ontology integration, plenty of research
analyzing meta-properties like unity, rigidity or identity). is taking place. A description of them is summarized in
In the biomedical field, such inaccuracies also occur: The (Freitas et al. 2007) and presented in depth in (Stuck-
Gene Ontology, in an earlier version, included the axiom enschmidt et al. 2000).
Cell has-part Axon. At a closer investigation, this definition On the actual application of biomedical ontologies,
led to ambiguities and underspecifications, since there text processing is surely one of the key ones. A very popu-
are cells without axons and axons without cells at least lar use case is the automatic assignment of MeSH terms
play a role in the lab (Schulz 2004). These two examples to user queries in PubMed. Another one is the automated
stress the need for more formality and semantic richness extraction of information related to individual genes
in biomedical ontologies. or proteins from scientific texts. The electronic health
16 RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009
record and consumer platform also constitute a wide MESH - Medical Subject Headings, http://www.nlm.nih.
field for text and knowledge processing. To tackle this gov/mesh/. Last accessed February 3, 2009.
issue, systems may rely on information extraction and
Miller G. WordNet: a lexical database for English. Com-
text mining systems (Muslea 1999, Ananiadou 2006).
munications of the ACM; 1995.
However, many questions remain unanswered, and the
combination of high quality text analysis methodologies Muslea I. Extraction patterns for information extrac-
with high-expressive and well-standardized ontologies tion tasks: A survey. American Association for Artificial
constitutes an ongoing research challenge. Intelligence (www.aaai.org) he AAAI-99 Workshop on
Machine Learning for Information (1999).
Bibliographic references Nelson SJ, Powell T, Humphreys LB. The Unified Medi-
Ananiadou S, McNaught J. Text Mining for Biology cal Language System (UMLS) of the National Library
and Biomedicine, chapter Introduction. Norwood, MA: of Medicine. Journal of American Medical Record As-
Artech House Publishers; 2006. sociation. 2006; 61: 40-42.
Antoniou G, van Harmelen F. A Semantic Web Primer. Nelson SJ, Schulman J. A Multilingual Vocabulary
MIT Press, Cambridge; 2004. Project - Managing the Maintenance Environment.
MeSH Section, National Library of Medicine, Bethesda,
Bechhofer S, Harmelen F, Hendler J, Horrocks I. OWL Maryland; 2007.
Web Ontology Language Reference. W3C Recommen-
dation; 2004 . http: //www.w3.org/TR/2003/PR-owl-ref- OBO - Open Biomedical Ontologies. http: //www.obo-
20031215/. Last accessed February 3, 2009. foundry.org. Last accessed February 3, 2009.
Bodenreider O. The Unified Medical Language System OBO-EDIT. An Introduction to OBO Ontologies http:
(UMLS): integrating biomedical terminology, Oxford Uni- http//oboedit.org/docs/html/An_Introduction_to_OBO_
versity. 2004 January 1; 32(1) Suppl.1: D267-D270. Ontologies.htm. Last accessed February 3, 2009.
Chen H, Fuller SS, Friedman C, Hersh W. Knowledge OpenGalen Foundation. http: //www.opengalen.org. Last
Management and Data Mining in Biomedicine Series: accessed February 3, 2009.
Integrated Series in Information Systems , New York:
PubMed. http: //www.ncbi.nlm.nih.gov/pubmed/. National
Springer; 2005. Vol. 8.
Library of Medicine. Last accessed February 3, 2009.
Cornet R. and de Keizer N. Forty years of SNOMED: a
Rector A, Rogers JE, Zanstra PE, Haring E. OpenGALEN:
literature review. BMC Medical Informatics and Decision
Open Source Medical Terminology and Tools. AMIA
Making. 2008; 8(Suppl 1): S2.
Annual Symposium Proceedings. 2003; 982.
FMA - Foundational Model of Anatomy sig.biostr.wash-
Rector A. Clinical Terminology: Why is it so hard? Meth-
ington.edu/projects/fm Accessed in April 2008. Berners-
ods of Information in Medicine. 2000; 38(4): 239-52.
Lee T, Hendler J, Lassila O, editors. The Semantic Web,
Scientific American. 2001; 28-37. Rubin DL, Shah NH, Noy N. Biomedical Ontologies: a
Freitas F, Stuckenschmidt H, Noy N. Ontology Issues functional perspective. Briefing in Bioinformatics. 2008
and Applications: Guest Editors’ Introduction. Journal Jan; 9(1): 75-90.
of the Brazilian Computer Society. 2005; 11(2). Schulz S, Hahn U. Mereotopological Reasoning about
GO - The Gene Ontology http://amigo.geneontology.org/ Parts and (W) holes in Bio-Ontologies, In: C. Welty and
cgi-bin/amigo/go.cgi. Last accessed February 3, 2009. B. Smith, editors, Formal Ontology in Information Sys-
tems. Collected Papers from the 2nd International FOIS
Gruber T. A translation approach to portable ontologies. Conference, New York, NY: ACM Press, 2001; 210-21.
Knowledge Acquisition. 1995; 5(2):199-220.
Schulz S, Hahn U. Towards the ontological foundations
Guarino N. Formal ontology in information systems. of symbolic biological theories. Artificial Intelligence in
Proc FOIS’98. 1998; 3-15. Medicine. 2007 Mar; 39(3): 237-50.
Guarino N, Welty C. A formal ontology of properties. Schulz S, Boeker M, Stenzhorn H, Niggemann J. Granu-
In: Knowledge Engineering and Knowledge Management larity Issues in the Alignment of Upper Ontologies.
- Proceedings of 12th International Conference EKAW Methods of Information in Medicine. 2009. Accepted
2000. France: Springer; 2000. for Publication.
IHTSDO - International Healthcare Terminology Stan-
Smith B, Ashburner M, Rosse C, Bard C, Bug W, Ceusters
dards Development Organisation. http://www.ihtsdo.de.
W, Goldberg L J, Eilbeck K, Ireland A, Mungall CJ, The
Last accessed February 3, 2009.
OBI Consortium, Leontis N, Rocca-Serra P, Ruttenberg
Kunierczyk W. Nontological Engineering. Formal Ontol- A, Sansone S-A, Scheuermann R H, Shah N, Whetzel PL
ogy In Information Systems. In: Proceedings of the 4th and Lewis S. The OBO Foundry: coordinated evolution
International Conference FOIS 2006, Amsterdam, The of ontologies to support biomedical data integration,
Netherlands: IOS Press; 2006. 39-50. Nature Biotechnology. 2007; 25: 1251-5.
RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009 17
Smith B, Mejino JLV, Schulz S, Rosse C. Anatomical In- Pundt, H. editors, Workshop on the 14th International
formation Science. In: COSIT 2005: Spatial Information Symposium of Computer Science for Environmental
Theory. Foundations of Geographic Information Science, Protection, Bonn, Germany. TZI, University of Bremen.
New York: Springer. 2005; 149-64 2000; 35–46.
Smith B, Ceusters W, Klagges B, Köhler J, Kumar A, Lomax Trombert-Paviot B, Rodrigues JM, Rogers JE, Baud R,
J, Mungall C, Neuhaus F, Rector AL, Rosse C., Relations in van der Haring E, Rassinoux AM, Abrial V, Clavel L,
Biomedical Ontologies. Genome Biology. 2005; 6(5). Idir H. GALEN: a third generation terminology tool
to support a multipurpose national coding system for
Spackman KA, Campbell KE, Côté RA. SNOMED RT:
surgical procedures. Intern J Med Informatics. 2000
A reference terminology for health care. In Masys DR
Sep; 58-59: 71-85.
(Ed.) , The Emergence of Internetable Health Care: Sys-
tems that Really Work. Proceedings of the 1997 AMIA UMLS - Unified Medical Language System http: //www.
Annual Symposium, 640-644. Philadelphia: Hanley & nlm.nih.gov/research/umls/. Last accessed February 3,
Belfus, Inc. 1997 . 2009.
Spackman KA. SNOMED CT milestones: endorsements WHO - International Classification of Diseases, 10th
are added to already-impressive standards credentials. Edition. World Health Organization. http: //www.who.
Healthcare Informatics. 2004; 21: 54-6. int/classifications/apps/icd/icd10online/ . Last accessed
February 3, 2009.
Stuckenschmidt H, Wache H, Vogele T, Visser U. En-
abling technologies for interoperability. In Visser, U. and
Stefan Schulz
Holds a medical degree (Heidelberg University, Germany) and is senior researcher and professor at the Institute
for Medical Biometry and Medical Informatics of the University Medical Center Freiburg, where he leads the
Medical Informatics Research Group. His work focuses on biomedical terminologies and ontologies, biomedical
knowledge representation, cross-language medical document retrieval, text and data mining in clinical document
repositories, eLearning in Medicine, and health informatics in developing countries. After clinical work in surgery
and internal medicine he obtained his doctoral degree in the field of tropical hygiene where he carried out a
parasitological field study on in São Luís, Brazil. After obtaining a technical qualification in medical computing,
he moved to the University of Freiburg, where he participated in clinical and educational software develop-
ment projects and participated in several research projects in the field of information extraction, biomedical
terminologies, medical language engineering and semantic technologies. He has played a leading role in several
EU-funded research projects, authored more than hundred peer reviewed publications and has received several
awards. Since 2001 he has repeatedly contributed to Brazilian health in-formatics research projects as a visiting
researcher at the Paraná Catholic University (PUC-PR).
18 RECIIS – Elect. J. Commun. Inf. Innov. Health. Rio de Janeiro, v.3, n.1, p.7-18, Mar., 2009