NIH Public Access: Author Manuscript
NIH Public Access: Author Manuscript
NIH Public Access: Author Manuscript
Author Manuscript
Appl Ontol. Author manuscript; available in PMC 2011 May 31.
Abstract
Since 2002 we have been testing and refining a methodology for ontology development that is
now being used by multiple groups of researchers in different life science domains. Gary Merrill,
in a recent paper in this journal, describes some of the reasons why this methodology has been
found attractive by researchers in the biological and biomedical sciences. At the same time he
assails the methodology on philosophical grounds, focusing specifically on our recommendation
that ontologies developed for scientific purposes should be constructed in such a way that their
terms are seen as referring to what we call universals or types in reality. As we show, Merrills
critique is of little relevance to the success of our realist project, since it not only reveals no actual
errors in our work but also criticizes views on universals that we do not in fact hold. However, it
nonetheless provides us with a valuable opportunity to clarify the realist methodology, and to
show how some of its principles are being applied, especially within the framework of the OBO
(Open Biomedical Ontologies) Foundry initiative.
Ontologies are created to serve multiple goals, including support for more effective retrieval
of data and for different sorts of reasoning. Here we focus on ontologies created to foster
consistency in the ways scientific results are described for purposes of more effective
integration of scientific data ontologies, therefore, that serve strategies to counteract the
many tendencies leading to ad hoc and non-interoperable coding of data, and thus to the
formation of data silos.
Unfortunately, the very success of such strategies has led to the creation of ever new
ontologies, and thus has resurrected the very silo problems which ontologies were designed
to counteract. To this end, it is of obvious advantage if we can find a way to minimize the
number of ontologies that are being constructed and at the same time maximize their mutual
consistency. These ends can be achieved, however, only if we can persuade ontology
developers to accept certain common constraints on how they build their ontologies and if
we can find a way to do this that does not endanger the flexibility that is needed to keep pace
with scientific advance.
The realist methodology is based on the idea that the most effective way to ensure mutual
consistency of ontologies over time and to ensure that ontologies are maintained in such a
Page 2
Scientific realism =def. the doctrine according to which scientific theories are (broadly)
true of reality.
Metaphysical realism =def. the doctrine according to which universals or types exist in
reality.
Merrill (2010, p. 85), quite correctly, sees elements of both of the above in the etiology of
our thinking on ontology development. He himself embraces what he calls an anti-realist
position which consists in the denial of metaphysical realism as defined above, and which
we can accordingly define as follows:
Anti-realism =def. the doctrine according to which there are no universals or types in
reality, but only individuals or particulars.
Two forms of anti-realism can then be distinguished:
Nominalism =def. a variety of anti-realism consisting in a doctrine to the effect that
entities labeled by the same term for example, this bonobo and that bonobo have
nothing in common but their name.
1The principles propounded in what follows are derived from our own practice in ontology development, and go beyond the principles
thus far adopted by the OBO Foundry, which are documented here: http://obofoundry.org/crit.shtml.
Page 3
which the terms used in ontologies should be seen as referring. Because ontological realism
is a methodology, and not a doctrine, it stands in no logical relation to any of the
metaphysical doctrines specified above. Certainly it takes over the terminology of types,
universals, instantiation from the metaphysical realist literature; but it does not stand or
fall according to whether universals or types do or do not exist in some metaphysical sense,
and our goal will be to provide a specification of our methodology which will allow even
anti-realists to recognize its benefits.
1.3. The methodology
The methodology can be summarized as follows. Ontologists, when building ontologies,
should conceive the world as including entities of two sorts called particulars (or
instances) and types (or universals), respectively. Particulars, according to this doctrine,
are the sorts of things that can be described on the basis of observations performed for
example in the lab or clinic. Types or universals we shall always use these terms
synonymously in what follows are to be understood as counterparts in reality of (some of)
the general terms used in the formulation of scientific theories. Particulars are concrete
individual entities (entities that exist in space and time and that exist only once); types or
universals are to be understood as repeatable. This means that, for each given type, we can
in principle discover of indefinitely many particulars that they are its instances. (We shall
return to address in more detail the relation between universals and repeatables below.)
The particulars in reality can be partitioned into groups on the basis of multiple similarity
relations which obtain between them, and the process of recognizing such collections of
similars is essential to all forms of cognition. Sometimes the process yields classifications,
which is to say partitions of reality based on hierarchies organized in terms of greater and
lesser generality.
Multiple more or less ad hoc classifications have been created in the course of time, and
human beings have the ability to cope with the resultant mismatches. Computers, however,
are much less tolerant of classificatory inconsistency, and this can cause problems when
computers are put to work in managing large and heterogeneous bodies of data. We can
distinguish two kinds of responses to these problems, of which the first, sometimes called
the bottom-up approach, sees the solution in terms of mappings between the existing,
mismatched classifications. The second, top-down approach, sees the solution in terms of
strategies to constrain the classifications created and used by different groups, in the
direction of greater consistency. The realist methodology that we advocate falls within this
second camp, and its strategy of prospective standardization in some ways parallels the
earlier effort to coordinate the expression of measurement results by creating a single
international system of units.
Whether scientists themselves see the general terms they use as referring to types (or
universals or natural kinds or like entities) is not relevant to the success of our methodology.
All that is important is that scientists use general terms in attempting to describe repeatable
features of reality. That they do this is not Merrills odd imputations to us of views to the
contrary notwithstanding because they have been taught (or should be taught) special
metaphysico-semantical doctrines concerning reference and meaning. Rather it is for a
variety of practical reasons for example, because, when scientists formulate particular and
general assertions, then they want other scientists to be able to verify or falsify them in
experiments. For this they must be in a position to describe repeatable features of reality in a
way that allows these other scientists to recreate them.
We do not deny that there are many distinct philosophical approaches to the understanding
of the scientific use of general terms and of what it is in reality towards which such terms
Appl Ontol. Author manuscript; available in PMC 2011 May 31.
Page 4
are directed. For practical purposes, however, we believe that these philosophical matters are
of secondary importance. This is because even the metaphysical anti-realist can, we believe,
view all putative references to types or universals including the many such putative
references in what follows as mere faons de parler about other, more commonplace
entities such as scientists beliefs or linguistic usage and still gain full practical
advantage from our methodology.2
We take as our starting point a distinction between two sorts of descriptions, which we
believe pervades the whole of science. It is seen most simply in the contrast between, for
example,
(A) AIDS is spreading very rapidly through Asia,
and
(B) AIDS is caused by the HIV virus,
in which the string AIDS can be understood as referring to a particular collection (in (A)),
and to a type (in (B)).
Scientists are constantly drawing on this distinction as they move back and forth between
descriptions of experiments on the one hand, where they are dealing with carefully
demarcated collections of particulars (for example, populations of study organisms), and the
formulation of results in theories on the other hand, where they can be seen as dealing with
corresponding types.
The distinction between collections and types is used by scientists themselves to monitor
progress in discovering the structure of reality. It was a scientific advance when members of
the collection of human beings distinguished by the possession of the phenotypic feature of
having a mongoloid face were found to be associated with instances of what the realist
would call the disorder universal trisomy 21. Similarly, the scientific debate over whether
there exists something that is properly to be called a race can be formulated in terms of
whether race should be understood as denoting a universal or a mere collection.
Discovering universals for example, discovering that there is a type of disease called
AIDS, or a type of particle called Higgs boson is a scientific achievement. Discovering
that terms purportedly referring to universals (like diabetes) do not do so (or do so only
ambiguously, as between diabetes mellitus and diabetes insipidus) is a scientific
achievement of a different kind. Yet another kind of achievement consists in discovering
general truths about universals for example, discovering that infection with the influenza
virus causes the same type of disease throughout the world, even in spite of the many
different manifestations and culturally contingent descriptions with which it is associated.
1.4. What are types or universals?
The difference between collections of particulars on the one hand and types or universals on
the other is related to what is commonly referred to in some logical circles as that between
classes in extension roughly, sets of individuals in the familiar sense and classes in
intension the latter sometimes (on one of the several understandings of this word) called
concepts. The problem with the approach in terms of extensions and intensions is that it
2Very crudely, the anti-realist might view a sentence of the form
(1) scientist X believes that instances of a given type Y exist in reality
as meaning something like
(2) scientist X believes that it is appropriate to use the general term Y in making positive assertions about reality.
Page 5
suggests that there is a closer concordance between the two sorts of entities than is in fact
the case. Both extensions and intensions, on standard views, can be combined in arbitrary
ways in Boolean combinations. Thus if F is an extension (set) and G is an extension (set),
then there are further extensions F & G, F or G, non-F, non-G, non-F & non-G and so on.
And similarly: if F is an intension, for example, the concept nausea, and G is an intension,
for example, the concept vomiting, then there are further concepts nausea and vomiting,
nausea or vomiting, non-nausea, non-vomiting, non-nausea and non-vomiting and so on.
Concepts, in other words, can be combined logically to produce other concepts. The
uncontrolled combination of concepts in this manner is in our eyes one reason for the
failure, thus far, of terminology artifacts created in accordance with what we shall recognize
below as the concept orientation. This is because, from the potentially infinite number of
concept combinations that can be formed from any given starting point, some selection must
be made. And as different individuals and groups make their selections in more or less
deliberate and more or less ad hoc ways, the realization of the goal of ontology-based
integration becomes ever more remote.
Something similar holds, of course, for collections. Thus, for example, if there is a collection
of people suffering from nausea in a given hospital at a given time, and a collection of
people suffering from vomiting in the same hospital at the same time, then there are ipso
facto further collections formed, for example, by the union and the intersection of these two
collections.
As concerns types, however, things are different. If we know (or better: think we know,
modulo the current state of science) that certain types exist, then on our view which
corresponds to the ways scientists themselves often use words like type or kind in
devising terminologies to describe their results the rules of Boolean algebra give us no
sanction at all to infer that certain other types exist also. The question thus arises as to which
collections do correspond to types or universals in the sense that we can formulate for them
definitions of the following sort:
(C) Collection of Xs =def. collection of particulars of type X.
This question is, unfortunately, not answerable with any simple recipe. It is in this respect
comparable to the question: how do we establish whether a given scientific assertion is true?
It would of course be nice to have a decision procedure for determining which terms should
be recognized as designating types for any given discipline ideally one which could be
programmed into a computer. In fact, however, the set of candidate terms designating types
is a matter that is decided, for each science, by the scientists themselves, in an on-going
process of terminology evolution through which those terms come to be selected for that are
fit to serve in successive formulations of the corresponding scientific theory. The work of
the ontologist, as we see it, is in large part one of transforming the results of this process
which are standardly informal, unreflected, subject to redundancies, ambiguities and to
constant revision into the sorts of systematic representations that are needed to support
integration.
Each scientific theory as it exists at any given stage will likely be marked by (as yet
unidentified) terminologically relevant errors, and these errors will accordingly be carried
over into the corresponding ontology. Hence, we cannot embrace any representational
assumption according to which there is a one-one correspondence either between scientific
general terms, or between terms in reference ontologies, and types or universals in reality.
Rather, the realist methodology is one according to which the developers of a reference
ontology should assume for heuristic purposes that the terms in the ontology they are
developing refer to such types, knowing full well that this assumption may be false for any
given term. Ceusters (2009) shows how on this basis we can use the analysis of the ways the
Page 6
set of terms selected for inclusion in a given ontology changes from version to version as a
strategy for evaluation of both the ontology as a whole and of the contributions made to this
ontology by specific individuals and groups.
Examples of general terms used by scientists that are unproblematically (assuming no errors
in the corresponding scientific theories) such as to represent types, include:
(D) Boson, electron, organism, planet, apoptosis, death, orbit.
Examples of general terms which are unproblematically such that they do not represent
types include:
(E) Thing that has been measured, thing that is either a fly or a music box, organism
belonging to the King of Spain, case of pneumonia in man wearing uniform while
riding bicycle on small boat with or without fall from stairs.
Note that the terms on either list can be used unproblematically to formulate representations
of collections (however the latter term is to be understood3). Terms like those in (D)
however, can be used to define collections in accordance with (C) above.
Given entities might be similar along different dimensions. For example they might be
similar with respect to length, or feeding pattern, or distance from Witwatersrand.
Informally, we can say that, for the collections defined by terms like those in (D) there is a
relation of similarity that holds between the members of the collection in virtue of what they
are (for example cells). For the collections defined by terms like those in (E), in contrast, the
pertinent similarity relation holds between the relevant members because of how they are
(for example how they are related to locations or observers).
Note, however, that in many cases even terms of the latter sort can still be defined in terms
of types, as for example in:
(F) Thing that has been measured =def. thing that has served as target of some instance
of the type act of measurement.
This strategy for re-defining terms will turn out to play a central role in our understanding of
ontologies in what follows. It can also help to elucidate the relation between universals or
types on the one hand, and repeatables on the other. Roughly: wherever we have
descriptions of repeatables of the form the Xs, some way can be found to define the X
term along the lines of (F) above.
1.5. The Higgs boson
When scientists attempt to detect the Higgs boson (Abazov et al., 2010) they are seeking,
first of all, to detect certain particulars individual things that exist (albeit in some merely
probabilistic sense) in space and time. But they are not, of course, seeking to detect just any
particulars. Rather, they are seeking particulars that are similar to each other in the sense that
they are, again, instances of a corresponding type.
In the case of successful detection, scientists would accordingly need to report their results
by employing descriptions of two sorts. On the one hand, they would need to use individual
3The ontology of collections is itself a difficult subject, and we can provide only brief and informal indications here. In a full account
we would need to address the question whether phrases like all members in a collection mean: all members existing at a given time
or all members existing at any time (Ceusters & Smith, 2010). We would also need to address the issues of vagueness which arise
where similarity relations are marked by gradients (Smith & Brogaard, 2000; Bittner & Smith, 2001). Such issues will not, however,
affect our argument here.
Page 7
referring expressions to identify what they had observed in particular experiments. This
would yield sentences such as:
Higgs boson particles have been detected by the CERN Large Hadron Collider in an
experiment carried out on June 4, 2014.
On the other hand, they would need to use general nouns and noun phrases to refer to the
types whose instantiation had been predicted by the relevant scientific theories, for example,
in sentences of the form:
All six types of elementary boson predicted by the Standard Model (photon, W boson, Z
boson, gluon, Higgs boson and graviton) have now been experimentally confirmed.
Here again the word type is being used to refer to an entity that is repeatable. The
underlying idea is that where there is repeatability there are entities called types. Because
these entities stand to each other in relations of greater and lesser generality, they can
sometimes usefully be represented in corresponding hierarchically organized ontologies, as
in Fig. 1.
1.6. Reference ontologies
We can now formulate the following
A second methodological principle can now be formulated, in this same normative spirit, as
follows:
Principle of consistency with established science: The assertions of which a reference
ontology consists at any given stage should be consistent with the best available settled
science that is current at that stage.
The two mentioned principles might in theory be consistent with an approach according to
which ontology developers working in support of different scientific disciplines would
develop representations of the types in the corresponding domains according to their own
specific ideas of how such a task might best be realized. Some might for example decide to
create a mere list of types organized alphabetically. Others might create a representation of
types organized hierarchically according to the mereological relations between their
instances. An uncoordinated approach along these lines would not, however, address the
goal of cross-disciplinary data integration. Where neighboring scientific disciplines are
formulating results concerning the entities in areas where their domains overlap, we need to
Appl Ontol. Author manuscript; available in PMC 2011 May 31.
Page 8
ensure that two ontologies agree in the ways these types are represented. Thus where one
discipline deals with subtypes of types falling within the purview of another discipline, then
the former will need to classify these subtypes by using terms taken over from the latter.
To address such issues, the representations of types created to support the integration of data
generated by a given family of scientific disciplines for example, in biomedicine need to
be developed in a highly constrained way in conformity with certain common principles that
are accepted in advance by the developers involved. The latter will need to agree not only in
use of terms, but also in definitions, and this will bring the need for common principles
concerning how terms are to be defined. Ontologists will need to agree also in the logics
used for reasoning with these definitions, on practices for use of identifiers, for versioning
and obsoleting, and for use of ontologies in annotations and all of this will require a
further layer of principles relating to governance and to testing and selection.
1.7. Ontology path dependence
What the various principles should be that guide ontology development is of course the 666
dollar question of ontology coordination. As we shall argue below, to have any hope of
success in an area as broad as the entirety of the life sciences, the principles must be
understood as part of an evolving, empirically guided process beginning with initial
formulations that address as closely as possible readily identifiable needs and practices of
biologists, moving on from there in stages to progressively more rigorous formulations
allowing incrementally more ambitious approaches to the integration of data.
Our experience tells us that the needed set of principles will involve some which take the
form of substantive or technical guidelines for building ontologies (for example, distinguish
continuants from occurrents; employ a backbone is_a hierarchy using single inheritance).
Some principles, however, will be a matter of social coordination, the most important of
these being:
Ontology path dependence principle: The decisions made by the creators of an ontology
including those decisions which pertain to the ontologys upper-level architecture
should as far as possible be made on the basis of the degree to which they advance the
consistency of that ontology with the reference ontologies already existing in relevant
domains.
One of Merrills central criticisms relates to our acceptance of what he calls the Referential
Assumption (Merrill, 2010, p. 85), which (in simple terms) we can express as the
proposition that ontologies should consist of general terms as their representational units.
Merrill criticizes our work because (he thinks) we hold this belief for complicated
philosophical reasons, which he rightly sees as being irrelevant to the practical purposes of
science. His criticism is however undermined because he fails to take account of the degree
to which we take path dependence seriously (because not to do so would doom our project
to failure). Thus he does not comprehend that, for us, the thesis that ontology developers
should focus on general terms when constructing ontologies is to be recommended for the
simple reason that all successful ontologies in support of science created thus far consist
overwhelmingly of representational units of this sort.
Tacit acceptance of the ontology path dependence principle among our biologist colleagues
has brought it about that certain ontologies in the area of biology in particular the Gene
Ontology (GO) have come to enjoy a privileged position. The GO is a controlled
vocabulary developed to serve the consistent formulation of information pertaining to the
attributes of gene products in organisms of different types (Gene Ontology Consortium,
2000). Since its creation in 1999 it has enjoyed a phenomenal success, and its role as de
facto standard ontology in important areas of biology makes it in some ways comparable to
Appl Ontol. Author manuscript; available in PMC 2011 May 31.
Page 9
the US interstate highway system. This in turn justifies the expenditure of extraordinary
effort to ensure that it continues to be developed in ways that maintain its consistency with
the best available science.
This privilege reflects in part a simple homesteader effect; since ontology is so new, there
are many fields thus far not ontologically tilled. The first in the field in any given area
acquires certain presumptive rights. One such right consists in the fact that ontologies
developed thereafter in neighboring domains have a responsibility to ensure that they are
constructed in ways that make them consistent from the point of view of both logicoontological architecture and scientific content with the already privileged ontologies which
came earlier. In addition, it implies that certain design choices made in the construction of
these established ontologies should, again presumptively, be adhered to also by the
successor ontologies which are created in their wake. At the same time, of course, the
homesteader privilege brings considerable responsibilities, and the presumptive rights
associated therewith can in principle be over-turned in case of demonstrably poor husbandry
(Smith & Ceusters, 2006; Smith, 2006b; Smith, 2010).
1.8. Asserted monohierarchies
Inspired in part by (Rector, 2003), we advocate the following:
Terms in the resultant asserted hierarchies can be used in various combinations, using
relations taken over from the Relation Ontology (RO) (Smith, 2005) to form new terms,
following a methodology first applied in relation to the GO and its sister ontologies in Wroe
et al. (2003) (compare also Hill et al., 2002; Mungall, 2004). The goal is both to reduce the
degree of arbitrariness typically involved in term composition in ontologies, and to ensure
that ontologies are developed in tandem in such a way as to constitute a progressively more
well-integrated modular network.4 A term such as blood glucose measurement, for example,
is formed from FMA:portion of blood, ChEBI:glucose and OBI:act of measurement. When a
classifier is applied to the result of adding such a term, with its definition, to the already
existing set of asserted monohierarchies, then certain further is_a relations will be able to be
inferred. This will in some cases yield a polyhierarchy, or in other words a hierarchy in
which some terms will have more than one is_a parent (hence multiple inheritance
meaning that the entities represented by a term with multiple parents will inherit a
corresponding set of attributes from each of its parents).
Rector (2003) has developed a methodology for normalizing ontologies by decomposing
existing polyhierarchies into homogeneous disjoint monohierarchies. For him, the
monohierarchies are then recombined using logical definitions from which an enriched polyhierarchy can be inferred mechanically using a theorem prover or reasoner.
Page 10
As Rector shows, ontology-based integration is easier to manage and scale on the basis of
normalized ontology modules. It is easier to master the problems associated with
combinatorial explosions when normalized ontology modules and a restricted set of relations
are used to serve as the basis for allowable sorts of combinations. It is also easier to maintain
ontologies, for example, when a change must be made due to some scientific advance. This
is because the change in question can be made in just one place in the normalized ontology,
allowing consequent changes in the associated polyhierarchies to be propagated
automatically.
As we ourselves have argued at length (for example, in Smith, 2004), ontologies which
allow multiple inheritance are prone to characteristic kinds of errors, not least because
different axes of classification become hard to keep separate in developers minds. And as
Rector points out, Empirically, whenever examining a multiaxial ontology and then
normalising it, we find errors.5 We have also seen in our own experience of working in
ontology teams with, for example, plant, or cell, or infectious disease biologists, that the
restriction to single inheritance, while often initially painful because it is seen as placing
restrictions on what can be said, very often yields a solution that is seen by the developers as
more illuminating of the underlying science, and thus as more stable, than the multiple
inheritance-based approaches that had previously been adopted.
Normalized ontology modules help in preventing errors also because their plug-and-play
character helps to encourage ontology reuse. Because those who are called upon to construct
new ontologies are more easily able to draw upon ontology content that has already been
thoroughly tested, they therefore do not need to construct ontology components anew and
thus they can avoid creating new errors and inconsistencies, and thus new avenues for silo
formation.
1.9. Basic Formal Ontology (BFO)
Restricting the asserted portions of the is_a hierarchies of reference ontologies to single
inheritance thus brings considerable benefits, and because there is an easy way of then
Page 11
generating the associated multiple inheritance-based artifacts people might need these
benefits come at very little cost. Some in the GO community are accordingly proposing
experimentally to figure out what the normalized versions of the three Gene Ontologies
would have to be in order to ensure that the existing versions of the ontologies could be
derived automatically therefrom by using reasoners. Even a partial success in this regard
would add much to GOs utility to reasoning systems.
In the case of the GO, it is clear what the relevant root nodes should be in such a normalized
reconstruction they would be cellular component, molecular function, and biological
process, respectively, corresponding to the three existing divisions of the GO. In the general
case, therefore, it is not so clear how such root nodes for normalized ontology modules
should be selected and how they should be positioned in relation to the root nodes of
neighboring ontologies. If ontologies are to be developed in coordinated fashion, therefore,
then substantive principles need to be available also to support the making of decisions such
as this, and to this end we need a strategy concerning which most general types or universals
should be taken as the starting point for the process of populating an ontology downward
from the root. To this end we have proposed the set of categories together forming the Basic
Formal Ontology (BFO) (Grenon and Smith, 2004), specifically the three top-level
categories of independent continuant, dependent continuant and occurrent.
Some set of upper-level categories is needed if ontology coordination in the service of data
integration is to be possible at all, and in Section 6 we shall argue the merits in this regard of
BFO resulting from the fact that it was created for precisely this purpose. Already some 75
ontology projects in different domains of the life sciences are being developed in its terms.
The authors of the Foundational Model of Anatomy have for some years been working to
ensure conformance with BFO (Smith & Rosse, 2004; Rosse & Mejino, 2007). BFO has
also been subjected to thorough tests of its serviceability as an upper level ontology for
scientific purposes by the members of the OBI (Ontology for Biomedical Investigations)
Consortium, and by users and critics such as Thomas Bittner, Maureen Donnelly and
Randall Dipert (Buffalo), Mathias Brochhausen (IFOMIS), Lawrence Hunter and Mike
Bada (Denver), Chris Mungall (Berkeley), Fabian Neuhaus (NIST), Bjoern Peters (San
Diego), Alan Ruttenberg (Buffalo), Holger Stenzhorn (IFOMIS, Saarland University) and
Kerry Trentelman (Buffalo), as well as by some of the 120 members of the BFO Discussion
Group.6 These tests have led to a number of changes in the ontology over time. They have
also, as we are the first to admit, identified a number of shortcomings in BFO, some of
which (we hope) will be addressed in the forthcoming release of BFO 2.0.
One important feature of BFO is that its tripartite top-level structure echoes the tripartite
design of the Gene Ontology. Collaboration between the BFO and GO Communities was
inaugurated at a meeting organized in Leipzig in 2004 on the topic of The Formal
Architecture of the Gene Ontology,7 where Smith, in a presentation entitled STOP!,8
presented arguments in favor of the need for certain changes in the GO. These arguments
received a favorable response from the GO Consortium because they were seen as bringing
immediate practical benefits, including:
a.
providing a clearer understanding of the relation between terms in the GO and the
entities studied in biological experiments (Hill et al., 2008),
Page 12
b. providing a readily applicable technique for formulating definitions of the is_a and
part_of relations and thereby removing certain inconsistencies in GOs earlier
treatments (Smith, 2004),
c.
identifying errors in terms and definitions of GO, leading for example to the
obseletion of terms such as GO:0005941 unlocalized protein complex, which
reflected a confusion of ontology with epistemology.
One result of our work with Ashburner, Lewis, Lomax, Mungall and other GO principals,
and also with the leaders of the FMA and GALEN groups, was the creation of the Relation
Ontology (RO) (Smith, 2005), which is designed to restrict the repertoire of relations
available for use by biomedical ontology developers to a small set, all the members of which
are logically defined in such a way as to promote interoperability of the ontologies which
use them.
Following shortly after the publication of the RO paper came the establishment, in 2006, of
the OBO Foundry (Smith et al., 2007), which adds a layer of governance and of peer review
to the process of multi-ontology development, and which uses the GO/BFO tripartite
division of categories as basis for partitioning the totality of biomedical entities into nonoverlapping ontology domains (see Fig. 2).
1.10. How the ontological realist methodology works to support ontology authoring
Our methodology for ontology development requires that discipline-specific reference
ontologies be created manually by experts in the corresponding disciplines, persons who
already know what it is in reality to which the terms in their discipline refer. The first round
in the iterative process of building a discipline-specific ontology will require the creation by
such persons of a draft list of the general terms that can be used within the discipline in
positive assertions to refer on initial inspection to types or universals.
For any given settled science the set of candidate terms in this respect is broadly understood
and accepted by the scientists involved. The problem is that this set is typically too large for
the purposes of coordinated ontology development. Some terms will thus need to be
removed, for example because of redundancy or ambiguity, or because they refer not to a
corresponding universal or type, but rather to what we might refer to as an attributive
collection of particulars, as for example, human who has been tested for HIV or human
with bra cup size C.9 Further terms, such as known allergy, other diabetes, pneumonia
diagnosed by inspection of sputum sample, will need to be excluded because they involve a
more or less hidden reference not to the way things (repeatably) are on the side of reality but
rather to some particular feature of our present state of knowledge (Bodenreider et al.,
2004).
To ensure conformity to the principle of asserted single inheritance, it will sometimes be
necessary to transplant some terms from the initial list into separate lists, for example, by
following the recommendations generated by application of the OntoClean methodology
(Guarino & Welty, 2002). The transposed terms will then be defined using terms which
remain, together with terms from other reference ontologies according to need. In this way,
for example, a term such as mechanosensory organ might be removed from a structurally
based anatomy ontology and defined in terms of the anatomical term organ and a term
such as mechanosensory function created in an external function ontology. The
9In Smith and Ceusters (2006) we called such collections defined classes. We no longer favor this terminology since the fact that a
given term is or is not defined in a given ontology need carry no significance as to the status or nature of the entity represented.
Page 13
When the asserted monohierarchies have been identified, the terms in each hierarchy can be
defined according to the
Principle of Aristotelian definitions (Rosse & Mejino, 2003): Given a term A in an
asserted monohierarchy, with parent term B, the definition of A should take the form
where C expresses some condition on those instances of B which fall within the As.
One consequence of this principle is that there are no disjunctive or conjunctive or negative
universals an issue to which we return in our treatment of the term non-smoker below.
1.11. The principle of instantiation
The inclusion of a representation of a universal in the GO requires that at least one realworld instance of this universal has been shown experimentally to have existed. Consider,
for example, the universal retinol dehydrogenase activity, defined as the potential to realize
the reaction: retinol + NAD+ = retinal + NADH + H+. Before this term could be included in
the GOs molecular function ontology, it was necessary that experimental evidence be
provided (Zhang et al., 2001) to the effect that there exist molecules that have instances of
this universal as their functions.
GOs practice here is taken as model for a further principle by means of which ontology
authors can judge whether given terms should be included in a given ontology, namely the
Principle of instantiation: A term should be included in a reference ontology only if
there is experimental evidence that instances to which that term refers exist in reality.
(Exists here should be understood in a tenseless sense in order to accommodate, for
example, universals pertaining to extinct species as well as universals such as swarm or
hurricane which are instantiated only intermittently.)
Insisting upon the principle of instantiation, and thus on experimental evidence, provides us
also with a means by which we can judge whether two ontologies are orthogonal in the
sense that they do not overlap in their respective domains. Is an ontology containing the
term phosphogluconate pathway orthogonal to an ontology containing the term pentose
phosphate cycle? To find out, we need to identify what types (if any) these terms refer to in
reality, and for this we will need to work with biologists who are carrying out salient
experiments and who can thus explain to us what processes of intervention and observation
are involved in gaining information about the corresponding instances.
References to different sorts of universals are in this way used to form chains of validation,
whereby tests for the instantiation of universals further down the chain (for instance,
molecule of retinal) provide evidence for the existence of universals further up (which
means: universals, or putative universals, closer to the frontiers of current knowledge for
instance, retinol dehydrogenase activity). Often very simple universals are involved in such
validations, as for example when instances of the color universals purple, pink and red are
10Sometimes there will be several ways of achieving the end of single inheritance. (Compare the situation in topology where any one
of a number of basic terms such as boundary, closure, interior, open, closed can be selected as primitive in such a way that
each of the other terms on the list can be defined therefrom.)
Page 14
Our modular strategy rests hereby on a division of labor between ontologists in different
disciplinary communities working in tandem on the basis of BFO as common formal
ontology, the latter being itself subject to revision in light of its ability to serve the
representations of the corresponding portions of science. The need for an approach involving
a common upper level ontology is, we believe, a simple practical consequence of
collaborative ontology development in the service of empirical science. The presence of a
common upper ontology means, for example, that those working on cells or proteins are
easily able to draw on each others resources in building their respective ontologies, revising
these in tandem in reflection of changes brought by advances in empirical science (Masci et
al., 2009). All of those involved are thereby engaged in creating not merely the ontologies
themselves but also, as an inevitable side-effect, an evolving set of mutually binding
constraints on each others work that serves to ensure that these ontologies are developed in
such as way that their interoperability is preserved over time. These constraints (principles,
criteria) must be widely acceptable to different groups of scientists providing data for
integration. At the same time, they must be able to bring about a process of evidence-driven
improvement in the ontologies constructed in their terms. The result is a system of reference
ontologies whereby:
1.
for any given domain of reality, exactly one reference ontology is constructed that
is (a) in conformity with the settled science in that domain and (b) capable of being
recommended for general use,
2.
3.
they will reduce the need for (typically fragile and costly) mappings between
ontologies covering the same or overlapping domains, and
4.
they will be able to be used as a reliable starting point for the development of
application ontologies needed for specific purposes.
Page 15
Each reference ontology, if our strategy is successful, will, like the GO, serve as an attractor
for multiple expanding groups of users whose members will have strong incentives not only
to invest resources directed toward ensuring that it is developed and used in ways that keep
pace with scientific advance, but also to recommend it to other users since this will
increase the value of their own investment. In this way, we believe, we have a strategy
which can avoid recreating through ontology proliferation the very silo effects to which
ontologies themselves were originally conceived as the antidote (Smith, 2008). We know of
no other approach to ontology development of which an analogous claim can be made.
1.13. How the ontological realist methodology works to support ontology maintenance
Scientists in many areas of biology, including clinical research, have come increasingly to
rely on a process whereby professional biocurators manually create annotations to
experimental data using terms from the GO. This annotation process unfolds in a series of
steps, which can be summarized as follows:
1.
2.
the curator applies expert knowledge to the documentation of the results of these
experiments, a process which involves determining which types of gene products
are being studied in the experiment, and which types of molecular functions,
biological processes and cellular components are identified as being correlated
therewith;
3.
Page 16
4.
where representations of specific types needed for annotations are missing from or
misclassified in the GO, the curator submits a corresponding request for inclusion
or correction to the ontologys editors using a dedicated tracker.
Through the implementation of step (4), a virtuous cycle is brought into play in conformity
with what we shall call the:
User feedback principle: A reference ontology should evolve on the basis of feedback
derived from those who are using the ontology for purposes of annotation.
This means that the process of curation of experimental results by biologists contributes to
the on-going improvement of the ontology. This in turn contributes to improvements in the
annotations created in subsequent cycles.
The methodology is described in detail in Hill et al. (2008), which makes clear the essential
interplay between the two kinds of descriptions referred to already above of (i) the
individual entities observed in the lab and captured in reports of experiments, and (ii) the
types these entities instantiate, which are represented through the use of general terms in the
assertions of the corresponding scientific theories.
The idea underlying our methodology for the development of such reference ontologies can
now be summarized as follows. Scientists formulate assertions describing their experimental
results and publish them in scientific papers and textbooks. These assertions contain
expressions of various sorts, some of which are candidate referring expressions. Some of the
latter will be general terms specific to the discipline in question, expressions used by
scientists to formulate assertions with positive intentional force, such as Bosons are
particles which obey BoseEinstein statistics or The N-terminus of retinol dehydrogenase
type 1 signals cytosolic orientation in the microsomal membrane. When initiating the
development of a reference ontology for a given scientific domain, we adopt, for each term
used by the given science, a defeasible assumption to the effect that it refers to some
corresponding type or universal. This assumption can be overturned in a number of ways.
Most interestingly, it can be overturned by scientific discovery, as for example, in the case
of phlogiston. By default, however, the assumption holds simply because as soon as it
becomes known to the scientists involved that a given general term does not refer to any
corresponding type or universal, then this term will be dropped from the repertoire of those
terms that can be used in the normal assertive contexts of the relevant science.
1.14. How can we know that a given general term denotes a universal?
Page 17
The principles are codified in our Referent Tracking (RT) framework (Ceusters & Smith,
2006b), only one element of which is discussed by Merrill namely, what we call the
PtoU-tuple template (for particular-to-universal) (Merrill, 2010, p. 96). Examining his
remarks in this connection will make clear why our proposals cause him such consternation
and why, on the basis of a proper understanding, this consternation could have been avoided.
The PtoU-tuple template pertains to the RT-recommended syntactic regimentation of a
statement, authored by a particular a, to the effect that some universal u, referred to in some
ontology o, is instantiated by some particular p:
Here IUI stands for instance unique identifier. When this template is used to create an
actual tuple that is intended to describe some portion of reality in an RT-conformant fashion,
then u is replaced by the designation, taken from some pre-existing ontology, of some
universal with which the particular denoted by IUIp enjoys the instantiation relationship
(inst). As we explain at length in Ceusters and Smith (2006b), the PtoU-template is
introduced precisely to express the instantiation of some universal by some particular. If
John Doe, a follower of ontological realism, formulates a statement by means of this
template, for example along the lines of:
(G) John Doe; 06/11/2010:6.45PM; inst; BFO; Barry Smith; independent continuant;
since 1951
then this implies that John Doe believes the following:
1.
2.
3.
4.
that the instantiation relationship between Barry Smith and the universal called
independent continuant has obtained since 1951.
John Doe might be wrong in one or more of these beliefs, and in that case his statement (G)
is false.
Merrill now expostulates as follows: u in such an entry is said to be the name of a universal.
Now why should we suppose that it is? (Merrill, 2010, p. 96) This is, from our perspective,
a bit like hearing someone responding to the assertion: the right to a speedy and public trial
is one of the rights enumerated in the Constitution of the United States by saying: Now
why should we suppose that it is?
Ontologies from our hand contain representational units that are assumed to denote
universals or types in reality. (Recall the reference ontology principle in Section 1.6.) That is
how, in the context of the referent tracking literature Merrill is here criticizing, an ontology
is defined. Data repositories that follow the referent tracking paradigm similarly contain
exclusively individual identifiers that are intended to refer to particulars. The underlying
idea can be codified in the form of a principle now observed successfully for some years by
ontologies such as the Gene Ontology:
Principle of obsoletion: Should we ever find that a term in an ontology or data
repository fails in designation, then the relevant entry will immediately be obsoleted.
This applies to expressions referring both to what is general and to what is particular.
(Ceusters, 2007; Ceusters & Smith, 2006a.)
Page 18
We have used the word type in the above side by side with the word universal. The use
of type reflects an effort on our part to be responsive to the needs of specific communities
of readers. But it has at the same time caused confusion because type is used in multiple
different ways in the multiple disciplines relevant to ontology. The term universal, in
contrast, has an established narrowly defined use that serves our ontological realist purposes
very well, and it is for this reason that we employ this term as part of our technical
vocabulary.
One downside arising from the choice of a term of such ancient provenance is that its usage
sets certain sorts of philosophically trained individuals into something approaching panic.
(This is true, with especial potency, in the case of Merrill (2010, p. 93) whose shock at the
fact that, still today, someone might use this word in a serious way is tempered only by the
fact that he himself employs with a similar purpose the term category,15 a term likewise
deriving from Aristotle.)
The countervailing benefit we derive from using universal, however, is that the term
conforms to the
Our own ideas on universals derive from our study of the work of Edmund Husserl, whose
Logical Investigations contains the first use of the term formal ontology (Husserl, 1913/21,
II, p. 219, 1970, pp. 428f). Husserl describes certain universal laws governing how parts are
related within structured wholes, laws, for example, of the form: if an instance of the
universal A exists within a given whole, then so also will an instance of a second universal B
(Smith, 1987; compare Smith et al., 2005). Simple examples are found in perceptual
psychology: every sensation of color involves some sensation of visual extent. But it was in
the field of linguistics that Husserls ideas were particularly influential, where they led to the
creation of what is now called categorial grammar (Buszkowski et al., 1988), and where
they influenced also the work of structural linguists such as Jakobson (Holenstein, 1976), of
the early speech act theorists (Smith, 1990), as well as Chomskys idea of a universal
grammar (Kuroda, 1997).
Parallel developments in linguistics led also to the work on universals of human language
on the part of Joseph H. Greenberg and his followers (Greenberg, 1963). In Greenbergs
terms, all languages have nouns and verbs and all spoken languages have consonants and
vowels. What, he asks, are the other universals common in this way to all human languages?
The attempt to answer this and a series of analogous questions initiated what is still one of
15http://www.ncsu.edu/chass/philo/LACSI.Abstract.pdf; http://biometrics.com/wp-content/uploads/2009/06/safetyworks.pdf.
Page 19
the most powerful research programs in the cognitive sciences. The project has been
influential also in disciplines such as anthropology, for example, in Brown (1991), which
identifies some hundreds of cognitive and behavioral universals common to all human
societies (compare also Pinker, 2002). And because the evolution of languages is influenced
by the same population splits that influence human genetic changes, work on language
universals has provided valuable materials also in assisting population geneticists trying to
reconstruct the path of early human migrations by means of genetic patterning in different
peoples (Cavalli-Sforza, 1997).
Interestingly, Greenbergs work on universals and on the typology of language grew out of
his deep study of Aristotle, and he followed Aristotles empirical methodology for
identifying universals through inspection of many examples.16 The reader may thus be
wondering if the world has reason to be grateful for the fact that the successes of Greenberg
and his followers in throwing light on human cognition and behavior were not thwarted by
complaints, from some Merrill counterpart of an earlier era, to the effect that they were
associating themselves with a metaphysical tradition with a long and sordid history
(Merrill, 2009, note 8).
2.3. Universals, scientific realism and received first-order logic
In an independent development in the late 1970s the term universal began to be used by
philosophers as part of a general rediscovery of the importance of traditional metaphysical
thinking, and especially of one or other version of metaphysical realism, for an
understanding of scientific laws. This rediscovery occurred after a period of dominance of
nominalism especially among philosophers active in the United States who were taking
advantage of the possibilities created by the new tool of first-order predicate logic (FOL) for
the formulation of philosophical arguments.
Simply put, the formulae of FOL consist of four kinds of expressions: logical constants,
such as and and not; quantifiers such as all and some; constant and variable terms
such as a, b, x, y; and predicates such as F and R. Formulae such as F(a) or R(a,
b) are then used to regiment natural language assertions such as, respectively, Socrates is a
man and Socrates is married to Xanthippe, where a stands in for Socrates, b for
Xanthippe, F for is a man and R for is married to.
Page 20
framework which, like many 20th century analytic philosophers, Merrill views as the
benchmark of acceptable formalization (2010, note 17). Because terms in received FOL
range exclusively over individual objects such as molecules or cells or people, such terms
cannot be used to refer to universals, or to anything general or repeatable. And the predicates
in FOL cannot be used to refer to such entities either because they cannot be used to refer
to anything at all.
The metaphysical turn of the 1970s consolidated itself in a new subdiscipline called
analytical metaphysics, which has since become an established part of the philosophical
mainstream. The doctrine of nominalism is indeed still alive in some circles of analytical
metaphysics today. In a survey of (primarily Anglosaxophone, analytical) philosophy
faculty carried out in November 2009, however (Bourget & Chalmers, 2009), only 15.1% of
the 931 faculty surveyed described themselves as accepting nominalism.17
Accordingly, when Merrill asserts that universals, and Aristotelian realism have come
under a series of sustained attacks for at least centuries, if not millennia (Merrill, 2010),
then the reader should be aware that these nominalist attacks have been launched so often
precisely because of the remarkable tenacity of the metaphysical realist position.
2.4. Summary of Merrills argument
I know that you believe that you understood what you think I said, but I am not sure you
realize that what you heard is not what I meant.
Robert McCloskey, State Department spokesman.18
Merrill (2010) points to some of the reasons why our methodological views have been found
attractive by researchers in different life science domains because of the practical
advantages they bring to the developers of ontologies. But at the same time, he assails this
methodology on a number of grounds, some of which rest on misinterpretations of our
views, some of which are, we confess, a consequence of the fact that our position is not easy
to grasp from the multiple expositions that we have created over the years for different
audiences of users. Ontology, when practiced seriously, is of its nature a multidisciplinary
affair, and we believe that our approach has gained traction in no small part because we have
taken its different disciplinary dimensions seriously. (Thus we have not viewed ontology as
an activity performed by and in the service of, for example, lexicographers, who tend to see
ontologies as focused primarily on meanings; we have also not viewed ontologies as the
idealized algebraic structures that are of special interest to some computer scientists.) The
interactions with these multiple disciplinary groups of users have led also, over time, to
important changes in our approach, including changes in our terminology. We are thus
grateful to Merrill for having provided us with the opportunity to address some of the
misunderstandings resulting herefrom.
Merrills own major misunderstandings of our view can be summarized as follows:
1.
2.
that we hold that studying and embracing metaphysical realism is a requirement for
doing science (Merrill, 2010, p. 103);
Page 21
3.
that we accept what Merrill calls the Referential Assumption according to which
the so-called general terms of our language (such as man) participate in a direct
reference relation in precisely the same manner as do the singular terms of our
language (such as Socrates) (Merrill, 2010, p. 85).
Under (1), Merrill fails to do justice to the narrowly practical significance of and
justification for our proposals. His misinterpretation of our views from this perspective can
be summarized as follows: that he interprets a methodology recommended for use by
ontologists working in scientific domains as a theory about the nature of science as a whole.
Sometimes such misinterpretation involves creative misquotation on Merrills part, as when,
for instance, our statement in Smith (2004) to the effect that:
good modeling in support of the natural sciences can be advanced by the cultivation
of a discipline that is devoted precisely to the representation of entities as they exist in
reality
is transformed in footnote 15 of Merrill (2010) into:
good modeling must be based on a metaphysical realism that embraces universals
(emphases added).19
Under (2), Merrill asserts at various points on the basis of nothing in our writings that we
claim that studying and embracing our alleged philosophical theory of science and of
scientific language is necessary to the proper conduct of science. Some of the users of the
realist methodology do indeed concern themselves with such philosophical matters. Some,
indeed are former students of philosophy who employ the realist methodology in their work
even though they embrace nominalist positions because they see it as bringing practical
benefits. Most, however, do not concern themselves with philosophy at all. And quite rightly
so. For we are, like Merrill himself, entirely convinced that no theory of science of the sort
produced by philosophers could be necessary to realizing the tasks of science itself.
Under (3), we shall recognize below that, while some of the general terms used in scientific
language are to be recognized for ontological purposes as designating types or universals, it
is, even in the realm of science (because of the existence of scientific error), not possible to
embrace any one-one correspondence between such general terms and corresponding
universals or types, and this for rather obvious reasons. We are thus taken aback by Merrills
assumption that we hold a referential view even in relation to the terms of natural language.
In his discussion of the two example sentences: John loves Mary and John loves pizza in
(2010), Merrill asserts that, because of the Referentialist Assumption, these sentences are
seen by Smith and Ceusters
as being syntactically identical, and so we are urged to conclude that the general term
pizza must denote some thing (as the individual term Mary does) but not a
particular thing a universal. (Merrill, 2010, p. 91.)
For it would of course be the height of naivety to apply anything like the Referentialist
Assumption to sentences of this sort. Indeed we have argued ad nauseam against the
drawing of ontological conclusions from the mere surface syntactic features of language.
Smith (2005) describes how we see many of the most influential figures of 20th-century
analytic philosophy, from Wittgenstein and Carnap to Lewis and Armstrong, as having been
19Similar creative misquotation is to be found, for example, in (2010, footnote 17), where Merrill asserts that our discussion of certain
inadequacies of description logic (Ceusters et al., 2003) attributes any problems (with such logics) to a failure to take seriously the
existence and role of universals. In fact, however, universals play a role in the mentioned paper only in our discussion of errors of one
specific type, namely those which arise through the confusion (familiar under the label is_a overloading) of the relations of
instantiation and subsumption.
Page 22
affected by the erroneous (indeed absurd) assumption that it is possible to infer the
ontological structure of reality from the logico-syntactic structure of one specific language.
Further problems turn on the fact that Merrill evinces little first-hand acquaintance either
with those practical purposes of ontology development which, on our view, ontologies are
primarily created to support, or with the ways the realist methodology is actually being used
to solve problems of ontology coordination. The word integration appears nowhere in his
essay, and neither does any reference to the signature paper The OBO Foundry:
coordinated evolution of ontologies to support biomedical data integration (Smith et al.,
2007) in which the application of the realist methodology is described. This apparent
ignorance of our actual intentions and of the reasons for the successes of the realist
methodology give his critique the flavor of one who would harangue soldiers marching
into battle on grounds of bad taste in the design of their uniforms.
We have alluded already to our agreement with Merrill in the view that no theory of science
(or, a fortiori of metaphysics) of the sort produced by philosophers could be necessary to
realizing the tasks of science itself. We thus share with him the view that it would be
inappropriate for philosophers of science or metaphysicians of whatever stripe to attempt
to interfere with how scientists do their job.
For this very reason, however, we have become acutely conscious in our work with various
communities of scientists of the degree to which scientific conduct is being interfered with
philosophically on another plane as a result of the increasing importance to science of
computational artifacts. This interference comes not from the side of philosophy itself,
however, but rather from information and computer science (Smith, 2004). Much of the
polemical work we have published in recent years has been addressed to the task of
counteracting this interference as it emanates especially from disciplines such as knowledge
engineering and conceptual modeling disciplines which exert a strong impact especially on
the ontology field and thus, indirectly, on science.
Page 23
If, however, terminology standards are constructed in such a way that real objects such as
rivers are placed on the same level as imagined objects such as unicorns, then it is unlikely
that the terminologies that result will be able to support the current needs of, for example,
biological science.
The most pervasive influence of International Standard Bad Philosophy is via the doctrine
according to which terms in ontologies should be seen as referring, in some sense, to
concepts, as captured, for example, in the ISO definition of a terminology as a set of terms
representing the system of concepts of a particular subject field (ISO, 1990).
It is because this doctrine has given rise, and continues to give rise, to multiple false steps in
the discipline of ontology false steps that we see being repeated over and over again in
every new area in which ontology technology is applied that we have devoted so much
effort to developing and disseminating an alternative, more scientifically coherent, view as
to how the terms in ontologies should properly be understood, in the hope that such false
steps can be avoided in the future.
At one point in Ontological Realism Merrill remarks of the realist approach that, while it
may be looked upon favorably by medical informaticists who lack familiarity with
alternative approaches and who for a time at least may be enticed into going along for
the ride, empirical scientists (will) find it much more difficult. Interestingly, however, it is
precisely among medical informaticians and computer scientists that we find the most
visceral resistance to the realist approach. Empirical scientists, in contrast, have been
supportive of our efforts from the beginning (and the reader is invited to note the large
number of bench biologists in the list provided at the end of Section 1.10).
Why should this be so? Why, more precisely, should so many informaticians and computer
scientists (and terminologists) remain so faithful to the concept orientation and, more
generally, to one or other subjectivist or relativist view, conceiving ontologies as
representations not of some independent reality but rather of mere views or perspectives or
descriptions or collective hunches (Smith, 2004)? Why, on the other hand, should so many
bench biologists be so open to the realist alternative?
Part of the answer lies, we believe, in the fact that computer scientists unlike most
biologists receive training in cognitive psychology, which leads encourages them to have
strong feelings about what they see as the constructed nature of much of human belief.
Another part has to do with the existence of incentives within the world of information
technology which support the creation of new intellectual resources rather than the
refinement and reuse of those which already exist. For empirical biologists, on the other
hand, incentives often point in the opposite direction, which means toward finding ways to
ensure that past, present and future data can be effectively shared.
3.2. The National Cancer Institute Thesaurus
One of the first applications of the methodology of ontological realism was to the critical
analysis of the National Cancer Institute Thesaurus (NCIT) (Ceusters et al., 2005), a
component of the UMLS Metathesaurus collection of biomedical source vocabularies, in
which we identified a series of embarrassing errors of definition, classification and logic.
The NCI commissioned an overhaul of its Thesaurus in response to our criticisms, though
the organization contracted to make the needed changes thereby succeeded, in some respects
at least, in making matters worse.
Page 24
Many of the errors of the NCIT, then and now, grow out of confusions surrounding the term
concept and its cognates, as for example in the use-mention confusions present in NCIT
definitions such as:
Conceptual entity =def. An organizational header for concepts representing mostly
abstract entities.
Event occurrence =def. An indication or description that something has occurred.
These formulations (taken from the version of NCIT current in June 2010) are not only
logically nonsensical (they are comparable to, for example, Swimming is healthy and has
two vowels); they are also practically useless for anyone who might want to understand
how, precisely, the respective terms are intended to be used by the authors of the NCIT.
The confusions manifest themselves also in the circular is_a relations present in the NCIT,
as for example in:
Entity is_a Conceptual Entity,
an assertion logically comparable to: apple is_a green apple.
As we have repeatedly urged, the reason why there are so many errors associated with
NCITs use of Concept just as there are so many parallel errors in other parts of the
UMLS Metathesaurus is because the authors of the NCIT do not understand what they are
referring to when they use the word Concept. One prime indication of this lack of
understanding is the number of occasions on which items classified by the NCIT under
Concept, Conceptual Entity or cognate terms are misclassified or inconsistently defined.
Neither Concept nor NCI Administrative Entity is classified as a Conceptual Entity in
NCIT, for example, and this even though the latter is defined by the NCIT as Conceptual
entities (sic) required by NCI operations and systems.
Some of the nicest examples of Conceptual Entity terms in NCIT are found among the
children of Geographic Area, which include Alabama, France and door.20 One
troubling issue here troubling because it suggests that the authors of the NCIT have an
uncertain understanding, not merely of geography, but also of the basic rules of logic is
that Alabama is asserted to be a subclass of US State, just as France is asserted to be a
subclass of Country (and just as the Burgundy wine region is asserted, in Noy and
McGuinness (2001), to be a subclass of France). States, countries and wine regions,
however, are not classes on any of the normal understandings of class; and thus also they
are not subclasses of other classes.
Page 25
2.
The ConceptId itself, which is the key of the Concepts Table (in this case it is
less ambiguous to use the term concept code).
3.
The real-world referent(s) of the ConceptId, that is, the class of entities in
reality which the ConceptId represents (in this case it is less ambiguous to use
the term meaning or code meaning).21
When I first started working on the (National Center for Biomedical Ontology) project I
didnt fully buy in to the realist approach. The process of resolving your critique of the
NCI Thesaurus, however, convinced me that from a purely pragmatic perspective the
approach (mostly) worked. Since then, I have continued to apply some of the basic
organizational principles and have been pleasantly surprised at how useful they have
been in defining, organizing and classifying all sorts of knowledge resources.
Somewhere along the way it just became intuitive and obvious science is about
describing reality, and the primary point of agreement has to be on the things being
described. I have to admit that I still dont agree with some of the techniques that have
been used to publicize this approach, but it is obvious, however, that what you have
been doing is working.22
Merrill does not himself advance a strategy for ontology coordination. As far as one can tell
from his (Merrill, 2009) and (Merrill, 2010), however, for such a strategy to be capable of
receiving Merrills support it would have to be centered on the use of FOL. The most
ambitious such strategies involve the translation of scientific content into the language of
FOL along lines first attempted by Carnap in the service of what was in his day referred to
as the Unity of Science Movement (Morris, 1960). In The Logical Structure of the World
(Carnap, 1928), he offers a methodology for translating all of science into one single
ontology based on a doctrine called resemblance nominalism. The approach uses Carnaps
own dialect of the language of FOL, which differs in two respects from that of received
FOL, first in allowing terms to represent what he calls elementary experiences, second in
allowing only one single primitive dyadic predicate M, which is satisfied if and only if two
particulars match each other. The result set standards of logical rigor and of syntactic
constraint in the service of the integration of the content of scientific theories which remain
unsurpassed. But Carnaps method nonetheless failed, not least because of what Carnaps
fellow nominalist Goodman called the disastrous problem of imperfect community
21Previously, SNOMED CT had defined Concept as: a unique unit of thought. At the same time it defined Disorder as: a
concept in which there is an explicit or implicit pathological process causing a state of disease which tends to exist for a significant
length of time under ordinary circumstances. From this it can be inferred that some units of thought contain pathological processes
causing states of disease.
22http://www.bioontology.org/node/540, last accessed June 30, 2010.
Page 26
(Goodman, 1951), a problem turning on the fact that simple examples can be constructed to
show that given groups of particulars may resemble each other yet fail to share any property
in common. Moreover, because two putatively distinct universals may happen to have
exactly the same instances. Carnaps method of constructing natural classes on resemblancenominalistic principles would then incorrectly determine only one class for what intuitively
seem to be two universals (thus: two respects in which the same things resemble one
another).
Sadly, however, Woodgers initiative, too, must be judged a failure, and this for a number of
reasons. First it was some 50 years ahead of its time, since the potential utility of the sort of
formalized representation attempted by Woodger became manifest only with the widespread
use of computation in support of scientific research. Second, Woodgers axiomatization falls
short from the point of view of modular organization, so that there is missing any distinction
between formal-ontological (top-level, organizing) portions of the theory and domainspecific portions corresponding to the separate biological disciplines.
All terms used in the formulae of Woodgers theory are defined in terms of the small set of
primitives listed in Table 1. This provides a promising approach to the creation of the sort of
constraint on expressivity that is needed if the goals of integration are to be achieved. But at
the same time the various domain-specific portions of Woodgers theory for example, its
treatments of Mendelian genetics and of embryology are so intricately embrangled with
each other in the formalization that, were one portion of the theory to be rejected because of
Appl Ontol. Author manuscript; available in PMC 2011 May 31.
Page 27
empirically-based advances in the relevant parts of science, then the entire theory would
have to be rejected also. For the same reason, too, Woodgers approach does not lend itself
readily to the sort of division of labor which would allow distinct components of the theory
for example, in cell biology or in evolutionary systematics to be developed in a
dedicated fashion by experts in the corresponding disciplines.
All of which brings us to what is from our present perspective the principal problem with
Woodgers approach: the absence of modularity or of what we could now call
normalization brings not only obstacles to the theorys being able to keep pace with
scientific advance; it implies also that as Fig. 3 makes clear his theoretical contribution,
as expressed in page after page of logical formulae, is practically impenetrable to all but a
very small minority of specialists in mathematical logic. Our experience working with
ontologists and scientists in biological and similarly complex domains has taught us,
however, that there is an essential trade-off between logical complexity on the one hand and
biological usability and revisability on the other. There was then, and is now, no way in
which Woodgers contribution could have been useful to biomedical researchers. For given
the scalability problems of the biomedical ontology integration task, ontology resources will
require at every stage significant contributions from multiple disciplinary groups of
biologists who are in a position to ensure that these resources are properly maintained and
properly used. Ontologies will receive the support they need from biologists in this way,
however, only if the latter are able both to understand their contents and have confidence
that they will evolve in such a way as to keep pace with scientific advance. Ontologies
which do not capture the relevant audiences of human users, even if they achieve very high
standards of technical rigor, will for scientific purposes be as worthless as, for example,
telephone networks meeting the highest of technical standards but with no actual
subscribers.
4.3. How would Merrill approach the task of ontology coordination?
Merrills purpose in Ontological Realism is a negative one. It is to demonstrate that the
realist methodology, while it contains several elements of which he approves, also contains
other elements centered on uses of the word universal that are subject, as he sees it, to
serious flaws and therefore ought to be abandoned.
We do not at all rule out that there might be ingredients in our methodology that are
inessential to its proper functioning and perhaps even detrimental in this or that way. We are
dealing, after all, with a large-scale effort in scientific coordination, where multiple path
dependencies will play a necessary role. But it is not at all clear that Merrill himself has
succeeded in identifying any such detrimental elements; and even if he had, we would be
reluctant to make any attempt to untangle them from the whole without good evidence of
what might be the consequences of such an attempt.
When we examine the content of Merrills critique, however, we find too little that is of
substance to justify such a change. For, when we leave aside his recommendations
concerning logic and semantics with many of which, were they only clearly specified, we
would almost certainly agree this critique amounts to a rather peculiar argument, resting in
no small part on a series of misquotation of our writings, which we might characterize in a
preliminary form as follows:
Some of Aristotles and Armstrongs ideas are inconsistent with empirical science
(true).
Appl Ontol. Author manuscript; available in PMC 2011 May 31.
Page 28
Therefore, Smith and Ceusters in using the word universal when describing the
ontological realist methodology cannot possibly be doing anything which helps
empirical scientists to do their work (false).
4.3.1. Adherence to the principles of logic and semanticsIn his (2007) Merrill
described his work (on the GlaxoSmithKline Babylon Knowledge Explorer) as a kind of
scissors-and-paste engineering. If, for example, the GO, or the WHO Drug Dictionary, are
found to be flawed, then our response, he says, cannot be to devote time and effort to
repairing such flaws in a systematic manner. Instead, it is to work with what is available or
to make what is available work. At this stage in the development of his thinking,
therefore, Merrill seemed to hold out no hope for the realization of the goal of ontology
coordination that is at the center of our work. If this is his position today, and if he has
arguments for this position, then we would assume that he would find it most sensible to
criticize our methodology on the basis of these arguments rather than on the basis of
incidental features of our writings still bearing traces of a philosophical etiology. We
speculate therefore that he has moved on from the total pessimism of 2007, and accordingly
focus on those passages in his writings which can be interpreted as allowing for the
possibility of some sort of ontology coordination strategy analogous to our own.
The first such element can be formulated as follows: that to develop ontologies able to meet
the needs of biomedical research, authors need to understand and employ the principles of
formal logic, semantics, and the philosophy of language. We will, thereby, Merrill says,
avoid the confusions and errors that Smith and Ceusters have quite rightly criticized in a
number of flawed approaches to ontologies in science (Merrill, 2010, p. 105).
Unfortunately, however, this is not so. Indeed, as Merrill himself is fully aware, it is not
even clear that there are commonly accepted principles of formal logic, semantics and the
philosophy of language. As he himself expresses it (personal communication): There are a
number of ways of approaching the semantics of sentences, of terms and of predicates.
Many of these ways are incompatible with one another, and each has certain advantages,
disadvantages and challenges.
The proposal as stated is also marked by a certain naivety as concerns the work which must
be done if those engaged in ontology development in the service of science are indeed to be
brought to the point where they are truly able to avoid confusions and errors of the sorts we
have identified. For we have ample evidence that even those schooled in the practical
application of the disciplines of logic and semantics may fail to recognize the need for
ontologies that enjoy, for example, the feature of mutual consistency; some, indeed, are
creating ontology-like artifacts which are unashamedly not internally consistent even with
themselves (Lenat, 1995).
4.3.2. The focus on predicates rather than on general termsIn a number of
places Merrill recommends versions of the principle of tolerance articulated by the later
Carnap as follows: Let us grant to those who work in any special field of investigation the
freedom to use any form of expression which seems useful to them, and tolerant in
permitting linguistic forms (Carnap, 1950). While such a principle is of course perfectly
acceptable in the context of hypothesis-driven experimental science, it would be the kiss of
death in the context of ontology. For as we have claimed already above, and as we argue in
detail in Section 6, ontology-based integration of data in a complex and heterogeneous
domain like that of the life sciences is in practice unachievable except through the
application of constraints on what can be said within the framework of the ontologies
created. Merrills endorsement of the tolerance principle will thus be seen to mark yet
Page 29
another worrying element of naivety on Merrills part when it comes to addressing the needs
of real-world ontological development.
Interestingly in both Merrill (2009) and (2010) seems to offer arguments not obviously
compatible with the spirit of the principle of tolerance in favor of the merits of a
regimentation of the content of ontologies and terminologies that would be based, not on
general terms, as is standardly the case, but rather on predicates. To see what this would
mean, consider the sentence:
(H) Lipitor is an HMG-CoA reductase inhibitor.
In ontologies modeled after the GO this sentence would be regimented via an assertion
linking two nouns, for example as follows:
(I) Lipitor has_function HMG-CoA reductase inhibitor.
On Merrills proposal, in contrast, it would be rendered as a universally quantified FOL
statement linking two predicates:
instantiates_Lipitor
instantiates_HMG-CoA reductase inhibitor
to the effect that everything which satisfies the first predicate satisfies also the second. In
symbols:
(J) (x)(instantiates_Lipitor(x) instantiates_HMG-CoA reductase inhibitor(x)).
There is now one obvious reason why all successful ontology and terminology ventures in
support of science thus far have preferred the first, general term-based, approach. Consider a
sentence such as
Simvastatin activates the protein kinase Akt and promotes angiogenesis in
normocholesterolemic animals.
Here the number of general terms that can be identified is rather limited relevant
candidates have been italicized. In the case of predicates, in contrast, because the latter can
be combined with each other to form logically more complex predicates in multiple arbitrary
ways, there will be, for any sentence of reasonable complexity, indefinitely many logically
acceptable predicates that can be identified within it. In the mentioned sentence, for
example, we can identify predicates such as:
activates the protein kinase Akt
activates the protein kinase Akt and promotes angiogenesis
promotes angiogenesis
promotes angiogenesis in normocholesterolemic animals
activates the protein kinase Akt and promotes angiogenesis in normocholesterolemic
animals
is promoted by Simvastatin
is activated by Simvastatin
is activated by something
promotes something
promotes something in normocholesterolemic animals
and many more.
Appl Ontol. Author manuscript; available in PMC 2011 May 31.
Page 30
4.3.3. Privileging FOLMerrill advocates in many places the use of FOL (or of the
related Common Logic family of first-order logics), just as he recommends authors such as
Cocchiarella (2003), Zalta (1983) and Lenat (1995), whose work in logic, philosophy or
computer engineering is rooted in the use of FOL or of the second order logics associated
with FOL.23 In one passage addressing the relations between FOL and other kinds of logic
Merrill reveals particularly clearly how little he has familiarized himself with the actual
practices of contemporary biomedical ontology, and specifically his apparent ignorance as
concerns the role and nature of the different sorts of logic that are employed therein:
The Referentialist Assumption [which Merrill sees as being adopted by Smith and
Ceusters] makes more sense if it is adopted against the background of a term logic (such
as Aristotles syllogistic, the logic of Leibniz or Boole, or Description Logic) rather
than a predicate logic (such as modern first-order predicate logic) [1]. Term logic, for
good reason, has been referred to by Peter Simons as logic lite; and its weaknesses are
well known (among them, lack of expressive and inferential power) [2]. Its
reintroduction in contemporary times as Description Logic [3] has been an attempt to
provide a simplified formal basis for automated reasoning, but its flaws are proving to
be too high a price to pay in many applications [4], and so alternatives are being sought
among them, Common Logic [5]. In other contexts, when one is willing to pay the
price in terms of computational resources and performance, standard first-order logic
(or something even stronger, as in the case of Cyc which adds some second-order
extensions ) provides a much more satisfactory framework for knowledge
representation and reasoning [6]. (Merrill, 2010, footnote 17.)
Ad [1]: As will become clear in Section 5.4 and as is documented at length in Smith (2005),
24 our highly constrained version of what Merrill calls the Referentialist Assumption is
anchored entirely within FOL.
Ad [2]: Those who have familiarity with the role of logic in major ontology development
projects will know that it is vital for some purposes to have at ones disposal a basic logical
resource that has the (weak) expressive power and support for inferencing that is needed
for publishing and editing of ontologies; this is proved not least by the tremendous success
of the OBO format, and of the OBO-Edit software resource,25 to which Merrill nowhere
23Oddly, the first two of these authors defend versions of metaphysical realism considerably more extreme than the version of this
position that Merrill imputes to Ceusters and Smith and criticizes so vehemently.
24A paper Merrill refers to in exactly this connection in Merrill (2010, footnote 9), but seems not to have read.
25http://oboedit.org/, last accessed June 30, 2010.
Page 31
refers, even though they continue to serve as the principal pipeline through which high
quality ontology-annotated data enter into the public domain.
Ad [3]: This is an egregious error, since Description Logics in the plural have nothing
whatever to do with term logic but are rather a (family of fragments of) FOL which have the
very same FOL semantics (modulo constrained expressivity) and in which predicates
including the relational predicates absent from term logics play the very same (Merrillapproved) role.
Ad [4]: The Description Logics used within the biomedical ontology development
community are primarily the OWL (Web Ontology Language) with the profiles OWL-EL
and OWL-QL according to the new OWL-2 specification.26 SNOMED CT (roughly) uses
the less expressive OWL-EL variant. Users of Description Logics are aware of the many
issues which flow from the constraints on expressivity that are imposed for the sake of
certain vital computational benefits, which include the facility, when working in OWL 2.0
and in certain other Description Logics, to check successive ontology drafts for consistency
in ways guaranteeing a response that is in almost all cases close to immediate, and to import
and export ontology content in flexible ways (Courtot et al., 2011). Certainly it is true that
these constraints on expressivity have often led to embarrassingly trivial work.27 Problems
arise also in virtue of the often seemingly willfully confusing choices of technical
metaterminology by the authors of OWL, for example, using property for what in other
circles is called relation. Because of these factors Smith and Ceusters initially belonged to
the camp of skeptics as concerns the use of OWL in scientific contexts. Largely as a result
of the efforts of those working within the OBO Foundry community, however, an
impressive and ever-increasing body of scientifically valuable content is now available on
the web using OWL as native development format.
Ad [5]: Again, because of his lack of familiarity with the body of work that he sees fit to
criticize, it is in fact, as concerns the life sciences, precisely within the community of users
of the OBO format that experiments in the use of Common Logic as a resource to
supplement the expressivity of weaker logics are being made,28 and in ways exploiting
aspects of the realist methodology which Merrill assails.
Ad [6]: We will return below to consider the merits of Cyc, or of anything like Cyc, for
purposes of scientific data integration. There are many reasons why, after some $100 million
of investments in its development, there are still no documented successes in this regard on
the part of Cyc. One reason is of course that Cyc was built for a quite different purpose.
Another reason, as concerns biology at least, is its content, of which we here provide just
one sample, taken at random29:
Page 32
We believe that observance of any of these requirements would, each for different reasons,
guarantee failure for any strategy for ontology coordination constructed in its terms: the first
because it is so woefully underdetermined, the second because it will guarantee forking, the
third because, at this stage in the development of logical and ontology technology, at least,
FOL-based initiatives can be made useful in the work of biomedical ontology development
only if they are employed in tandem with (or as precursors to) the development of simpler
logical resources with certain needed computational benefits.
Specifically, we depart from Armstrongs views in at least the following crucial respects
(Smith, 2005; Neuhaus et al., 2004), including:
1.
the central role he awards to the ontological category of states of affairs or facts,
which he views, oddly, as constituting the ultimate simples in the universe,
2.
3.
his reliance upon a mythical future perfected state of science in which, as he sees
it, his own formal-ontological proposals will be finally realized,
4.
his concomitant failure to address realistic examples taken from really existing
sciences such as physics or biology,
Page 33
5.
his assumption that all universals are properties or attributes (Armstrong, 2008),
and thus entities corresponding albeit not via any one-to-one mapping to
predicates,
6.
7.
To see how Merrills criticisms of our ontological views fall wide of the mark because he
imputes to us Armstrongian positions we do not hold, consider, for example, (3), above.
Where Armstrong can in all seriousness hold that to establish what universals there are we
need to appeal to the future perfected state of what he calls total science (Armstrong, 1989,
p. 87), we ourselves are interested precisely in really existing scientific theories, and in the
associated really existing ontologies, which in normal circumstances are not associated with
any claim to completeness. Really existing scientific theories are marked, rather, by messy
and inconvenient processes of change and of correction of error, including ontological error,
and our formulation of the realist methodology is designed precisely to do justice to this fact
(Ceusters, 2009). Where Armstrongs views are put forward as philosophical doctrines,
ontological realism is a practical methodology. In order to sustain his attack on ontological
realism on grounds derived from flaws he finds in philosophical doctrines defended by
Armstrong, therefore, Merrill is forced into contortions of positively Ptolemaic proportions.
30
Certainly we share Armstrongs recognition of the need for a sparse theory of universals
as contrasted with those theories which allow representations of universals/properties/
intensions/concepts to be constructed in combinatorial fashion. Armstrong himself
formulates the sparse view as follows: Given a predicate, there may be none, one or many
universals in virtue of which the predicate applies (Armstrong, 1978, emphasis added). For
us the sparse theory is a view to the effect that for each scientific general term, there may be
none, one or many universals to which the general term refers (Smith, 2006a). We, like
Armstrong, hold the sparse theory of universals because of our conviction that the question
as to which universals exist in reality is a matter for scientists, not for ontologists, logicians
or linguists, to determine (Grenon & Smith, 2004). Unlike Armstrong, however, for whom
what matters is the future total science when scientists will exist in a state of
epistemological perfection, we acknowledge that it is impossible to read off from any given
scientific theory what universals exist in reality for simple epistemological reasons turning
on the fact that the theory in question may rest on error.
5.2. The non-smoker
From the ontological realist perspective, that a specific universal exists is never a matter of
what can be discovered by logical means alone, but always only through application of the
scientific method.
In particular, therefore, we reject the thesis according to which, from the fact that F is a
universal, we could infer that non-F is a universal, where non-F is defined as follows:
30See, for example, Merrill (2010, footnote 13).
Page 34
(K) x instantiates non-F =def. it is not the case that (x instantiates F).
Indeed we go further and argue that:
(L) If F designates a universal then non-F (in the sense defined by (K)) does not
designate a universal.
(L) implies, in particular, that if smoker designates a universal, then non-smoker (in the
sense of (K)) does not designate a universal.
Here Merrill sees trouble for our position, in light of the fact that assertions such as:
(M) Non-smokers are less susceptible to cardio-pulmonary diseases than are smokers,
might very well be supported by empirical evidence. From this, he infers, it follows that
ontological realists might potentially be in a position where they would have to reject
empirical evidence because it would contradict some favored metaphysico-logical principle.
If he were right in this, then (L) would of course need to be sacrificed, and Merrill, because
he would have finally discovered an actual error in our work, would have scored a valuable
point.
Unfortunately, however, in making this charge Merrill confuses what are standardly called
internal and external negations, and thus himself commits an error of logic.31 This is
because non-smoker, as it occurs in assertions such as (M), utilizes only the internal
negation expressed by human who does not smoke, not the external (which is to say logical
or Boolean) negation conveyed by: entity of which it is not the case that it smokes. A
cardinal number, or a glass of water, is a non-smoker in the latter sense, which is the sense
captured by (K); not however in the former.
Our assertion (L), now, has no implications at all for terms (such as odorless, colorless,
invisible, unfriendly and so on) involving mere internal negation, since the sparse theory
of universals of which (L) is one expression pertains only to the question whether
representations of universals can be composed through application of logical constants such
as and or not. And we are confident that every assertion analogous to (M) in which nonsmoker or any similar term would truly be to be interpreted in the externally (i.e., logically)
negated sense, will be found to be clearly false on the basis of simple inspection. Thus it
was, the last time we checked, not the case that cardinal numbers are less susceptible to
cardio-pulmonary diseases than are smokers.
In Ceusters (2007) we set out the recommended realist treatment of negative assertions.
Both smoker and non-smoker, if included in an ontology conformant to the principles
presented above, would need to be included in the corresponding inferred hierarchy on the
basis of definitions along roughly the following lines:
smoker(x) =def. instantiates(x, human being) & y((instantiates(y, act of smoking) &
participates(x, y))
non-smoker(x) =def. instantiates(x, human being) & y((instantiates(y, act of smoking)
& participates(x, y))
employing universals human and act of smoking. We say roughly because a full account
would need to specify the thresholds for when somebody would count as belonging to one or
other collection, for example in the case of humans who recently gave up smoking, or who
smoke occasionally. (M) on this account would then amount to an assertion relating acts of
31See Slater (1979). In Ceusters et al. (2005) we show how this same logical error is committed also by the curators of the NCI
Thesaurus.
Page 35
In On What There Is (Quine, 1953), Quine presents an alternative to views of this sort
designed to lend support to nominalists, like himself, who have a taste for austere
ontologies. For the world to be such that Teco is a bonobo, Quine holds, it must be the case
that the world includes some bonobo; but it need not include anything properly referred to
by means of a general term such as, say, bonobohood or Pan paniscus.
From Quines point of view, A subject-predicate sentence is true if and only if the subject
satisfies the predicate. Thus, for example, Snow is white is true if and only if snow is
white. Many have been disconcerted by the apparent circularity of this doctrine. Armstrong
(1978) gives voice to this puzzlement by coining the term ostrich nominalism as a label for
those philosophers who refuse to countenance universals but who at the same time see no
need for any reductive analyses of the sort that would replace talk of universals for example
with talk of sets or collections of resembling particulars.
For Armstrong, questions like what makes it true that Teco is a bonobo?, or more
generally, what is it for a to be an instance of the type T?, or for a to have the attribute F?
are compulsory questions questions that all upstanding philosophers are called upon to
address (Armstrong, 1980). The ostrich nominalists response to such questions, however, is
to bury his head in the sand while everyone else in the debate (even the most extreme of
nominalists who might appeal, for example, to brute relations of resemblance) thinks that as
being F warrants some form of analysis.
In response, the ostrich nominalist might argue that, on his account, the phenomenon of true
predication is a basic phenomenon, one not reducible to, or explainable or analyzable in
terms of, anything more fundamental. Circularity is, in this sense, both inevitable and
harmless. In fact, however, we think that the only reason for treating predication in this way
as brute (which is to say: not further analyzable) comes, again, from an overblown
fascination with austere ontology (with a taste, as they say, for desert landscapes). Such an
ontology is revisionary; and it is adopted by the ostrich nominalist without good reason (and
certainly without any reason being supplied).
Merrill, too, where others see general terms, professes to see only predicates with no
referential force. In the sentence Socrates is a man, he writes,
the term Socrates is singular and denotes a particular man while the term man may
be taken to be a general term denoting the class of men, the form Man, or mankind.
Alternatively, in modern first-order logic, man would not be regarded as a term in this
sentence, but rather is a man would be regarded as a predicate. And the difference here
Appl Ontol. Author manuscript; available in PMC 2011 May 31.
Page 36
is that predicates are not (or certainly need not be) viewed as denoting anything.
(Merrill, 2009, p. 14, punctuation added.)
From this passage, and from the absence in Merrills writings of anything to the contrary, we
infer that Merrill, too, is an ostrich nominalist.
Consider now this sentence from his Ontological Realism (Merrill, 2010, p. 92):
The point is that while for the metaphysical realist (of the SmithCeusters school) a
fundamental task of the scientist must be to ask what universals exist, for the anti-realist
this is replaced by the much more sensible (and obviously empirical) task of
determining what predicates (loves pizza, has the flu, is a smoker, is a nonsmoker, etc.) should be introduced into our scientific language in order to formulate
our theories and test them in the empirical world and which of those predicates we
should retain in our language as a consequence of such testing.
How, on the ostrich perspective, could such testing be made intelligible? Let us suppose
(somewhat counter-intuitively, given what we know about how scientists work) that a given
group of scientists is attempting to determine empirically whether to include the predicate
has the flu in their scientific language. How do they do this? By investigating, presumably,
whether there are entities in reality which satisfy the predicate has the flu. And how do
they do this? Presumably by finding out whether, say, entity Jim satisfies this predicate. And
how do they do this? By determining whether the sentence Jim has the flu is true. And how
do they do this? By examining whether Jim satisfies the predicate has the flu. And how
do they do this? By determining whether the sentence Jim has the flu is true. And so on, ad
indefinitum.
Merrill seeks to climb out of this circle in the following passage:
we know what it means to have the flu. We can describe tests for determining such a
diagnosis and describe clear clinical (empirical) conditions pertaining to the flu and
those who have it. The flu universal does not make an appearance. (Merrill, 2010, p.
92.)
But how could it be that we can determine that something, the flu, is had, now by this
patient, now by that patient, if there are no repeatable somethings (however the latter are to
be understood from the metaphysical point of view)? How, if there are no repeatable
somethings, could there be tests which can be described in a uniform way and reliably
applied, now to this patient, now to that patient, to determine whether either has something
that would in both cases be referred to as the flu? And how could there be diagnoses and
conditions which share in common that they pertain to the flu? How, more generally, is
Merrill to do justice to the use of general terms as the subjects of true sentences formulating
scientific discoveries as, for example in:
(N) The H1N1 virus causes influenza?
As Summerford (2003) argues, If the nominalist is going to reject universals, then he must
demonstrate that the use of these terms does not involve countenancing such entities, and
nominalists have thus far failed to provide a satisfactory demonstration of how this is to be
achieved. The one approach which still attracts significant numbers of adherents views
general terms such as those appearing in (N) as referring to sets or collections, in effect by
identifying universals with their extensions, which is to say with the set or collection of their
instances. How, then, to address the problem turning on the fact that multiple putatively very
different universals might conceivably have identical extensions in this the actual world?
The favored answer to this question (deriving, in its most influential version, from Lewis,
1986) is to view the extensions in question as including members not merely among the
Page 37
actual, but also among the merely possible, physical individuals such as, say, Nicola
Guarinos thousandth child. This however creates further problems, not merely because it
makes the favored set-theoretic referents for general terms appear (as some might say)
curiouser and curiouser the more closely they are scrutinized, but also because it threatens
to make the treatment of such terms embarrassingly remote from the scientific and
computational needs of, say, biologists.
5.4. First-Order Logic with Universal Terms (FOLWUT)
There is a further problem for the predicate-based approach when received FOL is used,
namely that it might leave us in a position where certain needed logical inferences will not
be able to be drawn. Consider, to illustrate the point, an assertion concerning some portion a
of cell protein extract. From
(O) a was incubated for 10 min.
We can infer:
(P) a was incubated.
One way of treating (O) in received FOL would yield:
(O*) was_incubated_for_10_minutes (a),
a logically not further analyzable sentence, again of the form F(a), where F is the
predicate and a is a constant term referring to an individual object to which the predicate
F is applied. From (O*), however (and non-logicians among our readers will be shocked
by this), we cannot infer logically the regimented counterpart of (P), namely:
(P*) was_incubated(a).
Famously, this is the problem of the logical analysis of sentences involving adverbial
modifications. This problem is commonly seen as having been solved by Ramsey (1978)
and Davidson (1980), who recognized that sentences such as (O) are properly to be treated
as equivalent to sentences involving existential quantification over events, along the lines of:
(P**) (e)((instantiates(e, incubation_event) & participates(a, e) & duration(e, 10
minutes))
or in other words there is some incubation event e in which a participates and which is of
duration 10 minutes.
The inference to (O), which is now regimented as having the form:
Page 38
(R) has_a_headache(Werner).
(R) is true, again, if and only if the subject (Werner) satisfies the predicate
(has_a_headache), whereby (R) clearly respects the received FOL rule that it contains only
terms referring to individual objects. From the clinico-ontological perspective, however, this
rule will pose problems, for it means that (Q) does not allow the inference to, for example,
(S) there is a headache which Werner currently has,
or in symbols:
(Q*) (x)((instantiates(x, headache) & inheres(x, Werner)).
Moreover, received FOL will not allow us to assert, for example, that the headache referred
to in (S) has lasted for two hours, or is being treated by taking aspirin gum. Indeed, received
FOL allows no reference to disease-entities to your influenza, or my sinusitis of any
kind; rather, it requires us always to reformulate our statements about such entities as
statements about individual objects such as the organisms which are their bearers.
The version (dialect) of FOL that we propose called FOLWUT, for: first-order logic
with universal terms is designed to resolve such matters by allowing the ways clinicians
and others refer to entities such as diseases to be captured using terms in FOL along lines
illustrated already in (Q*). But it goes further in allowing terms in FOL to refer not only to
independent and dependent continuant particulars and to occurrent particulars, but also to
universals in all of these categories (Smith, 2005).
FOLWUT thereby departs from received FOL in two ways. First, it expands the repertoire
of types of entities to which the terms of FOL can refer. At the same time, it radically
restricts the family of allowable predicates, eliminating all predicates of the usual sort (is a
man, is an HMG-CoA reductase inhibitor and so forth), and admitting instead only a
small number of formal predicates, including two-place (relational) predicates of the sorts
described in the Relation Ontology all of them predicates which, like the formal tie of
identity =, come with fixed interpretations.
Such relational predicates will include, on the level of instances, suitably temporally indexed
versions of:
Part_of(x, y), for: individual x is part of individual y
Page 39
and so on.
The consequence of generalizing the scope of allowed referents for terms in FOL to include
also universals is that it brings the possibility of simulating, within an entirely traditional
FOL framework, some of the expressive possibilities of second order logic. In particular, we
can define, in terms of the instance-instance relations listed above, type-level relations such
as is_a and part_of in ways that are useful not only for ontology-based reasoning but also
for ensuring that the relations in question are used by those engaged in the construction of
ontologies in ways which avoid certain hitherto common errors (Bittner & Donnelly, 2007;
Donnelly et al., 2006). And then, exactly as Merrill would require, the result is a framework
in which predicates do not represent, and which is governed by standard predicate-logical
semantics.
5.5. General terms in scientific hypotheses
We recall that the principle of instantiation is formulated only for the case of reference
ontologies (and thus of ontologies created in support of settled science). Matters ontological
will be more complicated in areas of non-settled science, where there may be multiple
camps of experts, and where the appropriate ontological analysis of the very experiments
used to test given hypotheses may be subject to dispute. Ontologies may then provide a
supporting role in the testing of the relevant hypotheses; however, it is not up to the authors
of reference ontologies to pick sides in such disputes; rather this is a decision that should
wait for science.
Further issues are raised by our acceptance of the principle of instantiation. This principle is
designed to ensure that the users of the realist methodology see types or universals not as
entities in some special realm that is beyond the reach of empirical observation, but rather
squarely within the world of what happens and is the case, entities with which experimental
data is associated.
Sometimes, of course, general terms are used by scientists to designate entities (or purported
entities) postulated in areas where science is not yet settled, as for example in the case of the
Higgs boson (Dumontier & Hoehndorf, 2010). Merrill is right to insist (with Smith, 2006a)
that there is a role for ontologies to aid in formulating the hypotheses that later become
laws within theories. Here, clearly, the principle of instantiation does not apply. As
concerns the other elements of the realist methodology, however and contrary to what
Merrill (2010) and Dumontier and Hoehndorf (2010) argue ontologies following realist
principles are still able to be developed to fulfill this role. The information artifacts in
question will not, at least initially, be incorporable into reference ontologies recommended
for general use. But they may have a significant practical role to play nonetheless in helping
the relevant scientific hypotheses to become part of established science.
According to Merrill, a realist who is faced with a Higgs type of case would either (1) need
to wait before beginning the process of ontology building in the relevant areas until the
needed universals had emerged or (2) require a theory of meaning and thus a (nonrealist) theory of ontology that does not require the Referentialist Assumption. Case (1)
would cripple the realist methodology from a practical point of view. In case (2), realism
itself would be sacrificed for at least some portions of ontology building, thereby potentially
re-opening the problems flowing from older, concept-based approaches to ontology
development.32
Page 40
In fact, however, our solution is much more straightforward, and rests on the recognition
that language can clearly still be used to communicate in some sense even where
putative referring expressions fail in their reference. Some people assert their beliefs in the
existence of unicorns. All such beliefs are false. But the beliefs exist just as do other beliefs;
they can be communicated; and they can also be represented, as what we have called level 2
entities (Smith et al., 2006), in realist ontologies, created, perhaps, for purposes of
supporting psychiatric research.
The case of psychiatry reminds us that the issues raised by non-referring terms apply just as
much to singular terms as they do at the level of the general terms appropriate to ontologies.
Let us suppose, for example, that a psychiatric patient begins to express beliefs in something
he calls Murther. For the moment, we do not know whether Murther refers to some entity
or whether it is, like unicorn, merely the expression of some fantasy. Until this matter is
settled the psychiatrist, in compiling his clinical record for the patient, can avail himself of
the facility incorporated in the Referent Tracking paradigm, whereby instance unique
identifiers can be reserved for candidate particulars whose existence is not yet settled, for
example, when an order to obtain X-ray studies on some patient has been entered into the
hospital order system today, and identifiers are needed already in advance of the radiographs
that will exist only tomorrow. Such identifiers will lose their reserved status once the
entities in question have been confirmed to exist; and they will be immediately declared
obsolete should it ever be confirmed that the putative entities in question do not and will
never exist. The formal mechanisms are introduced in Ceusters (2007). We have
recommended that analogous mechanisms be formulated for application ontologies,
incorporating also new evidence codes to indicate that assertions containing the terms in
question are, for different reasons, problematic.33
By employing such mechanisms, application ontologies following realist principles can be
developed even where general terms are being used before the existence of corresponding
instances has been confirmed. The terms in question would need only to be provided with
provisional identifiers for purposes of ontological reasoning support.
The proposal thus conforms well with the strategy already implemented in the chemistry
domain, an area which Dumontier and Hoehndorf (2010) argue might somehow not be well
served by the realist approach. Consider, for example the model followed by IUPAC, the
International Union of Pure and Applied Chemistry, in its treatment of elements. There, a
formal name is given to an element only after evidence has been presented that it has been
created in the lab and this evidence has been verified through a rigorous process. In the
meantime, IUPAC creates provisional names for those elements hypothesized to exist, but
the latter are not included in the pertinent reference ontology (i.e., the Periodic Table) until
officially proven. At the same time, of course, pharmaceutical and other organizations are
developing the equivalent of application ontologies to support their planning processes in
which terms may be reserved, for example, for chemical substances that have not yet been
synthesized.
The problems identified in the above do not pertain specifically to ontologies and to the role
of the general terms therein. They pertain quite generally to uses of terms to (putatively)
32Some might suppose that there is a case (3), involving a hybrid approach that uses universals for scientifically established entities
and meanings or concepts for Higgs-type cases. One problem with such an approach, however, is that it leaves the ontologist with an
incoherent account of the referents of such terms during the transition from speculative to settled usage, as for example in the case of
new diseases at the stage when patients are already affected but the diseases themselves have not yet been incorporated into settled
diagnostic science.
33http://sourceforge.net/mailarchive/message.php?msg_name=20100719152134.E60EE207A9%40mweb2.acsu.buffalo.edu, last
accessed August 10, 2010.
Page 41
refer to what does not exist (Kroon, 1992). In the realm of particulars such terms are often
used in the context of planning for example, in the naming of babies not yet even
conceived. And for the formal regimentation of processes of this sort it appears that the
appeal to some sort of possible worlds approach la Lewis is what is required.
We have argued that a reference ontology is analogous to a settled scientific theory (Smith,
2008). Developing such an ontology presupposes an intention on the part of the developer to
represent some configuration of repeatable structures in reality in a way that conforms to the
current content of the relevant parts of science. There are of course many ontology-like
artifacts which rest on different goals. Some might develop application ontologies to
capture, for example, the content of Klingon science. Some might develop application
ontologies in the service of the history of science to represent entities postulated by Earthbased scientific theories that have long since been falsified. But such artifacts, we believe,
must be sequestered from the reference ontologies recommended for general use in support
of scientific research.
Merrill argues that a further problem arises for our views in the case of those general terms
whose referents have not yet been confirmed, because: If hypotheses containing such terms
can be regarded as meaningful (and they must if they are to be tested), then it cannot be
required that the terms in them denote universals (Merrill, 2010, p. 87). Here, too, we
believe, a response assigning the relevant term to an appropriate application ontology will be
quite sufficient.
DOLCE (Gangemi et al., 2002) is from the point of view of numbers of users a very
successful upper-level ontology, and it has been applied in a number of projects in
biomedical34,35 and other scientific domains. DOLCE and BFO in fact grew out of a
common philosophical orientation, and thus BFO overlaps with parts of DOLCEs top level
and is in close conformity with the DOLCE-associated OntoClean methodology (Guarino
and Welty, 2002). But DOLCE has chosen a strategy, different from that of BFO, focusing
on what it calls linguistic and cognitive engineering. This means that its coverage domain
includes the putative objects of mythology (leprechauns, for example) or fiction (instances
of pneumonia in 19th century Russian novels) and thus that, unlike BFO, it relies on an
ontology of possible worlds. We do not believe that this makes DOLCE stronger from the
perspective of providing support for the development of reference ontologies to serve the
needs of scientific researchers.
SUMO, too, has proved to have considerable value as an upper-level ontology for certain
purposes (Niles & Pease, 2001).36 Unfortunately the fact that it contains its own tiny
biology (protein, crustacean, body-covering, fruit-Or-vegetable) means that it cannot
34http://neuroscientific.net/index.php?id=43, last accessed June 30, 2010.
35http://www.imbi.uni-freiburg.de/aneurist/ontology/, last accessed June 30, 2010.
36It also incorporates certain elements contributed by Smith: http://suo.ieee.org/SUO/Ontology-refs.html, last accessed June 30, 2010.
Page 42
support the strategy of downward population that has proved so useful to scientists in the
case of BFO, since biologists are unlikely to find SUMOs definitions (and selection) of
biological terms acceptable, and they will find problematic the absence in SUMO of
anything like the BFO category of dependent continuant (for particulars such as Werners
headache, Marys hypertension or Brunos osteoarthritis (Scheuermann et al., 2009)).
Merrill has a number of positive things to say about Cyc in his Ontological Realism. Both
DOLCE and SUMO seem to us, however, to be much more coherent as ontologies for
scientific purposes than the upper level of Cyc, which is marred not least by the fact that it is
associated at lower levels with very many terms and definitions which, because of Cycs
primary focus on formalizing what it calls common sense knowledge, deviate significantly
from the terms and definitions favored by scientists. (The children of Cycs partially
tangible thing, for example, include both diisopropyl methylphosphonate and pay e-mail
provider.) From our present perspective, however, Cycs primary problem turns on the fact
that (like the UMLS) it does not strive for consistency among the various microtheories
which form its parts. Hence the very goal of creating a single consistent suite of
interoperable ontologies which would capture the terminological content of biomedical
science which is from our point of view the only coherent strategy for achieving ontologymediated data integration in the domain of the life sciences is undermined by Cycs own
paraconsistent logical structure.
DOLCE and SUMO are of signal importance for our argument here, however, because, like
BFO, both are constructed around (overwhelmingly) single inheritance taxonomies (is_a
hierarchies) consisting of singular nouns representing what in BFO and DOLCE (Masolo et
al., 2002) are called universals and in SUMO classes.37 In each case, the generic entities
which form the focus of the ontologies are said to have instances in the realm of particulars.
In each case the generic entities are governed by the sparse theory of universals outlined in
our discussion above.
Whether all of this applies to the Cyc knowledge base also is, alas, not easy to ascertain
from its documentation. But it is in any case clear and surely significant that at least
three of the four leading upper-level ontologies rest on views concerning the relation
between general terms and universals of just the sort that Merrill finds so objectionable.
Another design choice shared in common by BFO, DOLCE and SUMO is the acceptance of
a dichotomy between continuants and occurrents. Philosophers have argued back and forth
for some two thousand years over the question whether this dichotomy is truly such as to
represent the fundamental architecture of the reality that (as we would now say) is described
by science. Such arguments continue to be pursued at length by distinguished figures in the
ontology field such as John Sowa, who sees the continued existence of philosophical
communities with opposing views on this and similar matters as justification for his own
long-standing campaign against the very project of a consistent, formalized, upper-level
ontology of the sort whose widespread adoption is, in our eyes, the sine qua non of effective
ontology coordination in a large multidisciplinary area such as biomedicine. Rather, Sowa
favors a Cyc-like approach to ontology, in which the sparse theory of universals is
abandoned in favor of the acceptance of unrestricted Boolean combinations an approach
which Sowa himself describes as
37Classes are elucidated in the SUMO documentation as follows: Classes differ from Sets in two important respects. First, Classes
are not assumed to be extensional. That is, distinct Classes might well have exactly the same instances. Second, Classes typically have
an associated condition that determines the instances of the Class. So, for example, the condition human determines the Class of
Humans. See http://www.ontologyportal.org/translations/SUMO.owl.txt, last accessed June 30, 2010.
Page 43
Ontological traffic law principle: Ontological standards, including a common upperlevel ontology and standards governing syntactical uniformity, are indispensable to
every successful large-scale ontology development initiative, and this is so even if they
are selected arbitrarily provided they enjoy widespread assent among those working in
the relevant research community.
One example of such a traffic law, which has been executed with some success and, we
believe, some measurable benefit by the GO and its sister ontologies within the OBO
Foundry (Smith et al., 2007), is the law according to which all terms within an ontology
should be nouns and noun-phrases that are singular in number. (This purely syntactic law is
in fact inspired by our view according to which ontologies should be viewed as consisting of
representations of types or universals, but its implementation need clearly involve no
reference to this view.) Another example is the law which asserts that all terms in an
ontology should be traceable via is_a relations to the relevant ontology root node. Further
examples of such laws have been codified by the OBO Foundry in the form of an evolving
set of principles for ontology development in the biological and medical domains, some of
them focusing on governance. The first ten of these principles, first promulgated in April
2006,38 have proved to be of value to ontology developers seeking guidance on how most
effectively to create ontologies in such a way as to maximize consistency with other OBO
ontologies. Further principles are currently under review by the OBO Foundry with a view
to their adoption in the future.
These principles are interesting, since some of them are treated by Merrill as figures of fun,
and some of them as dangers to the advance of science. Under the first heading, Merrill sees
some of the principles of ontological realism roughly along the lines of How could such a
strange amalgam of Aristotelico-Australian philosophical ideas possibly have import for the
workings of serious scientific research? In fact, however, since the authors of this
communication first began to collaborate in 2002, our ontology development methodology
has been driven by needs and concerns not of philosophers, but rather of scientists building
systems in areas such as hospital adverse event reporting (Ceusters et al., 2009, 2009a,
2009b), salivaomics (Ai et al., 2010), or the diagnosis and treatment of Methicillin-resistant
Staphylococcus aureus (Goldfain et al., 2010).
38http://www.obofoundry.org/crit.shtml, last accessed June 30, 2010.
Page 44
Under the second heading (dangers to the advance of science), Merrill is concerned that the
OBO Foundry principle of modularity according to which there should be one ontology
for each domain that is recommended for general use in realizing the purposes of the
Foundry might harbor a view according to which for every scientific domain there is or
will be exactly one true theory, a view which could have detrimental consequences in
constraining the flexibility that is indispensable to scientific advance. As we have argued at
length (Smith et al., 2007), however, the OBO Foundry is not attempting to restrict the
ontologies people can build. Rather, it is attempting, as an experiment, to create a suite of
ontology artifacts built around a small set of high quality, interoperable, non-overlapping
reference ontologies following certain principles. All of those involved in the Foundry
initiative recognize that it is vital to the success of the Foundry that it is always open to, and
can only benefit from, both (1) criticism from the outside on the basis of the assumption
that no Foundry resource will ever exist in a form that cannot be further improved, and (2)
competitor initiatives, both at the level of single ontologies and at the level of the Foundry as
a whole.
7. Conclusion
At one point Merrill (2010, p. 105) asserts that our approach is neither science nor
philosophy, and in this he hits the nail exactly on the head. For in propagating the realist
methodology we are indeed engaging in a novel interdisciplinary activity that involves
elements of both of these, and also of computer science, politics, community organizing,
sociology, logic, and other black arts. Merrill himself, however, draws a slightly different
conclusion. For him, our approach because it involves reference to those damned
universals is ideology through and through, and hence in the final analysis
unscientific. Let us grant him, in the interests of eirenic compromise, that there is an
element of ideology involved in our work. Coordinated ontology development across a large
scale is so difficult that we are happy to draw on any means that will help us to achieve our
ends. But then at the same time we submit that there is an equal and opposite admixture of
ideology on Merrills side also an ideology deriving from the School of Nominalism.
For this reason too, therefore, we would welcome a systematic effort on Merrills part to
create and disseminate a strategy for ontology development that can be certified to be
general term free. If such a strategy were to gain traction amongst biologists, to the point
where Merrill himself were able to point to evidence of clear practical advantages over the
realist approach, then we would of course switch our adherence immediately. Strangely,
though, we cannot shake off our conviction that Merrill himself, were he to find himself in
an analogous situation,39 would not switch over to our side.
Acknowledgments
We are grateful to Gary Merrill for giving us this opportunity to clarify our views. We thank also Colin Batchelor,
Randall Dipert, Albert Goldfain, Janna Hastings, William Hogan, Ingvar Johansson, Michael McGlone, Chris
Mungall, Peter Robinson, Stefan Schulz, David Osumi-Sutherland, Alan Ruttenberg, Frederic Tremblay, Neil
Williams and the participants in the obo-discuss email discussion at http://tinyurl.com/34lacvy, for valuable
suggestions. The work on this paper was partially supported by the National Institutes of Health through the NIH
Roadmap for Medical Research, Grant 1 U 54 HG004028 (National Center for Biomedical Ontology) and also by
Grant R21LM009824 from the National Library of Medicine. The content of this paper is solely the responsibility
of the authors and does not necessarily represent the official views of the National Library of Medicine or the
National Institutes of Health.
39It may be that Merrill is already in this analogous situation, for instance given the statistics assembled in Bodenreider (2008),
which documents a significant fall-off in citations of the UMLS by clinical and biological researchers in recent years and a
countervailing rise in usage of the GO (measured as a percentage of all PubMed citations pertaining to ontology).
Page 45
References
NIH-PA Author Manuscript
NIH-PA Author Manuscript
NIH-PA Author Manuscript
Abazov VM, et al. Search for Higgs boson production in dilepton and missing energy final states with
5.4 fb1 of pp collisions at 2 1.96 TeV. Physical Review Letters. 2010; 104(061804)
Ai J, Smith B, Wong D. Saliva ontology: An ontology-based framework for a salivaomics knowledge
base. BMC Bioinformatics. 2010; 11:302. [PubMed: 20525291]
Arighi C, Liu H, Natale D, Barker W, Drabkin H, Hu Z, Blake J, Smith B, Wu C. TGF-beta signaling
proteins and the protein ontology. BMC Bioinformatics. 2009; 10(Suppl. 5) Art. No. S3.
Armstrong, DM. Universals and Scientific Realism. Nominalism and Realism (Vol. 1). A Theory of
Universals (Vol. 2). Cambridge University Press; Cambridge: 1978.
Armstrong DM. Against ostrich nominalism: a reply to Michael Devitt. Pacific Philosophical
Quarterly. 1980; 61:441.
Armstrong DM. In defence of structural universals. Australasian Journal of Philosophy. 1986; 64(1):
8588.
Armstrong, DM. Universals: An Opinionated Introduction. Westview Press; Boulder, CO: 1989.
Armstrong, DM. Universals as attributes. In: Loux, MJ., editor. Metaphysics: Contemporary Readings.
2nd edn.. Routledge; New York: 2008.
Batchelor, C.; Bittner, T.; Eilbeck, K.; Mungall, C.; Richardson, J.; Knight, R.; Stombaugh, J.; Zirbel,
CL.; Westhof, E.; Leontis, NB. The RNA Ontology (RNAO): an ontology for integrating RNA
sequence and structure data; Proceedings of the International Conference on Biomedical
Ontologies; Buffalo, NY: University at Buffalo. 2009; p. 7-10.
Bittner T, Donnelly M. Logical properties of foundational relations in bio-ontologies. Artificial
Intelligence in Medicine. 2007; 39:197216. [PubMed: 17428644]
Bittner, T.; Smith, B. Vagueness and granular partitions. In: Welty, C.; Smith, B., editors. Formal
Ontology and Information Systems. ACM Press; New York: 2001. p. 309-321.
Bodenreider, O. Yearbook of Medical Informatics. Schattauer; Stuttgart: 2008. Biomedical ontologies
in action: role in knowledge management, data integration and decision support; p. 67-79.
Bodenreider, O.; Smith, B.; Burgun, A. The ontology-epistemology divide: a case study in medical
terminology. In: Varzi, A.; Vieu, L., editors. Formal Ontology and Information Systems;
Proceedings of the Third International Conference (FOIS 2004); Amsterdam: IOS Press. 2004; p.
185-195.
Bourget, D.; Chalmers, D. The PhilPapers surveys: results, analysis and discussion. 2009. Available at:
http://philpapers.org/surveys/
Brown, DE. Human Universals. McGraw-Hill; New York: 1991.
Buszkowski, W.; Marciszewski, W.; Benthem, JV., editors. Categorial Grammar. John Benjamins;
Amsterdam: 1988.
Carnap, R. Der logische Aufbau der Welt. Felix Meiner; Leipzig: 1928. English translation by R.A.
George, The Logical Structure of the World. Pseudoproblems in Philosophy. University of
California Press, 1967
Carnap R. Empiricism, semantics, and ontology. Revue Internationale de Philosophie. 1950; 4:2040.
Cavalli-Sforza LL. Genes, peoples, and languages. Proceedings of the National Academy of Sciences.
1997; 94:77197724.
Ceusters, W. Dealing with mistakes in a referent tracking system. In: Hornsby, KS., editor.
Proceedings of Ontology for the Intelligence Community (OIC). Columbia, MA: November 28
29. 2007 p. 5-8.
Ceusters W. Applying evolutionary terminology auditing to the Gene Ontology. Journal of Biomedical
Informatics. 2009; 42(3):518529. [PubMed: 19162233]
Ceusters, W.; Capolupo, M.; Devlies, J. D4.2 RAPS Domain Ontology (M12 Version). Background
materials and methodology used to develop the domain ontology for risks against patient safety.
2009a. Available at: http://www.referent-tracking.com/RTU/sendfile/?file=ReMINE-D4-2.pdf
Ceusters, W.; Capolupo, M.; Devlies, J. D4.3 RAPS Application ontology (Version 1). Background
materials and methodology used to develop application ontologies for risks against patient safety.
2009b. Available at: http://www.referent-tracking.com/RTU/sendfile/?file=ReMINE-D4-3.pdf
Page 46
Page 47
Haendel, M.; Neuhaus, F.; Sutherland, D.; Mejino, JLE., Jr.; Mungall, C.; Smith, B. CARO: The
Common Anatomy Reference Ontology. In: Burger, A.; Davidson, D.; Baldock, R., editors.
Anatomy Ontologies for Bioinformatics: Principles and Practice. Springer; New York: 2008. p.
327-349.
Hill DP, Blake JA, Richardson JE, Ringwald M. Extension and integration of the Gene Ontology
(GO): combining GO vocabularies with external vocabularies. Genome Research. 2002; 12(12):
19821991. [PubMed: 12466303]
Hill DP, Smith B, McAndrews-Hill MS, Blake JA. Gene Ontology annotations: what they mean and
where they come from. BMC Bioinformatics. 2008; 9(Suppl. 5):S2. [PubMed: 18460184]
Hogan, WR. Whats in an is a link?; Proceedings of the First International Conference on Biomedical
Ontology; Buffalo. 2009; p. 170
Hogan WR. Why the Unified Medical Language System is not an ontology, MS. 2010
Hogan WR. Towards an ontological theory of substance intolerance and hypersensitivity. Journal of
Biomedical Informatics. 2010 to appear.
Holenstein, E. Roman Jakobsons Approach to Language. Indiana University Press; Bloomington, IN:
1976.
Husserl, E. Logische Untersuchungen. 2nd edn.. Niemeyer; Halle: 1913/21, 1970. English translation
as Logical Investigations, by J.N. Findlay. London: Routledge and Kegan Paul
ISO. Terminology-Vocabulary (ISO 1087: 1990). International Standards Organization; Geneva: 1990.
ISO. Text for FDIS 704. Terminology work: principles and methods (ISO/IEC JTC1 SC36 N0579:
1999). International Standards Organization; Geneva: 1999.
Johansson, I. Pattern as an ontological category. In: Guarino, N., editor. Formal Ontology in
Information Systems. IOS Press; Amsterdam: 1998. p. 86-94.
Kroon FW. Was Meinong only pretending? Philosophy and Phenomenological Research. 1992; 52(3):
499527.
Kuhn, TS. The Structure of Scientific Revolutions. The University of Chicago Press; Chicago, IL:
1970.
Kuroda S-Y. A second look at Marty, Husserl, and Chomsky: the significance of the revolution in
linguistics. Tohoku Daigaku Kenkyu Nenpo. 1997; 47:137.
Lenat D. CYC: a large-scale investment in knowledge infrastructure. Communications of the ACM
Archive. 1995; 38(11):3338.
Lewis, D. On the Plurality of Worlds. Blackwell; Oxford: 1986.
McCarthy, J.; Hayes, PJ. Some philosophical problems from the standpoint of artificial intelligence.
In: Meltzer, B.; Michie, D., editors. Machine Intelligence. Vol. 4. Edinburgh University Press;
Edinburgh: 1969. p. 463-502.
Masci AM, Arighi CN, Diehl AD, Lieberman AE, Mungall C, Scheuermann RH, Smith B, Cowell LG.
An improved ontological representation of dendritic cells as a paradigm for all cell types. BMC
Bioinformatics. 2009; 10:70. [PubMed: 19243617]
Masolo, C.; Borgo, S.; Gangemi, A.; Guarino, N.; Oltramari, A. WonderWeb Deliverable D18:
Ontology Library (Final). 2002. Available at:
http://wonderweb.semanticweb.org/deliverables/documents/D18.pdf
de Matos P, Alcntara R, Dekker A, Ennis M, Hastings J, Haug K, Spiteri I, Turner S, Steinbeck C.
Chemical entities of biological interest: an update. Nucleic Acids Research. 2010; 38:D249D254.
[PubMed: 19854951]
Merrill, GH. Engineering a development platform for ontology-enhanced knowledge applications. In:
Sharman, R., et al., editors. Ontologies. A Handbook of Principles, Concepts and Applications in
Information Systems. Springer; New York: 2007. p. 777-822.
Merrill GH. Concepts and synonymy in the UMLS Metathesaurus. Journal of Biomedical Discovery
and Collaboration. 2009; 4(7):137. [PubMed: 19126221]
Merrill GH. Ontological realism: methodology of misdirection? Applied Ontology. 2010; 5:79108.
Morris C. On the history of the International Encyclopedia of Unified Science. Synthese. 1960;
12:517521.
Page 48
Mungall CJ. Obol: Integrating language and meaning in bio-ontologies. Comparative and Functional
Genomics. 2004; 5:509520. [PubMed: 18629143]
Mungall CJ, Batchelor C, Eilbeck K. Evolution of the Sequence Ontology terms and relationships.
Journal of Biomedical Informatics. 2010 to appear.
Neuhaus, F.; Grenon, P.; Smith, B. In: Varzi, A.; Vieu, L., editors. A formal theory of substances,
qualities, and universals; Formal Ontology in Information Systems: Proceedings of the Third
International Conference (FOIS 2004); Amsterdam: IOS Press. 2004; p. 49-59.
Niles, I.; Pease, A. In: Welty, C.; Smith, B., editors. Towards a standard upper ontology; Proceedings
of the 2nd International Conference on Formal Ontology in Information Systems (FOIS);
Amsterdam: ACM Press. 2001; p. 2-9.
Noy, NF.; McGuinness, DL. Ontology development 101: a guide to creating your first ontology,
Technical report. 2001. Available at:
http://protege.stanford.edu/publications/ontology_development/ontology101-noy-mcguinness.html
Pinker, S. The Blank Slate. Viking Press; New York: 2002.
Ramsey, FP. Foundations. Mellor, DH., editor. Routledge; London: 1978.
Quine, WVO. From a Logical Point of View. Harvard University Press; Cambridge: 1953. On what
there is.
Rector, AL. Proceedings of K-CAP. ACM Press; New York: 2003. Modularisation of domain
ontologies implemented in description logics and related formalisms including OWL; p. 121-128.
Rector AL, Nowlan WA. The GALEN project. Computer Methods and Programs in Biomedicine.
1994; 45(1,2):7578. [PubMed: 7889770]
Rosse C, Mejino JLE Jr. A reference ontology for biomedical informatics: the Foundational Model of
Anatomy. Journal of Biomedical Informatics. 2003; 36:478500. [PubMed: 14759820]
Rosse, C.; Mejino, JLE, Jr.. The Foundational Model of Anatomy Ontology. In: Burger, A.; Davidson,
D.; Baldock, R., editors. Anatomy Ontologies for Bioinformatics: Principles and Practice.
Springer; London: 2007. p. 59-117.
Scheuermann, RH.; Ceusters, W.; Smith, B. Toward an ontological treatment of disease and diagnosis;
Proceedings of the 2009 AMIA Summit on Translational Bioinformatics; Washington, DC:
AMIA. 2009; p. 116-120.
Schulz S, Suntisrivaraporn B, Baader F, Boeker M. SNOMED reaching its adolescence: ontologists
and logicians health check. International Journal of Medical Informatics. 2009; 78(Suppl. 1):S86
S94. [PubMed: 18789754]
Slater BH. Internal and external negations. Mind. 1979; 38(1):588591.
Smith, B. Husserl, language and the ontology of the act. In: Buzzetti, D.; Ferriani, M., editors.
Speculative Grammar, Universal Grammar, and Philosophical Analysis of Language. John
Benjamins; Amsterdam: 1987. p. 205-227.
Smith, B. Towards a history of speech act theory. In: Burkhardt, A., editor. Speech Acts, Meanings
and Intentions. Critical Approaches to the Philosophy of John R. Searle. de Gruyter; Berlin/New
York: 1990. p. 29-61.
Smith, B. Beyond concepts: Ontology as reality representation; Proceedings of the Third International
Conference on Formal Ontology in Information Systems (FOIS 2004); Amsterdam: IOS Press.
2004; p. 73-84.
Smith, B. In: Reicher, ME.; Marek, JC., editors. Against fantology; Experience and Analysis: Papers
of the 27th International Wittgenstein Symposium; Vienna: The Austrian Ludwig Wittgenstein
Society. 2005; p. 153-170.
Smith B. From concepts to clinical reality: an essay on the benchmarking of biomedical terminologies.
Journal of Biomedical Informatics. 2006a; 39:288298. [PubMed: 16293444]
Smith, B. Against idiosyncrasy in ontology development. In: Bennett, B.; Fellbaum, C., editors.
Formal Ontology in Information Systems; Proceedings of the Fourth International Conference;
Amsterdam: IOS Press. 2006b; p. 15-26.
Smith, B. Ontology (science). In: Eschenbach, C.; Gruninger, M., editors. Formal Ontology in
Information Systems; Proceedings of the Fifth International Conference; Amsterdam: IOS Press.
2008; p. 21-35.
Page 49
Page 50
Fig. 1.
Proposed Elementary Particle Ontology according to the Standard Model (Anno 2009).
Page 51
Fig. 2.
The structure of the OBO Foundry (shaded regions correspond to the three original GO
ontologies).
Page 52
Fig. 3.
Page 53
Table 1
Primitive classes
Primitive relations
Cell
Part of
Male gamete
Earlier than
Female gamete
Whole organism
Environment of
Organized unity
Genetic property