Cladistics - The Theory and Practice of Parsimony Analysis)

•
-
." .._
'n.._ ....
.. _01:....._ _ _ _ ..
...."_t<ono
-........., ...
J- _ _~--.
__
~"~
ol _ _ ",.
P"""'
.. 010.....
_ dod ... """""'....,,,...,..,
to oa._h pro<odo......... .. . __.... _- -
...., ._
...t.o , c-,_""'~""o _ . AbiPIY_
1>00< 1'0< . 11 _ ..... ..... w oo ...""""'" _",. ......... of ' .. :IO<h
« ",",y!' ,.C..... v• .,.. 1<1<0., Cr., ........
.... lihc II
'AII<h>"'tn ... <.",;o...q;!l. , ,,at.d ,. , To ...
",oM "'"" "' """ ""
,...", .. mi<lom. of o.... dod...., _ _ ,~ ~
.. ... _
",..... ....... loo_""'dwrthooo"<,......-,· ~
~"
J - . ( ;, .......... _ s , . u - .... /l< ~
:;" --- /(
""= §
THE IYITmllC! AIIOCIATION PUBLICATION NO. II -
o
o OXfORJ) U!'lIV~R$1TY I'IU!SS
Cladistics
Second Edition
The Theory and Practice of
Parsimony Analysis
Jan 1. Kitching, Peler L. Forey, Christopher 1. Humphries,

and David M. Williams
Oxford New York Tokyo

OXFORD UNIVERSITY PRESS
1998
Oxford Vlliuenit)' PnSJ, Great C/artmdun Street, Oxford OX2 6/)1'
OxfonJ N ew YUlk
A/hem Auckland Ball):kok B()8OIa Bombay
BllelWS Aires CalclIUa Capt Town Dor es SaIUllIn
Delhi Florence HOII8 Kong /stanlml K4rochi
Kllala u,mpllr Madras Madrid Melbourne
Maico City Nairobi Pllfis Singapore
Taipei Tokyo Toronto Warsaw
and associated comptlllies in
Herlin lbad(ln
Oxford Is a trode mark uf Oxford Univmity Press

Published ill 1M United S,ates
by Ox[onl Uni~ity Prtss 'nc., New York
Cl The Systematics ASSOCiation, 1998
All rights reserved. No [Hlrt of this IfUblicatio~ay be

reprodllctd, stored in a relrieu(1.I.syStem, or transmifled, in allY
fonn or by allY mean.'!, wilhowthe prior permission in writing of o.rford
Uni~nity fu$S. Within the UK, aceptiofI.I are ullowed in respect of al'Y
fair dealillg for the P"'1'ose of research or priUUk sttldy, or criticism or
review, as pennitted under tM Copyright, Dtsigns and Patents A ct, /988, or
in the case ofrtpmgraphie reproduction in accordance with the terms of
licences wUt'd by the Copyrighl Licensing Agency. Enquiries COllcerning
reproduction ol/lside those temu and in OIher counlritsshauld be sent 10
Ihe Righls Dtpurtlnenl, Oxford University PreJl, allhe address above.
This booK is sold subject to the condition that il shall llot,

by way of trade urothen'l'iu, be lenl, re-rold, hiIT'd out, or otherwise
c~ulQ/ed without lhe publishds priM cemUIII in any fonn of bindmg
or cover OIher than Ihat in which il is published und ",ilhaut a similar
condition inc/lUling this condition being imposed
on Iht: sub:!equelll pun::haser.
A call1iogut: rr:cord lor Ihis book is ulillilllblt: from tht: Brilish Libmry
Ubrory 0/ Congress Clltalogillg ill Publication Dota

Cladistics: Ihe theory ~nd proctict of parsimony analysis / Illn J.
KitchIng . . . /1'101./. - 2nd ed.
(Systematics Association publicalions.: I J)
Inc/udes bibliographical rr:/ermces and indu.
I. Cludutie analpis. I. Kilching. Ian 1. II. Series.
QII83.C4861998 570'.1'2 - ddl 98-10352
ISBN 0 19 850139 0 (HbIc)
ISBN 0 198501382 (PM)
Typeset by 1"t:chnical Typeu/ling Irt'/und, &i/asl

Prinltd in Great Britain by Oid/lles UIL , Guildford &- Killgs Lylln
This book is dedicated to th e memory of Colin
Patters on who shared with us his deep in sight int o
the theory and practice of sys tema tics thro ugh
discu ssion, argument, writings and above all,
friend ship.
Preface
The predecessor of this book originated as course material for a workshop
sponsored by the Systematics Association in 1991. AI that lime, the Keepers
of The Natural History Museum Botany and Entomology Departments, Dr
Stephen Blackmore and Prof. Laurence Mound respectively, realized that
the re would be a broader demand for this information and e ncouraged its
publication . The Systematics Association agreed to issue the manual as part
of their Systematics Associat ion Publications series. The workshop proved to
be highly successful and was run for the next five years, during which lime it
was attended by over 100 students, toget her with 40 staff members of The
Natural History Museum. In a modified formal, the course was also run
twice at the University of Verona, Italy, and once at the University of
Massachusetts, Amherst It also formed the basis of a module of The Natural
History Museum/Imperial College of Science, Technology and Medicine
M.Sc. course, Advanced Methods in Taxonomy al1d Biodiversity. As a publica-
tion, Cladistics: a pracrical course ill systematics proved even mo re successful.
The original 1992 hardback run sold out rapidly and the book was subse-
quently reprinted three times in softback format 0993, 1996, 1997).
Over this period there have been many developments in cladistic theory
and practice. As our knowledge has advanced, some areas have become less
sign ificant, such as the a priori determination of character polarity. Others,
such as the phenetic methods used to analyse large molecular data sets have
been superseded by efficient and fast algorithms for parsimony analysis. Yet
other areas grew and ga ined in importance. Three-item state ments analysis
ra ted less than a page in the original manual but is now perhaps the most
contentious area in syste matics, generating new ideas and forcing the critical
re-appraisal of much cherished ideas. The whole field of tree support and
confidence statistics had barely begun in 1992 but has now produced a wide
varicty of measures. Finally, the original manual was criticized by some as
disjointed, appearing to be no more than a series of lecture notes put
togethe r in a single cover. If we are honest, that is exactly what it was.
When we decided to prepare a second edition, we resolved to address these
issues, yet still produce a book of similar size to the o riginal . Thus, we were
faced with some hard choices. First, we decided that this was to be a book
about cladistics. We therefore excluded those areas that we did not consider
to be part of cladistics, such as phenetic and maximum likelihood techniques.
Thcre are other hooks dedicated to both topics that the reader ean consult if
so inclined. We .110 removed the chapters a ll cladistic biogeography and
converting cI ........ inl0 formlll cllIssifica tions. aga in because specialist
texiS 9 n ~ _ IVlaable . We ,.Iso wanted to stress a unified
viii Preface
approach to the asse mbly and analysis o f data, irrespective of its source.
Hence there arc no chapters dea ling expressly with fossils o r with molecular
sequence dat a.
Instead, we have o rganized the book into ni ne chapters, beginning wit h a
d iscussion of basic princi ples and concepts. The next six chapters fo llow the
sequence o f events in a cl adistic analysis. Chapter 2 concerns characters and
coding, thaI is, how we proceed from observat ions o f organisms 10 an
al phanumeric data matrix. Chapte r 3 dea ls with cJ adogra m construction fro m
the da ta matrix and cJadogram root ing, toget her with the related lopic o f
polarity de ter mination. C haracter optimization and the effec ts of missing
data are considered in Chapter 4. C hapt er 5 addresses character fit and
weighting, while Chapter 6 provides the fi rst compre hensive overview of
cladogram su ppo rt and confi dence statistics. Chapter 7 discusses methods of
consensus analysis. T he pros and cons of simultaneous versus parlitioned
analysis form the subject of Chapter 8, while Ckapte r 9 discusses three- ite m
slatements analysis, a rece ntly deve lo ped method that unfo rtun ately has been
the subject of much partisan and opaque writing. Finally, in response to
numerous requests from students, we have included a glossary.
This book is the collective responsibility of all fou r authors. However,
preparatio n of the first drafts of th e text was unde rtaken as fo llows. Peter
Forey wrOte the sections o n basic concepts, missing va lues and simultaneous
ve rsus pa rlitioned analysis. Chris Humphries wrote the sections on charac-
ters, coding and consensus trees. Ian ](jtching wrote the sections on dado-
gram construction, characte r pola rity and rooting, optimizatio n, confi dence
li nd suppor! statistics and the glossary. David Williams wrole the sections on
basic measures of fil , weighting and three-item statements analysis. Peter
Forey prepared the figures, with considerable and ca pable assistance fro m his
daughter. Ki m. Finally, Ian Kitching undertook the ro le of editor, with the
unenviable job of trying to marry the various sections into a si ngle coherent
and sea mless whole. Any trivial mis takes that remain ca n be laid at his door
but more fun damental disagreement s should be taken up with all fo ur of us.
The choice of weapons will be ours.
Many ind ivid uals cont ributed grea tly to this book. In particular, we would
li ke to thank G ary Nelson, Mark Siddall, Ka ren Sidwell, Darrell Siebert and
Dick Va ne-Wright, who critically read through parts o f the man uscri pt, and
particularly to Andrew Smith and an ano nymous reviewe r, who read it all. We
also thank A ndrew Smith for permission to use his illustrations of th e PTP
test, Bremer support and the bootstrap and to Springer International
for perm ission to reproduce the ill ustration of Ihe 5S rRNA molecule o f
Pet/illomollas minor. Wc also thank, most wholehea rtedly, the Systemat ics
Association for their continued support o f this project.
I.J .K.
I' ,L.F.
CJ .H.
O.M.W.
Contents
List of authors XIII
1. introduction to cladistic co ncepts

1. 1 Defini tiun of re lationShip 1
1.2 Types of characters 2
1.3 Parsimony 5
1.4 G roups 10
1.5 Cladograms and trees 15
1.6 Tree terminology 17
1.7 Chapler summary .1 8
2. Characters and character coding 19

2. 1 In troduction 19
2.1.1 Filters 19
2.2 Kinds of characters 20
2.2.1 Qualitative and quantitative variables 20
2.2.2 Discrete and cont inuous va riables 21
2.2.3 Overlapping a nd non-overlapping characters 22
2.3 Cladistic characte rs 23
2.3.1 Diagnostic and systematic characters 23
2.3.2 C haracter transformations 25
2.3.3 Characters and character slates 25
2.3.4 Homology 26
2.4 Character codi ng for d iscrete va riables 27
2.4. 1 Mu ltistate characters - character linkage during analysis 29
2.4.2 Binary characters - character linkage duri ng analysis 29
2.4.3 Hierarchical character linkage 30
2.4.4 Tra nsforma tion between character states: order and polarity 30
2.4.5 Missing values Rnd cod ing 31
2.4.6 Informat ion content and the congruence les[ 32
2.5 Morphometric data in cI;.distic a nalysis 33
2.5. 1 Cudi ng morphome tric data 33
2.5.2 Oap·weight ing 33
2.6 Discussion: character discove lY and coding 35
2.6. 1 Choice ur chartM:lc rs 35
2.6.2 Coding 36
2.7 ('h nl!t~.LliU~mtUIY. 37
x COn/ems
3. Cladogram construction, character polarity

and rooting 38
3.1 Discoveri ng the most parsimonious cladograms 38
3.1.1 He nnigian argumen tation 38
3.1.2 Exact methods 39
3.1.3 Heuristic methods 42
Stepwise addition 43
Branch-swapping 45
3.2 Character polarity and rooting 48
3.2.1 Outgruup comparison 49
3.2.2 The on togenetic criterion 54
3.2.3 Ontogenetic criterion or outgroup comparison- which
is superior? 59
3.2.4 A priori models of character state chabge 60
Ingroup commonality 60
St ratigraphy 61
Biogeography 61
Function /adaptive value 62
Underlying synapomorphy 62
3.2.5 Polarity and rooting a posteriori 64
3.3 Chapler summary 68
4. Optimization and the effects of missing values 70

4.1 Optima lity criteria and character optimization 70
4.1.1 Wagner optimization 70
4.1.2 Fitch optimization 73
4.1.3 00110 optimization 75
4.1.4 Camin-Sakaloptimization 77
4.1.5 Generalized optimization 78
4.2 Missing values 79
4.3 Chapter summary 91
5. Measures of character fit and character weighting 92

5. 1 Measures of characte r fit 92
5.1.1 C1adogram length 92
5.1.2 Consistency index (cn 95
5.1.3 Ensemble consistency index (CI) 95
5.1.4 Problems with the consistency index 96
5. 1.5 Retention index (d) 97
5. 1.6 Ensemble retention index (RI) 'I'!
Colllet/ts
5.2 Character we ighting 99

5.2.1 Types of character weighti ng 100
5.2.2 A priori weighting 100
Character analysis 100
Morphological data 101
Molecular data 102
Compatibility analysis 108
5.2.3 A posten·ori weighti ng 110
Cladistic consistency 110
Successive weighting 111
Implied weighting 112
5.2.4 Prospects 11 5
6. Support and confidence statistics for c1adograms

and groups 118
6.1 Introduction 118

6.2 Random ization procedures applied to the whole cladogram 118
6.2. 1 Data decisiveness 119
6.2.2 Distribution of c1adogram lengths (O CL) 120
6.2.3 Permutation tail probabi liry (PTP) 122
6.3 Support for individual clades on a cladogram 126
6.3.1 Bremcr support 127
6.3.2 Randomizat ion procedures 129
Bootstrap 129
Jackknife 13 1
Clade stability index 133
Topology·dependent pe rmutation tail probabil ity (T· PTP) 134
6.4 Summary 135
6.5 Chapte r su mmary 136
7. Consens us trees 139
7. 1 Int roduction 139

7.2 Strict consensus trees 141
7.3 Combinable components or semi-strict consensus 143
7.4 Majority-rule and median consensus trees 145
7.5 Nelson consensus 146
7.6 Ada ms consensus 147
7.7 Agreement subt rees or cummun pru ned trees 147
7.S Conclusions 148
7.9 Chsp'tcr ,u-,em 149
xii Contents
O. Simultaneous and partitioned analysis 151

S.l Introduct ion 151
8.2 Theoretica l issues 155
S.3 Partitioned analysis (taxonom ic congruence) 157
8.3.1 Independence of data sets 157
8.3.2 Cladogram support 159
8.3.3 Diffe re nt sized data sets 159
8.4 Simultaneous analysis 160
8.4 .1 Resolution 160
8.4.2 Arbitrury consensus 161
8.5 Conditional data combination 162
8.6 Operational diffic ulties 164
8.7 Conclusions 165
9. Three-item statements analysis 168

9.1 Introduction 168
9.2 Coding 168
9.3 Implementation 170
9.3. 1 Binary characters 170
9.3.2 Multistate characters 171
9.3.3 Representation of three-item statements for analysis with
current parsimony progra ms 171
9.3.4 Cladogram length and three-item statements 172
9.3.5 Uniform and fractional we ighting 173
9.3.6 Minimal c1adograms 177
9.3.7 Optimization 178
9.3.8 Informa tion measures: CJ and RI 179
9.3.9 Summary·of implementation procedures 180
9.4 Precision 180
9.5 Chapte r summary 185
References 187
Suggestions for further reading 196
Glossary 199
Appendix: Computer programs 221
Index 223
Authors
Peter L. Forey
Dcpul1mcm of Palaeontology, 77le Natural History Museum, L ondon
Christopher J. Humphries
Deparlmcm of Botany, The Natural His/of)' Museum, London
Ian J. Kitching
Department of £mol1lolog)" The Natural History Museum, Lone/oil
David M. Williams
Department of Botany, The NaluraJ HislOry Museum, London
1.
Introduction to cladistic concepts
1.1 DEFINITION OF RELATIONSHIP
Cladistics is a method of classificatio n that groups taxa hierarchically into

discrete sets and subsets. Clad istics can be used to organize any com parative
data (e.g. linguistics) bUI its greatest application has been in the fie ld of
biologica l systematics. Cladistic methods we re made explicit by the German
entomo logist, Willi Hennig (1950), and became widely known to E nglish
speakers in 1965 and 1966 under the name ' phylogenetic systematics' . Hennig
wa nted a method for im ple menting Darwin's concepts of ancestors and
descendants. Hennig explained his ideas within an evolutionary fram ework;
he wrote about species, speciation and the transformation of morphology
th ro ugh the process of evolution, In this introductory chapter, we will begin
with He nnig's explanations but then slowly move towards the modern cladis-
tic view, which d ispenses with the need to rely on any particula r theory of
evolution fo r the analysis of systematic problems.
Hennig's most important COntributions were to offer a precise definition of
biological relationship and then 10 suggest how th at relationship might be
discovered. Formerly, the mea ning of biological relationship was defined only
vaguely. For example, a crab and lobster might be considered related because
they show a high degree of overall simil arity. Altern ative ly, these two animals
might have been thought close relatives because it was easy to imagine how
structures in one had transfo rmed into structures in the o ther, perhaps
th ro ugh the link of a common ancestor. These definitions of rclationship
were absolute (X is related to V).
Hennig's concept of relationShip is relative and is illust rated in Fig. 1.1 .
Considering three taxa, the sa lmon and the lizard are more closely relucd to
each o ther than either is to the shark. This is so because the salmon and the
lizard share a common ancestor, ' x' (which lived at lime (2)' that is not shared
with Ihe shark or any other laxon. Simil arly, the shark is mo re closely related
to the group salmon + lizard because the shark, salmon and lizard together
share a unique common ancestor, 'y', which lived at an earlier time (t l ). The
sa lmon and lizard are culled sister-grou ps, wh ile the shark is the sister-group
of the combined group salmon + liza rd. Similarly, Ihe lamprey is the sisler-
group of sha rk + salmon + lizard. The aim o f cladistic analysis is to hypothe-
si7.t: the sister-group hierarchy and express the re~u l fs in term ~ of hranching
diagrams. These diam IN called c1Kdograms, a reference to th e fac t that
2 Jllfrodllctioll (0 cladistic concepts
"""'
C
time
1
Fig. 1.1 Hennig's concept of relationship. For example, the lizanl a nd the sa lmon are
considered to be more closely related to each other than either is to the shark
because they share a common ancestor, ' x' (which lived al time ' 2)' th ai is not
shared with the shark or any other taxon.
they purport to express genealogical units o r clades. The aim of cladistics is 10

establish sister-group relationships. and the concept of two taxa being more
closely related to each ot her than either is to a third (the three-taxon
statement) is fundamental to cladistics.
Sister-groups arc hypothesized through the analysis of characters. These
characters may be morphological, physiological , behavioural, ecological or
molecu lar. The only requirement for cladistic analysis is that we translate
what we observe into discrete characters, that is, a particular character is
either prescnt or abscm ; it is one colour or another; it is small or large; and
so on.
1.2 TYPES OF CHARACTERS
Hennig made a distinctio n between two types of characters, or character

states, and Ihis distinction depe nded o n where they occurred in the infe rred
phylogenetic history of a group. The character or state that occu rs in the
ancestral morpholype he called ' plesiomorphic' (near to the ancestral mor-
phology) and the derived cha racter o r state he called 'apomorphic' (away
from the anccSlrai morphology). The relat ion between character and charac-
ter sta te is considered furth er in Chapler 2, where it will be seen that there
are many methods fo r coding observatio ns. Here, it is only necessary to
emphasize that the terms apomorphic and plesiomorphic are relalive
terms- relative to a part icular systematic problem. In Fi,. 1. .rlCter
Types of characters 3
A B c A B c o
"
,
('I (bl
Fig. 1.2 Plesiomorphy and apomorphy are relative terms. tal Character slate '8' is
plosiomorph ic find '3" is apo morphic. Stohl '0' is presumed to huve bflfl n present in
the ancestral morphotype tha t gave rise to taxa Band C. (b) Character state '3 " is
apomorphic wi th res pect to 'a' but plesiomorphic with respect to ·a~'.
slale 'a' is plesiomorphic and 'a" is apomorph ic. State 'a' is presumed 10 have
been prese lll in the ancestral morphotype th at gave rise 10 taxa U and C. In
IFig. 1.2b, 'a" is apomorphic with respect to 'a' but pJesiomorphic with respect
Sisler-groups are discovered by identifyi ng apomorphie characters inferred

10 have originated in the ir most recent com mon ancestor and shared by its
Clescendanls. These sbared apomo rphics, o r synapomorphies, ca n be thought
or as evolUlionary homologies, that is, as structu res inherited rrom the most
tee l\l common ancestor. In Fig. 1.3, characters 3 and 4 arc synapomorphies
that suggest that the lizard and the salmon shared a unique common ancestor
'11'. The cladogram implies that characters 3 and 4 arose in ancestor ' x' and
erc then inherited by both the salmon and the lizard. In contrast, the shared
possessio n of characters J and 2 by the sa lmon and lizard does not imply thai
the)' share a unique common ancestor because these attributes are also fo und
the shark. Such shared primitive characters (symplesiomorphies) are char-
ers inherited from a more remote ancestor than the most recent common
SlUr and are thus irrelevant to the problem or the relationship of the
rd and the salmon. However, wilh respect to the more inclusive three-taxon
, Ie III com prising the shark, salmon. and lizard, characlers 1 and 2 are
vlml. At this level, they 3rc s)'napomorphies, suggest ing that thcse three
taxa furm a grou p with a common ancestry at ' y'.
Ah l.!rnalively. we may approach this problcm by looking for the gro ups thai
art" pcc j{ied by differe nt characters . In Fig. 1.3, give n four taxa of unknown
int t"1 ela lionships, characters I and 2 suggest the group shark + salmon +
lizard while characters 3 and 4 suggest the group salmon + lizard . Further-
mo r e theae four characters suggest two nested groups, one more inclusive
t.han olher «Shark (Salmon, Lizard».
Ht"I't" we recognize that synapomorphy and symplesiomorphy describe the
~1IIIii~o~r c:haractefl relative to a particular problem. For example. characters
3 and .,. -rn.pomorphiel when we are inlere..aed in Ihe relalionships of
4 IntroducliOlI to cladistic concepts
r---------------- VERTEBRATA -----------------
, - -- - GNATHOSTOMATA - - - -- --
I'
PISCES
~ OSTEICHTHYES
I , TETRAPODA l
LAMPREY SHARK SALMON LIZARD
D A B c
II . inl e'lliIl noSI r~
1 t . prlsm.tic
ca' tu.ga 8 . p..".d""'yllimb
' 3. !III rav"
3. IIorge d......al bone_
I paired lint
Fig. 1.3 Cladogram fo r the lamprey, s h a rk, salmon and lizard. Monophyletic gtoUPS
are established on the basis of syns pomorphies (cha racters 1- 4), while autapomor-
phies (characters 5-12) define terminal taxa. Character 13 conflicts with this
hypothesis of relationships, suggesting instead a relationship between the shark and
salmon.. See Fig. 1.4 and lext for furtherexplana UOIL
the salmon, lizard, and shark, but symplesiomorph ies if the problem involves
the relationships of different spec ies of lizards or different of ""Imon.
ParsimollY 5
Hennig recognized a third character type, which comprises those characters

that are unique to one species or one group, such as characters 5-9 in the
lizard, 10 in the salmon. II in the shark and 12 in the lamprey. These he
ca lled autapomorphies, which in Fig. 1.3 define the terminal taxa A-D.
Autapomorphies can be thought of as the fingerprints of the terminal taxa.
The characters used to determine relationships are apomorphic characters
or character-states and this implies acceptance of a theory of transformation
(absence ...... presence, or condition a ...... a'). Hennig (1966) believed that there
were several criteria by which we could recognize plesiomorphic and apomor-
phic states even before we started any cladistic analysis. The two most
frequently employed today are the ontogenetic criterion and outgroup com-
parison. These are considered at length in Chapler 3, where we will also learn
that the distinction between plesiomorphic and apomorphic resides in rooting
the cladogram, that is, in the choice of the taxon that is to be the starting
point for our theories of character evolution.
1.3 PARSIMONY
Cladistic analysis orders synapomorphies into a nested hierarchy by choosing

the arrangement of taxa that accounts for the greatest number of characters
in the simplest way. For instance, Fig. 1.3 accounts for most of the cha racters
by assuming that each appeared only once in history, has been retained by all
descendants and has never been lost. But usually the data will not all suggest
the same groupings. For example, in Fig. 1.3, character 13 (fin rays) is
uniquely shared by the shark and the salmon and suggests the grouping
shark + salmon. [0 fact, this is a traditionally recognized formal Lionaean
taxon, Pisces (fishes). Why therefore should we favour a different grouping?
Consider two alternative solutions shown in Fig. 1.4. The cladogram shown in
Fig. l.4a recognizes the group Pisces (shark + salmon) based on one synapo-
morphy. character 13. In contrast, the c1adogram in Fig. l.4b recognizes the
group Osteichthyes (salmon + lizard) based on two synapomorphies (char-
acte rs 3 and 4). The second cladogram accounts for more characters as
having arisen only once. If there are alternative solutions, then we would
choose the simplest or most parsimonious pattern of cha racte r distribution.
Parsimony is the universal criterion for choosing between al1ernative hy-
potheses of character distribution just as it is a universal criterion for
choosing between any competing scie ntific hypotheses. It needs to be pointed
out here that parsimony is simply the most robust criterion for choosing
between solutions. It is not a statement about how evolution mayor may not
have taken place.
Parsimony is fundameat&110 dldialic analysis and may be explained in a
slightly different way. SuppGII we W m characters distributed among four
lUi II Ihown in tbI trUe in Fia. 1.5a. Taxon A has none
6 In trQ(/uctiQn to cladistic concept!;
( a)
lAM'FEY SHAA< SALMON LlZAFU
(b)
Fig. 1.4 Using parsimony to choose between two compe ting hypotheses of relation-
ship. (a) The shark and lhe salmon form a monophyletic Ij:rou p ba sed upon shared
possession of fin rays (character 13). Howover, this topology requires us to hypothe-
size tha i characters 3 and 4 Ilach aroS(! independently in the salmon and the lizard.
(IJ) Altema lively, the salmon and tho lizard form a monophyletic group based upon
characters 3 and 4, with charactor '13 now being considcffid homoplastic. This
d adogram is prefe rred to tha t in (a) because it is more parsimonious. Character 13
may still be a synapomorphy but a t a more incl usive level (albeit with some
homoplasy) Bnd is shown repositioned as such by the arrows.
of the characters but the other three taxa each have a different complement.
Characters 2 and 4 are autapomorphies, since they are each present in only
one of the taxa . They are uninformative for grouping taxa (they serve on ly to
diagnose these te rminal taxa). Cha racte rs 1. 3.5 and 6 arc potentially useful
beclIusc they are present in more than one taxon. G ive n the th ree taxa tha t
have pote nt ially informative information. there arc three ways in which we
cou ld arrange these taxa d ichotomously (Fig. I.Sb- d).
If we now place each of the characters, according 10 the roy
Pan'imony 7
( a) CHAR,.6CTEF6
TAXA I 2 3 4 5 6
A 0 0 0 0 0 0
B
•• • 0 0 0
•• ••
•• •
C 0
0
• 0 0 0
(b)r -_ _ (c) ( d)
A A ,-----A
B e ,----D
e D e
D B B
(e f-
) _ __ (f ) ( g)
A A ,-----A
,,
e
B .
, ,, e ~I--D
3 ,
Bill e
D
D ,, B , B
Fig. 1 .5 Explanation of pa rsimony in hHms of 1lna lysis of cha racter distributions. (a)
A data ma trix 01 six characters (1 - 6) distribu ted among four taxa (A - D). Plesiomor-
phic states are indi cated by open boxes, apomorphi c states uy solid boxes. (b-d)
The three possible resolutions of taxa 8 -0 relative to laxon A. (e) Placing the
r:ha racters on the topology in (bl requires seven steps. Chara cters 1-5 appea r only
once while character 6 appears twice. This is [he optimal, most parsimonious
solution, (fl Placing the characters on the topology in (c) requires nine steps.
Characters 1, 2 and 4 appea r on ly ollce but characters 3. 5 and 6 all appear twice.
This is a suboptimal solution. (8) Placing the characters on the topology in (d)
requires eight steps. Characters 1, 2. 4 and 6 appear on ly once but characters 3 and 5
both appear twice. Thi s is also a suboptimal solution.
specify, o n each of these possible cladograms (Fig. J.5e- g), the n we oblain
three d ifferen t resu lls. The cladogram in Fig. 1.5e shows that a ll but one of
tht! c ha racters ilppcars on ly o nce. Howeve r, in th is solutio n, we must assu me
that character 6 appears twice, o nce in ta xon B and once in taxo n C, which
arc not s i s tt:r·~rnup!l,
We can du the M. ~Jm':mc cJadol.!rilms in foil!. J..S f <tnd Fill. 1.5p.
8 Introduction to cladistic concepts
However, in these two cladograms, we must assume that two or more

characters appear more than once. Since the solution in Fig. 1.5e accounts for
the distribution of the characters in the most economical way, this is the
solution that we would prefer.
The distribution of characters can also be thought of as the number of
steps on a cJadogram. In Fig. 1.5, the number of steps is counted as the
number of instances where a character is gained. In the cladogram in Fig,
1.5e, this is seven. The other cJadograms (Figs 1.5f and l.Sg) are more costly,
requiring 9 and 8 steps respectively. The idea of steps is a little morc subtle
because a single character may appear at onc point on a c1adogram and
disappear again at another point. For example, another explanation of the
distribution of character 6 on the cladogram in Fig. 1.5e is 10 assume that it
was gained by the group B + C + D and then lost again in D. Each change,
whether gain or loss, is considered to be a step. In Ihis example, both
accounts of character change demand two step.5. Cladists often speak of the
length of a cladogram and this is what Ihey mean - the number of character
changes, irrespective of whcther those changes ' are gains or losses. The
output from cladistic analyses using computer packages invariably gives
c1adogram lengths, as well as other statistics (sec Chapter 5). Cladists also
speak of the most parsimonious solution as being Ihe optimal cladogram, and
the other cladograms (that is, those requiring more than the minimum
number of steps to explain the character distributions) as suboptimal.
It is possible that there are two or more equally parsimonious solutions for
a set of characters. Then, we may prefer to accl';pt one of the solutions based
on other criteria, such as a closer agreement with the stratigraphic record or
by differentially weighting one type of character change relative to another.
For certain applications, we may choose to combine those elements common
to the different solutions to make a consensus tree (see Chapters 7 and 8).
Any solution, or solutions, at which we arrive is a summation of the
relationships among characters. Just as taxa may be related to one another,
we can visualize characters as being related to one another. Fig. 1.6 illustrates
the several ways in which characters can be related. Suppose that we have a
character, 1, which is present in three taxa: A, Band C. This character can be
said to specify a group A + B + C. Suppose that we now discover a second
characler, 2, which specifies a group D + E. Since no common taxa are
involved the characters can be said to be consistent (strictly ' logically consis-
tent ') with one another. They can have no influence on one another because
they specify completely different groups. When we add a third character. 3,
that is present only in taxa Band C. this character specifies a subgroup of an
original larger group A + B + C. This character 3 is also said to be consistent
with character L Character 4 is found in taxa A, Band C and thus specifies
exactly the same group as character 1. Characters 1 and 4 are said to be
congruent with one another. Character 5 is found in taxon C and taxon O.
Character 5 specifies a group C + D. which is totaJ.lY. " " pup
Parsimony 9
la)
CHARI\CfERS
T AXA 1 2 3 , ,
A
B
•• D
D
••
•• • •
D D
D
C
• D
•
D
E
D
D
• •D
D
D
D D
~j
I b)
.---- A
-'Ie: B
C
~c 2 E
D
2 E
Ie) ~,-- A
10 _, - - A
B B
C C
D D
2
E 2 E
. . ~ CQIISISTENT
m!; ~ = (:O\Iffil£NT
~ :. ~LAST'C Of ~TlN3
fig. 1.6 Relationships among characters. (a) A dala matrix of five characters (1-5)
distribu ted among five taxa (A-El. Plesiomorphlc sta les are ind icated by open
boxes. apomorphic states by solid boxes. (b) Character 1 is shared by three taxa, A, B
and C, which form the initial group. (el Character 2, present in taxa 0 and E.
specifies a different group from character 1. (d) Character 3. shared by taxa Band
C. spocifies a subset of the initial group. (e) Character 4, present in taxa A. B and C,
specifies the same group as character 1. (0 Character 5, present in taxa C and D,
specifies II different group C + 0 that conflicts with the initial group. With regard to
cha racter 1. characters 2 and 3 are consistent, character 4 is congruent, while
character 5 is in conflict.
specified by the other characten. In other words, the group that this charac-
ter specifies conflicta wilh IbuIc .-cified by the other characters. Character
5 ill&.id to be hom.....iIa..__.:.._ _
10 illlroollctioll to cladistic concept)'
,
. b, . .d. in
• = CCNSLSTENT
OJ =camUENT
~ "l-«:MYlASTC or CCJfl.CTNG
. 'jg. 1.7 Application of the character relAtionships shown in Fig. 1.6 10 a cladogrllm
of the lampre y. shark, salrnon and lizHnl. With reference to character 1, character 2 is
congruent beCAUse it specifies the same group. Characters 3 and 4 are consistent
because they sp(Jcify a subgroup of thll l specified by cl1I.lracter 1. CharacluT 13 is
aJ~o consistent becauso it specifies A subgroup, even though thai subgroup docs \lot
appea r in the most parsimonious solulion. Character 14 conflicts with characte r 1
and is homoplastic.
Returning to the feal example. we can recognize seve ral types of cha racter
interaction. In Fig. 1.7, and taking character I as the refe rence, character 2 is
congruent with it because it specifies the same group. Characters 3 and 4 are
consistent with cha racters I and 2 because they specify a subgroup of the
group specified by character 1. Character 13, which is shared between the
shark and the sal mon, is also consistent with character 1 because it specifics a
subgroup, eve n though that subgroup does nOI appear in the most parsi-
monious solution. Character 14, which is shared between the lamprey and the
salmon, conflicts with character 1 und is thus homoplastic.
1.4 GROUPS
As a result of the relative definition of relationship, He nnig identified three
types of groups, which he recogn ized on the basis of ancestry and descen t.
Using Fig. 1.8 as a reference the following groups m:ly be recognized.
I. A monophylet ic group contains the most recent common anccilur plus all
and only all its descendants. In this figure, such groups would be Incestor
' x' and sal nllJn + lizanl ; or ancr.:S\(lr 'y' and shark r ·'P77 !I-: ,IJ; {If
Groups II
LAMPREY SHARK SALMON LIZARD

D A B C
FI;!!] M)f'OA-rvl.ETIC
~ PARAPHYlETIC
~ FOLVPt-('(LEllC
Fig. 1.6 The three Iypes of groups recognized by Hennig on !he bHsis of anctls!ry
lind descent.
ancestor 'z' and lamprey + y + shark + x + salmon + lizard. In this panicu-

lar example, the monophyletic groups have formal Linnaean names: they
arc Osteichthyes, Gnathostomata and Vertebrata respectively.
2 1\ paraphylclic group is whll t remlli ns after one or more parts of a
monophyletic group have bee n removed . The group shark + salmon is a
parHphyletic group !Imt has heen traditionally recognized as Pisces (fishes).
However, one of th e included members (I he sa lmon) is inferred to be
genealogicHlly closer to the lizard , which is not part of the group Pisces.
The slwrk and salmun share ttn ancestor (y) but not all the desce ndants of
that ancestor arc included in the aroup.
on Ihe bnsis of convergence, or on
lie charm:!crs a!)sumeti to have
been absent in the most recent common ancestor of the group. The group
lamprey + salmon, which might be recognized on the shared ability to
breed in freshwat er, would be considered a polyphyle tic group. Breeding in
freshwater in vertebrates might be considered to be an apomorphic char-
acter but this is inferred to have arisen on more than one occasion. The
character by which we might recognize it is non-homologous: it is a fal se
guide to rcJationship. No Linnaean taxon has ever been recognized for this
group.
MOSI systematists would agree thai recognlzmg monophyletic groups is

desirable and would also accept the artificiality of polyphyletic groups. It is
paraphyletic groups thai have been the source of debate, particularly among
palaeontologists.
Cladists insist that only monophyletic groups be recogn ized in a classifica-
tion. Paraphyletic groups obscure relationshipS because they are not real in
the same sense; they do not have historical r.cality and they cannot be
recognized by synapomorphy alone. tn Fig. 1.8, for instance, Pisces is a
paraphyletic group recognized by having synapomorphies of a larger group
(GnathoslOmata) and lacking the synapomorphies of Tetrapoda. Fishes are
distinctive only because they lack tetrapod characters. II turns out that the
'defining attributes' of paraphyletic groups are symplesiomorphies.
In the past, when evolutionary taxonomy was the classificatory paradigm
lind when the contribution of palaeontology was thought to be the identifica -
tion of ancestors, there were two reasons why paraphyletic groups were
popular. First, paraphyletic groups, such as reptiles or gymnosperms, were
justified on the basis that extra evolutionary information was conveyed by
distinguishing them from highly apomorphic relatives. That extra information
was considered to be evolutionary divergence. To retain Pisces as a para-
phyletic group in Fig. 1.3 and separate off the lizard (Tetrapoda) in a
collate ral group was done to emphasize the many autapomorphies (characters
5-9) of this latter group. In the terminology of evolutionary taxono my, these
tetrapod characters were evidence that tetrapods had shifted to a new
adaptive zone (involving life on land, receiving stimuli through air rather than
water, etc.). Similarly, retention of the paraphyletic group 'gymnosperms' was
justified in order to recognize the evolutionary divergence of angiosperms. In
a cladistic classification such divergence would be expressed by means of the
number of autapomorphies identifiable in tetrapods or angiosperms. Con-
verse ly, to retain paraphyletic groups means that we lose cladogenetic
information. Not all fishes are each others' closest relatives; some a re
genealogically more closely related to tetrapods than others.
Second, paraphyletic groups are popular and have been common ly recog-
nized in palaeontology because trad itionally they are the ancestral aroupings
(fishes ancestral to tetrapods, reptiles ancestral to birdl land mammals,
gymnosperms ancestral to angiosperms, algae ancestral and
Groups 13
BC charact II'S
A • e
ABC - feathers
ABC Be - pygostylG
ch!lract ars
(8) (b) (e)
"-ig. 1.9 Ancestors can not be distinguished as individual taxa because thoy are
wholly primitive with rospect to their descendants and thus have no features by
which they can be unequivocally recognized. (a) Three taxa, A. Band C, form a
group because they share ABC characters. Taxa Band C are sister-groups because
they sharo Be characters. If A is considered to be the ancestor of Band C. it can be
placed at the origin of Band C only if it lacks any distinguishing characters of its
own. Otherwise, it would be placed as the sister-group of B+ C. In other words,
ancestor A can only be recognized beca use it possesses ADC characters but lacks
BC characters. (b) Archaeopteryx, equivalent 10 taxon A in (a), is the traditional
ancestor or the birds. It has the synapomorphies (e.g. feathers) that are found in all
birds. including the ostrich (taxon B) and the raven (taxon C), but lacks the
synapomorphies of the oslrich+ raveu such as II pygostyle. In terms of cha racter
distribution, Archaeopteryx simply does nol exist. (c) To circumvent th is problem.
dadists place ancestors as the sister-group to their putative descendants and accept
that they must be nominal para phyletic taxa.
so on). But all of these paraphyletic groupings are based on symplesiomor-

phies and can only be recognized by what they do not have: fishes do not
have the autapomorphies of tetrapods, gymnosperms do not have the autapo-
morph ies of angiosperms.
This should not rea lly be a surprise because ancestors must , by definition,
he wholly primitive with respect to their descendants. But what Ihis also
means is that they cannot be distinguished as individual taxa. Consider Fig.
1.9a, in which taxon A is ancestral to descendent taxa Band C. This diagram
has been established on the distribution of characters, which are the only
nhservable allributes of taxa. All taxa are grouped because they share ABC
characters. Taxa Band C are sister-groups because they share BC characters.
A can only be placed at the origin of Band C if it lacks any distinguishing
characters of its own. Otherwise, it would be placed as the sister-group of
8 + C. In other words. A can only be recognized because il possesses ABC
characters but lacks BC characters. Fig. I.9b shows one traditionally recog-
nized ancestor, ArcluJlOpIer)«. Archaeopteryx (equivalent 10 taxon A in
Fig. 1.9a) has feathen. . . . . . . the mosl obvious synapomorphy of all
Irda. inc:ludinl the 0IUIdt (..... B) and the raven (taxon C). However,
!A~hatoDlf"" I, of the ostrich + raven such as
14 imrorillClioll to cwdistic concepts
a pygoslyle. But, of course, there arc many other ani mals that lack the
pygostyle and there fo re this cannol be a distinguishing character of
Ardwcopteryx. In fael , to date. Archaeopteryx has no recognized autapo-
morphies. Indeed, if there we re, Archaeopteryx would have to be placed as
the sister-group to the rest of the birds. In te rms of unique characters,
Archaeopteryx simply does not exisl. This is absurd, for its remains have been
excavated and studied. To circumvent this logical dilemma, cladists place
likely ancestors on a cladogram as the sister-group to their putative descen-
dan ts and accept that Ihey must be nominal para phyletic taxa (Fig. 1.9c).
Ancestors, just like paraphyle!ic taxa in general, can only be recognized by a
particular combination of characters that they have and characters that they
do not have. The unique attribute of possible ancestors is the lime at which
they lived. After a cladistic analysis has bee n completed the cladogra m may
be reinterpreted as a tree (see below) and at this stage some palaeontologists
may choose to recognize these paraphyletic',taxa as anceslOrs, particularly
when they do not overlap in time with their putative descendants (see Smith
1994a for a discussion). The logical impossibility of placing real taxa as
ancestors in cladistic analysis has the further consequence that the ancestors
'x', ;y' and 'z', which have been placed on several of the figures, should be
considered as hypolhelical ancesto rs representing collections of characters.
Up to this point, groups have been described in Hennig's terms of common
ancestry. But groups are not discovered in this way. In practice, lhey a re
discovered through analysis of character distributions. So we must return to
characters to look aga in at the definition of groups. We can separa te
characters 0 11 their ability to desc ribe groups. Those characters that allow us
to specify monophyletic groups are synapomorphies. Monophyletic groups are
discovered by finding synapomorphies. A very import ant conceptual leap
came when homology was equated wi th synapomorphy (Pat terson 1982). This
has import ant consequences, for it means that homologies are hypotheses;
nypotheses to be proposed, tested and, perhaps, falsified.
We ca n illustrate this by returning to Fig. 1.4. Let us assume that we have
arrived at the hypothesis of relationships shown in Fig. 1.4a. This hypothesis
recognizes that tbe salmon + shark is a monophyletic group, discovered by
suggesting that the shared possession of character 13 (fin rays) is a synapo-
morphy o r an homology. However, this hypothesis can be shown to be fal se
because other synapomorphies suggest that the salmon and Ihe lizard form a
monophyletic group recognized by the shared possession of characters 3 and
4. The original hypothesis of ho mology has been tested and shown to be fal se.
It may st ill be an homology but at a higher bierarchicallevel, as shown in Fig.
lAb, where it specifics the larger group of shark + salmon + lizard. However,
as an homology of the group shark + sal mon, it is fal se.
So, an hypothesis of homology is tested by congruence with other charac-
ters. It should be obvious from this that homologies are con tinually being
tested (t he three tests of homology are discussed in Chaptor 2). DUcovcry of
Cladograms alld trees 15
MONOPHYLETIC Homology. ',",pom",h, ,,~ ~
PARAPHYLETIC Sympteatomorplly
POL VPHVLETIC Homoplasy
Fig. ) .)0 The threo types of groups recognized by Hennig defined in terms of
character distributions. Monophyletic groups are discovered through homologies
(sYllapomorphies); para phyletic groups are those based upon sympiesiomorphies;
polyphyletic groups are fou nded upon homoplastic characters.
homology is at the heart of cladistic analysis. It is important to note, however,

that there is no need to appeal to any specific theory of evolution in order to
discover an homology. Evo\Ul'ionary theory may help explain homology but it
is unnecessary for discovering homology.
Monophyletic groups can be discovered through homologies and they are
the only kind of group that can be justified by objective boundaries. Para-
phyletic groups are those groups recognized by sympl esiomorphies, that is,
characters properly applicable at a more inclusive leve l in the hierarchy.
Polyphyletic groups are recognized by homoplastic distributions of characters.
These three relations are shown in Fig. I. to.
1.5 CLA DO GRAMS AND TREE S
Throughout this chapter we have been slowly moving away from Hennig's
evolutionary explanations for concepts of relationship, characters and groups.
To conclude this chapter, we must make the important distinction between
c1adograms and trees.
The relationships botweell Ilmprey, shark, salmon and lizard, is drawn in
Fig. 1.3 as a branchiq: diapam- a dldogram. A cladogram has no implied
lime IXi•. II ia simnIY. lhl .ummarizes a pattern of cha racter
(a)
0
l amprey
•
Sham
•
Salmon
•
lizard
00
1 1. 0 18. • 11
01 ( D (A (B,C))
(b)
• • • • •
y
0 0
• • • • • •
•
• •
0 0
Fig. 1.11 Cladograms and trees. (8) Uw dadogram from Fig. 1.3 depicting the
rela tionships of the lamprey, shark, sa lmon and lizard redra\'IITJ as a Venn diagram
snd In parenthetic notation. (b) .Five of the 12 possible trees that can be derived
from lhe cladogTam in (e).
distribution. The nodes of the branching diagram denOie a hie rarchy of

synapomorphies. There is no implication of ancestry and descen t. It could just
as easily be written in purenthetical notation or illustrated as a Venn
diagram, as shown in Fig. Ula. The character in for matio n contained in the
Venn d iagram is compatible wit h a number of derivative evolutionary trees,
which do include a time axis and embody the concepts of ancestry and
descent with modification . Five such trees (out of a tolal of 12) are shown in
Fig. 1.11 b. Some of these trees assume th at one o r mo re of the taxa
(A, B, C, D) are real ancestors. O ther trees include hypothet ical ancesto rs
(x,y,z). Only one tree has the same topology as the cladogram and this is the
one in which all the nodes represent hypothetical ancesto rs. The other trees
contain o ne or more real ancesto rs. Choice among these trees depends on
factors other than the distribution of characters over the sampled taxa, which
is the o nly e mpirical content. Selection of one tree In preferolKlCl to In)' other
Tree tennill%gy 17
may depend on our willingness to regard one taxon as ancestral to olhers.

Alternatively, we might say that some trees containing real ancestors are less
likely to be true than others because of unfavourable stratigraphic sequences.
The important point is that evolutionary trees are very precise statements of
singular history but thei r precision is gained from criteria other than charac-
ter distributions. These trees cannot be justified on characters alone.
The distinction between c1adograms and trees is important becau se many
people have taken the cJadogram to be a statement about evolution . To do
this we mu st be prepared to accept other beliefs, for example, that evolution
is parsimonious or that evolution proceeds exclusively by branching. Many of
the crilicisms of cladistics are levell ed at the claim that these are unrealistic
assumptions of evolution. Indeed they are. But they are not assumptions of
cladistics or cJadograms. They are assumptions of trees. The c1adogram, as a
distribution of characters, is the starting point f9r further analysis. In prac-
tice, many systematists do turn their dadograms into trees in order to say
something about evolution. Furthermore, they may incorporate evolutionary
assumptions concerning the relative likelihood of character change, such as
the impossibility of regaining a character once lost (Dalla's law), or the
greater likelihood of nucleotide changes taking place within loop regions
rather than stem regions of molecules. Some of these assumptions are
considered furth er in Chapters 4 and 5.
1.6 TREE TERMINOLOGY
Having made the distinction between c1adograms and trees, it needs to be

stressed that, unfortunately, cladistic analysis often speaks only of trees, tree
lengths and tree statistics. This is an historical legacy from Hennig's original
formulation of phylogenetic systematics, as well as a mathematical convention
for describing branching diagrams as trees.
There are several terms regarding trees that the reader will frequently
meet. Trees have a root, which is the starting point or base of the trce. The
branching points arc called the internal nodes, while the segments between
nodes are internodes or internal branches. Taxa pl aced at the tips are
terminal taxa and the segments leading from internal nodes to a terminal
taxon are called terminal branches.
Sometimes trees are drawn such that each of the branches is of equal
length, irrespective of how many character changes may be assigned 10 it.
This is called a non-metric tree. Another description is a metric tree in which
the relative lengths of the branches are drawn to refl ect the numbers of
character changes, which may be different in different parts of the tree. The
results of analyses of molecular data are often depicted as metric trees in
order to emphaalze (he .....1 varlalion in numbers of character changes that
often oocur on diffe &laird type of Iree is the ultrametric tree
18 ImY'()(iflct;on (0 cladistic concepls
in which each of the terminal taxa is fIXed al the same dist:l ncc from the rool
by assuming a constant molecular clock.
1.7 C HAP TE R SUMMARY
I. The clad istic concept of relationship is relative. Taxon A is more closely

re lated to taxon B th an either is to a third taxon C or any olher taxon.
2. Taxa A and B are sisler-groups and th is combined group will have its own
sisle r-gro up. The a im o f clad istic analysis is to d iscover siste.r-group
relationshi ps.
3. Sister-groups are discovered by plotting the distributions of synapomor-
phies.
4. Characters have relationships to onc an~ihe r on a given c1 adogram. They
may be congruen.t, consistent or homoplastic.
5. Cladistic groups may be monophyletic, paraphylclic or polyphyletic. Only
mo nophyletic groups arc real.
6. Monophyletic groups are recognized by ho mology. Homology equals
synapomorphy.
7. Cladograms are statements about character distribu tion. There may be
seve ral evolut ionary trees compatible with one cladogram but most of
these make additional assumptions beyond those of character distribution.
2.
Characters and character coding
2.1 INTROD UC TIO N
Cladistic analysis consists of three processes: discovery or selection of charac-

ters and laxa, coding of characters, and determination of cladograms Ihal best
explain the distribution of characters over the taxa. Although the three
operations are inextricably interlinked, much o f the literature deals with
analysis of coded data matrices and affo rds less space to reviewing the
principles of character d iscovery and coding (rom raw observations. In this
chapler, we describe the interaction between discovery and coding, discuss
different kinds of charact ers, and describe the methods and pitfalls associated
with different coding procedures.
The operations of cladistic analysis are stro ngly influenced by the se lection
and resolution of taxa and characters. The kinds of data avai lable (skeletons,
soft parts, nucleotide sequences, etc.) inevitably affect choice of cladistic
method. Converse ly, the cho ice of analytical method may set bounds for the
kinds of data that can be analysed and the format in which those data are
recorded. Thus, the problem lies in defining which data will be recorded, how
those data arc coded and whether particular data points are to be included or
excluded fr om cladistic analysis.
Modern data sets are form ed from characters scored as discrete codes in
columns and taxa in rows. Hence filters operate between the ini tial discovery
procedure and the recording of the variation in a data matrix. Details of the
filters used are often obscured by the style of publication and ofte n the final
reworked matrix is the on ly published information on the original observa~
tions. For computerized cladistic analysis, all matrices render variation into
discre te codes. Frequently, only characters that s how significant variation are
used and measurements are coded into discrete states on the basis of gaps.
T hese gaps may be the only clues to the type of the filter used .
2.1.1 Filtp.rs
'rhe 1l10st ohvious filter in d :'ldi ~t i c analysis is that which rejects attributes
thai are cnntinuQU$ lind quanlitlltivc and favours instead cha racters Ihal are
di$crClc and qualitatiye. The problem with all charact ers i$ determining those
tMt IIrc chItJis.ticaliv. u.IOJYlIIId .......Jh._1 IIJC not. In I!cncral. continuous and
20 C/UlnlClers and character coding
quantitative characters are considered not to be cladistic but to vary pheneti·
cally. There are many reasons why we favour discrete characters and consider
conti nuous characters unsuitable for cladistic ana lysis. Quantitative charac-
ters are difficult to describe fully, requiring means, medi ans and va riances to
establish the gaps. Using on ly a portion of the described character (e.g. the
mean or the median) raises the question of what to do with the rest, but
matrices generally require that each value in the matrix be represented by
single discrete alphanumeric value, although polymorphic variables can be
analysed in certain computer programs (e.g. PAUP and MacCl ade),
2.2 KI N DS OF CHARACTERS
The recommendation to reject quantitative and continuous data in favour of

qualitative and discrete data implies that 'qu1\ntitative', 'qualitative', 'continu·
OllS' and 'discrete' refer to kinds of data that differ in value, and that some
quantitative variables can be determined a priori to have greater systematic
value than others. This chapler reviews characters and coding from all four
perspectives: qualitative and quantitative, discrete and conti nuous.
2.2.1 Qualitative and quantitative variables

The distinction between qualitative and quantitative data is considered by
some (0 be more apparent than real. Stevens (1991) considered that ' many
so-called qualitative characters are based on a quantitative base filtered
through reified semantic discontinuities of ... terminology'. What this means
is th at although matrices seem to be coded with discrete characters. the
original observations were qualitative ranges that had been artificially fil-
tered. In other words, qualitative terminology hides quantitat ive values. For
example, descriptions of plane shapes. such as leaves ovoid, are shonhand
expressions for ranges of measurements, in this case, measurements of the
dimensional ratio between distance from Ihe base 10 the widest point on Lhe
leaf. The distinction between qualitative and quantitative refers more to
mode of expression rather than to intrinsic properties of the data. Table 2.1
provides a list from Thiele (1993) of quantitative characte rs from the litera-
ture that are expressed qua litatively with clear gaps between the different
states.
Qualitative shorthand expressions of quantitative data are most useful
when the data show widely discontinuous patterns of variation. Thus a sample
of leaves distinguishable into ovate and obovalc forms may be simply de-
scribed in a quantitative way. If ovate leaves belong to one taxon and obovate
leaves belong to another taxon, the character can be scored in the matrix for
cladistic analysis, either as a 0 o r I, or any pair of adjacenl iD"fII'L However,
if the sample of specimens has a continuous ranlO of none
Kinds of characters 21
Table 2.1 Examples of qualitative variables expressed as discrete

q uantitativo characters. (After Thiele 1993)
Male anal point reduced; distinct1.3
Larval antennal blade longer than flagellum ; shorter than flagellum 2.J
Cotyledons ovate-orbicular; spalhulate 2. 4
Fruit ribs prominent; nol prominent I.•
Buds widest below opercu lum suture; above operculum suture l . S
Fruit valves broader than long; approximately equal; longer than broad2,5
External valves of premaxilla midbod)' flat ; markedly indent (. 6
Posterolateral overlap of ectopterygoid with maxilla extensive; modest 2. 6
Dorsal fin lulterior; postcrio r 2. 7
Vomerine teeth extending laterally at least as far as the medial borders
of the internal narcs; not extending as far 2•8
Diastema belWeen palatine and vomerines small; largc L8
Pollen boat-shaped; g1obose 2•9
Nucellar cuticle thin; thick 1.9
Pterygoid noor of canalis caroticus intemus thin; thick t . HI
t Length measuremenl
2 Ralio.
~ Cranston and Humphries' (1966) recoding of Saelher (1976).
4 Thiele and Ladiges (1988).
~ Chappill (1969).
6 KJugo (1989).
7 Begle (1991).
8 Kraus (1988).
\I Laconic and Stevenson (1991).
10 Gaffney et al. (1991).
form and another, then the variation wouJd require some quantitative expres-
sion to do it justice. If two taxa each have a range of leaf shapes along this
cO nlinuum and the ranges overlap, the character would more often than not
he labelled quantitative and rejected from cladistic analysis. The terms
quantitative and qualitative are often used in this sense as synonyms for
overlapping and non-overlapping ranges in variables.
2.2.2 Discrete and continuous vari.ables
The terms discrete and continuous properly refer to mathematical properties

of the range of numbers used to measure an attribute. Continuous data are
those, such as dimensions, where potential values are so infinitesimally close
that there are no diaallowable real numbers. In contrast, discrete data can be
represented logically oaIr ... IUblet of all values, generally integers. They
Include absence/ prclCllICI'" lO/1 being the only allowable values), multi-
ltate dati (OLlL21 ,l)t8 of structures expressed as
22 Characters and character coding
Doe,Iapplng
Continuous
To>"" 1 T...... 2
I
M:nislie
Binary
II
Fig. 2.1 Overlapping and non-overlapping patterns of varia tion in continuous, meris-
ti c and binary da ta. (After Thiele 1993.)
integers, directly scored into the matrix or rescaled) and molecular data (e.g.
nucleotide sequences, ACGT / U).
2,2.3 Overlapping and non-overlapping characters
While qualitative, quanlitative, discrete and continuous are useful terms, the
degree of overlap among them is the crucial properly (Fig. 2.1), Although it is
implied that overlap can occur only between continuous characters, both
continuous and discrete characters can exhibit different degrees of overlap. It
is the degree of overlap that makes the distinction in filtering betwee n
overlapping and non-overlapping characters. The filte ring proscription can be
set to particular values for any given character. For example, it can be set to
select only those characters that show no ove rlap, rejecting all others. The
problem re mains, however, because, in reality, we have a sliding scale from
widely overlapping characters to widely disj unct characters that have d iscrete
gaps between th em. The required filter in these situations is a cut-off point
where the critical value might be scored for no overlap or any other
arbitrarily selected valu e. The problem is that this makes the fi ll eri ng of
characters highly susceptible to sampling error and the cut-orr points between
charact ers quite arbitrary. In reality, we should be able to grade characte rs
along a sliding scale and develop methods of toping illl dmere dc&!,ces of
Cladj,~lic characters 23
overlap, The sliding scale could recognize non-overlapping data as better than
overlapping dala and that the latter should be used only when the former are
unavailable. This approach matches the continuum of degree of overlap with
a continuum from beller to worse (Chappill 1989), rather than forcing it into
the general division of good and bad characters (Pimentel and Riggin s 1987;
Thiele 1993),
2.3 CLADISTIC CHARACTERS
The concept of a character is ill defined in cladistics and has a multiplicity of

meanings. Viewing features of organisms as characters or character states is
part of the process of recognizing their syste matic value when distinguishing
them from lion-cladistic characters, Characters are generally listed in the
form of presences or absences (e.g, vertebrae present/vertebrae absent), as
binary variables expressed as alternative characters or character stat es (e.g.
anthers introrse/amhers extrorse), or as multistates (e.g. eyes blu e/eyes
grcen/eyes brown). Although these distinctions might be obvious for anyone
group of organisms, it is impossible 10 write down how a character or
character state might be defined in terms of its systematic value. For,
inasmuch as we discover organisms and develop hypotheses of the boundaries
and relationships of taxa through the study of collected samples in both living
Hnd preserved collections, so Ihe same applies 10 characters. Initial hypothe-
ses of primary homology are subjected to similar processes to become refined
inlo secondary homologies (synapomorphies) through successive cladistic
,malyses and modifications of the original hypotheses (de Pinna 1991). Seen
in this way, characters are hypotheses about structures or features that can be
put through cladistic analysis to determine whether Lhey are hOJIlologues or
110t.
2.3.1 Diagnostic and systematic characters

There has always been a tension between the notion of defining characters in
orde r to identify and distinguish organisms and Ihe discovery of homologies
in co mparative biology to systematize the relations among organisms (Table
2. 2). Thus Smith (t994a), for example, considered that characters must occur
ill two or more states (one of which may be absence) and should be defined
ilS objectively as possible. Both morphological and molecular features that are
indistinguishable are generally coded as the same character state so as to
re fle ct the underlyin~ notiCln of primary homology. Thc problem with such
definitio ns (Table 2.211) ill thllt they do not distinguish between characters
used in key.~ for the expr_ purpoac til' distinguishing a taX()I1 from any of ils
relalives lind those char.lClalLlllat..arcJ1omolol.!ues ,Iud SUl.!l.!cs t re lationshios.
Table 2.2 Definitions of characters
II. Choracters diagnostic of taxa
'Characters are observed variations whi ch provide diagnostic features for differentia-
tion amongst taxa' (Smith 1994 0).
A character is 'any att ri bute of an organism or a group of organisms by which it
differs from an organism belonging to a different ca tegory or resembles an organism
of the sume ca tegory' (Mayr 01 al. 19531.
A character is 'anything that is considered a variable independent of any other thing
conside red at the same time' (Cain and Harrison 1958),
'A character in systematics may be defined as any feature which may be used to
distinguish one taxon from ano ther' (Mayr el oj. 1953).
A character is '8 fea ture of an organism that is divisible into at least two conditions
(or states) a nd that is used for constructing classifica tions and associated activities
{principally identification}' (Stuessy 1990).
b, Characters os tronsformations
'A character is a feafum of an organism which is the product of an ontogenetic or
cytogenetic sequence of previously existing features, or a feature of a previously
existing parental organism(s). Such features arise in evolution by the modification of
previously existing on togenetic or cytogenetic or molecular sequence' (Wiley 19811.
'A character is a feat ure of an organism that can be evaluated as a variable with two
or more mutuall y exclusive and ordered states' (Pimentel and Riggins 1987).
'A character (' transformation series' of Hennig) is a collection of mutually exel usive
states (attributes; features; 'characters', 'character states', or 'stages of expression' of
Hennig) which
a) have a fixed order of evol ution such thai
b) each state is derived directly from just one other state, and
c) there is a unique stahl from which every other is ultimately derived'
(Farris e r 01.19701.
c. Attribu tes of orga nisms, chara cter states of toxo
'Attrib ute slates are 'the descriptive terms which are applied to individua l
organisms, e.g. 'red', '2 em long"; attributes are 'sets of such descriptive terms, o.g.
'colour' to whi ch 'red' and 'green' belong'; character states are 'pro bability
distributions over the sta tes of an attribute'; characters are 'sets of such probabili ty
distributions" (Jardine 1969).
d. Characters as homologues
A character is 'a theory that two attributes which appear different in some way are
nevertheless the sa me (or homologous)' (Platnick 1979).
'If ... characters are hypotheses of homology and synapomorphy, then they must be
relationa l, and Ihe units of these relations are three-taxon sta tements' (Nelson and
Plalnick 1991).
'Cladistics is a discovery procedure, Bnd its discoveries are chBractara (homolo~ies)
and taxa' {Nelson and Patterson 19931.
Cladistic characters 25
2.3.2 Character transformations
Nevertheless, for characters or character states to be cladistic, and hence be
features of taxa, they must be scorablc in a data matrix and contain some
pattern for hypotheses of relationships of taxa to be discovered. For evolu-
tionary biologists, characters transform from one condition into another. For
example, Wiley's (1981) definition recognized that features of organisms are
the products of evolution and hence have arisen as changes in ontogeny and
transformation through time. However, there is a problem because this
definition is one of transformation of one character or character state into
another within organisms, rather than of an homology of any particular
group. Thus Wiley used his definition to describe cbaracters of Chordata and
Vertebrata, which are clearly taxa consisting of many individual organisms,
rather than transformation s of features within organisms.
Pime ntel and Riggins (1987) were stricter but less rigorous in their defini-
tion when they stated that a character can only be a feature of an organism
when it can be recognized as a distinct variable. Their definition is also
problematic because it, too, is tied to features of organisms rather than taxa
and, like Wiley, they go on to discuss coding variables for taxa rather than for
organisms. Farris (quoted in Miner 1980) showed that determining characters
was an inductive process when he stated that ' morphologists do not sample
characters, they synthesize them'. The Pimentel and Riggins definition is
based on that of Farris et at. (1970), who made it clear that, in order to be
able to determine characters for phylogenetic reconstruction, it was necessary
to recognize that they were mutually exclusive states that could be considered
transformations with a fixed order of evolution. Farris et al. thus redefined
Hennig's Darwinian interpretation, that characters transform from one state
lO another, as a series of axioms. All of the definitions in Table 2.2a- b
confuse the relationship between organisms and taxa and the problem
remains as to what diagnoses taxa when definitions refer 10 aliribUies of
organisms.
2.3.3 Characters and character states

Jardine (969) considered that diagnosing taxa and individual organisms using
the same character was nonsensical. The presence of a backbone is not a
property of Vertebrata but all of the organisms within the group Vertebrata
possess backbones. Jardine made the distinction between taxa and organisms
by describing characters and character states. Taxa have characters and
organisms have attributes or character states (Table 2.2cl
Most cladists consider that for characters to convey cladistic information
they must transform frOIII_ltIle into another through time. However, this
does not mean tba& ...... .,. chanlcs into a blue eye or that oval leaves
chuae into obov ~ c:brnlc animals lacking backbones
do not change into those that possess them. What actually changes is the
frequency of a pan icular character Siale for a given character and the
fre quencies of different characte r states change through time. Cladistic
character states are frequency distributions and, conversely, all cladistic
character slates have particul ar frequencies of distribution. Thus, des,irable
cladistic characters arc those with large, c1ear-cuL changes rather than smail ,
gradu al ones, and a good cladistic character is, in effect, a value judgement
on data.
2.3.4 Homology
The desire fo r cladistic characters 10 express large, clear-CUI differences
between taxa does nOI go far e nough in determ ining which character states
become grouping homologies. To an evolutionist, homology is defined as th e
sa me structure inherited from a common anoctstor. TIlUS 10 Hennig, hypothe-
ses about characters (synapo mo rphy) and hypotheses about groups (mono-
phyly) both appealed to ancesto rs fo r their justificatio n. This concept was
shown to lead to circular reasoning because both hypotheses for characte rs
and hypotheses for groups appeal to the same mysterious non-empirical
ancestors. The solution came with the so-called ' transformation of cladistics',
which allowed hypotheses about character states (homology) to give hypothe-
ses about groups (hierarchy). The method is empirical in that the re are no
appeals to ancestry for the determ ination of monophyletic groups (Platnick
1979) and any interpretations about ancestry are derived fro m the cladogram.
For cladistic analysis to be successful, we consider that it is necessary not
only to have principles that do not assume transformation, but also to
describe characters as hypotheses of homology that can be tested (Table
2.2d). Ho mo logy is the core concept of comparative biology and systematics.
When comparing and con trasting the morphology and anatomy of organisms,
we break down our observations into traits or character stales as recognizable
feat ures of the whole organism. Characters and cha racte r states convey no
phylogenetic information until we recognize their existence in ot her organ-
isms through naming them (Patterson 1982). It is the act of naming charac-
ters and character states that establishes theories of homology.
A n hypothesis of homology recognizes that a characte r in one taxon
represents the 'same' fea ture as a similar, but often not identical, character in
another taxon. Structures that are identical in form , position and develop-
ment in twO or more organisms pose no problem, because diffe rent systema-
tists can agree that th ey represent the sa me entities, i.e. they have a clear
one-to-one correspondence. However, problems arise when structures have
diverged in form so as to be o nly vaguely similar or when different develop-
mental pathways arrive at similar struct\1rOl. Proposina hypotheses of homol-
ogy becomes critical when we come 10 define •....ne • amonS structures.
Similarity of form i, not a crilorioa of .. .....,. but il first-order
hypOIhcsis. Fo r an homol 10 Ihe leulUre
Characler coding for discrete variables 27
Table 2.3 Tests of homology

Relation
Test Homology Parallelism Convergence
Similarity + +
Conjunction + + +
Congruence +
In question must also occur in the same topographical jXlsition within the
organisms being compared and also agree with mher characters ahont rela-
tionships of taxa (character congruence), a lest that can be applied only after,
br during, cladistic analyses (Table 2.3).
The congruence test equates homology with synapomorphy. Characters
-hal fit to a ciadogram with the same length pass the test, whereas those
cquiring more steps are deemed homoplastic. Thus, determination of homo-
gues becomes an empirical procedure and the final arbiters of homology
rc the characters and character states themselves (Patterson (982). Conse-
ucntly, the more characters that are included in the analysis, the more
~
c m a nding the test of homology becomes. This aspect becomes important in
ater considerations of simultaneous analysis or so-called lolal evidence (see
hapter 6).
2.4 CHARACTER CODI NG FOR DISCRETE

VARIABLES
'he crucial point in systematics is not the source of evidence from which we
lI empt to derive characters. as this will vary frolll group to group, but how
(mUTes might be usefully coded so as to reflect accurately our observations
Qr a particular scale of problem. We have already stated that for characters
)f character states to be cladistic, and hence features of taxa that might be
le ntial homoJogues, they must be scorable into data matrices that contain
me pattern for relationships of taxa to be discovered. Converting raw data
to codes for cladistic analysis is something that has to be done with
nsilicfllbJe care. There are many different ways of coding characters and
he outcome of different coding schemes can dramatically affect hypotheses
relationships. We shall demonstrate thi s using a hypothetical example of
nn icling character states of shape (round and square) and colour (black or
~" e) (Fig. 2.2).
or the purposes of eXpoilna lOme of the problems associated with
racter coding, we delClibe''-iiIM~~roI
F,"",
=Iaa
~ four coding methods, although
do not oxhBUIt tho I1'l!!lIlIlil
28 Characters alld character coding
v w x y z
• o • o
Fig. 2.2 Differen t ~xJlrossi o li s of conflic ti ng characters in five taxa (V-Z): absen t.
ro und and black, round and white, squ are and black. squa re and white. (After Plei lel
1995.)
three independent bina ry chanteters fo r shape and colour (Table 2.4c);

five completely independent characters (absence/presence coding) ( f able
2Ad).
Table 2.5 shows how these characters and character Siaies might be
distributed among five differe nt taxa (V-l.). Coding method A assumes
interdependence be,tween the main fea tures and codes everything into a
single muitistale character. Cod ing method B treats colour and shape as two
qu ite separate characters but includes an extra slate (0) to account fo r
absence of each character. Coding method C is similar to B but treats
Table 2.4 Four coding methods for the features shown in Fig. 2.2. Characters are
labelled with integers in bold a nd character sta te codes as integers ill pa rentheses
(0) MelllOd A: formula codinil 0$ one mullistote character with Jinked stoles.
' ,_D_~_ht_~_~_.~_~_re
and whi te (4)

Ib) Method B: shope and colollr a ttribules !realed as two independen l lllu ltistatc
characters.
1. absen t (O~ round ( 1 ~ square (2)
2. absen t (0); black (1): white (2)
(cl Method C: hierarchical coding with shope and colour a ttributes Irealed as
two independen t binary characters a nd an additio nal code for presence alld
absence of Ih e features; inapplicable observations fo r the absence of the fe ature
accommodated using question marks.
1. feat ures ausent (Oh features present (1)
2. ro und (0); square hl
3. b lack (0): while (1)
(tl) Method D: independent coding of variables ossuming no lransformations.
1. fea tures absent (0): features present 11)
2. round shape absen t (O~ rou nd shape present h)
J. square shape absent lo}, square sha pe pUlsent (1)
4. black pigme nta tion absen t (0); hlllck piWlltllltll tloll presonl (1)
5. white pi!!mcntlltion nhsont (O ); whito Jl i~lIl1mtlltiu n prestlnt (I)
Character coding lor discrete variables 29
Table 2.5 Five taxa (V- Z) wi th the featu res shoWII in Fig. 2.2, scored according
to tb e four coding methods (A-D) li sted in Table 2.4
Taxa Codins method
A D C 0
V 0 00 O?? 00000
W 1 11 100 11010
X 2 12 101 11001
Y 3 21 110 10110
Z 4 22 111 10101
presence and absence of any character state as a separate character and thus
has three columns. Coding method D assumes that all five character 'sta tes'
are independent characters. Following Pleijel (1995), these coding methods
are discussed under the four different headings of character linkage (depend-
ency between characters in a single matrix), hiera rchical dependency, missing
va lues and information content.
2.4.1 Multistate characters- character linkage during analysis

It is generally held thai characters in 3 cladistic analysis should represent
indepe ndent hypotheses of relationship. Correclly identified homologies are
expected to exhibit congruence and converge onto similar fundamental clado-
grams (ideally one). Homoplastic characters, on the other hand, are expected
to show scatter on cladograms. The main problem in choosing an appropriate
coding method is to arrive at an accurate division of characters and character
~ tat es so that Ihey reflect the relationships of Ihe organisms. The more that
characters become linked the greater is the departure from independence
and consequently the risk that one fa lse homology can obscure the topologies
or true homologies is decreased. Uniting independellt features into a single
lI1 uitistate character (method A) minimizes the effects of linkage, while
treating muitistate characters as unordered (see Chapter 3) allows transfor-
mation of one character stale to anot her in a single step. Because there arc
fo ur states in th e example (in addition to absence), there are fo ur possible
types of transformation s: a single ordered muhistale, a single unordered
rnu ltistate, two ordered muhislales, and Iwo unordered mullistates.
2.4.2 Binary characters-character linkage during analysis

In HI! other methods, where the d,lIa arc partitioned into separate columns,
the degree or partition creates differenl problems. Method B treats shape
and colour as independent Chanl(;lcr ~ bUI to cater for those taxa that do nOI
exlrll stille <ill ( nlblc~ A ft., en

IHlve any of the feliturcA, It .. necc... ry It) c(l(tc for their llbsence usi ng an
)f his approach is lhat duplicat ion
30 Characters alld character coding
of absences might become a problem when many different characters arc

perceived as connected to a feature that is absent from some taxa (Maddison
1993), This problem is solved to some extent by method C, where
presence/absence is included as a separate character and missing values arc
used for inapplicable observations (Table 2.5), Coding method 0 makes no
adjustment for character linkage and treats the characters as five separale
columns in the matrix (Table 2.5),
2.4.3 Hicrarchicul character linkage

Characters and characte r states will often be coded differently in an analysis
of a group of closely related taxa when compared to a more general study
that contains these taxa as only one small subgroup. At the more general
level, we may be satisfied to code just that information relating to the
absence or presence of a feature, whereas at. the less general level we might
choose to encode more of the observed variation. Thus, characters and
character states can vary in how they are coded at different scales, in other
words, character interdepende ncy is affected by hierarchy.
Coding method A divides the feature s of shape and colour into five linked
states in a single character. This may be satisfactory for a high level analysis,
when the main distinction is between absence and presence of any of the
different states. However, at a more inclusive level, we would tend to code
the shape and colour as separate characters and states, especially if none of
members of the study group lack any of the states being considered. The four
coding methods, A - D, can be placed on a sliding scale of differential
reformulation to satisfy different analytical scales. The addition and deletio n
of character states at different levels of analysis will affect the cladogram
topologies found using methods A - C much more than with method D, which,
by dividing the component attributes as finely as possible, tends to remain
stable al any level of analysis.
2.4.4 Transformation between character states; order and polarity

Once characters arc coded as multistatcs, the implications of order and
polarity can have major effects. Consider a character with three states, 0, I
and 2 (Fig. 2.3). In ordered analysis, the gain or loss of a state may be viewed
as incremental and thus a change from 0 to 1, or from I to 2, is considered as
one step. A change from Slate 0 to stale 2 requires a transformation via state
1 and requires two steps. The character is said to be ordered: 0 ...... 1 ~ 2. If
the direction of change of such a character is also fixed using an a priori
criterion, then the character is also said to be polarized, in this example
either as 0 ~ 1 -+ 2, 2 ~ J ~ 0 or 1 +- 0 ..... 2. In contrast, unordered cha.rac-
cost and thus nine transformation s are possible (F~ ~_n",. ..

ters are aJ lowed to change from anyone state into any other state with equal
ibilities
Character coding for discrete van'ables 31
• b ,
0-)1-)2
0-)2-)1
1 - .0-)2
1-)2-)0
2-)1-+0
2-+0 ..... 1
0..-1---.)02
04-2_ 1
1 ...... 0---.)02
Fig. 2.3 (a) The nine possible transformations for a muilistate character with three
sla les; 0, 1, 2. (b) Tho three allowable transformations between three states following
imposition of the order shown in le).
can be restricted to a set of three transformations by selecting a particular

character order and then to a single sequence by making another choice
regarding polarity. (See Chapter 3 for a detailed account of polarity, rooting
and optimization).
For example, consider that through study of the ontogeny, we have ascer-
tained that not only is absence piesiomorphic bUI also thai the round shape is
subsequently transformed into the square shape. The states of this multistate
character could now be considered incremental and both the order and
polarity could be included as extra information in a cladistic analysis.
2.4.5 Missing values and coding

The problem with linked characters due to absences may be circumvented by
adding extra absent/present characters, as in Table 2.4c. Taxa lacking the
fea l'Urcs of bOlh shape and colour are then scored with question marks (i.e. as
missing values) for character states connected to this feature (Table 2.5). This
solution draws attention to other problems. Platnick et at. (1991a) showed
th<lt absences can occur for a variety of reasons: unknown data, inapplicable
ual<l and polymorphism (see also §4.2). Coding problems due to terminal
polymorphism can be catered for in programs such as PAUP and MacClade,
while the problem o f unknown data can only be solved by further obselVation.
However, the dilemma of inapplicable data remains (PJeijeJ 1995). Coding
methods A and B simply accommodate the problem by treating absence as a
stale equivalent to both shape and colour, while in method D, the problem
cannot exist. In melhod C, the use of question marks does not distinguish
between inapplicability IDd abIonce due to lack of knowledge and, further·
more, can lead to problema In mlCltprclatlo n of results (Platnick el at. 199J a).
A detailed ac:couDt or ina data is given in Chapter 4.
Table Z.6 Sankorf cost matrix (see lext for information). Character codes
follow Table 2.4; absent (o}; round and black (1~ round and while (2},
square and black (3~ square and white (4). Cha racter costs are shown
as 0, 1 or 2 in the ma trix
0 1 2 3 4
0 0 2 2 2 2
1 1 0 1 1 2
2 1 1 0 2 1
3
4
1
1 ,
1 2
1
0
1
1
0
2.4.6 Information content and the congruence test

For any coding method to be efficien t, it must be able 10 transform observa-
tions into a suitable form for cladistic analysl6 without loss o f informa tion, SO
th at the coded informatio n can take pari in the:; most crilical test of ho mol-
ogy: congruence. The single multistate coding procedure (method A) (Table
2.5a) intimately ties colour and shape into a formula. Thus, it can never allow
either of the features to be tested independently of one another by other
characters. However, because there are four states, two for colour and two
for shape, it is possible to code a cost matrix (Sankoff and Rousseau 1975)
that considers the logical transformations (Table 2.6). A change from absence
to presence of any shape and colour costs two steps. However, change fr om
one shape to another, or from one colour to another, costs one step. Loss of
the feature also costs one step. Such cost matrices make no assumptions
abou t the direction of transfomat ions, but allow differential costs to be
applied to different stale changes within a single multistate characte r. Cost
matrices are discussed further under the headi ng of 'generalized optimiza-
tion' in §4.J .5.
In comrast, absence/ presence coding (method D) (Table 2.5d) is a formu -
lation of a ll potential homologies. Methods B and C (Tables 2Ab- c) are
problematic in that they make transformation ass·umption s in the codings that
cannot be tested by congruence. Character states represent homologies in tbe
fo rmulat ion of a character that are tested by congruence of similar character
state codings of other characters. However, if two character states are locked
into the same character, c.g. round with square or black with white, at least
one of the alternatives can never be tested by congruence. Instead, the
homology of the two states is assumed a priori. The absence / presence coding
method avoids making homology assumptions between features by coding
them all as separate cha racte rs. This method of coding allows every character
to be tested by every other character, but ignores the logical dependency
between the two manifestations of shape and the two of colour. It denies the
prim ary homology assessmen t based on similarity and can lead to pseudo-
parsimonious reconstructions during cladiltic analysis (Me.r, 1994),
Morphomelric dala ill cladistic analysis 33
2.5 MORPHOMETRIC DAT A IN CLADISTIC

ANALYSIS
All character stales (as used in cladistic analysis) Clre frequency distributions
of attribute values over a sample of individuals of a taxon (Thiele 1993).
Consequently, there arc many situations in which continuously variable
morphometric data have to be considered as cladistic characters. evcn when
the taxa have overlapping frequency distributions. It is quite possible that
continuous values, however opaque in the raw form , contain grouping
homologies when identified through discrete coding and cladistic anillysis.
Continuous variables shoult1 Oilly ue t:xdudet1 ir the dat1islic <lualysis cauuot
handle such data or if it can be shown empirically that those characters
convey no information or phylogenetic signal relative to other characters in
the data matrix.
2.5.1 Coding morphometric data

There are a number of methods (e.g. in MacClade) that can handle morpho·
metric data without recoding and that have some limited use for looking at
character evolution ovcr trees (c.g. Swofford and Berlochcr 1987. Huey and
Bennett 1987). However, in order to compare continuous variables with
quali tative variables on cladograms, continuous characters have often been
recoded as discrete characters (c.g. Cranston and Humphries 1988, Chappill
1989, Thiele and Ladiges 1988).
Generally, all methods can be described as gap-coding methods, although
the re are numerous variations: simple gap-coding (Mickevich and Johnson
1976); segment coding (Colless 1980); divergence coding (Thorpe 1984,
Almeida and Bisby 1984); generalized gap·coding (Archie 1985, Goldman
1988); range coding (Saum 1988) and gap-weighting (Thiele 1993). They all
have one thing in common- u simple algorithm to create gaps so as to
produce discrete codes for overlapping or continuous values. Samples of taxa
arc ranked along a scaled attribute axis, and then the attribute axis is divided
into states. Simple gap-coding divides the axis at those points where no values
occu r or between the means of the frequcncy distributions at the point where
the 'gap' exceeds a particular preconce ived value, such as one standard
deviation about the mean. Usually, the attribute axis will be divided into
fewe r states than there are taxa and for most computer programs there is an
upper bound to the number of states per character that can be analysed (32
in PAUP, 26 in MaeCiade and to in Hennig86, PlWE and NONA).
2.5.2 Gapoweightina:
The following method tra. 1biI1e (1993) is one of several methods that
provides more thaD limtM pp<odina by adding a weight code. Gap-
~Iah lina Ullel addlitYl! ... &0 each code so that the score in
34 Characters a/ld character codillg
A B C 0 E F G H
(81
(bl
f\:JfJ\J\JY\
". 17.
35 46 73 80
" '"
(cl •
I I
0.7
I
2.5
I
2 .' ,.I 5.5 7.'
I
,
(dl 0 3 3 4 6
• 9
Fig. 2.4 Example of coding using the gap weighting method of Thiele (1993). (a)
Frequency distribution curves for eight taxa, A-Ii. lb) Means for the taxa on the
attribute sca le. (c) Values sca led to 0 rII oge Of 10 (0- 9). (d) Integer coding for
analysis using Hennig66.
the column of the data matrix not only relates to the position of each slate
relative to every other stal e over the range, but also mainta ins the relative
sizes of the gaps between them . A suitable resca ling fu nction is also used to
allow the full range of integers that can be ha ndled by a given cladistic
computer programme to be used and thus ensure tha i as much of the raw
attribute data as possible is utilized in tbe codes.
1. T he raw data are initially ranked as a n ordered set of states, arranged

according to the values of the means, medians or o ther appropriate
measure of range. If the variances are not equ al, the data should be
standardized using a n appropriate transfonnation. One of the simplest is
log(x + 11 (Fig. 2.4,).
2. The data are the n range standardized, e.g.:
x s - (x - mi n/ max - min)"
where x is the raw datum, Xj is the standard ized datum a nd " is the
maximum number o f ordered states allowed by the cladistic computer
progra m (Fig. 2.4b).
3. Code the values as the rounded integer of the standardized values (Fig.
2.4c, dl.
4. T reat the character as a n o rdered multistatc fo r a nalysis (Fig. 2.4d).
Thiele (1993) considered the essential ele me nts o f the method to be as

follows.
I. The rescored characters retain information of

Discussion: characler discovery and coding 35
states and the sizes of the gaps belWeen states. Consequently, transforma-
tions between states are weighted proportionally to the sizes of the gaps
separating them .
2. All differences between states are accepted as potentially informative.
Parsimony is relied upon to discriminate truly informative gaps from
spurious ones, rather than using a priori statistical tests.
3. Differences within and between characters are equalized using a transfor-
mation and range-standardization procedure.
The method weights explicitly, using the assumption th at transformations

belWeen widely separated states are more likely to be informative than those
between narrowly separated states. This is important when characters con-
flict , especially different con tinuous variables. H9wever, this assumption will
be contested when broader patterns of congruence become the fi nal arbiter
during analysis. In a conventional analysis, binary 0/ 1 characters have a
range of 0 to 1. Gap coded characters have ranges of 0-9, 0- 26 or 0-32
depending upon the computer program used . It is as important, therefore,
that binary characters are either weighted by 10, 26 or 32 or coded as 0/ 9,
0/26 or 0/31. so as 10 maintain parity.
2.6 DISCUSSION, C HARA CTE R DISCOVERY

AND CODING
The aims of character discovery and character coding are to identify as

accurately as possible those features tbat ultimately diagnose relationships of
taxa. Characters should be determined and scored so that all possible
hypotheses of homology or synapomorphy can be examined through cladistic
analysis. Characters come fro m many sources and the aims of comparative
bio logy and systematics are one and the same- to determine the relevant
level of universali£y at which particular characters should be placed on
c1adograms so as to provide hypotheses of relationships between organisms
and groups:
2.6.1 Choice of characters

The debate as to what is a good cladistic character has produced a scale of
preferences from clear-cut qualitative differences Ihat prove to be robust
homologies to quantitative va riables that need to be heavily man ipulated
using special coding proc:edures to extract a potential phylogenetic signal. In
almost all reported 1In.~. it it the clear-cut qualitative characters that
delimit groups un am~. No doubt it is true that qualitative data are
more relilble thlD otbIn die perfect cladist ic character is one
tthlt hal -.n ... ram . Continuous characters
invariably produce cladograms with lower levels of fit than qu alitative charac-
ters. Clad ists wishing to use continuous characters have employed various
procedures in order to include them. There is often significant covariation
with other characters, even when continuous characters are recoded as
ordered multislates, which suggests thai they do tend to operate as linear
series. In many cases, morphometric and qualitative characters are found to
map simila r phylogenies and be informative about those phylogenies. It is
likely that morpho metric data will continue to be used most often in studies
of closely related taxa, whil e presence/absence characters will be used in
studies of higher ranking taxa. T he judgement that all morphometric data is
garbage is unnecessarily harsh and it is still open to debate what constitu tes
reliable evidence in clad istic analysis.
2.6.2 Coding
There are two schools oC thought on codirlg methods: those that advoca te
absence/ presence coding and those that consi.der additivity and multistate
coding as appropriate for d iagnosis of taxic relations. Absence/presence
coding is invariably binary and contrasts presence against absence. There are
very Cew cases oC true absence/ presence coding in the literature. most likely
due to linkage problems (Pleijel 1995). The most commonly used Corms oC
coding are methods A, Band C (Table 2.4). Pimentel and Riggins (1987)
considered that aU cladistic character.; should be treated as mu ltistates and
ideally coded as multiple column additive binary characters in order to
distinguish a pdori between linear and branched character state trees. They
considered that character states cannot be treated as simple, nominal va ri·
abies because redundancy is introduced and in format io n content is sacrificed.
In other words, additivity is a Corm oC inCormation and there arc many
reasons to be sympathctic to this viewpoint, especially in special cases, such
nucleot ide sequences. Here, the nucleotide codes (A, C, G, T) are invariably
considered as alternatives in multistate columns. Application of absence/
presence cod ing has yet to be considered in molecular systematics and there
is no body of opinion that considers base substitution as anything other than
a special form of character state transformation.
The issue of whet her one uses multistate or binary coding revolves around
the issue of transCormatio n between character states (Wilkinson 1995). Devo-
tees of multistate coding accept that characters should be treated as transfor·
mation series and that hypotheses oCadjacency between similar, but different,
character stales should be coded a priori in the character state matrix.
Transformation series analysis (TSA) is perhaps the most elaborate manifes·
tation of this method (Mickevich 1982). In contrast, absence/ prcsence cod ing
is a more si mple and straigh tforward approach than any oC the alternat ives.
Every variable is kept separate as a potential synapomorphy to be tested
aga inst othe r coded observations. Although redundancy of addilionui libsence
scores may be a problem , there IIrc lldvanrages in not buUdina unwarranted
assu mptions into the data . The udvuntllge is thai i decisions
Chapter summary 37
(lpriori, character hierarchies emerge from the results (Pleijei 1995). Three-
item statements analysis (Chapter 7) takes the argument further and by
converting characters into minimal expressions of the kind A(BC), ai ms to
move away from ideas of transformation in cladistic analysis.
2.7 CHAPTER SUMMARY
1. Cladistics is the discovel)' or selection of characters and taxa, coding of

characters and determination of cladograms using the property of homol-
ogy and the criterion of parsimony to best explain the distribution of
characters over the taxa.
2. Modern data sets generally require characters to be scored as discrete
alphanumeric codes in columns and taxa in rows.
3. A variety of fillers operate between the initial discovery procedure and the
recording of variation in a data matrix. The most commonly used filter in
cladistic analysis tends to reject characters that are continuous and quanti-
tative and favours instead characters thal are discrete and qualitative.
However, it is demonstrated here that the main issue is to determine for
all characters which are c1adistically useful or nol, and that all characters
can be arranged on a sliding scale of most useful to least useful. To reject
quantitative and conl'inuous data in favour of qualitative and di screte data
implies that quantitative, qualitative, continuous and discrete refer to differ-
ent kinds of data differing in value, and that some quantitative variables
ca n be determined a priori to have greater systematic value than others.
4. The interplay between the discovery procedure of determining characters
and cladistic analysis determines which structures are homologues and
which are not. Homology is the core concept of comparative biology and
systematics. For cladistic analysis to be successful we believe that it is
necessary to have principles that do not assume transformation, but to
describe characters as hypotheses of homology that can be tested with the
crite ria of similarity, conjunction and congru ence.
5. There are many different ways of coding characters and the outcomes of
di fferent coding schemes can dramatically affect hypotheses of relation-
ships. For discrete characters fo ur codi ng methods are described to show
the differences between various kinds of multistate coding and binary
coding. Character coding methods are discussed under four different
aspects of character linkage (or dependency between characters in a single
matrix), hierarchical dependency. missing values and informat ion content,
to show that codina metbodI that offer the most stringent test of homology
are those to be prcr,rncL Finally. there is a brief discussion of the
princ::iplcs of codin c:r5.
3.
Cladogram construction, character
polarity and rooting
3.1 DISCOVERI NG T HE MOST PARSIMON IOUS

CLAOOG RAMS
3.1.1 Hennigian argumentation
The first explicit method of cladogram const"tuction was proposed by Henn ig
0950, 1966) and is thus called Hennigian argumentation. This method
considers the information provided by each character individually. Groups o n
c1adograms arc recognized by the possession of apomorphies and, in
Hennigian argumentation, these apomorphies are identified a priori, that is,
the characters are polarized into plesiomorphic and apomorphic states before
the cladogram is constructed. The subject of character polarity and the
recognition of apomorphies is discussed in detail later in this chapter. For
now, it is assumed that the apomorphies have been recognized and coded 1,
with the piesiomorphic states coded O.
Consider the data set in T able 3.1 (derived from the characters in Fig. 1.3,
p. 4), in which we wish to resolve the interrelationships of the shark, salmo n
and lizard. This group comprises the study taxa, also referred to as the
ingroup. Taxa with which members of the ingroup are compared in order to
polarize characters are termed o utgroup taxa, in this case, the lamprey. First,
consider character 1. All the ingroup taxa share the apomorphic state, in
contrast to the outgroup. which has the plesiomorphic state. Character 1 thus
un ites the ingroup as a monophyletic group (Fig. 3.1a). Character 2 shows the
same distribution and corroborates tbe information contained in character 1.
Repeating this approach, characters 3 and 4 can be seen to unite the salmon
and the lizard into a monophyletic group to the exclusion of both th e lamprey
and the shark (Fig. 3. tb). The relationships among the ingroup taxa are now
fu lly resolved . Characters 5- 12 are autapomorphies of individual ingroup
taxa and thus have no role in cladogram construction at this level of
universality (Fig. 3.1 c). Howeve r, if this solution is accepted, then cha racter
13 must be placed on to the cladogram twice, resulting in a total of 14 steps.
An alternative topology wou ld un ite the shark and the salmon using the
putative apomorphic state of character 13 (Fig. 3. 1d). However, as seen in
Chapter I, this topology would entail two occurrences each for characters 3
and 4. It wou ld therefore have IS steps and wo uld n cd u a less
parsimonious result.
Discovering the most parsimonious cladogrmm 39
Table 3.1 Data set to illustrate Hennigian
argumentation
Lamprey 0000000000010
Shark 1100000000101
Sa lmon 1111 00000 1001
Lizard 1111111110000
With small data sets that arc reasonably free from homoplasy, Hennigian
argumentation is quick and simple to implement. However, most data sets
have large numbers of taxa and characters, as well as grcater degrees of
homoplasy. which makes finding the most parsimonious c1adograms by
I-Iennigian argumentation extremely lime-consuming: Thlls, computerized
methods bave been developed that speed up the search for most parsimo-
nious or minimum-length c1adograms.
3.1.2 Exact methods

Such computerized methods fall into two categories. Exact methods are those
that guarantee to find one or all of the shortest cladograms. Of these, the
simplest to understand is exhaustive search, in which every possible fully
resolved, unrooted cladogram for all the included taxa is examined and its
length calculated. In this way, it is certain that all minimum-length dado-
gra ms will be identified. A simple algorithm to perform an exhaustive search
LAMPFEY SHAAK LIZAAO
""'"
Cal
LAMPJ'EY SHAAK LIZAAO LAMPFEY SHAAK "',,"0
"
1 -d) D~ ~:::i:i:~:~
P'I& •• taT.bl,u clado8ram for the dllta
text for explanation.
III
40 Cladogram constnJCIion, character polarity and rooting
y
,
Y
------
C
Y
------
0
v • '0 C
, , ,
I I I
BEe 0
y,
8
~
,
£ C 0
y, B E Co C
~
, V,
II ceo •o , C 8 e 0 C
Y, • 0 ,
Y, V, C ,
8 C £ ' 0 BCD E C 8 0 C E BED C • 0
V, Y, V, Y, V, ~
,
Fig. 3.2 Ill ustrAtion of the exhaustive search strategy for determination of most
parsimonious cladograms. See text for explanation.
is outlined in Fig. 3.2. First, three taxa aTe chosen and connected to form the
only possible unrooted, fully resolved cladogram for these taxa (Fig. 3.2, top
cladogram). A fourth taxon is then selected (which taxon is chosen is
immaterial) and joined to each of the three branches of cladogram 1, yielding
three possible networks for four taxa (Fig. 3.2, second row). A fifth taxon is
then selected and added to each of the five branches of the three cJactograms,
giving 15 cladograms (Fig. 3.2, rows 3-5). This procedure, in which the 11th
taxon is added to every branch of every c1adogram (each of which contain
,, - 1 taxa) generated in the previous step, is continued until all possible
c1adograms for II taxa have been constructed. Finally, the lengths of all these
cJadograms are calculated and the shortest chosen as opti mal (most parsimo·
nious).
Unfortunately, searching for most parsimon ious cJadograms is what mathe·
mat icians term a ' hard' problem, that is, one that requires an exponentially
rising number of steps to solve as the size of the problem grows. T hat th is is
true of the search for most parsi monious cladograms can be appreciated by
inspecting the number of fully resolved, un rooted networks that mu st be
evaluated as the number of taxa increases (Table 3.2). For a c1adogram that
currently includes n - 1 taxa, there are 2n - 5 pouible poaitiOOIIO which the
nth taxon can be attached. So, wk.II .... are lOS A.IMr. IIIOI¥td. uncooted
Discovering the most parsimonious cladograms 41
Table 3.2 The number of fully resolved, unrooted networks

possible for n taxa
n
1
2
3 1
• 3
,
S 1S
lOS
,
7
10395
94S
9 135135
10 2027025
11 34459425
12 654729075
,.
13
1S
13749 3105 75
316234143225
7905853580625
16 213458046676875
17 6190283353629375
18 191898783962510625
19 6332659870762650625
20 221643095476699771675
62 6.66409461 X 10 ~6
63 >10 100
cladograms for six taxa, there are over 2 x lOW topologies for only 20 taxa,
while the number exceeds 10100 for as few as 63 taxa. Thus, it is doubtful
whether exhaustive search is a practical option for problems with more than a
moderate number of taxa.
Fortunately, there is an exact method available that does not require every
completed topology to be examined individually- the branch-and-bound
method. The branch-and-bound procedure closely resembles that of exhaus-
tive search, -bul begins with the calculation of a c1adogram using one of the
heuristic methods described later in this chapter. The length of this clado-
gram is retained as a reference length or upper bound for use during
lIubsequent c1adogram construction. The branch-and-bound method then
proceeds in a similar manner to exhaustive search but now, as the path is
followed , the lengths of the partial networks are calculated at each step and
compared with that of the upper bound. As soon as the length of a partial
nelWork exceeds that of Ihe upper bound, that path of c1adogram construc-
tion is abandoned, becauM tUlttachment of additional taxa can serve only
to increase the le~ funber. By &lUI means, Ihe number of completed
cladoarama that mu t btl ,ntu.1II II "Illy reduced.
Once I partJc; III I.XU have been added, then the
42 Cladogram constmc(iorl, character ptJlarity and rooling
length of the resultant c\adogram is once morc compared with the upper
bound. If its length is equal to the upper bound, then this cladogram is
retained as one of the set of optimal topologies and the branch-aod-bound
process continued. However, if the length is less than the upper bound, then
this topology is an improvement and its length is substit uted as the new upper
bound. This substitution procedure is important because it enables subse-
quent paths to be abandoned morc quickly. Once all possible paths have been
examined, then the sel of opt imal cladograms will have been found.
It is impossible a priori to estimate the exact tim e required to undertake a
branch-aod-hound analysis, as this is a complex func tion of computer proces-
sor speed, algorith.mic efficiency and the structure of any hOmoplasy in the
data. Most branch·and-bound applicat ions employ algorithmic devices to
ensure the ea rly abandon ment of path searches and th us reduce computation
time. For exa mple, efficient heuristic me thods can minimize the in itial
estimate of the upper bound. However, a "l?ranch-a nd-bound analysis is still
time consuming to implement and should not generally be considered fo r
data sets compri sing large numbers of taxa.
3.1.3 Heuristic methods

For larger data sets, approximate or heuristic methods must be adopted,
which generally use ' hill-climbing' techniques. These approaches are essen -
tia lly trial-a nd-error and do nOI guarantee to find all, or even any, of the
minimum-length c1adograms. Thus certainty of finding the optimal resu lt is
sacrificed in favour of reduced computational time.
As an analogy, consider a group of hikers who a re aiming to climb to the
top of a mou ntain as fast as possible, with the aid of a map and a compass.
However, as they begin their ascent, they walk into a mist that obscures their
view of all bu t the immed iate vicinity. In order to reach the to p in the
shortest possible time, the hikers' best strategy wou ld be to walk up the
mountain following the line of steepest ascent. The map shows them that Ihis
line is always perpendicular to the con tour line passing through their current
location. By following this li ne, they will eventually reach the summit.
However, if there is more than one peak to the mountain, o r it has
subsidiary hills and ridges, the n this approach might yield only a locally
optimal result , in thaI the hikers will simply reach the peak nearest to their
starting poi nt. There may be a higher summit e lsewhere, but they would be
unable to reach it because to do so would entail going down from their
current position and descent is forbidden in hill-climbing. One strategy, which
is ava ilable to hill-climbing computer algorithms, but probably forbidden to
even the most athletic hiker, is to leap horizontally from one peak to another
in an attempt to move from a slope that would lead o nly to a local opt imum
to the one that would lead to the global optimum. Such a procedure is
allowed because it does not entail descent. BUI thiI .rniIIIII!M lntufficient
because the various slopes may be too widely separated to be reached. Such
isolated clusters are referred to as 'islands', If such islands exist, then one way
to maximize the chance of reaching the true highest summit, the global
optimum, is to take several randomly chosen starting points and choose that
which leads· to the highest summit, with or without Icaping.
Although the hikers analogy may seem frivo lous, its terminology can be
translated directly into that of searching for minimum-length cladograms.
The highest peak is the set of minimum -length cladograms, the global
optimum. The sim plest computer algorithms merely make a single pass
through the data and construct a single topology. This is equivalent to
following the local gradient from where the hikers stan ed. However, although
the judicious addition of taxa to the partial cladogram may improve the
outcome, the resultant cladogram is most likely to be only locally optimal
unless good fortune prevails. More complex routines begin with a single
topology, then seek to locate the global optimum by rearranging the clado-
gram in various ways. These branch-swapping algorithms are equivalent to
jumping between hills. But branch-swapping routines are constrained to try
always to decrease the length of the cladogram with which they arc currently
working. Thus, if the global optimum exists within an alternative set of
topologies that can only be reached from the current position by branch-
swapping longer cladograms than are cu rrently to hand, then the global
optimum can never be reached from that starting point. This is the ' islands of
trees' problem. [f multiple islands do exist (something that is usually unknown
prior to analysis), then we can endeavour to land on the island that includes
the most parsimonious solution by running several analyses, each of which
starts from a topologically distinct cladogram.
Stepwise addition
Stepwise addition is the process by which taxa are added to the developing
cladogram in the initial building phase of an analysis. Initially, a cladogram of
three taxa is chosen, then a fourth is added to one of its three branches. A
fifth taxon is then selected and added to the network, followed by a sixth and
so on, until all taxa have been included. There are various methods for
choosing the initial three taxa, the addition sequence of the remaining taxa,
and the branch of the incipient cladogram to which each will be added.
The least sophisticated addition sequence selects the first three taxa in the
data set 10 form the initial network and then adds the remaining taxa in the
order in which they appear in the data set. The increases in length that would
result from attaching a taxon to ellch branch of the partial cladogram are
calculated and the branch selected that would result in the smallest increase.
A variation on this pr(X:edure uses a pseudorandom number generator 10
reorder the taxa in tbe. ......t prior to cladogram construction. A more
elaborate procedure _iii by Farris (1970), which he termed the
'simple Eligorithn~ Elm t Ja chosen. usually the first taxon in
44 Cladogram co/tstrnclion, character polarity and rooting
the dat a set. Then the difference between this taxon and each of the othe rs is
calculated as the sum of the absolute di ffe rences between thei r characters.
Farris called Ihis the 'advancement index', The initial netwo rk is then
constructed from the reference taxon and the two other taxa that a[c closest
to it, i.e. those that have the lowest advancement indices. The remaining taxa
are then added to the developi ng cladogram in order of increasing advance-
ment index, wit h tics being broken arbitrarily.
In all of these methods, the order of taxon additio n is determined before
cJadogram construction is begun . [n contrast, Swofford (1993) has imple-
mented a dynamic procedure, wh ich he calls 'closest', in which the addition
sequence is continually reassessed as the cladogram is built. First, the lengths
o f the networks for all possible triplets of taxa are calculated and the shortest
chosen. Then at each subsequent step, the increase in length that would
follow from attach ing each of the unselected taxa to each branch of the
developing cladogram is calcul ated and th~. taxon / branch combination that
gives the smallest iJ1CreaSe in overall length is chosen. As in all methods, ties
arc broken arbitrarily. This procedu re requires much more computing time
than do the other addition sequences. In these, the number of increases in
dadogram length that must be calculated at anyone step is equal only to the
number of possible attachment points (branches). The dynamic procedure
multiplies this by the number of unplaced taxa.
No one additio n sequence works best for all data sets. The less sophisti-
cated methods arc quicker but their inefficiency results in dadograms that
may be rar from opt imal and subsequent branch-swapping may then take
longer than it might. Run times using dynamic stepwise addition may be
excessive for extremely large numbers of taxa but this is less problematical as
processor speeds increase. The random addition sequence is useful in that it
can provide a number of different starting points and thereby improve the
chances that at least one will lead to the global, rather than a local, optimum.
Random add ition can also be employed as a non-rigorous means to
evaluat e the effectiveness of heuristic procedures. If one runs 100 replicates
using random addition and the same set of most parsimonious cJadogra ms are
found each ti me, then one can be reasonably certain that these do represent
the set of globally optimal topologies for that data. However, if by the
hundredth replica tion additional topologies or islands are still be ing discov-
ered, then it is likely that there are even more remaining to be found.
Therefore, it is recommended that analyses be repeated several times at least,
with the input order of the taxa randomized between runs.
As me ntioned above, a major problem with stepwise addition is that one
cannot backtrack from a given position. Algorithms with this property are
termed 'greedy'. Essentially the problem is their inability to predict the
future, that is, which of several options at a given point will ultimately lead to
the best result. The placement of a taxon on a partial cladogram may be
OPtimal at that point. but may be seen lubaequently to have been luboptimal
Discovering the most parsimonious dudograms 45
once further taxa have been added to the network. Once a taxon has been
added to a particular branch of a partial cladogram, the consequences of that
decision must be accepted. The problem is most acute when ties occur early
in stepwise addition. Ties may arise because, at a given stage, the addition of
two or more taxa may increase the length by the same minimum amount, or il
may be possible to add a single taxon equally to two or more branches, or
more than one equally parsimonious topology is found. An 'incorrect' selec-
tion may then lead well away from the global optimum. But imagine how our
hypothetical hikers could improve their chances of reaching the highest
summit if they could climb more than one hill at a time. This is what heuristic
programs do when they retain more than o ne topology at a given step. These
may simply be the set of shortest partial cladograms at Ihat stage, but a fixed
number may also be selected. Then, suboptimal topologies may also be
retained. This procedure reduces the effect of tics because, to some extenl
(depending upon the number of topologies retained), each of the alternatives
is followed up.
Branch -swapping
Ln practice though, manipulation of addition sequence alone will generally
yield only a local optimum. However, it may be possible to improve on this by
performing a series of predefined rearrangements of the cladogram, in the
hope that a shorter topology will be found. These rearrangements, commonly
referred to as 'branch-swapping', are very hit-and-miss, but if a shorter
topology does exist and sufficient rearrangements are performed, then one of
these rearrangements is likely to find it.
Branch-swapping algorithms are implemented by all cladistic computer
packages. The simplest rearrangement is nearest-neighbour interchange
(NN I), sometimes referred to as lotal branch-swapping. Each internal branch
of a bifurcating cladogram subtends four ' nearest-neighbour' branches, two at
either end. In Fig. 3.3, these are A + B, C, 0 , and E + F. NNI then exchanges
a branch from one end of the internal branch with one from the other, e.g. C
with E + F. For any internal branch, there are just two such NNI rearrange-
ments. The procedure is then repeated for all possible internal branches and
the lengths ·of the resulting topologies calculated to determine whether they
are shorter.
More extensive rearrangements can be performed and such methods are
often referred to as global branch-swapping. These involve clipping the
cladogram into two or more subciadograms and then reconnecting these in
various ways, with all possible recombinations being evaluHted. 'Subtree
pruning and regrafting' (SPR) clips off a rooted subcladogram (Fig. 3.4) from
the main cladogram. This is then regrafted on to each branch of the remnant
cladogram in turn and &bIluath of the resultant topologies calculated. All
possible combinadODI of ......... and rearafting are evaluated. In contrast, in
'tree bisection and (.Fla. 3.5>, the clipped subcladogram
46 Cladogram constme/ion, character fJolarity alld rootillg
B o
A E
c + F
/~
B c B
o
A
A E
E
c
o F F
Fig. 3.3 Example of branch-swapping by IWilrest-neighbour interchange. The chosen

branch (indicated by the shorl arrow) has a pa ir nearest-neighbour groups a l cHhel
end (A + D, C; D, E + Fl. One of these from the left end fe) is interchanged with each
of those on the right end (0 or E + F) in tum to give two new topologies. (Afl81
Swofford and Olsen 1990).
is re- rooted before it is reconnected to each branch of the remnan t clado-

gram. All possible bisections, re-rootings and reconnect ions are evaluated.
The SPR and TBR routines of PAUP cu t the cladogram into only two pieces,
wh ile prWE a nd NONA permit the c1adogram to be CUI into a maximum of
ten pieces.
The effectiveness of these branch-swapping routines in recovering the
optimum se t of cladograms increases in the order: NNI, SPR, TBR. However,
the more rigorous the branch-swapping method applied, the more compar-
isons there arc to evaluate and the lo nger is the computational time required.
For large data sets, it may not be possible (or analyses using TBR to be
completed within an acceptable lime. As with many aspects of cladistic
analysis, a trade--off must be made between confidence in having obtained tbe
global optimum and computation time. Several algorithms to speed up branch
swapping searches for most parsimonious cladograms were described by
Goloboff (1996).
However, like stepwise addition, branch-swapping suffe rs from problems of
becoming trapped in local opt im a. Unless there is an unbroken series of
rearrangements between the initial cladogram and the minimum-length
topology, eve n branch-swapping routines will no optimum .
B 0
A E
C F
j 0
:>~ F
j E
0
A
B
C
F
Fig. 3.4 Example of branch-swapping by subtree pruning and regtafting. A rooted
slIbcladogram. A + D, is clipped rrom the main cladogram then rea ttached to
another branch (leadi ng to taxon F) to give a new topology. (Aftor Swofford and
Olsen 1990).
For example. reaching the most parsimonious cladogram may require passing
through a series of rearrangements each of which is the same length as the
preceding. If the current cladogram is replaced only if the new topology is
shorter, rather than simply being of the same length , then crossing these
'p latea ux of optimali ty' will not be possible. The solution to this problem is 10
re ta in all the mosl parsimonious solutions found during a given round of
rearrangements. Equally, if reaching the globa l optimum requires rearrange-
ment of cJadograms that are longer th an the bcst fou nd so far, enlrapment in
a loca l optimum will also occur. Again , the solution is to retain more than
o nc dadogra m, bul nnw to includc suboptimal topologies as wel l, in the
eltpect:tlion that one will lead 10 Ihe.: glOba l optimum. Out even TOR will fail
to lend from lin initial to 1M optimal ~ Iad()grulll if the differences betwee n
~ he two, whi~h may' btl In in pUntle parts of the dadograrns
48 C[adogrm n COllst me/ioll, characler po/Clrify and roOfing
B o
A E
c o
A E
c o
F
Fig. :1.5 Exampll> of branch-swapping by tree bisection and reconneCllon. The
eladogram is divided into two unrooled subcladograms. One subcladogram is
reruoled (between B and A + C) then reattached to the other subcllldogram (on the
branch load ing to taxon E) to give a ne w topology. (Alter Swofford and Olsen 1990).
(Page 1993,,). Multiple cutting, as implemented in PI WE and NONA, is an

improvemen t in this rega rd, bul agajn, Cor large data sels, application of such
comprehensive options may stretch computation time beyond acceptable
limi ts.
3.2 C HARACTE R POLARITY AND ROOTING
Polarization refers to the imposition of direction onto character state change

or cha racter transformation. A character is sa id to be polarized whcn the
plesiomorphic Slate has been distinguished from the apomo rphic sta te.
Ma nually impl emented clad istic methods, such as Henn igian argumentation,
require apomorphies to be determined in advance of cladogram construction
and thus characters must be pola rized (l l)riori . Numerous me thods and
criteria for identifying th e apomorph ic Siaies of cha racters have heen pm ..
posed and classified in various way~. Howeve r, the cI."Uica'iun lu.lvanced by
Character po/aril)' and rooting 49
Nelson (1973) identifies what is perhaps the most fundamental division.
Nelson recognized as ' indirect' those methods that require information from
a source external to the study taxa. Most often , this is the prior existence of a
higher-level phylogeny including th e study taxa. Because the principle of
parsimony is fundamental to the applicat ion of oUlgroup comparison, Ne lson
considered this to be the only valid indirect method. However, the higher-level
phylogeny itself must be justified in terms of one that is even more inclusive,
and so on, leading to an infinite regress. Eventu ally, recourse must be made
to a method that is independent of pre-exist ing hypotheses of relationship in
ordcr to validate outgroup comparisons. Nelson termed such methods direct
arguments because they can be imple mented using only the information
available from members of the study grou p. O nly tb e criterion of ontogenetic
character precedence was considered by Nelson to be va lid beca use it did not
rely on a priori models of character transformation .
3.2.1 Outgroup comparison

In its simplest form, the outgroup criterion for polarity determination is
deri ned as follows.
For a given character wilh two or more stales within a group, the slate occurring in
related groups is assumed 10 be the plesiomorphic state (Watrous [lnd Whee ler 1981 :
5).
Prior to 1980, application o f outgroup comparison had been inconsistent and

confu sion wilh ingroup commonality (see below) was widespread. Watrous
and Wheeler provided initial cla rificat ion wi th the formulati on of a set of
nperational ru les fo r o utgroup comparison Ihal they termed tlte ' fun ctional
illgroup /functional outgroup' (FIG / FOG) method . First, all outgroup to Ihe
lInresolved set of ingroup taxa is designated. T hen, a character is selected and
th rough comparison with the slate occurring in the outgroup, partial resolu -
l inll of the ingroup is achieved. This initial character is chosen so as to
resolve the basal node of the ingroup. The fir sl taxon to branch from the
ingrou p lopology is then treated as the functional outgroup \0 the remain ing
ingrou p taxa (the fu nct ional ingroup), thereby perm itting fur the r resol ution
of the ingroup. This procedure is repeated until full resolution of the ingroup
h;ls hee n ach ieved. An example of the FIG / FOG method is shown in Fig.
3.11.
The FIG / FOG approach to ou tgroup comparison is adequ ate when the
Mates d isplayed in th e oll tgroup !Ire invariant. However, it cannot be applied
ea sily or ctlnsislcnt ly jf there i~ varia tion in the outgroup, and particularly if
unc o r m(Jre olltgroup hilla pnssess a state that is present also with in the
ingroup. The problem of • hettrolcnct.lus ()lIt grOllp was first addrc!;scd in
dewil oy Madl.lison tl a/. (1_11111 DOled Ih ;"tl determining the plesiomor-
hie sUllc uf It char 10 the members of the ingruup
50 Cladogram constrnc'ion, character polarity and rooting
"'''''
SA'","
SHAAK
SA,","
1( 11 Of I(2) """"
FRl3
""""
"""
"'''''
.ro
'","0
.ro
""'_
""'-
SHAAK SHAAK
SA,","
"''"'"
PER:H
""""
''''''
,...., """
,....,
Of" .ro
Fig. 3.6 Example of the FIG/ FOG method of ouI8rouP comparison. using the data in
Table 3.3, in which an ingroup of six gnathoslomes and one oulgroup, the lamprey,
arc scored for four unordered, mullistate characters. (a) The shark is established as
the first functional oulgroup because it shares stale 0 of character I wHh the
lamprey. Consequently, this state is interpreted as pleslomorphic for character 1.
However, while the remaining five ingroup taxa have been shown to fonn a
monophyletic group, it remains uncertain whether this clade. the Osteichthyes. is
supported by state 1 or state 2 of character 1. (b) Next, using the shark as fun ctional
outgroup to the Osteichthyes and ignoring the lamprey, state 1 of character 3 is
found to be apomorphic for tetrapods (frog. liza rd and bird) aud state 1 of character
4 to be apomorph ic for bony fish (salmon and perch). (c) Finally, using the bony fish
as functional outgroup to the tetrapods, state 1 of character 2 is seen to unite the
two amnioles (li7.ard and bird) into 8 monophyletic group tha t excludes the frog. (d)
The relative apomorphy of states 1 and 2 of character 1 can now be resolvod. with
state 1 supporting the monophyiy of the Osteichthyes and state 2 the monophyly of
Ihe Amniota. and the outstanding autapomorphic stales placed onto the fully
resolved cladogram.
would frequently lead only to locally parsimonious solutions. Failure to

achieve global opt imality might also result if outgroup comparison was taken
to indicate the character state present in the most recent common ancestor of
.the ingroup. These two defective procedurOi ..... ott.. .-bIaed to produce
the 'groundplan "tim.lion' method. bkb loaically
Character polarity and rootjng 51
Table 3.3 Data set to illustrate the FIG/FOG method
of ou lgroup comparison
Character
Taxon 1 2 3 4
Lamprey o3 2 2
Shark o2 0 0
Salmon 1 0 0 1
Perch 1 0 0 1
Frog 1 0 I 0
Lizard 2 1 1 0
Bird 2 1 1 0
flawed (Nixon and Carpenter 1996a), is still strongly advocated in certain

quarters. Similarly, if the state found most commonly among the outgroup
taxa is taken to be plesiomorphic, then again only a local optimum may be
found, depending upon the interrelationships of the outgroup taxa and the
distribution of the various character states among them.
Rather than estimating the condition present in the most recent common
ancestor of the ingroup (the ' ingroup node'; Fig. 3.7), Maddison et 01. argued
that it is the state at the next most distal node (the 'outgroup node'; Fig. 3.7)
that must be determined if the global optimum is to be found. This is the
node that unites the ingroup and the nrst outgroup into a monophyletic unit.
The state assigned to the outgroup node is termed 'decisive' if it can assume
only a single most parsimonious value, or 'equivocal' (or ambiguous) if more
Ihan one value may be applied. Furthermore, relationships among the out-
group taxa are assumed to be both known and fixed.
Visual inspection is sufftcient to resolve the simplest cases. For example, if
the outgroup taxa all agree in the state they possess, then this state is the
decisive assignment at the outgroup node. Conversely, if the first two oul-
group taxa differ in their states, then the assignment at the oUlgroup node is
equivocal. Additional, more basal outgroup taxa can never convert the
Hssignmcnt from equivocal to decisive, demonstrating its independence from
Ihe frcquenq of the stales among the outgroup taxa.
Howevcr, visual inspection can fail to recover the most parsimonious
o o o ~
'0
~ S.7 IIIlu""t~!'!J!!I:!I'" :~:':~:U:~~~OUt8rouP taxa are labelled

52 C/adogram cOll.wrnctioll, character polarity and rooting
o o 0 In In In In
Fig. 3.8 Example of the algorithmic approach to oulgroup comparison, applied 10

seven outgroup taxa and an unresolved sel of four ingroup taxa (In). The procedure
is similar to Wagner optimization (see §4.1.1). First, the a utgroup taxa are labelled
with their states, 0 or t (if an outgroup is pol.ymorphic, it is labelled all), Then,
beginning with pairs of oulgroup taxa and proceeding towards Ihe autgroup node
lignoring the toot), the intemal nodes are labelled a'ccording to the following rules: a
node is assigned stale 0 if its two deriva tive nodes are both labelled 0, or are 0 and
0/ 1; a node is assigned state 1 if its two deriva live nodes are both labelled 1, or are 1
and 0/1; a node is assigned stale all if Its two derivative nodes are both labelled
0/ 1, or are 0 and 1. Here. the oulgroup node is deciSively assigned state 1.
reconstruction fo r more complex examples and an algori thmic approach is

required (Fig. 3.8). The properties of this algorithm lead to furthe r rules that
permit the outgroup node state to be determined by simple inspection of the
distribution of states in the outgroup. When [Wo successive outgroup taxa
share the same state (i.e. they form a 'doublet'), if this is also the slale found
in the first outgroup, then it is decisive Cor the outgroup node (Fig. 3.9a).
Should the first doublet and the first outgroup taxon disagree, then the state
at the outgroup node is equivoca l (Fig. 3.9b). Note that all outgroup st ructure
beyond the first doublet is irrelevant to the assessment. If there are no
doublets (i.e. th e states in each successively more distal outgroup alternate),
then if the most distal outgroup agrees in state with the first outgroup, this
state is decisive at the outgroup node (Fig. 3.lOa), Otherwise, the assessment
is equivoca l (Fig. 3.lOb). Oearly, where there is no doublet. the choice of the
most distal outgroup is critical and the addition or subtraction of just one
outgroup to the base of the c1adogram would be sufficient to change an
outgroup node assessment from decisive to equivocal or vice versa, However,
such a sce nario is most unlikely in practice. U a patlern of alternating
character states were to be found in a large outgroup, the character would
most probably be rejected as having very low informat ion content.
In the systematic literature, greal, even paramount. importance is some·
times attached to the fi rst outgroup taxon, that is, the sister group of the
ingroup. If the sister group cannol be identified with con.uu)'. it 'is thought
that characters cannot be polarized and resolulioD of the p &llmpossi·
Character polarity and rooting 53
(a)
rS
00 0 ., In In In
(b)
rS=-----,
o Olnlnlnln
Fig. 3.9 Illustration of the 'first doublet rule' for a binary character. (a) If the state in
the fi rst doublet agrees with thaI In the first outgroup taxon. then this state is
ussigned decisively to the outgroup node. (bl If the slate In the first doublet
disagrees with that in the first OUlgfOUp laxon, Ihen the slale assigned to the
outgroup node is equivocal.
hie. In fact. identifying the sister group is not so important. It is true thai this
taxon plays a major role, because its state will always be assigned to the
outgroup node. either decisively or equ ivocally, but more distal outgroup taxa
Rlso exert an effect. In contrast, use of the sister group to polarize characters
is sometimes criticized if this taxon is considered to be too 'derived', that is, it
has too many autapomorphies, to make comparisons with the ingroup mean-
Ingful . However, this is no justification for ignoring it and appealing to some
mUTe distant and supposedly more ' primitive' taxon.
More important are the precondit ions of both the FIG/ FOG and algorith-
mic methods that the relationships among the outgroup taxa are both
lpecified and flXed. Probleml may arise when the interrelationships among
the outgroup taxa are pardaLlJ or even wholly unresolved. Then, uncertainty
In outgroup relationships .. tnllllltod iDto uncertainty in the state assign-
I
~t It the ou~ • DOt the problem that it may first
S4 C/adogmm construction. character polarity and rooting
(.)
(b)
)
o o Or&tntnln
fig. 3.10 Illustration of the 'alternating outgroup rule', (a) If the sta tes of the first an,
last outwoup taxa agree, tben this state is assigned decisively to the outgfOUp nod(
(b) If the slate in Ih~ first and last outgroup taxa disagree. then the slale assigned t,
the o utgroup node is equivoca l.
appear, fo r we will argue later in this chapter (§3.2.5) that character polarity
is actually a property that is derived from a cladistic analysis, rather than an a
priori condition.
3.2.2 The ontogenetic criterion

In the analysis of ontogeny, 10vtrup (1978) recognized two types of character,
based upon their developmental interactions. Epigenetic characters are
causally related among themselves, such that each stage in ontogeny is a
modification of, or is induced by, another character that developed earlier in
the ontogenetic sequence. Non-epigenetic characters are not so causally
related. Epigenetic characters are generally viewed as fundamental to and
essential for normal morphogenesis. The notochord of vertebrates is an
epigenetic character because normal vertebrate development cannot proceed
in its absence. In con trast , the presence of pigments in the feathers of a bird
is non-epigenetic because it is quite possible to have a viable bird, an albino,
that lacks aU such pigmentation. L0vtrup further distinguished between
terminal characters, which occur last in an onto,genelk: sequence, and non-
terminal characters, which are those that 0QCUt the aequence.
Character polarity and rooling 55
Original sequence: X....,.Y....,Z
Addition X....,.Y-+Z....,.D X-+Y-+E-+Z
~UhRli(Ufion X-+Y_F X-+O_7.
Deletion X~Y X~Z
Fig. 3.11 The six fundamental ways by which an ontogenetic pathway can be
modified.
Furthermore, an ontogenetic sequence can undergo three fundamental types

of modificat ion : addition, deletion and substilUtion, each of which can be
applied to hoth character types, giving a tolal of six pathways by which an
ontogeny can change (Fig. 3.11).
Historically. characters drawn from ontogeny have been applied to phylo-
genetic reconstruction in two ways. The simpiest viewpoint is that of Haeckel,
developed in the I860s, whose Biogenetic Law states simply that ontogeny
recapitulates phylogeny. Under this interpretation, the ontogenetic develop-
ment of a species passes through stages displayed in the adults of its
ancestors. Ontogeny is allowed only 10 proceed by a highly restricted form of
terminal addition, in which characters are added to the end of an already
completed ontogenetic sequence. Thus the course of phylogeny can be
discovered simply by ' reading' the ontogene tic sequence. However, this inter-
pretation of recapitulation has long been rejected as too strict and simplistic.
Nevertheless, Ihe idea Ihal phylogeny could somehow be 'read' from ontoge-
netic sequences remains.
An alternat ive viewpoint had already been proposed in the 1820s by the
German comparative embryologist, von Baer, who summarized the results of
his studies as follows.
• In development, the general characters appear before the special

characters.
• From the more general characters, the less general an d, finally, the
special characters are developed.
• During i.ls development, an organism departs more and more from the
fo rm of other organisms.
• The ea rly stages in the development of an organism are not like the adult
slages of Olher organisms lower down on Ihe scale, but are like the early
stages of those organisms.
Although von Baer (1828) did nOI frame his rules in an evolutionary
context, he did make two areal contributions towards fo rging a strong link
'Iween orderly ontogen), .... pIiJIo,enetic inference. The first was his
ccognition that we would ...., upcc:t the ontogenetic sequence of an
rga nism to pau th fDund in the adults of its ancestors.
56 C/adogrum constrnctiotl, character polan'ty and rooting
Rather, during ontogeny, two taxa would follow the same course of develop-
ment up to the point at which they dive rged into separate lineages. Both
ontogenies WQuld then be observed, in general, to have undergone one or
more independent terminal substitutions or additions, depending o n the time
that had elapsed since differentiation and the amount of subsequent change
that had laken place .
But perhaps morc important was his second rule, wh ich stated that ontoge-
netic change proceeds [rom the more to the less general. This observation
was generalized by Nelson (1978: 327) into the foll owing definition of the
ontogenetic criterion for dete rm ining character polarity.
Given an ontogenetic ch:lfacter transformation, fro m a character observed to be more

general to a character observed to be less general, the more general character is
primi tive and the less general character advanced.
The applica tion of the ontogenetic criterion can be illustrated usi ng the
example of the vertebrate endoskeleton. The endoskeleton of the adult shark
is composed of carlilnge, while Ihat of the perch is largely made out of bone.
Given on ly these observations, no decision can be made as to whether
ca rtilage or bone is apomorphic. However, a study of the ontogeny of the two
taxa shows that while a cartilaginous endoskeleton is formed in the early
embryos of both taxa, only in the shark does this state persist into the adult
animal. [n contrast, later in the on togeny of the perch, the cartilage is largely
replaced by bone, which is the state found in the adult. In other words, a
character that is observed to be more general (cartilage) has transformed into
o ne that is observed to be less general (bone), from which it can be inferred
that a cartilaginous endoskeleton is piesiomorphic and a bony endoskeleton
apomorphic.
Alberch (985) and Kluge (1985) disagreed with Ihis interpretation of the
ontogenetic process, arguing that the valid ontogenetic characters were not
the observed states but the transformational processes between those states.
Thus, in the above example, there would be only a single ontogenetic
character, the transformation of a cartilaginous skeleton into a bony one. De
Queiroz (1985) objected to observed states in an ontogenetic sequence being
used as characters in c1adogram construction, because these ' instantaneous
morphologies' were abstractions from ' real' ontogenetic transformations. In
his view, phylogeny is a sequence of never-ending life cycles and thus the
evidential basis for inrerring phylogeny should be the ontogenetic transforma-
tions themselves, rather than the features of the organisms that were being
transformed. As a result , he concluded that there could be no 'ontogenetic
method'.
Kluge (1988) considered de Quciroz's concept of treating transformation s
as cha racters both to be incomplete ilnd 10 offer no advantaae Over describins
the life cycle in terms of a model of growth and diffcrentialkm (Kluge and
Strauss 1985). Furthermore, taken to the extreme, de Queiroz's approach
would reduce the entire organism, if not the entire living world, to a single
character with an immense number of stales (transformations) and there
would be no basis for comparative biology. In practice, de Queiroz adopted a
pragmatic approach, defining 'character' as a feature of an organism ' large
enough to encompass variation that is potentially informative about the
relationships among the organisms being studied'. However, this definition is
open to the criticism of how large is large and how one is expected to
determine what is 'potentially informative ' prior to conducting an analysis.
Further controversy regarding Nelson 's generalization of the omogenetic
'Criterion concerned what exactly is meant by 'general', for which there are
two interpretations:
• strict temporal precedence, so that the more general state is that which
occurs first in ontogeny; and
• the most frequently observed, so that the general state is the commonest
state.
If the lalter definition is adopted, then Nelson's criterion is nothing more

,Ihan a special case of ingroup commonality, with all the deficiencies associ-
Ited with that method (see below). However, most c1adists would employ the
first definition , under which general is something more than si mply most
commonly observed. Although the more general characte r will be more
common, insofar as it will be possessed by all those taxa that also show the
~ess general character as well as some others that do not, this does not equate
to ingroup commonality (see §3.2.4). Frequent occurrence bears no necessary
relationship to relative time of phylogenetic appearance. A character may
Indeed be more common than another but what is important is that the less
,eneral character is nested within the distribution of the more general
I, haracter (Weston 1994). Without such an unequivocal relationship of gener·
Ilily, the more common character will not be the more general.
Furthermore, equating generality to strict temporal precedence results in
~aeCkel'S Biogenetic law. However, as already noted, this interpretation of
ontogeny is too strict because it only operates if ontogenetic change occurs by
fminal addition. This led L¢vlrup and others to assert that Nelson's
cfinition would only apply to ontogenies that were modified in this way.
luences in which characters have been added or deleted subterminally
\lId not be analysed under Nelson's definition . Should characters be substi-
utcd terminally, then informalion would be lost and outgroup comparison
uld have to be used to ...... polarity.
..
Terminal deletion hu beeD ~'*.IIo >d to pose a particular problem for
elson's criterion beaUII to IOCOndarily simplified ontogenies
.II!!II!!!I!I!!~ unmodified ontoaenlea.
58 Cladogram cOllslruction, character polarity and rooting
Paedomorphosis, or phylogenetic neoteny, in which a feature thai appears

only in the juveniles of ancestors occurs as both a juvenile and an ad ult
character in descendent taxa, results from such terminal deletion. The occur-
rence of paedomorphosis fal sifies von Bacr's law because tbe ontogeny
cannot be interpreted as going [rom the morc general to the less general and
thus Nelson's criterion is inapplicable. O f course, it is possible for any
sequence in which the more general character is retained throughout on-
togeny 10 be due to pacdomorphos is rather than the retention o f the
piesiomorphic condition. But Ihis is no more than an ontogenetic restatement
of a problem that pervades all systematics, that is, the detection of homo-
plasy. As noted in Chapler 1, errors due to the interpretation of homop l a~)' as
synapomorphy are detected by character incongruence and resolved by the
application of parsimony. In this regard, paedomorphosis is treated no
differently from any other instance of homoplasy.
For example, when we examine the ontogeny of certain salama~ders, we
observe that the juvenile cartilaginous skeleton remains largely unaltered
through to the adult stage. Using Nelson's cri terion alone, we cannot differ·
en tiate this ontogeny from the plesiomorphic sequence observed in the shark.
However, analysis of other characters shows that salamanders are deeply
embedded within the Osteichthyes, one generally acknowledged apomorphy
of which is a bony endoskeleton in the adult. Thus, congruence with other
characters shows that the cartilaginous skeleton of these salamanders is due
to paedomorphosis.
However, Weston (1988) showed that it was possible to hypothesize situa-
tions that did not conform to the requirement for terminal addition, but
which nonetheless could still be analysed using a more broadly framed direct
method of character analysis. Under Nelson's law, the direction of o ntoge-
netic transformation need no t necessarily be viewed as an indicator of the
direction of phylogenetic transformation (Weston 1994). Rather, it is a
criterion of similarity, like positional correspondence, and is thus a necessary
but not sufficient criterion for establishing primary homology. Weston sought
to remove any reference to sequence in the direct method by replacing the
concept of o ntogenetic transforma tio n with the more general concept of
homology and emphasizing the role of directly observed generality relation-
ships betwee n homologous characters. He was thus able to further ge neralize
Nelson 's law (Weston 1994: 133):
Given a distribution of two homologous characters in which one, :t, is possessed by all
o( the species that also possess its homolog, character y, and by at least one OIher
species Ihat docs not, then y may be post ulated to be apomorphous relative to x.
From this standpoint, the only va lid informatio n that can be extracted from
on togenetic transformations is the relative generality of characters. However,
de Pinna (1994) regarded the information conta!nod iD &bel order in which
characters transform one into another to constitute ontogenetic information
per se. To ignore this orderliness was to overlook the essential systematic
information derivable from ontogenies. He considered that eliminating the
relevance of ontogenetic sequence information reduced the direct method to
a form of 'common equals primitive' procedure. However, as noted above,
this would only hold jf the more general character is equated strictly with the
commonest character, a correspondence specifically rejected by Weston, who
emphasized the hierarchical nesting of characters. Jt would thus seem that de
Pinna misunderstood the concept of generality as it is applied in both
Nelson's Law and Weston's generalization.
3.2,3 Ontogenetic criterion or oulgroup comparison-which is superior?

Once the ontogenetic criterion and outgroup comparison had both been
form ally defin ed, and earlier errors in interpretation and implementation
corrected, there was a period of debate over their relative merits. One defect
of outgroup comparison was perce ived to be the requirement that the
relationships among the outgroup taxa be prespecified. In the absence of
such an hypothesis, outgroup comparison was considered difficult at best and
at worst , totally impractical. However, because the hierarchical relationship
between plesiomorphic and apomorphic characters could be obseJVed di ~
rectly, the ontogenetic criterion was considered independent of higher level
hypotheses of relationship and therefore supe rior, Proponen ts of outgroup
comparison responded by obseJVing that the ontogenetic criterion could only
be applied in cases where the ontogenetic sequence changed by terminal
addition, and thus would be misled by paectomorphosis. In order to polarize
characters in which paedomorphosis had occurred, recourse had to be made
tn outgroup comparison. This argument was answered with the observation
that pacdomorphosis was a problem for systematics in general rather than
olltogeny in particular. Nevertheless. it was counter-claimed, because the
direct observation of ontogeny could not polarize any character that could
1I0t also be polarized by outgroup comparison , and because there were
instances where the ontogenetic criterion failed bUl outgroup comparison
succeeded, then the former was merely an incomplete version of th e latter.
However, much of this debate is essentially beside the point because it
viewed the ontogenetic criterion and outgroup comparison either as compet-
ing alternatives or as essentially the same. Bmh perspectives are erroneous.
Despile their difference of opinion as to what constitutes information in
ontogenetic data, both Weston ( 994) and de Pinna (1994) agreed that the
ontogenetic criterion and outgroup comparison are complementary with
non-overlapping roles, Thus, empirical studies to determine the comparative
worth of the two methodlare futile. In itself outgroup comparison does not
polarize characters, nor • " . . . .AI for rooting cladograms (Weston 1994).
It II actually a melh GOnIlrucling the most parsimonious
60 Cladogram constrnction, character polarity and rooting
cladogram for the ingroup and locating that cladogram within the larger
scheme encompassing all living organisms. Outgroup comparison should be
trea ted as a technique for rooting these partial cladograms of study taxa, and
hence polarizing characters, on the generally reasonable assumption thai the
root of all life lies oulside the ingroup in question.
Thus, in a sense, oU lgroup comparison can be characterized as providing
on ly ' local fOOling', Global rooting of the entire tree of life would require the
application of a direct method, such as the ontogenetic criterion. However,
only if we were working at the very base of the tree of life and were fortunate
enough to discover the appropriate ontogenetic transformation, would we be
able to pola rize the entire topology. Until that time, even the ontogenetic
criterion provides only local polarity determir.ation. Thus, for all pract ical
purposes, the ontogenetic criterion is little different from outgroup com pari·
son. Both approaches are valid because both are ultimately justified by
parsimony.
3.2.4 A priori models of character state change

1ngroup commonality
In addition to the ontogenetic criterion and outgroup comparison, there are
many other criteria for determining character polarity that are based upon
models of how character stales are believed to transform one to another.
Transformational models are frequently unjustified by any theoretical or
empirical framework. For example, the criterion of ingroup commona lity is
defined as follows:
The plesiomorphic state will be more widespread within a monophyletic group than
will any onc apomorphic Slale. Therefore, the stale occurring most commonly within
the ingroup is plesiomorphic.
Put simply, common equals primitive. In this form, ingroup commonality is

ad hoc because it assumes that the evolutionary process is conservative and
that plesiomorphic characters are likely to be retained. While this may be
true for a particular character and group of taxa, it is not necessarily always
true. To distinguish belWeen these two cases, outgroup comparison is needed
and ingroup commonality becomes superfluous. Consider character 1 in
Table 3.3, in which the lamprey is regarded as the outgroup. Using ingroup
commonality, slate 1 would be interpreted as plesiomorphic, with state 0
being autapomorphic for the shark and state 2 uniting the lizard and bird.
State 0 in the lamprey wou ld be ignored as this tMon does not constitute part
of the ingroup. However, using outgroup comparison, state 0 is interpreted as
plesiomorphic and state I apomorphic, with subsequent transformation into
state 2.
There is an even more fundamental flaw in inaroup OCIIDmonality. The
basic component of cladistic analysis is the thrce-tUOft ""ment, that is.
Character polarity and rooling 61
taxa A and B are more closely related to each another than either is to a
third taxon, C. Resolution of a three-taxon statement depends upon the
grouping information (apomorphy) being present in A and B and absent in C.
However, using ingroup commonality, a three-taxon statement can never be
recognized, let alone resolved, because this requires two of the three taxa
share the apomorphic state. (ngroup commonality is thus contrary to the very
basis of cladistic analysis.
S tratigraphy
Fossil taxa arc often held to bc of paramount importance in determining
character polarity using the stratigraphic criterion or the criterion of geologi-
cal character precedence, which states:
If onc charactcr stale occurs only in older fossils and another state only in younger
fossi ls, then the former is the plesiomorphic and the latter the apomorphie state of
that character.
The valid ity of th e general concept is not in doubt; plesiomorphy must

precede apomorphy in time. Howeve r, equating the condition observed in the
oldest fossil with the plesiomorphic condition is fraught with difficulties. We
are required to assume that the available sequence of fossils is a true and
accurate record. However, this is rarely demonstrably so. Furthe rmo re, there
is no reason to assume that just because one fossil is older than another that
all its characters are therefore plesiomorphic, There are many exam ples of
very old fossil taxa with many apomorphic features . Many soft -bodied groups
of orga nisms had very early origins bUI left no fossil evidence. In contrast,
many hard-bodied organisms that arose later left excellent fossil records from
th e lime they appeared. Strict applicat ion of the stratigraphic criterion would
lead to incorrect conclusions in such circumstances.
At present the stratigraphic criterion has few strict adheren ts. Using it to
establish character polarity is always suspect, for any inconsistent or un-
wanted results can be explained away by invoking, ad hoc, the incompleteness
(If the record , Stratigraphy may be useful , however, in allowing us to choose
among multiple, equally most parsimonious cladograms, on the basis of
concordance with lhe fossil record.
Biogeography
Several cri teria have been proposed that use biogeographical information to
polarize characters. The most widely known is the 'criterion of chorological
progression ' (or ' progression rule', Hennig 1966). which postulates thai the
most derived species will be that found furthe st geographically or ecologically
(ro m the ancestral spedet. However, as a method for inferring character
polarity, the progresaioll rule ...,. . .raJ deficiencies, In particular, even if
I.Uopatrk speciation .... _ umed that the character states
62 C/adogra m cOlIstmc(ioll, character polarity and rooting
found only in the periphera l populations are apomo rphic. Vicariance bio-
geography is neutral regarding character polarity. FUrihermore, evidence is
required of the historical distribution or ecologica l requirements of the
ancestral species. However. given that ancestral taxa cannot be unequivocally
recognized, such evidence will not be fort hcoming. Such arguments invalidate
the progression rule as a means of inferring characte r polarity. Most cladists
would choose to use cladograms to test biogeographical hypotheses. If such
tests are to be independen t, then c\adograms must be constructed without
recourse to biogeographical information.
Function / adaptive value

The hypothesized functional o r adaptive value of a character is also fre-
quently held to be of fund amental importance in polarity determination.
Functional arguments are usually couched in terms of niche restriction or
specializations that help an organ ism our..;:ompete its relatives. But many
authors conflate functional value with selective value. Such usage is fraught
with difficulties, not the least of which is how the nature of the selective
forces acting on a character are to be measured. In fact, most studies that
purport to use function to polarize characters do nothing of the sort (Lauder
1990). They simply present morphological data, then infer function, not
measure it. The co rrect interpretation o f funct ional data requires that the use
of structural characters by the organism be directly observed and quant ified.
But such functional characters have no special properties that single them
ou l as int rinsically superior. Functional characters can be admitted in to a
cladist ic analysis but fun ct ional considerations should not be used a priori to
determine polarity.
Underlying synopomorphy
Perhaps the most idiosyncratic model of character change is lhat of unde rly-
ing synapomorphy. Championed by Ole Saether, underlying synapontorphy is
defined as 'close parallelism as a result of common inherited genetic factors
causing incomplete synapomorphy' or ' the inherited potential to develop
parallel simi larities', although this potential may not be realized in all
descendants. Consequently, it refers to the occurrence of synapomorphies in
only some members of a put ative monophyletic group. Consider a highly
simplified cladogram of the Bilateria (Fig, 3. 12). Haemoglobin is known to
occur in only three of the many lineages: Tubifex worms, some chirono mid
midges and vertebrates. The application of standard optimization methods
(see below) would lead us to concl ude that haemoglobin had been indepen-
dently derived th ree times. However, the amino-acid sequences in the three
groups are all very similar and it wou ld seem most unlikely that such a
complex molecule could have arisen de 1I0VO o n three separate occasions.
Underlying synapomorphy would assert that we are miltaken when we code
the other taxa as lacking haemoglobin. These lM& an the relevant
CharaCler polarity and rooting 63
BlLATERIA
OIIroncmldae
othe< insect. V.,.telll1l!.
Fig. 3.12 Underlying synapomorphy asserts that the presence of haemoglobin (ind i-
ca tod by 1) in TubifeK annelids, certain chlronomid midges Rnd vertebra tes implies
support for the monophyiy of the entire Bilateria (1 -). Tha Iltck of observable
haemoglobin in all other Bilateria is not taken as evidence against this hypothosis
for it is argued that these taxa possess the unexpressed capacity to develop this
molecule.
genes, but the genes are switched oCf because it is simply not selective ly
advantageous fo r them to be ",ctive. It is this unexpressed capacity to develop
a fca ture that is the underlying synapomorphy. Thus, in this example, the
potential 'capacity' (expressed or nOll to manufacture haemoglobin can be
used 10 unite all the IUxa in Fig. 3.1 2 into a mono phyletic gro up.
Saether argued that the use of underlying synapomorphics had been
ildvocated by Hennig ( 966), who called them homoioiogies. However,
Ilcn nig considered homoiology to be eq uivalent to conve rgence, which he
specifically rejected as a valid tool for estimating cl adistic rela tionships. It
may indeed be true th at the haemoglobin gene occurs in all taxa in Fig. 3. 12.
Howcver, the monophyly of the Bilateria cannot bc supported by only
IIcalte red occurrences of an expressed gene, for Ihis is using the absence of
~S
Inr()fm aiio n (observations, characters) as evidence fo r groups. By invoking ad
lux: hypotheses to explain away con nict in the data as ' unexpressed', unde rly-
Ing synapomorphies provide a licence to group in any way whatsoever. Whi le
It is perfectly acceptable to use the occurrence of haemoglobin in TUb/fex
rms. chironomid mid," and vertebrates as evidence of the monophyly of
each gro up separately, tbIrt dillribulion of Ihis character does not provide
denlial IUpport for lho of &be Bilateria as a whole. It may
64 Cladogram constmctiorl, character polarity and rooting
eventually be possible to delect an unexpressed gene directly using nucleotide

sequencing or another molecular technique. But if we found such an inactive
gene, the n it would be coded as peesclll and would no longer be an
underlying synapomorphy.
3,2.5 Polarity and rooting a posterior;

Cri teria fOf determining character pOlarity a priori are pari of a procedure
that can be characterized as the transformational approach to cladistics
(Eldredge 1979), This procedure, which corresponds closely to Hennig's
concept of phylogenetic systematics, is performed in two successive stages.
First, the different states recognized for each character are organized into
transformation series, which are then polarized using either the ontogenetic
criterion or ou tgroup comparison. Then the synapomorphies so revea led are
used to construct a cladogram. ' •.
With reference to outgroup comparison, the .transform at ional approach has
been characterized as 'constrained, two-step analysis'. The term constrained
derives from the requirement of the algorithm of Maddison et al. (1984) that
the outgroup must be resolved a priori and those relationships then held fixed
during polarity determination and cladogram construction. However, con-
strained analysis requires that two a priori assumptions be made. The first is
that the ingroup taxa form a monophyletic group, which in turn , implies that
the root must be basa l to the ingroup and not within it. The second
assumption concerns the outgroup structure, which implies fixed hypot heses
of monophyly, both among the outgroup taxa and with respect to the ingroup.
However, these immutable patterns of relationship are not open to indepen-
dent testing and. in particular, none of the outgroup taxa is permitted to be
part of the ingroup.
Problems arise when there is parallel homoplasy between one or more
Fig. 3.13 OUlgroup constrained and si multaneous, unconstrained analysis of three

ingroup taxa (A - C) and three outgroup taxa (D-F). (a) With outgroup relationships
predetennined and constrained as shown. character 5 is interpreted as syna pomor-
phic for group A-E, but with secondary loss in taxon A + B. The total length of the
cladogram Is six steps. (b) Two more characters, 6 and 7, are added that have the
same distribution among taxa as cbaracter 5. Because the outgroup relationships are
fixed, then these are also fo rced to show secondary loss in taxon A + B. The
dadogram now has ten sleps. (e) However, if the constrai nt on outgroup re lation-
ships is removed and all the taxa analysed Simultaneously, thon three shorter
dadogrHlIIs of nine steps are found in which 1110 original insrollp is nol mono-
phylel.ic. It can also be noted that the c1adogram In which taxa C. D and E form a
trichotomy 18 both Iho 'Irlclly supported cladogram (100 14.2) and also the st rict
COIUMtJlIU8 tme of the throe cJlldo8!amal'" 17.41
Character p%rity and rooting 65
members of the outgroup and a subset of the ingroup. For example, consider
a cJadogram (Fig. 3.13a) with three ingroup taxa (A, B and C) and three
outgroup taxa (D, E and P), the relationships among which are determined by
five characters, 1-5. Four of these characters are unique and unreversed
apomorphies but character 5, while offering some support to clade A- E, is
secondarily losl in taxon A + B. The cladogram is thus six steps long.
Subsequently, if two new characters, 6 and 7, are found that have the same
A
• c o E F
( a)
A
• c o E F
(b)
A
• c o E F A
• c o E F
(c)
A
• c o E F
66 C/adogram cOlIstrnction, character polarity and rooting
distribution among the taxa as character 5, then because the outgroup

relationships are fixed, a utgroup comparison must yield the same topology as
before (Fig. 3.13b), now with ten steps. However, if all seven characters are
analysed without topological constraints, then three equally parsimonious
c1adograms result (Fig. 3. 13c) (see §4.2 for the reasons why there should on ly
be a single solution). These c1adograms aTe each o ne step shorler than that in
Fig. 3. 13b, but more importantly they also have a markedly different topology
and one in which the assumption of ingroup monophyly is found 10 be
incorrect. This lack of global parsimony in the presence of homoplasy is due
to the requirement that the outgroup structure be known a priQri and then
held invariant.
The procedure by which the cladograms in Fig. 3. 13c were obtained is
termed 'simultaneous, unconstrai ned analysis'; simultaneous because both
outgroup and ingroup taxa arc analysed together, unconstrained because the
outgroup relationships are unspecified priOf to analysis. This approach can
never yield less parsimonious cladograms than a two-step, constrained analy-
sis and will orten give a more parsimonious result. Nixon and Carpenter
(1993) stated that in order for a simultaneous, unconstrained analysis to be
fully effective, characters that are informative with respect to outgroup taxa
relationships must also be included, even if these cha racters are invariant
within the ingroup. However, this is unnecessary. In a given analysis, we are
concerned only with resolving the relationships among the ingroup taxa. The
interre lat ions hips of the olltgroup taxa are quite irrelevant. If we are inter-
ested in this more inclusive question, then a separate data set should be
formulated that comprises characters appropriate to that problem. If we do
exclude from the data set all characters relevant to resolving the outgroup,
then we must not draw any conclusions regarding outgroup relationships from
whatever resolution may result . This is simply because all such resolutions
must derive from homoplasy and such evidence cannot form the basis for
inferring cladistic relationships.
Ultimately, simultaneous, unconstrained analysis dispenses with the need
to assign a priori polarity to cha racters altogether. Data are simply collected
for all taxa, ingroup and outgroup alike, and ar.alysed in a single matrix. The
resulting cladogram is then rootcd between the ingroup and outgroup and it
is only at this point in the analysis that character polarity is determined
(Nixon and Carpenter 1993). Under the transformational approach to cladis-
tics, polarity is a property of a character that must be determined prior to
cladogram construction. However, it can now be secn that polarity is actually
something that is inferred from a cladogram. This perspective is part of the
' taxk approach' to cladistics (Eld redge 1979), in wh ich only the distributions
of characters among taxa are used to hypothesize group membership. All
other prope rties of both characters and groups are derived from the resultant
cladogram. To devotees of the transformational approach, such concepts are
anathema . However, as stated earlJer in Ibit chapler, tbe majority of the
computer algorithms used to estimate most parsimonious cladograms actually

generate unrooted networks. No account is taken of any a priori polarity
decisions. ft is only when these networks are output as cladograms that they
are rooted, and usually this is done by placing the root at the outgroup node.
Thus, whether by conscious decision or not, most cladists use the taxic
approach and carry out simultaneous, unconstrained analyses.
Several other techniques for rooting cladograms have been suggested, but
all have major defects. The most frequently encountered alternative method
uses an artificial taxon for which each character coded with the putative
plesiomorphic state. However, two distinct concepts are sometimes conOatcd
in this approach and we must be careful to distinguish betwecn them. The
first concept treats the artificial taxon as a ' hypothetical ancestor' or
'groundplan' (recentJy re-invented as 'compartmentalization' by Mishler 1994).
However, as we have already seen, devising sucr 11 groundplan solely with
reference to the ingroup is equivalent to estimating the conditions at the
ingroup node, which may lead to a solution that is only locally optimal.
Alternatively,' if the algorithmic method of Maddison et al. (t 984) is correctly
applied, then the outgroup node with its optimized states can bc interpreted
as a composite, all-plesiomorphic outgroup. Such a taxon usually has all the
characters coded as zero and is thus referred to as an 'all-zero outgroup'. But
real outgroup taxa are necessary prerequisites to the formulation of such an
all-plesiomorphic, hypothetical outgroup. If these real outgroup taxa display
no character heteroge neity, then they are perforce equivalent to an artificial
all-plesiomorphic outgroup. However, if there is heterogeneity among the
Ilutgroup taxa, the zero state cannot be assumed a prion· to be the plcsiomor-
phic state for the ingroup. Consequently, it is more efficient simply to code
real outgroup taxa and employ them directly in a simultaneous, uncon-
strained analysis (Nixon and Carpenter 1996a).
When a set of heterogeneous outgroup taxa is used, il is frequently found
thai substitution of one of these taxa by another markedly alters the ingroup
tl)pology. Various strategies of outgroup selection to minimize such unwanled
effects were discussed by Smith (l994b).
Another method that attempts to circumvent this perceived pernicious
Crrecl is Lundberg rooting. First, the most parsimonious network for the
ingroup alone is determined, with no a priori assumptions regarding polarity.
Then, keeping this topology fixed, an all-zero out group or hypothetical
IIllceSlOr is allached 10 the branch that gives the least increase in length of
the overall cladogram. However, Lundberg rooting suffers from all of the
defects associated with all-zero artificial taxa discussed above. It has been
luggested that a real oUlgroup should be used instead but the result would be
most unlikely to be globally most parsimonious. Again, inclusion of the
outgroup in a simult~ ~~ined analysis would be preferable
(Nixon and Carpenter 1993)
In the unlikely KCC outgroup known Illat can be
68 C/adogram construction, character polarity and rootillg
conside red close enough for meaningful comparisons of characters to be
made, and no o ntogenet ic information is available, then midpoint rooling has
been suggested (Farris 1972). In this method, the rool is placed at the
midpoint of the longest path connecting two taxa in the network. Thus,
besides the synapomorphies themselves, midpoint rooling also considers the
amount of difference between taxa. Midpoint rooling can only be successfu l if
the two most divergent taxa in the network have the same rates of evolution·
ary change, that is, a constant evolutionary clock must be assumed. This
assumption, however, has both serious theoretical and empirical difficulties
and thus midpoint rooting should be avoided.
De Pinna ( 1996) considered that rooting should use an optimality criterion,
the only biologically defe nsible one of which is information pertaining to
ontogene tic cha racter transfo rmation. Application of 'ontogenetic rooting'
(de Pinna 1994) would thus yield a cladogram that was not only the most
inform ative with regard to characters, bu~ wou ld also maximize the informa·
tion avai lable from observed ontogenetic transformations. We might indeed
be fortunate enough to observe an ontogenetic transformatio n that would
enable us to polarize the entire ingroup cladogram. However, it is much more
likely, especially given our still fragmentary knowledge of most ontogenetic
transformations, that ontogenetic rooting would allow us only 10 place an
upper bound on the position of the root. This root, being most likely within
the ingroup, would be considered by most systematists to be inferior to one
placed between the outgroup and ingroup.
Thus, given that all ot her methods for root ing cladograms are either
deficient or incomplete, we advocate simultaneous, unconstrained ana lysis,
followed by rooting between the ingroup and Oulgroup, as the theoretically
and empirically most defensible approach to cladogram construction (N ixon
and Carpenter 1993).
3.3 C HAPTER SUMMARY
I. Exact methods of cladogram estimation guarantee discovery of the most

parsimon ious dadograms. Howeve r, they are time-consuming to imple·
ment and generally should not be considered for problems involving more
than 25 taxa. The most widely implemented exact method is branch-and-
bound .
2. For data sets o f more than 25 taxa, heuristic (hill-climhing) methods are.
used. wh ich sacrifice the ce rtainly of finding the most parsimonious
dadograms for computational speed. In order to improve the chances of,
finding the globa lly optimal solution , va rio us algorithmic devices can be
employed, induding difrere nt addition seque nce" and bnlm~ h swapping
routines.
Chapter summary '69;
3. Of the various criteria proposed 10 determine character polarity, only the

ontogenetic criterion and outgroup comparison have both a valid theoreti-
cal basis and wide applicability.
4. Cladistic analysis by means of the two-stage, constrained approach, which
includes a priori determination of character polarity can lead to subopti-
mal results.
5. Simultaneous, unconstrained analysis, in which character polarity is de-
rived from a cladogram, will never yield a less parsimonious cladogram
than a two-step, constrained analysis and will often give a more parsimo-
nious resu lt. 11 is thus advocated as the preferred method of cladllil
analysis.
4.
Optimization and the effects of
missing values
4.1 OPTIMAL ITY CRITER IA AND CHARACTER

OPTIM IZA TlON
Once the set of most parsimonious cladograms has been found , c1adists
generally wish to test hypotheses of character transformation. The first stage
of th is process is character optimization, which is effected by minimizi ng a
quanti t), te rmed an opt im ality criterion. In Chapter 2, we introduced the
concepts of addit ive (ordered) and non-additive (unordered) characters. These
character types are equivalent, respectively, to the two basic optimality
crite ria, Wagner aDd Fitch opt imization. However, there are innumerable
other ways in wh ich characters may be constrained to change. We now
expla in how the most commonly encountered optimality criteria are imple-
men ted to give what is termed a most parsimonious reconstruction (MPR) of
character ch,mge. It should be noted, however, that the opt imality critcrion
applied to each character is actually decided prior to c1adogram construction.
Furthermore, because each optimality criterion implies different costs, mea-
sured as number of steps, these choices will exert a large innuence upon the
length, and hence the topologies, of the most parsimonious cladograms for
the data. Thus the reasons for choosing particu lar opl imalily criteria should
be clearly explained and justified.
4.1.1 Wagner opHmization

Wagner optimization, which was form alized by Farris (970), is so named
because it is based upon the work of Wagner (1961). It is one of the two
simplest opt imali ly criteria, imposing minima l const raints upon permitted
character state changes. Free reversihility of characters is allowed. For binary
characters, th is meaDS that a change from 0 ___ I is equally as probable as a
change from I -+ O. Simi larly for a multistatc character, a change fro m I -+ 2
is equally probable as a change from 2 --t I. However. in Wagner optimiza-
tion, for a multistate chllTactc r to chllnge from 0 to 2, it is necessary to pass
' through ' I, and such a transformat ion will add two steps (i.e. 0 - I, I - . 2) to
the length of the cladogram. Wagner optimization. Iherdorc. deals with
additive or ordered chamctcrs, A consequence o( lh ilil is that tho
Optima/jty en',eria and character optjmization 71
number of character stale changes, and thus the length of the cladogram, is
independent of the position of the roo t. An unrooted cladogram evaluated
using Wagner optimization can be rooted at any point without changing its
length.
In order to determine the minimum number of changes for a character
using Wagner optimization, only a single pass through the cladogram is
required, beginning with the terminal tax.a and proceeding to the root.
Consider an unrooted cladogram (Fig. 4.1a), in which six taxa (A- F) show
four of five states of a multistate character (states 0, 1,2, 4). First, one o f the
terminal taxa (A) is chosen arbitrarily as the roOI (Fig. 4.1 b), although in
practice, an outgroup taxon would usually fulfil Ihis role. Optimization begins
by choosing pairs of terminal taxa. The slate(s) (termed the 'state set')
assigned to the internal node that unites them is then calculated as the
intersection of the state sets of the two derivative nodes. If the intersection is
empty, then the smallest closed set that contains an element from each of the
derivative state sets is assigned. For example, consider taxa E and F, linked by
internal node z. The intersection of their state sets, (2) and (4) respectively, is
indeed empty and thus lhe smallest closed set, (2- 4), is assigned to z, and a
value of 2 (I.e. 4 - 2) is added to the cladogram length. Similarly, the
intersection of (he state sets of tax.a C and D is also empty. Thus the state set
(1 - 2) is assigned to their internal node y, with I being added to the length. In
contrast, the intersection of the state sets of nodes y and z is not empty, as
each contains the value 2. This value is assigned to their internal node x and
nu increment is made to the length. We proceed in this way towa rds the root
until all internal nodes have been assigned state sets (Fig. 4.1c).
When this process is complete, the length of the cladogram will have been
calculated, which for Fig. 4.l c is 5. However, it can be seen that this method
does not necessarily assign states unambiguously to the internal nodes; that
is. it does not produce a most parsimonious reconstruction (MPR). For
ex ampl e, we are uncertain whether node y should be assigned a value of 1 or
2. In order to produce an MPR, a second pass through the cladogram must be
performed, this time starting from the root and visiting each internal node in
turn . If the state set of an internal node is ambiguous, then we assign the
Nlate that is closest to the state found in th e inte rnal node of which it is a
derivative. For example, nodes y and z are both assigned a value of 2 because
this is the va lue in both their state sets that is closest to the value assigned to
n(lde x. Notice al so that two changes (2 -+ 3 and 3 -+ 4) must be assumed to
have occurred between node z and taxon F. Once this process has been
I:ompleted, then the MPR has been found and all five steps required by the
I:haracter are accounted for (Fig. 4.ld).
It should be noted that this procedure (Farris 1970) will give a unique MPR
only when all characters .... frII ot homoplasy. In the presence of homoplasy,
lure than one MPR maJ eaiIL For eumple , Fig. 4.le, in which nodes x and y
are Hligned stale 1 rather bas DYe steps. However, there is a
72 Optimization and rile effects of missing values
('
c
, (2,
D
( a)
(1lA E (2 )
(O)B F (4 )
(0'
B
(2)
E
(4 '
F
('
C
, (2'
D
(0'
B
(2,
E
,4,
F
('
C
, (2'
D
( 1-2)
(2'
( b) ( 0)
A A (1)
(0,
B
(2'
E
(4'
F
('
C
, (2'
D
(0'
B
(2'
E
(4 '
F
('
C
, (2)
D
( d)
" ,
A (1J A (1 )
Fig. 4.1 Uetenni nBtion of character length using Wagner optimization (add itive or
ordered characters), (a) Unrooted cladogram for six taxa and considerillg one
multista te character (slales 0- 4). (h) The unrooled cladogram is arbitrarily rooled at
laxon A (c) Sla les assigned 10 internal nodes by passing from the term ina l taxa to
the root. (d ) Alternative states for lhe Internal nodes resolved by passing from the
root to the tenllinals. (e) An alternative equally parsimonious resolution. The points
at which character changes must be assumed are dttlloted by black bars. See text for
delaiJs.
difference in the behaviour o f the character changes. In Fig. 4.1d. the change
from I - 2 was placed on to the c1adogram at the closest possible position to
the root, with the result that the occurrence of state 1 In taxon C must be
accounted for by a reversal (2 ..... 1), This is knowa • or ' fast' ·...-rated·
Optimality cn'teria Qnd character optimizatioll 73
transformation (ACCTRAN in the PAUP program), because, when viewed

from the root, 'forward ' changes (e.g. 0 .... 1 or 7 - f 8) are placed o n the
c1adogram as soon as possible. Accelerated transformation favours the acqui-
si tion of fI character, with subsequent homoplasy accounted for by reversal. In
contrast, Fig. 4.1e suggests thai state 2 has been independently acquired
twice, once between nodes x and z and once in taxon D. This is 'delayed' or
'slow' transformation (DELTRAN in the PAUP program), because it at -
tempts to place forward changes on to a c1adogram as far as possible from the
root. Delayed transformation favours independent gains of a state rather than
acquisition and reversal. It should be noted that under accelerated t ransfor-
mation (Fig. 4.1d), state 2 is still inte rpreted as an homology (albeit with
subsequent reversal in taxon e), but under delayed transformation (Fig. 4.1e),
state 2 in taxa E and F and state 2 in taxon D are considered to be two
separate homologies and our origi nal hypothesis of primary homology is
refuted. The implications of and the choice between accelerated and delayed
tran sformation are discussed furth er below (see §4.2).
4.1.2 Fitch optimization

Filch optimization (Fitch 1971) works in the same way as Wagner optimiza-
tion but concerns noo-additively coded (unordered) characters. Once again
free reversibi lity is allowed but for multistate characters there is equal cost,
measured as steps, in transforming anyone state into any other. Thereforc,
the changes 0 --t 1, 0 -+ 4 and 2 - f 0 all add a single step to the length of a
cladogram.
Fitch optimization is implemented in a similar way to Wagner optimization
but with two important differences. First, the state set assigned to the internal
lIodc is calculated as the uniOfI o f the derivative state sets and the c1adogram
length is increased by o ne when the intersection of the two derivative state
sets is empty. Second, when calculating an MPR, the slate set assigned to an
internal node x is thaI of the next more inclusive node, if this va lue is
incl uded within the state sel of node x. Otherwise, any value in the state set
of node x is chosen arbitrarily.
Fitch optimization is illustrated in Fig. 4.2, using the same initial cladogram
as before (Fig. 4.2a). The c1adogram is again arbitrarily rooted using taxon A
(Fig. 4.2b) and the state sets assigned to the internal nodes during the first
pass are z (2,4), y (I,2), x (2) and w (0,2) (Fig. 4.2d. The second pass up the
c\adogram then assigns the unambiguous state 2 to each of nodes w, y and z
(Fig. 4.2d). Note that the assignment of Slate 2 to node w is arbitrary because
the state found in taxon A (1) is not present in the state set for node w (0,2).
Furthermore, because every .tate is only a single step from all other states,
the branches linking nodi w lID talOn B and node z to taxon F are each on ly
one step long. ThUi Filth produces an MPR that is only four
.tepllon•.
74 Optimization Qlld tire effects of missing values
( 1) (2 )
( a) C o
(1) A E (2)
(OJ B F( 4)
(0) (2) (4) ( 1) (2) (0) (2) (4 ) ( 1) (2)

B E FeD B E FeD
(1. 2)
( b)
A A (1)
(0) (2) (4) ( 1) (2) (0) (2) (4 ) (1) (2)

B E FeD B E FeD
(2)
(1)
( e)
A (1) A (l)
Fig. 4.2 Dtllermiuation of character length using Fitch optimization (non-addit ive or
unordered choracters). (a) Uncooted cJadogram for six taxa and considering one
muhistate charocter (slales 0 - 4). (b) The unTOated cladogram is arbitrarily rooted at
taxon A (cl Sla ies assigned to in ternal nodes by passing from the tenninal lax8 to
the root (d) Alternative states for the internal nodes resolved by passing fro m the
root 10 the Itlmlina ls. (e) An alternative eq ually parsimonious resolution. The points
at which character changes must be assumed are denoted by black bars. See texi for.
details.
Fitch optimization, like Wagner opt imizatio n, does not necessarily prod uce
a unique MPR. An alternative MPR is shown in Fig. 4.2e, where state 1 i
assigned to node w. It is nOl readily apparent by examlnma the original stat
set (0,2) that state I is a possible unique assiamncnl fur nuda w. To discovd
all possible MPRs, a second plU chrouah Ih' cl ...xUJlfy.
Optimality criteria and character optimization 75
In comparing these two basic methods of optimization, we would nOle that

ad ditive characters generally require more steps to be added to cladogram
length than do non-additive characters, leading to more possible MPRs for
the laller method.
4.1.3 00110 optimization

In both Wagner and Filch optimization, character states are allowed free
reversibility. However, there are situations in which character states may be
constrained in such a way that certain transformations are considered either
highly unlikely or impossible. 00110 optimization was introduced in order to
accommodate evolutionary scenarios in which it was considered most plausi·
ble a priori that each apomorphic state could only have arisen once and that
all homoplasy must be accounted for by secondary loss. For example, in
morphological studies, it may be thought that complex structures, such as the
vertebrate eye, could only have arisen once. Similarly, in the molecular field,
empirical studies have suggested that in analysis of restriction endonuclease
cleavage map data for mitochondrial DNA, there is a marked asymmetry
be tween the low probability of gaining a new site and the high probability of
losing sites (DeBry and Slade 1985).
00110 optimization requires that character polarity is prespecified. An
example is shown in Fig. 4.3. ]n order to simplify the procedure, the
cladogram is first (re-)rooted using one of the terminal taxa with the most
derived state (Fig. 4.3b). State sets arc assigned to the internal nodes and the
length of the dadogram calculated as follows.
• If the state sets of two derivative nodes are equal, then this value is
assigned to the internal node connecting them and dadogram length IS
not increased.
• If the slate sets are different, then the higher value is assigned to the
internal node, and cJadogram length is increased by the difference be·
Iween the two derived state sets.
• When the basal internal node is reached, its state set is compared with
thaI of the root. If they differ, cJadogram length is increased by the
difference; otherwise no action is taken.
When applied to the cladogram in Fig. 4.3b, 00110 optimization produces

1m MPR of four steps (Fig. 4.3c). This exercise is performed using a root, but
lhese rules have been modified to a generalized unrooted model (Swofford
end Olsen 1990). Under thil model it is nol necessary to specify the derived
· ate, but the polarity wW be determined by the state at the root.
The of DoUo II that if the assumption regarding
" falae, then levels of homoplasy
.lIIulllwl!!!!!W1l' . If we consider the
76 Optimization and the effects of missing 11(i/li es
(a) (0) E (1) F
(1) B G (l)
(2 )A H (l)
(0 ) C
")
B
")
C
")
0
")
E
(1)
F
")
G
(1)
H
(1)
B
"C ) ")
0
")
E
" ) (1)
F G
(1)
H
, • (t)
")
A (2) A (2)
.' ig. 4.3 Determination of character length under 00110 optimization. (0) unroated
cJadogram for eight ta xa and a character with three stales (0, 1, 2). (bl The
cladogra m is rooled al taxon A (ill this case Ibis taxon shows the most derived
slate). (e) Assignment of character stales to internal nodes. See lexl for details.
cladogram in Fig. 4.4a, 00110 optimization requires seven steps: a s ingle

origin of state I fo llowed by 6 reversals to state O. However, if the transforma·
tio n costs aTe really equal, then Wagner o r Fitch optimization ought to be
applied, giving a cladogram on which only two convergen t developments of
o 000000 0 1000000
Fig. 4.4 Comparison of (:hnraelar IUII81h usin8 Dollo Inlt! FII(:h uptlmlza tion. (e) ,
cladoBram requiring 56Vtln !!Itlps IIHhl!l Dnllo opllmiuUon (on••• In Imel !lix re ve
!luiH). (h) ThtillMme chl'll.;ht, ntqUIWH olily Iwg "'PI tcb JhJd~... II{)Il.
Optimality criteria (/11£1 character optimization 77
state 1 need be post ulated (Fig. 4.4b). 0 0110 o ptimization overestimates the
length by fi ve steps. The on ly means of avoiding this problem is to implement
a ' relaxed' 00110 method, whereby o ne might prefer nne gain and two losses
to two independe nt gains, but reject o ne gai n and Icn losses in favour of two
independent gains, A method for implemen ting such assumpUons is d iscussed
under 'generalized optimization' below.
4.1.4 Camin-Sokal optimization
Cam in-Sokal optimization (Camin and Sokol 1965) constrains character

transform ations sllch Ihal o nce a state h:1S been acquired it may never be lost.
Thus, any homopl asy must be accounted fo r by mulliple origin . The method
for calculating the number of steps on the ciadogram (Fig. 4.5) is very sim ilar
(2) (1) (0 ) (0) ( 1 ) (1 ) ( 1)
ABDEFGH
(a)
C(O)
(2) (1) (0) (0) (1) ( 1) (1 )

ABDE FGH
( b)
C(O)
f'iK. 4,5 I)tllurm ino!lo ll of ChiNe'" und ur Cum in - Sokil l o ptlm;7.lI tion. (II)
t;lot!ogrilm lalUl and I Ihfftt! 1I1nlllS (0, 1, ,), II lIIus t h I! ruotecl
with" tllXOIi 'I;«wlln.~:.:=~ Italtt. (h). S tnl tlll nH~I!(n U(1 10 nudU5
I hIlt!.
78 Optimization and the effeclS of missing values
to thai for 00110 optimization. Camin-Sokal optimization applies only to

rooled cladograms in which the root bears the plesiomorphic state. If the rOOI
is not the plesiomorphic state, then dum~cler pola rity must be reinterpreted
to make it so. Stale sets are assigned to the internal nodes using the following
rules.
• If the state sets of the two derivative nodes are equal, then this value is
assigned to the connecting node and c1adogram length is not increased .
• If the slate sets are different, then the lower value is assigned 10 the
l,;ullllccting node and cladogram length is inc reased by the difference
between the two derived Slates.
• When the basal internal node is reached, its state set is compared with
that of the rool. If they differ, cladogram length is increased by the
difference. Otherwise no action is t a ken~
When applied to the cladogram in Fig. 4.5a, Camin- Sokal opl'imizat ion
produces an MPR of four sleps (Fig. 4.5b).
Un like the other optimization procedu res described so far, Ca min- Sokal
opt imizat ion is very rarely used. It is highly un li kely that evolu tionary scenar-
ios would include the assumption that a feature may arise more than once
but never be lost.
4.1.5 Generalized optimization

All of the optimization procedures described above can he treated as special
cases of a generalized method of optimizarion. UDder generalized optimiza-
lion ('generalized parsimony' of Swofford and Olsen 1990), a 'cost' is assigned
to each transformation between states (this concept was introduced briefly
earlier in §2.4.6). The costs arc represented as a square matrix, the elements
of wh ich represent the increase in ciadogram length associated with the
transformation of o ne state into another (Sankoff and Rousseau, 1975). Cost
matrices for Wagner, Fitch, 00110 and Camin- Sokal optimizations are show n
in Table 4.1. For Wagner optimiza tion, it ca n be seen that the cost of
transforming states through the series is cu mulative, whereas for Filch
optimiza tion, the cost of transformin g bctween any two states is I. In 00110
optimization, M represen ts an arbitrarily large numbcr that guaran tees a
single forward transformat ion only on the cladogram. The infinite cost of
reversals in th e Ca min-Sakal matrix prevents such transformations from
occurring.
The advan tage of genera lized optimization is thai it allows flexibility in
permitted transformations that may not be o therwise available. For example,
in nucleotide sequence dala, transversions could be lllianod different costs
from transitions (see Chapter 5), Nor need the COIL 'Dunclrical. To
Missing values 79
Table 4.1 Generalized parsimony. State x state cost matrices for four typos of
parsimony. Under the 00110 option, an arbitrary high value. M, is applied so that
each gain of a character occurs only once on a c1adogram. Under the Camin- Sokal
option, reversals are prohibited by applying a value 01 infinity
Wagn~r Fitch 00110 Camin- Sokal
o 1 2 3 o 1 2 3 o 2 , o 1 2 3
o 1 2 3 M 2M 3M 2 3
1 1 - 1 2 1 1 M 2M ~ 2
2 2 1 1 1 1 1 2 M oooo 1
3 3 2 1 1 1 3 2 1 0000 00_
implement ' relaxed' 00110 optimization, a suitable value of M is chosen so

that the cost of a forward transformation is greater than that for a reversal
but does not preclude multiple gains allogether. For example, if M is set 10 J,
then the upper triangle of the 00110 optimization cost matrix in Table 4.1
becomes the same as that for Wagner or Camin-Sokal optimization. Under
this assumption, a single gain followed by multiple loss is the preferred
hypothesis until the number of reversals exceeds four, after which two
independent gains becomes the preferred hypothesis.
There are two difficulties in implementing generalized optimization proce-
dures. The first is purcly practical: the inclusion in a data sel of characters
coded using cost matrices greatly incrcases computation time. The second
problem concerns the determination of the costs to be applied to transforma-
tions, which are dependent upon acceptance of a particular model of charac-
ter change. Such models are mostly very difficult to defend a priori and as a
general rule, unless the application of differential costs can be explicitly and
thoroughly justified, then complex cost matrices should be avoided.
4.2 MISSING VALUES
Missing values, designated as '?', '-' or '.' in computer programs, are some-
times entered in data matrices. Most often, missing values appear in analyses
containing fossil taxa and the problems that flow from their inclusion may be
most acute in palaeontological data. However, missing data are not confined
to fossils. There are a variety of circumstances in which question marks are
used. This section explains the causes, effects and possible strategies for
dealing with missing values.
Missing values may appear in a dala matrix for one of several reasons.
1. A particular ObSC:rr;v.~t~IOii~=[jjjiL!jjjilnJ"'1!'! red even though the part of the

anima] or plant iI_a
80 Optimization and the effects of missing values
2. A question mark may be inserted in place of polymorphic coding (different

members of a terminal taxon may show some or all of the ahernativc
character states).
3. Technical difficulties in identifying a purine or pyrimidine in a gel may
lead to the inclusion of a question mark or lUPAC/ IUB ambiguity code
within a mo lecular sequence, meaning that the observation is uncertain.
4. Once ami no acid or gene sequences have been aligned berween d ifferent
species, it is often necessary to incl ude insertions and deletions (iodels) to
maximize a lignment of sequences. Usually indels are treated as an addi·
tional character slate for the purpose of cladistic analysis. Occasionally,
they are treated as question marks.
5. For fossils, which are always incomplete in one respect or another, there
may be genuine missing data, that.js, the particular part of the skeleton
has not yet been found or the fossil lacks the soft anatomical details
(fossils will nearly always lack molecular sequences).
6. Organisms may have missing data in the sense that some struct ures arc:
interpreted as having been lost. For example, in a systema tic problem
concerning the interrelationships of vertebrates, attributes of tooth shape
o r tooth replacement pattern cannot be scored for modern turtles and
modern birds.
7. A lt ernative ly, there may be characters for which states cannot be coded in
any outgroup taxon. For example, in the same systemat ic problem, states
relating to features of the vertebral column cannot be scored in any
ou tgroup (e.g. Amphioxus or sea squirts) because no ou lgroup has a
vertebral column . In both this and the previous case, character stales may
be judged non-applicable or illogical for some taxa, and although this is
usually noted in the written data matrix, computer algorithms will !Jeat
these entries as question marks.
8. A special case is exemplified by three-item statements analysis, where data
are recoded to express components. Here, the question marks are simply
devices, which the computer algorithm can accept. They do not indicate
missing or ambiguous character data. This type of questio n mark is
discussed furthe r in Chapter 9.
Irrespective of the cause of question marks in a data matrix, their effeci in

a computer parsimony analysis is the same. Question marks can lead 10 the
generation of multiple equally most pa rsimo nious ciadograms, to spurious
theories of character evolution, and to la ck of resolution by masking the
phylogenetic signal implied by the nb~erved dala.
It is important 10 recognize that mis.'iing values alone will not alter the
topological relationship (){ lala that arc known (rQIILtcal !i)bserved) data.
Mis,I'ing /JU /ll CS 81
Fig. 4.6 An example of the increase in the number of equlI.lly most parsimonioull
dlldograms follow ing addition of taxa with missillg values. Above (a) is Ihe slrkt
C!H1ScnSIl5 tree fo r 20 orders of Rocent mammals coded for 88 cha rac ters wit h nearly
1111 da tfl cells filled with rea l VOlllOS, Dolow (bl, sevon fossil taxa have been added,
each of which ha ve between 25 - :;7% missing data. (From Novace k 1992,)
Their most obvious erfeci il 10 increue Ihe number uf equally most parsimo.
ninus dndograms and IIIDIutkm. Novacek (1992) gave a good
example of this initial o r the twenly
recoanizcd ch.rlClen
82 Oplimizntiolt alld the effects of missing values
produced four equally most parsimonious cJadograms. All but six data cells
n.
were fi lled with real data (i.e. 0 or the few question marks being the result
of non-applicable coding. To this analysis. seven fossil taxa were added with
varying amounts of missing data (25 - 57%). Analysis of Ihis enlarged matrix
resulted in 6800 + equally most parsimonious cladograms (this being the limit
of computer memory rather than the total number of equally most parsimo-
nious solutions). In the strict consensus tree, the clade originally recognized
as containing primates, tree shrews, flying lemurs and bats was lost.
Addition of taxa 10 any analysis is liable to increase the number of equally
most parsimonious c1adograms by introducing additional homo plasy (al·
though it is possible that additional taxa may give fewer cladograms by
resolving previous ambiguity). However, the dramatic increase in cladogram
numbe r in the study cited above is due largely to the inclusion of question
marks, which increase ambiguous character optim izations at the internal
nodcs. It has been pointed out by Plat nick t!J.pf. (1991b) that two of the most
commonly used parsimony programs (Hennig86 and PAUP) can generate
spurious c1adograms when supplied with matrices containing missing data.
The example of Platnick et al., reproduced here as Fig. 4.7, shows a data
matrix that includes two missing values for two taxa (F and G). Analysis of
this da ta set using either Hennig86 or PAUP yields six equally most parsimo.-
nious cladograms. Yet. if we replace the missing values with all four possible
combinations of real va lues, then we can recover just four of these six
c1adograms (Fig. 4.7a, d- O. T he remaining two (Fig. 4.7b, c) are solely
products of the way that the computer programs o ptimize missing values.
It call be argued furth er th at five of the six cladograms, those with three
nodes (Fig. 4.7a, d, 0 and four nodes (Fig. 4.7h, c), are 'oveN eso!ved ', by o ne
and two nodes respectively. In o ther words. all of the branches resolving
groups DEFG , DFG , DG and OF are spurious; none has unambiguous
support ill the data. How these spurious resolutions arise can be explained
with reference to a second example (Fig. 4.8), which also illustrates that the
problem of ambiguous optimization can also occur in the absence of question
marks.
Analysis of the matrix in Fig. 4.8a using Hennig86 or PAUP yields four
equally most parsimonious topologies (Fig. 4.8b·d, O. Only two of these
c1adograms are unambiguously supported by dala: Fig. 4.8b, in which node
CDE is supported by character 3; and Fig. 4.8c, in which node BC is
supported by character 4. The cladograms shown in Fig. 4.8d and e, and in
Fig. 4.8f and g, represent alternative, ambiguous optimizat ions of characters 4
and 3 respectively. The former c1adogram of each pair is the delayed
transformation, while the latter c1adograms are the accelerated transforma·
tions. The topologies are reported by the computer programs because at least
one character is placed on each branch under at least one of these optimiza·
tions (Fig. 4.8e and Fig. 4.80 .
Swofford and Beale (1993) proposed tbrM criteria that could be used to
determine when zcro-le I 10 be colli d.
Missing values 83
"'M
, I t J
~
~
.
,, o0 0 0
1 0 a 1
,, 1 1 0 0
,,, , , ,
1 1 1 1
,
G ,, ,
1 I 1 0
1 1 ? 1
/1 ,,
BCEFOG BCEFDG
~
BCEGOF BCEGOF
~ ~
B C EGDF BCEGDF
~ ~
fig. 4.7 Spurious cladograms. (11. -0 Whun the data set shown here, including two
questions marks. is analysed using either Hennig86 or PAUP, six c1adograms (a-f)
are found. However, two of these (b and c) cannot be justified by any combination
of replaced 'real' observations. they are spurious and result from the way the
alga.rilhms treat zero branch lengths. (From Plalni ck et oJ. 1991b.1
• Coll apse a branch if its minimum length is zero; that is, if there is at least
one optimization of all characters that assigns zero-length to the branch,
then that branch is collapsed.
• Coll apse a branch if its maximum length is zero; thai is, if there is at least
one optimization of all characters that does not assign zero-length to the
branch, then that braadlll not collapsed.
• Apply either transformation to all characters or to
characten indMctuIIIJ. any branch assigned zero-length.
'Die third crilcm. ...JII. would we have a
84 Optimization amI the effects of missing values
,-) , , , •
,,,,
•, ,, ,, ,,
, , , , ,,
0
• ,
• B C 0 E
• B C 0 E
o o
ABC 0 E A BC 0 E
o o
A BC 0 E A 8 C 0 e
o o
Fig. 4.8 (a-gl Exampla of how ambiguous optimization can lead to spurious resolu-
tion when a data sel is analysed using either PAUP or Hennig66: see lexi for
explanation.
defensible justification for choosing one type of optimization over the alher.
However, de Pinna (1991) argued that, unless demonstrated to be false by
parsimony, accelerated transformation is to be preferred because it preserves
more of our original conjecture of primary homology than does delayed
transformation. (The same point was made by Farris (1983) on the basis of
higher information content.) In other words, by favouring the acquisition of a
character, with subsequent homoplasy accounted for by reversal (e.g. Fig. 4.8e
rather than Fig. 4.8d), accelerated transformation maintains o ur original
conjecture of the character as II putative synttpomorphy. In contrast, b~
treating homoplastic characters as independent deriva tio ns, delayed transfor -
malion rejects our ori&i.nll...1!Y.~lh"iI of primary homology. For this reason,
Missing fJOiuer 85
de Pinna (991) asserted that accelerated transformation optimization is the

theoretically superior algorithm for tracing character evolution. It should be
noted, however, that the argument of de Pinna explicitly concerns character
evolution. It is therefo re part of the transformational approach to clad istics,
which views characters as fea tures that transform one into another, rather
tha n the taxic approach, wh ich we adopt here, that uses the distributions of
characters among taxa to hypothesize group me mbe rship.
Swofford and Begle (1993) considered that the first criterion was flawed
beca use it might not be possible to collapse all branches that have a
minimum length of zero and still retain a most parsimonious topology. Two
branches may each be potentially of zero-length, but not simultaneously.
Thus, it is not possible to collapse both without increasing the c1adogram
length, giving a suboptimal solution. They considered that only the second
criterion could be justified, maintai ning that ambiguous support represented
potential resolution and thus should be retained on the most pa rsimonious
cladograms. Consequently, this approach was adopted in PAU P (and also
Hennig86; Farris 1988).
However, there is a practical problem. PAUP fo rces us to choose to
optimize using eithe r accelerated or delayed transformation. Accelerated
transformation should recover the topologies in Fig. 4.8b, c and e, but not
that in Fig. 4.8g, which ought to collapse into the topology in Fig. 4.8c due to
the lack of support for node DE. Likewise, delayed transformation should
recover the topologies in Fig. 4.8b, c and f, but not that in Fig. 4.8d, where the
J<lck of support for node DE ought to collapse it into the topology in Fig. 4.8b.
Nevertheless, PAUP continues to report all fo ur cladograms, even though
one of them cannot be supported by dala under the chosen optimization
routine.
Coddington and Scharff ( 994) suggested filtering cladograms produced
using the third criterion by discarding all topologies thai must contain a
zero-length branch. Th is procedure wou ld resolve the problem just discussed,
as we ll as removing those cladograms in which not all nodes are capable of
support simultaneously. However, it would retain topologies such as those in
Fig. 4.8e and Fig. 4.8f, where the resolution is supported by one optimization,
although not by another.
However, Nixon and Carpenter (l996b) noted that the extra resolution
comes only from a small number of ambiguously distributed, homoplastic
characters. The additional groups are thus weakJy supported and are not
strong hypot heses of relationship. Instead, they suggested that the most
efficient way to summarize the groupi ng infor mation in the da ta is to
elim inate all those minimum length c\adograms that contai n spuriously re-
solved nodes. In othtr wordI. apply the first criterion of Swofford and Begle,
but with the added provilo Ihil till minimum cJadogram length be main-
tained, thereby circurm ll' ''1waIIord IDCI Begle's objection. Those cJado-
arams that re~ um lenath and have all the
86 Optimization and the effects of missing valiles
resolved nodes supported by dilt3. are termed 'strictly supported cladograms'

(Nixon and Carpenter 1996b) (the related concept of ' minimal cladograms'
(Nelson 1992) in the context o f three-item statements analysis is discussed in
Chapter 9). In the example in Fig. 4.8, only the two cladograms in Fig. 4.8b
and Fig. 4.Sc would be retained. Of the programs currently available, only
NONA has a switch to disallow ambiguous optimizations.
To return now to missing data, the effects can be quite subtle because
there is no simple relationship between the amount of missing data and the
disruptive influence it may exert on either the number of c1adograms pro-
duced or their resolution. This is because the effect of introducing a taxon
with many question marks de pends upon Ihe distribution of the remaining
real (observed) data. This can be illustrated by an example published by
Nixon and Wheeler (1992) and redrawn here as Fig. 4.9. Give n six taxa, A - F,
each of which is completely known by real data, there is a single most
parsi monious solution (Fig. 4.9a). The addioon of a seven th taxon, G, which is
known only for characters 3 and 6, results in eight equally most parsimonious
solutions, the strict consensus of which is the unresolved bush (Fig. 4.9b).
This is because the real data known for taxon G places it in markedly
different parts of the original c1adogram (Fig. 4.9c). Therefore, in a real
analysis, if there is a taxon (e.g. a poorly known fossil) that behaves in this
manner, it may be better to leave it Oul entirely. Instead, it could be placed by
hand on the resulting cladogram in the alternative positions allowed by the
real data.
The removal of a taxon simply because it disrupts resolution cannot be
justified except in terms of computational expediency. A taxon, with or
without question marks, that causes a change in topology, other than simply
decreasing resolution, carries information that may be poten tially useful. Any
taxon that has a unique combination of characters may influence relation -
ships among the other taxa. It is against this background that Wilkinson and
Benton (1995) suggested employing 'safe taxonomic reduction'. This proce-
dure permits uS to elim inate those taxa with question marks that can have no
influence on the topology of the remaining taxa. A theoretical example is
shown in Fig. 4.10. Given the three taxa (A, a, e), it can be seen that taxon B
has precisely the same real codings as the mo:e completely known taxon A.
Deletion of this taxon cannot alter topological relationships, yet its presence
may increase the number of eq ually most parsimonious cladograms. Taxon C
contai ns exactly the same number of question marks as taxon a but they are
distributed differently such that, fo r character number 2, taxa A and C have
different rea l cod ings. Elimination of taxon C would be unsafe because the
real data for this taxon may have innuence on the resulting topology. In a
real example, Wilkinson and Benton (1995) analysed 16 taxa of Rhyn-
chosauridae (a n extinct filmily relat ed 10 the tuatara), finding 2 1 700 eq ually
most parsimonious cJadogrums (the limit impoSed by Ihe computer's memory),
Six tllxa sa tisfied the criterion of sufe UlXllOO reduction d could be
,• •• ,-
~ ~ < '" uQ .... '"
• • ," "
o1 1 , , 1
A B C D E F " "" A BCDEFG

• 00 0 0 1
•• •• o 0 0 0 •
o
(.,
V T
0 0 0 •
• .........
•, • • (0'
•
...
10 000
" ,
,,' ••• ,,
12 0 0 0
.3 0 0 0
,. 0 0 0
GA 8 C 0 E F A aGCOeF A SeGOEF A BCOGEF
V
A BGCOEF
V
ABCGOEF A
V BCDGEF
V
A eCOEfG
V ~ ~ ~
F'", 4.9 Some taxa vvith question marks can be highly disruptive. (a) The single cladograrn that results from analysis of the data set
shown, but including only taxa A- F. (b) When a seventh taxon. G, is added to the analysis, eight cladograms are produced of which
the slrict consensus tree is the uninfonna live bush. (el The disruption is caused by the alternative positions that taxon G can adopt
on the e ight original cladograms. (From Nixon and Wheeler 1992J
88 Optimization and the effects of missing values
a-IARACTffiS
TAlON A 1 2 0 ? 0 1 2 0 1 0 1
TAXON B 1 ? 0 ? 0 ? ? 0 ? 0 '" safe
TAXON C 1 0 0 ? 1 ? 1 2 ? ? ? 1 ... unsafe
t
dillerent real coding
Fig. 4.10 Safe taxonomic reduction. II taxon B is compared with taxon A, it can be
seen that in all characters denoted by question marks taxon A either is the same or
has rea l values. The inclusion of this taxon can bave no topological effect on the
outcome and may be sa fely deleted. However, taxon C differs in having a differen t
'real' value and may not be safely deleted sinc6t.jt may cause a change in topology.
deleted . The reduced data set yielded only t~o equally most parsimonious
c1adograms. Having reduced the number of cladograms, the excluded taxa
would then be placed back on to the c1adogram(s) before a selection was
made among them on other grounds (e.g. stratigraphic or biogeographical
plausibility).
Missing data due 10 incompleteness should not be a problem with Recent
taxa. It is theoretically possible to know a Recent an imal or plant in a way
that is not possible for fossils. However, it has been pointed out (Gauthier el
ai.1988) that marked divergence in structure among groups of Recent
animals a nd plants, as well as the inclusion of highly distinct outgroup taxa,
may mean that question marks are placed against some characters in Recent
taxa. For example, matrices can include quest ion marks derived from non·
applicable (or illogical) codings. This often arises in matrices including both
fossil and Recent representatives.
Although little may be done regarding genuinely missing dala, we shou ld
be aware of the consequences of coding non-applicable character states as
quest ion roadcs. First, these question marks will undoubtedly increase the
numbers of equally most parsimon ious cladograms. Second. it is possible that
using question marks fo r non-applicable characters may lead to select ion of a
more parsimonious cladogram than is allowed by plausible character evolu·
{ion.
The following example (Fig. 4. 11, from Maddison 1993) explains Ihis
phenomenon. Suppose that we wish to add characters of the tail to try t
resolve the topologica l ambigu ity shown in the left·hand ma in clade of Fig.
4.11a. The animals concerned exhibit one of three conditions. Some are]
tailless. some have red tails and some have blue tails. The distribution 0
these conditions in our initial analysis is shown in Fij. 4.11a. where it can be;
seen that the more basal members of botJa __ ' . . . dades 8re aij
Mw"" values 89
taiUess, while the more distal members of each show either red tails o r blue
tails.
There are several different ways in which we might code the tail condi tions
(see Chapter 2). Two of the commonly used methods are shown here as
alternative 1 and alternat ive 2 (Fig. 4. 11 b). Alternative 2 treats the three
types as dependent variables within a single multistate character. In contrast,
allernative I treats them as semi-independent variables distributed between
two characters and includes a question mark to denote non-applicable coding
for colour in those taxa having no tails. Coding alternative 1 is often used, yet
this may lead to a selection of a c1adogram that cannot possibly be justified in
terms of character evolution (th at is, when we translate the c1adogram into a
tree). The area of ambiguity in the left-hand clade involves four taxa, for
which there are 15 possible fully resolved solutions. Two are shown in Fig.
4. ll c and e, and in Fig. 4.lld and f. Optimization of the tail characters
(presence/absence, red/ blue) coded as alternative 1 results in us preferring
the topology in Fig. 4.11c over the ahernative (Fig. 4.11d), beca use the former
requires only two steps to explain the distribution of tail colour, rather than
three. But selection of that c\adogram is nonsensical, because the choice is
based upon fal sely ascribing tail colour (blue or red) to animals that do not
have tails. However, if we used cod ing alternative 2, the starling condition for
both topologies would be '0' and both require 4 steps (Fig. 4.Ue and Fig.
4.1 10. Of course, coding tail attributes in th is way gets us no further in
resolving the initial polytomy but it does mean that we avoid the choice of an
apparent ly optimal c1adogram based on nonsensical character attributes. As
Maddison pointed out, this problem is not confined to morphological data but
is also relevant to the use of question marks to code for gaps in protein or
nucleotide sequences. This is a very simple example and if we were doing the
analysis by hand, then the danger would be spotted. But in computer
analyses, these pitfalls are much more difricuit to detect. FUrihermore, if we
were to apply successive approximations character weigh ting (see Chapter 5)
to such an initially 'nonsensical' cladogram, the result may well deviate
further from reality.
Simulations have shown that question marks in ingroup taxa that are widely
scatt ered at low hierarchical levels exerl more deleterious effecls on charac-
ter optimization (and perhaps fal se selection of c1adograms) than do question
marks in taxa near the root. In practical terms, this means that in combined
analyses o f fossil and Recent taxa, where the latter are usually scattered
within the ingroup, it may be particularly important to avoid the use of
question marks for non-appl icable character states.
Mention has been made above o f the use of question marks for polymor-
phic taxa. To avoid this two solutions have been proposed. First, the group
may be broken up into two or more subgroups, the members of which show
uniform coding. Altemalively, a phylogcny for the group showing the poly-
morphi.m may be assumed in, It the ingroup node accepted as
90 Optimization alld the effects of missing values
"""'. ..,- -
.-- , ••, "--
...
-=-
,~
~--
,
- "• ,
~== ![IIIIIII!!111 l ll!lililjilii

ll{lI.iiiii.i5.&i5.B.
coku" !!JJ11? ?l1Jl!t "'*""
'~,~r
"'Ig. 4.11 An example of how nOlH lpplicable coding. scored as question marks, may
resull in choosing a cladogram tha t is 'nonsensical'. (8) The preliminary cladogram.
where part of the left hand side of the c1adogram was unresolved. We wish to add
characters of the tail to try to resolve this polytomy. There are three taxon types
(tailless, red-Iailed and blue-tailed). (bJ Two coding alternatives for the tail al·
tributes. Alternative 1 treats the features as two characters and includes e qutlstion
mark to denote inapplicabl e coding for colour in those taxa having no tails.
Alternative 2 treats the three types as 8 single multlstole character. (c) One of the
fiheen possible resolutions of the terminal polytomy shown il1 fa). (d) Another of the
fiheen possible resolutions of the terminal polytomy shown in (a). Optimization of
the tail cha racters coded according to a ltemative 1 leads us to prefer the topology
in (e) to that in (d), because the former is one step shorter. (e-O The sa me two
topol ogies as in (c) and (d). Optimization of the tail chanCltel'l c:oded a ccording to
alternative 2 doe! not allow u! to prefer either topo).g IIcMb .... tDur .tapi lon&
(From M.dd1ton 1983
Chapter l'ummary 91
representative of the entire group (but see §3.2.5 for a discussion of the
problems associated wi th this procedure).
In summary, the introduction of questio n marks into cladistic flna lyses
causes computational problems that have not yet been solved. These prob-
lems relate to both the numbers of cladograms produced and their resolution.
The introduction of question marks is clearly most significant when undertak-
ing simultaneous analyses of combined fossil and Recen t taxa (especially
where molecular sequences are included). For morphological data matrices, it
may be possible to disti nguish between informative and non ·informative fossil
taxa and eliminate the latter. It may also be possible to identi fy 'rogue' taxa
and eliminate them from initial analyses. For polymo rphisms, alignment gaps
in molecular sequences and non-applicable character states, care should be
taken in the initial coding. All of these strategies will alleviate the symptoms
of question marks but not remove them entirely.
4.3 CHAPTER SUM MARY
I. Character optimization is the process of determining the sequence of

character state change on a cladogram. Wagner optimization is used for
ordered characters and Fitch optimization for unordered characters. Other,
more restrict ive, methods include 00110 and Camin - Sokal opt imization.
All of these procedures can be considered to be special cases of a
generalized method of optimization in which explicit costs are assigned to
transformations between character states.
2. Missing values in a data matrix can be caused by the failure to observe a
feature in a particular organism due to lack of the appropriate organ or
life cycle stage, polymorphism, secondary loss, or dependent characters
leading to non-applicable character states.
3. Missing values can lead to an increase in the number of equally most
parsimonious cladograms, may decrease resolution, and may lead to selec-
tion of a more parsimonious cladogram than is allowed by plausible
character evolution.
4. Missing values can also cause some cladistic compu ter programs to pro-
duce spurious cladograms that cannot be supported by any possible combi-
nation of real values. Such cladograms ca n also be produced fro m data sets
that include no missing values as a result of ambiguous character optimiza-
tion. The preferred cladogram is that which is of minimum length and has
all its nodes unambiguously supported by data. This is the strictly sup-
ported cladogram.
5.
Measures of character fit and
character weighting
5.1 MEASURES OF CHARACTER FIT
Current parsimony programs utilize a number of different statistics to assess

the 'quality' of cl adograms. Standard measures are cladogram lengt h, the
consistency index and the retention indel\: The examples given below deal
with binary characters only. The principles can easily be extended to multi·
state characters.
5.1.1 Cladogram length

Consider Matrix I (Table 5. l a), which consists of four taxa (A- D) and six
characters (1 - 6, in which 0 - the plesiomorph ic state and 1 = the apomorphic
state). With the inclusion of an all-zero root (X), this matrix yields three
eq ually most parsimon ious cJadograms (Fig. 5.1a- c). Consider fir st the clado-
gram in Fig. 5. 1a. Four of the six characters fit the cladogram with one step
(Table 5.1. row Sj characters I, 4, 5 and 6). Character I is present o nly in
taxon D and character 4 is present in aU fo ur taxa, A - D . No mailer which
solution is considered, characters I and 4 will always fit with one step.
Characters 2, 3, 5 and 6 are different in that they imply alternative groupings
among taxa A - D. Character 5 implies that A and B arc more closely related
to each than either are to C o r D, while character 6 implies that A, Band C
are more closely related [Q each other than they are to D. Both o f these
hypOlheses arc represented in Fig. 5.1 a. Hence, characters 5 and 6 need only
appear o nce on the cladogram at the node uniting the relevant taxa. Charac-
ters 2 and 3 arc of the same kind as character 5 and 6 in that they also imply
particular group ings. Character 2 implies that C and D are more closely
related to each other than either is to A or S, while character 3 implies that
B, C and D are more closely related to each other than they are to A. Neither
of these hypotheses is present in Fig. 5. la, on which we will now concent rate.
Characters 2 and 3 can only be fitted to that cladogram with more than one
appeara nce. In this case, characters 2 and 3 appea r with a minimum o f two
steps (Matrix I, Table 5.la, ruw .f). In total, in Fig. 5.1a, fou r characters fit the
d adogra m once (characters 1, 4, 5 and 6), and two characters fit the
cladogram tw ice (characters 2 .mtl :n. Simple addition sives a total dadogrum
length to tal of ~ stepli.
MeaJurt.~ ()f dwracter fit 93
Table 5.1 (a) Matrix 1 with four laXH (A- D) a nd an ail·ztlro root (X) coded lor 6
characters (1 - 6), (h) Matrix 2 wilh flva taxil (A- E) and an ail·zero root (X) coded for
6 characters (1 - 61. s "" actual steps on tht! citldogram, In '" minimum possible steps,
g '" minimum steps on a bush, ci :consistom:y Index. ri '" retention Index, S "" sum 01
s, m, and g values (S , M and C respectively) for the cakulation o f Cl and RJ. CI(u) is
CI for informative characters only. The va illes fo r CI. CI(u) and RJ for Matrix 1 refer
to the c1adogram in Fig. 5.1e, while those ror Matrix 2 refer 10 the c1adogram in
Fig. 5.2c.
(a) Matrix I
T Characters
2 3 4 5 6 X
X 0 0 0 0 0 0
A 0 0- 0 1 1
B 0 0 1 1 1 1
C 0 1 1 0 1
D 1 1 1 0 0
, 1 2 2 1 6
m 1 I I 1 1
•
•
d 1
2
1
2 1
1
2
OB
2
0.5
10
d 1 1 1 0 0
Length 6
CI MIS 618 0.75
c Uu) MI S 410 0.67
RI C-S / C.M 10- 8/10-6 0.50
(bJ Matrix 2
T Characters
1 2 3 4 5 0 1
X 0 0 0 0 0 0
A 0 0 0 1 1 1
"
C
0
0
0
1
0
1 1
1
1
1
1
1
0
0
1
1
0
E 1 1 1 0 0
., I 2 2 1 1 1 6
1 1 1 1 1 0
'" 2 3 2 1 2 3 13
"
d
,; 1 1
0.5
0
0.5
0.5
l.ength 8
CI MIS 6/8 0.75
.,
c l(u)
=
MI~
p-S/G: ~J
4 / Ii
·tiI 1:I- ()
0.67
0.71
94 Measures of character fit (llId character weighting
0 c , B
.,
.,
(a) .,
, B C 0
., ••
..
(b)
A B C 0
.,
.,
(e)
J<ig. 5.1 a- c:. Ana lysis of the data In Tllble 5.1a yields three eq uall y most parsiml
nious c1adograms. Only characters 2, 3, 5 and G are mapped. Character ga in signifie
by ' +', character loss by ' - ',
It is tempting to consider the (it of a characte r and the terms 'sieps',

'appearances' or 'occurrences' as if they represent real events, such as the
origin of a particular cha racter. The term 'sleps', fo r example, has been
interpreted either as the number of occurrences of a particular character on
a particular cladogram, or as the amount of change required to transform a
character from one of ils states into another (in this case 0 to I, or I to 0).
Under this laller interpretation, the number of steps between different states
of a character can be made to vary according to pre-specified 'optimality
criteria' (see Chapter 4). Such assumptions can be incorporated into the data
sct a priori and considered as a form of weighting, such that one part icular
' transformation' is considered more ' likely' than another.
Various assumptions to weight step changes differentially have been pro-
posed. Wagner, Fitch, Dalla and Camin-Sokal ' pa rsimony' are among those
more commonly encountered. The recently developed concept of 'generalized
parsimony' attempts to incorporate even more complex changes into charac·
ter transformations (see Chapter 4). However, such assumptions need not
(and possibly should not) be made and cladogram length is simply a way of
quantifyi ng the optimal cladoaram, aiven a particular data set and possible
solutions.
M~asurts of character fit 95
When a character fits a cladogram perfectly (for a binary character, to one

node with one step), it can be considered 100% consistent with that particular
c1adogram. When a character does not fil a c1adogram perfectly (to more
than one node with more than one step), it is less than 100% consistent. Such
a character exhibits homoplasy. Homoplasy is discordance with a particular
dadogram and ca n be measured by va rious indices: the consistency index (ci)
and the reten tion index (rO for individual characters, and their e nsemble
values (Cn and (Rn measured over the entire su ite of characters for a
particular c1adogram(s).
5.1.2 Consistency index (ci)

Character consistency (cO is ' defined as mis, where s is the mlmmum
number of steps a character exhibits on a particular dadogram and m is the
minimum number of steps the same character can show on any cladogram.
For a binary character on any d adogram, In will equal! (Matrix I, Table 5.1,
row nI). The best possible fit will bc to a single node and for such a character,
s will also equal l. However, with increasing amounts of homoplasy, a binary
character will fit with a greater number of steps and s will equal 2 or more.
Consider again Matrix 1 (Table 5. 1a) and the c1adogram in Fig. S.lc.
Disregarding characters 1 and 4 as uninformative. the characters can be
divided into those that fit La one node (characters 2 and 3) and those that fi t
to more than one node (characters 5 and 6). All four are binary characters
and thus 111 - 1 for all of them. As characters 2 and 3 fil to a single node on
the cJadogram in Fig. 5.1a, their s also equals I, and thus their ci = 1
(11I1s - I ll). They are 100% consistent with this cJadogram. In contrast,
cha racte rs 5 and 6 fit to the cJadogram twice. Their s'" 2 and hence their
ci = 0.5 (mls - 1/ 2). They fit this particular cJadogram imperfectly and are
only 50% consistent. It should be noted that for Matrix 1 in Table S.la, there
are three equally parsimonious solutions (Fig. 5.1a-c) and the values of s
(a nd hence ci) for characters 2, 3. 5 and 6 will change according to the
cladogram being considered.
5.1.3 Ensemble consistency index (CO

Values of ci for individual characte rs are useful to understand how well they
have performed on different cladogrums, but it is also useful to know how
well the e ntire data set performs. The ensemble consistency index (Cn gives
this value. As with individual character ci values, if all the data fit a
cJadogram perfectly then data and cladogram are 100% consistent. Data that
are 100% consistent are rarely. if ever, encountered, no matter how well the
morphology is studied Of lbe DNA ICquencing is performed . CI is repre-
sented by the sam. paml' ••• ~I character consistency but with
upper cue letten to once CI- MIS, where M
96 Measures of character fit antI character weightillg
equals the sum o f all the m values (or the individual characters and S is the
sum of all the ir s values. In the case of Matrix I (Table 5.1 a), M - 6 (addition
of values in row m) and S " 8 (addition of values in row s). Therefore,
C I "'" MI S - 6/8 = 0.75. The data are 75% consistent with the cJadogram.
5.1 .4 Problems with the consistency index

Several problems have been noted rega rding the use of the consistency index
as a measure of homoplasy. First, the inclusion of uninfo rmative characters in
the calculation will inflate C I values. Consider once aga in the example of
Matrix 1 (Table 5. la). As noted above, in this particular data set two
characters (l and 4) arc uninformative of relations hips. Wh atever possible
solution is considered, their ci value will always equal 1. For this reason, their
values might be understood as irrelevant or unnecessary (Ycates 1992). If
they are de leted from consideration, the C)..(M IS) becomes 4/ 6 - 0.67. This
figure is somewhat lower than the 0.75 that was obtained when the uninfor·
mative characters were included in the calculation. However, the significance
of the two different values is probably only relevant when different data sets
are being compared. If different parsimony analyses are performed with the
uninformative characters included, then the values for various solutio ns will
be affected by this (a position contested by Bryant 1995). Most currently
available parsimony programs (e.g. NONA, PAUP) provide both values as
part of the output.
Second, as the number of taxa increases, va lues of C I are observed to
decrease. In most cases, when the number of taxa increases, the C I will
decrease irrespective of any change in information content. However, this is a
recognized and expected property. The intenlion behind Cl is to measure the
amoun t of homoplasy in a given character or data set. This it docs ade-
quately. Farris (1989) recognized the problem of character fit (synapomorphy),
as opposed to homoplasy, and introduced the retentio n index, ri (and its
associated ense mble vaJue, RO, to account for this (sec below).
The third perceived problem with the consistency index is that its value can
never reach zero. All binary characters will have a best and worst value. In
Matrix 1 (Table 5.1 a), character 5, for example, will have a best value ci - 1 (1
step, 100% consistent) and a worst value ci - 0.5 (2 steps, 50% consistent). As
the examples in Matrix 1 (Table 5. 1a) and the cladograms in Fig. 5. 1 deal with
only 4 taxa, a problem does not arise. However, consider a binary character
that is apomorphic for taxa ABC fro m a set of fivc, A-E. For the character to
be 100% consistent it wou ld need to group A, B and C toget her at a single
node. If it performs less well, it may group only A + B (or A + C, or B + C)
together at one node and would then have a ci of 0.5, i.e. one step for the
group A + B and one step for C. If the character performs at its poorest it
wou ld not group any taxa toacther and A, Band C would appear on separate
branches of the claoo,.r:an'I. In Ihi' calC, ci - 0.33 (3 'tepa, or 33% consistent),
Measures of character fit 97
Thus it can be seen that binary characters will always have a positive ci
regardless of their relative informativeness; ci can never reach zero. At this
stage, it is worth remembering that ci is intended to measure amount of
homoplasy and is still a useful measure for that purpose. However, in
situations such as the above example of a binary character that is apomorphic
in three taxa from a set of five, there will be cases in which some (but not all)
of the similarity can be interpreted as synapomorphy, such as when only A +
B are grouped to the exclusion of C. To meas ure this, Farris (1989) proposed
the retention index (ri).
5.1.5 Retention index (ri)

The retention index (ri) is defined as (g -s)/(g - m), where m and s are the
same as for ci, and g is the greatest number of steps a character can have on
any cladogram (' any cladogram' can be understood to be the unresolved
bush). Thus for binary characters, g will equal the fewe st number of variables
in the column. For example, in Matrix 1 (Table 5.la), character 2 has two '1'
values and three ' 0' values, hence g = 2; character 3 has three ' J' values and
two ' 0' values, hence g = 2. For binary characters that fit to one node of a
cladogram, such as characters 2 and 3 in Fig. 5.1c, ri = (2 - 0/(2 - 0 = 1.
For binary characters that do not fit to any node of a particular cladogram, as
characters 5 and 6 in Fig. 5.1c, ri = (2 - 2}/(2 - 0 = o.
To appreciate the difference between what ci and ri measure, consider
Matrix 2 (Table 5. tb). Matrix 2 is identical to Matrix 1 except for the addition
of taxon E, which has exactly the same characters as taxon D. Three new
cladograms result (Fig. 5.2a-c). The total dadogram length and the individual
character ci values are the same respectively as for the cladograms without
taxon E (Fig. 5.1a- c), but the ri values differ. For character 6 in Fig. 5.2c,
ri = (3 - 2)/(3 - 1) = 0.5, rather than 0 as in Matrix 1 (Table 5.1a, Fig. 5.1c),
because the character now groups 0 + E together. Character 6 shows the 0
state in D + E, which is interpreted as synapomorphic. (The situation is the
same with the cladogram in Fig. 5.2b.) Thus, although the amount of
homoplasy for character 6 remains unchanged (ci = 0.5), the evidential value
of this character differs depending on the dadogram (ri = 0 or 0.5).
Consider a further example illustrated in Fig. 5.3 (after Goloboff 1991).
Character I is present in taxa D and H and can fit a cladogram in two
possible ways: either the group 0 + H is supported (d = 1, ri = 1) or it is not
(d = 0.5, ri = 0). On the cladogram in Fig. 5.3, character 1 fit s the cladogram
as poorly as it possibly could. It has a ci of 0.5 and because none of the
similarity is interpretable as synapomorphy, it has a ri of o. C haracter 2 is
present in taxa A, B, C, F and I, allowing a number of possible groups to be
supporred (ABCF +., ABC + Fl. ABP + CI. and so on, up to the entire
group ABCFI). There II • rcuonable possitlility that a proportion of the
similarity among the lUll At .. C. , IIKII will be interpreted as synapomor-
phy. Character 2 fltalbe .3 poorly (3 times) and has a ci of
98
.,
o , , .
Mellsures of clltlrtlcter fit and character weighting
181
.,•
. , o ,
• , o ,
.,
1'1
Fig. 5.2 a-c. Analysis of the u ata in Table 5.1h yields three equally most parsimo-
nious cJadogrllffis. Only characters 2. 3, 5 and 6 are map ped. Character ga in signified
by '+ ', character loss by '-',
0.33. H owever, it could have a worse fit as there are five taxa with the
apomorphic value and hence its poorest ci would be 1/5""' 0.2. For the
cladogram in Fig. 5.3, character 2 does group some taxa toge ther (A + B + C).
Hence some of the similarity is interpretable as synapomorphy and its
ri - 0.50.
o , , ,. E F G H
Fig. 5.3 Cladogram for nine taxa, A-I, with the two characlllrs in Table 5.2 mappe
Character 1 a ppears twice (independently in 0 lind H) and Is not synnpomorphic.;
Characte r Z appears three Ilmll~ : twko indflptUldflrltly In F' 1t1l~11 , but nilio unithlij the-
members of tim group AOC. HIIIll;e IOnlll of II• • hnll.... ly-.lu)'n.pomotphh ~ (A fter.;
,c;o.l.bo.flIll9JJI
Character weighting 99
Table 5.2 Example demonstrating the lise of th e re tention index wit h nine ta xa
(A- I) and two characters (Cl and ezl. Abbreviations for 5. m, g, ci and ri as Table
5.1.
Taxa , m g oj ,;
A B C D E F G H (m/ sl ( g-5/g- ml
Cl 0 0 0 1 0 0 0 1 0 2 1 2 0.5 Z- Z/2- 1 =:0

C2 1 1 1 0 0 1 0 0 1 3 1 5 0.3 5-3/5-1 =0.5
Character 2 has more homoplasy than character I, which is reflected in

their c i values (0.33 and 0.5 respectively). However, character 2 dues con-
tribute some synapomorphy to the c1adogram and hence contains evidence of
grouping, whereas character 1 does not. This is reflected in their respective ri
values of 0.5 and 0 (see summary in Table 5.2).
5.1:6 Ensemble retention index (0)

As fo r the CI, the ensemble retention index (Rn can be found by using the
summed values of m, sand g (M, Sand G respectively). For Matrix 1 (Table
5. la),. RI = G - S/ G - M = 10 - 8/ 10 - 6 = 0.50, while for Matrix 2 (Table
5. lb), RJ = 13 - 8/ 13 - 6 = 0.71. Matrix 2 has an RJ vaJue that is a liule
better (0.71) than that of Matrix 1 (0.50). This is due to more of the sim ilari ty
being interpreted as synapomorphy. However, as the level of homoplasy is
unchanged, so is the CI.
5.2 C HARAC TER WEIGHTING

The idea thaI characters must be or need to be weighted is persistent in
cladistics. The first approaches to character weighting were largely subjective.
For morphological characters, criteria such as their structural complexity,
constancy among taxa, the 'Darwin principle' (characters of low adaptive
value should be weighted higher) and character correlation were considered
appropriate (Mayr 1969). Pheneticists initially eschewed weighting altogether,
while others suggested a variety of approaches not too dissimilar to those
outlined by MaYT (1969).
In the cladistic literature, the first serious attempt to incorporate character
weights was proposed by Farris (1969). However. they were no t explored in
detail unti l the discovery lhat analysis of a given mafrix might result in more
than one most parsimonious c1adogram. The quest.ion was then proposed:
given a suite of equally most parsimon ious cladograms from one data set, is
there a rational way to choose one from among the many? Two possible
approaches have been suucsted:
• As all the equallfllRiiii& ~niou s cludograms are equally ' true', one
should not seek to hut simply represe nt the
comrIJQ..n inlolJllIll!!aJl
100 Measures of character fit and character weighting
• The analysis could be repeated incorporating some justified approach to
character weighting, in the hope that one cladogram (or at least a smaller
subset) would emerge as better supported.
The use of consensus trees for summarizing information from suites of

cladograms is discussed fully elsewhere (see Chapter 7). Of significance here
is a distinction between how consensus trees can and have been used.
Conventionally, the legitimate use of consensus techniques is for summariz-
ing information from a suile of c1adograms derived (rom the same data set,
rather than from suites of cladograms derived from the different data sets
(although this topic is by no means resolved, and relates to the ' taxonomic'
ve rsus 'character' congruence debate; see Chapter 8), However, a consensus
tree is generally less resolved than any of the equally most parsimonious
cladograms from which it is derived and is therefore considered a poorer
summary of the data than any of these ori~nal ciadograms. For this reason,
Carpenter (1988) rejccted COnsensus approach.cs as the final summary and
suggested tha t prior to consensus tree construction, exploration of character
weighting might prove useful.
5,2.1 Types of character weighting

Weighting of characters has been subdivided into a priori and a posteriori
approaches, that is, weighting can be applied before o r afler cladogram
construction. Alternative terms have been proposed: ' hypothesis dependent'
and ' hypothesis independent' (Sharkey 1989), and 'tree dependent' and ' tree
independent' (Sharkey 1993). However, a priori weighting is hypothesis and
tree independent, while a posteriori weighting is hypothesis and tree depen-
dent. The later terms are therefore redundant and we will use a priori and a
posteriori throughout this discussion.
For a priori weighting, there are two differcnt approaches: character
analysis and character compatibility. There are also two approaches to a
posteriori weighting, both based on the notions of cladistic consistency and
character reliability: successive approximations weighting (Farris 1969, 1989,
hereafter called 'successive weighting'); and implied weighting (Ooloboff
1993) Wh ile thcre are oth er approaches to weighting characters, these have
the greatest relevance to cladistic practice.
5.2.2 A priori weighting

Character analysis
Characrer analysis rerers to the re-examination of origina l data in an effort to
discover whether any mistakes havc been made, such us badly formula ted
hypotheses of primary homology (similarity) or inappropriutc coding. The first
type of error, mistaken judgements of simUlrity. doIel)' c mblcs Hennig's
primary suggestion for resolving incongruence, that of checking and re-
checking characters. As a general rule, it must always be considered appropri-
ate to weed out poor delimitation of characters. As such, preliminary cladistic
analyses should indicate those characters that perform poorly (defined below)
and act as a guide to those that may require re-evaluation. However, even in
the most careful studies some character conflict will remain. A good example
is DNA sequence data, in which only so much re-sequencing can be done
before one must conclude that the conflict in the data is a fact. Therefore,
while it should be unreseJVedly stressed that examination (and re-
examination) of specimens (or sequences) is of vital importance, this will
probably not eliminate all character conflict.
Below we treat morphological and molecular data separately from the
point of view of character analysis. This shou ld nol be construed as recogni-
tion of different classes of data. On the contrary, there is much that is similar
in the analysis of molecular and morphological data. However, molecular
data have a fixed number of 'attributes': nucleotide sequences are repre-
sented by only four characters (the bases, plus possibly a fifth for 'gaps') and
protein sequences by 20 (the amino acids). Hence, regularities might be more
easily discovered than in morphological data, as it is possible to calculate
accurately the possible permutations of the characters and differential
weighting of any empirical 'imbalances' deemed worthwhile.
M orphological dolo
Neff (1986) presented the most complete analytical method for character
analysis to date , broadening the concept to include factors other than
re-examination of morphological comparisons. Neff divided the process of
phylogeny reconstruction into two distinct parts: character analysis and
cladistic analysis (an idea reinvented by Brower and Schawaroch 1996),
According to Neff, character analysis involves two steps rather than the one
that is generally assumed (character delimitation). Step 1 is synonymous with
character delimitation and includes the initia l investigation of specimens,
making of observations and identification of features as possible homologues.
As it pertains to a particular hypothesis, this step is summarized as: 'Feature
X in taxon A is the same as feature X in taxon B'. Neffs step 1 is analogous
to Patterson 's (1982) concept of similarity and Rieppel's (1988) idea of
'topographic correspondence' being the initial criterion for homology deter-
mination. Neffs second step consists of constructing a hierarchy of characters
and is also phrased with reference to particular hypotheses: ' Feature X is
more general than, and includes, the more specialized feature Z', Step 2 thus
concerns polarity estimation (this subject was considered in detail in Chapter
3). For the purposes of Ihis discussion, polarity decisions should be under-
stood to be conciusioRi tbat InI derived from a c1adogram and that clado-
'f8m construction doll DOl . . . . . . . priori polarization (see Chapter 3). Yet
there i. much of mcd deed. as Carpenter (1988: 292)
102 Measures of character fit alld cJUlfflcter weighting
noted: ' Actually Neffs paper ('I 986J merely argues for careful homology
decisions, which should be given'.
Molecular data
Judgemental mistakes of similarity are a more than reasonable source o f
error when considering morphologica l data. This is as it should be. Exami na-
tion of character conflict leads to fe-examination of characters, which, in
turn, leads to greate r understanding of the organisms and improved classifica-
tions. This is the essence of systematics. Yet when dealing with sequence data
(particularly nucleotide sequences), one is usually faced with the conclusio n
that the da ta do, indeed, contain copious ' real' conflict. This is because it is
impossible 10 examine a nucleot ide base in any further detail, its similarity is
as exact as can be. However, the regularity that defi nes sequence data may be
used to implement certain types of a priori weighting, usually on the assu mp-
tion thul the conflict is 'caused' by knowlhprocesses (but see below). T he
various types of weighting possible are listed as follows (Hillis el al. 1993);
perhaps more will yet be discovered.
A priori weighting:
Unifo rm we ighting (all bases given equal weight)
Non-uniform weighting (bases assigned different weights)
Across positions (structural-fun ctional differences)
Codon positions (selective weighting of fi rst, second and third

positions in relation 10 redundancy of genetic code)
Stems and loops (selective weighting of loops ve rsus stems in

secondary folding Slructure (RN As))
Within position (mutational bias)
T ransversions vcrsus transitions <weighting of transitional bias)
Relative substitution base composition 0 2 possible substitutions

weigh ted according 10 obseJVcd or expected frequencies)
Synonymous versus no n-synonymous change (change of amino

acids in coding regions)
A posteriori weighting:
Successive approximations weighting (weights according fO Icvels of

homoplasy)
Dynamic wciahlinJ (wel.hIS eccordlna to lovell of homoplasv. includes

within and KrC* dOni
Character weighrjng 103
Weighting nucleotide sites can be complex, depending on what is consid-
ered to be significant in the data. The 'default' approach is uniform weight-
ing, in which all bases are given equal weight. Non-uniform weighting selects
some differential that can be assessed from the data prior 10 analysis. This
can involve across sequence and within sequence positions or combinations of
both.
Across sequence position weighting involvcs known structural or fun ctional
differences in a particular molecular sequence. The best known example
relates to the degeneracy of the code. The synthesis of proteins requires the
genetic information to be translated into the correct amino <Icids, mediated
by transfer RNA (tRNA). Each tRNA recognizes a particular triplet of
nucleotides (codon) which represents a particular amino acid. As there are
four different nucleotides and each codon is a triplet, there are 64 possible
codons. Of these, 61 code for amino acids, while the remaining three code for
' nonsense' or 'stop' codons, which bring translation to a halt. With only 20
amino acids, some of the 61 codon triplets must code for identical amino
acids. In this sense the code is degenerate. Different codons that represent
the same amino acid are called synonymous. Therefore, a common method to
weight coding sequence data is to weight differentially the first, second and
third positions of the codon relative to particular amino acids. For example,
the amino acid proline is coded by four codons: CCU, CCC, CCA, CCO. Any
substitution in the third position will not result in a change of amino acid and
all third position changes (or proline are synonymous. Likewise, leucine is
coded by six codons: UUA, UUG, CUU, CUC, CUA and CUO. If a codon
has the form eux (where X stands for any base), the amino acid will always
be leucine, and all third position changes are synonymous. However, the
codons UUA and UUG also code for leucine. If the codon UUX has a third
position substitution, then cadons UUU and vue will result in phenylala-
nine. Thus, for leucine and phenylalanine, not all third position changes are
synonymous. Some are non-synonymous in that some substitutions result in
diffcreOl amino acids. It is possible to calcu late all [he different kinds of
changes that can occur in coding sequences. For instance, as there are 61
sense codons there are 549 possible nucleotide substitutions. Thus, it is
possible to calculate all of the different kinds of possible amino acid coding
lind the relative frequency of the change for each codon position. Such
calculations show that 70% of third position changes are synonymous, all
substitutions at ~he second position are non-synonymous, and 96% of substi-
tutions at the first position are non-synonymous. It thus appears rational to
downweight or even ignore third position changes. The most extreme form of
this type of weighting is to use the amino acid seq uence as the primary data
and not the nucleotide sequenee (e.g .• some rbcL studies). While statistically
the frequency or chanau upcctcd A. individual codon positions is clear,
empirical exa mi,nation 01 ... eku of differential weighting may reveal
different Jdndl of d and Carpenter 1996). It seems
104 Measures of chamcter fit and character weighting
H OC . Op
O-U
V-O
U -A
I 110- ' ~5
1139 °1?7
O'C
Pedinomomu minor
9 · ~.t 6 S rRNA
U-A
IV C · O·
. U13 [
A, " U
,M'
c ·o
gaU' A
.'
89£">, 'a
. 11
O' U
Fig. 5.4 Secondary structure in the 55 rRNA molecule of Pedinomonas minor.
PlIu ed sites are those which form Watson- Crick base pairs in the stem regions of a
molecule and unpaired siles 8re those that occur in the loop regions. (After
Devereux et 01. 1990.)
wise 10 exam ine each molecule individually rather than attempt to extrapo-
late a pa rticular situation into a generality pe rtaini ng to all molecules.
With respect to non-<:oding genes, different factors can be taken into
account. For instance, every molecular sequence has both a secondary and
tertiary fo lding structure such that some bases are placed adjacent ('0 each
other in stem regions and o thers are separate in loop regions (Fig. 5.4).
Nucleotide bases that appear opposite each other in stem regions are see n as
dependent , because if a substitution occurs in one position, the opposite base
may also have to change to maintain the overall st ructure. In contrast, bases
in the loop regions have fewe r such constraints li nd may be free to change to
any other nucleotide. Therefore, it might seem useful to weight these posi-
tions accordingly. HowlVU. diupeemenu exiat with respect to the informa-
tiveness of the Itgu for m.:eeot.
Wbee10r and Honeycutt
Character weightillg 105
TRANSITOI'S lRANSVEFSOOS
A ~.>-----------~.. G A G
c ~.~--------~ •• T c T
Fig. 5.5 Classification 01 nucleotide substitutions. Each arrow represents two op-
tions for direction of change. Transitions, of which there are four possibilities.
substitute one purine (A or C) fOf another or one pyrimid ine (C or T) fo r another.
Transversions, of which there are eight possibilities, substitute a purine for a
pyrimidine or vice versa.
(l988) suggested that the stem regions were uninformative (and effectively
gave them zero weight), while Dixon and Hillis (993) came to the opposite
conclusion. They suggested that Ihe informativeness of the stem and loop
regions may be unique to each molecule or to particular organ isms (Wheeler
and Honeycutt examined 5S and 5.8S rRNA genes in insects, while Dixon and
Hillis used 28S rRNA in vertebrates). Such discordant conclusions suggest
that even if it appears legitimate 10 implement this type of weighting scheme,
it may not be one that can be generalized to all organisms and all molecules.
Once again, the 'imbalances' need to be investigated with reference to each
particular problem and their utility examined against the conclusions (a
cladogram).
Within·sequence position weighting utilizes several different kinds of pra.
posed mutational bias. The first to be explored was the relative frequency of
transitions versus transversions. There are two kinds of nucleotides: adenine
(A) and guanine (G) are purines, while cytosine (C) and thymine (T) are
pyrimidines. Transitions substitute one purine for another or one pyrimidine
fo r another, of which there arc four possibilities (Fig. 5.5). Transversions are
substitutions of a pyrimidine by a purine or vice versa, of which there are
eight possibilities (Fig. 5.5). By chance alone, one would expect to encounter
more transversions than transitions. However, in many examples, there are
significantly more transitions than transversions. Such imbalances can be
used to weight the data so as to prevent favouring one kind of substitution
over another. For example, Miyamoto and Boyle (989) discovered that
'better resu lts in terms of unambiguous resolution ... , congruence ... , and
consistency , .. are expected from analyses of transversions alone, rather than
from combinations of nlWldona. uaDIVersions, and gap events ...•
Of further significance aN lbe reillive .ubstitution frequencies of all 12
possible substitutiolli. ..... _ "' ......tcd acc:ording to their frequency in
Iny pank:ullr III ible .ubltltutlon•• of which
106 Measures of character fil and character weighting
Table 5.3 The 16 possible substitutions among the four bases with
frequ encies 0- /: frequen cies w- z represent the four 'substitutions'
of olle base by the same ba st!, which are thus undetectRble and are
disregarded.
A c G T
A w o b ,
C d x e f
G
T • h
k
y
I ,
fo ur arc substitutions of one base with one that is identical and hence arc
undetectable and d isregarded (Table 5.3, w - z), Of the t 2 observable changes,
different frequency values are allowed for each d irection of change, such that
A ..... C is d and C -+ A is a, where d and "ill, may be different. These va lues
can be calculated from observed frequencies and compared against expected
frequencies. The values can be used in a var'iety of ways, including step-
matrices (see Chapter 4).
It is possible to combine within and across sequence weighting regimes, as
in dynamic weighting (Williams and Fitch 1990, Fitch and Ye 1990), which is
based, in part, on 'successive weighting' (see below). In successive weighting,
weights are assigned according to the ir levels of ho moplasy on resulting
cladograms. Dynamic weighting applies the same strategy but also includes
information on the relative frequency of the obsetved character cha nge. As
an example, Marshall (1992) used dynamic weighting to re-analyse small
subunit (SSU) rRNA sequences for amniotes. In the o riginal analysis, Hedges
et al. (1990) found support for a sister-group re lationsh ip between birds and
mammals to Ihe exclusion of crocodiles (Fig. 5.6b; a solulion favoured by
some morphologists, e.g. Gardiner 1993). However, inspection of the data
revealed considerable substitution bias. In particular, there was an over-
representation of T -C substitutions and a significant under-representation of
A-T and T - A subst itutions. Marshall noted that a large number of sites with
T- C subst itutions supported the bird- mammal relationship. Using dynamic
weighting, Marshall discovered instead that the data supported a
bird- crocodile sister-relationship (Fig. 5.6c, differing slightly from the tradi-
tional ' palaeontological' cladogram, Fig. 5.6a). These results may only attai n
significance wit h respect to a wider analysis of all molecular and morphologi
cal (including palaeontological) data (as in the comprehensive study b
Ecm isse and Kluge 1993, which suggested lillie overa ll su pport for the
bird-mammal relationship).
It is often taken as fac t that it is more reasonable to apply a priori
weighting schemes to sequence data. Many, if no t all. such weighting schem
arc conceived in terms of kinds of substitution. However, the no tion o~
substitution (a process) is itself derived from the dalII. Com rison of two (or
(a)
( b)
...
.'
~
( c) o·o'
(} ~'
Fig. 5.6 Three cJad ograms depicting possib!~ rela tionships among emnlotes. (a) The
palAeontologica l cladogram. (bl The cJadogram derived from the un weighted 18S
rRNA data or Hedges alaI. (1990). (el The cledogram deri ved from the weighted lBS
or
rRNA data Marshall (1992).
more) sequences will either reveal iden tity or not at parlicul ar sites. The
difference is empirical (derived from a comparison). Call ing the differences
betwee n two sites ·subllilulio...• may limply be a label based upon presumed
understanding of tbe .~_. "'1M dlrrerence. Yel the actual difference and
its cause can be . as.In this sense, weighting is
arrived al by observation of differe nces not the ir 'cause', This may seem to be
a minor semantic point, but such argu ments relate \0 the larger issue of
homology, its discovery and its 'celllse'. In the cladistic view, the 'cause' of
homology is irrelevant (and un necessary) to its discovery, which is the result
of analysis a nd characte r congruence. Bearing this in mind, wi th sequence
dula onc may view the kind of differences as empirica l observations 10 be
seen and tested in the light of subsequent analysis.
Compatibility analysis
Compat ibili ty was first suggested as a method of ana lysis rather than a
method of weighting. As such, compatibility was not favo urably rece ived and
is lillie used these days. However, several authors have suggested that when
used as a weigh ting scheme, it may be of some value (Penny and Hendy 1986,
Sharkey 1994). Some of these ideas are re'iiewed briefly below.
Sharkey (994) presented a method 01 character weighting that used
compatibility, although similar approaches had been presented previously by
Pcnny and Hendy (986) and Sharkey (1989) (but see Wilkinson (994).
Howeve r, Sharkey had some misgivings about these earlier attempts, hence
we will consider only his most recent method.
Sharkey (994) began his discussion by cit ing Farris' (1971) distinction
between 'congruent' and 'compatible' characters. Sharkey's interpretation
suggested that congruent characters corre late with respect to particular
phylogenetic hypotheses, while compatible characters are correlated with
each othe r in the data sct. Thus, congruem characters are determined after a
cJadogram has bee n constructed (a posteriori) and compat ible characte rs are
judged prior to cladogram construction (0 priori). Sharkey devised a simple
example to demonstrate the use of compatibility analysis (Table 5.4), in wh ich
13 binary characte rs are scored for eight taxa (and an all-zero root). Charac-
ters 1- 12 are all perfect ly compatible wi th each other. Cha racter 13 is
incompatible with every other character (except character 1 which is uninfor-
mative) and therefore should be a priori considered the weakest in the data
Table 5.4 Character matrix for 'compalihilily'welghting. Charoclers 1- 13 (columns),

taxa A- H with 0 as an all·zero root.
1 2 J 4 5 6 7 B 9 10 11 12 13
o o o o 0 0 0 0 0 0 0 0 0 0
A 1 1 1 1 1 1 1 1 0
B 1 1 1 1 1 I 1 1 1 1
C 1 1 1 1 1 1 1 1 0
D 1 1 1 1 1 1 1 1 1 1 1 1 1
E 1 1 1 1 1 0 0 0 0 0 0 0 0
F 1 1 1 0 0 0 0 0 0 0 0 0 0
G 1 1 o 0 0 0 0 0 0 0 0 0 1
H 1 o o 0 0 0 0 0 0 0 0 0 1
Character weiglllillg 109
OHGFEDCAB
"
(.)
O
,
HGFEDCAB
"
"
(')
fig. 5.7 The two equally most parsimonious dadograms derived from the data in
Table 5.4. (a) On this cJadogram, cha racter 2 fits perfectly with a single step, while
character 13 fits with four steps. (hI In contrast, on this cladogram, character 13 fits
with only three steps. although character 2 now fils less than porfBclly with two
steps.
set. Parsimony analysis yields two cladograms (Fig. 5.7). The first cladogram
(Fig. 5.7a) accounts for character 13 with four occurrences (branches leading
to taxa B, D, 0, and H). This is its worst possible performance on any
cladogram. The second cladogram (Fig. S.7b) accounts for character 13 with
three occurrences (branches leading to taxon G + 1-1, and branches leading to
B and D). Character 13 fits the second cladogram better than the first by one
step and provides additional support for the group G + H from character 3
(us a ' reversal'). Both cladograms are equally parsimonious. In this case, the
first cladogram (Fig. 5.7a) is preferred because more characters are compati·
ble. As Sharkey pointed out, this is not based upon any a posteriori considera-
tion of fit, but on the a priori consideration of compatible characters 1- 12. It
is worth noting that the first cladogram would also be selected by successive
weighting (see beloW).
To measure the relative amount of compatibility in any data set, Sharkey
proposed the unit discriminate compatibility measure (UDCM) of a charac·
ter, described as 'the complement of the probability of a derived character
state being nested with lnotber derived character state or the probability of a
derived character stlto lIduaive of another derived character state,
depending on the observed alw'Ktcr comparison' (Sharkey 1994).
The UDCM .I1OM on the balis of their
lJO Measllres of character fit and character weighting
overall compatibility. Sha rkey suggested Ihal o nce the weights had been
calculated Ihey could be uscd in parsimony analyses. Sharkey's objective was
to assist in choosing among competing trees. However. the idea has nOI been
tested in detail and it rem ains 10 be seen whether, li ke compatibility for
cladogram construction, such weigh ting schemes end up discarding (by down-
weighting) many characters.
5.2.3 A posleriori weighting

A posleriori weighti ng refers to methods that derive we ights after dadogram
const ruction. The idea was first suggested by Farris (1969) in relation to the
co ncept of cladistic consistency, which is based upo n measuri ng the amount
of discordance Ihal individual cha racters show o n a particular c1adogram , and
is the basis for 'successive we igh ting' and ' implied weighting'. Discordance is
another way of describing leve ls of ho moplasy, and homoplasy is usually
rellected by the number of extra steps required for characters to fit a
c1adogrum. Measures of discordance frequently used in cladistic analyses are
the consistency index (ci), retention index (rn and the rescaled consistency
indcx (re).
Cladis ti c consis ten cy

The essence o f clad istic consistency can be grasped fro m a simple example
taken from Farris (1989). Consider the fit of two characters, I and 2, on two
eq ually parsimonious c1adograrns, X and Y. Character I is a binary character
which, if it filS any c1adogmm in the worst possible way, will have on ly two
steps. Character 2 is also a binary character wh ich. if it fits any c1adogram in
the worst possible way, will have 15 steps. Character 1 fit s cladogram X with I
step (its best) and fits cladogram Y wilh 2 steps (i ls worst). C haracter 2 fits
c1adogram X with 15 steps (its worst) and fits cladogram Y with 14 steps
(nearly its worst). Both cladograms are eq ually parsimon ious and only o ne
step is s,lVed in each character. One may reasonably ask, which of the two
dadograms, if eit her, is to be preferred? It would seem intuitively obvious
that c1adogram X is the bettcr of two. The fi t of character I is perfect (I step,
cha racte r 1 is a synapomorphy) and while character 2 has a worst-case fit o r
15 steps, the improvement of a single step ach ieved on cladogram 2 is not
very impressive. Character 2 is clearly less in forma tive th an characte r I with
reference to both cladograms X and Y. Therefore. while one step is saved on
each cladogram, it seems preferable to use those characters wh ich have an
overall be tter fit. In other words, characte rs sho uld ge t the weight they
deserve. Such ideas suggest that some characters arc cladistically more
reliable than others. Successive weighti ng incorporates this idea, alt hough it s
implemen tation was not widely avait:lblc untit the rdc:lsc of the p:lrsimuny
program , l-I ennig86. Thc implcmcntatillO of stlcccssivt! weighting in Hcnnig86
differs from that originull), uUlllncd ~ "arri!i (1%9) and ill dcscribed below
C!wt'l/(:ter weigh,ing lit
Successi ve weighting
First, we need to recapitula te brie fl y the measures of homoplasy. Whe n
characters fit a cladogram pe rfectly (t ha t is, for binary characters, 10 a singl e
node o nly) they a re 100% consisten t with that cladogram. When characte rs
do not fit a cladogram perfectly (for binary characters, to more than o ne
node) Ihey are less th an 100% consiste nt. Characters that do no t fit to o ne
node of a cladogram exhibit homoplasy, which is discordance with a parlicu·
la r cladogram a nd is measured by the consistency index, d. Thus, the cladistic
consistency of a c haracter is potentially measurable re lative to its perfor-
mance on a panicular dadogram. However, altho ugh a character may show
homoplasy on a particular cladogra m, this does not mean that aU of its
simil a rity need be uninformative. The amount of simila rity interpreted as
synapo morphy is measured by the retention index, ri . The weight of a
character ca n be seen as a function of its fil to a dadogram a nd requires
consideratio n of bo th ho moplasy and syna po morphy. Both ci a nd ri can be
uscd to estima te weights for characte rs tha t have a direct bearing on their
eviden tial value with respect to the recovered c1adograms. Fa rris (t 989)
suggesled rescaling ci using the ri values, to give the rcsca led consistency
index (rc): essentia lly this is the product of d and rio Thus, c haracters wi th no
similarity interpre ted as synapomorphy (ri = 0) will be disregarded, irrespec-
tive of their level of homoplasy. Those that have some similari ty inte rpreted
as synapomorphy relative to a cladogram will have a weight proportiona l to
the amount of ho moplasy.
Referring back to Matrix I (Table 5. l a) and the cladogram in Fig. 5.le, four
characters fit this cladogram perfectly with ci - 1 and ri = 1 (cha racters 1- 4,
Table 5.5). Their ' deserved' weights a re calculated using rc, giving each a
va lue o f t (ci x ri), which is then scaled by He nnig86 between 0 and 10 to give
final weights of to. (PAUP scales the weights betwecn 0 and 1000.) Charac-
ters 5 a nd 6 fit the c1adogram with two steps a nd thus ci = 0.5. H owever, none
o f the ir si milarity is interpreted as synapomorphy and so for both c haracte rs
ri = O. Conseque ntly, rc = 0 (ci x ri)o They can thus play no part in determin·
ing the topology of the cladogram. As the weights can affec t both the number
;lfld topology of the resulting c1adograms, it is necessal)' to ensu re that the
weights stabilize. This is achieved by repeating the reweighting procedure
until thc weights assigned to each characte r in two successive itera tio ns arc
iden tical- hence the na me, successive weighting.
Ot her parsimony programs, such as PAUP, can implemen t o the r indices l)r
combinations of indices to assign we ights. but the use of such mC(lsurcs is still
su bject to some deba tc. However, the principles of successive wcighting
remain the same.
One major misl;onccpllon conccrnina ljU(:"cssivc weighting mu st he laid
firmly to rest. It is widely beltlYtd Ihl' .u«cs.~i vc weighting can be usell 10
reduce the numhcr or . cladograms fuund ror II
given dllta sct of the prnccdurc. Thill;
Table 5.5 Ma trix for successive and implied weighting. Characters 1- 6, taxa A-O
(extracted from Matrix 1, Table 5.1a), I ""length. m:: minimll m possible step s,
/ol = minimum steps on 8 bush. ci:: consistency index. ri "" retention index, sw =
successive weighting using rescaled d, Iw "" implied weighting using (K/ (K + ESJ ),
where K= constanl and &Si = extra steps. In this example, K =3.
Characters
Taxa 1 2 3 4 5 6
A 0 0 0 1 1
B 0 0 1 1 1 1
C
0
•
1
1
1
1
1
1
1
0
0
1
0
I 1 1 1 1 2 2
m 1 1 1 1 1 1
•
d
1
1
2
1
2
1 1
2
0.5
2
05
ri 1 1 1 1 0 0
sw (ri x ciJ
iw (K/(K + ES;)
10
0
1.
10
10
10
10
0
•
75
0
75
it most certainly is not. Successive weighting is a met hod for selecting

clwraclers according to their consistency on a given set of c1adograms. It is
not a method for choosing among those cJadograms. If the application of
successive weighting results in a smaller number of equally most parsimo-
nious c1adograms, then that may be considered a fortun ate side-effect. But
this need no t necessarily happen. If many characters have low consistencies,
and thus receive low weights, then the number of equally most parsimonious
solutions may act ually increase, sometimes quite dramatically. Even if the
number of minimum length c1adograms does decrease, they still need not be
a subset of the original set. Successive weighting effectively creates a new
data set, in which some characters (the more consistent) are replicated more
times than others (the less consistent). We shouJd not therefore be surprised
if successive weighting produces more and diJrere nt topologies from those
with which we started. It also follows that if we are to be consistent in
applying successive weighting, we should do so even if we in itially obtain only
a single most parsimonious c1adogram . Furthermore, we shou ld not co n s ide~
successive weighting to have ' fail ed' if this one cladogram is then replace
with over 1000 equally most parsimonious solutions, because this resul
actually im plies thai the data supporting the original solution were not ve ry
consistent and the hypothesis of re lationships was not at all strongl
supported.
Implied weighting
With respect to homoplasy. cxtra steps and finina aiii....1CrI 10 cladograms,
Charact~r weighting 113
At ,
b..
I·)
At
"" sleps
Ib)
steps
At
'-...
\
(0)
I'
steps
Fig. 5.0 Graphs depicting the three kinds of fitting function used to adjust for
relative cladistic consistency. (a) Linear. (b) Concave. (cl Convex. (Arter Goloboff
1993J
Farris (1969) discussed three forms of fitting function that could be used to
adjust for relative cladistic consistency: linear, concave, and convex (Fig. 5.8).
For linear fil (Fig. 5.8a), the cladogram with the overall shortest length is
considered optima l. The problem with this approach is that the relative values
of the characters are ignored. Linear fitling is used when equal (uniform)
weighting is applied, that is, all steps count equally. It is therefore equivalent
to the 'default' option of most parsimony programs. One conclusion that can
he drawn from uniform weighting is that the reliability of the weights (and
hence the characters) is sct prior to the analysis-they are all eq ual. Yet it is
clear Crom many, if not most, analyses that there will always be some
characters that behave well and othcrs that behave poorly. The implicat ion is
that they do nOI all contribute equal kinds of information.
For concave fit (Fig. 5.8b), • non·linear relationship reflects how well each
character performs on I relaIM ..... To return to the example given above,
which considered two binary . . . . . . . (l and 2) and two different but
equally parsimonioUi cia . . . V). both characters differed by a
114 Measures of character fit alld characler weighting
single step on the competing cladograms. However, character I has maximum
and minimum observed steps of 2 and 1 on cladograms Y and X respectively,
while character 2 has maximum and minimum observed steps o f 15 and 14 o n
cladograms X and Y respectively. Intuitively. character 1 should receive
greater weight than character 2. The proportionaJ difference can be assessed
by use of 'extra steps'. For two cl adograms on which a character has s· and S 2
sleps respectively (with 51 having the larger value), this proportional differ-
ence is given by ($1 -S1 )/{SIS 2), In this example, character 1 has a value of
0.5 and character 2 has a value o f 0.005. In short, concave fit gives preference
to those characters with least homoplasy.
For the sake of completeness, Farris included a short discussion on convex
fit (Fig. 5.8c). This approach implies the opposite of concave Ci t, suggesting
that characters with greater homoplasy are to be preferred - clearly, this is
not a sensible option!
Goloboff (1993) exploited concave fit to determine character we ights in his
computer program, PI WE. Here, weights are ca lculated as W - K/{K + ESi),
where ESi is the number of extra steps per character and K is the constant
of concavity (the inclusion of a measure of 'extra steps' makes Goloborrs
approach analogous to the discussion of Farris (J 969». Referring back to
Matrix 1 (Fig. 5.1a) and cladogram in Fig. 5.1c, four characters (J-4) fit this
cllldogra m perfectly (Table 5.5). C haracters 1 and 4 are uninformlltive and so
P1WE assigns them zero weight, in contrast to successive weighting which
assigns them maximum weight (in practice, of course, this makes no differ·
ence to the analysis, mere ly adding to cladogram length). Characte rs 2 and 3
fit the cl adogram perfectly, th at is with no extra steps, and hence receive the
maximum weight of 10. Characters 5 and 6 fit the cladogram with two steps
and hence each character has one extra step. Thus, for K - 3, they receive
weight s of 7.5. Two co nsequences can be seen immed iately fro m this simple
example. A ll characters. unless completely uninformative, receive a no n·zero
weight and this weight varies according to the value assigned to K. When ~
is altered, the weights change. For characters 5 and 6, weights for values of
K = 1 to 6 are given in Table 5.6. Goloborf (1 993) then used the total weight
of all characters, rather than c\adogram length, to se lect the best c1adogram.
Table 5.6 Weights for characters 5 and 6 from

Table 5.5 with. K = 1-6.
Value of K Weight assigned
1 1/{1+1)=0.50 X 10 - 5.0
2 2 /(2 + 1) = 0.66 x 10 "" 6 .6
3 3/(3 + 1)=0.75 x 10= 1.5
4 4!(4+1)=0.80 X 10=8.0
5 5/15 + 1)- 0.83 x 10- U
• 6/(6+ 1)-o.u x 10 - ...
Character weighting U5
In Goloboffs terminology, the ' heaviest' cladogram is selected by summ ing
all the assigned weights and choosing the cladogram with the largest value.
Implied weight ing is relatively new and few studies have yet been under-
taken (Goloboff 1995b, Szumik 1996 and, for critical examination, sec Turner
1995),
5.2.4 Prospects
The mechanics and implememation of a posteriori weighting are described
above, but what of the resu lts? Carpenter's general aim was to investigate a
rational way to reduce the number of equally parsimonious dauognuus using
empirical criteria. However, results suggest that this general expectation need
not be the case, for either successive weighting or implied weight ing. "TWo
significant results have emerged. First, a greater rather than a smaller
number of cladograms than were in the original set may be recovered after
weighting, and second, the cladograms found after weighting may not be
among the origi nal equally most parsimonious suite (e.g. Platnick el al.
1991b),
With respect to more cladograms, it may be that of the original equally
weighted characters, few were cladistically consistent (or self-consistent in the
term inology of Goloboff 1993) and many received zero weighl. The clado-
grams derived from the equally weighted data thus depend upon a few
ambiguous characters. T he significance of obtaining different topologies from
/I posteriori weighted data is perh aps more controversial. It has been argued
that this result is not unexpected (Platnick et al. 1991b, Goloboff 1993), for,
in practical terms, weighting is equivalent to excluding some characters and
non-randomly replicating others. Consequently, a differentially weighted data
sct need not give the same results as an equally weighted one. Others have
argued qu ite simply that longer cladograms should be disregarded as they
violate the basic prem ise of parsimony (Turner and Zandee 1995).
Consider one example. The elegant study of haplogyne spiders by Platnick
j·t al. (199 1a) yielded ten equally most parsimonious cladograms using equal
weights (length = 184). After successive weighting (implemented by
Il cnn ig86), this number was reduced to six (length 568; note this large
increase in length is to be expected, because, for example, a perfectly
consistent character now adds 10 to the length of the cladogram, ra ther than
the previous 1). None of the six we ighted cladograms were among the o riginal
ten . When the six weighted cladograms were inspected using the unweighted
matrix, two cladograms had a length of 185 (one step longer than the optimal
thtdograms from equal weighting) and the other four had a length of 187.
Platnick t t al. reasoned that of the six ciadograms, the two with length 185
wcre worthy of further coDIkIIraIioo Ind were better than the equally
weighted ciadograms in spite oIlbI r.ca thai they are one step longer. They
araued that for the characters contribulina to the
topology could be considered more 'consistent ' than those under equal
weights. This suggests a further concjusion relevant 10 weighting. Rather than
hcing a method to reduce the number of equally parsimonious cladograms,
Goloboff (1993) and Pial nick el al. (19910) have suggested that parsimony
analyses require weighting to achieve self-consistent results, even if o nly a
single most parsimonious cladogram is found using equal weights (Ihis view
has been comestcd by Turner and Zandee 1995; with a reply by Goloboff
19950). Platnick et al. (1996) have further suggested that equal weights can
only be considered a preliminary and crude estimate of the relative value of
the data. Such views are consistent with the general understanding of cladistic
parsimony, that the 'value' of a character is related to its performance on a
dadogram. Further, coupling this wi th a notion of support for each clade
leads to a firmer choice of particular c1adograms, whether only o ne results
from analysis or many.
A posteriori character weighting holds a berlain amount of promise and
may help 10 produce morc consistent cladograms relative 10 the dala col-
lected. Progress may result from performing analyses with different parame-
ters for weighting (as well as differen t values for obtaining weights) on more
data sets (e.g. Su ter 1994). Whatever the outcome, it see ms likely Ihal the
view of Platnick et at. (1996), that equal weights can on ly be a preliminary
and crude estimate of any particular data set, is worthy of further
consideration.
5.3 CHAPTER SUMMARY
I. The simplest measure for assessing the fit of data to a cladogram is

c1adogram length. Being the shortest, the most parsimonious cladogram
has the best fit to the data. Longer cladograms have poorer fil.
2. The consistency index, ci, of a character is defined as the ratio of m, the
minimum number of steps a character can ~xhib it on any cladogram, to s,
the minimum number of steps the same character can exhibit o n the
cladogram in question . The consistency index measures the amount of
homoplasy in the data. Proble ms with the consistency index are that
unique characters (autapomorphies) and invaria nt characters will inflate
its value; it can never allain a zero value; and, in general, its val ue is
inversely proportional to the number of taxa induded in the ana lysis.
3. The retention index, ri, was introduced to address the problem 0
character fit to a cladogram, as opposed 10 amou nt of homoplasy dis·
played by a character. The ri is defined as (8 - s)j(g - m), where g is the
greatest number of steps a character can exhibit on any c1adogram. The
retention index mealures the amount of almllUtlJ In''rPrelcd as 5)1nll
morphy and, unlike lhe d, can luaJn I ...
Chapler summary 117
4. Values for ci and ri a re useful for examining how individual characters

perform on a c1adogram. In order to assess how an e ntire data set
performs, the e nsemble consistency index (Cn and ensemble re tention
index (RI) arc used. CJ = M /S and RI =(G - S)/(G / M), where M , S
a nd G are the sums of all the m , sand g values for the individual
characters respectively.
5. Weighting of characters can be either a priori (tree independe nt, hypoth-
esis independent) or a posteriori (tree depe nde nt, hypothesis dependent).
6. There are two types of a priori weighting. A character can be weighted
according to what we can discover about its origin, e ither empirically
(ontogeny) o r histo rically (phylogeny). Phyloge ny differs from ontogeny in
that it requires knowledge of the 'evolutionary' properties of characters,
which may not be easily or even readily accessible. This kind of weighting
is usually unde rstood as some kind of connection between parsimony and
evolution. 'Character analysis' is considere d by some to be a form of
weighting but simply refers to a thorough re -examinalion of data from
specime ns to ensure that structures are worthy of comparison. Alterna-
tively, characters can be understood as being 'compatible' with each o ther
and may suggest that particular associations of taxa can be a priori
prohibited. Character compatibility has had a somewhat chequered his-
tory, having been used as a me thod of phylogenetic reconstruction and
consensus tree construction, as well as (l priori character weighting.
7. We ights can be assigne d a posteriori using the concept of character
consistency, a concept directly related to the cladograms resulting from a
cladistic analysis and their implied levels of ho moplasy.
8. The consistency index can be used to assign weights to characters.
However, because the ci cannot attain a zero value, Farris suggested
rescaling the ci using the retention index, to produce the rescaled
consistency index, rc.
9. Oncc weights have bee n assigned to characters using fe, the most
parsimonious c1adograms for the weighted data set are found. The
cladistic consistency of the characters on these new cladograms forms the
basis of .a new set of weights, and the process is continue d until the
weights stabilize. This is successive approximations character weighting.
10. In contrast, Goloboff suggested that the weight of a character (its
' implied weight') is a function of its fit to a cladogram. Therefore, the
best c1adogram maximizes fit, giving a measure of ' total fit '. The fun ctio n
of fit to a cladogram requires consideration of fit in terms of homoplasy.
II. Farris suggested three kinds of fitting function: linear, concave and
convex. Goloboff recommended concave fit (following Farris) a nd adopted
an approach lhlt co. . . . . die direct weiahting of 'extra steps', such that
the weight 01. by K/(K + ESi). where ESi is the
6.
Support and confidence statistics for
cladograms and groups
6.1 INTRODUCTION
The study of phylogeny is an historical science, concerned with the discovery

of historical singularities. Consequently, we do not consider phylogenetic
inference per se 10 be fundamen tally a statistical question, open to discover-
able and objectively definable confidence lim'ts. Hence, we are in diametric
opposition to those who would include such a standard statistical framework
as part of cladistic theory and pract ice. However, while the most parsimo-
nious cladogram does represent the best sum mary of the data to hand and is
thus the preferred hypothesis of relationships among th e study taxa, it is
na'l've to assume that it also represents the 'true phylogeny', Cladograms are
always subject to revision in the light of new data, reinterpreted hypotheses of
homology or improved analytica l methods. Bu t such changes can leacJ to
instability in taxonomy and no menclature if they are undertaken too fre-
quently or without sufficient fore thought. If authorit arianism is to be avoided,
some objective means needs to be found to determine wh ich changes are to
be included in the new, improved classification and wh ich are not.
6.2 RANDOM IZATION PROCEDURES A PPLIE D TO

T HE WHO LE CLADOGRAM
A question often asked o f a data set is whether it co nta ins 'significant

cladistic struclUrc', that is, whether we can have any confidence thai the
results of a cladistic analysis are, in some sense, ' real' and not just by-
products of chance. The concept of cladistic structure can be studied fro
two viewpoints. The first attempts to assign confidence to the most parsi mo-
nious cladob'Tam as a whole, while the second examines the support afforde
to individual clades wit hin the most parsimonious c1adogram and asks whic
of these are reliably supported by evidence and wh ich only weakly so.
To attempt to answer the question 'Could a cladogram as short as Ihis hav
arisen purely by chance?', it is necessury to compare th e length of the mos
parsimoniolls c1adograms derived from the real data wit h those ohlained fro
'phyloge nctica lly uninformative' dutu sets. Severnl definitions o f phylogeneli
cally uninformative daul have been pro d, For clUtmpJe, Archie nn
Randomization procedures applied to the whole cladogram 119
optlrrel cladogram
inc r"8singcladoll'amlllflg lh ----.
Fig. 0.1 The basic concept underlying 100st tree support statistics. A data set of real
observations is repeatedly perturbed according to a particular set of rules to yield a
large number of pseudorepHcate sets o[ 'phylogenetica lly uninformative' data. The
length of the most parsimonious dadogram(s) derived from the real data set is then
compared with the lengths of the most parsimonious c1adograms obtained from
these contrived data sets, with the expectation that the former (indicated by the
arrow) would be very much shorter than any of the latter (represented by the
frequency histogram).
Felsenstein (1993) interpreted it to mean 'statistically random'. In this method,

random daw are generated using a model in which there is an equal
probability of stale 0 or 1 in every cell. A random addition sequence
algorithm is then used to generate ' random cladograms'. The expected
number of steps on each random cladogram is then calculated. These values
are then compared with the length of the most parsimonious cladogram
obtained from the real data set, with the expectation that the latter should be
substantially shorter (Fig. 6.1),
6.2.1 Data docisiveness

In contrast, Goloboff (1991) argued that systematists expect data sets to
contain information that allows them to choose among different hypotheses
of relationship. A data set in which all possible informative characters occur
in equal numbers will provide no such evidence because all possible fully
resolved topologies will be of the same length . Because no choice can be
made among the c1adograms, the data set is phylogenetically uninformative.
Goloboff termed such a data set 'undecisive' (Table 6.1). In contrast, a
decisive matrix yields clac.toarlm that differ in length among themselves ami
thus offers a fcalOR for c:bac11.1DmO c1lldogroms over others. As the degree
of difference IQ Ie decisiveness of the matrix.
120 Support and confidence statistics for cladograms and groups
Table 6.1 Example or Q IJ undecisive data sci
Taxa Characters
A 1101101 100 ltD 110 1 100 1 toDD
"
C
0
101 101 1 010 101 101
011 011 1 001011011
000111 0111 000111
1 010 1 0 100
1 001 1 0010
0 111 1 0001
E 000 000 0 ODD 11 1 111 1 111 o 1111
Goloboff showed that the CI, RI and RC do not vary directly with decisive-
ness and he nce the least decisive matrices arc l1ul nt:ct:l>Sarily those with the
lowest values for these statistics. Nevertheless, a general measure of decisive·
ness is possible.
Data decisiveness (DO) is defined as:
DD ~-_
s- s
S-M
where S is the observed length of the most parsimonious cladogram, S is the

mean length of all possible bifurcating c1adograms and M is minimum
possible length of a cladogram were there no homoplasy in the data (the
same variable, M , that is used in the calculation of RI). The higher the value
o f DO, the more the cladograms derived from a data matrix differ in length.
DO decreases as characters increasingly conflict with one another, with a
value o f 1 implying there is no conflict, while a value of 0 is achieved fo r
wholly undecisive data. It is unaffected hy the presence of uninformative
characters in the data. DO is nOI necessarily correlated with the number of
equally most parsimonious cladograms for a dala set, nor has it any strict
con'nection with the strength of preference for the most parsimonious clado-
gram over every o ther cladogram. A very low decisiveness implies that there
arc only very weak reasons to prefer the most parsimon ious cladogram over
any other topology, includ ing those just one step longer. However, the
converse does not necessary hold true, that a high DD signifies a high
cladistic information conten t. In particu lar, DO cannot be used to assign a
co nfidence level to the most parsimon ious cladogram. T o' achieve this, it
would be necessary to compare whether the observed decisive ness is signifi-
eu ntly lower than that of random data with the same number of characters
and thus far, the statistical distribution of decisiveness generated from
random data remains undetermined .
6.2.2 Distri bution of cladogram lengths {DCO

Several authors have examined the distribution of cladogram lengths (DeL;
strictly, the distribution of ~naths of all possiblo bUurcalina t;ladoMrams) for.
Randomization procedures applied to the wllole cladogram 121
give n data set as an indicator of phylogenetic signal in the data. They have
argued that for a DeL that is nearly symmetrical, many cladograms will be
o nly a few Sleps longer than the most parsimonious cladogram and the
phylogenetic signal is weak. However, if the DeL is strongly negatively or
left-skewed (i.e. with a long tail to the left side of the distribution), then there
are relatively few cladograms that are just slightly longer than the most
parsimonious solution and the phylogenetic signal is therefore strong.
One significance test proposed by Hillis (1990 is based upon the null
model in which characters are generated independently and al random, and
all states have the same cxpected frequency. We would then conclude there is
significant cladistic structure, and that the characters are highly congruent,
when the skewness statistic, CI> for the real DeL is below the fifth percentile
(for example) of the DeL 81 derived from matrices of randomly generated
data. However, this effect can also result from the charactcrs simply having
different frequencies among the taxa. A character that divides the taxa into
twO groups of sim ilar size tends to make the DeL more symmetrical.
Conversely, characters that allow the recognition of small groups of taxa tend
to increase left-skewness. The significance test outlined above confounds this
effect with that due to character congruence.
Huelsenbeck (1991a) compared the DCLs generated from real and random
data using g]. Using simu lation tests, he found that data that were consistent
with o nly a single most parsimonious cladogram tended to produce a strongly
Icft -skewed DCL. In contrast, data that were consistent with numerous
equally most parsimonious cladograms produced a more symmetrical DCL
that cou ld not be distinguished from those generated from random data.
However, a hierarchical pattern will be recovered whenever there is character
congruence , whatever its source, and g] is generally negative even for
randomly generated data because of chance congruence within such data.
Thus, the te.',t is whether the observed skewness in Del is no less negat ive
than would be expected from random data. If it is, then thc conclusion is that
there is significant phylogenetic (hierarchical) signal in the data. H it is not,
then one would conclude that the observed character congruence is largely
due to chance and the most parsimonious cladogram is a poor estimator of
phylogeny.
Huelsenbeck's simulations were based on a model in which the probabil i-
ties of character change were equal along all branches of the cladogram. The
accuracy of the most parsimonious cladogram resuhing from these simula-
tions is determined by the p robability of character change along its branches.
If the probabil ity is very low, then most characters will be invariant or
autapomorphic. If it is too great, then the d istribution o f the characters
effectively becomes independent of the cladogram . Skewness is affected
similarly. When charactan . . tavuiant or autapomorphic, skewness is O.
Skewness is weak ...... die . . . . . 01 chanae is large and the expected
frequenciea of aU "I HUU,', null model. Skewness is
122 Support and confidence Slalislics for cladograms and groups
strongest with intermediate probabilities. Accuracy of the most parsimonious

cladogram is thus correlat ed with the degree of skewness of the DCL but that
correlation is not general. II results simply fro m the constraint that character
change has the same probability along each branch of the cladogram.
DeL skewness is also insensitive to the number of characters in the data
set and thus does not necessarily reflect the degree to which conclusions are
corroborated, a property that wou ld be expected of a measure of phylogenetic
struct ure in a data sel.
Skewness as a measure of cladistic structure in data also suffers from two
practical difficulties. First, informative cladistic data would be expected to
show a strongly left-skewed DeL with a highly ,Ittenuated left tail. However,
skewness is determined primarily by the central mass of the distribution. A
negatively skewed DeL will occur whenever the median value exceeds the
mean and thus need not necessarily have a 10Dg tail. Determining the degree
of attenuation of the tail by estimating the le"gths of all possible bifurcating
cladograms is relatively quick for small numbers of taxa. However, as noted in
Chapter 3, the number of such cladograms rises vel)' quickly as the number of
taxa increases, as a result of which sampling the cladograms becomes neces-
sary in order to achieve a result in an acceptable time. The problem is that as
the left tail becomes ever more attenuated, a sample of topologies is increas-
ingly less likely to include cladograms from this region. Thus, the very
cladograms upon which the sign ifica nce test is based have little chance of
affecting the skewness calculations. Second, the DCL is based only on fully
bifurcating cladograms. Lf many of these cladograms represent arbitrary
resolutions, then cou nting them as dist inct can lead to erroneous conclusions.
To sum marize, skewness of the DCL as a measure of the strength of
cladistic structure in data consists of using a poorly chosen statistic to
summarize a poorly chosen distribution.
6.2.3 Permutation tail probability (pTP)

It was noted above that, as a criterion for evaluating a c1adogram in terms of
its efficiency at summarizing the data, the ensemble consistency index (CO
suffers from the limitation that it is indifferent to the distribution of the
states of characters among taxa. The ensemble retention index (Rn was
specifica lly developed to remedy this deficiency by according greater influ-
ence to characters that support larger monophyletic groups and downweight-
ing those that occur more distally o n the cladogram. However, both CI and
Rl are expressed as proportions and are thus inse nsitive to the lOtal number
of characters, a deficiency shared with De L skewness. Two data sels with
identical CI and RI can differ in the amount of support they afford •
cladogram if ooe data set includes a larger number of characters than tho
other.
These perceived limitations _ot CI and RI it the numbe
Ralldomization procedures applied to the whole cladogram 123
of characters and fheir distributions among taxa could both some how be
laken into account. CI measures the degree to which data are explained by a
cladogram against a standard defined by the theoretic;tl minimum number of
steps possible for that data. This minimum can only be attain ed if the
characters covary in such a way that they all fit the cladogram perfectly, that
is, none of them shows any homoplasy. This cladist ic covariatjon reOecrs the
degree to which all of the characters are explainable by the same cJadogram
topology.
Faith and Cranston 0990 suggested that those most parsimonious clado-
grams that entailed large amounts of homoplasy might derive [rom data sets
in which the characters exhibited such poor cladistic covariation that analysis
of a comparable set of randomly covarying characters could produce a
cladogram of equal or even shorter length. If th is were true, then doubt
would be cast upon the validity of the origi nal result. In order to determine
whether the degree of character covariation in a data set is significantly
greater than that expected for a set of comparable randomly varying charac-
ters, Faith and Cranston proposed the permutation tail probability (PTP) test.
The PTP test (Fig. 6.2) uses a randomiza tion method in which the states of
each character are permuted and randomly reallocated among the ingroup
taxa in such a way that the proportions of each state are maintained. So, for
example, given five ingroup taxa A-E, if a character occurs with state 0 in
taxa A, B and C, and with state 1 in taxa D and E, permutation would
maintain the 3:2 ratio of occurrence of state 0 to stale I, but migh t allocate
state 0 to taxa A, C and E, and state 1 to taxa Band O. The character states
of the outgroup taxa are held constant and not permuted. This procedure is
applied to each character in the data set independcntly. In this way, the
identity of each character and the frequencies of each of its Slates is
maintained, but the clad istic covariation among the characters is disrupted.
The procedure is then repeated, say 99 times, to give 99 new data sets, which
arc then analysed using standard parsimony techniques. The lengths of each
of the most parsi monious cladograms from these 99 analyses arc then
compared to the length of those derived from the original matrix. The PTP is
then defined as the proportion of all data sets (permuted plus original) that
yield cladogtams equal to or shorter than those produced from the original
data set, and might be interpreted as the probabili ty of obtaining a cladogram
of this length under a model of random character covariation.
The value of the PTP has also been employed as the basis of a statistical
lest to assess Ihe degree of cladistic st ructure in the data. The null hypothesis
of this test is that there is no cladistic structure beyond that d ue to chance
and, for example, would be rejected at the 0.05% level if the mOSI pa rsimo-
nious cladograms of fewer than five of the 100 data sets were as short or
shorter than those derived from the unpermuted data (PTP So 0.05). When
none of the permuted dalll IItI prodlKles minimum length c1adograms as
ahort as thOle derived lIMn Faith and Cranston suggested
124 SlIpport and confidence statistics for dadograms and grollps
,.) O1af8CISfS
, •
Taxon A
•
0000010010
7 8 C 0 E
Taxon B
Taxon C
Ta~ on 0
Taxon E
11 00000000
'111"'201
111 1 0 11201
100 101 110
-
OlaracI61s
A 8 C 0 E
1') 4 7
Taxon A 101000201
Tax on
Taxon
Taxon
Taxon
B
C
0
E
1110011110
1000111200
0110101001
1101010 0 10
-
(0)
'"
1/>0"'111
oplimal clNog"m
(1635)
~
• 2150 28SO
N'ocr..alng cladogr,..., '-rv1h ~
Fig_6.2 PTP analysis. (a) The most parsimonious c1adogra m for the original data set
is determined and its length recorded. (b) The states of each character 8re then
pennuted among the taxa, while maintaining the proportions of each state, to
produce a new data set. This data set is then analysed and the length of its most
parsimonious cladogram recorded. (el This procedure is then repeated a large
number of times and the PTP is defi ned as the proportion of all da ta sets (pemlUted
plus original) that yield c1adograms equal to or shorter than those produced from
the origi nal data set. The null hypothesis tha t there Is no cladistic structure beyond
that due to chance would be rejected at the 0.05% level if the most parsimonious
c1adograms of fewer than 5 of the 100 data sets were as short or shorter than those
derived from the unpermuted data (PTP .:s;; 0.051.
Ihat Ihc PTP be arbitrarily recorded as 0.01. Thus a low PTP is desirable and
a value of less than 0.05 could be taken to imply the preaence of significant
cladistic covariation or structure in the ori&inal"':
Randomization procedures applied to the whole cladogram 125
However, because the PTP test is based upon the determination of the
lengths of most parsimonious cladograms, there is a practical problem with its
application. For smaller data sets, it may be possible to apply exact methods
of cladogram construction and thus obtain precise estimates of the PTP value
in a rea<;onable computational time. However, this is not possible for larger
data sets, which must be analysed using heuristic methods that are not
guaranteed to find the minimum length solutions. Furthermore, the signifi-
ca nce level, a, of a PTP test cannot be less than l/( W + 1), where W is the
number of permutations. For example, if a more stringent a value of 0.01 is
required, then 99 permutations must be undertaken, which may result in
excessively long computational times. Faith and Cranston noted this draw-
back and suggested that the PTP could perhaps be estimated by applying the
same heuristic procedure to both types of data set, i.e. original and permuted.
Alternatively, it might be sufficient simply to compare the results from an
exhaustive search of the real data with those of heuristic searches of the
permuted data. However, Kallersjo et al. (1992) noted that, becau se such
approximate values would always exceed the corresponding exact values. this
apparently conservative latter procedure would actually increase the apparent
difference between the real and randomized data sets and worsen {he risk of
a false conclusion of significant congruence.
Instead, they suggested using the single pass 'hennig' command of the
Hennig86 program (Farris 1988) to estimate very quickly the length of the
most parsimonious cladogram. This approximate length seldom differs by
more than a few per cent from that of ~he most parsimonious dadogram
calculated by an exact method. The test would then use only the number of
permutations in which the lengths of the estimat_ed minimum length c1ado-
grams exceeded that for the real data. If this procedure is applied to both
types of data and repeated many times (say 10000), then the approximation
differences would be unlikely to have much effect on that number. The
successful results of such an application should be simjlar to the histogram
depicted in Fig. 6.1. The bar furthest to the left represents the length of the
estimated minimum length cladogram obtained from the real data. This is
well separated from the rest of the distribution and thus lhere is little chance
that the use ·of approximate methods will lead to an erroneous conclusion.
In addition to this practical problem, the validity of the PTP test has also
been questioned from a theoretical viewpoint. Bryant (1992) argued that the
null hypothesis of randomly covarying characters is contrary to the very basis
of cladistics. Every character is a putative synapomorphy or homology state-
ment, from which it follows axiomatically that cladistic characters will covary
hierarchically. Characters are thus intrinsica lly hierarchical and cladistic
analysis simply summarizes the lolal implied hierarchy in the data set as
efficiently as possible. JI . . . . upocted covariation among characters that is
the assumption Iha. jul1.... lMlNfCh tor Ihe most parsimonious cladogram.
Furthermore, bec. I n Ihe data. it must follow that
126 Support (lIId confidence ~'tatistics for c/adograms and grollps
cladograms are inherently hierarchical . TIle two a re not independent. In

co ntrast, pe rmu ted dnta, in which the covariation among characters has been
deliberntely d isrupted, can have no in trinsic hie rarchy. When analysed by
clad istic methods, such data must have hierarchy imposed upon them in order
to produce a cladogram. Thus the PTP test compares the lengths of rea l
cladograms, which have inheren t hierarchy, with those of cont rived clado-
grams upon which hierarchical orde r has been imposed.
Carpente r (1992) considered lbat Fa it h and Cra nston had failed to provide
any justifica tion why th eir particula r randomization procedure o r significa nce
test should be chosen to specify probabilities. Why not si mply allocate the
en tries in the data matrix completely randomly, with no constraints as to the
values that could be obtained? Carpen ter thus considered th at the PTP test,
and perm utation procedu res in genera l, amounted to ' no more than a
misapplication of stat istics, and add no th ing to the use of cladistic parsimony'.
Hence, rather than providing a crite rion f6r the acceptance or rejection of
a cladogram, the results of a PTP test might be bette r viewed as II re lat ive
measure of overall confidence in the data set and may provide independent
evalua tion of the explanatory power of the data. If the length of the most
parsimonious cladogram derived fro m real data, which is inherently hierarchi -
ca l an d covariant, is not significan tly dirferent from the length of the shortest
cladogram derived from permuted data, which lacks eovariation and upon
which hierarchy must be imposed, then the explanatory power of the real
da ta sct is low. Conversely, if the most parsimonious c1adogram from the rea l
data is conside rably shorter than those derived from any permuted data set,
then we ca n have increased confidence in our result . However, while the PTP
test can indicate that a data set contains significant cladistic structure, the
converse does not pertain . A most parsimonious cladogram obta ined from a
real data sel that is similar in length to those obtained fro m permuted data
does no t denote that this data lacks cladistic structure. It merely red uces our
confi dence in the ove rall support for that pattern . The most parsimonio us
cladogram remains the best estimate of the relationships among the studv
taxa based upon the data to hand.
6.3 SUP PORT FO R IND IVID UAL CLADES ON A

C LADOG RA M
In Chapter I, it was explained that the recognition of mono phyletic groups is

pred icated upon the discovery of ),'}'napomorphic characters. In theory, o nl~
onc synapomorphy is required to establish the validity of a monophylct'
group. However, as the num ber of independen t synapomorphies supporting
clade increases, then so does both its corroboration and our confi dence in i
as a hypothesis of relationships. The refore, the IIIOIt Itraiah1forward asses..
ment of the support tha' can be accorded I II 11m Iy to count Ih
Support for individuul clades 0 11 {J cludogram 127
number of characters on the branch sublending it. In other words, branch

length equates to branch (clade) support.
However, this equation is an idealized concept that can only be applied
objectively and unequivocally when all characters o n a c1adogram are unique
and unreversed synapomorphies. Homoplasy makes the assessment of branch
support difficult, as attempts are made, for example, to weigh a reversal in
one character against an independent development in another. The results of
such considerations are often highly subjective, even authoritari an. In addi-
tion, homoplasy need not be eve nly distributed ove r the c1 adogram but
concentrated on certain branches. Such branches may thus appear to be
much better supported than they actually are if branch length alone is used as
the criterion of support. In the presence of ho moplasy, branch length is
merely a conjecture of suppon, not an absolut e measure, and may be
misleading.
6.3.1 Dremel' support

One method that has been proposed as a more precise measure of clade
support is the number of extra steps required before a clade is lost from the
strict consensus tree of near-minimum-Iength c1adograms. This notion has
been variously referred to as ' Bremer support ' (Kallersjo et al. 1992), 'branch
support' (Bremer 1994), 'length difference' (Faith 1991), 'clade stability',
'[Bremer's1 support index' (both Davis 1993) and 'decay index' (Donoghue et
al. 1992). This last term is perhaps unfortunate, if only because the best
supported groups are those with the greatest decay. As all of these terms are
synonymous, we choose Bremer support, both because it is unambiguous and
in recognition of the author who fi rst applied the concept in the context of
parsimony analysis (Bremer 1988).
To calcul ate Bre mer support (Fig. 6.3) for a particular clade of the most
parsimonious cladogram, all the c1adograms one step lo nger than the mini-
mum are fou nd. Then, the strict consensus of these plus the most parsimo-
nious c1adogram is constructed. This process is repeated, increasing the
length of the suboptimal c1adograms by one step each time until the clade in
question no longer occurs on the consensus. The number of extra steps
required to reach this point is the Bremer support for the clade. When only a
single most parsimonio us c1adogram is foun d for a data set, all the included
clades must have Bremer support > 1. Should there be mo re than one
equally parsimonious c1adogram, then Bremer support calculation begins with
the construction of either the strict or the semi-strict consensus tree of these
cladograms (sec Chapter 7 for details of consensus methods). Some groups
will necessa rily be lost on this fint consensus (otherwise there wou ld not have
been any alternative ~ in the first place) and such groups will have
zero Bremer support. 1'11.-. wlllnevtr "'ere arc multiple equally parsimo-
nious solutions. II ICUI YO ro Bremer support.
128 Support and confidence ,ftatistics for c/adograms and groups
O'eract8l's
"1
TaKon A
t2 • 7 to • • c 0 E
--
0000010010
Taxon B t 100000000
Tallon C 111111201
Taxon 0 1 , 1 0 1 1 20 1
Taxon E 10 010'110
~I
A
• c o E
Consensus 01 3 cladograms
length 15
'01
A
• c o E
•
Fig. 6.3 Bremer support. (a) The most parsimonious cladogram is determined for a
da ta set. (b) All c1adograms thet are one step longer than the most parsimonious are
then determined. The strict consensus tree of these plus the most parsimonious
c1adogram is constructed to determine those groups thol no longor receive unam-
biguous support. (c) This process is repeated, increasing tho length of the suboptimal
cladograms by one step each lime, until all groups are lost. The Bremer support for a
group is the number of steps that have to be added before that group is no longer
recovered o n the strict consensus tree of optimal and su boptimal cladogroms.
Bremer support has advantages over simple branch length as a measure 0

strength of evidence for a clade. When all characters are perfe ctly congruen t,
the most parsimonious reconstruction is unique and Bremer support equates
to branch length. Otherwise, the support for a clade is reduced to the degree
that there arc alternative equally parsimonious groupings or character
optimizations.
Bremer support values have been used as the basis of two methods to
assess the overall support accorded to a cladogram. 'Total support' is simply
the sum of all the Bremer support va lues over the c1adogram. Total support
can be rescaled meaningfully because its upper bound is the length of the
most parsimon ious c1adogram. This follows from the faa thac the support fo
a branch cannot exceed its length. The ·tol.I '~~t ll.lIlhus defined
Support for individual clades on a c/adogram 129
as the ratio of total support to the length of the most parsimonious clado-
gram. When the strict consensus tree of the most parsimonious c1adograms is
completely unresolved, then all groups have zero Bremer support and ti = O.
In con trast, if there is no homoplasy in the data, and hence only a single most
parsimonious cladogram that is fully supported by all characters, then ti = 1.
The total support index measures the stability of the most parsimonious
cladogram(s) in terms of supported resolution, rather than in terms of the
degree of homoplasy in the data. However, a low ti value should not be taken
to imply that all the clades on a cladogram are poorly supported. Some clades
may have high Bremer support values, despite low total support
6.3.2 Randomization procedures

An ahernative and profoundly different approach to the assessment of
support for individual clades applies data perturbat ion and randomization.
Three basic types of randomization methods can be recognized: permutation,
jackknifing and Monte Carlo. Jackknifing can be further divided into simple
first-order jackknifing and more complex, higher-order jackknifing. Monte
Carlo methods can be divided into those that are model-dependent and those
that are model-independent. Of the latter, the best known and most fre-
quently used is the bootstrap.
Bootstrap
Applied to cladistic data, the bootstrap (Fig. 6.4) randomly samples characters
with replacement to form a pseudo replicate data set of the same dimensions
as the original. The effect is to delete some characters randomly and to
reweight others randomly, with the constraint that the sum of the weights for
all characters equals the number of characters in the matrix. A large number
of pseudoreplicates is generated, typically 1000 or more. The most parsimo-
nious cladograms for each pseudoreplicate are then found and the degree of
conflict among them assessed by means of a majority rule consensus tree,
which includes all those groupings that are supported by more than 50% of
the pseudorcplicates (see Chapter 7 for further details). The percentage of
most parsimonious cladograms resulling from the pseudoreplicates in which a
particular group is found might be interpreted as a confidence level associ-
ated with that group. For example, if a group appears in 95% or more of
these cladograms, then it could be concluded that this group is supported at
lhe 95% level.
However, there are several serious limitations to this use of the bootstrap.
First, for such confidence limits to be valid, the groups for which the
monophyly is to be tested should be specified in advance. If we cannot specify
any such groups, then the number of potential groups is so large that in order
to maintain an overall typIl orror r••o of say 0,05, such an exceedingly Iowa
level would be required .... oonfidcnce inlerval would be vastly
130 Support and cOllfidellce sratisrics for cladograms and groups
Ct\araclars
"1 12 4 7 1 A B C D E
luan A 00000 1 0 0 10
Taxon
Taxon C
TalCon 0
Taxon E
e 11000 000 0 0
1 1 11111 20 1
1 1 1101 1 201
1 1 0010 1 1 1 0
---
(OJ Charac l ers
A B C D E
894856 41 2 10
Taxon A 0 1 000100 0 0
Taxon e 0000 0 00 1 10
Ta xon C 20 1 21 111 1 1
Taxon 0 2 01 2 0 11 1 1 1
Taxon E 11 01100110
'01 A B c D E
'"
.,.
,'"
Fig. 6.4 Bootstrap ana lysis. (a) The most parsimonious cl adogra m is determined for
the original da ta set. (hI Characten; are then sampled randomly with replacement to
produce a pseudoreplico le do la sel of the sanle size as Ihe original. SOllle characters
(e.g. #S) will be represented more than Ollce, while others (e.g. #3) will not be
included al all. The most parsimonious cladograms for the pseudoreplicale da ta set
are lhen constructed. (c) This process is repea ted 8 large number of times (e.g. 1000)
and the results summarized by means of a majority-rule consensus tree. Support for
a group is then interpreted as Ihe percentage of most parsimonious cladograms
resulting from the pseudoreplicotes in which the group is fo und.
inclusive (Swofford and O lsen 1990). Second, the confidence intervals ob-
tained through resampling methods are only approximate unless the o riginal
sa mple size, that is, the numbe r of characters ir. the data matrix, is large. This
is ' large' in the statistical sensc (more th an 1000 and prefe rably 10000) and
most data sets do not begin to approach this number of characters. Eve n
molecular data, which can comprise several Iho usand base pairs, do not
contain this number of info rm ative sites. It has been argued IhtH bootstrap-
ping is nOI affected by the inclusion of uninformative c!lIIracters (Harshman
1994) bul this has been shown to be in(.'Qrrect (Cirpenlcr 1996). Thus., few, i
any, dllta sets meet the .5t81i1uic8 !ill Wremnl of lhe bootlltrap.
Support for individual clades Orl a cJadogmm 131
An alternative view of the bootstrap is that it indicates how the support for
the various groups on the most parsimonious c1adogram is distributed among
characters. The expectation is that clades supported by a large number of
characters will be recovered frequently and receive high scores on a majority
rule consensus tree. In contrast, clades that are supported by only a few
characters, especially if these are homoplastic, are not' expected to be
recovered very often , if at all. However, a clade may be unambiguously
supported by a single character on the most parsimonious cladogram yet fail
to be recovered by a bootstrap analysis, due to the random nature of the
resampling. Thus, groups can be excluded fro m the majority rule consensus
tree even though they are uncontradicted on the most parsimonious clado-
gram. The bootstrap thus provides only a one-sided test of a cladogram.
Groups that are recovered are supported by the data. but groups thai are nol
recovered cannot be taken as rejected.
There is, however, a more serious and fundamental problem with the
bootstrap. This is the requirement that the characters in the original data
matrix should represent a random sample of all possible characters. However,
in systemalic studies, characters are not randomly sampled from indepcndent,
identically distributed populations. Rather, they are carefully selected and
filtered with the aim of best resolving the relationships of the taxa under
study. Such systematic bias is not considered to be a problem by advocates of
the bootstrap. who assert that attempts by systematists to try to ensure that
their characters are independent and uncorrelated are sufficient. However,
subjectively trying to ensure that characters are independent of one another
is simply inadequate. Unless the characters in a data set do accurately reflect
the larger underlying distribution of all possible characters, then bootstrap
confidence intcrvals may be very poorly estimated. There are also many other
factors that might lead to either overestimates or underestimates of confi-
dence, including size of the data set, efficiency of any heuristic search
procedures that are employed, cladogram topology, and differential and
uneven rates of character change among branches.
Thus, at best, the effects of these limitations mean that we would be unwise
to treat bootstrap confidence intervals as absolutes, although they may serve
as approximate guides to the support afforded to groups by the data. At
worst, the application of the bootstrap in cladistic studies can be considered
to lack rigorous justification.
Ja ckknife
1n contrast to the bootstrap, jackknife sampling is applied without replace-
ment and hence the pseudoreplicate data sets are smaller than the original.
Jackknifing aims to achieve better variance estimates than might otherwise be
possible from small ......... .Ill fint.arder jackknifing, pseudoreplicates are
constructed by randoallJ one obIervation (taxon or character) from
the data set. Hence. T obIervalions, T pseudoreplicates are
.1II!!ii!!!.~!!l.!!" oriainal IBmple. The
132 Support and confuJerlce statisticl' for cladograms and groups
variances of the T pseuctoreplicates are then averaged to give the estimate of
the parametric variance.
First-order jackkn ifing of taxa was introduced into systematics by Lanyon
(985). If a data set contains no homoplasy, then deletion of one taxon will
have no effect on the topology of th e most parsimonious cladogram. This is
because the information contained in the synapomorphies of that taxon is
inherent in those taxa that occupy more distal positions on the cladogram.
The most parsimonious cladogram obtai ned from analysis of a sample of
T - 1 taxa will thus be identical to that obtained from analysis of the
complete data set, but with the terminal branch leading to the deleted taxon
pruned oul. However, if there is conflict in the data, then analysis of
jackknifed data 'Sets need not produce the same topology as that derived from
the complete data set. Any conflict is revealed by constructing a strict
consensus tree of all the most parsimonious cladograms derived from all
possible first-order pseudoreplicates by rnearf!l.of a strict consensus tree (that
is, a consensus tree that contains only those components common to atl of the
fundament al c1adograms; see Chapter 7 for further explanation ). However,
the normal strict consensus method discards those taxa not present in all the
fundamental c1adograms and thus applying this procedure to cladograms
produced from jackknife pseudoreplicates would produce a consensus tree
that contained only the autgroup. Lanyon (1985) proposed a modification to
allow for the deleted taxa, called the jackknife strict consensus tree, which
contains those nodes that are shared by or consistent with all of the
pseudoreplicate c1adograms.
However, Lanyon's advocacy of a strict consensus tree suffers from the
drawback thai strict consensus is indifferent to the proportion of most
parsimonious c1adograms in which a clade is supported. Thus, it takes only
one cladogram from one pseudoreplicate that wildly disagrees with the
remainder in the position of one taxon to collapse the entire consensus into a
bush. To circumvent this problem, Siddall (1996) proposed the jackknife
monophyly index (JMJ), which is defi ned as:
T
[ p(c,)
1MI, -
,-I T-
=-;;;
where T is the number of ingroup taxa and p(c,) is the proportion of the
most parsimonious c1adograms of pseudoreplicate t in which clade c is
supported. Because the JM! is rnonophyly-dependent (i.e. it is calculated
using rooted c1adograms), it is inappropriate to jackknife the outgroup taxa
(ana logous arguments have been put forward with regard to lhe PTP test).
Siddall advised agai nst using JMl val ues as the basis for nccepting or rejecting
individual clades. While the JMI might, in some sense, indicate the amount of
stat istical support there is fur a clade, it certainlY: uacd to argue the
Su.pport lor individual clades 011 a cladogram 133
converse, that is, the degree of support against a clade. The JMI does show
which groupings on a cladogram are more stable and which are less stable.
The calculations can also help to identify ·critical' and 'problematic' taxa.
Critical taxa are those whose deletion results in a great increase in the
number of most parsimonious cladograms. In contrast, the deletion of prob-
lematic taxa stabilizes the results and reduces the numbcr of most parsimo-
nious cladograms.
Highe r-order jackknifing removes subsets of n observations 10 give pseu-
doreplicates of size T - II . As there is no justificat ion fo r stopping at a
particu lar subset size, the removal of all possible subset sizes from 2 to
(T - I) should be investigated. However, the number of possible ways in
wh ich any number of taxa (up to T - 1) can be removed from T taxa is
2T - 2. For more than about ten taxa, this results in an impract ically large
number of analyses to perform. Random sampling of all possible combina-
tions might provide a suitable heuristic solution, which could be performed by
randomly choosing a subset size and then randomly deleting this number of
taxa. T his procedure is identical to bootstrapping taxa, because this latter
method randomly deletes some taxa and randomly replicates others. As the
latter are duplicates identical in character information, they would act as Ol).e
terminal taxon in a parsimony analysis. The overall effect would thus be the
same as randomly deleting a random number of taxa, i.e. a higher-order
jackkn ife. Siddall (996) rather irreverently called this method the 'jackboot'.
Application of the jackboot would only be appropriate if the observations
being jackknifed could be assumed to be drawn randomly from some larger
sample, an assumption that might prove difficult to justify for taxa. Further-
mo re, the effects of higher-order subset deletions might be expected to bc
more severe than those of lower-order subsets. How these effects would be
weigh ted differentially remains unclear.
First-order jackknifing of characters was first applied in a systematic
context by Mueller & Ayala (1982), who advocated it for estimating the
sam pling variance of Nei 's genetic dista nce. However, first-order jackknifing
is of limited use when applied to cladistic characters because it amounts 10
little more than asking if there is more than one apomorphy supporting a
clade. In a homoplasy-free cladogram, all clades with only a single supporting
apomorphy would not be recovered, while al1 other clades would receive a
perfect score regardless of the number of apomorphies supporting them.
Likewise, higher-order jackknifing simply extends this problem and ulti-
mately, o nly those clades supported by at least as many characters as there
are taxa are guaranteed to be recovered.
Clade sta bility index

Davis ( 993) proposed I alCbod tor estimating the stability of clades that is
similar to higher~ ~ However, rather than using character
deletion al a me Yarlance, Davil used it to identify
134 Support and C01lfidellce .~tat;slics for c1adograms alld g.roups
those characters or combinations of characters th at are critical to the

maintenance of clades on a cladogram. Characters are successively deleted,
first individually then as increasingly larger subset.c; unt il the clade under
study is lost from the strict consensus tree of most parsimonious cladograms
resulting from the analysis of the reduced data sets. The clade stability index
(CSO is defined as the ratio of the minimum number of character deletions
required to lose a clade 10 the lotal number of informative characters in the
data set. Thus. a clade that is lost after the removal of two characters fro m an
informative data set of 15 characters would receive a CS I of 2/15 = 0.13.
The CSI suffe rs from the same computational difficulties as higher-order
jackknifing as the number of characters increases. For large data sets, Davis
suggested it wou ld be sufficient to analyse all reduced data sets with one and
all combinations of two characters removed, together with 500 reduced data
sets each for subsets of three to ten characters. For his data set of 74
characters, this would enlail analysi ng 6775 data sets. Under such a strategy,
the CS I would be a maximum value because an unsampled combination of
character deletions may exist that wou ld prevent a clade from being resolved.
The value of the CS[ is also strongly influenced by the size of the data set.
For a give n amoun t of apomorphic support for a clade. the larger the data
set, the smaller the CS I, a deficiency that calls into question the suitability of
the CSTas a general measu re of clade support. Furthermore, expressing clade
support in terms of characters works on ly with binary and non-additively
coded multistate characters. For additive binary characters, it is possible fo r a
clade to be supported by more than one character state change in a given
character. Deletion of a single character may then lead to the loss of more
than one apomorphic change supporting a clade. For these reasons, and in
addition being easier to calcu late, Bremer support is to be preferred over the
CST as a means of est imating clade su pport.
Topology-dependent permutation tail probabilily (T-PTPJ

The permutation methods of the PTP test have bee n extended (Faith 1991) to
provide both a prion' and (l posterion' tests of the monophyly or non-
monophyly of a selected clade on a cladogram: the 'topology-dependent
cladistic permutation tail probability' (T-PTP) tests. Evidence for the mono-
phyly of a clade is conside red strong if the most parsimonious cladogram in
which the clade is monophyletic is much shorte r than the most parsimonious
cladogram obtainable under the constraint that the clade is not monophyletic
(i.e. para phyletic or polyphyletic). The strength of the evidence is assessed by
comparing the length diffe rence unde r the two topological constraints for
real data agai nst that obtained from permuted data . As in the PTP test. the
significance value is ca[cu lated as the proportion of occasions on which
the observed length difference is equalled or e~y Lhe yalue from Ihe
permuted data.
Summary 135
A priori application of the T-PTP test requires that the monophyletic group
to be tested be hypothesized before the cladistic analysis is carried out and
asks whether a level of support for this group can be found that represents a
significant departure from randomness. In contrast, the a posteriori T-PTP
test asks whether the support for a given clade found as the result of a
cladistic analysis could have arisen by chance. The test parameter is the same
as that in the a priori T-PTP test, that is, the length difference betwee n the
most parsimonious cladograms obtainable under the constraints of mono·
phyly and non-monophyly. However, for each permuted data set, the length
difference is now calculated as the largest value that could be achieved for
any monophyletic group of the same size as the clade under consideration.
The T-PTP value is then estimated as tbe proportion of all data sets (real plus
permuted) in which some mor:ophyletic group can be found with a difference
value at least as large as that found for the clade in question.
T-PTP can be extended to include tests for the non-monophyly of a group
or, indeed, any conceivable set of compatible topological constraints. One
such set comprises all the clades of a given cJadogram and is referred to as
the 'all-groups' form of the T·PTP, or T-PTP(AG). In this version, the test
parameter is the difference in length between the most parsimonious clado-
gram and the minimum length cladogram obtainable that includes none of
the clades of the original topology. As such, it is actually a measure of
support for an entire cladogram, rather than individual components thereof.
However, as a derivative of the PTP test, T-PTP tests suffer from all the
theoretical and practical defects of that test. Hence, an a posteriori T-PTP
test cannot provide a criterion for the acceptance or rejection of a clade, but
on ly give a relative measure of confidence in that clade.
6.4 SUMMARY
In summary, no method proposed so far can serve as more than a useful

ge neral guide to the degree of confidence that we may place in the results of
a cladistic analysis. Certainly, none of the statistics discussed above can place
statistically meaningful confidence limits on to c1adograms. Furthermore, if
characters are intrinsically hierarchical, then any method that uses a random
model as the basis for a null hypothesis is entirely inappropriate. However,
while the most parsimonious c1adogram remains the best estimate of the
relationships among the study taxa based upon the data to hand, there is a
need to develop rigorous measures of confidence appropriate to the clado-
grams, in ordcr that we avoid overinterpreting the significance of our results.
But these measures of confidence in a group should be based upon Ihe
probability that the charlCleri considered to be synapomorphies have been
correctly interpreted (PErt b ~ 1989).
136 Support and confidence statistics for c/adograms and groups
6.5 CHAPTER SUMMARY
I. Many me thods have been proposed that can purportedly be used to

lIssign degrees of support o r confidence to the results of cladistic analysis.
These can be divided into two types. The first are applied to the
cladogram as a whole and aim to answer the question of whether the data
contain 'sign ificant' cladistic structure and thus the result ant cladogram is
not just the product of chance. The second group of methods examine the
support afforded to individua l groups o n a cladogram, in order to deter·
mine which are welJ supported by data and which arc only weakly
supported.
2. Methods aimed at assessing the support of whole cladograms all use
the same general principle. The rea l data sc i is repeated ly perturbed
according to a set of rules to produce a .la rge number of dala sets of
' phylogeneticaUy uninformative' data. The ~engl hs of the most parsimo-
nious cladograms derived from these contrived data sets are then com-
pared with the length of the most parsimonious cladograms for the real
data set, with the expectation that the latter will be substan tially shorter
than any of the former. These methods include data decisiveness (DO),
distribu tion of cladogram lengths (DeL), and the permutation tail proba·
bility (PTP) test.
3. DD compares the real data wi th a matrix of completely undecisive data,
that is, one in which all possible infonnative characte rs occur in equal
numbers. If the DO value is small, then there are only very weak reasons
for preferring the most parsimonious cladogram derived from the real
data. In contrast, a bigh DO value does not signify high information
content and certainly cannot be used to assign a confidence level to the
most parsimonious cladogram for the real data.
4. The use of DCL as an indicator of phylogenetic signal is based o n the
premi se that when the DeL is nearly symmetrica l, many cladograms will
be only a few Sleps longer th an the most parsimonious c1adogram and
thus the phylogenetic signal is weak. However, if the DeL is strongly
nega tively or left -skewed, there will be very few c1adogmlTIs th at arc just
slightly longer than the most parsimonious solution and thus the phyloge·
netic signal is strong. However, DCl skewness is a poor measure of
phylogenetic support because it is strongly dependen t upon the probabili-
ties of character change along each branch of a cladogram and is also
insensitive to the number of cha racters (and th us to the degree to which
conclusions are corroborated).
5. In the PTP test, the data are perturbed in such a way as to break any
clad istic covariation among the characters. 11IiI La achieved by rando mly
reallocating states within characters amoaa *- wbUe maintaining the
Chapter Summary 137
original proportions of each state. Characters are treated independently

and the states in the outgroup taxa are held constant and not permuted.
However, the null hypothesis of randomly covaryiog characters is con-
trary to the basis of cladistics. Cladistic characters are intrinsically
hierarchical and cladistic analysis aims to summarize this hierarchy as
efficiently as possible.
6. The simplest measure of support for individual clades is branch length.
However, homoplasy makes the objective interpretation of branch length
as support difficult. Bremer support aims to circumvent this problem by
assessing the number of extra steps that are required before a clade is
lost from the suict consensus tree of near-min imum length c1adograms.
'When there is no homoplasy, Bremer support is equal to branch length,
otherwise the support for a clade is reduced to the degree that there are
alternative equally parsimonious groups or character optimizations.
7. There arc three basic types of approaches that use randomization proce-
dures to assess, support for individual clades: Monte Carlo methods
(including bootstrapping), jackknifing and permutation procedures.
8. Monte Carlo methods can be model-dependent or model-independent.
Of the latter, the best known is the bootstrap. A large number of
pseudoreplicate data sets of the same size as the original are created by
random ly sampling characters with replacement. The effect is to delete
some characters randomly and to reweight the rest randomly. The mosl
parsimonious c1 adograms for these pseudoreplicates are calculated and a
majority rule consensus Iree used to assess the degree of conflict among
them. The percentage of pseudoreplicate data selS that recover a given
group is then interpreted as a measure of confidence in that group.
However, the bootstrap makes several assumptions that severely limit its
usefulness in this regard. The most serious of these is that the characters
in the data set are a random sample from all possible characters. [n
practice, systematists never sample characters at random but carefully
select them, thus violating the theoretical underpinning of the bootstrap
as a measure of clade confidence.
9. The jackknife creates pseudoreplicate data sets that are smaller than the
original by sampling without replacement. First-order jackknifing re-
moves only one character or taxon; higher-order jackknifing removes two
or more. Jackknifing of taxa can help us to identify 'critical' and ' prob-
lematic' taxa. Deletion of critical taxa results in a great increase in the
number of equally most parsimonious sol utions, while the deletion of
problematic taxa h.. the converse effect.
10. The clade stability lDdIa .. deflncd U Ihc ratio of the minimum number
of character • dade on a cladogram to the total
138 Support (l1It1 confidence st(ltistics for c/adograms tmd groups
number of informative characters in the data set. However, it suffe rs

from major compu tational difficu lt ies and the de ficiency that for a given
amount of apomorphic Sllpport for a clade, the larger the data set, the
smaller the CS I. For Ihese reasons, Ihe use of Bremer supporl is pre-
fe rred to the CSt
11. The PTP lest is appl ied to individual clades as Ihe ' topology-dependent
cladistic permutation lail probability' (T-PTP) lest. However, as a deriva-
tive of the PTP test, the T-PTP test su ffe rs from all the same naws am.I
fai lings and is of very limited uti lity.
12. None of the methods advanced so far CHn place meani nglul confidence
limits onto cladograms, alt hough dala decisiveness and Breme r support
may serve as useful general guides. If cladistic characte rs are intrinsically
hierarchical, then all methods using a random model as a null hypothesis
arc fu ndamentally Hawed.
7.
Consensus trees
7. 1 I NTHODUCTlON
In a clad istic analysis of a particular group of taxa, data arc frequently

obta ined from several sources. In biogeographical and co-evol utionary stud-
ies, information may be available for two o r more groups of taxa. Systematists
generally hold th at because different character sets share common evolu tion-
ary histories, then reliable phylogenetic methods should be able to recover a
common resolved pattern. This reasoning led to the idea that congruence
among data sets and fundamental cladograms might provide the strongest
evidence Ihal phylogenetic reconstruction is accurate ( Penny and Hendy
1986, Swofford 1990, In practice, however, analysis of different data sets can
produce cladograms wit h everything from mino r d iscrepancies in the place·
ment of one taxon to completely different overall topologies.
When confronted with differences between two or more cladograms, and
especially when they are very disparate, we are fa ced with a number of
problems. For example, for different data sets and the same group
of organisms, incongruence can be explained by suggesting that at least one
of the data sets is wrong or that each data set is providing only part o f the
;correct' systematic signal. The issues of debate concern the relative strengths
of particular data sets, particularly of morphological and molecular data sets,
and the reliability or futility of different analytical methods. However, the
underlying issue remains whether there any methods th at can be used 10
diffe rentiate unavoidable incongruence from 'spurious' differences due to
sam pling erro r.
T his chapler reviews the more accessible consensus met hods currently used
to determ in e congruence and incongruence among different data se ls, but
IIvoids the issue of ' total evidence' versus 'consensus' (reviewed in Chapter 8).
Since the advent of computer packages for cladistic analysis, there have been
considerable deve lopments both in methods of cladogram comparison and in
slatistics for assessing levels of agreeme nt in seemingly very d ifferenl dat a
sets and topologies.
Consensus methods arc a conven ienl means of summarizing agreement
and disagreement , or congruence and incongruence , bel ween twu or more
cladograms. Common to all methuds of t:uIIscnsus analy.~ i s is the desire to
conslruct a Irce from the nOlHlllltrldlt:tol')' componen ts found arnung the SC I
IIf dlltiograms generated Wbsill. Cnnse n .~u.~ trees cun be
140 Comensus trees
considered to be indirect methods for resolving cha racter conflict in the
const ruct ion of a general classification. They reduce the number of funda-
mental cladograms produced by parsimo ny amdy.<:is 10 one tree showing thei r
common components (i,e. non-trivial or informative groups sensu Nelson and
Pial nick 1981).
Cladograms generated from a data set in a cladistic ana lysis a re called
fundamental because they summarize it s hierarchical information. In can·
trast, consensus trees are derivative, being constructed from and representing
a set, or sets, o f cladograms. Consequently, consensus analysis almost invari-
ably produces a tree tha t would not be supported as most parsimonious by
the original data and in some cases may even con tain components not found
in any of the fundamental cladograms. For this reason, some authors (e.g.
Miyamoto 1985, Carpenter 1988) have argued against their use on the
grounds that a consensus tree rarely su mmarizes a data set as e[ficiently as
anyone of the fundame ntal cl3dograms (rum which it was constructed .
Nevertheless, consensus trees have their uses in allowing investigatio n of data
concordance amongst cladograms generated from different data sets. This is
important for investigating 'difficult' taxa that occur in different positions and
thus produce fun damental c1adograms with different topologies. It is impor-
tanl also for finding intersectio ns amongst cladograms when comparing
' hosts' and 'associates' (Page 1993b), for example, between gene trees and
species cladograms, hosts and their parasites in co-evolut ionary studies, and
area cladograms and biological c1adograms in histo rica l biogeography
(Humph ries and Parenti in press). Others argue th at because cladistic analy-
ses of almost all data sets produce multiple equaHy most parsimonious
cladograms, then consensus trees should be considered as essential in classi fi-
cation because they provide the o nly reasonable sum mary of the informatio n
that can be ach ieved (Anderberg and T ehler 1990, Bremer 1990).
Consensus analysis has become a minor growth industry in systematics and
many methods have been described. This chapter describes ollly t he seven
most commonly encountered in the cladistic li terature: strict, majority-rule,
combinable componenls (or semi-strict), Nelson, Adams, agreement subtrees
(or common pruned trees), and median consensus trees.
Strict and majority-rule consensus methods are based on simple counts of
the freq uency of informative groups (components) in the set of c1adograms
being compared. Strict consensus trees contaih only those components com-
mon to all of the fun damental cladograms. Combinable components or
semi-strict consensus trees ( Bremer 1990) include all of the components
found in st rict trees but in addition include those components that are
uncontradicted by less resolved components within the set of fundament al
c1adograms. Majority-rule consensus trees contain all of those clusters found
in more than 50% of these c1adograms. The cut-off value can be set to a
higher value, say 75%, but then the result i. not .trictly a majority-ruJ~
consensus tree. Mediln consensus trees are r.Jated 10 majority-rule
Strict consensus trees ) 4)
consensus trees. This method uses a metric that measures topological dis-
agreement between any pair of cladograms (Barthelemy and Monjardet 1981,
Barthe lemy and McMorris 1986). Nelson consensus trees comprise the cliques
of mutually compatible com ponents that arc most rcplicated in the funda -
menlal cladograms. Adams trees contain all intersecting sets of taxa common
to all c1adograms. In the greatest agreement subtree method, the least
number of branches arc ' pruned' from the fundam en tal cladograms to
produce the largest subtree with greatest agreement. In the trivial compari-
son of two cladograms, strict and majority-rule consensus trees are identical,
as are combinable components and Nelson consensus trees.
o
Nixo n and Carpenter 996b) argued that if the goal of consensus analysis
is to summarize agreement in grouping among a sel of fundamental clado-
grams, then only the stricl consensus tree fulfi ls Ihat goa l. All the other
methods listed above may yield trees with groups that are not full y supported
by all the data or are supported on ly ambiguously. These Nixon and
Carpenter called 'compromise trees', reserving the term conse nsus tree for
the strict conscnsus method only.
However, while we recognize the fundamental distinction between strict
consensus and other methods, we consider that there are equally fundamen-
tal differences among the ot her methods, and that the dichotomy proposed by
Nixon and Carpenter is unnecessarily doctrina ire. We therefore retain the
term consensus for all methods that aim to summarize the common informa-
tion contained in a set of c1adograms according to some specific critcrion.
Different consensus methods are suited to different tasks, alt hough the
literature is bewildering in that methods are frequently applied inappropri-
ately and the terminology has become somewhat confused (Nixon and
Carpenter 1996). Strict consensus is useful for determining common compo-
nents of all fundamental c1adograms, while combinable components consen-
sus highlights resolved components amongst a profile of cladograms some of
which contain unresolved components. Majority-rule consensus is most fre-
quently used to summarize the results of bootstrap analyses (see Chapter 6).
Adams consensus trees are used mostly to dete rmine the degree of preserved
structure in cladograms. They have their greatest value in compa ring see m-
ingly different topologies due to the erratic performance of ' rogue' taxa that
appear in widely different posi tions on cladograms. Largest commo n pruned
trees can be useful for determining incongruence in c1adogra ms when only a
few taxa are responsibl e [o r the different topologies.
7.2 STRICT CONSENSUS TREES
The most conservative m_ WI tree i. a strict consensus tree . First used by

Schuh and PolhemUi (1911). die '"rict tree' was defined by Sokal and Rohlf
(1980 u the un! uc 0Dly thOle groups that occur in all
142 CQllsensus trees
ABCDEFGHI ACBDEFGHI
ACOEFBGHI
2
3 '0
Ie)
Fig. 7.1 Cladograms of (a) butterflies. (bl birds and (e) bats, calcula ted using
COMPONENT 2.0 for Windows<ll>, (From Bremer 1990) Numbers refer to componen ts
in Table 7.1.
rival cl adograms. These were also called 'Nelson consensus trees' by Schuh
and Farris ( 981). As Nelson 's (1979) consensus method of adding together
replica ting and non-replicating components is different from strict consensus,
the distinction made by Page ( 1989) is maintained here (see below).
A strict consensus tree is derived by combining on ly those components that
appear in all members of a set of fundamental c1adograms. Consider three
cladograms for buuerflks. birds and bats (Fig. 7.0. In a [ully resolved
cl adogrum, there arc n - 2 info rmative components. Thus, for a cladogram of
9 taxa, there are 7 informative components. However, among the three
cladograms in Fig. 7.1, we actually find 12 different informative components
(Table 7.0, indicating that there is conflict among the cladograms. Of these
12, only component 10, comprising the group G HI, occurs in all th ree
fundamental cIadograms. The st rict consensus tree thus contai ns only this
component (Fig. 7.2a). The rationale behind strict consensus is that the data
are only consistent for Ihis one component. In this particular exam pIc, the
conserva tiveness of strict consensus means that one is left with depressingly
little resolution, although this is not always the case.
We noted in Chapte r 4 that both PAUP and Hennig86 can gene rate
spurious cladograms that are due solcly to ambiguous characte r optimizatio n.
TIle length o f the strict consensus tree can be used to determine whether all
the apparent resolution found in a set of fund.mental cladograms is due to
such ambiguity. Suppose we have two drrlfMlt &bit contain two fully
Combinable components or semi-strict consensus 143
Table 7.1 Components of the butterflies, birds and bats cJadograms, analysed using
COMPONENT 2.0 for Windows'" (Pags 1993b). Rows (intsgers) are the 12 compo-
nents found in these three cJadograms: columns (letters) are the taxa, as labelled in
Fig. 7.1. The composition of each component is indicated by lhe asterisks and the
number of cladograms in which it occurs is given in the last column. For example,
componenl10, comprising taxa G, H and ~ is present in all three of the cladograms,
while component 8, comprising taxa B, E and F, occurs in only a single cladogram
(bats). See text for further explanation.
A B c D E F G H Occurrences
1 • • 1
2 • • 2
3 • • • 1
4 • • • 2
5 • • • • 1
6 • • 1
7 • • 2
8 • • •
9 • • 2
10 • • • 3
11 • • • • • 2
12 • • • • • • • • 1
resolved but conflicting groups, each unambiguously supported by data.

Because the two groups are not common to both cladograms, neither will
appear in the strict consensus tree. rnstead, their component taxa will form a
polytomy and the strict consensus tree will be longer than the two fundamen-
tal cJadograms, because multiple origins of the characters supporting each
group will have to be postulated. However, ir the difference in topology
between the two conflicting groups is due to ambiguous character optimiza-
tion, then collapsing the zero-length branches will have no effect on length,
and the strict con:\ensus tree will be the same length as the fundamental
cladograms. In other words, if we find that a strict consensus tree is the same
length as its fundamental cladograms, Ihen we can conclude that all the extra
resolution present in those cladograms is spurious and due to ambiguous
character optimization. in these circumstances, the strict consensus tree is
the prefcrred topology because it is both of minimal length and all its
resolved nodes are supported by data; that is, it is the strictly supported
cladogram (Nixon and Carpenter I996b).
7.3 COMBINABLE COMPONENTS OR

~~~Kll~CONSENSUS
Bremer's (1990) •
.--...----
_
_ -
..... 1.
,
~per drew attention to the
t 44 Consensus trees
ABDCFEGH ABDCFEGHI
ABCOE FG HI ABCDEFGHI
AB C DEFGHI COEFGH
Ie )
Hg. 7.2 Consensus trees of the butterflies. birds and ba ts cladograms shown in Fig.
7.1 . (a) Strict consensus tree. (b) Majority-rule consensus tree/ Media n consensus
tree. (e) Combinable com ponents or semi-st.rict consensus tree. (d) Nelson consensus
tree. (e) Adams consensus tree. \0 Grea test agreement subtree. Consensus trees were
calculated using COMPONENT 2.0 ror Windows·. Numbers refer to components in
Table 7.1.
distinction between strict consensus and combi nable components consensus

and showed that components that were not replicated in all fundamental
cladograms, but were neve rthe less non-conflicting, cou ld also be combined,
rather than collapsed into a polytomy, as is done in strict consensus. Such
Don-replicated but no n-conflicting componen ts can occur when at least one
of the fund amental cladograms contains .1 polytomy. A cumponent tha t is one
possible resolut io n of a polyto my cannot conflict wit h the polytomy itse lf and
can also be included in a consensus tree. Tbua. • combinable compo nent
consensus tree is formed fro m all of the unconlr-"ctod gomponents from a
Majority-/U/~ and median collsensus tree 145
set of fundamental c1adograms. In combinable componen t consensus, compo-

nents need not be present in all c1adograms to appear in the consensus tree.
Thus, in our butterfly, bird and bat example (Fig. 7.0 , combinable compo-
nents consensus results in a tree (Fig. 7.2c) with slightly greate r resolution
than was achieved with strict conse nsus, in that it includes component 9
(H + I). This component is one possible resolution of the trichotomy, OHI ,
and therefore cannot be in disagreement with the bat c1adogram. in which
laxa G, H and I are unresolved. When all the fundamental cladograms are
fully resolved, with no spurious resolut ion due to ambiguous optimization,
then the combinable components consensus tree and the strict consensus tree
will be iden tical.
7.4 MAJORITY-RULE AND MEDIAN

CONSENSUS TREES
When many trees are being compared, a majority-rule consensus tree may be
preferable to strict consensus (Swofford 1991). Instead of including only those
groups that occur i.n the entire set of fundamental cJadograms, it is possible
to retain a pre-specified number of those c1adograms in the majori ty-rule
consensus tree. Typically, majority-rule trees are specified to re tain those
compone nts that occur in more than 50% of the c1adograms (Margush and
McMorris 1981). Thus, when set at 50%. the groups retained must appear in
more than half of the fundamental cladograms, because different groups that
occur in exactly 50% of the trees may conflict with each other. In our
butterfly, bird and bat example (Fig. 7.1), the four components shared by the
butterfli es and birds (Table 7.2, components 4, 7, 9, JO, together with the
universal component 10, appear in the majority-rule consensus trce (with
percentages of 66% and 100% respectively) (Fig. 7.2b).
The median consensus procedu re (Barthelemy and Monjardet 1981.
Barthelemy and McMorris 1986) is closely related to majority-rule consensus
method (Swofford 199 1, Page 1993b). This method uses a tree comparison
metric that measures the degree of disagreement between any pair of
c1adograms. Thus, if the distance between a pair of cladograms. T/ and ~, is
d ~ Ji , ~), then the total distance of any cladogram T to k rival cladograms is
given by:
k
dT - l: d(T ,T,)
1- 1
A c1adogram, Till' it. modIu tree i( ill total distance, d m , to the rival trees
is less than that (or IDI 0Ihw ...... m. lf the symmetric difference distance
(Robinson and Foukll 1... ....., .. 111. 1114) it used as the tree comparison
metric. then tM n, us tree i. • median tree
146 CO/l.~·ensIlS trees
Table 7.2 Component compatibility matrix. Numbers refer to the components in

Table 7.1. Full stops (.) designate incompatible components, Le. those thai show
conflict; Is represent a~reemell t between pairwise comparisons. The asterisks
indicate the two components (9 and 10l that are uncontredicted on all three
cladogTams.
Compatibility malrlx
,
1
3
, 1
4
5 , 1
I I I
6
7 1 1 1 1
I.
8
9
11
1
1
1
1
1
1
1
1
1
I
1
1
1
1
I
1
1
1
1
1
1
.'. I
1 1
1
12 1 1
, 3
I
4 5
1
6
1
7
1
8
1
9 I.
1 1
11 12
• •
(Barthelemy and McMorris 1986, Swofford 199 0 (Fig. 7,2b). When there is
an odd number of rival cladograms (k) or if there are no groups that appear
in exactly 50% of the rivals, the majority-rule tree is the only median tree.
However, when k is even, any tree representing a combination of the
majority-rule consensus tree with one or more combinable groups that occur
in exactly half of the rival cladograms is also a median tree (Swofford 1990 ,
or the methods described here, median consensus trees are the least fre-
quently encountered.
7.5 NELSON CONSENSUS
There has been some controversy as to what exactly is Nelson consensus.

Some have mistaken it for strict consensus (see Page 1989), wh ile Nelson's
(1979) own prese ntation seems most like combinable components consensus
(Nixon and Carpenter I996b). Here, we adopt the notion of Ne lson consen-
sus proposed by Page (1989), who likened Nelson's discussion of the relation-
ships among components to clique analysis. Page extended Nelson's concept
into a method for combining components that do Dot include contradictory
replicated sets, i.e., a formal clique analysis . Thus.
Agreement subtreu or cornman pruned trees 147
components may appear in the Nelson consensus tree that can be contra-
dicted in some of the original c1adograms. This version of Nelson consensus is
available as part of the COMPONENT for Windows· computer package
(Page 1993b). By way of example, we can examine the result obtained when
determining the largest cliques for our butterfly, bird and bat example (Fig.
7.0 . If we conside r taxon B, it occurs as the sister taxon to taxa C and 0 in
the butterfly clade, B(CO), as the sister to taxon 0 in the bird clade, C(BO),
and as the sister to taxon F in the bats clade, E(FB). As the sister pair CD
occurs in both the butterflies and bats (i.e., in 66% of the cladograms), and
the group BCD occurs in the butterflies Rnd the birds (also in 66% of the
cladograms), then the largest clique ,'Or all three cladograms is B(CD). For
our butterfly, bird and bat example (Fig. 7.1), the Nelson consensus tree (Fig.
7.2d) is identical to the majority-rule tree, with components 2, 4, 7, 9, 10 and
II comprising the largest clique for all three c1adograms (Table 7.2).
7.6 ADAMS CONSENSUS
Adams consensus trees (Adams 1972) are derived by relocating those taxa
that occur in conflicting positions on different fundamenta l cladograms to the
nearest node they have in common. Adams trees therefore contain all the
intersecting sets of taxa (nestings) common to all the fundamental cladograms
in any given set of dadograms. Thus, given two sets of branches, A and B,
and a c1adogram T, set A nests inside set B if A is a subset of B and clades in
A have a more derived common node in T than does set B. For example,
given the c1adogram A(B(C, D», the se lS BC, BD and CO all nest inside
ABCD, but only CD nests inside BCD (Page 1993b). Adams trees are
particularly useful for summarizing similarities in topology in the fundamen -
tal cladograms when they contain onc or more taxa that show very different
positions. For example, in our butterfly, bird and bat example (Fig. 7.1), all
three c1adograms share the nestings CD (component 2), EF (component 7)
and GHI (component 10), However, taxa A and B have three completely
different placings and thus are placed on the Adams consensus tree at the
lowest common node, that is, the unresolved basal node {Fig. 7.2e}. Adams
consensus trees must be used with care because components can appear in
them that do not occur in any of the fundamental cladograms.
7.7 AGREEMENT SUBTREE S OR COMMON

PRU NED TREES
In all the cons~n'u. methodl delcribed above, the consensus tree contains
the same number of tau II do the fundamental cladograms. A rather
different method of ~rams is an aarcement subtree, which
148 Consensus trees
shows only the clades and taxa th at are common to two or more fundamental
cladograms. [0 this method, the greatest agree ment subt ree (GAS) is ob·
tain ed by pruning one or more branches from each fundament'al cladogram
until a set of identical topologies is obtained. Finden and Gorden ( 1985)
re ferred to these as 'common pruned trees', The GAS is lhe subtree that
results from pruning the least number o f branches from the fundamental
cladograms. Given two cl adograms, T. and T 2 • Page (I 993b) defin ed the
distance, d OAS (Til T2 ), as the numbe r of branches removed to obtain the
grea test agreement subtree. By using a recursive algorithm (e.g. Kubicka el
al. 1995), a branch of one cladogram is selected and compared to the other
cladograms. The largest supporlcd subl.ree in Iht: o the r c1adograms of com-
parison that contains the selected branch is maintained. Each branch is
selected in tu rn and compared to the o ther cladograms. From amongst the
full range of subtrees obtained the largest agreement subtree is selected.
This method is most useful when one o r twt\ taxa are incongruen t amongst
the profile of fundam ental cladograms. Rosen (1979) used a common pruned
subtree to indicate common components in [wo seem ingly different area
cladograms for central American fishes. In our butterny, bird and bat
example (Fig. 7. '1), the largest agreement subtree is that which has had taxa A
and B pruned out to leave a common topology of six taxa (Fig. 7.20. An
alternative implementation might perm it the inclusion of uncontradicted
components, in which case, the trichotomy in Fig. 7.2e could be resolved as
G(HI).
7.8 CONCLUS IO NS
It sho uld be recognized that in most studies, it is the fund amental (data-
derived) cladograms that provide the most direct and reliable evidence of
relationships among the taxa under study. The only exception to this rule is
when the strict consensus tree is the same length as the fundamental
cladogram and is thus the strictly supported cladogram fo r that data set. Then
it represe nts our best estimate of relationships among the taxa. In all other
instances, when the consensus trees are longer than the fundam ental clado-
grams from which they are derived, they are worse estimates of taxon
relatio nships. However, of the plethora of methods and statistics for compar-
ing fundamental cladograms, consensus trees are the most useful fo r examin-
ing the results of clad ist ic analyses that yield mo re than one min imu m lengt h
cladogram. Conse nsus trees provide summary information about among-
cI'ldogram char<'lcter connict by providing an upper bound for the le ngt h of
characters among equally most parsimonious c1adograms (Nixon and
Carpenter 1996b). Strict consensus trees include only those groups (com po-
nents) for which there is unambiguous support Ilmo ng fundamen tal dado-
grams. All o ther methods include some degree 01 ~IOUI support.
Chapter summary 149
7.9 CHAPTER SUMMARY

I. When a cladistic analysis produces more than a single most parsimonious
cJadogram, consensus analysis provides a convenient means for summa-
rizing agreement and disagreement among these cladograms.
2. Cladograms derived from a data set directly are called fundamental
because they summarize its hierarchical information . In contrast, consen-
sus trees are derivative, as they are constructed from and represent a set,
or sets, of c1adograms.
3. If the goal of consensus analysis is to summarize agreement in grouping
among the fundamental c1adograms, then only strict consensus fulfil s that
goal. All other methods may yield trces that are not fully supported by
data or are supported only ambiguously. These have been called compro-
mise trees (Nixon and Carpenter 1996b ).
4. A strict consensus tree includes only those groups (components) that
occur in all the fundamental c1adograms. If the strict consensus tree is
the same length as its fundamental c1adograms, then the extra resolution
present in those c1adograms is spurious and due to ambiguous character
optimization. In these circumstances, the strict consensus tree represents
the preferred sol ution because it is both of minimal length and has all its
resolved nodes supported by data, that is, it is the strictly supported
ciadogram.
5. A combinable components (or semi-strict) consensus tree is formed from
all the uncontradicted components from a set of fundamental dado-
grams. In other words, it includes all of the components found in the
strict consensus tree but in addition includes those components that are
uncontradicted by less resolved components within the set of fundamen-
tal cladograms. When all the fundamental ciadograms are fully resolved
with no spurious resolution due to ambiguous optimization, then the
strict and combinable components consensus trees will be identical.
6. A majority-rule consensus tree includes only those components that
occur in more than 50% of the fundamental cladograms. When there are
only two fundamental cladograms, the majority·rule and strict consensus
trees are identica l.
7. The median consensus method is closely related to majority·rule consen-
sus but uses a tree comparison metric to measure the degree of disagree-
ment between any pair of ciadograms.
8. A Nelson consensus trcc consists of the clique of mutually compatible
components that an IDDIl replkated in the fundamental cladograms.
When there are only two fundlmcnlal cladograms. the Nelson and
combinable com fOCi e identical.
150 COllsensus (reel'
9. An Adams consensus (fee contains all the intersecting sets of taxa

common to the fundamental cladograms. Taxa in conflicting positions are
relocated to the I'1ln~t inclusive node they have in common. Consequently,
components can appear in an Adams consensus tree that do not occur in
any of the fundamental c1adograms. Adams consensus is most useful for
iden tifying ' rogue' taxa that can occu r in many dispara te positions in the
fund amental cladograms and thus cause the strict consensus tree to be
high ly unresolved.
10. A greatest agreement subtree or common pruned tree differs from all
o Lller consensus trec~ by including only the compone nts mIll l<lx<l held in
common by the fundamenl ai cladograms. II is obtained by pruning one or
more branches from each fund amental cladogram until a common topol·
ogy is obtained.
8.
Simultaneous and partitioned analysis
6.1 I NTRO DUCT ION
An area of cladistic me thodology thai has attracted much anent ion recently
concerns procedures for treating dat a derived from different sources. Most
syste matists acknowledge that there are different kinds of data, e .g. morpho-
logical, molecular, embryonic or larval , behavioural, elc. The debate concerns
the methods by which we analyse Ihese data and combine the m to revea l a
commo n phylogenetic histo ry. Some authors (e.g. Kluge 1989) argue that all
data should be analysed in a single matrix (Fig. 8. 1a), This has been called the
total evidence or character congruence approach because the final clado-
gra m(s) results purely from inleraction among all available characters. A
subtle hut im portant change in terminology, which we adopt , has recen tly
been in troduced by Nixon and Carpen ter (1996 a), who argued that this
approach is best called simultaneous analysis because all system atists ideally
use all (total) evidence, irrespective of the way in which they then deal with
Ihal evidence. Olher authors (e.g. Miyamolo and Filch 1995) prefer to
analyse data separately and then use consensus methods to combine the
resulting cladograms (Fig. 8. lb). This is called parlilioned analysis or the
taxonomic congruence approach because the fina l c1adogram(s) is the result
of adding together taxon cladograms, each derived from analysis of separate
data sels for the same laxa. Although there are advocales of bUl h approaches,
some systematists have been more concerned with prescribing condit ions
under which one or the ot her method may be most appropriate. Th at is, in
any particular circumstance, should we combine data (simultaneous analysis)
o r keep them separate (part itioned analysis)? In this chapler, we will ou tline
the main theoretical and methodological basis of both approaches, the
clai med advantages of each, as well as discussing possible cond itio ns under
which it may be better to use either simultaneous or part itio ned analysis.
It is worth emphasizing that vicariance biogeography relics o n a partitio ned
analysis approach inasmuch <IS c1adograms of different organisms inhabiting
similar areas are 'added ' together using consensus techniques. However, in
th is chapter, we are concerned lulcly with analysis of d ifferent kinds of data
rclevant to the phyloaenelle hillory of • pHrticul:l r group of organisms.
By way of introductloll. . . . _ .... eumplcs of studies that have used
huth "rIP"'":~U'U!~.!!!!1
152 Simultaneous and partitioned analysis
(a)
_ _I - ID"~ ~I I + I D"~~I I --
...
'\Y
BCDEF
(b) , • c •, , , ,
•, , ,
• c ABC 0 E ,
~ro'=W' ~
P-.d
V
ONIytll
(10 _ _ _)
+
Fig. 8.1 (a) In si multaneous analysis, separa te data sets are combined into a si ngle
ma trix before being analysed. (b) In partitioned analysis, eac h data set is analysed
separately (vertical arrows), yielding sets of intermediate cJadograms Lhat are then
'added' together using a consensus method. In bQth approilches. if more than one
most parsimonious cJlldogram is obtained fro m 'the analysis of a data set, then
consensus may be used to summ8ri?.e th e results.
Example 1: Milkwee d bu tl er/ lies

Vane-Wright et al. (1 992) coded morphological and chemical characters
expressed by len species of Arrican Milkweed bullerDies. The che mical
characters were coded as the presence/ absence of volatile pheromone com·
pone nts in the male abdominal ha irpencils. Sepa rate analyses of the mo rpho-
logica l and chemical dala each fo und three eq ually most parsimonious
cladograms. The strict consensus of each analysis is shown in Fig. 8.2.
Combining these two consensus trees to give an overall strict consensus tree
resulted in the solution shown at top righ t, which identified three mo no·
phyletic groups (circled). When the morphological and chemical characters
were combi ned inlO a single data set and subjected to simultaneous analysis
(bottom right), a single cladogram was found that identified eight mo no-
phyletic groups (circled). Therefore. in th is study. simultaneous analysis led to
far greater resolution than d id the part itioned evidence approach and, inci·
den tally, this unique solution was ident ical to one of the three alte rnative
morphologica l cladograms. Why this was so is explained late r in Ihis chapler.
Example 2: Echinoid phylogeny

T his examp le (Fig. 8.3) is taken fro m a larger sludy Ihal sought to establish
relationships among Recent and fossil sea urchins (Littlewood and Smith
(995). This extracl deals only with those Recent taxa for which both morpho-
logical and molecular information are availablc. T hree data sets we re used :
morpho logy, LSU (large subun it) rRNA, and SSU (small subuni t) rRNA. As
in exa mple I, simu lt aneous ana lysis led 10 fur greutcr resolution than did
partit ioned analysis. In the c1adograms derived fro m thc separate analyses
(Fig. 8.3, top), no te th e very diffc rent position o f Arbacia. T his was later
identified by Lilt lewood and Smith (1995) as _ . raxon'.
.#;\~o~"" ~fb·~~ h::'~
".
.,. ~
/.# .#."
.'
.,.. ),.<t>
.~... ~.;,fff
.....
If'..~ /... ,~,,~, /,'" :;'
/.",
iI',f ,.1'." ,I""".~/i' .t- ,,;-' (t' ~<!'i ...&-~i ...t' t1-"'~" b<&"J>'
i
~ Jt ..o~ #' ...t</".., ~ Fi
+
-.
morphology st r ict consensus
ff' .~
.'
"""",.
;Jf'.~.~~~..J' -~•• ~ _
..", _",,-If.Ji'
A if ~ ._,.l. ~'V#,b'll' .0.... - -I".Ji!>
C"<f ,0 (if' ..... ~ v ~ <f
simultaneous analysis
fII. I.Z Analysis of seven milkweed butlerfiies of the genus Amauris and three outgroup taxa (Danaus chrysipPu$, Tirumala
formosa and T. petiverana). Two data sets were availa ble, comprising 32 morphological characters (29 informative) and 68
c:bemical characters (63 informative). The partitioned ana lysis is shown along the lop row with the strict consensus trees from the
two separate analyses and the combined strict consensus tree. The result of simultaneous analysis is shown in the bottom row.
Informative nodes are circled. {From Vane-Wright a l al. 199zJ
~
.I'
.:P ,,-. -"
~..
:<t~ ~~
~~ ~~'If NA
_,{O
/'
If!'
,i>'O:>
,cO
.r\.Ji.,,l'~~
~,§>
il'
~",
,§>
../
If!'
Ii'
, ~
~~ ~~'
.. ". ".
~f' ~~
",,' of J' ~.. ,".rlf' ,.
/fI''f>...;.~ #~... #4t
I' ,,l'. .." J' ,.
~. #' ~ <5<." ••• ,;<"/,/./~<6''t' .,I'
.ji :,.~'¥ ~, 'IF:<:/' ~,.
<6'~ ,.~~ <&-"" qI' ~,<:," #- ,..~
"'" '(f' >J'
.> ./ <5<""'-/ <6' # <6''''<6' /'
,JP" ,f~ ;/ ",0'

F ""' • ,," J' I'..l"
.'" <6''t' / ' . " <6''t' . / <¥'
IF.. 8.3 Analysis of echinoids using throe data sets: 163 morphological characters (50 informati ve), 2i8 base pairs within the SSU
:SA gene (34 infonnative), and 91 base pairs within the LSU rRNA gene (28 informative). The partitioned analysis is shown along
Ithe lop row. The cladogram derived from simultaneous analysis of all three da ta sets is shown below. Note thaI this example is
e xtracted from a larger study in which morphological data were coded for many more taxa, which explains the relatively few
infonnative morphological charflcter.;,(From Littlewood and Smilh 1995.)
Theorctical issucs ISS
Example 3: Deer mice and grasshopper mice

This example (Fig. 8.4) is taken from a study by Su llivan (996), who
compared the resuhs of partitioned and si multaneous analysis of two mito-
chondrial gene sequences, cytochrome band 12S rRNA. Su llivan was inter·
ested in the performance of these data selS as judged against a particular
phylogeny (taken as true). The results from the separate analysis of each gene
sequence were each reasonably well resolved, with strong bootstrap support
for all nodes (but see Chapter 6 for a discussion of the Oaws of this stat istic).
However, despite this, when the two cladograms were combined into a strict
consensus tree, many components were lost. In cont rast, simultaneous analy-
sis yielded a single cladogram that was congruent with both the cytochrome b
result and the ' true phylogeny'.
8.2 THEOR ETI CAL ISSUES
Both approaches to data analysis are based on different premises. Simultane·

ous ana lysis accepts that the goal of cladistics is to maximize ex planatory
power; that is, jt aims to explain the distribution o f a ll available characte rs in
the most parsimonious fashion (maximum informativeness). The issue of
whether this yields the true (but usually unknowable) phylogeny is a separate
issue. We may also look at this in a slightly different way. Any phylogenetic
analysis attempts to maximize homology. Our method of testing hypotheses of
homology is through character congruence (see Chapter 2), from which it
follows that the more characters are included as potential tests of homo logy,
the more seve re that test becomes and the more confident we may become in
accepting the truth of the conclusio n.
Partitioned analysis starts from the premise that there are genuinely
independent types of datl:! (both in kind and ability to yield a phylogenetic
signa]), which need to be analysed separately. The most extreme exa mples
may be molecular and morphological data where, in some cases gene dado-
grams do not match species cladograms. One rationale for partitioned ana ly-
~ i s stat es that the results of separate analyses may provide tests' for Qne
another. That is, by looking for repeating patterns, we may ObSClVC confirm,,·
tion or rejection of an original hypothesis (G rande 1994). The underlying
assumption is that different data sets arc truly independent indicators of
rel ationship.
Most of the debate between the 'H.lvocates of simultaneous and part itioned
ilua lysis has concerned justification for different methodologies and the style
of argument has been thlll uf claim and c()unh:rdaim . Rath er than tre<ll each
view separately, we will try to avoid repelilinn nnd t<lke the <1l1egcd principal
advanlilge ~ o f both aDd dllaull both .Idol uf Ihe issues involved. As the
advocates of parlilioned more issues, we begin with this
npproach.
~~,:§>
.!i'.'~"'~'~~~/.",,, ,. .,f~
4> ,. ,,0\ ~ ";> •.:> ...\
,:§>~~<"'~ O~ ...J>~", ~
d' "'~ .if', .f~' ",.,,"",,' #' $
.,.-f q ..J'.'#
~. ~.
.f'. ~ '.f' ~ ,~~
q . q . q . q . o· O· O· q.
;t.1!-
#'.q.#'q . ~fli~ ~~" ,o,f/ -
q . q . q. q . O· o· O· f"'.""'1'.~"~#• II".to·
.t:
q . q . q . q . q . q . q . O·
fT
, , //
0'
+ --.
1 2SIfNA strict
+ ".,""/ .J>
,f~'j}-
...'
"," "J.J'. ","'""
,# ~ ~J''il ~ r o· o·,r.I'
q.
I:t
q . q . q . q . q . q.
./ 4-.;'
O·
.... 8..Analysis of two mitochondria l genes of seven species of deer mice (Peromyscus) and three species of grasshopper mice
COnychomus). The cytochrome b (Cyl h) sequence contained 61 informative characters and the 12S rRNA had 48 informative
characters. The partitioned analysis is shown along the lop row. The cJadogram derived from simultaneous analysis is shown below.
The figures are the percentage of bootstrap replicates in which the groups were recovered. (From Sullivan 1996.)
Partitioned analysis (taxonomic congruence) 157
8.3 PARTITIONED ANALYS IS (T AXONOM IC

CONGRUENC E)
8.3.1 Im.lelKlnLhmce uf data sets
T he crux of arguments in favour of partit ioned analysis resls on the
that there arc genuinely different classes of data, wh ich may refl ect d;ffere n ~
evolutionary processes of character change and which behave differently
respect to phylogeny reconstruction. This idea has its roots in the
literature on phcnetics (Sokal and Sneath 1963). The most (requently rec'llll
nized independent data sets are molecular sequences that lead to
cladograms, and morphological data that lead to species c1adograms.
mayor may not agree. Some authors (e.g. Miyamoto and Fitch 1995)
suggested that in sexually reproducing organisms, the maternally
mitochondrial genes constitute data independent of the biparentally
nuclear genes. Further, protein-cod ing and non-coding genes may be
ered as distinct and , perhaps, transcribed and non-t ranscribed pam
genome
be a casemayfor
alsoconsidering
be independent.
larval Within the :~~E:l:~;:~:,~~:::~3
characters
cha racters, part icu larly in organisms that drastic
Bull el af. (1993) defended data parti tioning using what th01
' process partitions', by which they meant subsets of characters
evolving according to different sets of rules. Different kinds of
as mentioned above wou ld fa ll into differen t process partitions.
and Filch (.1995) weIll further and ident ified five criteriu of pnJC''''-.1!I
• The genes arc not genetically linked.

• The gene products do not interact.
• The genes do not specify the same {unction.
• The gene products do not interact in the same i I
• The gene products do not regu late the expression of I!. aent: 1I:J1• • •
process partition.
Given the complexity and integral natu re of gene expression.~:=~

our relative ignorance of gene inte raction, it may be difficu lt to -"
into sets according to these st rict process criteria. Bu t the inlenl_
partitioning data is 10 try to establish whether we have independent mew
independen t data to estimate phylogenetic relationships. The ob"'MldII~
that we may retrieve strongly supported but non-congruen t topologies
diffe rent data sets is evidence that one or more is positively misleadinl
analysed using parsimony. The QUl al expl anat ion is usually the SUIII""I<><i]
that different evoJutlOllllJ PIa II IIIR .cting upon different ' process
tions' with the el~7? 55 _L.vuJutionary sianals will reluJI.
remains 10 be demo nstrated empirically but, if correct, then the corollary is
that we would analyse separate part itions using different models of evolutio n·
ary change. Then O Uf analyses would shift from a parsimony approach
towards a maximum likelihood approach, where analyses are driven by one or
mo re of a myriad of process models, each of which in itself requires
justification. If the justification for partitioning dala is to account for differ·
enl evolutionary processes, then we may have difficulty decid ing how fin e the
partitions must be. Once again the problem is most tra nsparent in mo lecular
studies where there are no ticeable differences in evolution ary rales between
and even within genes (e.g. stem and loop regions).
Bull el al. (1 993) carried out simulation experiments in which they began
with a given phylogeny. The data were then divided into two partitions, onc of
which was allowed to evolve quickly and the other slowly. Fro m the results of
these simulations, they claimed that combining the resultant two data sets in
a simultaneous analysis was less successful in·,.ecovering the given phylogeny
than was analysing the two data sets separately. Howeve r, if the data sets are
differe ntially weighted, as is o ft en done with molecular data, then the
problem evapora tes and simultaneous analysis produces the morc accurate
resuh (Chippendale and Wiens 1994). In a sense, thcse simulation experi-
ments are misleading because, in reality, the 'given phylogeny' is unknown
and the conditions imposed on the expcrimems are too simplistic. Empirical
examples are much more informative. Sullivan (1996) presented such a study,
which showed that among-site ra te va riatio n was more o r less uniformly
distributed across the sequences and that parti tioning the data was neither
feasible nor appropriate. His study (Fig. 8.4) allemptcd to discover how we ll
sequences of cytochrome band 12S rR NA recovered a phylogeny of deer
mice (PeromysCIIs) and grasshoppe r mice (OnycJwmys), which was acce pt ed as
the ' true phylogeny' based on several other kinds of evidence. Separate
analysis of the genes gave co nflicting results but each was strongly supported
(a t least as measured by bootstrap values). Three separate tests suggested
strong heteroge neity between the data sels (i.e. it was not due to sa mpling
error). which, according to Bull et al. (1993), should mean that these data sets
represent diffe rent process partitions. Therefore, these data should have been
kepi separate and accepted as alternative bu t eq ually va lid gene histories
(evcn though bot h genes are mi tochondrial and are o nly maternally inherited)
Simultaneo us una lysis of the twO ge nes gave the cytochrome b topology,
which was the same as the ' true topology' and with even higher boot strap
values. The conclusion was that the 125 rRNA cl adogram, even though
strongly conflicting with the ' true phylogeny', cont ained a common hidden
phylogene tic signal and Ihat combining the data in a simultaneous an alysis
allowed that signal to 'show th rough'. Su llivan's (1996) resu lts suggested that
the ' phylogenetic signal can in fuct be add itive when data from genes wi th
different evolutionary histo ries are analysed u dcr u hOIllOI!.CneOlis recon-
st ructio n model (parsimony with eq ulIl weigh. )',
Partitioned analysis (taxoll omic cOllgmellce) 159
BuU et al. (1993) were less extreme than Miyamoto and Fitch (1995) in
their process partitioning. AJthough they recognized the possibility of differ-
ent processes, they were more concerned wi th demonstrable heterogeneity
between data sets and the condit ions under which they shou ld be combined
or kept sepa rate. We will return to this question later in the chapter.
8.3.Z Cladogram support

Another cla imed advantage of partitioned analysis is that different data sets
may vary in their ability to recover a phylogenetic signal and th at by keeping
data sets apart this may be recognized. The idea behind this claim is th at
differen t data sets may include very different levels of homoplasy. For
example, data set J may yield o ne optimal cladogram plus many suboptimal
c1adograms that are but one step longer. Data set 2 may yield one opti mal
c1adogram that is many steps shorter than the nearest suboptimal solution.
Data set 2 may be sa id to con tai n a stro nger phylogenetic signal and this
recognition may allow us to select the second cladogram in preference to the
first. Our choice may also be strongly innue nced if the optimal cladogram
rrom data set 2 was identical wit h one of the near suboptimal cladograms
from data set I. Simultaneous analys is mayor may not recover the
c1adogram wi th less homoplasy but, in any event, it may no t allow us to
identify whelber data set I or data sct 2 is giving the stronger signal.
However, if differen t strengths of signa l are suspected, it is possible to cater
for this in a simultaneous analysis by adding or de leting characters and noting
which sets of characters are congruent with the concl usion.
There is another side to this argument that supports the simultaneous
analysis approach. If different data sets do yield cladograms with very
different levels of support then, unless the topology of both is identical, lhe
consensus resulting from partitioned analysis will lose any evidence of sup-
port hetcrogeneiry. This is because the results of ana lysing separate data sels
are each regarded as having equal weight before the cladograms are added
toge ther to give the final result . Of course, il is possible to allow for this in a
simultaneous analysis by applying different weights to different character
partitions that make up the total matrix. However. th is does not obviate the
necessity to justify such a practice explicitly.
8.3.3 Different sized data sets

Advocates of partit ioned analysis argue that c1 adograms produced from
analyses of separate data sets have an equal contribution to make to the fin al
result. One claimed Idvaatap of this is to avoid the potenti al effects of very
unequally sized dall mn '. I 1& .. common for molecular data sets to be an
order of malnltude.... data sets with the implied
danler thll the will IWlmp the morpholoaical
160 Siml/ltaneous and partitioned anulysis
signal. Theoretically this may be possible but in pract ice it rarely happens. In
Example 2 (Fig. 8.3), there are many more potential cha racte rs in both the
!..SU and SSU rRNA sequences (2200 base pairs) than in the morphological
data set (163 characters), Howeve r, the numbers of informative characters
that may contribute to topological variants arc comparable between data sets
(1 21 molecular and 136 morphological). T his situation is by no means
uncom mon. Exceptions may be those sequences that show strong G-C bias
(see, for example, Hedges and Maxson 1996. who favoured a partitio ned
analysis approach when a nalysing amniote relationships; also the sect ion on
weighting in Chapter 5), It remains to be demonstrated empirically that data
sets will swamp one another as a rule rather than exceptionally.
For a simultaneous ana lysis, the only way in which one portion of the
combined matrix will swamp out a signal from another portion of the matrix
is if the swamping signal is sufficiently strong. With molecular data, th is is
rare ly the case, and even then, the optimal cladogram is usually followed by
suboptimal solutions favouring the morphological data that arc one or two
steps longcr. Even accepting the fact that there may be many more in forma-
tive churacters in a mo lecular data set, if there is a stronger signal from
morpho logical data, it will generally show through .
8.4 SIMULTANEOUS ANALYSIS

8.4.1 Resolution
One claim of th e simultaneous analysis approach is that the resulting clado-

gram(s) is nearly always more highly resolved than is a consensus of separate
c1adograms. Th is was so in all the exam ples given at the beginning of this
chapter. Expressed another way, simultuneous analysis leads to greater ex-
planatory power and this is the primary justification for the method. There
have been no extensive counterclaims made against th is poin t but it has been
argued that even though the consensus cJadogram may be less resolved, it
does provide a conservative estimate of phylogeny (Swofford 1991). We
interpret this to mean that by taking the conservative route, there is less
dange r of selecting a 'wrong' answer-which indeed may be true. Notice,
however, that the contrast made here is that between character congruencc
and phylogeny reconstruction. The claimed advantage of partitioned data and
consensus is that a conseTVative phylogeny (a statement of historical events) is
preferable to character congruence (a statement of the most parsimonious
distribut ion of characters).
But there arc problems. The purpose of a phylogenctic tree, as opposed to
a c1adogram, is to plol character changes (evolu tion). As Nixon and
Carpentc r (t 996a) pointed out, the cllllflicting dadograllls Ihat produced the
polytomies in the conse nsus tree COntain COnnK:linl chuructcr optimizlIl ions.
SimuiulfIt!OIIS allalysis 161
Conscqucnlly, the conservativeness that is claimed for consensus trees actu-

ally means ambiguity in ideas of character evolution, which must then be
resolved some other way (choosing one of the cladograms). Using a consensus
tree as a conservative statement of phylogeny requires that we ignore both
the taxa involved in the ambiguity and any inferences of character evolution
of those characters that are optimized differently on alternative c1adograms.
In a consensus tree, the on ly statements that we can make aboul evolution
will be limited to unambiguous character optimizations. The more highly
resolved the cladogram, the more statements can be made about character
evolution when the c\adogram is interpreted as a phylogenetic tree. To retain
all laxa in a phylogenetic analysis resulting in a consensus requires choosing
one tree. Clearly, the cho ice is easier if there are fewer alternatives.
There could be some advantage in maintaining separate data sets in order
to recognize the patterns of resolution of different kinds of data. In the
Amauris exa mple, analysis of both morphological and chemical data each
produced three optimal cladograms. However, while for the morphological
data the ambiguity involved Amal/ris hecate, A. albimacuiatll, A . damocles
and A. ochiea, for the chemical data it involved Dallaus cJllysippus, Timmala
/omlOsa and the gen us Amamis as a unit. Such information would be
concealed by simultaneous ana lysis, yet this might be o f interest when
comparing separate analyses of different groups of o rgan is ms using similar
types of data. It may be recognized that molecular data derived from a
particular gene consistently gave poor resolution amongst taxa inferred to be
of particular age.
8.4.2 Arbitrary consensus
Ano ther claimed advantage o f simultaneous analysis is that it avuids arbi

lrariness in the consensus methods used (Kluge and Wolf 1993). However.
both partitioned and simultaneous analysis use consensus trees to su mmarize
information when more tha n one cladogram results from the analysis of •
given data set, and to this extent there is no diffcrence. However, partitioned
81lJ1lysis also uses consensus at a second, later stage (Fig. 8. 1) to summari7.
the information common to the separate data sets, and il is possible thnt in
order to maximize the effectiveness of this consensuS, the fin al result is II
hybrid o f two or more conse nsus me thods. The re arc many consenSWj
methods (see C hapter 7), each designed to summarize info rmation in II
difrercnt way. Some (e.g. Adams conse nsus trees) may contain topological
variants not part of the original sct. while others (c.g. majurity-rule consensus
trees) ignore somc (opoloatcal varian" thai may hc com mon to all separate
analyses. Clearly, to combine. CDnICn~us tree for data set I with It
strkt consensus tree for consensus tree for dal.
set 3 is to eliminale lOme,
162 Simultaneous and partiTioned analysis
ciadograms. H owever, as long as the sa me consensus method is used through-

out and justified, then the choice of consensus method is nOI relevan t to the
debale between part itioned and sir)1ullaneous analysis.
8.5 CON DITIO NA L DATA COM BI NA TIO N
As men tioned at the beginning o f this chapter, some authors acknowledge a

utility in both a pproaches to data a nalysis a nd have recognized there may be
occasions when one or the other is preferable. This has been called con-
ditional data combination (Huelsenbeck el al. 1996), The 'conditional' is
cenlred o n measures o f heterogeneity between the data partitions. As Bull et
al. (1993) explained, when he teroge neity between data sets yields significantly
different phylogenetic estimates that are too great to be explained by sa m-
pling e rror of either taxa o r characters, ltlen the analyses should be kept
separate and simultaneous analysis not undertaken. What might provide a
test? Several have been proposed. De Queiroz (1993) suggested using boot-
strap values as a criterion. If, when comparing bootstrap values of conflicting
clades, these values are both high, then th e data should not be combined.
However, there are problems wit h the bootstrap (see Chapter 6) and, as the
study by Sullivan (t 996) described above showed, it does not necessarily
follow that connicting cladograms, each of which has high bootstrap va lues
assigned to the internal branches, give poor results when combined in a
simultaneous analysis.
Othc r, more complex statist ics have been devised to test whether incongru·
ence betwee n data sets is greater than expected by chance alone (e.g. Farris
el at. 1994, Huelse nbeck and Bull 1996). H ere, we choose Templeton's
no n-parametric test, as applied by La rson (1994), as an example. In this test,
the most parsimo nious cladogram (or cladograms) is derived for each data
set. Next, the fit of characters [rom one data set is compared on the two
alternative topologies (or o n those most closely comparable if each data sct
produces more than one optimal cladogram). If the characters show a
significa ntly bcttcr fit to their 'own' topology than to thc ' rival' topology, then
the differences between the two data sets ca nnot be explained as the chance
result of sampling error. This test is worked through using the Milkweed
butterfly example (Table 8.0. In this example, th e conclusion is that the data
sets ca n be combined because the differences between them in one of the
data sets cou ld be due to chance alone and are no t the result of a significa nt
difference in phylogenetic signa l.
This and ot her such tests may be statistically elegan t but rcmain relatively
crude, and there are no hard and fast rules for deciding what strategy to take.
Unfortunately, it is not just characte r sampling that may be misleading, hut
also taxon sampling. In the example 2 uself hen:: (Fig. fU), application of
Templcton's test recorded differences hetween ,he mor~hul()gic"l cladogram
Conditionol lJata combination 163
and SSU rRNA cladogram that were judged as insignificant , but the differ-
ences between the LSU rRNA cladogram and morphology were judged as
significa nt, suggesting that they should not be combined. However,
LiUlewood and Smith (1995) recognized that this significance was due almost
entirely to the different positions of Arbacia on the two cladograms and
rega rded Arbado as a rogue taxon. They considered that because the
he terogeneity between the data sets was caused by a single taxon, then
simultaneous analysis was the preferred option.
Table 8.1 Templeton's test of data he terogeneity appl ied 10 the Amauris butterfly
da ta. For each of the th ree fu nda mental c1adog rams that fonn the morphology
consensus and the chemica l consensus, the topology is chosen frolll one set that
most closely resembles one from the other sel Morphologica l characters are then
optimized on to hoth cladograms. For each character the difference in performance
on its own cladogra m and the rival cladogram is noted together with the number of
extra (+) or fewer (-) steps. The total number of characters is noted keeping extra
and fewer steps separate. The differences a re then ranked (mid-point scores given
when two or more characters share tile sa me number of differences). The sums of
tho positive a nd negative ranks Are ca lculated separately And the smaller of the two
figures ta ken as the test statistic. The probability value is read ITom a critical value
ta ble of the Wilcoxon Rl:mk Sum. If the probability is < 0.05 hhe critical va lue}, then
the d ifference betwee n the performance of the characters on to its own cJadogrom
and the rival cladogram is signifjcant. The reciprocal operation is also performed for
the biochemical characters. In this case, it is concluded that the chemica l da ta are
not optimi7.ed signUicontl y belter on their own c1adogra m than on the morphologi·
ca l c1adogram. However, the morphological data are optimized significanl1 y better
to their own dadogram than to the chemical c1adogra nl. This means th at the
chemical signal is weak and the data may be combined.
Num ber of times

character changes on
Morphology morphology biochemical Difference Rank
character topology topology + +
5 1 2 1 0 5.5 0
6 1 2 0 5.5 0
7 1 2 1 0 5.5 0
•
9 1
2
2
I
1
0
0
5.5
5.5
0
0
12 1 2 1 0 5.5 0
13 1 2 0 5.5 0
2' 1 2 0 5.5 0
1 2
28
29
Tolal::: 10
1 2 I
10
0
0
0 5.5
5.5 "
0
Ranked sum 55 ()
Test sta tistic ::: 0; probability 'C' . , ......meanl ,
Table 8.1 (Continued)
Number of times
character clnHl ~as on
Biochemical biochemical morphology Difference Rank
character topology topology + +
37 3 4 6
3. 1 2 1 6
62 1 2 1 6
64 1 2 1 6
65 1 2 1 6
7. 2 3 1 6
71 1 2 1 6
72 3 2 1 6
73 2 1 1 6
78 3 2 1 6
80 1 2 1 6
Total =11 8 3
Ran ked sum 48 18
Test statistic = 18; probabiUty value> 0.05, not significant
If, as a result of a heterogeneity lest, simultaneous analysis is ' recom-

mended', then the data are combined. If, however, it is not ' recommended'
then we must fin d the sou rce of the heterogeneity. And if the heterogeneity is
due to sa mpling then, theoretically, this may be corrected. If it is thought that
it is due to different evolutionary processes then we must choose between
data sets- hul this is a decision taken beyond cladistic analysis.
8.6 OPERATIONAL DIFF IC ULTIES
In atlempts to include as much data as possible in any phylogenetic analysis,

we may encou nter problems with both taxon and cha racter sam pling. Such
difficulties will affect both part itioned and simultaneous analysis. Different
taxa may have been sampled for different characters. For partitioned analysis,
the problem of combining c1adograms with different but overlapping sets of
terminal taxa is analogous to problems e ncountered in cladist ic biogeography,
and the resolution oC such dilemmas resides in the methods of component
analysis. Of more immediate concern are the problems encountered in trying
to combine into a common matrix different data sets that may not include the
sa me taxon sam ple or representation of characte r completeness (e.g. fossils
with Recent taxa).
Some workers reduce the taxon sampling within each data set such that the
same termina l taxa are represented. But this may be too restrictive (it would
certainly discriminate against including fossil. iD IoaY and molecular
Conclu.sions 165
studies) and could result in very poor taxon sampling for the group under
study. Another strateb'Y would be to use 'hybrid' taxa. These may be of several
kinds. Perhaps the simplest situation would be to use morphological data
from one species and add molecular data from another. For a very speciose
ge nus, this may be unsatisfactory, but could be checked later when more data
become available. Often, it happens that morphological data exist for many
species but molecular sequences for only one. Thus, another approach might
be to carry out an initial analysis of the species using morphological data and
then code the 'ancestral ' conditions into a morphotype, before adding the
single molecular sequence. Here the hybrid is between a real and inferential
sample of characters (sec Chapter 3 for the problems associated with using
such artificial 'groundplans').
Until recently, it was commonplace for a sequence for a given gene to be
available for only a single species within a genus and often for just a single
individual within a species. However, multiple sequences are now becoming
increasingly available for the most popular genes (e.g. 28S rRNA, iSS rRNA,
cytb, COIl) and this is giving rise to another practical sampling problem.
Some species may still be represented by only a single sequence, but for
others, there are several sequences known that, in addition, vary among
themselves. Then, in order to avoid polymorphic codings for some taxa,
variable sites could be deleted in all terminal taxa.
A rather special case of data diversity occurs when fossils are included in
analyses embracing data from soft anatomy as well as molecular sequences.
Here, there would be many question marks inserted against the fossil taxa.
The disruptive effect of many question marks has already been considered in
Chapter 4. Discussion of Ihis topic has focu sed on situations in which the
question marks are scattered throughout the data matrix and there have been
fears that many question marks may cause convergence onto an incorrect
solution (Huelsenbeck 1991b). However, in situations where we may wish to
add fossils that are reasonably well known from skeletal data but are
unknown for other types of data, all the question marks are clustered in one
partition of the matrix. Recently, Wiens and Reeder (1995) carried out
simulation tests on real data and showed that when missing data are concen-
trated, the effect on inaccuracy is relatively minor. This gives us hope thai
including fossils, which have the ability to break up long branch lengths, may
be more beneficial than excluding them.
8.7 CONCLUSIONS
In many respects, devotMI at partitioned Inalysis and simultaneous analysis

are seeking different.,.... aDd lIIuch 01 tho diJcussion of one faction is
actually tanaentlll • die P' rllli of .... odIer. MOIl arluments in favour of
partitioned evolutionary
166 Silllu/tUlzeUII$ and partitioned analysis
processes may be affecting different data in different ways. The goal is to

recogn ize those partitions so that they may be analysed using different
evolutionary models. The perceived acceptable risk is relaxation of the
primary cladistic principle of parsimony. The strength of partitioned analysis
lies in the estimat io n of the reliability of phylogenetic signals coming from
different data sou rces. Partitioned analysis is concerned with evolutionary
trees and patterns of character evolution, even though the tree may be poorly
resolved .
Simu lt aneous analysis is driven by the goal of parsimony over all charac-
ters, which maximizes information content and provides the most severe test
of homology. The primary aim is to establish a cladogram, which can then bt:
converted 10 a phylogenetic tree from which we can infer evolut ionary
pallcrn and process. The acceptable risk is tha t it may disguise the re lative
strengths o f differe nt phylogenetic signals. Since the discovery of homology is
the cen tral tenet of this book, we recommertd simultaneous analysis as the
appropriate approach for clad istic analysis.
B.B CHAPTER SUMMARY
I. Simultaneous analysis combines all ava ilable data, from whatever source,
into a single data set for analysis.
2. Partitioned analysis allocates data derived from different sources to
sepa rate data sets, which the n arc analysed individually before combining
the results using a consensus method.
3. Simu ltaneous analysis aims to maximize explanatory power of the data by
explaining the distribution of all characters in the most parsimonious
manner. By testing hypotheses of homology through character congru-
ence, it follows that the greater the number of characters included in an
analysis, the more rigorously are those hypotheses tested. This is the
strength of simultaneous analysis.
4. In contrast, partit ioned ana lysis is bilsed upon the premise that there are
genuinely different classes of data that may re nect different evolutionary
processes. These data classes may thus be independent indica tors of
relationships and need to be analysed separately. In this way. the resu lts
of the separate analyses may provide tests o f o ne another.
5. Classes of characters that evolve accordi ng to different ru les are terme
process parti tions. However. given. the complexity of. g~ n e ex pre~~on an~
our meagre knowledge o f evolutio nary processes, It IS very dlrflcul t to
justify where to draw up the pa rtitions or to decide how fin e they mus
be.
167
6. Differe nt data se ts mily vlI ry in thei r levels of homoplasy a nd thus in their

ability to recover a phyloge netic signlll. By keeping data se ts separate.
this !]lay be recognized. This is the st rength of partitioned analysis.
7. It has been claim ed that combini ng a data set with a small number of
characters with o ne that is much larger may lead to the swampi ng o f its
phylogenetic signal. However, in practice, this has rarely been found to
occur.
8. Simu ll ancous analysis oft e n produces a more resolved result than docs
partitioned analysis. Thus, simultaneous analys is leads to greater expl ana-
tory power.
9. Conditio nal data combination is based upon the premise that there may
be some occasions on which simultaneous an alysis should be undert aken
and o thers when the partitioned approach may be preferable. The
hete rogeneity in phylogenetic signal between data sets is ca lculat ed and if
this is greater than can be explained by sampling error, then the data sels
should be kept separate . Heterogeneity tests include the use o f the
bootstrap and Templeton's test.
10. Jf data from two or mo re sources are unavail able for some taxa, then
combinin g data into a single matrix may lead to a large percentage of
missing values. However, in such situ atio ns, the missing values are gener-
ally concentrated in one partition of the matfix and simulation tests have
shown that the ir disruptive effects afe relatively minor.
9.
Three-item statements analysis
9.1 INT RODUCTIO N
De Pinna (996) wrote that ' the most imeresting idea in mainst ream theo reti-
cal systematics in recen t years is the so-called three-item analysis', This may
strike cladists as a curious stateme nt, given that most commentary relating to
three-item statements analysis has been decided ly negative (Harvey 1992;
Kluge 1993, 1994; Farris et al. 1995). He wetH on to point out that a major
success oC Ihe cladistic approach was the recognition that anceslor-
descendent relationships among taxa cannot be object ively proposed or
tested. only siste r-group relationships (see Chapler O. The idea that one
taxon can 'give rise' to another is ack.nowledged to be beyond proper
scientific investigation. 1t is perhaps curious that cladists still treat characters
in a pre-cladistic way, as if one character state can give rise to another,
relating one state to another in ancestor-descendent fashion. Th is approach
seems embedded in the standard approach to character coding with the
recognition of ' transformation series' and the use of ch(lracter optimization.
If ancestral taxa have bee n discarded from scientific enquiry then why not
ancestral characters? Both appe<lr to be based on the absence of evidence
and the formalism of conventional cladistic approach.
All cladis[s agree that cladistics is about grouping by synapomorphy and
that synapomorphy is evidence of relationship. Three-item statements analy-
sis is an altcrnative way to codc data based on the idea that ' taxon' and
' homology' represent the same relationship (Nelson 1994). 11 departs from
other approaches by focusing on the smallest possible unit of relationship, the
three-item statement, and how these fit most parsimoniously to possible
cladograms. In this sense, three-item sta temen ts analysis is an en tirely differ-
en t way of viewing data. Acco rding to its creators (Nelson and Plat nick 1991),
three-item statements analysis improves (he precision of parsimo ny. This
cnapler concentrates on the impleme ntation of three-item statements a naly-
sis and outlillt:s some possible ways in which precision is improved.
9.2 CODING
Prior to analysis, systemat ic datil (observations) are coded a.~ series of hinary
or muitistate characters renecting judgeme nts uf primMry homolog)' (sec
Cmlillg
Chapter 2). Using a fo ur taxon example (A- D), if taxa C and Dare observ
to have a feat ure, the n they are usually coded as ' I' and taxa A and B, which
lack that feature. a re coded as ' 0'. The hinary ch<lf<lcte r incorporates an
element of ' ide ntity' (the Js) and an clement of 'difference' (the Os). Each
binary character is assumed to be an independent homology. For muitislale
characters, the different st ates are assumed homologous among themselves
a nd as a consequence are non-independe nt (i.e. th e Is, 2s, e tc., a re de pen-
dent). Multistate cha racters are often represented (or inte rpreted) as suites of
binary characte rs in a clad istic a nalys is. Fo r the purposes of this chapter,
re presenta tion of data as binary or mu ltistate variables will be re fe rred to as
the standard approach .
Three-item state ments ana lysis, in contrast, docs no t re prese nt syste matic
data as binary and multistate variables but reduces observa tions to their
simplest expression o f relatio nsh ip, a three-ite m state me nt. Fo r example, the
three-item state ment A(BC) implies thai taxa Band C share a relationship to
the exclusion of taxon A. Suites of three-ite m statements ca n be a rra nged
into a statement X taxon matrix for a nalysis with a parsimony program, in a n
ide ntical fashion to standa rd binary a nd multistate data.
Using the same four taxon example (A - D) as abovc, in which C and D
(aJ (bJ ('J
ACD9CD"aCD
Y·Y -V
(dJ
A (CO) • ."1 A B (CO)
(eJ II)
, , , , , , , ,
0
•
,•
0
0
-v .10
B O?
'"
0" -v
F'ig. 9.1 Diffe rent analytica l representations of one binary character, AB(CD), for
which two three-item sta temen ts are possible: A (COI and BleD}. \a) Diagrammatic
represen tation of the three-item statement A(CO). (b) Diagrammatic representation
of the three-item statement B(eO). (c) Diagrllmma lic representalion of Ihe solulion to
A(CDI + B(eD) = AB( e D). (dl WrlttOIl ropl1l86l1ta llon of Ihe statements find solution In
(II -C). (e) Standard dllta m,trll( ,nd lI' 10111110 11. (f) Three-Hem statfltnonts l1ullri x and
its solution. Note thai while IbII IOluUon 'PPftlllS to 1>& the seme os (d), II is actually
Il strict conSfln8U' 1rH of . ., mOAt parslmo nlou8 lulutlonr< lind the
m.
170 Three-item stalements analysis
possess a particular feature and A and B do not, two three-item statements
are possible: A(CD) and B(CD) (Fig. 9.1a-b). Addition of these two three-item
statements produces a summary cladogram (Fig. 9.!c), in which the two
three-item statements combine to unite C and 0 : A(eD) + B(CO) - AB(CD)
(Fig. 9.ld). This is identical to the summary cladogram from the binary
character, AB(CD). which unites C + D on the basis of a common possession
of state I (Fig. 9. Ie).
From Ihis simple example, we can see Ibal for primary representation of
data (the o riginal observations), there is no difference between the three-item
statemenls and the standard approaches. The choice facing systematists reSIS
upon which aspect of the data they wish to represent from their original
observations. As a starting point, it is worth remembering that cladistics, in its
most general form, is concerned with hierarchical patterns, whether those
patterns express relationsh ips among characters. taxa, genes o r areas (Nelson
and Platnick 1981). In short, an hierarchical ~ttern does not imply a process
but expresses degrees of relationship. Cladistics is the study of relationships.
9.3 IM PLEMENTA TlON
The above example demonstrates that adding two three-item stateme nts can
be ach ieved simply by hand. Howeve r, most data sets involve many (some-
times very many, see below) three-item statements and computerized meth-
ods then become necessary. Three-item statements can be represented in a
standard character x taxon matrix. However. if we consider our earlier exam-
ple (Fig. 9. la,b), for the statement ACCD), there is no corresponding data
point for taxon B, wh ile for statement S(CD), there is no dat a point for taxon
A. Nevertheless, current parsimony programs require that a ll cells in a data
matrix be filled with a val ue of some kind and so these ' data' points are
represented by question marks (Fig. 9.10. The results then differ from those
expected, in that not one but three mOst parsimoniolls cladograms are found:
AB(CD), A(B(CD» and B(A(CD» . The strict consensus tree of these three
solutions is shown in Fig. 9. If (for furthe r explanation, see §9.3.6). This aspect
of three-item statements analysis has been exploited in various critiques
(HalVey 1992, Kluge 1993, 1994). However, it should be noted that the
differences between the manual and compute r solutions are due to current
idiosyncrasies of parsimony programs, especially their treatment of question
marks (see Chapter 4), and not to the form in which the data are represented
(Nelson and Ladiges 1993). 11 should be borne in mind that implementation
of a method involves issues separate from the reasons to adopt the method,
although both are connected.
9.3.1 Binary characters

The three-item statementl equivalent to a.l!!!!!i!ij ~lol...nined by
Implementation 171
compari ng each pair of taxa that has the informative state with every taxon
that lacks that state. For example, for a character expressing the relationship
ABODE), there are three three-item statements: A(DE), B(DE) and c(DE).
Only taxa DE possess the informative state and as a pair are re lated to A,
Band C. The number of possible three-ite m statements is given by
(t - n)l1(n - 1)/2, where 1 - the total number of taxa and n - the number of
taxa with the informative (apomorphic) state. For ABC(DE), 11 - 2 and t "" 5,
hence (5 - 2)2(2 - 1)/2 = 3 statements. For a character expressing the rela-
tionship AB(CDE) there are six Ihree-item statements: A(CD), A(CE), A{DE),
B(CD), B(CE), and B(DE). Taxa CDE possess the informat ive state and
constitute three pairs, CD, DE, and CE with each pair related to A and B.
Thus, 1/ - J and 1 - 5. hence (5 - 3)3(3 - 0/2 = 6 statements.
9.3.2 Multistate characters
Multislate characters are usually disassembled into suites of binary characters

for the purposes of analysis (see Chapter 2). However, it is now recognized
that such re-coding may involve redundancy. From the perspective of three-
item stateme nts ana lysis, a multistate character is equivalent to a suite of
unique three-item statements so thai no statement appears more than once
in the suite (Nelson and Ladiges 1992). For example, an ordered multistate
character expressing the relationship A(B(CD» has four three-item state-
ments, A(BC), A(BD), A(CD), and B(CD). Manually, A(B(CD» can be
reduced to its basic components: A<BCD) and B(CD). A(BCD) has three
three· item statements, A(BC), A(BD) and A(CD) and B(CD) provides the
fourth . How does this differ from its binary representation? The multistate
character A(B(CD» can be represented as two binary characters: A(BCD)
and AB(CD). As before, A(BCD) yields three stateme nls, A(BC), A(BD) and
A(eD), while AB(CD) yields two statements, A(eD) and B(CD), giving a 101al
of five. The two bi nary characters have one more statement than the single
multistate character, beca use A(CO) occurs twice. Thus two binary characters
(understood as independent) appear to have more information than the
multistate character in which the states arc seen as dependent. This poin ts to
a possible difference in information content between a multistate character
and its binary equivalent (see below).
9.3.3 Representation of three-item statements for analysis

with current parsimony programs
A three-item statement involves only three terminals, but current parsimony

programs require an ceUa of • matrix to be filled. For example, a three-i tem
statement pertinent 10. llo-lUon problem will require nine terminals to be
ignored. This rcquire _ _ III M.lafied by the use of question marks in
the appropriate .;el 'h ......not I real value} should be
172 17lree-ilem Slatements allalysis
interpreted as truly meaning ' non-applicable', ralher than the 'either/or' or

' polymorphic' interpretation that can also be attributed 10 a question mark
(see Chapter 4). This simple heuristic device has bee n persistently misunde r-
stood by critics (e.g. Harvey 1992. Kluge 1993, 1994). However, the use of
question ma rks may present problems in the resulting cladograms by produc-
ing over-resolution of nodes thaI arc not supporlcd by data. Nelson (J 992)
introduced the concept of ' minimal cladogram' (see below) with respect to
three-item data, which is similar to the strictly supported dadograms in
standard analyses (see §4.2).
9.3.4 Cladogram length and three-item statements
Table 9.1a is a standard mat rix for four taxa (A- D) and 10 binary characters
0 - 10). Ana lysis of this matrix yields one cla~ogram (Fig. 9.2a; length - 13).
The three-item statements ma trix for the same data is given in Tabl e 9.l b,
analysis of which yields the same cladogram (Fig. 9.2a, length"", 30). The
three-item statement 'characte rs' can be grouped into six sets of three-item
statements, with a total o f 24 statements: A(CD) (x7), B(CD) (x3), A(BC)
( x 4), A(BD) (x4), C(AB) (x3), D(AB) (x3). Of these, 18 are included in
the resu lting cladogram , while six are excluded. The form e r are referred to
as 'accommodated three-item statements' (ATS) and the latter as 'non-
accommodated three-item statements' (NTS). Accommodated statements each
fit to a node with a single step and add one step to the cladogram length.
Table 9.1 (a) Standard matrix of four taxa (A- D) coded fo r 10 binary characters. (b)
The correspond ing three·item statements ma trix. Statements have bellil arranged
into six groups of eqUivalent statemenL<;. Characters 1-3 and 8- 10 yield two
three-ite m sta tements (a and b). whi le characters 4- 7 yield three statements (o - c)
See text fo r further details.
I 2 3 4 5 6 7 8 9 10
A 0000000111
B 00011111 11
C 1 1 1 1 11000
01111111000
(0)
1 23 4567 1 2 3 4 5 6 7 4 5 6 7 8 9 10 8 9 10
aaaaaaa ob 0 b b b b c c c c ••• b b b
A 0000000 ? ? 7 0 0 00 0000 t 1 1 t 1 1
B 11??1?1 0 0 0 1 1 1 1 1 1 1 I t 1 1 1 1 1
C 1 1 1 11 1 1 I I 1 1 ? ? ? ? o () 0 ? ? ?
D 1111 11 1 t 1 '! 'f ? ? 1 1 1 1 'f 'f 'f () 0 0
b"pleme/ltatioll 173
A B COABC O
,v y
F·ig. 9.2 (a) Three·item solution for the matrix in Table 9.1b under unifoml weight-
ing. (b) Alternative cladogram found using fractional weighting of the same d ata.
Non-accommodated statements fit the cladogram twice and add two steps to
the length. T hus, the relationship between data ((he suite of three-item
statements) and c1adugram length is given by:
Leng,h - l: ATS + 2( l: NTS) .

For the example in Table 9.1b. the accommodated statements arc A(BC),
A(SD), A(CD) and S(CD), giving a total o f 18, and the non-accommodated
statements are, c(AD) and D(AD), which add a fu rther 12 steps, giving an
overall cladogram length of 30, wh ich is the length reported by parsimony
programs.
T he shortest cladograms accommodate the largest number of statements
and are hence selected as optimal. The use of 'steps' should not be confused
with the conventional understanding of that term , which is equivalent to a
character transforma tion. A three-item stateme nt involves o nly three te rmi-
nals, hence it can o nly fit a node of a cladogram exactly (with one step) or
inexactly (with two steps). The essent ial difference in interpretation is thitl
the length statistic ind icates 'fit' of data rather than 'character' change.
9.3.5 Uniform and fractional weighting

It was noted above that, for bi nary characters, the equiva lent number of
three-item stateme nts is given by (t - n),,(n - 1)/2. However, whe n /I > 2,
the re may be redundancy among statements. Consider, for example, the
bi nary character expressing the relationship A(BC D), which has th ree state-
ments, AmC), A(BD) and A(CO). Combina tion of any pair yields the o riginal
relationship, ACBCD), which logically implies the third:
A(BC) + A( BD) - A(BCD )

A(BC) + A(CD) - A(BCD)
A<BD) + A(CD) - A(BCD)
This suggc.'its Ihal , in sume cases, nut a ll stateme nts are indepe ndent. In
this exam ple, because only two of Ihe Ihree statements arc rcquired 10
reproduce the original relldoalblp.lICh CIIn be weighted I<l have an ahsnlule
value of two-third•. 00aIrIIf. • aU of Independent three-item stlde-
menllj is (II - 1) h -II) .au. (or weiahl) is the ratin uf
174 Three·item statements analysis
independe nt sta teme nts to the total number of statem e nts, or 2/11 . Tables of
values for total and independent statements are provided by Nelson and
Ladigcs (1992, tables 1- 2), with cnrrections to two enlries in thei r table I by
Williams (submitted).
Consider now another binary character expressing the relationship
ABC(DE), for which there are three statements, A(DE), B(DE), and C(DE).
As in the first example, the number of independent statemen ts is 3, but their
absolute value is 1. In other words, if any onc of the three statements is
omitted , the other two canno t be combined to yie ld the correct result,
ABODE), For example, MOE) and B(DE) sum 10 give o nly AB(DE); C is
missing. A ll three stateme nts arc independent. Finally, consider a third binary
character expressing the relat ionship AB(CDE), for which there are six
three-item statements: A(CD), A(CE), A(DE), see O), S(eE), and B(DE).
The number of independent statements is 4, each with an absolute value of
two-thirds. Inspect ion of the six statements -r.cveals two subsets of three
statements, one with relationships releva nt to A, t~ e other with relat ionships
relevant to B: A(CD), A(CE), A(DE) and Il(CD), Il(CE), B(DE). Furt her
inspection reveais that within each subset, summing any two will produce a
correct solution. Thus A(CE) + A(DE) '" A(CDE), NCO) + A(OE) -
A(CDE), and A(CE) + A(DE) ~ A(CDE), as well as A(CD) + A(CE) +
A(D E) - A(CDE). The same holds fo r the relationships of B. Hence, in each
subset, one statement is redundant, reducing the total weight from 6 to 4.
It should be emphasized that elimination of redundant logically-dependent
statements pertains to the data as a whole, as no particular statement is or
should ever be excl uded. All statemen ts are relevant even if their value is less
under certain ci rcumstances.
To appreciate the effects o f redundancy, consider aga in the example of
multistate character A(S(CO» , which has four statements, A(BC), A(BO),
A(CD) and B(CD). Its two derivative binary characters, A (BCD) and AB(CD),
have five statements with ACCO) occurring twice. Hence one ACCD) state-
men t is redundant. To remedy the situat ion, each of the three statements
from A(SCD) can be downweighted by two-thirds, reducing the total informa-
tion content of the two binary characters to four statements. Such down-
weighting is referred to as ' fractional weighting' (FW). Fractional we ighting
highlights the difference between multistate characters and pairs of (con-
gruent) binary characters in terms of their information con tent and informa-
tion redundancy. In the multista te character A (B(CD», all fou r statements
are give n equal weight (as states are assumed dependent). In the pair of
binary characters, the statements for A(BCD) are dependent and hence have
some logical redundancy but are still independent of AB(CD) (Table 9.2).
Three-item sta tement representation identifies non-independence in the
data, whereas the standard approach does not. Redundancy thus remains
with additive binary coding. Fractional weightina is therefore more sensitive
Implemelltatioll 175
Table 9.Z Comparison of three Hem ntpresenlation for the multistate cha racter
A(B(CO)), its binary equivalents and the effects of fractional weighting on both kinds
of characters.
Uniform Weighting fractional weighting Multistate

Binary Binary Binary Binary
A(BCD) Afl{CO) A(BCD) Afl{CD) A{B{CO)
AlBel 1 ,
!
AlDOl 1 l 1
A(CD) 1 ,
! 1
B(eD) 1 1 1
3+2 2+2 4
Total 5 4 4
than uniform weighting to the information in the data. It should be noted

that the use of the word 'weighting' refers to relative information content and
should not be construed in the same terms as the ' weighting' procedures
described in Chapler 5.
Consider once again the data in Table 9.lb. Three-item statements analysis
using 'uniform weighting' (UW), in which all statements are given equal
(unit) weight, yields one cladogram (Fig. 9.2a, length "" 30; 18 ATS + 2 x 6
NT'S - 30). Using fractional weighting, the optimal c1adogram(s) will be that
which has the greatest total weight, which need not be the same as the
c1adogram that accommodates the greatest number of statements (Nelson
1993, Nelson and Ladiges 1994). If fractional weighting is applied, using
Hcnnig86 (which requires weights to have integer values between 0 and 10,
see below), an additional cladogram is found (Fig. 9.2b). Both cladograms
have a length of 27. The cladogram in Fig. 9.2a implies six three-item
statements in various quantities, totalling 20 after weighting. This is four
fewer than for the UW matrix (Table 9.3). Of these 20 statements, 15 are
accommodated on the c1adogram: A(BC) x 3, A(BO) x 3, A(CD) x 6 and
B(CO) x 3. A further six statements are not accommodated: C(AB) x 3,
O(AB) X 3 ( - 6 statements). This gives a cladogram length of 27{ -- 15 + (2 X
6» . Likewise. for the c1adogram in Fig. 9.2b, of the 20 three-item statements,
15 art: accommodated: CXAB) x 3, I)(AB) x 3, A(CD) X 6 and B(CD) x 3;
and six statements are not: A(BC) X 3 and A(BD) X 3, also givi ng a length of
27.
To understand the fractional weights, inspection of the data is required. In
Table 9.3, lhree statements cont ribute to the componenl A(BCD): A(BC),
A(BO) and A(CD). Of these, on ly two are required to yield the correct result,
the third being logically implied in each case. Therefore, a fractional weight
of two-thirds i. appUed CD .... _tement. Effectively, the FW matrix is
reduced by one In, UW matrix. A(BCO) is included
9.3 Comparison of the effects of unUonn weighting (UW) and fractional weighting (FW) using the data in Table 9.1 b and
in Fig. 92. Cladogram 1 = Fig. 9.2a: cladogram 2=Fig. 9.2b. FW (no fa ctor):fOunded fractiona l weight for use in
PAUP without a correction factor (the lrue value, if different, is given In parentheses). FW{x J) = frac tional weight to be
correction of the actual values by a fa ctor of J to minimize the effects of rounding error, e.g.. for statement A(BC),
Z; is multiplied by J to give a new weigbt of 6. 0= accommoda ted three·item statement (ATS) (i) or
three item statement (NTS) (x) on ciadogrsm Total = total statements bum Original data (column sums).
of tree = sum of numbers of statements with a 'J' in column 0 + 2 X statements with II 'X' in column D. See text for
Cladogram 1 Cladogram Z
UW FW (no factor) FW{xJ) D Statements UW FW (no factor) FW{ x J) D
,,,
,IS
4 J (2~) 8 AIBC) 4 3 (2t) 8 X
,,
4 3 (2~) 8 A(SO) 4 3 (Z ~ ) 8 X
7 6 (st) J7
, AICD) 7 6 (st ) J7
,,
3 3 9 BlCD) 3 3 9
3 3 9 X CIAB) 3 3 9
3 3 9 X OWl) 3 3 9
:roOal 24 ZO (ZO) 60 4 Total 24 20 (ZO) 60 4

27 (26)
'-'" 30 78 4 Length 32 27 (z5 t l 7. 4
Implememotioll 117
as redundant statements, which constitutes spurious information.
This example is also instructive in another matler relating to implementa-
tion. Fractional weights are, by definitio n, fraction s. However, all currently
ava ilable parsimo ny programs can only apply integer weights to characters in
a data matrix. Thus, the fractional weights must be rounded to the nearest
integer value. In the above example, this results in two equally most parsimo-
nious c1adograms of length 27. However, if we carry out the exercise manually
using the fractions themselves, then we arrive at a length of 26 for the
c1adogram in Fig. 9.2a and only 25i for that in Fig. 9.2b. We now see that
there is, in fact , only one most parsimonious cladogram , not two (Table 9.3).
The result of two equally most parsimonious solutio ns is an artefact of the
implementation. To circumvent Ihis problem, fra ctional weights should be
multiplied by an appropriate correction fa ctor to give accurate integer values.
In the present example, this is achieved by using a fa ctor o f 3 (Table 9.3). tn
r
practice, however, rou ndin may still be required LO give integer weights. For
example, if a weight of 23 is interpreted as 2.667, then correction using a
factor of 3 will give a weight of 8.00 1, which then needs to be rounded to 8.
9.3.6 Minimal cladograms

Analysis of a matrix of three-item statements may yield one or more clado-
grams, some of which may not be minimal. A minimal cladogram is unde r-
stood here to be one that is not only minimal in length, but one in which all
the resolved nodes are supported by data. (For a discussion of similar
situations in standard analyses, see §4.2.) The use of question marks in any
data matrix may create problems in resulting c1adograms by leading to
over-resolution of nodes not supported by data. Over-resolution of c1ado-
grams for three-item statements analysis can be illustrated graphically by
considering one standard character. Out of seven taxa, six have the inform a-
tive state, and thus there are 15 three-ite m statements. Ana lysis of th ese 15
statements using Hennig86 gives 945 solutions, each fully resolved and of
le ngth 15. This is because every statement is accommodated on every c1ado-
gram. Of the 945 solutions, there is none in which aU nodes (of which there
are five) are fully supported by data. The strict consensus tree of the 945
c1adograms is also a minimal cJadogram of length 15 but bas only one node
supported by data , the node uniting all six taxa with the informative stale.
This is the preferred cladogram, not only because it is of minimum length,
but also because it includes no nodes that are unsupported by data. This
observation appl ies to all single binary characters represented as three-item
state ments where n> 2 (Nelson and Ladiges 1992). This is the reason why
analysis of the matrix in Fia. 9.1f by Hennig86 results in th ree cladograms, of
which the strict con..,.. II lb. preferred tree.
In many three-hom . .IIGIeDU I nalyses, the minimal cladogram can be
found by conatru \It tree of aU the most parsimonious
178 Th ree-item statements analysis
cladograms, providing it is the sa me length as those cladogra ms (the strict

consensus tree collapses nodes not supported by data). When the strict
consensus tree is longer than all the most parsimonious cladograms, it cannot
be minimal. In cases where the strict co nsensus tree is not minimal, one may
inspect each most parsimonious c1adogram. collapsing nodes manually, and
noting any change in length. H a node can be collapsed with no change in
c\adogram length, then the resultant less- resolved topology is considered
preferable. because originally the collapsed node had no support from the
dala.
An alternative strategy is 10 use the parsimony program, NONA, which
eliminates ambiguous optimiza tions due to the use of question marks. Results
so far ind icate that NONA usually, but nO! always, find s the minimal
cJadogram(s) for a three·item statemen ts matrix mO
o re e ffici ently than manual
manipulation.
9.3.7 Optimization
Because of their design, cu rre nt parsimony prOi;rams may indicate ambiguous

opt imizations fo r some statements at some nodes o n a cJadogram. The
progra ms treat every stateme nt (every column in the matrix) as a conven·
tional character. That is, they must assign a value to each character o n every
node o f the cJadogram despite the statement havillg only th ree real values.
Consider the example in Fig, 9.3 , If the statement NCD) is optimized onto
the cladogram A(B(C(DE»), the programs will assign 01 1 to node X and I to
node Y (Fig. 9.3a). The assignment of 01 1 to node X is irrelevant to the
understanding of the statement's purpose. All that matters is that the
statement fits node Y exactly (Fig. 9.3b). Note that a three· item statement is
read o n a cladogram in the form 'C and D arc more closely related to each
"0 B7 C1 0 1 E7 .1.0 B Cl 01 E
(.0( 1)
F"13- 9.3 'Optimiza tion' of the three--ilem statement AICOl onto the c1adogram
A(B(C(OE))). (a) Pursimony computer programs that require a ll cells of a da ta matrix
to be filled and attempt to optimize the 'condition' in taxon B a~ either 0 or 1, even
though this is nonsensical. (b) The correct 'optim iza Hon' of the three·Hem statement
A(CD), in whic h taxon B is irreleva nt. The sta tement A(CDI should be raad os 'C and
D are more closely rela te d to ear-h o the r th lln e ithe r is to A' nnd not in the form of
the s tanda rd approac h, wh ich seHII lIodos 811 pos!llhle (ancHlltnd) ' Irnllsformllt lo nll' of
olle dUlra<:tllr (ur ~ I IIIIII 11.'-'0 e.DQlher,1
Implemelltation 179
other than either is to A' not as in the standard approach, which sees nodes
as possible (ancestral) transformations of one character (or state) into an-
olher. This point was misunderstood by Farris el al. (1995) who, in their
examples, count three-item statemen ts as if they are optimized characters.
The assignment at node X in Fig. 9.3a is due 10 the programs treating
question marks as ' potentia l' data when, in the case of three-item statements,
they are no such thing (the standard approach also has problems; see §4.2).
T he main point is that despite the default optimization of values, ' l-le nnig86
(and PAUP) effi ciently implements three-item analysis because tree length. if
not optimization, is exact ' (Nelson 1993) (our emphasis). Cladogram length
still reflects optima lity accurately.
9.3.8 Information measures: Cl and Rl
A three-item statement involves only three terminals and therefore will either
fit to a node on a cJadogram or not. In other words, a three-item statement
will display either one step and have a ci of 1.00, or two steps and have a ci of
0.50. Hence C I is not a use ful measure as it simply distinguishes the fil to a
node of each statement.
For ri values, the situation is somewhat different. Plalnick (1993) drew
attention to how three-item statements differ from binary characters in thcir
fit to cladograms, by noting the performance of a suite of three- item state-
ments on a series of different (specified) topologies (Table 9.4) The topolo-
gies of the c1adograms are not rel evant to this discussion. Note thai RI
reflects the number of accepted (accommodated) three-item state men IS as iI
fraction of the total number of th ree-item statements considered. For exam-
ple, cladogram I accepts 65 statemen ts as true, hence 65 / 135 = 0.48; dado-
grams 2 and 3 accept 54 statemen ts as true, hence 54/ 135 = 0.40; cladogram
4 accepts no statements as true, hence 0/ 135 ... 0; and cladogram 5 accepts
all 135 statements as true, hence 135/ 135 = I. There fo re Rl would appear to
be a useful measure for the amount of fit ror each possible cladogram.
Table 9.4 Data for nve cladograms discussed by Pl atnick (1993). Each clacl0Krnm
is tabulated for the number of 8ccepted (accommodated) and prohibited (nun ·
accommoda ted) statements. together with some more conventional sta ti stics.
Numbe r of Statements
Acccpts Pro hibits Steps Nodes Length R1
Cladogram 1
Cludogram 2
Cludogram J
('True')
65
54
('Fa lsc')
70 6
2
2
12
11
4
205
2Ui
21ti
.
4()
4()
Cludo81"Ulll 4
Chulo8rom 5 • (;
1
2111
Uri
0
100
180 Three-item stalemems analysis
9.3.9 Summary of implementation procedures
Implementation of three-item statements analysis can be executed in a
manner similar to standard character analysis but requires the same attention
\0 detail. This is because all currently available parsimony programs were
designed with a different purpose in mind. Nevertheless, to repeat, ' Hennig86
(and PAUP) efficiently implements three-item analysis because tree lenglh, if
not opt imizatio n, is exact' (Nelson 1993).
Improvements in precision in three-item statements analysis come from
three sources. First, attention must be given to redundancy in the data and
fract ional weights should be used. Second, the scaling of weights must be
appropriate to avoid oversi mplifying the effect of integer weights currently
required by parsimo ny programs. Third, the final cladogram(s) should be
minimal with respect fi rst to length and then to the number of nodes
supported by data.
Platnick et aI, (1991 a) suggested that the best cladogram for ava il able data
should satisfy ' the criteria of parsimony, relative informativeness of charac-
ters, and maximum resolution of characters'. In both three-item statements
analysis and the standard approach, parsimony is the principle used to fi t data
10 a cladogram and fuJiy resolved cladograms are the object ive. It would seem
that 'maximum resolution of characters' is an issue that has only recently
been explored. The relative informativeness of characters differs depending
on the way data are represented. and this may be the major distinguishing
fa ctor between the standard and three-item approaches (see also §4.2).
9.4 PRECIS ION
When Nelson and Platnick (1991) first proposed the use of three-item
statements to analyse systematic data, they suggested that it might improve
the precision of parsimony (sensu Farris 1983). They presented results from
the analysis o f several hypothetical and one real data matrix (from Carpenter
1988) comrasting the results of three-item and standard analyses. Their
resu lts showed that three-item statements analysis somet imes produced fewer
cJadograms, sometimes more cJactograms, and sometimes different dado-
gra ms compared wit h the standa rd approach. What, then, is the meaning of
' more precise'?
The suggestions of Platnick et al. (199Ia) once again provide a valuable
means of understandi ng the issues facing cladistic practice. The best dado-
gram ror ava ilable data should satisfy ' the criteria of parsimony, relative
informativeness of characters, and maximum resolution of taxa', As stated
above, bmh three-item statements analysis and the standard approach use
parsimony to organise the data . Both approaches also attempt to gain
' max.imum resolution or taxa', such that all nocIIIlN Iy lupported by data.
Precision 181
Table 9.5 (a) Matrix of conflicting charac ters for fou r taxa (A- D)
and an all·zero root (0), coded for three characters 1- 3. (b) The
equivalent three·item statements matrix. {After Nelson 1996J
1.1
1 , 3
0 0 0 0
A 0 0 0
B 0 1 1
C 1 0 1
D 1 1 0
Ibl
, 3
• b • b • b
0 0 0 0 0 0 0
?
A
B
,
0 ?
0
0
1
?
1
0
1 1
C 1 ? 0 1
D 1 1 1 ? 0
The o nly factor that can differ between the two approaches is the 'relative
informativeness of characters' or, perhaps more accurately. the relative
informativeness of observations.
Nelson (I996) provided a series of examples that demonstrated improved
preCision by greater resolution of character data. Nelson analysed a series of
data sets with four to seven taxa, Taxon A has all plesiomorphic states (Os),
while the other taxa, B - n (where 11 - CD, eDE, up to CDEFGHI), have
different combinations of conflicting apomorphic states (Is). Each matrix was
analysed wi th an all -zero root, 0, as the focus of interest was the relation-
ships among taxa A - 11. Taxon A is not an outgroup but part of th e problem
requiring solution.
For example, standard analysis of the data in Table 9.5a yields six equally
most parsimonious cladograms, the strict consensus tree of which is an
uninformative bush. This result suggests that there is no overall information
in this matrix-or at least the information has maximum conflict. If the same
data are represented as a suite of three-item statements (Table 9.5b), three
cladograms result, the strict consensus tree of which is A(BCD). This result
suggests that there is information in this matrix to relaJe B + C + D as a
group relative to A.
Nelson (1996) analysed 120 matrices, in which there were connicting
characte rs in three (BCD) to eight (B-O taxa, using the standard approach
(although the series could be utended indefinitely). The results from 96 of
these matrices included lhe poup • - n relative to A. while 24 did not. In
other words. over thr • JIOde. which lalufftCicnt to suspect
182 Three ·;lcm statements analysis
that matrices of this kind are generally informative. Three-item statements

analysis of the same 120 matrices aiwllYs yields a resolved node.
A glance ,I{ the data in Table 9.5 seems to imply that while the rel ation-
ships among B- O can not be specified (there is significant conflict), they do
indeed share a closer relationship to each other than they do to taxon A.
Three-item statements analysis does seem to improve precision in cases wit h
Ihis kind of connict (see also Siebe rt and Williams 1997). Thus, it would seem
that fo r the standard approach, a t'axo n thai lacks the attribute o f a larger
group (8 - D), while retain ing inio rmal ion relevant to relationships with in the
group, collapses the c1adogram en tirely (a point noted earlier by Nelson and
Platnick 1980). This suggests tha t in order to understand the relationships of
a subgroup, olle needs to have established the relationships of the larger
group (Nelso n 1996), Finally, these results suggest that the behaviour of the
standard approach to certain kinds of con nict might be a consequence of
a particu lar ' model' of character evolution uneerlying its implementat ion, a
model 'tha t requ ires sy napomorphy (evidence of relationship) ( 0 have a
unique o rigin (optimized as 1 at a node with distal Os as reversal(s), o r vice
versa)' (Nelson 1992; also 1996).
Plalnick et at. (1 996) suggested a second demonstration of increased
precision . Using a simple exa mple of one binary character, distributed among
four taxa (A- D) such that C and D share the apomorphic state and A and B
share the plesiomorphic state , standard analysis yields the cladogram AnCCD).
Platnick el af. (1996) suggested that inspection of all 26 possible cladograms
for Ihe four taxa (Fig. 9.4) would be instructive, because this would give an
ind ication of how much ' worse' the 01 hers were compared with the correct
solutiol1. When the binary characte r AB(CD) is fitted to the cladograms,
these can be divided into two series. Four c1adograms (Fig. 9.4a-c and y) have
one step, whil e the remaining 22 have two steps. Examination of the c1ado-
grams with two steps implies that, among other things, completely incorrect
solutions (e.g. Fig. 9.4g- j or Fig. 9.41- 0) are as good (or as bad) as some
partially correct solutions (e.g. Fig 9.4d- e) o r even the totally unresolved
bush (Fig. 9.4z). Th is point had been made earlier by Platnick (I989: 23) in a
diffcrent context.
The binary character AB(CD) yields two three-item sta tements, A(CD) and
B(CD) and three-item statements analysis yields the c1adogram AB(CD). If
these statements arc also fiu ed to all 26 possible so lutions, three series are
obl'ained, rather than two. The three-item statements solutions comprise four
cladograms with two steps (Fig. 9.4a-c and y), six with 3 steps (Fig. 9.4d- f, k,
r- s) and 16 with 4 steps. In Ihis context, three-item state ments analysis is
more ' precise', ill the sense that it partitions all the possible so lu tions in a
more effi cient way than docs the standard approach.
What, the n, is the signirica ncc of such parti tioning, given th at all c1ado-
gram s with II lenglh of 3 are sub·opt im al relative to the most parsimonious,
minimul clildogram (Fig. 9.4)1)1 The significant.'C lie in .he value of different
cladograms as d. llCCumul.l&lc . for " lin he standard
Precision )83
ABC 0 C o A
• B A C 0 B 0 C A BCD A
-.. ,~:,
,~., ........ 1
,>{;., ........ 1
,~.......... ,~.. .... . . . 3
A 0 C B o A C B DeB A o B C A C BOA
,~_. .... ~.......... ~.. ~_. .. _.h' 0

4._. ...
A C OB o B C A CAD 8 CO B A C BOA
,~. _" _3 :l..... ~:_h . ~:.. .. ~.,

... o ' ...., -" "
,:t. ~:, ~, • •
C BOA o 8 C A B 0 C A A C 0 C o A
,)!Y:::::.. >!.
:p) Y=::. :q) ... ,,, . J ..... ... J -,. ..
C
>\~
Y::::~.
:u)
ABC
BOA
.
D
C A
,.....
,.) ........
8
.....
0 o B C
...", .. ~,..
~:,
A A 0
•c •
...,", Y::,
A C 0
Y ,,,,,, .. 2
>\~,
I:::::.
!)
Fig. 9.4 (a- z) The 26 possible topologies for four taxa, A-D. Two lengths are given
for ea ch cladogram. The upper figure relates to the binary character, All(eD); the
lower 10 its two constituent three-item statements, A(CO) and I3(CD). The four
cladogra ms enclosed within single-line boxes (a-c, y) accommodate the binary
charactor a nd both threo- item sta temsnts wi th a single step each and thus explain
all of the data. The binary character fil s to a ll other cladograms wilh two steps,
whi c h thus appear 10 oxplaln nono of tho data. Howover, the six cJadograms
enclosod within doublu-Unu hU)108 (d- f. k, r-s) do accommoda te ons of tho three-
Item Itatement. with OM ...p ud tMr.fore explain at least part of the data.
Thu.. although .ubopllI.... ...., . , .....flned to ,II of other remaining clado-
sram. (a-I, I-q. t-... ... ...
p. to Iccommodall ea.ch three-Item
184 Th ree·item stalem elltS analysis
approach, the c1adograms in Fig. 9.4k and Fig. 9.41 expla in none of the data.
However. using three-item statements, the former cladogram includes o ne
sta teme nt, A(eD), and hence expl ains <I I least some of the data. As Pla tnick
el al. ( 1996) pointed Qut, switching fro m Fig. 9.41 to Fig. 9A k should not be a
' zero-cost' o ption. The value o f the three-item statements approach in this
context is that il is likely to be more sensitive to th e accumulation o f further
da ta tha n is the sta nda rd a pproach.
There is a further point Ihal seems sign ificant for coding protocols. T he
standard app roach treats the bin ary character AB(CD) as a feature relating C
and n toget her with respect to A and B. It says nothing about the re lation-
ships o f C and 0 relative to A or B. The standard character is restricted to
treating C + D as a group, with plesiomorphic Slates as uninformative with
respect to relationsh ips. Under ce rtain circumstances, the standard approach
will treat A + B as the 'group' rather than C + D, with the plesiomorphy (0)
interpreted as 'secondary' or a ' reversal'. l"his is different from crude phe-
netic grouping by symplesiomorphy but nevertheless constitutes a version of
'grouping by plcsiomorphy' (de Pinna 1996). Plalnick el at. (1 996) went
further, wondering ' how long it will take for systematists to realise th at
allowing the '0' entries fo r taxa A and B to constrain potential resolutions can
also give Ihis sort of ' negative evidence' more weight than it deserves, in at
least some circu mstances'. It would also seem that the notion o f character
reversal belongs to tbe realm of ' trees' rather than cl adograms and thus again
constitutes a kind of model (see Chapte r n.
Of further interest is that the fully resolved solutions selected by the
standard approach as shortest (Fig. 9.4a-c) correspond to the ' Interpretatio n
I' solutions, originally proposed by Nelson and Plat nick (1 980) fo r dealing
with potential resol ution of basal trichotomies. In contrast, the fully resolved
solutions selected by the three-item statements approach as shortest (Fig.
9.4a- f and k) correspond to the ' interpretatio n 2' solutions proposed by
Nelson and Plat nick (1 980), whe re the close relationship of C and 0 is
maintained , even though A or B are more closely related 10 either C o r D.
(The additio nal solutions wilh terminal trichotomies simply represent sum-
maries o f Ihe resolved cladograms. The cladogram in Fig. 9.4r is a su mmary
of the fully resolved cladograrns in Fig. 9.4c-e, and the cladogra m in Fig. 9.4s
is a sum mary of the fully resolved cladograms in Fig. 9.4a, f and kJ Finally.
Interpretations 1 and 2 bear some resemblance to Assumptions 1 and 2 in
biogeography, while 'secondary' symplcsiomo rphy. ' reversals', and plesio-
morphies as 'po tentially informative' have a certain amount of similarity
wilh Assumption 0 of biogeography, a somewhat questionable protocol
(Humphries and Parenti in press).
It is worth recalling that three-item statements analysis bega n in bio-
geography. Biogeography, in its modern cladistic fo rm, deals with chldograms
of areas (fo r a recent summary, see Humphries and Parenti in press). A more
general que!lIion with re ~I 10 cludistics miah1 be: il lhere an empirical way
C1!(1ptcr .mll/mary 185
of dealing with all branching diagram s, regardless of the ' kind' of data '! Is
there a genera l theory of cladograms and hence a general theol)' of systemat-
ics? The first analytical explorations o f this question began, perhaps, with
Nelson and Platnick ( 98 1), a much neglected but stili highly relevant and
fundam ental book. We may possibly question any direct analogy between
systematics and biogeography. However, one understanding of the differences
seems to reside in how characters are viewed, and one possible resolution
may lie in rejecting characters as 'transformation series' (anccstor-
descendent sequences) in favour of characters as statements of relationship
(A is more closely related to B than it is to C).
9.5 CHAPTER SUMMARY
1. ' Cladistics' is grouping by synapomorphy, where synapomorphy is under-

stood as evidence of relationship . Three-item statements analysis is an
alternative way to code data based on the idea that ' taxo n' and
' homology' represent the same relationship.
2. Three-item statements analysis reduces data to their simplest expression
of rela tionship, the three-item statement. With four taxa (A-D), where C
and D possess a particular fea ture and A and B do not, there arc two
three-item statements: A(CD) and B(CD).
3. A matrix o f three-item statements can be analysed using current parsi-
mony programs to find the best fitting c1adogram.
4. The relationship between cladogram length and da ta (t he suite of three-
item statements) is given by: length - ATS (accommodated three-it em
statements) + 2 X NTS (non-accommodated three-item statclllcnts).
Length accurately reflects the ' best' cladogram because ' H cnnigH6 (lUld
PAUP) effi ciently impleme nts three-item statements analysis hecause
tree length, if not optimization, is exact' (Nelson 1993).
5. Some binary characters may have redundant information. For example.
the character re presenting the relationship A(BCO), has three state-
ments, A(BC), ACBD) and A(e D ), any pair of which in combination will
yield the sa me result and thus logica lly imply th e third. This suggests that
in some eases not all statemcnts are independent. In this example, as :lIIy
two of the three state ments derived from A( BCD) produce the same
result, each can be we ighted accon.lingly. Such dowl1weighting is called
frac tional weigh ting.
6. Analysis of a mHtrix uf three-item stlltemcnts lIlay yield one Of more
c1adngram.~, ] A minim,,] dmlogn·l m is
understood 0"","- ' =
has .111 resolved nodes
186 Three-item slatemems analysis
7. A statement shou ld be read from a cladogram in the form 'e and 0 are
morc closely re lated to cach other than either is 10 N, and not in the
form of the 5t3mbl'd approach, which sees nodes as possible (ancestral)
' transformations' of one characte r (or state) into another. Standard
optimization is irrelevant.
8. When fitt ed to a cladogralll , a statemen t wilt have a ci of either 1.00
(when a sta tement fits a particular cladogram) or 0.50 (when i, does not).
Hence CI is not a useful overa ll meas ure of fit as it simply distinguishes
the fit to a node of each statemenl.
9. The number of accepted (accommodated) three- item statements as a
fract ion of the lotal number of statements considered is reflected by the
RI value. Therefore, RI might be a useful measure for the amount of fit
for each possible c1adogram.
-,
10. To improve precision in three-ite m statements analysis, atlentio n must be
give n to redundancy in the data and fract ional weights used. Weights
should be scaled appropriately, in order to avoid introducing errors due
to requirement of current compute r programs [or integer weights. The
final cladogram(s) should be minimal with all nodes supported by data.
J L One understand ing of the difference between the standard approach and
the th ree-item approach resides in how character data are viewed: as
' transfo rmation series' (ancestor- descendent seque nces) or as statements
of relationships (A is more closely related to B than it is to C).
References
References cUed
Adams, E. N. (1972). Consensus techniques and the comparison of taxo nom ic trees.
Systematic Zoology, 21 , 390-7.
Alberch, P. (1985). Problems with the interpreta tion of developmental sequences.
SysfemaLic Zoology, 34, 46- 58.
Allard, M. W. and Carpenter, J. M. (1996), On weighting and congruence. Cladistics,
Il, 183-98.
Almeida, M. T. and Bisby, F. A. (1984), A simple method for esta blishing taxonomic
characters from measurement data. Taxon , 33, 405-9.
Anderberg, A. and Te hler, A. (1990). Consensus trees, a necessity in ta xonomic
practice. C/atlisrics, 6, 399-402.
Archie, J. W. (J985). Methods for coding variable morphological features for numeri -
cal taxonomic analysis. Systematic Zoology, 34, 326-45.
Archie, J. W. and Felsenslcin, 1. (1 993). Thc numbe r of evolutiona ry sleps on random
and minimum length trees for random evolutionary data. Journal of Th eorelical
Biology, 45, 52- 79.
Baer, K. E. von ( 1828). Ueber Elltwickelllrlgsgeschichle der Thiere: BCQbac/rruflg ulld
Re/lexioll, Theil 1. Gebriider Borntrager, Konigsberg.
Barthelemy, l-P. and McMorris, F. R. (1986). The median procedurc for tHrees.
Jownal of Classificatioll, 3, 329-34.
Barthelemy, J.-P. and Monjardel, B. (1981). The median procedure in duster analysis
and social choice theory. Mathematical Social Sciences, l , 235-67.
Haum, B. R. (1988). A simple procedure for establishing discrete characters from
measurement data, applicable 10 cladistics. Taxon, 37, 63- 70.
Begle, D. P. (1991). Relationships of Ihe osmeroid fishes and the use of reductive
char<lcters in phylogenetic analysis. Systematic Zoology , 40, 33 - 53.
Bremer, K. (1988). The limits of amino-acid seq uence data in angiosperm phylogenetic
reconstruct ion. Evolutioll, 42, 795- 803.
Bremer, K. (]990). Combinable component consensus. Cladistics, 6, 369-72.
Bremer, K. (1994). Bra nch support and tree stability. Cladistics, JO, 295-304.
Brower, A. V. Z. and Schawaroch, V. (1996). Three steps to homology asses-~ment.
Cladislics . 12, 265- 72.
Bryant, 1-1. N. (1992). The wle of permutati(ln tail probability tests in phyloge nelic
systematil-s. SY~'lem(llic IJiof()!,,'l', 41 , 25R-(I~t
Bryallt , 1-1 . N. (J995). Why ,IlIwpOl1l0rphies shou ld he removed : 11 reply 10 YClltes.
C/(/dislir.~, II , 3!:II -4 .:~,""""""",,,
Bull, J. J., I-Iuelscllhcd~, 1. P., ('unnin.h"lII, ~. W., Sworford. D. l . alld Waddell. I' . J.
(l~).l). f'lIrtitinnin. Ittn ill phylogcnctk' ,malysis. Sy,\/('lIwli('
IJiofos,v, 42, :\H4-91
188 References
Cain, A J. and Harrison. G. A 0(58). An analysis of the taxonomist's judgement of
affinity. Proceedings of the Zoological Sociery of London, 131, 85 - 98.
Camino J. H. and Sokal, R. R. (1 965). A method for deduci ng branching sequences in
phylogeny. Evolution, 19, 311 -26.
Carpenter, J. M . (1988), Choosing among mu ltiple equally parsimonious d adograms.
Cladis/ic.s, 4, 29}-6.
Carpenter, J. M. (1992). Random cladistics. Cladistics, 8, 147-53.
Carpente r, J. M. ( 1996). Uninformative bootst rapping. Cladistics, n , 177- 81.
ChappiU, J . A. (1989). QuaOlitative characters in phylogenetic analysis. Cladistics, S,
217-34.
Coddingto n, J . IlIuJ Scharff, N. (1994). Proble ms with zero-length bra nches. Cladistics,
10,415-23.
Colless, D. B. (I980). Congruence between morphological and allozyme data for
Menidia species: a reappraisal. Systcmalic Zoology, 29, 288- 99.
Cra nston, P. S. and Humphries, C. J. (1 988). Cladistics and compUiers: a chironomid
conundmm. Cladistics, 4, 72- 92. .,
Davis., J. I. (]993). Character removal as a means fo r ass.essing the stability of clades.
Cladislics, 9, 201 - to.
DeBry, R. W. and Slade, N. A. (1 985). Cladistic analysis of restriction endonuclease
cleavage ma ps within a maximum-likelihood framework. Syslemalic Zoology. 34,
21- 34.
de Queiroz, K. (1985). The ontogenetic method for determining character polarity and
ils relevance to phylogenetic systematics. Systemn tic Zoology. 34, 280- 99.
de Quciroz, K. (1 993). For consensus (sometimes). Systemalic Biology, 42, 368- 72.
Devereux, R.o Loehlich, A. R. and Fox, G. E. (1990). Higher plant origins and the
phylogt:ny of green algae. IOlm1a1 of Molecula r Evolurion, 31, 18-24.
Dixon, M. T. and Hillis, D. M. (1993). Ribosomal RNA secondary structure: compen-
satory mutations and implications for phylogenetic analysis. Molecular Biology
and Evolution, 10,256-67.
Donoghue, M. J ., Olmstead, R. G., Smith, J. F. and Palmer, J . D. (1992). Phylogenetic
relationshi ps of Dipsuca[es based on rlx L sequences. A nnals of the Missouri
Botanical Garden, 79, 672- 85.
Eernisse, D. J . and Kluge, A. G. ( 993). Taxonomic congruence versus total evidence,
and amniote phylogeny inferred from fossils, molecules, and morphology. Molecu-
lar Biology and EuolUlioll, 10, 11 70-95.
Eldredge, N. (1 979). Alternative approaches to evolutio nary theory. Blllletin of the
Cumegie Mllu um of Natural History, 13,7- 19.
Faith, D. P. (1 991). Cladistic pemmlation tests for monophyly and non-monophyly.
Systema tic Zoology, 40, 366- 75.
Faith, D. P. and Cranston, P. S. (1991). Could a cladogram this short have arisen by
chance alone? - On permutation tests fo r cladistic structure. Cladistics, 7, 1- 28.
Farris, J. S. (1 969). A successive approximations approach to character weighting.
Systematic Zoology, 18, 374- 85.
Farris, J. S. ( 970). Methods for computing Wagner trees. Systematic Zoology, 19,
83-92.
Farris, 1. S. (1971). The hypothesis of nonspecificity I nd taxonomic congruence.
Annual Reuiew of ~ tlnd Synemtllics, 1, 2n-30Z..
References 189
Farris, J. S. (1972). Estimating phylogenetic trees from distance matrices. Amnican

Naturalist , 106, 645-68.
Farris, J. S. (1983). The logical basis of phylogenetic analysis. Aduanc~s in Cladistics. 2,
1-36.
Farris, J. S. (1988). Hennig86, version 1.5. Program and documentation. Port Jefferson
Station, New York.
Farris, J. S. (1989). The retention index and the rescaled consistency index. Cladistics,
S,4 17- 19.
Farris, 1. S., Kluge, A. O. and Eckhardt, M. J. (1970). A numerical approach to
phylogenetic systematics. Systematic Zoology, 19, 172-89.
Farris. J. S., Kallersjo, M., Kl uge, A. G. and Buh, C. (J 994). Testing significance of
incongruence. Cladistics, 10,3 15- 19.
Farris. J. S., Kiillersj6, M., Albert, V. A., Allard, M., Anderberg, A t Bowditch, B., Bult,
e., Carpenter, 1. M., Crow, T. M., Dc Lact, J., Fitzhugh, K., Frost, D., Goloboff,
P. A., Humphries, C. J .. Jondelius, 0., Judd, D. , Karis, P. O., Lipscomb, D.,
Luckow, M., Mindell, D., Muons, 1.. Nixon, K. C, Presch, W., Seberg, D., Siddall,
M. E., St ruwe, L., Tehler, A., Wenzel, J., Wheeler, Q. D. and Wheeler, W. (1995).
Explanation. Claflistics, 11,2 11 - 8.
Finden, C. R. and Go rden, A. D. (1985). Obtaining common pruned trees. Jounwlof
Classificatioll, 2, 255- 76.
Fitch, W. M. (971). Towards defining the course of evolution: mini mum change for a
specified tree topology. Systematic Zoology. 20, 406-16.
Fitch, W. M. and Ye, 1. (990). Weighted parsimony: does it work? In Phylogenetic
afwlysis of DNA sequenc~s (cd. M. M. Miyamoto and J. Cracraft), pp. 147- 54.
Oxford University Press, Oxford.
Gaffney, E. S., Meylan, P. A. and Wyss, A R. (1991). A computer assisted analysis of
the relationships of the higher categories of turtles. Cladistics, 7. 313-35.
Gardiner, B. (1993). Haematolhermia: warm-blooded amniotes. CladisticS, 9, 369-95.
Gau thier, 1., Kluge, A G. and Rowe, T. (988). Amniote phylogeny and the impor-
tance of fossils. Cladistics, 4, 105- 209.
Goldman, N. (988). Methods for discrete coding of morphological characte rs in
numerical analysis. Cladistics, 4, 59- 71 .
Goloboff. P. A (1991). Homoplasy and the cho ice amo ng cJadograms. Cladistics, 7,
215-32.
Goloboff, P. A. (1993). Estimating characte r weights during tree search. Claf/istics, 9,
83- 91.
Goloborr, P. A 0995a). Parsimony and weighting: a reply to Turner and Zandee.
Cladistics, 11 ,91 -104.
Goloboff, P. A. 0995b). A revision of the South American spiders o f the family
Ncmcsiidae (Araneae, Mygalomorphac). Part I. Species from Peru, Chile,
Argentina, and Uruguay. Bulletin olrlle American Museum 01 Natura/History, 224,
1- 189.
Goloboff, P. A. (1996). Methods fo r fa ster parsimony analysis. Cladistics, 12, 199-220.
Grande. L (994). Repealin& patterns in natu re, predictability, and ' impact' in
science. In Inreprelin& 1M IIImuchy of ntlture (ed. L. Grande and O. RieppeO, pp.
61-84. Academic Prell. lID '*F-
Harshman, J. (1994). T'ho '''''KlC,. on bootstrap values. System.
line BJoIogy, 43, C19-
190 References
Harvey, A. W. (1992), Three-Iaxon state ments: more precisely, an abuse of parsimony?
Cladislics, 8, 345-54.
Hedges, S. B. and Maxson, L R. (19%). Re: molecules and morphology in amniote
phylogeny. Molecular Phy/ogenetics and Evolw iQII, 6, 312- 14.
Hedges, S. B., Mobe rg, K. O. and Maxso n, 1_ R. (1990). Tetra pod phylogeny inferred
from 18S and 28S ri bosomal RNA sequences and a review of the evidence for
amnio te relationships. Molecular Bi%gy and Eoolurion, 7. 607-33.
Hendy, M. D .. Utile, C H. C. and Penny, D. (1 984). Comparing IIces with pendent
ve rtices labelled. SIAM Journal of Applied Mathematics, 44, 1054- 65 .
Hennig, W. (1950). Gnmdziige ciner Th t:orie der pliy/ogellerisclum Sy!lcmatik. Deutsche
Zenl ra1vc rlag, Berlin.
Hennig., W. (1 965). Phylogenetic systematics. Annual Review of Entolliology, 10,
97- 116.
Hennig, W. (1966). Phylogenetic systematics. University of Illinois Press, Urbana.
o
Hillis, D. M. 99I). Discriminati ng between phylogenetic signal and random lIoise in
DNA sequences. In Phylogenetic analysis of DNA "(equellus (ed. M . M. Miyamoto
and J. Cracraft ), pp. 278-94. Oxford Unive rsity Press, Oxford.
H illis, D. M., Allard, M . W. and Miyamoto, M. M. (1993). Analysis of DNA sequence
da ta: phylogenetic infere nce. In Molecular Evolrllion: producin8 the biochemical
data. Methods in Enzymology, No. 224, (cd. E. A. Z imnler, T . J. White, R. L.
Cann and A. C. Wilson), pp. 456-87. Academic Press, San Diego.
Huelsenbeck, J. P. (1 99 1a). Tree-length skewness: an indica tor of phylogenetic in for -
mation. Systematic Zoology, 40, 257-70.
Hue,lsenbcck, J. P. (I99 l b). When are fossils beller than extant taxa in phylogenetic
analysis? S}'stematic Biology, 40, 458-69.
Huelsenbeck, J. P. and Bull, J. J. (996). A likelihood ratio for detection of phyloge-
ne tic signal. Systematic Biology, 45, 92- 8.
I-I uelse nbeck, 1. P., Bull, J. 1. and Cunningham, C. W. (1 996), Combining data in
phylogene tic a nalysis. Trends i" Ecology and Systematics, 11, 152- 8.
Huey, R. B. and Bennett, A. F. (1987). Phylogenetic studies of co-adaptation: pre-
fe rred temperatu res ve rsus optimal performa nce temperatures of lizards.
Evolution, 41 , 1098- 115.
Humphries. C. J. and Parenti, L In press. Cladistic Biogeography. (2nd ed.). Clarendo n
Press, Oxford.
Jardi ne, N. (1969). A logical basis for bio logical classifica tion. Systematic Zoology, 18,
37- 52.
Kiillersjo, M., Farris, J. S., KJugc, A G . and Bult, C. (1 992). Skewness and permuta·
tio n. ellUlisties, 8,275- 87.
Kluge, A G. (1985). Ontogeny and phylogenetic systemat ics. Cladistics, 1, 13-27.
Kluge, A G. (1988). The characteristics of ontogeny. In Ontogeny and systelllaties (cd.
C. J. Humphries). pp. 57-82. Columbia University I>ress. New York.
Kluge, A. G. (J 989). A concern for evidence :md II phylogenetic hypo thesis of
relatio nships among Epicraus ( Boidae: Serpentes). Systematic Zoology, 38, 7-25.
Kluge, A. G. (1993). T hree-taxon tra nsformation in phylogenetic infe rence: ambiguity
and distort io n as regards explanatory power. Cladistics. 9, 246- 59.
Kluge, A. G . (1994). Moving targets and shell games. Cladistics, 10,403-13.
Kluge, A O . and Strauss. R. E. (19M5). Ontoaeny and 'Yllcmlltics. Annual Rt!j);~w of
Ecology and SystelfUllics, 16. 2047-68.
References 191
Kluge, A. O. and Wolf, J. (1993). Cladist ics: what's in a word? Cladistics, 9, 1- 25.
Kraus, F. (1988). An empirical evaluation of the use of the ontogeny polarization
cri terion in phylogenetic i n fe ren~. Systematic Zoology, 37. 106- 41.
Kubicka, E , Kubicka, O. and McMorris, F. R. (995). An algorithm to fi nd agreemelll
subtrees. la/mlal of Classification, 12, 91 -9.
Laconic, H. and Stevenson, D. W. (1991). Cladistics of the Magnoliidae. Cladistics, 7,
267-96.
Lanyon, S. M. (1985). Detecti ng internal inconsistencies in distance data. Systematic
Zoology, 34, 397- 403.
Larson, A. (1994). 'Jbe comparison of morphologic!!1 and molecular data in phyloge-
netic systematics. In Moleclliar apfJrotIches to ecology alld euolw;oll (ed. B.
Schierwater, 8. Streit. O . P. Wagner and R. DcSallc), pp. 37 1- 90. Birkhliuser
Verlag, Basel.
Lauder. O. V. (1990). Functional morphology and systematics: snldying functional
patterns in an historical contcxt. Allllual Review of Ecology and Syslematicr, 21 ,
317- 40.
Littlewood, D. T. J. and Smith, A. B. (1995). A combined morphological and molecular
phylogeny for sea urchins (Echi noidea: Echinodermata). Philosophical Tral/sac-
li01'-" of the RQ)'(l1 Society of LOl/don , B, 347, 213- 34.
UWlrup, S. (1978). On von Baerian and Haeckelian recapitulation. Systematic Zo9/ogy,
21, 348-52.
Maddison, W. P. (1993). Missing data versus missing chllracters in phylogenetic
analysis. Systematic Biology. 42, 576- 81.
Maddison, W. P., Donoghue, M. J. and Maddison, D. R. (984). Outgroup analysis and
parsimony. Systematic Zoology, 33, 83- 103.
Margush, T. and McMorris, F. R. (1981). Consensus notrees. Bulletil! of Mathematical
Biology, 43, 239- 44.
Marshall, C. (1992). Substitution bias, weighted parsimony, and amniote phylogeny as
inferred from ISS rRNA sequences. Molecular Biology alld Euollllioll, 9, 370- 73.
Mayr, E. (1969). Principles of systematic zoology. McGraw-Hili. New York .
Mayr, E., Linsley, E. G. and Usinger, R. L. ( 1953). Methods and principles of systelllalic
zoology. McGraw-H ili, New York.
Meier, R. (1994). On the inllppropriatenl!:ss of presence/ absence recoding for non-
additive mu1tistate characters in computerized cladistic llnalyses. Zoologischer
Allzeiger, 232, 201-12.
Miekevich, M. F. (1982). Transforma tion series analysis. Systematic Zoology, 31 ,
46 1- 78.
Mickevich, M. F. and Johnson, M. P. (J976). Congruence between morphological and
allozyme data in evolutionary inference !lIId character evolution. Syslematic
Zoology , 2S. 260- 70.
Mishler, B. (1994). Gadistie analysis of molecular and morphological data. American
lournal of Physical Anthropology, 94, 143- 56.
Miller, C. (1980). The Thirteenth Annual Numerical Taxonomy Confere nce. System-
atic Zoology, 29, 177-90.
Miyamoto, M. M. (1985). Conlen,ul cladograms and general classifica tion. Cladistics.
1, 186-9.
Miyamoto, M. M. aacI ~ Tbe potential importance of mitochondrial
DNA aequlnat ~ ~Joaeny. In 1M Itimlrchy of lilt {ed.
192 References
B. Fernholm, K. Bremer and H. Jornvall), pp. 437-50. Elsevier Science,
Amsterdam.
Miyamoto. M. M. and FilCh, W. M. (1995). Tesling species phylogenies and phyloge-
netic methods with congrue nce. Systematic Biology, 44, 64- 7.
Mueller, L O. and Ayala, F. J. (982). Estimation and in terpretation of genetic
disumce in empirical studies. Genetical Research, 40, 127-37.
Neff, N. ([986). A ralional basis for a priori characler weighting. Systematic Zoology,
JS, 110- 23.
Nelson, G. J. (1913). The higher-level phylogeny of lhe vertebrates. Systematic
Zoology, 22, 87-91.
Nelson, O. J. (1978). Onlogeny, phylngeny, paleontology. and the biop,cnetic law.
Systematic Zoology, 27. 324-45.
Nelson, G. J. (1979). Cladistic analysis and synthesis: principles and definitions, with
a historical note on Adanso n's Famille des Plantes· (1763- 1764). Systemalic
Zoology, 28, 1-2 l.
Nelson, G. J. (1992). Reply to Harvey. Cladistics, 8, 356- 60.
Nelson. G. J. ( 1993). Reply. Cladistics, 9, 26 1- 65.
Nelson, G. J. ( 1994). Homology and systematics. In Homology: the hierarclrical basis of
comptlrlltive biology (ed. n. K.. Hall), pp. 101 - 49. Academic Press, San Diego.
Nelson, G. J. (1996). Nullius in verba. Joumal of Comparatioc Biology. I, 141 - 52.
Nelson, G . J. and ladiges, P.Y. (1992). Information contcnt and fractional weight of
three-taxon statements. Systematic Biology, 41, 490-4.
Nelson. G. J. and Ladiges, P.Y. (\993). Missing data and three-item analysis.
Cladistics, 9, lI l - 13.
Nelson, G. 1. and ladiges, P.Y. (1994). Three-ilGm consensus: empirical test of
fractional weighting. In Models ill phylogellY reconstnlction, Systematics Associa-
lion Special Volume, No. 52, (cd. R. W. Scolland, D. J. Siebert and O. M .
Williams), pp. 193-207. Clarendon Press, Oxford.
Nelson, G. J. and Patterson, C. (1993). Cladistics, sociology and success: a comment on
Donoghue's cri tique of David Hull . Biology and Philosophy, 8, 441 - 3.
Nelson, G . J. and Platnick, N. I. (1980). Multiple branching in cladograms: two
interpre tations. Systematic Zoology, 29, 86-91.
Nelson, G. J. and Pial nick, N. I. (1981). S),l·tem(j(ics and biogeography: cladistics and
vicariance. Columbia University Press, New York.
Nelson. G. J. and Plalnick, N. l. (l991). Three-tuxon statements: a more precise use of
parsimony'? Cladistics, 7, 351- 66.
Nixon. K. C. and CarpeOler, J. M. (993). On outgroups. Cladistics, 9, 413- 26.
Nixon. K. C. and Carpenler, J. M. (l996a). On simultaneous analysis. Cladistics, 12,
22 1-41.
Nixon, K. C. and CarpeOler, J. M. (l996b). On consensus, collapsibility, and d ade
concordance. Cladistics, 12,305 - 21.
Nixon, K. C. and Wheeler, Q. D. (1992). Extinctio n and the origin of species. In
Extinctioll and phylogeny (ed. Q . D. Wheeler and M. Novacek), pp. 119- 43.
Columbia University Press, New York.
Novacek, M. (1992). Fossils IS critical data for phylogeny. In Extinction and phylogeny
(ed. O. D. Wheeler and M. Novacck), pp. 46-88. Columbia Universiry Press, New
York.
References 193
Page. R. D. M. (1989). Comments on component·compatibility in historical bio-

geography. Cladistics, S, 167-82.
Page, R. D. M. (I993a). On islands of trees and the efficacy of different methods of
branch swapping in finding most-parsimonious trees. Systematic Biology. 42,
200- 10.
Page, R. O. M. (1993b). COMPONENT lIersion 2.0. Tree compan'son software for use
with Microsoft Windows. Users Guide. The Natural History Museum, London .
Patterson, C. (1982). Morpho logical characters and homology. In Problem.' ill phyloge-
netic reconstruction (ed. K. A. Joysey and A. E. Friday), pp. 21-74. Academic
press, London.
Penny, D. and Hendy, M. D. (1986). Estimating the reliability of evolutionary trees.
Molecular Biology and Evolution, J, 403- 17.
Pimentel, R. A. and Riggins. R. (1987). The nature of cladistic data. Cladistics, J ,
275-89.
Pinna, M. C. C. de (1991). Concepts and tests of homology in the cladistic paradigm.
CladistiCS, 7, 317-38.
Pinna, M . C. C. de (1994). Ontogeny, polarity and rooting. In Models in phylogeny
reconstruction, Systematics Association Special Volunle, No. 52, (ed . R. W.
Scotland, D. J. Siebert and D. M. Williams), pp. 157-72. Clarendon Press, Oxford.
Pinna, M . C. C. de (1996). Comparative biology and systematics: some controve rsies in
retrospective. Journal of Comparolivt Biology. 1, 3-16.
Piatnick, N. L (J979). Philosophy and the transformation of cladistics. Systemalic
Zoology, 28. 537-46.
Plalnick, N. I. (1989). Cladistic and phylogenetic analysis today. In Tlte hierarchy of life
(ed. B. Fernholm, K. Bremer and H. Jo rnvall), pp. 17-24. Elsevier Science,
Amsterdam.
J>latnick. N. I. (993). Character optimization and weighting: differences between the
standard and three· taxon approaches to phylogenetic inference. Cladistics. 9,
267-72.
Plalnick, N. I., Coddington, J. A., Forster, R. R. and G riSWOld. C. E. (I 99 1a).
Spinneret morphology and the phylogeny of haplogyne spiders (Araneae, Araneo-
morphae). American Museum Novilal(!.S, 3016, 1-73.
Platnick, N. I., Griswold, C. E. and Coddington, J. A. (J99lb). On missing entries in
cladistic analysis. Cladistics. 7, 337- 43.
Platnick. N. I., Humphries, C. J., Nelson, G. 1. and Williams, D. M. (996). Is Farris
optimization perfect? Three-taxon statements and multiple branching. Cladistics,
12, 243-52.
Pleijel. F. (1995). On character coding for phylogeny reCQnslructjon. Cladistics, 11,
309- 15.
Rieppel. O. (1988). FUlldamema/sofcomparotiue biology. Birkhiiuser Verlag, BaseL
Robinson, O. F. and Foulds. L R. «98l). Comparison of phylogenetic trees. Mathe·
matical Biosdences, 53, 131- 47.
Rosen, D. E. (1979). Fishes from the uplands and intermontane basins of Guatemala:
revisionary studies and comparative geography. Bulletin of the Americun Musellln
of NalUral HlJJory, 161,276-376.
Slether, O. A. (1976). . . . . . oIlqdrobGrnw, 1'ri.ssocladiw, Zalutschia , Paratrissocla-
diu.s, and _ ............ (a.pecra: Chlronomidae). Bulletin of the Fisheries
1W«J..Jt •
194 References
Sankoff, D. and Rousseau, P. (975). locating the vertices of a Steiner tree in
arbit rary space. M alJlI!lIIaricti/ Programming, 9, 240- 6.
Schuh, R. T. and Farris, J. S. (1 981). Methods for investigating taxonomic congruence
and their application to the Leptopodomorpha. Systematic Zoology. 30, 331-5 1.
Schuh, R. T. and Polhemus, J. T. (198 1). Analysis of taxonomic congruence among
morphological, ecological, and biogeograp hic data sets fo r the Leplopodomorpha
(Hemiptera), Systematic Z oology, 29, 1- 26.
Sharkey. M. ( 1989). A hypo thesis-indepe ndent method of character weighting for
cladistic analysis. Cladistics, S, 63-86.
Sharkey, M. (1993), Exact indices, crileria to select from minimum length trees.
Cladistics 9, 211 - 22.
Sharkey, M. (\994). Discriminate compatibility measures and the reduction routine.
Syslemalic Biology, 43, 526- 42.
Siddall, M . E. (1996). Another monophyly index: revisiting the jackknife. Cladistic}", II,
33- 56.
Siebert, 0.1. and Williams, D. M. (1997). Book review (Nullius in verba~ Biological
JOllma/ of the Linneon Society , 60, 145-6.
Smith, A. B. (19940). Systematics and the fossil record: documellling euolurionary
paflems. Rlac kwell Scientific Publications, London.
Smith, A. B. (l994b). Rooting molecular trees: problems and s trategies. Biological
JOllmal of the LinneOlI SOCiety , 51 , 279- 92.
Sokal, R. R. and Rohlf, F. 1. (1981). Taxonomic congruence in the L..eptopodomorpha
re-examined. Systematic Zoology, 30, 309- 25.
Stevcns, P. F. (1991). Character states, morphological va riation, and phylogenetic
analysis: a review. Systematic 8 0/(lIIY. 16, 553-83.
Stucssy, T. F. (1990). Plam taxonomy: Ihe lystematic eva/llaliOlI of comparative data.
Columbia University Press, New York.
Sullivan, 1. (1996). Combining data with different dislributions of among-site variation.
Systematic Biology, 45, 375- 80.
Suter, S. 1. (1 994). Cladistic analysis of the living cussidllioids (Echinoidea), and the
effects of character o rdering and successive ap proxjmalions weighting. Zoological
JOI/mal of the Linnea" Society of Londotl, 112, 363- 87.
Swofford, D. L (1 991). When are phylogeny estimates from molecular and morpho log-
ical data incongruelll? In Phylogenetic analysis of DNA sequel/us (ed. M. M.
,Miyamoto and 1. Cracraft), pp . 295- 333. Oxford University Press, Oxford.
Swofford, D . L. (J 993). PAUP, Phylogen etic al/alysis 145il/8 panimollY, ven iol! J. J. mino is
Natural History Survey, Champaign.
Swofford, O. L. and Begle, D. P. (1993). User 's manual for PAUP, Phyloge'letic unalysis
!Ising parsimony, version 3. J. Illinois Natural HislOry Survey, Champa ign.
Swo([ord, D. L. and Berlocher, S. H. ( 987). Inferri ng evolutionary trees from gene
frequency data under the principle of maKimum parsimony. Systematic Zoology,
36, 293- 325.
Swofford, D. L. and Olsen, G. J. (1 990), Phylogeny reconstruction. In Moleciliur
systemmics (cd. D. M . Hillis and C. Moritz), pp. 411 - 501 . Sinauer Associates,
Sunderland, M ass ac hu sc lt.~.
Szumik, C. A. (1996). The highe r d as~ifi ca t ion o f the order Emhioptcra: a cladist ic
'H1alysis. Clmlislirs, 12, 41 - 64.
References 195
T hick, K. (1993). The holy grail of the perrect character: the cladistic treatment of
morphometric data. Cladistics, 9, 275 - 304.
Thiele, K. and Ladiges, P. Y. (1988). A cladistic analysis of AngopJwnl Cav. (Myrtaceac).
Cladislia, 4, 23-42.
T horpe, R. S. (1 984). Coding morphometric characters for constructing distance
Wagner networks. £1.101111;00 , 38, 244- 355.
Turne r, H. (1995). Cladistic and biogeographic an31yscs of AI)'tera Blume and Mis-
charytera ge n. nov. (Sapindltccae) with notes on methodology 3nd a fu lltaxonomie
revision. Hlum ea (supplcme nd. 9,1 - 230.
Turner, H. and Zandee , R. (1 995). The behaviour of Golobofrs tree fi tness measure
F. Cladislics, II , 57- 72.
Vane-Wright, R. I., Schulz. S. and Boppre, M. (1 992). The clad istics of Amauris
butter£l ies: congruence, consensus and to tal evidence. Cladistics, 8, 125- 38.
Wagller. W. H. (196]). Problems in the classifica tion of ferns. Recelll Admlncu in
BOlaIlY, 1,84 1-4.
Watrous, l. E. and Wheeler, Q. D. ( 981). The outgronp com parison method of
characte r at1 <1 lysis. Systematic Zoology, 30, I- I t.
Werdclin, L (1 989). We are not out of the woods yeC-a report rrom a Nobel
Symposium. Clm/Istics, 5, 192- 200.
Weston, P. H. (1 988). Indirect and direct methods in systematics. In Omogeny mul
syslt!lllatics (ed. c. 1. Hum phries), pp. 25-56. Columbia University Press, New
York.
Weston, P. H. (1994). Methods for rooting cladistic trees. In M{)(lels in phylogmy
I1!colIstmctioll. Systematics Association Special Volume, No. 52, (cd. R. W.
Scotland, D . 1. Siebert and O. M . Williams), pp. 125 - 55. Clarendon Press, Oxford.
Wheele r, W. C. and I-Ioneycull. R. L (\ 988). Paired sequence di(ferencc in riOOS(lmal
RNAs: evolutionary and phylogenetic considerations. Moleclliar Bio/OKY /111(/ P;' 'f}'
Il(ti01l , 5, 90- 6.
Wiens, 1. 1. and Reeder, T . W. (1995). Combining data sets with different IIIl1nbcrs of
taxa for phylogenetic analysis. Systemtlfic Biology, 44. 548-58.
Wiley, E. O. (198 1). Phylogcnetics: Ihe theory and pmctice of pllylogellelic sy.\·'f'"W I;O.
Wiley Intersl'ience, New York.
Wilkinson, M. (1994). Weights and ranks in numerica l phylogellctics. Claw,mo, III,
32 1- 9.
Wilki nson, M. (995). A comparison of two methods of character construeti ull.
Cladisrics, II , 297- 308.
Wilkinson, M. and Benton, M. 1. (1995). Missing dalrl and rhyl1cho~aur phylogeny.
Hi.l"lOrica/Biulogy, 10, 137- 50.
Will iams. D. M. (subm itted). An exami ll3titm of character reprcsentatinn ilnd cladistic
analysis: interrelationships of the diatom genus f 'ragilarifroma (BadJlariuphyta)
and its relativcs. C/lldil·tic.f .
Williams, P. L. lind Fitch, W. M. (Ji)()()). I'hyJIJ~eny determination using a dynamically
weigh ted p;lr.~ imtlny method. In Moift·ulur ~I~"'wi(m: compUl('r {//lIlly~'i.I' of protl'ill
(W(IIWi~"'
;';~'~~~=~;~i~:~:~~;~:' un, (ctl. R. F. Doolittle ),
pp.6 15-2f1.
YC; ll e .~, D.
196 References
Suggestions for further reading
The number in p<Jrenlheses after each entry is a cross-reference to the appropriate
chapter
Archie, J. W. (1989), A rllndomization test for phylogenetic information in systematic

data. Systematic Zoology, 38, 219- 52. (6)
Arnold, E. N. (1981). Estimating phylogenies at tow levels. Zeitschri/t flir zoologische
Systemolik und EvoIUlionsforschung, 19, 1-35. (3)
Ax, P. ( 1987). The phylogenetic system. John Wiley, Chichester. (I)
Clark, C. and C urran, D. J. (1986). Outgroup analysis, homoplasy, nnd glob!! 1 pal'lli-
mony: a response to Maddison, Donoghue and Maddison. Systemm;c Zoology, 35,
422-6. (3)
Coddington, J. A and Scharff, N. (1996). Problems with 'soft' polylomics. Cladistics,
Il, 139-45. (4)
Crisci, J. and Stuessy, T. (1980). Determining primiti;~ character states for phyloge-
netic reconstruction. Systematic Botany, 6, 112-35. (3) .
Crowson, R. A. (1970). ClassificaliOlI and biology. Atherton, New York. (2)
Davis, J. L, Frohlich, M. W. and Soreng, R. 1. (993). Cladistic characters and
cJadogram stability. Systematic Botany, 18, 188-96. (6)
De Beer, G. R. (1958). Embryos and anustors, (Jrd ed.). Oxford University Press,
Oxford. (3)
de l ong, R. (1980). Some tools for evolutionary and phylogenetic studies. Zeitschrijt for
zoologische Syslematik und Evollliionsforschung, 18, }-23. (3)
Eggleton, P. and Vane-Wright, R. J. (994). Some principles of phylogenctics and their
implications for comparative biology. In Phylogenetics and ecology (cd. P.
Eggleton and R. I. Vane-Wright), pp. 345-63. Academic Press, London. (I)
Faith, D. P. and Ballard, J. W. O. (1994). Length differences and lopology-dependent
tests: a response to Kiillersjo el al. Cladistics, 10, 57-64. (6)
Farris, J. S. (1982). Outgroups and parsimony. Systematic Zoology, 3l, 328-34. (3)
Farris, J. S. (1986). On the boundaries of phylogenetic systematics. Cladistics, 2, 14- 27.
(1,3)
Farris, 1. S. (1991). fuce~ homoplasy ratios. Cladistics, 7, 8 1- 91. (6)
Farris, J. S., Kallersjo, M., KJuge, A. G. and Blllt, C. (1994). Permutations. CladistiCS,
10, 65- 76. (6)
Feisenstein , 1. (1985). Confidence limits on phylogenies: an approach using the
bootstrap. Eoolution, 3', 783-91. (6)
Feisenstein, 1. (1988). Phylogenies from molecular sequences: inference and rel iabil-
ity. Annual Review of Genetics, 22, 52 1- 65. (2,6)
Gift, N. and Stevens, P. F. (997). Vagaries in the delimitation of character stales in
quantitative variation-an experimental study. Syslematic Biology, 46,112- 25 . (2)
Goloboff, P. A. (1991). Random data, homoplasy and information. Cladistics, 7,
395-406. (6)
Gould, S. 1. (977). Ontogeny and phylogeny. Belknap Pre~ of Harvard University
Press, Cambridge, Massachusetts. (3)
H auser, D. L and Presch, W. ( 99 1). The effect of ordered characters on phylogenetic
reconstruction. Cladistics. 7, 243-66. (2)
References 197
Hillis, D . M. and Bull, J. J. (1993). An empirical test of bootstrap ping as a method fo r
assessing confidence in phylogenetic analysis. Systematic Biology, 42, 182- 92. (6)
Humphries, C. J. (ed.) (1 988). Ontogeny and Systematics. Columbia Uni versity Press,
New York. (3)
Humphries, C. 1. and Funk, V. A. ( 984). Oadistic methodology. In Cu mmt concepts in
plant taxonomy (cd. V. H . He)Wood and D. M. Moore), pp. 323-62. Academic
Press, London. (I )
Kitching, I. J. (1 992). The de termination of character polarity. In Cladistics: a practical
coune in systematics. Systematics Association Publication, No. 10, (cd. P. L. Farcy,
C. J . Humphries, I. J. Kitching, R. W. Scotland, D. J. Siebert and D. M. Williams),
pp. 22- 43. Odord University Press, Oxford. (3)
Le Quesne, W. (989). Freq uency distributions o f lengths of possible netwo rks from a
data matrix. Clndistics, 5, 395-407. (6)
Lipsromb. D. L. (1 992). Parsimony, homology and the analysis of multistate charac-
ters. Cwdistics, 8, 45- 65. (2)
Mabee, P. M. (993). Phylogenetic interp retation of ontogenetic change: sorting out
the actual and artefactual in an empirical case study of centrarchid fi shes.
Zoological Journal of the Linnean Sociery, 107, 175- 91 . (3)
Mabee, P. M. (1996). Reassessing the ontogenetic cri te rion: a response to Patterson.
Clodistics, 12, 169- 76. (3)
Maddison, W. P. (1 989). Reconstructing characte r evolution on polytomous ciarlo-
grams. Cwdistics, 5, 365- 77. (4)
Maddison, W. P. (199 l). Squa red-change parsimony reconstructions of ancestral slates
for continuous-valued characters on a phylogenetic tree. Systematic Zoology, 40,
304- 14. (2)
Maddison, W. P. and Maddison, D. R. (992). MacCladc: analysis of phylogeny and
character eoolwioll, version 3.0. Sinauc r Associales, Sunde rland, Massachusetts.
(2)
Maslin, T. P. (1952). Morphological criteria of phyletic relationships. Systemalic
Zoology, 1, 49-70. (2)
Nelson, G. J. (1 985). O utgroups and ontogeny. Cladistics, 1, 29- 45. (3)
Pallcrson, C. (1980). Cladistics. Biologist, 27, 234- 40. (I)
Patterson, C. (cd.) (1 988). Molecules and morphology in evolution: cOl/flict or compro-
mise. Cambridge University Press, Cambridge. (2)
Patterson, C. (1996). Comments on Mabee's ' Empirical rejection of the ontogenetic
polarity criterion'. Cladistics, 11, 147- 67. (3)
Penny, D . and Hendy, M. (1985). Testi ng methods of evolutionary tree construction.
Cladistics, 1,266- 72. (6)
Pogue, M. G. and Mickevich. M. F. (1990). Character defini tions and character state
delineation : the bete noire oCphylogenetic in ference. Cladistics, 6, 3 l9-61. (2)
RieppeJ, O. (1 985). O ntogeny and thc hierarchy of types. Cladistics, 1, 234- 46. (3)
Rogers, J. S. (1984). Deriving phylogenetic trees from allele frequencies. Systematic
Zoology, 33, 52-63. (2)
Sa:ther, O . A. (1986). The myth of object ivi ty-post-H e nnj~ia n deviations. Cw distics.
2.1 - 13. (3)
Sanderson, M. J. (1989); IM.'CI ~tI on phyloienies: the bootstrap revisited.
Cladistics, S. 11 ...
198 I?eferellces
Sanderson, M. I . ({995). Objections to bootstrapping phylogenies: a critique. System-

alic Bi%gy, 44, 299- 320, (6)
Sanderson, M. J. and Donoghue, M. J. (1 989). Pallerns of variation in levels of
homoplasy. Evoililioll , 43, 178 1-95. (5)
Sokal, R. R. and Sneat h, P. I-I. A. (J963). Principles of lIumerical haonomy. W. H.
Freeman and Company, San Francisco. (2)
Stevens, P. F. (1980). Evolutionary polarity of charaelt:r Siales. Annual Review of
Ecology and Systemutics, II , 333- 58. (2, 3)
Swofford, D. L. and Maddison, W. r . (1987). Reconstructing ancestr,li character Sla les
using Wagner parsimony. Muthemlltica( Biosciences, 87, 199- 229. (2)
Trueman, J. W. H. ( L993). Randomizatio n confound ed: a response to Carpenter.
Cladistics, 9, 10 1- 9. (6)
Trueman, J. W. H. (996). Permutat ion tests and outgroups. CladisJics, 12, 253- 61. (6)
Wenzel, J . W. (1993). Application of the biogenetic law to behavioral ontogeny: a test
using nest architecture in paper wasps. l oumal of Evuluriol/my Biology, 6, 229- 47.
(3)
Wilkinson, M. (994). Common cladistic informat ion and its consensus rcpresentalion:
reduced Adams Imd reduced cladistic conscnsus trees and profiles. Systematic
Biology, 43, 343- 68. (5,1)
Wilkinson, M. ( 995). Arbit rary resol utions, missing entries and the problem of
zero-length branches in parsimony analysis. Systematic Biology, 44, l OS- II I. (4)
Wilkinson, M. and Benton, M. J. ( 996). Sphenodontid phylogeny and the problems of
multiple trees. Philosophical Tral/sac/ions of the Royal Society of Lom/on, B, 351 ,
1- 16. (5)
Glossary
Ila licizcd words or expressions in a defi nition have their own entry in the
glossary. 'See also' indicates a cross reference to a related topic, whereas 'cr.'
is a cross refe rence to an antonym. Synonymous expressions have only a
single explanatory en try. ' Also known as' is a cross reference from the main
entry to a synonym; 'See' is a cross reference from a synonym to the main
entry. Where two or more alternative definition s for a term arc provided,
then that given fi rst is the accepted usage within the context of th is book.
uccelerated transformation (ACCTRAN) A procedure for resolving ambigu-

ous optimizatioll in wh ich Ihe initial forward charactc r t ransformat ions
arc placed on to Ihe cladogram as close to the root as possible.
Accelerated transformation accounts fo r homoplasy in terms of rever·
sals to the pfesiomotphic condition. Also known as fast tralls/omm/ioll.
ce. delayed trans/onnatio",
accommodated three-item statement (ATS) A three-item sllItemeflf that fits

to a cladogram with a single step.
Adams consensus tree A consensus tree fo rmed from all of t he intersecting

sets of intcrnal nodes common to a set of fimdame'ltal c111dograms.
Taxa in connict ing posit ions are relocated to the most inclusive nOOr.!
that they have in common among the fundamental c1adograms.
nddilive cod ing A met hod for representing ordered mullis/ate ch(lracte,.~, as a
linked series o f binary cJwraclers. Cf. nOIl-additiJ.)e coding.
adjacent character Two character st ates within a multistate cilamctcr are

adjacent if they arc placed next to each other in a IranJ/ommtioll series.
For example, in the transform ation series 0 - I ~ 2, states 0 and I,
and I and 2, are adjacent. Cf. lIoll·adjacellf,
agreement sublree A met hod of comparing two or more [//Iull/ melltal dodo·
grams that shows only the clades and taxa held in common. See also
greatest agrc('IIII!1I1 ,w hlree.
III1.pleslumorphlc uUllruuP An tlnifkial outgroup taxon in which each

character ill alded wllh Ihe: putative plesiomt)rphk state as esti mat ed
by all tllmmLnM.tItod Or ",".rillIHWI. See alst) lIJ/,zl:m Olllgr(Jllp.
200 Glossary
all-zero outgroup An aU-plesiomorphic OU fgroup in which all characters
are co n ~i de re d to be absent and are coded as zero. See also all-
plesiomorphic otugrollp.
amhlguous optimization The resu lt of optimizatioll of a character onto a

given cladogram when one sequence of character transformation pro-
vides su pport fo r a branch, albe it only by homoplastic characters or
character states, while an alternative sequence produces a zero-length
branch. The result is an over-resolved dodogram .
apomorphy A derived character or chara(;tef :stale. Cr. plr:siomorphy. See

also aw apomorphy, homology, s)'lIapomorphy.
autapomorphy A derived characte r or character state (apomorphy) thai is

restricted to a single remljnat taxon in a da1a set. An autapomorphy at
a given hierarchical level may be a synapomorphy at a less-incl usive
level. One form of wlinfomwtive character.
basal hra nch See branch.
binary characler A character that has only two observed states. Binary
characters are generally coded 0/ 1. They can be directed or undirected,
polarized or unpolarized, but are imrinsicaJly ordered. Bin ary characters
cannot be unordered. See also muitistate character.
Biogenetic Law See recapitulation ( Haeckeiian).
bootstrap A statistical procedure for achieving a better estimate of the

parametric variance of a distribut ion than the observed sample vari-
ance by ave raging pselldoreplicate variances, The origina l data set is
sampled with replacement to produce a pseudo replicate of the same
dimensions as the original. Sec also jackkmfe.
branch A line on a c\adogram connecting two nodes (illtemaJ branch), a

node and the root (basal branch) or a node and a tenninal taxon
(tenninal branch). Also known as an illtemo<ie.
branch length (1) The number of steps on a branch. (2) T he number of

characters that fit to a branch.
bra nch support Sec Bremer Sllpport.
branch-and-bound An exact algorithm for c1adogram construction. The

method begins by constructing a cJadogram by means of a heuristic
method. The length of this cJadogram is used u the initial upper bound
Glossary 201
for an exhaustive search. The number of topologies to be examined is
then restricted by discarding alJ partial cladograms whose length ex-
ceeds the upper bound. If a complete cladogram is found Inat is
shorter than the upper bound, then the upper bound is reset to Ihis
length in order to increase efficiency further.
branch-swapping A procedure fo r moving clades around a cladogram in an

effort to find a marc parsimonious topology. See also nearest-neighbour
interchange, subtree pnming and regra[ting, tree bisection and recorlflec-
tion .
Bremer support The number of exira steps required before a clade is lost
from the strict consensus tree of near-minimum length c1adograms.
Also known as branch suppor1, clade stability, decay index, lel/gth
difference .
Comin-Sokal optimization The optimization proced ure used for a certain

type of ordered, polarized, directed ch aracter. The costs of the transfor-
mations, 0 - I, I - 2, etc., are as in Wagner optimization. The costs of
aU transformations in the opposite direction are treated as infinite,
thereby preventing character reversal. Camin-Sokal optimization re-
quires all homoplasy to be accounted for by multiple, parallel transfor-
mations.
character (0 A character is an hypothesis of primary homology in two or

more temlinal taxa based on original obsetvations of organisms. (2) An
observable feat ure of a organism used to distinguish it from another.
See also character state .
character analysis A procedure that re-examines the origi nal data in an

effort to discover whether any errors in the original coding o( charac-
ters and scoring of character states have been made, i.e. faulty hy-
potheses of primary homology or inappropriate character coding.
character congruence See total euidence.
character state (1) A scored observation of a feature perceived in the

organism(s) chosen to represent a remlinal taxon. (2) One of two or
more alternative manifestations of a character (2).
chorological progression, criterion of Sce progression mle.
cI.de See monophyletic JlTOUP.
cI.d••labillty See jaf!l''''Y2!lZ.

· • ••
202 Glossary
clade stability index (CSJ) The rat io o f the minimum nu mber of character
deletions from a dala matrix required to lose a clade fro m a cladogram
to the total number of informlllive characte rs in the data set from
which that c1adogram was derived.
cladistic cons istency The fi t of a c haracte r to a cladogram in terms of the

number of occurrences requ ired to explain the data. Cladistic consis-
tency is usually measured using the consistency index or the rescaled
consi.ftem"y index. See ulso consistent.
cladistic covarialion The degree to whic h all characte rs in a data set are
explainable by Ihe same d adogram topology.
cladistics A method of classification that grou.ps taxa hierarchically into

nested sets and conventionally represents these relationships as a
cladogmm . See also phylogene!ie jystemllfics.
cladogrdm A branching diagram specify ing hierarchica l relationships among

taxa based upon homologies (synapomorphies). A cladogram includes
no connotat ion of ancestry and has no implied time ax is. Cf. phyloge-
netic tree.
clique A sct of mutu ally complllible characters.
COding The conversion o f original observations into a discrete alphanumeri-

cal fo rmat suitable for cladistic analysis.
combin able components consensus tree A consensus tree formed from all
the uncontradicted componelUs in a set of /wuiamemai c1adograms;
thai is, o ne that contains all the components found on the respective
strict consensus tree, plus those componen ts that are uncontrad icted by
less resolved components within the set of fundament al dadograms.
A lso known as a semi-.wriet conscnSlJS tree.
common pruned tree See greatest agreement s/lbtree.
cummonality A method of characlCr polarization th at states that the

character state most freq uently observed ilmo ng the ingroup taxa is
{,/esiolllorphic.
compatible char:tcter Two cha rac ters that do not contlic t in the g roltp.~ Ih at
they Sli ppo rl arc termed compatible. Compatihle t haracl crs im:llIdc
bl)lh those Ihal arc ('()//8! 'II CIIf aud thoSe til:ll IIrc i'lIIui.III·"' .
Glossary 203
component A group o f taxa as determined by the branching pattern of ,I
c\adogram. For example, in a group comprising three taxa A , B, and C,
where B and C are mo re closely related to each ot her than e ither is to
A, there are two components, ABC and BC. ABC is an tminfommtive
component, while BC is an infomra/We component. Also known as
clade, monophyletic group .
component information The number of infomlatiue compOl1ents in a d ado-

gram. See also lenn informatiotl.
A (;Ollsensus tree Ih a l includes a l Icast some components

COIIIIJWlllisl! (['l!l!
that are not present in all of the fund amental cladograms. Cf. Sln'ct
consensus tree. See also Adams consensus tree, com binable componen fs
consenslls tree, m ajority-mle consensus tree, median consensus tree,
Nelson consensus free.
concordance The degree of agreement between two patterns.
congruence test A test of secondary homology. To pass the congruence test,

a characte r must specify the same group on a cladogram as another
character. In combi nation with the similarity test and conjunction lest ,
the congruence test equates hom ology with SYllapomotplty. See also
conj unction lest, sim ilarity lest.
congruent character A character t hat specifics the same group of taxa as

another character. See also compatible character, cOllsistent d WTUcter,
homologue.
congruent cladograms A set of cladograms that agree in their topologies.
conjunction test A test of primary homology. To pass the conjunction test,

two characters hypothesized to be homologous must not coexist in an
organism al the same time. See also congruence lest, simi/orily test.
consensus method A method for combining the grouping inform ation con-
tained in a set of cladograms for the same taxa into a single topology,
the consensus tree. See Adams consensus tree, com hinable com pollentJ'
consensus tree, comprom ise tree, majority-ntle Cotu'eIlSIlJ' tree, mediall
COllsellSllS tree , Nelson consensus tree, strict COIISell,~IIS tree.
consens us tree (I ) A brllllch ing d iagrum produced using a W"SCIISUS m ethod.

(2) In a restricted sense, a striN C()II.fenrU5 tree. Cc. COml)rom ise free .
consistency lodel ((I) A me.,ure uf the unu,unl of homoplasy in a cha rlle-

Icr relative 10' 1m. The consistency indu is cillculalcd as
204 Glossary
the ratio of m, the minimum number of steps a character can exhibit
on any cladogram, to s, the minimum number of steps the same
character can exhibit on the cladogram in question. See also ensemble
consistency index, retention index.
consistent ehameler A character that specifies a subset of the group of taxa

specified by another character or a different group of taxa entirely.
Also known as logically consistem. See also cladistic consistency. com-
patihle, congruent character.
constant character A character for which all taxa in a data set are allocated
the same code. One type of uninformative character.
constrained, two-step analysis A method of c1~dogram construction in which

the different slales of each characler are' first organized into transfor-
mation sen'es and polarized using outgroup comparison and known.
fIX ed outgroup relationsh ips. The SYllapomorphies so revealed are used
to construct the cladogram. See Helllligiall argumentation; simultane-
ous, unconstrained analysis.
continuous charaCler A character for which potcntial valucs are so infinites-

imally close that there are potentially no disallowable real numbers,
e.g. wing length. See also discrete.
convergence Two characters that pass the conjunctioll test of homology but
fai l both the similarity and congruence tests are termed convergent.
Also known as homoiologies. See also homology, parallelism.
COSI The number of steps required to account for the tJansfonnation of one
character stale into another On a c1adogram.
cosl matrix A square malrix representing the costs of transformation be-

tween all states o f a characte r. The values in the upper triangle of the
matrix represent the costs in olle direction (usually ' forward '), while
the values in the lower triangle represent the costs of transformation in
the opposite direction.
data decisiveness (DD) A measure of the degree to which a data matrix is

decisive. Data de(;isivc ncss is calculated as the ratio of (5 - S) to
(5 - M), where S is the obse rved length of the most parsimonious
d adogrmn, S is the mean length o f all possible bifurcat ing dadograms
:md M is minimum possible length or a cladogram were there no
homoplasy in the data.
Glossary 205
decay index See Bremer support.
dedsive assignment The assignment of a unique value to an intemal node

on a cladogram by means of an optimization procedure. Cf. equivocal
assignment.
decisive data A data sct that conta ins at least one phylogenetically informa-
tive character. Decisive data yield cladograms that differ in length
among themselves and thus offer reasons for choosing some clado-
grams in preference to others. Cf. wldecisive data.
delayed transformation (DELTRAN) A procedure for resolving ambiguous

optimization in wh ich the initial forward character transformations are
placed on to the c1adogram as far fro m the root as possible. Delayed
transformation accounts for homoplasy in terms of independent gains.
Also known as slow frons/onnafion . Cf. accelerated trons/omUlfioll.
dependent character A character or character state for which the coding

depends upon the coding allocated to another character or character
state. For example, features of wing venation in insects arc dependent
upon the presence of a wing.
derived character See apomorphy.
direct method A method of character polarization that can be implemented

using only the information available from the taxa under study. O.
indirect,
directed character A character in which the transformations in one direc-

tion cost a different number of steps from the transformations in the
opposite direction, For example, in the directed character, 0 - I, the
transformatio n 0 --t 1 may cost one step but the reverse transformation,
1 --t 0 may cost two steps. Camin- Sokal and 00110 optimizations both
use directed characters, a, undirected,
direction The imposition of differential costs on the transfonnation be-

tween two character states in one direction relative to the transfonna-
tion in the opposite direction. See also order, polarity,
discrete character A character that can be represented logically by a subset

of all possible real numbers, generally only by integers. See also
cOlllinuous.
Dollo optlm...... Tbe OfIIimiztltion procedure used for a certain type of

ordered, ~!!!IIIIII• •_ •.Ch.r.cters may be POlarized or not. The
206 Glossary
casts of the tJansformations, 0 -+ 1, 1- 2, etc., are as in Wagner
optimizatioll, but weighted by a high arbitrary value to e nsure that each
occurs only once on the cladogram .. The costs of all transformations in
the opposite direction are as in Wagne r optimization. 00110 oplimiza·
lion requires all homoplasy to be accounted for by reve rsals 10 more
plesiomorphic conditions.
doublet Any two consecutive outgroup taxa o n a cladogram that share the
same slate.
enstmble CQDsistency index (en A measu re of the amount of ho moplasy in
a data matrix relative to a given c1adogram. Th e ensemble consistency
index is calculated as the ratio of M , the minimum number of steps all
characters can exhibit on any c1adogram, to 5. the minimum number of
steps they can exhibit on the c1adogram in question. See also consis-
tency index.
ensemble relenlion index (RI) A measure of the amount of similari ty in a
data mat rix that can be interpreted as synapomorphy on a given
cladogram. The ensemble retention index is calculated is the ratio of
(G - S) to (0 - M), where G is the greatest number of steps that all
characters can exh ibit on a ny c1adogram, M is the minimum number of
steps all characters can exhibit on any cladogram and S is the mini-
mum number of steps they can exhibit on the cladogram in question.
See also retention index.
equivocal assignment The assignment of a non-unique value to an imemal
node on a cladogram by means of an optimizatiOfI procedure. Cf.
decisive assignment.
evolutionary taxonomy A school of systematics_that balds the position that
overall similarity had to be taken into account and balanced against
genealogy in the collsll'Uction of a classification. Because evolutionary
rates among differe nt lineages were known to be highly variable, those
taxa that had 'evolved further' (as evidenced by a large numbe r of
autapomorphies) warranted special recognition. O ne consequence of
this position is an insiste nce on the retention of paraphyletic groups in
classifications.
end algo.-ithm An algorithm for constructing cladograms that is gua ran -
teed to find one or all of the most parsimo nious cladograms. Cf .
heuristic algorithm. See also braflch-alld-bollnd, exhallsliue seurch.
exhaustive search An exact algorithm that examines every JX>ssible fully·
resolved, unrooted cladogram for the taxa included in that data set in
order to find the most parsimonious solution<l).
Glossary 207
fast transformation See accelerated trans/annation.
nller A procedure applied to characters between the initial discovery phase

and the recording of the variation in a data matrix that aims to simplify
the coding without loss of informa tion.
Filch optimization The optimization procedure used (or unordered, unpolar-

ized, undirected characters.
fractional weighting (FW) The application of differential weights to c!unac-

ters or three-item Slaft:lfl efll~' ill a uala sc t in o rder to correct for
redundan cy. Most frequently used with refere nce to three-item !1'tate-
menU analysis. Cf. ullifonn weighting.
fundamental cladogram A c1adogram produced by direct analysis of data.
gap A sectio n of a scaled character axis where no observed values occur or

where the distance bctween two consecutive observations exceeds
some preconceived value (e.g. one standard deviation about the mean).
gap-coding A method for recoding colllillUOUS characters (usually morpho-

metric data) as discrete characters by the creat ion or recognition of
gaps.
gap-weighting A gap<o<iing method that not o nly recodes continuous char-

acters as discrele characters but also maintains the relative sizes of the
gaps between them by means of additive coding.
gltncral characler ( I) A character that occurs earlier in an ontogenetic

sequence and is considered to specify a more inclusive group. (2) A
character that is more frequently observed.
generalized optimization An optimization procedure in which the costs of

transform ation are set individually for each character and character
state and represented as a cost malrix.
geological chllrllcler precedence, crilerion of See slratigraphic cn·teriorl.
gr~atest agreement subtree The agreement subtree found by pruning one or

more bra nches from each of a set of fundamemal cladograms until a
set of identical topologies is obtained. Also known as a commOI! pnmed
tree .
• roundplan A reconstruction of the cha racter states at the in.wolltJ node.

See also hy~ulor.
208 Glossary
Hennigian argumentation The first explicit procedure for constructing

cladograms, in which the information contained in each character is
considered independently. The different states of each character are
fi rst organized into lra ns/onnaliofl series and polarized, then the
synapomorphies so revealed afC used to construct the cladogram. See
also cOflstmined, two-step analysis.
heterogeneity Heterogeneity between data sets refers to the statistical dif-

ference between the topology and strength o f phylogenetic sign al
contained in two or mo rc data sets coded for the same taxa.
heuristic a lgorithm An algorithm for constructing cladograms that is not

guaranteed to find the most parsimonious solutio l1 . O. exact algon'thm.
See also branch-swapping, step wise addition.
homoiology See convergent.
homologue, homology (1) Two cha racters that pass the similarity, cOlljunc-
tion and cOllgmellce tests are termed homologous. Also known as
SYllopomorphy. (2) Character states that share modifications from an-
other conditio n, e.g. wings of birds in relation to forelimbs of other
tetrapods. See also convergence, parallelism. primary homology, sec-
ondary homology.
homoplasy (adj. homoplastic) (I ) A character that specifi es a different and

overlapping group of taxa fro m anoth er character. (2) Any character
that is not a synapomorphy (homology).
hypothesis dependent weighting See weighting (a posteriori).
hypothesis independent weighting See weighting (a priori).
hypothetical a ncestor A reconstruction of the character states at the ill -

group node interpreted in terms of a real ancestor o f the ingroup taxa.
See also groundplan .
illogical coding See lIon-appiicable coding.
implied weighting A procedure fo r weighting characters according to their

fi t to a c1adogram, assessed as the implied number of extra steps. The
fitling {unction is generally concave, which gives more weight to those
characters with least homoplasy. See also successj~ (approximations
character) Wl!i&hling.
Glossary 209
Indirect method A method of character Iw/arization that requires informa-

tion from a source external to the taxa under study. Cf. direct.
informa tive component A component that includes morc than one but Icss
than all of the lermina/taxa in a data set. Cf. IInin/omwtive component.
Ingr-oup The group under investigation in a cladistic analysis in order to

resolve the relationships of its members. O. OlltgrOUp.
Ingr-oup node The nOlle of a c1adogram that unites all members of the
if/group into a single clade. See also au/group /lode.
inter-dependency See linkage (2).
Internal branch Sec branch.
Inter-node Sec branch.
islands of trees Sets of c1adograms that can on ly be reached one from

another by branch -swapping topologies that are longer than those
currently to hand.
jackknife A statistical procedure for achieving a better estimate of the

parametric variance of a distribution from small samples than the
observed sample variance by averaging pseudoreplicQte variances. The
original data set is sampled without replacement to produce a pseu-
doreplicate that is smaller than the original. In first-order jackknifing,
only a single original obselVation is excluded; in higher-order jackknif-
ing. more than one original observation is excl uded. See also boowrap.
jackknife monophyly Index {JMI) The ratio of the sum of p (c,), from / = I
to T, to T, where T is the number of ingroup taxa and p (c,) is the
proportion of the most parsimonious cladograms of pseudorcplicate t
in which clade c is supported.
jackknife strict consensus tree A consensus tree that includes all compo-
nents common to or consistent with all members of a set of jackknife
pseudoreplicate c1adograms.
length (0 The minimum number of character changes (steps) required on a

cladogram to account for the data. (2) The summed fit of all characters
to a cladogram. (3) The total of accommodated three-item statements
plus twice the number of non-accommodated three-item statements on a
ciadogram.
210 Glossary
length difference See Bremer support.
HnkaKe (t) The condition under which two characters do not represent
independent evidence in support of a group. (2) The union of more
than two character slates into a single mV/Ii$tQle character. Also known
as character interdependency.
logically consistent See consisletU.
Lundberg rooting A method of rooting that first determines the most

parsimunious cladogram for the ingroup taxa alone, the n, keeping this
topology fixed, adds an all-plesiomotpllic outgrouf' or hypothetical ances-
lor at the position thai gives the leas t increase in overall length of the
cladogram.
majority-rule consensus tree A consensus (ree tormed from those compo-

nents that occur in al least 50% of a set of fulldamelltal cladograms.
The ' majority rule' can also be set to any value greater than 50%.
median consensus tree A consensus tree for which the degree of difference
between any pair of fundamental c/adograms (as measured by a tree
comparisoll metric) is smaller than for any ot her cladogram of the same
taxa.
meristic character A discrete character that represents counts of structures

expressed as inlegers, either directly scored into tbe data matrix: or
rescaled, e.g. number of digits on the foot of a mammal.
metric tree A branching diagram on which the length of each bratlch is

proportiol1al to the amount of character change thai occurs along it.
See also lIorl-metric tree, uflrametric tree.
midpoint rooting A method of rooting that places the root at the midpoint
of the longest branch o r path connecting two taxa.
minimal cladogram (I) A most parsimonious cladogram. Using the standard

{lpproach , a minimal cladogrnm has minimum length; using three·item
statements analysis, a minimal cladogram is one that fits the greatest
number of accommodated three-item statements. Also known as the
optimal cladogram. (2) A strictly supported c/adogram.
monophyly (monophyletic group) (0 A group is diagnosed as monophyletic

by the discovery of homologies (SY1wpomofplries). Also known as a
chule. (2) A grou p that includes a most recenl common ancestor plus
all and o nly all of its descendants. See !lIsa compotlctlt , pamphyly.
polyphyly,
Glossary 211
morphometric character A continuous, quantitative character derived from

measurements of some aspect of the morphology of an organism.
most parsimonious reconstruction See optimization.
multistate character A character th at has more than two observed states.

Multistate characters are generally coded 0/ 1/ 2 .. . n and can be
directed or undirected, polarized or unpolarized, and ordered or un-
ordered. See also biliary character.
nearest-neighboul' iuten:luUlgl: A method of branch-swapping Ihat ex-

cha nges a nearest-neighbour (rom onc end of an intemal brallch with
one from the other end.
nearest neighbours The branches arising from the nodes at either end of a
particular internal branch of a cladogram.
Nelson consensus tree A consenstls tree formed from the C/iql~ e of mutually
compatible components that are most replicated in a set of fundamen-
tal c/adograms.
node A point on a cladogram where three or more branches meet.
non-accommodated three-item statement (NTS) A three-item slCltemelll that

fits to a cladogram with two steps.
non-additive coding A method for representing unordered mllltisUlte charac-

ters as a linked series of binary characters. Cf. additive codil1g.
non-adjacent characler Two character Slates with in a multistate character

arc non-adjacent if lhere is at least one othe r character state placed
between them in a tralls/onnatiQI1 series. For example, in the transfor-
mation series 0 - 1 H 2, states 0 and 2 are non-adjacent. Cf. adjacent.
non-applicable coding A coding to denote character states that cannot

logically be obsclVed in some taxa (e.g. the condition of tooth charac-
te rs in modern turtles). Computer programs evaluate these as missing
data coded as question marks. AJso known as illogical codillg.
non-met ric tree A branching diagram on which each of the branches is of

equal length. See also metric tree , ultrametric tree.
ontogenetic criterion A direct method of character polarization that uses

the information derived from ontogeny to determine the relative apo-
m orphyand piesiomorphy of character states found in the ingroup taxa.
212 Glossary
ontogeny The pattern of changing features 'of an organism as development
proceeds from zygote to adult,
optimal cladogram See minimal cladogram ( 0 . See also suboptimal clado-

gram.
optimality criterion The sum total of all constraints to be applied to a

character during optimization in terms of order, polarity and direction.
optimization A procedure for reconstructing the most parsimonious se-

quence of character state change (most parsim mlinus recolls/metioll .
MPR) on a cladogram by minimizing an optimality en'terion. See also
Ca m;n- Sak(" optimization, Dollo optimization, Fitch optimization, ge,,·
eralized optimization, Wagner optimization.
order The sequence of character state transformation in a mu!tEstale char-
acter. See also direction, polan'ty.
ordertd character A mullistale character of which the order has been

determined. Transformation between any cwo adjacent stales costs the
same number of steps (usually one, see direction), bUI transformation
between two rlon-adjacem states costs the sum of the steps between
their implied adjacent states. For example, in the ordered characler,
a .... I ... 2, the transformations a ++ I and 1 ... 2 each cost the same
number of steps but the transformat ion a ... 2 costs cwice as many (i.e.
transformation proceeds as if via state 1). Wagner optimizatioll uses
ordered characters. cr. unordered.
outgroup A taxon used in a cladistic analysis for comparative purposes,

usually with respect 10 character polan'ry determination. Cf. illgroup.
outgroup comparison An indirect method of character polarization that uses

the info rmation on character states in oUlgroup laxa to determine the
relative apomorphy and plesiomorphy of character states found in the
ingroup taxa.
outgroup node The node on a cladogram that unites the ingroup taxa with
the (irst Oldgroup (sister-group). See also ingroup node.
over-resolved cladogram A cladogram with spurious resolution due to the

presence of one or more zero-length branches. See also ambiguous
optimization, sln'Clly-sllpported c/udogram.
parallelism Two cha racters that pass both the similarity test and conjunction
fest of homology but fail the congruence teat are termed parallelisms.
See also convt'fltnce, homoJogy.
Glossary 213
paraphyly (paraphyletic group) (1) A group recognized by symplesiomor-

pJiies. (2) A group that remains when one or more components of a
mOflOphylelic group are excluded. (3) A group thfl l indudes a most
recent common ancestor plus on ly some of its descendants. See also
monophyly, polyphyly.
parsimony The general scientific criterion for choosing among competing

hypotheses that states that we sho uld accept the hypothesis that
explains the data most simply and efficiently.
pllrUlion~t1 lu ..dysls A technique of data analysis in which data from differ-

ent sources are maintained as distinct data matrices, analysed individu-
ally and then the results combined using a consellSlts tree method to
extract common groupings. Also known as taxonomic congm eflce.
permutation tall probability (PTP) lest A procedure thal attempts to assess

the degree of cladistic covariatiofl in a data set by permut ing the states
of each character and random ly reassigning them to the illgroup taxa in
such a way that the proportions of each stale arc maintained. This
process is repeated to produce a large number of such pseudoreplicate
data sets, which are then analysed cladistically. The PTP is defined as
the proportion of all data sets (permuted plus original) that yield
cladograms equal to or shorter than those produced from the original
data set.
phylogenetic systematics A method of classification that utilizes hypotheses

of character transformation to group taxa hiera rchically into nested
sets and then interprets these relationships as a phylogenetic lree. See
also cladistics.
phylogenetic tree An hypothesis of genealogical relationships among a group

of taxa with speci fic connotations of ancestry and an implied time axis.
Cf. cladogram.
plesiomorphy (I) An apomorphy of a mo re inclusive hierarchical level than

that being considered. (2) An ancestral or primitive character or
character state. Cr. also apomorphy.
polarity A character o r tralls/onnaliol! sen"es is said to be polarized when the

directio n of character change or evolution has been specified, thereby
determining the relative plesiomorphy and apommphy of the charac-
ters or character states. See also directio", order.
polartlAtion The asaipunent of poIan·ty to a character or Ira nsfonnalioll

StMI.
2 14 Glossary
polymorphic character (I) A character that can show two or morc character
stales within the same individu al, e.g. alleles. (2) A character that can
show two or more states among diffe rent individuals o f a taxon, e .g.
colour forms of some species o f bUllerfly.
polyphyly (polyphyletic group) (0 A group based on convergelll characters.

(2) A group based upon homoplastic characters assumed to have been
absent in the most recent common ancestor of the group. (3) A group
that does not include the most recent com mon ancestor of all its
membe rs. See also mOllophyly, paraphyly.
primary homology An hypothesis of homology that has passed the similarity

and conjunction teslS, bUI which has yet to be subjected ' to the
congmence tesl. See also secondary homology.
primillve character See p/esiomorphy.
process partitions A concept deve loped for partitioned analysis in which

data arc organized into sets satisfying ccrtain restrictive process
criteria.
progression rule A method of character polarization that states that the

character state found in the taxon furthest geographically or ecologi-
cally from the ancestral taxon is apomOlphic. Also known as the
criterion of chor% gicat progression.
pseudorepJicale An artificial d ata set produced by permutation of, or rc-

sampling from , a data set of real observations.
qualitative character A character for which the original observations are in

teons of distinct, alternative conditions. Many qualitative characters
are actually Jiltered versio ns of quantitative characters.
quantitative character A character for which the original observations arc

in the form of measurements. See also meristic character, qualitarive
character.
recapitulation (Haec:kellan) The view of development that stales that on-

togeny recapitulates phylogeny and thus the ontogenetic sequence o f
an organism would be expected to pass through the stages found in the
adults of its ancestors. Also "'flnWfl as (he BiMel/etic Law. See also
recapilulalion ( von Baeriatl).
recapitulation (von Beerlen) The view of development that states that two
(axa will share the same ontogenetic sequence up CO the point that they
Glossary 215
diverged into separate lineages and thus we would never expect the
ontogenetic sequence of an organism to pass through the stages found
in the adults of its ancestors. Se.e also recapitulation ( Haeckelian)'
redundancy (character) A character that exhibits linkage (1) with another

character is redundant.
redundancy (three. item statement) A three-item j·tafemetll that is logically

implied by the combination of two other three-item statements is
redundant. For example, the statement A(BCD) yiclds the three- item
statements: A(BC), A(BD) and A(CD). Combination of any two will
recover the original statement, A(BCD), making the third redundant.
Redundancy is accounted for by /roctiotlal weighting.
replicated component A compon.ent is replicated on two c1adograms if it is

present in both, irrespective of the internal relationships of its can·
stituent taxa. Thus, if one c1adogram includes the clade A(B(CD») and
another cladogram includes the clade A(D(BC» , then the component
(BCD) is replicated in both topologies.
rescaled consistency index (rc) The product of the cQnsistency index and the
relelltion index of a character.
retention index (ril A measure of the amount of simil arity in a character

that can be interpreted as synapomorphy on a given cladogram. The
rete ntion index is calculated as the ratio of (g - $) to (g - m), where g
is the greatest number of steps a character can exhibit on any clado·
gram, m is the minimum number of steps a character can exhibit on
. any cladogram and s is the mi nimum number of steps the same
character can exhibit on the cladogram in question. See also consis·
tency in.dex, ensemble retention index.
root (I) The basal taxon of a cladogram on whicb all characte rs have been
polarized. (2) The starting point or base of a cl adogram.
rooling The process of assigning a root to a cladogram.
sare taxonomic reduction A technique used to identify those taxa having a

complement of values that will have no influence on the topological
relationships in a cladistic analysis.
secondary homology An hypothesis of homology that has passed the ~·im ilar
ity, conjunction and congruence tests and is accepted as a synapomor-
phy. See also primary homology.
2 16 Glossary
semi-slrict consensus tret See combinable co"lponents consel/SIIS tree.
similarity test A test of primory hnmoiogy. To pass the similarity test, two
characters must be generally comparable in morphology, anatomy and
topographical position. See also congruence test , conjllllction lest.
simultaneQus analysis See total evidence.
simultaneous, unconstrained analysis A method for constructing clado-

grams that considers both OUlgroup and iI/group taxa together and
makes no a priori assumptions regarding character polarity. See also
cOl/strained, two-step analysis. Not to be confu sed with simultaneous
analysis .
sister-group(s) (1) Two taxa that are more c1dsely related to each other
than either is to a third taxon. (2) The taxon th<!t is genealogically most
closely related to the ingroup.
slow transformation See delayed trans!onnation.
spurious resolution ReSOlution o n a cJadogram that is not supported unam-

biguously by data, but is solely the product of ambiguous optimization.
standard approach The method of cladistic analysis that codes tbe obsetved
feat ures of taxa as biliary characters and/or transformation series,
assesses the optimal c1adogram in terms of length and investigates
hypotheses of characte r evolut ion using optimization proced ures. Used
primarily in contradistinction to three-item statemelllS analysis.
step (1) A single gain or loss of a character or a transformation of a

multis/ate character on a cladogram. (2) The fi t to a node of a clado-
gram of an accommodated three-item statement.
stepwise addition The sequence by which taxa are added to a developing

cladogram during the initial building phase of an analysis.
stratigraphic criterion A method of character polarization that states that

the character state found in the oldest foss il taxa is plesiomcrphic. Also
known as the criterion of geological character precedence.
strict consensus tree A consellSUs 'ree formed from on ly those componen ts

com mo n to all members of a sct of fundamental c/adogrtlms. In a
restricted sense, II strict consensus tree may be considered to be the
only consensus tf'l!t! Ihnt results from a IlU. ~nsus. All other
Glossary 217
consensus methods permit the inclusion of at least some components
that are not present in all of the fund amental cladograms and thus
produce compromise trees.
strictly supported cladogram A c1adogram from which all spurious resolu-

tioll has been removed and on which all resolved groups are unambigu-
ously supported by data. Also known as a minimal cllldogram (2). See
also over-resolved cladogram.
suboptimal c1adogram A c1adogram that requires more than the minimum

number of steps to alXount ror the data. Usually interpreted as those
cladograms that are one or a few steps longer than the optimal
cladogram.
subtree pruning and regmRing (SPR) A method of branch-swapping that

clips off rooted subcladograms from the main cladogram and then
re-attaches them in new positions elsewhere on the remnant main
cladogram.
successive (approximalions character) weighting An iterative procedure for

we ighti ng characte rs a posteriori accordi ng to their cladistic consistency,
which is usually measured by the consistency index or rescaled consis-
tency index. See also implied weighling.
symplesiomorphy (0 A synapomorphy of a more inclusive hierarch ica l level

than that being considered. (2) The occurrence in two or more taxa of
a monophyletic group cf a plesiomorplric character or character state;
that is, one that has been inherited from an ancestor more distant than
the most-recent common ancestor of the group. Paraphyletic groups
result from mistaking symplesiomorphies for synapomorphies.
syna pomorphy (0 A secondary homology. (2) An apomorpliy that unites two

or more taxa into a monophyletic group.
taxlc a pproach An approach to cladistic analysis that uses only the distribu-
tions of characters among taxa to hypot hesize group membership. All
other propenies of both characters and groups (e.g. polarity, mOllo-
ph}'ly, mltls/omlalioll series) are derived from the resulting cladogram.
Cf. trons!omwtional approach.
taxon A named group of two or more organisms.
luonomic congruen« See partitioned analysis.
term See tenninai taxon.

218 Glossary
term information The term information of a component is one less than tbe
number of lennillol taxa incl uded within that component. The term
information of a cladogram is the sum of the term information of all its
infomwlive componeflfs. See also compollelll ;1I/onlla(;OIl,
terminal branch See branch.
terminal taxon A taxon placed at one end of a tenni,w[ bronch on a

cladogram. Nso known as a ten".
three-item statement T he concept that two entities (taxa, areas) arc more
closely related to each other than either is to any other third entity.
Also known as a three-laxon :Natemellt.
three-item statements a nalysis A method of cJ.ildistic analysis that focuses

on the smallest unit of relationship. the three-item statement, rather
than on characters. The observed fea tures of taxa are coded in terms
of the relat ionships they imply. that is as three-item statements, and the
optimal cladogram is that which maximizes t he number of accommo-
dated three-item stalements.
three-taxon statement See three-item statement.
topology-dependent permutation ta il probability (T.PTP) test A modifica-

tion of the pemllltation tail probability lest that attempts to assess the
degree of support for individual clades on a cladogram.
lota1 evidence A technique of data analysis whereby all data is combined

into a single matrix before parsimony analysis to maximize cJwraCler
congrnence or homology statements. Also known as simultaneous analy-
sis or the characler cO/lgnlence method.
lotal support The sum of the Bremer support values of all branches on a
cladogram.
lotal support index The ratio of tolal .suppott to the length of the most
parsimonious cladogram .
tra nsformation series A series of three or more increasingly apomorphic

characters or character states.
Iransformation series analysis (TSA) An iterative method of character

analysis that attempts to bring the order of multistate characters into
conformity with the hierarchy inheren t in the rest of the data. TSA
begins with the cnnstrllf.:tion of lin initial 1f'Q~I;On ,st!rit!,s fur each
Glossary 219
multislate character. The data set is then analysed to find {he most
pllrsimonious c1adograms. Any transformation series that conflict with
these cladograms are recoded to confnrm to tldjact!m positions on the
cladogram. The data set is then recoded and re-analysed to obtain a
new set of most parsimonious c1adograms. This process is repeated
until both the topologies of the most parsimonious cJado2rams and the
transformation se ries stabil ize.
transformational apPI'OHch An approach to cladistic analysis that views

characters as features o f organisms that transform one into another
and thus that polarized tralls/onna/wl! series must be postulated prior to
analysis. Cf. toxic approach.
tree bisection a nd recoooeclittn (TOR) A method of branch-swapping tliit

clips off subcladograms from the main c1adogram and re-roots
before re-auaching them in new positions t:lsewhere on the'..!'r.!II1~rJ
main c1adogram.
tree comparison melric A measure of the degree of difference ~~iiiiLi

cladograms. See also median consensus tree.
tree dependent weighting Sec weighting (a posteriori).
tree independent weighting See weighting (0 priori).
uUrametric tree A branching diagram on which every tcrm;""/11U1DIf Ii UiIi

same distance from the root. See also metric tree, non -~Irlc lIN.
undeci sive data A data set that includes ;til possible infornllllive ch:tntcters
in equal numbers so that it is phylogenc liclllly uninrormll tive. Ulldeci-
sive data yields all possible fully resolved ciadograms. which will all be
of the same length, and thus offer no reason for choosin g some
cladograms in preference to ot hers. Cf. decisive data .
underlying syoapomorphy (I) Close parallelism as a result of common

inherited genetic fac tors ca usi ng incomplete synapo lllorphy. (2) The
inherited potential to develop parallel similarities.
undirected character A character in which the transformations in one

direction cost the same numbe r of steps as the transformations ill the
opposite d irection. For example, in the undirected character, 0 - l ,the
transformat ions 0 --> 1 and I --> 0 each cost the same number of steps.
WOKner optimization and Filch optimization use undirected characters.
Cf. directed.
220 Glossary
uniform weighting (UW) The application of equa l weights to all characte rs
or three·item statements in a data SCI. Cr. fractional weighting.
uninformative character A character that contai ns no grouping information

relevant to a particular cladistic problem, e.g. autopomOlphies and
constant characters.
uninformative component A component that includes either a single lenni·

nal taxon or all of the taxa in a data set. cr. informative compo"efll.
unit discriminate compatibility measure (UDCMJ The complement of the

probabili ty of a derived character state being nested with another
derived character state, or the probability of a derived character state
being exclusive of another derived character state, depending upon the
observed pairwise character comparison.
unordered character A mu/tistate character of which the order has not been
determined. In an unordered character, transformation between any
two states, whether al/jacent or non-adjacent, costs the same number of
steps (usually one, see direction). For example, in the unordered
character, 0 ...... 1 ...... 2, the transformations 0 .... 1. 1 ++ 2 and 0 ...... 2 all
cost the same number of steps. Filch optimization uses unordered
characters. Cf. ordered.
unpolarized A character that has not had its polarity determined.
Venn diagram A graph ic represe ntation of a cladogram using inte rnested

boxes. circles. ellipses or parentheses to symbolize nodes.
Wagner optimiza tion The optimizcllion procedure used for ordered, unpolar-
ized, undirected characters.
weighting (0 posteriori) A procedure that applies differential weights to

characters following cladogram construction. Also known as hypothesis
dependent or tree dependent weighting.
weighting (0 priori) A procedure that applies differential weights to charac-

ters prior to c1adogram construction . Also known as hypothesis indepen-
dent or tree independent weighting.
zero-length branch A branch on a cladogram that is unsupported by

characters.
Appendix: Computer programs
The following is a list of those programs that arc mentioned in this book or
that implement methods and procedures discussed io the text. More complete
listings of phylogenetic compu ter programs and packages can be found at:
h up: / / evolution .ge nelics. washington.ed u / phyl ip / software. Mml
http:/ j www.nhm.ac. uk/ hen n ig/ sortwa reo hlml
Farris. J. S. (1988). Hennig86 version 1.5. MS-DOS program. Published by the au thor,
Port Jefferson Slalion, New York.
Goloboff, P. (19960), PIWE version 2.51. MS-DOS program. Published by the au thor,
San Miguel de T ucuman , Argentina.
OoloOOff, P. 0996b), NONA version /.50. MS·DOS program. Published by the autho r.
San Miguel de Tucuman, Argentina.
Maddison, W. P. and Maddison, O. R. (1992). MacCklde 3.0/. Macintosh OS program.
Sinauer Associates, Sunderland, Massachusetts.
Nelson, G . J. and Ladiges, P. Y. (1995). TAX' MSDOS computer programs for
systematics. MS·DOS program. Publisned by the authors, New York and
Melbourne.
Nixon, K. (992). CLA DOS tJCr.Swn 1.2. MS-DOS program. Cornell University, Ithaca.
Page, R. D. M . 09(3). COMPONENT veman 2.0. MS·DOS program for Windows e .
The Natural History Museum, London.
Ramos, T. C. (19%). Tree Gardener ucn';oll 1.0. MS-DOS program . Museu de
Zoologiaj USP, Sao Paulo.
Siddall, M. E. (1996). Random Cladistics, ucrsiQl1 4.0.3, Ohio edition . MS·DOS program.
Virginia Ins titute of Marine Sciences. Gloucester Point.
Swofford, D. L (1993). PAUP, Phylogenetic Analysis Using Parsimony, version 3.1.
Macintosh OS program distributed by the Illinois Natural History Survey,
Champaign, Illinois.
Index
Page numbers in ilillie refer to the glossary. charat lcr optimization
Camin- Sokat optimiution 77- 8
COS t matrice~ 78- 9
accelerated transformation (ACCrRAN) Dollo op timization 75- 7
72-3, 199 Film optimization 73- 5
Adams oonsensu.' I r"'~~ 141, 147, ISO. 161, 199 generalized 78- 9
amino acid sequence, and character weighting
Wagner optimization 70- 3
103 character polarity
ancestors
a priori de termination 64- 7
as paraph)'lc!ic taxa 13- 14
biogeography 61 - 2
problems in cladis tic ana lys is 14
corlStraincd two-step analysis 64- 6
apomorphy
de termination 48- 69
defini tion 200
de lcrmin alion 48- 9 fossils 61
ArchotoplN)'l. cladistic problems 13-14 fu nct ional Yalue 62
au tapom urphy 5 Herrnigian argumentation 48
definit ion 200 ingroup oommOfiality 60- 1
an d terminal lalla 5 on togenelic cri terion 54- 60
outgroup com parison 49-54, 59-60
and paedomorphosis 58
binary charaClcTs progression nrlc 61 - 2
coding of data 29-30,36, 168- 9 simultaneou~, un constrai nt:d analYSis 66, 68
definition 200 strat igrap hy 61
Biogenetic Law 55, ZOO. 214 and underlying synapomorphy 62-4
biogeography character selcetion 19- 20,35- 6
chara cter polarity 61-2 character Slah!S 24- 6. 201
and three-item sta tements ana lysis 184- 5 recognition 2- 5
bootstrap methods 200 transformati on 30- 1
clade support 129- 31, 137 eharac1t: r weighting 99- 11 6, 208, 21 7, 220
branch length 200 and amino itcid sequence 103
clade su pport 126- 1, 131 a poJI~riOri 100, 102, 108, 110- 116, 220
branch support, see Bremer support a ptiori 100- 10, 11 7, 220
branch-and-bound melhod, d adogram con- character analysis 100- 1
struction 4 1- 2, 2O(}-1 and cladistic consis tency 110
branchi ng diagrams, see cladog rums compatibili ty ana lysis 108- 10
bntnch-$wapping
dynami<; 106
cladogram construction 45-8,201. 217,219
and ge netic code 103- 4
w mputer program s 46
hypothesis depende nt, set! character weight-
Breme r support 127- 9,137.201
ing. a posteriori
hypot hesis independent, .I 'te character
Camin- $okal optimizatiOn 77- 9, 201 weighting. a pril)fj
characte r implied 112- 15, ~
definition 23- 6, 201 molec ular data 102- 8
characte r coding, Jet' codi ng of oata morphological dala 101-2
character consistency, set! co(15iSlency index and rescaled consistency index (rc) I I I
clmncter fi t successive t06, 111 - 12, 217
dadogrnm lcngth 92- 5, 116 three-ite m statements analysis 113- 7
measurement 92- 9 tree dependent. Jet character weighting, u
statiM ieli1 anal ysis 95- 9, 11 6 fXl$lcriori
clutrHctcr length, Jtt character opti miza tion tree independent, see chKrllctc r weighting, a
chllntctel liokage 29-111 priori
224 I"dex
characters cladogram collstruction
adjace nt 199 branch.and.bound method 41-2, 2()()- J
apomorphic 2- 3 branch-swappillg 45-8, 201, 217, 219
roogTuenl 8- 10, 27 effidency 39- 40, 42-4, 46, 48
consis!ent 8, 9, 10 exact met hods 39- 42
conti nuous 19- 22,35- 6 Hen nigian argume ntat ion 38-9, 208
definition 23- 6, 201 heuristic methods 42- 8
diagnostic 23-4 'hill-climbing' techniques 42- 3
directed 205 nearest-neighbour imerehangc (NNJ) 45- 6,
discrete 21-2
211
filtering 19- 20,37
parsimony 38-48
gener~l 56- 7
and homo logy 24, 26- 7 stepwise addition 43- 5
homoplastic 9, 10 subtree pruning and regrafl ing (SPR) 45- 7
!lon-adjacent 21/ tree biscetion and recon necti on (TBR) 45- S
oon-ovcrklppi ng 22- 3 cladogram lengt h
ordered 2/2 Camin-Sokal optimization 77
overlappi ng 22- 3 character fit 92- 5, 116
plesiomorphic 2- 3 definition . 8, 209
quali tative 20- t Dollo opt i ~i lation 75- 7
quantilalivc 19- 21 meas ure of cha racter fit 92- 5
systematic 23- 4 relationship to da ta 172- 3, 117, 185
transfor mation 24- 5 th ree-item statements analysis 172- 3, 177,
undirected 219 179, ISS
unordere d 220
Wagner optimization 71 - 2
see also binary characters; mullistatc
c1adogram support, and partitioned analysis
characters
dade stab il ity, see Breme r support
.,9
dade support cladograms
bootstrap methods 129- 31, 137, 200 and biogeography 61-2
branch length 126-7, 137,200 comparison with !Tees 15- 18
Bremer support 127-9, 137, 201 confidence limits liS
clade stab ility index (CSi) 133- 4, 137-8,202 criteria [or optimality ISO
' jackboot'method 133 definition 1- 2, 202
jackknife sampl ing 13 1- 3, 137,209 dete rmination of 19
Momc Carlo methods 129- 31, 137 fundalnelltal 139-40, 143, 145, 14S-9
randomization proccdures 129- 35 length 8, 209
statistical 126-35 minimal 177- 8, 210
topology-dcpendcnt permutation tail proba.
optimal 8
bility (T-PTP) 134- 5, 138,218
and phylogeny 118, 160
total support ind<.lx (Ii) 128- 9, 218
resolution by sim ultaneous anal ysis 160
dildislic analysis
and arbiuilry consensus 161 rooting 5,64- 8, 2/5
charact<.l r coding, see coding of dillR statistical analysis 118- 38
chilracter seh,!(\ion 19- 20 steps S
cond itiona l data combination 162- 4 coding of data 19,27- 37
maximum likelihood apl}roach 158 add itive 199
morphometric dat a 33-5 binary characters 29- 30, 36, 168-9
and parsimony 5- 10 discrete variables 27- 32,36- 7
pilrlitioned 151-2.155, 157- 60 gap' coding 33
problem of ancestors 14 gap-weightin g 33- 5
simulta neous 15 1- 2,155, 160- 2 information content 32
standard appmach 210 missing values 31 - 2, 79- 90
strict consensus trees 153- 4
morphometric data 33- 5
see also three-i tem state men ts ~lIlalysis
cladistic co nsistcncy, ~nd cha racter w~ighting multist"te ch.. r"ctns 2'1, JI), l(i9
110, 202 nuclcutidcs JI)
cl.uJistics three.item litlltemellt~ IIn!lly,~is 161(- 7t
dd illitilm I, III.~, 1M Cl'nnhillllbic wmpo~.nl~, Clln1l<IIXUS Ir~c~
three· tll •• m ~tllt~mtnt 2 143-'. ''''- . ,
Index 225
com puter programs el,tdogram Icngth 8, 209
branch-swapping 46, 018 consistency indcx (cO 95, 201- 4
characte r weighti ng 11 0- 11 , 114 constrained, two-M ep Hnaiysi5 2Q4
cladogra m length 8 data decisiveness (DO) 120, 136, 204, 219
conscns us Irees 142- 4, 147 00110 optimization 105- 6
exac t methods 39- 42 ensemble consistency index (Cn 95- 6, 206
heuristic methods 42- 8 en$C:mble re tcntion index ( R I) 99, 206
mi nima l cladograms 177- 8 epigenetic characters 54
missing val ues 31,82-6 gap-weighting 207
morphometric dat a 33 homo logy 26- 7, 208
permutation tai l probability (PTP) test 125 homoplasy 208
polymorphic variables 20 illgroup 38, 209
three-item sta tements analysis 170- 2, 177- 9 lin kage 210
condi ti unal datil combination. dadi~tk ~ na lysi5 mono phyletic groups 10- 11 ,210
162-4 mul tistate characters 211
consensus analysis, see consensus trees non-epigenetic charactcrs 54
conse nsus IrtM IOU, 132, 139-50, 203 ontogenetic critc rion 21 /
Ada ms 141, 147, ISO, 161, 199 ou tgroup 212
agreement sublrees 141- 8, ISO, 199 pa ra phyletic groups 11 , 213
arbit ra l)' 161 pa rtitioned analysis 213
and Brcmer support 127-8 pe rmu ta tion tail probabili ty (PT P) 213
combinable compol1c ntS 143- 5, 149,102 polyphyletic groups 11 - 12,214
common pruned 147- 8, ISO, 102 relationship 1
greatest agreemc nt subtrees (GAS) 148, re tcntion index (rO 97, 2/ 5
150, 207 three- item sta tements analysis 185, 218
majority-rule 140, 145- 6, 149, 161,210 transforma tion series analysis (TSA) 218- 19
median 140- 1,145 - 6,149,210,219 underlying synapomorphy 62, 219
Nelson 14 1- 2,146- 7, )49, 211 see also glossary
se mi -strict 140, 143- 5, 149, 202, 216 delayed transfo rma tion (OELTRAN) 73
strict 140-3, 149, 153- 4, 177- 8,216- 17 discre te variables, character coding 27- 32,205
three-item stalemenlS analysis 177-8 dist ribution of c1adogram lengths (DeW,
consistency index (d) statistical analysis 120- 2, 136
an d cha racter weigh ting III 1Xl1l0 optimization 75 - 9
defini tion 95, 116, 203- 4 defin ition 205-6
effect of numbe r of tlUll 96 00llo's Law 11
effect of uni nformative characters 96
mcasurc of character fi t 95, 116
problems 96- 7, 116
th ree-item statements analysis 179 echinoids, cladistic analysis 152, 154, 163
see also ense mble consistency index (Cn ensemble consistency index (CI)
de finition 95- 6, 206
co nstrained two-step analysis
charac ter pola ri ty 64- 6 me8suremCn! of characte r fi t 95- 6, 111
dcfini tion 204 problems 96-7
cost ma trices 32, 78-9 ond statistical ana lysis 120,122- 3
criterion of ehorological progression, ue th ree-i tem statements analysis 119
~ also consis tency index (til
progression rule
cytoc hrome b, partitioned analysis 158 ensemble reten tion index (RI)
defin ition 99, 206
llleaSlIrcment of characte r fit 99, 11 7
da ta decisiveness (DO), definition 120, 136, and st8ti5ticailinalysis 120, 122
104, 219 three-item Statements analysis 179
decay index, SEe Bremer support see a/so retention index (rO
deer mice, cladistic analysis 155- 6, 158 epige netic characters, defin ition 54
defi nitions
apomorphy 200
Ullw pnmnrphy 2(K) FIG/ FOG methnd, outgroup eomparisoll
bimuy charac ters 200 49- 53
chllflu;tcr 23- 6, 201 Fitch IIptimirot iclIl 73-9, 207
d. di 5tK:1I I, IHS, 202 c05t mAt rix 711- 9
chldtllr um 1· 2, 202 nlln·.ddit lve chlflw:,!g. 7,l- ~, l21J
226 Index
fossils milkweed butterOies (A III/lII';5), cladistic
chHllClt r pola ri ty 61 a na lysis 152-3,161, 163- 4
lac k of cc rtai n types o f dll ill 61, 165 min ima l cladogram s, three- ite m state ments
an alysis 177- 8
missing va lues
gap-weighting coding o f data 31-2,79- 90
de fi nit ion 107 and computer progra nlS 31,82- 86
morplwrnclric d:lI a 33- 5 deali ng wi th 82- 8, 165
gene ralized cha racte r o pt imization 78-9, 207 crfects o n cladogra ms 81- 2, 165
genes. partitioned analysis 157-8 and non-applicablt: character Slates 88-90
gene tic code rcasons fo r 79- 80
and characte r weighTin g 103- 4 three- item statem e nts a na lysis 170
degeneracy \I)] mo lecular data
giosSllI')' ]99 220
c haracte r weighti ng 1U1- K
grasshopper mice, clad istic ana lysis 155-6. 158
and part itio ncd ana lysis 157- 9, 160, 161
grea test agreement su btrees (GAS) 148, 150,
si:u o f da ta $C ts 159- 60
20' monophy letic g ro ups
defi nitio n 10- 11 , IS, liD
Hacekel. Biogene lic Law 55, 114 spc:c ifi calion.by synapomo rphy 14
I-I c nnigia n argume ntation Mon te Carlo me thods, cladc su pport 129-31,
and characte r pula rity 48 137
cladogram oonslru clio n 38- 9, 208 mo rphological dat ~, cha racter weig hling
heuris tic me thods, cladogram conSIUleticn 101 -2
42-8 mo rpho metric da ta
'h iJI.dimbing' techn iques, dad ogram clad istic ana lysis 33- 6
construction 42- 3 COlli ng of da ta 33-5
homoialog)', and underlyi ng synapomo rphy 63 gnp-we ightin g 33- 5
homology most parsimonious cladog rams, effects of miss-
and characters 24,26- 7 ing va lues 81-2
de fin ition 26-7, 208, 215 most pa rs imonious reco ustruct io n (MI>R) 71 ,
and synapom orphy 14,27 73- 5,78
lesling 14-15 mul listate characten; 21- 2,28-9
homoplasy COlli ng o f data 29,36, 169
and Camin- Soka l optimization n definition 211
de finition 108 th n:c-i tem state men ts nna lysis 111
and 00110 o ptimization 7S
measures of 92- 9, 111 , 116
an d most pa rsimon io us reconst ruct ion 7 1
nea rest-lI.!ighbou r int e rchange (NNl),
a nd Wagner optimi zatio n 71
cladogram cons truction 4S-6, 11l
Nelson consensus trees 141-2, 146-7, 149, 111
ingroup, ddin ition 38, 209 non-addi tive <:ha ract e~, Fi tch o ptimizatio n
ingro up commonali ty, chanlcte r pola riry 6O-t 73-5,220
non-applicable chara<:ter s tates, and missing
~alue5 88- 90
'jackboot' me thod, clade suppo rt 133 non-c pigenet ic <:harac tcrs. defini tio n 54
jac kknife sa mpling, clade suppCln DI-3, 137, nudeic adds, pa rtitioned ana lysis 157- 8
209 nucleotide seque nce ull ta, cha racte r wcightl ng
102- 8
nuc lco tides, character coding 36
length difference, SI'e Bremer support
lin kage , definition 110
l undberg root ing 67, 110
Oll toge ne tic crite lion
character pola rity 54- 60
maju rity· rul e trees 140, 145-6, 141}, 16t, 2/(} dcfil1i tinn 1.1 /
maximu m likelihood "PJlrna~ h , d adistil: mutt l1S (~), 6K
lUmlys is lSI! Onhlltcny
med ia n Cj\lIscn SIl ~ trn's J411- 1. 14~ ~h, 149,
'1111 , 211) ....
IIIW~"'5iI
I/ldex 227
outgroup rooli ng
all-plesiomorphic 67. 199 cladograms 5,64 -8, 21S
all -zero 67, 200 'hypot hetical ancestor' 67
definit ion 212 Lundberg rooting 67
ou tgroup compa rison midpo int rooting 6S
algo rithmic approach 52- 4 nntogenetic cri terion 60,68
and character pobrity 49- 54.59- 60.212 outgroup comparison 60. 67- S
rooting 60,67- 8 SI'f! al.~Q chamcter polarity
' fun ctio nal in group/(u nctio na l o ut ~roup '
(FIG/ FOG) method 49-53
M'mi-stric t conse nsus trees 140, 143- 5, 149,
202, 216
simu llltneOUs ana lysis, res()lu tion of
pacd0010 lphosis. cladistic problems 58 cladogrami 160
pa raphyletK: groups si multaneous unconstrained analysis,
as a ncestral groupings 12- 14 characte r pol ~rity 66, 68
defin it ion 11.15.211 sister_groups 1
and sy mpl c:sio morphy 12- 13 a nd apornorpllic chat:rCh:rs 3
parsi mo ny statistica l an31ysis
and cladistic .mal)'5is 5 - 10 cladograms 11 8- 38
cladogram con.~ truction 3S-48 oon~islency index (ei) 95- 7, Ill . 116, 179,
three-item stllte mcn ts ana lysis 180- 1 203-4
Set also most parsi mo nious cladograms; most da ta decisi\'enes.s (DO) 119-20. 136
parsimonious reconstruction (MPR) distribution of cl Adogram lengths (DeL)
partitio ned anal ysis 120- 2. 136
and ch:uaclcr congruence 160 ensemble consistency index (el) 95- 7. 117.
a nd cladisdc analysis 151- 152 120, 122- 3. 206
and dadogram support 159 ensemble retention index (RI) 99. 117. 120,
'-)'tochrome b 158 122.206
definition 213 incongruence between data sets 162
and molecular data 157-6 1 limitations 135
a nd phylogeny 160 permutation tai l probability (PTI'l 122-6,
and vica rillnee hiogeography 15 1 136-7
permuta tion tail probability (PTr) randonlization of data l1S- 19
de rinition 2/3 resca led consistency indcx (RC) I I I. 117.
problems 125- 6 120
stat istical analysis 122- 6 retent ion index (ri) 97- 9, I 11 ,116,179.215
phylogenc tic systematics I. 17, 213; see alS(} skewness 121 - 2
cladistics Te mple ton's non-parametric lest 162-]
phylogeny three-i tem stat e mentS ana lysis 179
and cladogmms 118, 160 sta ti stical s upporl, clades 126- ]5
and on togeny 55- 9 stepwise addit ion, datJogram construction
polymorphic tala, coding problems 89.91 41-5
polyphyletic groups, defin it ion I 1- 12, 15. 214 strat igraphy, character pola rity 61.2 16
progression r ul~, character po larity 61- 2.214 s trict consensus trees J4() - 3, 149. 153- 4,
177- 8,216 - / 7
subtree pruning and regrllfling (SI'R),
rei:ttionship, defin ition I cladogram cOllstmctiOI1 45-7
rescaled co nsistency index (Re) support inde~, see Breme r su pport
and c:ha racter weighting I I I, 117 symplcsiomorphy 3, 217
and statistical analysis 120 lind paraphylelic groups 12- 13
rclelllio n index (ri) synapolllorphy 3, 217
and challlcter weighting I II and cha racter polarity 62- 4
lkfinitioo 97, 215 evidence or relationships 168
measurement or chamcter fit 97- 9, 116 and homology 14,27
three -item stateme nt s ~ n J lysis 179 specification of monophy letic grou ps 14
.\'<',: (11,\'0 ense mhle rclent!un inLiex (RII
rilxll'Om'll RNA t"~~1fl s"mpling
~h.n~Il' t er wci~ht inll HI~, ltI7 ~Iim,"hics It.4 -5
Imr titi(H1cd IInIt1r,... ..C'".,."'. (u"s il. 6 1, ItIS
228 Index
three -item statements analysis trce bisection and reconneclion (TOR),
81;oommodaled 172, 199 dadognlm construction 45- 8
and binary characters 170- 1 u ees
nnd biogeography 184 - 5 comparison with cladograms 1.5- 17
cJadogmm length 172 - 3, 117,179, 185 terminology 17- 18
coding of data 168-71
comparison wi th standard approach 169- 70,
178-9, 180-4 unde rlyi ng synapomorphy
computer programs 170, 111-2,117, [18- 9 definition 62, 119
co nse nsu~ trces 177- 8 Bud homoiology 63
definition 185,2 18 ullioformative characters, efrect 00
minimal cJadograms 117- 8 consis tency index 96
and miSSi ng values 170 uni t discrimiuate compatibility measure
mu t,islale c h"racters 171 (UOCf,'1) 220
non-accommodated 173,211 unordered , haracten;, Jet no n-additive
non-independence of statemen ts 173- 4 characte rs
parsimony 180- 1
precision 1110-5 vicariance bklgeography, and partit ioned
principles 168 analysis J 51
statistical analysis 179 von Baer, recapitulation 55-6,214- 15
weighting 173- 7
three-taxon statement 2
topology-dependent permutation tail Wagner optimization 70- 3, 220
probabili ty (T-PTP), clade character lengt h n
support 134- 5, 138, 2/8 clatlogram length 71-2
lOial support index: (Ii), of clades 128- 9, 218 cosl matrix 78- 9
transformation se ries an alysis (TSA), and homoplasy 71
definition 118- 19 most parsimonious reconstruction 71

Cladistics - The Theory and Practice of Parsimony Analysis)

Uploaded by

Copyright:

Available Formats

Cladistics - The Theory and Practice of Parsimony Analysis)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cladistics - The Theory and Practice of Parsimony Analysis)

Uploaded by

Copyright:

Available Formats

•

Jan 1. Kitching, Peler L. Forey, Christopher 1. Humphries,

Oxford New York Tokyo

Oxford Is a trode mark uf Oxford Univmity Press

Cl The Systematics ASSOCiation, 1998

All rights reserved. No [Hlrt of this IfUblicatio~ay be

This booK is sold subject to the condition that il shall llot,

Ubrory 0/ Congress Clltalogillg ill Publication Dota

Typeset by 1"t:chnical Typeu/ling Irt'/und, &i/asl

List of authors XIII

1. introduction to cladistic co ncepts

2. Characters and character coding 19

3. Cladogram construction, character polarity

4. Optimization and the effects of missing values 70

5. Measures of character fit and character weighting 92

5.2 Character we ighting 99

6. Support and confidence statistics for c1adograms

6.1 Introduction 118

7. Consens us trees 139

7. 1 Int roduction 139

O. Simultaneous and partitioned analysis 151

9. Three-item statements analysis 168

Suggestions for further reading 196

1.1 DEFINITION OF RELATIONSHIP

Cladistics is a method of classificatio n that groups taxa hierarchically into

they purport to express genealogical units o r clades. The aim of cladistics is 10

1.2 TYPES OF CHARACTERS

Hennig made a distinctio n between two types of characters, or character

Sisler-groups are discovered by identifyi ng apomorphie characters inferred

r---------------- VERTEBRATA -----------------

' 3. !III rav"

3. IIorge d......al bone_

Hennig recognized a third character type, which comprises those characters

Cladistic analysis orders synapomorphies into a nested hierarchy by choosing

lAM'FEY SHAA< SALMON LlZAFU

However, in these two cladograms, we must assume that two or more

LAMPREY SHARK SALMON LIZARD

ancestor 'z' and lamprey + y + shark + x + salmon + lizard. In this panicu-

MOSI systematists would agree thai recognlzmg monophyletic groups is

so on). But all of these paraphyletic groupings are based on symplesiomor-

MONOPHYLETIC Homology. ',",pom",h, ,,~ ~

POL VPHVLETIC Homoplasy

homology is at the heart of cladistic analysis. It is important to note, however,

1.5 CLA DO GRAMS AND TREE S

distribution. The nodes of the branching diagram denOie a hie rarchy of

may depend on our willingness to regard one taxon as ancestral to olhers.

1.6 TREE TERMINOLOGY

Having made the distinction between c1adograms and trees, it needs to be

1.7 C HAP TE R SUMMARY

I. The clad istic concept of relationship is relative. Taxon A is more closely

2.1 INTROD UC TIO N

Cladistic analysis consists of three processes: discovery or selection of charac-

The recommendation to reject quantitative and continuous data in favour of

2.2.1 Qualitative and quantitative variables

Table 2.1 Examples of qualitative variables expressed as discrete

2.2.2 Discrete and continuous vari.ables

The terms discrete and continuous properly refer to mathematical properties

2,2.3 Overlapping and non-overlapping characters

2.3 CLADISTIC CHARACTERS

The concept of a character is ill defined in cladistics and has a multiplicity of