Cladistics - The Theory and Practice of Parsimony Analysis)
Cladistics - The Theory and Practice of Parsimony Analysis)
Cladistics - The Theory and Practice of Parsimony Analysis)
-
." .._
'n.._ ....
.. _01:....._ _ _ _ ..
...."_t<ono
-........., ...
J- _ _~--.
__
~"~
ol _ _ ",.
P"""'
.. 010.....
_ dod ... """""'....,,,...,..,
to oa._h pro<odo......... .. . __.... _- -
...., ._
...t.o , c-,_""'~""o _ . AbiPIY_
1>00< 1'0< . 11 _ ..... ..... w oo ...""""'" _",. ......... of ' .. :IO<h
« ",",y!' ,.C..... v• .,.. 1<1<0., Cr., ........
.... lihc II
'AII<h>"'tn ... <.",;o...q;!l. , ,,at.d ,. , To ...
",oM "'"" "' """ ""
,...", .. mi<lom. of o.... dod...., _ _ ,~ ~
.. ... _
",..... ....... loo_""'dwrthooo"<,......-,· ~
~"
J - . ( ;, .......... _ s , . u - .... /l< ~
:;" --- /(
""= §
THE IYITmllC! AIIOCIATION PUBLICATION NO. II -
o
o OXfORJ) U!'lIV~R$1TY I'IU!SS
Cladistics
Second Edition
The Theory and Practice of
Parsimony Analysis
A call1iogut: rr:cord lor Ihis book is ulillilllblt: from tht: Brilish Libmry
approach to the asse mbly and analysis o f data, irrespective of its source.
Hence there arc no chapters dea ling expressly with fossils o r with molecular
sequence dat a.
Instead, we have o rganized the book into ni ne chapters, beginning wit h a
d iscussion of basic princi ples and concepts. The next six chapters fo llow the
sequence o f events in a cl adistic analysis. Chapter 2 concerns characters and
coding, thaI is, how we proceed from observat ions o f organisms 10 an
al phanumeric data matrix. Chapte r 3 dea ls with cJ adogra m construction fro m
the da ta matrix and cJadogram root ing, toget her with the related lopic o f
polarity de ter mination. C haracter optimization and the effec ts of missing
data are considered in Chapter 4. C hapt er 5 addresses character fit and
weighting, while Chapter 6 provides the fi rst compre hensive overview of
cladogram su ppo rt and confi dence statistics. Chapter 7 discusses methods of
consensus analysis. T he pros and cons of simultaneous versus parlitioned
analysis form the subject of Chapter 8, while Ckapte r 9 discusses three- ite m
slatements analysis, a rece ntly deve lo ped method that unfo rtun ately has been
the subject of much partisan and opaque writing. Finally, in response to
numerous requests from students, we have included a glossary.
This book is the collective responsibility of all fou r authors. However,
preparatio n of the first drafts of th e text was unde rtaken as fo llows. Peter
Forey wrOte the sections o n basic concepts, missing va lues and simultaneous
ve rsus pa rlitioned analysis. Chris Humphries wrote the sections on charac-
ters, coding and consensus trees. Ian ](jtching wrote the sections on dado-
gram construction, characte r pola rity and rooting, optimizatio n, confi dence
li nd suppor! statistics and the glossary. David Williams wrole the sections on
basic measures of fil , weighting and three-item statements analysis. Peter
Forey prepared the figures, with considerable and ca pable assistance fro m his
daughter. Ki m. Finally, Ian Kitching undertook the ro le of editor, with the
unenviable job of trying to marry the various sections into a si ngle coherent
and sea mless whole. Any trivial mis takes that remain ca n be laid at his door
but more fun damental disagreement s should be taken up with all fo ur of us.
The choice of weapons will be ours.
Many ind ivid uals cont ributed grea tly to this book. In particular, we would
li ke to thank G ary Nelson, Mark Siddall, Ka ren Sidwell, Darrell Siebert and
Dick Va ne-Wright, who critically read through parts o f the man uscri pt, and
particularly to Andrew Smith and an ano nymous reviewe r, who read it all. We
also thank A ndrew Smith for permission to use his illustrations of th e PTP
test, Bremer support and the bootstrap and to Springer International
for perm ission to reproduce the ill ustration of Ihe 5S rRNA molecule o f
Pet/illomollas minor. Wc also thank, most wholehea rtedly, the Systemat ics
Association for their continued support o f this project.
I.J .K.
I' ,L.F.
CJ .H.
O.M.W.
Contents
References 187
Glossary 199
Appendix: Computer programs 221
Index 223
Authors
Peter L. Forey
Dcpul1mcm of Palaeontology, 77le Natural History Museum, L ondon
Christopher J. Humphries
Deparlmcm of Botany, The Natural His/of)' Museum, London
Ian J. Kitching
Department of £mol1lolog)" The Natural History Museum, Lone/oil
David M. Williams
Department of Botany, The NaluraJ HislOry Museum, London
1.
Introduction to cladistic concepts
"""'
C
time
1
Fig. 1.1 Hennig's concept of relationship. For example, the lizanl a nd the sa lmon are
considered to be more closely related to each other than either is to the shark
because they share a common ancestor, ' x' (which lived al time ' 2)' th ai is not
shared with the shark or any other taxon.
A B c A B c o
"
,
('I (bl
Fig. 1.2 Plesiomorphy and apomorphy are relative terms. tal Character slate '8' is
plosiomorph ic find '3" is apo morphic. Stohl '0' is presumed to huve bflfl n present in
the ancestral morphotype tha t gave rise to taxa Band C. (b) Character state '3 " is
apomorphic wi th res pect to 'a' but plesiomorphic with respect to ·a~'.
slale 'a' is plesiomorphic and 'a" is apomorph ic. State 'a' is presumed 10 have
been prese lll in the ancestral morphotype th at gave rise 10 taxa U and C. In
IFig. 1.2b, 'a" is apomorphic with respect to 'a' but pJesiomorphic with respect
, - -- - GNATHOSTOMATA - - - -- --
I'
PISCES
~ OSTEICHTHYES
I , TETRAPODA l
LAMPREY SHARK SALMON LIZARD
D A B c
II . inl e'lliIl noSI r~
1 t . prlsm.tic
ca' tu.ga 8 . p..".d""'yllimb
I paired lint
Fig. 1.3 Cladogram fo r the lamprey, s h a rk, salmon and lizard. Monophyletic gtoUPS
are established on the basis of syns pomorphies (cha racters 1- 4), while autapomor-
phies (characters 5-12) define terminal taxa. Character 13 conflicts with this
hypothesis of relationships, suggesting instead a relationship between the shark and
salmon.. See Fig. 1.4 and lext for furtherexplana UOIL
the salmon, lizard, and shark, but symplesiomorph ies if the problem involves
the relationships of different spec ies of lizards or different of ""Imon.
ParsimollY 5
1.3 PARSIMONY
( a)
(b)
Fig. 1.4 Using parsimony to choose between two compe ting hypotheses of relation-
ship. (a) The shark and lhe salmon form a monophyletic Ij:rou p ba sed upon shared
possession of fin rays (character 13). Howover, this topology requires us to hypothe-
size tha i characters 3 and 4 Ilach aroS(! independently in the salmon and the lizard.
(IJ) Altema lively, the salmon and tho lizard form a monophyletic group based upon
characters 3 and 4, with charactor '13 now being considcffid homoplastic. This
d adogram is prefe rred to tha t in (a) because it is more parsimonious. Character 13
may still be a synapomorphy but a t a more incl usive level (albeit with some
homoplasy) Bnd is shown repositioned as such by the arrows.
of the characters but the other three taxa each have a different complement.
Characters 2 and 4 are autapomorphies, since they are each present in only
one of the taxa . They are uninformative for grouping taxa (they serve on ly to
diagnose these te rminal taxa). Cha racte rs 1. 3.5 and 6 arc potentially useful
beclIusc they are present in more than one taxon. G ive n the th ree taxa tha t
have pote nt ially informative information. there arc three ways in which we
cou ld arrange these taxa d ichotomously (Fig. I.Sb- d).
If we now place each of the characters, according 10 the roy
Pan'imony 7
( a) CHAR,.6CTEF6
TAXA I 2 3 4 5 6
A 0 0 0 0 0 0
B
•• • 0 0 0
•• ••
•• •
C 0
0
• 0 0 0
(b)r -_ _ (c) ( d)
A A ,-----A
B e ,----D
e D e
D B B
(e f-
) _ __ (f ) ( g)
A A ,-----A
,,
e
B .
, ,, e ~I--D
3 ,
Bill e
D
D ,, B , B
Fig. 1 .5 Explanation of pa rsimony in hHms of 1lna lysis of cha racter distributions. (a)
A data ma trix 01 six characters (1 - 6) distribu ted among four taxa (A - D). Plesiomor-
phic states are indi cated by open boxes, apomorphi c states uy solid boxes. (b-d)
The three possible resolutions of taxa 8 -0 relative to laxon A. (e) Placing the
r:ha racters on the topology in (bl requires seven steps. Chara cters 1-5 appea r only
once while character 6 appears twice. This is [he optimal, most parsimonious
solution, (fl Placing the characters on the topology in (c) requires nine steps.
Characters 1, 2 and 4 appea r on ly ollce but characters 3. 5 and 6 all appear twice.
This is a suboptimal solution. (8) Placing the characters on the topology in (d)
requires eight steps. Characters 1, 2. 4 and 6 appear on ly once but characters 3 and 5
both appear twice. Thi s is also a suboptimal solution.
specify, o n each of these possible cladograms (Fig. J.5e- g), the n we oblain
three d ifferen t resu lls. The cladogram in Fig. 1.5e shows that a ll but one of
tht! c ha racters ilppcars on ly o nce. Howeve r, in th is solutio n, we must assu me
that character 6 appears twice, o nce in ta xon B and once in taxo n C, which
arc not s i s tt:r·~rnup!l,
We can du the M. ~Jm':mc cJadol.!rilms in foil!. J..S f <tnd Fill. 1.5p.
8 Introduction to cladistic concepts
la)
CHARI\CfERS
T AXA 1 2 3 , ,
A
B
•• D
D
••
•• • •
D D
D
C
• D
•
D
E
D
D
• •D
D
D
D D
~j
I b)
.---- A
-'Ie: B
C
~c 2 E
D
2 E
Ie) ~,-- A
10 _, - - A
B B
C C
D D
2
E 2 E
. . ~ CQIISISTENT
m!; ~ = (:O\Iffil£NT
~ :. ~LAST'C Of ~TlN3
fig. 1.6 Relationships among characters. (a) A dala matrix of five characters (1-5)
distribu ted among five taxa (A-El. Plesiomorphlc sta les are ind icated by open
boxes. apomorphic states by solid boxes. (b) Character 1 is shared by three taxa, A, B
and C, which form the initial group. (el Character 2, present in taxa 0 and E.
specifies a different group from character 1. (d) Character 3. shared by taxa Band
C. spocifies a subset of the initial group. (e) Character 4, present in taxa A. B and C,
specifies the same group as character 1. (0 Character 5, present in taxa C and D,
specifies II different group C + 0 that conflicts with the initial group. With regard to
cha racter 1. characters 2 and 3 are consistent, character 4 is congruent, while
character 5 is in conflict.
specified by the other characten. In other words, the group that this charac-
ter specifies conflicta wilh IbuIc .-cified by the other characters. Character
5 ill&.id to be hom.....iIa..__.:.._ _
10 illlroollctioll to cladistic concept)'
,
. b, . .d. in
• = CCNSLSTENT
OJ =camUENT
~ "l-«:MYlASTC or CCJfl.CTNG
. 'jg. 1.7 Application of the character relAtionships shown in Fig. 1.6 10 a cladogrllm
of the lampre y. shark, salrnon and lizHnl. With reference to character 1, character 2 is
congruent beCAUse it specifies the same group. Characters 3 and 4 are consistent
because they sp(Jcify a subgroup of thll l specified by cl1I.lracter 1. CharacluT 13 is
aJ~o consistent becauso it specifies A subgroup, even though thai subgroup docs \lot
appea r in the most parsimonious solulion. Character 14 conflicts with characte r 1
and is homoplastic.
Returning to the feal example. we can recognize seve ral types of cha racter
interaction. In Fig. 1.7, and taking character I as the refe rence, character 2 is
congruent with it because it specifies the same group. Characters 3 and 4 are
consistent with cha racters I and 2 because they specify a subgroup of the
group specified by character 1. Character 13, which is shared between the
shark and the sal mon, is also consistent with character 1 because it specifics a
subgroup, eve n though that subgroup does nOI appear in the most parsi-
monious solution. Character 14, which is shared between the lamprey and the
salmon, conflicts with character 1 und is thus homoplastic.
1.4 GROUPS
As a result of the relative definition of relationship, He nnig identified three
types of groups, which he recogn ized on the basis of ancestry and descen t.
Using Fig. 1.8 as a reference the following groups m:ly be recognized.
I. A monophylet ic group contains the most recent common anccilur plus all
and only all its descendants. In this figure, such groups would be Incestor
' x' and sal nllJn + lizanl ; or ancr.:S\(lr 'y' and shark r ·'P77 !I-: ,IJ; {If
Groups II
FI;!!] M)f'OA-rvl.ETIC
~ PARAPHYlETIC
~ FOLVPt-('(LEllC
Fig. 1.6 The three Iypes of groups recognized by Hennig on !he bHsis of anctls!ry
lind descent.
been absent in the most recent common ancestor of the group. The group
lamprey + salmon, which might be recognized on the shared ability to
breed in freshwat er, would be considered a polyphyle tic group. Breeding in
freshwater in vertebrates might be considered to be an apomorphic char-
acter but this is inferred to have arisen on more than one occasion. The
character by which we might recognize it is non-homologous: it is a fal se
guide to rcJationship. No Linnaean taxon has ever been recognized for this
group.
BC charact II'S
A • e
ABC - feathers
ABC Be - pygostylG
ch!lract ars
(8) (b) (e)
"-ig. 1.9 Ancestors can not be distinguished as individual taxa because thoy are
wholly primitive with rospect to their descendants and thus have no features by
which they can be unequivocally recognized. (a) Three taxa, A. Band C, form a
group because they share ABC characters. Taxa Band C are sister-groups because
they sharo Be characters. If A is considered to be the ancestor of Band C. it can be
placed at the origin of Band C only if it lacks any distinguishing characters of its
own. Otherwise, it would be placed as the sister-group of B+ C. In other words,
ancestor A can only be recognized beca use it possesses ADC characters but lacks
BC characters. (b) Archaeopteryx, equivalent 10 taxon A in (a), is the traditional
ancestor or the birds. It has the synapomorphies (e.g. feathers) that are found in all
birds. including the ostrich (taxon B) and the raven (taxon C), but lacks the
synapomorphies of the oslrich+ raveu such as II pygostyle. In terms of cha racter
distribution, Archaeopteryx simply does nol exist. (c) To circumvent th is problem.
dadists place ancestors as the sister-group to their putative descendants and accept
that they must be nominal para phyletic taxa.
a pygoslyle. But, of course, there arc many other ani mals that lack the
pygostyle and there fo re this cannol be a distinguishing character of
Ardwcopteryx. In fael , to date. Archaeopteryx has no recognized autapo-
morphies. Indeed, if there we re, Archaeopteryx would have to be placed as
the sister-group to the rest of the birds. In te rms of unique characters,
Archaeopteryx simply does not exisl. This is absurd, for its remains have been
excavated and studied. To circumvent this logical dilemma, cladists place
likely ancestors on a cladogram as the sister-group to their putative descen-
dan ts and accept that Ihey must be nominal para phyletic taxa (Fig. 1.9c).
Ancestors, just like paraphyle!ic taxa in general, can only be recognized by a
particular combination of characters that they have and characters that they
do not have. The unique attribute of possible ancestors is the lime at which
they lived. After a cladistic analysis has bee n completed the cladogra m may
be reinterpreted as a tree (see below) and at this stage some palaeontologists
may choose to recognize these paraphyletic',taxa as anceslOrs, particularly
when they do not overlap in time with their putative descendants (see Smith
1994a for a discussion). The logical impossibility of placing real taxa as
ancestors in cladistic analysis has the further consequence that the ancestors
'x', ;y' and 'z', which have been placed on several of the figures, should be
considered as hypolhelical ancesto rs representing collections of characters.
Up to this point, groups have been described in Hennig's terms of common
ancestry. But groups are not discovered in this way. In practice, lhey a re
discovered through analysis of character distributions. So we must return to
characters to look aga in at the definition of groups. We can separa te
characters 0 11 their ability to desc ribe groups. Those characters that allow us
to specify monophyletic groups are synapomorphies. Monophyletic groups are
discovered by finding synapomorphies. A very import ant conceptual leap
came when homology was equated wi th synapomorphy (Pat terson 1982). This
has import ant consequences, for it means that homologies are hypotheses;
nypotheses to be proposed, tested and, perhaps, falsified.
We ca n illustrate this by returning to Fig. 1.4. Let us assume that we have
arrived at the hypothesis of relationships shown in Fig. 1.4a. This hypothesis
recognizes that tbe salmon + shark is a monophyletic group, discovered by
suggesting that the shared possession of character 13 (fin rays) is a synapo-
morphy o r an homology. However, this hypothesis can be shown to be fal se
because other synapomorphies suggest that the salmon and Ihe lizard form a
monophyletic group recognized by the shared possession of characters 3 and
4. The original hypothesis of ho mology has been tested and shown to be fal se.
It may st ill be an homology but at a higher bierarchicallevel, as shown in Fig.
lAb, where it specifics the larger group of shark + salmon + lizard. However,
as an homology of the group shark + sal mon, it is fal se.
So, an hypothesis of homology is tested by congruence with other charac-
ters. It should be obvious from this that homologies are con tinually being
tested (t he three tests of homology are discussed in Chaptor 2). DUcovcry of
Cladograms alld trees 15
PARAPHYLETIC Sympteatomorplly
Fig. ) .)0 The threo types of groups recognized by Hennig defined in terms of
character distributions. Monophyletic groups are discovered through homologies
(sYllapomorphies); para phyletic groups are those based upon sympiesiomorphies;
polyphyletic groups are fou nded upon homoplastic characters.
Throughout this chapter we have been slowly moving away from Hennig's
evolutionary explanations for concepts of relationship, characters and groups.
To conclude this chapter, we must make the important distinction between
c1adograms and trees.
The relationships botweell Ilmprey, shark, salmon and lizard, is drawn in
Fig. 1.3 as a branchiq: diapam- a dldogram. A cladogram has no implied
lime IXi•. II ia simnIY. lhl .ummarizes a pattern of cha racter
16 Introduction to cladistic concepts
(a)
0
l amprey
•
Sham
•
Salmon
•
lizard
00
1 1. 0 18. • 11
01 ( D (A (B,C))
(b)
• • • • •
y
0 0
• • • • • •
•
• •
0 0
Fig. 1.11 Cladograms and trees. (8) Uw dadogram from Fig. 1.3 depicting the
rela tionships of the lamprey, shark, sa lmon and lizard redra\'IITJ as a Venn diagram
snd In parenthetic notation. (b) .Five of the 12 possible trees that can be derived
from lhe cladogTam in (e).
in which each of the terminal taxa is fIXed al the same dist:l ncc from the rool
by assuming a constant molecular clock.
2.1.1 Filtp.rs
'rhe 1l10st ohvious filter in d :'ldi ~t i c analysis is that which rejects attributes
thai are cnntinuQU$ lind quanlitlltivc and favours instead cha racters Ihal are
di$crClc and qualitatiye. The problem with all charact ers i$ determining those
tMt IIrc chItJis.ticaliv. u.IOJYlIIId .......Jh._1 IIJC not. In I!cncral. continuous and
20 C/UlnlClers and character coding
quantitative characters are considered not to be cladistic but to vary pheneti·
cally. There are many reasons why we favour discrete characters and consider
conti nuous characters unsuitable for cladistic ana lysis. Quantitative charac-
ters are difficult to describe fully, requiring means, medi ans and va riances to
establish the gaps. Using on ly a portion of the described character (e.g. the
mean or the median) raises the question of what to do with the rest, but
matrices generally require that each value in the matrix be represented by
single discrete alphanumeric value, although polymorphic variables can be
analysed in certain computer programs (e.g. PAUP and MacCl ade),
2.2 KI N DS OF CHARACTERS
t Length measuremenl
2 Ralio.
~ Cranston and Humphries' (1966) recoding of Saelher (1976).
4 Thiele and Ladiges (1988).
~ Chappill (1969).
6 KJugo (1989).
7 Begle (1991).
8 Kraus (1988).
\I Laconic and Stevenson (1991).
10 Gaffney et al. (1991).
form and another, then the variation wouJd require some quantitative expres-
sion to do it justice. If two taxa each have a range of leaf shapes along this
cO nlinuum and the ranges overlap, the character would more often than not
he labelled quantitative and rejected from cladistic analysis. The terms
quantitative and qualitative are often used in this sense as synonyms for
overlapping and non-overlapping ranges in variables.
To>"" 1 T...... 2
I
M:nislie
Binary
II
Fig. 2.1 Overlapping and non-overlapping patterns of varia tion in continuous, meris-
ti c and binary da ta. (After Thiele 1993.)
integers, directly scored into the matrix or rescaled) and molecular data (e.g.
nucleotide sequences, ACGT / U).
While qualitative, quanlitative, discrete and continuous are useful terms, the
degree of overlap among them is the crucial properly (Fig. 2.1), Although it is
implied that overlap can occur only between continuous characters, both
continuous and discrete characters can exhibit different degrees of overlap. It
is the degree of overlap that makes the distinction in filtering betwee n
overlapping and non-overlapping characters. The filte ring proscription can be
set to particular values for any given character. For example, it can be set to
select only those characters that show no ove rlap, rejecting all others. The
problem re mains, however, because, in reality, we have a sliding scale from
widely overlapping characters to widely disj unct characters that have d iscrete
gaps between th em. The required filter in these situations is a cut-off point
where the critical value might be scored for no overlap or any other
arbitrarily selected valu e. The problem is that this makes the fi ll eri ng of
characters highly susceptible to sampling error and the cut-orr points between
charact ers quite arbitrary. In reality, we should be able to grade characte rs
along a sliding scale and develop methods of toping illl dmere dc&!,ces of
Cladj,~lic characters 23
overlap, The sliding scale could recognize non-overlapping data as better than
overlapping dala and that the latter should be used only when the former are
unavailable. This approach matches the continuum of degree of overlap with
a continuum from beller to worse (Chappill 1989), rather than forcing it into
the general division of good and bad characters (Pimentel and Riggin s 1987;
Thiele 1993),
d. Characters as homologues
A character is 'a theory that two attributes which appear different in some way are
nevertheless the sa me (or homologous)' (Platnick 1979).
'If ... characters are hypotheses of homology and synapomorphy, then they must be
relationa l, and Ihe units of these relations are three-taxon sta tements' (Nelson and
Plalnick 1991).
'Cladistics is a discovery procedure, Bnd its discoveries are chBractara (homolo~ies)
and taxa' {Nelson and Patterson 19931.
Cladistic characters 25
2.3.2 Character transformations
Nevertheless, for characters or character states to be cladistic, and hence be
features of taxa, they must be scorablc in a data matrix and contain some
pattern for hypotheses of relationships of taxa to be discovered. For evolu-
tionary biologists, characters transform from one condition into another. For
example, Wiley's (1981) definition recognized that features of organisms are
the products of evolution and hence have arisen as changes in ontogeny and
transformation through time. However, there is a problem because this
definition is one of transformation of one character or character state into
another within organisms, rather than of an homology of any particular
group. Thus Wiley used his definition to describe cbaracters of Chordata and
Vertebrata, which are clearly taxa consisting of many individual organisms,
rather than transformation s of features within organisms.
Pime ntel and Riggins (1987) were stricter but less rigorous in their defini-
tion when they stated that a character can only be a feature of an organism
when it can be recognized as a distinct variable. Their definition is also
problematic because it, too, is tied to features of organisms rather than taxa
and, like Wiley, they go on to discuss coding variables for taxa rather than for
organisms. Farris (quoted in Miner 1980) showed that determining characters
was an inductive process when he stated that ' morphologists do not sample
characters, they synthesize them'. The Pimentel and Riggins definition is
based on that of Farris et at. (1970), who made it clear that, in order to be
able to determine characters for phylogenetic reconstruction, it was necessary
to recognize that they were mutually exclusive states that could be considered
transformations with a fixed order of evolution. Farris et al. thus redefined
Hennig's Darwinian interpretation, that characters transform from one state
lO another, as a series of axioms. All of the definitions in Table 2.2a- b
confuse the relationship between organisms and taxa and the problem
remains as to what diagnoses taxa when definitions refer 10 aliribUies of
organisms.
2.3.4 Homology
The desire fo r cladistic characters 10 express large, clear-CUI differences
between taxa does nOI go far e nough in determ ining which character states
become grouping homologies. To an evolutionist, homology is defined as th e
sa me structure inherited from a common anoctstor. TIlUS 10 Hennig, hypothe-
ses about characters (synapo mo rphy) and hypotheses about groups (mono-
phyly) both appealed to ancesto rs fo r their justificatio n. This concept was
shown to lead to circular reasoning because both hypotheses for characte rs
and hypotheses for groups appeal to the same mysterious non-empirical
ancestors. The solution came with the so-called ' transformation of cladistics',
which allowed hypotheses about character states (homology) to give hypothe-
ses about groups (hierarchy). The method is empirical in that the re are no
appeals to ancestry for the determ ination of monophyletic groups (Platnick
1979) and any interpretations about ancestry are derived fro m the cladogram.
For cladistic analysis to be successful, we consider that it is necessary not
only to have principles that do not assume transformation, but also to
describe characters as hypotheses of homology that can be tested (Table
2.2d). Ho mo logy is the core concept of comparative biology and systematics.
When comparing and con trasting the morphology and anatomy of organisms,
we break down our observations into traits or character stales as recognizable
feat ures of the whole organism. Characters and cha racte r states convey no
phylogenetic information until we recognize their existence in ot her organ-
isms through naming them (Patterson 1982). It is the act of naming charac-
ters and character states that establishes theories of homology.
A n hypothesis of homology recognizes that a characte r in one taxon
represents the 'same' fea ture as a similar, but often not identical, character in
another taxon. Structures that are identical in form , position and develop-
ment in twO or more organisms pose no problem, because diffe rent systema-
tists can agree that th ey represent the sa me entities, i.e. they have a clear
one-to-one correspondence. However, problems arise when structures have
diverged in form so as to be o nly vaguely similar or when different develop-
mental pathways arrive at similar struct\1rOl. Proposina hypotheses of homol-
ogy becomes critical when we come 10 define •....ne • amonS structures.
Similarity of form i, not a crilorioa of .. .....,. but il first-order
hypOIhcsis. Fo r an homol 10 Ihe leulUre
Characler coding for discrete variables 27
In question must also occur in the same topographical jXlsition within the
organisms being compared and also agree with mher characters ahont rela-
tionships of taxa (character congruence), a lest that can be applied only after,
br during, cladistic analyses (Table 2.3).
The congruence test equates homology with synapomorphy. Characters
-hal fit to a ciadogram with the same length pass the test, whereas those
cquiring more steps are deemed homoplastic. Thus, determination of homo-
gues becomes an empirical procedure and the final arbiters of homology
rc the characters and character states themselves (Patterson (982). Conse-
ucntly, the more characters that are included in the analysis, the more
~
c m a nding the test of homology becomes. This aspect becomes important in
ater considerations of simultaneous analysis or so-called lolal evidence (see
hapter 6).
v w x y z
• o • o
Fig. 2.2 Differen t ~xJlrossi o li s of conflic ti ng characters in five taxa (V-Z): absen t.
ro und and black, round and white, squ are and black. squa re and white. (After Plei lel
1995.)
Table 2.5 shows how these characters and character Siaies might be
distributed among five differe nt taxa (V-l.). Coding method A assumes
interdependence be,tween the main fea tures and codes everything into a
single muitistale character. Cod ing method B treats colour and shape as two
qu ite separate characters but includes an extra slate (0) to account fo r
absence of each character. Coding method C is similar to B but treats
Table 2.4 Four coding methods for the features shown in Fig. 2.2. Characters are
labelled with integers in bold a nd character sta te codes as integers ill pa rentheses
(0) MelllOd A: formula codinil 0$ one mullistote character with Jinked stoles.
' ,_D_~_ht_~_~_.~_~_re
V 0 00 O?? 00000
W 1 11 100 11010
X 2 12 101 11001
Y 3 21 110 10110
Z 4 22 111 10101
presence and absence of any character state as a separate character and thus
has three columns. Coding method D assumes that all five character 'sta tes'
are independent characters. Following Pleijel (1995), these coding methods
are discussed under the four different headings of character linkage (depend-
ency between characters in a single matrix), hiera rchical dependency, missing
va lues and information content.
• b ,
0-)1-)2
0-)2-)1
1 - .0-)2
1-)2-)0
2-)1-+0
2-+0 ..... 1
0..-1---.)02
04-2_ 1
1 ...... 0---.)02
Fig. 2.3 (a) The nine possible transformations for a muilistate character with three
sla les; 0, 1, 2. (b) Tho three allowable transformations between three states following
imposition of the order shown in le).
Table Z.6 Sankorf cost matrix (see lext for information). Character codes
follow Table 2.4; absent (o}; round and black (1~ round and while (2},
square and black (3~ square and white (4). Cha racter costs are shown
as 0, 1 or 2 in the ma trix
0 1 2 3 4
0 0 2 2 2 2
1 1 0 1 1 2
2 1 1 0 2 1
3
4
1
1 ,
1 2
1
0
1
1
0
All character stales (as used in cladistic analysis) Clre frequency distributions
of attribute values over a sample of individuals of a taxon (Thiele 1993).
Consequently, there arc many situations in which continuously variable
morphometric data have to be considered as cladistic characters. evcn when
the taxa have overlapping frequency distributions. It is quite possible that
continuous values, however opaque in the raw form , contain grouping
homologies when identified through discrete coding and cladistic anillysis.
Continuous variables shoult1 Oilly ue t:xdudet1 ir the dat1islic <lualysis cauuot
handle such data or if it can be shown empirically that those characters
convey no information or phylogenetic signal relative to other characters in
the data matrix.
2.5.2 Gapoweightina:
The following method tra. 1biI1e (1993) is one of several methods that
provides more thaD limtM pp<odina by adding a weight code. Gap-
~Iah lina Ullel addlitYl! ... &0 each code so that the score in
34 Characters a/ld character codillg
A B C 0 E F G H
(81
(bl
f\:JfJ\J\JY\
". 17.
35 46 73 80
" '"
(cl •
I I
0.7
I
2.5
I
2 .' ,.I 5.5 7.'
I
,
(dl 0 3 3 4 6
• 9
Fig. 2.4 Example of coding using the gap weighting method of Thiele (1993). (a)
Frequency distribution curves for eight taxa, A-Ii. lb) Means for the taxa on the
attribute sca le. (c) Values sca led to 0 rII oge Of 10 (0- 9). (d) Integer coding for
analysis using Hennig66.
the column of the data matrix not only relates to the position of each slate
relative to every other stal e over the range, but also mainta ins the relative
sizes of the gaps between them . A suitable resca ling fu nction is also used to
allow the full range of integers that can be ha ndled by a given cladistic
computer programme to be used and thus ensure tha i as much of the raw
attribute data as possible is utilized in tbe codes.
x s - (x - mi n/ max - min)"
where x is the raw datum, Xj is the standard ized datum a nd " is the
maximum number o f ordered states allowed by the cladistic computer
progra m (Fig. 2.4b).
3. Code the values as the rounded integer of the standardized values (Fig.
2.4c, dl.
4. T reat the character as a n o rdered multistatc fo r a nalysis (Fig. 2.4d).
invariably produce cladograms with lower levels of fit than qu alitative charac-
ters. Clad ists wishing to use continuous characters have employed various
procedures in order to include them. There is often significant covariation
with other characters, even when continuous characters are recoded as
ordered multislates, which suggests thai they do tend to operate as linear
series. In many cases, morphometric and qualitative characters are found to
map simila r phylogenies and be informative about those phylogenies. It is
likely that morpho metric data will continue to be used most often in studies
of closely related taxa, whil e presence/absence characters will be used in
studies of higher ranking taxa. T he judgement that all morphometric data is
garbage is unnecessarily harsh and it is still open to debate what constitu tes
reliable evidence in clad istic analysis.
2.6.2 Coding
There are two schools oC thought on codirlg methods: those that advoca te
absence/ presence coding and those that consi.der additivity and multistate
coding as appropriate for d iagnosis of taxic relations. Absence/presence
coding is invariably binary and contrasts presence against absence. There are
very Cew cases oC true absence/ presence coding in the literature. most likely
due to linkage problems (Pleijel 1995). The most commonly used Corms oC
coding are methods A, Band C (Table 2.4). Pimentel and Riggins (1987)
considered that aU cladistic character.; should be treated as mu ltistates and
ideally coded as multiple column additive binary characters in order to
distinguish a pdori between linear and branched character state trees. They
considered that character states cannot be treated as simple, nominal va ri·
abies because redundancy is introduced and in format io n content is sacrificed.
In other words, additivity is a Corm oC inCormation and there arc many
reasons to be sympathctic to this viewpoint, especially in special cases, such
nucleot ide sequences. Here, the nucleotide codes (A, C, G, T) are invariably
considered as alternatives in multistate columns. Application of absence/
presence cod ing has yet to be considered in molecular systematics and there
is no body of opinion that considers base substitution as anything other than
a special form of character state transformation.
The issue of whet her one uses multistate or binary coding revolves around
the issue of transCormatio n between character states (Wilkinson 1995). Devo-
tees of multistate coding accept that characters should be treated as transfor·
mation series and that hypotheses oCadjacency between similar, but different,
character stales should be coded a priori in the character state matrix.
Transformation series analysis (TSA) is perhaps the most elaborate manifes·
tation of this method (Mickevich 1982). In contrast, absence/ prcsence cod ing
is a more si mple and straigh tforward approach than any oC the alternat ives.
Every variable is kept separate as a potential synapomorphy to be tested
aga inst othe r coded observations. Although redundancy of addilionui libsence
scores may be a problem , there IIrc lldvanrages in not buUdina unwarranted
assu mptions into the data . The udvuntllge is thai i decisions
Chapter summary 37
(lpriori, character hierarchies emerge from the results (Pleijei 1995). Three-
item statements analysis (Chapter 7) takes the argument further and by
converting characters into minimal expressions of the kind A(BC), ai ms to
move away from ideas of transformation in cladistic analysis.
With small data sets that arc reasonably free from homoplasy, Hennigian
argumentation is quick and simple to implement. However, most data sets
have large numbers of taxa and characters, as well as grcater degrees of
homoplasy. which makes finding the most parsimonious c1adograms by
I-Iennigian argumentation extremely lime-consuming: Thlls, computerized
methods bave been developed that speed up the search for most parsimo-
nious or minimum-length c1adograms.
Cal
"
1 -d) D~ ~:::i:i:~:~
P'I& •• taT.bl,u clado8ram for the dllta
text for explanation.
III
40 Cladogram constnJCIion, character polarity and rooting
y
,
Y
------
C
Y
------
0
v • '0 C
, , ,
I I I
BEe 0
y,
8
~
,
£ C 0
y, B E Co C
~
, V,
II ceo •o , C 8 e 0 C
Y, • 0 ,
Y, V, C ,
8 C £ ' 0 BCD E C 8 0 C E BED C • 0
V, Y, V, Y, V, ~
,
Fig. 3.2 Ill ustrAtion of the exhaustive search strategy for determination of most
parsimonious cladograms. See text for explanation.
is outlined in Fig. 3.2. First, three taxa aTe chosen and connected to form the
only possible unrooted, fully resolved cladogram for these taxa (Fig. 3.2, top
cladogram). A fourth taxon is then selected (which taxon is chosen is
immaterial) and joined to each of the three branches of cladogram 1, yielding
three possible networks for four taxa (Fig. 3.2, second row). A fifth taxon is
then selected and added to each of the five branches of the three cJactograms,
giving 15 cladograms (Fig. 3.2, rows 3-5). This procedure, in which the 11th
taxon is added to every branch of every c1adogram (each of which contain
,, - 1 taxa) generated in the previous step, is continued until all possible
c1adograms for II taxa have been constructed. Finally, the lengths of all these
cJadograms are calculated and the shortest chosen as opti mal (most parsimo·
nious).
Unfortunately, searching for most parsimon ious cJadograms is what mathe·
mat icians term a ' hard' problem, that is, one that requires an exponentially
rising number of steps to solve as the size of the problem grows. T hat th is is
true of the search for most parsi monious cladograms can be appreciated by
inspecting the number of fully resolved, un rooted networks that mu st be
evaluated as the number of taxa increases (Table 3.2). For a c1adogram that
currently includes n - 1 taxa, there are 2n - 5 pouible poaitiOOIIO which the
nth taxon can be attached. So, wk.II .... are lOS A.IMr. IIIOI¥td. uncooted
Discovering the most parsimonious cladograms 41
1
2
3 1
• 3
,
S 1S
lOS
,
7
10395
94S
9 135135
10 2027025
11 34459425
12 654729075
,.
13
1S
13749 3105 75
316234143225
7905853580625
16 213458046676875
17 6190283353629375
18 191898783962510625
19 6332659870762650625
20 221643095476699771675
62 6.66409461 X 10 ~6
63 >10 100
cladograms for six taxa, there are over 2 x lOW topologies for only 20 taxa,
while the number exceeds 10100 for as few as 63 taxa. Thus, it is doubtful
whether exhaustive search is a practical option for problems with more than a
moderate number of taxa.
Fortunately, there is an exact method available that does not require every
completed topology to be examined individually- the branch-and-bound
method. The branch-and-bound procedure closely resembles that of exhaus-
tive search, -bul begins with the calculation of a c1adogram using one of the
heuristic methods described later in this chapter. The length of this clado-
gram is retained as a reference length or upper bound for use during
lIubsequent c1adogram construction. The branch-and-bound method then
proceeds in a similar manner to exhaustive search but now, as the path is
followed , the lengths of the partial networks are calculated at each step and
compared with that of the upper bound. As soon as the length of a partial
nelWork exceeds that of Ihe upper bound, that path of c1adogram construc-
tion is abandoned, becauM tUlttachment of additional taxa can serve only
to increase the le~ funber. By &lUI means, Ihe number of completed
cladoarama that mu t btl ,ntu.1II II "Illy reduced.
Once I partJc; III I.XU have been added, then the
42 Cladogram constmc(iorl, character ptJlarity and rooling
length of the resultant c\adogram is once morc compared with the upper
bound. If its length is equal to the upper bound, then this cladogram is
retained as one of the set of optimal topologies and the branch-aod-bound
process continued. However, if the length is less than the upper bound, then
this topology is an improvement and its length is substit uted as the new upper
bound. This substitution procedure is important because it enables subse-
quent paths to be abandoned morc quickly. Once all possible paths have been
examined, then the sel of opt imal cladograms will have been found.
It is impossible a priori to estimate the exact tim e required to undertake a
branch-aod-hound analysis, as this is a complex func tion of computer proces-
sor speed, algorith.mic efficiency and the structure of any hOmoplasy in the
data. Most branch·and-bound applicat ions employ algorithmic devices to
ensure the ea rly abandon ment of path searches and th us reduce computation
time. For exa mple, efficient heuristic me thods can minimize the in itial
estimate of the upper bound. However, a "l?ranch-a nd-bound analysis is still
time consuming to implement and should not generally be considered fo r
data sets compri sing large numbers of taxa.
because the various slopes may be too widely separated to be reached. Such
isolated clusters are referred to as 'islands', If such islands exist, then one way
to maximize the chance of reaching the true highest summit, the global
optimum, is to take several randomly chosen starting points and choose that
which leads· to the highest summit, with or without Icaping.
Although the hikers analogy may seem frivo lous, its terminology can be
translated directly into that of searching for minimum-length cladograms.
The highest peak is the set of minimum -length cladograms, the global
optimum. The sim plest computer algorithms merely make a single pass
through the data and construct a single topology. This is equivalent to
following the local gradient from where the hikers stan ed. However, although
the judicious addition of taxa to the partial cladogram may improve the
outcome, the resultant cladogram is most likely to be only locally optimal
unless good fortune prevails. More complex routines begin with a single
topology, then seek to locate the global optimum by rearranging the clado-
gram in various ways. These branch-swapping algorithms are equivalent to
jumping between hills. But branch-swapping routines are constrained to try
always to decrease the length of the cladogram with which they arc currently
working. Thus, if the global optimum exists within an alternative set of
topologies that can only be reached from the current position by branch-
swapping longer cladograms than are cu rrently to hand, then the global
optimum can never be reached from that starting point. This is the ' islands of
trees' problem. [f multiple islands do exist (something that is usually unknown
prior to analysis), then we can endeavour to land on the island that includes
the most parsimonious solution by running several analyses, each of which
starts from a topologically distinct cladogram.
Stepwise addition
Stepwise addition is the process by which taxa are added to the developing
cladogram in the initial building phase of an analysis. Initially, a cladogram of
three taxa is chosen, then a fourth is added to one of its three branches. A
fifth taxon is then selected and added to the network, followed by a sixth and
so on, until all taxa have been included. There are various methods for
choosing the initial three taxa, the addition sequence of the remaining taxa,
and the branch of the incipient cladogram to which each will be added.
The least sophisticated addition sequence selects the first three taxa in the
data set 10 form the initial network and then adds the remaining taxa in the
order in which they appear in the data set. The increases in length that would
result from attaching a taxon to ellch branch of the partial cladogram are
calculated and the branch selected that would result in the smallest increase.
A variation on this pr(X:edure uses a pseudorandom number generator 10
reorder the taxa in tbe. ......t prior to cladogram construction. A more
elaborate procedure _iii by Farris (1970), which he termed the
'simple Eligorithn~ Elm t Ja chosen. usually the first taxon in
44 Cladogram co/tstrnclion, character polarity and rooting
the dat a set. Then the difference between this taxon and each of the othe rs is
calculated as the sum of the absolute di ffe rences between thei r characters.
Farris called Ihis the 'advancement index', The initial netwo rk is then
constructed from the reference taxon and the two other taxa that a[c closest
to it, i.e. those that have the lowest advancement indices. The remaining taxa
are then added to the developi ng cladogram in order of increasing advance-
ment index, wit h tics being broken arbitrarily.
In all of these methods, the order of taxon additio n is determined before
cJadogram construction is begun . [n contrast, Swofford (1993) has imple-
mented a dynamic procedure, wh ich he calls 'closest', in which the addition
sequence is continually reassessed as the cladogram is built. First, the lengths
o f the networks for all possible triplets of taxa are calculated and the shortest
chosen. Then at each subsequent step, the increase in length that would
follow from attach ing each of the unselected taxa to each branch of the
developing cladogram is calcul ated and th~. taxon / branch combination that
gives the smallest iJ1CreaSe in overall length is chosen. As in all methods, ties
arc broken arbitrarily. This procedu re requires much more computing time
than do the other addition sequences. In these, the number of increases in
dadogram length that must be calculated at anyone step is equal only to the
number of possible attachment points (branches). The dynamic procedure
multiplies this by the number of unplaced taxa.
No one additio n sequence works best for all data sets. The less sophisti-
cated methods arc quicker but their inefficiency results in dadograms that
may be rar from opt imal and subsequent branch-swapping may then take
longer than it might. Run times using dynamic stepwise addition may be
excessive for extremely large numbers of taxa but this is less problematical as
processor speeds increase. The random addition sequence is useful in that it
can provide a number of different starting points and thereby improve the
chances that at least one will lead to the global, rather than a local, optimum.
Random add ition can also be employed as a non-rigorous means to
evaluat e the effectiveness of heuristic procedures. If one runs 100 replicates
using random addition and the same set of most parsimonious cJadogra ms are
found each ti me, then one can be reasonably certain that these do represent
the set of globally optimal topologies for that data. However, if by the
hundredth replica tion additional topologies or islands are still be ing discov-
ered, then it is likely that there are even more remaining to be found.
Therefore, it is recommended that analyses be repeated several times at least,
with the input order of the taxa randomized between runs.
As me ntioned above, a major problem with stepwise addition is that one
cannot backtrack from a given position. Algorithms with this property are
termed 'greedy'. Essentially the problem is their inability to predict the
future, that is, which of several options at a given point will ultimately lead to
the best result. The placement of a taxon on a partial cladogram may be
OPtimal at that point. but may be seen lubaequently to have been luboptimal
Discovering the most parsimonious dudograms 45
once further taxa have been added to the network. Once a taxon has been
added to a particular branch of a partial cladogram, the consequences of that
decision must be accepted. The problem is most acute when ties occur early
in stepwise addition. Ties may arise because, at a given stage, the addition of
two or more taxa may increase the length by the same minimum amount, or il
may be possible to add a single taxon equally to two or more branches, or
more than one equally parsimonious topology is found. An 'incorrect' selec-
tion may then lead well away from the global optimum. But imagine how our
hypothetical hikers could improve their chances of reaching the highest
summit if they could climb more than one hill at a time. This is what heuristic
programs do when they retain more than o ne topology at a given step. These
may simply be the set of shortest partial cladograms at Ihat stage, but a fixed
number may also be selected. Then, suboptimal topologies may also be
retained. This procedure reduces the effect of tics because, to some extenl
(depending upon the number of topologies retained), each of the alternatives
is followed up.
Branch -swapping
Ln practice though, manipulation of addition sequence alone will generally
yield only a local optimum. However, it may be possible to improve on this by
performing a series of predefined rearrangements of the cladogram, in the
hope that a shorter topology will be found. These rearrangements, commonly
referred to as 'branch-swapping', are very hit-and-miss, but if a shorter
topology does exist and sufficient rearrangements are performed, then one of
these rearrangements is likely to find it.
Branch-swapping algorithms are implemented by all cladistic computer
packages. The simplest rearrangement is nearest-neighbour interchange
(NN I), sometimes referred to as lotal branch-swapping. Each internal branch
of a bifurcating cladogram subtends four ' nearest-neighbour' branches, two at
either end. In Fig. 3.3, these are A + B, C, 0 , and E + F. NNI then exchanges
a branch from one end of the internal branch with one from the other, e.g. C
with E + F. For any internal branch, there are just two such NNI rearrange-
ments. The procedure is then repeated for all possible internal branches and
the lengths ·of the resulting topologies calculated to determine whether they
are shorter.
More extensive rearrangements can be performed and such methods are
often referred to as global branch-swapping. These involve clipping the
cladogram into two or more subciadograms and then reconnecting these in
various ways, with all possible recombinations being evaluHted. 'Subtree
pruning and regrafting' (SPR) clips off a rooted subcladogram (Fig. 3.4) from
the main cladogram. This is then regrafted on to each branch of the remnant
cladogram in turn and &bIluath of the resultant topologies calculated. All
possible combinadODI of ......... and rearafting are evaluated. In contrast, in
'tree bisection and (.Fla. 3.5>, the clipped subcladogram
46 Cladogram constme/ion, character fJolarity alld rootillg
B o
A E
c + F
/~
B c B
o
A
A E
E
c
o F F
A E
C F
j 0
:>~ F
j E
0
A
B
C
F
Fig. 3.4 Example of branch-swapping by subtree pruning and regtafting. A rooted
slIbcladogram. A + D, is clipped rrom the main cladogram then rea ttached to
another branch (leadi ng to taxon F) to give a new topology. (Aftor Swofford and
Olsen 1990).
For example. reaching the most parsimonious cladogram may require passing
through a series of rearrangements each of which is the same length as the
preceding. If the current cladogram is replaced only if the new topology is
shorter, rather than simply being of the same length , then crossing these
'p latea ux of optimali ty' will not be possible. The solution to this problem is 10
re ta in all the mosl parsimonious solutions found during a given round of
rearrangements. Equally, if reaching the globa l optimum requires rearrange-
ment of cJadograms that are longer th an the bcst fou nd so far, enlrapment in
a loca l optimum will also occur. Again , the solution is to retain more than
o nc dadogra m, bul nnw to includc suboptimal topologies as wel l, in the
eltpect:tlion that one will lead 10 Ihe.: glOba l optimum. Out even TOR will fail
to lend from lin initial to 1M optimal ~ Iad()grulll if the differences betwee n
~ he two, whi~h may' btl In in pUntle parts of the dadograrns
48 C[adogrm n COllst me/ioll, characler po/Clrify and roOfing
B o
A E
c o
A E
c o
F
Fig. :1.5 Exampll> of branch-swapping by tree bisection and reconneCllon. The
eladogram is divided into two unrooled subcladograms. One subcladogram is
reruoled (between B and A + C) then reattached to the other subcllldogram (on the
branch load ing to taxon E) to give a ne w topology. (Alter Swofford and Olsen 1990).
"'''''
SA'","
SHAAK
SA,","
1( 11 Of I(2) """"
FRl3
""""
"""
"'''''
.ro
'","0
.ro
""'_
""'-
SHAAK SHAAK
SA,","
"''"'"
PER:H
""""
''''''
,...., """
,....,
Of" .ro
Fig. 3.6 Example of the FIG/ FOG method of ouI8rouP comparison. using the data in
Table 3.3, in which an ingroup of six gnathoslomes and one oulgroup, the lamprey,
arc scored for four unordered, mullistate characters. (a) The shark is established as
the first functional oulgroup because it shares stale 0 of character I wHh the
lamprey. Consequently, this state is interpreted as pleslomorphic for character 1.
However, while the remaining five ingroup taxa have been shown to fonn a
monophyletic group, it remains uncertain whether this clade. the Osteichthyes. is
supported by state 1 or state 2 of character 1. (b) Next, using the shark as fun ctional
outgroup to the Osteichthyes and ignoring the lamprey, state 1 of character 3 is
found to be apomorphic for tetrapods (frog. liza rd and bird) aud state 1 of character
4 to be apomorph ic for bony fish (salmon and perch). (c) Finally, using the bony fish
as functional outgroup to the tetrapods, state 1 of character 2 is seen to unite the
two amnioles (li7.ard and bird) into 8 monophyletic group tha t excludes the frog. (d)
The relative apomorphy of states 1 and 2 of character 1 can now be resolvod. with
state 1 supporting the monophyiy of the Osteichthyes and state 2 the monophyly of
Ihe Amniota. and the outstanding autapomorphic stales placed onto the fully
resolved cladogram.
o o 0 In In In In
(a)
rS
00 0 ., In In In
(b)
rS=-----,
o Olnlnlnln
Fig. 3.9 Illustration of the 'first doublet rule' for a binary character. (a) If the state in
the fi rst doublet agrees with thaI In the first outgroup taxon. then this state is
ussigned decisively to the outgroup node. (bl If the slate In the first doublet
disagrees with that in the first OUlgfOUp laxon, Ihen the slale assigned to the
outgroup node is equivocal.
hie. In fact. identifying the sister group is not so important. It is true thai this
taxon plays a major role, because its state will always be assigned to the
outgroup node. either decisively or equ ivocally, but more distal outgroup taxa
Rlso exert an effect. In contrast, use of the sister group to polarize characters
is sometimes criticized if this taxon is considered to be too 'derived', that is, it
has too many autapomorphies, to make comparisons with the ingroup mean-
Ingful . However, this is no justification for ignoring it and appealing to some
mUTe distant and supposedly more ' primitive' taxon.
More important are the precondit ions of both the FIG/ FOG and algorith-
mic methods that the relationships among the outgroup taxa are both
lpecified and flXed. Probleml may arise when the interrelationships among
the outgroup taxa are pardaLlJ or even wholly unresolved. Then, uncertainty
In outgroup relationships .. tnllllltod iDto uncertainty in the state assign-
I
~t It the ou~ • DOt the problem that it may first
S4 C/adogmm construction. character polarity and rooting
(.)
(b)
)
o o Or&tntnln
fig. 3.10 Illustration of the 'alternating outgroup rule', (a) If the sta tes of the first an,
last outwoup taxa agree, tben this state is assigned decisively to the outgfOUp nod(
(b) If the slate in Ih~ first and last outgroup taxa disagree. then the slale assigned t,
the o utgroup node is equivoca l.
appear, fo r we will argue later in this chapter (§3.2.5) that character polarity
is actually a property that is derived from a cladistic analysis, rather than an a
priori condition.
Fig. 3.11 The six fundamental ways by which an ontogenetic pathway can be
modified.
Although von Baer (1828) did nOI frame his rules in an evolutionary
context, he did make two areal contributions towards fo rging a strong link
'Iween orderly ontogen), .... pIiJIo,enetic inference. The first was his
ccognition that we would ...., upcc:t the ontogenetic sequence of an
rga nism to pau th fDund in the adults of its ancestors.
56 C/adogrum constrnctiotl, character polan'ty and rooting
Rather, during ontogeny, two taxa would follow the same course of develop-
ment up to the point at which they dive rged into separate lineages. Both
ontogenies WQuld then be observed, in general, to have undergone one or
more independent terminal substitutions or additions, depending o n the time
that had elapsed since differentiation and the amount of subsequent change
that had laken place .
But perhaps morc important was his second rule, wh ich stated that ontoge-
netic change proceeds [rom the more to the less general. This observation
was generalized by Nelson (1978: 327) into the foll owing definition of the
ontogenetic criterion for dete rm ining character polarity.
The applica tion of the ontogenetic criterion can be illustrated usi ng the
example of the vertebrate endoskeleton. The endoskeleton of the adult shark
is composed of carlilnge, while Ihat of the perch is largely made out of bone.
Given on ly these observations, no decision can be made as to whether
ca rtilage or bone is apomorphic. However, a study of the ontogeny of the two
taxa shows that while a cartilaginous endoskeleton is formed in the early
embryos of both taxa, only in the shark does this state persist into the adult
animal. [n contrast, later in the on togeny of the perch, the cartilage is largely
replaced by bone, which is the state found in the adult. In other words, a
character that is observed to be more general (cartilage) has transformed into
o ne that is observed to be less general (bone), from which it can be inferred
that a cartilaginous endoskeleton is piesiomorphic and a bony endoskeleton
apomorphic.
Alberch (985) and Kluge (1985) disagreed with Ihis interpretation of the
ontogenetic process, arguing that the valid ontogenetic characters were not
the observed states but the transformational processes between those states.
Thus, in the above example, there would be only a single ontogenetic
character, the transformation of a cartilaginous skeleton into a bony one. De
Queiroz (1985) objected to observed states in an ontogenetic sequence being
used as characters in c1adogram construction, because these ' instantaneous
morphologies' were abstractions from ' real' ontogenetic transformations. In
his view, phylogeny is a sequence of never-ending life cycles and thus the
evidential basis for inrerring phylogeny should be the ontogenetic transforma-
tions themselves, rather than the features of the organisms that were being
transformed. As a result , he concluded that there could be no 'ontogenetic
method'.
Kluge (1988) considered de Quciroz's concept of treating transformation s
as cha racters both to be incomplete ilnd 10 offer no advantaae Over describins
the life cycle in terms of a model of growth and diffcrentialkm (Kluge and
Character polarity and rooting 57
Strauss 1985). Furthermore, taken to the extreme, de Queiroz's approach
would reduce the entire organism, if not the entire living world, to a single
character with an immense number of stales (transformations) and there
would be no basis for comparative biology. In practice, de Queiroz adopted a
pragmatic approach, defining 'character' as a feature of an organism ' large
enough to encompass variation that is potentially informative about the
relationships among the organisms being studied'. However, this definition is
open to the criticism of how large is large and how one is expected to
determine what is 'potentially informative ' prior to conducting an analysis.
Further controversy regarding Nelson 's generalization of the omogenetic
'Criterion concerned what exactly is meant by 'general', for which there are
two interpretations:
• strict temporal precedence, so that the more general state is that which
occurs first in ontogeny; and
• the most frequently observed, so that the general state is the commonest
state.
Given a distribution of two homologous characters in which one, :t, is possessed by all
o( the species that also possess its homolog, character y, and by at least one OIher
species Ihat docs not, then y may be post ulated to be apomorphous relative to x.
From this standpoint, the only va lid informatio n that can be extracted from
on togenetic transformations is the relative generality of characters. However,
de Pinna (1994) regarded the information conta!nod iD &bel order in which
Character polarity and rooting 59
characters transform one into another to constitute ontogenetic information
per se. To ignore this orderliness was to overlook the essential systematic
information derivable from ontogenies. He considered that eliminating the
relevance of ontogenetic sequence information reduced the direct method to
a form of 'common equals primitive' procedure. However, as noted above,
this would only hold jf the more general character is equated strictly with the
commonest character, a correspondence specifically rejected by Weston, who
emphasized the hierarchical nesting of characters. Jt would thus seem that de
Pinna misunderstood the concept of generality as it is applied in both
Nelson's Law and Weston's generalization.
The plesiomorphic state will be more widespread within a monophyletic group than
will any onc apomorphic Slale. Therefore, the stale occurring most commonly within
the ingroup is plesiomorphic.
taxa A and B are more closely related to each another than either is to a
third taxon, C. Resolution of a three-taxon statement depends upon the
grouping information (apomorphy) being present in A and B and absent in C.
However, using ingroup commonality, a three-taxon statement can never be
recognized, let alone resolved, because this requires two of the three taxa
share the apomorphic state. (ngroup commonality is thus contrary to the very
basis of cladistic analysis.
S tratigraphy
Fossil taxa arc often held to bc of paramount importance in determining
character polarity using the stratigraphic criterion or the criterion of geologi-
cal character precedence, which states:
If onc charactcr stale occurs only in older fossils and another state only in younger
fossi ls, then the former is the plesiomorphic and the latter the apomorphie state of
that character.
Biogeography
Several cri teria have been proposed that use biogeographical information to
polarize characters. The most widely known is the 'criterion of chorological
progression ' (or ' progression rule', Hennig 1966). which postulates thai the
most derived species will be that found furthe st geographically or ecologically
(ro m the ancestral spedet. However, as a method for inferring character
polarity, the progresaioll rule ...,. . .raJ deficiencies, In particular, even if
I.Uopatrk speciation .... _ umed that the character states
62 C/adogra m cOlIstmc(ioll, character polarity and rooting
found only in the periphera l populations are apomo rphic. Vicariance bio-
geography is neutral regarding character polarity. FUrihermore, evidence is
required of the historical distribution or ecologica l requirements of the
ancestral species. However. given that ancestral taxa cannot be unequivocally
recognized, such evidence will not be fort hcoming. Such arguments invalidate
the progression rule as a means of inferring characte r polarity. Most cladists
would choose to use cladograms to test biogeographical hypotheses. If such
tests are to be independen t, then c\adograms must be constructed without
recourse to biogeographical information.
Underlying synopomorphy
Perhaps the most idiosyncratic model of character change is lhat of unde rly-
ing synapomorphy. Championed by Ole Saether, underlying synapontorphy is
defined as 'close parallelism as a result of common inherited genetic factors
causing incomplete synapomorphy' or ' the inherited potential to develop
parallel simi larities', although this potential may not be realized in all
descendants. Consequently, it refers to the occurrence of synapomorphies in
only some members of a put ative monophyletic group. Consider a highly
simplified cladogram of the Bilateria (Fig, 3. 12). Haemoglobin is known to
occur in only three of the many lineages: Tubifex worms, some chirono mid
midges and vertebrates. The application of standard optimization methods
(see below) would lead us to concl ude that haemoglobin had been indepen-
dently derived th ree times. However, the amino-acid sequences in the three
groups are all very similar and it wou ld seem most unlikely that such a
complex molecule could have arisen de 1I0VO o n three separate occasions.
Underlying synapomorphy would assert that we are miltaken when we code
the other taxa as lacking haemoglobin. These lM& an the relevant
CharaCler polarity and rooting 63
BlLATERIA
OIIroncmldae
othe< insect. V.,.telll1l!.
Fig. 3.12 Underlying synapomorphy asserts that the presence of haemoglobin (ind i-
ca tod by 1) in TubifeK annelids, certain chlronomid midges Rnd vertebra tes implies
support for the monophyiy of the entire Bilateria (1 -). Tha Iltck of observable
haemoglobin in all other Bilateria is not taken as evidence against this hypothosis
for it is argued that these taxa possess the unexpressed capacity to develop this
molecule.
genes, but the genes are switched oCf because it is simply not selective ly
advantageous fo r them to be ",ctive. It is this unexpressed capacity to develop
a fca ture that is the underlying synapomorphy. Thus, in this example, the
potential 'capacity' (expressed or nOll to manufacture haemoglobin can be
used 10 unite all the IUxa in Fig. 3.1 2 into a mono phyletic gro up.
Saether argued that the use of underlying synapomorphics had been
ildvocated by Hennig ( 966), who called them homoioiogies. However,
Ilcn nig considered homoiology to be eq uivalent to conve rgence, which he
specifically rejected as a valid tool for estimating cl adistic rela tionships. It
may indeed be true th at the haemoglobin gene occurs in all taxa in Fig. 3. 12.
Howcver, the monophyly of the Bilateria cannot bc supported by only
IIcalte red occurrences of an expressed gene, for Ihis is using the absence of
~S
Inr()fm aiio n (observations, characters) as evidence fo r groups. By invoking ad
lux: hypotheses to explain away con nict in the data as ' unexpressed', unde rly-
Ing synapomorphies provide a licence to group in any way whatsoever. Whi le
It is perfectly acceptable to use the occurrence of haemoglobin in TUb/fex
rms. chironomid mid," and vertebrates as evidence of the monophyly of
each gro up separately, tbIrt dillribulion of Ihis character does not provide
denlial IUpport for lho of &be Bilateria as a whole. It may
64 Cladogram constmctiorl, character polarity and rooting
members of the outgroup and a subset of the ingroup. For example, consider
a cJadogram (Fig. 3.13a) with three ingroup taxa (A, B and C) and three
outgroup taxa (D, E and P), the relationships among which are determined by
five characters, 1-5. Four of these characters are unique and unreversed
apomorphies but character 5, while offering some support to clade A- E, is
secondarily losl in taxon A + B. The cladogram is thus six steps long.
Subsequently, if two new characters, 6 and 7, are found that have the same
A
• c o E F
( a)
A
• c o E F
(b)
A
• c o E F A
• c o E F
(c)
A
• c o E F
66 C/adogram cOlIstrnction, character polarity and rooting
Once the set of most parsimonious cladograms has been found , c1adists
generally wish to test hypotheses of character transformation. The first stage
of th is process is character optimization, which is effected by minimizi ng a
quanti t), te rmed an opt im ality criterion. In Chapter 2, we introduced the
concepts of addit ive (ordered) and non-additive (unordered) characters. These
character types are equivalent, respectively, to the two basic optimality
crite ria, Wagner aDd Fitch opt imization. However, there are innumerable
other ways in wh ich characters may be constrained to change. We now
expla in how the most commonly encountered optimality criteria are imple-
men ted to give what is termed a most parsimonious reconstruction (MPR) of
character ch,mge. It should be noted, however, that the opt imality critcrion
applied to each character is actually decided prior to c1adogram construction.
Furthermore, because each optimality criterion implies different costs, mea-
sured as number of steps, these choices will exert a large innuence upon the
length, and hence the topologies, of the most parsimonious cladograms for
the data. Thus the reasons for choosing particu lar opl imalily criteria should
be clearly explained and justified.
number of character stale changes, and thus the length of the cladogram, is
independent of the position of the roo t. An unrooted cladogram evaluated
using Wagner optimization can be rooted at any point without changing its
length.
In order to determine the minimum number of changes for a character
using Wagner optimization, only a single pass through the cladogram is
required, beginning with the terminal tax.a and proceeding to the root.
Consider an unrooted cladogram (Fig. 4.1a), in which six taxa (A- F) show
four of five states of a multistate character (states 0, 1,2, 4). First, one o f the
terminal taxa (A) is chosen arbitrarily as the roOI (Fig. 4.1 b), although in
practice, an outgroup taxon would usually fulfil Ihis role. Optimization begins
by choosing pairs of terminal taxa. The slate(s) (termed the 'state set')
assigned to the internal node that unites them is then calculated as the
intersection of the state sets of the two derivative nodes. If the intersection is
empty, then the smallest closed set that contains an element from each of the
derivative state sets is assigned. For example, consider taxa E and F, linked by
internal node z. The intersection of their state sets, (2) and (4) respectively, is
indeed empty and thus lhe smallest closed set, (2- 4), is assigned to z, and a
value of 2 (I.e. 4 - 2) is added to the cladogram length. Similarly, the
intersection of (he state sets of tax.a C and D is also empty. Thus the state set
(1 - 2) is assigned to their internal node y, with I being added to the length. In
contrast, the intersection of the state sets of nodes y and z is not empty, as
each contains the value 2. This value is assigned to their internal node x and
nu increment is made to the length. We proceed in this way towa rds the root
until all internal nodes have been assigned state sets (Fig. 4.1c).
When this process is complete, the length of the cladogram will have been
calculated, which for Fig. 4.l c is 5. However, it can be seen that this method
does not necessarily assign states unambiguously to the internal nodes; that
is. it does not produce a most parsimonious reconstruction (MPR). For
ex ampl e, we are uncertain whether node y should be assigned a value of 1 or
2. In order to produce an MPR, a second pass through the cladogram must be
performed, this time starting from the root and visiting each internal node in
turn . If the state set of an internal node is ambiguous, then we assign the
Nlate that is closest to the state found in th e inte rnal node of which it is a
derivative. For example, nodes y and z are both assigned a value of 2 because
this is the va lue in both their state sets that is closest to the value assigned to
n(lde x. Notice al so that two changes (2 -+ 3 and 3 -+ 4) must be assumed to
have occurred between node z and taxon F. Once this process has been
I:ompleted, then the MPR has been found and all five steps required by the
I:haracter are accounted for (Fig. 4.ld).
It should be noted that this procedure (Farris 1970) will give a unique MPR
only when all characters .... frII ot homoplasy. In the presence of homoplasy,
lure than one MPR maJ eaiIL For eumple , Fig. 4.le, in which nodes x and y
are Hligned stale 1 rather bas DYe steps. However, there is a
72 Optimization and rile effects of missing values
('
c
, (2,
D
( a)
(1lA E (2 )
(O)B F (4 )
(0'
B
(2)
E
(4 '
F
('
C
, (2'
D
(0'
B
(2,
E
,4,
F
('
C
, (2'
D
( 1-2)
(2'
( b) ( 0)
A A (1)
(0,
B
(2'
E
(4'
F
('
C
, (2'
D
(0'
B
(2'
E
(4 '
F
('
C
, (2)
D
( d)
" ,
A (1J A (1 )
Fig. 4.1 Uetenni nBtion of character length using Wagner optimization (add itive or
ordered characters), (a) Unrooted cladogram for six taxa and considerillg one
multista te character (slales 0- 4). (h) The unrooled cladogram is arbitrarily rooled at
laxon A (c) Sla les assigned 10 internal nodes by passing from the term ina l taxa to
the root. (d ) Alternative states for lhe Internal nodes resolved by passing from the
root to the tenllinals. (e) An alternative equally parsimonious resolution. The points
at which character changes must be assumed are dttlloted by black bars. See text for
delaiJs.
difference in the behaviour o f the character changes. In Fig. 4.1d. the change
from I - 2 was placed on to the c1adogram at the closest possible position to
the root, with the result that the occurrence of state 1 In taxon C must be
accounted for by a reversal (2 ..... 1), This is knowa • or ' fast' ·...-rated·
Optimality cn'teria Qnd character optimizatioll 73
( 1) (2 )
( a) C o
(1) A E (2)
(OJ B F( 4)
(1. 2)
( b)
A A (1)
(2)
(1)
( e)
A (1) A (l)
Fig. 4.2 Dtllermiuation of character length using Fitch optimization (non-addit ive or
unordered choracters). (a) Uncooted cJadogram for six taxa and considering one
muhistate charocter (slales 0 - 4). (b) The unTOated cladogram is arbitrarily rooted at
taxon A (cl Sla ies assigned to in ternal nodes by passing from the tenninal lax8 to
the root (d) Alternative states for the internal nodes resolved by passing fro m the
root 10 the Itlmlina ls. (e) An alternative eq ually parsimonious resolution. The points
at which character changes must be assumed are denoted by black bars. See texi for.
details.
Fitch optimization, like Wagner opt imizatio n, does not necessarily prod uce
a unique MPR. An alternative MPR is shown in Fig. 4.2e, where state 1 i
assigned to node w. It is nOl readily apparent by examlnma the original stat
set (0,2) that state I is a possible unique assiamncnl fur nuda w. To discovd
all possible MPRs, a second plU chrouah Ih' cl ...xUJlfy.
Optimality criteria and character optimization 75
• If the state sets of two derivative nodes are equal, then this value is
assigned to the internal node connecting them and dadogram length IS
not increased.
• If the slate sets are different, then the higher value is assigned to the
internal node, and cJadogram length is increased by the difference be·
Iween the two derived state sets.
• When the basal internal node is reached, its state set is compared with
thaI of the root. If they differ, cJadogram length is increased by the
difference; otherwise no action is taken.
(1) B G (l)
(2 )A H (l)
(0 ) C
")
B
")
C
")
0
")
E
(1)
F
")
G
(1)
H
(1)
B
"C ) ")
0
")
E
" ) (1)
F G
(1)
H
, • (t)
")
A (2) A (2)
.' ig. 4.3 Determination of character length under 00110 optimization. (0) unroated
cJadogram for eight ta xa and a character with three stales (0, 1, 2). (bl The
cladogra m is rooled al taxon A (ill this case Ibis taxon shows the most derived
slate). (e) Assignment of character stales to internal nodes. See lexl for details.
o 000000 0 1000000
Fig. 4.4 Comparison of (:hnraelar IUII81h usin8 Dollo Inlt! FII(:h uptlmlza tion. (e) ,
cladoBram requiring 56Vtln !!Itlps IIHhl!l Dnllo opllmiuUon (on••• In Imel !lix re ve
!luiH). (h) ThtillMme chl'll.;ht, ntqUIWH olily Iwg "'PI tcb JhJd~... II{)Il.
Optimality criteria (/11£1 character optimization 77
state 1 need be post ulated (Fig. 4.4b). 0 0110 o ptimization overestimates the
length by fi ve steps. The on ly means of avoiding this problem is to implement
a ' relaxed' 00110 method, whereby o ne might prefer nne gain and two losses
to two independe nt gains, but reject o ne gai n and Icn losses in favour of two
independent gains, A method for implemen ting such assumpUons is d iscussed
under 'generalized optimization' below.
(a)
C(O)
( b)
C(O)
f'iK. 4,5 I)tllurm ino!lo ll of ChiNe'" und ur Cum in - Sokil l o ptlm;7.lI tion. (II)
t;lot!ogrilm lalUl and I Ihfftt! 1I1nlllS (0, 1, ,), II lIIus t h I! ruotecl
with" tllXOIi 'I;«wlln.~:.:=~ Italtt. (h). S tnl tlll nH~I!(n U(1 10 nudU5
I hIlt!.
78 Optimization and the effeclS of missing values
• If the state sets of the two derivative nodes are equal, then this value is
assigned to the connecting node and c1adogram length is not increased .
• If the slate sets are different, then the lower value is assigned 10 the
l,;ullllccting node and cladogram length is inc reased by the difference
between the two derived Slates.
• When the basal internal node is reached, its state set is compared with
that of the rool. If they differ, cladogram length is increased by the
difference. Otherwise no action is t a ken~
When applied to the cladogram in Fig. 4.5a, Camin- Sokal opl'imizat ion
produces an MPR of four sleps (Fig. 4.5b).
Un like the other optimization procedu res described so far, Ca min- Sokal
opt imizat ion is very rarely used. It is highly un li kely that evolu tionary scenar-
ios would include the assumption that a feature may arise more than once
but never be lost.
o 1 2 3 M 2M 3M 2 3
1 1 - 1 2 1 1 M 2M ~ 2
2 2 1 1 1 1 1 2 M oooo 1
3 3 2 1 1 1 3 2 1 0000 00_
Missing values, designated as '?', '-' or '.' in computer programs, are some-
times entered in data matrices. Most often, missing values appear in analyses
containing fossil taxa and the problems that flow from their inclusion may be
most acute in palaeontological data. However, missing data are not confined
to fossils. There are a variety of circumstances in which question marks are
used. This section explains the causes, effects and possible strategies for
dealing with missing values.
Missing values may appear in a dala matrix for one of several reasons.
Fig. 4.6 An example of the increase in the number of equlI.lly most parsimonioull
dlldograms follow ing addition of taxa with missillg values. Above (a) is Ihe slrkt
C!H1ScnSIl5 tree fo r 20 orders of Rocent mammals coded for 88 cha rac ters wit h nearly
1111 da tfl cells filled with rea l VOlllOS, Dolow (bl, sevon fossil taxa have been added,
each of which ha ve between 25 - :;7% missing data. (From Novace k 1992,)
Their most obvious erfeci il 10 increue Ihe number uf equally most parsimo.
ninus dndograms and IIIDIutkm. Novacek (1992) gave a good
example of this initial o r the twenly
recoanizcd ch.rlClen
82 Oplimizntiolt alld the effects of missing values
produced four equally most parsimonious cJadograms. All but six data cells
n.
were fi lled with real data (i.e. 0 or the few question marks being the result
of non-applicable coding. To this analysis. seven fossil taxa were added with
varying amounts of missing data (25 - 57%). Analysis of Ihis enlarged matrix
resulted in 6800 + equally most parsimonious cladograms (this being the limit
of computer memory rather than the total number of equally most parsimo-
nious solutions). In the strict consensus tree, the clade originally recognized
as containing primates, tree shrews, flying lemurs and bats was lost.
Addition of taxa 10 any analysis is liable to increase the number of equally
most parsimonious c1adograms by introducing additional homo plasy (al·
though it is possible that additional taxa may give fewer cladograms by
resolving previous ambiguity). However, the dramatic increase in cladogram
numbe r in the study cited above is due largely to the inclusion of question
marks, which increase ambiguous character optim izations at the internal
nodcs. It has been pointed out by Plat nick t!J.pf. (1991b) that two of the most
commonly used parsimony programs (Hennig86 and PAUP) can generate
spurious c1adograms when supplied with matrices containing missing data.
The example of Platnick et al., reproduced here as Fig. 4.7, shows a data
matrix that includes two missing values for two taxa (F and G). Analysis of
this da ta set using either Hennig86 or PAUP yields six equally most parsimo.-
nious cladograms. Yet. if we replace the missing values with all four possible
combinations of real va lues, then we can recover just four of these six
c1adograms (Fig. 4.7a, d- O. T he remaining two (Fig. 4.7b, c) are solely
products of the way that the computer programs o ptimize missing values.
It call be argued furth er th at five of the six cladograms, those with three
nodes (Fig. 4.7a, d, 0 and four nodes (Fig. 4.7h, c), are 'oveN eso!ved ', by o ne
and two nodes respectively. In o ther words. all of the branches resolving
groups DEFG , DFG , DG and OF are spurious; none has unambiguous
support ill the data. How these spurious resolutions arise can be explained
with reference to a second example (Fig. 4.8), which also illustrates that the
problem of ambiguous optimization can also occur in the absence of question
marks.
Analysis of the matrix in Fig. 4.8a using Hennig86 or PAUP yields four
equally most parsimonious topologies (Fig. 4.8b·d, O. Only two of these
c1adograms are unambiguously supported by dala: Fig. 4.8b, in which node
CDE is supported by character 3; and Fig. 4.8c, in which node BC is
supported by character 4. The cladograms shown in Fig. 4.8d and e, and in
Fig. 4.8f and g, represent alternative, ambiguous optimizat ions of characters 4
and 3 respectively. The former c1adogram of each pair is the delayed
transformation, while the latter c1adograms are the accelerated transforma·
tions. The topologies are reported by the computer programs because at least
one character is placed on each branch under at least one of these optimiza·
tions (Fig. 4.8e and Fig. 4.80 .
Swofford and Beale (1993) proposed tbrM criteria that could be used to
determine when zcro-le I 10 be colli d.
Missing values 83
"'M
, I t J
~
~
.
,, o0 0 0
1 0 a 1
,, 1 1 0 0
,,, , , ,
1 1 1 1
,
G ,, ,
1 I 1 0
1 1 ? 1
/1 ,,
BCEFOG BCEFDG
~
BCEGOF BCEGOF
~ ~
B C EGDF BCEGDF
~ ~
fig. 4.7 Spurious cladograms. (11. -0 Whun the data set shown here, including two
questions marks. is analysed using either Hennig86 or PAUP, six c1adograms (a-f)
are found. However, two of these (b and c) cannot be justified by any combination
of replaced 'real' observations. they are spurious and result from the way the
alga.rilhms treat zero branch lengths. (From Plalni ck et oJ. 1991b.1
• Coll apse a branch if its minimum length is zero; that is, if there is at least
one optimization of all characters that assigns zero-length to the branch,
then that branch is collapsed.
• Coll apse a branch if its maximum length is zero; thai is, if there is at least
one optimization of all characters that does not assign zero-length to the
branch, then that braadlll not collapsed.
• Apply either transformation to all characters or to
characten indMctuIIIJ. any branch assigned zero-length.
'Die third crilcm. ...JII. would we have a
84 Optimization amI the effects of missing values
,-) , , , •
,,,,
•, ,, ,, ,,
, , , , ,,
0
• ,
• B C 0 E
• B C 0 E
o o
ABC 0 E A BC 0 E
o o
A BC 0 E A 8 C 0 e
o o
Fig. 4.8 (a-gl Exampla of how ambiguous optimization can lead to spurious resolu-
tion when a data sel is analysed using either PAUP or Hennig66: see lexi for
explanation.
defensible justification for choosing one type of optimization over the alher.
However, de Pinna (1991) argued that, unless demonstrated to be false by
parsimony, accelerated transformation is to be preferred because it preserves
more of our original conjecture of primary homology than does delayed
transformation. (The same point was made by Farris (1983) on the basis of
higher information content.) In other words, by favouring the acquisition of a
character, with subsequent homoplasy accounted for by reversal (e.g. Fig. 4.8e
rather than Fig. 4.8d), accelerated transformation maintains o ur original
conjecture of the character as II putative synttpomorphy. In contrast, b~
treating homoplastic characters as independent deriva tio ns, delayed transfor -
malion rejects our ori&i.nll...1!Y.~lh"iI of primary homology. For this reason,
Missing fJOiuer 85
• • ," "
o1 1 , , 1
•• •• o 0 0 0 •
o
(.,
V T
0 0 0 •
• .........
•, • • (0'
•
...
10 000
" ,
,,' ••• ,,
12 0 0 0
.3 0 0 0
,. 0 0 0
V
A BGCOEF
V
ABCGOEF A
V BCDGEF
V
A eCOEfG
V ~ ~ ~
F'", 4.9 Some taxa vvith question marks can be highly disruptive. (a) The single cladograrn that results from analysis of the data set
shown, but including only taxa A- F. (b) When a seventh taxon. G, is added to the analysis, eight cladograms are produced of which
the slrict consensus tree is the uninfonna live bush. (el The disruption is caused by the alternative positions that taxon G can adopt
on the e ight original cladograms. (From Nixon and Wheeler 1992J
88 Optimization and the effects of missing values
a-IARACTffiS
TAlON A 1 2 0 ? 0 1 2 0 1 0 1
TAXON B 1 ? 0 ? 0 ? ? 0 ? 0 '" safe
t
dillerent real coding
Fig. 4.10 Safe taxonomic reduction. II taxon B is compared with taxon A, it can be
seen that in all characters denoted by question marks taxon A either is the same or
has rea l values. The inclusion of this taxon can bave no topological effect on the
outcome and may be sa fely deleted. However, taxon C differs in having a differen t
'real' value and may not be safely deleted sinc6t.jt may cause a change in topology.
deleted . The reduced data set yielded only t~o equally most parsimonious
c1adograms. Having reduced the number of cladograms, the excluded taxa
would then be placed back on to the c1adogram(s) before a selection was
made among them on other grounds (e.g. stratigraphic or biogeographical
plausibility).
Missing data due 10 incompleteness should not be a problem with Recent
taxa. It is theoretically possible to know a Recent an imal or plant in a way
that is not possible for fossils. However, it has been pointed out (Gauthier el
ai.1988) that marked divergence in structure among groups of Recent
animals a nd plants, as well as the inclusion of highly distinct outgroup taxa,
may mean that question marks are placed against some characters in Recent
taxa. For example, matrices can include quest ion marks derived from non·
applicable (or illogical) codings. This often arises in matrices including both
fossil and Recent representatives.
Although little may be done regarding genuinely missing dala, we shou ld
be aware of the consequences of coding non-applicable character states as
quest ion roadcs. First, these question marks will undoubtedly increase the
numbers of equally most parsimon ious cladograms. Second. it is possible that
using question marks fo r non-applicable characters may lead to select ion of a
more parsimonious cladogram than is allowed by plausible character evolu·
{ion.
The following example (Fig. 4. 11, from Maddison 1993) explains Ihis
phenomenon. Suppose that we wish to add characters of the tail to try t
resolve the topologica l ambigu ity shown in the left·hand ma in clade of Fig.
4.11a. The animals concerned exhibit one of three conditions. Some are]
tailless. some have red tails and some have blue tails. The distribution 0
these conditions in our initial analysis is shown in Fij. 4.11a. where it can be;
seen that the more basal members of botJa __ ' . . . dades 8re aij
Mw"" values 89
taiUess, while the more distal members of each show either red tails o r blue
tails.
There are several different ways in which we might code the tail condi tions
(see Chapter 2). Two of the commonly used methods are shown here as
alternative 1 and alternat ive 2 (Fig. 4. 11 b). Alternative 2 treats the three
types as dependent variables within a single multistate character. In contrast,
allernative I treats them as semi-independent variables distributed between
two characters and includes a question mark to denote non-applicable coding
for colour in those taxa having no tails. Coding alternative 1 is often used, yet
this may lead to a selection of a c1adogram that cannot possibly be justified in
terms of character evolution (th at is, when we translate the c1adogram into a
tree). The area of ambiguity in the left-hand clade involves four taxa, for
which there are 15 possible fully resolved solutions. Two are shown in Fig.
4. ll c and e, and in Fig. 4.lld and f. Optimization of the tail characters
(presence/absence, red/ blue) coded as alternative 1 results in us preferring
the topology in Fig. 4.11c over the ahernative (Fig. 4.11d), beca use the former
requires only two steps to explain the distribution of tail colour, rather than
three. But selection of that c\adogram is nonsensical, because the choice is
based upon fal sely ascribing tail colour (blue or red) to animals that do not
have tails. However, if we used cod ing alternative 2, the starling condition for
both topologies would be '0' and both require 4 steps (Fig. 4.Ue and Fig.
4.1 10. Of course, coding tail attributes in th is way gets us no further in
resolving the initial polytomy but it does mean that we avoid the choice of an
apparent ly optimal c1adogram based on nonsensical character attributes. As
Maddison pointed out, this problem is not confined to morphological data but
is also relevant to the use of question marks to code for gaps in protein or
nucleotide sequences. This is a very simple example and if we were doing the
analysis by hand, then the danger would be spotted. But in computer
analyses, these pitfalls are much more difricuit to detect. FUrihermore, if we
were to apply successive approximations character weigh ting (see Chapter 5)
to such an initially 'nonsensical' cladogram, the result may well deviate
further from reality.
Simulations have shown that question marks in ingroup taxa that are widely
scatt ered at low hierarchical levels exerl more deleterious effecls on charac-
ter optimization (and perhaps fal se selection of c1adograms) than do question
marks in taxa near the root. In practical terms, this means that in combined
analyses o f fossil and Recent taxa, where the latter are usually scattered
within the ingroup, it may be particularly important to avoid the use of
question marks for non-appl icable character states.
Mention has been made above o f the use of question marks for polymor-
phic taxa. To avoid this two solutions have been proposed. First, the group
may be broken up into two or more subgroups, the members of which show
uniform coding. Altemalively, a phylogcny for the group showing the poly-
morphi.m may be assumed in, It the ingroup node accepted as
90 Optimization alld the effects of missing values
"""'. ..,- -
.-- , ••, "--
...
-=-
,~
~--
,
- "• ,
'~,~r
"'Ig. 4.11 An example of how nOlH lpplicable coding. scored as question marks, may
resull in choosing a cladogram tha t is 'nonsensical'. (8) The preliminary cladogram.
where part of the left hand side of the c1adogram was unresolved. We wish to add
characters of the tail to try to resolve this polytomy. There are three taxon types
(tailless, red-Iailed and blue-tailed). (bJ Two coding alternatives for the tail al·
tributes. Alternative 1 treats the features as two characters and includes e qutlstion
mark to denote inapplicabl e coding for colour in those taxa having no tails.
Alternative 2 treats the three types as 8 single multlstole character. (c) One of the
fiheen possible resolutions of the terminal polytomy shown il1 fa). (d) Another of the
fiheen possible resolutions of the terminal polytomy shown in (a). Optimization of
the tail cha racters coded according to a ltemative 1 leads us to prefer the topology
in (e) to that in (d), because the former is one step shorter. (e-O The sa me two
topol ogies as in (c) and (d). Optimization of the tail chanCltel'l c:oded a ccording to
alternative 2 doe! not allow u! to prefer either topo).g IIcMb .... tDur .tapi lon&
(From M.dd1ton 1983
Chapter l'ummary 91
representative of the entire group (but see §3.2.5 for a discussion of the
problems associated wi th this procedure).
In summary, the introduction of questio n marks into cladistic flna lyses
causes computational problems that have not yet been solved. These prob-
lems relate to both the numbers of cladograms produced and their resolution.
The introduction of question marks is clearly most significant when undertak-
ing simultaneous analyses of combined fossil and Recen t taxa (especially
where molecular sequences are included). For morphological data matrices, it
may be possible to disti nguish between informative and non ·informative fossil
taxa and eliminate the latter. It may also be possible to identi fy 'rogue' taxa
and eliminate them from initial analyses. For polymo rphisms, alignment gaps
in molecular sequences and non-applicable character states, care should be
taken in the initial coding. All of these strategies will alleviate the symptoms
of question marks but not remove them entirely.
Table 5.1 (a) Matrix 1 with four laXH (A- D) a nd an ail·ztlro root (X) coded lor 6
characters (1 - 6), (h) Matrix 2 wilh flva taxil (A- E) and an ail·zero root (X) coded for
6 characters (1 - 61. s "" actual steps on tht! citldogram, In '" minimum possible steps,
g '" minimum steps on a bush, ci :consistom:y Index. ri '" retention Index, S "" sum 01
s, m, and g values (S , M and C respectively) for the cakulation o f Cl and RJ. CI(u) is
CI for informative characters only. The va illes fo r CI. CI(u) and RJ for Matrix 1 refer
to the c1adogram in Fig. 5.1e, while those ror Matrix 2 refer 10 the c1adogram in
Fig. 5.2c.
(a) Matrix I
T Characters
2 3 4 5 6 X
X 0 0 0 0 0 0
A 0 0- 0 1 1
B 0 0 1 1 1 1
C 0 1 1 0 1
D 1 1 1 0 0
, 1 2 2 1 6
m 1 I I 1 1
•
•
d 1
2
1
2 1
1
2
OB
2
0.5
10
d 1 1 1 0 0
Length 6
CI MIS 618 0.75
c Uu) MI S 410 0.67
RI C-S / C.M 10- 8/10-6 0.50
(bJ Matrix 2
T Characters
1 2 3 4 5 0 1
X 0 0 0 0 0 0
A 0 0 0 1 1 1
"
C
0
0
0
1
0
1 1
1
1
1
1
1
0
0
1
1
0
E 1 1 1 0 0
., I 2 2 1 1 1 6
1 1 1 1 1 0
'" 2 3 2 1 2 3 13
"
d
,; 1 1
0.5
0
0.5
0.5
l.ength 8
CI MIS 6/8 0.75
.,
c l(u)
=
MI~
p-S/G: ~J
4 / Ii
·tiI 1:I- ()
0.67
0.71
94 Measures of character fit (llId character weighting
0 c , B
.,
.,
(a) .,
, B C 0
., ••
..
(b)
A B C 0
.,
.,
(e)
J<ig. 5.1 a- c:. Ana lysis of the data In Tllble 5.1a yields three eq uall y most parsiml
nious c1adograms. Only characters 2, 3, 5 and G are mapped. Character ga in signifie
by ' +', character loss by ' - ',
.,
o , , .
Mellsures of clltlrtlcter fit and character weighting
181
.,•
. , o ,
• , o ,
.,
1'1
Fig. 5.2 a-c. Analysis of the u ata in Table 5.1h yields three equally most parsimo-
nious cJadogrllffis. Only characters 2. 3, 5 and 6 are map ped. Character ga in signified
by '+ ', character loss by '-',
0.33. H owever, it could have a worse fit as there are five taxa with the
apomorphic value and hence its poorest ci would be 1/5""' 0.2. For the
cladogram in Fig. 5.3, character 2 does group some taxa toge ther (A + B + C).
Hence some of the similarity is interpretable as synapomorphy and its
ri - 0.50.
o , , ,. E F G H
Fig. 5.3 Cladogram for nine taxa, A-I, with the two characlllrs in Table 5.2 mappe
Character 1 a ppears twice (independently in 0 lind H) and Is not synnpomorphic.;
Characte r Z appears three Ilmll~ : twko indflptUldflrltly In F' 1t1l~11 , but nilio unithlij the-
members of tim group AOC. HIIIll;e IOnlll of II• • hnll.... ly-.lu)'n.pomotphh ~ (A fter.;
,c;o.l.bo.flIll9JJI
Character weighting 99
Table 5.2 Example demonstrating the lise of th e re tention index wit h nine ta xa
(A- I) and two characters (Cl and ezl. Abbreviations for 5. m, g, ci and ri as Table
5.1.
Taxa , m g oj ,;
A B C D E F G H (m/ sl ( g-5/g- ml
M orphological dolo
Neff (1986) presented the most complete analytical method for character
analysis to date , broadening the concept to include factors other than
re-examination of morphological comparisons. Neff divided the process of
phylogeny reconstruction into two distinct parts: character analysis and
cladistic analysis (an idea reinvented by Brower and Schawaroch 1996),
According to Neff, character analysis involves two steps rather than the one
that is generally assumed (character delimitation). Step 1 is synonymous with
character delimitation and includes the initia l investigation of specimens,
making of observations and identification of features as possible homologues.
As it pertains to a particular hypothesis, this step is summarized as: 'Feature
X in taxon A is the same as feature X in taxon B'. Neffs step 1 is analogous
to Patterson 's (1982) concept of similarity and Rieppel's (1988) idea of
'topographic correspondence' being the initial criterion for homology deter-
mination. Neffs second step consists of constructing a hierarchy of characters
and is also phrased with reference to particular hypotheses: ' Feature X is
more general than, and includes, the more specialized feature Z', Step 2 thus
concerns polarity estimation (this subject was considered in detail in Chapter
3). For the purposes of Ihis discussion, polarity decisions should be under-
stood to be conciusioRi tbat InI derived from a c1adogram and that clado-
'f8m construction doll DOl . . . . . . . priori polarization (see Chapter 3). Yet
there i. much of mcd deed. as Carpenter (1988: 292)
102 Measures of character fit alld cJUlfflcter weighting
noted: ' Actually Neffs paper ('I 986J merely argues for careful homology
decisions, which should be given'.
Molecular data
Judgemental mistakes of similarity are a more than reasonable source o f
error when considering morphologica l data. This is as it should be. Exami na-
tion of character conflict leads to fe-examination of characters, which, in
turn, leads to greate r understanding of the organisms and improved classifica-
tions. This is the essence of systematics. Yet when dealing with sequence data
(particularly nucleotide sequences), one is usually faced with the conclusio n
that the da ta do, indeed, contain copious ' real' conflict. This is because it is
impossible 10 examine a nucleot ide base in any further detail, its similarity is
as exact as can be. However, the regularity that defi nes sequence data may be
used to implement certain types of a priori weighting, usually on the assu mp-
tion thul the conflict is 'caused' by knowlhprocesses (but see below). T he
various types of weighting possible are listed as follows (Hillis el al. 1993);
perhaps more will yet be discovered.
A priori weighting:
A posteriori weighting:
H OC . Op
O-U
V-O
U -A
I 110- ' ~5
1139 °1?7
O'C
Pedinomomu minor
9 · ~.t 6 S rRNA
U-A
IV C · O·
. U13 [
A, " U
,M'
c ·o
gaU' A
.'
89£">, 'a
. 11
O' U
Fig. 5.4 Secondary structure in the 55 rRNA molecule of Pedinomonas minor.
PlIu ed sites are those which form Watson- Crick base pairs in the stem regions of a
molecule and unpaired siles 8re those that occur in the loop regions. (After
Devereux et 01. 1990.)
wise 10 exam ine each molecule individually rather than attempt to extrapo-
late a pa rticular situation into a generality pe rtaini ng to all molecules.
With respect to non-<:oding genes, different factors can be taken into
account. For instance, every molecular sequence has both a secondary and
tertiary fo lding structure such that some bases are placed adjacent ('0 each
other in stem regions and o thers are separate in loop regions (Fig. 5.4).
Nucleotide bases that appear opposite each other in stem regions are see n as
dependent , because if a substitution occurs in one position, the opposite base
may also have to change to maintain the overall st ructure. In contrast, bases
in the loop regions have fewe r such constraints li nd may be free to change to
any other nucleotide. Therefore, it might seem useful to weight these posi-
tions accordingly. HowlVU. diupeemenu exiat with respect to the informa-
tiveness of the Itgu for m.:eeot.
Wbee10r and Honeycutt
Character weightillg 105
TRANSITOI'S lRANSVEFSOOS
A ~.>-----------~.. G A G
c ~.~--------~ •• T c T
Fig. 5.5 Classification 01 nucleotide substitutions. Each arrow represents two op-
tions for direction of change. Transitions, of which there are four possibilities.
substitute one purine (A or C) fOf another or one pyrimid ine (C or T) fo r another.
Transversions, of which there are eight possibilities, substitute a purine for a
pyrimidine or vice versa.
(l988) suggested that the stem regions were uninformative (and effectively
gave them zero weight), while Dixon and Hillis (993) came to the opposite
conclusion. They suggested that Ihe informativeness of the stem and loop
regions may be unique to each molecule or to particular organ isms (Wheeler
and Honeycutt examined 5S and 5.8S rRNA genes in insects, while Dixon and
Hillis used 28S rRNA in vertebrates). Such discordant conclusions suggest
that even if it appears legitimate 10 implement this type of weighting scheme,
it may not be one that can be generalized to all organisms and all molecules.
Once again, the 'imbalances' need to be investigated with reference to each
particular problem and their utility examined against the conclusions (a
cladogram).
Within·sequence position weighting utilizes several different kinds of pra.
posed mutational bias. The first to be explored was the relative frequency of
transitions versus transversions. There are two kinds of nucleotides: adenine
(A) and guanine (G) are purines, while cytosine (C) and thymine (T) are
pyrimidines. Transitions substitute one purine for another or one pyrimidine
fo r another, of which there arc four possibilities (Fig. 5.5). Transversions are
substitutions of a pyrimidine by a purine or vice versa, of which there are
eight possibilities (Fig. 5.5). By chance alone, one would expect to encounter
more transversions than transitions. However, in many examples, there are
significantly more transitions than transversions. Such imbalances can be
used to weight the data so as to prevent favouring one kind of substitution
over another. For example, Miyamoto and Boyle (989) discovered that
'better resu lts in terms of unambiguous resolution ... , congruence ... , and
consistency , .. are expected from analyses of transversions alone, rather than
from combinations of nlWldona. uaDIVersions, and gap events ...•
Of further significance aN lbe reillive .ubstitution frequencies of all 12
possible substitutiolli. ..... _ "' ......tcd acc:ording to their frequency in
Iny pank:ullr III ible .ubltltutlon•• of which
106 Measures of character fil and character weighting
Table 5.3 The 16 possible substitutions among the four bases with
frequ encies 0- /: frequen cies w- z represent the four 'substitutions'
of olle base by the same ba st!, which are thus undetectRble and are
disregarded.
A c G T
A w o b ,
C d x e f
G
T • h
k
y
I ,
fo ur arc substitutions of one base with one that is identical and hence arc
undetectable and d isregarded (Table 5.3, w - z), Of the t 2 observable changes,
different frequency values are allowed for each d irection of change, such that
A ..... C is d and C -+ A is a, where d and "ill, may be different. These va lues
can be calculated from observed frequencies and compared against expected
frequencies. The values can be used in a var'iety of ways, including step-
matrices (see Chapter 4).
It is possible to combine within and across sequence weighting regimes, as
in dynamic weighting (Williams and Fitch 1990, Fitch and Ye 1990), which is
based, in part, on 'successive weighting' (see below). In successive weighting,
weights are assigned according to the ir levels of ho moplasy on resulting
cladograms. Dynamic weighting applies the same strategy but also includes
information on the relative frequency of the obsetved character cha nge. As
an example, Marshall (1992) used dynamic weighting to re-analyse small
subunit (SSU) rRNA sequences for amniotes. In the o riginal analysis, Hedges
et al. (1990) found support for a sister-group re lationsh ip between birds and
mammals to Ihe exclusion of crocodiles (Fig. 5.6b; a solulion favoured by
some morphologists, e.g. Gardiner 1993). However, inspection of the data
revealed considerable substitution bias. In particular, there was an over-
representation of T -C substitutions and a significant under-representation of
A-T and T - A subst itutions. Marshall noted that a large number of sites with
T- C subst itutions supported the bird- mammal relationship. Using dynamic
weighting, Marshall discovered instead that the data supported a
bird- crocodile sister-relationship (Fig. 5.6c, differing slightly from the tradi-
tional ' palaeontological' cladogram, Fig. 5.6a). These results may only attai n
significance wit h respect to a wider analysis of all molecular and morphologi
cal (including palaeontological) data (as in the comprehensive study b
Ecm isse and Kluge 1993, which suggested lillie overa ll su pport for the
bird-mammal relationship).
It is often taken as fac t that it is more reasonable to apply a priori
weighting schemes to sequence data. Many, if no t all. such weighting schem
arc conceived in terms of kinds of substitution. However, the no tion o~
substitution (a process) is itself derived from the dalII. Com rison of two (or
Character weighting 107
(a)
( b)
...
.'
~
( c) o·o'
(} ~'
Fig. 5.6 Three cJad ograms depicting possib!~ rela tionships among emnlotes. (a) The
palAeontologica l cladogram. (bl The cJadogram derived from the un weighted 18S
rRNA data or Hedges alaI. (1990). (el The cledogram deri ved from the weighted lBS
or
rRNA data Marshall (1992).
more) sequences will either reveal iden tity or not at parlicul ar sites. The
difference is empirical (derived from a comparison). Call ing the differences
betwee n two sites ·subllilulio...• may limply be a label based upon presumed
understanding of tbe .~_. "'1M dlrrerence. Yel the actual difference and
its cause can be . as.In this sense, weighting is
108 Measures of character fit and character weighting
arrived al by observation of differe nces not the ir 'cause', This may seem to be
a minor semantic point, but such argu ments relate \0 the larger issue of
homology, its discovery and its 'celllse'. In the cladistic view, the 'cause' of
homology is irrelevant (and un necessary) to its discovery, which is the result
of analysis a nd characte r congruence. Bearing this in mind, wi th sequence
dula onc may view the kind of differences as empirica l observations 10 be
seen and tested in the light of subsequent analysis.
Compatibility analysis
Compat ibili ty was first suggested as a method of ana lysis rather than a
method of weighting. As such, compatibility was not favo urably rece ived and
is lillie used these days. However, several authors have suggested that when
used as a weigh ting scheme, it may be of some value (Penny and Hendy 1986,
Sharkey 1994). Some of these ideas are re'iiewed briefly below.
Sharkey (994) presented a method 01 character weighting that used
compatibility, although similar approaches had been presented previously by
Pcnny and Hendy (986) and Sharkey (1989) (but see Wilkinson (994).
Howeve r, Sharkey had some misgivings about these earlier attempts, hence
we will consider only his most recent method.
Sharkey (994) began his discussion by cit ing Farris' (1971) distinction
between 'congruent' and 'compatible' characters. Sharkey's interpretation
suggested that congruent characters corre late with respect to particular
phylogenetic hypotheses, while compatible characters are correlated with
each othe r in the data sct. Thus, congruem characters are determined after a
cJadogram has bee n constructed (a posteriori) and compat ible characte rs are
judged prior to cladogram construction (0 priori). Sharkey devised a simple
example to demonstrate the use of compatibility analysis (Table 5.4), in wh ich
13 binary characte rs are scored for eight taxa (and an all-zero root). Charac-
ters 1- 12 are all perfect ly compatible wi th each other. Cha racter 13 is
incompatible with every other character (except character 1 which is uninfor-
mative) and therefore should be a priori considered the weakest in the data
OHGFEDCAB
"
(.)
O
,
HGFEDCAB
"
"
(')
fig. 5.7 The two equally most parsimonious dadograms derived from the data in
Table 5.4. (a) On this cJadogram, cha racter 2 fits perfectly with a single step, while
character 13 fits with four steps. (hI In contrast, on this cladogram, character 13 fits
with only three steps. although character 2 now fils less than porfBclly with two
steps.
set. Parsimony analysis yields two cladograms (Fig. 5.7). The first cladogram
(Fig. 5.7a) accounts for character 13 with four occurrences (branches leading
to taxa B, D, 0, and H). This is its worst possible performance on any
cladogram. The second cladogram (Fig. S.7b) accounts for character 13 with
three occurrences (branches leading to taxon G + 1-1, and branches leading to
B and D). Character 13 fits the second cladogram better than the first by one
step and provides additional support for the group G + H from character 3
(us a ' reversal'). Both cladograms are equally parsimonious. In this case, the
first cladogram (Fig. 5.7a) is preferred because more characters are compati·
ble. As Sharkey pointed out, this is not based upon any a posteriori considera-
tion of fit, but on the a priori consideration of compatible characters 1- 12. It
is worth noting that the first cladogram would also be selected by successive
weighting (see beloW).
To measure the relative amount of compatibility in any data set, Sharkey
proposed the unit discriminate compatibility measure (UDCM) of a charac·
ter, described as 'the complement of the probability of a derived character
state being nested with lnotber derived character state or the probability of a
derived character stlto lIduaive of another derived character state,
depending on the observed alw'Ktcr comparison' (Sharkey 1994).
The UDCM .I1OM on the balis of their
lJO Measllres of character fit and character weighting
overall compatibility. Sha rkey suggested Ihal o nce the weights had been
calculated Ihey could be uscd in parsimony analyses. Sharkey's objective was
to assist in choosing among competing trees. However. the idea has nOI been
tested in detail and it rem ains 10 be seen whether, li ke compatibility for
cladogram construction, such weigh ting schemes end up discarding (by down-
weighting) many characters.
A 0 0 0 1 1
B 0 0 1 1 1 1
C
0
•
1
1
1
1
1
1
1
0
0
1
0
I 1 1 1 1 2 2
m 1 1 1 1 1 1
•
d
1
1
2
1
2
1 1
2
0.5
2
05
ri 1 1 1 1 0 0
sw (ri x ciJ
iw (K/(K + ES;)
10
0
1.
10
10
10
10
0
•
75
0
75
Implied weighting
With respect to homoplasy. cxtra steps and finina aiii....1CrI 10 cladograms,
Charact~r weighting 113
At ,
b..
I·)
At
"" sleps
Ib)
steps
At
'-...
\
(0)
I'
steps
Fig. 5.0 Graphs depicting the three kinds of fitting function used to adjust for
relative cladistic consistency. (a) Linear. (b) Concave. (cl Convex. (Arter Goloboff
1993J
Farris (1969) discussed three forms of fitting function that could be used to
adjust for relative cladistic consistency: linear, concave, and convex (Fig. 5.8).
For linear fil (Fig. 5.8a), the cladogram with the overall shortest length is
considered optima l. The problem with this approach is that the relative values
of the characters are ignored. Linear fitling is used when equal (uniform)
weighting is applied, that is, all steps count equally. It is therefore equivalent
to the 'default' option of most parsimony programs. One conclusion that can
he drawn from uniform weighting is that the reliability of the weights (and
hence the characters) is sct prior to the analysis-they are all eq ual. Yet it is
clear Crom many, if not most, analyses that there will always be some
characters that behave well and othcrs that behave poorly. The implicat ion is
that they do nOI all contribute equal kinds of information.
For concave fit (Fig. 5.8b), • non·linear relationship reflects how well each
character performs on I relaIM ..... To return to the example given above,
which considered two binary . . . . . . . (l and 2) and two different but
equally parsimonioUi cia . . . V). both characters differed by a
114 Measures of character fit alld characler weighting
single step on the competing cladograms. However, character I has maximum
and minimum observed steps of 2 and 1 on cladograms Y and X respectively,
while character 2 has maximum and minimum observed steps o f 15 and 14 o n
cladograms X and Y respectively. Intuitively. character 1 should receive
greater weight than character 2. The proportionaJ difference can be assessed
by use of 'extra steps'. For two cl adograms on which a character has s· and S 2
sleps respectively (with 51 having the larger value), this proportional differ-
ence is given by ($1 -S1 )/{SIS 2), In this example, character 1 has a value of
0.5 and character 2 has a value o f 0.005. In short, concave fit gives preference
to those characters with least homoplasy.
For the sake of completeness, Farris included a short discussion on convex
fit (Fig. 5.8c). This approach implies the opposite of concave Ci t, suggesting
that characters with greater homoplasy are to be preferred - clearly, this is
not a sensible option!
Goloboff (1993) exploited concave fit to determine character we ights in his
computer program, PI WE. Here, weights are ca lculated as W - K/{K + ESi),
where ESi is the number of extra steps per character and K is the constant
of concavity (the inclusion of a measure of 'extra steps' makes Goloborrs
approach analogous to the discussion of Farris (J 969». Referring back to
Matrix 1 (Fig. 5.1a) and cladogram in Fig. 5.1c, four characters (J-4) fit this
cllldogra m perfectly (Table 5.5). C haracters 1 and 4 are uninformlltive and so
P1WE assigns them zero weight, in contrast to successive weighting which
assigns them maximum weight (in practice, of course, this makes no differ·
ence to the analysis, mere ly adding to cladogram length). Characte rs 2 and 3
fit the cl adogram perfectly, th at is with no extra steps, and hence receive the
maximum weight of 10. Characters 5 and 6 fit the cladogram with two steps
and hence each character has one extra step. Thus, for K - 3, they receive
weight s of 7.5. Two co nsequences can be seen immed iately fro m this simple
example. A ll characters. unless completely uninformative, receive a no n·zero
weight and this weight varies according to the value assigned to K. When ~
is altered, the weights change. For characters 5 and 6, weights for values of
K = 1 to 6 are given in Table 5.6. Goloborf (1 993) then used the total weight
of all characters, rather than c\adogram length, to se lect the best c1adogram.
5.2.4 Prospects
The mechanics and implememation of a posteriori weighting are described
above, but what of the resu lts? Carpenter's general aim was to investigate a
rational way to reduce the number of equally parsimonious dauognuus using
empirical criteria. However, results suggest that this general expectation need
not be the case, for either successive weighting or implied weight ing. "TWo
significant results have emerged. First, a greater rather than a smaller
number of cladograms than were in the original set may be recovered after
weighting, and second, the cladograms found after weighting may not be
among the origi nal equally most parsimonious suite (e.g. Platnick el al.
1991b),
With respect to more cladograms, it may be that of the original equally
weighted characters, few were cladistically consistent (or self-consistent in the
term inology of Goloboff 1993) and many received zero weighl. The clado-
grams derived from the equally weighted data thus depend upon a few
ambiguous characters. T he significance of obtaining different topologies from
/I posteriori weighted data is perh aps more controversial. It has been argued
that this result is not unexpected (Platnick et al. 1991b, Goloboff 1993), for,
in practical terms, weighting is equivalent to excluding some characters and
non-randomly replicating others. Consequently, a differentially weighted data
sct need not give the same results as an equally weighted one. Others have
argued qu ite simply that longer cladograms should be disregarded as they
violate the basic prem ise of parsimony (Turner and Zandee 1995).
Consider one example. The elegant study of haplogyne spiders by Platnick
j·t al. (199 1a) yielded ten equally most parsimonious cladograms using equal
weights (length = 184). After successive weighting (implemented by
Il cnn ig86), this number was reduced to six (length 568; note this large
increase in length is to be expected, because, for example, a perfectly
consistent character now adds 10 to the length of the cladogram, ra ther than
the previous 1). None of the six we ighted cladograms were among the o riginal
ten . When the six weighted cladograms were inspected using the unweighted
matrix, two cladograms had a length of 185 (one step longer than the optimal
thtdograms from equal weighting) and the other four had a length of 187.
Platnick t t al. reasoned that of the six ciadograms, the two with length 185
wcre worthy of further coDIkIIraIioo Ind were better than the equally
weighted ciadograms in spite oIlbI r.ca thai they are one step longer. They
araued that for the characters contribulina to the
116 Measures of character fit and character weighting
topology could be considered more 'consistent ' than those under equal
weights. This suggests a further concjusion relevant 10 weighting. Rather than
hcing a method to reduce the number of equally parsimonious cladograms,
Goloboff (1993) and Pial nick el al. (19910) have suggested that parsimony
analyses require weighting to achieve self-consistent results, even if o nly a
single most parsimonious cladogram is found using equal weights (Ihis view
has been comestcd by Turner and Zandee 1995; with a reply by Goloboff
19950). Platnick et al. (1996) have further suggested that equal weights can
only be considered a preliminary and crude estimate of the relative value of
the data. Such views are consistent with the general understanding of cladistic
parsimony, that the 'value' of a character is related to its performance on a
dadogram. Further, coupling this wi th a notion of support for each clade
leads to a firmer choice of particular c1adograms, whether only o ne results
from analysis or many.
A posteriori character weighting holds a berlain amount of promise and
may help 10 produce morc consistent cladograms relative 10 the dala col-
lected. Progress may result from performing analyses with different parame-
ters for weighting (as well as differen t values for obtaining weights) on more
data sets (e.g. Su ter 1994). Whatever the outcome, it see ms likely Ihal the
view of Platnick et at. (1996), that equal weights can on ly be a preliminary
and crude estimate of any particular data set, is worthy of further
consideration.
II. Farris suggested three kinds of fitting function: linear, concave and
convex. Goloboff recommended concave fit (following Farris) a nd adopted
an approach lhlt co. . . . . die direct weiahting of 'extra steps', such that
the weight 01. by K/(K + ESi). where ESi is the
6.
Support and confidence statistics for
cladograms and groups
6.1 INTRODUCTION
optlrrel cladogram
Fig. 0.1 The basic concept underlying 100st tree support statistics. A data set of real
observations is repeatedly perturbed according to a particular set of rules to yield a
large number of pseudorepHcate sets o[ 'phylogenetica lly uninformative' data. The
length of the most parsimonious dadogram(s) derived from the real data set is then
compared with the lengths of the most parsimonious c1adograms obtained from
these contrived data sets, with the expectation that the former (indicated by the
arrow) would be very much shorter than any of the latter (represented by the
frequency histogram).
"
C
0
101 101 1 010 101 101
011 011 1 001011011
000111 0111 000111
1 010 1 0 100
1 001 1 0010
0 111 1 0001
E 000 000 0 ODD 11 1 111 1 111 o 1111
Goloboff showed that the CI, RI and RC do not vary directly with decisive-
ness and he nce the least decisive matrices arc l1ul nt:ct:l>Sarily those with the
lowest values for these statistics. Nevertheless, a general measure of decisive·
ness is possible.
Data decisiveness (DO) is defined as:
DD ~-_
s- s
S-M
give n data set as an indicator of phylogenetic signal in the data. They have
argued that for a DeL that is nearly symmetrical, many cladograms will be
o nly a few Sleps longer than the most parsimonious cladogram and the
phylogenetic signal is weak. However, if the DeL is strongly negatively or
left-skewed (i.e. with a long tail to the left side of the distribution), then there
are relatively few cladograms that are just slightly longer than the most
parsimonious solution and the phylogenetic signal is therefore strong.
One significance test proposed by Hillis (1990 is based upon the null
model in which characters are generated independently and al random, and
all states have the same cxpected frequency. We would then conclude there is
significant cladistic structure, and that the characters are highly congruent,
when the skewness statistic, CI> for the real DeL is below the fifth percentile
(for example) of the DeL 81 derived from matrices of randomly generated
data. However, this effect can also result from the charactcrs simply having
different frequencies among the taxa. A character that divides the taxa into
twO groups of sim ilar size tends to make the DeL more symmetrical.
Conversely, characters that allow the recognition of small groups of taxa tend
to increase left-skewness. The significance test outlined above confounds this
effect with that due to character congruence.
Huelsenbeck (1991a) compared the DCLs generated from real and random
data using g]. Using simu lation tests, he found that data that were consistent
with o nly a single most parsimonious cladogram tended to produce a strongly
Icft -skewed DCL. In contrast, data that were consistent with numerous
equally most parsimonious cladograms produced a more symmetrical DCL
that cou ld not be distinguished from those generated from random data.
However, a hierarchical pattern will be recovered whenever there is character
congruence , whatever its source, and g] is generally negative even for
randomly generated data because of chance congruence within such data.
Thus, the te.',t is whether the observed skewness in Del is no less negat ive
than would be expected from random data. If it is, then thc conclusion is that
there is significant phylogenetic (hierarchical) signal in the data. H it is not,
then one would conclude that the observed character congruence is largely
due to chance and the most parsimonious cladogram is a poor estimator of
phylogeny.
Huelsenbeck's simulations were based on a model in which the probabil i-
ties of character change were equal along all branches of the cladogram. The
accuracy of the most parsimonious cladogram resuhing from these simula-
tions is determined by the p robability of character change along its branches.
If the probabil ity is very low, then most characters will be invariant or
autapomorphic. If it is too great, then the d istribution o f the characters
effectively becomes independent of the cladogram . Skewness is affected
similarly. When charactan . . tavuiant or autapomorphic, skewness is O.
Skewness is weak ...... die . . . . . 01 chanae is large and the expected
frequenciea of aU "I HUU,', null model. Skewness is
122 Support and confidence Slalislics for cladograms and groups
,.) O1af8CISfS
, •
Taxon A
•
0000010010
7 8 C 0 E
Taxon B
Taxon C
Ta~ on 0
Taxon E
11 00000000
'111"'201
111 1 0 11201
100 101 110
-
OlaracI61s
A 8 C 0 E
1') 4 7
Taxon A 101000201
Tax on
Taxon
Taxon
Taxon
B
C
0
E
1110011110
1000111200
0110101001
1101010 0 10
-
(0)
'"
1/>0"'111
oplimal clNog"m
(1635)
~
• 2150 28SO
N'ocr..alng cladogr,..., '-rv1h ~
Fig_6.2 PTP analysis. (a) The most parsimonious c1adogra m for the original data set
is determined and its length recorded. (b) The states of each character 8re then
pennuted among the taxa, while maintaining the proportions of each state, to
produce a new data set. This data set is then analysed and the length of its most
parsimonious cladogram recorded. (el This procedure is then repeated a large
number of times and the PTP is defi ned as the proportion of all da ta sets (pemlUted
plus original) that yield c1adograms equal to or shorter than those produced from
the origi nal data set. The null hypothesis tha t there Is no cladistic structure beyond
that due to chance would be rejected at the 0.05% level if the most parsimonious
c1adograms of fewer than 5 of the 100 data sets were as short or shorter than those
derived from the unpermuted data (PTP .:s;; 0.051.
Ihat Ihc PTP be arbitrarily recorded as 0.01. Thus a low PTP is desirable and
a value of less than 0.05 could be taken to imply the preaence of significant
cladistic covariation or structure in the ori&inal"':
Randomization procedures applied to the whole cladogram 125
However, because the PTP test is based upon the determination of the
lengths of most parsimonious cladograms, there is a practical problem with its
application. For smaller data sets, it may be possible to apply exact methods
of cladogram construction and thus obtain precise estimates of the PTP value
in a rea<;onable computational time. However, this is not possible for larger
data sets, which must be analysed using heuristic methods that are not
guaranteed to find the minimum length solutions. Furthermore, the signifi-
ca nce level, a, of a PTP test cannot be less than l/( W + 1), where W is the
number of permutations. For example, if a more stringent a value of 0.01 is
required, then 99 permutations must be undertaken, which may result in
excessively long computational times. Faith and Cranston noted this draw-
back and suggested that the PTP could perhaps be estimated by applying the
same heuristic procedure to both types of data set, i.e. original and permuted.
Alternatively, it might be sufficient simply to compare the results from an
exhaustive search of the real data with those of heuristic searches of the
permuted data. However, Kallersjo et al. (1992) noted that, becau se such
approximate values would always exceed the corresponding exact values. this
apparently conservative latter procedure would actually increase the apparent
difference between the real and randomized data sets and worsen {he risk of
a false conclusion of significant congruence.
Instead, they suggested using the single pass 'hennig' command of the
Hennig86 program (Farris 1988) to estimate very quickly the length of the
most parsimonious cladogram. This approximate length seldom differs by
more than a few per cent from that of ~he most parsimonious dadogram
calculated by an exact method. The test would then use only the number of
permutations in which the lengths of the estimat_ed minimum length c1ado-
grams exceeded that for the real data. If this procedure is applied to both
types of data and repeated many times (say 10000), then the approximation
differences would be unlikely to have much effect on that number. The
successful results of such an application should be simjlar to the histogram
depicted in Fig. 6.1. The bar furthest to the left represents the length of the
estimated minimum length cladogram obtained from the real data. This is
well separated from the rest of the distribution and thus lhere is little chance
that the use ·of approximate methods will lead to an erroneous conclusion.
In addition to this practical problem, the validity of the PTP test has also
been questioned from a theoretical viewpoint. Bryant (1992) argued that the
null hypothesis of randomly covarying characters is contrary to the very basis
of cladistics. Every character is a putative synapomorphy or homology state-
ment, from which it follows axiomatically that cladistic characters will covary
hierarchically. Characters are thus intrinsica lly hierarchical and cladistic
analysis simply summarizes the lolal implied hierarchy in the data set as
efficiently as possible. JI . . . . upocted covariation among characters that is
the assumption Iha. jul1.... lMlNfCh tor Ihe most parsimonious cladogram.
Furthermore, bec. I n Ihe data. it must follow that
126 Support (lIId confidence ~'tatistics for c/adograms and grollps
O'eract8l's
"1
TaKon A
t2 • 7 to • • c 0 E
--
0000010010
Taxon B t 100000000
Tallon C 111111201
Taxon 0 1 , 1 0 1 1 20 1
Taxon E 10 010'110
~I
A
• c o E
Consensus 01 3 cladograms
length 15
'01
A
• c o E
•
Fig. 6.3 Bremer support. (a) The most parsimonious cladogram is determined for a
da ta set. (b) All c1adograms thet are one step longer than the most parsimonious are
then determined. The strict consensus tree of these plus the most parsimonious
c1adogram is constructed to determine those groups thol no longor receive unam-
biguous support. (c) This process is repeated, increasing tho length of the suboptimal
cladograms by one step each lime, until all groups are lost. The Bremer support for a
group is the number of steps that have to be added before that group is no longer
recovered o n the strict consensus tree of optimal and su boptimal cladogroms.
as the ratio of total support to the length of the most parsimonious clado-
gram. When the strict consensus tree of the most parsimonious c1adograms is
completely unresolved, then all groups have zero Bremer support and ti = O.
In con trast, if there is no homoplasy in the data, and hence only a single most
parsimonious cladogram that is fully supported by all characters, then ti = 1.
The total support index measures the stability of the most parsimonious
cladogram(s) in terms of supported resolution, rather than in terms of the
degree of homoplasy in the data. However, a low ti value should not be taken
to imply that all the clades on a cladogram are poorly supported. Some clades
may have high Bremer support values, despite low total support
Bootstrap
Applied to cladistic data, the bootstrap (Fig. 6.4) randomly samples characters
with replacement to form a pseudo replicate data set of the same dimensions
as the original. The effect is to delete some characters randomly and to
reweight others randomly, with the constraint that the sum of the weights for
all characters equals the number of characters in the matrix. A large number
of pseudoreplicates is generated, typically 1000 or more. The most parsimo-
nious cladograms for each pseudoreplicate are then found and the degree of
conflict among them assessed by means of a majority rule consensus tree,
which includes all those groupings that are supported by more than 50% of
the pseudorcplicates (see Chapter 7 for further details). The percentage of
most parsimonious cladograms resulling from the pseudoreplicates in which a
particular group is found might be interpreted as a confidence level associ-
ated with that group. For example, if a group appears in 95% or more of
these cladograms, then it could be concluded that this group is supported at
lhe 95% level.
However, there are several serious limitations to this use of the bootstrap.
First, for such confidence limits to be valid, the groups for which the
monophyly is to be tested should be specified in advance. If we cannot specify
any such groups, then the number of potential groups is so large that in order
to maintain an overall typIl orror r••o of say 0,05, such an exceedingly Iowa
level would be required .... oonfidcnce inlerval would be vastly
130 Support and cOllfidellce sratisrics for cladograms and groups
Ct\araclars
"1 12 4 7 1 A B C D E
luan A 00000 1 0 0 10
Taxon
Taxon C
TalCon 0
Taxon E
e 11000 000 0 0
1 1 11111 20 1
1 1 1101 1 201
1 1 0010 1 1 1 0
---
(OJ Charac l ers
A B C D E
894856 41 2 10
Taxon A 0 1 000100 0 0
Taxon e 0000 0 00 1 10
Ta xon C 20 1 21 111 1 1
Taxon 0 2 01 2 0 11 1 1 1
Taxon E 11 01100110
'01 A B c D E
'"
.,.
,'"
Fig. 6.4 Bootstrap ana lysis. (a) The most parsimonious cl adogra m is determined for
the original da ta set. (hI Characten; are then sampled randomly with replacement to
produce a pseudoreplico le do la sel of the sanle size as Ihe original. SOllle characters
(e.g. #S) will be represented more than Ollce, while others (e.g. #3) will not be
included al all. The most parsimonious cladograms for the pseudoreplicale da ta set
are lhen constructed. (c) This process is repea ted 8 large number of times (e.g. 1000)
and the results summarized by means of a majority-rule consensus tree. Support for
a group is then interpreted as Ihe percentage of most parsimonious cladograms
resulting from the pseudoreplicotes in which the group is fo und.
inclusive (Swofford and O lsen 1990). Second, the confidence intervals ob-
tained through resampling methods are only approximate unless the o riginal
sa mple size, that is, the numbe r of characters ir. the data matrix, is large. This
is ' large' in the statistical sensc (more th an 1000 and prefe rably 10000) and
most data sets do not begin to approach this number of characters. Eve n
molecular data, which can comprise several Iho usand base pairs, do not
contain this number of info rm ative sites. It has been argued IhtH bootstrap-
ping is nOI affected by the inclusion of uninformative c!lIIracters (Harshman
1994) bul this has been shown to be in(.'Qrrect (Cirpenlcr 1996). Thus., few, i
any, dllta sets meet the .5t81i1uic8 !ill Wremnl of lhe bootlltrap.
Support for individual clades Orl a cJadogmm 131
An alternative view of the bootstrap is that it indicates how the support for
the various groups on the most parsimonious c1adogram is distributed among
characters. The expectation is that clades supported by a large number of
characters will be recovered frequently and receive high scores on a majority
rule consensus tree. In contrast, clades that are supported by only a few
characters, especially if these are homoplastic, are not' expected to be
recovered very often , if at all. However, a clade may be unambiguously
supported by a single character on the most parsimonious cladogram yet fail
to be recovered by a bootstrap analysis, due to the random nature of the
resampling. Thus, groups can be excluded fro m the majority rule consensus
tree even though they are uncontradicted on the most parsimonious clado-
gram. The bootstrap thus provides only a one-sided test of a cladogram.
Groups that are recovered are supported by the data. but groups thai are nol
recovered cannot be taken as rejected.
There is, however, a more serious and fundamental problem with the
bootstrap. This is the requirement that the characters in the original data
matrix should represent a random sample of all possible characters. However,
in systemalic studies, characters are not randomly sampled from indepcndent,
identically distributed populations. Rather, they are carefully selected and
filtered with the aim of best resolving the relationships of the taxa under
study. Such systematic bias is not considered to be a problem by advocates of
the bootstrap. who assert that attempts by systematists to try to ensure that
their characters are independent and uncorrelated are sufficient. However,
subjectively trying to ensure that characters are independent of one another
is simply inadequate. Unless the characters in a data set do accurately reflect
the larger underlying distribution of all possible characters, then bootstrap
confidence intcrvals may be very poorly estimated. There are also many other
factors that might lead to either overestimates or underestimates of confi-
dence, including size of the data set, efficiency of any heuristic search
procedures that are employed, cladogram topology, and differential and
uneven rates of character change among branches.
Thus, at best, the effects of these limitations mean that we would be unwise
to treat bootstrap confidence intervals as absolutes, although they may serve
as approximate guides to the support afforded to groups by the data. At
worst, the application of the bootstrap in cladistic studies can be considered
to lack rigorous justification.
Ja ckknife
1n contrast to the bootstrap, jackknife sampling is applied without replace-
ment and hence the pseudoreplicate data sets are smaller than the original.
Jackknifing aims to achieve better variance estimates than might otherwise be
possible from small ......... .Ill fint.arder jackknifing, pseudoreplicates are
constructed by randoallJ one obIervation (taxon or character) from
the data set. Hence. T obIervalions, T pseudoreplicates are
.1II!!ii!!!.~!!l.!!" oriainal IBmple. The
132 Support and confuJerlce statisticl' for cladograms and groups
variances of the T pseuctoreplicates are then averaged to give the estimate of
the parametric variance.
First-order jackkn ifing of taxa was introduced into systematics by Lanyon
(985). If a data set contains no homoplasy, then deletion of one taxon will
have no effect on the topology of th e most parsimonious cladogram. This is
because the information contained in the synapomorphies of that taxon is
inherent in those taxa that occupy more distal positions on the cladogram.
The most parsimonious cladogram obtai ned from analysis of a sample of
T - 1 taxa will thus be identical to that obtained from analysis of the
complete data set, but with the terminal branch leading to the deleted taxon
pruned oul. However, if there is conflict in the data, then analysis of
jackknifed data 'Sets need not produce the same topology as that derived from
the complete data set. Any conflict is revealed by constructing a strict
consensus tree of all the most parsimonious cladograms derived from all
possible first-order pseudoreplicates by rnearf!l.of a strict consensus tree (that
is, a consensus tree that contains only those components common to atl of the
fundament al c1adograms; see Chapter 7 for further explanation ). However,
the normal strict consensus method discards those taxa not present in all the
fundamental c1adograms and thus applying this procedure to cladograms
produced from jackknife pseudoreplicates would produce a consensus tree
that contained only the autgroup. Lanyon (1985) proposed a modification to
allow for the deleted taxa, called the jackknife strict consensus tree, which
contains those nodes that are shared by or consistent with all of the
pseudoreplicate c1adograms.
However, Lanyon's advocacy of a strict consensus tree suffers from the
drawback thai strict consensus is indifferent to the proportion of most
parsimonious c1adograms in which a clade is supported. Thus, it takes only
one cladogram from one pseudoreplicate that wildly disagrees with the
remainder in the position of one taxon to collapse the entire consensus into a
bush. To circumvent this problem, Siddall (1996) proposed the jackknife
monophyly index (JMJ), which is defi ned as:
T
[ p(c,)
1MI, -
,-I T-
=-;;;
where T is the number of ingroup taxa and p(c,) is the proportion of the
most parsimonious c1adograms of pseudoreplicate t in which clade c is
supported. Because the JM! is rnonophyly-dependent (i.e. it is calculated
using rooted c1adograms), it is inappropriate to jackknife the outgroup taxa
(ana logous arguments have been put forward with regard to lhe PTP test).
Siddall advised agai nst using JMl val ues as the basis for nccepting or rejecting
individual clades. While the JMI might, in some sense, indicate the amount of
stat istical support there is fur a clade, it certainlY: uacd to argue the
Su.pport lor individual clades 011 a cladogram 133
converse, that is, the degree of support against a clade. The JMI does show
which groupings on a cladogram are more stable and which are less stable.
The calculations can also help to identify ·critical' and 'problematic' taxa.
Critical taxa are those whose deletion results in a great increase in the
number of most parsimonious cladograms. In contrast, the deletion of prob-
lematic taxa stabilizes the results and reduces the numbcr of most parsimo-
nious cladograms.
Highe r-order jackknifing removes subsets of n observations 10 give pseu-
doreplicates of size T - II . As there is no justificat ion fo r stopping at a
particu lar subset size, the removal of all possible subset sizes from 2 to
(T - I) should be investigated. However, the number of possible ways in
wh ich any number of taxa (up to T - 1) can be removed from T taxa is
2T - 2. For more than about ten taxa, this results in an impract ically large
number of analyses to perform. Random sampling of all possible combina-
tions might provide a suitable heuristic solution, which could be performed by
randomly choosing a subset size and then randomly deleting this number of
taxa. T his procedure is identical to bootstrapping taxa, because this latter
method randomly deletes some taxa and randomly replicates others. As the
latter are duplicates identical in character information, they would act as Ol).e
terminal taxon in a parsimony analysis. The overall effect would thus be the
same as randomly deleting a random number of taxa, i.e. a higher-order
jackkn ife. Siddall (996) rather irreverently called this method the 'jackboot'.
Application of the jackboot would only be appropriate if the observations
being jackknifed could be assumed to be drawn randomly from some larger
sample, an assumption that might prove difficult to justify for taxa. Further-
mo re, the effects of higher-order subset deletions might be expected to bc
more severe than those of lower-order subsets. How these effects would be
weigh ted differentially remains unclear.
First-order jackknifing of characters was first applied in a systematic
context by Mueller & Ayala (1982), who advocated it for estimating the
sam pling variance of Nei 's genetic dista nce. However, first-order jackknifing
is of limited use when applied to cladistic characters because it amounts 10
little more than asking if there is more than one apomorphy supporting a
clade. In a homoplasy-free cladogram, all clades with only a single supporting
apomorphy would not be recovered, while al1 other clades would receive a
perfect score regardless of the number of apomorphies supporting them.
Likewise, higher-order jackknifing simply extends this problem and ulti-
mately, o nly those clades supported by at least as many characters as there
are taxa are guaranteed to be recovered.
6.4 SUMMARY
7. 1 I NTHODUCTlON
consensus trees. This method uses a metric that measures topological dis-
agreement between any pair of cladograms (Barthelemy and Monjardet 1981,
Barthe lemy and McMorris 1986). Nelson consensus trees comprise the cliques
of mutually compatible com ponents that arc most rcplicated in the funda -
menlal cladograms. Adams trees contain all intersecting sets of taxa common
to all c1adograms. In the greatest agreement subtree method, the least
number of branches arc ' pruned' from the fundam en tal cladograms to
produce the largest subtree with greatest agreement. In the trivial compari-
son of two cladograms, strict and majority-rule consensus trees are identical,
as are combinable components and Nelson consensus trees.
o
Nixo n and Carpenter 996b) argued that if the goal of consensus analysis
is to summarize agreement in grouping among a sel of fundamental clado-
grams, then only the stricl consensus tree fulfi ls Ihat goa l. All the other
methods listed above may yield trees with groups that are not full y supported
by all the data or are supported on ly ambiguously. These Nixon and
Carpenter called 'compromise trees', reserving the term conse nsus tree for
the strict conscnsus method only.
However, while we recognize the fundamental distinction between strict
consensus and other methods, we consider that there are equally fundamen-
tal differences among the ot her methods, and that the dichotomy proposed by
Nixon and Carpenter is unnecessarily doctrina ire. We therefore retain the
term consensus for all methods that aim to summarize the common informa-
tion contained in a set of c1adograms according to some specific critcrion.
Different consensus methods are suited to different tasks, alt hough the
literature is bewildering in that methods are frequently applied inappropri-
ately and the terminology has become somewhat confused (Nixon and
Carpenter 1996). Strict consensus is useful for determining common compo-
nents of all fundamental c1adograms, while combinable components consen-
sus highlights resolved components amongst a profile of cladograms some of
which contain unresolved components. Majority-rule consensus is most fre-
quently used to summarize the results of bootstrap analyses (see Chapter 6).
Adams consensus trees are used mostly to dete rmine the degree of preserved
structure in cladograms. They have their greatest value in compa ring see m-
ingly different topologies due to the erratic performance of ' rogue' taxa that
appear in widely different posi tions on cladograms. Largest commo n pruned
trees can be useful for determining incongruence in c1adogra ms when only a
few taxa are responsibl e [o r the different topologies.
ABCDEFGHI ACBDEFGHI
ACOEFBGHI
2
3 '0
Ie)
Fig. 7.1 Cladograms of (a) butterflies. (bl birds and (e) bats, calcula ted using
COMPONENT 2.0 for Windows<ll>, (From Bremer 1990) Numbers refer to componen ts
in Table 7.1.
rival cl adograms. These were also called 'Nelson consensus trees' by Schuh
and Farris ( 981). As Nelson 's (1979) consensus method of adding together
replica ting and non-replicating components is different from strict consensus,
the distinction made by Page ( 1989) is maintained here (see below).
A strict consensus tree is derived by combining on ly those components that
appear in all members of a set of fundamental c1adograms. Consider three
cladograms for buuerflks. birds and bats (Fig. 7.0. In a [ully resolved
cl adogrum, there arc n - 2 info rmative components. Thus, for a cladogram of
9 taxa, there are 7 informative components. However, among the three
cladograms in Fig. 7.1, we actually find 12 different informative components
(Table 7.0, indicating that there is conflict among the cladograms. Of these
12, only component 10, comprising the group G HI, occurs in all th ree
fundamental cIadograms. The st rict consensus tree thus contai ns only this
component (Fig. 7.2a). The rationale behind strict consensus is that the data
are only consistent for Ihis one component. In this particular exam pIc, the
conserva tiveness of strict consensus means that one is left with depressingly
little resolution, although this is not always the case.
We noted in Chapte r 4 that both PAUP and Hennig86 can gene rate
spurious cladograms that are due solcly to ambiguous characte r optimizatio n.
TIle length o f the strict consensus tree can be used to determine whether all
the apparent resolution found in a set of fund.mental cladograms is due to
such ambiguity. Suppose we have two drrlfMlt &bit contain two fully
Combinable components or semi-strict consensus 143
Table 7.1 Components of the butterflies, birds and bats cJadograms, analysed using
COMPONENT 2.0 for Windows'" (Pags 1993b). Rows (intsgers) are the 12 compo-
nents found in these three cJadograms: columns (letters) are the taxa, as labelled in
Fig. 7.1. The composition of each component is indicated by lhe asterisks and the
number of cladograms in which it occurs is given in the last column. For example,
componenl10, comprising taxa G, H and ~ is present in all three of the cladograms,
while component 8, comprising taxa B, E and F, occurs in only a single cladogram
(bats). See text for further explanation.
A B c D E F G H Occurrences
1 • • 1
2 • • 2
3 • • • 1
4 • • • 2
5 • • • • 1
6 • • 1
7 • • 2
8 • • •
9 • • 2
10 • • • 3
11 • • • • • 2
12 • • • • • • • • 1
Bremer's (1990) •
.--...----
_
_ -
..... 1.
,
~per drew attention to the
t 44 Consensus trees
ABDCFEGH ABDCFEGHI
ABCOE FG HI ABCDEFGHI
AB C DEFGHI COEFGH
Ie )
Hg. 7.2 Consensus trees of the butterflies. birds and ba ts cladograms shown in Fig.
7.1 . (a) Strict consensus tree. (b) Majority-rule consensus tree/ Media n consensus
tree. (e) Combinable com ponents or semi-st.rict consensus tree. (d) Nelson consensus
tree. (e) Adams consensus tree. \0 Grea test agreement subtree. Consensus trees were
calculated using COMPONENT 2.0 ror Windows·. Numbers refer to components in
Table 7.1.
When many trees are being compared, a majority-rule consensus tree may be
preferable to strict consensus (Swofford 1991). Instead of including only those
groups that occur i.n the entire set of fundamental cJadograms, it is possible
to retain a pre-specified number of those c1adograms in the majori ty-rule
consensus tree. Typically, majority-rule trees are specified to re tain those
compone nts that occur in more than 50% of the c1adograms (Margush and
McMorris 1981). Thus, when set at 50%. the groups retained must appear in
more than half of the fundamental cladograms, because different groups that
occur in exactly 50% of the trees may conflict with each other. In our
butterfly, bird and bat example (Fig. 7.1), the four components shared by the
butterfli es and birds (Table 7.2, components 4, 7, 9, JO, together with the
universal component 10, appear in the majority-rule consensus trce (with
percentages of 66% and 100% respectively) (Fig. 7.2b).
The median consensus procedu re (Barthelemy and Monjardet 1981.
Barthelemy and McMorris 1986) is closely related to majority-rule consensus
method (Swofford 199 1, Page 1993b). This method uses a tree comparison
metric that measures the degree of disagreement between any pair of
c1adograms. Thus, if the distance between a pair of cladograms. T/ and ~, is
d ~ Ji , ~), then the total distance of any cladogram T to k rival cladograms is
given by:
k
dT - l: d(T ,T,)
1- 1
A c1adogram, Till' it. modIu tree i( ill total distance, d m , to the rival trees
is less than that (or IDI 0Ihw ...... m. lf the symmetric difference distance
(Robinson and Foukll 1... ....., .. 111. 1114) it used as the tree comparison
metric. then tM n, us tree i. • median tree
146 CO/l.~·ensIlS trees
Compatibility malrlx
,
1
3
, 1
4
5 , 1
I I I
6
7 1 1 1 1
I.
8
9
11
1
1
1
1
1
1
1
1
1
I
1
1
1
1
I
1
1
1
1
1
1
.'. I
1 1
1
12 1 1
, 3
I
4 5
1
6
1
7
1
8
1
9 I.
1 1
11 12
• •
(Barthelemy and McMorris 1986, Swofford 199 0 (Fig. 7,2b). When there is
an odd number of rival cladograms (k) or if there are no groups that appear
in exactly 50% of the rivals, the majority-rule tree is the only median tree.
However, when k is even, any tree representing a combination of the
majority-rule consensus tree with one or more combinable groups that occur
in exactly half of the rival cladograms is also a median tree (Swofford 1990 ,
or the methods described here, median consensus trees are the least fre-
quently encountered.
components may appear in the Nelson consensus tree that can be contra-
dicted in some of the original c1adograms. This version of Nelson consensus is
available as part of the COMPONENT for Windows· computer package
(Page 1993b). By way of example, we can examine the result obtained when
determining the largest cliques for our butterfly, bird and bat example (Fig.
7.0 . If we conside r taxon B, it occurs as the sister taxon to taxa C and 0 in
the butterfly clade, B(CO), as the sister to taxon 0 in the bird clade, C(BO),
and as the sister to taxon F in the bats clade, E(FB). As the sister pair CD
occurs in both the butterflies and bats (i.e., in 66% of the cladograms), and
the group BCD occurs in the butterflies Rnd the birds (also in 66% of the
cladograms), then the largest clique ,'Or all three cladograms is B(CD). For
our butterfly, bird and bat example (Fig. 7.1), the Nelson consensus tree (Fig.
7.2d) is identical to the majority-rule tree, with components 2, 4, 7, 9, 10 and
II comprising the largest clique for all three c1adograms (Table 7.2).
Adams consensus trees (Adams 1972) are derived by relocating those taxa
that occur in conflicting positions on different fundamenta l cladograms to the
nearest node they have in common. Adams trees therefore contain all the
intersecting sets of taxa (nestings) common to all the fundamental cladograms
in any given set of dadograms. Thus, given two sets of branches, A and B,
and a c1adogram T, set A nests inside set B if A is a subset of B and clades in
A have a more derived common node in T than does set B. For example,
given the c1adogram A(B(C, D», the se lS BC, BD and CO all nest inside
ABCD, but only CD nests inside BCD (Page 1993b). Adams trees are
particularly useful for summarizing similarities in topology in the fundamen -
tal cladograms when they contain onc or more taxa that show very different
positions. For example, in our butterfly, bird and bat example (Fig. 7.1), all
three c1adograms share the nestings CD (component 2), EF (component 7)
and GHI (component 10), However, taxa A and B have three completely
different placings and thus are placed on the Adams consensus tree at the
lowest common node, that is, the unresolved basal node {Fig. 7.2e}. Adams
consensus trees must be used with care because components can appear in
them that do not occur in any of the fundamental cladograms.
In all the cons~n'u. methodl delcribed above, the consensus tree contains
the same number of tau II do the fundamental cladograms. A rather
different method of ~rams is an aarcement subtree, which
148 Consensus trees
shows only the clades and taxa th at are common to two or more fundamental
cladograms. [0 this method, the greatest agree ment subt ree (GAS) is ob·
tain ed by pruning one or more branches from each fundament'al cladogram
until a set of identical topologies is obtained. Finden and Gorden ( 1985)
re ferred to these as 'common pruned trees', The GAS is lhe subtree that
results from pruning the least number o f branches from the fundamental
cladograms. Given two cl adograms, T. and T 2 • Page (I 993b) defin ed the
distance, d OAS (Til T2 ), as the numbe r of branches removed to obtain the
grea test agreement subtree. By using a recursive algorithm (e.g. Kubicka el
al. 1995), a branch of one cladogram is selected and compared to the other
cladograms. The largest supporlcd subl.ree in Iht: o the r c1adograms of com-
parison that contains the selected branch is maintained. Each branch is
selected in tu rn and compared to the o ther cladograms. From amongst the
full range of subtrees obtained the largest agreement subtree is selected.
This method is most useful when one o r twt\ taxa are incongruen t amongst
the profile of fundam ental cladograms. Rosen (1979) used a common pruned
subtree to indicate common components in [wo seem ingly different area
cladograms for central American fishes. In our butterny, bird and bat
example (Fig. 7. '1), the largest agreement subtree is that which has had taxa A
and B pruned out to leave a common topology of six taxa (Fig. 7.20. An
alternative implementation might perm it the inclusion of uncontradicted
components, in which case, the trichotomy in Fig. 7.2e could be resolved as
G(HI).
7.8 CONCLUS IO NS
It sho uld be recognized that in most studies, it is the fund amental (data-
derived) cladograms that provide the most direct and reliable evidence of
relationships among the taxa under study. The only exception to this rule is
when the strict consensus tree is the same length as the fundamental
cladogram and is thus the strictly supported cladogram fo r that data set. Then
it represe nts our best estimate of relationships among the taxa. In all other
instances, when the consensus trees are longer than the fundam ental clado-
grams from which they are derived, they are worse estimates of taxon
relatio nships. However, of the plethora of methods and statistics for compar-
ing fundamental cladograms, consensus trees are the most useful fo r examin-
ing the results of clad ist ic analyses that yield mo re than one min imu m lengt h
cladogram. Conse nsus trees provide summary information about among-
cI'ldogram char<'lcter connict by providing an upper bound for the le ngt h of
characters among equally most parsimonious c1adograms (Nixon and
Carpenter 1996b). Strict consensus trees include only those groups (com po-
nents) for which there is unambiguous support Ilmo ng fundamen tal dado-
grams. All o ther methods include some degree 01 ~IOUI support.
Chapter summary 149
An area of cladistic me thodology thai has attracted much anent ion recently
concerns procedures for treating dat a derived from different sources. Most
syste matists acknowledge that there are different kinds of data, e .g. morpho-
logical, molecular, embryonic or larval , behavioural, elc. The debate concerns
the methods by which we analyse Ihese data and combine the m to revea l a
commo n phylogenetic histo ry. Some authors (e.g. Kluge 1989) argue that all
data should be analysed in a single matrix (Fig. 8. 1a), This has been called the
total evidence or character congruence approach because the final clado-
gra m(s) results purely from inleraction among all available characters. A
subtle hut im portant change in terminology, which we adopt , has recen tly
been in troduced by Nixon and Carpen ter (1996 a), who argued that this
approach is best called simultaneous analysis because all system atists ideally
use all (total) evidence, irrespective of the way in which they then deal with
Ihal evidence. Olher authors (e.g. Miyamolo and Filch 1995) prefer to
analyse data separately and then use consensus methods to combine the
resulting cladograms (Fig. 8. lb). This is called parlilioned analysis or the
taxonomic congruence approach because the fina l c1adogram(s) is the result
of adding together taxon cladograms, each derived from analysis of separate
data sels for the same laxa. Although there are advocales of bUl h approaches,
some systematists have been more concerned with prescribing condit ions
under which one or the ot her method may be most appropriate. Th at is, in
any particular circumstance, should we combine data (simultaneous analysis)
o r keep them separate (part itioned analysis)? In this chapler, we will ou tline
the main theoretical and methodological basis of both approaches, the
clai med advantages of each, as well as discussing possible cond itio ns under
which it may be better to use either simultaneous or part itio ned analysis.
It is worth emphasizing that vicariance biogeography relics o n a partitio ned
analysis approach inasmuch <IS c1adograms of different organisms inhabiting
similar areas are 'added ' together using consensus techniques. However, in
th is chapter, we are concerned lulcly with analysis of d ifferent kinds of data
rclevant to the phyloaenelle hillory of • pHrticul:l r group of organisms.
By way of introductloll. . . . _ .... eumplcs of studies that have used
huth "rIP"'":~U'U!~.!!!!1
152 Simultaneous and partitioned analysis
(a)
_ _I - ID"~ ~I I + I D"~~I I --
...
'\Y
BCDEF
(b) , • c •, , , ,
•, , ,
• c ABC 0 E ,
~ro'=W' ~
P-.d
V
ONIytll
(10 _ _ _)
+
Fig. 8.1 (a) In si multaneous analysis, separa te data sets are combined into a si ngle
ma trix before being analysed. (b) In partitioned analysis, eac h data set is analysed
separately (vertical arrows), yielding sets of intermediate cJadograms Lhat are then
'added' together using a consensus method. In bQth approilches. if more than one
most parsimonious cJlldogram is obtained fro m 'the analysis of a data set, then
consensus may be used to summ8ri?.e th e results.
+
-.
morphology st r ict consensus
ff' .~
.'
"""",.
;Jf'.~.~~~..J' -~•• ~ _
..", _",,-If.Ji'
A if ~ ._,.l. ~'V#,b'll' .0.... - -I".Ji!>
C"<f ,0 (if' ..... ~ v ~ <f
simultaneous analysis
fII. I.Z Analysis of seven milkweed butlerfiies of the genus Amauris and three outgroup taxa (Danaus chrysipPu$, Tirumala
formosa and T. petiverana). Two data sets were availa ble, comprising 32 morphological characters (29 informative) and 68
c:bemical characters (63 informative). The partitioned ana lysis is shown along the lop row with the strict consensus trees from the
two separate analyses and the combined strict consensus tree. The result of simultaneous analysis is shown in the bottom row.
Informative nodes are circled. {From Vane-Wright a l al. 199zJ
~
.I'
.:P ,,-. -"
~..
:<t~ ~~
~~ ~~'If NA
_,{O
/'
If!'
,i>'O:>
,cO
.r\.Ji.,,l'~~
~,§>
il'
~",
,§>
../
If!'
Ii'
, ~
~~ ~~'
.. ". ".
~f' ~~
",,' of J' ~.. ,".rlf' ,.
/fI''f>...;.~ #~... #4t
I' ,,l'. .." J' ,.
~. #' ~ <5<." ••• ,;<"/,/./~<6''t' .,I'
.ji :,.~'¥ ~, 'IF:<:/' ~,.
<6'~ ,.~~ <&-"" qI' ~,<:," #- ,..~
"'" '(f' >J'
.> ./ <5<""'-/ <6' # <6''''<6' /'
simultaneous analysis
IF.. 8.3 Analysis of echinoids using throe data sets: 163 morphological characters (50 informati ve), 2i8 base pairs within the SSU
:SA gene (34 infonnative), and 91 base pairs within the LSU rRNA gene (28 informative). The partitioned analysis is shown along
Ithe lop row. The cladogram derived from simultaneous analysis of all three da ta sets is shown below. Note thaI this example is
e xtracted from a larger study in which morphological data were coded for many more taxa, which explains the relatively few
infonnative morphological charflcter.;,(From Littlewood and Smilh 1995.)
Theorctical issucs ISS
+ --.
1 2SIfNA strict
+ ".,""/ .J>
,f~'j}-
...'
"," "J.J'. ","'""
,# ~ ~J''il ~ r o· o·,r.I'
q.
I:t
q . q . q . q . q . q.
./ 4-.;'
O·
simultaneous analysis
.... 8..Analysis of two mitochondria l genes of seven species of deer mice (Peromyscus) and three species of grasshopper mice
COnychomus). The cytochrome b (Cyl h) sequence contained 61 informative characters and the 12S rRNA had 48 informative
characters. The partitioned analysis is shown along the lop row. The cJadogram derived from simultaneous analysis is shown below.
The figures are the percentage of bootstrap replicates in which the groups were recovered. (From Sullivan 1996.)
Partitioned analysis (taxonomic congruence) 157
and SSU rRNA cladogram that were judged as insignificant , but the differ-
ences between the LSU rRNA cladogram and morphology were judged as
significa nt, suggesting that they should not be combined. However,
LiUlewood and Smith (1995) recognized that this significance was due almost
entirely to the different positions of Arbacia on the two cladograms and
rega rded Arbado as a rogue taxon. They considered that because the
he terogeneity between the data sets was caused by a single taxon, then
simultaneous analysis was the preferred option.
Table 8.1 Templeton's test of data he terogeneity appl ied 10 the Amauris butterfly
da ta. For each of the th ree fu nda mental c1adog rams that fonn the morphology
consensus and the chemica l consensus, the topology is chosen frolll one set that
most closely resembles one from the other sel Morphologica l characters are then
optimized on to hoth cladograms. For each character the difference in performance
on its own cladogra m and the rival cladogram is noted together with the number of
extra (+) or fewer (-) steps. The total number of characters is noted keeping extra
and fewer steps separate. The differences a re then ranked (mid-point scores given
when two or more characters share tile sa me number of differences). The sums of
tho positive a nd negative ranks Are ca lculated separately And the smaller of the two
figures ta ken as the test statistic. The probability value is read ITom a critical value
ta ble of the Wilcoxon Rl:mk Sum. If the probability is < 0.05 hhe critical va lue}, then
the d ifference betwee n the performance of the characters on to its own cJadogrom
and the rival cladogram is signifjcant. The reciprocal operation is also performed for
the biochemical characters. In this case, it is concluded that the chemica l da ta are
not optimi7.ed signUicontl y belter on their own c1adogra m than on the morphologi·
ca l c1adogram. However, the morphological data are optimized significanl1 y better
to their own dadogram than to the chemical c1adogra nl. This means th at the
chemical signal is weak and the data may be combined.
Ranked sum 55 ()
Test sta tistic ::: 0; probability 'C' . , ......meanl ,
164 Simultaneous and partitioned analysis
Table 8.1 (Continued)
Number of times
character clnHl ~as on
Biochemical biochemical morphology Difference Rank
character topology topology + +
37 3 4 6
3. 1 2 1 6
62 1 2 1 6
64 1 2 1 6
65 1 2 1 6
7. 2 3 1 6
71 1 2 1 6
72 3 2 1 6
73 2 1 1 6
78 3 2 1 6
80 1 2 1 6
Total =11 8 3
Ran ked sum 48 18
Test statistic = 18; probabiUty value> 0.05, not significant
8.7 CONCLUSIONS
I. Simultaneous analysis combines all ava ilable data, from whatever source,
into a single data set for analysis.
2. Partitioned analysis allocates data derived from different sources to
sepa rate data sets, which the n arc analysed individually before combining
the results using a consensus method.
3. Simu ltaneous analysis aims to maximize explanatory power of the data by
explaining the distribution of all characters in the most parsimonious
manner. By testing hypotheses of homology through character congru-
ence, it follows that the greater the number of characters included in an
analysis, the more rigorously are those hypotheses tested. This is the
strength of simultaneous analysis.
4. In contrast, partit ioned ana lysis is bilsed upon the premise that there are
genuinely different classes of data that may re nect different evolutionary
processes. These data classes may thus be independent indica tors of
relationships and need to be analysed separately. In this way. the resu lts
of the separate analyses may provide tests o f o ne another.
5. Classes of characters that evolve accordi ng to different ru les are terme
process parti tions. However. given. the complexity of. g~ n e ex pre~~on an~
our meagre knowledge o f evolutio nary processes, It IS very dlrflcul t to
justify where to draw up the pa rtitions or to decide how fin e they mus
be.
167
De Pinna (996) wrote that ' the most imeresting idea in mainst ream theo reti-
cal systematics in recen t years is the so-called three-item analysis', This may
strike cladists as a curious stateme nt, given that most commentary relating to
three-item statements analysis has been decided ly negative (Harvey 1992;
Kluge 1993, 1994; Farris et al. 1995). He wetH on to point out that a major
success oC Ihe cladistic approach was the recognition that anceslor-
descendent relationships among taxa cannot be object ively proposed or
tested. only siste r-group relationships (see Chapler O. The idea that one
taxon can 'give rise' to another is ack.nowledged to be beyond proper
scientific investigation. 1t is perhaps curious that cladists still treat characters
in a pre-cladistic way, as if one character state can give rise to another,
relating one state to another in ancestor-descendent fashion. Th is approach
seems embedded in the standard approach to character coding with the
recognition of ' transformation series' and the use of ch(lracter optimization.
If ancestral taxa have bee n discarded from scientific enquiry then why not
ancestral characters? Both appe<lr to be based on the absence of evidence
and the formalism of conventional cladistic approach.
All cladis[s agree that cladistics is about grouping by synapomorphy and
that synapomorphy is evidence of relationship. Three-item statements analy-
sis is an altcrnative way to codc data based on the idea that ' taxon' and
' homology' represent the same relationship (Nelson 1994). 11 departs from
other approaches by focusing on the smallest possible unit of relationship, the
three-item statement, and how these fit most parsimoniously to possible
cladograms. In this sense, three-item sta temen ts analysis is an en tirely differ-
en t way of viewing data. Acco rding to its creators (Nelson and Plat nick 1991),
three-item statements analysis improves (he precision of parsimo ny. This
cnapler concentrates on the impleme ntation of three-item statements a naly-
sis and outlillt:s some possible ways in which precision is improved.
9.2 CODING
Prior to analysis, systemat ic datil (observations) are coded a.~ series of hinary
or muitistate characters renecting judgeme nts uf primMry homolog)' (sec
Cmlillg
Chapter 2). Using a fo ur taxon example (A- D), if taxa C and Dare observ
to have a feat ure, the n they are usually coded as ' I' and taxa A and B, which
lack that feature. a re coded as ' 0'. The hinary ch<lf<lcte r incorporates an
element of ' ide ntity' (the Js) and an clement of 'difference' (the Os). Each
binary character is assumed to be an independent homology. For muitislale
characters, the different st ates are assumed homologous among themselves
a nd as a consequence are non-independe nt (i.e. th e Is, 2s, e tc., a re de pen-
dent). Multistate cha racters are often represented (or inte rpreted) as suites of
binary characte rs in a clad istic a nalys is. Fo r the purposes of this chapter,
re presenta tion of data as binary or mu ltistate variables will be re fe rred to as
the standard approach .
Three-item state ments ana lysis, in contrast, docs no t re prese nt syste matic
data as binary and multistate variables but reduces observa tions to their
simplest expression o f relatio nsh ip, a three-ite m state me nt. Fo r example, the
three-item state ment A(BC) implies thai taxa Band C share a relationship to
the exclusion of taxon A. Suites of three-ite m statements ca n be a rra nged
into a statement X taxon matrix for a nalysis with a parsimony program, in a n
ide ntical fashion to standa rd binary a nd multistate data.
Using the same four taxon example (A - D) as abovc, in which C and D
(aJ (bJ ('J
ACD9CD"aCD
Y·Y -V
(dJ
A (CO) • ."1 A B (CO)
(eJ II)
, , , , , , , ,
0
•
,•
0
0
-v .10
B O?
'"
0" -v
F'ig. 9.1 Diffe rent analytica l representations of one binary character, AB(CD), for
which two three-item sta temen ts are possible: A (COI and BleD}. \a) Diagrammatic
represen tation of the three-item statement A(CO). (b) Diagrammatic representation
of the three-item statement B(eO). (c) Diagrllmma lic representalion of Ihe solulion to
A(CDI + B(eD) = AB( e D). (dl WrlttOIl ropl1l86l1ta llon of Ihe statements find solution In
(II -C). (e) Standard dllta m,trll( ,nd lI' 10111110 11. (f) Three-Hem statfltnonts l1ullri x and
its solution. Note thai while IbII IOluUon 'PPftlllS to 1>& the seme os (d), II is actually
Il strict conSfln8U' 1rH of . ., mOAt parslmo nlou8 lulutlonr< lind the
m.
170 Three-item stalements analysis
possess a particular feature and A and B do not, two three-item statements
are possible: A(CD) and B(CD) (Fig. 9.1a-b). Addition of these two three-item
statements produces a summary cladogram (Fig. 9.!c), in which the two
three-item statements combine to unite C and 0 : A(eD) + B(CO) - AB(CD)
(Fig. 9.ld). This is identical to the summary cladogram from the binary
character, AB(CD). which unites C + D on the basis of a common possession
of state I (Fig. 9. Ie).
From Ihis simple example, we can see Ibal for primary representation of
data (the o riginal observations), there is no difference between the three-item
statemenls and the standard approaches. The choice facing systematists reSIS
upon which aspect of the data they wish to represent from their original
observations. As a starting point, it is worth remembering that cladistics, in its
most general form, is concerned with hierarchical patterns, whether those
patterns express relationsh ips among characters. taxa, genes o r areas (Nelson
and Platnick 1981). In short, an hierarchical ~ttern does not imply a process
but expresses degrees of relationship. Cladistics is the study of relationships.
The above example demonstrates that adding two three-item stateme nts can
be ach ieved simply by hand. Howeve r, most data sets involve many (some-
times very many, see below) three-item statements and computerized meth-
ods then become necessary. Three-item statements can be represented in a
standard character x taxon matrix. However. if we consider our earlier exam-
ple (Fig. 9. la,b), for the statement ACCD), there is no corresponding data
point for taxon B, wh ile for statement S(CD), there is no dat a point for taxon
A. Nevertheless, current parsimony programs require that a ll cells in a data
matrix be filled with a val ue of some kind and so these ' data' points are
represented by question marks (Fig. 9.10. The results then differ from those
expected, in that not one but three mOst parsimoniolls cladograms are found:
AB(CD), A(B(CD» and B(A(CD» . The strict consensus tree of these three
solutions is shown in Fig. 9. If (for furthe r explanation, see §9.3.6). This aspect
of three-item statements analysis has been exploited in various critiques
(HalVey 1992, Kluge 1993, 1994). However, it should be noted that the
differences between the manual and compute r solutions are due to current
idiosyncrasies of parsimony programs, especially their treatment of question
marks (see Chapter 4), and not to the form in which the data are represented
(Nelson and Ladiges 1993). 11 should be borne in mind that implementation
of a method involves issues separate from the reasons to adopt the method,
although both are connected.
compari ng each pair of taxa that has the informative state with every taxon
that lacks that state. For example, for a character expressing the relationship
ABODE), there are three three-item statements: A(DE), B(DE) and c(DE).
Only taxa DE possess the informative state and as a pair are re lated to A,
Band C. The number of possible three-ite m statements is given by
(t - n)l1(n - 1)/2, where 1 - the total number of taxa and n - the number of
taxa with the informative (apomorphic) state. For ABC(DE), 11 - 2 and t "" 5,
hence (5 - 2)2(2 - 1)/2 = 3 statements. For a character expressing the rela-
tionship AB(CDE) there are six Ihree-item statements: A(CD), A(CE), A{DE),
B(CD), B(CE), and B(DE). Taxa CDE possess the informat ive state and
constitute three pairs, CD, DE, and CE with each pair related to A and B.
Thus, 1/ - J and 1 - 5. hence (5 - 3)3(3 - 0/2 = 6 statements.
Table 9.1a is a standard mat rix for four taxa (A- D) and 10 binary characters
0 - 10). Ana lysis of this matrix yields one cla~ogram (Fig. 9.2a; length - 13).
The three-item statements ma trix for the same data is given in Tabl e 9.l b,
analysis of which yields the same cladogram (Fig. 9.2a, length"", 30). The
three-item statement 'characte rs' can be grouped into six sets of three-item
statements, with a total o f 24 statements: A(CD) (x7), B(CD) (x3), A(BC)
( x 4), A(BD) (x4), C(AB) (x3), D(AB) (x3). Of these, 18 are included in
the resu lting cladogram , while six are excluded. The form e r are referred to
as 'accommodated three-item statements' (ATS) and the latter as 'non-
accommodated three-item statements' (NTS). Accommodated statements each
fit to a node with a single step and add one step to the cladogram length.
Table 9.1 (a) Standard matrix of four taxa (A- D) coded fo r 10 binary characters. (b)
The correspond ing three·item statements ma trix. Statements have bellil arranged
into six groups of eqUivalent statemenL<;. Characters 1-3 and 8- 10 yield two
three-ite m sta tements (a and b). whi le characters 4- 7 yield three statements (o - c)
See text fo r further details.
I 2 3 4 5 6 7 8 9 10
A 0000000111
B 00011111 11
C 1 1 1 1 11000
01111111000
(0)
1 23 4567 1 2 3 4 5 6 7 4 5 6 7 8 9 10 8 9 10
aaaaaaa ob 0 b b b b c c c c ••• b b b
A 0000000 ? ? 7 0 0 00 0000 t 1 1 t 1 1
B 11??1?1 0 0 0 1 1 1 1 1 1 1 I t 1 1 1 1 1
C 1 1 1 11 1 1 I I 1 1 ? ? ? ? o () 0 ? ? ?
D 1111 11 1 t 1 '! 'f ? ? 1 1 1 1 'f 'f 'f () 0 0
b"pleme/ltatioll 173
A B COABC O
,v y
F·ig. 9.2 (a) Three·item solution for the matrix in Table 9.1b under unifoml weight-
ing. (b) Alternative cladogram found using fractional weighting of the same d ata.
Non-accommodated statements fit the cladogram twice and add two steps to
the length. T hus, the relationship between data ((he suite of three-item
statements) and c1adugram length is given by:
AlBel 1 ,
!
AlDOl 1 l 1
A(CD) 1 ,
! 1
B(eD) 1 1 1
3+2 2+2 4
Total 5 4 4
Cladogram 1 Cladogram Z
UW FW (no factor) FW{xJ) D Statements UW FW (no factor) FW{ x J) D
,,,
,IS
,,
4 3 (2~) 8 A(SO) 4 3 (Z ~ ) 8 X
7 6 (st) J7
, AICD) 7 6 (st ) J7
,,
3 3 9 BlCD) 3 3 9
3 3 9 X CIAB) 3 3 9
3 3 9 X OWl) 3 3 9
9.3.7 Optimization
F"13- 9.3 'Optimiza tion' of the three--ilem statement AICOl onto the c1adogram
A(B(C(OE))). (a) Pursimony computer programs that require a ll cells of a da ta matrix
to be filled and attempt to optimize the 'condition' in taxon B a~ either 0 or 1, even
though this is nonsensical. (b) The correct 'optim iza Hon' of the three·Hem statement
A(CD), in whic h taxon B is irreleva nt. The sta tement A(CDI should be raad os 'C and
D are more closely rela te d to ear-h o the r th lln e ithe r is to A' nnd not in the form of
the s tanda rd approac h, wh ich seHII lIodos 811 pos!llhle (ancHlltnd) ' Irnllsformllt lo nll' of
olle dUlra<:tllr (ur ~ I IIIIII 11.'-'0 e.DQlher,1
Implemelltation 179
other than either is to A' not as in the standard approach, which sees nodes
as possible (ancestral) transformations of one character (or state) into an-
olher. This point was misunderstood by Farris el al. (1995) who, in their
examples, count three-item statemen ts as if they are optimized characters.
The assignment at node X in Fig. 9.3a is due 10 the programs treating
question marks as ' potentia l' data when, in the case of three-item statements,
they are no such thing (the standard approach also has problems; see §4.2).
T he main point is that despite the default optimization of values, ' l-le nnig86
(and PAUP) effi ciently implements three-item analysis because tree length. if
not optimization, is exact ' (Nelson 1993) (our emphasis). Cladogram length
still reflects optima lity accurately.
A three-item statement involves only three terminals and therefore will either
fit to a node on a cJadogram or not. In other words, a three-item statement
will display either one step and have a ci of 1.00, or two steps and have a ci of
0.50. Hence C I is not a use ful measure as it simply distinguishes the fil to a
node of each statement.
For ri values, the situation is somewhat different. Plalnick (1993) drew
attention to how three-item statements differ from binary characters in thcir
fit to cladograms, by noting the performance of a suite of three- item state-
ments on a series of different (specified) topologies (Table 9.4) The topolo-
gies of the c1adograms are not rel evant to this discussion. Note thai RI
reflects the number of accepted (accommodated) three-item state men IS as iI
fraction of the total number of th ree-item statements considered. For exam-
ple, cladogram I accepts 65 statemen ts as true, hence 65 / 135 = 0.48; dado-
grams 2 and 3 accept 54 statemen ts as true, hence 54/ 135 = 0.40; cladogram
4 accepts no statements as true, hence 0/ 135 ... 0; and cladogram 5 accepts
all 135 statements as true, hence 135/ 135 = I. There fo re Rl would appear to
be a useful measure for the amount of fit ror each possible cladogram.
Table 9.4 Data for nve cladograms discussed by Pl atnick (1993). Each clacl0Krnm
is tabulated for the number of 8ccepted (accommodated) and prohibited (nun ·
accommoda ted) statements. together with some more conventional sta ti stics.
Numbe r of Statements
Acccpts Pro hibits Steps Nodes Length R1
Cladogram 1
Cludogram 2
Cludogram J
('True')
65
54
('Fa lsc')
70 6
2
2
12
11
4
205
2Ui
21ti
.
4()
4()
Cludo81"Ulll 4
Chulo8rom 5 • (;
1
2111
Uri
0
100
180 Three-item stalemems analysis
9.3.9 Summary of implementation procedures
Implementation of three-item statements analysis can be executed in a
manner similar to standard character analysis but requires the same attention
\0 detail. This is because all currently available parsimony programs were
designed with a different purpose in mind. Nevertheless, to repeat, ' Hennig86
(and PAUP) efficiently implements three-item analysis because tree lenglh, if
not opt imizatio n, is exact' (Nelson 1993).
Improvements in precision in three-item statements analysis come from
three sources. First, attention must be given to redundancy in the data and
fract ional weights should be used. Second, the scaling of weights must be
appropriate to avoid oversi mplifying the effect of integer weights currently
required by parsimo ny programs. Third, the final cladogram(s) should be
minimal with respect fi rst to length and then to the number of nodes
supported by data.
Platnick et aI, (1991 a) suggested that the best cladogram for ava il able data
should satisfy ' the criteria of parsimony, relative informativeness of charac-
ters, and maximum resolution of characters'. In both three-item statements
analysis and the standard approach, parsimony is the principle used to fi t data
10 a cladogram and fuJiy resolved cladograms are the object ive. It would seem
that 'maximum resolution of characters' is an issue that has only recently
been explored. The relative informativeness of characters differs depending
on the way data are represented. and this may be the major distinguishing
fa ctor between the standard and three-item approaches (see also §4.2).
When Nelson and Platnick (1991) first proposed the use of three-item
statements to analyse systematic data, they suggested that it might improve
the precision of parsimony (sensu Farris 1983). They presented results from
the analysis o f several hypothetical and one real data matrix (from Carpenter
1988) comrasting the results of three-item and standard analyses. Their
resu lts showed that three-item statements analysis somet imes produced fewer
cJadograms, sometimes more cJactograms, and sometimes different dado-
gra ms compared wit h the standa rd approach. What, then, is the meaning of
' more precise'?
The suggestions of Platnick et al. (199Ia) once again provide a valuable
means of understandi ng the issues facing cladistic practice. The best dado-
gram ror ava ilable data should satisfy ' the criteria of parsimony, relative
informativeness of characters, and maximum resolution of taxa', As stated
above, bmh three-item statements analysis and the standard approach use
parsimony to organise the data . Both approaches also attempt to gain
' max.imum resolution or taxa', such that all nocIIIlN Iy lupported by data.
Precision 181
Table 9.5 (a) Matrix of conflicting charac ters for fou r taxa (A- D)
and an all·zero root (0), coded for three characters 1- 3. (b) The
equivalent three·item statements matrix. {After Nelson 1996J
1.1
1 , 3
0 0 0 0
A 0 0 0
B 0 1 1
C 1 0 1
D 1 1 0
Ibl
, 3
• b • b • b
0 0 0 0 0 0 0
?
A
B
,
0 ?
0
0
1
?
1
0
1 1
C 1 ? 0 1
D 1 1 1 ? 0
The o nly factor that can differ between the two approaches is the 'relative
informativeness of characters' or, perhaps more accurately. the relative
informativeness of observations.
Nelson (I996) provided a series of examples that demonstrated improved
preCision by greater resolution of character data. Nelson analysed a series of
data sets with four to seven taxa, Taxon A has all plesiomorphic states (Os),
while the other taxa, B - n (where 11 - CD, eDE, up to CDEFGHI), have
different combinations of conflicting apomorphic states (Is). Each matrix was
analysed wi th an all -zero root, 0, as the focus of interest was the relation-
ships among taxa A - 11. Taxon A is not an outgroup but part of th e problem
requiring solution.
For example, standard analysis of the data in Table 9.5a yields six equally
most parsimonious cladograms, the strict consensus tree of which is an
uninformative bush. This result suggests that there is no overall information
in this matrix-or at least the information has maximum conflict. If the same
data are represented as a suite of three-item statements (Table 9.5b), three
cladograms result, the strict consensus tree of which is A(BCD). This result
suggests that there is information in this matrix to relaJe B + C + D as a
group relative to A.
Nelson (1996) analysed 120 matrices, in which there were connicting
characte rs in three (BCD) to eight (B-O taxa, using the standard approach
(although the series could be utended indefinitely). The results from 96 of
these matrices included lhe poup • - n relative to A. while 24 did not. In
other words. over thr • JIOde. which lalufftCicnt to suspect
182 Three ·;lcm statements analysis
ABC 0 C o A
• B A C 0 B 0 C A BCD A
-.. ,~:,
,~., ........ 1
,>{;., ........ 1
,~.......... ,~.. .... . . . 3
A 0 C B o A C B DeB A o B C A C BOA
,:t. ~:, ~, • •
C BOA o 8 C A B 0 C A A C 0 C o A
,)!Y:::::.. >!.
:p) Y=::. :q) ... ,,, . J ..... ... J -,. ..
C
>\~
Y::::~.
:u)
ABC
BOA
.
D
C A
,.....
,.) ........
8
.....
0 o B C
...", .. ~,..
~:,
A A 0
•c •
...,", Y::,
A C 0
Y ,,,,,, .. 2
>\~,
I:::::.
!)
Fig. 9.4 (a- z) The 26 possible topologies for four taxa, A-D. Two lengths are given
for ea ch cladogram. The upper figure relates to the binary character, All(eD); the
lower 10 its two constituent three-item statements, A(CO) and I3(CD). The four
cladogra ms enclosed within single-line boxes (a-c, y) accommodate the binary
charactor a nd both threo- item sta temsnts wi th a single step each and thus explain
all of the data. The binary character fil s to a ll other cladograms wilh two steps,
whi c h thus appear 10 oxplaln nono of tho data. Howover, the six cJadograms
enclosod within doublu-Unu hU)108 (d- f. k, r-s) do accommoda te ons of tho three-
Item Itatement. with OM ...p ud tMr.fore explain at least part of the data.
Thu.. although .ubopllI.... ...., . , .....flned to ,II of other remaining clado-
sram. (a-I, I-q. t-... ... ...
p. to Iccommodall ea.ch three-Item
184 Th ree·item stalem elltS analysis
approach, the c1adograms in Fig. 9.4k and Fig. 9.41 expla in none of the data.
However. using three-item statements, the former cladogram includes o ne
sta teme nt, A(eD), and hence expl ains <I I least some of the data. As Pla tnick
el al. ( 1996) pointed Qut, switching fro m Fig. 9.41 to Fig. 9A k should not be a
' zero-cost' o ption. The value o f the three-item statements approach in this
context is that il is likely to be more sensitive to th e accumulation o f further
da ta tha n is the sta nda rd a pproach.
There is a further point Ihal seems sign ificant for coding protocols. T he
standard app roach treats the bin ary character AB(CD) as a feature relating C
and n toget her with respect to A and B. It says nothing about the re lation-
ships o f C and 0 relative to A or B. The standard character is restricted to
treating C + D as a group, with plesiomorphic Slates as uninformative with
respect to relationsh ips. Under ce rtain circumstances, the standard approach
will treat A + B as the 'group' rather than C + D, with the plesiomorphy (0)
interpreted as 'secondary' or a ' reversal'. l"his is different from crude phe-
netic grouping by symplesiomorphy but nevertheless constitutes a version of
'grouping by plcsiomorphy' (de Pinna 1996). Plalnick el at. (1 996) went
further, wondering ' how long it will take for systematists to realise th at
allowing the '0' entries fo r taxa A and B to constrain potential resolutions can
also give Ihis sort of ' negative evidence' more weight than it deserves, in at
least some circu mstances'. It would also seem that the notion o f character
reversal belongs to tbe realm of ' trees' rather than cl adograms and thus again
constitutes a kind of model (see Chapte r n.
Of further interest is that the fully resolved solutions selected by the
standard approach as shortest (Fig. 9.4a-c) correspond to the ' Interpretatio n
I' solutions, originally proposed by Nelson and Plat nick (1 980) fo r dealing
with potential resol ution of basal trichotomies. In contrast, the fully resolved
solutions selected by the three-item statements approach as shortest (Fig.
9.4a- f and k) correspond to the ' interpretatio n 2' solutions proposed by
Nelson and Plat nick (1 980), whe re the close relationship of C and 0 is
maintained , even though A or B are more closely related 10 either C o r D.
(The additio nal solutions wilh terminal trichotomies simply represent sum-
maries o f Ihe resolved cladograms. The cladogram in Fig. 9.4r is a su mmary
of the fully resolved cladograrns in Fig. 9.4c-e, and the cladogra m in Fig. 9.4s
is a sum mary of the fully resolved cladograms in Fig. 9.4a, f and kJ Finally.
Interpretations 1 and 2 bear some resemblance to Assumptions 1 and 2 in
biogeography, while 'secondary' symplcsiomo rphy. ' reversals', and plesio-
morphies as 'po tentially informative' have a certain amount of similarity
wilh Assumption 0 of biogeography, a somewhat questionable protocol
(Humphries and Parenti in press).
It is worth recalling that three-item statements analysis bega n in bio-
geography. Biogeography, in its modern cladistic fo rm, deals with chldograms
of areas (fo r a recent summary, see Humphries and Parenti in press). A more
general que!lIion with re ~I 10 cludistics miah1 be: il lhere an empirical way
C1!(1ptcr .mll/mary 185
of dealing with all branching diagram s, regardless of the ' kind' of data '! Is
there a genera l theory of cladograms and hence a general theol)' of systemat-
ics? The first analytical explorations o f this question began, perhaps, with
Nelson and Platnick ( 98 1), a much neglected but stili highly relevant and
fundam ental book. We may possibly question any direct analogy between
systematics and biogeography. However, one understanding of the differences
seems to reside in how characters are viewed, and one possible resolution
may lie in rejecting characters as 'transformation series' (anccstor-
descendent sequences) in favour of characters as statements of relationship
(A is more closely related to B than it is to C).
7. A statement shou ld be read from a cladogram in the form 'e and 0 are
morc closely re lated to cach other than either is 10 N, and not in the
form of the 5t3mbl'd approach, which sees nodes as possible (ancestral)
' transformations' of one characte r (or state) into another. Standard
optimization is irrelevant.
8. When fitt ed to a cladogralll , a statemen t wilt have a ci of either 1.00
(when a sta tement fits a particular cladogram) or 0.50 (when i, does not).
Hence CI is not a useful overa ll meas ure of fit as it simply distinguishes
the fit to a node of each statemenl.
9. The number of accepted (accommodated) three- item statements as a
fract ion of the lotal number of statements considered is reflected by the
RI value. Therefore, RI might be a useful measure for the amount of fit
for each possible c1adogram.
-,
10. To improve precision in three-ite m statements analysis, atlentio n must be
give n to redundancy in the data and fract ional weights used. Weights
should be scaled appropriately, in order to avoid introducing errors due
to requirement of current compute r programs [or integer weights. The
final cladogram(s) should be minimal with all nodes supported by data.
J L One understand ing of the difference between the standard approach and
the th ree-item approach resides in how character data are viewed: as
' transfo rmation series' (ancestor- descendent seque nces) or as statements
of relationships (A is more closely related to B than it is to C).
References
References cUed
Adams, E. N. (1972). Consensus techniques and the comparison of taxo nom ic trees.
Systematic Zoology, 21 , 390-7.
Alberch, P. (1985). Problems with the interpreta tion of developmental sequences.
SysfemaLic Zoology, 34, 46- 58.
Allard, M. W. and Carpenter, J. M. (1996), On weighting and congruence. Cladistics,
Il, 183-98.
Almeida, M. T. and Bisby, F. A. (1984), A simple method for esta blishing taxonomic
characters from measurement data. Taxon , 33, 405-9.
Anderberg, A. and Te hler, A. (1990). Consensus trees, a necessity in ta xonomic
practice. C/atlisrics, 6, 399-402.
Archie, J. W. (J985). Methods for coding variable morphological features for numeri -
cal taxonomic analysis. Systematic Zoology, 34, 326-45.
Archie, J. W. and Felsenslcin, 1. (1 993). Thc numbe r of evolutiona ry sleps on random
and minimum length trees for random evolutionary data. Journal of Th eorelical
Biology, 45, 52- 79.
Baer, K. E. von ( 1828). Ueber Elltwickelllrlgsgeschichle der Thiere: BCQbac/rruflg ulld
Re/lexioll, Theil 1. Gebriider Borntrager, Konigsberg.
Barthelemy, l-P. and McMorris, F. R. (1986). The median procedurc for tHrees.
Jownal of Classificatioll, 3, 329-34.
Barthelemy, J.-P. and Monjardel, B. (1981). The median procedure in duster analysis
and social choice theory. Mathematical Social Sciences, l , 235-67.
Haum, B. R. (1988). A simple procedure for establishing discrete characters from
measurement data, applicable 10 cladistics. Taxon, 37, 63- 70.
Begle, D. P. (1991). Relationships of Ihe osmeroid fishes and the use of reductive
char<lcters in phylogenetic analysis. Systematic Zoology , 40, 33 - 53.
Bremer, K. (1988). The limits of amino-acid seq uence data in angiosperm phylogenetic
reconstruct ion. Evolutioll, 42, 795- 803.
Bremer, K. (]990). Combinable component consensus. Cladistics, 6, 369-72.
Bremer, K. (1994). Bra nch support and tree stability. Cladistics, JO, 295-304.
Brower, A. V. Z. and Schawaroch, V. (1996). Three steps to homology asses-~ment.
Cladislics . 12, 265- 72.
Bryant, 1-1. N. (1992). The wle of permutati(ln tail probability tests in phyloge nelic
systematil-s. SY~'lem(llic IJiof()!,,'l', 41 , 25R-(I~t
Bryallt , 1-1 . N. (J995). Why ,IlIwpOl1l0rphies shou ld he removed : 11 reply 10 YClltes.
C/(/dislir.~, II , 3!:II -4 .:~,""""""",,,
Bull, J. J., I-Iuelscllhcd~, 1. P., ('unnin.h"lII, ~. W., Sworford. D. l . alld Waddell. I' . J.
(l~).l). f'lIrtitinnin. Ittn ill phylogcnctk' ,malysis. Sy,\/('lIwli('
IJiofos,v, 42, :\H4-91
188 References
Cain, A J. and Harrison. G. A 0(58). An analysis of the taxonomist's judgement of
affinity. Proceedings of the Zoological Sociery of London, 131, 85 - 98.
Camino J. H. and Sokal, R. R. (1 965). A method for deduci ng branching sequences in
phylogeny. Evolution, 19, 311 -26.
Carpenter, J. M . (1988), Choosing among mu ltiple equally parsimonious d adograms.
Cladis/ic.s, 4, 29}-6.
Carpenter, J. M. (1992). Random cladistics. Cladistics, 8, 147-53.
Carpente r, J. M. ( 1996). Uninformative bootst rapping. Cladistics, n , 177- 81.
ChappiU, J . A. (1989). QuaOlitative characters in phylogenetic analysis. Cladistics, S,
217-34.
Coddingto n, J . IlIuJ Scharff, N. (1994). Proble ms with zero-length bra nches. Cladistics,
10,415-23.
Colless, D. B. (I980). Congruence between morphological and allozyme data for
Menidia species: a reappraisal. Systcmalic Zoology, 29, 288- 99.
Cra nston, P. S. and Humphries, C. J. (1 988). Cladistics and compUiers: a chironomid
conundmm. Cladistics, 4, 72- 92. .,
Davis., J. I. (]993). Character removal as a means fo r ass.essing the stability of clades.
Cladislics, 9, 201 - to.
DeBry, R. W. and Slade, N. A. (1 985). Cladistic analysis of restriction endonuclease
cleavage ma ps within a maximum-likelihood framework. Syslemalic Zoology. 34,
21- 34.
de Queiroz, K. (1985). The ontogenetic method for determining character polarity and
ils relevance to phylogenetic systematics. Systemn tic Zoology. 34, 280- 99.
de Quciroz, K. (1 993). For consensus (sometimes). Systemalic Biology, 42, 368- 72.
Devereux, R.o Loehlich, A. R. and Fox, G. E. (1990). Higher plant origins and the
phylogt:ny of green algae. IOlm1a1 of Molecula r Evolurion, 31, 18-24.
Dixon, M. T. and Hillis, D. M. (1993). Ribosomal RNA secondary structure: compen-
satory mutations and implications for phylogenetic analysis. Molecular Biology
and Evolution, 10,256-67.
Donoghue, M. J ., Olmstead, R. G., Smith, J. F. and Palmer, J . D. (1992). Phylogenetic
relationshi ps of Dipsuca[es based on rlx L sequences. A nnals of the Missouri
Botanical Garden, 79, 672- 85.
Eernisse, D. J . and Kluge, A. G. ( 993). Taxonomic congruence versus total evidence,
and amniote phylogeny inferred from fossils, molecules, and morphology. Molecu-
lar Biology and EuolUlioll, 10, 11 70-95.
Eldredge, N. (1 979). Alternative approaches to evolutio nary theory. Blllletin of the
Cumegie Mllu um of Natural History, 13,7- 19.
Faith, D. P. (1 991). Cladistic pemmlation tests for monophyly and non-monophyly.
Systema tic Zoology, 40, 366- 75.
Faith, D. P. and Cranston, P. S. (1991). Could a cladogram this short have arisen by
chance alone? - On permutation tests fo r cladistic structure. Cladistics, 7, 1- 28.
Farris, J. S. (1 969). A successive approximations approach to character weighting.
Systematic Zoology, 18, 374- 85.
Farris, J. S. ( 970). Methods for computing Wagner trees. Systematic Zoology, 19,
83-92.
Farris, 1. S. (1971). The hypothesis of nonspecificity I nd taxonomic congruence.
Annual Reuiew of ~ tlnd Synemtllics, 1, 2n-30Z..
References 189
Kluge, A. O. and Wolf, J. (1993). Cladist ics: what's in a word? Cladistics, 9, 1- 25.
Kraus, F. (1988). An empirical evaluation of the use of the ontogeny polarization
cri terion in phylogenetic i n fe ren~. Systematic Zoology, 37. 106- 41.
Kubicka, E , Kubicka, O. and McMorris, F. R. (995). An algorithm to fi nd agreemelll
subtrees. la/mlal of Classification, 12, 91 -9.
Laconic, H. and Stevenson, D. W. (1991). Cladistics of the Magnoliidae. Cladistics, 7,
267-96.
Lanyon, S. M. (1985). Detecti ng internal inconsistencies in distance data. Systematic
Zoology, 34, 397- 403.
Larson, A. (1994). 'Jbe comparison of morphologic!!1 and molecular data in phyloge-
netic systematics. In Moleclliar apfJrotIches to ecology alld euolw;oll (ed. B.
Schierwater, 8. Streit. O . P. Wagner and R. DcSallc), pp. 37 1- 90. Birkhliuser
Verlag, Basel.
Lauder. O. V. (1990). Functional morphology and systematics: snldying functional
patterns in an historical contcxt. Allllual Review of Ecology and Syslematicr, 21 ,
317- 40.
Littlewood, D. T. J. and Smith, A. B. (1995). A combined morphological and molecular
phylogeny for sea urchins (Echi noidea: Echinodermata). Philosophical Tral/sac-
li01'-" of the RQ)'(l1 Society of LOl/don , B, 347, 213- 34.
UWlrup, S. (1978). On von Baerian and Haeckelian recapitulation. Systematic Zo9/ogy,
21, 348-52.
Maddison, W. P. (1993). Missing data versus missing chllracters in phylogenetic
analysis. Systematic Biology. 42, 576- 81.
Maddison, W. P., Donoghue, M. J. and Maddison, D. R. (984). Outgroup analysis and
parsimony. Systematic Zoology, 33, 83- 103.
Margush, T. and McMorris, F. R. (1981). Consensus notrees. Bulletil! of Mathematical
Biology, 43, 239- 44.
Marshall, C. (1992). Substitution bias, weighted parsimony, and amniote phylogeny as
inferred from ISS rRNA sequences. Molecular Biology alld Euollllioll, 9, 370- 73.
Mayr, E. (1969). Principles of systematic zoology. McGraw-Hili. New York .
Mayr, E., Linsley, E. G. and Usinger, R. L. ( 1953). Methods and principles of systelllalic
zoology. McGraw-H ili, New York.
Meier, R. (1994). On the inllppropriatenl!:ss of presence/ absence recoding for non-
additive mu1tistate characters in computerized cladistic llnalyses. Zoologischer
Allzeiger, 232, 201-12.
Miekevich, M. F. (1982). Transforma tion series analysis. Systematic Zoology, 31 ,
46 1- 78.
Mickevich, M. F. and Johnson, M. P. (J976). Congruence between morphological and
allozyme data in evolutionary inference !lIId character evolution. Syslematic
Zoology , 2S. 260- 70.
Mishler, B. (1994). Gadistie analysis of molecular and morphological data. American
lournal of Physical Anthropology, 94, 143- 56.
Miller, C. (1980). The Thirteenth Annual Numerical Taxonomy Confere nce. System-
atic Zoology, 29, 177-90.
Miyamoto, M. M. (1985). Conlen,ul cladograms and general classifica tion. Cladistics.
1, 186-9.
Miyamoto, M. M. aacI ~ Tbe potential importance of mitochondrial
DNA aequlnat ~ ~Joaeny. In 1M Itimlrchy of lilt {ed.
192 References
B. Fernholm, K. Bremer and H. Jornvall), pp. 437-50. Elsevier Science,
Amsterdam.
Miyamoto. M. M. and FilCh, W. M. (1995). Tesling species phylogenies and phyloge-
netic methods with congrue nce. Systematic Biology, 44, 64- 7.
Mueller, L O. and Ayala, F. J. (982). Estimation and in terpretation of genetic
disumce in empirical studies. Genetical Research, 40, 127-37.
Neff, N. ([986). A ralional basis for a priori characler weighting. Systematic Zoology,
JS, 110- 23.
Nelson, G. J. (1913). The higher-level phylogeny of lhe vertebrates. Systematic
Zoology, 22, 87-91.
Nelson, O. J. (1978). Onlogeny, phylngeny, paleontology. and the biop,cnetic law.
Systematic Zoology, 27. 324-45.
Nelson, G. J. (1979). Cladistic analysis and synthesis: principles and definitions, with
a historical note on Adanso n's Famille des Plantes· (1763- 1764). Systemalic
Zoology, 28, 1-2 l.
Nelson, G. J. (1992). Reply to Harvey. Cladistics, 8, 356- 60.
Nelson. G. J. ( 1993). Reply. Cladistics, 9, 26 1- 65.
Nelson, G. J. ( 1994). Homology and systematics. In Homology: the hierarclrical basis of
comptlrlltive biology (ed. n. K.. Hall), pp. 101 - 49. Academic Press, San Diego.
Nelson, G. J. (1996). Nullius in verba. Joumal of Comparatioc Biology. I, 141 - 52.
Nelson, G . J. and ladiges, P.Y. (1992). Information contcnt and fractional weight of
three-taxon statements. Systematic Biology, 41, 490-4.
Nelson. G. J. and Ladiges, P.Y. (\993). Missing data and three-item analysis.
Cladistics, 9, lI l - 13.
Nelson, G. 1. and ladiges, P.Y. (1994). Three-ilGm consensus: empirical test of
fractional weighting. In Models ill phylogellY reconstnlction, Systematics Associa-
lion Special Volume, No. 52, (cd. R. W. Scolland, D. J. Siebert and O. M .
Williams), pp. 193-207. Clarendon Press, Oxford.
Nelson, G. J. and Patterson, C. (1993). Cladistics, sociology and success: a comment on
Donoghue's cri tique of David Hull . Biology and Philosophy, 8, 441 - 3.
Nelson, G . J. and Platnick, N. I. (1980). Multiple branching in cladograms: two
interpre tations. Systematic Zoology, 29, 86-91.
Nelson, G. J. and Pial nick, N. I. (1981). S),l·tem(j(ics and biogeography: cladistics and
vicariance. Columbia University Press, New York.
Nelson. G. J. and Plalnick, N. l. (l991). Three-tuxon statements: a more precise use of
parsimony'? Cladistics, 7, 351- 66.
Nixon. K. C. and CarpeOler, J. M. (993). On outgroups. Cladistics, 9, 413- 26.
Nixon. K. C. and Carpenler, J. M. (l996a). On simultaneous analysis. Cladistics, 12,
22 1-41.
Nixon, K. C. and CarpeOler, J. M. (l996b). On consensus, collapsibility, and d ade
concordance. Cladistics, 12,305 - 21.
Nixon, K. C. and Wheeler, Q. D. (1992). Extinctio n and the origin of species. In
Extinctioll and phylogeny (ed. Q . D. Wheeler and M. Novacek), pp. 119- 43.
Columbia University Press, New York.
Novacek, M. (1992). Fossils IS critical data for phylogeny. In Extinction and phylogeny
(ed. O. D. Wheeler and M. Novacck), pp. 46-88. Columbia Universiry Press, New
York.
References 193
(W(IIWi~"'
;';~'~~~=~;~i~:~:~~;~:' un, (ctl. R. F. Doolittle ),
pp.6 15-2f1.
YC; ll e .~, D.
196 References
Suggestions for further reading
The number in p<Jrenlheses after each entry is a cross-reference to the appropriate
chapter
Ila licizcd words or expressions in a defi nition have their own entry in the
glossary. 'See also' indicates a cross reference to a related topic, whereas 'cr.'
is a cross refe rence to an antonym. Synonymous expressions have only a
single explanatory en try. ' Also known as' is a cross reference from the main
entry to a synonym; 'See' is a cross reference from a synonym to the main
entry. Where two or more alternative definition s for a term arc provided,
then that given fi rst is the accepted usage within the context of th is book.
nddilive cod ing A met hod for representing ordered mullis/ate ch(lracte,.~, as a
linked series o f binary cJwraclers. Cf. nOIl-additiJ.)e coding.
agreement sublree A met hod of comparing two or more [//Iull/ melltal dodo·
grams that shows only the clades and taxa held in common. See also
greatest agrc('IIII!1I1 ,w hlree.
binary characler A character that has only two observed states. Binary
characters are generally coded 0/ 1. They can be directed or undirected,
polarized or unpolarized, but are imrinsicaJly ordered. Bin ary characters
cannot be unordered. See also muitistate character.
Bremer support The number of exira steps required before a clade is lost
from the strict consensus tree of near-minimum length c1adograms.
Also known as branch suppor1, clade stability, decay index, lel/gth
difference .
cladistic covarialion The degree to whic h all characte rs in a data set are
explainable by Ihe same d adogram topology.
combin able components consensus tree A consensus tree formed from all
the uncontradicted componelUs in a set of /wuiamemai c1adograms;
thai is, o ne that contains all the components found on the respective
strict consensus tree, plus those componen ts that are uncontrad icted by
less resolved components within the set of fundament al dadograms.
A lso known as a semi-.wriet conscnSlJS tree.
compatible char:tcter Two cha rac ters that do not contlic t in the g roltp.~ Ih at
they Sli ppo rl arc termed compatible. Compatihle t haracl crs im:llIdc
bl)lh those Ihal arc ('()//8! 'II CIIf aud thoSe til:ll IIrc i'lIIui.III·"' .
Glossary 203
component A group o f taxa as determined by the branching pattern of ,I
c\adogram. For example, in a group comprising three taxa A , B, and C,
where B and C are mo re closely related to each ot her than e ither is to
A, there are two components, ABC and BC. ABC is an tminfommtive
component, while BC is an infomra/We component. Also known as
clade, monophyletic group .
consensus method A method for combining the grouping inform ation con-
tained in a set of cladograms for the same taxa into a single topology,
the consensus tree. See Adams consensus tree, com hinable com pollentJ'
consensus tree, comprom ise tree, majority-ntle Cotu'eIlSIlJ' tree, mediall
COllsellSllS tree , Nelson consensus tree, strict COIISell,~IIS tree.
constant character A character for which all taxa in a data set are allocated
the same code. One type of uninformative character.
convergence Two characters that pass the conjunctioll test of homology but
fai l both the similarity and congruence tests are termed convergent.
Also known as homoiologies. See also homology, parallelism.
COSI The number of steps required to account for the tJansfonnation of one
character stale into another On a c1adogram.
decisive data A data sct that conta ins at least one phylogenetically informa-
tive character. Decisive data yield cladograms that differ in length
among themselves and thus offer reasons for choosing some clado-
grams in preference to others. Cf. wldecisive data.
homologue, homology (1) Two cha racters that pass the similarity, cOlljunc-
tion and cOllgmellce tests are termed homologous. Also known as
SYllopomorphy. (2) Character states that share modifications from an-
other conditio n, e.g. wings of birds in relation to forelimbs of other
tetrapods. See also convergence, parallelism. primary homology, sec-
ondary homology.
informa tive component A component that includes morc than one but Icss
than all of the lermina/taxa in a data set. Cf. IInin/omwtive component.
Ingr-oup node The nOlle of a c1adogram that unites all members of the
if/group into a single clade. See also au/group /lode.
jackknife monophyly Index {JMI) The ratio of the sum of p (c,), from / = I
to T, to T, where T is the number of ingroup taxa and p (c,) is the
proportion of the most parsimonious cladograms of pseudorcplicate t
in which clade c is supported.
jackknife strict consensus tree A consensus tree that includes all compo-
nents common to or consistent with all members of a set of jackknife
pseudoreplicate c1adograms.
HnkaKe (t) The condition under which two characters do not represent
independent evidence in support of a group. (2) The union of more
than two character slates into a single mV/Ii$tQle character. Also known
as character interdependency.
median consensus tree A consensus tree for which the degree of difference
between any pair of fundamental c/adograms (as measured by a tree
comparisoll metric) is smaller than for any ot her cladogram of the same
taxa.
midpoint rooting A method of rooting that places the root at the midpoint
of the longest branch o r path connecting two taxa.
nearest neighbours The branches arising from the nodes at either end of a
particular internal branch of a cladogram.
Nelson consensus tree A consenstls tree formed from the C/iql~ e of mutually
compatible components that are most replicated in a set of fundamen-
tal c/adograms.
outgroup node The node on a cladogram that unites the ingroup taxa with
the (irst Oldgroup (sister-group). See also ingroup node.
parallelism Two cha racters that pass both the similarity test and conjunction
fest of homology but fail the congruence teat are termed parallelisms.
See also convt'fltnce, homoJogy.
Glossary 213
polymorphic character (I) A character that can show two or morc character
stales within the same individu al, e.g. alleles. (2) A character that can
show two or more states among diffe rent individuals o f a taxon, e .g.
colour forms of some species o f bUllerfly.
recapitulation (von Beerlen) The view of development that states that two
(axa will share the same ontogenetic sequence up CO the point that they
Glossary 215
diverged into separate lineages and thus we would never expect the
ontogenetic sequence of an organism to pass through the stages found
in the adults of its ancestors. Se.e also recapitulation ( Haeckelian)'
rescaled consistency index (rc) The product of the cQnsistency index and the
relelltion index of a character.
root (I) The basal taxon of a cladogram on whicb all characte rs have been
polarized. (2) The starting point or base of a cl adogram.
secondary homology An hypothesis of homology that has passed the ~·im ilar
ity, conjunction and congruence tests and is accepted as a synapomor-
phy. See also primary homology.
2 16 Glossary
similarity test A test of primory hnmoiogy. To pass the similarity test, two
characters must be generally comparable in morphology, anatomy and
topographical position. See also congruence test , conjllllction lest.
sister-group(s) (1) Two taxa that are more c1dsely related to each other
than either is to a third taxon. (2) The taxon th<!t is genealogically most
closely related to the ingroup.
standard approach The method of cladistic analysis that codes tbe obsetved
feat ures of taxa as biliary characters and/or transformation series,
assesses the optimal c1adogram in terms of length and investigates
hypotheses of characte r evolut ion using optimization proced ures. Used
primarily in contradistinction to three-item statemelllS analysis.
taxlc a pproach An approach to cladistic analysis that uses only the distribu-
tions of characters among taxa to hypot hesize group membership. All
other propenies of both characters and groups (e.g. polarity, mOllo-
ph}'ly, mltls/omlalioll series) are derived from the resulting cladogram.
Cf. trons!omwtional approach.
term information The term information of a component is one less than tbe
number of lennillol taxa incl uded within that component. The term
information of a cladogram is the sum of the term information of all its
infomwlive componeflfs. See also compollelll ;1I/onlla(;OIl,
three-item statement T he concept that two entities (taxa, areas) arc more
closely related to each other than either is to any other third entity.
Also known as a three-laxon :Natemellt.
lotal support The sum of the Bremer support values of all branches on a
cladogram.
lotal support index The ratio of tolal .suppott to the length of the most
parsimonious cladogram .
undeci sive data A data set that includes ;til possible infornllllive ch:tntcters
in equal numbers so that it is phylogenc liclllly uninrormll tive. Ulldeci-
sive data yields all possible fully resolved ciadograms. which will all be
of the same length, and thus offer no reason for choosin g some
cladograms in preference to ot hers. Cf. decisive data .
unordered character A mu/tistate character of which the order has not been
determined. In an unordered character, transformation between any
two states, whether al/jacent or non-adjacent, costs the same number of
steps (usually one, see direction). For example, in the unordered
character, 0 ...... 1 ...... 2, the transformations 0 .... 1. 1 ++ 2 and 0 ...... 2 all
cost the same number of steps. Filch optimization uses unordered
characters. Cf. ordered.
Wagner optimiza tion The optimizcllion procedure used for ordered, unpolar-
ized, undirected characters.
The following is a list of those programs that arc mentioned in this book or
that implement methods and procedures discussed io the text. More complete
listings of phylogenetic compu ter programs and packages can be found at:
h up: / / evolution .ge nelics. washington.ed u / phyl ip / software. Mml
http:/ j www.nhm.ac. uk/ hen n ig/ sortwa reo hlml
Farris. J. S. (1988). Hennig86 version 1.5. MS-DOS program. Published by the au thor,
Port Jefferson Slalion, New York.
Goloboff, P. (19960), PIWE version 2.51. MS-DOS program. Published by the au thor,
San Miguel de T ucuman , Argentina.
OoloOOff, P. 0996b), NONA version /.50. MS·DOS program. Published by the autho r.
San Miguel de Tucuman, Argentina.
Maddison, W. P. and Maddison, O. R. (1992). MacCklde 3.0/. Macintosh OS program.
Sinauer Associates, Sunderland, Massachusetts.
Nelson, G . J. and Ladiges, P. Y. (1995). TAX' MSDOS computer programs for
systematics. MS·DOS program. Publisned by the authors, New York and
Melbourne.
Nixon, K. (992). CLA DOS tJCr.Swn 1.2. MS-DOS program. Cornell University, Ithaca.
Page, R. D. M . 09(3). COMPONENT veman 2.0. MS·DOS program for Windows e .
The Natural History Museum, London.
Ramos, T. C. (19%). Tree Gardener ucn';oll 1.0. MS-DOS program . Museu de
Zoologiaj USP, Sao Paulo.
Siddall, M. E. (1996). Random Cladistics, ucrsiQl1 4.0.3, Ohio edition . MS·DOS program.
Virginia Ins titute of Marine Sciences. Gloucester Point.
Swofford, D. L (1993). PAUP, Phylogenetic Analysis Using Parsimony, version 3.1.
Macintosh OS program distributed by the Illinois Natural History Survey,
Champaign, Illinois.
Index
Page numbers in ilillie refer to the glossary. charat lcr optimization
Camin- Sokat optimiution 77- 8
COS t matrice~ 78- 9
accelerated transformation (ACCrRAN) Dollo op timization 75- 7
72-3, 199 Film optimization 73- 5
Adams oonsensu.' I r"'~~ 141, 147, ISO. 161, 199 generalized 78- 9
amino acid sequence, and character weighting
Wagner optimization 70- 3
103 character polarity
ancestors
a priori de termination 64- 7
as paraph)'lc!ic taxa 13- 14
biogeography 61 - 2
problems in cladis tic ana lys is 14
corlStraincd two-step analysis 64- 6
apomorphy
de termination 48- 69
defini tion 200
de lcrmin alion 48- 9 fossils 61
ArchotoplN)'l. cladistic problems 13-14 fu nct ional Yalue 62
au tapom urphy 5 Herrnigian argumentation 48
definit ion 200 ingroup oommOfiality 60- 1
an d terminal lalla 5 on togenelic cri terion 54- 60
outgroup com parison 49-54, 59-60
and paedomorphosis 58
binary charaClcTs progression nrlc 61 - 2
coding of data 29-30,36, 168- 9 simultaneou~, un constrai nt:d analYSis 66, 68
definition 200 strat igrap hy 61
Biogenetic Law 55, ZOO. 214 and underlying synapomorphy 62-4
biogeography character selcetion 19- 20,35- 6
chara cter polarity 61-2 character Slah!S 24- 6. 201
and three-item sta tements ana lysis 184- 5 recognition 2- 5
bootstrap methods 200 transformati on 30- 1
clade support 129- 31, 137 eharac1t: r weighting 99- 11 6, 208, 21 7, 220
branch length 200 and amino itcid sequence 103
clade su pport 126- 1, 131 a poJI~riOri 100, 102, 108, 110- 116, 220
branch support, see Bremer support a ptiori 100- 10, 11 7, 220
branch-and-bound melhod, d adogram con- character analysis 100- 1
struction 4 1- 2, 2O(}-1 and cladistic consis tency 110
branchi ng diagrams, see cladog rums compatibili ty ana lysis 108- 10
bntnch-$wapping
dynami<; 106
cladogram construction 45-8,201. 217,219
and ge netic code 103- 4
w mputer program s 46
hypothesis depende nt, set! character weight-
Breme r support 127- 9,137.201
ing. a posteriori
hypot hesis independent, .I 'te character
Camin- $okal optimizatiOn 77- 9, 201 weighting. a pril)fj
characte r implied 112- 15, ~
definition 23- 6, 201 molec ular data 102- 8
characte r coding, Jet' codi ng of oata morphological dala 101-2
character consistency, set! co(15iSlency index and rescaled consistency index (rc) I I I
clmncter fi t successive t06, 111 - 12, 217
dadogrnm lcngth 92- 5, 116 three-ite m statements analysis 113- 7
measurement 92- 9 tree dependent. Jet character weighting, u
statiM ieli1 anal ysis 95- 9, 11 6 fXl$lcriori
clutrHctcr length, Jtt character opti miza tion tree independent, see chKrllctc r weighting, a
chllntctel liokage 29-111 priori
224 I"dex
characters cladogram collstruction
adjace nt 199 branch.and.bound method 41-2, 2()()- J
apomorphic 2- 3 branch-swappillg 45-8, 201, 217, 219
roogTuenl 8- 10, 27 effidency 39- 40, 42-4, 46, 48
consis!ent 8, 9, 10 exact met hods 39- 42
conti nuous 19- 22,35- 6 Hen nigian argume ntat ion 38-9, 208
definition 23- 6, 201 heuristic methods 42- 8
diagnostic 23-4 'hill-climbing' techniques 42- 3
directed 205 nearest-neighbour imerehangc (NNJ) 45- 6,
discrete 21-2
211
filtering 19- 20,37
parsimony 38-48
gener~l 56- 7
and homo logy 24, 26- 7 stepwise addition 43- 5
homoplastic 9, 10 subtree pruning and regrafl ing (SPR) 45- 7
!lon-adjacent 21/ tree biscetion and recon necti on (TBR) 45- S
oon-ovcrklppi ng 22- 3 cladogram lengt h
ordered 2/2 Camin-Sokal optimization 77
overlappi ng 22- 3 character fit 92- 5, 116
plesiomorphic 2- 3 definition . 8, 209
quali tative 20- t Dollo opt i ~i lation 75- 7
quantilalivc 19- 21 meas ure of cha racter fit 92- 5
systematic 23- 4 relationship to da ta 172- 3, 117, 185
transfor mation 24- 5 th ree-item statements analysis 172- 3, 177,
undirected 219 179, ISS
unordere d 220
Wagner optimization 71 - 2
see also binary characters; mullistatc
c1adogram support, and partitioned analysis
characters
dade stab il ity, see Breme r support
.,9
dade support cladograms
bootstrap methods 129- 31, 137, 200 and biogeography 61-2
branch length 126-7, 137,200 comparison with !Tees 15- 18
Bremer support 127-9, 137, 201 confidence limits liS
clade stab ility index (CSi) 133- 4, 137-8,202 criteria [or optimality ISO
' jackboot'method 133 definition 1- 2, 202
jackknife sampl ing 13 1- 3, 137,209 dete rmination of 19
Momc Carlo methods 129- 31, 137 fundalnelltal 139-40, 143, 145, 14S-9
randomization proccdures 129- 35 length 8, 209
statistical 126-35 minimal 177- 8, 210
topology-dcpendcnt permutation tail proba.
optimal 8
bility (T-PTP) 134- 5, 138,218
and phylogeny 118, 160
total support ind<.lx (Ii) 128- 9, 218
resolution by sim ultaneous anal ysis 160
dildislic analysis
and arbiuilry consensus 161 rooting 5,64- 8, 2/5
charact<.l r coding, see coding of dillR statistical analysis 118- 38
chilracter seh,!(\ion 19- 20 steps S
cond itiona l data combination 162- 4 coding of data 19,27- 37
maximum likelihood apl}roach 158 add itive 199
morphometric dat a 33-5 binary characters 29- 30, 36, 168-9
and parsimony 5- 10 discrete variables 27- 32,36- 7
pilrlitioned 151-2.155, 157- 60 gap' coding 33
problem of ancestors 14 gap-weightin g 33- 5
simulta neous 15 1- 2,155, 160- 2 information content 32
standard appmach 210 missing values 31 - 2, 79- 90
strict consensus trees 153- 4
morphometric data 33- 5
see also three-i tem state men ts ~lIlalysis
cladistic co nsistcncy, ~nd cha racter w~ighting multist"te ch.. r"ctns 2'1, JI), l(i9
110, 202 nuclcutidcs JI)
cl.uJistics three.item litlltemellt~ IIn!lly,~is 161(- 7t
dd illitilm I, III.~, 1M Cl'nnhillllbic wmpo~.nl~, Clln1l<IIXUS Ir~c~
three· tll •• m ~tllt~mtnt 2 143-'. ''''- . ,
Index 225
com puter programs el,tdogram Icngth 8, 209
branch-swapping 46, 018 consistency indcx (cO 95, 201- 4
characte r weighti ng 11 0- 11 , 114 constrained, two-M ep Hnaiysi5 2Q4
cladogra m length 8 data decisiveness (DO) 120, 136, 204, 219
conscns us Irees 142- 4, 147 00110 optimization 105- 6
exac t methods 39- 42 ensemble consistency index (Cn 95- 6, 206
heuristic methods 42- 8 en$C:mble re tcntion index ( R I) 99, 206
mi nima l cladograms 177- 8 epigenetic characters 54
missing val ues 31,82-6 gap-weighting 207
morphometric dat a 33 homo logy 26- 7, 208
permutation tai l probability (PTP) test 125 homoplasy 208
polymorphic variables 20 illgroup 38, 209
three-item sta tements analysis 170- 2, 177- 9 lin kage 210
condi ti unal datil combination. dadi~tk ~ na lysi5 mono phyletic groups 10- 11 ,210
162-4 mul tistate characters 211
consensus analysis, see consensus trees non-epigenetic charactcrs 54
conse nsus IrtM IOU, 132, 139-50, 203 ontogenetic critc rion 21 /
Ada ms 141, 147, ISO, 161, 199 ou tgroup 212
agreement sublrees 141- 8, ISO, 199 pa ra phyletic groups 11 , 213
arbit ra l)' 161 pa rtitioned analysis 213
and Brcmer support 127-8 pe rmu ta tion tail probabili ty (PT P) 213
combinable compol1c ntS 143- 5, 149,102 polyphyletic groups 11 - 12,214
common pruned 147- 8, ISO, 102 relationship 1
greatest agreemc nt subtrees (GAS) 148, re tcntion index (rO 97, 2/ 5
150, 207 three- item sta tements analysis 185, 218
majority-rule 140, 145- 6, 149, 161,210 transforma tion series analysis (TSA) 218- 19
median 140- 1,145 - 6,149,210,219 underlying synapomorphy 62, 219
Nelson 14 1- 2,146- 7, )49, 211 see also glossary
se mi -strict 140, 143- 5, 149, 202, 216 delayed transfo rma tion (OELTRAN) 73
strict 140-3, 149, 153- 4, 177- 8,216- 17 discre te variables, character coding 27- 32,205
three-item stalemenlS analysis 177-8 dist ribution of c1adogram lengths (DeW,
consistency index (d) statistical analysis 120- 2, 136
an d cha racter weigh ting III 1Xl1l0 optimization 75 - 9
defini tion 95, 116, 203- 4 defin ition 205-6
effect of numbe r of tlUll 96 00llo's Law 11
effect of uni nformative characters 96
mcasurc of character fi t 95, 116
problems 96- 7, 116
th ree-item statements analysis 179 echinoids, cladistic analysis 152, 154, 163
see also ense mble consistency index (Cn ensemble consistency index (CI)
de finition 95- 6, 206
co nstrained two-step analysis
charac ter pola ri ty 64- 6 me8suremCn! of characte r fi t 95- 6, 111
dcfini tion 204 problems 96-7
cost ma trices 32, 78-9 ond statistical ana lysis 120,122- 3
criterion of ehorological progression, ue th ree-i tem statements analysis 119
~ also consis tency index (til
progression rule
cytoc hrome b, partitioned analysis 158 ensemble reten tion index (RI)
defin ition 99, 206
llleaSlIrcment of characte r fit 99, 11 7
da ta decisiveness (DO), definition 120, 136, and st8ti5ticailinalysis 120, 122
104, 219 three-item Statements analysis 179
decay index, SEe Bremer support see a/so retention index (rO
deer mice, cladistic analysis 155- 6, 158 epige netic characters, defin ition 54
defi nitions
apomorphy 200
Ullw pnmnrphy 2(K) FIG/ FOG methnd, outgroup eomparisoll
bimuy charac ters 200 49- 53
chllflu;tcr 23- 6, 201 Fitch IIptimirot iclIl 73-9, 207
d. di 5tK:1I I, IHS, 202 c05t mAt rix 711- 9
chldtllr um 1· 2, 202 nlln·.ddit lve chlflw:,!g. 7,l- ~, l21J
226 Index
fossils milkweed butterOies (A III/lII';5), cladistic
chHllClt r pola ri ty 61 a na lysis 152-3,161, 163- 4
lac k of cc rtai n types o f dll ill 61, 165 min ima l cladogram s, three- ite m state ments
an alysis 177- 8
missing va lues
gap-weighting coding o f data 31-2,79- 90
de fi nit ion 107 and computer progra nlS 31,82- 86
morplwrnclric d:lI a 33- 5 deali ng wi th 82- 8, 165
gene ralized cha racte r o pt imization 78-9, 207 crfects o n cladogra ms 81- 2, 165
genes. partitioned analysis 157-8 and non-applicablt: character Slates 88-90
gene tic code rcasons fo r 79- 80
and characte r weighTin g 103- 4 three- item statem e nts a na lysis 170
degeneracy \I)] mo lecular data
giosSllI')' ]99 220
c haracte r weighti ng 1U1- K
grasshopper mice, clad istic ana lysis 155-6. 158
and part itio ncd ana lysis 157- 9, 160, 161
grea test agreement su btrees (GAS) 148, 150,
si:u o f da ta $C ts 159- 60
20' monophy letic g ro ups
defi nitio n 10- 11 , IS, liD
Hacekel. Biogene lic Law 55, 114 spc:c ifi calion.by synapomo rphy 14
I-I c nnigia n argume ntation Mon te Carlo me thods, cladc su pport 129-31,
and characte r pula rity 48 137
cladogram oonslru clio n 38- 9, 208 mo rphological dat ~, cha racter weig hling
heuris tic me thods, cladogram conSIUleticn 101 -2
42-8 mo rpho metric da ta
'h iJI.dimbing' techn iques, dad ogram clad istic ana lysis 33- 6
construction 42- 3 COlli ng of da ta 33-5
homoialog)', and underlyi ng synapomo rphy 63 gnp-we ightin g 33- 5
homology most parsimonious cladog rams, effects of miss-
and characters 24,26- 7 ing va lues 81-2
de fin ition 26-7, 208, 215 most pa rs imonious reco ustruct io n (MI>R) 71 ,
and synapom orphy 14,27 73- 5,78
lesling 14-15 mul listate characten; 21- 2,28-9
homoplasy COlli ng o f data 29,36, 169
and Camin- Soka l optimization n definition 211
de finition 108 th n:c-i tem state men ts nna lysis 111
and 00110 o ptimization 7S
measures of 92- 9, 111 , 116
an d most pa rsimon io us reconst ruct ion 7 1
nea rest-lI.!ighbou r int e rchange (NNl),
a nd Wagner optimi zatio n 71
cladogram cons truction 4S-6, 11l
Nelson consensus trees 141-2, 146-7, 149, 111
ingroup, ddin ition 38, 209 non-addi tive <:ha ract e~, Fi tch o ptimizatio n
ingro up commonali ty, chanlcte r pola riry 6O-t 73-5,220
non-applicable chara<:ter s tates, and missing
~alue5 88- 90
'jackboot' me thod, clade suppo rt 133 non-c pigenet ic <:harac tcrs. defini tio n 54
jac kknife sa mpling, clade suppCln DI-3, 137, nudeic adds, pa rtitioned ana lysis 157- 8
209 nucleotide seque nce ull ta, cha racte r wcightl ng
102- 8
nuc lco tides, character coding 36
length difference, SI'e Bremer support
lin kage , definition 110
l undberg root ing 67, 110
Oll toge ne tic crite lion
character pola rity 54- 60
maju rity· rul e trees 140, 145-6, 141}, 16t, 2/(} dcfil1i tinn 1.1 /
maximu m likelihood "PJlrna~ h , d adistil: mutt l1S (~), 6K
lUmlys is lSI! Onhlltcny
med ia n Cj\lIscn SIl ~ trn's J411- 1. 14~ ~h, 149,
'1111 , 211) ....
IIIW~"'5iI
I/ldex 227
outgroup rooli ng
all-plesiomorphic 67. 199 cladograms 5,64 -8, 21S
all -zero 67, 200 'hypot hetical ancestor' 67
definit ion 212 Lundberg rooting 67
ou tgroup compa rison midpo int rooting 6S
algo rithmic approach 52- 4 nntogenetic cri terion 60,68
and character pobrity 49- 54.59- 60.212 outgroup comparison 60. 67- S
rooting 60,67- 8 SI'f! al.~Q chamcter polarity
' fun ctio nal in group/(u nctio na l o ut ~roup '
(FIG/ FOG) method 49-53
M'mi-stric t conse nsus trees 140, 143- 5, 149,
202, 216
simu llltneOUs ana lysis, res()lu tion of
pacd0010 lphosis. cladistic problems 58 cladogrami 160
pa raphyletK: groups si multaneous unconstrained analysis,
as a ncestral groupings 12- 14 characte r pol ~rity 66, 68
defin it ion 11.15.211 sister_groups 1
and sy mpl c:sio morphy 12- 13 a nd apornorpllic chat:rCh:rs 3
parsi mo ny statistica l an31ysis
and cladistic .mal)'5is 5 - 10 cladograms 11 8- 38
cladogram con.~ truction 3S-48 oon~islency index (ei) 95- 7, Ill . 116, 179,
three-item stllte mcn ts ana lysis 180- 1 203-4
Set also most parsi mo nious cladograms; most da ta decisi\'enes.s (DO) 119-20. 136
parsimonious reconstruction (MPR) distribution of cl Adogram lengths (DeL)
partitio ned anal ysis 120- 2. 136
and ch:uaclcr congruence 160 ensemble consistency index (el) 95- 7. 117.
a nd cladisdc analysis 151- 152 120, 122- 3. 206
and dadogram support 159 ensemble retention index (RI) 99. 117. 120,
'-)'tochrome b 158 122.206
definition 213 incongruence between data sets 162
and molecular data 157-6 1 limitations 135
a nd phylogeny 160 permutation tai l probability (PTI'l 122-6,
and vica rillnee hiogeography 15 1 136-7
permuta tion tail probability (PTr) randonlization of data l1S- 19
de rinition 2/3 resca led consistency indcx (RC) I I I. 117.
problems 125- 6 120
stat istical analysis 122- 6 retent ion index (ri) 97- 9, I 11 ,116,179.215
phylogenc tic systematics I. 17, 213; see alS(} skewness 121 - 2
cladistics Te mple ton's non-parametric lest 162-]
phylogeny three-i tem stat e mentS ana lysis 179
and cladogmms 118, 160 sta ti stical s upporl, clades 126- ]5
and on togeny 55- 9 stepwise addit ion, datJogram construction
polymorphic tala, coding problems 89.91 41-5
polyphyletic groups, defin it ion I 1- 12, 15. 214 strat igraphy, character pola rity 61.2 16
progression r ul~, character po larity 61- 2.214 s trict consensus trees J4() - 3, 149. 153- 4,
177- 8,216 - / 7
subtree pruning and regrllfling (SI'R),
rei:ttionship, defin ition I cladogram cOllstmctiOI1 45-7
rescaled co nsistency index (Re) support inde~, see Breme r su pport
and c:ha racter weighting I I I, 117 symplcsiomorphy 3, 217
and statistical analysis 120 lind paraphylelic groups 12- 13
rclelllio n index (ri) synapolllorphy 3, 217
and challlcter weighting I II and cha racter polarity 62- 4
lkfinitioo 97, 215 evidence or relationships 168
measurement or chamcter fit 97- 9, 116 and homology 14,27
three -item stateme nt s ~ n J lysis 179 specification of monophy letic grou ps 14
.\'<',: (11,\'0 ense mhle rclent!un inLiex (RII
rilxll'Om'll RNA t"~~1fl s"mpling
~h.n~Il' t er wci~ht inll HI~, ltI7 ~Iim,"hics It.4 -5
Imr titi(H1cd IInIt1r,... ..C'".,."'. (u"s il. 6 1, ItIS
228 Index
three -item statements analysis trce bisection and reconneclion (TOR),
81;oommodaled 172, 199 dadognlm construction 45- 8
and binary characters 170- 1 u ees
nnd biogeography 184 - 5 comparison with cladograms 1.5- 17
cJadogmm length 172 - 3, 117,179, 185 terminology 17- 18
coding of data 168-71
comparison wi th standard approach 169- 70,
178-9, 180-4 unde rlyi ng synapomorphy
computer programs 170, 111-2,117, [18- 9 definition 62, 119
co nse nsu~ trces 177- 8 Bud homoiology 63
definition 185,2 18 ullioformative characters, efrect 00
minimal cJadograms 117- 8 consis tency index 96
and miSSi ng values 170 uni t discrimiuate compatibility measure
mu t,islale c h"racters 171 (UOCf,'1) 220
non-accommodated 173,211 unordered , haracten;, Jet no n-additive
non-independence of statemen ts 173- 4 characte rs
parsimony 180- 1
precision 1110-5 vicariance bklgeography, and partit ioned
principles 168 analysis J 51
statistical analysis 179 von Baer, recapitulation 55-6,214- 15
weighting 173- 7
three-taxon statement 2
topology-dependent permutation tail Wagner optimization 70- 3, 220
probabili ty (T-PTP), clade character lengt h n
support 134- 5, 138, 2/8 clatlogram length 71-2
lOial support index: (Ii), of clades 128- 9, 218 cosl matrix 78- 9
transformation se ries an alysis (TSA), and homoplasy 71
definition 118- 19 most parsimonious reconstruction 71