Phylogenetic Trees
Phylogenetic Trees
Phylogenetic Trees
Phylogenetic trees
Bas E. Dutilh
Systems Biology: Bioinformatic Data Analysis
Utrecht University, February 14th 2017
1
2/13/17
Phylogeny
• Term coined by Ernst
Haeckel (1866)
– Phylon (fulon)
• Tribe
• Race
– Genus
• Birth
• Origin
• At every node in the tree, a
new lineage is born
• All lineages in a tree are
related because they
descend from the same root
• Tree topology shows how
the lineages are related
Phylogenetic trees
The horizontal lines are branches
• A phylogenetic tree represents and represent evolutionary
the phylogeny of species or lineages changing over time.
sequences
The vertical lines represent nodes
or evolutionary splits. Line length
has no meaning; lines just show
which branches are connected.
2
2/13/17
Nodes in a tree
• Tips (sometimes called “leaves” or “terminal nodes”)
– Present day species or sequences
– The only things we can directly measure
– Contain information used to build the tree
• Ancestral nodes (or “internal nodes”)
– Last common ancestor of
its daughter lineages
• Root (sometimes)
– Last common
ancestor of
whole tree
– If the tree is
Time axis: away from root
rooted, then
the time axis is
defined away
from the root http://epidemic.bio.ed.ac.uk/how_to_read_a_phylogeny
3
2/13/17
Evolutionary time
• The evolutionary divergence between
lineages can be measured by using
evolving characters
– For molecular phylogenies, the unit is
generally: mutations per sequence site
Evolutionary time
4
2/13/17
Evolutionary distance
• The evolutionary distance between two nodes can be
calculated as the sum of all the horizontal branch lengths
between them
– For example the distance between
virus3 and virus7 is:
= 11 x
0.77 mutations/site
http://epidemic.bio.ed.ac.uk/how_to_read_a_phylogeny
5
2/13/17
A case story
• In August 1994 a nurse in Lafayette, LA, tests negative for HIV
• A few weeks later, she breaks off a messy 10 year affair with a doctor
• Three weeks later, while suffering from chronic fatigue symptoms, the
doctor gives his ex-mistress a vitamin B-12 shot, somewhat against her will
• In January 1995, the nurse tests positive for both HIV and hepatitis C.
Investigation reveals no obvious means of infection (positive test for a
sexual partner, accident with a patient, et cetera). The vitamin B-12 shot
becomes suspicious
• The doctor’s office records from the day are conveniently missing but
eventually found by police buried in the back of a closet. The records show
that the doctor had withdrawn blood samples from a known HIV patient
and a known hepatitis C patient the same day as the vitamin B-12 shot.
The record keeping is not in line with standard office procedure and there
is no information as to what happened to either blood sample
• The nurse never had contact with either patient
• Seemingly strong, but otherwise circumstantial, evidence that the doctor
deliberately infected the nurse with HIV and hepatitis C
6
2/13/17
HIV phylogeny
} HIV strains
found in patient
} HIV strains
found in victim
} HIV strains
found in other
individuals from
Lafayette
7
2/13/17
Branch support
• Support values show you how reliable a branching split is
• Mostly displayed as values or circles
– Often between 0 and 1, or 0-100%
• Branches that are not well supported
might be collapsed 0.01
– This means the topology is unclear
– Bifurcating à multifurcating branch
Support*
100%
Note: this is the same split
50% (in an unrooted tree)
1%
*more about this later
http://epidemic.bio.ed.ac.uk/how_to_read_a_phylogeny
Rotating branches
• In the tree on the left, it looks like bat viruses are all
grouped together, but they are not in one lineage!
• Because the vertical dimension has no meaning, branches
can be freely rotated
• The trees below are identical, one branch is rotated:
8
2/13/17
Radial display
Terminal nodes
Internal nodes
Root
Polar display
Normal display http://epidemic.bio.ed.ac.uk/how_to_read_a_phylogeny
9
2/13/17
Time
Origin of mammals
Origin of vertebrates
Origin of animals
Origin of eukaryotes
Earliest fossils
Origin of life
10
2/13/17
Ancestral states
• Can we figure out what animal the very first virus infected?
• We know that evolution tends to be conservative
• We can infer ancestral states by
assuming the fewest possible
changes in the tree
• This is called the
parsimony principle
http://epidemic.bio.ed.ac.uk/how_to_read_a_phylogeny
11
2/13/17
Ancestral states
• Can we figure out what animal the very first virus infected?
• We know that evolution tends to be conservative
• We can infer ancestral states by
assuming the fewest possible
changes in the tree
• This is called the
parsimony principle
http://epidemic.bio.ed.ac.uk/how_to_read_a_phylogeny
Bas E. Dutilh
Systems Biology: Bioinformatic Data Analysis
Utrecht University, February 14th 2017
12
2/13/17
http://epidemic.bio.ed.ac.uk/how_to_read_a_phylogeny
13
2/13/17
14
2/13/17
Question
Mouse A Mouse C
Human A Mouse D
Mouse B Human C
Tree 1
Human B Human D
Tree 2
15
2/13/17
Evolution of function
Annotation
transfer OK
Annotation
transfer not OK
Annotation
transfer OK
16
2/13/17
Annotation
transfer OK
17
2/13/17
Gene loss
Gene loss
18
2/13/17
Wolbachia branch
• … or just one
contaminated sample L
Unrecognized paralogy
• Another mechanism that causes of conflict between the
phylogenetic tree of a gene and the species tree is
unrecognized paralogy
species A
ancestor
19
2/13/17
Phylogenetic inconsistencies
• The phylogenies of different genes from the same genomes
can be inconsistent
20