Bootstraps and Testing Trees: 31 July 2002 Marine Biological Laboratory Woods Hole Molecular Evolution Workshop

31 July 2002 Marine Biological Laboratory Woods Hole Molecular Evolution Workshop
Bootstraps and Testing Trees

Joe Felsenstein Department of Genome Sciences University of Washington, Seattle email: [email protected]
2620 2625
ln L
2630 2635 2640
10
20
50
100
200
Transition / transversion ratio
Likelihood curve (and interval) of Ts/Tn ratio
A
v1 v 6
B
v 2
D
v 4
E
v5
Constraints for a clock

v1 = v 2 v4 = v5
v3 v 8 v7
v1 + v = v 3 6 v +v = v +v 3 4 7 8
Constraints on a tree for a clock
204
ln Likelihood
A 205 x
A x
B 206 x
0.10
0.10
0.20
x Likelihood surface (in x) for three clocklike trees
Mouse Bovine
Tree I
Gibbon Orang Gorilla Chimp Human
Mouse Bovine
Tree II
Gibbon Orang Gorilla Chimp Human
Two trees to be tested using KHT test
site Tree
231
232
ln L
1405.61 1408.80
I II Diff
2.971 4.483 5.673 5.883 2.691 8.003 2.983 4.494 5.685 5.898 2.700 7.572
... ... ...
2.971 2.691 2.987 2.705
+0.012 +0.111 +0.013 +0.015 +0.010 0.431
+0.012 +0.010
+3.19
Table of dierences in log-likelihood by site
0.50
0.0
0.50
1.0
1.5
2.0
Difference in log likelihood at site
Histogram of LnL among sites (Hasegawa 232-site data)
Paired sites tests
Winning sites test (Prager and Wilson, 1988). Do a sign test on the signs of the dierences. z test (me, 1993 in PHYLIP documentation). Assume dierences are normal, do z test of whether mean (hence sum) dierence is signicant. t test. Swoord et. al., 1996: do a t test (paired) Wilcoxon ranked sums test (Templeton, 1983). RELL test (Kishino and Hasegawa, 1989 per my suggestion). Bootstrap resample sites, get distribution of dierence of totals.
In this example ...
Winning sites test. 3.279 109
160 of 232 sites favor tree I. P <
z test. Dierence of log-likelihood totals is 0.948104 standard deviations from 0, P = 0.343077. Not signicant. t test. Same as z test for this large a number of sites. Wilcoxon ranked sums test. Rank sum is 4.82805 standard deviations below its expected value, P = 0.000001378765 RELL test. 8,326 out of 10,000 samples have a positive sum, P = 0.3348 (two-sided)
for each parameter value, find data values (unshaded) that account for 95% of the probability
data value
parameter value
then, given a data value, the parameters that are in the 95% confidence region are those for which that data value is in the unshaded region
A 3species clocklike tree with JukesCantor model Possible data patterns
( x, y, z stand for different bases) ABC xxx xxy xyx yxx xyz we will ignore all outcomes except these three
Expected frequencies of xxy, xyx, yxx: (p ,p ,p ) 1 2 3 (p, q, q)
p 3 p 1
p 2
(q, p, q)
(q, q, p)
Test of 3species Tree with a Clock

(Felsenstein, 1985)
(informative characters) possible data
or tin gt ree 5 3 10 p su po rti 5 ng 0
A B C
su
pp
e1 0 tre
10
0
5 10 supporting tree 2
confidence region
10
statistic: number of steps different between best and next best tree
Chars
S (0.05) 4 5 4 5 4 5 6 7
4 5 6 7 8 913 1420 2129
10
10
estimate ofq
q (unknown) true value of
empirical distribution of sample Bootstrap replicates
(unknown) true distribution
Distribution of estimates of parameters
Bootstrap sampling from a distribution (a mixture of two normals) to estimate the variance of the mean
Bootstrap sampling
To infer the error in a quantity, , estimated from a sample of points x1, x2, . . . , xn we can

Do the following R times (R = 1000 or so) Draw a bootstrap sample by sampling n times with replacement from the sample. Call these x, x, . . . , x . Note n 1 2 that some of the original points are represented more than once in the bootstrap sample, some once, some not at all.
Estimate from the bootstrap sample, call this k (k = 1, 2, . . . , R)
When all R bootstrap samples have been done, the distribution of i estimates the distribution one would get if one were able to draw repeated samples of n points from the unknown true distribution.
Original Data
sites
sequences
Bootstrap sample #1
Estimate of the tree
sites
sequences
sample same number of sites, with replacement
Bootstrap sample #2
sequences
sites
Bootstrap estimate of the tree, #1
sample same number of sites, with replacement
(and so on)
Bootstrap estimate of the tree, #2
Bootstrap sampling of phylogenies
The sites are assumed to have evolved independently given the tree. They are the entities that are sampled (the xi). The trees play the role of the parameter. One ends up with a cloud of R sampled trees. To summarize this cloud, we ask, for each branch in the tree, how frequently it appears among the cloud of trees. We make a tree that summarizes this for all the most frequently occurring branches. This is the majority rule consensus tree of the bootstrap estimates of the tree.
Trees:
E A C F B D E C A B D F
A F D B
A D
F B C
C A D
F B
How many times each partition of species is found: AE | BCDF ACE | BDF ACEF | BD AC | BDEF AEF | BCD ADEF | BC ABDF | EC ABCE | DF 3 3 1 1 1 2 1 3 Majorityrule consensus tree of the unrooted trees: E C
60 60
B
60
Bovine Mouse Squir Monk

35 72 42 74 80
Chimp Human Gorilla Orang Gibbon
84
77 99 99 100
Rhesus Mac Jpn Macaq CrabE.Mac BarbMacaq
49
Tarsier Lemur
An example of bootstrap sampling of trees 232 nucleotide, 14-species mitochondrial D-loop data set Analyzed by parsimony, 100 bootstrap replicates
Potential problems with the bootstrap 1. Sites may not evolve independently 2. Sites may not come from a common distribution (but can consider them sampled from a mixture of possible distributions) 3. If do not know which branch is of interest at the outset, a multiple-tests problem means P values are overstated 4. P values are biased (too conservative) 5. Bootstrapping does not correct biases in phylogeny methods
True value of mean
Distribution of individual values of x
True distribution of sample means
Estimated distributions of sample means
"Topology" II
"Topology" I
A model showing the bias in bootstrap P vales
note that the true P is more extreme than the average of the Ps
estimate of the "phylogeny"
topology II
topology I
the true mean
Illustration of the source of the bias
1.00
0.80
Average P
0.60
0.40
0.20
0.00
0.00 0.20 0.40 0.60 0.80 1.00
True P
Extent of the bias in the example
Probability of correct topology
1.00
2.0
0.80
1.0
0.60
0.1
0.40 0.20 0.00 0.00 0.50 1.00
P value
Probability of being correct and variance of prior
Other resampling methods Delete-half jackknife. replacement. Sample a random 50% of the sites, without
Delete-1/e jackknife (Farris et. al. 1996) (too little deletion from a statistical viewpoint). Reweighting characters by choosing weights from an exponential distribution. In fact, reweighting them by any exchangeable weights having coecient of variation of 1 Parametric bootstrap simulate data sets of this size assuming the estimate of the tree is the truth (to correct for correlation among adjacent sites) (Knsch, 1989) Blocku bootstrapping sample n/b blocks of b adjacent sites.
Bovine Mouse Squir Monk

80
Chimp Human Gorilla Orang Gibbon
84
69 50 32 72
80 98 99 100
Rhesus Mac Jpn Macaq CrabE.Mac BarbMacaq
59
Tarsier Lemur
Delete-half jackknife P values (compare with bootstrap)
Exact computation of the effects of deletion fraction for the jackknife

n 1 n 2
(suppose 1 and 2 are conflicting groups)
n characters
m 1
m 2
n(1) characters
We can compute for various ns the probabilities of getting more evidence for group 1 than for group 2 A typical result is for n = 10, n = 8, n = 100 : 2 1 Jackknife Bootstrap Prob( m >m ) 1 2 Prob( m >m ) 1 2 Prob( m >m ) 1 2 + 1 Prob( m = m ) 2 1 2 0.6384 0.7230 0.6807 = 1/2 0.5923 0.7587 0.6755 = 1/e 0.6441 0.8040 0.7240
The Parametric Bootstrap (Efron, 1985)

Suppose we have independent observations drawn from a known distribution: and a parameter, , calculated from this.
x , x , x , ... 1 2 3
To infer the variability of ^ Use the current estimate, Use the distribution that has that as its true parameter
x *, x *, x * ... , 1 2 3 x * n x * n x * n
^ ^ ^
sample R data sets from that distribution, each having the same sample size as the original sample
x *, x *, x * ... , 1 2 3
. . .
x *, x *, x * ... , 1 2 3
. . .
x *, x *, x * ... , 1 2 3
x * n
i as the estimate of the distribution from which
and take the distribution of the
is drawn
A resampling approach to distributions of the likelihood ratio statistics
Goldman (1993) suggests that, in cases where we may wonder whether the Likelihood Ratio Test statistic really has its desired 2 distribution we can: Take our best estimate of the tree Simulate on it the evolution of data sets of the same size For each replicate, calculate the LRT statistic Use this as the distribution and see where the actual LRT value lies in it (e.g.: in the upper 5%?) This, of course, is a parametric bootstrap.
computer simulation
data set #1
estimation of tree
T 1
estimate of tree original data
data set #2
data set #3
data set #100
100
References Bremer, K. 1988. The limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42: 795-803. [Bremer support] Cavender, J. A. 1978. Taxonomy with condence. Mathematical Biosciences 40: 271-280. [Pioneering paper on condence intervals on trees] Efron, B. 1979. Bootstrap methods: another look at the jackknife. Annals of Statistics 7: 1-26. [The original bootstrap paper] Efron, B. 1985. Bootstrap condence intervals for a class of parametric problems. Biometrika 72: 45-58. [The parametric bootstrap] Farris, J. S., V. A. Albert, M. Kallersj, D. Lipscomb, and A. G. Kluge. 1996. o Parsimony jackkning outperforms neighbor-joining. Cladistics 12: 99-124. [The delete-1/e jackknife for phylogenies] Felsenstein, J. 1981b. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: 368-376. [Mentions possibility of likelihood ratio tests]
Felsenstein, J. 1985a. Condence limits on phylogenies: an approach using the bootstrap. Evolution 39: 783-791. [The bootstrap rst applied to phylogenies] Felsenstein, J. 1985b. Condence limits on phylogenies with a molecular clock. Systematic Zoology 34: 152-161. Felsenstein, J. and H. Kishino. 1993. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Systematic Biology 42: 193-200. [A more detailed exposition of the bias of P values in a normal case] Fisher, R. A. 1922. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, A 222: 309368. [Fishers great likelihood paper, with mention of asymptotic variances of MLEs] Goldman, N. 1993. Statistical tests of models of DNA substitution. Journal of Molecular Evolution 36: 182-98. [Parametric bootstrapping for testing models]
Harshman, J. 1994. The eect of irrelevant characters on bootstrap values. Systematic Zoology 43: 419-424. [Not much eect on parsimony whether or not you include invariant characters when bootstrapping] Kishino, H. and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. Journal of Molecular Evolution 29: 170-179. [The KHT test] Hasegawa, M., H. Kishino. 1989. Condence limits on the maximumlikelihood estimate of the hominoid tree from mitochondrial-DNA sequences. Evolution 43: 672-677 [The KHT test] Hasegawa, M. and H. Kishino. 1994. Accuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree. Molecular Biology and Evolution 11: 142-145. [RELL probabilities] Hillis, D. M. and J. J. Bull. 1993. An empirical test of bootstrapping as a method for assessing condence in phylogenetic analysis. Systematic Biology 42: 182-192. [Bias in P values seen in a large simulation study]
Huelsenbeck, J. P. and B. Rannala. 1997. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276: 227-232 (11 April) [Review of hyothesis testing with trees] Huelsenbeck, J. P. and K. A. Crandall. 1997. Phylogeny estimation and hypothesis testing using maximum likelihood. Annual Review of Ecology and Systematics 28: 437-466. [Review] Kishino, H. and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. Journal of Molecular Evolution 29: 170-179. [KHT test with likelihoods] Kishino, H. T. Miyata and M. Hasegawa. 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. Journal of Molecular Evolution 31: 151-160. Knsch, H. R. 1989. The jackknife and the bootstrap for general stationary u observations. Annals of Statistics 17: 1217-1241. [The block-bootstrap] Margush, T. and F. R. McMorris. 1981. Consensus n-trees. Bulletin of
Mathematical Biology 43: 239-244i. [Majority-rule consensus trees] Mueller, L. D. and F. J. Ayala. 1982. Estimation and interpretation of genetic distance in empirical studies. Genetical Research 40: 127-137. [Suggest conventional jackknife to assess variance of branch length.] Penny, D. and M. D. Hendy. 1985. Testing methods of evolutionary tree construction. Cladistics 1: 266-278. [Use jackknife resampling to assess accuracy of tree reconstruction, independently of my use of the bootstrap] Prager, E. M. and A. C. Wilson. 1988. Ancient origin of lactalbumin from lysozyme: analysis of DNA and amino acid sequences. Journal of Molecular Evolution 27: 326-335. [winning-sites test] Sanderson, M. J. 1995. Objections to bootstrapping phylogenies: a critique. Systematic Biology 44: 299-320. [Good but he accepts a few criticisms I would not have accepted] Shimodaira, H. and M. Hasegawa. 1999. Multiple comparisons of loglikelihoods with applications to phylogenetic inference. Molecular Biology
and Evolution 16: 1114-1116. [Correction of KHT test for multiple hypothesis] Sitnikova, T., A. Rzhetsky, and M. Nei. 1995. Interior-branch and bootstrap tests of phylogenetic trees. Molecular Biology and Evolution 12: 319-333. [The interior-branch test] Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37: 221-224. [The rst paper on the KHT test] Wu, C. F. J. 1986. Jackknife, bootstrap and other resampling plans in regression analysis. Annals of Statistics 14: 1261-1295. [The delete-half jackknife] Zharkikh, A., and W.-H. Li. 1992. Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. Molecular Biology and Evolution 9: 11191147. [Discovery and explanation of bias in P values]
This Microsoft-free presentation prepared with PDFLaTeX (mathematical typesetting and PDF preparation) Free Pascal Compiler (calculating curves) GNU Plotutils (plotting curves) Idraw (drawing program to modify plots and draw gures) Adobe Acrobat Reader (to display the PDF in full-screen mode) Linux (operating system)

Bootstraps and Testing Trees: 31 July 2002 Marine Biological Laboratory Woods Hole Molecular Evolution Workshop

Uploaded by

Copyright:

Available Formats

Bootstraps and Testing Trees: 31 July 2002 Marine Biological Laboratory Woods Hole Molecular Evolution Workshop

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bootstraps and Testing Trees: 31 July 2002 Marine Biological Laboratory Woods Hole Molecular Evolution Workshop

Uploaded by

Copyright:

Available Formats

31 July 2002 Marine Biological Laboratory Woods Hole Molecular Evolution Workshop

Bootstraps and Testing Trees

2630 2635 2640

Transition / transversion ratio

Likelihood curve (and interval) of Ts/Tn ratio

Constraints for a clock

Constraints on a tree for a clock

x Likelihood surface (in x) for three clocklike trees

Gibbon Orang Gorilla Chimp Human

Gibbon Orang Gorilla Chimp Human

Two trees to be tested using KHT test

... ... ...

2.971 2.691 2.987 2.705

+0.012 +0.111 +0.013 +0.015 +0.010 0.431

Table of dierences in log-likelihood by site

Difference in log likelihood at site

Histogram of LnL among sites (Hasegawa 232-site data)

Paired sites tests

In this example ...

Winning sites test. 3.279 109

160 of 232 sites favor tree I. P <

A 3species clocklike tree with JukesCantor model Possible data patterns

Expected frequencies of xxy, xyx, yxx: (p ,p ,p ) 1 2 3 (p, q, q)

Test of 3species Tree with a Clock

4 5 6 7 8 913 1420 2129

q (unknown) true value of

empirical distribution of sample Bootstrap replicates

(unknown) true distribution

Distribution of estimates of parameters

Estimate of the tree

sample same number of sites, with replacement

Bootstrap estimate of the tree, #1

sample same number of sites, with replacement

Bootstrap estimate of the tree, #2

Bootstrap sampling of phylogenies

Bovine Mouse Squir Monk

Chimp Human Gorilla Orang Gibbon

Rhesus Mac Jpn Macaq CrabE.Mac BarbMacaq

True value of mean

Distribution of individual values of x

True distribution of sample means

Estimated distributions of sample means

A model showing the bias in bootstrap P vales

estimate of the "phylogeny"

the true mean

Illustration of the source of the bias

Extent of the bias in the example

Probability of correct topology

Probability of being correct and variance of prior

Bovine Mouse Squir Monk

Chimp Human Gorilla Orang Gibbon

Rhesus Mac Jpn Macaq CrabE.Mac BarbMacaq

Delete-half jackknife P values (compare with bootstrap)

Exact computation of the effects of deletion fraction for the jackknife

The Parametric Bootstrap (Efron, 1985)

i as the estimate of the distribution from which

and take the distribution of the

A resampling approach to distributions of the likelihood ratio statistics

estimate of tree original data