Bootstraps and Testing Trees: 31 July 2002 Marine Biological Laboratory Woods Hole Molecular Evolution Workshop
Bootstraps and Testing Trees: 31 July 2002 Marine Biological Laboratory Woods Hole Molecular Evolution Workshop
Bootstraps and Testing Trees: 31 July 2002 Marine Biological Laboratory Woods Hole Molecular Evolution Workshop
2620 2625
ln L
10
20
50
100
200
A
v1 v 6
B
v 2
D
v 4
E
v5
v3 v 8 v7
v1 + v = v 3 6 v +v = v +v 3 4 7 8
204
ln Likelihood
A 205 x
A x
B 206 x
0.10
0.10
0.20
Mouse Bovine
Tree I
Mouse Bovine
Tree II
site Tree
231
232
ln L
1405.61 1408.80
I II Diff
2.971 4.483 5.673 5.883 2.691 8.003 2.983 4.494 5.685 5.898 2.700 7.572
+0.012 +0.010
+3.19
0.50
0.0
0.50
1.0
1.5
2.0
Winning sites test (Prager and Wilson, 1988). Do a sign test on the signs of the dierences. z test (me, 1993 in PHYLIP documentation). Assume dierences are normal, do z test of whether mean (hence sum) dierence is signicant. t test. Swoord et. al., 1996: do a t test (paired) Wilcoxon ranked sums test (Templeton, 1983). RELL test (Kishino and Hasegawa, 1989 per my suggestion). Bootstrap resample sites, get distribution of dierence of totals.
z test. Dierence of log-likelihood totals is 0.948104 standard deviations from 0, P = 0.343077. Not signicant. t test. Same as z test for this large a number of sites. Wilcoxon ranked sums test. Rank sum is 4.82805 standard deviations below its expected value, P = 0.000001378765 RELL test. 8,326 out of 10,000 samples have a positive sum, P = 0.3348 (two-sided)
for each parameter value, find data values (unshaded) that account for 95% of the probability
data value
parameter value
then, given a data value, the parameters that are in the 95% confidence region are those for which that data value is in the unshaded region
( x, y, z stand for different bases) ABC xxx xxy xyx yxx xyz we will ignore all outcomes except these three
p 3 p 1
p 2
(q, p, q)
(q, q, p)
su
pp
e1 0 tre
10
0
5 10 supporting tree 2
confidence region
10
statistic: number of steps different between best and next best tree
Chars
S (0.05) 4 5 4 5 4 5 6 7
10
10
estimate ofq
Bootstrap sampling from a distribution (a mixture of two normals) to estimate the variance of the mean
Bootstrap sampling
To infer the error in a quantity, , estimated from a sample of points x1, x2, . . . , xn we can
Do the following R times (R = 1000 or so) Draw a bootstrap sample by sampling n times with replacement from the sample. Call these x, x, . . . , x . Note n 1 2 that some of the original points are represented more than once in the bootstrap sample, some once, some not at all.
Estimate from the bootstrap sample, call this k (k = 1, 2, . . . , R)
When all R bootstrap samples have been done, the distribution of i estimates the distribution one would get if one were able to draw repeated samples of n points from the unknown true distribution.
Original Data
sites
sequences
Bootstrap sample #1
sites
sequences
Bootstrap sample #2
sequences
sites
(and so on)
The sites are assumed to have evolved independently given the tree. They are the entities that are sampled (the xi). The trees play the role of the parameter. One ends up with a cloud of R sampled trees. To summarize this cloud, we ask, for each branch in the tree, how frequently it appears among the cloud of trees. We make a tree that summarizes this for all the most frequently occurring branches. This is the majority rule consensus tree of the bootstrap estimates of the tree.
Trees:
E A C F B D E C A B D F
A F D B
A D
F B C
C A D
F B
How many times each partition of species is found: AE | BCDF ACE | BDF ACEF | BD AC | BDEF AEF | BCD ADEF | BC ABDF | EC ABCE | DF 3 3 1 1 1 2 1 3 Majorityrule consensus tree of the unrooted trees: E C
60 60
B
60
84
77 99 99 100
49
Tarsier Lemur
An example of bootstrap sampling of trees 232 nucleotide, 14-species mitochondrial D-loop data set Analyzed by parsimony, 100 bootstrap replicates
Potential problems with the bootstrap 1. Sites may not evolve independently 2. Sites may not come from a common distribution (but can consider them sampled from a mixture of possible distributions) 3. If do not know which branch is of interest at the outset, a multiple-tests problem means P values are overstated 4. P values are biased (too conservative) 5. Bootstrapping does not correct biases in phylogeny methods
"Topology" II
"Topology" I
note that the true P is more extreme than the average of the Ps
topology II
topology I
1.00
0.80
Average P
0.60
0.40
0.20
0.00
0.00 0.20 0.40 0.60 0.80 1.00
True P
1.00
2.0
0.80
1.0
0.60
0.1
0.40 0.20 0.00 0.00 0.50 1.00
P value
Other resampling methods Delete-half jackknife. replacement. Sample a random 50% of the sites, without
Delete-1/e jackknife (Farris et. al. 1996) (too little deletion from a statistical viewpoint). Reweighting characters by choosing weights from an exponential distribution. In fact, reweighting them by any exchangeable weights having coecient of variation of 1 Parametric bootstrap simulate data sets of this size assuming the estimate of the tree is the truth (to correct for correlation among adjacent sites) (Knsch, 1989) Blocku bootstrapping sample n/b blocks of b adjacent sites.
84
69 50 32 72
80 98 99 100
59
Tarsier Lemur
n characters
m 1
m 2
n(1) characters
We can compute for various ns the probabilities of getting more evidence for group 1 than for group 2 A typical result is for n = 10, n = 8, n = 100 : 2 1 Jackknife Bootstrap Prob( m >m ) 1 2 Prob( m >m ) 1 2 Prob( m >m ) 1 2 + 1 Prob( m = m ) 2 1 2 0.6384 0.7230 0.6807 = 1/2 0.5923 0.7587 0.6755 = 1/e 0.6441 0.8040 0.7240
x , x , x , ... 1 2 3
To infer the variability of ^ Use the current estimate, Use the distribution that has that as its true parameter
x *, x *, x * ... , 1 2 3 x * n x * n x * n
^ ^ ^
sample R data sets from that distribution, each having the same sample size as the original sample
x *, x *, x * ... , 1 2 3
. . .
x *, x *, x * ... , 1 2 3
. . .
x *, x *, x * ... , 1 2 3
x * n
is drawn
Goldman (1993) suggests that, in cases where we may wonder whether the Likelihood Ratio Test statistic really has its desired 2 distribution we can: Take our best estimate of the tree Simulate on it the evolution of data sets of the same size For each replicate, calculate the LRT statistic Use this as the distribution and see where the actual LRT value lies in it (e.g.: in the upper 5%?) This, of course, is a parametric bootstrap.
computer simulation
data set #1
estimation of tree
T 1
data set #2
data set #3
100
References Bremer, K. 1988. The limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42: 795-803. [Bremer support] Cavender, J. A. 1978. Taxonomy with condence. Mathematical Biosciences 40: 271-280. [Pioneering paper on condence intervals on trees] Efron, B. 1979. Bootstrap methods: another look at the jackknife. Annals of Statistics 7: 1-26. [The original bootstrap paper] Efron, B. 1985. Bootstrap condence intervals for a class of parametric problems. Biometrika 72: 45-58. [The parametric bootstrap] Farris, J. S., V. A. Albert, M. Kallersj, D. Lipscomb, and A. G. Kluge. 1996. o Parsimony jackkning outperforms neighbor-joining. Cladistics 12: 99-124. [The delete-1/e jackknife for phylogenies] Felsenstein, J. 1981b. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: 368-376. [Mentions possibility of likelihood ratio tests]
Felsenstein, J. 1985a. Condence limits on phylogenies: an approach using the bootstrap. Evolution 39: 783-791. [The bootstrap rst applied to phylogenies] Felsenstein, J. 1985b. Condence limits on phylogenies with a molecular clock. Systematic Zoology 34: 152-161. Felsenstein, J. and H. Kishino. 1993. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Systematic Biology 42: 193-200. [A more detailed exposition of the bias of P values in a normal case] Fisher, R. A. 1922. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, A 222: 309368. [Fishers great likelihood paper, with mention of asymptotic variances of MLEs] Goldman, N. 1993. Statistical tests of models of DNA substitution. Journal of Molecular Evolution 36: 182-98. [Parametric bootstrapping for testing models]
Harshman, J. 1994. The eect of irrelevant characters on bootstrap values. Systematic Zoology 43: 419-424. [Not much eect on parsimony whether or not you include invariant characters when bootstrapping] Kishino, H. and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. Journal of Molecular Evolution 29: 170-179. [The KHT test] Hasegawa, M., H. Kishino. 1989. Condence limits on the maximumlikelihood estimate of the hominoid tree from mitochondrial-DNA sequences. Evolution 43: 672-677 [The KHT test] Hasegawa, M. and H. Kishino. 1994. Accuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree. Molecular Biology and Evolution 11: 142-145. [RELL probabilities] Hillis, D. M. and J. J. Bull. 1993. An empirical test of bootstrapping as a method for assessing condence in phylogenetic analysis. Systematic Biology 42: 182-192. [Bias in P values seen in a large simulation study]
Huelsenbeck, J. P. and B. Rannala. 1997. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276: 227-232 (11 April) [Review of hyothesis testing with trees] Huelsenbeck, J. P. and K. A. Crandall. 1997. Phylogeny estimation and hypothesis testing using maximum likelihood. Annual Review of Ecology and Systematics 28: 437-466. [Review] Kishino, H. and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. Journal of Molecular Evolution 29: 170-179. [KHT test with likelihoods] Kishino, H. T. Miyata and M. Hasegawa. 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. Journal of Molecular Evolution 31: 151-160. Knsch, H. R. 1989. The jackknife and the bootstrap for general stationary u observations. Annals of Statistics 17: 1217-1241. [The block-bootstrap] Margush, T. and F. R. McMorris. 1981. Consensus n-trees. Bulletin of
Mathematical Biology 43: 239-244i. [Majority-rule consensus trees] Mueller, L. D. and F. J. Ayala. 1982. Estimation and interpretation of genetic distance in empirical studies. Genetical Research 40: 127-137. [Suggest conventional jackknife to assess variance of branch length.] Penny, D. and M. D. Hendy. 1985. Testing methods of evolutionary tree construction. Cladistics 1: 266-278. [Use jackknife resampling to assess accuracy of tree reconstruction, independently of my use of the bootstrap] Prager, E. M. and A. C. Wilson. 1988. Ancient origin of lactalbumin from lysozyme: analysis of DNA and amino acid sequences. Journal of Molecular Evolution 27: 326-335. [winning-sites test] Sanderson, M. J. 1995. Objections to bootstrapping phylogenies: a critique. Systematic Biology 44: 299-320. [Good but he accepts a few criticisms I would not have accepted] Shimodaira, H. and M. Hasegawa. 1999. Multiple comparisons of loglikelihoods with applications to phylogenetic inference. Molecular Biology
and Evolution 16: 1114-1116. [Correction of KHT test for multiple hypothesis] Sitnikova, T., A. Rzhetsky, and M. Nei. 1995. Interior-branch and bootstrap tests of phylogenetic trees. Molecular Biology and Evolution 12: 319-333. [The interior-branch test] Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37: 221-224. [The rst paper on the KHT test] Wu, C. F. J. 1986. Jackknife, bootstrap and other resampling plans in regression analysis. Annals of Statistics 14: 1261-1295. [The delete-half jackknife] Zharkikh, A., and W.-H. Li. 1992. Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. Molecular Biology and Evolution 9: 11191147. [Discovery and explanation of bias in P values]
This Microsoft-free presentation prepared with PDFLaTeX (mathematical typesetting and PDF preparation) Free Pascal Compiler (calculating curves) GNU Plotutils (plotting curves) Idraw (drawing program to modify plots and draw gures) Adobe Acrobat Reader (to display the PDF in full-screen mode) Linux (operating system)