Bootstraps and Testing Trees: 31 July 2002 Marine Biological Laboratory Woods Hole Molecular Evolution Workshop

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

31 July 2002 Marine Biological Laboratory Woods Hole Molecular Evolution Workshop

Bootstraps and Testing Trees


Joe Felsenstein Department of Genome Sciences University of Washington, Seattle email: [email protected]

2620 2625

ln L

2630 2635 2640

10

20

50

100

200

Transition / transversion ratio

Likelihood curve (and interval) of Ts/Tn ratio

A
v1 v 6

B
v 2

D
v 4

E
v5

Constraints for a clock


v1 = v 2 v4 = v5

v3 v 8 v7

v1 + v = v 3 6 v +v = v +v 3 4 7 8

Constraints on a tree for a clock

204

ln Likelihood

A 205 x

A x

B 206 x

0.10

0.10

0.20

x Likelihood surface (in x) for three clocklike trees

Mouse Bovine

Tree I

Gibbon Orang Gorilla Chimp Human

Mouse Bovine

Tree II

Gibbon Orang Gorilla Chimp Human

Two trees to be tested using KHT test

site Tree

231

232

ln L
1405.61 1408.80

I II Diff

2.971 4.483 5.673 5.883 2.691 8.003 2.983 4.494 5.685 5.898 2.700 7.572

... ... ...

2.971 2.691 2.987 2.705

+0.012 +0.111 +0.013 +0.015 +0.010 0.431

+0.012 +0.010

+3.19

Table of dierences in log-likelihood by site

0.50

0.0

0.50

1.0

1.5

2.0

Difference in log likelihood at site

Histogram of LnL among sites (Hasegawa 232-site data)

Paired sites tests

Winning sites test (Prager and Wilson, 1988). Do a sign test on the signs of the dierences. z test (me, 1993 in PHYLIP documentation). Assume dierences are normal, do z test of whether mean (hence sum) dierence is signicant. t test. Swoord et. al., 1996: do a t test (paired) Wilcoxon ranked sums test (Templeton, 1983). RELL test (Kishino and Hasegawa, 1989 per my suggestion). Bootstrap resample sites, get distribution of dierence of totals.

In this example ...

Winning sites test. 3.279 109

160 of 232 sites favor tree I. P <

z test. Dierence of log-likelihood totals is 0.948104 standard deviations from 0, P = 0.343077. Not signicant. t test. Same as z test for this large a number of sites. Wilcoxon ranked sums test. Rank sum is 4.82805 standard deviations below its expected value, P = 0.000001378765 RELL test. 8,326 out of 10,000 samples have a positive sum, P = 0.3348 (two-sided)

for each parameter value, find data values (unshaded) that account for 95% of the probability

data value

parameter value
then, given a data value, the parameters that are in the 95% confidence region are those for which that data value is in the unshaded region

A 3species clocklike tree with JukesCantor model Possible data patterns

( x, y, z stand for different bases) ABC xxx xxy xyx yxx xyz we will ignore all outcomes except these three

Expected frequencies of xxy, xyx, yxx: (p ,p ,p ) 1 2 3 (p, q, q)

p 3 p 1

p 2

(q, p, q)

(q, q, p)

Test of 3species Tree with a Clock


(Felsenstein, 1985)
(informative characters) possible data
or tin gt ree 5 3 10 p su po rti 5 ng 0
A B C

su

pp

e1 0 tre

10
0

5 10 supporting tree 2

confidence region
10
statistic: number of steps different between best and next best tree

Chars

S (0.05) 4 5 4 5 4 5 6 7

4 5 6 7 8 913 1420 2129

10

10

estimate ofq

q (unknown) true value of

empirical distribution of sample Bootstrap replicates

(unknown) true distribution

Distribution of estimates of parameters

Bootstrap sampling from a distribution (a mixture of two normals) to estimate the variance of the mean

Bootstrap sampling

To infer the error in a quantity, , estimated from a sample of points x1, x2, . . . , xn we can

Do the following R times (R = 1000 or so) Draw a bootstrap sample by sampling n times with replacement from the sample. Call these x, x, . . . , x . Note n 1 2 that some of the original points are represented more than once in the bootstrap sample, some once, some not at all.
Estimate from the bootstrap sample, call this k (k = 1, 2, . . . , R)

When all R bootstrap samples have been done, the distribution of i estimates the distribution one would get if one were able to draw repeated samples of n points from the unknown true distribution.

Original Data

sites

sequences

Bootstrap sample #1

Estimate of the tree

sites

sequences

sample same number of sites, with replacement

Bootstrap sample #2
sequences

sites

Bootstrap estimate of the tree, #1

sample same number of sites, with replacement

(and so on)

Bootstrap estimate of the tree, #2

Bootstrap sampling of phylogenies

The sites are assumed to have evolved independently given the tree. They are the entities that are sampled (the xi). The trees play the role of the parameter. One ends up with a cloud of R sampled trees. To summarize this cloud, we ask, for each branch in the tree, how frequently it appears among the cloud of trees. We make a tree that summarizes this for all the most frequently occurring branches. This is the majority rule consensus tree of the bootstrap estimates of the tree.

Trees:
E A C F B D E C A B D F

A F D B

A D

F B C

C A D

F B

How many times each partition of species is found: AE | BCDF ACE | BDF ACEF | BD AC | BDEF AEF | BCD ADEF | BC ABDF | EC ABCE | DF 3 3 1 1 1 2 1 3 Majorityrule consensus tree of the unrooted trees: E C
60 60

B
60

Bovine Mouse Squir Monk


35 72 42 74 80

Chimp Human Gorilla Orang Gibbon

84

77 99 99 100

Rhesus Mac Jpn Macaq CrabE.Mac BarbMacaq

49

Tarsier Lemur

An example of bootstrap sampling of trees 232 nucleotide, 14-species mitochondrial D-loop data set Analyzed by parsimony, 100 bootstrap replicates

Potential problems with the bootstrap 1. Sites may not evolve independently 2. Sites may not come from a common distribution (but can consider them sampled from a mixture of possible distributions) 3. If do not know which branch is of interest at the outset, a multiple-tests problem means P values are overstated 4. P values are biased (too conservative) 5. Bootstrapping does not correct biases in phylogeny methods

True value of mean

Distribution of individual values of x

True distribution of sample means

Estimated distributions of sample means

"Topology" II

"Topology" I

A model showing the bias in bootstrap P vales

note that the true P is more extreme than the average of the Ps

estimate of the "phylogeny"

topology II

topology I

the true mean

Illustration of the source of the bias

1.00

0.80

Average P

0.60

0.40

0.20

0.00
0.00 0.20 0.40 0.60 0.80 1.00

True P

Extent of the bias in the example

Probability of correct topology

1.00

2.0
0.80

1.0
0.60

0.1
0.40 0.20 0.00 0.00 0.50 1.00

P value

Probability of being correct and variance of prior

Other resampling methods Delete-half jackknife. replacement. Sample a random 50% of the sites, without

Delete-1/e jackknife (Farris et. al. 1996) (too little deletion from a statistical viewpoint). Reweighting characters by choosing weights from an exponential distribution. In fact, reweighting them by any exchangeable weights having coecient of variation of 1 Parametric bootstrap simulate data sets of this size assuming the estimate of the tree is the truth (to correct for correlation among adjacent sites) (Knsch, 1989) Blocku bootstrapping sample n/b blocks of b adjacent sites.

Bovine Mouse Squir Monk


80

Chimp Human Gorilla Orang Gibbon

84
69 50 32 72

80 98 99 100

Rhesus Mac Jpn Macaq CrabE.Mac BarbMacaq

59

Tarsier Lemur

Delete-half jackknife P values (compare with bootstrap)

Exact computation of the effects of deletion fraction for the jackknife


n 1 n 2
(suppose 1 and 2 are conflicting groups)

n characters

m 1

m 2

n(1) characters

We can compute for various ns the probabilities of getting more evidence for group 1 than for group 2 A typical result is for n = 10, n = 8, n = 100 : 2 1 Jackknife Bootstrap Prob( m >m ) 1 2 Prob( m >m ) 1 2 Prob( m >m ) 1 2 + 1 Prob( m = m ) 2 1 2 0.6384 0.7230 0.6807 = 1/2 0.5923 0.7587 0.6755 = 1/e 0.6441 0.8040 0.7240

The Parametric Bootstrap (Efron, 1985)


Suppose we have independent observations drawn from a known distribution: and a parameter, , calculated from this.

x , x , x , ... 1 2 3

To infer the variability of ^ Use the current estimate, Use the distribution that has that as its true parameter
x *, x *, x * ... , 1 2 3 x * n x * n x * n

^ ^ ^

sample R data sets from that distribution, each having the same sample size as the original sample

x *, x *, x * ... , 1 2 3
. . .

x *, x *, x * ... , 1 2 3
. . .

x *, x *, x * ... , 1 2 3

x * n

i as the estimate of the distribution from which

and take the distribution of the

is drawn

A resampling approach to distributions of the likelihood ratio statistics

Goldman (1993) suggests that, in cases where we may wonder whether the Likelihood Ratio Test statistic really has its desired 2 distribution we can: Take our best estimate of the tree Simulate on it the evolution of data sets of the same size For each replicate, calculate the LRT statistic Use this as the distribution and see where the actual LRT value lies in it (e.g.: in the upper 5%?) This, of course, is a parametric bootstrap.

computer simulation
data set #1

estimation of tree
T 1

estimate of tree original data

data set #2

data set #3

data set #100

100

References Bremer, K. 1988. The limits of amino acid sequence data in angiosperm phylogenetic reconstruction. Evolution 42: 795-803. [Bremer support] Cavender, J. A. 1978. Taxonomy with condence. Mathematical Biosciences 40: 271-280. [Pioneering paper on condence intervals on trees] Efron, B. 1979. Bootstrap methods: another look at the jackknife. Annals of Statistics 7: 1-26. [The original bootstrap paper] Efron, B. 1985. Bootstrap condence intervals for a class of parametric problems. Biometrika 72: 45-58. [The parametric bootstrap] Farris, J. S., V. A. Albert, M. Kallersj, D. Lipscomb, and A. G. Kluge. 1996. o Parsimony jackkning outperforms neighbor-joining. Cladistics 12: 99-124. [The delete-1/e jackknife for phylogenies] Felsenstein, J. 1981b. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17: 368-376. [Mentions possibility of likelihood ratio tests]

Felsenstein, J. 1985a. Condence limits on phylogenies: an approach using the bootstrap. Evolution 39: 783-791. [The bootstrap rst applied to phylogenies] Felsenstein, J. 1985b. Condence limits on phylogenies with a molecular clock. Systematic Zoology 34: 152-161. Felsenstein, J. and H. Kishino. 1993. Is there something wrong with the bootstrap on phylogenies? A reply to Hillis and Bull. Systematic Biology 42: 193-200. [A more detailed exposition of the bias of P values in a normal case] Fisher, R. A. 1922. On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society of London, A 222: 309368. [Fishers great likelihood paper, with mention of asymptotic variances of MLEs] Goldman, N. 1993. Statistical tests of models of DNA substitution. Journal of Molecular Evolution 36: 182-98. [Parametric bootstrapping for testing models]

Harshman, J. 1994. The eect of irrelevant characters on bootstrap values. Systematic Zoology 43: 419-424. [Not much eect on parsimony whether or not you include invariant characters when bootstrapping] Kishino, H. and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. Journal of Molecular Evolution 29: 170-179. [The KHT test] Hasegawa, M., H. Kishino. 1989. Condence limits on the maximumlikelihood estimate of the hominoid tree from mitochondrial-DNA sequences. Evolution 43: 672-677 [The KHT test] Hasegawa, M. and H. Kishino. 1994. Accuracies of the simple methods for estimating the bootstrap probability of a maximum-likelihood tree. Molecular Biology and Evolution 11: 142-145. [RELL probabilities] Hillis, D. M. and J. J. Bull. 1993. An empirical test of bootstrapping as a method for assessing condence in phylogenetic analysis. Systematic Biology 42: 182-192. [Bias in P values seen in a large simulation study]

Huelsenbeck, J. P. and B. Rannala. 1997. Phylogenetic methods come of age: testing hypotheses in an evolutionary context. Science 276: 227-232 (11 April) [Review of hyothesis testing with trees] Huelsenbeck, J. P. and K. A. Crandall. 1997. Phylogeny estimation and hypothesis testing using maximum likelihood. Annual Review of Ecology and Systematics 28: 437-466. [Review] Kishino, H. and M. Hasegawa. 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. Journal of Molecular Evolution 29: 170-179. [KHT test with likelihoods] Kishino, H. T. Miyata and M. Hasegawa. 1990. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. Journal of Molecular Evolution 31: 151-160. Knsch, H. R. 1989. The jackknife and the bootstrap for general stationary u observations. Annals of Statistics 17: 1217-1241. [The block-bootstrap] Margush, T. and F. R. McMorris. 1981. Consensus n-trees. Bulletin of

Mathematical Biology 43: 239-244i. [Majority-rule consensus trees] Mueller, L. D. and F. J. Ayala. 1982. Estimation and interpretation of genetic distance in empirical studies. Genetical Research 40: 127-137. [Suggest conventional jackknife to assess variance of branch length.] Penny, D. and M. D. Hendy. 1985. Testing methods of evolutionary tree construction. Cladistics 1: 266-278. [Use jackknife resampling to assess accuracy of tree reconstruction, independently of my use of the bootstrap] Prager, E. M. and A. C. Wilson. 1988. Ancient origin of lactalbumin from lysozyme: analysis of DNA and amino acid sequences. Journal of Molecular Evolution 27: 326-335. [winning-sites test] Sanderson, M. J. 1995. Objections to bootstrapping phylogenies: a critique. Systematic Biology 44: 299-320. [Good but he accepts a few criticisms I would not have accepted] Shimodaira, H. and M. Hasegawa. 1999. Multiple comparisons of loglikelihoods with applications to phylogenetic inference. Molecular Biology

and Evolution 16: 1114-1116. [Correction of KHT test for multiple hypothesis] Sitnikova, T., A. Rzhetsky, and M. Nei. 1995. Interior-branch and bootstrap tests of phylogenetic trees. Molecular Biology and Evolution 12: 319-333. [The interior-branch test] Templeton, A. R. 1983. Phylogenetic inference from restriction endonuclease cleavage site maps with particular reference to the evolution of humans and the apes. Evolution 37: 221-224. [The rst paper on the KHT test] Wu, C. F. J. 1986. Jackknife, bootstrap and other resampling plans in regression analysis. Annals of Statistics 14: 1261-1295. [The delete-half jackknife] Zharkikh, A., and W.-H. Li. 1992. Statistical properties of bootstrap estimation of phylogenetic variability from nucleotide sequences. I. Four taxa with a molecular clock. Molecular Biology and Evolution 9: 11191147. [Discovery and explanation of bias in P values]

This Microsoft-free presentation prepared with PDFLaTeX (mathematical typesetting and PDF preparation) Free Pascal Compiler (calculating curves) GNU Plotutils (plotting curves) Idraw (drawing program to modify plots and draw gures) Adobe Acrobat Reader (to display the PDF in full-screen mode) Linux (operating system)

You might also like