EX 1 TREE THINKING CONCEPTS Worksheet

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

EXERCISE 1:

TREE THINKING CONCEPTS USING MESQUITE


Adapted from Tree Thinking Concepts of Baum & Smith (2012) and Huang & Whitall (2018)

Introduction Exercise 1 Checklist of


Before Charles Darwin, the concept of evolution was
Graded Assessment
different. The first scientific theory of evolution was often • Species Identification
credited to Jean Baptiste de Lamarck. He endorsed the idea that Results and SGD
life is constantly emerging from nonliving things through Deadline:
spontaneous generation. In his book, Lamarck also argued that
living beings are organized in an ascending series. This is called • Return Demo of Mesquite
the ladder of life (the Great Chain of Being), which is an idea And SGD
that living species represent a continuum of forms of differing Deadline:
degrees of advancements (humans near the “top” of a
• Oral PowerPoint
metaphorical ladder. Simpler organisms were placed on the
lower ladder while more complex organisms were perched Presentation w/ Rubrics
higher which signifies superiority. However, the representation Deadline:
of life cannot fit on only one ladder. With the help of Charles
Lyell’s works, Darwin realized that the evolution of organisms
is more than the ladder of life. He visualized evolution as having a tree form, with all living species
connected to one another at branch points. Darwin pointed out that if distinct living species trace back to a
common ancestor, then evolution must have happened.

According to Baum and Smith (2012), tree thinking is the ability to visualize evolution in tree form
and to use tree diagrams to communicate and analyze evolutionary phenomena. It is significant in the
development of accurate understanding of evolution and aids in the organization of knowledge on
biodiversity. Tree thinking is essential in the various fields of biology: from molecular biology to genetics,
and developmental biology to ecology. Moreover, tree thinking is vital in applied research, such as tracking
diseases like COVID-19, recording responses to climate change, and guiding conservation policies.

A tree diagram is composed of lines, called branches (or edges) which are connected at nodes
(lineages splitting events). The diagram needs to be directed (time runs in one direction along each branch)
and acyclic (lineages that diverge never subsequently fuse) for it to be considered as a tree in the formal
sense. The root of the tree is a special node that marks the point where time enters the diagram and is often
designated by an external branch whose tip is unlabeled.

Mesquite is an extendible, open source, modular software used in studying evolutionary biology.
It allows the users to organize and analyze comparative data about organisms with an emphasis on
phylogenetic analysis. Moreover, it can also be used in population genetics and in non-phylogenetic
multivariate analysis. Mesquite can accept two major types of data: (1) morphological (discreet/categorical)
data and (2) molecular sequence (continuous) data. For the morphological data, a matrix will be formed by
scoring and entering the morphological information. This will let the users to make their own phylogenetic
tree and consensus tree. Meanwhile, sequence data files (FASTA, GenBank, text files) are used for the
molecular analysis. Mesquite specifically allows the users to load and align sequences which are used for
tree building. You may also examine previous character states on the generated molecular phylogenies. In
building the phylogenetic trees for both types of data, different methods are used, namely: Neighbor-
Joining, Parsimony, Minimum Evolution, Maximum likelihood and Bayesian analysis.
Objectives

At the end of this exercise, the students are expected to:


1. Use BLAST analysis for the preliminary identification of species using the assigned sequences.
2. Use Molecular (DNA) sequences to develop phylogenetic trees using parsimony.
3. Determine how characters are weighed in an analysis.
4. Trace character macroevolution in the molecular-based phylogenies.
5. Read and interpret phylogenetic trees based on evolutionary concepts.

Reminders before stating the exercise:

1. Download the following softwares:


a. Notepad or Notepad++ for Windows and TextWrangler or BBEdit for MacOS- Make sure
to download the suitable version based on your windows and MacOS.
b. Java – Make sure to download the updated version of Java in your respective devices. Go
to https://www.java.com/en/download/.
c. Mesquite – Go to
https://www.mesquiteproject.org/Installation.html?GettingStartedPanel=open. Follow the
system requirements and procedures to ensure smooth processing of the software.
d. MUSCLE – This will serve as the alignment algorithm for the molecular data analysis. Go
to www.drive5.com/muscle/. Copy the downloaded file and place in the same folder as the
Mesquite files.
2. Open the Mesquite icon. You must arrive at this window (Figure 3) to ensure that all modules were
properly installed.
3. A separate excel file containing ten (10) datasets will be provided for you. Each group will be
assigned to a single dataset that will be used for this exercise while following the tutorial.
4. Please bear in mind that the instructions provided in this manual is true for the latest version of
Mesquite at the time of writing (Mesquite v3.61 build 927 on Windows 10). Minor differences in
the interface may be observed in older versions and between MacOs, Linux and Windows.

PROCEDURES

Part 1: BLAST Analysis for Species Identification

Basic Local Alignment Search Tool (BLAST) is used for finding regions of local similarity between
nucleotide or protein sequences. It refers to a suite of programs utilized in generating alignments between
sequences known as “query” and sequences within a database known as “subject”sequences. The program
also compares the sequence in the database and computes the statistical significance of the matches.

BLAST databases are built from concatenated FASTA formatted sequences. FASTA is a format of BLAST
query sequences consisting of character strings of single letter nucleotide or amino acid codes, preceded by
a definition line, beginning with a “>” symbol, and containing identifiers and descriptive information.

There are several types of BLAST searches. The National Center for Biotechnology Information (NCBI)
WebBLAST offers four main search types: (1) BLASTn (Nucleotide BLAST), (2) BLASTx (translated
nucleotide sequence searched against protein sequences), (3)tBLASTn (protein sequence searched against
translated nucleotide sequences, and (4) BLASTp (Protein BLAST). NCBI BLAST homepage
(https://blast.ncbi.nlm.nih.gov/Blast.cgi) is shown below .
Figure1. NCBI BLAST
homepage.

In this exercise, you will be


using Nucleotide BLAST.
BLASTn compares one or
more nucleotide query
sequences to a subject
nucleotide sequence or a
database of nucleotide
sequences. It is used in the
determination of
evolutionary relationships
among different organisms.

1. To go to the Nucleotide
BLAST page, just click on the Nucleotide BLAST button on the NCBI BLAST homepage OR simply visit
this link-
(https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastn&PAGE_TYPE=BlastSearch&LINK_LOC=
blasthome). This will lead you to the Nucleotide BLAST page below.

Figure 2. BLASTn (Nucleotide BLAST) homepage.

2. Copy your assigned sequence and paste it on the Enter Query Sequence textbox. You may also upload
the FASTA file. Click the BLAST button to start the search. Wait for the results to load (note: sometimes
this may take a few minutes).

3. Once the search is finished, this will show you the sequences producing significant alignments. In the
descriptions tab, you will see the description, scientific name, max score, total score, query cover, e-value,
percent identity, accession length, and accession number. In the provided excel file, supply the missing
data and include the first three species from your BLAST results. Predict the identity of each of the
sequences.

Part 2: Molecular Analysis Using Mesquite


2A. Creating a Fasta file based on obtained sequences
Molecular analysis is made possible by investigating selected gene regions of the target organism.
For this exercise, the genetic material you will be working on has been provided in a different
supplementary materials file.
Check the list of Sequences provided to your group. Go to NCBI Genbank
(https://www.ncbi.nlm.nih.gov/genbank/) and look for your required genetic data. Make sure to double
check the details of the files you are downloading. Compile all related gene sequences in a .fasta file (you
can use Notepad or Notepad++ for Windows and TextWrangler/BBEdit for MacOS). For each dataset, two
gene regions were provided. Create separate FASTA files for each gene region. Save appropriately.

Figure 3. Mesquite Window

2B. Loading Sequence Data Files in Mesquite


1. Before starting, double check your .fasta file.
a. Each accession must start with the “>” to indicate a new sequence.
b. The .fasta file should only contain a single type of gene region. If the paper called for
multiple gene regions, separate them in their own .fasta files.
2. Start Mesquite. Click FILE >> Open File. Open the .fasta file you compiled in Part 2A. This will
prompt the file interpreter box from where you will select FASTA (DNA/RNA). Notice that
Mesquite can interpret different file types as needed. Click OK.
3. Save the file as a nexus executable file (e.g. PseuduvariaPSBA.nex). DO NOT FORGET the file
extension .nex when saving.
4. The project (e.g. PseuduvariaPSBA.nex) is now open. The character matrix now shows your taxa
in the same arrangement as that of the provided .fasta file. The DNA bases A, T, C and G are color-
coded (Figure 4).
Figure 4. Unaligned Pseuduvaria psbA-trnH intergenic spacer sequences

5. You may now simplify the taxa name by double clicking each taxon. Examine the sequences. This
is now the working file. Save your progress.

2C. Sequence Alignment


To proceed with the analysis, the sequence files must be aligned. Aligned sequences represented
related data and reveal the informative characters of the taxa in study. For the provided example, sequences
from the psbA-trnH intergenic spacer were used.
1. To align the sequences, click MATRIX >> Align Multiple Sequences. Here three alignment
algorithms are suggested: ClustalW, MAFFT and MUSCLE align. For this example, align
using MUSCLE (since this was the alignment software we downloaded beforehand. You can
also use the other alignment algorithms provided you have the softwares necessary).
a. In the dialog box asking for separate threads, click NO.
b. Define the directory where you placed MUSCLE. No need to check the include gaps
and provide additional options. Click OK.
c. Let the alignment process and wait. Once done, you will notice that from the
ambiguous placement of the bases in Figure 4, you now have an aligned sequence data
as seen in Figure 5. Details of the matrix can be seen in the project file panel at the left.
Examine the sequence and check how the DNA bases were aligned in this example.
Save your progress.
Figure 5. Aligned Pseuduvaria psbA-trnH intergenic spacer sequences

Note:
In cases wherein the initial alignment did not produce a good “aligned sequence data”, you can
check and alter your sequences. Instances will occur when the provided/downloaded sequence has been
reversed or encoded as sequence complements. To arrange your alignment, select the target sequence, click
ALTER and choose the necessary method. Make sure to read beforehand on how these methods change the
arrangement of your target sequence.

2D. Constructing the Tree (Molecular Data)


1. The character matrix based on molecular data is now ready for tree reconstruction analysis. From
the tabs click ANALYSIS >> Tree Inference >> Tree Search >> Mesquite Heuristic Search (add
and arrange). This will open a dialog box “Criteria for Tree Search”. From this, select “Treelength”.
Note that there is another term “- Treelength” situated under the “Tree value using character
matrix”. Choose the one that will calculate the parsimony length of the tree. Click OK.
2. Another dialog box will open for “Tree Arranger”. Select SPR Rearranger (function: rearranges a
tree by subtree pruning and regrafting (SPR). Click OK.
3. Set MAXTREES (number of possible tree rearrangements to be made) to 100. Click OK. In the
next dialog box, make sure to check the box for “NOT separate thread” and “Auto-save”. Click
OK.
4. Wait for the analysis to run. More taxa, longer DNA sequences and more tree combinations set
(MAXTRESS) would require a longer running time. The waiting time would differ based on the
laptop/processor/data matrix you are using. Extend patience as necessary.
5. Once done, the tree will be opened in a new tab. Reroot as necessary. Store the 1st tree. Save the
project.
Figure 6. Molecular-based tree for Pseuduvaria psbA-trnH sequences

2E. Constructing the Consensus Tree


The multiple trees created by the analysis can be reduced to a single tree with the best representation of
relationships. This is the consensus tree or the most parsimonious tree. following the steps from Part 2D.
Save the project file and the consensus tree in PDF form. Tracing the character changes of each base is also
possible by using the following steps.
1. Select “TAXA&TREES”>> highlight “Make New Trees Block from” >> Consensus Tree.
2. In the dialog box “ Source of Trees for Consensus”, select Stored Trees. Click OK.
3. In the “ Consensus calculator” box, choose your desired tree. For this analysis, choose Majority
rule Consensus. Click OK.
4. Options for the selected tree will now open. Check the box for “ consider tree weights” and “write
group frequency list”. Set the required frequency of clades at 0.5. This value renders how “strict”
the clade support must be for such clade/relationship to be considered “natural”. Tree root should
be “as specified in the first tree” (pertaining to the rerooted tree you stored in 2D). Click OK and
do NOT run on a separate thread.
5. The consensus tree will open in a new tab. Reroot the tree as needed. Save the project file. Examine
the tree and observe the relationships

2F. Saving .tre files


Tree files can be used for succeeding analysis such as historical biogeography and haplotype
networks. To do so, click FILE >> Export. In the dialog box, select “Select Newick/Phylip Treefile”. In the
succeeding box, first select the Consensus tree and click Include. Save your file with the file extension .tre
(e.g. PseudovariaPSBAConsensus.tre). Do the same steps and this time select “Trees from Mesquite’s
heuristic search” to save all trees created in the analysis. Again, save as .tre file.

Guide Questions:
1. How can you identify an organism or a given sample using BLAST? If the e-value for a given species is
“0,” how will you interpret the result? If the percent identity is 100%, how will you interpret the results?
What are the strengths and limitations of BLAST?
2. Are interpretations of a single data tree analysis conclusive enough? Can it accurately describe the
evolution and relationships of taxa studied?
3. How reliable are molecular data in evolutionary studies? What are the pros and cons of using molecular
data in evolutionary studies?

Conduct all the procedures indicated in this worksheet. Compile submit all the necessary output provided
in the checklist.

References:
Baum, D.A., Smith, S.D., & Donovan, S.S. (2005). The tree-thinking challenge. Science. 310:979–980.
Baum, David A. & Stacey D. Smith. (2012). Tree Thinking: An Introduction to Phylogenetic Biology.
Systematic Biology. 62(4):634–637.
Huang, Sophia & Justen B. Whittall. (2018). Tree of Trees: Using Campus Tree Diversity to Integrate
Molecular, Organismal, and Evolutionary Biology. The American Biology Teacher, 80(2): 144–151.
Maddison, W. P. and D.R. Maddison. 2019. Mesquite: a modular system for evolutionary analysis. Version
3.61. http://www.mesquiteproject.org
Omland, Kevin E., Lyn G. Cook & Michael D. Crisp. (2008). Tree thinking for all biology: the problem
with reading phylogenies as ladders of progress. BioEssays 30:854–867.
Wheeler, D. B. M.(2007)." Chapter 9: BLAST QuickStart. Bergman, Nicholas H. Comparative Genomics
Volumes, 1, 395-396.

You might also like