Online Genetics Laboratory
Online Genetics Laboratory
Online Genetics Laboratory
In 1982, GenBank was created by the US government with funding contributed by the
National Institutes of Health, the National Science Foundation, the Department of Energy
and the Department of Defense. GenBank is a massive database consisting of almost
every DNA, RNA or protein sequence ever sequenced and is publicly available.
There will be questions/tasks scattered throughout this laboratory you will have to submit
at the end. It is suggested you keep a Word file open to answer these questions as you go
– be sure to save frequently!
Beyond gene and protein sequences, GenBank also houses all the genomes that have ever
been sequenced. (The genome sequence of an organism is the letter sequence (As, Cs, Gs,
and Ts) for the entire DNA in that organism.)
Click Back.
Click the link for Genomic Biology along the left side.
Click the link for Genome along the left side.
Q2. How many genomes have been completely sequenced? How is this number split
between different types of organisms?
(Note that GenBank includes viruses, viroids and plasmids as organism types!)
Find Eukaryota on the left side and click on Chromosome underneath it.
Find Homo sapiens on the list.
For your favourite chromosome, click on the associated accession number (the link that
looks like NC_000123).
What you are seeing now is a “map” of the chromosome. On the left is a map of various
regions of the chromosome, and on the right is a map of all the genes that have been
identified on that chromosome. At the bottom is a summary of the maps, and there should
be a number indicating how big the chromosome is in the form of XXM bp, where M
stands for million and bp stands for base pairs - each base pair is a base (ACGTs) on one
strand hydrogen-bonding with the base on the other strand.
Q3. What chromosome did you choose, and how long is it?
You can actually display the entire letter sequence for the chromosome, but we won’t do
that. Instead, we’ll search GenBank for a specific gene, the gene that causes sickle-cell
anaemia.
The webpage you see now is a listing of all the databases that NCBI maintains and the
number of results each database returned on the search term “HBB”. Obviously as you
can see, there are a lot of them.
The webpage you see now are all the genes in the NCBI Gene database that returned a
match on the term “HBB”. The first result is the haemoglobin beta gene in human beings,
but haemoglobin beta genes from other species are listed as well – in particular, the next
four are mouse, cow, rat and chimpanzee.
What you see now is the GenBank entry for the human haemoglobin beta gene. There’s a
lot of information here, but scroll to the bottom and you should see the actual sequence
for the gene.
Q4. If DNA is double-stranded, why is only one sequence of bases presented for the
gene?
Normal CTGACTCCTGAGGAGAAGTCT
Sickle CTGACTCCTGTGGAGAAGTCT
Q5. Can you find where this nucleotide is in the gene? Which nucleotide is it? (ie. the
first, the second, the third…?)
Enough of that. Back to Diversity of Life. Modern phylogenetics is performed primarily
with ribosomal RNA sequences. The reason is that rRNA is one of the components of
ribosomes, which exist in all organisms, as opposed to genes which might not exist in all
organisms.
BLAST is a computer algorithm (program) that finds matches to DNA, RNA, and protein
sequences. By BLASTing the human 18S rRNA sequence in an rRNA database, you
should get a list of similar 18S rRNA sequences from different species. Each entry is in
the form Genus_species_A12345.
Q6. Which species returned the best match to human 18S rRNA?
Choose the rRNA sequences of 15-20 different species by clicking the checkbox next to
them. Make sure your sequences are from different species and include Homo sapiens.
At the top of the page, change the dropdown menu from DCSE alignment to FASTA.
Click Get sequences.
When the page is done loading, copy the entire page.
Go, preferably in a new window or tab, to Jalview website at http://www.jalview.org
Click Applet Version along the left.
Click the fourth Start Jalview button.
Wait until the Java program loads.
Click File -> Input from textbox.
Paste your sequences from the rRNA database.
Click New Window.
Click Edit -> Remove Empty Columns
Using your mouse, highlight the first ~45 columns, the ones filled with os.
Hit Delete on your keyboard.
Scroll to the far right end, highlight the last ~45 columns, the ones filled with os.
Hit Delete on your keyboard.
Click Colour -> Nucleotide.
Stop here. What you have here is a colour-coded alignment of the 18S rRNA sequences
of your various species. It is an alignment, because similar regions of each sequence are
matched up with each other – i.e. As are matched with As, Cs with Cs, Gs with Gs, and
Us with Us wherever possible. You’ll notice that each letter is given its own particular
colour. You’ll also notice there are no Ts, but instead there are Us. That’s because Ts get
replaced by Us when moving from DNA to RNA. You’ll notice that there are a lot of
solid columns of the same colour, where every species has the same letter in the same
position. This is because 18S rRNA is highly conserved between different species – the
sequences from different species will show high identity, or be the same, in many places.
Mutations in the 18S rRNA sequence are not well-tolerated – a malfunctioning 18S
rRNA gene will result in immediate cell death.
You’ll also notice some places where there are no letters for some species. Those are
“gaps” – parts of the rRNA sequence that were either added or deleted for certain species.
The areas you deleted at the beginning and the end were large gaps resulting from
different species having different lengths of rRNA.
Q7. Using the Internet, figure out which 15-20 species you have from their scientific
names and list them.
Q8. If you were to create a phylogenetic tree of these species, which species do you
think would be close together on the tree (closely related)? What might the tree
look like?
In Jalview, click Calculate -> Calculate Tree -> Neighbour Joining Using % Identity.
(If you have problems with this step, try hitting Esc first. You may have selected an area
of the alignment without realising it.)
Q10. Was this laboratory at all interesting to you? What might you change about it to
make it more useful/fun/informative, or do you think it was a total waste of your
time?