Online Genetics Laboratory

SBI3U Online Genetics Laboratory November 18, 2008
In 1982, GenBank was created by the US government with funding contributed by the
National Institutes of Health, the National Science Foundation, the Department of Energy
and the Department of Defense. GenBank is a massive database consisting of almost
every DNA, RNA or protein sequence ever sequenced and is publicly available.
Currently, GenBank is maintained by the National Center for Biotechnology Information

(NCBI) and is co-ordinated with similar databases owned by the European Molecular
Biology Laboratory (EMBL) and the DNA Database of Japan (DDBJ). In this laboratory,
you will be directed to access GenBank and explore some of the online tools available to
modern biologists.
There will be questions/tasks scattered throughout this laboratory you will have to submit
at the end. It is suggested you keep a Word file open to answer these questions as you go
– be sure to save frequently!
Visit the NCBI website at http://www.ncbi.nlm.nih.gov/

Click the link for GenBank along the left side.
Q1. How big is GenBank in terms of bases and sequences?

(Ignore the bit about WGS – whole genome shotgun – which is a technique used
to sequence entire genomes.)
Beyond gene and protein sequences, GenBank also houses all the genomes that have ever
been sequenced. (The genome sequence of an organism is the letter sequence (As, Cs, Gs,
and Ts) for the entire DNA in that organism.)
Click Back.
Click the link for Genomic Biology along the left side.
Click the link for Genome along the left side.
Q2. How many genomes have been completely sequenced? How is this number split
between different types of organisms?
(Note that GenBank includes viruses, viroids and plasmids as organism types!)
Find Eukaryota on the left side and click on Chromosome underneath it.
Find Homo sapiens on the list.
For your favourite chromosome, click on the associated accession number (the link that
looks like NC_000123).
What you are seeing now is a “map” of the chromosome. On the left is a map of various
regions of the chromosome, and on the right is a map of all the genes that have been
identified on that chromosome. At the bottom is a summary of the maps, and there should
be a number indicating how big the chromosome is in the form of XXM bp, where M
stands for million and bp stands for base pairs - each base pair is a base (ACGTs) on one
strand hydrogen-bonding with the base on the other strand.
Q3. What chromosome did you choose, and how long is it?
You can actually display the entire letter sequence for the chromosome, but we won’t do
that. Instead, we’ll search GenBank for a specific gene, the gene that causes sickle-cell
anaemia.
Click on the NCBI logo in the top left corner.

In the search box, enter HBB, and click Go.
The webpage you see now is a listing of all the databases that NCBI maintains and the
number of results each database returned on the search term “HBB”. Obviously as you
can see, there are a lot of them.
Click on Gene, the twelfth database listed in the first column.
The webpage you see now are all the genes in the NCBI Gene database that returned a
match on the term “HBB”. The first result is the haemoglobin beta gene in human beings,
but haemoglobin beta genes from other species are listed as well – in particular, the next
four are mouse, cow, rat and chimpanzee.
Click on the first HBB link.

On the right side, under Table of Contents, click Reference Sequences.
Click on GenBank.
What you see now is the GenBank entry for the human haemoglobin beta gene. There’s a
lot of information here, but scroll to the bottom and you should see the actual sequence
for the gene.
Q4. If DNA is double-stranded, why is only one sequence of bases presented for the
gene?
In sickle-cell anaemia, a single nucleotide is changed in the haemoglobin beta gene.
Normal CTGACTCCTGAGGAGAAGTCT
Sickle CTGACTCCTGTGGAGAAGTCT
Q5. Can you find where this nucleotide is in the gene? Which nucleotide is it? (ie. the
first, the second, the third…?)
Enough of that. Back to Diversity of Life. Modern phylogenetics is performed primarily
with ribosomal RNA sequences. The reason is that rRNA is one of the components of
ribosomes, which exist in all organisms, as opposed to genes which might not exist in all
organisms.
Click on the NCBI link in the top left corner.

In the search box, enter X03205 – this is the accession number for human 18S rRNA –
and click Go.
Click on Nucleotide.
Click on X03205.
In the second dropdown menu, change GenBank to FASTA.
Highlight and copy the sequence.
Visit the Ribosomal RNA Database at
http://bioinformatics.psb.ugent.be/webtools/rRNA/blastrrna.html
Paste the rRNA sequence in the Query Sequence box.
At the top, change the Database dropdown from SSU+LSU to SSU.
Click Search.
BLAST is a computer algorithm (program) that finds matches to DNA, RNA, and protein
sequences. By BLASTing the human 18S rRNA sequence in an rRNA database, you
should get a list of similar 18S rRNA sequences from different species. Each entry is in
the form Genus_species_A12345.
Q6. Which species returned the best match to human 18S rRNA?
Choose the rRNA sequences of 15-20 different species by clicking the checkbox next to
them. Make sure your sequences are from different species and include Homo sapiens.
At the top of the page, change the dropdown menu from DCSE alignment to FASTA.
Click Get sequences.
When the page is done loading, copy the entire page.
Go, preferably in a new window or tab, to Jalview website at http://www.jalview.org
Click Applet Version along the left.
Click the fourth Start Jalview button.
Wait until the Java program loads.
Click File -> Input from textbox.
Paste your sequences from the rRNA database.
Click New Window.
Click Edit -> Remove Empty Columns
Using your mouse, highlight the first ~45 columns, the ones filled with os.
Hit Delete on your keyboard.
Scroll to the far right end, highlight the last ~45 columns, the ones filled with os.
Hit Delete on your keyboard.
Click Colour -> Nucleotide.
Stop here. What you have here is a colour-coded alignment of the 18S rRNA sequences
of your various species. It is an alignment, because similar regions of each sequence are
matched up with each other – i.e. As are matched with As, Cs with Cs, Gs with Gs, and
Us with Us wherever possible. You’ll notice that each letter is given its own particular
colour. You’ll also notice there are no Ts, but instead there are Us. That’s because Ts get
replaced by Us when moving from DNA to RNA. You’ll notice that there are a lot of
solid columns of the same colour, where every species has the same letter in the same
position. This is because 18S rRNA is highly conserved between different species – the
sequences from different species will show high identity, or be the same, in many places.
Mutations in the 18S rRNA sequence are not well-tolerated – a malfunctioning 18S
rRNA gene will result in immediate cell death.
You’ll also notice some places where there are no letters for some species. Those are
“gaps” – parts of the rRNA sequence that were either added or deleted for certain species.
The areas you deleted at the beginning and the end were large gaps resulting from
different species having different lengths of rRNA.
Q7. Using the Internet, figure out which 15-20 species you have from their scientific
names and list them.
Q8. If you were to create a phylogenetic tree of these species, which species do you
think would be close together on the tree (closely related)? What might the tree
look like?
In Jalview, click Calculate -> Calculate Tree -> Neighbour Joining Using % Identity.
(If you have problems with this step, try hitting Esc first. You may have selected an area
of the alignment without realising it.)
Q9. What does the tree look like?
Maximize the tree window.

Click on View -> Font.
Change the font size to 30, or thereabouts.
Using the Print Screen button on your keyboard, take a screenshot of the tree, and paste
it into your Word file.
Q10. Was this laboratory at all interesting to you? What might you change about it to
make it more useful/fun/informative, or do you think it was a total waste of your
time?
Print and submit! Don’t forget to put your name(s) on it!

Online Genetics Laboratory

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Online Genetics Laboratory

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Online Genetics Laboratory

Uploaded by

Copyright:

Available Formats

SBI3U Online Genetics Laboratory November 18, 2008

Currently, GenBank is maintained by the National Center for Biotechnology Information

Visit the NCBI website at http://www.ncbi.nlm.nih.gov/

Q1. How big is GenBank in terms of bases and sequences?

Click on the NCBI logo in the top left corner.

Click on Gene, the twelfth database listed in the first column.

Click on the first HBB link.

In sickle-cell anaemia, a single nucleotide is changed in the haemoglobin beta gene.

Click on the NCBI link in the top left corner.

Q9. What does the tree look like?

Maximize the tree window.

Print and submit! Don’t forget to put your name(s) on it!

You might also like