Bioinformatics Session11

Bioinformatics (BIO213)
Session 11
TBLASTN, PSI-BLAST, DELTA-BLAST …
Why are they around?
Name the BLAST programs you have used till now?
• BLASTn
• BLASTp
• Smart BLAST
• Global Align
• Multiple Alignment
What are they used for?
When are they used? Example…
• BLASTn:
• Nucleotide to Nucleotide search
• BLASTp:
• Protein to Protein search
• BLASTn vs BLASTp: Which is preferred and why?
• Which scoring matrix/scheme is generally used and why?
TBLASTN, PSI-BLAST, DELTA-BLAST …
Why are they around?
• BLAST Searching with Multidomain Protein: HIV-1 Pol
• The Gag‐Pol protein of HIV‐1 (NP_057849.4) is a multidomain protein
of 1435 AA residues with protease, reverse transcriptase, and
integrase domains.
Kinds of searches we can perform with
such a viral protein:
Graphical overview: Clicking on domains takes you
to domain databases
List of alignments (query-anchored with
dots for identities)
Perfectly conserved
Rarely substituted residues
• Taxonomy report for a BLASTP search shows
an overview of which species have proteins
matching the HIV‐1 query.
• Most matches are viral, but others include
rabbit, fungal, pig, and insect sequences.
• To learn more about the distribution of Pol proteins
throughout the tree of life, we may further ask what
bacterial proteins are related to the viral HIV‐1 Pol
polyprotein.
• Repeat the BLASTP search with NP_057849 as the

query, but limit the search to “Bacteria”
BLASTP searching HIV-1 pol against bacterial proteins
bacterial matches to HIV-1

retropepsin, reverse
transcriptase domains
bacterial matches to
HIV-1 ribonuclease H
domain bacterial matches to
HIV-1 integrase core
domain
• This suggests that the ribonuclease H and integrase core domains of HIV‐1 match many dozens of bacterial
proteins.
• You can inspect pairwise alignments to confirm that the viral and bacterial proteins are homologous, often
sharing about 30% amino acid identity over spans of over 150 amino acids.
BLAST searching HIV-1 pol against human
sequences
Question: are there human homologs

of HIV-1 pol protein?
Query: HIV-1 Pol
Program: BLASTP
Database: human nr (nonredundant)
Matches: many human proteins
share significant identity.
BLAST searching HIV-1 pol against human
sequences
Question: are there human RNA

transcripts corresponding to HIV-1 pol?
Query: HIV-1 Pol Program: TBLASTN
Database: human ESTs
Matches: many human genes are actively
transcribed to generate transcripts
homologous to HIV-1 pol.
TBLASTN/X helps in searching for super diverged species

PSI-BLAST and DELATA BLAST serves the same purpose
Using BLAST for gene discovery: FIND-A-GENE
• A common problem in biology is finding a new gene.

• Traditionally, genes and proteins were identified using the techniques
of molecular biology and biochemistry.
• Such experimental biology approaches will always remain essential
but has practical limitations.
• Bioinformatics approaches can also be useful to provide evidence for
the existence of new genes.
• For our purposes a “new” gene refers to the discovery of some DNA
sequence in a database that is not annotated (described).
• You may want to find new genes for many reasons:
• Can you think of a few?
A general strategy for “Find-a-gene project” to practice BLAST
Start with the sequence TBLASTN

of a known protein
Eg. human beta globin

(NP_000509) to search for novel Inspect the
output
globin gene
BLASTX nr
or
BLASTP nr

of a known protein
Inspect the
output
BLASTX nr
or
BLASTP nr
2) Perform a TBLASTN search against a DNA database consisting of genomic DNA or ESTs.
Include the output of that BLAST search in your document.

of a known protein
3) You need to distinguish between a perfect match to your query Inspect the
(not “novel”), a near match (might be “novel”, depending on the output
results), and a nonhomologous result.
BLASTX nr
or
BLASTP nr
2) Perform a TBLASTN search against a DNA database consisting of genomic DNA or ESTs.
Include the output of that BLAST search in your document.
Gather information about this “novel” protein
• At a minimum, identify the protein sequence of the “novel”
protein as displayed in the BLAST results.
• Propose a name for the novel protein (e.g., “Krishnazoa globin”),
and report the species from which it derives.
• It is very unlikely (but still possible) that you will find a novel gene
from an organism such as S. cerevisiae, human, or mouse,
because those genomes have already been thoroughly
annotated.
• It is more likely that you will discover a new gene in a genome
that is currently being sequenced, such as bacteria or mosses or
protozoa.

of a known protein
Inspect the
output
BLASTX nr
or
BLASTP nr
(4)
• Use the DNA sequence of the EST and perform a BLASTX query against the nonredundant (nr) database
• As an alternative strategy, take the encoded protein sequence, and use it as a query in a BLASTP search of the
nonredundant (nr) database at NCBI.
Demonstrate that this gene, and its
corresponding protein, are novel
For the purposes of this course, “novel” is defined as follows.
• If there is a 100% identity match to a protein in the database from the same
species, then your protein is NOT novel (even if the match is to a protein with
a name such as “unknown”).
• If the best match is to a protein with < 100% identity to your query, then it is
likely that your protein is novel and you have succeeded.
• If there is a match with 100% identity but to a different species than the one
you started with, then you have succeeded in finding a novel gene.
• If there are no database matches to the original query from step (1), this
indicates that you have found a DNA/protein that is not homologous to the
original query. You should start over.
Confirm if the novel protein is hit for your
query
• Generate a multiple sequence alignment with your novel
protein, your original query protein, and a group of other
members of this family.
• A typical number of proteins to use in a multiple sequence
alignment is a minimum of 5 or 10 and a reasonable maximum
is 30.

Bioinformatics Session11

Uploaded by

Copyright:

Available Formats

Bioinformatics Session11

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bioinformatics Session11

Uploaded by

Copyright:

Available Formats

Bioinformatics (BIO213)

• Repeat the BLASTP search with NP_057849 as the

bacterial matches to HIV-1

Question: are there human homologs

Question: are there human RNA

TBLASTN/X helps in searching for super diverged species

• A common problem in biology is finding a new gene.

Start with the sequence TBLASTN

Eg. human beta globin

Start with the sequence TBLASTN

Start with the sequence TBLASTN

Start with the sequence TBLASTN

You might also like