BIO 20-1 Third Long Exam Bioinformatics Engineering: - We Can Get The Amino Acid Sequence Using Single Letter Code

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4

BIO 20-1

Third Long Exam


Bioinformatics Engineering

I. PROTEIN IDENTIFIER

1. What is the name of the protein?

heat shock protein HSP 90-alpha isoform 1 [Homo sapiens]

2. What is the FASTA format for the protein?

>gi|153792590|ref|NP_001017963.2| heat shock protein HSP 90-alpha isoform 1 [Homo sapiens]


MPPCSGGDGSTPPGPSLRDRDCPAQSAEYPRDRLDPRPGSPSEASSPPFLRSRAPVNWYQEKAQVFLWHL
MVSGSTTLLCLWKQPFHVSAFPVTASLAFRQSQGAGQHLYKDLQPFILLRLLMPEETQTQDQPMEEEEVE
TFAFQAEIAQLMSLIINTFYSNKEIFLRELISNSSDALDKIRYESLTDPSKLDSGKELHINLIPNKQDRT
LTIVDTGIGMTKADLINNLGTIAKSGTKAFMEALQAGADISMIGQFGVGFYSAYLVAEKVTVITKHNDDE
QYAWESSAGGSFTVRTDTGEPMGRGTKVILHLKEDQTEYLEERRIKEIVKKHSQFIGYPITLFVEKERDK
EVSDDEAEEKEDKEEEKEKEEKESEDKPEIEDVGSDEEEEKKDGDKKKKKKIKEKYIDQEELNKTKPIWT
RNPDDITNEEYGEFYKSLTNDWEDHLAVKHFSVEGQLEFRALLFVPRRAPFDLFENRKKKNNIKLYVRRV
FIMDNCEELIPEYLNFIRGVVDSEDLPLNISREMLQQSKILKVIRKNLVKKCLELFTELAEDKENYKKFY
EQFSKNIKLGIHEDSQNRKKLSELLRYYTSASGDEMVSLKDYCTRMKENQKHIYYITGETKDQVANSAFV
ERLRKHGLEVIYMIEPIDEYCVQQLKEFEGKTLVSVTKEGLELPEDEEEKKKQEEKKTKFENLCKIMKDI
LEKKVEKVVVSNRLVTSPCCIVTSTYGWTANMERIMKAQALRDNSTMGYMAAKKHLEINPDHSIIETLRQ
KAEADKNDKSVKDLVILLYETALLSSGFSLEDPQTHANRIYRMIKLGLGIDEDDPTADDTSAAVTEEMPP
LEGDDDTSRMEEVD

3. What are the information that can be obtained from the FASTA format? (3 points)

A multiple sequence FASTA format would be obtained by concatenating several single


sequence FASTA files. We can get the amino acid sequence using single letter code.

4. What is the source organism of the protein?

Homo sapiens

5. What is/ are the functions of the protein?

The protein encoded by this gene is an inducible molecular chaperone that functions as a
homodimer. The encoded protein aids in the proper folding of specific target proteins by use
of an ATPase activity that is modulated by co-chaperones.
III. QUESTIONS

1. Just by examining the graphic summary, how many proteins produced a significant match with
the query sequence? What does the color of the alignments indicate? (2 points)

2. In a table format, identify all the proteins that produced significant alignments andtheir source
organism, max scores, E-values, and accession numbers. (8 points)

Protein Name Organism Total Scores E-values Accession Number


heat shock 90kDa protein 1, Homo sapiens 116 3e-30 ABC40730.1
alpha
heat shock 90kDa protein 1, Homo sapiens 116 3e-30 NP_001017963.2
alpha
Heat shock protein HSP 90- Homo sapiens 116 3e-30 CAI64495.1
alpha 2
unnamed protein product Homo sapiens 104 2e-27 CAD66568.1
expanded Drosophila grimshawi 28.1 9.9 XP_001988550.1

3. Identify all the human protein isoforms that significantly matched with the query sequence. (2
points)

Protein Name
heat shock 90kDa protein 1, alpha
heat shock protein HSP 90-alpha isoform 1
Heat shock protein HSP 90-alpha 2
unnamed protein product

4. Which organism would be the most suitable animal model for studying human cystinosin
function? Briefly discuss your answer. (Hint: Protein homology (especially conserved domains) and
function are two important considerations for selecting a candidate animal/organism model. NCBI is
equipped with a sequence analysis platform, Analyze this sequence, that can identify conserved
domains in proteins and their identities/function). (4 points)

5. Would bakers yeast represent a suitable model for studying human cystinosin function? Briefly
discuss your answer. (4 points)

Baker's yeast can represent a suitable model for studying human cystinosin function, it is
because baker's yeast is one of the simplest eukaryotic organism which has a several
essential processes that is conserved between yeast and humans. The yeast genome is
just over 12 million base pairs in length and contains about 6000 genes. This suggests that
such diseases result from the disruption of very basic cellular processes, such as DNA
repair, cell division or the control of gene expression.
6. In most alignments, max score would be equivalent to total score. However, in the descriptions
table, NP_009705.1 produced a total score of 99.0 and max score of 34.7. Briefly discuss the
discrepancy between the two values. (2 points)

7. Identify a substitution (positive and negative), insertion, and deletion mutations between the
query sequence and the most significant homologue in bakers yeast. (4 points)

Amino Acid Position or Range Type of Mutation

8. Discuss briefly one practical application of BLAST.

We all know that BLAST provides a method for rapid searching of nucleotide and protein
databases. It has an algorithm which was written with balancing speed and increased
sensitivity for distant sequence relationships. BLAST is one of the more popular
bioinformatics tools. Researchers use command-line applications to perform searches
locally, often searching custom databases and performing searches in bulk, possibly
distributing the searches on their own computer cluster. Several variants of BLAST exist to
compare all combinations of nucleotide or protein queries against a nucleotide or protein
database. In addition to performing alignments, BLAST provides an "expect" value,
statistical information about the significance of each alignment.

You might also like