BIO 20-1 Third Long Exam Bioinformatics Engineering: - We Can Get The Amino Acid Sequence Using Single Letter Code
BIO 20-1 Third Long Exam Bioinformatics Engineering: - We Can Get The Amino Acid Sequence Using Single Letter Code
BIO 20-1 Third Long Exam Bioinformatics Engineering: - We Can Get The Amino Acid Sequence Using Single Letter Code
I. PROTEIN IDENTIFIER
3. What are the information that can be obtained from the FASTA format? (3 points)
Homo sapiens
The protein encoded by this gene is an inducible molecular chaperone that functions as a
homodimer. The encoded protein aids in the proper folding of specific target proteins by use
of an ATPase activity that is modulated by co-chaperones.
III. QUESTIONS
1. Just by examining the graphic summary, how many proteins produced a significant match with
the query sequence? What does the color of the alignments indicate? (2 points)
2. In a table format, identify all the proteins that produced significant alignments andtheir source
organism, max scores, E-values, and accession numbers. (8 points)
3. Identify all the human protein isoforms that significantly matched with the query sequence. (2
points)
Protein Name
heat shock 90kDa protein 1, alpha
heat shock protein HSP 90-alpha isoform 1
Heat shock protein HSP 90-alpha 2
unnamed protein product
4. Which organism would be the most suitable animal model for studying human cystinosin
function? Briefly discuss your answer. (Hint: Protein homology (especially conserved domains) and
function are two important considerations for selecting a candidate animal/organism model. NCBI is
equipped with a sequence analysis platform, Analyze this sequence, that can identify conserved
domains in proteins and their identities/function). (4 points)
5. Would bakers yeast represent a suitable model for studying human cystinosin function? Briefly
discuss your answer. (4 points)
Baker's yeast can represent a suitable model for studying human cystinosin function, it is
because baker's yeast is one of the simplest eukaryotic organism which has a several
essential processes that is conserved between yeast and humans. The yeast genome is
just over 12 million base pairs in length and contains about 6000 genes. This suggests that
such diseases result from the disruption of very basic cellular processes, such as DNA
repair, cell division or the control of gene expression.
6. In most alignments, max score would be equivalent to total score. However, in the descriptions
table, NP_009705.1 produced a total score of 99.0 and max score of 34.7. Briefly discuss the
discrepancy between the two values. (2 points)
7. Identify a substitution (positive and negative), insertion, and deletion mutations between the
query sequence and the most significant homologue in bakers yeast. (4 points)
We all know that BLAST provides a method for rapid searching of nucleotide and protein
databases. It has an algorithm which was written with balancing speed and increased
sensitivity for distant sequence relationships. BLAST is one of the more popular
bioinformatics tools. Researchers use command-line applications to perform searches
locally, often searching custom databases and performing searches in bulk, possibly
distributing the searches on their own computer cluster. Several variants of BLAST exist to
compare all combinations of nucleotide or protein queries against a nucleotide or protein
database. In addition to performing alignments, BLAST provides an "expect" value,
statistical information about the significance of each alignment.