Biochem 218 - Biomedical Informatics 231: Doug Brutlag Professor Emeritus Biochemistry & Medicine (By Courtesy)
Biochem 218 - Biomedical Informatics 231: Doug Brutlag Professor Emeritus Biochemistry & Medicine (By Courtesy)
Biochem 218 - Biomedical Informatics 231: Doug Brutlag Professor Emeritus Biochemistry & Medicine (By Courtesy)
Doug Brutlag
Professor Emeritus
Biochemistry & Medicine (by courtesy)
Faculty, TAs and Staff
• Alway M114
– Tuesdays & Thursdays 2:15-3:30 PM
• Course Web Site
– http://biochem218.stanford.edu/
• Stanford Center for Professional Development
– http://scpd.stanford.edu/
• Videos available 24 hours/day, 7 days/week
• Course offered Autumn, Winter and Spring
quarters
Course Requirements
• Lectures
– Theoretical background of current methods
– Strengths and weaknesses of current approaches
– Future directions for improvements
• Demonstrations
– Applications (Mac, PC, Unix, Web)
– Web applications
– Illustrate homework
• All homework and questions must be submitted by
email to [email protected]
• Several homework assignments (35%)
– Due one week after assigned
• Final project (Due March 12th)
– A critical or comparative review of computational approaches to
any problem in computational molecular biology
– Propose new approach
– Implement a new approach
– Examples of previous projects for the class can be found at
http://biochem218.stanford.edu/Projects.html
David Mount
Bioinformatics: Sequence and Genome Analysis 2nd Edition
Jin Xiong
Essential Bioinformatics
Richard Durbin et al.
Biological Sequence Analysis
Jones & Pevzner
Bioinformatics Algorithms
Dan Gusfield
Algorithms on Strings, Trees & Sequences
Baldi & Brunak
Bioinformatics: The Machine Learning Approach
Higgins & Taylor
Bioinformatics: Sequence, Structure & Databanks
NCBI Handbook
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook
NCBI Handbook
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook
EMBL-EBI Home Page
http://www.ebi.ac.uk/
Berg, Tymoczko & Stryer
Biochemistry, Fifth Edition
Benjamin Lewin
Genes IX
Genomics, Bioinformatics &
Computational Biology
Genomics Bioinformatics
Computational Biology
Genomics, Bioinformatics &
Computational Biology
Genomics Bioinformatics
Systems Biology
Structural Genomics Proteomics
Computational Biology
Genomics, Bioinformatics &
Computational Biology
Genomics Bioinformatics
Computational Biology
Robotics
Machine Learning
Databases
Statistics & Probability
Artificial Intelligence
Information Theory
Algorithms Graph Theory
What is Bioinformatics?
Individuals
RNA Protein
DNA Phenotype
Evolution Selection
Populations
Biological Information
Computational Goals of Bioinformatics
• Learn & Generalize: Discover conserved patterns (models) of
sequences, structures, interactions, metabolism & chemistries from
well-studied examples.
RNA Phenotype
DNA Protein
(Symptoms)
Molecular Biology of the Gene 1965
Central Paradigm of Bioinformatics
Sequences of Common
Structure or Function
Sequence Similarity
10 20 30 40 50
Query VLSPADKTNVKAAWGKVGAHAGEVGAEALERMFLSFPTTKTYFPHF------DLSHGS
|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |
Match HLTPEEKSAVTALWGKV--NVDEYGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
A Typical Motif:
Zinc Finger DNA Binding Motif
C..C............H....H
Inferring Biological Function from
Protein Sequence
Consensus Sequences
Weight Matrices or
or Sequence Motifs
Position-Specific
1 2 3 4 5 Scoring
6 7 8 Matrices
9 10 11 12
Zinc Finger (C2H2 type)
A 2 1 3 13 10 12 67 4 13 9 1 2 C x {2,4} C x {12} H x {3,5} H
R 7 5 8 9 4 0 1 16 7 0 1 0
N 0 8 0 1 0 0 0 2 1 1 10 0
D 0 1 0 1 13 0 0 12 1 0 4 0
C 0 0 1 0 0 0 0 0 0 2 2 1
Q 1 1 21 8 10 0 0 7 6 0 0 2
E 2 0 0 9 21 0 0 15 7 3 3 0 Profiles, PSI-BLAST
G 9 7 1 4 0 0 8 0 0 0 46 0
H 4 3 1 1 2 0 0 2 2 0 5 0 Sequences of Common Hidden Markov Models
I 10 0 11 1 2 10 0 4 9 3 0 16 Structure or Function
L 16 1 17 0 1 31 0 3 11 24 0 14 D2 D3 D4 D5
K 3 4 5 10 11 1 1 13 10 0 5 2
M 7 1 1 0 0 0 0 0 5 7 1 8
F 4 0 3 0 0 4 0 0 0 10 0 0 I1 I2 I3 I4 I5
P 0 6 0 1 0 0 0 0 0 0 0 0
S 1 17 0 8 3 1 3 0 2 2 2 0
T 5 22 3 11 1 5 0 2 2 2 0 5
W 2 0 0 0 0 0 0 0 0 1 0 1 AA1 AA2 AA3 AA4 AA5 AA6
Y 1 0 4 2 0 1 0 0 2 4 0 1
V 6 3 1 1 2 15 0 0 2 12 0 28
Sequence Similarity
10 20 30 40 50
Query VLSPADKTNVKAAWGKVGAHAGEVGAEALERMFLSFPTTKTYFPHF------DLSHGS
|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |
Match HLTPEEKSAVTALWGKV--NVDEYGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
Buried Treasure
Buried Treasure
Buried Treasure
Clustal Globin Alignment
Consensus Sequence From a
Multiple Sequence Alignment
ClustalW Insulin Alignments
10 20 30
IPGP F V S R H
IPDK A A N Q H
IPDG M A L WM R L L P L L A L L A L W A P A P T R A F V N Q H
IPCH M A L W I R S L P L L A L L V F S G P G - T S Y A A N Q H
IPCA M A V W I Q A G A L L F L L A V S S V N A N A G A P - Q H
IPBO F V N Q H
IPAF M A A L W L Q S F S L L V L L V V S W P G S Q A V A P A Q H
A . W . . L L L L A N Q H
40 50 60
IPGP L C G S N L V E T L Y S V C Q D D G F F Y I P K D X X E L E
IPDK L C G S H L V E A L Y L V C G E R G F F Y S P K T X X D V E
IPDG L C G S H L V E A L Y L V C G E R G F F Y T P K A R R E V E
IPCH L C G S H L V E A L Y L V C G E R G F F Y S P K A R R D V E
IPCA L C G S H L V D A L Y L V C G P T G F F Y N P K R D V D P P
IPBO L C G S H L V E A L Y L V C G E R G F F Y T P K A R R E V E
IPAF L C G S H L V D A L Y L V C G D R G F F Y N P K R D V D Q L
L C G S H L V E A L Y L V C G E R G F F Y . P K . D V E
70 80 90
IPGP D P Q V E Q T E L G M G - - - - - L G A G G L Q P - - L Q G
IPDK Q P - L V N G P L H G E - - - - - V G E L P F Q - - - - H E
IPDG D L Q V R D V E L A G A - - - - - P G E G G L Q P L A L E G
IPCH Q P - L V S S P L R G E - - - - - A G V L P F Q - - - - Q E
IPCA L G F L P P K S - - - - - - A Q E T E V A D F A F K D H A E
IPBO G P Q V G A L E L A G G - - - - - P G A G G L E - - - - - G
IPAF L G F L P P K S G G A A A A G A D N E V A E F A F K D Q M E
P L L G G F Q E
T Cells Signaling
DNA Damage
Fibroblast Stimulation
B Cells Signaling
CMV Infection
Anoxia
Polio Infection
Monocytes Signaling IL4
Hormone
Clustering Gene Expression Profiles:
Comparison of Methods
GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC
CACATCGCATCACGTGACCAGT...GACATGGACGGC
GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA
TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG
CGCTAGCCCACGTGGATCTTGT...AGAATGGCCTAT
Finding Transcription Factor Binding
Sites
ATGGCTGCACCACGTTTATGC...ACGATGTCTCGC
CACATCGCATCACGTGACCAGT...GACATGGACGGC
GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA
TTAGGACCATCACGTGA...ACAATGAGAGCG
CGCTAGCCCACGTTGATCTTGT...AGAATGGCCTAT
Pho4 binding
Metabolic Networks: BioCyc
http://biocyc.org/
C. crescentus Cell Cycle Gene Expression
Genome Wide Associations in
Rheumatoid Arthritis
Novel Diagnostics
Microchips & Microarrays - DNA
Gene Expression - RNA
Proteomics - Protein
Novel Therapeutics
Drug Target Discovery
Rational Drug Design
Molecular Docking
Gene Therapy
Stem Cell Therapy
Understanding Metabolism
Understanding Disease
Inherited Diseases - OMIM
Infectious Diseases
Pathogenic Bacteria
Viruses