0% found this document useful (0 votes)
67 views50 pages

Biochem 218 - Biomedical Informatics 231: Doug Brutlag Professor Emeritus Biochemistry & Medicine (By Courtesy)

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1/ 50

Computational Molecular Biology

Biochem 218 – BioMedical Informatics 231


http://biochem218.stanford.edu/

Genomics and Bioinformatics

Doug Brutlag
Professor Emeritus
Biochemistry & Medicine (by courtesy)
Faculty, TAs and Staff

Doug Brutlag Lee Kozar

Maeve O’Huallachain Dan Davison


Course and Video
Availability

• Alway M114
– Tuesdays & Thursdays 2:15-3:30 PM
• Course Web Site
– http://biochem218.stanford.edu/
• Stanford Center for Professional Development
– http://scpd.stanford.edu/
• Videos available 24 hours/day, 7 days/week
• Course offered Autumn, Winter and Spring
quarters
Course Requirements
• Lectures
– Theoretical background of current methods
– Strengths and weaknesses of current approaches
– Future directions for improvements
• Demonstrations
– Applications (Mac, PC, Unix, Web)
– Web applications
– Illustrate homework
• All homework and questions must be submitted by
email to [email protected]
• Several homework assignments (35%)
– Due one week after assigned
• Final project (Due March 12th)
– A critical or comparative review of computational approaches to
any problem in computational molecular biology
– Propose new approach
– Implement a new approach
– Examples of previous projects for the class can be found at
http://biochem218.stanford.edu/Projects.html
David Mount
Bioinformatics: Sequence and Genome Analysis 2nd Edition
Jin Xiong
Essential Bioinformatics
Richard Durbin et al.
Biological Sequence Analysis
Jones & Pevzner
Bioinformatics Algorithms
Dan Gusfield
Algorithms on Strings, Trees & Sequences
Baldi & Brunak
Bioinformatics: The Machine Learning Approach
Higgins & Taylor
Bioinformatics: Sequence, Structure & Databanks
NCBI Handbook
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook
NCBI Handbook
http://www.ncbi.nlm.nih.gov/bookshelf/br.fcgi?book=handbook
EMBL-EBI Home Page
http://www.ebi.ac.uk/
Berg, Tymoczko & Stryer
Biochemistry, Fifth Edition
Benjamin Lewin
Genes IX
Genomics, Bioinformatics &
Computational Biology

Genomics Bioinformatics

Structural Genomics Proteomics

Computational Molecular Biology

Computational Biology
Genomics, Bioinformatics &
Computational Biology

Genomics Bioinformatics
Systems Biology
Structural Genomics Proteomics

Computational Molecular Biology

Computational Biology
Genomics, Bioinformatics &
Computational Biology

Genomics Bioinformatics

Structural Genomics Proteomics

Computational Molecular Biology

Computational Biology

Robotics
Machine Learning
Databases
Statistics & Probability
Artificial Intelligence
Information Theory
Algorithms Graph Theory
What is Bioinformatics?
Individuals

RNA Protein

DNA Phenotype

Evolution Selection

Populations
Biological Information
Computational Goals of Bioinformatics
• Learn & Generalize: Discover conserved patterns (models) of
sequences, structures, interactions, metabolism & chemistries from
well-studied examples.

• Prediction: Infer function or structure of newly sequenced genes,


genomes, proteins or proteomes from these generalizations.

• Organize & Integrate: Develop a systematic and genomic approach to


molecular interactions, metabolism, cell signaling, gene expression…

• Simulate: Model gene expression, gene regulation, protein folding,


protein-protein interaction, protein-ligand binding, catalytic function,
metabolism…

• Engineer: Construct novel organisms or novel functions or novel


regulation of genes and proteins.

• Gene Therapy: Target specific genes, or mutations, RNAi to change a


disease phenotype.
Central Paradigm of Molecular Biology

RNA Phenotype
DNA Protein
(Symptoms)
Molecular Biology of the Gene 1965
Central Paradigm of Bioinformatics

Genetic Molecular Biochemical Phenotype


Information Structure Function (Symptoms)
MVHLTPEEKT
AVNALWGKVN
VDAVGGEALG
RLLVVYPWTQ
RFFESFGDLS
SPDAVMGNPK
VKAHGKKVLG
AFSDGLAHLD
NLKGTFSQLS
ELHCDKLHVD
PENFRLLGNV
LVCVLARNFG
KEFTPQMQAA
YQKVVAGVAN
ALAHKYH
Central Paradigm of Bioinformatics

Genetic Molecular Biochemical Phenotype


Information Structure Function (Symptoms)
MVHLTPEEKT
AVNALWGKVN
VDAVGGEALG
RLLVVYPWTQ
RFFESFGDLS
SPDAVMGNPK
VKAHGKKVLG
AFSDGLAHLD
NLKGTFSQLS
ELHCDKLHVD
PENFRLLGNV
LVCVLARNFG
KEFTPQMQAA
YQKVVAGVAN
ALAHKYH
Challenges Understanding
Genetic Information

Genetic Molecular Biochemical


Information Structure Function Phenotype

• Genetic information is redundant


• Structural information is redundant
• Genes and proteins are meta-stable
• Single genes have multiple functions
• Genes are one dimensional but function depends
on three-dimensional structure
Using A Controlled Vocabulary for Literature Search
http://www.ncbi.nlm.nih.gov/sites/entrez?db=mesh
Gene Ontology Database
http://www.geneontology.org/
UCSC Genome Browser
http://genome.ucsc.edu/
ExPASy Proteomics Server
http://www.expasy.ch/doc.html
Inferring Biological Function from
Protein Sequence
Consensus Sequences
or Sequence Motifs
Zinc Finger (C2H2 type)
C x {2,4} C x {12} H x {3,5} H

Sequences of Common
Structure or Function

Sequence Similarity
10 20 30 40 50
Query VLSPADKTNVKAAWGKVGAHAGEVGAEALERMFLSFPTTKTYFPHF------DLSHGS
|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |
Match HLTPEEKSAVTALWGKV--NVDEYGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
A Typical Motif:
Zinc Finger DNA Binding Motif

C..C............H....H
Inferring Biological Function from
Protein Sequence
Consensus Sequences
Weight Matrices or
or Sequence Motifs
Position-Specific
1 2 3 4 5 Scoring
6 7 8 Matrices
9 10 11 12
Zinc Finger (C2H2 type)
A 2 1 3 13 10 12 67 4 13 9 1 2 C x {2,4} C x {12} H x {3,5} H
R 7 5 8 9 4 0 1 16 7 0 1 0
N 0 8 0 1 0 0 0 2 1 1 10 0
D 0 1 0 1 13 0 0 12 1 0 4 0
C 0 0 1 0 0 0 0 0 0 2 2 1
Q 1 1 21 8 10 0 0 7 6 0 0 2
E 2 0 0 9 21 0 0 15 7 3 3 0 Profiles, PSI-BLAST
G 9 7 1 4 0 0 8 0 0 0 46 0
H 4 3 1 1 2 0 0 2 2 0 5 0 Sequences of Common Hidden Markov Models
I 10 0 11 1 2 10 0 4 9 3 0 16 Structure or Function
L 16 1 17 0 1 31 0 3 11 24 0 14 D2 D3 D4 D5
K 3 4 5 10 11 1 1 13 10 0 5 2
M 7 1 1 0 0 0 0 0 5 7 1 8
F 4 0 3 0 0 4 0 0 0 10 0 0 I1 I2 I3 I4 I5
P 0 6 0 1 0 0 0 0 0 0 0 0
S 1 17 0 8 3 1 3 0 2 2 2 0
T 5 22 3 11 1 5 0 2 2 2 0 5
W 2 0 0 0 0 0 0 0 0 1 0 1 AA1 AA2 AA3 AA4 AA5 AA6
Y 1 0 4 2 0 1 0 0 2 4 0 1
V 6 3 1 1 2 15 0 0 2 12 0 28

Sequence Similarity
10 20 30 40 50
Query VLSPADKTNVKAAWGKVGAHAGEVGAEALERMFLSFPTTKTYFPHF------DLSHGS
|:| :|: | |:|||| | |:||| |: : :|:| :| | |: |
Match HLTPEEKSAVTALWGKV--NVDEYGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGN
10 20 30 40 50
Buried Treasure
Buried Treasure
Buried Treasure
Clustal Globin Alignment
Consensus Sequence From a
Multiple Sequence Alignment
ClustalW Insulin Alignments

10 20 30
IPGP F V S R H
IPDK A A N Q H
IPDG M A L WM R L L P L L A L L A L W A P A P T R A F V N Q H
IPCH M A L W I R S L P L L A L L V F S G P G - T S Y A A N Q H
IPCA M A V W I Q A G A L L F L L A V S S V N A N A G A P - Q H
IPBO F V N Q H
IPAF M A A L W L Q S F S L L V L L V V S W P G S Q A V A P A Q H
A . W . . L L L L A N Q H

40 50 60
IPGP L C G S N L V E T L Y S V C Q D D G F F Y I P K D X X E L E
IPDK L C G S H L V E A L Y L V C G E R G F F Y S P K T X X D V E
IPDG L C G S H L V E A L Y L V C G E R G F F Y T P K A R R E V E
IPCH L C G S H L V E A L Y L V C G E R G F F Y S P K A R R D V E
IPCA L C G S H L V D A L Y L V C G P T G F F Y N P K R D V D P P
IPBO L C G S H L V E A L Y L V C G E R G F F Y T P K A R R E V E
IPAF L C G S H L V D A L Y L V C G D R G F F Y N P K R D V D Q L
L C G S H L V E A L Y L V C G E R G F F Y . P K . D V E

70 80 90
IPGP D P Q V E Q T E L G M G - - - - - L G A G G L Q P - - L Q G
IPDK Q P - L V N G P L H G E - - - - - V G E L P F Q - - - - H E
IPDG D L Q V R D V E L A G A - - - - - P G E G G L Q P L A L E G
IPCH Q P - L V S S P L R G E - - - - - A G V L P F Q - - - - Q E
IPCA L G F L P P K S - - - - - - A Q E T E V A D F A F K D H A E
IPBO G P Q V G A L E L A G G - - - - - P G A G G L E - - - - - G
IPAF L G F L P P K S G G A A A A G A D N E V A E F A F K D Q M E
P L L G G F Q E

100 110 120


IPGP A L Q X X - - G I V D Q C C T G T C T R H Q L Q S Y C N
IPDK E Y Q X X - - G I V E Q C C E N P C S L Y Q L E N Y C N
IPDG A L Q K R - - G I V E Q C C T S I C S L Y Q L E N Y C N
IPCH E Y E K V K R G I V E Q C C H N T C S L Y Q L E N Y C N
IPCA V I R K R - - G I V E Q C C H K P C S I F E L Q N Y C N
IPBO P P Q K R - - G I V E Q C C A S V C S L Y Q L E N Y C N
IPAF M M V K R - - G I V E Q C C H R P C N I F D L Q N Y C N
. Q K R G I V E Q C C C S L Y Q L E N Y C N
HMM Model of Hemoglobins
http://decypher.stanford.edu/
GrowTree VegF Neighbor Joining Tree
Human Gene Expression Signatures

T Cells Signaling

DNA Damage

Fibroblast Stimulation

B Cells Signaling

CMV Infection

Anoxia

Polio Infection
Monocytes Signaling IL4
Hormone
Clustering Gene Expression Profiles:
Comparison of Methods

D'haeseleer P (2005). Nat Biotechnol. 23,1499-501.


TAMO:
Tools for the Analysis of Motifs
Finding Transcription Factor Binding Sites

Upstream Regions Co-


expressed
Genes
Pho 5
GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC
Pho 8
CACATCGCATCACGTGACCAGT...GACATGGACGGC
Pho 81
GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA
Pho 84
TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG
Pho …
CGCTAGCCCACGTGGATCTTGA...AGAATGACTGGC
Transcription
Start
Finding Transcription Factor Binding
Sites

Upstream Regions Co-expressed


Genes

GATGGCTGCACCACGTGTATGC...ACGATGTCTCGC
CACATCGCATCACGTGACCAGT...GACATGGACGGC
GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA
TCTCGTTAGGACCATCACGTGA...ACAATGAGAGCG
CGCTAGCCCACGTGGATCTTGT...AGAATGGCCTAT
Finding Transcription Factor Binding
Sites

Upstream Regions Co-expressed


Genes

ATGGCTGCACCACGTTTATGC...ACGATGTCTCGC
CACATCGCATCACGTGACCAGT...GACATGGACGGC
GCCTCGCACGTGGTGGTACAGT...AACATGACTAAA
TTAGGACCATCACGTGA...ACAATGAGAGCG
CGCTAGCCCACGTTGATCTTGT...AGAATGGCCTAT

Pho4 binding
Metabolic Networks: BioCyc
http://biocyc.org/
C. crescentus Cell Cycle Gene Expression
Genome Wide Associations in
Rheumatoid Arthritis

Pearson, T. A. et al. JAMA 2008;299:1335-1344


Leveraging Genomic Information in
Medicine

Novel Diagnostics
Microchips & Microarrays - DNA
Gene Expression - RNA
Proteomics - Protein
Novel Therapeutics
Drug Target Discovery
Rational Drug Design
Molecular Docking
Gene Therapy
Stem Cell Therapy
Understanding Metabolism
Understanding Disease
Inherited Diseases - OMIM
Infectious Diseases
Pathogenic Bacteria
Viruses

You might also like