2006 09 01 - Lect01 - ch1 2 PDF
2006 09 01 - Lect01 - ch1 2 PDF
2006 09 01 - Lect01 - ch1 2 PDF
Teaching assistants
People with very diverse backgrounds in biology People with diverse backgrounds in computer science and biostatistics Most people have a favorite gene, protein, or disease
Textbook
The course textbook is J. Pevsner, Bioinformatics and Functional Genomics (Wiley, 2003). The chapters contain content, lab exercises, and quizzes that were developed in this course over the past six years. A few copies will be available on reserve at Welch Library for those of you who do not want to buy a copy (go up to the 2nd floor), and the library has six more copies. Several other bioinformatics texts are available: Baxevanis and Ouellette David Mount Durbin et al.
Web sites
The course website is reached via: http://pevsnerlab.kennedykrieger.org/bioinfo_course.htm (or Google pevsnerlab courses) This site contains the powerpoints for each lecture. The textbook website is: http://www.bioinfbook.org This has 1000 URLs, organized by chapter This site also contains the same powerpoints. The weekly quizzes are on my website: http://pevsnerlab.kennedykrieger.org/moodle Once you log in and take a quiz, you will get instant feedback. You can use moodle to ask questions as well.
Literature references
You are encouraged to read original source articles. They will enhance your understanding of the material. Reading will be assigned.
Aspartyl protease PR
Reverse transcriptase RT
Integrase
IN
There is a computer lab each Friday. This is a chance to gain practical experience using a variety of web resources. You can do the lab on your own, ahead of time. However, during the Friday lab you can get help on problems, and in some cases the computers will have specialized software.
Grading
40% ten moodle quizzes (corresponding to chapters 2-11) 30% final exam October 25 (in class) 30% discovery of a novel gene: --Find the novel gene by the end of September, and turn in the final report, with phylogenetic tree, by October 25 --Instructions are posted on the course website --We will discuss this project in detail in the next two weeks.
Grading
Quizzes are taken at the moodle website, and are due one week after the relevant lecture 4% Chapter 2 quiz (sequences) 4% Chapter 3 quiz (alignment) 4% Chapter 4 quiz (BLAST) 4% Chapter 5 quiz (advanced BLAST) 4% Chapter 6 quiz (RNA) 4% Chapter 7 quiz (microarrays) 4% Chapter 8 quiz (proteomics) 4% Chapter 9 quiz (protein structure) 4% Chapter 10 quiz (multiple alignment) 4% Chapter 11 quiz (phylogeny) 30% find-a-gene project (due October 25) 30% final exam October 25 (in class)
ten quizzes
What is bioinformatics?
Interface of biology and computers Analysis of proteins, genes and genomes using computer algorithms and computer databases Genomics is the analysis of genomes. The tools of bioinformatics are used to make sense of the billions of base pairs of DNA that are sequenced by genomics projects.
bioinformatics
medical informatics
Tool-users
Tool-makers
databases infrastructure algorithms
DNA
RNA
protein
phenotype
Page 5
Time of development
Page 5
Page 6
DNA
RNA
protein
phenotype
Growth of GenBank
Base pairs of DNA (billions)
1982 1986 1990 1994 1998 2002
Sequences (millions)
Year
Growth of GenBank
Sequences (millions)
70 60 50 40 30 20 10 0
1985
1990
1995
2000
December 1982
June 2006
http://www.ncbi.nlm.nih.gov/Genbank/
DNA
RNA
protein
genome
transcriptome
proteome
DNA
RNA
protein
phenotype
EMBL
GenBank
DDBJ
EMBL
Housed at EBI European Bioinformatics Institute
GenBank
Housed at NCBI National Center for Biotechnology Information
DDBJ
Housed in Japan
Page 16
8/06
http://www.ncbi.nlm.nih.gov/Taxonomy/txstat.cgi
www.ncbi.nlm.nih.gov
Page 24
www.ncbi.nlm.nih.gov
PubMed is
National Library of Medicine's search service 16 million citations in MEDLINE links to participating online journals PubMed tutorial (via Education on side bar)
Page 24
Entrez integrates
the scientific literature; DNA and protein sequence databases; 3D protein structure data; population study data sets; assemblies of complete genomes
Page 24
Page 24
BLAST is
Basic Local Alignment Search Tool NCBI's sequence similarity search tool supports analysis of DNA and protein databases 80,000 searches per day
Page 25
OMIM is
Online Mendelian Inheritance in Man catalog of human genes and genetic disorders edited by Dr. Victor McKusick, others at JHU
Page 25
Books is
searchable resource of on-line books
Page 26
TaxBrowser is
browser for the major divisions of living organisms (archaea, bacteria, eukaryota, viruses) taxonomy information such as genetic codes molecular data on extinct organisms
Page 26
Page 26
Page 26
Page 27
Page 27
FASTA format
Note that links to many other RBP4 database entries are available revised Fig. 2.8 Page 30
Page 31
DNA
RNA
protein
Cluster size (ESTs) 1 2 3-4 5-8 9-16 17-32 500-1000 2000-4000 8000-16,000 16,000-30,000
UniGene build 194, 8/06
Number of clusters 42,800 6,500 6,500 5,400 4,100 3,300 2,128 233 21 8
Page 31
Page 31
click human
enter RBP4
Page 33
Page 33
There are many possible approaches. Begin at the main page of NCBI, and type an Entrez query: hiv-1 pol
Page 34
Searching for HIV-1 pol: Following the genome link yields a manageable three results
Page 34
For the Entrez query: hiv-1 pol there are about 40,000 nucleotide or protein records (and >100,000 records for a search for hiv-1), but these can easily be reduced in two easy steps: --specify the organism, e.g. hiv-1[organism] --limit the output to RefSeq!
Page 34
only 1 RefSeq
At this point, select a reasonable candidate (e.g. histone 2, H4) and follow its link to Entrez Gene. There, you can confirm you have the right gene/protein.
8-12-06
Page 35
PubMed is the NCBI gateway to MEDLINE. MEDLINE contains bibliographic citations and author abstracts from over 4,600 journals published in the United States and in 70 foreign countries. It has >14 million records dating back to 1966.
Page 35
MeSH is the acronym for "Medical Subject Headings." MeSH is the list of the vocabulary terms used for subject analysis of biomedical literature at NLM. MeSH vocabulary is used for indexing journal articles for MEDLINE. The MeSH controlled vocabulary imposes uniformity and consistency to the indexing of biomedical literature.
Page 35
1 AND 2
1 OR 2
1 NOT 2 8/04
Article contents: globin is globin is absent present Search result: globin is found true positive false positive (article does not discuss globins)
true negative
8/06
http://www.welch.jhu.edu
Brian Brown ([email protected]) and Carrie Iwema ([email protected]) are the Welch Medical Library liasons to the basic sciences
Course sponsors
Dept. of Molecular Microbiology & Immunology, and Dept. of Biostatistics, School of Public Health