Bioinformatics Seminar3rdOct18
Bioinformatics Seminar3rdOct18
Bioinformatics Seminar3rdOct18
Supervised by
Dr. Yogita, Assistant Professor
National Institute of
Technology, Meghalaya
Presented by
Subhasree Majumder
T18CS005
M.Tech CSE 1st Year
Objective
• Introduction to Bioinformatics
• Application Areas
• Fundamentals of Cell, DNA and Proteins
• Central Dogma of Biology
• Biological databases, data formats
• Sequence Alignment
What is Bioinformatics
• Bioinformatics involves the technology that uses computational power for
storage, retrieval, manipulation and distribution of information related to
biological macromolecules such as DNA, RNA and proteins.
• In broader sense it is an interdisciplinary field which is the meeting point
of Biology, Statistics and Computer Sciences.
Applications and Subfield
Bioinformatics consists of two major subfields: the development of computational tools and databases and
the application of these tools and databases in generating biological knowledge in the areas of molecular
sequence analysis, molecular structural analysis, and molecular functional analysis
Biology for Engineers
Cell- The Fundamental Unit of Life.
• The Cytoplasm is the main matrix which holds the cell and all other related
cellular devices or organelles.
2. Divide the sequence into 6 different reading frames(+1, +2, +3, -1, -2 and -3). The first reading frame is obtained by
considering the sequence in words of 3.
FRAME +1: CGC TAC GTC TTA CGC TGG AGC TCT CAT GGA TCG GTT CGG TAG GGC TCG ATC ACA TCG CTA GCC AT
3. The second reading frame is formed after leaving the first nucleotide and then grouping the sequence into words of 3
nucleotides
FRAME +2: C GCT ACG TCT TAC GCT GGA GCT CTC ATG GAT CGG TTC GGT AGG GCT CGA TCA CAT CGC TAG CCA T
4.The third reading frame is formed after leaving the first 2 nucleotides and then grouping the sequence into words of 3
nucleotides
FRAME +3: CG CTA CGT CTT ACG CTG GAG CTC TCA TGG ATC GGT TCG GTA GGG CTC GAT CAC ATC GCT AGC CAT
5. The other 3 reading frames can be found only after finding the reverse complement.
Complement : GCGATGCAGAATGCGACCTCGAGAGTACCTAGCCAAGCCATCCCGAGCTAGTGTAGCGATCGGTA
Reverse complement: ATGGCTAGCGATGTGATCGAGCCCTACCGAACCGATCCATGAGAGCTCCAGCGTAAGACGTAGCG
6. Now same process as that of +1, +2 and +3 strands is repeated for -1, -2 and -3 strands with reverse complement
sequence
FRAME -1: ATG GCT AGC GAT GTG ATC GAG CCC TAC CGA ACC GAT CCA TGA GAG CTC CAG CGT AAG ACG TAG CG
FRAME -2: A TGG CTA GCG ATG TGA TCG AGC CCT ACC GAA CCG ATC CAT GAG AGC TCC AGC GTA AGA CGT AGC G
FRAME -3: AT GGC TAG CGA TGT GAT CGA GCC CTA CCG AAC CGA TCC ATG AGA GCT CCA GCG TAA GAC GTA GCG
Open Reading Frame Search
3. Now mark the start codon and stop codons in the reading frames
4. Identify the open reading frame (ORF) - sequence stretch beginning with a start codon and
ending in a stop codon.
AACGT_AGAATC__
_TGCAGAGCC_GGA
Gap penalty can be calculated as follows:
• Opening a gap receives a penalty of d.
• Extending a gap receives a penalty of e.
• Total penalty = d+ (n-1)*e.
Dynamic problem works by finding the best alignment as sum of previous alignment and present
alignment. So suppose we want to align two sequences x and y.
• F(i,j) is the score of the best alignment between X 1…I and y 1…j.
• S(A,B) is the score of substitution of A with B, and d is gap penalty.
• 3 cases are possible Si Si __
Tj __ Tj
-5 2
-10
-15
AA G -
-AGC
AA G -
A-GC
Final alignment
Dynamic algorithm for pairwise sequence alignment ~Smith Waterman Local Sequence
alignment algorithm
It was discovered in the later years , that proteins might have similar functions but very different
structures with similarities found only in domain/motifs. Global alignment algorithm often ignores
such similar areas.
Smith Waterman came up with a different approach over the Needleman algorithm, by lower
bounding the function by 0 instead of going negative:
0
Note: Here we 2 things, since lower bound is 0, so no negative scores, also that stops the backtracking path from
starting always from bottom right.
Next, because of above we get local alignments also. Like 0->2 and 0->2->4
0
Thank You