CE6068 Lecture 1

CE6068
Bioinformatics and Computational Molecular

Introduction to Bioinformatics
Chia-Ru Chung
Department of Computer Science and Information Engineering
National Central University
2024/3/6
Outline
• DNA, RNA, and Protein

• Genome, Chromosome, and Gene
• Mutation
• Central Dogma
• Biotechnological Tools
• Brief History of Bioinformatics
1
DNA, RNA, and Protein
Organization of the Body
• The organization of the body progresses from
cells to tissues, then to organs, followed by organ
systems, ultimately forming a fully functioning
organism.
• Cell performs two type of functions:
‐ Perform chemical reactions necessary to maintain
our life
‐ Pass the information for maintaining life to the next
generation
Ref. https://www.exploringnature.org/db/view/Levels-of-Organization-in-the-Body-Cells-to-Organisms-Color 3
Cells
systems, store the
ultimately genetic
forming information
a fully functioning
in the form of
organism. double-stranded DNA
our life
generation
systems, ultimately forming a fully functioning
organism.
Portions of the DNA called genes
are transcribed into closely related
‐ Perform chemical reactions
molecules callednecessary
RNAs.to maintain
our life
generation
‣ Protein performs chemical reactions
‣ DNA
cells tostores andthen
tissues, passes information
to organs, followed by organ
‣ RNA is the
systems, intermediate
ultimately formingbetween DNA and
a fully functioning
organism.
proteins
our life
generation
Protein
• Proteins constitute most of a cell’s dry mass. They are not only the
building blocks from which cells are built, but also execute nearly
all cell functions.
• Protein is a sequence composed of an alphabet of 20 amino acids.
‐ The length is in the range of 20 to more than 5000 amino acids.
‐ In average, protein contains around 350 amino acids.
• Protein folds into three-dimensional shape, which form the
building blocks and perform most of the chemical reactions within Ref. https://en.wikipedia.org/wiki/Protein
a cell.
7
Protein
• Proteins constitute most of a cell’s dry mass. They are not only the
building blocks from which cells are built, but also execute nearly
‣ The folding
all cell of the proteins is caused by the weak interactions
functions.
among amino acid residues.
• Protein is a sequence composed of an alphabet of 20 amino acids.
‣ The weak interactions include hydrogen bonds, ionic bonds,
‐ The
van derlength
Waalsis attractions,
in the range ofand
20 to more
the than 5000 amino
hydrophobic acids.
interactions.
‐ In average, protein contains around 350 amino acids.
• Protein folds into three-dimensional shape, which form the
building blocks and perform most of the chemical reactions within Ref. https://en.wikipedia.org/wiki/Protein
a cell.
8
Amino Acids
Cα (the central carbon)
• Amino acids are the building blocks of
proteins.
• Each amino acid consist of
‣ Amino Group (-NH2 group)
‣ Carboxyl Group (-COOH group)
‣ R Group (Side Chain), which determines the
type of an amino acid Ref. https://www.reagent.co.uk/blog/what-are-amino-acids/
9
20 Common Amino Acids
• There are 20 common amino
acids, characterized by
different R groups.
• These 20 amino acids can be
classified according to their
mass, volume, acidity, polarity,
and hydrophobicity.
疏水性
Ref. https://www.reagent.co.uk/blog/what-are-amino-acids/ 10
Summary of Amino Acid Properties
Ref. Figure 1.2 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung 11

Nonstandard Amino Acids
• Two non-standard amino acids which can be specified by genetic code:
‐ Selenocysteine (Sec, U) is incorporated into some proteins at a UGA codon, which is normally a
stop codon.
‐ Pyrrolysine (Pyl, O) is used by some methanogenic archaea in enzymes that they use to produce
methane. It is coded for with the codon UAG.
• Non-standard amino acids which do not appear in protein:
• E.g. lanthionine, 2-aminoisobutyric acid, and dehydroalanine
• They often occur as intermediates in the metabolic pathways for standard amino acids
• Non-standard amino acids which are formed through modification to the R-groups of standard
amino acids:
• E.g. hydroxyproline is made by a post-translational modification of proline. 12
Protein Sequence and Structure
• A protein or a polypeptide chain is formed by joining
the amino acids together via a peptide bond.
• One end of the polypeptide is the amino group,
which is called N-terminus.
• The other end of the polypeptide is the carboxyl
group, which is called C-terminus.
• To write a peptide sequence, the convention is to
write the sequence from the N-terminus to the C-
terminus. Ref. https://en.wikipedia.org/wiki/Peptide_bond
13
Protein Structure
• Describe the protein structure in 4 levels: primary, secondary,
tertiary, and quaternary structures.
• Primary structure is the amino acid sequence.
• Secondary structure is formed through interaction between the
backbone atoms due to the hydrogen bonding. “local” structures
such as α-helices, β-sheets, and turns.
• Tertiary structure is the three-dimensional structure of the
protein formed through the interaction of the secondary
structures due to the hydrophobic effect.
• The quaternary structure is formed by the packing of different
proteins to form a protein complex. Ref. https://www.khanacademy.org/science/biology/macromolecules/proteins-and-amino-acids/a/orders-
of-protein-structure 14
DNA
• Deoxyribonucleic acid (DNA) is the genetic
material in all organisms (with certain viruses being
an exception) and it stores the instructions needed
by the cell to perform daily life functions.
• DNA consists of two strands which are interwoven
together to form a double helix.
• Each strand is a chain of small molecules called
nucleotides.
Ref. https://www.priyamstudycentre.com/2023/08/deoxyribonucleic-acid-dna.html 15
Nucleotides (1/2)
• Nucleotides are the building blocks of all

nucleic acid molecules.
磷酸
• Each nucleotide consists of: 腺嘌呤
‐ A pentose sugar deoxyribose

去氧核糖
‐ Phosphate (bound to the 5′ carbon) Ref. Figure 1.4 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung
‐ Base (bound to the 1′ carbon) – nitrogenous base

含氮鹼基
16
Nucleotides (2/2)
‣ The conversion of adenosine triphosphate (ATP)
to adenosine diphosphate (ADP)
• Each nucleotide can have one, two, or three
phosphates.
• Monophosphate nucleotides have only 1
phosphate, and are the building blocks of DNA.
• Diphosphate nucleotides and triphosphate
nucleotides have 2 and 3 phosphate groups,
Ref. Figure 1.5 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung
respectively. They are used to transport energy in

the cell.
17
Nitrogenous Bases 含氮鹼基
• There are 5 different nucleotides: adenine (A), cytosine (C), guanine (G), thymine (T),
and uracil(U).
• A, G are called purines. They have a 2-ring structure.
• C, T, U are called pyrimidines. They have a 1-ring structure.
• DNA only uses A, C, G, and T.
18
Watson-Crick Base Pairing
• Complementary bases:
‐ A with T (two hydrogen-bonds)
‐ C with G (three hydrogen-bonds)
• The distance between the two strands is
about 10 Å .
• Due to the weak interaction force, the
two strands form double helix.
Ref. https://www.mun.ca/biology/scarr/Watson-Crick_Model.html 19
Reasons behind the Complementary Bases
• Purines (A or G) cannot pair up because they are too big.
• Pyrimidines (C or T) cannot pair up because they are too small.
• G and T (or A and C) cannot pair up because they are chemically incompatible.
20
Orientation of a DNA
• One strand of DNA is generated by chaining together nucleotides.
• It forms a phosphate-sugar backbone.
• It has direction: from 5’to 3’. (Because DNA always extends from 3’end.)
• Upstream: from 5’to 3’ ACGTA
• Downstream: from 3’to 5’
21
Double Stranded DNA
• Normally, DNA is double stranded within a cell.
The two strands are antiparallel. One strand is
the reverse complement of another one.
• The double strands are interwoven together and
form a double helix.
• One reason for double stranded is that it eases
DNA replicate.
Ref. https://www.khanacademy.org/science/ap-biology/gene-expression-and-
regulation/replication/a/hs-dna-structure-and-replication-review
22
Circular form of DNA
• DNA usually exists in linear form
• E.g. in human, yeast, exists in linear form
• In some simple organism, DNA exists in circular form.
• E.g. in E. coli, exists in circular form
23
Location of DNA in a Cell
• Two types of organisms: Prokaryotes (原核生物) and
Eukaryotes (真核生物).
• Prokaryotes are single celled organisms with no nuclei
(e.g. bacteria)
‐ DNA swims within the cell
• Eukaryotes are organisms with single or multiple cells. Ref. https://www.ancestry.com/c/dna-learning-hub/cells
Their cells have nuclei. (e.g. plant and animal)

‐ DNA locates within the nucleus.
24
RNA
• Ribonucleic acid (RNA) is the nucleic acid which is produced during the transcription
process.
• RNA has both the properties of DNA and protein
‐ Similar to DNA, it can store and transfer information
‐ Similar to protein, it can form complex 3-dimensional structure and perform some
functions.
25
Nucleotide for RNA
• Nucleotide consists of three parts:
‐ Ribose Sugar (has an extra OH group at 2’)
‐ Phosphate (bound to the 5’carbon)
‐ Base (bound to the 1’carbon)
The nucleotide here has a ribose sugar, instead

of a deoxyribose in the DNA nucleotide.
26
RNA vs DNA
• RNA is single stranded.
• The nucleotides of RNA are quite similar to that of DNA, except that it has an extra
OH at position 2’.
‐ Due to this extra OH, it can form more hydrogen bonds than DNA. Thus, RNA can form
complexity 3-dimensional structure.
• RNA use the base U instead of T.
‐ U is chemically similar to T. In particular, U is also complementary to A.
27
Functions of RNA
• RNA has the properties of both DNA and proteins.
‐ First, similar to DNA, it can store and transfer information.
‐ Second, similar to protein, it can form complex 3D structures and perform functions.
• For the storage of information, RNA is not as stable as DNA and that’s why we still
have DNA.
• Protein can perform more functions than RNA does, which is the reason why protein
is still needed.
28
Different Types of RNA
• There are two types of RNAs: messenger RNAs (mRNAs) and non-coding RNAs (ncRNAs).
• Messenger RNAs carry the encoded information required to make proteins of all types.
• Non-coding RNAs include ribosomal RNAs (rRNAs), transfer RNAs (tRNAs), short ncRNAs,
and long ncRNAs.
‐ rRNAs form parts of ribosomes that help to translate mRNAs into proteins.
‐ tRNAs serve as a molecular dictionary that translates the nucleic acid code into the amino acid sequences
of proteins.
‐ Short ncRNAs include 4 different types: snoRNAs, microRNAs, siRNAs, piRNAs. They are responsible
for the regulation of the process for generating proteins from genes.
29
microRNA (miRNA)
• miRNA is a single-stranded RNA of length ~22.
• miRNA is encoded as a non-coding RNA.
• It first transcribed as a primary transcript called primary miRNA
(pri-miRNA).
• It then cleaved into a precursor miRNA (pre-miRNA) with the help
of the nuclease Drosha. Precursor miRNA is of length ~60-80 nt
and can potentially fold into a stem-loop structure.
• The pre-miRNA is transported into the cytoplasm by Exportin 5. It
is further cleaved into a mature miRNA by the endonuclease Dicer.
Ref. https://commons.wikimedia.org/wiki/File:MiRNA_processing.jpg 30
Genome, Chromosome, and Gene
Genome (基因體)
• The complete set of DNA in an organism is referred to collectively as a “genome”.
• Genomes of different organisms vary widely in size:
‐ the smallest known genome for a free-living organism (a bacterium called Mycoplasma genitalium生殖
支原體) contains about 600,000 DNA base pairs
‐ the largest known genome is Amoeba dubia 無恆變形蟲 which contains 670 billion base pairs
• Human and mouse genomes have about 3 billion DNA base pairs.
‐ Except for mature red blood cells, sperm cells, and egg cells, all human cells contain the same complete
genome, though some may have small differences due to mutations
‐ Sperm and egg cells only contain half of the genome from father and mother, respectively. During
fertilization, the sperm cell fuses with the egg cell, and forms a new cell containing the complete genome.
32
Chromosome (染色體)
• A genome is not one consecutive double-stranded DNA chain.
• Usually, a DNA is tightly wound around histone proteins and forms a chromosome.
• The total information stored in all chromosomes constitute a genome.
• In most multi-cell organisms, every cell contains the same complete set of genome.
May have some small different due to mutation
• Example:
Human Genome: has 3G base pairs, organized in 23 pairs of chromosomes
33
Gene
• A gene is a sequence of DNA that encodes a protein or an RNA molecule.
• In human genome, it is expected there are 30,000–35,000 genes.
• For gene that encodes protein,
‐ In Prokaryotic genome, one gene corresponds to one protein
‐ In Eukaryotic genome, one gene can corresponds to more than one protein because of the
process “alternative splicing”
34
Complexity of the Organism versus Genome Size
• Genome size really has no relationship to the complexity of the organism.

‐ The genome of humans, a complex multicellular organism, is approximately 3
billion base pairs.
‐ The Amoeba dubia, a single cell organism, has up to 670 billion base pairs.
35
Number of Genes versus Genome Size
• Prokaryotic genome: E.g. E. coli • Eukaryotic genome: E.g. Human

‐ Number of base pairs: 5M ‐ Number of base pairs: 3G
‐ Number of genes: 4k ‐ Estimated number of genes: 20k –30k
‐ Average length of a gene: 1000 bp ‐ Estimated average length of a gene: 1000-2000 bp
• Note that 90% of the E. coli genome consists of coding regions.

• Less than 3% of the human genome is believed to be coding regions. The rest is called
junk DNA.
• For Eukaryotic genome, the genome size has no relationship with the number of genes!
36
Mutation
‣ When one strand of a double-stranded DNA is
damaged, the strand can be repaired with the
Replication of DNA information of another strand.
‣ The replication and repair mechanisms help to
maintain the same genome for all cells in our body.
• Why is DNA double-stranded?
‐ One of the main reasons is to facilitate replication.
‐ Replication is the process allowing a cell to duplicate and pass its DNA to the two daughter
cells.
‣ The process first separates the two strands of DNA.
‣ Then, each DNA strand serves as a template for the synthesis of another
complementary strand with the help of DNA polymerase.
‣ This process generates two identical double-stranded DNAs for the two daughter cells.
38
Mutation of DNA
• Despite replication being a near-perfect process, infrequent mistakes called
“mutations” are still possible.
• Mutations are generally caused by modifying the way that nucleotides form hydrogen
bonds.
• Replication will hybridize wrong nucleotides with the template DNA and mutations
occur.
• One possible way to modify nucleotides is through tautomeric shift.
異構轉移
39
Ref. https://www.slideshare.net/Varshini3/spontaneous-mutation-232631980
Mutation of DNA
• Despite replication being a near-perfect process, infrequent mistakes called
“mutations” are still possible.
• Mutations are generally
• A rare caused by modifying
event which the chemical
changes the way that property
nucleotides formbases
of the hydrogen
and
affects the way the bases form hydrogen bonds.
bonds. • For example, tautomeric shift may enable adenine to base pair with
cytosine. When
• Replication will hybridize wronganucleotides
tautomeric shift
with occurs during DNA
the template replication, a wrong
and mutations
nucleotide will be inserted and a mutation occurs.
occur.
• One possible way to modify nucleotides is through tautomeric shift.
異構轉移
40
Types of Mutation
• Mutation is the change of
genome by sudden.
• It is the basis of evolution.
• It is also the cause of cancer.
• Note: mutation can occur in
DNA, RNA, and Protein.
Ref. https://microbenotes.com/types-of-mutations/
41
Central Dogma
Central Dogma
• Central dogma was first enunciated by Francis Crick in 1958 and restated in a Nature
paper published in 1970.
• Central dogma describes the process of transferring information from DNA to RNA to
protein.
• The expression of gene consists of two steps
• Transcription: DNA → mRNA Ref. Figure 1.11 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung
• Translation: mRNA → Protein

• Post-translation Modification: Protein → Modified protein
43
Central Dogma
• Central dogma was first enunciated by Francis Crick in 1958 and restated in a Nature
paper published in 1970.
• Central dogma describes the process of transferring information from DNA to RNA to
protein.
• The expression of gene consists of two steps
• Transcription: DNA → mRNA Ref. Figure 1.11 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung
• Translation: mRNA → Protein

• Post-translation Modification: Protein → Modified protein
Ref. https://stock.adobe.com/hk/images/central-dogma-of-dna-transcription-and-translation/502112289
44
Ref. https://www.youtube.com/watch?v=gG7uCskUOrA&ab_channel=yourgenome
DNA
45
Introns and Exons
• Eukaryotic genes contain introns and exons.
‐ Introns are sequences that ultimately will be spliced out of the mRNA
‐ Introns normally satisfies the GT-AG rule, that is, intron begins with GT and end with AG.
‐ Each gene can have many introns and each intron may has thousands bases.
• Introns can be very long. An extreme example (gene that associated with the disease
cystic fibrosis in humans):
‐ With 24 introns of total length ≈1M
‐ The total length of exons ≈1k
46
Exon – Definition
• Exons are segments within a gene that are transcribed into RNA and ultimately
expressed as parts of proteins.
• They contain the actual coding sequences for proteins, meaning that the sequence of
nucleotides within exons is translated into the sequence of amino acids that make up
proteins.
47
Exon – Function
• Protein Coding: Exons include sequences that directly code for amino acids, forming the
functional parts of proteins.
• RNA Splicing: During the process of RNA splicing, introns are removed from the precursor
mRNA (pre-mRNA), and exons are joined together. This splicing can be alternative, meaning
different combinations of exons can be joined in various ways to produce multiple unique
mRNA transcripts from a single gene, leading to the production of different proteins. This
significantly increases the diversity of proteins that can be produced by a single gene.
• Regulatory Sequences: Some exons also contain sequences that play roles in regulating gene
expression at the level of RNA processing and transport.
48
Intron – Definition
• Introns are non-coding sequences found within genes that are transcribed into RNA
but are removed from the pre-mRNA during RNA splicing.
• Introns thus do not encode protein sequences and are not represented in the final
mRNA that gets translated into protein.
49
Intron – Function
• RNA Splicing: The primary role of introns is in RNA splicing, where they are removed from the pre-mRNA.
This process is crucial for the correct assembly of the coding regions (exons) into a contiguous sequence that
can be translated into a protein.
• Gene Regulation: Introns can contain regulatory elements that influence gene expression. These elements
can affect how splicing occurs, which can alter the mRNA produced and thus the protein synthesized. This
regulation can be important for developmental processes, cellular differentiation, and adapting to
environmental changes.
• Alternative Splicing: Introns play a key role in alternative splicing, where the same pre-mRNA can be
spliced in different ways to produce different mRNA variants, leading to the production of different proteins
from the same gene. This increases protein diversity without requiring more genes.
• Evolutionary Role: Introns may contribute to genetic diversity and evolution. Their presence allows for
exon shuffling during genetic recombination events, potentially leading to new protein functions. 50
Transcription – Procaryotes
• Synthesize a piece of RNA (messenger RNA, mRNA)
from one strand of the DNA gene.
1. An enzyme RNA polymerase temporarily
separates the double-stranded DNA
2. It begins the transcription at the transcription start site.
3. A → A, C → C, G → G, and T → U
4. Once the RNA polymerase reaches the transcription
start site, transcription stop.
51
Transcription – Eukaryotes
• A prokaryotic gene is completely transcribed into an
mRNA by the RNA polymerase.
1. RNA polymerase produces a pre-mRNA which
contains both introns and exons.
2. The 5′ cap and poly-A tail are added to the pre-mRNA.
3. The introns are removed and an mRNA is produced.
4. The final mRNA is transported out of the nucleus.

52
Translation
• Translation is also called protein synthesis. It synthesizes a protein from a mRNA.
• The translation process is handled by a molecular complex known as a ribosome
which consists of both proteins and ribosomal RNA (rRNA).
1. The ribosome reads mRNA from 5′ to 3′. The translation starts around the start codon
(translation start site)
2. With the help of transfer RNA (tRNA), each codon is translated to an amino acid.
3. The translation stops once the ribosome reads the stop codon (translation stop site).
A class of RNA molecules that transport amino acids to ribosomes for
incorporation into a polypeptide undergoing synthesis.
53
Genetic Code (1/2)
密碼子
• Each amino acids are encoded by consecutive sequences of 3 nucleotides, called codon.
• The decoding table from codon to amino acid is called genetic code.
• Note:
‐ There are 43=64 different codons. Thus, the codons are not one-to-one correspondence to the 20 amino
acids.
‐ All organisms use the same decoding table!
‐ The codons that encode the same amino acid tend to have the same first and second nucleotide.
‐ Recall that amino acids can be classified into 4 groups. A single base change in a codon is usually not
sufficient to cause a codon to code for an amino acid in different group.
54
Genetic Code (2/2)
• Start codon: ATG (also code
for M)
• Stop codon: TAA, TAG, TGA
55
Codon Usage
• All but 2 amino acids (W and M) are coded by more than one codon.
• S is coded by 6 different codons.
• Different organisms often prefers one particular codon to encode a particular amino
acid.
• S. pombe, C. elegans, D. melanogaster, and many unicellular organisms, highly
expressed genes like those encoding ribosomal proteins use codons whose tRNAs are
the most abundant in the organism.
• Low expressed genes use codons whose tRNAs are the least abundant.
56
More on Gene Structure
• A gene consists of three regions: the 5′ untranslated region, the coding region, and the
3′ untranslated region.
• Coding region contains the codons for protein. It is also called open reading frame. Its
length is a multiple of 3. It must begin with start codon, end with end codon, and the rest of
its codons are not a end codon.
• mRNA transcript contains 5’untranslated region + coding region + 3’untranslated region
• Regulatory region contains promoter, which regulate the transcription process.
57
More on Gene Structure
• A gene consists of three regions: the 5′ untranslated region, the coding region, and the
3′ untranslated region.
• Coding region contains the codons for protein. It is also called open reading frame. Its
length is a multiple of 3. It must begin with start codon, end with end codon, and the rest of
its codons are not a end codon.
• mRNA transcript contains 5’untranslated region + coding region + 3’untranslated region
• Regulatory region contains promoter, which regulate the transcription process.
58
More on tRNA
tRNA Structure
• tRNA = transfer RNA
• There are 61 different tRNAs, each correspond
to a nontermination codon
• Each tRNA folds to form a cloverleaf-shaped
structure.
• One side holds an anticodon.
• The other side holds the appropriate amino acid.
Ref. https://www.biorender.com/template/transfer-rna-trna-structure
59
Gene structure –Eukaryotes
The length of the

yellow part must be
multiple of 3. 60
Post-Translation Modification (PTM)
• Post-translation modification (PTM) is the chemical modification of a protein after its
translation.
• It involves
‐ Addition of functional groups
E.g. acylation, methylation, phosphorylation
‐ Addition of other peptides
E.g. ubiquitination, the covalent linkage to the protein ubiquitin.
‐ Structural changes
E.g. disulfide bridges, the covalent linkage of two cysteine amino acids.
61
Examples of PTM (Kinase and Phosphatases)
磷酸化
• Phosphorylation is a process to add a phosphate (PO4) group to a protein.
• Kinase and Phosphatases can phosphorylate and dephosphorylate a protein.
• This process changes the conformation of proteins and causes them to become
activated or deactivated.
• For example, phosphorylation of p53 (tumor suppressor protein) causes apoptotic cell
death.
• Phosphorylation is used to dynamically turn on or off many signaling pathways.
62
Examples of PTM (tRNA)
• Aminoacylation is the process of adding an aminoacyl group to a protein.
• tRNA applies aminoacylation to covalently link its 3’end CCA to an amino acid.
• This process is known as an aminoacyl tRNA synthetase.
63
Population Genetics
• The genomes of two individuals of the same species are not exactly the same.
• Given the genome of two individuals of the same species, if there exists a position (called a
locus) where the nucleotides between the two individuals are different, we call it a single
nucleotide polymorphism (SNP).
• For humans, we expect SNPs are responsible for over 80% of the variation between two
individuals.
• Understanding SNPs can help us to understand the different within a population.
• For example, in human, SNPs control the color of hair, the blood type, etc., of different
individual. Also, many diseases like cancer are related to SNPs.
64
Biotechnological Tools
Basic Biotechnological Tools
• Cutting and breaking DNA
‐ Restriction Enzymes
‐ Shotgun method
• Copying DNA
‐ Cloning
‐ Polymerase Chain Reaction –PCR
• Measuring length of DNA
‐ Gel Electrophoresis
66
Restriction Enzymes
• Restriction enzyme recognizes certain point, called restriction site, in the DNA with a
particular pattern and break it. Such process is called digestion.
• Naturally, restriction enzymes are used to break foreign DNA to avoid infection.
• Example:
‐ EcoRI is the first restriction enzyme discovered that cuts DNA wherever the sequence
GAATTC is found.
‐ Similar to most of the other restriction enzymes, GAATTC is a palindrome, that is,
GAATTC is its own reverse complement.
• Currently, more than 300 known restriction enzymes have been discovered.
67
EcoRI
• EcoRI is the first discovered restriction enzyme.
• It cut between G and A. Sticky ends are created.

• Note that some restriction enzymes give rise to blunt ends instead of sticky ends.
68
Shotgun method
• Break the DNA molecule into small pieces randomly
• Method:
‐ Have a solution having a large amount of purified DNA
‐ By applying high vibration, each molecule is broken randomly into small fragments.
69
Cloning
• For many experiments, small amounts of DNAs are not enough.
• Cloning is one way to replicate DNAs.
70
Polymerase Chain Reaction (1/2)
• PCR is invented by Kary B. Mullis in 1984.
• PCR allows rapidly replication of a selected region of a DNA without the need for a
living cell.
• Automated! Time required: a few hours.
• Inputs for PCR:
• Two oligonucleotides are synthesized, each complementary to the two ends of the region.
They are used as primers.
• Thermostable DNA polymerase TaqI.
Taq stands for the bacterium Thermos aquaticus that grows in the yellow stone hot springs.
71
Polymerase Chain Reaction (2/)
• PCR consists of repeating a cycle with three phases 25-30 times. Each cycle takes
about 5 minutes
Phase 1: separate double stranded DNA by heat
Phase 2: cool; add synthesis primers
Phase 3: Add DNA polymerase TaqI to catalyze 5’to 3’DNA synthesis
• Then, the selected region has been amplified exponentially.
72
73
Example Applications of PCR
• PCR method is used to amplify DNA segments to the point where it can be readily
isolated for use.
• Example applications:
Clone DNA fragments from mummies
Detection of viral infections
74
Gel Electrophoresis
• Developed by Frederick Sanger in 1977.
• A technique used to separate a mixture of DNA fragments of different lengths.
• We apply an electrical field to the mixture of DNA.
• Note that DNA is negative charged. Due to friction, small molecules travel faster than
large molecules.
• The mixture is separated into bands, each containing DNA molecules of the same
length.
75
Applications
• Separating DNA sequences from a mixture
For example, after a genome is digested by a restriction enzyme, hundreds or
thousands of DNA fragments are yielded.
By Gel Electrophoresis, the fragments can be separated.
76
Sequencing by Gel electrophoresis
• An application of gel electrophoresis is to reconstruct DNA sequence of length 500-
800 within a few hours
• Idea:
‐ Generating all sequences end with A
‐ Using gel electrophoresis, the sequences end with A are separated into different bands.
Such information tells us the positions of A’s in the sequence.
‐ Similar for C, G, and T
77
Read the Sequence
• We have four groups of fragments: A, C, G, and T.
• All fragments are placed in negative end.
• The fragments move to the positive end.
• From the relative distances of the fragments, we can reconstruct the sequence.
Ref. Figure 1.17 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung 78

Hybridization
• Among thousands of DNA fragments, Biologists routinely need to find a DNA
fragment which contains a particular DNA subsequence.
• This can be done based on hybridization.
1. Suppose we need to find a DNA fragments which contains ACCGAT.
2. Create probes which is inversely complementary to ACCGAT.
3. Mix the probes with the DNA fragments.
4. Due to the hybridization rule (A=T, C≡G), DNA fragments which contain ACCGAT will
hybridize with the probes
79
DNA Array
• The idea of hybridization leads to the DNA array technology.
• In the past, “one gene in one experiment”
• Hard to get the whole picture.
• DNA array is a technology which allows researchers to do experiment on a set of
genes or even the whole genome.
80
Idea of DNA Array
• An orderly arrangement of thousands of spots.
• Each spot contains many copies of the same DNA
fragment.
• When the array is exposed to the target solution, DNA
fragments in both array and target solution will match based
on hybridization rule:
• A=T, C≡G (hydrogen bond) Ref. Figure 1.18 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung
• Such idea allows us to do thousands of hybridization

experiments at the same time.
81
Applications of DNA arrays
• Sequencing by hybridization
‐ A promising alternative to sequencing by gel electrophoresis
‐ It may be able to reconstruct longer DNA sequences in shorter time
• Expression profile of a cell
‐ DNA arrays allow us to monitor the activities within a cell
‐ Each spot contains the complement of a particular gene
‐ Due to hybridization, we can measure the concentration of different mRNAs within a cell
• SNP detection
‐ Using probes with different alleles to detect the single nucleotide variation.
82
Brief History of Bioinformatics
Ref. https://link.springer.com/chapter/10.1007/978-3-031-22206-1_4 84
Ref. https://link.springer.com/chapter/10.1007/978-3-031-22206-1_4 85
Ref. https://en.wikipedia.org/wiki/Computational_biology
86
Q &A
Thank you!

CE6068 Lecture 1

Uploaded by

Copyright:

Available Formats

CE6068 Lecture 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CE6068 Lecture 1

Uploaded by

Copyright:

Available Formats

CE6068

Bioinformatics and Computational Molecular

• DNA, RNA, and Protein

Ref. Figure 1.2 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung 11

• Nucleotides are the building blocks of all

‐ A pentose sugar deoxyribose

‐ Base (bound to the 1′ carbon) – nitrogenous base

respectively. They are used to transport energy in

Ref. Figure 1.6 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung

• Downstream: from 3’to 5’

Ref. Figure 1.7 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung

Their cells have nuclei. (e.g. plant and animal)

The nucleotide here has a ribose sugar, instead

• Genome size really has no relationship to the complexity of the organism.

• Prokaryotic genome: E.g. E. coli • Eukaryotic genome: E.g. Human

• Note that 90% of the E. coli genome consists of coding regions.

• Translation: mRNA → Protein

• Translation: mRNA → Protein

Ref. Figure 1.12 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung

Ref. Figure 1.13 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung

The length of the

Ref. Figure 1.15 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung

• It cut between G and A. Sticky ends are created.

Ref. Figure 1.17 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung 78

• Such idea allows us to do thousands of hybridization

You might also like