CE6068 Lecture 1
CE6068 Lecture 1
CE6068 Lecture 1
1
DNA, RNA, and Protein
Organization of the Body
• The organization of the body progresses from
cells to tissues, then to organs, followed by organ
systems, ultimately forming a fully functioning
organism.
• Cell performs two type of functions:
‐ Perform chemical reactions necessary to maintain
our life
‐ Pass the information for maintaining life to the next
generation
Ref. https://www.exploringnature.org/db/view/Levels-of-Organization-in-the-Body-Cells-to-Organisms-Color 3
Organization of the Body
• The organization of the body progresses from
cells to tissues, then to organs, followed by organ
Cells
systems, store the
ultimately genetic
forming information
a fully functioning
in the form of
organism. double-stranded DNA
• Cell performs two type of functions:
‐ Perform chemical reactions necessary to maintain
our life
‐ Pass the information for maintaining life to the next
generation
Ref. https://www.exploringnature.org/db/view/Levels-of-Organization-in-the-Body-Cells-to-Organisms-Color 4
Organization of the Body
• The organization of the body progresses from
cells to tissues, then to organs, followed by organ
systems, ultimately forming a fully functioning
organism.
Portions of the DNA called genes
• Cell performs two type of functions:
are transcribed into closely related
‐ Perform chemical reactions
molecules callednecessary
RNAs.to maintain
our life
‐ Pass the information for maintaining life to the next
generation
Ref. https://www.exploringnature.org/db/view/Levels-of-Organization-in-the-Body-Cells-to-Organisms-Color 5
Organization of the Body
‣ Protein performs chemical reactions
• The organization of the body progresses from
‣ DNA
cells tostores andthen
tissues, passes information
to organs, followed by organ
‣ RNA is the
systems, intermediate
ultimately formingbetween DNA and
a fully functioning
organism.
proteins
• Cell performs two type of functions:
‐ Perform chemical reactions necessary to maintain
our life
‐ Pass the information for maintaining life to the next
generation
Ref. https://www.exploringnature.org/db/view/Levels-of-Organization-in-the-Body-Cells-to-Organisms-Color 6
Protein
• Proteins constitute most of a cell’s dry mass. They are not only the
building blocks from which cells are built, but also execute nearly
all cell functions.
• Protein is a sequence composed of an alphabet of 20 amino acids.
‐ The length is in the range of 20 to more than 5000 amino acids.
‐ In average, protein contains around 350 amino acids.
• Protein folds into three-dimensional shape, which form the
building blocks and perform most of the chemical reactions within Ref. https://en.wikipedia.org/wiki/Protein
a cell.
7
Protein
• Proteins constitute most of a cell’s dry mass. They are not only the
building blocks from which cells are built, but also execute nearly
‣ The folding
all cell of the proteins is caused by the weak interactions
functions.
among amino acid residues.
• Protein is a sequence composed of an alphabet of 20 amino acids.
‣ The weak interactions include hydrogen bonds, ionic bonds,
‐ The
van derlength
Waalsis attractions,
in the range ofand
20 to more
the than 5000 amino
hydrophobic acids.
interactions.
‐ In average, protein contains around 350 amino acids.
• Protein folds into three-dimensional shape, which form the
building blocks and perform most of the chemical reactions within Ref. https://en.wikipedia.org/wiki/Protein
a cell.
8
Amino Acids
Cα (the central carbon)
• Amino acids are the building blocks of
proteins.
• Each amino acid consist of
‣ Amino Group (-NH2 group)
‣ Carboxyl Group (-COOH group)
‣ R Group (Side Chain), which determines the
type of an amino acid Ref. https://www.reagent.co.uk/blog/what-are-amino-acids/
9
20 Common Amino Acids
• There are 20 common amino
acids, characterized by
different R groups.
• These 20 amino acids can be
classified according to their
mass, volume, acidity, polarity,
and hydrophobicity.
疏水性
Ref. https://www.reagent.co.uk/blog/what-are-amino-acids/ 10
Summary of Amino Acid Properties
13
Protein Structure
• Describe the protein structure in 4 levels: primary, secondary,
tertiary, and quaternary structures.
• Primary structure is the amino acid sequence.
• Secondary structure is formed through interaction between the
backbone atoms due to the hydrogen bonding. “local” structures
such as α-helices, β-sheets, and turns.
• Tertiary structure is the three-dimensional structure of the
protein formed through the interaction of the secondary
structures due to the hydrophobic effect.
• The quaternary structure is formed by the packing of different
proteins to form a protein complex. Ref. https://www.khanacademy.org/science/biology/macromolecules/proteins-and-amino-acids/a/orders-
of-protein-structure 14
DNA
• Deoxyribonucleic acid (DNA) is the genetic
material in all organisms (with certain viruses being
an exception) and it stores the instructions needed
by the cell to perform daily life functions.
• DNA consists of two strands which are interwoven
together to form a double helix.
• Each strand is a chain of small molecules called
nucleotides.
Ref. https://www.priyamstudycentre.com/2023/08/deoxyribonucleic-acid-dna.html 15
Nucleotides (1/2)
16
Nucleotides (2/2)
‣ The conversion of adenosine triphosphate (ATP)
to adenosine diphosphate (ADP)
• Each nucleotide can have one, two, or three
phosphates.
• Monophosphate nucleotides have only 1
phosphate, and are the building blocks of DNA.
• Diphosphate nucleotides and triphosphate
nucleotides have 2 and 3 phosphate groups,
Ref. Figure 1.5 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung
18
Watson-Crick Base Pairing
• Complementary bases:
‐ A with T (two hydrogen-bonds)
‐ C with G (three hydrogen-bonds)
• The distance between the two strands is
about 10 Å .
• Due to the weak interaction force, the
two strands form double helix.
Ref. https://www.mun.ca/biology/scarr/Watson-Crick_Model.html 19
Reasons behind the Complementary Bases
• Purines (A or G) cannot pair up because they are too big.
• Pyrimidines (C or T) cannot pair up because they are too small.
• G and T (or A and C) cannot pair up because they are chemically incompatible.
20
Orientation of a DNA
• One strand of DNA is generated by chaining together nucleotides.
• It forms a phosphate-sugar backbone.
• It has direction: from 5’to 3’. (Because DNA always extends from 3’end.)
• Upstream: from 5’to 3’ ACGTA
21
Double Stranded DNA
• Normally, DNA is double stranded within a cell.
The two strands are antiparallel. One strand is
the reverse complement of another one.
• The double strands are interwoven together and
form a double helix.
• One reason for double stranded is that it eases
DNA replicate.
Ref. https://www.khanacademy.org/science/ap-biology/gene-expression-and-
regulation/replication/a/hs-dna-structure-and-replication-review
22
Circular form of DNA
• DNA usually exists in linear form
• E.g. in human, yeast, exists in linear form
• In some simple organism, DNA exists in circular form.
• E.g. in E. coli, exists in circular form
23
Location of DNA in a Cell
• Two types of organisms: Prokaryotes (原核生物) and
Eukaryotes (真核生物).
• Prokaryotes are single celled organisms with no nuclei
(e.g. bacteria)
‐ DNA swims within the cell
• Eukaryotes are organisms with single or multiple cells. Ref. https://www.ancestry.com/c/dna-learning-hub/cells
25
Nucleotide for RNA
• Nucleotide consists of three parts:
‐ Ribose Sugar (has an extra OH group at 2’)
‐ Phosphate (bound to the 5’carbon)
‐ Base (bound to the 1’carbon)
27
Functions of RNA
• RNA has the properties of both DNA and proteins.
‐ First, similar to DNA, it can store and transfer information.
‐ Second, similar to protein, it can form complex 3D structures and perform functions.
• For the storage of information, RNA is not as stable as DNA and that’s why we still
have DNA.
• Protein can perform more functions than RNA does, which is the reason why protein
is still needed.
28
Different Types of RNA
• There are two types of RNAs: messenger RNAs (mRNAs) and non-coding RNAs (ncRNAs).
• Messenger RNAs carry the encoded information required to make proteins of all types.
• Non-coding RNAs include ribosomal RNAs (rRNAs), transfer RNAs (tRNAs), short ncRNAs,
and long ncRNAs.
‐ rRNAs form parts of ribosomes that help to translate mRNAs into proteins.
‐ tRNAs serve as a molecular dictionary that translates the nucleic acid code into the amino acid sequences
of proteins.
‐ Short ncRNAs include 4 different types: snoRNAs, microRNAs, siRNAs, piRNAs. They are responsible
for the regulation of the process for generating proteins from genes.
29
microRNA (miRNA)
• miRNA is a single-stranded RNA of length ~22.
• miRNA is encoded as a non-coding RNA.
• It first transcribed as a primary transcript called primary miRNA
(pri-miRNA).
• It then cleaved into a precursor miRNA (pre-miRNA) with the help
of the nuclease Drosha. Precursor miRNA is of length ~60-80 nt
and can potentially fold into a stem-loop structure.
• The pre-miRNA is transported into the cytoplasm by Exportin 5. It
is further cleaved into a mature miRNA by the endonuclease Dicer.
Ref. https://commons.wikimedia.org/wiki/File:MiRNA_processing.jpg 30
Genome, Chromosome, and Gene
Genome (基因體)
• The complete set of DNA in an organism is referred to collectively as a “genome”.
• Genomes of different organisms vary widely in size:
‐ the smallest known genome for a free-living organism (a bacterium called Mycoplasma genitalium生殖
支原體) contains about 600,000 DNA base pairs
‐ the largest known genome is Amoeba dubia 無恆變形蟲 which contains 670 billion base pairs
• Human and mouse genomes have about 3 billion DNA base pairs.
‐ Except for mature red blood cells, sperm cells, and egg cells, all human cells contain the same complete
genome, though some may have small differences due to mutations
‐ Sperm and egg cells only contain half of the genome from father and mother, respectively. During
fertilization, the sperm cell fuses with the egg cell, and forms a new cell containing the complete genome.
32
Chromosome (染色體)
• A genome is not one consecutive double-stranded DNA chain.
• Usually, a DNA is tightly wound around histone proteins and forms a chromosome.
• The total information stored in all chromosomes constitute a genome.
• In most multi-cell organisms, every cell contains the same complete set of genome.
May have some small different due to mutation
• Example:
Human Genome: has 3G base pairs, organized in 23 pairs of chromosomes
33
Gene
• A gene is a sequence of DNA that encodes a protein or an RNA molecule.
• In human genome, it is expected there are 30,000–35,000 genes.
• For gene that encodes protein,
‐ In Prokaryotic genome, one gene corresponds to one protein
‐ In Eukaryotic genome, one gene can corresponds to more than one protein because of the
process “alternative splicing”
34
Complexity of the Organism versus Genome Size
35
Number of Genes versus Genome Size
38
Mutation of DNA
• Despite replication being a near-perfect process, infrequent mistakes called
“mutations” are still possible.
• Mutations are generally caused by modifying the way that nucleotides form hydrogen
bonds.
• Replication will hybridize wrong nucleotides with the template DNA and mutations
occur.
• One possible way to modify nucleotides is through tautomeric shift.
異構轉移
39
Ref. https://www.slideshare.net/Varshini3/spontaneous-mutation-232631980
Mutation of DNA
• Despite replication being a near-perfect process, infrequent mistakes called
“mutations” are still possible.
• Mutations are generally
• A rare caused by modifying
event which the chemical
changes the way that property
nucleotides formbases
of the hydrogen
and
affects the way the bases form hydrogen bonds.
bonds. • For example, tautomeric shift may enable adenine to base pair with
cytosine. When
• Replication will hybridize wronganucleotides
tautomeric shift
with occurs during DNA
the template replication, a wrong
and mutations
nucleotide will be inserted and a mutation occurs.
occur.
• One possible way to modify nucleotides is through tautomeric shift.
異構轉移
40
Types of Mutation
• Mutation is the change of
genome by sudden.
• It is the basis of evolution.
• It is also the cause of cancer.
• Note: mutation can occur in
DNA, RNA, and Protein.
Ref. https://microbenotes.com/types-of-mutations/
41
Central Dogma
Central Dogma
• Central dogma was first enunciated by Francis Crick in 1958 and restated in a Nature
paper published in 1970.
• Central dogma describes the process of transferring information from DNA to RNA to
protein.
• The expression of gene consists of two steps
• Transcription: DNA → mRNA Ref. Figure 1.11 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung
DNA
45
Introns and Exons
• Eukaryotic genes contain introns and exons.
‐ Introns are sequences that ultimately will be spliced out of the mRNA
‐ Introns normally satisfies the GT-AG rule, that is, intron begins with GT and end with AG.
‐ Each gene can have many introns and each intron may has thousands bases.
• Introns can be very long. An extreme example (gene that associated with the disease
cystic fibrosis in humans):
‐ With 24 introns of total length ≈1M
‐ The total length of exons ≈1k
46
Exon – Definition
• Exons are segments within a gene that are transcribed into RNA and ultimately
expressed as parts of proteins.
• They contain the actual coding sequences for proteins, meaning that the sequence of
nucleotides within exons is translated into the sequence of amino acids that make up
proteins.
47
Exon – Function
• Protein Coding: Exons include sequences that directly code for amino acids, forming the
functional parts of proteins.
• RNA Splicing: During the process of RNA splicing, introns are removed from the precursor
mRNA (pre-mRNA), and exons are joined together. This splicing can be alternative, meaning
different combinations of exons can be joined in various ways to produce multiple unique
mRNA transcripts from a single gene, leading to the production of different proteins. This
significantly increases the diversity of proteins that can be produced by a single gene.
• Regulatory Sequences: Some exons also contain sequences that play roles in regulating gene
expression at the level of RNA processing and transport.
48
Intron – Definition
• Introns are non-coding sequences found within genes that are transcribed into RNA
but are removed from the pre-mRNA during RNA splicing.
• Introns thus do not encode protein sequences and are not represented in the final
mRNA that gets translated into protein.
49
Intron – Function
• RNA Splicing: The primary role of introns is in RNA splicing, where they are removed from the pre-mRNA.
This process is crucial for the correct assembly of the coding regions (exons) into a contiguous sequence that
can be translated into a protein.
• Gene Regulation: Introns can contain regulatory elements that influence gene expression. These elements
can affect how splicing occurs, which can alter the mRNA produced and thus the protein synthesized. This
regulation can be important for developmental processes, cellular differentiation, and adapting to
environmental changes.
• Alternative Splicing: Introns play a key role in alternative splicing, where the same pre-mRNA can be
spliced in different ways to produce different mRNA variants, leading to the production of different proteins
from the same gene. This increases protein diversity without requiring more genes.
• Evolutionary Role: Introns may contribute to genetic diversity and evolution. Their presence allows for
exon shuffling during genetic recombination events, potentially leading to new protein functions. 50
Transcription – Procaryotes
• Synthesize a piece of RNA (messenger RNA, mRNA)
from one strand of the DNA gene.
1. An enzyme RNA polymerase temporarily
separates the double-stranded DNA
2. It begins the transcription at the transcription start site.
3. A → A, C → C, G → G, and T → U
4. Once the RNA polymerase reaches the transcription
start site, transcription stop.
Ref. Figure 1.12 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung
51
Transcription – Eukaryotes
• A prokaryotic gene is completely transcribed into an
mRNA by the RNA polymerase.
1. RNA polymerase produces a pre-mRNA which
contains both introns and exons.
2. The 5′ cap and poly-A tail are added to the pre-mRNA.
3. The introns are removed and an mRNA is produced.
4. The final mRNA is transported out of the nucleus.
54
Genetic Code (2/2)
• Start codon: ATG (also code
for M)
• Stop codon: TAA, TAG, TGA
55
Codon Usage
• All but 2 amino acids (W and M) are coded by more than one codon.
• S is coded by 6 different codons.
• Different organisms often prefers one particular codon to encode a particular amino
acid.
• S. pombe, C. elegans, D. melanogaster, and many unicellular organisms, highly
expressed genes like those encoding ribosomal proteins use codons whose tRNAs are
the most abundant in the organism.
• Low expressed genes use codons whose tRNAs are the least abundant.
56
More on Gene Structure
• A gene consists of three regions: the 5′ untranslated region, the coding region, and the
3′ untranslated region.
• Coding region contains the codons for protein. It is also called open reading frame. Its
length is a multiple of 3. It must begin with start codon, end with end codon, and the rest of
its codons are not a end codon.
• mRNA transcript contains 5’untranslated region + coding region + 3’untranslated region
• Regulatory region contains promoter, which regulate the transcription process.
57
More on Gene Structure
Ref. Figure 1.14 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung
• A gene consists of three regions: the 5′ untranslated region, the coding region, and the
3′ untranslated region.
• Coding region contains the codons for protein. It is also called open reading frame. Its
length is a multiple of 3. It must begin with start codon, end with end codon, and the rest of
its codons are not a end codon.
• mRNA transcript contains 5’untranslated region + coding region + 3’untranslated region
• Regulatory region contains promoter, which regulate the transcription process.
58
More on tRNA
tRNA Structure
• tRNA = transfer RNA
• There are 61 different tRNAs, each correspond
to a nontermination codon
• Each tRNA folds to form a cloverleaf-shaped
structure.
• One side holds an anticodon.
• The other side holds the appropriate amino acid.
Ref. https://www.biorender.com/template/transfer-rna-trna-structure
59
Gene structure –Eukaryotes
62
Examples of PTM (tRNA)
• Aminoacylation is the process of adding an aminoacyl group to a protein.
• tRNA applies aminoacylation to covalently link its 3’end CCA to an amino acid.
• This process is known as an aminoacyl tRNA synthetase.
63
Population Genetics
• The genomes of two individuals of the same species are not exactly the same.
• Given the genome of two individuals of the same species, if there exists a position (called a
locus) where the nucleotides between the two individuals are different, we call it a single
nucleotide polymorphism (SNP).
• For humans, we expect SNPs are responsible for over 80% of the variation between two
individuals.
• Understanding SNPs can help us to understand the different within a population.
• For example, in human, SNPs control the color of hair, the blood type, etc., of different
individual. Also, many diseases like cancer are related to SNPs.
64
Biotechnological Tools
Basic Biotechnological Tools
• Cutting and breaking DNA
‐ Restriction Enzymes
‐ Shotgun method
• Copying DNA
‐ Cloning
‐ Polymerase Chain Reaction –PCR
• Measuring length of DNA
‐ Gel Electrophoresis
66
Restriction Enzymes
• Restriction enzyme recognizes certain point, called restriction site, in the DNA with a
particular pattern and break it. Such process is called digestion.
• Naturally, restriction enzymes are used to break foreign DNA to avoid infection.
• Example:
‐ EcoRI is the first restriction enzyme discovered that cuts DNA wherever the sequence
GAATTC is found.
‐ Similar to most of the other restriction enzymes, GAATTC is a palindrome, that is,
GAATTC is its own reverse complement.
• Currently, more than 300 known restriction enzymes have been discovered.
67
EcoRI
• EcoRI is the first discovered restriction enzyme.
68
Shotgun method
• Break the DNA molecule into small pieces randomly
• Method:
‐ Have a solution having a large amount of purified DNA
‐ By applying high vibration, each molecule is broken randomly into small fragments.
69
Cloning
• For many experiments, small amounts of DNAs are not enough.
• Cloning is one way to replicate DNAs.
70
Polymerase Chain Reaction (1/2)
• PCR is invented by Kary B. Mullis in 1984.
• PCR allows rapidly replication of a selected region of a DNA without the need for a
living cell.
• Automated! Time required: a few hours.
• Inputs for PCR:
• Two oligonucleotides are synthesized, each complementary to the two ends of the region.
They are used as primers.
• Thermostable DNA polymerase TaqI.
Taq stands for the bacterium Thermos aquaticus that grows in the yellow stone hot springs.
71
Polymerase Chain Reaction (2/)
• PCR consists of repeating a cycle with three phases 25-30 times. Each cycle takes
about 5 minutes
Phase 1: separate double stranded DNA by heat
Phase 2: cool; add synthesis primers
Phase 3: Add DNA polymerase TaqI to catalyze 5’to 3’DNA synthesis
• Then, the selected region has been amplified exponentially.
72
73
Ref. Figure 1.16 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung
Example Applications of PCR
• PCR method is used to amplify DNA segments to the point where it can be readily
isolated for use.
• Example applications:
Clone DNA fragments from mummies
Detection of viral infections
74
Gel Electrophoresis
• Developed by Frederick Sanger in 1977.
• A technique used to separate a mixture of DNA fragments of different lengths.
• We apply an electrical field to the mixture of DNA.
• Note that DNA is negative charged. Due to friction, small molecules travel faster than
large molecules.
• The mixture is separated into bands, each containing DNA molecules of the same
length.
75
Applications
• Separating DNA sequences from a mixture
For example, after a genome is digested by a restriction enzyme, hundreds or
thousands of DNA fragments are yielded.
By Gel Electrophoresis, the fragments can be separated.
76
Sequencing by Gel electrophoresis
• An application of gel electrophoresis is to reconstruct DNA sequence of length 500-
800 within a few hours
• Idea:
‐ Generating all sequences end with A
‐ Using gel electrophoresis, the sequences end with A are separated into different bands.
Such information tells us the positions of A’s in the sequence.
‐ Similar for C, G, and T
77
Read the Sequence
• We have four groups of fragments: A, C, G, and T.
• All fragments are placed in negative end.
• The fragments move to the positive end.
• From the relative distances of the fragments, we can reconstruct the sequence.
80
Idea of DNA Array
• An orderly arrangement of thousands of spots.
• Each spot contains many copies of the same DNA
fragment.
• When the array is exposed to the target solution, DNA
fragments in both array and target solution will match based
on hybridization rule:
• A=T, C≡G (hydrogen bond) Ref. Figure 1.18 in Algorithms in Bioinformatics: A Practical Introduction by Wing-Kin Sung
86
Q &A
Thank you!