Anatomy of A Gene

Mrs.
OFELIA SOLANO SALUDAR

Department of Natural Sciences
University of St. La Salle
Bacolod City
BASIC GENETIC MECHANISMS
How did we know that genes are made of DNA?
Streptococcus pneumoniae comes in 2 forms that differ from one another in their microscopic
appearance and in their ability to cause disease. Cells of the pathogenic strain, which are lethal when
injected into mice, are encased in a slimy, glistening polysaccharide capsule, designated the S form. The
harmless strain of lacks this protective coat; it forms colonies that appear flat and rough, referred to as
the R form. Fred Griffith found in the 1920s that a substance present in the virulent S strain could
permanently change, or transform, the nonlethal R strain into the deadly S strain.
Avery, MacLeod, and McCarty in the 1930s prepared an extract from the disease-
causing S strain and identified the “transforming principle” that would permanently
change R-strain pneumococci into the lethal S strain as DNA. This was the first
evidence that DNA could
serve as the genetic material.
(A) In 1952, Hershey and Chase worked with T2 viruses, which are made of protein
and DNA. (B) To determine whether the genetic material of the T2 virus is protein or
DNA, the researchers radioactively labeled the DNA in one batch of viruses with 32P
and the proteins in a 2nd batch of viruses with 35S. These labeled viruses were then
allowed to infect E. coli, and the mixture was disrupted by brief pulsing in a Waring
blender to separate the infected bacteria from the empty viral heads. When
radioactivity was measured, they found that most of the 32P-labeled DNA had entered
the bacterial cells, while most of the 35S-labeled proteins remained in solution with the
spent viral particles.
WHAT IS A GENE?
In molecular terms, a GENE is the entire DNA sequence
required for synthesis of a functional protein or RNA molecule.
 A gene includes: exons (coding), control or regulatory
regions and introns (non-coding).
 Most bacterial and yeast genes lack introns, whereas most
genes in multicellular organisms contain them. The total
length of intron sequences often is much longer than that of
exon sequences.
 A simple eukaryotic transcription unit produces a single
monocistronic mRNA, which is translated into a single
protein.
Organization of genes on human chromosome 22
 A bacterial operon comprises a single transcription unit,
which is transcribed from a particular promoter into a single
primary transcript. Genes and transcription units are
distinguishable in prokaryotes.
 Most eukaryotic genes and transcription units generally are
identical, and the two terms are used interchangeably.
 A complex eukaryotic transcription unit is transcribed into a
primary transcript that can be processed into 2 or more different
monocistronic mRNAs depending on the choice of splice sites
or polyadenylation sites. Eukaryotic transcription units are
classified into 2 types, depending on the fate of the 10 transcript:
1. The 10 transcript produced from a simple transcription unit is
processed to yield a single type of mRNA, encoding a single
protein.
2. In complex transcription units, the 10 RNA transcript can be
processed in more than one way, leading to formation of
mRNAs containing different exons. Each mRNA is
monocistronic, with translation usually initiating at the first
AUG in the mRNA.
(Top) If a 10
transcript contains
alternative splice
sites, it can be
processed into
mRNAs with the
same 5’ and 3’
exons but different
internal exons.
(Bottom) If a 10
transcript has two
poly(A) sites, it can
be processed into
mRNAs with
alternative 3 exons.
If alternative promoters (f or g) are active in different cell types, mRNA1, produced in a cell
type in which f is activated, has a different exon (1A) than mRNA2 has, which is produced in
a cell type in which g is activated (and where exon 1B is used). Mutations in control regions
(a and b) and those designated c within exons shared by the alternative mRNAs affect the
proteins encoded by both alternatively processed mRNAs. In contrast, mutations (d and e)
within exons unique to one of the alternatively processed mRNAs affect only the protein
translated from that mRNA. For genes that are transcribed from different promoters in
different cell types (bottom), mutations in different control regions (f and g) affect expression
only in the cell type in which that control region is active.
Comparison of gene organization, transcription, and translation in prokaryotes and eukaryotes. (a)
The tryptophan (trp) operon is a continuous segment of the E. coli chromosome, containing 5 genes (blue) that encode
the enzymes necessary for the stepwise synthesis of tryptophan. The order of the genes in the bacterial genome parallels
the sequential function of the encoded proteins in the tryptophan pathway. (b) The 5 genes encoding the enzymes
required for tryptophan synthesis in yeast (Saccharomyces cerevisiae) are carried on 4 different chromosomes. Each
gene is transcribed from its own promoter to yield a primary transcript that is processed into a functional mRNA
encoding a single protein.
MAJOR CLASSES OF EUKARYOTIC DNA AND THE HUMAN GENOME
Representation of the nucleotide sequence content of the human genome
LINES, SINES, retroviral-like elements, and DNA-only transposons are all mobile genetic
elements that have multiplied in our genome by replicating themselves and inserting the new copies
in different positions. Simple sequence repeats are short nucleotide sequences (less than 14
nucleotide pairs) that are repeated for long stretches. Segmental duplications are large blocks of the
genome (1000–200,000 nucleotide pairs) that are present at two or more locations in the genome.
Over half of the unique sequence consists of genes and the remainder is probably regulatory DNA.
Most of the DNA present in heterochromatin has not yet been sequenced.
 PROTEIN-CODING GENES
1. Solitary genes - roughly 25–50% of the protein-coding
genes represented only once in the haploid genome
2. Duplicated genes constitute the second group of protein
coding genes with close but nonidentical sequences that
generally are located within 5–50 kb of one another. In
vertebrate genomes, duplicated genes constitute half the
protein-coding DNA sequences.
3. Gene family is a set of duplicated genes that encode
proteins with similar but nonidentical amino acid sequences.
The encoded, closely related, homologous proteins
constitute a protein family. A few protein families, such as
protein kinases, transcription factors, and vertebrate
immunoglobulins, include hundreds of members.
GENE FAMILY FUNCTION #
Translation, ribosomal structure and biogenesis 61
Transcription 5
Replication, repair, recombination 13
Cell division and chromosome partitioning 1
Molecule chaperones 9
Outer membrane, cell-wall biogenesis 3 Numbers of
Secretion 4 gene families,
Inorganic ion transport 9 classified by
Signal transduction 1
Energy production and conversion 18 function, that
Carbohydrate metabolism and transport 14 are common
Amino acid metabolism and transport 40 to all 3
Nucleotide metabolism and transport 15
Coenzyme metabolism 23 domains of
Lipid metabolism 8 the living
General biochemical function predicted; specific world
33
biological role unknown
Function unknown 1
 TANDEMLY REPEATED GENES encode rRNAs, tRNAs, histones
 rRNAs are encoded in tandem arrays in genomic DNA. Multiple
copies of tRNA and histone genes also occur, often in clusters, but
not generally in tandem arrays.
 REPETITIOUS DNA are concentrated in specific chromosomal
locations
1. Simple-sequence or satellite DNA consists largely of quite short
sequences repeated in long tandem arrays and is preferentially
located in centromeres (they assist in attaching chromosomes to
spindle fibers during mitosis), telomeres, and specific locations
within the arms of particular chromosomes.
 Repeats containing 1–13 bp are often called micro-satellites and
cause about 14 neuromuscular diseases (myotonic dystrophy,
spinocerebelllar ataxia).
 The length of a particular simple-sequence tandem array is quite
variable between individuals in a species. These differences form the
basis for DNA fingerprinting.
2. Mobile DNA elements are moderately repeated DNA
sequences interspersed at multiple sites throughout the
genomes of higher eukaryotes. They are less frequent in
prokaryotes.
a. DNA transposons are mobile DNA elements that
transpose to new sites directly as DNA.
b. Retrotransposons are first transcribed into an RNA copy
of the element, which then is reverse-transcribed into
DNA.
 A common feature of all mobile elements is the presence of
short direct repeats flanking the sequence.
 Enzymes encoded by mobile elements themselves catalyze
insertion of these sequences at new sites in genomic DNA.
Classification of mobile
elements into 2 major classes.
(a) Eukaryotic DNA transposons
(orange) move via a DNA
intermediate, which is excised from
the donor site.
(b) Retrotransposons (green) are
first transcribed into an RNA
molecule, which then is reverse-
transcribed into double-stranded
DNA. In both cases, the double-
stranded DNA intermediate is
integrated into the target-site DNA
to complete movement. Thus DNA
transposons move by a cut-and-
paste mechanism,
whereas retrotransposons move by
a copy-and-paste
mechanism.
General structure of bacterial IS elements
Retrotransposons are much more abundant in vertebrates. However, DNA transposons which
are similar in structure to bacterial IS elements occur (e.g., the Drosophila P element). The
relatively large central region of an IS element, which encodes one or two enzymes required
for transposition, is flanked by an inverted repeat at each end. The sequences of the inverted
repeats are nearly identical, but they are oriented in opposite directions. The sequence is
characteristic of a particular IS element. The 5 and 3 short direct (as opposed to inverted)
repeats are not transposed with the insertion element; rather, they are insertion-site sequences
that become duplicated, with one copy at each end, during insertion of a mobile element. The
length of the direct repeats is constant for a given IS element, but their sequence depends on
the site of insertion and therefore varies with each transposition of the IS element. Arrows
indicate sequence orientation.
 LTR retrotransposons or viral retrotransposons (8% of human
genomic DNA) are flanked by long terminal repeats (LTRs), similar
to those in retroviral DNA; they encode reverse transcriptase and
integrase.
 They move in the genome by being transcribed into RNA, which then
undergoes reverse transcription and integration into the host-cell
chromosome.
General structure of eukaryotic LTR retrotransposons.

The central protein-coding region is flanked by 2 long terminal repeats (LTRs), which are element-
specific direct repeats. Like other mobile elements, integrated retrotransposons have short target-
site direct repeats at each end. The protein-coding region constitutes 80% or more of a
retrotransposon and encodes reverse transcriptase, integrase, and other retroviral proteins.
Generation of retroviral genomic RNA from integrated retroviral DNA. The left LTR
directs cellular RNA polymerase II to initiate transcription at the first nucleotide of the left
R region. The resulting primary transcript extends beyond the right LTR. The right LTR,
now present in the RNA
primary transcript, directs cellular enzymes to cleave the primary
transcript at the last nucleotide of the right R region and to add a poly(A) tail, yielding a
retroviral RNA genome. A similar mechanism generates the RNA intermediate during
transposition of retrotransposons. The short direct-repeat sequences (black) of target-site
DNA are generated during integration of the retroviral DNA into the host-cell genome.
Model for reverse transcription of retroviral genomic RNA into DNA.
The genomic RNA is packaged in the virion with a retrovirus-specific
cellular tRNA hybridized to a complementary sequence near its 5’ end called the primer-binding site (PBS).
The retroviral RNA has a short direct-repeat terminal sequence (R) at each end. The overall reaction is
carried out by reverse transcriptase.
 Nonviral retrotransposons are the most abundant mobile
elements in mammals. They form two classes in mammalian
genomes: LINEs and SINEs (long and short interspersed
elements.
 Both LINEs and SINEs lack LTRs and have an A/T-rich stretch
at one end. They move by a nonviral retrotransposition
mechanism mediated by LINE encoded proteins involving
priming by chromosomal DNA.
 SINE sequences exhibit extensive homology with small
cellular RNAs transcribed by RNA polymerase III.
 Alu elements, the most common SINEs in humans, are ≈300-bp
sequences found scattered throughout the human genome.
General structure of a LINE
The length of the target-site direct repeats varies among copies of the
element at different sites in the genome. Although the full-length L1
sequence is ≈6 kb long, variable amounts of the left end are absent at
over 90% of the sites where this mobile element is found. The shorter
open reading frame (ORF1), ≈1 kb in length, encodes an RNA-
binding protein. The longer ORF2, ≈4 kb in length, encodes a
bifunctional protein with reverse transcriptase and DNA endonuclease
activity.
Proposed mechanism of LINE reverse
transcription and integration
Only ORF2 protein is represented. Newly
synthesized LINE DNA is shown in black.
 Some moderately repeated DNA sequences are derived from
cellular RNAs that were reverse-transcribed and inserted into
genomic DNA at some time in evolutionary history.
 Processed pseudogenes are derived from mRNAs, lack
introns; a feature that distinguishes them from pseudogenes,
which arose by sequence drift of duplicated genes.
The human –globin gene cluster contains two pseudogenes (white); these
regions are related to the functional globin-type genes but are not
transcribed. Each red arrow indicates the location of an Alu sequence, an
≈300-bp noncoding repeated sequence that is abundant in the human
genome.
Mobile DNA elements were earlier viewed as selfish
molecular parasites. Today, they are viewed as
contributors to the evolution of higher organisms by
promoting:
 the generation of gene families via gene duplication
 the creation of new genes via shuffling of preexisting
exons
 formation of more complex regulatory regions that
provide multifaceted control of gene expression
Mobile DNA elements most likely influenced evolution significantly by
serving as recombination sites and by mobilizing adjacent DNA
sequences. They have also been found in mutant alleles associated with
several
human genetic diseases.
Exon shuffling : recombination between homologous interspersed repeats.

Recombination between interspersed repeats in the introns of separate genes produces
transcription units with a new combination of exons.
A double crossover between two sets of Alu repeats results in an exchange of exons between
the two genes.
Exon shuffling: transposition of an exon flanked by homologous DNA
transposons into an intron on a 2nd gene. Transposase can recognize and cleave
the DNA at the ends of the transposon inverted repeats. In gene 1, if the
transposase cleaves at the left end of the transposon on the left and at the right
end of the transposon on the right, it can transpose all the intervening DNA,
including the exon from gene 1, to a new site in an intron of gene 2. The net
result is an insertion of the exon from gene 1 into gene 2.
Exon shuffling: transposition by integration of an exon into another gene via
LINE transposition. Some LINEs have weak poly(A) signals. If such a LINE is
in the 3-most intron of gene 1, during transposition its transcription may
cntinue beyond its own poly(A) signals and extend into the 3 exon,
transcribing the cleavage and polyadenylation signals of gene 1 itself. This
RNA can then be reverse transcribed and integrated by the LINE ORF2 protein
into an intron on gene 2, introducing a new 3 exon (from gene 1) into gene 2.

Anatomy of A Gene

Uploaded by

Copyright:

Available Formats

Anatomy of A Gene

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anatomy of A Gene

Uploaded by

Copyright:

Available Formats

Mrs.

OFELIA SOLANO SALUDAR

General structure of eukaryotic LTR retrotransposons.

Exon shuffling : recombination between homologous interspersed repeats.

You might also like