Anatomy of A Gene
Anatomy of A Gene
Anatomy of A Gene
Streptococcus pneumoniae comes in 2 forms that differ from one another in their microscopic
appearance and in their ability to cause disease. Cells of the pathogenic strain, which are lethal when
injected into mice, are encased in a slimy, glistening polysaccharide capsule, designated the S form. The
harmless strain of lacks this protective coat; it forms colonies that appear flat and rough, referred to as
the R form. Fred Griffith found in the 1920s that a substance present in the virulent S strain could
permanently change, or transform, the nonlethal R strain into the deadly S strain.
Avery, MacLeod, and McCarty in the 1930s prepared an extract from the disease-
causing S strain and identified the “transforming principle” that would permanently
change R-strain pneumococci into the lethal S strain as DNA. This was the first
evidence that DNA could
serve as the genetic material.
(A) In 1952, Hershey and Chase worked with T2 viruses, which are made of protein
and DNA. (B) To determine whether the genetic material of the T2 virus is protein or
DNA, the researchers radioactively labeled the DNA in one batch of viruses with 32P
and the proteins in a 2nd batch of viruses with 35S. These labeled viruses were then
allowed to infect E. coli, and the mixture was disrupted by brief pulsing in a Waring
blender to separate the infected bacteria from the empty viral heads. When
radioactivity was measured, they found that most of the 32P-labeled DNA had entered
the bacterial cells, while most of the 35S-labeled proteins remained in solution with the
spent viral particles.
WHAT IS A GENE?
In molecular terms, a GENE is the entire DNA sequence
required for synthesis of a functional protein or RNA molecule.
A gene includes: exons (coding), control or regulatory
regions and introns (non-coding).
Most bacterial and yeast genes lack introns, whereas most
genes in multicellular organisms contain them. The total
length of intron sequences often is much longer than that of
exon sequences.
A simple eukaryotic transcription unit produces a single
monocistronic mRNA, which is translated into a single
protein.
Organization of genes on human chromosome 22
A bacterial operon comprises a single transcription unit,
which is transcribed from a particular promoter into a single
primary transcript. Genes and transcription units are
distinguishable in prokaryotes.
Most eukaryotic genes and transcription units generally are
identical, and the two terms are used interchangeably.
A complex eukaryotic transcription unit is transcribed into a
primary transcript that can be processed into 2 or more different
monocistronic mRNAs depending on the choice of splice sites
or polyadenylation sites. Eukaryotic transcription units are
classified into 2 types, depending on the fate of the 10 transcript:
1. The 10 transcript produced from a simple transcription unit is
processed to yield a single type of mRNA, encoding a single
protein.
2. In complex transcription units, the 10 RNA transcript can be
processed in more than one way, leading to formation of
mRNAs containing different exons. Each mRNA is
monocistronic, with translation usually initiating at the first
AUG in the mRNA.
(Top) If a 10
transcript contains
alternative splice
sites, it can be
processed into
mRNAs with the
same 5’ and 3’
exons but different
internal exons.
(Bottom) If a 10
transcript has two
poly(A) sites, it can
be processed into
mRNAs with
alternative 3 exons.
If alternative promoters (f or g) are active in different cell types, mRNA1, produced in a cell
type in which f is activated, has a different exon (1A) than mRNA2 has, which is produced in
a cell type in which g is activated (and where exon 1B is used). Mutations in control regions
(a and b) and those designated c within exons shared by the alternative mRNAs affect the
proteins encoded by both alternatively processed mRNAs. In contrast, mutations (d and e)
within exons unique to one of the alternatively processed mRNAs affect only the protein
translated from that mRNA. For genes that are transcribed from different promoters in
different cell types (bottom), mutations in different control regions (f and g) affect expression
only in the cell type in which that control region is active.
Comparison of gene organization, transcription, and translation in prokaryotes and eukaryotes. (a)
The tryptophan (trp) operon is a continuous segment of the E. coli chromosome, containing 5 genes (blue) that encode
the enzymes necessary for the stepwise synthesis of tryptophan. The order of the genes in the bacterial genome parallels
the sequential function of the encoded proteins in the tryptophan pathway. (b) The 5 genes encoding the enzymes
required for tryptophan synthesis in yeast (Saccharomyces cerevisiae) are carried on 4 different chromosomes. Each
gene is transcribed from its own promoter to yield a primary transcript that is processed into a functional mRNA
encoding a single protein.
MAJOR CLASSES OF EUKARYOTIC DNA AND THE HUMAN GENOME
Representation of the nucleotide sequence content of the human genome
LINES, SINES, retroviral-like elements, and DNA-only transposons are all mobile genetic
elements that have multiplied in our genome by replicating themselves and inserting the new copies
in different positions. Simple sequence repeats are short nucleotide sequences (less than 14
nucleotide pairs) that are repeated for long stretches. Segmental duplications are large blocks of the
genome (1000–200,000 nucleotide pairs) that are present at two or more locations in the genome.
Over half of the unique sequence consists of genes and the remainder is probably regulatory DNA.
Most of the DNA present in heterochromatin has not yet been sequenced.
PROTEIN-CODING GENES
1. Solitary genes - roughly 25–50% of the protein-coding
genes represented only once in the haploid genome
2. Duplicated genes constitute the second group of protein
coding genes with close but nonidentical sequences that
generally are located within 5–50 kb of one another. In
vertebrate genomes, duplicated genes constitute half the
protein-coding DNA sequences.
3. Gene family is a set of duplicated genes that encode
proteins with similar but nonidentical amino acid sequences.
The encoded, closely related, homologous proteins
constitute a protein family. A few protein families, such as
protein kinases, transcription factors, and vertebrate
immunoglobulins, include hundreds of members.
GENE FAMILY FUNCTION #
Translation, ribosomal structure and biogenesis 61
Transcription 5
Replication, repair, recombination 13
Cell division and chromosome partitioning 1
Molecule chaperones 9
Outer membrane, cell-wall biogenesis 3 Numbers of
Secretion 4 gene families,
Inorganic ion transport 9 classified by
Signal transduction 1
Energy production and conversion 18 function, that
Carbohydrate metabolism and transport 14 are common
Amino acid metabolism and transport 40 to all 3
Nucleotide metabolism and transport 15
Coenzyme metabolism 23 domains of
Lipid metabolism 8 the living
General biochemical function predicted; specific world
33
biological role unknown
Function unknown 1
TANDEMLY REPEATED GENES encode rRNAs, tRNAs, histones
rRNAs are encoded in tandem arrays in genomic DNA. Multiple
copies of tRNA and histone genes also occur, often in clusters, but
not generally in tandem arrays.
REPETITIOUS DNA are concentrated in specific chromosomal
locations
1. Simple-sequence or satellite DNA consists largely of quite short
sequences repeated in long tandem arrays and is preferentially
located in centromeres (they assist in attaching chromosomes to
spindle fibers during mitosis), telomeres, and specific locations
within the arms of particular chromosomes.
Repeats containing 1–13 bp are often called micro-satellites and
cause about 14 neuromuscular diseases (myotonic dystrophy,
spinocerebelllar ataxia).
The length of a particular simple-sequence tandem array is quite
variable between individuals in a species. These differences form the
basis for DNA fingerprinting.
2. Mobile DNA elements are moderately repeated DNA
sequences interspersed at multiple sites throughout the
genomes of higher eukaryotes. They are less frequent in
prokaryotes.
a. DNA transposons are mobile DNA elements that
transpose to new sites directly as DNA.
b. Retrotransposons are first transcribed into an RNA copy
of the element, which then is reverse-transcribed into
DNA.
A common feature of all mobile elements is the presence of
short direct repeats flanking the sequence.
Enzymes encoded by mobile elements themselves catalyze
insertion of these sequences at new sites in genomic DNA.
Classification of mobile
elements into 2 major classes.
(a) Eukaryotic DNA transposons
(orange) move via a DNA
intermediate, which is excised from
the donor site.
(b) Retrotransposons (green) are
first transcribed into an RNA
molecule, which then is reverse-
transcribed into double-stranded
DNA. In both cases, the double-
stranded DNA intermediate is
integrated into the target-site DNA
to complete movement. Thus DNA
transposons move by a cut-and-
paste mechanism,
whereas retrotransposons move by
a copy-and-paste
mechanism.
General structure of bacterial IS elements
Retrotransposons are much more abundant in vertebrates. However, DNA transposons which
are similar in structure to bacterial IS elements occur (e.g., the Drosophila P element). The
relatively large central region of an IS element, which encodes one or two enzymes required
for transposition, is flanked by an inverted repeat at each end. The sequences of the inverted
repeats are nearly identical, but they are oriented in opposite directions. The sequence is
characteristic of a particular IS element. The 5 and 3 short direct (as opposed to inverted)
repeats are not transposed with the insertion element; rather, they are insertion-site sequences
that become duplicated, with one copy at each end, during insertion of a mobile element. The
length of the direct repeats is constant for a given IS element, but their sequence depends on
the site of insertion and therefore varies with each transposition of the IS element. Arrows
indicate sequence orientation.
LTR retrotransposons or viral retrotransposons (8% of human
genomic DNA) are flanked by long terminal repeats (LTRs), similar
to those in retroviral DNA; they encode reverse transcriptase and
integrase.
They move in the genome by being transcribed into RNA, which then
undergoes reverse transcription and integration into the host-cell
chromosome.
The human –globin gene cluster contains two pseudogenes (white); these
regions are related to the functional globin-type genes but are not
transcribed. Each red arrow indicates the location of an Alu sequence, an
≈300-bp noncoding repeated sequence that is abundant in the human
genome.
Mobile DNA elements were earlier viewed as selfish
molecular parasites. Today, they are viewed as
contributors to the evolution of higher organisms by
promoting:
the generation of gene families via gene duplication
the creation of new genes via shuffling of preexisting
exons
formation of more complex regulatory regions that
provide multifaceted control of gene expression
Mobile DNA elements most likely influenced evolution significantly by
serving as recombination sites and by mobilizing adjacent DNA
sequences. They have also been found in mutant alleles associated with
several
human genetic diseases.