An Introduction For Classical and Molecular Genetics

-
AN INTRODUCTION TO CLASSICAL AND MOLECULAR GENETICS

1
-
re> pr12 years, humankind has made great progress
unraveling the mysteries of inheritance. The
of
I
Robert P. Wagner
mi, baker's yeastthe rod-shaped
Afabidops& t heqa, and t
tiny, machine-like viruses that infect%a%t&fla.' &ch '
of these organisms has contributed to our
understanding of molecular interactions
-
Understanding Inheritance
hat like begets like-that what is now called a species begets offspring
of the same species-must have been evident to the earliest humans.
Recognition of the inheritance of variations within a species must also
have come early, since domestication of animals undoubtedly involved
elimination of individuals with undesirable characteristics (a penchant for
human flesh, for example). The first animals to be domesticated may well have been
members of the dog family, which were used as food, and domestication of canines
may have started even before the advent of Homo sapiens. The remains of an old
horninid relative of ours, Homo erectus (also known as Java or Peking man), have
been found associated with those of a dog-like animal in 500,000-year-old fossils.
The earliest canine remains associated with our own species are a mere 12,000 years
old. The domestication of food plants probably began between 8000 and 9000 years
ago, although some authorities contend that the domestication of cereals preceded
that of most animals.
Humans must also have very early related mating between "male" and "female"
animals, including humans, with the subsequent issuance of offspring. Sexual repro-
duction in plants was probably recognized much later-many plants, after all, are
discreetly bisexual-but at least 4000 years ago, as evidenced by the Babylonians'
selective breeding, through controlled pollination, of the date palm (Phoenix dactylif-
era), which occurs as separate male and female trees. (The dates borne by a female
tree result from fertilization of its eggs by sperm-containing pollen from male trees.)
The oldest recorded thoughts about heredity appear in the religious writings of the
ancient Hindus and Jews, which reveal recognition of the heritability of disease,
health, and mental and physical characteristics. The caste system of the Hindus, the
hereditary priesthood among the Jews of the tribe of Levi, and later, in Homer's time,
the inheritance of the gift of prophecy are a few reflections of ancient thinking about
the link between successive generations of humans. Some of those ideas, which of
necessity were based primarily on philosophical outlook rather than scientific fact,
are discussed briefly in "Early Ideas about Heredity."
The Dawn
The first significant advances toward our current understanding of inheritance came
in the late Renaissance with the work of the English physician William Harvey
(1578-1657) and the invention of the microscope (circa 1600). Harvey is best
known for his discovery of the dynamics of the circulation of the blood, but he also
propounded a new view about the relative importance of the contributions of male and
female animals to the creation of offspring. Previously, the female contribution, the
egg, had been regarded as mere matter, matter that assumes a form dictated entirely
by the male's semen. But Harvey proposed that both egg and semen guide the
development of an offspring. His observation of the eggs of many species led him to
conclude (in De generatione animalium, 1651) that "ex ovo omnia." That everything
arises from an egg was meant to apply to humans also, even though Harvey had
never seen the eggs of humans or any other live-bearing creature.
Los Alarms Science Number 20 1992
Ancient beliefs about heredity included
the idea that inborn characteristics are in-
herited from parents, as well as the idea that
they could be affected by external influ-
ences on the parents at conception or dur-
ing pregnancy. The biblical story of Jacob's
wages (Genesis, chapter 30) combines both.
Jacob had agreed to tend the flock of his
uncle and father-in-law, Laban, if he could
take when he left all the unusually colored
animals: the sheep with dark wool and the
goats with white streaks or speckles. But
Laban, a deceitful and greedy man, took his
few such animals three days' journey away.
The remaining stock he assumed would not
produce offspring of the colorations Jacob
had named. However, Jacob peeled tree
branches to make them striped and spotted
and stood them in the watering troughs
when the stronger goats were mating nearby.
The kids from those matings, unlike their
parents, had the markings that made them
his, and they were more vigorous than the
offspring of the weaker goats. He herded
the sheep so they faced Laban's dark-col-
ored goats; they then bore dark-colored
lambs. Today the appearance in offspring
of characteristics different from those of
either parent can be attributed to the com-
bined effects of the genetic contributions of
each parent (see "Mendelian Genetics").
The ancient Greeks gave considerable at-
tention to human inheritance in their writ-
ings. Plato, for example, made cogent state-
ments about human traits being determined
by both parents. He emphasized that people
are not completely equal in physical and
mental characteristics and that each person
inherits a nature suited to fulfilling only cer-
tain societal functions. Also prominent in
the thinking of the early Greeks was the
inheritance of acquired characteristics.
Aristotle, for example, wrote that
children are born resembling their par-
ents in their whole body and their indi-
vidual parts. Moreover this resemblance
is true not only of inherited but also of
acquired characters. For it has hap-
pened that the children of parents who
bore scars are also scarred in just the
same way in just the same place. In
Chalcedon, for example, a man who
had been branded on the arm had a
child who showed the same brand let-
ter, though it was not so distinctly marked
and had become blurred.
The idea that external influences play a role
in heredity persisted even until the early part
of the twentieth century. We now know that
the idea contains some truth. For example,
ionizing radiation, many chemicals, and in-
fection by some viruses can cause heritable
changes, or mutations, but generally those
changes are entirely random and cannot be
directed toward specific outcomes.
One of the more remarkable theories about
century A.D. and were even accepted by
Charles Darwin. Pangenesis was for some
reason dominant in the thinking of the phi-
losophers and theologians of the Middle
Ages. Albertus Magnus (1 193-1 280), his
pupil Thomas Aquinas (1 225-1 274), and
the naturalist Roger Bacon (circa 1220-
1294) all accepted pangenesis as a fact.
One variant of the theory was the idea that
both male and female produced semen.
According to Paracelsus (1 493-1 541), se-
men was an extract of the human body
containing all the human organs in an ideal
form and was thus a physical link between
successive generations.
Also prevalent during the Middle Ages was
the concept of entelechy, the Aristotelian
idea that the way an individual develops is
determined by a vital, inner force. The de-
termining force is provided by the male and
transmitted in his semen. The female pro-
vides no semen but only, so to speak, raw
material. Aristotle compared the roles of
male and female in the creation of an off-
spring with the roles of sculptor and stone in
the creation of a sculpture.
human inheritance, pangenesis, was de-
veloped in about the fifth century B.C. and
espoused by Hippocrates and his followers. Other forms of vitalism continued to be
Accordingtothattheory, semen wasformed popular even up to the beginning of the
in every part of the male body and traveled twentieth century primarily because people
through the blood vessels to the testicles, lacked knowledge about the nature of the
which were merely repositories. Variations physical connection between generations
of the theory lasted well into the ninteenth of animals and plants.
Number 20 1992 Los Alamos Science
3
With his naked eye Harvey could see no form in a newly laid, fertilized chicken egg.
But he assumed the form that did appear later arose epigenetically from matter that has
some sort of inherent, though invisible, organization. The theory of epigenesis-that
an organism arises from structural elaboration of formless matter rather than by
enlargement of a preformed entity-dates back to Aristotle, but Harvey differed
from Aristotle in seriously doubting that the living can arise from the nonliving.
Experimental justification for his doubt came about a century later.
Thoughts about heredity would probably not have advanced beyond Harvey's had it
not been for the compound microscope, an invention credited sometimes to Zaccharias
Janssen and sometimes to Galileo. Other Renaissance men noted for their discoveries
with the microscope and improvements to its design are regarded as the founders of
microscopy: Nehemiah Grew (164 1-1 7 12), Robert Hooke (1 635-1 703), Antoni van
Leeuwenhoek (1632-1723), Marcello Malpighi (1628-1 694), and Jan Swammerdam
(1637-1680). Their observations-among which were sperms in semen and structural
elements, dubbed cells by Hooke, in plant and animal tissues-formed the foundations
of the science now called cell biology.
Users of the early, low-resolution microscopes could (and did) let their imaginations
run wild. Some thought they saw miniature humans, homunculi, preformed in hu-
man sperms; others saw tiny animals, animalcula, preformed in animal eggs. Those
apparitions led to resurrection of the theory of preformation originally propounded
by Democritus and other Greeks. In the eighteenth century the preformation theory
developed into the encapsulation theory, which stated that, at the time of creation, all
future generations were packaged, one inside the other, within the primordial egg or
sperm. Logically, all life would come to an end when the last homunculus or animal-
culum was born. The encapsulation theory died=because it was ridiculous-although
many eminent biologists were its fierce advocates up to the beginning of the nine-
teenth century.
The higher-resolution microscopes of the later half of the eighteenth century allowed
Caspar Friedrich Wolff (17341794) to observe the development of chicken embryos.
His work clearly showed that the components of a new organism are not preformed
but, as stated two millenia before by Aristotle and a century before by Harvey, arise
from the undifferentiated matter of the fertilized egg.
The Great Awakening
Modem biology may be said to have been born in the nineteenth century, several hun-
dred years after the beginnings of modem chemistry and physics. Earlier biologists
were either physicians or naturalists (what we now call botanists and zoologists), and
their work focused on structure, physiology, and classification. But the nineteenth
century brought several developments that were basic to emergence of the newer
branches of biology, including cell biology and genetics.
Los Alamos Science Number 20 1992
The Rise of Cell Biology. During the first half of the nineteenth century, evidence
accumulated for the so-called cell theory, which states that the cell is the structural
and functional unit of all organisms. The diversity of cell shapes and sizes was
noted (see "The Variety of Cells"), and various intracellular structures were observed
(see "Components of Eukaryotic Cells"). Of particular importance to genetics is
the membrane-bound intracellular structure called the nucleus, which was found to
be a common feature of the cells of all organisms more complex than bacteria and
blue-green algae. Organisms possessing a nucleus were classified as eukaryotes, and
organisms lacking a nucleus were classified as prokaryotes.
Later, during the early 1850s, came the momentous finding, embraced in the aphorism
omnis cellula e cellula, that cells divide to form new cells. A leading proponent of
the idea that all cells come from cells was the German physician Rudolph Virchow
(1821-1902). A cancer specialist, among other things, Virchow asserted that cancer
cells arise from cells pre-existing in the body and do not, as earlier physicians had
thought, arise by spontaneous generation from unorganized matter.
Another development was the realization that gametes (sperms and eggs) are also
cells, in particular cells specialized for transmitting information from one generation
of a sexually reproducing organism to the next. The remarkable difference in size
between sperms and eggs was found to be due to cell components other than their
nuclei, and that observation, coupled with the belief that sperms and eggs contain the
same amount of hereditary information, indicated that hereditary information resides
in the nuclei of gametes. The nucleus was found to be the site also of the information
transmitted from one cellular generation to the next.
The above developments led to formulation of the law of genetic continuity, which
succinctly summarizes what was probably the most important advance toward the
understanding of living systems up to that time: Life comes only from life through
the medium of cells.
By the late 1880s hereditary information had been localized farther to intranuclear
elements that can be seen with the microscope during the mitotic phase of the
cell cycle, the phase that culminates in cell division (see "The Eukaryotic Cell
Cycle"). The elements, which were named chromosomes because they can be
stained (selectively colored) with certain dyes, are most easily observed during the
portion of the mitotic phase called metaphase. (We now know that each "metaphase
chromosome" consists of two duplicates of a single chromosome bound together
along a more or less central region.)
Facts accumulated about chromosomes (see "Chromosomes: The Sites of Hereditary
Information"). All the somatic cells (cells other than gametes) of a sexually repro-
ducing organism have the same even number of chromosomes, the so-called diploid
number, whereas all its gametes have the same so-called haploid number of chromo-
somes, which is exactly one-half the diploid number. Furthermore, the diploid and
Number 20 1992 Los A l m s Science
Cells vary in shape from the
most simple to the indescribably
complex. Shown here are electron
micrographs of a few examples
from nature's cornucopia.
Escherichia coli, the most studied
of all bacteria
From Molecular Biology of the Cell, second
edition, by Bruce Alberts et al. Copyright 1989
by Garland Publishing, Inc. Reprinted with
permission. Courtesy of Tony Brain and
the Science Photo Library.
Mouse fibroblast during the
final stage of cell division
From Molecular Biology of the Cell,
second edition, by Bruce Alberts et al.
Copyright 1989 by Garland Publishing, Inc.
Reprinted with permission. Courtesy of
Guenter Albrecht-Buehler.
M
^TII^\fllI- - - I -
- - - - --
-- -
COMPONENTS OF EUKARYOTIC CELLS 1
apparatus (where various macro-
molecules are modified, sorted,
and packaged for secretion from
the cell or for distribution to other
organelles), an endoplasmic
reticulum (the principal site of
protein synthesis), and a nucleus
(the residence of chromosomes
and the site of DNA replication and
transcription). The nucleolus is the
site of ribosomal-RNA synthesis.
The organelles unique to plant
cells are chloroplasts (the sites of
photosynthesis in green plants) and
vacuoles (water-filled compartments
that serve as space fillers and as
Figure adapted (with permission) from an
illustration in Genes and Genomes by
Maxine Singer and Paul Berg (University
Science Books, 1991).
include mitochon-
dria (the sites of energy production
by oxidation of nutrients), a Golgi
\
Golgi Apparatus
\
Endoplasmic reticulum
'1 \
- , - - -
: . a
storage vessels). Plant cells differ fro
' animal cells also in being surrounded
by a cellulose cell wall, a much more
rigid form of the extracellular matrix
that surrounds animal cells.
Vacuole -----
8 Los Alamos Science Number 20 1992
1 THE EUKARYOTIC CELL CYCLE
Interphase Mitotic phase 7 - -
A , e x
Generation time
Time -
The term "cell cyclen refers collectively to
the events that occurwithin a eukaryoticcell
between its birth by mitosis and its division,
again by mitosis, into two daughter cells.
The cell may be either a one-celled organ-
ism such as baker's yeast (Saccharomyces
cerevisiae) or a somatic cell of a multicellu-
lar organism. Earlystudies of the eukaryotic
cell cycle concentrated on the microscopi-
cally visible and dramatic physical events of
the cell-division, or mitotic, phase (M). On-
set of the mitotic phase is signaled by the
appearance of microscopically visible worm-
like bodies within the nucleus, that is, by the
condensation of duplicated chromosomes
into a much less diffuse configuration. The
mitotic phase ends when the cell separates
into two daughter cellls, each of which then
embarks on its own cycle. (Details of the
mitotic phase are presented in "Mitosis.")
Because the early microscopic studies re-
vealed little physical activity during the por-
tion of the cell cycle that precedes the
mitotic phase (other than a relatively small
increase in cell size), that portion was inap-
propriately named the resting phase, or
interphase. We now know that most of the
biosynthetic activity required of a cell-both
for its own maintenance and reproduction
and for its function or functions as a con-
stituent of a multicellularorganism-occurs
during interphase.
Most of the biochemicals produced by a cell
are synthesized throughout interphase.
DNA is a notable and easily detected ex-
ception, and for that reason interphase is
subdivided into the period between cell birth
and the onset of DNA synthesis (GI), the
period of DNA synthesis (S), which ends
when all the nuclear DNA has been repli-
cated and hence the number of chromo-
somes has doubled, and the period be-
tween the end of DNA synthesis and the
beginning of the mitotic phase (GJ. After a
cell has entered S, it is committed to com-
pleting the cell cycle, even when environ-
mental conditions are extremely adverse.
The length of the cell cycle, the generation
time, varies with environmental conditions
and among species and cell types. For
example, epithelial cells, the cells that line
the interior and exterior surfaces of the
human body, have relatively short genera-
tion times (about eight hours); fibroblasts,
cells that assist in healing wounds, com-
plete their cell cycle only on demand; mature
red blood cells never undergo mitosis; and
embryonic cells divide very rapidly. Ob-
served generation times for those cells that
do have a regular cycle range from about a
few minutes to a few months. The variation
in generation time is due mainly to a varia-
tion in the,length of GI and of G. The mitotic
phase of most species and most cell types
occupies only about 10 percent of the
generation time.
The cell cycle of bacteria, in addition to
being shorter (typically less than an hour), is
also less complex. In particular, DNA is
synthesized continuously, the two copies of
the single bacterial chromosome do not
undergo extensive condensation before cell
division, and a mechanism simpler than the
one illustrated in "Mitosis" assures parcel-
ing out of one chromosome copy to each
daughter cell.
Wi t hi n the nucleus of each cell of a
eukaryotic organism are a number of
chromosomes, each composed of a
single molecule of DNA (see "DNA: Its
Structure and Components") and a
roughly equal mass of proteins
(primarily the proteins called histones).
The DNA molecule carries hereditary
information; the proteins help effect
the ordered condensation, or
compaction, of the very long, very
thin DNA molecule. During most of a
cell's life, its chromosomes are too
decondensed to be visible with an
optical microscope. However, during
metaphase, a phase preparatory to cell
division (see "Mitosis" and "Meiosis"),
the chromosomes become highly con-
densed and hence easily visible. Most
studies of chromosomes are therefore
carried out on chromosomes extracted
from cells arrested at metaphase.
Each such "metaphase chromosome"
consists in reality of two duplicates
of a single chromosome bound
together along a somewhat constricted
region called a centromere. The three
micrographs of metaphase chromo-
somes shown here illustrate some
general facts about chromosomes.
Shown above are the metaphase chromo-
somes extracted from a root-tip cell of maize
(Zea mays), The chromosomes were stained
with a fluorescent dye and photographed
through an optical microscope while being
illuminated by a laser that excites the dye's
fluorescence. (The chromosomes could
have been stained instead with a nonfluo-
rescent dye.) A total of twenty metaphase
chromosomes is visible in the micrograph,
and any somatic cell (any cell other than an
egg or a sperm) of any Zea mays plant
possesses that same number of metaphase
chromosomes. In general, all the somatic
cells of all the members of a species pos-
sess the same even number of metaphase
chromosomes, called the diploid chromo-
some number. The diploid chromosome
x about 550
number varies erratically from species to
species: the known values range from 2 to
many hundreds. (Note that the diploid chro-
mosome number is not a measure of a
species' evolutionary status.) The twenty
metaphase chromosomes of Zea mays
obviously exhibit different morphologies, that
is, different sizes and centromere positions.
However, even the untrained observer might
notice that the two highlighted metaphase
chromosomes lookvery much alike. In fact,
the twenty metaphase chromosomes of Zea
mayscan be grouped into ten homologous,
or morphologically indistinguishable, pairs.
The metaphase chromosomes of all eu-
karyotic species occur as homologous pairs,
and that general fact is due to the occur-
rence of chromosomes themselves as ho-
mologous pairs. Furthermore, the homol-
ogy of a pair of chromosomes is due to a
high degree of similarity between the base
sequences of their constituent DNA mol-
ecules. (Micrograph courtesy of Paul Jack-
son and Jerome Conia.)
Shown at right are the metaphase chromo-
somes extracted from a somatic cell of a
house mouse (Musmusculus). To help iden-
tify homologous pairs, the chromosomes
were stained with a dye called Giemsa that
produces a pattern of dark and light bands,
a pattern that varies from one homologous
pair to another. The chromosome images
have been grouped in homologous pairs
and arranged in order of decreasing size.
Such a display of metaphasechromosomes
is called a karyotype. The last entry in the
karyotype is the pair of chromosomes that
are involved in determining sex. Because
this particular mouse cell posseses two
homologous sex chromosomes, it is a cell
from a female mouse. Cells of a male
mouse possess two nonhomologous sex
chromosomes, one X chromosome and a
smaller Y chromosome.
x about 650
x about 750
Shown at left is the karyotype of a human
prepared from the Giemsa-stained met-
aphase chromosomes of a lymphocyte. Note
the twenty-two homologous pairs of auto-
somes (chromosomes other than sex chro-
mosomes) and the two nonhomologous
sex chromosomes. The nonhomology of
the sex chromosomes indicates that this is
the karyotype of a male human, namely of
the well-known cytogeneticist T. C. Hsu of
the University of Texas System Cancer
Center. (Both of the karyotypes on this
page were provided by T. C. Hsu.)
haploid chromosome numbers are constant among different members of the same
species but vary among different species. For example, all somatic cells of all
members of the species Homo sapiens contain forty-six chromosomes, all somatic
cells of all members of the species Drosophila melanogaster (a fruit fly) contain
eight chromosomes, all somatic cells of all members of the species Pisum sativum
(the garden pea) contain fourteen chromosomes, and all somatic cells of all members
of the species Mus muscuius (the house mouse) contain forty chromosomes. And all
the gametes of all members of each of the above species contain twenty-three, four,
seven, and twenty chromosomes, respectively. Second, the metaphase chromosomes
within a single cell vary morphologically (in size and shape), but the variations
remain constant among all cells of all members of a single species. (We now know
that exceptions to the above generalizations occur and that the exceptions are often
causes or symptoms of disease.)
The morphological differences among the metaphase chromosomes of a species led to
recognition that metaphase chromosomes occur as morphologically indistinguishable
(homologous) pairs. Although the members of a pair of homologous metaphase
chromosomes are indistinguishable by any low-resolution physical technique, they do
differ, as we now know, in fine details of the nucleotide sequences of their constituent
DNA molecules. The occurrence of metaphase chromosomes as morphologically
indistinguishable pairs is due to the occurrence of chromosomes themselves as
homologous pairs, pairs whose constituent DNA molecules have nearly identical
nucleotide sequences.
An exception to the occurrence of chromosomes as homologous pairs should be noted.
Males of some species, including all mammals and Drosophila melanoguster, possess
two chromosomes, called the X and Y chromosomes, that do not form a homologous
pair. the Y chromosome generally being much smaller than the X chromosome.
Females of such species possess two X chromosomes, each of which is homologous
to the other and to the X chromosome of the male. Collectively, the X and Y
chromosomes are called sex chromosomes; the remaining chromosomes are called
autosomes. In the case of humans and other placental mammals, the presence of a
Y chromosome is necessary for maleness (the presence of testes), but in the case of
other species, including D. melanogaster, the presence of a Y chromosome, although
necessary for fertility, is not necessary for maleness.
Also observed during the late nineteenth century were microscopic details of cell
division and the effect of cell division on chromosomes. Mitosis, the type of
cell division undergone by all somatic cells other than the immediate precursors
of gametes, was found to yield two daughter somatic cells with the same diploid
number of chromosomes as the mother cell (see "Mitosis"). Furthermore, the
German zoologist Theodor Heinrich Boveri (1862-1915) found that the metaphase
chromosomes of a mother cell and a daughter cell had the same morphologies.
Those observations indicated that each chromosome in the mother cell is somehow
duplicated before the cell undergoes mitosis.
Meiosis, the type of cell division undergone by the precursors of gametes, was
found to be a much more complex process than mitosis. It involves two successive
cell divisions and can yield, four gametes each containing one-half the number
of chromosomes as the precursor cell. (Thus meiosis also must be preceded by
chromosome duplication.) Furthermore, the haploid set of chromosomes in each
gamete is not a haphazard selection from the diploid set of the mother cell. Instead
each gamete is endowed with a randomly selected member of each pair of homologous
chromosomes in the mother cell (see "Meiosis"). That is, the probability of a gamete's
being endowed with one member of a pair of homologous chromosomes is the
same as the probability of its being endowed with the other member, and, equally
important, the outcome of its endowment with a member of one pair of homologous
chromosomes has no effect on the outcome of its endowment with a member of
another pair. In other (and more arcane) words, meiosis equally segregates each
pair of homologous chromosomes and independently assorts the complete set of
homologous chromosomes.
The X chromosome and the Y chromosome of a male also were found to segregate
equally during meiosis, even though they are not homologous in the sense of
being physically indistinguishable. That fact implies that a male produces two
equally probable sperm types, one containing a Y chromosome and the other an X
chromosome. Thus fertilization of an egg by a sperm results in two equally probable
combinations of sex chromosomes, XY and XX.
The equal segregation and independent assortment of chromosomes during meiosis
leads to diversity among the chromosome sets of the offspring of sexually reproducing
organisms. Consider, for example, an organism that possesses but two pairs of
homologous chromosomes denoted by 1 and 1' and 2 and 2'. Such an organism
produces, with equal probability, four types of gametes, those containing 1 and 2,
1 and 2', 1 and 2, and 1' and 2'. If the organism is self-fertilizing (as are many
plants and lower animals), then of the sixteen possible types of offspring, only four
possess a set of chromosomes identical to the parental set. In contrast, bacteria
reproduce asexually by a type of cell division that, like mitosis, yields only genetic
replicas of the mother cell. (Bacteria are not, however, genetically immutable, since
various mechanisms can effect changes in their genetic material, which are then
transmitted to their offspring.) In general, if a sexually reproducing organism has
N pairs of homologous chromosomes, it can produce 2N types of gametes, and
if it is self-fertilizing, only 2Nof the V possible types of offspring possess a
set of chromosomes identical to the parental set. In other words, the probability
of an offspring's possessing a set of chromosomes identical to the parental set is
1/2". When N equals twenty-three, that probability equals 118,388,608, a very
small number. The probability of human parents producing an offspring with a set of
chromosomes identical to that of either parent is even closer to zero, since although
humans do possess twenty-three pairs of equally segregating and independently
assorting chromosomes, they are not of course self-fertilizing. Discussed later is
a process that leads to even more differences among the chromosome sets of sexually
Number 20 1992 Los Alums Science
Mi t osi s i s the type of
cell division that
produces two daughter
cells from a single
mother cell. Each
daughter cell has a set
of chromosomes
identical to the set
possessed by the
mother cell. Mitosis i s
Mother cell
Centrosome
Nuclear
membrane
Homologous
chromosome
pair
Centromere
Sister-
chromatid
pair
the mechanism whereby Mitotic
a multicellular organism spindle
increases in size and Microtubule
replaces dead cells and
whereby single-celled
eukaryotic organisms
reproduce asexually.
The interested reader
can find a striking series
of photomicrographs of
mitosis i n the lily
Haemanthus katherinae
on page 7 of Genes and
Genomes: A Changing
Perspective by Maxine
Singer and Paul Berg
(University Science
Books, 1991).
Daughter cells
INTERPHASE
Gl-During GI (see "The Eukaryotic Cell Cyclen) the chromosomes of
the mother cell are very long and very thin. Only two of the cell's Npairs
of homologous chromosomes are shown, and the members of each
homologous pair are depicted in different shades of the same color. The
centrosome is the source of fibrous proteins called microtubules. One
function of microtubules is to direct the motion of chromosomes during
mitosis (and meiosis).
G,-The mother cell has replicated its complement of chromosomes
(during the preceding S phase) and all other cellular material required
for cell division, including the centrosome. The two identical copies of
each chromosome are bound together along their centromeres into a
so-called sister-chromatid pair.
MITOTIC PHASE
Prophase
The onset of mitosis is signaled by the ordered compaction, or conden-
sation, of chromosomes into microscopically visible threads. Microtu-
bules radiating from the two centrosomes collectively compose the
mitotic spindle.
Prometaphase
The chromosomes have condensed further, and the centrosomes have
migrated to opposite sides of the cell. Disintegration of the nuclear
membrane has allowed microtubules to bind to each chromosome at a
region within its centromere.
Metaphase
The chromosomes have assumed their most condensed configuration,
and the sister-chromatid pairs have assumed the familiar X shape.
Under the influence of opposing forces exerted by microtubules radiat-
ing from both centrosomes, each sister-chromatid pair has become
aligned along the midplane of the cell.
Anaphase
The bond joining each sister-chromatid pair has broken, and the
members of each former sister-chromatid pair have begun moving
toward opposite sides of the cell. As a result, a set of chromosomes
identical to the set initially possessed by the mother cell becomes
segregated in each side of the cell. The cell has begun to elongate and
narrow at the midplane.
Telophase
A new nuclear membrane has formed around each segregated set of
chromosomes, the chromosomes have begun to decondense, and the
cell has begun to divide.
INTERPHASE
GICleavag of the extranuclear cellular mate-
rial has produced two daughter cells, and the
chromosomes in each have decondensed further
in preparation for the biosynthetic activities of G,.
PREMEIOTIC PHASE
Germ-line cell
Centrosome
The germ-line cell, whch may be an oogonium in an ovary or a spermatogo-
nium in a testis, appears little different from a somatic cell in G,. Only two of
Nuclear
the germ-line cell's N pairs of homologous chromosomes are shown, and the
membrane
members of each homologous pair are depicted in different shades of the Homologous
same color. chromosome
pair
Meiosis is the type
of cell division that
produces the gametes
(eggs and sperms)
The germ-line cell has replicated its complement of chromosomes and all
centromere
whose union is the first
other cellular material required for cell division, including the centrosome. The
two identical copies of each chromosome are bound together along their
step in the creation of a
centromeres into a sister-chromotid pair. new human or other
Sister-
chromatid sexually reproducing
pair
organism. Only
MEIOTIC PHASE so-called germ-line
cells undergo meiosis,
Prophase I
The onset of meiosis is signaled by a limited condensation of chromosomes. and each gamete
Homologoussister-chromatid pairs have become closely associated, forming
contains a haploid set
N tetrads and allowing "crossing over" to occur, here within only one tetrad.
Crossing over results in the exchange of corresponding portions of homolo-
of chromosomes-a set
gous chromosomes. The germ-line cell now lingers in prophase I for a time
composed of one
that ranges, depending on the species, from a few days to many years.
4~
crossing
member of each of the
Metaphase I
The germ-line cell has passed through prometaphase I (not shown) and has
entered metaphase I. The chromosomes have fully condensed, and the
tetrads have become aligned along the midplane of the cell.
Anaphase I
The members of each tetrad have separated and begun moving toward
opposite sides of the cell. Depicted here is but one of the 2N possible
outcomes of the motion of the members of the Ntetrads. The equal probability
of each possible outcome is the physical basis for Mendel's laws of equal
segregation and independent assortment.
4N
Prophase II
The germ-line cell has passed through telophase I (not shown)
and has divided into two cells, each of which has entered
prophase II. Note that the products of the first meiotic division,
like the products of mitosis, have the same number of chromo-
somes as the original cell. However, a product of mitosis
contains N homologous chromosome pairs, whereas a prod-
uct of the first meiotic division contains two identical copies of
each of N nonhomologous chromosomes.
Anaphase II
Both cells have passed through prometaphase II and meta-
phase II (not shown). Each sister-chromatid pair has sepa-
rated, and the members of each former sister-chromatid pair
have begun migrating to opposite sides of the cell.
POSTMEIOTIC PHASE
N pairs of homologous
chromosomes possessed
by the diploid germ-line
cell. The transition from
diploidy to haploidy is
accomplished by two
successive partitions of
nuclear material. During
each partition the motions
of the chromosomes are
directed, as they are
during mitosis, by
microtubules radiating
from two centrosomes.
Each cell has passed through telophase II (not shown) and
divided into two gametes. Thus each meiosis can yield four
gametes. However, meiosis of an oogonium usually yields
only one egg because each division of extranuclear material
usually yields only one cell that survives because it receives
most of the extranuclear material.
Gametes
reproducing organisms and their offspring: the "crossing over" that occurs between
homologous chromosomes during the first stage of meiosis (see "Meiosis"). Together,
crossing over and equal segregation and independent assortment essentially guarantee
that in the whole history of Homo sapiens, no two individuals (except the pairs of
identical twins arising from single fertilized eggs) have been alike genetically.
The facts that accumulated about chromosomes and their behavior during mitosis
and meiosis suggested that the link between generations (of cells or organisms)
was a substance present in chromosomes. In 1896 the American cell biologist
Edmund Beecher Wilson (1856-1939) suggested that the substance of inheritance
was the "nuclein" isolated in 1874 by the Swiss chemist Johann Friedrich Miescher
(1844-1895) from the nuclei of human pus cells and salmon sperms. Nuclein
was found to be composed of two types of chemicals, a nucleic acid and various
"albumins," or proteins. By the end of the century, the most advanced thinkers
about the mechanism of inheritance, such as Wilson, Boveri, and August Friedrich
Leopold Weismann (1834-1915), were of the opinion that nuclein was the stuff of
inheritance.
A Theory of Inheritance. The nineteenth century was the setting also for the elegant
work of the Austrian Gregor Johann Mendel (1822-1884), an Augustinian monk
better versed in mathematics and physics than in biology. In 1865 Mendel published
visionary explanations for the results of his plant-breeding experiments. Among
them was the notion that discrete units of heredity (which he called Merkmale and
we call genes) are passed unchanged from generation to generation even though
each unit is not necessarily expressed as an observable trait in every generation.
He also proposed that each plant possesses two such units for each observable
trait, one inherited from its male parent and the other from its female parent.
Mendel developed statistical laws for predicting how the paired units of heredity
are parceled out to offspring. The laws are now known to be applicable (within
certain limits) to all sexually reproducing organisms. Furthermore, Mendel's laws
parallel the behavior of homologous chromosome pairs during meiosis (the equal
segregation of a single chromosome pair and the independent assortment of different
chromosome pairs) because, as we now know, Mendel's units of heredity reside on
chromosomes. Remarkably, Mendel deduced his theory before chromosomes were
identified as the probable carriers of genetic information. Hiqroposals are discussed
here out of chronological order because their significance to the emerging science
of genetics was not grasped-and probably could not have been grasped-until after
the observed behavior of chromosomes during meiosis could provide a physical basis
for his abstract theory. Mendel's publication remained unknown, in fact, until 1900
when, working independently, the German botanist Karl Erich Correns (1 864- 1 93 3),
the Dutch botanist Hugo De Vries (1848-1935), and the Austrian botanist Erich
Tschermak von Seysenegg (1871-1962) performed similar experiments, arrived at
similar explanations, and brought Mendel's publication to light, garnering him well-
deserved albeit posthumous fame.
To best appreciate Mendel's work, one needs to know something about the successes
and shortcomings of previous efforts at selective breeding of plants and animals.
Selective breeding was certainly well under way in the Neolithic period, and numerous
early successes produced most of the strains of domestic plants and animals now in
existence. Some of the plant-breeding efforts led to plants so different from their
ancestral relatives that they can be considered hurnan-made species. Notable examples
are today's Zea mays (maize, or corn) and Solanum tuberosum (the potato plant).
Natives of present-day Mexico began developing maize from tiny-eared relatives
between 4000 and 5000 years ago, and the pre-Columbian inhabitants of present-
day Peru and Bolivia developed a plant producing palatable tubers from relatives
producing tubers so bitter as to be inedible. When introduced into the Old World
in the sixteenth century, maize and the potato had a tremendous influence on the
world's economy. The potato, for example, replaced wheat and rye in the cool
areas of northern Europe as a staple food because it produces more calories per acre.
(Only rice is as efficient a calorie-producer as the potato, and rice is a warrn-climate
plant.) The introduction of maize and the potato is thought by some historians to
have significantly accelerated the great increase in the rate of population growth of
western Europe that. began in about the fourteenth century.
Successful as the early breeding efforts were, and those of the noted eighteenth-
century plant breeders Josef Gottlieb Koelreuter (1733-1806) and Joseph Gaertner
(1732-1791), they certainly were not what we would now call scientific, since in
general the outcomes of breedings were quite unpredictable. In contrast, Mendel's
aim at the outset of his eight-year effort was to ascertain the statistical rules governing
the inheritance of variable traits. Both his methodology and his theoretical conclusions
are the foundation for all future studies in genetics.
Mendel chose to work with a plant that exhibits distinct variants of a number of
traits, the garden pea (Pisum sativum). He concentrated on two variants of each
of seven traits, including pod color (green and yellow) and flower color (violet and
white). His unique experimental approach began by allowing plants that bore, say,
green pods to self-pollinate for a sufficient number of generations to assure that each
new generation of self-pollinated plants would also bear only green pods. Since
each of the fourteen purebred strains consistently bore only one variant of each of a
single trait, the purebred strains were advantageous to Mendel's work, providing a
certain and observable starting point and amounting, essentially, to a control on his
experiments. Mendel proceeded to study the inheritance of each of the seven traits,
first one at a time and then in pairs. All of the experiments on the inheritance of
single traits followed the same pattern as that described here for pod color.
First, Mendel cross-pollinated the two strains purebred for pod color, the strain bred
true for green pods and the strain bred true for yellow pods. (Together the two
purebred strains are called the parental generation.) Regardless of which strain he
used as the male (pollen-contributing) parent, all the resulting offspring (called here
hybrids or members of the first generation) bore only green pods. Today we would
say that all members of the first generation exhibited the same phenotype, a term
introduced in 1909 by the Danish botanist Wilhelrn Ludwig Johannsen (1857-1927).
S yrnbolically ,
parental generation + first generation,
and in particular,
purebred green x purebred yellow - hybrids, all green.
The natural question to ask is: Has the capacity to produce the yellow-pod phenotype
disappeared altogether, or is it still present but somehow suppressed in the first-
generation hybrids? To find out, Mendel selfed the hybrids (that is, he allowed them
to self-pollinate), and he observed that the yellow-pod phenotype reappeared among
the resulting offspring (the second generation). When Mendel counted the number
of second-generation offspring exhibiting each phenotype (a novel procedure at the
time), he found that the ratio of green-podded plants to yellow-podded plants was
approximately 3 to 1. Symbolically,
first generation -s- second generation
and in particular,
green hybrid x green hybrid - 3 green : 1 yellow.
To find out whether any members of the second generation had the capacity to produce
offspring with the phenotype they themselves did not exhibit, Mendel selfed the
members of the second generation. He found that all the yellow-podded members
behaved like plants purebred for yellow pod color; that is, they produced only yellow-
podded offspring. In contrast, only one-third of the green-podded members of the
second generation behaved like plants purebred for green pod color, whereas the
remaining two-thirds behaved like the first-generation hybrids, producing both green-
and yellow-podded progeny in the ratio of 3 to 1. In other words, the ratio 3 green: 1
yellow exhibited by the second generation is more accurately described as the ratio 1
pure green:2 hybrid green: 1 pure yellow. Mendel continued selfing the green-podded
members of successive generations and always found that approximately two-thirds
of the green-podded progeny of green hybrids were again green hybrids, behaving
just like the first-generation hybrids. That is, when those two-thirds were allowed to
self-pollinate, they produced green- and yellow-podded progeny in the approximate
ratio of 3 to 1.
To explain the mathematical regularity of his results, Mendel advanced a theoretical
model of inheritance. First, and most basic, is the idea that the fertilized egg (zygote)
from which a plant develops contains two genes, or units of heredity, for pod color,
one contributed by the egg and the other contributed by the sperm. ("Gene" is another
term coined by Johannsen.) Mendel also proposed that there were two distinct genes
for pod color, one for green and one for yellow. The gene for green pod color he
called dominant (and designated it by a capital letter, say PI because any plant that
carried that gene bore green pods. The
gene for yellow pod color he called re-
cessive (and designated it by a lower-
case letter, p). Today we say P and p
are different forms, or alleles, of the gene
for pod color. Since the egg and sperm
each contain only one allele, a fertilized
egg contains one of three possible allele
pairs (or possesses one of three possible
genotypes, another word coined by Jo-
hanssen): PP, Pp, or pp. Mendel pro-
posed that the plants purebred for green
pod color contained the pair PP, those
purebred for yellow pod color contained
the pair pp, and the hybrid plants, which
bore only green pods but produced both
green- and yellow-podded progeny when
allowed to self-pollinate, contained the
pair Pp. In modem terminology plants
possessing the genotype PP are said to
be homozygous dominant; those possess-
ing the genotype pp are homozygous re-
cessive; and those with the genotype Pp
are heterozygous. This terminology and
other nomenclature of genetics is illus-
trated in the table.
Trait Phenotypes Genotypes Alleles Gene
PP
(homozygous
dominant)
Green /
(dominant) ----.+
^ p
PP
/ (dominant)
/ (heterozygous)
Pod color \
' h \
Pod-color
/ gene
Yellow PP < P
(recessive) (homozygous (recessive)
recessive)
FF
(homozygous
dominant)
violet 7 \ F
(dominant)
Ff 6
(dominant)
/ (heterozygous) \
Flower color'
White f f < ~ f
(recessive) (homozygous (recessive)
recessive)
With those hypotheses and the laws of probability Mendel constructed a probabilis-
tic model that explained the results of his experiments. The model is shown in
"Mendelian Genetics." The element of chance is operative in both the formation of
gametes (eggs and sperms) and in the formation of zygotes (fertilized eggs). Mendel
assumed that during the formation of gametes, the pair of alleles for pod color sepa-
rates (or segregates) equally; in other words, the probability that a gamete will receive
one or the other of the pair is equal to one-half. He therefore predicted correctly that
among the gametes produced by a green hybrid (a plant heterozygous for pod color),
.
approximately one-half would contain P and the remainder would contain p. Be-
cause, as is now known, each member of the allele pair for a given trait resides at
the same location on one or the other of a pair of homologous, equally segregating
chromosomes, only one allele enters each gamete. Therefore, the behavior of a single
allele pair during meiosis is known as Mendel's law of equal segregation.
The element of chance is also operative in the random union of an egg and a sperm to
form a zygote with a particular genotype. For example, in the formation of offspring
of the green hybrids, the probability of forming a zygote with the genotype PP, call
it Pr(PP), is the joint probability of two independent events, namely, the probability
that an egg contains P, and the probability that a sperm contains P. Since the joint
probability is the product of the probabilities of the two independent events, we can
write Pr(PP) = Pr(P)Pr(P).
Mendel applied this rule to predict the probability of finding a given genotype among
the progeny of the green hybrids. Since green hybrids produce gametes containing
P or p, each with a probability of 112, the eggs and sperms combine in four equally
probable ways to produce offspring with the genotypes PP, Pp, pP, or pp, and the
probability of each of those genotypes is 1/2 times 1/2, or 1/4. Since Pp and pP
are equivalent genotypes (it doesn't matter whether a particular allele arrived with
the sperm or the egg), the probabilities for Pp and pP are added to predict that the
probability of an offspring's having the genotype Pp is 1/2. In other words, the three
possible genotypes occur in the ratio 1 PP:2 Pp:lpp. Translating the genotypes into
phenotypes yields the ratio 3 green: 1 yellow in agreement with Mendel's observations.
Having explained the 3 green:! yellow ratio by advancing a general model, Mendel
went on to test the model by crossing green hybrids (genotype Pp) with plants
purebred for yellow pod color (genotype pp). He predicted that the offspring would
have the genotypes Pp andpp in the ratio 1 Pp:l pp and found, in agreement with the
model, that approximately one-half the progeny bore green pods and the remainder
bore yellow pods.
Mendel obtained similar results for all seven traits. In other words, he inferred the
existence of two alleles for each trait, one dominant and one recessive. However,
we now know that the alleles of a gene do not always exhibit a dominant-recessive
relationship. Sometimes the pairing of different alleles leads to a blend (for example,
pairing of the snapdragon alleles that specify white and red flowers leads to pink
flowers); sometimes it leads to simultaneous exhibition of both phenotypes (for
example, pairing of the human alleles that specify A and B blood types, which are
characterized by the presence of the antigens A and B, respectively, on the surface of
red blood cells, leads to AB blood type, which is characterized by the presence of both
antigens). However, the validity of Mendel's research and theoretical conclusions is
unaffected by the fact that he focused, presumably by chance, on traits controlled by
alleles that do exhibit the phenomenon of dominance.
Mendel next proceeded to study the co-inheritance of two traits, say pod color
(specified by dominant and recessive alleles P and p, respectively) and flower color
(specified by dominant and recessive alleles F and f, respectively). Again, he first
developed two purebred strains, one purebred for green pod color and violet flower
color (genotype PPFF) and the other purebred for yellow pod color and white flower
color (genotype ppff).
As before, Mendel cross-pollinated the purebred strains, thus producing dihybrid
offspring, each heterozygous for both traits. He selfed the resulting first dihybrid
generation to produce the second dihybrid generation. Each member of the first
dihybrid generation exhibited both dominant phenotypes; that is, they bore green
pods and violet flowers. Members of the second dihybrid generation exhibited four
composite phenotypes in a 9:3:3:1 ratio, as shown below.
Possible Phenotypes among Second
Dihybrid Generation
green pods, violet flowers
green pods, white flowers
yellow pods, violet flowers
yellow pods, white flowers
Fraction Exhibiting
Phenotype
Note that the ratio of green- to yellow-podded members of the second dihybrid
generation was still 3 to 1, just as it was in the second generation produced by the
experiments on pod color alone. The ratio of violet- to white-flowered members of
the second dihybrid generation also was 3 to 1. Mendel realized that the 9:3:3: 1
ratio resulted from multiplicative combinations of the two 3:l ratios. He therefore
concluded that the phenotypes for the two traits are inherited independently. hi other
words, the probability of each composite phenotype is the product of the probabilities
of the two "component" (single-trait) phenotypes. For example, the probability that
a second-dihybrid-generation member will bear green pods and white flowers (3116)
is the product of the probability of its bearing green pods (314) and the probability
of its bearing white flowers (114).
The independent inheritance of the two traits implies that when members of the
first dihybrid generation produce gametes, segregation of the alleles for pod color is
independent of the segregation of the alleles for flower color. In other words, the
two allele pairs assort independently. The members of the first dihybrid generation
have the genotype PpFf, so each gamete receives P or p with a probability of 112
and F or f with a probability of 112. Since the segregation of each allele pair is
an independent event, the individual probabilities are multiplied to predict that the
probability of forming each of the four possible types of gametes, those containing
PF, Pf, pF, or pf, is 112 times 112, or 114.
Random fertilization of eggs by sperms produces the sixteen genotypes shown in the
probability table for the second dihybrid generation in "Mendelian Genetics." Each
has a probability of 114 times 114, or 1/16. The composite phenotype corresponding to
each genotype is also shown. Counting the number of times each phenotype appears
yields the 9:3:3: 1 ratio observed by Mendel.
The physical basis for Mendel's law of independent assortment is the independent
assortment of the various different pairs of homologous chromosomes during meiosis.
MENDELIAN GENETICS 1
Mendelis experiments on the inheritance
of single traits and pairs of traits, illustrated
here, led him to postulate the concept of
discrete, particulate units of heredity that
pass unchanged from generation to gen-
eration. He studied seven traits (character-
istics) of the garden pea, each of which
exhibited two alternative forms. For example,
pod color could be either green or yellow,
and flower color could be either violet or
white. As described in the main text, Mendel
found that one form of each trait was domi-
nant and the other recessive and that the
progeny of controlled breedings exhibited
one form or the other in definite ratios. The
observed mathematical regularities led to
the model of inheritance described here.
Mendel knew that his plants reproduced
sexually, but he did not know that chromo-
somes exist nor that the number of chromo-
somes was reduced by one-half during the
formation of gametes. As a result his termi-
nology was rather imprecise. He did not
clearly distinguish the form of a trait from the
units of heredity whose actions determine
the trait. That distinction was made almost
half a century later by Johannsen, who
coined the term gene for the particulate
units of heredity, the term genotype for the
genes whose action determines a trait, and
the term phenotype for the form of the trait
determined by the genotype. The more pre-
cise terminology is used in the following
description of Mendel's model and in the
accompanying figures.
Mendel's model of inheritance includes four
postulates.
1. Each plant contains a pair of genes for
each trait; that is, the genotype for a trait is
specified by a pair of genes.
2. During the formation of gametes, the
gene pair for a trait segregates equally; that
is, the genes in the pair are parceled out to
the gametes in a fashion such that each
gamete receives only one member of the
pair and has an equal chance of receiving
either member of the pair (the law of equal
segregation).
3. A gene has two forms, or alleles, desig-
nated by, say, A and a. Only plants with the
genotype aa (homozygous for a) exhibit the
recessive phenotype. A plant with the geno-
type AA (homozygous for A) or the geno-
type Aa (heterozygous) exhibits the domi-
nant phenotype.
4. During the formation of gametes, segre-
gation of the gene pair for any one trait is
independent of the segregation of the other
gene pairs. Consequently a plant heterozy-
gous for two traits (genotype AaBb) pro-
duces gametes containing AB, Ab, aB, and
ab with equal probability (the law of inde-
pendent assortment). Note that the law of
independent assortment holds only if the
genes for the different traits are on different
pairs of homologous chromosomes.
Mendel's laws of equal segregation and
independent assortment can be applied in
two ways. If one knows the genotypes of
both parents, one can predict the probability
of the genotype of a future offspring. Or,
working backward, if one observes in exist-
ing offspring the approximate ratios of phe-
notypes predicted by Mendel's laws, one
can often infer the genotypes of the parents,
just as Mendel did.
Mendel's Experiments on Inheritance of One Trait (Pod Color)
Methodology
Step 1 : Cross-pollinate two strains of peas, one purebred for green pod color, the other purebred for yellow pod color. Result: All first-
generation hybrids bear green pods.
Step 2: Self-pollinate the green hybrids. Result: Second-generation plants bear either green or yellow pods in the approximate ratio of 3
green to 1 yellow. Further selfing shows that half the second generation (or two-thirds of the green-podded members) are hybrids.
Theoretical Model
Parental generation
(purebred strains)
Probability of each
gamete type
First generation
(green hybrids)
Probability of each
gamete type
Second generation
I
Meiosis
Gametes
1 P
I
Meiosis
Gametes
I P
cross-pollinate
Sperms
Self-pollinate
Mendel assumed that each plant contains a pair of genes for pod
color. Therefore, each purebred parent is homozygous; that is,
each contains two identical genes for pod color.
P = green-pod-color allele
p = yellow-pod-color allele
Since a fertilized egg results from the union of two gametes, each
gamete contains one allele for pod color.
Because all first-generation offspring bore green pods, Mendel called
green the dominant pod color and yellow the recessive pod color.
Mendel inferred that whenever P, the allele for the dominant pod
color, is present, the plant bears green pods (the law of dominance).
Mendel inferred that the pair Ppsegregates equally into the
gametes; that is, each gamete (whether egg or sperm) receives P or
p with equal probability of one-half (law of equal segregation).
Random union of eggs and sperms produces four possible combina-
tions of alleles in the offspring. As shown by the table, the probabili-
ties of each gamete type are multiplied to yield the probabilities of
the four possible genotypes in the second-generation offspring.
Since Pp and pP are equivalent genotypes, the probabilities of each
are added to yield a probability of one-half for the genotype Pp.
Mendel's model predicts, for members of the second generation,
phenotypes in the ratio 3 green : 1 yellow (in agreement with
Mendel's observations) and genotypes in the ratio 1 PP : 2 Pp : 1 pp.
Number 20 1992 Los Alamos Science 23
- - -
Mendel's Experiments on Inheritance of Two Traits (Pod Color and Flower Color)
Methodology
Step 1 : Cross-pollinate two strains of peas, one purebred for the two dominant phenotypes (green pods and violet flowers), the other
purebred for the two recessive phenotypes (yellow pods and white flowers). Result: All first-generation dihybrids bear green
pods and violet flowers.
Step 2: Self-pollinate the first-generation dihybrids. Result: Second-generation plants exhibit four composite phenotypes (pod color,
flower color) in the ratio of 9 (green, violet) : 3 (yellow, violet) : 3 (green, white) : 1 (yellow, white).
Theoretical Model
Parental generation
(strains purebred
for two traits)
Probability of each
gamete type
First generation
(green-pod and
tiolet-flower dihybrids)
Probability of each
gamete type
I
I
Meiosis
Gametes
1PF
Meiosis
Gametes
^pf
cross-pollinate
Sperms
1
- Pf
4
Second generation
Each purebred parent is homozygous for both pod color and flower
color.
Phenotype
P = green-pod-color allele
p = yellow-pod-color allele
F = violet-flower-color allele
f = white-flower-color allele
Pod
color
Each gamete carries only one gene for each trait.
All first-generation (dihybrid) offspring bear violet flowers and green
pods, the dominant phenotypes, in agreement with the law of
dominance.
Independent equal segregation of each allele pair (Pp and Ft)
produces gametes containing one of four equally probable
combinations of alleles (law of independent assortment).
Random union of eggs and sperms produces offspring containing
one of sixteen equally probable combinations of alleles. All are
equally probable because all gamete types are equally probable.
The sixteen combinations reduce to nine different genotypes and
four different composite phenotypes, which are predicted from the
probability table to occur in the ratio 9:3:3:1 in agreement with
Mendel's observations.
24 Los Alamos Science Number 20 1992
Therefore, the law applies only if the allele pairs for the two traits reside on different
pairs of homologous chromosomes. In fact, deviations from Mendelian predictions
for the co-inheritance of two traits is evidence that the two traits are specified by
allele pairs that reside on the same pair of homologous chromosomes.
This discussion of Mendel's theory of inheritance ends with two points of note. First,
although the theory is now known to be applicable to humans as well as to pea plants,
it is unlikely that it could have been deduced from data about the outcomes of human
breedings. As subjects of inheritance studies, humans pose several disadvantages:
The controlled breeding of humans is generally regarded as inappropriate and would
be difficult to achieve even if it were not; each pair of human parents typically
produces too few data (offspring) for analysis of the sort required; and the rate
at which humans produce offspring is too slow to suit most experimenters' taste.
Moreover, many human traits are specified not by a single allele pair but by many
allele pairs.
The second point of note concerns the utility of Mendel's theory as a predictive tool,
particularly for human breedings. The theory can be applied directly only to traits
determined by a single allele pair. Such traits are called Mendelian traits because they
are inherited in accordance with Mendel's laws. Most Mendelian traits of humans are
disorders-some mild, some grave-caused by the presence of a defective allele. To
determine the probability that an offspring will be affected by a Mendelian disorder
requires knowing the parental genotypes for the disorder and whether the disorder
is caused by a dominant or a recessive allele. The required genotypic information
for the parents can often be inferred from the phenotypes of their existing offspring
and of their parents, and information about whether the defective allele is dominant
or recessive can often be inferred from the pattern of inheritance of the disorder in
other families (see "Inheritance of Mendelian Disorders"). More. than three thousand
human Mendelian disorders have been identified. One of the goals of the Human
Genome Project is to supply the tools necessary to isolate the causative alleles from
the vast quantity of human genetic material and to identify the defects in the alleles.
A Theory of Evolution. The nineteenth century brought not only the rise of cell
biology and the work of Mendel but also a growing acceptance of the fact of evolution,
of the creation of extant organisms by changes in the life forms that first populated
this planet. Belief in the ancient principle of the invariability of species waned, and
in its place came tile conviction that new species had been and are being formed.
(A notable holdout to the idea of evolution was the eminent Harvard zoologist
Jean Louis Rudolphe Agassiz (1807-1873), who was what we would today call
a creationist.) The veering of scientific opinion toward evolution led to development
of a theory of evolution based on natural selection. Formulated independently by
Charles Robert Darwin (1 809-1 882) and Alfred Russell Wallace (1 823-19 13), the
theory was presented to the world first in a jointly authored short publication (1858)
and later in Darwin's classic book On the Origin of Species (1859). Crucial to
development of the theory were the observations that offspring resembled their parents
Number 20 1992 Los A l ms Science
-3
INHERITANC
Although some inherited disorders of humans are due to the
combined effects of multiple genes (multigenic disorders) or to the
combined effects of genes and the environment (multifactorial disor-
ders), a so-called Mendelian disorder is caused by a single defective
allele. Over 3000 Mendelian disorders are known. They range from
mild conditions such as red-green color blindness to life-threatening
diseases such as cystic fibrosis. Because the defective allele can be
either dominant or recessive and can reside on either an autosome
or a sex chromosome (in particular, the X chromosome-very few
genes reside on the small human Y chromosome), four types of
Mendelian disorders are possible: autosomal dominant, autosomal
recessive, X-linked dominant, and X-linked recessive. Each type of
disorder reveals itself through a distinctive pattern of inheritance in
a family pedigree. Illustrated here are the patterns for three of the four
types of Mendelian disorders.
Consider first the inheritance of an autoso-
ma1 dominant Mendelian disorder. Many
such disorders are expressed only in adult-
hood, including Huntington's disease,
neurofibromatosis, and polycystic kidney
disease. Shown in (a) are the equally
probable genotypes and the phenotypes of
the offspring of an affected father and an
unaffected mother (or of an affected mother
and an unaffected father). The genotype of
the affected father can be either DD or Dn,
where n is the nondefective recessive ver-
sion of the defective dominant allele D.
Because the father's having the genotype
DD is the less typical and less interesting
situation (all his offspring would beaffected),
it is assumed in (a) that the father has the
genotype Dn. Because the mother is unaf-
fected, her genotype must be nn. The equal
segregation of chromosomes during meio-
sis implies that the offspring of such a mat-
ing can have one of two equally probable
genotypes: Dn or nn. Therefore the prob-
ability of an offspring's being affected is 112.
Note carefully, though, that only in the limit
of an infinite number of offspring will the
ratio of affected to unaffected offspring be
equal to 1. Also shown in (a)
is the pedigree of a family
afflicted with hypercho-
lesterolemia, a dominant
disorderthat causes excess
levels of cholesterol in the
blood. A thirty-year-old white
male (11-4) suffered a myo-
cardial infarction, a type of
heart blockage, and was
then found to test positively
for hypercholesterolemia.
(a) Autosomal Dominant Disorder
1 Probabilistic Prediction
Affected
Unaffected
Carrier
0 Female
I
Ma'e
Dn nn Dn nn
A fifty-fifty chance of inheriting the disorder
Observed Pedigree
Vertical inheritance pattern
Further tests indicated that
his sister (11-1) and his four children (111-6, Ill-
7,111-8,111-9) also had hypercholesterolemia.
In addition, afamily history revealed that the
man's father (1-3) and uncle (1-1) both died
of myocardial infarctions before reaching
the age of fifty-five. Note that all of 11-4's
children are affected by the disorder, an
outcome that is not inconsistent (although it
may appear to be) with the probabilistic
predictions based on the chromosome
theory of heredity. Note also that the dis-
ease appears in all three generations of the
pedigree; such a "vertical" pattern is char-
acteristic of dominant disorders.
Shown in (b) is the inheritance of an auto-
somal recessive Mendelian disorder, ex-
amples of which include Tay-Sachs dis-
ease, cystic fibrosis, and sickle-cell anemia.
Assume a typical situation: Both parents
are carriers, or, in other words, are unaf-
fected but have the genotype Nd, where N
is the nondefective dominant version of d.
The equal segregation of chromosomes
during meiosis implies that the probability of
an offspring's having the genotype dd and
therefore of being affected is 114. In addi-
tion, the probability of an offspring's having
the genotype Nd or dN (and of being a
(b) Autosomal Recessive Disorder
Probabilistic Prediction
(c) X-linked Recessive Disorder
Probabilistic Prediction
NN dN Nd dd
A one-in-four chance of inheriting the disorder
Observed Pedigree
xNxd x* xNx^ X^Y
Only males at risk of inheriting the disorder
Observed Pedigree
M
Horizontal inheritance pattern Disorder passed to male offspring from female carriers
carrier) is 112 and of having the genotype
NN (and of being unaffected) is 114. Also
shown in (b) is the pedigree of a family with
an autosomal recessive Mendelian disor-
der. Only two individuals, both in the third
generation (111-1and Ill-4), are affected. All
the other individuals listed are either carri-
ers or unaffected. Since typically siblings in
only a single generation are affected by a
recessive Mendelian disorder, its inherit-
ance pattern is referred to as horizontal.
Shown in (c) is the inheritance of an X-linked
recessive Mendelian disorder. Such disor-
ders include hemophilia, which is the result
of a lack of an essential blood-clotting fac-
tor, and Duchenne muscular dystrophy,
which causes progressive muscle weak-
ness and death in early adulthood from
respiratory problems. Again assume a typi-
cal situation: The mother is a carrier and
therefore has the genotype x^x*, and the
father is unaffected and therefore has the
genotype X q . Any male offspring has a
probability of 112 of being affected, and any
female offspring has a probability of 112 of
being a carrier. Also shown in (c) is a pedi-
gree of a family with Duchenne muscular
dystrophy. One son (11-2) and two daugh-
ters (11-1 and 11-3) inherited the maternal X
chromosome on which the defective allele
resides. The son, possessing only one X
chromosome, is affected. On the other hand,
the daughters are unaffected carriers, but
their sons (111-2, 111-6, and 111-7)inherited the
defective allele. The pedigree illustrates the
typical pattern of inheritance of an X-linked
recessive disorder: transmission from an
affected male through his daughters to his
grandsons. Females can inherit the dis-
ease if the father is affected and the mother
is either affected or a carrier.
Number 20 1992 Los Alamos Science 2 '
only incompletely and that selective breeding had produced plants and animals quite
different from the ancestral strains. Darwin arrived at his conclusions in large part by
doing a Gedankenexperiment, much as Albert Einstein later arrived at his theory of
relativity. It should be noted that not all of Darwin's thinking was as forward-looking
as his theory of evolution. He was an exponent of a form of pangenesis (see "Early
Ideas about Heredity") and of blending inheritance (the notion that the characteristics
of offspring are the result of a melding of the parental characteristics). Darwin's
cousin Francis Galton (1822-191 I), in his own way also a genius, tried to point
out to Darwin, without success, that neither theory of inheritance made much sense.
In doing so Galton came very close to developing the same theory of particulate
inheritance as had Mendel, although like Darwin, he was unaware of Mendel's work.
Like Mendel, Galton was cognizant of probability and statistics. He can be considered
the founder of modem biostatistical theory, which has been an immensely powerful
tool in the development of genetic theory.
The cell biologists, Mendel, and Darwin and Wallace made basic contributions to the
foundations of modem genetics, but they did so essentially in isolation from each
other. Mendel was influenced to some extent by the findings of the cell biologists and
of the evolutionists, but neither of the latter were influenced by him or by each other.
Such isolation among different fields of science, though detrimental to progress, is
still today not uncommon.
Things Come Together
The science of genetics was bom in the first decade of the twentieth century
through fusion of Mendel's theory of inheritance and the cell biologists' knowl-
edge about chromosomes. In 1902 a student of Wilson's, Walter Stanborough Sutton
(1 877-19 l6), and Boveri independently recognized the parallels between the real ob-
jects called chromosomes and the theoretical constructs called genes-the occurrence
of both as pairs, their separation in a similar fashion during gamete formation, and
their re-pairing during fertilization-and proposed that each member of a pair of al-
leles is located on one or the other member of a pair of homologous chromosomes.
Thus was born the chromosome theory of heredity. The theory was soon proved, and
during the period between 1910 and 1940th heyday of classical genetics-many
allele pairs were localized to particular homologous chromosome pairs.
Classical Genetics. The term "classical genetics" refers to those aspects of genetics
that can be studied without reference to the molecular details of genes. The early stars
of classical genetics were the American Thomas Hunt Morgan (1866-1945), his stu-
dents Calvin Blackrnan Bridges (1889-1938), Hermann Joseph Muller (1890-1967),
and Alfred Henry Sturtevant (1891-1970), and last but not least members of the genus
Drosophila, most notably the common fruit fly Drosophila melanogaster. Morgan's
interest lay (initially at least) in determining whether the changes that result in new
species occur gradually or abruptly. He chose to study changes in D. melanogaster
because it reaches sexual maturity so rapidly, produces so many offspring, and is so
easily and cheaply raised in the laboratory. The discovery, in the spring of 1910, of
a lone white-eyed male fly among thousands upon thousands of red-eyed flies in the
Fly Room at Columbia University was a momentous event, leading not only to proof
of the chromosome theory of heredity but also to knowledge of previously unknown
aspects of meiosis.
Now is an appropriate time to emphasize the critical role of mutants in genetics. (A
mutant is a member of a species that exhibits a phenotype different from the "wild-
type" phenotype exhibited by most members of a natural population of the species.)
Even knowledge of the existence of a gene is usually inferred from the existence of
a mutant. When faced, for example, with a vast population of only red-eyed flies,
how could anyone suspect that eye color is a manifestation of genes in operation?
To be discussed later is another invaluable role of mutants-as tools for learning
more specifically what genes do. (That genes determine physically observable traits
is certainly true but remarkably vague.)
An early outcome of the discovery of the white-eyed fly was Morgan's proposal
that alleles for red and white eye color in D. melanosaster are located on its X
chromosomes. Morgan arrived at that proposal by observing the eye colors of the
progeny resulting from a series of breedings, a series that began with rnatings between
the white-eyed male and wild-type red-eyed females. (Note that mutants must not
only be discovered but also be allowed to survive and breed.) Because all the progeny
were red-eyed, Morgan concluded that the red-eye-color allele is dominant. Next he
interbred the progeny and found, just as Mendel would have predicted, that three-
quarters of the resulting second-generation progeny were red-eyed and one-quarter
were white-eyed. However, among neither the red-eyed nor the white-eyed second-
generation flies did he find an equal number of males and females, as would be
predicted if the observed segregation of sex chromosomes was independent of the
presumed segregation of red-and white-eye-color alleles. Instead two-thirds of the
red-eyed second-generation flies were females and all of the white-eyed flies were
males. Morgan continued by mating red-eyed males to white-eyed females, a breeding
that is the "reciprocal" of the original breeding of the lone white-eyed male. He found
that half of the progeny were female and red-eyed and the other half were male and
white-eyed, whereas Mendel would have predicted that all of the progeny would be
red-eyed, just as all of the progeny resulting from the original breeding were red-eyed.
To explain those deviations from Mendelian predictions, Morgan proposed that the
red- and white-eye-color alleles are X-linked, or in other words that they are located
on the X chromosomes.
The reader can more easily verify that Morgan's hypothesis explains the outcomes
of the breedings he carried out by using some symbolism. Let w and W denote,
respectively, the recessive white-eye-color allele and the dominant red-eye-color
allele. Denote an X chromosome containing w by XU' and an X chromosome
containing W by xw. Then the first breeding Morgan carried out, the breeding
between wild-type red-eyed females and the white-eyed male, is denoted by xWxW
x Xw Y. The progeny of such a breeding contain one of two equally probable
combinations of sex chromosomes: xWxW and X^Y. In other words, half the
progeny are female and red-eyed and half are male and red-eyed. The reader is urged
to verify that Morgan's proposal explains the outcomes of the other breedings he
carried out, namely xWxw x X^Y and XwX" x X ~ Y .
Morgan's experiments certainly supported the chromosome theory of heredity, but
the work of Bridges provided more direct confirmation. Bridges started by repeating,
on a large scale, one of the breedings Morgan had carded out, the breeding between
white-eyed female flies and red-eyed male flies. If, as Morgan proposed, the w
and W alleles reside on the X chromosomes, that breeding can be represented
by Xw Xw x X^Y and, as Morgan had observed, half of the resulting progeny
would possess the sex-chromosome combination Xw X^ (would be red-eyed females)
and half would possess the sex-chromosome combination XWY (would be white-
eyed males). But Bridges' large-scale breeding produced a surprise: A very small
fraction of the progeny (about one in every two thousand) were either white-
eyed females or sterile red-eyed males. Bridges found, by direct microscopic
observation of the chromosomes of the unusual progeny, that they possessed an
anomolous number of sex chromosomes. The white-eyed females possessed two
X chromosomes and one Y chromosome, and the sterile red-eyed males possessed
a single X chromosome. Obviously the single X chromosome of a sterile red-eyed
male must be the residence of the red-eye-color allele he must possess, and the pair of
homologous X chromosomes of a white-eyed female must be the residences of the two
white-eye-color alleles she must possess. Thus a combination of cytological data and
genotypic and phenotypic data directly confirmed the chromosome theory of heredity.
(Note that Bridges' "cytogenetic" evidence also indicated that the Y chromosome of
D. melanogaster is involved in determining fertility rather than maleness.)
A question about Bridges' work remains: How could the abnormal numbers of
sex chromosomes in the unusual progeny be explained? Bridges proposed that the
homologous X chromosomes of a female fruit fly occasionally fail to segregate during
meiosis. Meioses in which such "nondisjunctions" occur would yield two equally
probable types of eggs: eggs containing two X chromosomes and eggs containing
no X chromosomes. Fertilization of those two types of eggs by the two types of
sperms produced by a male fruit fly would result in four types of fertilized eggs:
those containing the combination of sex chromosomes XmXmXp, the combination
XmXmY, the combination Xp, and the combination Y. (The subscript on each X
chromosome denotes maternal origin or paternal origin.) The combinations XmXmY
and Xp are the combinations Bridges observed in the unusual progeny; he attributed
the absence of unusual progeny containing the XmXmXp and Y combinations to
a lethal overdose and a lethal underdose of X chromosomes. Nondisjunction is
now known to be a rare but medically significant feature of meiosis. The human
disorder known as Down syndrome, for example, is caused by nondisjunction of
chromosomes 2 1.
It is odd that proof for the existence of a rare meiotic glitch-nondisjunction-
antedated clear evidence for the existence of what is now known to be a common
feature of meiosis~crossing over. (Nondisjunction occurs once in about every
hundred thousand human meioses, whereas crossing over occurs about thirty-three
times per human meiosis, or on average more than once per homologous chromosome
pair per human meiosis.) As proposed by Morgan, crossing over brings about
an exchange, between two homologous chromosomes, of corresponding regions
of the chromosomes. (An analogy is the exchange, between two nearly identical
yardsticks, of, say, initial seven-inch regions.) Because homologous chromosomes
differ from each other in details of their chemical composition, the products of a
single crossover are two "recombinant" chromosomes, each different from (but still
homologous to) the other and the chromosomes that participated in the crossover.
In particular, if the exchanged regions contained different alleles of two genes, the
recombinant chromosomes contain combinations of alleles that are different from the
combinations of alleles possessed by the participants (see "Crossing Over: A Special
Type of Recombination"). Thus crossing over, like independent assortment, increases
the genetic diversity of sexually reproducing organisms. But whereas independent
assortment merely creates new combinations of existing chromosomes, crossing over
can create new chromosomes, ones containing new combinations of alleles.
Crossing over might today be regarded as merely another item in the phenomenology
of meiosis were it not that it is the key element of a method for determining a measure
of the distance between two genes (or, more precisely, two allele pairs) resident on the
same chromosome (or, more precisely, on the same homologous chromosome pair).
(Note that the method is applicable only to genes for which two or more alleles
exist.) Called classical linkage analysis, the method is far from straightforward. The
first step, of course, is to establish that two allele pairs are linked (are resident on
the same homologous chromosome pair) by observing deviations from Mendelian
predictions for the co-inheritance of the traits specified by the allele pairs. The
next step is to measure the fraction of meioses in which crossing over leads to new
combinations of alleles. The final step (and one not known to be necessary to the
earliest linkage analysts) is to convert the measured "recombination fraction" to a
"genetic distance" for the two allele pairs, which is defined as the probability of the
occurrence of crossing over anywhere in the chromosomal region between the allele
pairs. (Although a genetic distance is a dimensionless number, it is expressed in terms
of a unit called a morgan or, more usually, in centimorgans.) The relationship between
recombination fraction and genetic distance is complex (see "Classical Linkage
Mapping" in "Mapping the Genome"), but a recombination fraction is approximately
equal to its corresponding genetic distance when the recombination fraction is less
than about 0.10. The significance of the genetic distance for two allele pairs is that the
genetic distance is proportional to the physical distance between the loci of the allele
pairs, provided crossing over occurs with equal probability at any point along the
chromosome pair. Despite the fact that the stated proviso is not in general satisfied,
genetic distance was until recently the only available measure of the physical distance
between gene loci.
DNA molecules, and hence chromosomes,
are not immutable, even in the absence of
external mutagenic agents. One of the
natural mechanisms whereby DNA mol-
ecules can change is recombination, which
rearranges genetic material by breaking
and joining portions of the same DNA mol-
l
ecule or portions of different DNA mol-
ecules of the same organism. (Recombina-
tion can occur also between the DNA of an
organism and the DNA of a virus that infects
the organism.) Crossing over is the type of
recombination undergone bythesimilar DNA
molecules within two homologous chromo-
somes. It occurs almost exclusively during
prophase I of meiosis, when homologous
chromosomes are closely apposed. A single
crossover between homologous chromo-
somes effects an exchange of correspond-
ing chromosome regions and results in the
formation of recombinant chromosomes,
which differ in their content of hereditary
information from the chromosomes that par-
ticipated in the crossover. Crossing over
also occurs between the identical DNA
molecules within the chromosomes of a
sister-chromatid pair, but because the re-
combinant chromosomes so formed are
usually identical to the participants, such
recombination has little geneticsignificance.
Crossing Over during Prophase I of Meiosis
Recombinant
chromosomes
Closely apposed Crossover Crossover
homologous in progress complete
sister-chromatid
pairs
possessed by the parent germ-line cell.
Crossing over is thus a mechanism for
increasing genetic diversity. It also is the
basis of a standard method for determining
a "distance" between the locus of A and a
and the locus of Band b. The first step in the
method (see "Determining a Genetic Dis-
tance'') is to carry out a certain breeding
experiment and thereby measure, among a
group of gametes produced by one parent,
the fraction possessing the new allele com-
binations (the so-called recombination frac-
tion). When the measured recombination
fraction is relatively small (less than about
0.10), it is approximately equal to the "ge-
netic distance" between the two loci, that is,
to the average number of crossovers be-
tween the two loci per meiosis. The genetic
distance between the two loci in turn is a
rough measure of the physical distance (the
distance along the DNA molecule) between
the two loci.
Effect of Crossing Over on Allele Combinations in Gametes
a Prophase I
-
of meiosis
u
between of meiosis
loci of two
B b 8 allele pairs b B b B
1-
B
Allele combinations on
single chromosomes
in gametes
Allele combinations
on homologous
chromosome pairs
in germ-line cell
I
The occurrence of a single crossover be-
tween the loci of two allele pairs, say A and
a and B and b, resident on a homologous
chromosome pair results in the formation of
some gametes that possess combinations
of alleles different from the combinations
As illustrated in "Determining a Genetic Distance," linkage analysis is facilitated by
carrying out either one of two particular breedings. (Each breeding is a "test cross"
involving one doubly heterozygous parent and one doubly recessive parent.) Morgan
happened to carry out both breedings-between fruit flies, of course-in the early
1910s and thereby not only gathered the first clear evidence for the existence of
crossing over but also measured the first recombination fractions.
Then in 19 13 Sturtevant measured recombination fractions for various pairwise com-
binations of six allele pairs known to reside on the X chromosomes of Drosophila.
By assuming that the loci of the six allele pairs dot the X chromosome as points
dot a line and that the measured recombination fraction for, say, the allele pairs
A,a and B, b is directly proportional to the length of the X-chromosome segment be-
tween the locus of A,a and the locus of B,b, Sturtevant constructed a diagram-the
first "genetic-linkage mapw-showing the relative locations of the six genes and their
pairwise separations. Sturtevant then used his diagram to calculate the recombination
fractions for those pairwise combinations of allele pairs that he had measured but
not needed to construct the diagram. The approximate agreement between calculated
and measured recombination fractions indicated that both of his assumptions were at
least approximately valid. We now know that, although the genes of all eukaryotic
organisms lie along linear DNA molecules, the genes of prokaryotic organisms lie in-
stead along circular DNA molecules. Furthermore, as indicated above, recombination
fractions are not in general proportional to physical distance.
As noted previously, genetic studies of an organism demand the availability of
mutants, that is, of individuals possessing alleles different from those possessed by
wild-type individuals. For many years, though, geneticists had to survive on the rare
mutants provided by nature. (Fewer than ten out of every million members of a
natural population of a species are phenotypically obvious mutants.) Then in 1927
Muller (one of Morgan's trio of brilliant students) demonstrated that x rays induce
heritable mutations in the fruit fly, and a year later the American geneticist Lewis
John Stadler (1896-1954) used x rays to create new alleles in barley. The availability
of x-ray-induced mutants accelerated the pace of gene discovery and genetic-linkage
mapping.
The demonstrated power of combining cytological data about the chromosomes of
an organism with genotypic and phenotypic data led, in the 1930s, to emergence of
cytogenetics as a separate field of biology. Crucial to cytogeneticists is the ability
to distinguish one pair of homologous metaphase chromosomes from another. For
distinguishing features, early cytogeneticists relied on sizes and shapes, which do
not always provide unambiguous identification. (The word "shape" generally means
centromere location, but it can also mean an unusual structural feature specific to only
certain metaphase chromosomes of certain organisms. Chromosome 9 of a strain of
Zea mays, for example, is sometimes blessed with a conspicuous knob at the end of
its short arm, a feature that helped elucidate the mechanism of crossing over.) It was
soon learned, however, that each homologous chromosome pair within a metaphase
-- --
The classical method for determining the
genetic distance between the loci of two
allele pairs known to reside on the same
homologous chromosome pair of an organ-
ism involves observing the phenotypes of
the offspring of one of two particular breed-
ings. During the course of Thomas Hunt
Morgan's work on fruit flies, he happened to
carry out both breedings and was rewarded
not only with the first clear evidence of
crossing over but also with the first unam-
biguous genetic-distance data. Morgan's
experiments and data are used here to
illustrate the procedure.
The allele pairs in question reside on one of
the homologous autosome pairs of Dro-
sophila melanogaster. One allele pair af-
fects eye color: a dominant allele A that
specifies red eye color and a recessive
allele a that specifies purple eye color. The
other allele pair affects wing length: a domi-
nant allele B that specifies wild-type wings
and a recessive allele b that specifies ves-
tigial (very short) wings.
The participants in the first breeding are a
female fruit fly that is heterozygous for both
traits (and therefore has red eyes and nor-
mal wings) and a male fruit fly that is ho-
mozygous for both recessive trait variants
(and therefore has purple eyes and vestigial
wings). Furthermore, the female is known
to be a product of the breeding AABB x
aabb. Therefore the distribution of the alle-
les A, a, 6, and b on the homologous auto-
some pair of the female is known: Both
dominant alleles (A and B) reside on one
member of the homologous autosome pair,
and both recessive alleles (a and b) reside
on the other member. Such an allele distri-
bution is denoted by writing the genotype of
the female as AB/ab. The distribution of the
alleles a, a, b, and b on the homologous
autosome pair of the male is also known
(because the male is homozygous for both
traits) and is denoted in a similar fashion as
ab/ab. Thus the first breeding can be sym-
bolized by
AB/ab female x ab/ab male.
Meioses in the heterozygous female that
involve no crossovers between the two loci
yield two types of eggs: those possessing
the chromosome with the allele combina-
tion A6 and those possessing the chromo-
some with the allele combination ab. In
meioses in the female that involve a single
crossover between the two loci (or any odd
number of crossovers) yield in addition two
other types of eggs: those possessing a
chromosome with theallelecombination Ab
and those possessing a chromosome with
the allele combination aB. In other words, a
single crossover between the two loci es-
tablishes linkage between one dominant
and one recessive allele. On the other
hand, meioses in the doubly homozygous
male, whether or not they invove cross-
overs between the two loci, yield sperms
possessing only the allele combination ab.
Thus the offspring of breeding 1 possess
four genotypes, each corresponding to one
of the four possible phenotypes:
AB/ab female x ab/ab male +
AB/ab + ab/ab + Ab/ab + aB/ab.
Morgan examined more than 2800 progeny
of breeding 1 and found that 47.2 percent
had red eyes and normal wings (AB/ab),
42.1 percent had purple eyes and vestigial
wings (ab/ab), 5.3 percent had red eyes and
vestigial wings (Ab/ab), and 5.4 percent had
purple eyes and normal wings (aB/ab). All
the offspring exhibiting the last two pheno-
types (the combinations of one recessive
trait variant and one dominant trait variant)
result only from crossovers during meioses
other words, the two dominant alleles and in the female parent. Thus the data indicate
the two recessive alleles remain linked that the probability of new allele linkages
(resident on the same chromo- being formed by crossing over is 0.107 =
some), just as they are in the 0.053 + 0.054. That value for the so-called
female herself. But those recombination fraction corresponds to a
genetic distance of about 12 centimorgans.
(The relationship between recombination
fraction and genetic distance is presented
in "Classical Linkage Mapping" in "Mapping
the Genome.")
The participants in the other breeding that
provides unambiguous recombination-frac-
tion data are, like the participants in breed-
ing 1, a doubly heterozygous female and a
doubly homozygous-recessive male. How-
ever, the second female is known to be a
product of the breeding Ab/Ab x aB/aB
(rather than the breeding AB/AB x ab/ab).
Therefore the distribution of alleles on her
homologous autosome pair is Ab/aB(rather
than AB/ab). (The difference in allele distri-
butions of the two doubly heterozygous
females is often referred to as a difference
in linkage phase.) The second breeding is
thus symbolized by
Ab/aB female x ab/ab male.
- -
Breeding 2 yields offspring that exhibit the
same genotypes and phenotypes as breed-
ing 1 :
Ab/aB female x ab/ab male
Ab/ab + aB/ab + AB/ab + ab/ab.
Morgan examined more than 2300 progeny
of breeding 2 and found that 41.3 percent
had red eyes and vestigial wings (Ab/ab),
45.7 percent had purple eyes and normal
wings (aB/ab), 6.7 percent had red eyes
and normal wings (AB/ab), and 6.3 percent
had purple eyes and vestigial wings (ab/
ab). Again, all the offspring exhibiting the
last two phenotypes result only from cross-
overs during meioses in the female parent.
Thus the data indicate that the recombina-
tion fraction for the two allele pairs is 0.1 30,
which corresponds to a genetic distance of
about 15 centimorgans.
Note that the two data sets yield different
values for the same genetic distance. How-
ever, the difference between the values is
within the statistical uncertainties associ-
ated with measurements of probabilistic
events. Note also that the same genetic
distance could in principle be determined
by carrying out the reciprocal of breeding 1
or breeding 2 (that is, a breeding between
a doubly heterozygous male and a doubly
homozygous-recessive female). Then, the
crossovers detected are those that occur
/-
during meioses in the
male parent rather than in the
female parent. However, for some
unknown reason crossing over simply
does not occur in male fruit flies. But fruit
flies are exceptional in that respect, and
genetic distances for other species can be
determined by carrying out either breeding
1 , say, or its reciprocal.
Breedings 1 and 2 are those that provide
unambiguous recombination-fraction data.
As an example of the ambiguities that can
arise, consider the fruit-fly breeding
AB/ab female x AB/ab male.
Assume first that crossing over between the
two loci does not occur during meioses in
the female parent. Then the offspring of
breeding 3 exhibit two phenotypes: red eyes
and normal wings (AB/AB and AB/ab) and
purple eyes and vestigial wings (ab/ab).
Now assume that crossing over does occur
during meioses in the female parent. Then
among the offspring of breeding 3 are some
that exhibit the two other possible pheno-
types: red eyes and vestigial wings (Ab/ab)
and purple eyes and normal wings (aB/ab).
All offspring that exhibit those two pheno-
types result only from crossing over. How-
ever, crossing over also leads to offspring
that exhibit one of the phenotypes produced
in the absence of crossing over, namely, red
eyes and normal wings (Ab/ABand aB/AB).
In other words, whereas the offspring pro-
The reader can accept on faith or verify
personally that breedings 1 and 2 are the
only breedings that provide unambiguous
recombination-fraction and hence genetic-
distance data. Note, in addition, that obtain-
ing even ambiguous data requires that one
parent be doubly heterozygous.
Determining a genetic distance is thus rela-
tively easy when the breeding of the organ-
ism in question can be manipulated at will.
But determining the genetic distance be-
tween the loci of two human allele pairs is
much more difficult, since the breeding of
humans cannot be manipulated, the geno-
types and allele distributions of human par-
ents are not always known, and human
breedings generally produce so few off-
spring that the statistical uncertainty in the
measured recombination fraction is large.
duced by breeding 1 or 2 can
unambiguously be sorted by pheno-
type into two categories-those that
are the result of crossovers and those
that are not-the offspring resulting
breeding 3 cannot be so sorted because
meioses that do and do not involve cross-
overs result in the doubly dominant pheno-
type.
Number 20 1992 Los Alamos Science 3 5
cell displays a characteristic pattern of dark and light bands when stained with an
appropriate dye (see "Chromosomes: The Sites of Hereditary Information"). Because
the banding pattern characteristic of a pair of homologous metaphase chromosomes
varies along the length of the chromosomes, it can also be used to identify different
regions of the chromosomes. The advent of chromosome banding led to recognition
of the occasional occurrence of aberrant chromosomes. (The incidence of aberrant
chromosomes, like the incidence of gene mutations, can be increased by exposure
to x rays or other mutagenic agents.) Several types of chromosome aberrations, or
rearrangements, were noted, including translocations (the exchange of chromosome
regions between nonhomologous chromosomes) and inversions (the reversal of the
orientation of a chromosome region).
Obviously a chromosome rearrangement can lead to changes in the complement of
genes present on a chromosome or to changes in their relative locations. The gene (or
genes) affected by a chromosome rearrangement (as determined from genetic data)
can then be assigned a locus within the rearranged chromosome region. Although
the locus so obtained is inexact, it is better than the alternative of knowing nothing
at all about the locus. Knowledge of the whereabouts on a chromosome of a
gene then serves to "anchor" a genetic-linkage map including that gene to the
chromosome. (Recall that a linkage analysis provides only distances between genes
on a chromosome; additional information is required to locate the genes relative to
the chromosome itself.)
Chromosome rearrangements and gene mutations are but two examples of naturally
rare phenomena that, once noted, are exploited to gain basic information about genes.
Another example is the exceptional behavior of the cells that compose the salivary
glands of Drosophila (and other insects of the order Diptera). In 1933 the American
zoologist Theophilus Shickel Painter (1889-1969) and independently two German
geneticists discovered that the chromosomes in those cells were microscopically vis-
ible during interphase. (Interphase chromosomes are usually not microscopically
visible because they have not yet condensed in preparation for mitosis.) For some
unknown reason the salivary cells of Drosophila undergo not a single round but many
successive rounds of chromosome duplication during the S phase of interphase (see
"The Eukaryotic Cell Cycle"). The numerous (on the order of a thousand) copies
of each chromosome remain closely associated along their lengths, forming a fiber
sufficiently thick to be microscopically visible. Because such "polytene" chromo-
somes are not condensed, sites of chromosome rearrangements can be pinpointed
with much greater resolution.
The Rise of Molecular Genetics. By 1940 many genes were known to exist, and
a goodly number of the known genes had been assigned to particular regions of
particular chromosomes. But the gene remained an abstract concept. No one knew
what genes do or even of what they are made. A speculation about what genes do
had appeared as early as 1903, when the French geneticist Lucien Claude Cuenot
(1866-1951) proposed that inherited coat-color differences in mice were due to the
action of different genes. And in 1909 the English physician Archibald Edward
Garrod (1857-1936) established that the human disease alkaptonuria was inherited as
a recessive trait variant and proposed that the unmistakable symptom of the disease
(urine that blackens after being excreted) was due to accumulation in the urine of
a metabolic product that normally is degraded with the help of a certain enzyme.
(An enzyme is a protein that catalyzes a biochemical reaction.) But Cuenot's and
Garrod's proposals were regarded as mere speculation for many years. Then, in
1941, the American geneticist George Wells Beadle (1903-1989) and the American
biochemist Edward Lawrie Tatum (1909-1975) clearly demonstrated the connection
between the genes an organism possesses and the biochemicals it is able to synthesize.
Beadle and Taturn's work focused on the bread mold Neurospora crassa. Because
wild-type spores of N. crassa can be cultured in the laboratory on a minimal growth
medium (one containing only sucrose, inorganic salts, and the vitamin biotin), they
reasoned that the mold must possess enzymes that help convert those simple molecules
into all the other necessities of life. By exposing N. crassa to ultraviolet light,
Beadle and Tatum produced a very few mutant spores that could not be cultured on
a minimal growth medium but could be cultured on a growth medium containing
a single additional nutrient (vitamin B6, for example). They concluded that the
x rays had caused a mutation in a gene that somehow directs the synthesis of an
enzyme involved in the synthesis of the nutrient. Evidence in support of such a
conclusion accumulated, and in 1948 the American geneticist and biochemist Norman
Harold Horowitz (1915-) propounded the famous one gene-one enzyme hypothesis.
Molecular genetics was born. Horowitz's hypothesis has since been modified to state
that one gene directs the synthesis of one protein, or, more precisely, one polypeptide
chain, since some proteins contain more than one polypeptide chain.
Beadle and Taturn's work on N. crassa demonstrated the value of studying such
a simple organism. Attention soon turned to even simpler organisms-bacteria.
The bacterium Escherichia coli, a tenant of the vertebrate gut, gained particular
favor. As a result of studies begun soon after World War II by Francois Jacob
(1 920-), Joshua Lederberg (1 925-), Jacques Lucien Monod (1 9 10-1 976), and Elie
Leo Wollman (1917-), more is known about the genes of E. coli, including their
regulation, than of any other living organism. Attention also focused on viruses,
the simplest of all organisms, and in particular on the viruses that infect bacteria,
known as bacteriophages or simply phages. (Viruses are composed of a nucleic acid
core encased in a protein coat. They are not living organisms in the sense that they
lack the machinery for biosynthesis. They can, however, reproduce-by usurping the
biosynthetic machinery of the cells they infect-and pass their characteristics from
generation to generation through the medium of genes just as cellular organisms do.)
In the United States the so-called Phage Group, led by Max Delbruck (1906-1981),
Alfred Day Hershey (1908-), and Salvador Edward Luria (19 12-199 I), aroused
interest in the interaction between phages and bacteria as a model system for studying
the fundamental mechanisms of heredity. Work by the Phage Group included
developing quantitative methods for studying the life cycles of phages and later
the discovery that phages can transfer bacterial genes from one bacterial strain to
another, a process called transduction. (Transduction was to become a progenitor
of recombinant-DNA technology.) The promiscuous exchange of genetic material
between different strains of bacteria and between bacteria and their viruses facilitated
the mapping of genes and the identification of their functions.
What genes are made of was the other big question about genes in the 1940s. In
1925 Wilson, reversing his previous stance, espoused protein as the genetic material.
The idea of a proteinaceous genetic material was subsequently widely accepted
for more than two decades, primarily because the nonproteinaceous component of
chromosomes, DNA (deoxyribonucleic acid), was thought by chemists to have a
structure that rendered it incapable of carrying any kind of message. However,
in 1944 the American bacteriologists Oswald Theodore Avery (1877-1955) and
his colleagues presented strong evidence that the genetic material was DNA. Their
evidence was the ability of DNA extracted from dead members of a pathogenic strain
of Streptococcus pneumoniae to impart the inherited characteristic of pathogenicity to
live members of a nonpathogenic strain of the same bacterium. (We now know that
the mechanism involved in the transformation from nonpathogenicity to pathogenicity
is DNA recombination, of which crossing over is a specific example.) And in 1952
Hershey and another member of the Phage Group, the American geneticist Martha
Chase (1927-), showed that DNA is the component of a phage that enters a bacterium
and thus presumably directs the synthesis of new phages within the infected bacterium.
Nevertheless, despite the accumulating evidence, DNA was not widely accepted as
the genetic material.
Then in 1953 James Dewey Watson (1928-) and Francis Harry Compton Crick
(1916) proposed a structure for DNA that accounted for its ability to self-replicate
and to direct the synthesis of proteins. The structure they proposed is of course
the famous double helix, which, like two-ply embroidery floss, is composed of
two strands coiled helically about a common axis. Each strand is a polymer of
deoxyribonucleotides, and each deoxyribonucleotide contains a phosphate group, the
residue of the sugar deoxyribose, and the residue of one of four nitrogenous organic
bases (adenine, cytosine, guanine, and thymine). The deoxyribonucleotides are linked
together in a manner such that alternating phosphate groups and sugar residues form
a backbone off which the bases project. Hereditary information is encoded in the
order, or sequence, of bases along the strands. The two strands are coiled about
the helix axis in a manner such that the backbones form the boundaries of a space
within which the bases are contained. Each base on one strand is linked by hydrogen
bonds to a base on the other strand; the members of each "base pair" lie in a plane
that is essentially perpendicular to the axis of the helix. Of the ten theoretically
possible base pairs, only two so-called complementary pairs are found in DNA: the
pair adenine and thymine and the pair cytosine and guanine. Thus the order of the
bases on one strand is precisely related to the order of the bases on the other strand,
and the two strands are said to be complementary. Further details are presented in
"DNA: Its Structure and Components."
Watson and Crick arrived at their structure for DNA with the help of x-ray diffraction
data for DNA fibers obtained by Maurice Hugh Frederick Wilkins (1916) and
Rosalind Franklin (1920-1957) and of the observation in 1950 by Erwin Chargaff
(1905-) that the number of molecules of adenine in any of various DNA samples
equals the number of molecules of thymine and that the number of molecules of
cytosine equals the number of molecules of guanine. In addition, following the
example of the American chemist Linus Carl Pauling (1901-), who in 1951 had
worked out the details of a helical polypeptide structure (the so-called a helix), they
made liberal use of ball-and-stick models.
Molecules of DNA are exceptional among biological macromolecules in two respects.
First, they are very long relative to their width. If the diameter of the double helix
could be increased to that of a strand of angel-hair pasta, the length of the DNA
molecule in a typical human chromosome would be about 12 kilometers. Second, al-
though single-helical configurations are not uncommon in biological macromolecules,
the double-helical configuration of DNA is unique. One might wonder why DNA is
double-stranded. After all, normally only one of the strands directs protein synthesis,
the two strands are replicated separately, and some viruses manage quite nicely with
only single-stranded DNA. The evolutionary advantage of double-stranded DNA is
thought to lie in the fact that, if one strand is damaged, the other strand can provide
the information required to repair the damaged strand.
The base-pairing feature of DNA immediately suggested that each strand of DNA
could serve as the template for directing the synthesis of a complementary strand. The
result would be two identical double-stranded DNA molecules, each containing one
new and one old strand. The suggestion that DNA replication is "semiconservative"
was proved correct (for the DNA of E. coli and a higher plant) several years after
the double-helical DNA structure was proposed. The details of DNA replication,
however, are very complex, involving a number of enzymes. One enzyme first
uncoils a portion of the DNA molecule, and another separates the two strands. Then
an enzyme called a DNA polymerase, using one of the separated DNA strands as
a template, catalyzes the polymerization of free deoxyribonucleoside triphosphates
into a strand that is complementary to the template. Some features of the process
are detailed in "DNA Replication."
Now that genes were known to direct the synthesis of proteins and to be made of
DNA, the next problem was to determine the relationship between DNA and proteins.
The first clue about the relationship came in 1949 when Pauling presented evidence
that the hemoglobin present in humans suffering from sickle-cell anemia differed
structurally from the hemoglobin in humans not suffering from that inherited disease.
(Hemoglobin is composed of two copies each of two polypeptides, the so-called a and
8 chains. The a chain contains 141 amino acids, and the f3 chain contains 150 amino
acids.) What features of a protein affect its structure? By the 1940s biochemists
were beginning to realize that the structure of a protein is determined not so much by
which amino acids it contains but more by the sequence of the amino acids along the
1
(a) Computer-generated
Image of DNA
(by Mel Prueitt)
Deoxyribose residue .
phosphate
group
I
to 3' carbon
of sugar
residue
iown, the two strands coil
about each other in a fashion such that all
the bases project inward toward the helix
axis. The two strands are held together by
hydrogen bonds (pink rods) linking each
base projecting from one backbone to its
so-called complementary base projecting
from the other backbone. The base A
always bonds to T (A and T are comple-
mentary bases), and C is always linked to G
(C and G are complementary bases). Thus
the order of the bases along one strand is
dictated by and can be inferred from the
order of the bases along the other strand.
(The two strands are said to be complemen-
tary.) The pairing of A only with T and of C
only with G is the feature of DNA that allows
it to serve as a template not only for its own
replication but also for the synthesis of
proteins (see "DNA Replication" and "Pro-
tein Synthesis"). Note that the members of
a base pair are essentially coplanar.
All available evidence indicates that each
eukaryotic chromosome contains a single
long molecule of DNA, only a small portion
of which is shown here. Furthermore, the
ends of each DNA molecule, called te-
lomeres, have a special base sequence
and a somewhat different structure.
Shown in (b)
is an uncoiled fragment of (a) containing
three complementary base pairs. From the
chemist's viewpoint, each strand of DNA is
a polymer made up of four repeated units
called deoxyribonucleotides, or simply
nucleotides. The four nucleotides are re-
garded as the monomers of DNA (rather
than the sugar residue, the phosphate group,
and the four base residues) because the
nucleotides are the units added as a strand
of DNA is being synthesized (see "DNA
Replication").
The usual configuration of DNA is shown in
(a). Two chains, or strands, of repeated
chemical units are coiled together into a
double helix. Each strand has a "backbone"
of alternating deoxyribose residues (larger
spheres) and phosphate groups (smaller
spheres). Free deoxyribose, C5O4Hl0, is one
of a class of organic compounds known as
sugars; the phosphate group, is a
component of many other biochemicals.
Attached to each sugar residue is one of
four essentially planar nitrogenous organic
bases: adenine (A), cytosine (C), guanine
(G), or thymine (T). The plane of each base
is essentially perpendicularto the helix axis.
Encoded in the order of the bases along a
strand is the hereditary information that
distinguishes, say, a robin from a human
and one robin from another.
A particular nucleotide is commonly desig-
nated by the symbol for the base it contains.
Thus T is a symbol not only for the base
thymine (more precisely, the thymine resi-
due) but also for the indicated nucleotide.
Also shown are chemical and structural
details of the backbone components. Note
that four carbon atoms of the sugar residue
and its one oxygen atom form a pentagon in
a plane parallel to the helix axis, and that
the fifth carbon atom of the sugar residue
projects out of that plane.
Shown in (c) are further chemical
and structural details of the DNA segment
shown in (b). The planes of the three base
pairs have been rotated into the plane of the
sugar residues. Details of particular note
include the following.
Linking any two neighboring sugar residues
is an -0-P-0- "bridge" between the 3'
carbon atom of one of the sugars and the 5'
carbon atom of the other sugar. (The desig-
nations 3' (three prime) and 5' (five prime)
arise from astandard system for numbering
atoms in organic molecules.) When a DNA
molecule is broken into fragments, as it
must be before it can be studied, the breaks
usually occur at one of the four covalent
bonds in each bridge.
Because deoxyribose has an asymmetric
structure, the ends of each strand of a DNA
fragment are different. At one end the termi-
nal carbonatom in the backbone is the 5'
Understanding Inheritance A
d
carbon atom of the -&
terminal sugar (the carbon atom that lies
outside the planar portion of the sugar),
whereas at the other end the terminal car-
bon atom is the 3' carbon atom of the
terminal sugar (a carbon atom that lies
within the planar portion of the sugar).
The two complementary strands of DNAare
antiparallel. In other words, arrows drawn
from, say, the 5' end to the 3' end of each
strand have opposite directions. Most of the
enzymes that move along a backbone in the
course of catalyzing chemical reactions
move in the 5'-to-3' direction. The compo-
sition of a DNA fragment is represented
symbolically in a variety of ways. However,
all of the representations focus on theorder,
or sequence, of the nucleotides (and hence
the bases) along the strands of the frag-
ment. For example, the most complete rep-
Carbon atom
Covalent bond
- - - - - - -
Hydrogen bond
DNA backbone
5'40-3'
direction
Hydregen atoms not
Involved fn hydrogen
bonding have bean
omitted in this drawing,
As a result some carbon
atomsand seme nifrogen
mmssappew to be
u n d a - b o a .
resentation for the fragment
shown above is
The most abbreviated representation, ACT
(or, equivalently, AGT), gives the sequence
of only one strand (since the sequence of
the complementary strand can be inferred
from the given sequence) and follows the
convention that the left-to-right direction
corresponds to the 5'-to 3' direction.
Parental DNA
molecule
A
Replication
Identical
daughter
DNA
molecules
Anoverall description of DNA replication is
quite simple. Each strand of a parent DNA
molecule serves as the template for synthe-
sis of a complementary strand. The result is
two daughter DNA molecules, each com-
posed of one parental strand and one newly
synthesized strand and each a duplicate of
the parent molecule. But this overall sim-
plicity, illustrated above, is misleading, since
DNA replication involves the intricate and
coordinated interplay of more than twenty
enzymes. The most important general fea-
ture of DNA replication is its extremely high
accuracy. A"proofreadingJ' capability of DNA
polymerase, the enzyme that catalyzes the
basic chemical reaction involved in replica-
tion, guarantees that only about one per
billion of the bases in a newly synthesized
strand differs from the complement of the
corresponding base in the template strand.
A more detailed description of DNA replica-
tion should note first that replication of a
chromosomal DNA molecule does not be-
gin at one end of the molecule and proceed
uninterruptedly to the other end. Instead,
scattered along the molecule are numerous
occurrences of a particular base sequence,
and each occurrence of that sequence
serves as an "origin of replication" for a
portion of the molecule. Thus different por-
tions of a DNA molecule are replicated
separately. Baker's yeast, Saccharomyces
cerevisiae, is one of the few eukaryotes for
which the base sequence of its origins of
replication is now known. Knowledge of the
base sequence of an organism's origins of
replication is necessary in the creation of
artificial chromosomes of the organism, syn-
thetic entities that are treated by the
organism's cellular machinery just as its
own chromosomes are treated. The clon-
ing vectors known as YACs are an example
of artificial chromosomes.
Replication of the portion of a DNA mol-
ecule flanked by two origins of replication
begins with the action of enzymes that move
along the parental DNA, progressively un-
coiling and denaturing (separating into single
strands) the double helix. Uncoiling and
denaturation expose the bases in each pa-
rental strand and thereby enable the bases
to direct the order in which deoxyribonucle-
otides are added by DNA polymerase to the
strand being synthesized.
Because, as shown in the figure at right,
DNA polymerase elongates a growing chain
of deoxyribonucleotides only in the 5'40-3'
direction (arrows), one of the new DNA
strands can be synthesized continuously
but the other strand must be synthesized in
short pieces called Okazaki fragments. (The
Okazaki fragments shown here are much
shorter than they are in reality.) The discon-
tinuous synthesis of one of the new strands
is the source of additional complexities in
replicating the very ends, the telomeres, of
a DNA molecule.
5'
5'
Time
Okazaki
fragments
As shown in the figure on the next page,the
participants in the chemical reaction bywhich
each portion of a DNAstrand is synthesized
include a "primer," the enzyme DNA poly-
merase, a DNAtemplate (a parental strand),
and a supply of free deoxyribonucleoside
triphosphates (dNTPs). The usual primer is
a very short strand of RNA, generally con-
taining between four and twelve ribonucle-
otides. (RNA is a single-stranded nucleic
acid; its structure is very similar to that of a
strand of DNA. Because the sugar residue
in RNA is derived from ribose rather than
deoxyribose, the repeated units in RNA are
DNA template
- 5' strand
RNA-
primer
The deoxyribonucleotide
dTTP binds to the first free
base, A, in the template
strand.
0- 0-
I I
0'-P-0-P-0-
II II
0 0
-L
DNA
polymerase
DNA polymerase
catalyzes the creation
of an -0-P-O- bridge,
thus extending the
backbone and
incorporating the
new base, T, into
the growing strand.
The atoms of the newly
formed -0-Pa- bridge
are shown explicitly and
highlighted in red.
Growing strand
5'
dCTP binds to the next
free base, G, on the
DNA template strand.
The polymerase continues
to extend the growing
strand in the 5'-to-3'
direction.
called ribonucleotides rather than deoxyri-
bonudeotides.) A primer is required be-
cause DNA polymerase catalyzes the addi-
tion of a deoxyribonucleotide to an existing
chain of nucleotides (either ribonucleotides
or deoxyribonucleotides) but not the de
novo synthesis of a chain of deoxyribo-
nucleotides. The action of each parental
strand as a template is based on hydrogen
bonding between complementary bases. In
particular, a base in a parental strand hydro-
gen bonds to the dNTP containing the
complementary base. As a result, the dNTP
is fixed in a position such that the DNA
polymerase can exert its catalytic action on
the triphosphate group of the dNTP and the
3' hydroxyl group of the 3'-terminal sugar of
the primer. The result is the addition of a
deoxyribonucleotide to the primer and the
release of a pyrop hosphate group, (PO7)+.
The next deoxyribonucleotide in the tem-
plate strand fixes its complementary dNTP
into position, the DNA polymerase moves
further along the chain being elongated,
and addition of another deoxyribonucle-
otide is effected by action of the polymerase
on the triphosphate group of the dNTP and
the hydroxyl group of the sugar of the de-
oxyribonucleotide just previously added.
Successive repetitions of the process and
eventual replacement of the RNA primer
with DNA lead to formation of double-
stranded DNAidentical to the parental DNA.
Number 20 1992 Los Alums Science 43
polypeptide chain. Then in 1957 Vernon Martin Ingram (1924) demonstrated that
the sixth amino acid in the ,8 chain of normal hemoglobin is glutamic acid, whereas
the sixth amino acid in the ,8 chain of sickle hemoglobin is valine. Otherwise, the
amino-acid sequences of both ,8 chains are identical. Ingram's work suggested that
the function of DNA was to determine the order in which amino acids are assembled
into proteins.
DNA itself could not, however, be the template for the synthesis of proteins, since
DNA is sequestered in the nucleus of a eukaryotic cell, whereas proteins were known
to be synthesized in the cytoplasm outside the nucleus. Perhaps an intermediary
substance was involved, one that receives hereditary information from DNA in the
nucleus and then moves to the cytoplasm, where it serves as the template for protein
synthesis. A likely candidate for such an intermediary was the other known nucleic
acid, namely ribonucleic acid, or RNA, which is found primarily in the cytoplasm.
Like DNA, RNA is a polymer of four different nucleotides, but the nucleotides
are ribonucleotides containing the sugar ribose, which differs from deoxyribose in
possessing a hydroxyl group on its 2' carbon atom. Another difference is that the base
thymine is absent from RNA, being replaced by the base uracil (U), which lacks the
extra-ring methyl group of thymine but, like thymine, hydrogen bonds with adenine.
The final difference between DNA and RNA is that RNA is usually single-stranded.
That RNA is the intermediary between DNA and proteins soon became the working
hypothesis of biochemists, and the details of protein synthesis were worked out in
the fifties and sixties. Briefly, a segment of DNA (a gene) serves as the template
for the synthesis, in the nucleus, of so-called messenger RNA (mRNA), a process
called transcription and similar to DNA replication. The mRNA then enters the
cytoplasm, where it serves as the template for the ordered assembly of amino acids
into a protein, a process called translation. Details of transcription and translation
are illustrated "Protein Synthesis."
The last general problem about the relation between DNA and proteins was to crack
the code relating the sequence of deoxyribonucleotides that constitutes a gene to the
sequence of amino acids that constitutes a protein. Experiments performed in 1961
by Crick and the British molecular biologist Sydney Brenner (1927-) suggested that
the code was a triplet code, or, in other words, that a sequence of three adjacent
deoxyribonucleotides (a codon) specifies each amino acid. The genetic code was
completely cracked by 1966, thanks primarily to the independent efforts of two
groups, one led by Marshall Warren Nirenberg (1929-) and the other by Har Gobind
Khorana (1922-). As shown in "The Genetic Code," eighteen of the twenty amino
acids are specified by two or more codons. The redundancy of the code implies
that gene mutations involving single-base substitutions do not necessarily result in a
change in an amino acid.
Now that what seemed the major questions about the material and mechanisms
of heredity had been answered, was anything fascinating left to learn? Or would
(a) Protein Synthesis in Prokaryotic and Eukaryotic Cells
Prokaryotic
Cell
^
Sense strand
Template
(non-sense strand) 1
\
1
Translation
A
Cell wall
A
Protein synthesis is the process by which
information encoded in a gene is converted
into a specific protein. In 1957 Francis Crick
proposed two hypotheses about protein
synthesis, which later became known as
the central dogma of molecular biology. He
proposed first that gene sequences are
'collinear" with protein sequences. In other
words, the linear arrangement of subunits
(deoxyribonucleotides) composing a gene
corresponds to the linear arrangement of
subunits (amino acids) composing a pro-
tein. Second, Crick proposed that a seg-
ment of RNA (a ribonucleotide sequence)
acts as an intermediate translator between
the deoxyribonucleotide sequence and the
amino-acid sequence, or, in other words,
that genetic information flows from DNA to
RNA to protein. Crick had no experimental
evidence to support his hypotheses. But
very shortly Charles Yanofsky and Seymour
Benzer, working independently, provided
the first evidence in support of the collinear-
ity hypothesis. Their experiments showed
that mutations in the genes of E. coiiand of
the T4 bacteriophage produced parallel
changes in amino-acid sequences. And as
details of protein synthesis were worked
out, the role of RNA as an intermediary was
also established.
Eukaryotic A
Cell A
PROTEIN SYNTHESIS
1
Transcription
, 3
Primz
, *. trans
7
7
Cytoplasm
v
Shown in (a) is an overview of protein syn-
thesis in a prokaryotic cell. In the first stage,
called transcription, a DNAsegment, agene,
serves as a template for the synthesis of a
single-stranded RNA segment called a
messenger RNA (mRNA). The base se-
quence of the mRNA is complementary to
the base sequence of one strand of the
gene (the template, or "non-sense," strand)
and is therefore identical to the base se-
quence of the other strand of the gene (the
'sensen strand). The one exception to the
identity is that the base U (uracil) replaces
the base T. (Recall that in RNA uracil, rather
than thymine, is the base complementary to
adenine.)
In the second stage of protein synthesis,
called translation, the mRNA serves as the
template for the stringing together of amino
acids into a protein. The protein is assembled
according to the genetic code. That is, the
succession of codons (triplets of adjacent
ribonucleotides) that compose the mRNA
dictates the succession of amino acids that
compose the protein. (A listing of codons
and corresponding amino acids is presented
in 'The Genetic Code.") Although transcrip-
tion and translation are depicted here as if
they occurred at different times, translation
of a prokaryotic mRNA often begins before
its synthesis by transcription is complete.
Also shown in (a) is an overview of protein
synthesis in a eukaryotic cell. Unlike pro-
karyotic genes, most eukaryotic genes are
composed of stretches of protein-coding
sequences (exons) interrupted by longer
stretches of noncoding sequences (introns).
Both the exons and introns within a eukary-
otic gene are transcribed. The resulting
primary transcript is then spliced; that is,
each intron is removed and the adjacent
exons are linked together.
I
The shortened RNA is now an mRNA, an
RNA that contains only protein-coding se-
quences. The mRNA leaves the nucleus
and in the cytoplasm is translated into a
protein according to the genetic code. Thus
transcription and translation are of neces-
sity temporally separated in eu karyotic cells.
The overviews in (a) illustrate that, as Crick
had postulated, genetic information flows
from DNA to RNA to protein within both
prokaryotic and eukaryotic cells. One im-
portant exception to the central dogma is
the class of viruses known as retroviruses,
of which the AIDS virus is an example.
Retroviruses store genetic information in
RNA and then convert the information to
DNA-a reversal of the usual information
flow that is known as reverse transcription.
Details of transcription and translation are
shown in (b) and (c) respectively. Transcrip-
tion begins when an enzyme, an RNA poly-
merase, binds to a particular segment of a
gene called the promoter. The double helix
then uncoils and separates into two strands,
exposing a small number of bases. The
RNA polymerase facilitates hydrogen bond-
ing between an exposed base in the tem-
plate strand and its complementary base in
a free ribonucleoside triphosphate (NTP)
and then between the next exposed base in
the template strand and its complementary
base in another free NTP. While the two
NTPs are held in proximity by the hydrogen
bonds, the RNA polymerase catalyzes the
formation of an -0-P-O- bridge between
them, thus forming a chain of two covalently
linked ribonucleotides. (SeeUDNA Replica-
tion" for details about formation of -0-P-0-
bridges.) A third NTP is hydrogen-bonded
to the third exposed base in the template
strand and is covalently linked to the second
ribonucleotide in the chain. The RNA poly-
merase moves along the template in the 3'-
to-5' direction, continuing to unwind and
separate the double helix and to elongate
the RNA chain in the 5"-to-3' direction by
catalyzing the addition of successive ribo-
nucleotides. At the same time, the distorted
DNA in the wake of the polymerase re-
winds. After the gene is fully transcribed,
the polymerase separates from the double
helix. If the gene transcribed is a eukaryotic
gene, the newly minted RNA is spliced and
the resulting mRNA enters the cytoplasm
through pores in the nuclear membrane.
As shown in (c), translation occurs with the
help of transfer RNA molecules (tRNAs)
and ribosomes. Each tRNA is a tiny, clover-
leaf-shaped molecule that serves as an
adapter: At one end it contains a triplet of
ribonucleotides (an anticodon) that binds
with a complementary codon on the mRNA
strand, and at the other end it has an attach-
ment site for a single amino acid. Many
varieties of tRNAs exist. An important dif-
ference between one tRNA and another is
the presence of a different anticodon on the
central cloverleaf stem. The number of dif-
ferent anticodons found in thevarious tRNAs
is less than the number of codons in the
genetic code. That is so because the base
pairing between the third base of the mRNA
codon and the first base of the tRNA anti-
codon can depart from the usual Watson-
Crick rules. For example, G can pair with U
in addition to C.
Ribosomes are very large molecules com-
posed of ribosomal RNA (rRNA) and ap-
proximately fifty different proteins. As a ribo-
some travels along an mRNA it catalyzes
the reactions that lead to synthesis of the
protein encoded in the mRNA. Thousands
of ribosomes exist within each cell.
Before a tRNA molecule participates in trans-
lation, it must be converted to an aminoacyl-
tRNA (become attached to the amino acid
corresponding to its anticodon). Each of the
twenty amino acids found in proteins can be
attached to at least one type of tRNA, and
most can be attached to several. The bind-
ing between tRNA and amino acid is cata-
(b) Transcription
Sense strand
- . .
Messenger RNA
(non-sense strand)
1
46 Los Alarnos Science Number 20 1992
- -
(c) Translation
^-
Amino-acid sequence
' (protein)
>\y
Anticodon
-
Amino acid
/
Messenger RNA Ribosome
w
Aminoacyl synthetase
Attachment site
v
lyzed by one of a group of enzymes. Those
exquisitely specific enzymes, called
aminoacyl synthetases, are in fact theagents
by which the genetic information in mRNA is
decoded.
Translation begins when an aminoacyl-tRNA
containing the amino acid methionine and a
ribosome bind to an initiation sequence
near the 5' end of the mRNA. The initiation
sequence consists of the START codon
AUG, to which the aminoacyl-tRNA binds
through complementary base pairing. A
second aminoacyl-tRNA, which contains
an anticodon complementaryto the second
mRNAcodon, binds to the mRNA. Then the
amino acid on the first aminoacyl-tRNA is
joined by a peptide bond to the amino acid
on the second aminoacyl-tRNA, thus creat-
ing a chain of two amino acids dangling off
the end of the second aminoacyl-tRNA. The
process continues as the ribosome moves
along the mRNA (in the 5'-to-3' direction)
and as peptide bonds are formed between
successive amino acids. When the ribo-
some reaches a STOP codon within the
mRNA, the ribosome detaches from the
mRNA, and the completed protein is re-
leased into the cytoplasm.
The process of translation is fast: A single
ribosome can translate up to fifty ribonucle-
otides per second. Furthermore, at anyone
time numerous ribosomes may be traveling
along a single mRNA, each producing a
molecule of the same protein. Thus a pro-
tein needed for diverse tasks within the cell
can be quickly and efficiently produced.
Note: Published only recently (in June 1992)
was strong evidence that the formation of
peptide bonds between amino acids during
translation is catalyzed not by some protein
enzyme within a ribosome but instead by an
RNA component of the ribosome. That
news is exciting but not completely unex-
pected, since the ability of RNA to function
as a catalyst in other situations had been
demonstrated in the early 1980s. In particu-
lar, the primary transcript of a ribosomal-
RNA gene of the protozoan Tetrahymena
thermophila had been shown to effect its
own splicing and the catalytic action of an
RNA-protein complex that processes the
primary transcripts of certain transfer-RNA
genes had been ascribed to the RNA com-
ponent of the complex rather than the pro-
tein component.
THE G7NETln Crime '
(a) RNA Codons for the Twenty Amino Acids
Wh a t triplet of ribonucleotides directs the
addition of, say, the amino acid alanine to a
protein that is being synthesized? Of ly-
sine? Of any one of the twenty amino acids
found in proteins? That was the problem to
be faced after advancement of the ideas
that a gene is a string of deoxyribonucle-
otide triplets, that the string of deoxyribo-
nucleotide triplets is transcribed into astring
of ribonucleotide triplets, and that the string
of ribonucleotide triplets is translated into a
string of amino acids-a protein. The results
of research on the problem is condensed in
the genetic code, a listing of the sixty-four
possible ribonucleotide triplets and the amino
acid (or translation command) correspond-
ing to each. Fortunatelyforthose who worked
on the problem, the genetic code is organ-
ism-independent. That is, the same genetic
code is used by virtually all organisms.
Researchers began to crack the genetic
code in the early 1960s. Marshall Nirenberg
and his collaborators added a synthetic
RNA, consisting entirely of repetitions of a
single ribonucleotide, say U, to a bacterial
extract that contained everything neces-
sary for protein synthesis except RNA. The
result was a string of the amino acid phenyl-
alanine. They concluded that the ribonucleo-
tide triplet UUU codes for phenylalanine.
Other ribonucleotide triplets were decoded
by performing similar experiments with syn-
thetic RNAs containing only A's, c's, or G's
or various combinations of ribonucleotides.
By 1966 research teams led by Har Gobind
Khorana and Marshall Nirenberg had
cracked the entire genetic code.
Second base
U I C I A
Phe
Phe
Leu
Leu
Leu
Leu
Leu
L e u
l le
l le
lie
Met (start)
Val
Val
Val
Val
Ser TY r
Ser TY r
Ser STOP
S e r STOP
Pro His
Pro His
Pro Gin
Pro ~p Gin
Thr Asn
Thr Asn
Thr L Y ~
Thr L Y ~
Ala ASP
Ala ASP
Ala Glu
Ala Glu
Shown in (a) is the usual representation of
the genetic code. The letters U, C, A, and G
are symbolsfor the ribonucleotides contain-
ing the bases uracil, cytosine, adenine, and
guanine, respectively. The symbols in the
body of the table are three-letter abbrevia-
tions for the amino acids. To find the amino
acid specified by a particular codon (say the
codon CAG), locate the first nucleotide (C)
along the left side of the table and the
second nucleotide (A) along the top of the
table. Their intersection pinpoints one of
four amino acids. Of those four the one
aligned with the third nucleotide (G) is the
amino acid in question. Thus the amino acid
glutamine (Gin) is specified by the three-
nucleotide sequence CAG.
Shown in (b) is another version of the ge-
netic code, one expressed in terms of DNA
G
cys U
cys c
STOP A
Trp G
Arg U
Arg C
Arg A
Arg G
Ser U
Ser C
Arg A
Arg G
-
Gly U
Gly C
Gly A
Gly G
-
Amino-acid
abbreviations
Ala = Alanine
Arg = Arginine
Asp = Aspartic acid
Asn = Asparagine
Cys = Cysteine
Glu = Glutamic acid
Gin = Glutamine
Gly = Glycine
- His = Histidine
a.
o- lie = Isoleucine
Leu = Leucine
LYS = Lysine
Met = Methionine
Phe = Phenylalanine
Pro = Proline
Ser = Serine
Thr = Threonine
Trp = Tryptophan
Tyr = Tyrosine
Val = Valine
codons instead of RNA codons. Each single-
stranded deoxyribonucleotide triplet listed
in (b) is the sequence of the so-called sense
strand of a DNA codon-the strand that
does not serve as a template for synthesis
of RNA. Note that most of the amino acids
are specified by at least two codons. For
example, phenylalanine is specified by two
codons: TTT and TTC. Arginine is specified
by a total of six codons: CGT, CGC, CGA,
CGG, AGA, and AGG. In general, the more
an amino acid is used in protein synthesis
the likelier it is to be specified by more than
one codon. Note also the start codon (ATG)
and the three stop codons (TAA, TGA, and
TAG) that are used to signal the beginning
and end of protein synthesis. The substan-
tive difference between the two versions of
the genetic code is that in (b) the deoxyribo-
nucleotide T replaces the ribonucleotide U.
-
(b) DNA Codons for the Twenty Amino Acids
Ala Arg Asp Asn Cys Glu Gin Gly His lieu Leu Met Phe Pro Ser Thr Trp Tyr Val STOP
Lys /QTART\
GCA AGA GAT AAT TGT GAA CAA GGA CAT ATA TTA AAA ATG TTT CCA AGT ACA TGG TAT GTA TAA
GCG AGG GAC AAC TGC GAG CAG GGG CAC ATT TTG AAG TTC CCG AGC ACG TAC GTG TAG
GCT CGA GGT ATC CTA CCT TCA ACT GTT TGA
GCC CGG GGC CTG CCC TCG ACC GTC
CGT CTT TCT
CGC CTC TCC
--
molecular genetics degenerate into clearing up details here and details there? Some
thought so, and bemoaned the passing of a golden age. But in reality another era, and
one just as golden, was opening, thanks to development of techniques for manipulating
and analyzing DNA.
The Techniques of Molecular Genetics
The late 1960s mark the beginning of the recombinant-DNA revolution. During the
ensuing years it became possible to make billions of identical copies of segments
of DNA by cloning (duplicating) each segment individually as a recombinant DNA
molecule in the bacterium Escherichia coli. The significance of that breakthrough was
enhanced by other new developments, including the ability to separate fragments of
DNA that differ in length by only a few nucleotide pairs, to determine the nucleotide
sequences of cloned segments of DNA, to create specific mutations in cloned genes,
and to introduce cloned eukaryotic genes into experimental organisms.
Those startling developments arose from advances during the previous decade in
nucleic-acid biochemistry and in bacterial and phage genetics. Basic features of the
replication, repair, and recombination of DNA and of the synthesis of proteins had
been elucidated, and identification and isolation of the enzymes that catalyze the
chemical reactions involved had allowed those processes to be reproduced in vitro.
The action of phages as carriers of genetic material between different strains of E.
coli had been utilized to isolate individual E. coli genes. The rates of transcription of
E. coli genes had been determined (by measuring the amounts of RNA transcribed
from the different genes) and had been found to be regulated, that is, to vary from
gene to gene and in response to external stimuli. The observed regulation of gene
expression in E. coli had been traced to the interaction of certain proteins with
regulatory sequences in its genome. By 1968 about a hundred genes had been ordered
on the genetic maps of phages, and about fifteen hundred genes had been ordered
on the genetic map of E. coli.
On the other hand, essentially nothing was known about the structure of eukaryotic
genes, their regulation, or their organization in chromosomal DNA molecules. Even
the major difference between prokaryotic and eukaryotic genes-the presence of
introns in the latter-had not yet been discovered. Most frustrating was the lack
of a methodology for studying eukaryotic genomes analogous to the phage-bacteria
system for studying the organization, rearrangement, and functions of phage and
bacterial genomes.
But in 1968 techniques began to be developed that exploit the cellular machinery and
the biosynthetic products of bacteria to replicate, manipulate, and analyze eukaryotic
genes and to manufacture eukaryotic proteins. Improvements during the past twenty
years in recombinant-DNA techniques have produced an explosion of knowledge
about eukaryotic genes and about the organization and rearrangements of DNA in
eukaryotic genomes, including the human genome.
This section briefly describes some of the techniques that are employed in the study
of DNA and points out some of the facts about DNA the techniques have helped to
reveal. The chronological approach will be more or less abandoned, and none of the
contributions will be attributed to their originators.
A description of the preparation of a sample of DNA is appropriate as a preliminary
to this section. The usual preparation procedure involves treating a large number of
cells (typically about 5 million) of the organism in question with a detergent, which
dissolves cellular membranes and dissociates the proteinaceous component of the
chromosomes from the DNA. Then the membrane components and the proteins are
removed with an organic solvent such as a chloroform-phenol mixture, and the DNA
is precipitated with ethanol as a highly viscous liquid. The mass of the DNA in such a
sample is small, about 30 micrograms in the case of human DNA and correspondingly
smaller in the case of DNA extracted from organisms with smaller genomes.
It is worth noting that no DNA sample prepared in the above manner contains intact
DNA molecules. The mechanical aspects of sample preparation (such as stirring and
pipetting) invariably break some of the covalent bonds of the DNA backbones. That
accidental fragmentation is usually of little consequence, however, because most of
the techniques employed to study DNA at the molecular level are applicable only to
stretches of DNA shorter than the intact molecules found in chromosomes. In fact,
deliberate fragmentation, by either mechanical or biochemical means, is the first step
in many of the techniques to be described below.
The length of a DNA molecule or fragment is expressed in terms of the number of
base pairs it contains. (Because the structure of DNA is regular, number of base
pairs is directly proportional to physical length.) The average length of the intact
DNA molecules within human chromosomes, for example, is about 130 million base
pairs, which corresponds to a physical length of about 4.5 centimeters. The lengths
of the known human genes are much shorter, ranging from less than a hundred
base pairs for the transfer-RNA genes to over a million base pairs for the Duchenne
muscular-dystrophy gene and the cystic-fibrosis gene.
We turn now to the means for manipulating and analyzing DNA.
Fractionation by Copy Number and Repetitive DNA. The mid 1960s brought to
light a surprising feature of eukaryotic DNAs: their content of multiple identical or
nearly identical copies of various sequences. The various repeated sequences are
collectively called repetitive DNA, and, depending on the species, repetitive DNA
is estimated to constitute between 3 and 80 percent of the total. (Between 25 and
35 percent of the human genome, and of other mammalian genomes, is repetitive
DNA.) In contrast, the DNAs of viruses and prokaryotes contain no or very little
repetitive DNA. The phenomenology of repetitive DNA is complex and not yet fully
explored. A few of the repeated sequences are genes, but most have no known
function. The multiple copies of some repeated sequences are situated one after the
other; the known lengths of the repeated units in such tandem repeats range from two
base pairs to several thousand base pairs. Some tandem repeats occur at only one
location within a genome; others, called interspersed tandem repeats, occur at many
locations. Like the multiple copies of an interspersed tandem repeat, the multiple
copies of other repeated sequences are scattered here and there within a genome; the
known lengths of such interspersed repeats range from about a hundred base pairs
to seven thousand base pairs. And finally the copy numbers of the various repeated
sequences range from less than ten to over a million. Two of the many repeated
sequences found in the human genome are the GT sequence, an interspersed tandem
repeat that consists of between fifteen and thirty tandem repetitions of the sequence
5'-GT and has a copy number on the order of a hundred thousand, and the Alu
sequence, an interspersed repeat that is about three hundred base pairs in length and
has a copy number close to 2 million.
The existence of repetitive DNA became known from comparison of the renaturation
kinetics of prokaryotic and eukaryotic DNAs. Recall that the natural configuration
of DNA is double-stranded. However, DNA can be separated into single strands
(denatured) by, say, heating an aqueous solution of the DNA to about 100C
When the temperature of a thermally denatured sample of DNA is lowered, random
encounters among the single-stranded fragments lead to renaturation, or the re-
establishment of hydrogen bonds between complementary fragments. The kinetics of
the renaturation can be monitored by, for example, measuring the time dependence
of the absorption of ultraviolet light by the sample, since single- and double-stranded
DNA have different capacities to absorb ultraviolet light.
Consider the renaturation of two samples of denatured DNA, one prepared by
breaking the genome of E. coli into equal-length fragments and the other prepared by
breaking, into fragments of the same length as the E. coli fragments, a hypothetical
DNA molecule of the same total length as the E. coli genome but composed of
multiple repetitions of a single sequence. Each single-stranded E. coli fragment is
complementary to only one of the many single-stranded fragments in the first sample,
whereas each single-stranded hypothetical fragment is complementary to one-half of
the equally numerous single-stranded fragments in the second sample. Obviously,
then, the hypothetical sample renatures more rapidly, at least initially, than the E.
coli sample, and therefore the graphs of fraction renatured versus time for the two
samples are different. This example illustrates why renaturation-kinetics data are the
source of information about the presence of repetitive DNA.
Other types of information can be extracted from renaturation-kinetics data. Consider
the renaturation of the E. coli genome and the genome of the virus known as T4,
each broken into fragments of the same length. Both genomes contain essentially
no repetitive DNA, but the sample of E. coli DNA contains a greater number of
fragments because the E. coli genome (which contains about 5,000,000 base pairs
of DNA) is larger than the T4 genome (which contains about 170,000 base pairs
of DNA). Therefore the E. coli genome renatures less rapidly than the T4 genome.
In other words, renaturation kinetics provides information about the relative sizes of
genomes. Furthermore, because the rate at which hydrogen bonds are established
between fragments of single-stranded DNA that have similar but not identical base
sequences depends on the degree of similarity of the base sequences of the fragments,
the kinetics of the joint renaturation of samples of DNA from different species
provides an estimate of the overall similarity of the base sequences of the DNAs.
Today renaturation is most often used to fractionate fragments of DNA by copy
number, that is, to separate a DNA sample into components containing highly
repetitive DNA, less highly repetitive DNA, and single-copy DNA. Such a separation
narrows the search for genes, most of which occur only once within a genome and
hence are contained in the single-copy fraction.
Fragmenting DNA with Restriction Enzymes. Until 1970 DNA molecules were
of necessity fragmented by mechanical means, such as forcing a sample through a
syringe. Mechanical fragmentation has disadvantages: Identical pieces of DNA are
not fragmented at the same points, and the lengths of the resulting fragments vary
widely. Then came discovery of restriction enzymes (or, more precisely, type I1
restriction endonucleases), biochemicals capable of "cutting" double-stranded DNA
not only in a reproducible manner but also into less widely varying lengths. In
particular, a restriction enzyme recognizes and binds to an enzyme-specific, very
short sequence within a DNA segment and catalyzes the breaking of two particular
oxygen-phosphorus-oxygen (-0-Pa-) bridges, one in each backbone of the segment.
The locations along a stretch of DNA of the sequence recognized by a restriction
enzyme are called restriction sites.
The -0-P-0- bridges broken by a restriction enzyme usually lie within the recognition
sequence of the enzyme. For example, the restriction enzyme EcoRI recognizes and
binds to the sequence
and, if allowed to interact with a sample of DNA for a sufficiently long time
(to completely "digest" the DNA), cuts the DNA within every occurrence of that
sequence. Note that the sequence recognized by EcoRI, like the sequences recognized
by many other restriction enzymes, is palindromic; in other words, the 5'-to-3'
sequence of one strand is identical to the 5'-to-3' sequence of the other strand.
The average length of the restriction fragments produced by EcoRI, a "6-base cutter"
(a restriction enzyme that recognizes a 6-base-pair sequence), can be estimated to be
about 4000 base pairs, since DNA is approximately a random sequence of four base
pairs and any given sequence of six base pairs occurs on average every 46 = 4096
base pairs within such a sequence. (Note, however, that the observed average length
of the fragments produced by an N-base cutter sometimes differs considerably from
the estimate of 4^.) Fragments with a shorter average length can be obtained by
complete digestion with, say, a 4-base cutter, and fragments with a longer average
length can be obtained by complete digestion with a restriction enzyme that recognizes
a sequence longer than 6 base pairs or by partial digestion with a 6-base cutter, which
leaves some of the restriction sites uncut.
A majority of the many restriction enzymes available today, including EcoRI, cut
DNA in a fashion such that the resulting fragments terminate in a very short section
of single-stranded DNA. For example EcoRI cuts the DNA segment
5'- . . . GAATTC . . . -3'
3'- . . . CTTAAG . . . -5'
into the fragments
and
Note that the single-stranded ends of the two EcoRI restriction fragments are com-
plementary. The utility of such "sticky" ends in the creation of recombinant DNA
molecules will be described below.
A brief natural history of restriction enzymes is presented in "Restriction Enzymes,"
as well as a listing of a few of the many available.
Fractionating DNA Fragments by Length: Gel Electrophoresis. Because DNA
fragments are negatively charged, they are subject to an electrical force when placed
in an electric field. In particular, DNA fragments placed in a gel (a porous, semisolid
material) move through the gel in a direction opposite to the direction of an applied
electric field. Furthermore, the rate at which a fragment travels is approximately
inversely proportional to the logarithm of its length. Therefore gel electrophoresis
is a means for separating DNA fragments by length. Details of the technique are
described in "Gel Electrophoresis."
But what is the point of separating fragments of DNA by length? After all, the lengths
of the fragments obtained either by breaking a DNA molecule mechanically or by
cutting it with a restriction enzyme bear no relation to the functioning of the molecule
within a cell. Nevertheless, gel electrophoresis, particularly of restriction fragments,
is of great utility in the study of DNA. For example, consider the genome of the phage
known as A (lambda), a double-stranded DNA molecule about 50,000 base pairs in
length. When many copies of the A genome are completely digested with Ec0R.I
and the resulting restriction fragments are subjected to gel electrophoresis, groups of
Li ke the immune systems of vertebrate
eu karyotes, the restriction enzymes of bac-
teria combat foreign substances. In particu-
lar, restriction enzymes render the DNA of,
say, an invading bacteriophage harmless
by catalyzing its fragmentation, or, more
precisely, by catalyzing the breaking of cer-
tain -0-P-0- bridges in the backbones of
each DNA strand. The evolution of restric-
tion enzymes helped many species of bac-
teria to survive; their discovery by humans
helped precipitate the recombinant-DNA
revolution.
Three types of restriction enzymes are
known, but the term "restriction enzymeJ1
refers here and elsewhere in this issue to
type II restriction endonucleases, the only
type commonly used in the study of DNA. (A
nuclease is an enzyme that catalyzes the
breaking of -0-P-O- bridges in a string of
deoxyribonucleotides or ribonucleotides; an
endonuclease catalyzes the breaking of
internal rather than terminal -0-P-O-
bridges.) Many restriction enzymes have
been isolated; more than seventy are avail-
able commercially. Each somehow recog-
nizes and binds to its own restriction sites,
short stretches of double-stranded DNA
with aspecific basesequence. Having bound
to one of its restriction sites, the enzyme
catalyzes the breaking of one particular -0-
-P-0- bridge in each DNA strand.
The accompanying table lists a few of the
more commonly used restriction enzymes
and the organism in which each is found.
The first three letters of the name of a
restriction enzyme are an abbreviation for
the species of the source organism and are
therefore customarily italicized. The next
letter(s) of the name designates the strain of
the source organism, and the terminal Ro-
Restriction Enzyme
BamH I
Eco R I
MboI
Source Organism Base Sequence of Restriction Site
Bacillus
amyloliquefaciens
Escherichia coli
Haemophilus aegyptius
Haemophilus influenzae
Moraxella bows
Nocardia otitidis
Thermus aquaticus
ian numeral denotes the order of its dis-
covery in the source organism.
Also listed in the table are the base se-
quences of the restriction sites of the en-
zymes. The red line separates the ends of
the resulting fragments. The restriction sites
of many of the known restriction enzymes
and of all the restriction enzymes listed in
the table have palindromic base sequences.
That is, the 5'-to-3' base sequence of one
strand is the same as the 5'-to-3' base
sequence of its complementary strand. Both
the bridges broken by a restriction enzyme
that recognizes a palindromic sequence lie
within or at the ends of the sequence.
Note that most of the restriction enzymes in
the table make "staggered" cuts; that is,
they produce fragments with protruding
single-stranded ends. Those "cohesive," or
'stickyJJJ ends are very useful. Suppose that
a sample of human DNA and a sample of
phage DNA are both fragmented with the
same restriction enzyme, one that makes
staggered cuts. When the resulting frag-
ments are mixed, they will tend to hydrogen
bond with each other because of the
complementarity of their sticky ends. In
particular, some human DNA fragments
will hydrogen bond to some phage DNA
fragments. And that bonding is the first step
in the creation of a recombinant DNA mol-
ecule.
A final point about restriction enzymes is the
problem of how the DNA of a bacterium
avoids being chopped up by the friendly fire
of the restriction enzyme(s) it produces.
Evolution has solved that problem also. A
bacterium that produces a type I I restriction
endonuclease produces in addition another
enzyme that catalyzes the modification of
restriction sites in its own DNA in a manner
such that they cannot serve as binding sites
for the restriction enzyme.
Historically gel electrophoresis was first
applied to separating proteins essentially
according to mass, but the technique was
adapted to separating fragments of DNA (or
RNA) essentially according to fragment
length. The technique works on DNA be-
cause the phosphate groups of a DNA
fragment are negatively charged, and there-
fore, under the influence of an electric field,
the fragment migrates through a gel (a
porous, semisolid medium) in a direction
opposite to that of the field. Furthermore,
the rate at which the fragment migrates
through the gel is approximately inversely
proportional to the logarithm of its length.
Gel electrophoresis of DNA is carried out
with two types of electric field. Conventional
gel electrophoresis employs a field that is
temporally constant in both direction and
magnitude. Incontrast, pulsed-field gel elec-
trophoresis employs a field that is created
by pulses of current and therefore varies
periodically from zero to some set value.
More important, the direction of the electric
field also varies because different pulses
flow through pairs of electrodes at different
locations. (Note, however, that the time-
averaged direction of the electric field is
along the length of the gel.) The advantage
of such a pulsed field is that it prevents long
DNAfragments, fragments longerthan about
50,000 base pairs, from jackknifing within
the structural framework of the gel and thus
allows the long fragments to migrate through
the gel in a length-dependent manner, just
as shorter fragments migrate in a constant
electric field.
The gel employed is usually a solidified
aqueous solution of agarose, a purified form
of agar. By varying the concentration of
agarose in the gel, conventional gel electro-
phoresis can be applied to samples con-
taining DNAfragments with average lengths
between a few hundred base pairs and tens
of thousands of base pairs. (Another gel
used for conventional electrophoresis is
polyacrylamide, which is particularly suited
GEL ELECTROPHORESIS
(a) Conventional Gel Electrophoresis
loaded a gel-calibration sample, a sample
containing fragments of known lengths. As
DNA fragments
shown in (a), the flow of electricity through
I
Cathode 1 Agarose gel Anode
the gel causes the fragments to migrate
toward the positive electrode. The shorter
fragments move more easily through the
gel and therefore travel farther.
\ ^- Buffer solution
Electrophoresis chamber
The positions of the fragments after electro-
phoresis can be detected by soaking the gel
in a solution of ethidium bromide, which
binds strongly to DNA and emits visible light
when illuminated with ultraviolet light. In a
photograph of the ultraviolet-illuminated gel,
the fragments appear as light bands. The
ethidium-bromide visualization technique
makes the positions of all the fragments in
the gel visible. An alternative visualization
to separating fragments with lengths less technique detects only certain fragments
than about a thousand base pairs and is (see "Hybridization Techniques").
therefore the gel of choice for sequencing.)
Conventional gel electrophoresis in an aga- The above description of gel electrophore-
rose gel is illustrated in (a); details of the sis might suggest that the sample of DNA
technique are as follows. contains but one copy of each fragment. In
reality the sample must contain many cop-
Agarose isdissolved in a hot buffer solution, ies of each fragment, and each band seen
and the gel solution is allowed to solidify into in the image of the length-separated frag-
a thin slab in a casting tray in which the teeth mentscontains manyfragments, all of which
of a comb-like device are suspended. After have the same length but not necessarily
the gel has solidified, the comb is removed. the same sequence.
The "wells" formed by the teeth of the comb
are the receptacles into which the samples
of DNA are loaded. The thickness of the gel
is about 5 millimeters; its length and width
are much greater and vary with the purpose
of the electrophoresis. Before being loaded
with the DNA sample(s), the gel is im-
mersed in a conducting buffer solution in an
electrophoresis chamber.
Before a DNA sample is loaded into a well,
it is mixed with a dense solution of sucrose
or glycerol to prevent the DNA from escap-
ing into the buffer solution. Into one well is
Number 20 1992 Los Alamos Science 55
(b) Conventional Gel Electrophoresis
of Fragmented Human DNA
Shown in (b) are the results of conventional
gel electrophoresis of six different samples
of human DNA. Samples 1, 2, and 3 con-
sisted of the restriction fragments produced
by cutting the same cloned segment of
human DNA with EcoRI alone (a 6-base
cutter), with both EcoRI and /-//ndlll (an-
other 6-base cutter), and with HindM alone,
respectively. Samples 4,5, and 6 consisted
of the restriction fragments produced by
cutting a different cloned segment of human
DNA again with EcoRI alone, with both
EcoRI and HindW. and with HindW alone,
respectively. The leftmost lane of the gel
contains fragments of the lengths indicated.
Note that all the restriction fragments are
well resolved.
(c) Pulsed-field Electrophoresis of
Intact DNA Molecules of
Saccharomyces cerevisiae
Shown in (c) are the results of pulsed-field
gel electrophoresis of three identical
samples, each containing all sixteen of the
intact DNA molecules that compose the
genome of the yeast Saccharomyces
cerevisiae. The four longest chromosomal
DNA molecules are not resolved; all four are
located in the topmost band. The remaining
twelve chromosomal DNA molecules, how-
ever, arewell resolved. The indicated lengths
of the resolved DNA molecules were deter-
mined from the positions, in the rightmost
lane of the gel, of the fragments in a calibra-
tion sample. Even longer fragments, frag-
ments with lengths up to about 5 million
base pairs, can be separated by increasing
the duration of the pulses.
DNA fragments are found in the gel at locations corresponding to lengths of 3400,
4900, 5300, 6000, 7900, and 22,000 base pairs. That set of six EcoRI restriction-
fragment lengths is unique to the A genome and hence can be used as an identifying
characteristic of the genome, a characteristic called its EcoRI restriction-fragment
fingerprint. Only viral genomes can be fingerprinted with a 6-base cutter such as
EcoRI. Complete digestion of the much larger bacterial and eukaryotic genomes with
a 6-base cutter yields so many restriction fragments that gel electrophoresis produces
an essentially continuous smear of fragments rather than a relatively small number
of well-separated fragments. However, a short segment of a large genome can be
fingerprinted with a &base cutter, provided many copies of the segment are available.
Note that the EcoRI restriction-fragment fingerprint of the A genome provides no in-
formation about the order of the restriction fragments along the A genome. More in-
formation is needed to order the fragments and thereby construct an EcoRI restriction-
site map of the A genome, a map showing the distances between its EcoRI restriction
sites. One way to get the additional information is to carry out two digestions, one
of which is complete and the other only partial. The complete digestion produces
fragments such that the length of each is equal to the distance between some two
adjacent restriction sites; the partial digestion produces some fragments such that the
length of each is equal to the distance spanned by three or more adjacent restriction
sites. Together the length data obtained from the two digestions provide sufficient
information to order the fragments and construct the restriction-site map.
The restriction-fragment fingerprints of cloned segments of a large genome have found
application in the efforts to "map" the segments, that is, to arrange the segments in the
order in which they appear along the genome. The principle behind this application
is as follows. Suppose that the restriction-fragment fingerprints of two segments of
a genome include a number of restriction-fragment lengths in common. Calculations
based on the distribution of restriction sites along the genome and on the number of
restriction-fragment lengths in common lead to a value for the probability that the two
fragments overlap and therefore contain pieces of DNA that are contiguous along a
chromosomal DNA molecule. (See "Physical Mapping-A One-Dimensional Jigsaw
Puzzle" in "Mapping the Genome.")
This discussion of gel electrophoresis concludes by noting that the electric field used
to cany out the procedure is usually a constant electric field. However, in such a
field long DNA fragments (fragments longer than about 50,000 base pairs) tend to
become trapped at arbitrary locations in the gel and thus do not migrate through the
gel in a length-dependent manner. But fragments that long or longer are of interest,
and separating them by length is sometimes desirable. For example, making a NotI
restriction-site map of a human chromosome involves gel electrophoresis of restriction
fragments that are on average 1,000,000 base pairs long. (Not! is an 8-base cutter;
the estimated average length of the fragments it produces, namely 4* = 65,536 base
pairs, differs considerably from the observed average length because the recognition
sequence of that restriction enzyme includes several occurrences of the dinucleotide
Number 20 1992 Los Alarms Science
sequence 5'-CG, which happens to be rare in mammalian genomes. NotI is one of
a group of "infrequent cutters," all of which contain at least one occurrence of the
sequence 5'-CG and produce fragments with average lengths ranging from 100,000
base pairs to 1 million base pairs.) Length separation of long fragments can be
accomplished by using an electric field that varies intermittently in direction but has
a time-averaged direction along the length of the gel. Such a "pulsed" field allows
long DNA fragments to wind their way through the molecular framework of the
gel. As shown in "Gel Electrophoresis," pulsed-field electrophoresis can separate
even the very long DNA molecules extracted intact from yeast chromosomes. (Note
that pulsed-field gel electrophoresis of long fragments requires preparation of the
DNA sample by special methods because the accidental fragmentation involved in
the method described at the beginning of this section cannot be tolerated when DNA
molecules are to be studied either intact or as the long, reproducibly cut fragments
produced by a restriction enzyme such as NotI.)
Amplifying DNA. Most of the techniques currently used to analyze a segment of
DNA require the availability of many copies of the segment. Two methods for
"amplifying" a DNA segment are now at hand: molecular cloning, which was
developed in the 1970s, and the polymerase chain reaction (PCR), which was
developed less than a decade ago.
Amplification by Molecular Cloning. Molecular cloning involves replication of a
foreign DNA segment by a host organism, usually the bacterium E. coli. However,
a segment of DNA that has entered an E. coli cell will not be replicated by the cell
unless the segment has first been combined with a cloning "vector," a DNA molecule
that the cell does replicate. The combination of the segment to be cloned, the "insert,"
and the vector is called a recombinant DNA molecule.
The phenomenon of transduction, discovered in 1952, had shown that DNA from
the genome of one strain of E, coli is sometimes incorporated into the genome of
a phage without affecting the ability of the phage to be replicated in another strain
of E. coli. In other words, the phage genome was known to act as a vector, a
DNA molecule that carries foreign DNA into a host cell, where it is then replicated.
Nevertheless, the earliest cloning vectors were plasmids, small DNA molecules found
in and replicated by bacteria. (Plasmids, like the genomes of bacteria, are circular
DNA molecules. They are, however, much smaller than bacterial genomes. Some
plasmids are replicated only when their hosts replicate and occur as single copies.
The replication of other plasmids is not coordinated with host-cell replication; such
plasmids occur as multiple copies.) The plasmid first used was one of a number that
had been studied intensively because they contain genes that confer on the bacteria
in which they reside the ability to survive in the presence of antibiotics. Today two
vectors in addition to phage genomes and plasmids are also widely used: cosrnids,
which are replicated in E. coli, and yeast artificial chromosomes (YACs), which are
replicated in the single-celled eukaryotic organism Saccharomyces cerevisiae (baker's
yeast). Both cosmids and YACs are synthetic rather than naturally occurring DNA
molecules.
The first step in molecular cloning is to make the recombinant DNA molecules in
vitro. The following is a description of the procedure employed when the vector
is a plasrnid that contains a single restriction site for EcoN embedded within a
gene for resistance to ampicillin. Digestion of a population of such plasmids with
EcoRI produces "linearized" plasmids with sticky ends. Inserts with identical sticky
ends are formed by digesting the DNA to be cloned also with EcoRI. When the
linearized plasmids and the inserts are mixed together, along with an enzyme called
a DNA ligase, the sticky ends of some inserts hydrogen bond to the sticky ends
of the linearized plasmids. The backbones of such hydrogen-bonding products
are then covalently linked by the DNA ligase into recombinant DNA molecules
(here recombinant plasmids). Note that the ligation mixture also contains some
nonrecombinant plasrnids because some linearized plasmids simply recyclize.
A more detailed description of the making of recombinant DNA molecules with
plasmids and other vectors is presented in the article "DNA Libraries." Here we
point out only that different vectors are used to clone inserts of different lengths.
Plasmids cany inserts that are usually about 4000 base pairs long, A phages cany
inserts that are usually four to five times longer, and YACs carry inserts that are
usually more than one hundred times longer. (The great lengths of the inserts carried
by YACs implies that YAC cloning, like pulsed-field gel electrophoresis, requires a
special method of DNA preparation.)
The next step in molecular cloning with plasmids is to expose a population of E.
coli cells to the ligation mixture in the hope that one recombinant plasmid will
enter each of a reasonable fraction of the cells. Entry of a plasmid into an E. coli
cell is said to transform the cell, provided the plasmid is replicated by the cell.
The mechanism by which a plasmid (or a YAC) enters a host cell is not completely
understood, but several empirical methods have been found that increase the efficiency
of transformation (number of cells transformed per unit mass of recombinant DNA
molecules). In contrast, the mechanism by which a phage enters (infects) a host cell
is fairly well understood and is inherently more efficient.
After the E. coli cells have been exposed to the ligation mixture, the solution
containing the exposed cells is diluted, a small amount of the diluted solution is
transferred to each of a number of culture dishes containing a solid growth medium,
and the cells are allowed to divide. (Dilution of the exposed cells assures that only a
relatively small number of cells is transferred to each culture dish.) The aggregate, or
colony, of cells produced by successive divisions of a single cell is called a clone of
the single cell. Each member of a clone that arises from a transformed cell contains
at least one copy of the plasmid and, if the transforming plasmid was a recombinant
plasmid, at least one copy of the insert.
Because the goal of molecular cloning is not only to obtain many copies of the insert
within a recombinant DNA molecule but also to do so in as short a time as possible,
one criterion for a host cell is a short generation time. The generation times of both
E. coli and yeast are suitably short. For example, the generation time of E. coli is
about 20 minutes. Thus a single E. coli cell can, under suitable conditions, multiply
into more than a billion cells in about 10 hours.
The final step in plasmid cloning is to identify the clones arising from cells trans-
formed by recombinant plasmids. Recall that the EcoRI restriction site of the plasmid
used in this example lies within its ampicillin-resistance gene. Assume that each host
cell itself contained a plasmid carrying a gene for resistance to ampicillin. Then only
those clones that arose from cells transformed by a recombinant plasmid possess an
inoperative ampicillin-resistance gene (because the insert interrupts the gene). Using
that fact to identify the clones of interest involves transferring a portion of each clone
from the culture dish to some other vessel in a manner that preserves the positions
of the clones. Ampicillin is then added to the other vessel, and the positions of the
clones that die are noted. The clones at the corresponding positions on the culture
dish are the clones desired. Other ingenious tricks have been devised to identify the
desired clones.
The sample of DNA to be cloned usually consists of many different fragments, all
from the same source. Examples are the large sets of fragments obtained by cutting,
say, the mouse genome or the human X chromosome with a restriction enzyme. Then
each recombinant DNA molecule contains a different fragment of the source DNA,
and each host cell entered by a recombinant DNA molecule gives rise to a clone
of a different fragment. A collection of such clones is called a DNA library-a
mouse-genome DNA library, say, or a human-X-chromosome DNA library. The
article "DNA Libraries" describes molecular cloning more fully and discusses the
problems it presents.
Amplification by PCR. Unlike cloning, the polymerase chain reaction is carried out
entirely in vitro and, more important, is capable of amplifying a specific one of
the many fragments that may be present in a DNA sample. The selectivity of the
reaction implies that it is also a means for detecting the presence of the fragment
being amplified. Details of the reaction are presented in "The Polymerase Chain
Reaction and Sequence-tagged Sites" in "Mapping the Genome."
Sequencing DNA. The ultimate in detailed information about a fragment of DNA
is its base sequence. The process of obtaining that information is called sequenc-
ing. Two sequencing methods were developed in 1977, both based on essentially
the same principle but each realizing the goal in a different way. Let b1 b2b3 . . . bN
be the base sequence of the fragment to be sequenced. Consider the set of subfrag-
ments {bl , b1 b2, & bs, . . . , b1 b2 b3 . . . b N} . Assume that such a set of subfragments
Los Alumos Science Number 20 1992
can be generated and, equally important, can be separated into four subsets: the
subset A consisting of those subfragments that end in the base A; the subset C
consisting of those subfragments that end in C; the subset G consisting of those
subfragments that ends in the base G; and the subset T consisting of those subfrag-
ments that end in the base T. Note that together the four subsets compose the set
{bl , bi b2, bl b2b3, . . . , bib2 b3 . . . b N} . The subsets A, C, G, and T are subjected to
electrophoresis, each in a different "lane" of a gel (a different strip of gel parallel
to the direction of the applied electric field). After electrophoresis each subfragrnent
is located in one of the four lanes according to its length. Suppose that the shortest
subfragment, bl, appears in the A lane of the gel; that the next longer subfragment,
bib2, appears in the T lane; that the next longer subfragment, blb2b3, appears in the
G lane; . . . ; and that the longest subfragment, bl b2b3 - . . bN, appears in the T lane.
Then the base sequence of the fragment is ATG . . . T.
s
Obviously the above description of the principle of the two sequencing methods
has avoided the question of how the four subsets of subfragments are generated.
The procedures for doing so are described in "DNA Sequencing" in "Mapping the
Genome."
Although sequencing is still a tedious and expensive process, the information so
obtained is crucial to identification of the DNA mutations that cause inherited
disorders and to a broad understanding of the functioning and evolution of genes
and genomes. Much effort is being devoted to increasing the speed and decreasing
the cost of current sequencing methods and to searching for new methods.
Hybridization: Detecting the Presence of Specific DNA Sequences. The two
single-stranded DNA fragments produced by denaturation of a (double-stranded)
DNA fragment will, under appropriate conditions, renature (form a double-stranded
fragment by hydrogen bonding) because the single-stranded fragments are comple-
mentary along the entirety of their lengths. (Recall that two single-stranded fragments
are complementary if and only if the 5'-to-3' base sequence of one is the complement
of the 3'-to-5' base sequence of the other.) Similarly, hydrogen bonding between
an RNA fragment and a complementary single-stranded DNA fragment will form
a double-stranded DNA-RNA fragment, a phenomenon called hybridization. (Hy-
bridization between the RNA transcript of an E. coli gene and the template strand of
the gene was the technique used in the 1960s to measure the rates of transcription
of various E. coli genes.) The term "hybridization" now also includes the hydrogen
bonding that occurs between any two single-stranded nucleic-acid fragments that are
complementary along only some portion (usually a relatively short portion) of their
lengths.
Hybridization is widely used to detect the presence of a particular DNA segment in a
sample of DNA. If the sample consists of a set of cloned DNA fragments, each cloned
fragment is denatured and then allowed to interact with a solution containing many
copies of a radioactively labeled "probe," a relatively short stretch of single-stranded
DNA whose sequence is identical to or complementary to some unique portion of the
segment of interest. Under the right conditions the probe hybridizes only to the cloned
fragment (or fragments) that contains the segment of interest, and the radioactivity
of the probe identifies the fragment to which the probe has hybridized. For example,
suppose that the sample is a complete set of cloned human DNA fragments and
the segment of interest is the interspersed tandem repeat (5'-GTI Examples of
a probe for that segment are the single-stranded fragments with the sequences (5'-
AC)y and (5f-GT)7. Because the segment (5f-GT) appears at numerous locations
in the human genome, such a probe hybridizes to numerous cloned fragments but
only to those containing the interspersed tandem repeat (or a portion thereof). If the
sample to be interrogated with a probe is instead a solution containing many different
DNA fragments, the fragments must first be separated and immobilized, usually by
gel electrophoresis. If the probe is sufficiently short, hybridization can be carried
out directly on the gel. Usually, however, the length-separated fragments are first
transferred from the gel to a nitrocellulose filter. The procedure, called Southern (or
gel-transfer) hybridization, is illustrated in "Hybridization Techniques."
In-situ hybridization is a variation of hybridization in which the sample to be
interrogated with a probe consists of the intact DNA molecules within metaphase
chromosomes. The metaphase chromosomes are spread out on a microscope slide
and partially denatured. The probe copies are labeled with a fluorescent molecule
and allowed to interact with the denatured chromosomes. The presence of bound
probe is detected by observing the chromosomes with a fluorescence microscope.
An example of the fluorescence signal obtained by using the technique is shown in
"Hybridization Techniques." In-situ hybridization provides information about which
chromosome contains the segment of interest and its approximate location on the
chromosome.
This section on the techniques of molecular genetics concludes with an application
that not only requires the use of almost all the techniques described but also is of
particular significance to the efforts to arrange cloned fragments of human DNA in
the same order as they appear in the intact DNA molecules of human chromosomes.
The application involves the use of long cloned fragments of human DNA to obtain
an upper limit on the length of the segment of DNA that separates the chromosomal
locations of any two short cloned fragments of human DNA (such as those provided
by plasmid, phage, or cosmid cloning). The long fragments, which are produced by
cutting human genomic DNA with an infrequent cutter, are subjected to pulsed-field
gel electrophoresis and then to Southern hybridization. Two different probes are
used separately in the hybridization; each is unique to one of the two short cloned
fragments. If both probes hybridize to the same long fragment, then both short
fragments lie within the long fragment In other words, the chromosomal locations
of the short fragments are separated by a length of DNA no longer than the length
of the long fragment to which both probes hybridized.
Los Alarnos Science Number 20 1992
Southern hybridization is a technique for
identifying, among a sample of many differ-
ent DNA fragments, the fragment(s) con-
taining a particular nucleotide sequence.
As depicted in (a), the sample has typically
been fragmented with a restriction enzyme.
The restriction fragments are subjected to
gel electrophoresis to separate them by
length and immobilize them. The length-
separated fragments are then transferred to
a filter paper made of nitrocellulose, a pro-
cedure called blotting. (Note that blotting
preserves the locations of the fragments.)
The filter is washed first with a solution that
denatures the fragments and then with a
solution containing many copies of a radio-
actively labeled, single-stranded "probe"
whose sequence is identical to or comple-
mentary to some unique portion of the se-
quence of interest. The probe hybridizes
(hydrogen bonds) to only the denatured
fragments containing the complement of its
sequence and hence the sequence of inter-
est. The unbound probe is washed away,
and the filter is dried and placed in contact
h
with x-ray film. The radioactivity of the bound
probe exposes the film and creates an im-
age, an autoradiogram, of the fragment(@
to which the probe has bound. Southern
hybridization is particularly useful for de-
tecting variations among different members
of a species in the lengths of the restriction
fragments originating from a particular re-
gion of the organism's genome (see "Mod-
ern Linkage Mapping with Polymorphic DNA
Markers" in "Mapping the Genome").
The number of fragments "picked out" by a
probe depends on the number of times the
sequence of interest occurs in the sample
DNA. If the sequence occurs only once (if a
probe for, say, a single-copy gene is being
used), the probe picks out one or at most
two fragments (provided the probe isshorter
than any of the fragments in the sample).
On the other hand, if the sequence of inter-
est occurs more than once (if a probe for a
multiple-copy gene or a repeated sequence
is being used), the probe picks out a larger
-
HYBRIDIZATION TECHNIQUES
(a) Southern Hybridization
DNA sample
I
Fragmentation with
restriction enzyme
Restriction
fragments
Gel eletrophoresis
Gel
containing
length-
separated
restriction
fragments
I
I
Transfer fragments
from gel to
nitrocellulose filter
Filter with
were in
I
the gel
Hybridization with
radioactively
labeled probe
Filter with
probe
bound to
mentary
I
fragment
1
Autoradiography
Film
, showing
image of
.. I , At'- .
hybridized
fragment
number of fragments. Furthermore, the hy-
bridization conditions (temperature and sa-
linity of the probe solution) can be adjusted
so that either exact complementarity or a
lesser degree of complementarity is re-
quired for binding of the probe.
In-situ hybridization is a variation of hybrid-
ization in which the sample consists of the
complement of chromosomes within a cell
arrested at metaphase. The metaphase
chromosomes are spread out and partially
denatured on a microscope slide, the probe
is labeled with a fluorescent dye, and the
bound probe is imaged with a fluorescence
microscope. Shown in (b) is the fluores-
cence signal resulting from in-situ hybrid-
ization of a probe for the human telomere to
human metaphase chromosomes. (A te-
lomere is a special sequence at each end of
a eukaryotic DNA molecule that protects
the molecule from enzymatic degradation
and prevents shortening of the molecule as
it is replicated. The sequence of the human
telomere was discovered by Robert K.
Moyzis and his colleagues, who also pro-
vided evidence that all vertebrates share
the same telomeric sequence. Note that, as
expected, the probe has bound only to the
terminal regions of each chromosome. (Mi-
crograph courtesy of Julie Meyne.)
(b) Results of In-Situ Hybridization of
-
Human-Telomere Probe to Human
Chromosomes
63
r Promoter (TATA box)
Upstream
enhancer
txon txon
Downstream enhancer
Stop site
Poly A site -j 1 1
t xon
Upstream
region
Each eukaryotic gene is placed in one of
three classes according to which of the
three eukaryotic RNA polymerases is in-
volved in its transcription. The genes for
RNAs are transcribed by RNA polymerases
I and Ill. The genes for proteins, the class
first brought to mind by the word "gene" and
the class focused on here, are transcribed
by RNA polymerase II (polll).
Shown above are the components of a
prototypic protein gene. By convention the
sense strand of the gene, the strand with the
sequence of DNA bases corresponding to
the sequence of RNA bases in the primary
RNA transcript, is depicted with its 5'-to-3'
direction coincident with the left-to-right di-
rection. (Often only the sense strand of a
gene is displayed.) The left-to-right direc-
tion thus coincides with the direction in
which the template strand is transcribed.
The terms "upstream" and "downstream"
describe the location of one feature of a
gene relative to that of another. Their mean-
ings in that context are based on regarding
transcription as a directional process analo-
gous to the flow of water in a stream.
The start site is the location of the first
deoxyribonucleotide in the template strand
that happens to be transcribed. It defines
the beginning of the transcription region of
the gene. Note that the start site lies up-
stream of the DNA codon (ATG) corre-
sponding to the RNA codon (AUG) that
signals the start of translation of the tran-
scribed RNA. The transcription region ends
at some nonspecific deoxyribonucleotide
between 500 and 2000 base pairs down-
Transcription
region
stream of the poly A site. Within the poly A
site are sequences that, when transcribed,
signal the location at which the primary RNA
transcript is cleaved and equipped with a
"tail" composed of a succession of ribo-
nucleotides containing the base A. (The
poly A tail is thought to aid the transport of
messenger RNA from the nucleus of a cell
to the cytoplasm.) Note that the poly A site
lies downstream of the DNA codon (here
TAA) corresponding to one of the RNA
codons (UAA) that signals the end of trans-
lation of the transcribed RNA.
Within the transcription region are exons
and introns. Exons tend to be about 300
base pairs long; each is a succession of
codons uninterrupted by stop codons. In-
trons, on theother hand, are not uninterruped
successions of codons, and the RNA seg-
ments transcribed from introns are spliced
out of the primary RNA transcript before
translation. A few protein genes contain no
introns (the human a-interferon gene is an
example), most contain at least one, and
some contain a large number (the human
thyroglobulin gene contains about forty).
Generally the amount of DNA composing
the introns of a protein gene is far greater
than the amount composing its exons.
Close upstream of the start site is a pro-
moter sequence, where pol II binds and
initiates transcription. A common promoter
sequence in eukaryotic genes is the so-
called TATA box, which has the consensus
sequence 5'-TATAAA and is located at a
variable short distance (about 30 base pairs)
upstream of the start site.
I
Downstream
region
The region upstream of the promoter and,
less frequently, the downstream region or
the transcription region itself contain se-
quences that control the rate of initiation of
transcription. Although expression of a pro-
tein gene is regulated at a number of stages
in the pathway from gene to protein, control
of replication initiation is the dominant regu-
latory mechanism. (Primary among the other
regulatory mechanisms is control of splic-
ing.) The regulated expression of a gene
(the when, where, and degree of expres-
sion) is the key to phenotypic differences
between the various cells of a multicellular
organism and also between organisms that
possess similar genotypes.
Initiation of transcription is controlled mainly
by DNA sequences (cis elements) and by
certain proteins, many but not all of which
aresequence-specific DNA-binding proteins
(trans-acting transcription factors). Thus
both temporal and cellular specificities of
transcription control are governed by the
availability of the different trans-acting tran-
scription factors. Interactions of transcrip-
tion factors with cis elements and with each
other lead to formation of complex protein
assemblies that control the ability of pol 11 to
initiate transcription. Most of the complexes
enhance transcription initiation, but some
act as repressors. Enhancers and repres-
sors can be located as far as 10,000 base
pairs away from the transcription region.
Class I and class Ill genesdifferfrom protein
genes not only in their anatomies but also in
the promoters, cis elements, and trans-
acting factors involved in their transcription.
Genes and Genomes: What the Future Holds
The techniques described in the preceding section, and others not mentioned, have
greatly increased our knowledge of the molecular anatomies of genes. Previously,
a gene for a protein was defined narrowly as a segment of DNA that is transcribed
into a messenger RNA, which in turn is translated into the protein. The definition
considered more appropriate today includes not only the protein-coding segment of
the gene (its transcription region) but also its sometimes far-flung regulatory regions
(see "The Anatomy of a Eukaryotic Protein Genee'). The regulatory regions contain
DNA sequences that help determine whether and at what rate the gene is expressed
(or, equivalently, the protein is synthesized). Some of the genes of a multicellular
organism, its "housekeeping" genes, are expressed at more or less the same level
in essentially all of its cells, regardless of type. Others are expressed only in
certain types of cells or only at certain times. Gene regulation is, in fact, the
key not only to appropriate functioning of the organism but also to its development
from a single cell. In addition, gene regulation may also be responsible for the
striking phenotypic differences between higher apes and humans despite the negligible
differences between the structures of their proteins. "The Anatomy of a Eukaryotic
Protein Gene" presents also a few details about the mechanisms of gene regulation.
Despite the accumulating knowledge, it is safe to say that what is known about
genes, particularly human genes, is far less than what remains to be learned. The
total number of human genes can now be only crudely estimated, remarkably few
have been localized to particular regions of particular chromosomes, and even fewer
have been sequenced or studied in sufficient detail to understand their regulation.
Other outstanding questions include the mechanisms by which the expression of
genes is coordinated and the effects of gene mutations on morphology, physiology,
and pathology.
The techniques of molecular genetics are also providing information about genomes
as a whole, opening the way to comparative studies of genome anatomy, organization,
and evolution. For example, the available evidence indicates remarkable similarities
between the mouse genome and the human genome, despite the 60 million years
that have elapsed since rodents and primates diverged from a common ancestor. The
similarities lie not only in the base sequences of genes but also in their linkages.
Perhaps the conserved linked genes represent units of some higher, as yet unknown
operational feature. The same may be true also of repetitive DNA, about which we
now know so little. In time, when those and other genomes have been sequenced
in their entireties, the observed similarities and differences will be a rich source of
answers and new questions about the operation and evolution of genomes. H
Number 20 1992 Los A l m s Science
Further Reading
James A. Peters, editor. 1964. Classic Papers in Genetics. Englewood Cliffs, New Jersey: Prentice-Hall,
Inc .
J. Herbert Taylor, editor. 1965. Selected Papers on Molecular Genetics. New York: Academic Press.
John Cairns, Gunther S. Stent, and James D. Watson, editors. 1966. Phage and the Origins of Molecular
Biology. Cold Spring Harbor, New York: Cold Spring Harbor Laboratory of Quantitative Biology.
John C. Kendrew. 1968. The Thread of Life: An Introduction to Molecular Biology. Cambridge,
Massachusetts: Harvard University Press.
Rene J. Dubos. 1976. The Professor, the Institute, and DNA. New York: The Rockefeller University Press.
Franklin H. Porugal and Jack S. Cohen. 1977. A Century of DNA: A History of the Discovery of the
Structure and Function of the Genetic Substance. Cambridge, Massachusetts: The MIT Press.
Horace Freeland Judson. 1979. The Eighth Day of Creation. New York: Simon and Schuster.
James D. Watson. 1980. The Double Helix: A Personal Account of the Discovery of the Structure of DNA.
New York: W. W. Norton and Co.
James D. Watson and John Tooze. 1981. The DNA Story: A Documentary History of Gene Cloning. San
Francisco: W. H. Freeman and Company.
James D. Watson, Nancy H. Hopkins, Jeffrey W. Roberts, Joan Argetsinger Steitz, and Alan M. Weiner.
1987. Molecular Biology of the Gene. Men10 Park, California: The Benjamm/Cummings Publishing
Company, Inc.
David A. Micklos and Greg A. Freyer. 1990. DNA Science: A First Course in Recombinant DNA
Technology. New York: Cold Spring Harbor Laboratory Press.
James Damell, Harvey Lodish, and David Baltimore. 1990. Molecular Cell Biology, second edition. New
York: W. H. Freeman and Company.
Maxine Singer and Paul Berg, 1991. Genes & Genomes: A Changing Perspective. Mill Valley, California:
University Science Books.
Robert P. Wagner is a consultant to the
Laboratory's Life Sciences Division and
Professor Emeritus of Zoology at the Uni-
versity of Texas, Austin, the institution
from which he received his Ph.D. His
work at the Laboratory focuses on the ac-
tivities of the Center for Human Genome
Studies. He has taught undergraduate and
graduate genetics for over thiry-five years
and has authored or co-authored six books
and many research and review articles on
various aspects of g~netics. His numer-
ous honors and awards include fellowships
from the National Research Council and
tile Guggenheim Foundation and election
as a fellow of the American Association for
the Advancement of Science and as presi-
dent of the Genetics Society of America.
To create astereoscopic image of DNAfrom the two images on this page, focus
your eyes on a distant object above the page and then move the images up into
your line of sight, holding the page 12 to 18 inches away and being careful to
keep your eyes focused at infinity. If your eyes have not shifted, you should be
aware of three images. Concentrate on the middle one, which is the desired
stereoscopic image. You may have to practice a few times and should be sure
the page and your head are vertical.

An Introduction For Classical and Molecular Genetics

Uploaded by

Copyright:

Available Formats

An Introduction For Classical and Molecular Genetics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Introduction For Classical and Molecular Genetics

Uploaded by

Copyright:

Available Formats

-

AN INTRODUCTION TO CLASSICAL AND MOLECULAR GENETICS

You might also like