'Like begets like,' that what is now called a species begets offspring of the same species. Recognition of the inheritance of variations within a species must have come early. Domestication of food plants probably began between 8000 and 9000 years ago.
'Like begets like,' that what is now called a species begets offspring of the same species. Recognition of the inheritance of variations within a species must have come early. Domestication of food plants probably began between 8000 and 9000 years ago.
Original Title
An Introduction for Classical and Molecular Genetics
'Like begets like,' that what is now called a species begets offspring of the same species. Recognition of the inheritance of variations within a species must have come early. Domestication of food plants probably began between 8000 and 9000 years ago.
'Like begets like,' that what is now called a species begets offspring of the same species. Recognition of the inheritance of variations within a species must have come early. Domestication of food plants probably began between 8000 and 9000 years ago.
AN INTRODUCTION TO CLASSICAL AND MOLECULAR GENETICS
1 - re> pr12 years, humankind has made great progress unraveling the mysteries of inheritance. The of I Robert P. Wagner mi, baker's yeastthe rod-shaped Afabidops& t heqa, and t tiny, machine-like viruses that infect%a%t&fla.' &ch ' of these organisms has contributed to our understanding of molecular interactions - Understanding Inheritance hat like begets like-that what is now called a species begets offspring of the same species-must have been evident to the earliest humans. Recognition of the inheritance of variations within a species must also have come early, since domestication of animals undoubtedly involved elimination of individuals with undesirable characteristics (a penchant for human flesh, for example). The first animals to be domesticated may well have been members of the dog family, which were used as food, and domestication of canines may have started even before the advent of Homo sapiens. The remains of an old horninid relative of ours, Homo erectus (also known as Java or Peking man), have been found associated with those of a dog-like animal in 500,000-year-old fossils. The earliest canine remains associated with our own species are a mere 12,000 years old. The domestication of food plants probably began between 8000 and 9000 years ago, although some authorities contend that the domestication of cereals preceded that of most animals. Humans must also have very early related mating between "male" and "female" animals, including humans, with the subsequent issuance of offspring. Sexual repro- duction in plants was probably recognized much later-many plants, after all, are discreetly bisexual-but at least 4000 years ago, as evidenced by the Babylonians' selective breeding, through controlled pollination, of the date palm (Phoenix dactylif- era), which occurs as separate male and female trees. (The dates borne by a female tree result from fertilization of its eggs by sperm-containing pollen from male trees.) The oldest recorded thoughts about heredity appear in the religious writings of the ancient Hindus and Jews, which reveal recognition of the heritability of disease, health, and mental and physical characteristics. The caste system of the Hindus, the hereditary priesthood among the Jews of the tribe of Levi, and later, in Homer's time, the inheritance of the gift of prophecy are a few reflections of ancient thinking about the link between successive generations of humans. Some of those ideas, which of necessity were based primarily on philosophical outlook rather than scientific fact, are discussed briefly in "Early Ideas about Heredity." The Dawn The first significant advances toward our current understanding of inheritance came in the late Renaissance with the work of the English physician William Harvey (1578-1657) and the invention of the microscope (circa 1600). Harvey is best known for his discovery of the dynamics of the circulation of the blood, but he also propounded a new view about the relative importance of the contributions of male and female animals to the creation of offspring. Previously, the female contribution, the egg, had been regarded as mere matter, matter that assumes a form dictated entirely by the male's semen. But Harvey proposed that both egg and semen guide the development of an offspring. His observation of the eggs of many species led him to conclude (in De generatione animalium, 1651) that "ex ovo omnia." That everything arises from an egg was meant to apply to humans also, even though Harvey had never seen the eggs of humans or any other live-bearing creature. Los Alarms Science Number 20 1992 Understanding Inheritance Ancient beliefs about heredity included the idea that inborn characteristics are in- herited from parents, as well as the idea that they could be affected by external influ- ences on the parents at conception or dur- ing pregnancy. The biblical story of Jacob's wages (Genesis, chapter 30) combines both. Jacob had agreed to tend the flock of his uncle and father-in-law, Laban, if he could take when he left all the unusually colored animals: the sheep with dark wool and the goats with white streaks or speckles. But Laban, a deceitful and greedy man, took his few such animals three days' journey away. The remaining stock he assumed would not produce offspring of the colorations Jacob had named. However, Jacob peeled tree branches to make them striped and spotted and stood them in the watering troughs when the stronger goats were mating nearby. The kids from those matings, unlike their parents, had the markings that made them his, and they were more vigorous than the offspring of the weaker goats. He herded the sheep so they faced Laban's dark-col- ored goats; they then bore dark-colored lambs. Today the appearance in offspring of characteristics different from those of either parent can be attributed to the com- bined effects of the genetic contributions of each parent (see "Mendelian Genetics"). The ancient Greeks gave considerable at- tention to human inheritance in their writ- ings. Plato, for example, made cogent state- ments about human traits being determined by both parents. He emphasized that people are not completely equal in physical and mental characteristics and that each person inherits a nature suited to fulfilling only cer- tain societal functions. Also prominent in the thinking of the early Greeks was the inheritance of acquired characteristics. Aristotle, for example, wrote that children are born resembling their par- ents in their whole body and their indi- vidual parts. Moreover this resemblance is true not only of inherited but also of acquired characters. For it has hap- pened that the children of parents who bore scars are also scarred in just the same way in just the same place. In Chalcedon, for example, a man who had been branded on the arm had a child who showed the same brand let- ter, though it was not so distinctly marked and had become blurred. The idea that external influences play a role in heredity persisted even until the early part of the twentieth century. We now know that the idea contains some truth. For example, ionizing radiation, many chemicals, and in- fection by some viruses can cause heritable changes, or mutations, but generally those changes are entirely random and cannot be directed toward specific outcomes. One of the more remarkable theories about century A.D. and were even accepted by Charles Darwin. Pangenesis was for some reason dominant in the thinking of the phi- losophers and theologians of the Middle Ages. Albertus Magnus (1 193-1 280), his pupil Thomas Aquinas (1 225-1 274), and the naturalist Roger Bacon (circa 1220- 1294) all accepted pangenesis as a fact. One variant of the theory was the idea that both male and female produced semen. According to Paracelsus (1 493-1 541), se- men was an extract of the human body containing all the human organs in an ideal form and was thus a physical link between successive generations. Also prevalent during the Middle Ages was the concept of entelechy, the Aristotelian idea that the way an individual develops is determined by a vital, inner force. The de- termining force is provided by the male and transmitted in his semen. The female pro- vides no semen but only, so to speak, raw material. Aristotle compared the roles of male and female in the creation of an off- spring with the roles of sculptor and stone in the creation of a sculpture. human inheritance, pangenesis, was de- veloped in about the fifth century B.C. and espoused by Hippocrates and his followers. Other forms of vitalism continued to be Accordingtothattheory, semen wasformed popular even up to the beginning of the in every part of the male body and traveled twentieth century primarily because people through the blood vessels to the testicles, lacked knowledge about the nature of the which were merely repositories. Variations physical connection between generations of the theory lasted well into the ninteenth of animals and plants. Number 20 1992 Los Alamos Science 3 Understanding Inheritance With his naked eye Harvey could see no form in a newly laid, fertilized chicken egg. But he assumed the form that did appear later arose epigenetically from matter that has some sort of inherent, though invisible, organization. The theory of epigenesis-that an organism arises from structural elaboration of formless matter rather than by enlargement of a preformed entity-dates back to Aristotle, but Harvey differed from Aristotle in seriously doubting that the living can arise from the nonliving. Experimental justification for his doubt came about a century later. Thoughts about heredity would probably not have advanced beyond Harvey's had it not been for the compound microscope, an invention credited sometimes to Zaccharias Janssen and sometimes to Galileo. Other Renaissance men noted for their discoveries with the microscope and improvements to its design are regarded as the founders of microscopy: Nehemiah Grew (164 1-1 7 12), Robert Hooke (1 635-1 703), Antoni van Leeuwenhoek (1632-1723), Marcello Malpighi (1628-1 694), and Jan Swammerdam (1637-1680). Their observations-among which were sperms in semen and structural elements, dubbed cells by Hooke, in plant and animal tissues-formed the foundations of the science now called cell biology. Users of the early, low-resolution microscopes could (and did) let their imaginations run wild. Some thought they saw miniature humans, homunculi, preformed in hu- man sperms; others saw tiny animals, animalcula, preformed in animal eggs. Those apparitions led to resurrection of the theory of preformation originally propounded by Democritus and other Greeks. In the eighteenth century the preformation theory developed into the encapsulation theory, which stated that, at the time of creation, all future generations were packaged, one inside the other, within the primordial egg or sperm. Logically, all life would come to an end when the last homunculus or animal- culum was born. The encapsulation theory died=because it was ridiculous-although many eminent biologists were its fierce advocates up to the beginning of the nine- teenth century. The higher-resolution microscopes of the later half of the eighteenth century allowed Caspar Friedrich Wolff (17341794) to observe the development of chicken embryos. His work clearly showed that the components of a new organism are not preformed but, as stated two millenia before by Aristotle and a century before by Harvey, arise from the undifferentiated matter of the fertilized egg. The Great Awakening Modem biology may be said to have been born in the nineteenth century, several hun- dred years after the beginnings of modem chemistry and physics. Earlier biologists were either physicians or naturalists (what we now call botanists and zoologists), and their work focused on structure, physiology, and classification. But the nineteenth century brought several developments that were basic to emergence of the newer branches of biology, including cell biology and genetics. Los Alamos Science Number 20 1992 Understanding Inheritance The Rise of Cell Biology. During the first half of the nineteenth century, evidence accumulated for the so-called cell theory, which states that the cell is the structural and functional unit of all organisms. The diversity of cell shapes and sizes was noted (see "The Variety of Cells"), and various intracellular structures were observed (see "Components of Eukaryotic Cells"). Of particular importance to genetics is the membrane-bound intracellular structure called the nucleus, which was found to be a common feature of the cells of all organisms more complex than bacteria and blue-green algae. Organisms possessing a nucleus were classified as eukaryotes, and organisms lacking a nucleus were classified as prokaryotes. Later, during the early 1850s, came the momentous finding, embraced in the aphorism omnis cellula e cellula, that cells divide to form new cells. A leading proponent of the idea that all cells come from cells was the German physician Rudolph Virchow (1821-1902). A cancer specialist, among other things, Virchow asserted that cancer cells arise from cells pre-existing in the body and do not, as earlier physicians had thought, arise by spontaneous generation from unorganized matter. Another development was the realization that gametes (sperms and eggs) are also cells, in particular cells specialized for transmitting information from one generation of a sexually reproducing organism to the next. The remarkable difference in size between sperms and eggs was found to be due to cell components other than their nuclei, and that observation, coupled with the belief that sperms and eggs contain the same amount of hereditary information, indicated that hereditary information resides in the nuclei of gametes. The nucleus was found to be the site also of the information transmitted from one cellular generation to the next. The above developments led to formulation of the law of genetic continuity, which succinctly summarizes what was probably the most important advance toward the understanding of living systems up to that time: Life comes only from life through the medium of cells. By the late 1880s hereditary information had been localized farther to intranuclear elements that can be seen with the microscope during the mitotic phase of the cell cycle, the phase that culminates in cell division (see "The Eukaryotic Cell Cycle"). The elements, which were named chromosomes because they can be stained (selectively colored) with certain dyes, are most easily observed during the portion of the mitotic phase called metaphase. (We now know that each "metaphase chromosome" consists of two duplicates of a single chromosome bound together along a more or less central region.) Facts accumulated about chromosomes (see "Chromosomes: The Sites of Hereditary Information"). All the somatic cells (cells other than gametes) of a sexually repro- ducing organism have the same even number of chromosomes, the so-called diploid number, whereas all its gametes have the same so-called haploid number of chromo- somes, which is exactly one-half the diploid number. Furthermore, the diploid and Number 20 1992 Los A l m s Science Cells vary in shape from the most simple to the indescribably complex. Shown here are electron micrographs of a few examples from nature's cornucopia. Escherichia coli, the most studied of all bacteria From Molecular Biology of the Cell, second edition, by Bruce Alberts et al. Copyright 1989 by Garland Publishing, Inc. Reprinted with permission. Courtesy of Tony Brain and the Science Photo Library. Mouse fibroblast during the final stage of cell division From Molecular Biology of the Cell, second edition, by Bruce Alberts et al. Copyright 1989 by Garland Publishing, Inc. Reprinted with permission. Courtesy of Guenter Albrecht-Buehler. M ^TII^\fllI- - - I - Los Alamos Science Number 20 1992 - - - - -- Understanding Inheritance -- - COMPONENTS OF EUKARYOTIC CELLS 1 apparatus (where various macro- molecules are modified, sorted, and packaged for secretion from the cell or for distribution to other organelles), an endoplasmic reticulum (the principal site of protein synthesis), and a nucleus (the residence of chromosomes and the site of DNA replication and transcription). The nucleolus is the site of ribosomal-RNA synthesis. The organelles unique to plant cells are chloroplasts (the sites of photosynthesis in green plants) and vacuoles (water-filled compartments that serve as space fillers and as Figure adapted (with permission) from an illustration in Genes and Genomes by Maxine Singer and Paul Berg (University Science Books, 1991). include mitochon- dria (the sites of energy production by oxidation of nutrients), a Golgi \ Golgi Apparatus \ Endoplasmic reticulum '1 \ - , - - - : . a storage vessels). Plant cells differ fro ' animal cells also in being surrounded by a cellulose cell wall, a much more rigid form of the extracellular matrix that surrounds animal cells. Vacuole ----- 8 Los Alamos Science Number 20 1992 1 THE EUKARYOTIC CELL CYCLE Interphase Mitotic phase 7 - - A , e x Generation time Time - The term "cell cyclen refers collectively to the events that occurwithin a eukaryoticcell between its birth by mitosis and its division, again by mitosis, into two daughter cells. The cell may be either a one-celled organ- ism such as baker's yeast (Saccharomyces cerevisiae) or a somatic cell of a multicellu- lar organism. Earlystudies of the eukaryotic cell cycle concentrated on the microscopi- cally visible and dramatic physical events of the cell-division, or mitotic, phase (M). On- set of the mitotic phase is signaled by the appearance of microscopically visible worm- like bodies within the nucleus, that is, by the condensation of duplicated chromosomes into a much less diffuse configuration. The mitotic phase ends when the cell separates into two daughter cellls, each of which then embarks on its own cycle. (Details of the mitotic phase are presented in "Mitosis.") Because the early microscopic studies re- vealed little physical activity during the por- tion of the cell cycle that precedes the mitotic phase (other than a relatively small increase in cell size), that portion was inap- propriately named the resting phase, or interphase. We now know that most of the biosynthetic activity required of a cell-both for its own maintenance and reproduction and for its function or functions as a con- stituent of a multicellularorganism-occurs during interphase. Number 20 1992 Los Alamos Science Most of the biochemicals produced by a cell are synthesized throughout interphase. DNA is a notable and easily detected ex- ception, and for that reason interphase is subdivided into the period between cell birth and the onset of DNA synthesis (GI), the period of DNA synthesis (S), which ends when all the nuclear DNA has been repli- cated and hence the number of chromo- somes has doubled, and the period be- tween the end of DNA synthesis and the beginning of the mitotic phase (GJ. After a cell has entered S, it is committed to com- pleting the cell cycle, even when environ- mental conditions are extremely adverse. The length of the cell cycle, the generation time, varies with environmental conditions and among species and cell types. For example, epithelial cells, the cells that line the interior and exterior surfaces of the human body, have relatively short genera- tion times (about eight hours); fibroblasts, cells that assist in healing wounds, com- plete their cell cycle only on demand; mature red blood cells never undergo mitosis; and embryonic cells divide very rapidly. Ob- served generation times for those cells that do have a regular cycle range from about a few minutes to a few months. The variation in generation time is due mainly to a varia- tion in the,length of GI and of G. The mitotic phase of most species and most cell types occupies only about 10 percent of the generation time. The cell cycle of bacteria, in addition to being shorter (typically less than an hour), is also less complex. In particular, DNA is synthesized continuously, the two copies of the single bacterial chromosome do not undergo extensive condensation before cell division, and a mechanism simpler than the one illustrated in "Mitosis" assures parcel- ing out of one chromosome copy to each daughter cell. Understanding Inheritance Wi t hi n the nucleus of each cell of a eukaryotic organism are a number of chromosomes, each composed of a single molecule of DNA (see "DNA: Its Structure and Components") and a roughly equal mass of proteins (primarily the proteins called histones). The DNA molecule carries hereditary information; the proteins help effect the ordered condensation, or compaction, of the very long, very thin DNA molecule. During most of a cell's life, its chromosomes are too decondensed to be visible with an optical microscope. However, during metaphase, a phase preparatory to cell division (see "Mitosis" and "Meiosis"), the chromosomes become highly con- densed and hence easily visible. Most studies of chromosomes are therefore carried out on chromosomes extracted from cells arrested at metaphase. Each such "metaphase chromosome" consists in reality of two duplicates of a single chromosome bound together along a somewhat constricted region called a centromere. The three micrographs of metaphase chromo- somes shown here illustrate some general facts about chromosomes. Shown above are the metaphase chromo- somes extracted from a root-tip cell of maize (Zea mays), The chromosomes were stained with a fluorescent dye and photographed through an optical microscope while being illuminated by a laser that excites the dye's fluorescence. (The chromosomes could have been stained instead with a nonfluo- rescent dye.) A total of twenty metaphase chromosomes is visible in the micrograph, and any somatic cell (any cell other than an egg or a sperm) of any Zea mays plant possesses that same number of metaphase chromosomes. In general, all the somatic cells of all the members of a species pos- sess the same even number of metaphase chromosomes, called the diploid chromo- some number. The diploid chromosome x about 550 number varies erratically from species to species: the known values range from 2 to many hundreds. (Note that the diploid chro- mosome number is not a measure of a species' evolutionary status.) The twenty metaphase chromosomes of Zea mays obviously exhibit different morphologies, that is, different sizes and centromere positions. However, even the untrained observer might notice that the two highlighted metaphase chromosomes lookvery much alike. In fact, the twenty metaphase chromosomes of Zea mayscan be grouped into ten homologous, or morphologically indistinguishable, pairs. The metaphase chromosomes of all eu- karyotic species occur as homologous pairs, and that general fact is due to the occur- rence of chromosomes themselves as ho- mologous pairs. Furthermore, the homol- ogy of a pair of chromosomes is due to a high degree of similarity between the base sequences of their constituent DNA mol- ecules. (Micrograph courtesy of Paul Jack- son and Jerome Conia.) Los Alamos Science Number 20 1992 Understanding Inheritance Shown at right are the metaphase chromo- somes extracted from a somatic cell of a house mouse (Musmusculus). To help iden- tify homologous pairs, the chromosomes were stained with a dye called Giemsa that produces a pattern of dark and light bands, a pattern that varies from one homologous pair to another. The chromosome images have been grouped in homologous pairs and arranged in order of decreasing size. Such a display of metaphasechromosomes is called a karyotype. The last entry in the karyotype is the pair of chromosomes that are involved in determining sex. Because this particular mouse cell posseses two homologous sex chromosomes, it is a cell from a female mouse. Cells of a male mouse possess two nonhomologous sex chromosomes, one X chromosome and a smaller Y chromosome. x about 650 Number 20 1992 Los Alamos Science x about 750 Shown at left is the karyotype of a human prepared from the Giemsa-stained met- aphase chromosomes of a lymphocyte. Note the twenty-two homologous pairs of auto- somes (chromosomes other than sex chro- mosomes) and the two nonhomologous sex chromosomes. The nonhomology of the sex chromosomes indicates that this is the karyotype of a male human, namely of the well-known cytogeneticist T. C. Hsu of the University of Texas System Cancer Center. (Both of the karyotypes on this page were provided by T. C. Hsu.) Understanding Inheritance haploid chromosome numbers are constant among different members of the same species but vary among different species. For example, all somatic cells of all members of the species Homo sapiens contain forty-six chromosomes, all somatic cells of all members of the species Drosophila melanogaster (a fruit fly) contain eight chromosomes, all somatic cells of all members of the species Pisum sativum (the garden pea) contain fourteen chromosomes, and all somatic cells of all members of the species Mus muscuius (the house mouse) contain forty chromosomes. And all the gametes of all members of each of the above species contain twenty-three, four, seven, and twenty chromosomes, respectively. Second, the metaphase chromosomes within a single cell vary morphologically (in size and shape), but the variations remain constant among all cells of all members of a single species. (We now know that exceptions to the above generalizations occur and that the exceptions are often causes or symptoms of disease.) The morphological differences among the metaphase chromosomes of a species led to recognition that metaphase chromosomes occur as morphologically indistinguishable (homologous) pairs. Although the members of a pair of homologous metaphase chromosomes are indistinguishable by any low-resolution physical technique, they do differ, as we now know, in fine details of the nucleotide sequences of their constituent DNA molecules. The occurrence of metaphase chromosomes as morphologically indistinguishable pairs is due to the occurrence of chromosomes themselves as homologous pairs, pairs whose constituent DNA molecules have nearly identical nucleotide sequences. An exception to the occurrence of chromosomes as homologous pairs should be noted. Males of some species, including all mammals and Drosophila melanoguster, possess two chromosomes, called the X and Y chromosomes, that do not form a homologous pair. the Y chromosome generally being much smaller than the X chromosome. Females of such species possess two X chromosomes, each of which is homologous to the other and to the X chromosome of the male. Collectively, the X and Y chromosomes are called sex chromosomes; the remaining chromosomes are called autosomes. In the case of humans and other placental mammals, the presence of a Y chromosome is necessary for maleness (the presence of testes), but in the case of other species, including D. melanogaster, the presence of a Y chromosome, although necessary for fertility, is not necessary for maleness. Also observed during the late nineteenth century were microscopic details of cell division and the effect of cell division on chromosomes. Mitosis, the type of cell division undergone by all somatic cells other than the immediate precursors of gametes, was found to yield two daughter somatic cells with the same diploid number of chromosomes as the mother cell (see "Mitosis"). Furthermore, the German zoologist Theodor Heinrich Boveri (1862-1915) found that the metaphase chromosomes of a mother cell and a daughter cell had the same morphologies. Those observations indicated that each chromosome in the mother cell is somehow duplicated before the cell undergoes mitosis. Los Alamos Science Number 20 1992 Understanding Inheritance Meiosis, the type of cell division undergone by the precursors of gametes, was found to be a much more complex process than mitosis. It involves two successive cell divisions and can yield, four gametes each containing one-half the number of chromosomes as the precursor cell. (Thus meiosis also must be preceded by chromosome duplication.) Furthermore, the haploid set of chromosomes in each gamete is not a haphazard selection from the diploid set of the mother cell. Instead each gamete is endowed with a randomly selected member of each pair of homologous chromosomes in the mother cell (see "Meiosis"). That is, the probability of a gamete's being endowed with one member of a pair of homologous chromosomes is the same as the probability of its being endowed with the other member, and, equally important, the outcome of its endowment with a member of one pair of homologous chromosomes has no effect on the outcome of its endowment with a member of another pair. In other (and more arcane) words, meiosis equally segregates each pair of homologous chromosomes and independently assorts the complete set of homologous chromosomes. The X chromosome and the Y chromosome of a male also were found to segregate equally during meiosis, even though they are not homologous in the sense of being physically indistinguishable. That fact implies that a male produces two equally probable sperm types, one containing a Y chromosome and the other an X chromosome. Thus fertilization of an egg by a sperm results in two equally probable combinations of sex chromosomes, XY and XX. The equal segregation and independent assortment of chromosomes during meiosis leads to diversity among the chromosome sets of the offspring of sexually reproducing organisms. Consider, for example, an organism that possesses but two pairs of homologous chromosomes denoted by 1 and 1' and 2 and 2'. Such an organism produces, with equal probability, four types of gametes, those containing 1 and 2, 1 and 2', 1 and 2, and 1' and 2'. If the organism is self-fertilizing (as are many plants and lower animals), then of the sixteen possible types of offspring, only four possess a set of chromosomes identical to the parental set. In contrast, bacteria reproduce asexually by a type of cell division that, like mitosis, yields only genetic replicas of the mother cell. (Bacteria are not, however, genetically immutable, since various mechanisms can effect changes in their genetic material, which are then transmitted to their offspring.) In general, if a sexually reproducing organism has N pairs of homologous chromosomes, it can produce 2N types of gametes, and if it is self-fertilizing, only 2Nof the V possible types of offspring possess a set of chromosomes identical to the parental set. In other words, the probability of an offspring's possessing a set of chromosomes identical to the parental set is 1/2". When N equals twenty-three, that probability equals 118,388,608, a very small number. The probability of human parents producing an offspring with a set of chromosomes identical to that of either parent is even closer to zero, since although humans do possess twenty-three pairs of equally segregating and independently assorting chromosomes, they are not of course self-fertilizing. Discussed later is a process that leads to even more differences among the chromosome sets of sexually Number 20 1992 Los Alums Science Understanding Inheritance Mi t osi s i s the type of cell division that produces two daughter cells from a single mother cell. Each daughter cell has a set of chromosomes identical to the set possessed by the mother cell. Mitosis i s Mother cell Centrosome Nuclear membrane Homologous chromosome pair Centromere Sister- chromatid pair the mechanism whereby Mitotic a multicellular organism spindle increases in size and Microtubule replaces dead cells and whereby single-celled eukaryotic organisms reproduce asexually. The interested reader can find a striking series of photomicrographs of mitosis i n the lily Haemanthus katherinae on page 7 of Genes and Genomes: A Changing Perspective by Maxine Singer and Paul Berg (University Science Books, 1991). Daughter cells INTERPHASE Gl-During GI (see "The Eukaryotic Cell Cyclen) the chromosomes of the mother cell are very long and very thin. Only two of the cell's Npairs of homologous chromosomes are shown, and the members of each homologous pair are depicted in different shades of the same color. The centrosome is the source of fibrous proteins called microtubules. One function of microtubules is to direct the motion of chromosomes during mitosis (and meiosis). G,-The mother cell has replicated its complement of chromosomes (during the preceding S phase) and all other cellular material required for cell division, including the centrosome. The two identical copies of each chromosome are bound together along their centromeres into a so-called sister-chromatid pair. MITOTIC PHASE Prophase The onset of mitosis is signaled by the ordered compaction, or conden- sation, of chromosomes into microscopically visible threads. Microtu- bules radiating from the two centrosomes collectively compose the mitotic spindle. Prometaphase The chromosomes have condensed further, and the centrosomes have migrated to opposite sides of the cell. Disintegration of the nuclear membrane has allowed microtubules to bind to each chromosome at a region within its centromere. Metaphase The chromosomes have assumed their most condensed configuration, and the sister-chromatid pairs have assumed the familiar X shape. Under the influence of opposing forces exerted by microtubules radiat- ing from both centrosomes, each sister-chromatid pair has become aligned along the midplane of the cell. Anaphase The bond joining each sister-chromatid pair has broken, and the members of each former sister-chromatid pair have begun moving toward opposite sides of the cell. As a result, a set of chromosomes identical to the set initially possessed by the mother cell becomes segregated in each side of the cell. The cell has begun to elongate and narrow at the midplane. Telophase A new nuclear membrane has formed around each segregated set of chromosomes, the chromosomes have begun to decondense, and the cell has begun to divide. INTERPHASE GICleavag of the extranuclear cellular mate- rial has produced two daughter cells, and the chromosomes in each have decondensed further in preparation for the biosynthetic activities of G,. Los Alamos Science Number 20 1992 Understanding Inheritance PREMEIOTIC PHASE Germ-line cell Centrosome The germ-line cell, whch may be an oogonium in an ovary or a spermatogo- nium in a testis, appears little different from a somatic cell in G,. Only two of Nuclear the germ-line cell's N pairs of homologous chromosomes are shown, and the membrane members of each homologous pair are depicted in different shades of the Homologous same color. chromosome pair Meiosis is the type of cell division that produces the gametes (eggs and sperms) The germ-line cell has replicated its complement of chromosomes and all centromere whose union is the first other cellular material required for cell division, including the centrosome. The two identical copies of each chromosome are bound together along their step in the creation of a centromeres into a sister-chromotid pair. new human or other Sister- chromatid sexually reproducing pair organism. Only MEIOTIC PHASE so-called germ-line cells undergo meiosis, Prophase I The onset of meiosis is signaled by a limited condensation of chromosomes. and each gamete Homologoussister-chromatid pairs have become closely associated, forming contains a haploid set N tetrads and allowing "crossing over" to occur, here within only one tetrad. Crossing over results in the exchange of corresponding portions of homolo- of chromosomes-a set gous chromosomes. The germ-line cell now lingers in prophase I for a time composed of one that ranges, depending on the species, from a few days to many years. 4~ crossing member of each of the Metaphase I The germ-line cell has passed through prometaphase I (not shown) and has entered metaphase I. The chromosomes have fully condensed, and the tetrads have become aligned along the midplane of the cell. Anaphase I The members of each tetrad have separated and begun moving toward opposite sides of the cell. Depicted here is but one of the 2N possible outcomes of the motion of the members of the Ntetrads. The equal probability of each possible outcome is the physical basis for Mendel's laws of equal segregation and independent assortment. 4N Prophase II The germ-line cell has passed through telophase I (not shown) and has divided into two cells, each of which has entered prophase II. Note that the products of the first meiotic division, like the products of mitosis, have the same number of chromo- somes as the original cell. However, a product of mitosis contains N homologous chromosome pairs, whereas a prod- uct of the first meiotic division contains two identical copies of each of N nonhomologous chromosomes. Anaphase II Both cells have passed through prometaphase II and meta- phase II (not shown). Each sister-chromatid pair has sepa- rated, and the members of each former sister-chromatid pair have begun migrating to opposite sides of the cell. POSTMEIOTIC PHASE N pairs of homologous chromosomes possessed by the diploid germ-line cell. The transition from diploidy to haploidy is accomplished by two successive partitions of nuclear material. During each partition the motions of the chromosomes are directed, as they are during mitosis, by microtubules radiating from two centrosomes. Each cell has passed through telophase II (not shown) and divided into two gametes. Thus each meiosis can yield four gametes. However, meiosis of an oogonium usually yields only one egg because each division of extranuclear material usually yields only one cell that survives because it receives most of the extranuclear material. Number 20 1992 Los Alamos Science Gametes Understanding Inheritance reproducing organisms and their offspring: the "crossing over" that occurs between homologous chromosomes during the first stage of meiosis (see "Meiosis"). Together, crossing over and equal segregation and independent assortment essentially guarantee that in the whole history of Homo sapiens, no two individuals (except the pairs of identical twins arising from single fertilized eggs) have been alike genetically. The facts that accumulated about chromosomes and their behavior during mitosis and meiosis suggested that the link between generations (of cells or organisms) was a substance present in chromosomes. In 1896 the American cell biologist Edmund Beecher Wilson (1856-1939) suggested that the substance of inheritance was the "nuclein" isolated in 1874 by the Swiss chemist Johann Friedrich Miescher (1844-1895) from the nuclei of human pus cells and salmon sperms. Nuclein was found to be composed of two types of chemicals, a nucleic acid and various "albumins," or proteins. By the end of the century, the most advanced thinkers about the mechanism of inheritance, such as Wilson, Boveri, and August Friedrich Leopold Weismann (1834-1915), were of the opinion that nuclein was the stuff of inheritance. A Theory of Inheritance. The nineteenth century was the setting also for the elegant work of the Austrian Gregor Johann Mendel (1822-1884), an Augustinian monk better versed in mathematics and physics than in biology. In 1865 Mendel published visionary explanations for the results of his plant-breeding experiments. Among them was the notion that discrete units of heredity (which he called Merkmale and we call genes) are passed unchanged from generation to generation even though each unit is not necessarily expressed as an observable trait in every generation. He also proposed that each plant possesses two such units for each observable trait, one inherited from its male parent and the other from its female parent. Mendel developed statistical laws for predicting how the paired units of heredity are parceled out to offspring. The laws are now known to be applicable (within certain limits) to all sexually reproducing organisms. Furthermore, Mendel's laws parallel the behavior of homologous chromosome pairs during meiosis (the equal segregation of a single chromosome pair and the independent assortment of different chromosome pairs) because, as we now know, Mendel's units of heredity reside on chromosomes. Remarkably, Mendel deduced his theory before chromosomes were identified as the probable carriers of genetic information. Hiqroposals are discussed here out of chronological order because their significance to the emerging science of genetics was not grasped-and probably could not have been grasped-until after the observed behavior of chromosomes during meiosis could provide a physical basis for his abstract theory. Mendel's publication remained unknown, in fact, until 1900 when, working independently, the German botanist Karl Erich Correns (1 864- 1 93 3), the Dutch botanist Hugo De Vries (1848-1935), and the Austrian botanist Erich Tschermak von Seysenegg (1871-1962) performed similar experiments, arrived at similar explanations, and brought Mendel's publication to light, garnering him well- deserved albeit posthumous fame. Los Alamos Science Number 20 1992 Understanding Inheritance To best appreciate Mendel's work, one needs to know something about the successes and shortcomings of previous efforts at selective breeding of plants and animals. Selective breeding was certainly well under way in the Neolithic period, and numerous early successes produced most of the strains of domestic plants and animals now in existence. Some of the plant-breeding efforts led to plants so different from their ancestral relatives that they can be considered hurnan-made species. Notable examples are today's Zea mays (maize, or corn) and Solanum tuberosum (the potato plant). Natives of present-day Mexico began developing maize from tiny-eared relatives between 4000 and 5000 years ago, and the pre-Columbian inhabitants of present- day Peru and Bolivia developed a plant producing palatable tubers from relatives producing tubers so bitter as to be inedible. When introduced into the Old World in the sixteenth century, maize and the potato had a tremendous influence on the world's economy. The potato, for example, replaced wheat and rye in the cool areas of northern Europe as a staple food because it produces more calories per acre. (Only rice is as efficient a calorie-producer as the potato, and rice is a warrn-climate plant.) The introduction of maize and the potato is thought by some historians to have significantly accelerated the great increase in the rate of population growth of western Europe that. began in about the fourteenth century. Successful as the early breeding efforts were, and those of the noted eighteenth- century plant breeders Josef Gottlieb Koelreuter (1733-1806) and Joseph Gaertner (1732-1791), they certainly were not what we would now call scientific, since in general the outcomes of breedings were quite unpredictable. In contrast, Mendel's aim at the outset of his eight-year effort was to ascertain the statistical rules governing the inheritance of variable traits. Both his methodology and his theoretical conclusions are the foundation for all future studies in genetics. Mendel chose to work with a plant that exhibits distinct variants of a number of traits, the garden pea (Pisum sativum). He concentrated on two variants of each of seven traits, including pod color (green and yellow) and flower color (violet and white). His unique experimental approach began by allowing plants that bore, say, green pods to self-pollinate for a sufficient number of generations to assure that each new generation of self-pollinated plants would also bear only green pods. Since each of the fourteen purebred strains consistently bore only one variant of each of a single trait, the purebred strains were advantageous to Mendel's work, providing a certain and observable starting point and amounting, essentially, to a control on his experiments. Mendel proceeded to study the inheritance of each of the seven traits, first one at a time and then in pairs. All of the experiments on the inheritance of single traits followed the same pattern as that described here for pod color. First, Mendel cross-pollinated the two strains purebred for pod color, the strain bred true for green pods and the strain bred true for yellow pods. (Together the two purebred strains are called the parental generation.) Regardless of which strain he used as the male (pollen-contributing) parent, all the resulting offspring (called here hybrids or members of the first generation) bore only green pods. Today we would Number 20 1992 Los Alamos Science Understanding Inheritance say that all members of the first generation exhibited the same phenotype, a term introduced in 1909 by the Danish botanist Wilhelrn Ludwig Johannsen (1857-1927). S yrnbolically , parental generation + first generation, and in particular, purebred green x purebred yellow - hybrids, all green. The natural question to ask is: Has the capacity to produce the yellow-pod phenotype disappeared altogether, or is it still present but somehow suppressed in the first- generation hybrids? To find out, Mendel selfed the hybrids (that is, he allowed them to self-pollinate), and he observed that the yellow-pod phenotype reappeared among the resulting offspring (the second generation). When Mendel counted the number of second-generation offspring exhibiting each phenotype (a novel procedure at the time), he found that the ratio of green-podded plants to yellow-podded plants was approximately 3 to 1. Symbolically, first generation -s- second generation and in particular, green hybrid x green hybrid - 3 green : 1 yellow. To find out whether any members of the second generation had the capacity to produce offspring with the phenotype they themselves did not exhibit, Mendel selfed the members of the second generation. He found that all the yellow-podded members behaved like plants purebred for yellow pod color; that is, they produced only yellow- podded offspring. In contrast, only one-third of the green-podded members of the second generation behaved like plants purebred for green pod color, whereas the remaining two-thirds behaved like the first-generation hybrids, producing both green- and yellow-podded progeny in the ratio of 3 to 1. In other words, the ratio 3 green: 1 yellow exhibited by the second generation is more accurately described as the ratio 1 pure green:2 hybrid green: 1 pure yellow. Mendel continued selfing the green-podded members of successive generations and always found that approximately two-thirds of the green-podded progeny of green hybrids were again green hybrids, behaving just like the first-generation hybrids. That is, when those two-thirds were allowed to self-pollinate, they produced green- and yellow-podded progeny in the approximate ratio of 3 to 1. To explain the mathematical regularity of his results, Mendel advanced a theoretical model of inheritance. First, and most basic, is the idea that the fertilized egg (zygote) from which a plant develops contains two genes, or units of heredity, for pod color, one contributed by the egg and the other contributed by the sperm. ("Gene" is another term coined by Johannsen.) Mendel also proposed that there were two distinct genes for pod color, one for green and one for yellow. The gene for green pod color he called dominant (and designated it by a capital letter, say PI because any plant that Los Alamos Science Number 20 1992 Understanding Inheritance carried that gene bore green pods. The gene for yellow pod color he called re- cessive (and designated it by a lower- case letter, p). Today we say P and p are different forms, or alleles, of the gene for pod color. Since the egg and sperm each contain only one allele, a fertilized egg contains one of three possible allele pairs (or possesses one of three possible genotypes, another word coined by Jo- hanssen): PP, Pp, or pp. Mendel pro- posed that the plants purebred for green pod color contained the pair PP, those purebred for yellow pod color contained the pair pp, and the hybrid plants, which bore only green pods but produced both green- and yellow-podded progeny when allowed to self-pollinate, contained the pair Pp. In modem terminology plants possessing the genotype PP are said to be homozygous dominant; those possess- ing the genotype pp are homozygous re- cessive; and those with the genotype Pp are heterozygous. This terminology and other nomenclature of genetics is illus- trated in the table. Trait Phenotypes Genotypes Alleles Gene PP (homozygous dominant) Green / (dominant) ----.+ ^ p PP / (dominant) / (heterozygous) Pod color \ ' h \ Pod-color / gene Yellow PP < P (recessive) (homozygous (recessive) recessive) FF (homozygous dominant) violet 7 \ F (dominant) Ff 6 (dominant) / (heterozygous) \ Flower color' White f f < ~ f (recessive) (homozygous (recessive) recessive) With those hypotheses and the laws of probability Mendel constructed a probabilis- tic model that explained the results of his experiments. The model is shown in "Mendelian Genetics." The element of chance is operative in both the formation of gametes (eggs and sperms) and in the formation of zygotes (fertilized eggs). Mendel assumed that during the formation of gametes, the pair of alleles for pod color sepa- rates (or segregates) equally; in other words, the probability that a gamete will receive one or the other of the pair is equal to one-half. He therefore predicted correctly that among the gametes produced by a green hybrid (a plant heterozygous for pod color), . approximately one-half would contain P and the remainder would contain p. Be- cause, as is now known, each member of the allele pair for a given trait resides at the same location on one or the other of a pair of homologous, equally segregating chromosomes, only one allele enters each gamete. Therefore, the behavior of a single allele pair during meiosis is known as Mendel's law of equal segregation. The element of chance is also operative in the random union of an egg and a sperm to form a zygote with a particular genotype. For example, in the formation of offspring of the green hybrids, the probability of forming a zygote with the genotype PP, call it Pr(PP), is the joint probability of two independent events, namely, the probability Number 20 1992 Los Alamos Science Understanding Inheritance that an egg contains P, and the probability that a sperm contains P. Since the joint probability is the product of the probabilities of the two independent events, we can write Pr(PP) = Pr(P)Pr(P). Mendel applied this rule to predict the probability of finding a given genotype among the progeny of the green hybrids. Since green hybrids produce gametes containing P or p, each with a probability of 112, the eggs and sperms combine in four equally probable ways to produce offspring with the genotypes PP, Pp, pP, or pp, and the probability of each of those genotypes is 1/2 times 1/2, or 1/4. Since Pp and pP are equivalent genotypes (it doesn't matter whether a particular allele arrived with the sperm or the egg), the probabilities for Pp and pP are added to predict that the probability of an offspring's having the genotype Pp is 1/2. In other words, the three possible genotypes occur in the ratio 1 PP:2 Pp:lpp. Translating the genotypes into phenotypes yields the ratio 3 green: 1 yellow in agreement with Mendel's observations. Having explained the 3 green:! yellow ratio by advancing a general model, Mendel went on to test the model by crossing green hybrids (genotype Pp) with plants purebred for yellow pod color (genotype pp). He predicted that the offspring would have the genotypes Pp andpp in the ratio 1 Pp:l pp and found, in agreement with the model, that approximately one-half the progeny bore green pods and the remainder bore yellow pods. Mendel obtained similar results for all seven traits. In other words, he inferred the existence of two alleles for each trait, one dominant and one recessive. However, we now know that the alleles of a gene do not always exhibit a dominant-recessive relationship. Sometimes the pairing of different alleles leads to a blend (for example, pairing of the snapdragon alleles that specify white and red flowers leads to pink flowers); sometimes it leads to simultaneous exhibition of both phenotypes (for example, pairing of the human alleles that specify A and B blood types, which are characterized by the presence of the antigens A and B, respectively, on the surface of red blood cells, leads to AB blood type, which is characterized by the presence of both antigens). However, the validity of Mendel's research and theoretical conclusions is unaffected by the fact that he focused, presumably by chance, on traits controlled by alleles that do exhibit the phenomenon of dominance. Mendel next proceeded to study the co-inheritance of two traits, say pod color (specified by dominant and recessive alleles P and p, respectively) and flower color (specified by dominant and recessive alleles F and f, respectively). Again, he first developed two purebred strains, one purebred for green pod color and violet flower color (genotype PPFF) and the other purebred for yellow pod color and white flower color (genotype ppff). As before, Mendel cross-pollinated the purebred strains, thus producing dihybrid offspring, each heterozygous for both traits. He selfed the resulting first dihybrid generation to produce the second dihybrid generation. Each member of the first Los Alamos Science Number 20 1992 Understanding Inheritance dihybrid generation exhibited both dominant phenotypes; that is, they bore green pods and violet flowers. Members of the second dihybrid generation exhibited four composite phenotypes in a 9:3:3:1 ratio, as shown below. Possible Phenotypes among Second Dihybrid Generation green pods, violet flowers green pods, white flowers yellow pods, violet flowers yellow pods, white flowers Fraction Exhibiting Phenotype Note that the ratio of green- to yellow-podded members of the second dihybrid generation was still 3 to 1, just as it was in the second generation produced by the experiments on pod color alone. The ratio of violet- to white-flowered members of the second dihybrid generation also was 3 to 1. Mendel realized that the 9:3:3: 1 ratio resulted from multiplicative combinations of the two 3:l ratios. He therefore concluded that the phenotypes for the two traits are inherited independently. hi other words, the probability of each composite phenotype is the product of the probabilities of the two "component" (single-trait) phenotypes. For example, the probability that a second-dihybrid-generation member will bear green pods and white flowers (3116) is the product of the probability of its bearing green pods (314) and the probability of its bearing white flowers (114). The independent inheritance of the two traits implies that when members of the first dihybrid generation produce gametes, segregation of the alleles for pod color is independent of the segregation of the alleles for flower color. In other words, the two allele pairs assort independently. The members of the first dihybrid generation have the genotype PpFf, so each gamete receives P or p with a probability of 112 and F or f with a probability of 112. Since the segregation of each allele pair is an independent event, the individual probabilities are multiplied to predict that the probability of forming each of the four possible types of gametes, those containing PF, Pf, pF, or pf, is 112 times 112, or 114. Number 20 1992 Los Alamos Science Random fertilization of eggs by sperms produces the sixteen genotypes shown in the probability table for the second dihybrid generation in "Mendelian Genetics." Each has a probability of 114 times 114, or 1/16. The composite phenotype corresponding to each genotype is also shown. Counting the number of times each phenotype appears yields the 9:3:3: 1 ratio observed by Mendel. The physical basis for Mendel's law of independent assortment is the independent assortment of the various different pairs of homologous chromosomes during meiosis. Understanding Inheritance MENDELIAN GENETICS 1 Mendelis experiments on the inheritance of single traits and pairs of traits, illustrated here, led him to postulate the concept of discrete, particulate units of heredity that pass unchanged from generation to gen- eration. He studied seven traits (character- istics) of the garden pea, each of which exhibited two alternative forms. For example, pod color could be either green or yellow, and flower color could be either violet or white. As described in the main text, Mendel found that one form of each trait was domi- nant and the other recessive and that the progeny of controlled breedings exhibited one form or the other in definite ratios. The observed mathematical regularities led to the model of inheritance described here. Mendel knew that his plants reproduced sexually, but he did not know that chromo- somes exist nor that the number of chromo- somes was reduced by one-half during the formation of gametes. As a result his termi- nology was rather imprecise. He did not clearly distinguish the form of a trait from the units of heredity whose actions determine the trait. That distinction was made almost half a century later by Johannsen, who coined the term gene for the particulate units of heredity, the term genotype for the genes whose action determines a trait, and the term phenotype for the form of the trait determined by the genotype. The more pre- cise terminology is used in the following description of Mendel's model and in the accompanying figures. Mendel's model of inheritance includes four postulates. 1. Each plant contains a pair of genes for each trait; that is, the genotype for a trait is specified by a pair of genes. 2. During the formation of gametes, the gene pair for a trait segregates equally; that is, the genes in the pair are parceled out to the gametes in a fashion such that each gamete receives only one member of the pair and has an equal chance of receiving either member of the pair (the law of equal segregation). 3. A gene has two forms, or alleles, desig- nated by, say, A and a. Only plants with the genotype aa (homozygous for a) exhibit the recessive phenotype. A plant with the geno- type AA (homozygous for A) or the geno- type Aa (heterozygous) exhibits the domi- nant phenotype. 4. During the formation of gametes, segre- gation of the gene pair for any one trait is independent of the segregation of the other gene pairs. Consequently a plant heterozy- gous for two traits (genotype AaBb) pro- duces gametes containing AB, Ab, aB, and ab with equal probability (the law of inde- pendent assortment). Note that the law of independent assortment holds only if the genes for the different traits are on different pairs of homologous chromosomes. Mendel's laws of equal segregation and independent assortment can be applied in two ways. If one knows the genotypes of both parents, one can predict the probability of the genotype of a future offspring. Or, working backward, if one observes in exist- ing offspring the approximate ratios of phe- notypes predicted by Mendel's laws, one can often infer the genotypes of the parents, just as Mendel did. Los Alamos Science Number 20 1992 Understanding Inheritance Mendel's Experiments on Inheritance of One Trait (Pod Color) Methodology Step 1 : Cross-pollinate two strains of peas, one purebred for green pod color, the other purebred for yellow pod color. Result: All first- generation hybrids bear green pods. Step 2: Self-pollinate the green hybrids. Result: Second-generation plants bear either green or yellow pods in the approximate ratio of 3 green to 1 yellow. Further selfing shows that half the second generation (or two-thirds of the green-podded members) are hybrids. Theoretical Model Parental generation (purebred strains) Probability of each gamete type First generation (green hybrids) Probability of each gamete type Second generation I Meiosis Gametes 1 P I Meiosis Gametes I P cross-pollinate Sperms Self-pollinate Mendel assumed that each plant contains a pair of genes for pod color. Therefore, each purebred parent is homozygous; that is, each contains two identical genes for pod color. P = green-pod-color allele p = yellow-pod-color allele Since a fertilized egg results from the union of two gametes, each gamete contains one allele for pod color. Because all first-generation offspring bore green pods, Mendel called green the dominant pod color and yellow the recessive pod color. Mendel inferred that whenever P, the allele for the dominant pod color, is present, the plant bears green pods (the law of dominance). Mendel inferred that the pair Ppsegregates equally into the gametes; that is, each gamete (whether egg or sperm) receives P or p with equal probability of one-half (law of equal segregation). Random union of eggs and sperms produces four possible combina- tions of alleles in the offspring. As shown by the table, the probabili- ties of each gamete type are multiplied to yield the probabilities of the four possible genotypes in the second-generation offspring. Since Pp and pP are equivalent genotypes, the probabilities of each are added to yield a probability of one-half for the genotype Pp. Mendel's model predicts, for members of the second generation, phenotypes in the ratio 3 green : 1 yellow (in agreement with Mendel's observations) and genotypes in the ratio 1 PP : 2 Pp : 1 pp. Number 20 1992 Los Alamos Science 23 Understanding Inheritance - - - Mendel's Experiments on Inheritance of Two Traits (Pod Color and Flower Color) Methodology Step 1 : Cross-pollinate two strains of peas, one purebred for the two dominant phenotypes (green pods and violet flowers), the other purebred for the two recessive phenotypes (yellow pods and white flowers). Result: All first-generation dihybrids bear green pods and violet flowers. Step 2: Self-pollinate the first-generation dihybrids. Result: Second-generation plants exhibit four composite phenotypes (pod color, flower color) in the ratio of 9 (green, violet) : 3 (yellow, violet) : 3 (green, white) : 1 (yellow, white). Theoretical Model Parental generation (strains purebred for two traits) Probability of each gamete type First generation (green-pod and tiolet-flower dihybrids) Probability of each gamete type I I Meiosis Gametes 1PF Meiosis Gametes ^pf cross-pollinate Sperms 1 - Pf 4 Second generation Each purebred parent is homozygous for both pod color and flower color. Phenotype P = green-pod-color allele p = yellow-pod-color allele F = violet-flower-color allele f = white-flower-color allele Pod color Each gamete carries only one gene for each trait. All first-generation (dihybrid) offspring bear violet flowers and green pods, the dominant phenotypes, in agreement with the law of dominance. Independent equal segregation of each allele pair (Pp and Ft) produces gametes containing one of four equally probable combinations of alleles (law of independent assortment). Random union of eggs and sperms produces offspring containing one of sixteen equally probable combinations of alleles. All are equally probable because all gamete types are equally probable. The sixteen combinations reduce to nine different genotypes and four different composite phenotypes, which are predicted from the probability table to occur in the ratio 9:3:3:1 in agreement with Mendel's observations. 24 Los Alamos Science Number 20 1992 Understanding Inheritance Therefore, the law applies only if the allele pairs for the two traits reside on different pairs of homologous chromosomes. In fact, deviations from Mendelian predictions for the co-inheritance of two traits is evidence that the two traits are specified by allele pairs that reside on the same pair of homologous chromosomes. This discussion of Mendel's theory of inheritance ends with two points of note. First, although the theory is now known to be applicable to humans as well as to pea plants, it is unlikely that it could have been deduced from data about the outcomes of human breedings. As subjects of inheritance studies, humans pose several disadvantages: The controlled breeding of humans is generally regarded as inappropriate and would be difficult to achieve even if it were not; each pair of human parents typically produces too few data (offspring) for analysis of the sort required; and the rate at which humans produce offspring is too slow to suit most experimenters' taste. Moreover, many human traits are specified not by a single allele pair but by many allele pairs. The second point of note concerns the utility of Mendel's theory as a predictive tool, particularly for human breedings. The theory can be applied directly only to traits determined by a single allele pair. Such traits are called Mendelian traits because they are inherited in accordance with Mendel's laws. Most Mendelian traits of humans are disorders-some mild, some grave-caused by the presence of a defective allele. To determine the probability that an offspring will be affected by a Mendelian disorder requires knowing the parental genotypes for the disorder and whether the disorder is caused by a dominant or a recessive allele. The required genotypic information for the parents can often be inferred from the phenotypes of their existing offspring and of their parents, and information about whether the defective allele is dominant or recessive can often be inferred from the pattern of inheritance of the disorder in other families (see "Inheritance of Mendelian Disorders"). More. than three thousand human Mendelian disorders have been identified. One of the goals of the Human Genome Project is to supply the tools necessary to isolate the causative alleles from the vast quantity of human genetic material and to identify the defects in the alleles. A Theory of Evolution. The nineteenth century brought not only the rise of cell biology and the work of Mendel but also a growing acceptance of the fact of evolution, of the creation of extant organisms by changes in the life forms that first populated this planet. Belief in the ancient principle of the invariability of species waned, and in its place came tile conviction that new species had been and are being formed. (A notable holdout to the idea of evolution was the eminent Harvard zoologist Jean Louis Rudolphe Agassiz (1807-1873), who was what we would today call a creationist.) The veering of scientific opinion toward evolution led to development of a theory of evolution based on natural selection. Formulated independently by Charles Robert Darwin (1 809-1 882) and Alfred Russell Wallace (1 823-19 13), the theory was presented to the world first in a jointly authored short publication (1858) and later in Darwin's classic book On the Origin of Species (1859). Crucial to development of the theory were the observations that offspring resembled their parents Number 20 1992 Los A l ms Science Understanding Inheritance -3 INHERITANC Although some inherited disorders of humans are due to the combined effects of multiple genes (multigenic disorders) or to the combined effects of genes and the environment (multifactorial disor- ders), a so-called Mendelian disorder is caused by a single defective allele. Over 3000 Mendelian disorders are known. They range from mild conditions such as red-green color blindness to life-threatening diseases such as cystic fibrosis. Because the defective allele can be either dominant or recessive and can reside on either an autosome or a sex chromosome (in particular, the X chromosome-very few genes reside on the small human Y chromosome), four types of Mendelian disorders are possible: autosomal dominant, autosomal recessive, X-linked dominant, and X-linked recessive. Each type of disorder reveals itself through a distinctive pattern of inheritance in a family pedigree. Illustrated here are the patterns for three of the four types of Mendelian disorders. Consider first the inheritance of an autoso- ma1 dominant Mendelian disorder. Many such disorders are expressed only in adult- hood, including Huntington's disease, neurofibromatosis, and polycystic kidney disease. Shown in (a) are the equally probable genotypes and the phenotypes of the offspring of an affected father and an unaffected mother (or of an affected mother and an unaffected father). The genotype of the affected father can be either DD or Dn, where n is the nondefective recessive ver- sion of the defective dominant allele D. Because the father's having the genotype DD is the less typical and less interesting situation (all his offspring would beaffected), it is assumed in (a) that the father has the genotype Dn. Because the mother is unaf- fected, her genotype must be nn. The equal segregation of chromosomes during meio- sis implies that the offspring of such a mat- ing can have one of two equally probable genotypes: Dn or nn. Therefore the prob- ability of an offspring's being affected is 112. Note carefully, though, that only in the limit of an infinite number of offspring will the ratio of affected to unaffected offspring be equal to 1. Also shown in (a) is the pedigree of a family afflicted with hypercho- lesterolemia, a dominant disorderthat causes excess levels of cholesterol in the blood. A thirty-year-old white male (11-4) suffered a myo- cardial infarction, a type of heart blockage, and was then found to test positively for hypercholesterolemia. (a) Autosomal Dominant Disorder 1 Probabilistic Prediction Affected Unaffected Carrier 0 Female I Ma'e Dn nn Dn nn A fifty-fifty chance of inheriting the disorder Observed Pedigree Vertical inheritance pattern Further tests indicated that his sister (11-1) and his four children (111-6, Ill- 7,111-8,111-9) also had hypercholesterolemia. In addition, afamily history revealed that the man's father (1-3) and uncle (1-1) both died of myocardial infarctions before reaching the age of fifty-five. Note that all of 11-4's children are affected by the disorder, an outcome that is not inconsistent (although it may appear to be) with the probabilistic predictions based on the chromosome theory of heredity. Note also that the dis- ease appears in all three generations of the pedigree; such a "vertical" pattern is char- acteristic of dominant disorders. Shown in (b) is the inheritance of an auto- somal recessive Mendelian disorder, ex- amples of which include Tay-Sachs dis- ease, cystic fibrosis, and sickle-cell anemia. Assume a typical situation: Both parents are carriers, or, in other words, are unaf- fected but have the genotype Nd, where N is the nondefective dominant version of d. The equal segregation of chromosomes during meiosis implies that the probability of an offspring's having the genotype dd and therefore of being affected is 114. In addi- tion, the probability of an offspring's having the genotype Nd or dN (and of being a Los Alamos Science Number 20 1992 Understanding Inheritance (b) Autosomal Recessive Disorder Probabilistic Prediction (c) X-linked Recessive Disorder Probabilistic Prediction NN dN Nd dd A one-in-four chance of inheriting the disorder Observed Pedigree xNxd x* xNx^ X^Y Only males at risk of inheriting the disorder Observed Pedigree M Horizontal inheritance pattern Disorder passed to male offspring from female carriers carrier) is 112 and of having the genotype NN (and of being unaffected) is 114. Also shown in (b) is the pedigree of a family with an autosomal recessive Mendelian disor- der. Only two individuals, both in the third generation (111-1and Ill-4), are affected. All the other individuals listed are either carri- ers or unaffected. Since typically siblings in only a single generation are affected by a recessive Mendelian disorder, its inherit- ance pattern is referred to as horizontal. Shown in (c) is the inheritance of an X-linked recessive Mendelian disorder. Such disor- ders include hemophilia, which is the result of a lack of an essential blood-clotting fac- tor, and Duchenne muscular dystrophy, which causes progressive muscle weak- ness and death in early adulthood from respiratory problems. Again assume a typi- cal situation: The mother is a carrier and therefore has the genotype x^x*, and the father is unaffected and therefore has the genotype X q . Any male offspring has a probability of 112 of being affected, and any female offspring has a probability of 112 of being a carrier. Also shown in (c) is a pedi- gree of a family with Duchenne muscular dystrophy. One son (11-2) and two daugh- ters (11-1 and 11-3) inherited the maternal X chromosome on which the defective allele resides. The son, possessing only one X chromosome, is affected. On the other hand, the daughters are unaffected carriers, but their sons (111-2, 111-6, and 111-7)inherited the defective allele. The pedigree illustrates the typical pattern of inheritance of an X-linked recessive disorder: transmission from an affected male through his daughters to his grandsons. Females can inherit the dis- ease if the father is affected and the mother is either affected or a carrier. Number 20 1992 Los Alamos Science 2 ' Understanding Inheritance only incompletely and that selective breeding had produced plants and animals quite different from the ancestral strains. Darwin arrived at his conclusions in large part by doing a Gedankenexperiment, much as Albert Einstein later arrived at his theory of relativity. It should be noted that not all of Darwin's thinking was as forward-looking as his theory of evolution. He was an exponent of a form of pangenesis (see "Early Ideas about Heredity") and of blending inheritance (the notion that the characteristics of offspring are the result of a melding of the parental characteristics). Darwin's cousin Francis Galton (1822-191 I), in his own way also a genius, tried to point out to Darwin, without success, that neither theory of inheritance made much sense. In doing so Galton came very close to developing the same theory of particulate inheritance as had Mendel, although like Darwin, he was unaware of Mendel's work. Like Mendel, Galton was cognizant of probability and statistics. He can be considered the founder of modem biostatistical theory, which has been an immensely powerful tool in the development of genetic theory. The cell biologists, Mendel, and Darwin and Wallace made basic contributions to the foundations of modem genetics, but they did so essentially in isolation from each other. Mendel was influenced to some extent by the findings of the cell biologists and of the evolutionists, but neither of the latter were influenced by him or by each other. Such isolation among different fields of science, though detrimental to progress, is still today not uncommon. Things Come Together The science of genetics was bom in the first decade of the twentieth century through fusion of Mendel's theory of inheritance and the cell biologists' knowl- edge about chromosomes. In 1902 a student of Wilson's, Walter Stanborough Sutton (1 877-19 l6), and Boveri independently recognized the parallels between the real ob- jects called chromosomes and the theoretical constructs called genes-the occurrence of both as pairs, their separation in a similar fashion during gamete formation, and their re-pairing during fertilization-and proposed that each member of a pair of al- leles is located on one or the other member of a pair of homologous chromosomes. Thus was born the chromosome theory of heredity. The theory was soon proved, and during the period between 1910 and 1940th heyday of classical genetics-many allele pairs were localized to particular homologous chromosome pairs. Classical Genetics. The term "classical genetics" refers to those aspects of genetics that can be studied without reference to the molecular details of genes. The early stars of classical genetics were the American Thomas Hunt Morgan (1866-1945), his stu- dents Calvin Blackrnan Bridges (1889-1938), Hermann Joseph Muller (1890-1967), and Alfred Henry Sturtevant (1891-1970), and last but not least members of the genus Drosophila, most notably the common fruit fly Drosophila melanogaster. Morgan's interest lay (initially at least) in determining whether the changes that result in new species occur gradually or abruptly. He chose to study changes in D. melanogaster Los Alamos Science Number 20 1992 Understanding Inheritance because it reaches sexual maturity so rapidly, produces so many offspring, and is so easily and cheaply raised in the laboratory. The discovery, in the spring of 1910, of a lone white-eyed male fly among thousands upon thousands of red-eyed flies in the Fly Room at Columbia University was a momentous event, leading not only to proof of the chromosome theory of heredity but also to knowledge of previously unknown aspects of meiosis. Now is an appropriate time to emphasize the critical role of mutants in genetics. (A mutant is a member of a species that exhibits a phenotype different from the "wild- type" phenotype exhibited by most members of a natural population of the species.) Even knowledge of the existence of a gene is usually inferred from the existence of a mutant. When faced, for example, with a vast population of only red-eyed flies, how could anyone suspect that eye color is a manifestation of genes in operation? To be discussed later is another invaluable role of mutants-as tools for learning more specifically what genes do. (That genes determine physically observable traits is certainly true but remarkably vague.) An early outcome of the discovery of the white-eyed fly was Morgan's proposal that alleles for red and white eye color in D. melanosaster are located on its X chromosomes. Morgan arrived at that proposal by observing the eye colors of the progeny resulting from a series of breedings, a series that began with rnatings between the white-eyed male and wild-type red-eyed females. (Note that mutants must not only be discovered but also be allowed to survive and breed.) Because all the progeny were red-eyed, Morgan concluded that the red-eye-color allele is dominant. Next he interbred the progeny and found, just as Mendel would have predicted, that three- quarters of the resulting second-generation progeny were red-eyed and one-quarter were white-eyed. However, among neither the red-eyed nor the white-eyed second- generation flies did he find an equal number of males and females, as would be predicted if the observed segregation of sex chromosomes was independent of the presumed segregation of red-and white-eye-color alleles. Instead two-thirds of the red-eyed second-generation flies were females and all of the white-eyed flies were males. Morgan continued by mating red-eyed males to white-eyed females, a breeding that is the "reciprocal" of the original breeding of the lone white-eyed male. He found that half of the progeny were female and red-eyed and the other half were male and white-eyed, whereas Mendel would have predicted that all of the progeny would be red-eyed, just as all of the progeny resulting from the original breeding were red-eyed. To explain those deviations from Mendelian predictions, Morgan proposed that the red- and white-eye-color alleles are X-linked, or in other words that they are located on the X chromosomes. The reader can more easily verify that Morgan's hypothesis explains the outcomes of the breedings he carried out by using some symbolism. Let w and W denote, respectively, the recessive white-eye-color allele and the dominant red-eye-color allele. Denote an X chromosome containing w by XU' and an X chromosome containing W by xw. Then the first breeding Morgan carried out, the breeding Number 20 1992 Los Alamos Science Understanding Inheritance between wild-type red-eyed females and the white-eyed male, is denoted by xWxW x Xw Y. The progeny of such a breeding contain one of two equally probable combinations of sex chromosomes: xWxW and X^Y. In other words, half the progeny are female and red-eyed and half are male and red-eyed. The reader is urged to verify that Morgan's proposal explains the outcomes of the other breedings he carried out, namely xWxw x X^Y and XwX" x X ~ Y . Morgan's experiments certainly supported the chromosome theory of heredity, but the work of Bridges provided more direct confirmation. Bridges started by repeating, on a large scale, one of the breedings Morgan had carded out, the breeding between white-eyed female flies and red-eyed male flies. If, as Morgan proposed, the w and W alleles reside on the X chromosomes, that breeding can be represented by Xw Xw x X^Y and, as Morgan had observed, half of the resulting progeny would possess the sex-chromosome combination Xw X^ (would be red-eyed females) and half would possess the sex-chromosome combination XWY (would be white- eyed males). But Bridges' large-scale breeding produced a surprise: A very small fraction of the progeny (about one in every two thousand) were either white- eyed females or sterile red-eyed males. Bridges found, by direct microscopic observation of the chromosomes of the unusual progeny, that they possessed an anomolous number of sex chromosomes. The white-eyed females possessed two X chromosomes and one Y chromosome, and the sterile red-eyed males possessed a single X chromosome. Obviously the single X chromosome of a sterile red-eyed male must be the residence of the red-eye-color allele he must possess, and the pair of homologous X chromosomes of a white-eyed female must be the residences of the two white-eye-color alleles she must possess. Thus a combination of cytological data and genotypic and phenotypic data directly confirmed the chromosome theory of heredity. (Note that Bridges' "cytogenetic" evidence also indicated that the Y chromosome of D. melanogaster is involved in determining fertility rather than maleness.) A question about Bridges' work remains: How could the abnormal numbers of sex chromosomes in the unusual progeny be explained? Bridges proposed that the homologous X chromosomes of a female fruit fly occasionally fail to segregate during meiosis. Meioses in which such "nondisjunctions" occur would yield two equally probable types of eggs: eggs containing two X chromosomes and eggs containing no X chromosomes. Fertilization of those two types of eggs by the two types of sperms produced by a male fruit fly would result in four types of fertilized eggs: those containing the combination of sex chromosomes XmXmXp, the combination XmXmY, the combination Xp, and the combination Y. (The subscript on each X chromosome denotes maternal origin or paternal origin.) The combinations XmXmY and Xp are the combinations Bridges observed in the unusual progeny; he attributed the absence of unusual progeny containing the XmXmXp and Y combinations to a lethal overdose and a lethal underdose of X chromosomes. Nondisjunction is now known to be a rare but medically significant feature of meiosis. The human disorder known as Down syndrome, for example, is caused by nondisjunction of chromosomes 2 1. Los Alamos Science Number 20 1992 Understanding Inheritance It is odd that proof for the existence of a rare meiotic glitch-nondisjunction- antedated clear evidence for the existence of what is now known to be a common feature of meiosis~crossing over. (Nondisjunction occurs once in about every hundred thousand human meioses, whereas crossing over occurs about thirty-three times per human meiosis, or on average more than once per homologous chromosome pair per human meiosis.) As proposed by Morgan, crossing over brings about an exchange, between two homologous chromosomes, of corresponding regions of the chromosomes. (An analogy is the exchange, between two nearly identical yardsticks, of, say, initial seven-inch regions.) Because homologous chromosomes differ from each other in details of their chemical composition, the products of a single crossover are two "recombinant" chromosomes, each different from (but still homologous to) the other and the chromosomes that participated in the crossover. In particular, if the exchanged regions contained different alleles of two genes, the recombinant chromosomes contain combinations of alleles that are different from the combinations of alleles possessed by the participants (see "Crossing Over: A Special Type of Recombination"). Thus crossing over, like independent assortment, increases the genetic diversity of sexually reproducing organisms. But whereas independent assortment merely creates new combinations of existing chromosomes, crossing over can create new chromosomes, ones containing new combinations of alleles. Crossing over might today be regarded as merely another item in the phenomenology of meiosis were it not that it is the key element of a method for determining a measure of the distance between two genes (or, more precisely, two allele pairs) resident on the same chromosome (or, more precisely, on the same homologous chromosome pair). (Note that the method is applicable only to genes for which two or more alleles exist.) Called classical linkage analysis, the method is far from straightforward. The first step, of course, is to establish that two allele pairs are linked (are resident on the same homologous chromosome pair) by observing deviations from Mendelian predictions for the co-inheritance of the traits specified by the allele pairs. The next step is to measure the fraction of meioses in which crossing over leads to new combinations of alleles. The final step (and one not known to be necessary to the earliest linkage analysts) is to convert the measured "recombination fraction" to a "genetic distance" for the two allele pairs, which is defined as the probability of the occurrence of crossing over anywhere in the chromosomal region between the allele pairs. (Although a genetic distance is a dimensionless number, it is expressed in terms of a unit called a morgan or, more usually, in centimorgans.) The relationship between recombination fraction and genetic distance is complex (see "Classical Linkage Mapping" in "Mapping the Genome"), but a recombination fraction is approximately equal to its corresponding genetic distance when the recombination fraction is less than about 0.10. The significance of the genetic distance for two allele pairs is that the genetic distance is proportional to the physical distance between the loci of the allele pairs, provided crossing over occurs with equal probability at any point along the chromosome pair. Despite the fact that the stated proviso is not in general satisfied, genetic distance was until recently the only available measure of the physical distance between gene loci. Number 20 1992 Los Alamos Science Understanding Inheritance DNA molecules, and hence chromosomes, are not immutable, even in the absence of external mutagenic agents. One of the natural mechanisms whereby DNA mol- ecules can change is recombination, which rearranges genetic material by breaking and joining portions of the same DNA mol- l ecule or portions of different DNA mol- ecules of the same organism. (Recombina- tion can occur also between the DNA of an organism and the DNA of a virus that infects the organism.) Crossing over is the type of recombination undergone bythesimilar DNA molecules within two homologous chromo- somes. It occurs almost exclusively during prophase I of meiosis, when homologous chromosomes are closely apposed. A single crossover between homologous chromo- somes effects an exchange of correspond- ing chromosome regions and results in the formation of recombinant chromosomes, which differ in their content of hereditary information from the chromosomes that par- ticipated in the crossover. Crossing over also occurs between the identical DNA molecules within the chromosomes of a sister-chromatid pair, but because the re- combinant chromosomes so formed are usually identical to the participants, such recombination has little geneticsignificance. Crossing Over during Prophase I of Meiosis Recombinant chromosomes Closely apposed Crossover Crossover homologous in progress complete sister-chromatid pairs possessed by the parent germ-line cell. Crossing over is thus a mechanism for increasing genetic diversity. It also is the basis of a standard method for determining a "distance" between the locus of A and a and the locus of Band b. The first step in the method (see "Determining a Genetic Dis- tance'') is to carry out a certain breeding experiment and thereby measure, among a group of gametes produced by one parent, the fraction possessing the new allele com- binations (the so-called recombination frac- tion). When the measured recombination fraction is relatively small (less than about 0.10), it is approximately equal to the "ge- netic distance" between the two loci, that is, to the average number of crossovers be- tween the two loci per meiosis. The genetic distance between the two loci in turn is a rough measure of the physical distance (the distance along the DNA molecule) between the two loci. Effect of Crossing Over on Allele Combinations in Gametes a Prophase I - of meiosis u between of meiosis loci of two B b 8 allele pairs b B b B 1- B Allele combinations on single chromosomes in gametes Allele combinations on homologous chromosome pairs in germ-line cell I The occurrence of a single crossover be- tween the loci of two allele pairs, say A and a and B and b, resident on a homologous chromosome pair results in the formation of some gametes that possess combinations of alleles different from the combinations Los Alamos Science Number 20 1992 Understanding Inheritance As illustrated in "Determining a Genetic Distance," linkage analysis is facilitated by carrying out either one of two particular breedings. (Each breeding is a "test cross" involving one doubly heterozygous parent and one doubly recessive parent.) Morgan happened to carry out both breedings-between fruit flies, of course-in the early 1910s and thereby not only gathered the first clear evidence for the existence of crossing over but also measured the first recombination fractions. Then in 19 13 Sturtevant measured recombination fractions for various pairwise com- binations of six allele pairs known to reside on the X chromosomes of Drosophila. By assuming that the loci of the six allele pairs dot the X chromosome as points dot a line and that the measured recombination fraction for, say, the allele pairs A,a and B, b is directly proportional to the length of the X-chromosome segment be- tween the locus of A,a and the locus of B,b, Sturtevant constructed a diagram-the first "genetic-linkage mapw-showing the relative locations of the six genes and their pairwise separations. Sturtevant then used his diagram to calculate the recombination fractions for those pairwise combinations of allele pairs that he had measured but not needed to construct the diagram. The approximate agreement between calculated and measured recombination fractions indicated that both of his assumptions were at least approximately valid. We now know that, although the genes of all eukaryotic organisms lie along linear DNA molecules, the genes of prokaryotic organisms lie in- stead along circular DNA molecules. Furthermore, as indicated above, recombination fractions are not in general proportional to physical distance. As noted previously, genetic studies of an organism demand the availability of mutants, that is, of individuals possessing alleles different from those possessed by wild-type individuals. For many years, though, geneticists had to survive on the rare mutants provided by nature. (Fewer than ten out of every million members of a natural population of a species are phenotypically obvious mutants.) Then in 1927 Muller (one of Morgan's trio of brilliant students) demonstrated that x rays induce heritable mutations in the fruit fly, and a year later the American geneticist Lewis John Stadler (1896-1954) used x rays to create new alleles in barley. The availability of x-ray-induced mutants accelerated the pace of gene discovery and genetic-linkage mapping. The demonstrated power of combining cytological data about the chromosomes of an organism with genotypic and phenotypic data led, in the 1930s, to emergence of cytogenetics as a separate field of biology. Crucial to cytogeneticists is the ability to distinguish one pair of homologous metaphase chromosomes from another. For distinguishing features, early cytogeneticists relied on sizes and shapes, which do not always provide unambiguous identification. (The word "shape" generally means centromere location, but it can also mean an unusual structural feature specific to only certain metaphase chromosomes of certain organisms. Chromosome 9 of a strain of Zea mays, for example, is sometimes blessed with a conspicuous knob at the end of its short arm, a feature that helped elucidate the mechanism of crossing over.) It was soon learned, however, that each homologous chromosome pair within a metaphase Number 20 1992 Los Alamos Science Understanding Inheritance -- -- The classical method for determining the genetic distance between the loci of two allele pairs known to reside on the same homologous chromosome pair of an organ- ism involves observing the phenotypes of the offspring of one of two particular breed- ings. During the course of Thomas Hunt Morgan's work on fruit flies, he happened to carry out both breedings and was rewarded not only with the first clear evidence of crossing over but also with the first unam- biguous genetic-distance data. Morgan's experiments and data are used here to illustrate the procedure. The allele pairs in question reside on one of the homologous autosome pairs of Dro- sophila melanogaster. One allele pair af- fects eye color: a dominant allele A that specifies red eye color and a recessive allele a that specifies purple eye color. The other allele pair affects wing length: a domi- nant allele B that specifies wild-type wings and a recessive allele b that specifies ves- tigial (very short) wings. The participants in the first breeding are a female fruit fly that is heterozygous for both traits (and therefore has red eyes and nor- mal wings) and a male fruit fly that is ho- mozygous for both recessive trait variants (and therefore has purple eyes and vestigial wings). Furthermore, the female is known to be a product of the breeding AABB x aabb. Therefore the distribution of the alle- les A, a, 6, and b on the homologous auto- some pair of the female is known: Both dominant alleles (A and B) reside on one member of the homologous autosome pair, and both recessive alleles (a and b) reside on the other member. Such an allele distri- bution is denoted by writing the genotype of the female as AB/ab. The distribution of the alleles a, a, b, and b on the homologous autosome pair of the male is also known (because the male is homozygous for both traits) and is denoted in a similar fashion as ab/ab. Thus the first breeding can be sym- bolized by AB/ab female x ab/ab male. Meioses in the heterozygous female that involve no crossovers between the two loci yield two types of eggs: those possessing the chromosome with the allele combina- tion A6 and those possessing the chromo- some with the allele combination ab. In meioses in the female that involve a single crossover between the two loci (or any odd number of crossovers) yield in addition two other types of eggs: those possessing a chromosome with theallelecombination Ab and those possessing a chromosome with the allele combination aB. In other words, a single crossover between the two loci es- tablishes linkage between one dominant and one recessive allele. On the other hand, meioses in the doubly homozygous male, whether or not they invove cross- overs between the two loci, yield sperms possessing only the allele combination ab. Thus the offspring of breeding 1 possess four genotypes, each corresponding to one of the four possible phenotypes: AB/ab female x ab/ab male + AB/ab + ab/ab + Ab/ab + aB/ab. Morgan examined more than 2800 progeny of breeding 1 and found that 47.2 percent had red eyes and normal wings (AB/ab), 42.1 percent had purple eyes and vestigial wings (ab/ab), 5.3 percent had red eyes and vestigial wings (Ab/ab), and 5.4 percent had purple eyes and normal wings (aB/ab). All the offspring exhibiting the last two pheno- types (the combinations of one recessive trait variant and one dominant trait variant) result only from crossovers during meioses other words, the two dominant alleles and in the female parent. Thus the data indicate the two recessive alleles remain linked that the probability of new allele linkages (resident on the same chromo- being formed by crossing over is 0.107 = some), just as they are in the 0.053 + 0.054. That value for the so-called female herself. But those recombination fraction corresponds to a genetic distance of about 12 centimorgans. (The relationship between recombination fraction and genetic distance is presented in "Classical Linkage Mapping" in "Mapping the Genome.") The participants in the other breeding that provides unambiguous recombination-frac- tion data are, like the participants in breed- ing 1, a doubly heterozygous female and a doubly homozygous-recessive male. How- Los Alamos Science Number 20 1992 ever, the second female is known to be a product of the breeding Ab/Ab x aB/aB (rather than the breeding AB/AB x ab/ab). Therefore the distribution of alleles on her homologous autosome pair is Ab/aB(rather than AB/ab). (The difference in allele distri- butions of the two doubly heterozygous females is often referred to as a difference in linkage phase.) The second breeding is thus symbolized by Ab/aB female x ab/ab male. - - Breeding 2 yields offspring that exhibit the same genotypes and phenotypes as breed- ing 1 : Ab/aB female x ab/ab male Ab/ab + aB/ab + AB/ab + ab/ab. Morgan examined more than 2300 progeny of breeding 2 and found that 41.3 percent had red eyes and vestigial wings (Ab/ab), 45.7 percent had purple eyes and normal wings (aB/ab), 6.7 percent had red eyes and normal wings (AB/ab), and 6.3 percent had purple eyes and vestigial wings (ab/ ab). Again, all the offspring exhibiting the last two phenotypes result only from cross- overs during meioses in the female parent. Thus the data indicate that the recombina- tion fraction for the two allele pairs is 0.1 30, which corresponds to a genetic distance of about 15 centimorgans. Note that the two data sets yield different values for the same genetic distance. How- ever, the difference between the values is within the statistical uncertainties associ- ated with measurements of probabilistic events. Note also that the same genetic distance could in principle be determined by carrying out the reciprocal of breeding 1 or breeding 2 (that is, a breeding between a doubly heterozygous male and a doubly homozygous-recessive female). Then, the crossovers detected are those that occur /- Understanding Inheritance during meioses in the male parent rather than in the female parent. However, for some unknown reason crossing over simply does not occur in male fruit flies. But fruit flies are exceptional in that respect, and genetic distances for other species can be determined by carrying out either breeding 1 , say, or its reciprocal. Breedings 1 and 2 are those that provide unambiguous recombination-fraction data. As an example of the ambiguities that can arise, consider the fruit-fly breeding AB/ab female x AB/ab male. Assume first that crossing over between the two loci does not occur during meioses in the female parent. Then the offspring of breeding 3 exhibit two phenotypes: red eyes and normal wings (AB/AB and AB/ab) and purple eyes and vestigial wings (ab/ab). Now assume that crossing over does occur during meioses in the female parent. Then among the offspring of breeding 3 are some that exhibit the two other possible pheno- types: red eyes and vestigial wings (Ab/ab) and purple eyes and normal wings (aB/ab). All offspring that exhibit those two pheno- types result only from crossing over. How- ever, crossing over also leads to offspring that exhibit one of the phenotypes produced in the absence of crossing over, namely, red eyes and normal wings (Ab/ABand aB/AB). In other words, whereas the offspring pro- The reader can accept on faith or verify personally that breedings 1 and 2 are the only breedings that provide unambiguous recombination-fraction and hence genetic- distance data. Note, in addition, that obtain- ing even ambiguous data requires that one parent be doubly heterozygous. Determining a genetic distance is thus rela- tively easy when the breeding of the organ- ism in question can be manipulated at will. But determining the genetic distance be- tween the loci of two human allele pairs is much more difficult, since the breeding of humans cannot be manipulated, the geno- types and allele distributions of human par- ents are not always known, and human breedings generally produce so few off- spring that the statistical uncertainty in the measured recombination fraction is large. duced by breeding 1 or 2 can unambiguously be sorted by pheno- type into two categories-those that are the result of crossovers and those that are not-the offspring resulting breeding 3 cannot be so sorted because meioses that do and do not involve cross- overs result in the doubly dominant pheno- type. Number 20 1992 Los Alamos Science 3 5 Understanding Inheritance cell displays a characteristic pattern of dark and light bands when stained with an appropriate dye (see "Chromosomes: The Sites of Hereditary Information"). Because the banding pattern characteristic of a pair of homologous metaphase chromosomes varies along the length of the chromosomes, it can also be used to identify different regions of the chromosomes. The advent of chromosome banding led to recognition of the occasional occurrence of aberrant chromosomes. (The incidence of aberrant chromosomes, like the incidence of gene mutations, can be increased by exposure to x rays or other mutagenic agents.) Several types of chromosome aberrations, or rearrangements, were noted, including translocations (the exchange of chromosome regions between nonhomologous chromosomes) and inversions (the reversal of the orientation of a chromosome region). Obviously a chromosome rearrangement can lead to changes in the complement of genes present on a chromosome or to changes in their relative locations. The gene (or genes) affected by a chromosome rearrangement (as determined from genetic data) can then be assigned a locus within the rearranged chromosome region. Although the locus so obtained is inexact, it is better than the alternative of knowing nothing at all about the locus. Knowledge of the whereabouts on a chromosome of a gene then serves to "anchor" a genetic-linkage map including that gene to the chromosome. (Recall that a linkage analysis provides only distances between genes on a chromosome; additional information is required to locate the genes relative to the chromosome itself.) Chromosome rearrangements and gene mutations are but two examples of naturally rare phenomena that, once noted, are exploited to gain basic information about genes. Another example is the exceptional behavior of the cells that compose the salivary glands of Drosophila (and other insects of the order Diptera). In 1933 the American zoologist Theophilus Shickel Painter (1889-1969) and independently two German geneticists discovered that the chromosomes in those cells were microscopically vis- ible during interphase. (Interphase chromosomes are usually not microscopically visible because they have not yet condensed in preparation for mitosis.) For some unknown reason the salivary cells of Drosophila undergo not a single round but many successive rounds of chromosome duplication during the S phase of interphase (see "The Eukaryotic Cell Cycle"). The numerous (on the order of a thousand) copies of each chromosome remain closely associated along their lengths, forming a fiber sufficiently thick to be microscopically visible. Because such "polytene" chromo- somes are not condensed, sites of chromosome rearrangements can be pinpointed with much greater resolution. The Rise of Molecular Genetics. By 1940 many genes were known to exist, and a goodly number of the known genes had been assigned to particular regions of particular chromosomes. But the gene remained an abstract concept. No one knew what genes do or even of what they are made. A speculation about what genes do had appeared as early as 1903, when the French geneticist Lucien Claude Cuenot (1866-1951) proposed that inherited coat-color differences in mice were due to the Los Alamos Science Number 20 1992 Understanding Inheritance action of different genes. And in 1909 the English physician Archibald Edward Garrod (1857-1936) established that the human disease alkaptonuria was inherited as a recessive trait variant and proposed that the unmistakable symptom of the disease (urine that blackens after being excreted) was due to accumulation in the urine of a metabolic product that normally is degraded with the help of a certain enzyme. (An enzyme is a protein that catalyzes a biochemical reaction.) But Cuenot's and Garrod's proposals were regarded as mere speculation for many years. Then, in 1941, the American geneticist George Wells Beadle (1903-1989) and the American biochemist Edward Lawrie Tatum (1909-1975) clearly demonstrated the connection between the genes an organism possesses and the biochemicals it is able to synthesize. Beadle and Taturn's work focused on the bread mold Neurospora crassa. Because wild-type spores of N. crassa can be cultured in the laboratory on a minimal growth medium (one containing only sucrose, inorganic salts, and the vitamin biotin), they reasoned that the mold must possess enzymes that help convert those simple molecules into all the other necessities of life. By exposing N. crassa to ultraviolet light, Beadle and Tatum produced a very few mutant spores that could not be cultured on a minimal growth medium but could be cultured on a growth medium containing a single additional nutrient (vitamin B6, for example). They concluded that the x rays had caused a mutation in a gene that somehow directs the synthesis of an enzyme involved in the synthesis of the nutrient. Evidence in support of such a conclusion accumulated, and in 1948 the American geneticist and biochemist Norman Harold Horowitz (1915-) propounded the famous one gene-one enzyme hypothesis. Molecular genetics was born. Horowitz's hypothesis has since been modified to state that one gene directs the synthesis of one protein, or, more precisely, one polypeptide chain, since some proteins contain more than one polypeptide chain. Beadle and Taturn's work on N. crassa demonstrated the value of studying such a simple organism. Attention soon turned to even simpler organisms-bacteria. The bacterium Escherichia coli, a tenant of the vertebrate gut, gained particular favor. As a result of studies begun soon after World War II by Francois Jacob (1 920-), Joshua Lederberg (1 925-), Jacques Lucien Monod (1 9 10-1 976), and Elie Leo Wollman (1917-), more is known about the genes of E. coli, including their regulation, than of any other living organism. Attention also focused on viruses, the simplest of all organisms, and in particular on the viruses that infect bacteria, known as bacteriophages or simply phages. (Viruses are composed of a nucleic acid core encased in a protein coat. They are not living organisms in the sense that they lack the machinery for biosynthesis. They can, however, reproduce-by usurping the biosynthetic machinery of the cells they infect-and pass their characteristics from generation to generation through the medium of genes just as cellular organisms do.) In the United States the so-called Phage Group, led by Max Delbruck (1906-1981), Alfred Day Hershey (1908-), and Salvador Edward Luria (19 12-199 I), aroused interest in the interaction between phages and bacteria as a model system for studying the fundamental mechanisms of heredity. Work by the Phage Group included developing quantitative methods for studying the life cycles of phages and later Number 20 1992 Los Alamos Science Understanding Inheritance the discovery that phages can transfer bacterial genes from one bacterial strain to another, a process called transduction. (Transduction was to become a progenitor of recombinant-DNA technology.) The promiscuous exchange of genetic material between different strains of bacteria and between bacteria and their viruses facilitated the mapping of genes and the identification of their functions. What genes are made of was the other big question about genes in the 1940s. In 1925 Wilson, reversing his previous stance, espoused protein as the genetic material. The idea of a proteinaceous genetic material was subsequently widely accepted for more than two decades, primarily because the nonproteinaceous component of chromosomes, DNA (deoxyribonucleic acid), was thought by chemists to have a structure that rendered it incapable of carrying any kind of message. However, in 1944 the American bacteriologists Oswald Theodore Avery (1877-1955) and his colleagues presented strong evidence that the genetic material was DNA. Their evidence was the ability of DNA extracted from dead members of a pathogenic strain of Streptococcus pneumoniae to impart the inherited characteristic of pathogenicity to live members of a nonpathogenic strain of the same bacterium. (We now know that the mechanism involved in the transformation from nonpathogenicity to pathogenicity is DNA recombination, of which crossing over is a specific example.) And in 1952 Hershey and another member of the Phage Group, the American geneticist Martha Chase (1927-), showed that DNA is the component of a phage that enters a bacterium and thus presumably directs the synthesis of new phages within the infected bacterium. Nevertheless, despite the accumulating evidence, DNA was not widely accepted as the genetic material. Then in 1953 James Dewey Watson (1928-) and Francis Harry Compton Crick (1916) proposed a structure for DNA that accounted for its ability to self-replicate and to direct the synthesis of proteins. The structure they proposed is of course the famous double helix, which, like two-ply embroidery floss, is composed of two strands coiled helically about a common axis. Each strand is a polymer of deoxyribonucleotides, and each deoxyribonucleotide contains a phosphate group, the residue of the sugar deoxyribose, and the residue of one of four nitrogenous organic bases (adenine, cytosine, guanine, and thymine). The deoxyribonucleotides are linked together in a manner such that alternating phosphate groups and sugar residues form a backbone off which the bases project. Hereditary information is encoded in the order, or sequence, of bases along the strands. The two strands are coiled about the helix axis in a manner such that the backbones form the boundaries of a space within which the bases are contained. Each base on one strand is linked by hydrogen bonds to a base on the other strand; the members of each "base pair" lie in a plane that is essentially perpendicular to the axis of the helix. Of the ten theoretically possible base pairs, only two so-called complementary pairs are found in DNA: the pair adenine and thymine and the pair cytosine and guanine. Thus the order of the bases on one strand is precisely related to the order of the bases on the other strand, and the two strands are said to be complementary. Further details are presented in "DNA: Its Structure and Components." Los Alamos Science Number 20 1992 Understanding Inheritance Watson and Crick arrived at their structure for DNA with the help of x-ray diffraction data for DNA fibers obtained by Maurice Hugh Frederick Wilkins (1916) and Rosalind Franklin (1920-1957) and of the observation in 1950 by Erwin Chargaff (1905-) that the number of molecules of adenine in any of various DNA samples equals the number of molecules of thymine and that the number of molecules of cytosine equals the number of molecules of guanine. In addition, following the example of the American chemist Linus Carl Pauling (1901-), who in 1951 had worked out the details of a helical polypeptide structure (the so-called a helix), they made liberal use of ball-and-stick models. Molecules of DNA are exceptional among biological macromolecules in two respects. First, they are very long relative to their width. If the diameter of the double helix could be increased to that of a strand of angel-hair pasta, the length of the DNA molecule in a typical human chromosome would be about 12 kilometers. Second, al- though single-helical configurations are not uncommon in biological macromolecules, the double-helical configuration of DNA is unique. One might wonder why DNA is double-stranded. After all, normally only one of the strands directs protein synthesis, the two strands are replicated separately, and some viruses manage quite nicely with only single-stranded DNA. The evolutionary advantage of double-stranded DNA is thought to lie in the fact that, if one strand is damaged, the other strand can provide the information required to repair the damaged strand. The base-pairing feature of DNA immediately suggested that each strand of DNA could serve as the template for directing the synthesis of a complementary strand. The result would be two identical double-stranded DNA molecules, each containing one new and one old strand. The suggestion that DNA replication is "semiconservative" was proved correct (for the DNA of E. coli and a higher plant) several years after the double-helical DNA structure was proposed. The details of DNA replication, however, are very complex, involving a number of enzymes. One enzyme first uncoils a portion of the DNA molecule, and another separates the two strands. Then an enzyme called a DNA polymerase, using one of the separated DNA strands as a template, catalyzes the polymerization of free deoxyribonucleoside triphosphates into a strand that is complementary to the template. Some features of the process are detailed in "DNA Replication." Now that genes were known to direct the synthesis of proteins and to be made of DNA, the next problem was to determine the relationship between DNA and proteins. The first clue about the relationship came in 1949 when Pauling presented evidence that the hemoglobin present in humans suffering from sickle-cell anemia differed structurally from the hemoglobin in humans not suffering from that inherited disease. (Hemoglobin is composed of two copies each of two polypeptides, the so-called a and 8 chains. The a chain contains 141 amino acids, and the f3 chain contains 150 amino acids.) What features of a protein affect its structure? By the 1940s biochemists were beginning to realize that the structure of a protein is determined not so much by which amino acids it contains but more by the sequence of the amino acids along the Number 20 1992 Los Alamos Science Understanding Inheritance 1 (a) Computer-generated Image of DNA (by Mel Prueitt) Deoxyribose residue . phosphate group I to 3' carbon of sugar residue iown, the two strands coil about each other in a fashion such that all the bases project inward toward the helix axis. The two strands are held together by hydrogen bonds (pink rods) linking each base projecting from one backbone to its so-called complementary base projecting from the other backbone. The base A always bonds to T (A and T are comple- mentary bases), and C is always linked to G (C and G are complementary bases). Thus the order of the bases along one strand is dictated by and can be inferred from the order of the bases along the other strand. (The two strands are said to be complemen- tary.) The pairing of A only with T and of C only with G is the feature of DNA that allows it to serve as a template not only for its own replication but also for the synthesis of proteins (see "DNA Replication" and "Pro- tein Synthesis"). Note that the members of a base pair are essentially coplanar. All available evidence indicates that each eukaryotic chromosome contains a single long molecule of DNA, only a small portion of which is shown here. Furthermore, the ends of each DNA molecule, called te- lomeres, have a special base sequence and a somewhat different structure. Shown in (b) is an uncoiled fragment of (a) containing three complementary base pairs. From the chemist's viewpoint, each strand of DNA is a polymer made up of four repeated units called deoxyribonucleotides, or simply nucleotides. The four nucleotides are re- garded as the monomers of DNA (rather than the sugar residue, the phosphate group, and the four base residues) because the nucleotides are the units added as a strand of DNA is being synthesized (see "DNA Replication"). The usual configuration of DNA is shown in (a). Two chains, or strands, of repeated chemical units are coiled together into a double helix. Each strand has a "backbone" of alternating deoxyribose residues (larger spheres) and phosphate groups (smaller spheres). Free deoxyribose, C5O4Hl0, is one of a class of organic compounds known as sugars; the phosphate group, is a component of many other biochemicals. Attached to each sugar residue is one of four essentially planar nitrogenous organic bases: adenine (A), cytosine (C), guanine (G), or thymine (T). The plane of each base is essentially perpendicularto the helix axis. Encoded in the order of the bases along a strand is the hereditary information that distinguishes, say, a robin from a human and one robin from another. A particular nucleotide is commonly desig- nated by the symbol for the base it contains. Thus T is a symbol not only for the base thymine (more precisely, the thymine resi- due) but also for the indicated nucleotide. Also shown are chemical and structural details of the backbone components. Note that four carbon atoms of the sugar residue and its one oxygen atom form a pentagon in a plane parallel to the helix axis, and that the fifth carbon atom of the sugar residue projects out of that plane. Los Alamos Science Number 20 1992 Shown in (c) are further chemical and structural details of the DNA segment shown in (b). The planes of the three base pairs have been rotated into the plane of the sugar residues. Details of particular note include the following. Linking any two neighboring sugar residues is an -0-P-0- "bridge" between the 3' carbon atom of one of the sugars and the 5' carbon atom of the other sugar. (The desig- nations 3' (three prime) and 5' (five prime) arise from astandard system for numbering atoms in organic molecules.) When a DNA molecule is broken into fragments, as it must be before it can be studied, the breaks usually occur at one of the four covalent bonds in each bridge. Because deoxyribose has an asymmetric structure, the ends of each strand of a DNA fragment are different. At one end the termi- nal carbonatom in the backbone is the 5' Number 20 1992 Los Alamos Science Understanding Inheritance A d carbon atom of the -& terminal sugar (the carbon atom that lies outside the planar portion of the sugar), whereas at the other end the terminal car- bon atom is the 3' carbon atom of the terminal sugar (a carbon atom that lies within the planar portion of the sugar). The two complementary strands of DNAare antiparallel. In other words, arrows drawn from, say, the 5' end to the 3' end of each strand have opposite directions. Most of the enzymes that move along a backbone in the course of catalyzing chemical reactions move in the 5'-to-3' direction. The compo- sition of a DNA fragment is represented symbolically in a variety of ways. However, all of the representations focus on theorder, or sequence, of the nucleotides (and hence the bases) along the strands of the frag- ment. For example, the most complete rep- Carbon atom Covalent bond - - - - - - - Hydrogen bond DNA backbone 5'40-3' direction Hydregen atoms not Involved fn hydrogen bonding have bean omitted in this drawing, As a result some carbon atomsand seme nifrogen mmssappew to be u n d a - b o a . resentation for the fragment shown above is The most abbreviated representation, ACT (or, equivalently, AGT), gives the sequence of only one strand (since the sequence of the complementary strand can be inferred from the given sequence) and follows the convention that the left-to-right direction corresponds to the 5'-to 3' direction. Understanding Inheritance Parental DNA molecule A Replication Identical daughter DNA molecules Anoverall description of DNA replication is quite simple. Each strand of a parent DNA molecule serves as the template for synthe- sis of a complementary strand. The result is two daughter DNA molecules, each com- posed of one parental strand and one newly synthesized strand and each a duplicate of the parent molecule. But this overall sim- plicity, illustrated above, is misleading, since DNA replication involves the intricate and coordinated interplay of more than twenty enzymes. The most important general fea- ture of DNA replication is its extremely high accuracy. A"proofreadingJ' capability of DNA polymerase, the enzyme that catalyzes the basic chemical reaction involved in replica- tion, guarantees that only about one per billion of the bases in a newly synthesized strand differs from the complement of the corresponding base in the template strand. A more detailed description of DNA replica- tion should note first that replication of a chromosomal DNA molecule does not be- gin at one end of the molecule and proceed uninterruptedly to the other end. Instead, scattered along the molecule are numerous occurrences of a particular base sequence, and each occurrence of that sequence serves as an "origin of replication" for a portion of the molecule. Thus different por- tions of a DNA molecule are replicated separately. Baker's yeast, Saccharomyces cerevisiae, is one of the few eukaryotes for which the base sequence of its origins of replication is now known. Knowledge of the base sequence of an organism's origins of replication is necessary in the creation of artificial chromosomes of the organism, syn- thetic entities that are treated by the organism's cellular machinery just as its own chromosomes are treated. The clon- ing vectors known as YACs are an example of artificial chromosomes. Replication of the portion of a DNA mol- ecule flanked by two origins of replication begins with the action of enzymes that move along the parental DNA, progressively un- coiling and denaturing (separating into single strands) the double helix. Uncoiling and denaturation expose the bases in each pa- rental strand and thereby enable the bases to direct the order in which deoxyribonucle- otides are added by DNA polymerase to the strand being synthesized. Because, as shown in the figure at right, DNA polymerase elongates a growing chain of deoxyribonucleotides only in the 5'40-3' direction (arrows), one of the new DNA strands can be synthesized continuously but the other strand must be synthesized in short pieces called Okazaki fragments. (The Okazaki fragments shown here are much shorter than they are in reality.) The discon- tinuous synthesis of one of the new strands is the source of additional complexities in replicating the very ends, the telomeres, of a DNA molecule. 5' 5' Time Okazaki fragments As shown in the figure on the next page,the participants in the chemical reaction bywhich each portion of a DNAstrand is synthesized include a "primer," the enzyme DNA poly- merase, a DNAtemplate (a parental strand), and a supply of free deoxyribonucleoside triphosphates (dNTPs). The usual primer is a very short strand of RNA, generally con- taining between four and twelve ribonucle- otides. (RNA is a single-stranded nucleic acid; its structure is very similar to that of a strand of DNA. Because the sugar residue in RNA is derived from ribose rather than deoxyribose, the repeated units in RNA are Los Alamos Science Number 20 1992 Understanding Inheritance DNA template - 5' strand RNA- primer The deoxyribonucleotide dTTP binds to the first free base, A, in the template strand. 0- 0- I I 0'-P-0-P-0- II II 0 0 -L DNA polymerase DNA polymerase catalyzes the creation of an -0-P-O- bridge, thus extending the backbone and incorporating the new base, T, into the growing strand. The atoms of the newly formed -0-Pa- bridge are shown explicitly and highlighted in red. Growing strand 5' dCTP binds to the next free base, G, on the DNA template strand. The polymerase continues to extend the growing strand in the 5'-to-3' direction. called ribonucleotides rather than deoxyri- bonudeotides.) A primer is required be- cause DNA polymerase catalyzes the addi- tion of a deoxyribonucleotide to an existing chain of nucleotides (either ribonucleotides or deoxyribonucleotides) but not the de novo synthesis of a chain of deoxyribo- nucleotides. The action of each parental strand as a template is based on hydrogen bonding between complementary bases. In particular, a base in a parental strand hydro- gen bonds to the dNTP containing the complementary base. As a result, the dNTP is fixed in a position such that the DNA polymerase can exert its catalytic action on the triphosphate group of the dNTP and the 3' hydroxyl group of the 3'-terminal sugar of the primer. The result is the addition of a deoxyribonucleotide to the primer and the release of a pyrop hosphate group, (PO7)+. The next deoxyribonucleotide in the tem- plate strand fixes its complementary dNTP into position, the DNA polymerase moves further along the chain being elongated, and addition of another deoxyribonucle- otide is effected by action of the polymerase on the triphosphate group of the dNTP and the hydroxyl group of the sugar of the de- oxyribonucleotide just previously added. Successive repetitions of the process and eventual replacement of the RNA primer with DNA lead to formation of double- stranded DNAidentical to the parental DNA. Number 20 1992 Los Alums Science 43 Understanding Inheritance polypeptide chain. Then in 1957 Vernon Martin Ingram (1924) demonstrated that the sixth amino acid in the ,8 chain of normal hemoglobin is glutamic acid, whereas the sixth amino acid in the ,8 chain of sickle hemoglobin is valine. Otherwise, the amino-acid sequences of both ,8 chains are identical. Ingram's work suggested that the function of DNA was to determine the order in which amino acids are assembled into proteins. DNA itself could not, however, be the template for the synthesis of proteins, since DNA is sequestered in the nucleus of a eukaryotic cell, whereas proteins were known to be synthesized in the cytoplasm outside the nucleus. Perhaps an intermediary substance was involved, one that receives hereditary information from DNA in the nucleus and then moves to the cytoplasm, where it serves as the template for protein synthesis. A likely candidate for such an intermediary was the other known nucleic acid, namely ribonucleic acid, or RNA, which is found primarily in the cytoplasm. Like DNA, RNA is a polymer of four different nucleotides, but the nucleotides are ribonucleotides containing the sugar ribose, which differs from deoxyribose in possessing a hydroxyl group on its 2' carbon atom. Another difference is that the base thymine is absent from RNA, being replaced by the base uracil (U), which lacks the extra-ring methyl group of thymine but, like thymine, hydrogen bonds with adenine. The final difference between DNA and RNA is that RNA is usually single-stranded. That RNA is the intermediary between DNA and proteins soon became the working hypothesis of biochemists, and the details of protein synthesis were worked out in the fifties and sixties. Briefly, a segment of DNA (a gene) serves as the template for the synthesis, in the nucleus, of so-called messenger RNA (mRNA), a process called transcription and similar to DNA replication. The mRNA then enters the cytoplasm, where it serves as the template for the ordered assembly of amino acids into a protein, a process called translation. Details of transcription and translation are illustrated "Protein Synthesis." The last general problem about the relation between DNA and proteins was to crack the code relating the sequence of deoxyribonucleotides that constitutes a gene to the sequence of amino acids that constitutes a protein. Experiments performed in 1961 by Crick and the British molecular biologist Sydney Brenner (1927-) suggested that the code was a triplet code, or, in other words, that a sequence of three adjacent deoxyribonucleotides (a codon) specifies each amino acid. The genetic code was completely cracked by 1966, thanks primarily to the independent efforts of two groups, one led by Marshall Warren Nirenberg (1929-) and the other by Har Gobind Khorana (1922-). As shown in "The Genetic Code," eighteen of the twenty amino acids are specified by two or more codons. The redundancy of the code implies that gene mutations involving single-base substitutions do not necessarily result in a change in an amino acid. Now that what seemed the major questions about the material and mechanisms of heredity had been answered, was anything fascinating left to learn? Or would Los Alamos Science Number 20 1992 (a) Protein Synthesis in Prokaryotic and Eukaryotic Cells Prokaryotic Cell ^ Sense strand Template (non-sense strand) 1 \ 1 Translation A Cell wall A Protein synthesis is the process by which information encoded in a gene is converted into a specific protein. In 1957 Francis Crick proposed two hypotheses about protein synthesis, which later became known as the central dogma of molecular biology. He proposed first that gene sequences are 'collinear" with protein sequences. In other words, the linear arrangement of subunits (deoxyribonucleotides) composing a gene corresponds to the linear arrangement of subunits (amino acids) composing a pro- tein. Second, Crick proposed that a seg- ment of RNA (a ribonucleotide sequence) acts as an intermediate translator between the deoxyribonucleotide sequence and the amino-acid sequence, or, in other words, that genetic information flows from DNA to RNA to protein. Crick had no experimental evidence to support his hypotheses. But very shortly Charles Yanofsky and Seymour Benzer, working independently, provided the first evidence in support of the collinear- ity hypothesis. Their experiments showed that mutations in the genes of E. coiiand of the T4 bacteriophage produced parallel changes in amino-acid sequences. And as details of protein synthesis were worked out, the role of RNA as an intermediary was also established. Eukaryotic A Cell A PROTEIN SYNTHESIS 1 Transcription , 3 Primz , *. trans 7 7 Cytoplasm v Shown in (a) is an overview of protein syn- thesis in a prokaryotic cell. In the first stage, called transcription, a DNAsegment, agene, serves as a template for the synthesis of a single-stranded RNA segment called a messenger RNA (mRNA). The base se- quence of the mRNA is complementary to the base sequence of one strand of the gene (the template, or "non-sense," strand) and is therefore identical to the base se- quence of the other strand of the gene (the 'sensen strand). The one exception to the identity is that the base U (uracil) replaces the base T. (Recall that in RNA uracil, rather than thymine, is the base complementary to adenine.) In the second stage of protein synthesis, called translation, the mRNA serves as the template for the stringing together of amino acids into a protein. The protein is assembled according to the genetic code. That is, the succession of codons (triplets of adjacent ribonucleotides) that compose the mRNA dictates the succession of amino acids that compose the protein. (A listing of codons and corresponding amino acids is presented in 'The Genetic Code.") Although transcrip- tion and translation are depicted here as if they occurred at different times, translation of a prokaryotic mRNA often begins before its synthesis by transcription is complete. Also shown in (a) is an overview of protein synthesis in a eukaryotic cell. Unlike pro- karyotic genes, most eukaryotic genes are composed of stretches of protein-coding sequences (exons) interrupted by longer stretches of noncoding sequences (introns). Both the exons and introns within a eukary- otic gene are transcribed. The resulting primary transcript is then spliced; that is, each intron is removed and the adjacent exons are linked together. I Number 20 1992 Los Alamos Science Understanding Inheritance The shortened RNA is now an mRNA, an RNA that contains only protein-coding se- quences. The mRNA leaves the nucleus and in the cytoplasm is translated into a protein according to the genetic code. Thus transcription and translation are of neces- sity temporally separated in eu karyotic cells. The overviews in (a) illustrate that, as Crick had postulated, genetic information flows from DNA to RNA to protein within both prokaryotic and eukaryotic cells. One im- portant exception to the central dogma is the class of viruses known as retroviruses, of which the AIDS virus is an example. Retroviruses store genetic information in RNA and then convert the information to DNA-a reversal of the usual information flow that is known as reverse transcription. Details of transcription and translation are shown in (b) and (c) respectively. Transcrip- tion begins when an enzyme, an RNA poly- merase, binds to a particular segment of a gene called the promoter. The double helix then uncoils and separates into two strands, exposing a small number of bases. The RNA polymerase facilitates hydrogen bond- ing between an exposed base in the tem- plate strand and its complementary base in a free ribonucleoside triphosphate (NTP) and then between the next exposed base in the template strand and its complementary base in another free NTP. While the two NTPs are held in proximity by the hydrogen bonds, the RNA polymerase catalyzes the formation of an -0-P-O- bridge between them, thus forming a chain of two covalently linked ribonucleotides. (SeeUDNA Replica- tion" for details about formation of -0-P-0- bridges.) A third NTP is hydrogen-bonded to the third exposed base in the template strand and is covalently linked to the second ribonucleotide in the chain. The RNA poly- merase moves along the template in the 3'- to-5' direction, continuing to unwind and separate the double helix and to elongate the RNA chain in the 5"-to-3' direction by catalyzing the addition of successive ribo- nucleotides. At the same time, the distorted DNA in the wake of the polymerase re- winds. After the gene is fully transcribed, the polymerase separates from the double helix. If the gene transcribed is a eukaryotic gene, the newly minted RNA is spliced and the resulting mRNA enters the cytoplasm through pores in the nuclear membrane. As shown in (c), translation occurs with the help of transfer RNA molecules (tRNAs) and ribosomes. Each tRNA is a tiny, clover- leaf-shaped molecule that serves as an adapter: At one end it contains a triplet of ribonucleotides (an anticodon) that binds with a complementary codon on the mRNA strand, and at the other end it has an attach- ment site for a single amino acid. Many varieties of tRNAs exist. An important dif- ference between one tRNA and another is the presence of a different anticodon on the central cloverleaf stem. The number of dif- ferent anticodons found in thevarious tRNAs is less than the number of codons in the genetic code. That is so because the base pairing between the third base of the mRNA codon and the first base of the tRNA anti- codon can depart from the usual Watson- Crick rules. For example, G can pair with U in addition to C. Ribosomes are very large molecules com- posed of ribosomal RNA (rRNA) and ap- proximately fifty different proteins. As a ribo- some travels along an mRNA it catalyzes the reactions that lead to synthesis of the protein encoded in the mRNA. Thousands of ribosomes exist within each cell. Before a tRNA molecule participates in trans- lation, it must be converted to an aminoacyl- tRNA (become attached to the amino acid corresponding to its anticodon). Each of the twenty amino acids found in proteins can be attached to at least one type of tRNA, and most can be attached to several. The bind- ing between tRNA and amino acid is cata- (b) Transcription Sense strand - . . Messenger RNA (non-sense strand) 1 46 Los Alarnos Science Number 20 1992 Understanding Inheritance - - (c) Translation ^- Amino-acid sequence ' (protein) >\y Anticodon - Amino acid / Messenger RNA Ribosome w Aminoacyl synthetase Attachment site v lyzed by one of a group of enzymes. Those exquisitely specific enzymes, called aminoacyl synthetases, are in fact theagents by which the genetic information in mRNA is decoded. Translation begins when an aminoacyl-tRNA containing the amino acid methionine and a ribosome bind to an initiation sequence near the 5' end of the mRNA. The initiation sequence consists of the START codon AUG, to which the aminoacyl-tRNA binds through complementary base pairing. A second aminoacyl-tRNA, which contains an anticodon complementaryto the second mRNAcodon, binds to the mRNA. Then the amino acid on the first aminoacyl-tRNA is joined by a peptide bond to the amino acid on the second aminoacyl-tRNA, thus creat- Number 20 1992 Los Alamos Science ing a chain of two amino acids dangling off the end of the second aminoacyl-tRNA. The process continues as the ribosome moves along the mRNA (in the 5'-to-3' direction) and as peptide bonds are formed between successive amino acids. When the ribo- some reaches a STOP codon within the mRNA, the ribosome detaches from the mRNA, and the completed protein is re- leased into the cytoplasm. The process of translation is fast: A single ribosome can translate up to fifty ribonucle- otides per second. Furthermore, at anyone time numerous ribosomes may be traveling along a single mRNA, each producing a molecule of the same protein. Thus a pro- tein needed for diverse tasks within the cell can be quickly and efficiently produced. Note: Published only recently (in June 1992) was strong evidence that the formation of peptide bonds between amino acids during translation is catalyzed not by some protein enzyme within a ribosome but instead by an RNA component of the ribosome. That news is exciting but not completely unex- pected, since the ability of RNA to function as a catalyst in other situations had been demonstrated in the early 1980s. In particu- lar, the primary transcript of a ribosomal- RNA gene of the protozoan Tetrahymena thermophila had been shown to effect its own splicing and the catalytic action of an RNA-protein complex that processes the primary transcripts of certain transfer-RNA genes had been ascribed to the RNA com- ponent of the complex rather than the pro- tein component. Understanding Inheritance THE G7NETln Crime ' (a) RNA Codons for the Twenty Amino Acids Wh a t triplet of ribonucleotides directs the addition of, say, the amino acid alanine to a protein that is being synthesized? Of ly- sine? Of any one of the twenty amino acids found in proteins? That was the problem to be faced after advancement of the ideas that a gene is a string of deoxyribonucle- otide triplets, that the string of deoxyribo- nucleotide triplets is transcribed into astring of ribonucleotide triplets, and that the string of ribonucleotide triplets is translated into a string of amino acids-a protein. The results of research on the problem is condensed in the genetic code, a listing of the sixty-four possible ribonucleotide triplets and the amino acid (or translation command) correspond- ing to each. Fortunatelyforthose who worked on the problem, the genetic code is organ- ism-independent. That is, the same genetic code is used by virtually all organisms. Researchers began to crack the genetic code in the early 1960s. Marshall Nirenberg and his collaborators added a synthetic RNA, consisting entirely of repetitions of a single ribonucleotide, say U, to a bacterial extract that contained everything neces- sary for protein synthesis except RNA. The result was a string of the amino acid phenyl- alanine. They concluded that the ribonucleo- tide triplet UUU codes for phenylalanine. Other ribonucleotide triplets were decoded by performing similar experiments with syn- thetic RNAs containing only A's, c's, or G's or various combinations of ribonucleotides. By 1966 research teams led by Har Gobind Khorana and Marshall Nirenberg had cracked the entire genetic code. Second base U I C I A Phe Phe Leu Leu Leu Leu Leu L e u l le l le lie Met (start) Val Val Val Val Ser TY r Ser TY r Ser STOP S e r STOP Pro His Pro His Pro Gin Pro ~p Gin Thr Asn Thr Asn Thr L Y ~ Thr L Y ~ Ala ASP Ala ASP Ala Glu Ala Glu Shown in (a) is the usual representation of the genetic code. The letters U, C, A, and G are symbolsfor the ribonucleotides contain- ing the bases uracil, cytosine, adenine, and guanine, respectively. The symbols in the body of the table are three-letter abbrevia- tions for the amino acids. To find the amino acid specified by a particular codon (say the codon CAG), locate the first nucleotide (C) along the left side of the table and the second nucleotide (A) along the top of the table. Their intersection pinpoints one of four amino acids. Of those four the one aligned with the third nucleotide (G) is the amino acid in question. Thus the amino acid glutamine (Gin) is specified by the three- nucleotide sequence CAG. Shown in (b) is another version of the ge- netic code, one expressed in terms of DNA G cys U cys c STOP A Trp G Arg U Arg C Arg A Arg G Ser U Ser C Arg A Arg G - Gly U Gly C Gly A Gly G - Amino-acid abbreviations Ala = Alanine Arg = Arginine Asp = Aspartic acid Asn = Asparagine Cys = Cysteine Glu = Glutamic acid Gin = Glutamine Gly = Glycine - His = Histidine a. o- lie = Isoleucine Leu = Leucine LYS = Lysine Met = Methionine Phe = Phenylalanine Pro = Proline Ser = Serine Thr = Threonine Trp = Tryptophan Tyr = Tyrosine Val = Valine codons instead of RNA codons. Each single- stranded deoxyribonucleotide triplet listed in (b) is the sequence of the so-called sense strand of a DNA codon-the strand that does not serve as a template for synthesis of RNA. Note that most of the amino acids are specified by at least two codons. For example, phenylalanine is specified by two codons: TTT and TTC. Arginine is specified by a total of six codons: CGT, CGC, CGA, CGG, AGA, and AGG. In general, the more an amino acid is used in protein synthesis the likelier it is to be specified by more than one codon. Note also the start codon (ATG) and the three stop codons (TAA, TGA, and TAG) that are used to signal the beginning and end of protein synthesis. The substan- tive difference between the two versions of the genetic code is that in (b) the deoxyribo- nucleotide T replaces the ribonucleotide U. - (b) DNA Codons for the Twenty Amino Acids Ala Arg Asp Asn Cys Glu Gin Gly His lieu Leu Met Phe Pro Ser Thr Trp Tyr Val STOP Lys /QTART\ GCA AGA GAT AAT TGT GAA CAA GGA CAT ATA TTA AAA ATG TTT CCA AGT ACA TGG TAT GTA TAA GCG AGG GAC AAC TGC GAG CAG GGG CAC ATT TTG AAG TTC CCG AGC ACG TAC GTG TAG GCT CGA GGT ATC CTA CCT TCA ACT GTT TGA GCC CGG GGC CTG CCC TCG ACC GTC CGT CTT TCT CGC CTC TCC Los Alamos Science Number 20 1992 -- Understanding Inheritance molecular genetics degenerate into clearing up details here and details there? Some thought so, and bemoaned the passing of a golden age. But in reality another era, and one just as golden, was opening, thanks to development of techniques for manipulating and analyzing DNA. The Techniques of Molecular Genetics The late 1960s mark the beginning of the recombinant-DNA revolution. During the ensuing years it became possible to make billions of identical copies of segments of DNA by cloning (duplicating) each segment individually as a recombinant DNA molecule in the bacterium Escherichia coli. The significance of that breakthrough was enhanced by other new developments, including the ability to separate fragments of DNA that differ in length by only a few nucleotide pairs, to determine the nucleotide sequences of cloned segments of DNA, to create specific mutations in cloned genes, and to introduce cloned eukaryotic genes into experimental organisms. Those startling developments arose from advances during the previous decade in nucleic-acid biochemistry and in bacterial and phage genetics. Basic features of the replication, repair, and recombination of DNA and of the synthesis of proteins had been elucidated, and identification and isolation of the enzymes that catalyze the chemical reactions involved had allowed those processes to be reproduced in vitro. The action of phages as carriers of genetic material between different strains of E. coli had been utilized to isolate individual E. coli genes. The rates of transcription of E. coli genes had been determined (by measuring the amounts of RNA transcribed from the different genes) and had been found to be regulated, that is, to vary from gene to gene and in response to external stimuli. The observed regulation of gene expression in E. coli had been traced to the interaction of certain proteins with regulatory sequences in its genome. By 1968 about a hundred genes had been ordered on the genetic maps of phages, and about fifteen hundred genes had been ordered on the genetic map of E. coli. On the other hand, essentially nothing was known about the structure of eukaryotic genes, their regulation, or their organization in chromosomal DNA molecules. Even the major difference between prokaryotic and eukaryotic genes-the presence of introns in the latter-had not yet been discovered. Most frustrating was the lack of a methodology for studying eukaryotic genomes analogous to the phage-bacteria system for studying the organization, rearrangement, and functions of phage and bacterial genomes. But in 1968 techniques began to be developed that exploit the cellular machinery and the biosynthetic products of bacteria to replicate, manipulate, and analyze eukaryotic genes and to manufacture eukaryotic proteins. Improvements during the past twenty years in recombinant-DNA techniques have produced an explosion of knowledge about eukaryotic genes and about the organization and rearrangements of DNA in eukaryotic genomes, including the human genome. Number 20 1992 Los Alamos Science Understanding Inheritance This section briefly describes some of the techniques that are employed in the study of DNA and points out some of the facts about DNA the techniques have helped to reveal. The chronological approach will be more or less abandoned, and none of the contributions will be attributed to their originators. A description of the preparation of a sample of DNA is appropriate as a preliminary to this section. The usual preparation procedure involves treating a large number of cells (typically about 5 million) of the organism in question with a detergent, which dissolves cellular membranes and dissociates the proteinaceous component of the chromosomes from the DNA. Then the membrane components and the proteins are removed with an organic solvent such as a chloroform-phenol mixture, and the DNA is precipitated with ethanol as a highly viscous liquid. The mass of the DNA in such a sample is small, about 30 micrograms in the case of human DNA and correspondingly smaller in the case of DNA extracted from organisms with smaller genomes. It is worth noting that no DNA sample prepared in the above manner contains intact DNA molecules. The mechanical aspects of sample preparation (such as stirring and pipetting) invariably break some of the covalent bonds of the DNA backbones. That accidental fragmentation is usually of little consequence, however, because most of the techniques employed to study DNA at the molecular level are applicable only to stretches of DNA shorter than the intact molecules found in chromosomes. In fact, deliberate fragmentation, by either mechanical or biochemical means, is the first step in many of the techniques to be described below. The length of a DNA molecule or fragment is expressed in terms of the number of base pairs it contains. (Because the structure of DNA is regular, number of base pairs is directly proportional to physical length.) The average length of the intact DNA molecules within human chromosomes, for example, is about 130 million base pairs, which corresponds to a physical length of about 4.5 centimeters. The lengths of the known human genes are much shorter, ranging from less than a hundred base pairs for the transfer-RNA genes to over a million base pairs for the Duchenne muscular-dystrophy gene and the cystic-fibrosis gene. We turn now to the means for manipulating and analyzing DNA. Fractionation by Copy Number and Repetitive DNA. The mid 1960s brought to light a surprising feature of eukaryotic DNAs: their content of multiple identical or nearly identical copies of various sequences. The various repeated sequences are collectively called repetitive DNA, and, depending on the species, repetitive DNA is estimated to constitute between 3 and 80 percent of the total. (Between 25 and 35 percent of the human genome, and of other mammalian genomes, is repetitive DNA.) In contrast, the DNAs of viruses and prokaryotes contain no or very little repetitive DNA. The phenomenology of repetitive DNA is complex and not yet fully explored. A few of the repeated sequences are genes, but most have no known Los Alamos Science Number 20 1992 Understanding Inheritance function. The multiple copies of some repeated sequences are situated one after the other; the known lengths of the repeated units in such tandem repeats range from two base pairs to several thousand base pairs. Some tandem repeats occur at only one location within a genome; others, called interspersed tandem repeats, occur at many locations. Like the multiple copies of an interspersed tandem repeat, the multiple copies of other repeated sequences are scattered here and there within a genome; the known lengths of such interspersed repeats range from about a hundred base pairs to seven thousand base pairs. And finally the copy numbers of the various repeated sequences range from less than ten to over a million. Two of the many repeated sequences found in the human genome are the GT sequence, an interspersed tandem repeat that consists of between fifteen and thirty tandem repetitions of the sequence 5'-GT and has a copy number on the order of a hundred thousand, and the Alu sequence, an interspersed repeat that is about three hundred base pairs in length and has a copy number close to 2 million. The existence of repetitive DNA became known from comparison of the renaturation kinetics of prokaryotic and eukaryotic DNAs. Recall that the natural configuration of DNA is double-stranded. However, DNA can be separated into single strands (denatured) by, say, heating an aqueous solution of the DNA to about 100C When the temperature of a thermally denatured sample of DNA is lowered, random encounters among the single-stranded fragments lead to renaturation, or the re- establishment of hydrogen bonds between complementary fragments. The kinetics of the renaturation can be monitored by, for example, measuring the time dependence of the absorption of ultraviolet light by the sample, since single- and double-stranded DNA have different capacities to absorb ultraviolet light. Consider the renaturation of two samples of denatured DNA, one prepared by breaking the genome of E. coli into equal-length fragments and the other prepared by breaking, into fragments of the same length as the E. coli fragments, a hypothetical DNA molecule of the same total length as the E. coli genome but composed of multiple repetitions of a single sequence. Each single-stranded E. coli fragment is complementary to only one of the many single-stranded fragments in the first sample, whereas each single-stranded hypothetical fragment is complementary to one-half of the equally numerous single-stranded fragments in the second sample. Obviously, then, the hypothetical sample renatures more rapidly, at least initially, than the E. coli sample, and therefore the graphs of fraction renatured versus time for the two samples are different. This example illustrates why renaturation-kinetics data are the source of information about the presence of repetitive DNA. Other types of information can be extracted from renaturation-kinetics data. Consider the renaturation of the E. coli genome and the genome of the virus known as T4, each broken into fragments of the same length. Both genomes contain essentially no repetitive DNA, but the sample of E. coli DNA contains a greater number of fragments because the E. coli genome (which contains about 5,000,000 base pairs of DNA) is larger than the T4 genome (which contains about 170,000 base pairs Number 20 1992 Los Alamos Science Understanding Inheritance of DNA). Therefore the E. coli genome renatures less rapidly than the T4 genome. In other words, renaturation kinetics provides information about the relative sizes of genomes. Furthermore, because the rate at which hydrogen bonds are established between fragments of single-stranded DNA that have similar but not identical base sequences depends on the degree of similarity of the base sequences of the fragments, the kinetics of the joint renaturation of samples of DNA from different species provides an estimate of the overall similarity of the base sequences of the DNAs. Today renaturation is most often used to fractionate fragments of DNA by copy number, that is, to separate a DNA sample into components containing highly repetitive DNA, less highly repetitive DNA, and single-copy DNA. Such a separation narrows the search for genes, most of which occur only once within a genome and hence are contained in the single-copy fraction. Fragmenting DNA with Restriction Enzymes. Until 1970 DNA molecules were of necessity fragmented by mechanical means, such as forcing a sample through a syringe. Mechanical fragmentation has disadvantages: Identical pieces of DNA are not fragmented at the same points, and the lengths of the resulting fragments vary widely. Then came discovery of restriction enzymes (or, more precisely, type I1 restriction endonucleases), biochemicals capable of "cutting" double-stranded DNA not only in a reproducible manner but also into less widely varying lengths. In particular, a restriction enzyme recognizes and binds to an enzyme-specific, very short sequence within a DNA segment and catalyzes the breaking of two particular oxygen-phosphorus-oxygen (-0-Pa-) bridges, one in each backbone of the segment. The locations along a stretch of DNA of the sequence recognized by a restriction enzyme are called restriction sites. The -0-P-0- bridges broken by a restriction enzyme usually lie within the recognition sequence of the enzyme. For example, the restriction enzyme EcoRI recognizes and binds to the sequence and, if allowed to interact with a sample of DNA for a sufficiently long time (to completely "digest" the DNA), cuts the DNA within every occurrence of that sequence. Note that the sequence recognized by EcoRI, like the sequences recognized by many other restriction enzymes, is palindromic; in other words, the 5'-to-3' sequence of one strand is identical to the 5'-to-3' sequence of the other strand. The average length of the restriction fragments produced by EcoRI, a "6-base cutter" (a restriction enzyme that recognizes a 6-base-pair sequence), can be estimated to be about 4000 base pairs, since DNA is approximately a random sequence of four base pairs and any given sequence of six base pairs occurs on average every 46 = 4096 Los Alamos Science Number 20 1992 Understanding Inheritance base pairs within such a sequence. (Note, however, that the observed average length of the fragments produced by an N-base cutter sometimes differs considerably from the estimate of 4^.) Fragments with a shorter average length can be obtained by complete digestion with, say, a 4-base cutter, and fragments with a longer average length can be obtained by complete digestion with a restriction enzyme that recognizes a sequence longer than 6 base pairs or by partial digestion with a 6-base cutter, which leaves some of the restriction sites uncut. A majority of the many restriction enzymes available today, including EcoRI, cut DNA in a fashion such that the resulting fragments terminate in a very short section of single-stranded DNA. For example EcoRI cuts the DNA segment 5'- . . . GAATTC . . . -3' 3'- . . . CTTAAG . . . -5' into the fragments and Note that the single-stranded ends of the two EcoRI restriction fragments are com- plementary. The utility of such "sticky" ends in the creation of recombinant DNA molecules will be described below. A brief natural history of restriction enzymes is presented in "Restriction Enzymes," as well as a listing of a few of the many available. Fractionating DNA Fragments by Length: Gel Electrophoresis. Because DNA fragments are negatively charged, they are subject to an electrical force when placed in an electric field. In particular, DNA fragments placed in a gel (a porous, semisolid material) move through the gel in a direction opposite to the direction of an applied electric field. Furthermore, the rate at which a fragment travels is approximately inversely proportional to the logarithm of its length. Therefore gel electrophoresis is a means for separating DNA fragments by length. Details of the technique are described in "Gel Electrophoresis." But what is the point of separating fragments of DNA by length? After all, the lengths of the fragments obtained either by breaking a DNA molecule mechanically or by cutting it with a restriction enzyme bear no relation to the functioning of the molecule within a cell. Nevertheless, gel electrophoresis, particularly of restriction fragments, is of great utility in the study of DNA. For example, consider the genome of the phage known as A (lambda), a double-stranded DNA molecule about 50,000 base pairs in length. When many copies of the A genome are completely digested with Ec0R.I and the resulting restriction fragments are subjected to gel electrophoresis, groups of Number 20 1992 Los Alamos Science Understanding Inheritance Li ke the immune systems of vertebrate eu karyotes, the restriction enzymes of bac- teria combat foreign substances. In particu- lar, restriction enzymes render the DNA of, say, an invading bacteriophage harmless by catalyzing its fragmentation, or, more precisely, by catalyzing the breaking of cer- tain -0-P-0- bridges in the backbones of each DNA strand. The evolution of restric- tion enzymes helped many species of bac- teria to survive; their discovery by humans helped precipitate the recombinant-DNA revolution. Three types of restriction enzymes are known, but the term "restriction enzymeJ1 refers here and elsewhere in this issue to type II restriction endonucleases, the only type commonly used in the study of DNA. (A nuclease is an enzyme that catalyzes the breaking of -0-P-O- bridges in a string of deoxyribonucleotides or ribonucleotides; an endonuclease catalyzes the breaking of internal rather than terminal -0-P-O- bridges.) Many restriction enzymes have been isolated; more than seventy are avail- able commercially. Each somehow recog- nizes and binds to its own restriction sites, short stretches of double-stranded DNA with aspecific basesequence. Having bound to one of its restriction sites, the enzyme catalyzes the breaking of one particular -0- -P-0- bridge in each DNA strand. The accompanying table lists a few of the more commonly used restriction enzymes and the organism in which each is found. The first three letters of the name of a restriction enzyme are an abbreviation for the species of the source organism and are therefore customarily italicized. The next letter(s) of the name designates the strain of the source organism, and the terminal Ro- Restriction Enzyme BamH I Eco R I MboI Source Organism Base Sequence of Restriction Site Bacillus amyloliquefaciens Escherichia coli Haemophilus aegyptius Haemophilus influenzae Moraxella bows Nocardia otitidis Thermus aquaticus ian numeral denotes the order of its dis- covery in the source organism. Also listed in the table are the base se- quences of the restriction sites of the en- zymes. The red line separates the ends of the resulting fragments. The restriction sites of many of the known restriction enzymes and of all the restriction enzymes listed in the table have palindromic base sequences. That is, the 5'-to-3' base sequence of one strand is the same as the 5'-to-3' base sequence of its complementary strand. Both the bridges broken by a restriction enzyme that recognizes a palindromic sequence lie within or at the ends of the sequence. Note that most of the restriction enzymes in the table make "staggered" cuts; that is, they produce fragments with protruding single-stranded ends. Those "cohesive," or 'stickyJJJ ends are very useful. Suppose that a sample of human DNA and a sample of phage DNA are both fragmented with the same restriction enzyme, one that makes staggered cuts. When the resulting frag- ments are mixed, they will tend to hydrogen bond with each other because of the complementarity of their sticky ends. In particular, some human DNA fragments will hydrogen bond to some phage DNA fragments. And that bonding is the first step in the creation of a recombinant DNA mol- ecule. A final point about restriction enzymes is the problem of how the DNA of a bacterium avoids being chopped up by the friendly fire of the restriction enzyme(s) it produces. Evolution has solved that problem also. A bacterium that produces a type I I restriction endonuclease produces in addition another enzyme that catalyzes the modification of restriction sites in its own DNA in a manner such that they cannot serve as binding sites for the restriction enzyme. Los Alamos Science Number 20 1992 Understanding Inheritance Historically gel electrophoresis was first applied to separating proteins essentially according to mass, but the technique was adapted to separating fragments of DNA (or RNA) essentially according to fragment length. The technique works on DNA be- cause the phosphate groups of a DNA fragment are negatively charged, and there- fore, under the influence of an electric field, the fragment migrates through a gel (a porous, semisolid medium) in a direction opposite to that of the field. Furthermore, the rate at which the fragment migrates through the gel is approximately inversely proportional to the logarithm of its length. Gel electrophoresis of DNA is carried out with two types of electric field. Conventional gel electrophoresis employs a field that is temporally constant in both direction and magnitude. Incontrast, pulsed-field gel elec- trophoresis employs a field that is created by pulses of current and therefore varies periodically from zero to some set value. More important, the direction of the electric field also varies because different pulses flow through pairs of electrodes at different locations. (Note, however, that the time- averaged direction of the electric field is along the length of the gel.) The advantage of such a pulsed field is that it prevents long DNAfragments, fragments longerthan about 50,000 base pairs, from jackknifing within the structural framework of the gel and thus allows the long fragments to migrate through the gel in a length-dependent manner, just as shorter fragments migrate in a constant electric field. The gel employed is usually a solidified aqueous solution of agarose, a purified form of agar. By varying the concentration of agarose in the gel, conventional gel electro- phoresis can be applied to samples con- taining DNAfragments with average lengths between a few hundred base pairs and tens of thousands of base pairs. (Another gel used for conventional electrophoresis is polyacrylamide, which is particularly suited GEL ELECTROPHORESIS (a) Conventional Gel Electrophoresis loaded a gel-calibration sample, a sample containing fragments of known lengths. As DNA fragments shown in (a), the flow of electricity through I Cathode 1 Agarose gel Anode the gel causes the fragments to migrate toward the positive electrode. The shorter fragments move more easily through the gel and therefore travel farther. \ ^- Buffer solution Electrophoresis chamber The positions of the fragments after electro- phoresis can be detected by soaking the gel in a solution of ethidium bromide, which binds strongly to DNA and emits visible light when illuminated with ultraviolet light. In a photograph of the ultraviolet-illuminated gel, the fragments appear as light bands. The ethidium-bromide visualization technique makes the positions of all the fragments in the gel visible. An alternative visualization to separating fragments with lengths less technique detects only certain fragments than about a thousand base pairs and is (see "Hybridization Techniques"). therefore the gel of choice for sequencing.) Conventional gel electrophoresis in an aga- The above description of gel electrophore- rose gel is illustrated in (a); details of the sis might suggest that the sample of DNA technique are as follows. contains but one copy of each fragment. In reality the sample must contain many cop- Agarose isdissolved in a hot buffer solution, ies of each fragment, and each band seen and the gel solution is allowed to solidify into in the image of the length-separated frag- a thin slab in a casting tray in which the teeth mentscontains manyfragments, all of which of a comb-like device are suspended. After have the same length but not necessarily the gel has solidified, the comb is removed. the same sequence. The "wells" formed by the teeth of the comb are the receptacles into which the samples of DNA are loaded. The thickness of the gel is about 5 millimeters; its length and width are much greater and vary with the purpose of the electrophoresis. Before being loaded with the DNA sample(s), the gel is im- mersed in a conducting buffer solution in an electrophoresis chamber. Before a DNA sample is loaded into a well, it is mixed with a dense solution of sucrose or glycerol to prevent the DNA from escap- ing into the buffer solution. Into one well is Number 20 1992 Los Alamos Science 55 Understanding Inheritance (b) Conventional Gel Electrophoresis of Fragmented Human DNA Shown in (b) are the results of conventional gel electrophoresis of six different samples of human DNA. Samples 1, 2, and 3 con- sisted of the restriction fragments produced by cutting the same cloned segment of human DNA with EcoRI alone (a 6-base cutter), with both EcoRI and /-//ndlll (an- other 6-base cutter), and with HindM alone, respectively. Samples 4,5, and 6 consisted of the restriction fragments produced by cutting a different cloned segment of human DNA again with EcoRI alone, with both EcoRI and HindW. and with HindW alone, respectively. The leftmost lane of the gel contains fragments of the lengths indicated. Note that all the restriction fragments are well resolved. (c) Pulsed-field Electrophoresis of Intact DNA Molecules of Saccharomyces cerevisiae Shown in (c) are the results of pulsed-field gel electrophoresis of three identical samples, each containing all sixteen of the intact DNA molecules that compose the genome of the yeast Saccharomyces cerevisiae. The four longest chromosomal DNA molecules are not resolved; all four are located in the topmost band. The remaining twelve chromosomal DNA molecules, how- ever, arewell resolved. The indicated lengths of the resolved DNA molecules were deter- mined from the positions, in the rightmost lane of the gel, of the fragments in a calibra- tion sample. Even longer fragments, frag- ments with lengths up to about 5 million base pairs, can be separated by increasing the duration of the pulses. Los Alamos Science Number 20 1992 Understanding Inheritance DNA fragments are found in the gel at locations corresponding to lengths of 3400, 4900, 5300, 6000, 7900, and 22,000 base pairs. That set of six EcoRI restriction- fragment lengths is unique to the A genome and hence can be used as an identifying characteristic of the genome, a characteristic called its EcoRI restriction-fragment fingerprint. Only viral genomes can be fingerprinted with a 6-base cutter such as EcoRI. Complete digestion of the much larger bacterial and eukaryotic genomes with a 6-base cutter yields so many restriction fragments that gel electrophoresis produces an essentially continuous smear of fragments rather than a relatively small number of well-separated fragments. However, a short segment of a large genome can be fingerprinted with a &base cutter, provided many copies of the segment are available. Note that the EcoRI restriction-fragment fingerprint of the A genome provides no in- formation about the order of the restriction fragments along the A genome. More in- formation is needed to order the fragments and thereby construct an EcoRI restriction- site map of the A genome, a map showing the distances between its EcoRI restriction sites. One way to get the additional information is to carry out two digestions, one of which is complete and the other only partial. The complete digestion produces fragments such that the length of each is equal to the distance between some two adjacent restriction sites; the partial digestion produces some fragments such that the length of each is equal to the distance spanned by three or more adjacent restriction sites. Together the length data obtained from the two digestions provide sufficient information to order the fragments and construct the restriction-site map. The restriction-fragment fingerprints of cloned segments of a large genome have found application in the efforts to "map" the segments, that is, to arrange the segments in the order in which they appear along the genome. The principle behind this application is as follows. Suppose that the restriction-fragment fingerprints of two segments of a genome include a number of restriction-fragment lengths in common. Calculations based on the distribution of restriction sites along the genome and on the number of restriction-fragment lengths in common lead to a value for the probability that the two fragments overlap and therefore contain pieces of DNA that are contiguous along a chromosomal DNA molecule. (See "Physical Mapping-A One-Dimensional Jigsaw Puzzle" in "Mapping the Genome.") This discussion of gel electrophoresis concludes by noting that the electric field used to cany out the procedure is usually a constant electric field. However, in such a field long DNA fragments (fragments longer than about 50,000 base pairs) tend to become trapped at arbitrary locations in the gel and thus do not migrate through the gel in a length-dependent manner. But fragments that long or longer are of interest, and separating them by length is sometimes desirable. For example, making a NotI restriction-site map of a human chromosome involves gel electrophoresis of restriction fragments that are on average 1,000,000 base pairs long. (Not! is an 8-base cutter; the estimated average length of the fragments it produces, namely 4* = 65,536 base pairs, differs considerably from the observed average length because the recognition sequence of that restriction enzyme includes several occurrences of the dinucleotide Number 20 1992 Los Alarms Science Understanding Inheritance sequence 5'-CG, which happens to be rare in mammalian genomes. NotI is one of a group of "infrequent cutters," all of which contain at least one occurrence of the sequence 5'-CG and produce fragments with average lengths ranging from 100,000 base pairs to 1 million base pairs.) Length separation of long fragments can be accomplished by using an electric field that varies intermittently in direction but has a time-averaged direction along the length of the gel. Such a "pulsed" field allows long DNA fragments to wind their way through the molecular framework of the gel. As shown in "Gel Electrophoresis," pulsed-field electrophoresis can separate even the very long DNA molecules extracted intact from yeast chromosomes. (Note that pulsed-field gel electrophoresis of long fragments requires preparation of the DNA sample by special methods because the accidental fragmentation involved in the method described at the beginning of this section cannot be tolerated when DNA molecules are to be studied either intact or as the long, reproducibly cut fragments produced by a restriction enzyme such as NotI.) Amplifying DNA. Most of the techniques currently used to analyze a segment of DNA require the availability of many copies of the segment. Two methods for "amplifying" a DNA segment are now at hand: molecular cloning, which was developed in the 1970s, and the polymerase chain reaction (PCR), which was developed less than a decade ago. Amplification by Molecular Cloning. Molecular cloning involves replication of a foreign DNA segment by a host organism, usually the bacterium E. coli. However, a segment of DNA that has entered an E. coli cell will not be replicated by the cell unless the segment has first been combined with a cloning "vector," a DNA molecule that the cell does replicate. The combination of the segment to be cloned, the "insert," and the vector is called a recombinant DNA molecule. The phenomenon of transduction, discovered in 1952, had shown that DNA from the genome of one strain of E, coli is sometimes incorporated into the genome of a phage without affecting the ability of the phage to be replicated in another strain of E. coli. In other words, the phage genome was known to act as a vector, a DNA molecule that carries foreign DNA into a host cell, where it is then replicated. Nevertheless, the earliest cloning vectors were plasmids, small DNA molecules found in and replicated by bacteria. (Plasmids, like the genomes of bacteria, are circular DNA molecules. They are, however, much smaller than bacterial genomes. Some plasmids are replicated only when their hosts replicate and occur as single copies. The replication of other plasmids is not coordinated with host-cell replication; such plasmids occur as multiple copies.) The plasmid first used was one of a number that had been studied intensively because they contain genes that confer on the bacteria in which they reside the ability to survive in the presence of antibiotics. Today two vectors in addition to phage genomes and plasmids are also widely used: cosrnids, which are replicated in E. coli, and yeast artificial chromosomes (YACs), which are Los Alamos Science Number 20 1992 Understanding Inheritance replicated in the single-celled eukaryotic organism Saccharomyces cerevisiae (baker's yeast). Both cosmids and YACs are synthetic rather than naturally occurring DNA molecules. The first step in molecular cloning is to make the recombinant DNA molecules in vitro. The following is a description of the procedure employed when the vector is a plasrnid that contains a single restriction site for EcoN embedded within a gene for resistance to ampicillin. Digestion of a population of such plasmids with EcoRI produces "linearized" plasmids with sticky ends. Inserts with identical sticky ends are formed by digesting the DNA to be cloned also with EcoRI. When the linearized plasmids and the inserts are mixed together, along with an enzyme called a DNA ligase, the sticky ends of some inserts hydrogen bond to the sticky ends of the linearized plasmids. The backbones of such hydrogen-bonding products are then covalently linked by the DNA ligase into recombinant DNA molecules (here recombinant plasmids). Note that the ligation mixture also contains some nonrecombinant plasrnids because some linearized plasmids simply recyclize. A more detailed description of the making of recombinant DNA molecules with plasmids and other vectors is presented in the article "DNA Libraries." Here we point out only that different vectors are used to clone inserts of different lengths. Plasmids cany inserts that are usually about 4000 base pairs long, A phages cany inserts that are usually four to five times longer, and YACs carry inserts that are usually more than one hundred times longer. (The great lengths of the inserts carried by YACs implies that YAC cloning, like pulsed-field gel electrophoresis, requires a special method of DNA preparation.) The next step in molecular cloning with plasmids is to expose a population of E. coli cells to the ligation mixture in the hope that one recombinant plasmid will enter each of a reasonable fraction of the cells. Entry of a plasmid into an E. coli cell is said to transform the cell, provided the plasmid is replicated by the cell. The mechanism by which a plasmid (or a YAC) enters a host cell is not completely understood, but several empirical methods have been found that increase the efficiency of transformation (number of cells transformed per unit mass of recombinant DNA molecules). In contrast, the mechanism by which a phage enters (infects) a host cell is fairly well understood and is inherently more efficient. After the E. coli cells have been exposed to the ligation mixture, the solution containing the exposed cells is diluted, a small amount of the diluted solution is transferred to each of a number of culture dishes containing a solid growth medium, and the cells are allowed to divide. (Dilution of the exposed cells assures that only a relatively small number of cells is transferred to each culture dish.) The aggregate, or colony, of cells produced by successive divisions of a single cell is called a clone of the single cell. Each member of a clone that arises from a transformed cell contains at least one copy of the plasmid and, if the transforming plasmid was a recombinant plasmid, at least one copy of the insert. Number 20 1992 Los Alamos Science Understanding Inheritance Because the goal of molecular cloning is not only to obtain many copies of the insert within a recombinant DNA molecule but also to do so in as short a time as possible, one criterion for a host cell is a short generation time. The generation times of both E. coli and yeast are suitably short. For example, the generation time of E. coli is about 20 minutes. Thus a single E. coli cell can, under suitable conditions, multiply into more than a billion cells in about 10 hours. The final step in plasmid cloning is to identify the clones arising from cells trans- formed by recombinant plasmids. Recall that the EcoRI restriction site of the plasmid used in this example lies within its ampicillin-resistance gene. Assume that each host cell itself contained a plasmid carrying a gene for resistance to ampicillin. Then only those clones that arose from cells transformed by a recombinant plasmid possess an inoperative ampicillin-resistance gene (because the insert interrupts the gene). Using that fact to identify the clones of interest involves transferring a portion of each clone from the culture dish to some other vessel in a manner that preserves the positions of the clones. Ampicillin is then added to the other vessel, and the positions of the clones that die are noted. The clones at the corresponding positions on the culture dish are the clones desired. Other ingenious tricks have been devised to identify the desired clones. The sample of DNA to be cloned usually consists of many different fragments, all from the same source. Examples are the large sets of fragments obtained by cutting, say, the mouse genome or the human X chromosome with a restriction enzyme. Then each recombinant DNA molecule contains a different fragment of the source DNA, and each host cell entered by a recombinant DNA molecule gives rise to a clone of a different fragment. A collection of such clones is called a DNA library-a mouse-genome DNA library, say, or a human-X-chromosome DNA library. The article "DNA Libraries" describes molecular cloning more fully and discusses the problems it presents. Amplification by PCR. Unlike cloning, the polymerase chain reaction is carried out entirely in vitro and, more important, is capable of amplifying a specific one of the many fragments that may be present in a DNA sample. The selectivity of the reaction implies that it is also a means for detecting the presence of the fragment being amplified. Details of the reaction are presented in "The Polymerase Chain Reaction and Sequence-tagged Sites" in "Mapping the Genome." Sequencing DNA. The ultimate in detailed information about a fragment of DNA is its base sequence. The process of obtaining that information is called sequenc- ing. Two sequencing methods were developed in 1977, both based on essentially the same principle but each realizing the goal in a different way. Let b1 b2b3 . . . bN be the base sequence of the fragment to be sequenced. Consider the set of subfrag- ments {bl , b1 b2, & bs, . . . , b1 b2 b3 . . . b N} . Assume that such a set of subfragments Los Alumos Science Number 20 1992 Understanding Inheritance can be generated and, equally important, can be separated into four subsets: the subset A consisting of those subfragments that end in the base A; the subset C consisting of those subfragments that end in C; the subset G consisting of those subfragments that ends in the base G; and the subset T consisting of those subfrag- ments that end in the base T. Note that together the four subsets compose the set {bl , bi b2, bl b2b3, . . . , bib2 b3 . . . b N} . The subsets A, C, G, and T are subjected to electrophoresis, each in a different "lane" of a gel (a different strip of gel parallel to the direction of the applied electric field). After electrophoresis each subfragrnent is located in one of the four lanes according to its length. Suppose that the shortest subfragment, bl, appears in the A lane of the gel; that the next longer subfragment, bib2, appears in the T lane; that the next longer subfragment, blb2b3, appears in the G lane; . . . ; and that the longest subfragment, bl b2b3 - . . bN, appears in the T lane. Then the base sequence of the fragment is ATG . . . T. s Obviously the above description of the principle of the two sequencing methods has avoided the question of how the four subsets of subfragments are generated. The procedures for doing so are described in "DNA Sequencing" in "Mapping the Genome." Although sequencing is still a tedious and expensive process, the information so obtained is crucial to identification of the DNA mutations that cause inherited disorders and to a broad understanding of the functioning and evolution of genes and genomes. Much effort is being devoted to increasing the speed and decreasing the cost of current sequencing methods and to searching for new methods. Hybridization: Detecting the Presence of Specific DNA Sequences. The two single-stranded DNA fragments produced by denaturation of a (double-stranded) DNA fragment will, under appropriate conditions, renature (form a double-stranded fragment by hydrogen bonding) because the single-stranded fragments are comple- mentary along the entirety of their lengths. (Recall that two single-stranded fragments are complementary if and only if the 5'-to-3' base sequence of one is the complement of the 3'-to-5' base sequence of the other.) Similarly, hydrogen bonding between an RNA fragment and a complementary single-stranded DNA fragment will form a double-stranded DNA-RNA fragment, a phenomenon called hybridization. (Hy- bridization between the RNA transcript of an E. coli gene and the template strand of the gene was the technique used in the 1960s to measure the rates of transcription of various E. coli genes.) The term "hybridization" now also includes the hydrogen bonding that occurs between any two single-stranded nucleic-acid fragments that are complementary along only some portion (usually a relatively short portion) of their lengths. Hybridization is widely used to detect the presence of a particular DNA segment in a sample of DNA. If the sample consists of a set of cloned DNA fragments, each cloned fragment is denatured and then allowed to interact with a solution containing many Number 20 1992 Los Alamos Science Understanding Inheritance copies of a radioactively labeled "probe," a relatively short stretch of single-stranded DNA whose sequence is identical to or complementary to some unique portion of the segment of interest. Under the right conditions the probe hybridizes only to the cloned fragment (or fragments) that contains the segment of interest, and the radioactivity of the probe identifies the fragment to which the probe has hybridized. For example, suppose that the sample is a complete set of cloned human DNA fragments and the segment of interest is the interspersed tandem repeat (5'-GTI Examples of a probe for that segment are the single-stranded fragments with the sequences (5'- AC)y and (5f-GT)7. Because the segment (5f-GT) appears at numerous locations in the human genome, such a probe hybridizes to numerous cloned fragments but only to those containing the interspersed tandem repeat (or a portion thereof). If the sample to be interrogated with a probe is instead a solution containing many different DNA fragments, the fragments must first be separated and immobilized, usually by gel electrophoresis. If the probe is sufficiently short, hybridization can be carried out directly on the gel. Usually, however, the length-separated fragments are first transferred from the gel to a nitrocellulose filter. The procedure, called Southern (or gel-transfer) hybridization, is illustrated in "Hybridization Techniques." In-situ hybridization is a variation of hybridization in which the sample to be interrogated with a probe consists of the intact DNA molecules within metaphase chromosomes. The metaphase chromosomes are spread out on a microscope slide and partially denatured. The probe copies are labeled with a fluorescent molecule and allowed to interact with the denatured chromosomes. The presence of bound probe is detected by observing the chromosomes with a fluorescence microscope. An example of the fluorescence signal obtained by using the technique is shown in "Hybridization Techniques." In-situ hybridization provides information about which chromosome contains the segment of interest and its approximate location on the chromosome. This section on the techniques of molecular genetics concludes with an application that not only requires the use of almost all the techniques described but also is of particular significance to the efforts to arrange cloned fragments of human DNA in the same order as they appear in the intact DNA molecules of human chromosomes. The application involves the use of long cloned fragments of human DNA to obtain an upper limit on the length of the segment of DNA that separates the chromosomal locations of any two short cloned fragments of human DNA (such as those provided by plasmid, phage, or cosmid cloning). The long fragments, which are produced by cutting human genomic DNA with an infrequent cutter, are subjected to pulsed-field gel electrophoresis and then to Southern hybridization. Two different probes are used separately in the hybridization; each is unique to one of the two short cloned fragments. If both probes hybridize to the same long fragment, then both short fragments lie within the long fragment In other words, the chromosomal locations of the short fragments are separated by a length of DNA no longer than the length of the long fragment to which both probes hybridized. Los Alarnos Science Number 20 1992 Southern hybridization is a technique for identifying, among a sample of many differ- ent DNA fragments, the fragment(s) con- taining a particular nucleotide sequence. As depicted in (a), the sample has typically been fragmented with a restriction enzyme. The restriction fragments are subjected to gel electrophoresis to separate them by length and immobilize them. The length- separated fragments are then transferred to a filter paper made of nitrocellulose, a pro- cedure called blotting. (Note that blotting preserves the locations of the fragments.) The filter is washed first with a solution that denatures the fragments and then with a solution containing many copies of a radio- actively labeled, single-stranded "probe" whose sequence is identical to or comple- mentary to some unique portion of the se- quence of interest. The probe hybridizes (hydrogen bonds) to only the denatured fragments containing the complement of its sequence and hence the sequence of inter- est. The unbound probe is washed away, and the filter is dried and placed in contact h with x-ray film. The radioactivity of the bound probe exposes the film and creates an im- age, an autoradiogram, of the fragment(@ to which the probe has bound. Southern hybridization is particularly useful for de- tecting variations among different members of a species in the lengths of the restriction fragments originating from a particular re- gion of the organism's genome (see "Mod- ern Linkage Mapping with Polymorphic DNA Markers" in "Mapping the Genome"). The number of fragments "picked out" by a probe depends on the number of times the sequence of interest occurs in the sample DNA. If the sequence occurs only once (if a probe for, say, a single-copy gene is being used), the probe picks out one or at most two fragments (provided the probe isshorter than any of the fragments in the sample). On the other hand, if the sequence of inter- est occurs more than once (if a probe for a multiple-copy gene or a repeated sequence is being used), the probe picks out a larger Number 20 1992 Los Alamos Science - HYBRIDIZATION TECHNIQUES (a) Southern Hybridization DNA sample I Fragmentation with restriction enzyme Restriction fragments Gel eletrophoresis Gel containing length- separated restriction fragments I I Transfer fragments from gel to nitrocellulose filter Filter with were in I the gel Hybridization with radioactively labeled probe Filter with probe bound to mentary I fragment 1 Autoradiography Film , showing image of .. I , At'- . hybridized fragment number of fragments. Furthermore, the hy- bridization conditions (temperature and sa- linity of the probe solution) can be adjusted so that either exact complementarity or a lesser degree of complementarity is re- quired for binding of the probe. In-situ hybridization is a variation of hybrid- ization in which the sample consists of the complement of chromosomes within a cell arrested at metaphase. The metaphase chromosomes are spread out and partially denatured on a microscope slide, the probe is labeled with a fluorescent dye, and the bound probe is imaged with a fluorescence microscope. Shown in (b) is the fluores- cence signal resulting from in-situ hybrid- ization of a probe for the human telomere to human metaphase chromosomes. (A te- lomere is a special sequence at each end of a eukaryotic DNA molecule that protects the molecule from enzymatic degradation and prevents shortening of the molecule as it is replicated. The sequence of the human telomere was discovered by Robert K. Moyzis and his colleagues, who also pro- vided evidence that all vertebrates share the same telomeric sequence. Note that, as expected, the probe has bound only to the terminal regions of each chromosome. (Mi- crograph courtesy of Julie Meyne.) (b) Results of In-Situ Hybridization of - Human-Telomere Probe to Human Chromosomes 63 Understanding Inheritance r Promoter (TATA box) Upstream enhancer txon txon Downstream enhancer Stop site Poly A site -j 1 1 t xon Upstream region Each eukaryotic gene is placed in one of three classes according to which of the three eukaryotic RNA polymerases is in- volved in its transcription. The genes for RNAs are transcribed by RNA polymerases I and Ill. The genes for proteins, the class first brought to mind by the word "gene" and the class focused on here, are transcribed by RNA polymerase II (polll). Shown above are the components of a prototypic protein gene. By convention the sense strand of the gene, the strand with the sequence of DNA bases corresponding to the sequence of RNA bases in the primary RNA transcript, is depicted with its 5'-to-3' direction coincident with the left-to-right di- rection. (Often only the sense strand of a gene is displayed.) The left-to-right direc- tion thus coincides with the direction in which the template strand is transcribed. The terms "upstream" and "downstream" describe the location of one feature of a gene relative to that of another. Their mean- ings in that context are based on regarding transcription as a directional process analo- gous to the flow of water in a stream. The start site is the location of the first deoxyribonucleotide in the template strand that happens to be transcribed. It defines the beginning of the transcription region of the gene. Note that the start site lies up- stream of the DNA codon (ATG) corre- sponding to the RNA codon (AUG) that signals the start of translation of the tran- scribed RNA. The transcription region ends at some nonspecific deoxyribonucleotide between 500 and 2000 base pairs down- Transcription region stream of the poly A site. Within the poly A site are sequences that, when transcribed, signal the location at which the primary RNA transcript is cleaved and equipped with a "tail" composed of a succession of ribo- nucleotides containing the base A. (The poly A tail is thought to aid the transport of messenger RNA from the nucleus of a cell to the cytoplasm.) Note that the poly A site lies downstream of the DNA codon (here TAA) corresponding to one of the RNA codons (UAA) that signals the end of trans- lation of the transcribed RNA. Within the transcription region are exons and introns. Exons tend to be about 300 base pairs long; each is a succession of codons uninterrupted by stop codons. In- trons, on theother hand, are not uninterruped successions of codons, and the RNA seg- ments transcribed from introns are spliced out of the primary RNA transcript before translation. A few protein genes contain no introns (the human a-interferon gene is an example), most contain at least one, and some contain a large number (the human thyroglobulin gene contains about forty). Generally the amount of DNA composing the introns of a protein gene is far greater than the amount composing its exons. Close upstream of the start site is a pro- moter sequence, where pol II binds and initiates transcription. A common promoter sequence in eukaryotic genes is the so- called TATA box, which has the consensus sequence 5'-TATAAA and is located at a variable short distance (about 30 base pairs) upstream of the start site. I Downstream region The region upstream of the promoter and, less frequently, the downstream region or the transcription region itself contain se- quences that control the rate of initiation of transcription. Although expression of a pro- tein gene is regulated at a number of stages in the pathway from gene to protein, control of replication initiation is the dominant regu- latory mechanism. (Primary among the other regulatory mechanisms is control of splic- ing.) The regulated expression of a gene (the when, where, and degree of expres- sion) is the key to phenotypic differences between the various cells of a multicellular organism and also between organisms that possess similar genotypes. Initiation of transcription is controlled mainly by DNA sequences (cis elements) and by certain proteins, many but not all of which aresequence-specific DNA-binding proteins (trans-acting transcription factors). Thus both temporal and cellular specificities of transcription control are governed by the availability of the different trans-acting tran- scription factors. Interactions of transcrip- tion factors with cis elements and with each other lead to formation of complex protein assemblies that control the ability of pol 11 to initiate transcription. Most of the complexes enhance transcription initiation, but some act as repressors. Enhancers and repres- sors can be located as far as 10,000 base pairs away from the transcription region. Class I and class Ill genesdifferfrom protein genes not only in their anatomies but also in the promoters, cis elements, and trans- acting factors involved in their transcription. Los Alamos Science Number 20 1992 Understanding Inheritance Genes and Genomes: What the Future Holds The techniques described in the preceding section, and others not mentioned, have greatly increased our knowledge of the molecular anatomies of genes. Previously, a gene for a protein was defined narrowly as a segment of DNA that is transcribed into a messenger RNA, which in turn is translated into the protein. The definition considered more appropriate today includes not only the protein-coding segment of the gene (its transcription region) but also its sometimes far-flung regulatory regions (see "The Anatomy of a Eukaryotic Protein Genee'). The regulatory regions contain DNA sequences that help determine whether and at what rate the gene is expressed (or, equivalently, the protein is synthesized). Some of the genes of a multicellular organism, its "housekeeping" genes, are expressed at more or less the same level in essentially all of its cells, regardless of type. Others are expressed only in certain types of cells or only at certain times. Gene regulation is, in fact, the key not only to appropriate functioning of the organism but also to its development from a single cell. In addition, gene regulation may also be responsible for the striking phenotypic differences between higher apes and humans despite the negligible differences between the structures of their proteins. "The Anatomy of a Eukaryotic Protein Gene" presents also a few details about the mechanisms of gene regulation. Despite the accumulating knowledge, it is safe to say that what is known about genes, particularly human genes, is far less than what remains to be learned. The total number of human genes can now be only crudely estimated, remarkably few have been localized to particular regions of particular chromosomes, and even fewer have been sequenced or studied in sufficient detail to understand their regulation. Other outstanding questions include the mechanisms by which the expression of genes is coordinated and the effects of gene mutations on morphology, physiology, and pathology. The techniques of molecular genetics are also providing information about genomes as a whole, opening the way to comparative studies of genome anatomy, organization, and evolution. For example, the available evidence indicates remarkable similarities between the mouse genome and the human genome, despite the 60 million years that have elapsed since rodents and primates diverged from a common ancestor. The similarities lie not only in the base sequences of genes but also in their linkages. Perhaps the conserved linked genes represent units of some higher, as yet unknown operational feature. The same may be true also of repetitive DNA, about which we now know so little. In time, when those and other genomes have been sequenced in their entireties, the observed similarities and differences will be a rich source of answers and new questions about the operation and evolution of genomes. H Number 20 1992 Los A l m s Science Understanding Inheritance Further Reading James A. Peters, editor. 1964. Classic Papers in Genetics. Englewood Cliffs, New Jersey: Prentice-Hall, Inc . J. Herbert Taylor, editor. 1965. Selected Papers on Molecular Genetics. New York: Academic Press. John Cairns, Gunther S. Stent, and James D. Watson, editors. 1966. Phage and the Origins of Molecular Biology. Cold Spring Harbor, New York: Cold Spring Harbor Laboratory of Quantitative Biology. John C. Kendrew. 1968. The Thread of Life: An Introduction to Molecular Biology. Cambridge, Massachusetts: Harvard University Press. Rene J. Dubos. 1976. The Professor, the Institute, and DNA. New York: The Rockefeller University Press. Franklin H. Porugal and Jack S. Cohen. 1977. A Century of DNA: A History of the Discovery of the Structure and Function of the Genetic Substance. Cambridge, Massachusetts: The MIT Press. Horace Freeland Judson. 1979. The Eighth Day of Creation. New York: Simon and Schuster. James D. Watson. 1980. The Double Helix: A Personal Account of the Discovery of the Structure of DNA. New York: W. W. Norton and Co. James D. Watson and John Tooze. 1981. The DNA Story: A Documentary History of Gene Cloning. San Francisco: W. H. Freeman and Company. James D. Watson, Nancy H. Hopkins, Jeffrey W. Roberts, Joan Argetsinger Steitz, and Alan M. Weiner. 1987. Molecular Biology of the Gene. Men10 Park, California: The Benjamm/Cummings Publishing Company, Inc. David A. Micklos and Greg A. Freyer. 1990. DNA Science: A First Course in Recombinant DNA Technology. New York: Cold Spring Harbor Laboratory Press. James Damell, Harvey Lodish, and David Baltimore. 1990. Molecular Cell Biology, second edition. New York: W. H. Freeman and Company. Maxine Singer and Paul Berg, 1991. Genes & Genomes: A Changing Perspective. Mill Valley, California: University Science Books. Robert P. Wagner is a consultant to the Laboratory's Life Sciences Division and Professor Emeritus of Zoology at the Uni- versity of Texas, Austin, the institution from which he received his Ph.D. His work at the Laboratory focuses on the ac- tivities of the Center for Human Genome Studies. He has taught undergraduate and graduate genetics for over thiry-five years and has authored or co-authored six books and many research and review articles on various aspects of g~netics. His numer- ous honors and awards include fellowships from the National Research Council and tile Guggenheim Foundation and election as a fellow of the American Association for the Advancement of Science and as presi- dent of the Genetics Society of America. Los Alamos Science Number 20 1992 Understanding Inheritance To create astereoscopic image of DNAfrom the two images on this page, focus your eyes on a distant object above the page and then move the images up into your line of sight, holding the page 12 to 18 inches away and being careful to keep your eyes focused at infinity. If your eyes have not shifted, you should be aware of three images. Concentrate on the middle one, which is the desired stereoscopic image. You may have to practice a few times and should be sure the page and your head are vertical. Number 20 1992 Los Alamos Science