Academia.eduAcademia.edu

Introduction to Genetics

Introduction to Genetics 1 000 Introduction to Genetics • Royal Hemophilia and Romanov DNA • The Importance of Genetics The Role of Genetics in Biology Genetic Variation is the Foundation of Evolution Divisions of Genetics • A Brief History of Genetics Prehistory Early Written Records The Rise of Modern Genetics Twentieth-Century Genetics The Future of Genetics • Basic Concepts in Genetics Alexis, heir to the Russian throne, and his father Tsar Nicholas Romanoff II. (Hulton/Archive by Getty Images.) Royal Hemophilia and Romanov DNA On August 12, 1904, Tsar Nicholas Romanov II of Russia wrote in his diary: “A great never-to-be forgotten day when the mercy of God has visited us so clearly.” That day Alexis, Nicholas’s first son and heir to the Russian throne, had been born. At birth, Alexis was a large and vigorous baby with yellow curls and blue eyes, but at 6 weeks of age he began spontaneously hemorrhaging from the navel. The bleeding persisted for several days and caused great alarm. As he grew and began to walk, Alexis often stumbled and fell, as all children do. Even his small scrapes bled profusely, and minor bruises led to significant internal bleeding. It soon became clear that Alexis had hemophilia. Hemophilia results from a genetic deficiency of blood clotting. When a blood vessel is severed, a complex cascade of reactions swings into action, eventually producing a protein called fibrin. Fibrin molecules stick together to form a clot, which stems the flow of blood. Hemophilia, marked by slow clotting and excessive bleeding, is the result if any one of the factors in the clotting cascade is missing or faulty. In those with hemophilia, life-threatening blood loss can occur with minor injuries, and spontaneous bleeding into joints erodes the bone with crippling consequences. 1 000 Chapter I ◗ 1.1 Hemophilia was passed down through the royal families of Europe. Alexis suffered from classic hemophilia, which is caused by a defective copy of a gene on the X chromosome. Females possess two X chromosomes per cell and may be unaffected carriers of the gene for hemophilia. A carrier has one normal version and one defective version of the gene; the normal version produces enough of the clotting factor to prevent hemophilia. A female exhibits hemophilia only if she inherits two defective copies of the gene, which is rare. Because males have a single X chromosome per cell, if they inherit a defective copy of the gene, they develop hemophilia. Consequently, hemophilia is more common in males than in females. Alexis inherited the hemophilia gene from his mother, Alexandra, who was a carrier. The gene appears to have originated with Queen Victoria of England (1819 – 1901), ( ◗ FIGURE 1.1). One of her sons, Leopold, had hemophilia and died at the age of 31 from brain hemorrhage following a minor fall. At least two of Victoria’s daughters were carriers; through marriage, they spread the hemophilia gene to the royal families of Prussia, Spain, and Russia. In all, 10 of Queen Victoria’s male descendants suffered from hemophilia. Six female descendants, including her granddaughter Alexandra (Alexis’s mother), were carriers. Nicholas and Alexandra constantly worried about Alexis’s health. Although they prohibited his participation in sports and other physical activities, cuts and scrapes were inevitable, and Alexis experienced a number of severe bleeding episodes. The royal physicians were helpless during these crises — they had no treatment that would stop the bleeding. Gregory Rasputin, a monk and self-proclaimed “miracle worker,” prayed over Alexis during one bleeding crisis, after which Alexis made a remarkable recovery. Rasputin then gained considerable influence over the royal family. At this moment in history, the Russian Revolution broke out. Bolsheviks captured the tsar and his family and held them captive in the city of Ekaterinburg. On the night of July 16, 1918, a firing squad executed the royal family and their attendants, including Alexis and his four sisters. Eight days later, a protsarist army fought its way into Ekaterinburg. Although army investigators searched vigorously for the bodies of Nicholas and his family, they found only a few personal effects and a single finger. The Bolsheviks eventually won the revolution and instituted the world’s first communist state. Historians have debated the role that Alexis’s illness may have played in the Russian Revolution. Some have argued that the revolution was successful because the tsar and Alexandra were distracted by their son’s illness and under the influence of Rasputin. Others point out that many factors contributed to the overthrow of the tsar. It is probably naive to attribute the revolution entirely to one sick boy, but it is Introduction to Genetics clear that a genetic defect, passed down through the royal family, contributed to the success of the Russian Revolution. More than 80 years after the tsar and his family were executed, an article in the Moscow News reported the discovery of their skeletons outside Ekaterinburg. The remains had first been located in 1979; however, because of secrecy surrounding the tsar’s execution, the location of the graves was not made public until the breakup of the Soviet government in 1989. The skeletons were eventually recovered and examined by a team of forensic anthropologists, who concluded that they were indeed the remains of the tsar and his wife, three of their five children, and the family doctor, cook, maid, and footman. The bodies of Alexis and his sister Anastasia are still missing. To prove that the skeletons were those of the royal family, mitochondrial DNA (which is inherited only from the mother) was extracted from the bones and amplified with a molecular technique called the polymerase chain reaction (PCR). DNA samples from the skeletons thought to belong to Alexandra and the children were compared with DNA taken from Prince Philip of England, also a direct descendant of Queen Victoria. Analysis showed that mitochondrial DNA from Prince Philip was identical with that from these four skeletons. DNA from the skeleton presumed to be Tsar Nicholas was compared with that of two living descendants of the Romanov line. The samples matched at all but one nucleotide position: the living relatives possessed a cytosine (C) residue at this position, whereas some of the skeletal DNA possessed a thymine (T) residue and some possessed a C. This difference could be due to normal variation in the DNA; so experts concluded that the skeleton was almost certainly that of Tsar Nicholas. The finding remained controversial, however, until July 1994, when the body of Nicholas’s younger brother Georgij, who died in 1899, was exhumed. Mitochondrial DNA from Georgij also contained both C and T at the controversial position, proving that the skeleton was indeed that of Tsar Nicholas. This chapter introduces you to genetics and reviews some concepts that you may have encountered briefly in a preceding biology course. We begin by considering the importance of genetics to each of us, to society at large, and to students of biology. We then turn to the history of genetics, how the field as a whole developed. The final part of the chapter reviews some fundamental terms and principles of genetics that are used throughout the book. There has never been a more exciting time to undertake the study of genetics than now. Genetics is one of the frontiers of science. Pick up almost any major newspaper or news magazine and chances are that you will see something related to genetics: the discovery of cancer-causing genes; the use of gene therapy to treat diseases; or reports of possible hereditary influences on intelligence, personality, and sexual orientation. These findings often have significant economic and ethical implications, making the study of genetics relevant, timely, and interesting. www.whfreeman.com/pierce More information about the history of Nicholas II and other tsars of Russia and about hemophilia The Importance of Genetics Alexis’s hemophilia illustrates the important role that genetics plays in the life of an individual. A difference in one gene, of the 35,000 or so genes that each human possesses, changed Alexis’s life, affected his family, and perhaps even altered history. We all possess genes that influence our lives. They affect our height and weight, our hair color and skin pigmentation. They influence our susceptibility to many diseases and disorders ( ◗ FIGURE 1.2) and even contribute to our intelligence and personality. Genes are fundamental to who and what we are. Although the science of genetics is relatively new, people have understood the hereditary nature of traits and have “practiced” genetics for thousands of years. The rise of agriculture began when humans started to apply genetic principles to the domestication of plants and animals. Today, the major crops and animals used in agriculture have undergone extensive genetic alterations to greatly increase their yields and provide many desirable traits, such as disease and pest 000 000 Chapter I (a) (b) Laron dwarf Susceptibilit to diphtheria Low-tone deafness Limb–girdle dystrophy Diastrophic dysplasia Chromosome 5 ◗ 1.2 Genes influence susceptibility to many diseases and disorders. (a) X-ray of the hand of a person suffering from diastrophic dysplasia (bottom), a hereditary growth disorder that results in curved bones, short limbs, and hand deformities, compared with an X-ray of a normal hand (top). (b) This disorder is due to a defect in a gene on chromosome 5. Other genetic disorders encoded by genes on chromosome 5 also are indicated by braces. (Part a: top, Biophoto Associates/Science Source Photo Researchers; bottom, courtesy of Eric Lander, Whitehead Institute, MIT.) (a) resistance, special nutritional qualities, and characteristics that facilitate harvest. The Green Revolution, which expanded global food production in the 1950s and 1960s, relied heavily on the application of genetics ( ◗ FIGURE 1.3). Today, genetically engineered corn, soybeans, and other crops constitute a significant proportion of all the food produced worldwide. The pharmaceutical industry is another area where genetics plays an important role. Numerous drugs and food additives are synthesized by fungi and bacteria that have been genetically manipulated to make them efficient producers of these substances. The biotechnology industry employs molecular genetic techniques to develop and mass-produce substances of commercial value. Growth hormone, insulin, and clotting factor are now produced commercially by genetically engineered bacteria ( ◗ FIGURE 1.4). Techniques of molecular genetics have also been used to produce bacteria that remove minerals from ore, break down toxic chemicals, and inhibit damaging frost formation on crop plants. Genetics also plays a critical role in medicine. Physicians recognize that many diseases and disorders have a hereditary component, including well-known genetic disorders such as sickle-cell anemia and Huntington disease as well as many common diseases such as asthma, diabetes, and hypertension. Advances in molecular genetics have allowed important insights into the nature of cancer and permitted the development of many diagnostic tests. Gene therapy — the direct alteration of genes to treat human diseases — has become a reality. www.whfreeman.com/pierce Information about biotechnology, including its history and applications (b) ◗ 1.3 The Green Revolution used genetic techniques to develop new strains of crops that greatly increased world food production during the 1950s and 1960s. (a) Norman Borlaug, a leader in the development of new strains of wheat that led to the Green Revolution, and a family in Ghana. Borlaug received the Nobel Peace Prize in 1970. (b) Traditional rice plant (top) and modern,high-yielding rice plant (bottom). (Part a, UPI/Corbis-Bettman; part b, IRRI.) Introduction to Genetics the study of evolution requires an understanding of basic genetics. Developmental biology relies heavily on genetics: tissues and organs form through the regulated expression of genes ( ◗ FIGURE 1.5). Even such fields as taxonomy, ecology, and animal behavior are making increasing use of genetic methods. The study of almost any field of biology or medicine is incomplete without a thorough understanding of genes and genetic methods. Genetic Variation Is the Foundation of Evolution ◗ 1.4 The biotechnology industry uses molecular genetic methods to produce substances of economic value. In the apparatus shown, growth hormone is produced by genetically engineered bacteria. ( James Holmes/Celltech Ltd./Science Photo Library/Photo Researchers.) The Role of Genetics in Biology Although an understanding of genetics is important to all people, it is critical to the student of biology. Genetics provides one of biology’s unifying principles: all organisms use nucleic acids for their genetic material and all encode their genetic information in the same way. Genetics undergirds the study of many other biological disciplines. Evolution, for example, is genetic change taking place through time; so Life on Earth exists in a tremendous array of forms and features that occupy almost every conceivable environment. All life has a common origin (see Chapter 2); so this diversity has developed during Earth’s 4-billion-year history. Life is also characterized by adaptation: many organisms are exquisitely suited to the environment in which they are found. The history of life is a chronicle of new forms of life emerging, old forms disappearing, and existing forms changing. Life’s diversity and adaptation are a product of evolution, which is simply genetic change through time. Evolution is a two-step process: first, genetic variants arise randomly and, then, the proportion of particular variants increases or decreases. Genetic variation is therefore the foundation of all evolutionary change and is ultimately the basis of all life as we know it. Genetics, the study of genetic variation, is critical to understanding the past, present, and future of life. Concepts Heredity affects many of our physical features as well as our susceptibility to many diseases and disorders. Genetics contributes to advances in agriculture, pharmaceuticals, and medicine and is fundamental to modern biology. Genetic variation is the foundation of the diversity of all life. Divisions of Genetics ◗ 1.5 The key to development lies in the regulation of gene expression. This early fruit-fly embryo illustrates the localized production of proteins from two genes, ftz (stained gray) and eve (stained brown), which determine the development of body segments in the adult f ly. (Peter Lawrence, 1992. The Making of a Fly, Blackwell Scientific Publications.) Traditionally, the study of genetics has been divided into three major subdisciplines: transmission genetics, molecular genetics, and population genetics ( ◗ FIGURE 1.6). Also known as classical genetics, transmission genetics encompasses the basic principles of genetics and how traits are passed from one generation to the next. This area addresses the relation between chromosomes and heredity, the arrangement of genes on chromosomes, and gene mapping. Here the focus is on the individual organism — how an individual organism inherits its genetic makeup and how it passes its genes to the next generation. Molecular genetics concerns the chemical nature of the gene itself: how genetic information is encoded, replicated, and expressed. It includes the cellular processes of replication, transcription, and translation — by which genetic information is transferred from one molecule to another — and gene 0005 000 Chapter I (c) (d) Transmission genetics Molecular genetics Population genetics (e) examines the principles of heredity; molecular genetics deals with the gene and the cellular processes by which genetic information is transferred and expressed; population genetics concerns the genetic composition of groups of organisms and how that composition changes over time and space. www.whfreeman.com/pierce genetics Information about careers in A Brief History of Genetics Although the science of genetics is young — almost entirely a product of the past 100 years — people have been using genetic principles for thousands of years. Prehistory ◗ 1.6 Genetics can be subdivided into three interrelated fields. (Top left, Alan Carey/Photo Researchers; top right, MONA file M0214602 tif; bottom, J. Alcock/Visuals Unlimited.) regulation — the processes that control the expression of genetic information. The focus in molecular genetics is the gene — its structure, organization, and function. Population genetics explores the genetic composition of groups of individual members of the same species (populations) and how that composition changes over time and space. Because evolution is genetic change, population genetics is fundamentally the study of evolution. The focus of population genetics is the group of genes found in a population. It is convenient and traditional to divide the study of genetics into these three groups, but we should recognize that the fields overlap and that each major subdivision can be further divided into a number of more specialized fields, such as chromosomal genetics, biochemical genetics, quantitative genetics, and so forth. Genetics can alternatively be subdivided by organism (fruit fly, corn, or bacterial genetics), and each of these organisms can be studied at the level of transmission, molecular, and population genetics. Modern genetics is an extremely broad field, encompassing many interrelated subdisciplines and specializations. Concepts The three major divisions of genetics are transmission genetics, molecular genetics, and population genetics. Transmission genetics The first evidence that humans understood and applied the principles of heredity is found in the domestication of plants and animals, which began between approximately 10,000 and 12,000 years ago. Early nomadic people depended on hunting and gathering for subsistence but, as human populations grew, the availability of wild food resources declined. This decline created pressure to develop new sources of food; so people began to manipulate wild plants and animals, giving rise to early agriculture and the first fixed settlements. Initially, people simply selected and cultivated wild plants and animals that had desirable traits. Archeological evidence of the speed and direction of the domestication process demonstrates that people quickly learned a simple but crucial rule of heredity: like breeds like. By selecting and breeding individual plants or animals with desirable traits, they could produce these same traits in future generations. The world’s first agriculture is thought to have developed in the Middle East, in what is now Turkey, Iraq, Iran, Syria, Jordan, and Israel, where domesticated plants and animals were major dietary components of many populations by 10,000 years ago. The first domesticated organisms included wheat, peas, lentils, barley, dogs, goats, and sheep. Selective breeding produced woollier and more manageable goats and sheep and seeds of cereal plants that were larger and easier to harvest. By 4000 years ago, sophisticated genetic techniques were already in use in the Middle East. Assyrians and Babylonians developed several hundred varieties of date palms that differed in fruit size, color, taste, and time of ripening. An Assyrian bas-relief from 2880 years ago depicts the use of artificial fertilization to control crosses between date palms ( ◗ FIGURE 1.7). Other crops and domesticated animals were developed by cultures in Asia, Africa, and the Americas in the same period. Introduction to Genetics ◗ 1.7 Ancient peoples practiced genetic techniques in agriculture. (Top) Comparison of ancient (left) and modern (right) wheat. (Bottom) Assyrian bas-relief sculpture showing artificial pollination of date palms at the time of King Assurnasirpalli II, who reigned from 883–859 B.C. (Top left and right, IRRI; bottom, Metropolitan Museum of Art, gift of John D. Rockefeller Jr., 1932. Concepts Humans first applied genetics to the domestication of plants and animals between approximately 10,000 and 12,000 years ago. This domestication led to the development of agriculture and fixed human settlements. Early Written Records Ancient writings demonstrate that early humans were aware of their own heredity. Hindu sacred writings dating to 2000 years ago attribute many traits to the father and suggest that differences between siblings can be accounted for by effects from the mother. These same writings advise that one should avoid potential spouses having undesirable traits that might be passed on to one’s children. The Talmud, the Jewish book of religious laws based on oral traditions dating back thousands of years, presents an uncannily accurate understanding of the inheritance of hemophilia. It directs that, if a woman bears two sons who die of bleeding after circumcision, any additional sons that she bears should not be circumcised; nor should the sons of her sisters be circumcised, although the sons of her brothers should. This advice accurately depicts the X-linked pattern of inheritance of hemophilia (discussed further in Chapter 6). The ancient Greeks gave careful consideration to human reproduction and heredity. The Greek physician Alcmaeon (circa 520 B.C.) conducted dissections of animals and proposed that the brain was not only the principle site of perception, but also the origin of semen. This proposal sparked a long philosophical debate about where semen was produced and its role in heredity. The debate culminated in the concept of pangenesis, which proposed that specific particles, later called gemmules, carry information from various parts of the body to the reproductive organs, from where they are passed to the embryo at the moment of conception ( ◗ FIGURE 1.8a). Although incorrect, the concept of pangenesis was highly influential and persisted until the late 1800s. Pangenesis led the ancient Greeks to propose the notion of the inheritance of acquired characteristics, in which traits acquired during one’s lifetime become incorporated into one’s hereditary information and are passed on to 000 000 Chapter I (a) Pangenesis concept (b) Germ–plasm theory 1 According to the pangenesis concept, genetic information from different parts of the body… 1 According to the germ-plasm theory, germ-line tissue in the reproductive organs… 2 …travels to the reproductive organs… 2 …contains a complete set of genetic information… 3 …where it is transferred to the gametes. 3 …that is transferred directly to the gametes. Sperm Sperm Zygote Egg Zygote Egg ◗ 1.8 Pangenesis, an early concept of inheritance, compared with the modern germ-plasm theory. offspring; for example, people who developed musical ability through diligent study would produce children who are innately endowed with musical ability. The notion of the inheritance of acquired characteristics also is no longer accepted, but it remained popular through the twentieth century. The Greek philosopher Aristotle (384 – 322 B.C.) was keenly interested in heredity. He rejected the concepts of both pangenesis and the inheritance of acquired characteristics, pointing out that people sometimes resemble past ancestors more than their parents and that acquired characteristics such as mutilated body parts are not passed on. Aristotle believed that both males and females made contributions to the offspring and that there was a struggle of sorts between male and female contributions. Although the ancient Romans contributed little to the understanding of human heredity, they successfully developed a number of techniques for animal and plant breeding; the techniques were based on trial and error rather than any general concept of heredity. Little new was added to the understanding of genetics in the next 1000 years. The ancient ideas of pangenesis and the inheritance of acquired characteristics, along with techniques of plant and animal breeding, persisted until the rise of modern science in the seventeenth and eighteenth centuries. The Rise of Modern Genetics Dutch spectacle makers began to put together simple microscopes in the late 1500s, enabling Robert Hooke (1653 – 1703) to discover cells in 1665. Microscopes provided naturalists with new and exciting vistas on life, and perhaps it was excessive enthusiasm for this new world of the very small that gave rise to the idea of preformationism. According to preformationism, inside the egg or sperm existed a tiny miniature adult, a homunculus, which simply enlarged during development. Ovists argued that the homunculus resided in the egg, whereas spermists insisted that it was in the sperm ( ◗ FIGURE 1.9). Preformationism meant that all traits would be inherited from only one parent — from the father if the homunculus was in the sperm or from the mother if it was in the egg. Although many observations suggested that offspring possess a mixture of traits from both parents, preformationism remained a popular concept throughout much of the seventeenth and eighteenth centuries. Another early notion of heredity was blending inheritance, which proposed that offspring are a blend, or mixture, Introduction to Genetics The New Genetics ETHICS • SCIENCE • TECHNOLOGY Mapping the Human Genome— Where does it lead, and what does it mean? In June 2000, scientists from the Human Genome Project and Celera Genomics stood at a podium with former President Bill Clinton to announce a stunning achievement— they had successfully constructed a sequence of the entire huan genome. Soon this process of identifying and sequencing each and every human gene became characterized as "mapping the human genome". As with maps of the physical world, the map of the human genome provides a picture of locations, terrains, and structures. But, like explorers, scientists must continue to decipher what each location on the map can tell us about diseases, human health, and biology. The map accelerates this process, as it allows researchers to identify key structural dimensions of the gene they are exploring, and reminds them where they have been and where they have yet to explore. What does the map of the human genome depict? when researchers discuss the sequencing of the genome, they are describing the identification of the patterns and order of the 3 billion human DNA base pairs. While this provides valuable information about overall structure and the evolution of humans in relation to other organisms, researchers really wanted the key information encoded in just 2% of this enormous map—the information that makes most of the proteins that compose you and me. Comprised of DNA, genes are the basic units of heredity; they hold all of the information required to make the proteins that regulate most life functions, from digesting food to battling diseases. Proteins stand as the link between genes and pharmaceutical drug development, they show which genes are being expressed at any given moment, and provide information about gene function. Knowing our genes will lead to greater understanding and radically improved treatment of many diseases. However, sequencing the entire human genome, in conjunction with sequencing of various nonhuman genomes under the same project, has raised fundamental questions about what it means to be human. After all, fruit flies possess about one-third the number of genes as humans, and an ear of corn has approximately the same number of genes as a human! In addition, the overall DNA sequence of a chimpanzee is about 99% the same as the human genome sequence. As the genomes of other species become available, the similarities to the human genome in both structure and sequence pattern will continue to be identified. At a basic level, the discovery of so many commonalities and links and ancestral trees with other species adds credence to principles of evolution and Darwinism. Some of the most anticipated developments and potential benefits of the Human Genome Project directly affect human health; researchers, practicing physicians, and the general public eagerly await the development of targeted pharmaceutical agents and more specific diagnostic tests. Pharmacogenomics is at the intersection of genetics and pharmacology; it is the study of how one's genetic makeup will affect his or her response to various drugs. In the future, medicine will potentially be safer, cheaper, and more disease specific, all while causing fewer side effects and acting more effectively, the first time around. There are however some hard ethical questions that follow in the wake of new genetic knowledge. Patients will have to undergo genetic testing in order to match drugs to their genetic makeup. Who will have access to these result—just the health care practitioner, or the patient's insurance company, employer/school, and/or family members? While the tests were administered for one case, 000 by Arthur L. Caplan and Kelly A. Carroll will the information derived from them be used for other purposes, such as for identification of other conditions/future diseases, or even in research studies? How should researchers conduct studies in pharmacogenomics? Often they need to group study subjects by some kind of identifiabe traits that they believe will assist in separating groups of drugs, and in turn they separate people into populations. The order of almost all of the DNA base pairs (99.9%) is exactly the same in all humans. So, this leaves a small window of difference. There is potential for stigmatization of individuals and groups, of people based on race and ethnicity inherent in genomic research and analysis. As scientists continue drug development, they must be careful to not further such ideas, especially as studies of nuclear DNA indicate that there is often more genetic variation within "races" or cultures, than between "races" or cultures. Stigmatization or discrimination can occur through genetic testing and human subjects research on populations. These are just a few of the ethical issues arising out of one development of the Human Genome Project. The potential applications of genome research are staggering, and the mapping is just the beginning. Realizing this was simply a starting point, the draft sequences of the human genome released in February 2001 by the publicly funded Human Genome Project and the private company, Celera Genomics, are freely available on the Internet. A long road lies ahead, where scientists will be charged with exploring and understanding the functions of and relationships between genes and proteins. With such exploration comes a responsibility to acknowledge and address the ethical, legal, and social implications of this exciting research. 000 Chapter I work set the foundation for the modern study of genetics. Subsequent to his work, a number of other botanists began to experiment with hybridization, including Gregor Mendel (1822 – 1884) ( ◗ FIGURE 1.10), who went on to discover the basic principles of heredity. Mendel’s conclusions, which were unappreciated for 45 years, laid the foundation for our modern understanding of heredity, and he is generally recognized today as the father of genetics. Developments in cytology (the study of cells) in the 1800s had a strong influence on genetics. Robert Brown (1773 – 1858) described the cell nucleus in 1833. Building on the work of others, Matthis Jacob Schleiden (1804 – 1881) and Theodor Schwann (1810 – 1882) proposed the concept of the cell theory in 1839. According to this theory, all life is composed of cells, cells arise only from preexisting cells, and the cell is the fundamental unit of structure and function in living organisms. Biologists began to examine cells to see how traits were transmitted in the course of cell division. Charles Darwin (1809 – 1882), one of the most influential biologists of the nineteenth century, put forth the theory of evolution through natural selection and published his ideas in On the Origin of Species in 1856. Darwin recognized that heredity was fundamental to evolution, and he ◗ 1.9 Preformationism was a popular idea of inheritance in the seventeenth and eighteenth centuries. Shown here is a drawing of a homunculus inside a sperm. (Science VU/Visuals Unlimited.) of parental traits. This idea suggested that the genetic material itself blends, much as blue and yellow pigments blend to make green paint. Once blended, genetic differences could not be separated out in future generations, just as green paint cannot be separated out into blue and yellow pigments. Some traits do appear to exhibit blending inheritance; however, we realize today that individual genes do not blend. Nehemiah Grew (1641 – 1712) reported that plants reproduce sexually by using pollen from the male sex cells. With this information, a number of botanists began to experiment with crossing plants and creating hybrids. Foremost among these early plant breeders was Joseph Gottleib Kölreuter (1733 – 1806), who carried out numerous crosses and studied pollen under the microscope. He observed that many hybrids were intermediate between the parental varieties. Because he crossed plants that differed in many traits, Kölreuter was unable to discern any general pattern of inheritance. In spite of this limitation, Kölreuter’s ◗ 1.10 Gregor Mendel was the founder of modern genetics. Mendel first discovered the principles of heredity by crossing different varieties of pea plants and analyzing the pattern of transmission of traits in subsequent generations. (Hulton/Archive by Getty Images.) Introduction to Genetics conducted extensive genetic crosses with pigeons and other organisms. However, he never understood the nature of inheritance, and this lack of understanding was a major omission in his theory of evolution. In the last half of the nineteenth century, the invention of the microtome (for cutting thin sections of tissue for microscopic examination) and the development of improved histological stains stimulated a flurry of cytological research. Several cytologists demonstrated that the nucleus had a role in fertilization. Walter Flemming (1843 – 1905) observed the division of chromosomes in 1879 and published a superb description of mitosis. By 1885, it was generally recognized that the nucleus contained the hereditary information. Near the close of the nineteenth century, August Weismann (1834 – 1914) finally laid to rest the notion of the inheritance of acquired characteristics. He cut off the tails of mice for 22 consecutive generations and showed that the tail length in descendants remained stubbornly long. Weismann proposed the germ-plasm theory, which holds that the cells in the reproductive organs carry a complete set of genetic information that is passed to the gametes (see Figure 1.8b). Twentieth-Century Genetics The year 1900 was a watershed in the history of genetics. Gregor Mendel’s pivotal 1866 publication on experiments with pea plants, which revealed the principles of heredity, was “rediscovered,” as discussed in more detail in Chapter 3. The significance of his conclusions was recognized, and other biologists immediately began to conduct similar genetic studies on mice, chickens, and other organisms. The results of these investigations showed that many traits indeed follow Mendel’s rules. 1 Cells are removed from the patient. Virus containing functional gene Cells Patient with genetic disease ◗ Walter Sutton (1877 – 1916) proposed in 1902 that genes are located on chromosomes. Thomas Hunt Morgan (1866 – 1945) discovered the first genetic mutant of fruit flies in 1910 and used fruit flies to unravel many details of transmission genetics. Ronald A. Fisher (1890 – 1962), John B. S. Haldane (1892 – 1964), and Sewall Wright (1889 – 1988) laid the foundation for population genetics in the 1930s. Geneticists began to use bacteria and viruses in the 1940s; the rapid reproduction and simple genetic systems of these organisms allowed detailed study of the organization and structure of genes. At about this same time, evidence accumulated that DNA was the repository of genetic information. James Watson (b. 1928) and Francis Crick (b. 1916) described the three-dimensional structure of DNA in 1953, ushering in the era of molecular genetics. By 1966, the chemical structure of DNA and the system by which it determines the amino acid sequence of proteins had been worked out. Advances in molecular genetics led to the first recombinant DNA experiments in 1973, which touched off another revolution in genetic research. Walter Gilbert (b. 1932) and Frederick Sanger (b. 1918) developed methods for sequencing DNA in 1977. The polymerase chain reaction, a technique for quickly amplifying tiny amounts of DNA, was developed by Kary Mullis (b. 1944) and others in 1986. In 1990, gene therapy was used for the first time to treat human genetic disease in the United States ( ◗ FIGURE 1.11), and the Human Genome Project was launched. By 1995, the first complete DNA sequence of a free-living organism — the bacterium Haemophilus influenzae — was determined, and the first complete sequence of a eukaryotic organism (yeast) was reported a year later. At the beginning of the twenty-first century, the human genome sequence was determined, ushering in a new era in genetics. 2 A new or corrected version of a gene is added to the cell, usually with the use of a genetically engineered virus. 3 The cells are then grown in a culture, tested… 4 …and implanted into the patient. 1.11 Gene therapy applies genetic engineering to the treatment of human diseases. ( J. Coate, MDBD/Science VU/Visuals Unlimited.) 000 000 Chapter I The Future of Genetics The information content of genetics now doubles every few years. The genome sequences of many organisms are added to DNA databases every year, and new details about gene structure and function are continually expanding our knowledge of heredity. All of this information provides us with a better understanding of numerous biological processes and evolutionary relationships. The flood of new genetic information requires the continuous development of sophisticated computer programs to store, retrieve, compare, and analyze genetic data and has given rise to the field of bioinformatics, a merging of molecular biology and computer science. In the future, the focus of DNA-sequencing efforts will shift from the genomes of different species to individual differences within species. It is reasonable to assume that each person may some day possess a copy of his or her entire genome sequence. New genetic microchips that simultaneously analyze thousands of RNA molecules will provide information about the activity of thousands of genes in a given cell, allowing a detailed picture of how cells respond to external signals, environmental stresses, and disease states. The use of genetics in the agricultural, chemical, and health-care fields will continue to expand; some predict that biotechnology will be to the twenty-first century what the electronics industry was to the twentieth century. This everwidening scope of genetics will raise significant ethical, social, and economic issues. This brief overview of the history of genetics is not intended to be comprehensive; rather it is designed to provide a sense of the accelerating pace of advances in genetics. In the chapters to come, we will learn more about the experiments and the scientists who helped shape the discipline of genetics. www.whfreeman.com/pierce history of genetics More information about the Concepts Developments in plant hybridization and cytology in the eighteenth and nineteenth centuries laid the foundation for the field of genetics today. After Mendel’s work was rediscovered in 1900, the science of genetics developed rapidly and today is one of the most active areas of science. Basic Concepts in Genetics Undoubtedly, you learned some genetic principles in other biology classes. Let’s take a few moments to review some of these fundamental genetic concepts. Cells are of two basic types: eukaryotic and prokaryotic- Structurally, cells consist of two basic types, although, evolutionarily, the story is more complex (see Chapter 2). Prokaryotic cells lack a nuclear membrane and possess no membranebounded cell organelles, whereas eukaryotic cells are more complex, possessing a nucleus and membranebounded organelles such as chloroplasts and mitochondria. A gene is the fundamental unit of heredity- The precise way in which a gene is defined often varies. At the simplest level, we can think of a gene as a unit of information that encodes a genetic characteristic. We will enlarge this definition as we learn more about what genes are and how they function. Genes come in multiple forms called alleles- A gene that specifies a characteristic may exist in several forms, called alleles. For example, a gene for coat color in cats may exist in alleles that encode either black or orange fur. Genes encode phenotypes- One of the most important concepts in genetics is the distinction between traits and genes. Traits are not inherited directly. Rather, genes are inherited and, along with environmental factors, determine the expression of traits. The genetic information that an individual organism possesses is its genotype; the trait is its phenotype. For example, the A blood type is a phenotype; the genetic information that encodes the blood type A antigen is the genotype. Genetic information is carried in DNA and RNAGenetic information is encoded in the molecular structure of nucleic acids, which come in two types: deoxyribonucleic acid (DNA) and ribonucleic acid (RNA). Nucleic acids are polymers consisting of repeating units called nucleotides; each nucleotide consists of a sugar, a phosphate, and a nitrogenous base. The nitrogenous bases in DNA are of four types (abbreviated A, C, G, and T), and the sequence of these bases encodes genetic information. Most organisms carry their genetic information in DNA, but a few viruses carry it in RNA. The four nitrogenous bases of RNA are abbreviated A, C, G, and U. Genes are located on chromosomes- The vehicles of genetic information within the cell are chromosomes ( ◗ FIGURE 1.12), which consist of DNA and associated proteins. The cells of each species have a characteristic number of chromosomes; for example, bacterial cells normally possess a single chromosome; human cells possess 46; pigeon cells possess 80. Each chromosome carries a large number of genes. Introduction to Genetics ◗ 1.12 Genes are carried on chromosomes. (Biophoto Associates/Science Source/Photo Researchers.) Chromosomes separate through the processes of mitosis and meiosis- The processes of mitosis and meiosis ensure that each daughter cell receives a complete set of an organism’s chromosomes. Mitosis is the separation of replicated chromosomes during the division of somatic (nonsex) cells. Meiosis is the pairing and separation of replicated chromosomes during the division of sex cells to produce gametes (reproductive cells). Genetic information is transferred from DNA to RNA to protein- Many genes encode traits by specifying the structure of proteins. Genetic information is first transcribed from DNA into RNA, and then RNA is translated into the amino acid sequence of a protein. 000 Mutations are permanent, heritable changes in genetic information- Gene mutations affect only the genetic information of a single gene; chromosome mutations alter the number or the structure of chromosomes and therefore usually affect many genes. Some traits are affected by multiple factors- Some traits are influenced by multiple genes that interact in complex ways with environmental factors. Human height, for example, is affected by hundreds of genes as well as environmental factors such as nutrition. Evolution is genetic change- Evolution can be viewed as a two-step process: first, genetic variation arises and, second, some genetic variants increase in frequency, whereas other variants decrease in frequency. www.whfreeman.com/pierce A glossary of genetics terms Connecting Concepts Across Chapters This chapter introduces the study of genetics, outlining its history, relevance, and some fundamental concepts. One of the themes that emerges from our review of the history of genetics is that humans have been interested in, and using, genetics for thousands of years, yet our understanding of the mechanisms of inheritance are relatively new. A number of ideas about how inheritance works have been proposed throughout history, but many of them have turned out to be incorrect. This is to be expected, because science progresses by constantly evaluating and challenging explanations. Genetics, like all science, is a self-correcting process, and thus many ideas that are proposed will be discarded or modified through time. CONCEPTS SUMMARY • Genetics is central to the life of every individual: it influences our physical features, susceptibility to numerous diseases, personality, and intelligence. • Genetics plays important roles in agriculture, the pharmaceutical industry, and medicine. It is central to the study of biology. • Genetic variation is the foundation of evolution and is critical to understanding all life. • The study of genetics can be divided into transmission genetics, molecular genetics, and population genetics. • The use of genetics by humans began with the domestication of plants and animals. • The ancient Greeks developed the concept of pangenesis and the concept of the inheritance of acquired characteristics. Ancient Romans developed practical measures for the breeding of plants and animals. • In the seventeenth century, biologists proposed the idea of preformationism, which suggested that a miniature adult is present inside the egg or the sperm and that a person inherits all of his or her traits from one parent. • Another early idea, blending inheritance, proposed that genetic information blends during reproduction and offspring are a mixture of the parental traits. • By studying the offspring of crosses between varieties of peas, Gregor Mendel discovered the principles of heredity. • Darwin developed the concept of evolution by natural selection in the 1800s, but he was unaware of Mendel’s work and was not able to incorporate genetics into his theory. 000 Chapter I • Developments in cytology in the nineteenth century led to the understanding that the cell nucleus is the site of heredity. • In 1900, Mendel’s principles of heredity were rediscovered. Population genetics was established in the early 1930s, followed closely by biochemical genetics and bacterial and viral genetics. Watson and Crick discovered the structure of DNA in 1953, which stimulated the rise of molecular genetics. • Advances in molecular genetics have led to gene therapy and the Human Genome Project. • Cells come in two basic types: prokaryotic and eukaryotic. • Genetics is the study of genes, which are the fundamental units of heredity. • The genes that determine a trait are termed the genotype; the trait that they produce is the phenotype. • Genes are located on chromosomes, which are made up of nucleic acids and proteins and are partitioned into daughter cells through the process of mitosis or meiosis. • Genetic information is expressed through the transfer of information from DNA to RNA to proteins. • Evolution requires genetic change in populations. IMPORTANT TERMS transmission genetics (p. 5) molecular genetics (p. 5) population genetics (p. 6) pangenesis (p. 7) inheritance of acquired characteristics (p. 7) preformationism (p. 8) blending inheritance (p. 8) cell theory (p. 10) germ-plasm theory (p. 11) COMPREHENSION QUESTIONS Answers to questions and problems preceded by an asterisk will be found at the end of the book. * * * * * 1. Outline some of the ways in which genetics is important to each of us. 2. Give at least three examples of the role of genetics in society today. 3. Briefly explain why genetics is crucial to modern biology. 4. List the three traditional subdisciplines of genetics and summarize what each covers. 5. When and where did agriculture first arise? What role did genetics play in the development of the first domesticated plants and animals? 6. Outline the notion of pangenesis and explain how it differs from the germ-plasm theory. 7. What does the concept of the inheritance of acquired characteristics propose and how is it related to the notion of pangenesis? 8. What is preformationism? What did it have to say about how traits are inherited? 9. Define blending inheritance and contrast it with preformationism. 10. How did developments in botany in the seventeenth and eighteenth centuries contribute to the rise of modern genetics? 11. How did developments in cytology in the nineteenth century contribute to the rise of modern genetics? *12. Who first discovered the basic principles that laid the foundation for our modern understanding of heredity? 13. List some advances in genetics that have occurred in the twentieth century. *14. Briefly define the following terms: (a) gene; (b) allele; (c) chromosome; (d) DNA; (e) RNA; (f) genetics; (g) genotype; (h) phenotype; (i) mutation; (j) evolution. 15. What are the two basic cell types (from a structural perspective) and how do they differ? 16. Outline the relations between genes, DNA, and chromosomes. APPLICATION QUESTIONS AND PROBLEMS * 17. Genetics is said to be both a very old science and a very young science. Explain what is meant by this statement. 18. Find at least one newspaper article that covers some aspect of genetics. Briefly summarize the article. Does this article focus on transmission, molecular, or population genetics? 19. The following concepts were widely believed at one time but are no longer accepted as valid genetic theories. What experimental evidence suggests that these concepts are incorrect and what theories have taken their place? (a) pangenesis; (b) the inheritance of acquired characteristics; (c) preformationism; (d) blending inheritance. Introduction to Genetics 000 CHALLENGE QUESTIONS 20. Describe some of the ways in which your own genetic makeup affects you as a person. Be as specific as you can. 21. Pick one of the following ethical or social issues and give your opinion on this issue. For background information, you might read one of the articles on ethics listed and marked with an asterisk in Suggested Readings at the end of this chapter. (b) Should biotechnology companies be able to patent newly sequenced genes? (c) Should gene therapy be used on people? (d) Should genetic testing be made available for inherited conditions for which there is no treatment or cure? (e) Should governments outlaw the cloning of people? (a) Should a person’s genetic makeup be used in determining his or her eligibility for life insurance? SUGGESTED READINGS Articles on ethical issues in genetics are preceded by an asterisk. *American Society of Human Genetics Board of Directors and the American College of Medical Genetics Board of Directors. 1995. Points to consider: ethical, legal, pyschosocial implications of genetic testing in children. American Journal of Human Genetics 57:1233–1241. An official statement on some of the ethical, legal, and psychological considerations in conducting genetic tests on children by two groups of professional geneticists. Dunn, L. C. 1965. A Short History of Genetics. New York: McGraw-Hill. An excellent history of major developments in the field of genetics. *Friedmann, T. 2000. Principles for human gene therapy studies. Science 287:2163–2165. An editorial that outlines principles that serve as the foundation for clinical gene therapy. Kottak, C. P. 1994. Anthropology: The Exploration of Human Diversity, 6th ed. New York: McGraw-Hill. Contains a summary of the rise of agriculture and initial domestication of plants and animals. Lander, E. S., and R. A. Weinberg. 2000. Genomics: journey to the center of biology. Science 287:1777–1782. A succinct history of genetics and, more specifically, genomics written by two of the leaders of modern genetics. McKusick, V. A. 1965. The royal hemophilia. Scientific American 213(2):88–95. Contains a history of hemophilia in Queen Victoria’s descendants. Massie, R. K. 1967. Nicholas and Alexandra. New York: Atheneum. One of the classic histories of Tsar Nicholas and his family. Massie, R. K. 1995. The Romanovs: The Final Chapter. New York: Random House. Contains information about the finding of the Romanov remains and the DNA testing that verified the identity of the skeletons. *Rosenberg, K., B. Fuller, M. Rothstein, T. Duster, et al. 1997. Genetic information and workplace: legislative approaches and policy challenges. Science 275:1755–1757. Deals with the use of genetic information in employment. *Shapiro, H. T. 1997. Ethical and policy issues of human cloning. Science 277:195 – 196. Discussion of the ethics of human cloning. Stubbe, H. 1972. History of Genetics: From Prehistoric Times to the Rediscovery of Mendel’s Laws. Translated by T. R. W. Waters. Cambridge, MA: MIT Press. A good history of genetics, especially for pre-Mendelian genetics. Sturtevant, A. H. 1965. A History of Genetics. New York: Harper and Row. An excellent history of genetics. *Verma, I. M., and N. Somia. 1997. Gene therapy: promises, problems, and prospects. Nature 389:239–242. An update on the status of gene therapy. 2 Chromosomes and Cellular Reproduction • • The Diversity of Life • Cell Reproduction Basic Cell Types: Structures and Evolutionary Relationships Prokaryotic Cell Reproduction Eukaryotic Cell Reproduction The Cell Cycle and Mitosis • Sexual Reproduction and Genetic Variation Meiosis Consequences of meiosis Meiosis in the Life Cycle of Plants and Animals This is Chapter 2 Opener photo legend. (Art Wolfe/Photo Researchers.) The Diversity of Life More than by any other feature, life is characterized by diversity: 1.4 million species of plants, animals, and microorganisms have already been described, but this number vastly underestimates the total number of species on Earth. Consider the arthropods — insects, spiders, crustaceans, and related animals with hard exoskeletons. About 875,000 arthropods have been described by scientists worldwide. The results of recent studies, however, suggest that as many as 5 million to 30 million species of arthropods may be living in tropical rain forests alone. Furthermore, many species contain numerous genetically distinct populations, and each population contains genetically unique individuals. Despite their tremendous diversity, living organisms have an important feature in common: all use the same genetic system. A complete set of genetic instructions for any organism is its genome, and all genomes are encoded in nucleic acids, either DNA or RNA. The coding system for genomic information also is common to all life — genetic instructions are in the same format and, with rare exceptions, the code words are identical. Likewise, the process 16 by which genetic information is copied and decoded is remarkably similar for all forms of life. This universal genetic system is a consequence of the common origin of living organisms; all life on Earth evolved from the same primordial ancestor that arose between 3.5 billion and 4 billion years ago. Biologist Richard Dawkins describes life as a river of DNA that runs through time, connecting all organisms past and present. That all organisms have a common genetic system means that the study of one organism’s genes reveals principles that apply to other organisms. Investigations of how bacterial DNA is copied (replicated), for example, provides information that applies to the replication of human DNA. It also means that genes will function in foreign cells, which makes genetic engineering possible. Unfortunately, this common genetic system is also the basis for diseases such as AIDS (acquired immune deficiency syndrome), in which viral genes are able to function — sometimes with alarming efficiency — in human cells. This chapter explores cell reproduction and how genetic information is transmitted to new cells. In prokaryotic cells, cell division is relatively simple because a prokaryotic Chromosomes and Cellular Reproduction cell usually possesses only a single chromosome. In eukaryotic cells, multiple chromosomes must be copied and distributed to each of the new cells. Cell division in eukaryotes takes place through mitosis and meiosis, processes that serve as the foundation for much of genetics; so it is essential to understand them well. Grasping mitosis and meiosis requires more than simply memorizing the sequences of events that take place in each stage, although these events are important. The key is to understand how genetic information is apportioned during cell reproduction through a dynamic interplay of DNA synthesis, chromosome movement, and cell division. These processes bring about the transmission of genetic informa- tion and are the bases of similarities and differences between parents and progeny. Basic Cell Types: Structure and Evolutionary Relationships Biologists traditionally classify all living organisms into two major groups, the prokaryotes and the eukaryotes. A prokaryote is a unicellular organism with a relatively simple cell structure ( ◗ FIGURE 2.1). A eukaryote has a compartmentalized cell structure divided by intracellular membranes; eukaryotes may be unicellular or multicellular. ◗ 2.1 Prokaryotic and eukaryotic cells differ in structure. (Left to right: T.J. Beveridge/Visuals Unlimited; W. Baumeister/Science Photo/Library/Photo Researchers; Biophoto Associates/Photo Researchers; G. Murti/Phototake.) 17 18 Chapter 2 Research indicates that dividing life into two major groups, the prokaryotes and eukaryotes, is incorrect. Although similar in cell structure, prokaryotes include at least two fundamentally distinct types of bacteria. These distantly related groups are termed eubacteria (the true bacteria) and archaea (ancient bacteria). An examination of equivalent DNA sequences reveals that eubacteria and archaea are as distantly related to one another as they are to the eukaryotes. Although eubacteria and archaea are similar in cell structure, some genetic processes in archaea (such as transcription) are more similar to those in eukaryotes, and the archaea may actually be evolutionarily closer to eukaryotes than to eubacteria. Thus, from an evolutionary perspective, there are three major groups of organisms: eubacteria, archaea, and eukaryotes. In this book, the prokaryotic – eukaryotic distinction will be used frequently, but important eubacterial – archaeal differences also will be noted. From the perspective of genetics, a major difference between prokaryotic and eukaryotic cells is that a eukaryote has a nuclear envelope, which surrounds the genetic material to form a nucleus and separates the DNA from the other cellular contents. In prokaryotic cells, the genetic material is in close contact with other components of the cell — a property that has important consequences for the way in which genes are controlled. Another fundamental difference between prokaryotes and eukaryotes lies in the packaging of their DNA. In eukaryotes, DNA is closely associated with a special class of proteins, the histones, to form tightly packed chromosomes. This complex of DNA and histone proteins is termed chromatin, which is the stuff of eukaryotic chromosomes ( ◗ FIGURE 2.2). Histone proteins limit the accessibility of enzymes and other proteins that copy and read the DNA but they enable the DNA to fit into the nucleus. Eukaryotic DNA must separate from the histones before the genetic information in the DNA can be accessed. Archaea also have some histone proteins that complex with DNA, but the structure of their chromatin is different from that found in eukaryotes. However, eubacteria do not possess histones, so their DNA does not exist in the highly ordered, tightly packed arrangement found in eukaryotic cells ( ◗ FIGURE 2.3). The copying and reading of DNA are therefore simpler processes in eubacteria. Genes of prokaryotic cells are generally on a single, circular molecule of DNA, the chromosome of the prokaryotic cell. In eukaryotic cells, genes are located on multiple, usually linear DNA molecules (multiple chromosomes). Eukaryotic cells therefore require mechanisms that ensure that a copy of each chromosome is faithfully transmitted to each new cell. This generalization — a single, circular chromosome in prokaryotes and multiple, linear chromosomes in eukaryotes — is not always true. A few bacteria have more than one chromosome, and important bacterial genes are frequently found on other DNA molecules called plasmids. Furthermore, in some eukaryotes, a few genes are located on circular DNA molecules found outside the nucleus (see Chapter 20). Histone proteins DNA Chromatin ◗ 2.2 In eukaryotic cells, DNA is complexed to histone proteins to form chromatin. (a) (b) ◗ 2.3 Prokaryotic DNA (a) is not surrounded by a nuclear membrane nor is the DNA complexed with histone proteins; eukaryotic DNA (b) is complexed to histone proteins to form chromosomes that are located in the nucleus. (Part a, Dr. G. Murti/Science Photo Library/Photo Researchers; Part b, Biophoto Associates/Photo Researchers.) Chromosomes and Cellular Reproduction ◗ 2.4 A virus consists of DNA or RNA surrounded by a protein coat. (Hans Gelderblam/Visuals Unlimited.) Concepts Organisms are classified as prokaryotes or eukaryotes, and prokaryotes comprise archaea and eubacteria. A prokaryote is a unicellular organism that lacks a nucleus, its DNA is not complexed to histone proteins, and its genome is usually a single chromosome. Eukaryotes are either unicellular or multicellular, their cells possess a nucleus, their DNA is complexed to histone proteins, and their genomes consist of multiple chromosomes. Viruses are relatively simple structures composed of an outer protein coat surrounding nucleic acid (either DNA or RNA; ◗ FIGURE 2.4). Viruses are neither cells nor primitive forms of life: they can reproduce only within host cells, which means that they must have evolved after, rather than before, cells. In addition, viruses are not an evolutionarily distinct group but are most closely related to their hosts — the genes of a plant virus are more similar to those in a plant cell than to those in animal viruses, which suggests that viruses evolved from their hosts, rather than from other viruses. The close relationship between the genes of virus and host makes viruses useful for studying the genetics of host organisms. www.whfreeman.com/pierce More information on the diversity of life and the evolutionary relationships among organisms Cell Reproduction For any cell to reproduce successfully, three fundamental events must take place: (1) its genetic information must be copied, (2) the copies of genetic information must be separated from one another, and (3) the cell must divide. All cellular reproduction includes these three events, but the processes that lead to these events differ in prokaryotic and eukaryotic cells. Prokaryotic Cell Reproduction When prokaryotic cells reproduce, the circular chromosome of the bacterium is replicated ( ◗ FIGURE 2.5). The two resulting identical copies are attached to the plasma membrane, which grows and gradually separates the two chromosomes. Finally, a new cell wall forms between the two chromosomes, producing two cells, each with an identical copy of the chromosome. Under optimal conditions, some bacterial cells divide every 20 minutes. At this rate, a single bacterial cell could produce a billion descendants in a mere 10 hours. Eukaryotic Cell Reproduction Like prokaryotic cell reproduction, eukaryotic cell reproduction requires the processes of DNA replication, copy separation, and division of the cytoplasm. However, the presence of multiple DNA molecules requires a more complex mechanism to ensure that one copy of each molecule ends up in each of the new cells. Eukaryotic chromosomes are separated from the cytoplasm by the nuclear envelope. The nucleus was once thought to be a fluid-filled bag in which the chromosomes 19 20 Chapter 2 A prokaryotic cell contains a single circular chromosome attached to the plasma membrane. Bacterium of genes, and the modification of gene products before they leave the nucleus. We will now take a closer look at the structure of eukaryotic chromosomes. Eukaryotic chromosomes Each eukaryotic species has DNA The chromosome replicates. As the plasma membrane grows, the two chromosomes separate. The cell divides. Each new cell has an identical copy of the original chromosome. a characteristic number of chromosomes per cell: potatoes have 48 chromosomes, fruit flies have 8, and humans have 46. There appears to be no special significance between the complexity of an organism and its number of chromosomes per cell. In most eukaryotic cells, there are two sets of chromosomes. The presence of two sets is a consequence of sexual reproduction; one set is inherited from the male parent and the other from the female parent. Each chromosome in one set has a corresponding chromosome in the other set, together constituting a homologous pair ( ◗ FIGURE 2.6). Human cells, for example, have 46 chromosomes, comprising 23 homologous pairs. The two chromosomes of a homologous pair are usually alike in structure and size, and each carries genetic information for the same set of hereditary characteristics. (An exception is the sex chromosomes, which will be discussed in Chapter 4.) For example, if a gene on a particular chromosome encodes a characteristic such as hair color, another gene (called an allele) at the same position on that chromosome’s homolog also encodes hair color. However, these two alleles need not be identical: one might produce red hair and the other might produce blond hair. Thus, most cells carry two sets of genetic information; these cells are diploid. But not all eukaryotic cells are diploid: reproductive cells (such as eggs, sperm, and spores) and even nonreproductive cells in some organisms may contain a single set of chromosomes. Cells with a single set of chromosomes are haploid. Haploid cells have only one copy of each gene. Concepts ◗ 2.5 Prokaryotic cells reproduce by simple division. (Micrograph Lee D, Simon/Photo Researchs.) Cells reproduce by copying and separating their genetic information and then dividing. Because eukaryotes possess multiple chromosomes, mechanisms exist to ensure that each new cell receives one copy of each chromosome. Most eukaryotic cells are diploid, and their two chromosomes sets can be arranged in homologous pairs. Haploid cells contain a single set of chromosomes. Chromosome structure The chromosomes of eukaryotic floated, but we now know that the nucleus has a highly organized internal scaffolding called the nuclear matrix. This matrix consists of a network of protein fibers that maintains precise spatial relations among the nuclear components and takes part in DNA replication, the expression cells are larger and more complex than those found in prokaryotes, but each unreplicated chromosome nevertheless consists of a single molecule of DNA. Although linear, the DNA molecules in eukaryotic chromosomes are highly folded and condensed; if stretched out, some human chromosomes Chromosomes and Cellular Reproduction (a) Humans have 23 pairs of chromosomes, including the sex chromosomes, X and Y. Males are XY, females are XX. (b) A diploid organism has two sets of chromosomes organized as homologous pairs. Allele A Allele a These two versions of a gene code for a trait such as hair color. ◗ 2.6 Diploid eukaryotic cells have two sets of chromosomes. (a) A set of chromosomes from a human cell. (b) The chromosomes are present in homologous pairs, which consist of chromosomes that are alike in size and structure and carry information for the same characteristics. (Courtesy of Dr. Thomas Ried and Dr. Evelin Schrock.) would be several centimeters long — thousands of times longer than the span of a typical nucleus. To package such a tremendous length of DNA into this small volume, each DNA molecule is coiled again and again and tightly packed around histone proteins, forming the rod-shaped chromosomes. Most of the time the chromosomes are thin and difficult to observe but, before cell division, they condense further into thick, readily observed structures; it is at this stage that chromosomes are usually studied ( ◗ FIGURE 2.7). A functional chromosome has three essential elements: a centromere, a pair of telomeres, and origins of replication. The centromere is the attachment point for spindle microtubules, which are the filaments responsible for moving chromosomes during cell division. The centromere appears as a constricted region that often stains less strongly than does the rest of the chromosome. Before cell division, a protein complex called the kinetochore assembles on the centromere, to which spindle microtubules later attach. Chromosomes without a centromere cannot be drawn into the newly formed nuclei; these chromosomes are lost, often with catastrophic consequences to the cell. On the basis of the location of the centromere, chromosomes are classified into four types: metacentric, submetacentric, acrocentric, and telocentric ( ◗ FIGURE 2.8). One of the two arms of a chromosome (the short arm of a submetacentric or acrocentric chromosome) is designated by the letter p and the other arm is designated by q. Telomeres are the natural ends, the tips, of a linear chromosome (see Figure 2.7); they serve to stabilize the chromosome ends. If a chromosome breaks, producing new ends, these ends have a tendency to stick together, and the chromosome is degraded at the newly broken ends. Telomeres provide chromosome stability. The results of research (discussed in Chapter 12) suggest that telomeres also participate in limiting cell division and may play important roles in aging and cancer. Origins of replication are the sites where DNA synthesis begins; they are not easily observed by microscopy. Their structure and function will be discussed in more detail in Chapters 11 and 12. In preparation for cell division, each At times, a chromosome consists of a single chromatid… …at other times, it consists of two (sister) chromatids. The telomeres are the stable ends of chromosomes. Telomere Centromere Two (sister) chromatids Kinetochore Spindle microtubules Telomere One chromosome One chromosome The centromere is a constricted region of the chromosome where the kinetochore forms and the spindle microtubules attach. ◗ 2.7 Structure of a eukaryotic chromosome. 21 Chapter 2 Concepts Metacentric Sister chromatids are copies of a chromosome held together at the centromere. Functional chromosomes contain centromeres, telomeres, and origins of replication. The kinetochore is the point of attachment for the spindle microtubules; telomeres are the stabilizing ends of a chromosome; origins of replication are sites where DNA synthesis begins. Submetacentric The Cell Cycle and Mitosis Acrocentric Telocentric ◗ 2.8 Eukaryotic chromosomes exist in four major types. (L. Lisco, D. W. Fawcett/Visuals Unlimited.) chromosome replicates, making a copy of itself. These two initially identical copies, called sister chromatids, are held together at the centromere (see Figure 2.7). Each sister chromatid consists of a single molecule of DNA. The cell cycle is the life story of a cell, the stages through which it passes from one division to the next ( ◗ FIGURE 2.9). This process is critical to genetics because, through the cell cycle, the genetic instructions for all characteristics are passed from parent to daughter cells. A new cycle begins after a cell has divided and produced two new cells. A new cell metabolizes, grows, and develops. At the end of its cycle, the cell divides to produce two cells, which can then undergo additional cell cycles. The cell cycle consists of two major phases. The first is interphase, the period between cell divisions, in which the cell grows, develops, and prepares for cell division. The second is M phase (mitotic phase), the period of active cell division. M phase includes mitosis, the process of nuclear division, and cytokinesis, or cytoplasmic division. Let’s take a closer look at the details of interphase and M phase. 1 During G1, the cell grows. 7 Mitosis and cytokinesis (cell division) takes place in M phase. G2/M checkpoint G2 4 In S, DNA duplicates. ◗ s to G1 is M phase: nuclear and cell division 6 After the G2/M checkpoint, the cell can divide. 5 In G2, the cell prepares for mitosis. 2 Cells may enter G0, a nondividing phase. Cytokinesis M i 22 Interphase: cell growth G0 G1/S checkpoint 3 After the G1/S checkpoint, the cell is committed to dividing. S 2.9 The cell cycle consists of interphase (a period of cell growth) and M phase (the period of nuclear and cell division). Chromosomes and Cellular Reproduction Interphase Interphase is the extended period of growth and development between cell divisions. Although little activity can be observed with a light microscope, the cell is quite busy: DNA is being synthesized, RNA and proteins are being produced, and hundreds of biochemical reactions are taking place. By convention, interphase is divided into three phases: G1, S, and G2 (see Figure 2.9). Interphase begins with G1 (for gap 1). In G1, the cell grows, and proteins necessary for cell division are synthesized; this phase typically lasts several hours. There is a critical point in the cell cycle, termed the G1/S checkpoint, in G1; after this checkpoint has been passed, the cell is committed to divide. Before reaching the G1/S checkpoint, cells may exit from the active cell cycle in response to regulatory signals and pass into a nondividing phase called G0 (see Figure 2.9), which is a stable state during which cells usually maintain a constant size. They can remain in G0 for an extended period of time, even indefinitely, or they can reenter G1 and the active cell cycle. Many cells never enter G0; rather, they cycle continuously. After G1, the cell enters the S phase (for DNA synthesis), in which each chromosome duplicates. Although the cell is committed to divide after the G1/S checkpoint has been passed, DNA synthesis must take place before the cell can proceed to mitosis. If DNA synthesis is blocked (with drugs or by a mutation), the cell will not be able to undergo mitosis. Before S phase, each chromosome is composed of one chromatid; following S phase, each chromosome is composed of two chromatids. After the S phase, the cell enters G2 (gap 2). In this phase, several additional biochemical events necessary for cell division take place. The important G2/M checkpoint is reached in G2; after this checkpoint has been passed, the cell is ready to divide and enters M phase. Although the length of interphase varies from cell type to cell type, a typical dividing mammalian cell spends about 10 hours in G1, 9 hours in S, and 4 hours in G2 (see Figure 2.9). Throughout interphase, the chromosomes are in a relatively relaxed, but by no means uncoiled, state, and individual chromosomes cannot be seen with the use of a microscope. This condition changes dramatically when interphase draws to a close and the cell enters M phase. Mphase M phase is the part of the cell cycle in which the copies of the cell’s chromosomes (sister chromatids) are separated and the cell undergoes division. A critical process in M phase is the separation of sister chromatids to provide a complete set of genetic information for each of the resulting cells. Biologists usually divide M phase into six stages: the five stages of mitosis (prophase, prometaphase, metaphase, anaphase, and telophase) and cytokinesis ( ◗ FIGURE 2.10). It’s important to keep in mind that M phase is a continuous process, and its separation into these six stages is somewhat artificial. During interphase, the chromosomes are relaxed and are visible only as diffuse chromatin, but they condense dur- ing prophase, becoming visible under a light microscope. Each chromosome possesses two chromatids because the chromosome was duplicated in the preceding S phase. The mitotic spindle, an organized array of microtubules that move the chromosomes in mitosis, forms. In animal cells, the spindle grows out from a pair of centrosomes that migrate to opposite sides of the cell. Within each centrosome is a special organelle, the centriole, which is also composed of microtubules. (Higher plant cells do not have centrosomes or centrioles, but they do have mitotic spindles). Disintegration of the nuclear membrane marks the start of prometaphase. Spindle microtubules, which until now have been outside the nucleus, enter the nuclear region. The ends of certain microtubules make contact with the chromosome and anchor to the kinetochore of one of the sister chromatids; a microtubule from the opposite centrosome then attaches to the other sister chromatid, and so each chromosome is anchored to both of the centrosomes. The microtubules lengthen and shorten, pushing and pulling the chromosomes about. Some microtubules extend from each centrosome toward the center of the spindle but do not attach to a chromosome. During metaphase, the chromosomes arrange themselves in a single plane, the metaphase plate, between the two centrosomes. The centrosomes, now at opposite ends of the cell with microtubules radiating outward and meeting in the middle of the cell, center at the spindle pole. Anaphase begins when the sister chromatids separate and move toward opposite spindle poles. After the chromatids have separated, each is considered a separate chromosome. Telophase is marked by the arrival of the chromosomes at the spindle poles. The nuclear membrane re-forms around each set of chromosomes, producing two separate nuclei within the cell. The chromosomes relax and lengthen, once again disappearing from view. In many cells, division of the cytoplasm (cytokinesis) is simultaneous with telophase. The major features of the cell cycle are summarized in Table 2.1. Concepts The active cell-cycle phases are interphase and M phase. Interphase consists of G1, S, and G2. In G1, the cell grows and prepares for cell division; in the S phase, DNA synthesis takes place; in G2, other biochemical events necessary for cell division take place. Some cells enter a quiescent phase called G0. M phase includes mitosis and cytokinesis and is divided into prophase, prometaphase, metaphase, anaphase, and telophase. www.whfreeman.com/pierce Mitosis animations, tutorials, and pictures of dividing cells Movement of Chromosomes in Mitosis Each microtubule of the spindle is composed of subunits of a protein called tubulin, and each microtubule has direction 23 24 Chapter 2 ◗ 2.10 The cell cycle is divided into stages. (Photos © Andrew S. Bajer, University of Oregon.) Table 2.1 Features of the cell cycle Stage Major Features G0 phase Stable, nondividing period of variable length Interphase G1 phase Growth and development of the cell; G1/S checkpoint S phase Synthesis of DNA G2 phase Preparation for division; G2/S checkpoint M phase Prophase Chromosomes condense and mitotic spindle forms Prometaphase Nuclear envelope disintegrates, spindle microtubules anchor to kinetochores Metaphase Chromosomes align on the metaphase plate Anaphase Sister chromatids separate, becoming individual chromosomes that migrate toward spindle poles Telophase Chromosomes arrive at spindle poles, the nuclear envelope re-forms, and the condensed chromosomes relax Cytokinesis Cytoplasm divides; cell wall forms in plant cells Chromosomes and Cellular Reproduction or polarity. Like a flashlight battery, one end is referred to as plus () and the other end as minus (). The “” end is always oriented toward the centrosome, and the “” end is always oriented away from the centrosome; microtubules lengthen and shorten by the addition and removal of subunits primarily at the “” end. At one time, chromosomes were viewed as passive carriers of genetic information that were pushed about by the active spindle microtubules. Research findings now indicate that chromosomes actively control and generate the forces responsible for their movement in the course of mitosis and meiosis. Chromosome movement is accomplished through complex interactions between the kinetochore of the chromosome and the microtubules of the spindle apparatus. The forces responsible for the poleward movement of chromosomes during anaphase are generated at the kinetochore itself but are not completely understood. Located within each kinetochore are specialized proteins called molecular motors, which may help pull a chromosome toward the spindle pole ( ◗ FIGURE 2.11). The poleward force is created by the removal of the tubulin primarily at the “” end of the microtubule. In mitosis, deploymerization of tubulin and perhaps also molecular motors pull the chromosome toward the pole, but this force is initially counterbalanced by the attachment of the two chromatids. Throughout prophase, prometaphase, and metaphase, the sister chromatids are held together by a gluelike material called cohesion. The cohesion material breaks down at the onset of anaphase, allowing the two chromatids to separate and the resulting newly formed chromosomes to move toward the spindle pole. While the chromosomal microtubules shorten, other microtubules elongate, pushing the two spindle poles farther apart. As the chromosomes near the spindle poles, they contract to form a compact mass. In spite of much study, the precise role of the poles, kinetochores, and microtubules in the formation and function of the spindle apparatus is still incompletely understood. Genetic consequences of the cell cycle What are the genetically important results of the cell cycle? From a single cell, the cell cycle produces two cells that contain the same genetic instructions. These two cells are identical with each other and with the cell that gave rise to them. They are identical because DNA synthesis in S phase creates an exact copy of each DNA molecule, giving rise to two genetically 25 26 Chapter 2 1 Spindle microtubules are composed of tubulin subunits, which are polar. Tubulin subunits + – Centrosome + + – – – + + + – – – – – – + + The + end of the microtubule is oriented away from the centrosome… + + …and the – end is oriented toward the centrosome. Microtubules lengthen and shorten primarily at the + end. Chromosome Tubulin subunits Kinetochore Motor protein + Microtubule – Centrosome 2 Molecular motor proteins on the chromosome kinetochore move along the microtubule… 3 …and, as they do, tubulin subunits are removed from the positive end… + – Centrosome 4 …and the chromosome pulls itself toward the centrosome. ◗ 2.11 Removal of the tublin subunits from microtubules at the kinetochore and perhaps molecular motors, are responsible for the poleward movement of chromosomes during anaphase. identical sister chromatids. Mitosis then ensures that one chromatid from each replicated chromosome passes into each new cell. Another genetically important result of the cell cycle is that each of the cells produced contains a full complement of chromosomes — there is no net reduction or increase in chromosome number. Each cell also contains approximately half the cytoplasm and organelle content of the original parental cell, but no precise mechanism analogous to mitosis ensures that organelles are evenly divided. Consequently, not all cells resulting from the cell cycle are identical in their cytoplasmic content. Control of the cell cycle For many years, the biochemical events that controlled the progression of cells through the cell cycle were completely unknown, but research has now revealed many of the details of this process. Progression of the cell cycle is regulated at several checkpoints, which ensure that all cellular components are present and in good working order before the cell proceeds to the next stage. The checkpoints are necessary to prevent cells with damaged or missing chromosomes from proliferating. One important checkpoint mentioned earlier, the G1/S checkpoint, comes just before the cell enters into S phase and replicates its DNA. When this point has been passed, DNA replicates and the cell is committed to divide. A second critical checkpoint, called the G2/M checkpoint, is at the end of G2, before the cell enters mitosis. Both the G1/S and the G2/M checkpoints are regulated by a mechanism in which two proteins interact. The concentration of the first protein, cyclin, oscillates during the cell cycle ( ◗ FIGURE 2.12a). The second protein, cyclindependent kinase (CDK), cannot function unless it is bound to cyclin. Cyclins and CDKs are called by different names in different organisms, but here we will use the terms applied to these molecules in yeast. Let’s begin by looking at the G2/M checkpoint. This checkpoint is regulated by cyclin B, which combines with CDK to form M-phase promoting factor (MPF). After MPF is formed, it must be activated by the addition of a phosphate group to one of the amino acids of CDK ( ◗ FIGURE 2.12b). Whereas the amount of cyclin B changes throughout the cell cycle, the amount of CDK remains constant. During G1, cyclin B levels are low; so the amount of MPF also is low (see Figure 2.12a). As more cyclin B is produced, it combines with CDK to form increasing amounts of MPF. Near the end of G2, the amount of active MPF reaches a critical level, which commits the cell to divide. The MPF concentration continues to increase, reaching a peak in mitosis (see Figure 2.12a). The active form of MPF is a protein kinase, an enzyme that adds phosphate groups to certain other proteins. Active MPF brings about many of the events associated with mitosis, such as nuclear-membrane breakdown, spindle formation, and chromosome condensation. At the end of metaphase, cyclin is abruptly degraded, which lowers the amount of MPF and, initiating anaphase, sets in motion a chain of events that ultimately brings mitosis to a close 27 Chromosomes and Cellular Reproduction (a) Interphase Interphase Mitosis G2 sis ito M sis ito M G1 G2 G1 sis ito M G1 G2 sis ito M G1 G2 S G1 G1 G2 S S Level of active MPF (M-phase promoting factor) Cyclin B accumulates throughout interphase. Near the end of G2, active MPF reaches a critical level, which causes the cell to progress through the G2/M checkpoint and into mitosis. Degradation of cyclin B near the end of mitosis causes the active MPF level to drop, and the cell reenters interphase. Increasing levels of cyclin B during interphase combine with CDK to produce increasing levels of inactive MPF. Breakdown of nuclear envelope, chromosome condensation, spindle assembly (b) Active B MPF P CDK G2/M checkpoint Near the end of interphase, activating factors add phosphate groups (P) to MPF, producing active MPF, which brings about the breakdown of the nuclear envelope, chromosome condensation, spindle assembly, and other events associated with M phase. Near the end of metaphase, cyclin B degradation lowers the amount of active MPF, which brings about anaphase, telophase, cytokinesis, and eventually interphase. Cyclin B degradation is tos Mi M phase: nuclear and cell division Activating factors (phosphorylation) G1/S checkpoint G1 P Inactive B CDK MPF Increasing B cyclin B sis ito M G1 G2 S S Level of cyclin B sis ito M G1 G2 S Mitosis G2 sis ito M G1 G2 S Interphase S Dephosphorylation G2 Interphase: cell growth CDK S ◗ 2.12 Progression through the cell cycle is regulated by cyclins and CDKs. Shown here is regulation of the G2/M checkpoint in yeast. (see Figure 2.12b). Ironically, active MPF brings about its own demise by destroying cyclin. In brief, high levels of active MPF stimulate mitosis, and low levels of MPF bring a return to interphase conditions. A number of factors stimulate the synthesis of cyclin B and the activation of MPF, whereas other factors inhibit MPF. Together these factors determine whether the cell passes through the G2/M checkpoint and ensure that mitosis is not initiated until conditions are appropriate for cell division. For example, DNA damage inhibits the activation of MPF; the cell is arrested in G2 and does not undergo division. The G1/S checkpoint is regulated in a similar manner. In fission yeast (Shizosaccharomyces pombe), the same CDK is used, but it combines with G1 cyclins. Again, the level of CDK remains relatively constant, whereas the level of G1 cyclins increases throughout G1. When the activated CDK – G1– cyclin complex reaches a critical concentration, proteins necessary for replication are activated and the cell enters S phase. 28 Chapter 2 Many cancers are caused by defects in the cell cycle’s regulatory machinery. For example, mutation in the gene that encodes cyclin D, which has a role in the human G1/S checkpoint, contributes to the rise of B-cell lymphoma. The overexpression of this gene is associated with both breast and esophageal cancer. Likewise, the tumor-suppressor gene p53, which is mutated in about 75% of all colon cancers, regulates a potent inhibitor of CDK activity. Concepts The cell cycle produces two genetically identical cells, with no net change in chromosome number. Progression through the cell cycle is controlled at checkpoints, which are regulated by interactions between cyclins and cyclin-dependent kinases. Connecting Concepts Counting Chromosomes and DNA Molecules The relations among chromosomes, chromatids, and DNA molecules frequently cause confusion. At certain times, chromosomes are unreplicated; at other times, each possesses two chromatids (see Figure 2.7b). Chromosomes sometimes consist of a single DNA molecule; at other Number of chromosomes per cell times, they consist of two DNA molecules. How can we keep track of the number of these structures in the cell cycle? There are two simple rules for counting chromosomes and DNA molecules: (1) to determine the number of chromosomes, count the number of functional centromeres; (2) to determine the number of DNA molecules, count the number of chromatids. Let’s examine a hypothetical cell as it passes through the cell cycle ( ◗ FIGURE 2.13). At the beginning of G1, this diploid cell has a complete set of four chromosomes, inherited from its parent cell. Each chromosome consists of a single chromatid — a single DNA molecule — so there are four DNA molecules in the cell during G1. In S phase, each DNA molecule is copied. The two resulting DNA molecules combine with histones and other proteins to form sister chromatids. Although the amount of DNA doubles during S phase, the number of chromosomes remains the same, because the two sister chromatids share a single functional centromere. At the end of S phase, this cell still contains four chromosomes, each with two chromatids; so there are eight DNA molecules present. Through prophase, prometaphase, and metaphase, the cell has four chromosomes and eight DNA molecules. At anaphase, however, the sister chromatids separate. Each now has its own functional centromere, and so each is considered a separate chromosome. Until cytokinesis, each cell contains eight chromosomes, each consisting of a single chromatid; G1 S G2 Prophase and prometaphase Metaphase Anaphase Telophase and cytokinesis 4 4 4 4 4 8 4 8 Number of DNA molecules per cell 4 0 ◗ 2.13 The number of chromosomes and DNA molecules changes in the course of the cell cycle. The number of chromosomes per cell equals the number of functional centromeres, and the number of DNA molecules per cell equals the number of chromatids. Chromosomes and Cellular Reproduction thus, there are still eight DNA molecules present. After cytokinesis, the eight chromosomes (eight DNA molecules) are distributed equally between two cells; so each new cell contains four chromosomes and four DNA molecules, the number present at the beginning of the cell cycle. MEIOSIS I Sexual Reproduction and Genetic Variation If all reproduction were accomplished through the cell cycle, life would be quite dull, because mitosis produces only genetically identical progeny. With only mitosis, you, your children, your parents, your brothers and sisters, your cousins, and many people you didn’t even know would be clones — copies of one another. Only the occasional mutation would introduce any genetic variability. This is how all organisms reproduced for the first 2 billion years of Earth’s existence (and the way in which some organisms still reproduce today). Then, some 1.5 billion to 2 billion years ago, something remarkable evolved: cells that produce genetically variable offspring through sexual reproduction. The evolution of sexual reproduction is one of the most significant events in the history of life. As will be discussed in Chapters 22 and 23, the pace of evolution depends on the amount of genetic variation present. By shuffling the genetic information from two parents, sexual reproduction greatly increases the amount of genetic variation and allows for accelerated evolution. Most of the tremendous diversity of life on Earth is a direct result of sexual reproduction. Sexual reproduction consists of two processes. The first is meiosis, which leads to gametes in which chromosome number is reduced by half. The second process is fertilization, in which two haploid gametes fuse and restore chromosome number to its original diploid value. Meiosis The words mitosis and meiosis are sometimes confused. They sound a bit alike, and both include chromosome division and cytokinesis. Don’t let this deceive you. The outcomes of mitosis and meiosis are radically different, and several unique events that have important genetic consequences take place only in meiosis. How is meiosis different from mitosis? Mitosis consists of a single nuclear division and is usually accompanied by a single cell division. Meiosis, on the other hand, consists of two divisions. After mitosis, chromosome number in newly formed cells is the same as that in the original cell, whereas meiosis causes chromosome number in the newly formed cells to be reduced by half. Finally, mitosis produces genetically identical cells, whereas meiosis produces genetically variable cells. Let’s see how these differences arise. Like mitosis, meiosis is preceded by an interphase stage that includes G1, S, and G2 phases. Meiosis consists of two distinct phases: meiosis I and meiosis II, each of which MEIOSIS II n Reduction division Mitotic division 2n n n ◗ 2.14 Meiosis includes two cell divisions. In this figure, the original cell is 2n4. After two meiotic divisions each resulting cell 1n2. includes a cell division. The first division is termed the reduction division because the number of chromosomes per cell is reduced by half ( ◗ FIGURE 2.14). The second division is sometimes termed the equational division because the events in this phase are similar to those of mitosis. However, meiosis II differs from mitosis in that chromosome number has already been halved in meiosis I, and the cell does not begin with the same number of chromosomes as it does in mitosis (see Figure 2.14). The stages of meiosis are outlined in ◗ FIGURE 2.15. During interphase, the chromosomes are relaxed and visible as diffuse chromatin. Prophase I is a lengthy stage, divided into five substages ( ◗ FIGURE 2.16). In leptotene, the chromosomes contract and become visible. In zygotene, the chromosomes continue to condense; homologous chromosomes begin to pair up and begin synapsis, a very close pairing association. Each homologous pair of synapsed chromosomes consists of four chromatids called a bivalent or tetrad. In pachytene, the chromosomes become shorter and thicker, and a three-part synaptonemal complex develops between homologous chromosomes. Crossing over takes place, in which homologous chromosomes exchange genetic information. The centromeres of the paired chromosomes move apart during diplotene; the two homologs remain attached at each chiasma (plural, chiasmata), which is the result of crossing over. In diakinesis, chromosome condensation continues, and the chiasmata move toward the ends of the chromosomes as the strands slip apart; so the homologs remained paired only at the tips. Near the end of prophase I, the nuclear membrane breaks down and the spindle forms. 29 30 Chapter 2 Metaphase I is initiated when homologous pairs of chromosomes align along the metaphase plate (see Figure 2.15). A microtubule from one pole attaches to one chromosome of a homologous pair, and a microtubule from the other pole attaches to the other member of the pair. Anaphase I is marked by the separation of homologous chromosomes. The two chromosomes of a homologous pair are pulled toward opposite poles. Although the homol- ogous chromosomes separate, the sister chromatids remain attached and travel together. In telophase I, the chromosomes arrive at the spindle poles and the cytoplasm divides. The period between meiosis I and meiosis II is interkinesis, in which the nuclear membrane re-forms around the chromosomes clustered at each pole, the spindle breaks down, and the chromosomes relax. These cells then pass through Prophase II, in which these events are reversed: the Chromosomes and Cellular Reproduction ◗ 2.15 Meiosis is divided into stages. (Photos © C. A. Hasen kampf/BPS.) chromosomes recondense, the spindle re-forms, and the nuclear envelope once again breaks down. In interkinesis in some types of cells, the chromosomes remain condensed, and the spindle does not break down. These cells move directly from cytokinesis into metaphase II, which is similar to metaphase of mitosis: the individual chromosomes line up on the metaphase plate, with the sister chromatids facing opposite poles. In anaphase II, the kinetochores of the sister chromatids separate and the chromatids are pulled to opposite poles. Each chromatid is now a distinct chromosome. In telophase II, the chromosomes arrive at the spindle poles, a nuclear envelope re-forms around the chromosomes, and the cytoplasm divides. The chromosomes relax and are no longer visible. The major events of meiosis are summarized in Table 2.2. 31 32 Chapter 2 Crossing over Chromosomes pair Leptotene Zygotene Synaptonemal complex Chiasmata Pachytene Synaptonemal complexes Diplotene Bivalent or tetrad Diakinesis Chiasmata ◗ 2.16 Crossing over takes place in prophase I. In yeast, rough pairing of chromosomes begins in leptotene and continues in zygotene. The synaptonemal complex forms in pachytene. Crossing over is initiated in zygotene, before the synaptonemal complex develops, and is not completed until near the end of prophase I. Table 2.2 Major events in each stage of meiosis Stage Major Events Meiosis I Prophase I Chromosomes condense, homologous pairs of chromosomes synapse, crossing over takes place, nuclear envelope breaks down, and mitotic spindle forms Metaphase I Homologous pairs of chromosomes line up on the metaphase plate Anaphase I The two chromosomes (each with two chromatids) of each homologous pair separate and move toward opposite poles Telophase I Chromosomes arrive at the spindle poles Cytokinesis The cytoplasm divides to produce two cells, each having half the original number of chromosomes Interkinesis In some cells the spindle breaks down, chromosomes relax, and a nuclear envelope re-forms, but no DNA synthesis takes place Meiosis II Prophase II * Chromosomes condense, the spindle forms, and the nuclear envelope disintegrates Metaphase II Individual chromosomes line up on the metaphase plate Anaphase II Sister chromatids separate and migrate as individual chromosomes toward the spindle poles Telophase II Chromosomes arrive at the spindle poles; the spindle breaks down and a nuclear envelope re-forms Cytokinesis The cytoplasm divides *Only in cells in which the spindle has broken down, chromosomes have relaxed, and the nuclear envelope has re-formed in telophase I. Other types of cells skip directly to metaphase II after cytokinesis. 33 Chromosomes and Cellular Reproduction Consequences of Meiosis that, after crossing over has taken place, the two sister chromatids are no longer identical — one chromatid has alleles A and B, whereas its sister chromatid (the chromatid that underwent crossing over) has alleles a and B. Likewise, one chromatid of the other chromosome has alleles a and b, and the other has alleles A and b. Each of the four chromatids now carries a unique combination of alleles: A B, a B, A b, and a b. Eventually, the two homologous chromosomes separate, each going into a different cell. In meiosis II, the two chromatids of each chromosome separate, and thus each of the four cells resulting from meiosis carries a different combination of alleles ( ◗ FIGURE 2.17d). The second process of meiosis that contributes to genetic variation is the random distribution of chromosomes in anaphase I of meiosis following their random alignment during metaphase I. To illustrate this process, consider a cell with three pairs of chromosomes I, II, and III ( ◗ FIGURE 2.18a). One chromosome of each pair is maternal in origin (Im, IIm, and IIIm); the other is paternal in origin (Ip, IIp, and IIIp). The chromosome pairs line up in the center of the cell in metaphase I and, in anaphase I, the chromosomes of each homologous pair separate. How each pair of homologs aligns and separates is random and independent of how other pairs of chromosomes align and separate ( ◗ FIGURE 2.18b). By chance, all the maternal chromosomes might migrate to one side, with all the paternal chromosomes migrating to the other. After division, one cell would contain chromosomes Im, IIm, and IIIm, and the other, Ip, IIp, and IIIp. Alternatively, the Im, IIm, and IIIp chromosomes might move to one side, and the Ip, IIp, and IIIm chromosomes to the other. The different migrations would produce different combinations of chromosomes in the resulting cells ( ◗ FIGURE 2.18c). There are four ways in which a diploid cell with three pairs of chromosomes can divide, producing a total of eight different What are the overall consequences of meiosis? First, meiosis comprises two divisions; so each original cell produces four cells (there are exceptions to this generalization, as, for example, in many female animals; see Figure 2.22b). Second, chromosome number is reduced by half; so cells produced by meiosis are haploid. Third, cells produced by meiosis are genetically different from one another and from the parental cell. Genetic differences among cells result from two processes that are unique to meiosis. The first is crossing over, which takes place in prophase I. Crossing over refers to the exchange of genes between nonsister chromatids (chromatids from different homologous chromosomes). At one time, this process was thought to take place in pachytene (Figure 2.15b), and the synaptonemal complex was believed to be a requirement for crossing over. However, recent evidence from yeast suggests that the situation is more complex, as shown in Figure 2.16. Crossing over is initiated in zygotene, before the synaptonemal complex develops, and is not completed until near the end of prophase I. After crossing over has taken place, the sister chromatids may no longer be identical. Crossing over is the basis for intrachromosomal recombination, creating new combinations of alleles on a chromatid. To see how crossing over produces genetic variation, consider two pairs of alleles, which we will abbreviate Aa and Bb. Assume that one chromosome possesses the A and B alleles and its homolog possesses the a and b alleles ( ◗ FIGURE 2.17a). When DNA is replicated in the S stage, each chromosome duplicates, and so the resulting sister chromatids are identical ( ◗ FIGURE 2.17b). In the process of crossing over, breaks occur in the DNA strands and the breaks are repaired in such a way that segments of nonsister chromatids are exchanged ( ◗ FIGURE 2.17c). The molecular basis of this process will be described in more detail in Chapter 12; the important thing here is 1 One chromosome possesses the A and B genes… 2 …and the homologous chromosome possesses the a and b genes. (a) 3 DNA replication during S phase produces identical sister chromatids. 4 During crossing over in prophase I, segments of nonsister chromatids are exchanged. (b) A a B b A Aa a Crossing over Bb b (d) A B a (c) DNA synthesis B 5 After meiosis I and II, each of the resulting cells carries a unique combination of genes. A aA a B Bb b Meiosis I and II B A b a ◗ 2.17 Crossing over produces genetic variation. b 34 Chapter 2 (a) (b) 1 This cell has three homologous pairs of chromosomes. 2 One of each pair is maternal in origin (Im, IIm, IIIm)… II m Im III m II p Ip III p 3 …and the other is paternal (Ip, IIp, IIIp). Im DNA replication III p Gametes I m II m III m I m II m III m III p I p II p III p I p II p III p Im Ip I m II m III p I m II m III p II m II p III p III m I p II p III m I p II p III m Im Ip I m II p III p I m II p III p II p II m III p III m I p II m III m I p II m III m Im Ip I m II p III m I m II p III m II p II m III m III p I p II m III p I p II m III p Im Ip II m II p III m II m III m II p (c) Ip 4 There are four possible ways for the three pairs to align in metaphase I. ◗ 2.18 Genetic variation is produced through the random distribution of chromosomes in meiosis. In this example, the cell shown possesses three homologous pairs of chromosomes. combinations of chromosomes in the gametes. In general, the number of possible combinations is 2n, where n equals the number of homologous pairs. As the number of chromosome pairs increases, the number of combinations quickly becomes very large. In humans, who have 23 pairs of chromosomes, there are 8,388,608 different combinations of chromosomes possible from the random separation of homologous chromosomes. Through the random distribution of chromosomes in anaphase I, alleles located on different chromosomes are sorted into different combinations. The genetic consequences of this process, termed independent assortment, will be explored in more detail in Chapter 3. In summary, crossing over shuffles alleles on the same homologous chromosomes into new combinations, whereas the random distribution of maternal and paternal chromosomes shuffles alleles on different chromosomes into new combinations. Together, these two processes are capable of producing tremendous amounts of genetic variation among the cells resulting from meiosis. Conclusion: Eight different combinations of chromosomes in the gametes are possible, depending on how the chromosomes align and separate in meiosis I and II. Concepts Meiosis consists of two distinct divisions: meiosis I and meiosis II. Meiosis (usually) produces four haploid cells that are genetically variable. The two processes responsible for genetic variation are crossing over and the random distribution of maternal and paternal chromosomes. www.whfreeman.com/pierce meiosis A tutorial and animations of Connecting Concepts Comparison of Mitosis and Meiosis Now that we have examined the details of mitosis and meiosis, let’s compare the two processes ( ◗ FIGURE 2.19). In both mitosis and meiosis, the chromosomes contract and 35 Chromosomes and Cellular Reproduction Mitosis Parent cell (2n) Prophase Anaphase Metaphase Two daughter cells, each 2n 2n Individual chromosomes align on the metaphase plate. 2n Chromatids separate. Meiosis Parent cell (2n) Prophase I Crossing over takes place. Metaphase I Anaphase Homologous pairs of chromosomes align on the metaphase plate. Interkinesis Pairs of chromosomes separate. Metaphase II Anaphase II Four daughter cells, each n n n ◗ Individual chromosomes align. Chromatids separate. 2.19 Comparison of mitosis and meiosis (female, ; male, ). become visible; both processes include the movement of chromosomes toward the spindle poles, and both are accompanied by cell division. Beyond these similarities, the processes are quite different. Mitosis entails a single cell division and usually produces two daughter cells. Meiosis, in contrast, comprises two cell divisions and usually produces four cells. In diploid cells, homologous chromosomes are present before both meiosis and mitosis, but the pairing of homologs takes place only in meiosis. Another difference is that, in meiosis, chromosome number is reduced by half in anaphase I, but no chromosome reduction takes place in mitosis. Furthermore, meiosis is characterized by two processes that produce genetic variation: crossing over (in prophase I) and the random distribution of maternal and paternal chromosomes (in anaphase I). There are normally no equivalent processes in mitosis. Mitosis and meiosis also differ in the behavior of chromosomes in metaphase and anaphase. In metaphase I of meiosis, homologous pairs of chromosomes line up on the metaphase plate, whereas individual chromosomes line up on the metaphase plate in metaphase of mitosis (and metaphase II of meiosis). In anaphase I of meiosis, paired chromosomes separate, and each of the chromosomes that migrate toward a pole possesses two chromatids attached at the centromere. In contrast, in anaphase of mitosis (and anaphase II of meiosis), chromatids separate, and each chromosome that moves toward a spindle pole consists of a single chromatid. Meiosis in the Life Cycle of Plants and Animals The overall result of meiosis is four haploid cells that are genetically variable. Let’s now see where meiosis fits into the life cycle of a multicellular plant and a multicellular animal. Sexual reproduction in plants Most plants have a complex life cycle that includes two distinct generations (stages): the diploid sporophyte and the haploid gametophyte. These two stages alternate; the sporophyte produces haploid spores through meiosis, and the gametophyte produces haploid gametes through mitosis ( ◗ FIGURE 2.20). This type of life cycle is sometimes called alternation of generations. In this cycle, the immediate products of meiosis n n 36 Chapter 2 1 Through meiosis, the diploid (2n) sporophyte produces haploid (1n) spores, which become the gametophyte. gamete Mitosis Spores gamete 2 Through mitosis, the gametophytes produce haploid gametes… Gametophyte (haploid, n ) Fertilization Meiosis Sporophyte (diploid, 2n ) Zygote 3 …that fuse during fertilization to form a diploid zygote. Mitosis 4 Through mitosis, the zygote becomes the diploid sporophyte. ◗ 2.20 Plants alternate between diploid and haploid life stages. are called spores, not gametes; the spores undergo one or more mitotic divisions to produce gametes. Although the terms used for this process are somewhat different from those commonly used in regard to animals (and from some of those employed so far in this chapter), the processes in plants and animals are basically the same: in both, meiosis leads to a reduction in chromosome number, producing haploid cells. In flowering plants, the sporophyte is the obvious, vegetative part of the plant; the gametophyte consists of only a few haploid cells within the sporophyte. The flower, which is part of the sporophyte, contains the reproductive structures. In some plants, both male and female reproductive structures are found in the same flower; in other plants, they exist in different flowers. In either case, the male part of the flower, the stamen, contains diploid reproductive cells called microsporocytes, each of which undergoes meiosis to produce four haploid microspores ( ◗ FIGURE 2.21a). Each microspore divides mitotically, producing an immature pollen grain consisting of two haploid nuclei. One of these nuclei, called the tube nucleus, directs the growth of a pollen tube. The other, termed the generative nucleus, divides mitotically to produce two sperm cells. The pollen grain, with its two haploid nuclei, is the male gametophyte. The female part of the flower, the ovary, contains diploid cells called megasporocytes, each of which undergoes meiosis to produce four haploid megaspores ( ◗ FIGURE 2.21b), only one of which survives. The nucleus of the surviving megaspore divides mitotically three times, producing a total of eight haploid nuclei that make up the female gametophyte, the embryo sac. Division of the cytoplasm then produces separate cells, one of which becomes the egg. When the plant flowers, the stamens open and release pollen grains. Pollen lands on a flower’s stigma — a sticky platform that sits on top of a long stalk called the style. At the base of the style is the ovary. If a pollen grain germinates, it grows a tube down the style into the ovary. The two sperm cells pass down this tube and enter the embryo sac ( ◗ FIGURE 2.21c). One of the sperm cells fertilizes the egg cell, producing a diploid zygote, which develops into an embryo. The other sperm cell fuses with two nuclei enclosed in a single cell, giving rise to a 3n (triploid) endosperm, which stores food that will be used later by the embryonic plant. These two fertilization events are termed double fertilization. Concepts In the stamen of a flowering plant, meiosis produces haploid microspores that divide mitotically to produce haploid sperm in a pollen grain. Within the ovary, meiosis produces four haploid megaspores, only one of which divides mitotically three times to produce eight haploid nuclei. During pollination, one sperm fertilizes the egg cell, producing a diploid zygote; the other fuses with two nuclei to form the endosperm. Chromosomes and Cellular Reproduction (a) 37 (b) Pistil Stamen Ovary Microsporocyte (diploid) 1 In the stamen, diploid microsporocytes undergo meiosis… Flower Megasporocyte (diploid) 6 Diploid megasporocytes undergo meiosis… Diploid, 2n Meiosis Meiosis Haploid, 1n 2 …to produce four haploid microspores. Four megaspores (haploid) Four microspores (haploid) 7 …to produce four haploid megaspores, but only one survives. Only one survives 3 Each undergoes mitosis to produce a pollen grain with two haploid nuclei. Mitosis Haploid generative nucleus 4 The tube nucleus directs the growth of a pollen tube. 8 The surviving megaspore divides mitotically three times,… Mitosis 2 nuclei Pollen grain Haploid tube nucleus 4 nuclei Mitosis 9 …to produce eight haploid nuclei. Pollen tube 5 The generative nucleus divides mitotically to produce two sperm cells. 8 nuclei 10 The cytoplasm divides, producing separate cells,… Two haploid sperm cells Division of cytoplasm Tube nucleus Ovum 11 …one of which becomes the egg. Egg Binucleate cell Sperm (c) Embryo sac Double fertilization Endosperm, (triploid, 3n) 16 The other sperm cell fuses with the binucleate cell to form triploid endosperm. Binucleate cell 14 Double fertilization takes place when the two sperm cells of a pollen grain enter the embryo sac. 15 One sperm cell fertilizes the egg cell, producing a diploid zygote. Embryo (diploid, 2n) ◗ 2.21 Sexual reproduction in flowering plants. 12 Two of the nuclei become enclosed within the same cell… 13 …and the other nuclei are partitioned into separate cells. 38 Chapter 2 (b) Female gametogenesis (oogenesis) (a) Male gametogenesis (spermatogenesis) Spermatogonia in the testis can undergo repeated rounds of mitosis, producing more spermatogonia. Oogonia in the ovary may either undergo repeated rounds of mitosis, producing additional oogonia, or… Spermatogonium (2n) Oogonium (2n) A spermatogonium may enter prophase I, becoming a primary spermatocyte. …enter prophase I, becoming primary oocytes. Primary spermatocyte (2n) Primary oocyte (2n) Each primary spermatocyte completes meiosis I, producing two secondary spermatocytes… Secondary spermatocytes (1n) Each primary oocyte completes meiosis I, producing a large secondary oocyte and a smaller polar body, which disintegrates. Secondary oocyte (1n) The secondary oocyte completes meiosis II, producing an ovum and a second polar body, which also disintegrates. …that then undergo meiosis II to produce two haploid spermatids each. Spermatids (1n) Maturation First polar body Ovum (1n) Second polar body Spermatids mature into sperm. Sperm Fertilization Zygote (2n) ◗ 2.22 A sperm and ovum fuse at fertilization to produce a diploid zygote. Gamete formation in animals. Meiosis in animals The production of gametes in a male animal (spermatogenesis) takes place in the testes. There, diploid primordial germ cells divide mitotically to produce diploid cells called spermatogonia ( ◗ FIGURE 2.22a). Each spermatogonium can undergo repeated rounds of mitosis, giving rise to numerous additional spermatogonia. Alternatively, a spermatogonium can initiate meiosis and enter into prophase I. Now called a primary spermatocyte, the cell is still diploid because the homologous chromosomes have not yet separated. Each primary spermatocyte completes meiosis I, giving rise to two haploid secondary spermatocytes that then undergo meiosis II, with each producing two haploid spermatids. Thus, each primary spermatocyte produces a total of four haploid spermatids, which mature and develop into sperm. The production of gametes in the female (oogenesis) begins much like spermatogenesis. Diploid primordial germ cells within the ovary divide mitotically to produce oogonia ( ◗ FIGURE 2.22b). Like spermatogonia, oogonia can undergo repeated rounds of mitosis or they can enter into meiosis. Once in prophase I, these still-diploid cells are called primary oocytes. Each primary oocyte completes meiosis I and divides. Here the process of oogenesis begins to differ from that of spermatogenesis. In oogenesis, cytokinesis is unequal: most of the cytoplasm is allocated to one of the two haploid cells, the secondary oocyte. The smaller cell, which contains half of the chromosomes but only a small part of the cytoplasm, is called the first polar body; it may or may not divide further. The secondary oocyte completes meiosis II, and again cytokinesis is unequal — most of the cytoplasm passes into one of the cells. The larger cell, which acquires most of the cytoplasm, is the ovum, the mature female gamete. The smaller cell is the second polar body. Only the ovum is capable of being fertilized, and the polar bodies usually disintegrate. Oogenesis, then, produces a single mature gamete from each primary oocyte. Chromosomes and Cellular Reproduction We have now examined the place of meiosis in the sexual cycle of two organisms, a flowering plant and a typical multicellular animal. These cycles are just two of the many variations found among eukaryotic organisms. Although the cellular events that produce reproductive cells in plants and animals differ in the number of cell divisions, the number of haploid gametes produced, and the relative size of the final products, the overall result is the same: meiosis gives rise to haploid, genetically variable cells that then fuse during fertilization to produce diploid progeny. Concepts In the testes, a diploid spermatogonium undergoes meiosis, producing a total of four haploid sperm cells. In the ovary, a diploid oogonium undergoes meiosis to produce a single large ovum and smaller polar bodies that normally disintegrate. Connecting Concepts Across Chapters This chapter focused on the processes that bring about cell reproduction, the starting point of all genetics. We have examined four major concepts: (1) the differences that exist in the organization and packaging of genetic material 39 in prokaryotic and eukaryotic cells; (2) the cell cycle and its genetic results; (3) meiosis, its genetic results, and how it differs from mitosis of the cell cycle; and (4) how meiosis fits into the reproductive cycles of plants and animals. Several of the concepts presented in this chapter serve as an important foundation for topics in other chapters of this book. The fundamental differences in the organization of genetic material of prokaryotes and eukaryotes are important to keep in mind as we explore the molecular functioning of DNA. The presence of histone proteins in eukaryotes affects the way that DNA is copied (Chapter 12) and read (Chapter 13). The direct contact between DNA and cytoplasmic organelles in prokaryotes and the separation of DNA by the nuclear envelope in eukaryotes have important implications for gene regulation (Chapter 16) and the way that gene products are modified before they are translated into proteins (Chapter 14). The smaller amount of DNA per cell in prokaryotes also affects the organization of genes on chromosomes (Chapter 11). A critical concept in this chapter is meiosis, which serves as the cellular basis of genetic crosses in most eukaryotic organisms. It is the basis for the rules of inheritance presented in Chapters 3 through 6 and provides a foundation for almost all of the remaining chapters of this book. CONCEPTS SUMMARY • A prokaryotic cell possesses a simple structure, with no nuclear envelope and usually a single, circular chromosome. A eukaryotic cell possesses a more complex structure, with a nucleus and multiple linear chromosomes consisting of DNA complexed to histone proteins. • Cell reproduction requires the copying of the genetic material, separation of the copies, and cell division. • In a prokaryotic cell, the single chromosome replicates, and each copy attaches to the plasma membrane; growth of the plasma membrane separates the two copies, which is followed by cell division. • In eukaryotic cells, reproduction is more complex than in prokaryotic cells, requiring mitosis and meiosis to ensure that a complete set of genetic information is transferred to each new cell. • In eukaryotic cells, chromosomes are typically found in homologous pairs. • Each functional chromosome consists of a centromere, a telomere, and multiple origins of replication. Centromeres are the points at which kinetochores assemble and to which microtubules attach. Telomeres are the stable ends of chromosomes. After a chromosome is copied, the two copies remain attached at the centromere, forming sister chromatids. • The cell cycle consists of the stages through which a eukaryotic cell passes between cell divisions. It consists of: (1) interphase, in which the cell grows and prepares for division and (2) M phase, in which nuclear and cell division take place. M phase consists of mitosis, the process of nuclear division, and cytokinesis, the division of the cytoplasm. • Interphase begins with G1, in which the cell grows and synthesizes proteins necessary for cell division, followed by S phase, during which the cell’s DNA is replicated. The cell then enters G2, in which additional biochemical events necessary for cell division take place. Some cells exit G1 and enter a nondividing state called G0. • M phase consists of prophase, prometaphase, metaphase, anaphase, telophase, and cytokinesis. In these stages, the chromosomes contract, the nuclear membrane breaks down, and the spindle forms. The chromosomes line up in the center of the cell. Sister chromatids separate and become independent chromosomes, which then migrate to opposite ends of the cell. The nuclear membrane reforms around chromosomes at each end of the cell, and the cytoplasm divides. • The usual result of mitosis is the production of two genetically identical cells. 40 Chapter 2 • Progression through the cell cycle is controlled by interactions between cyclins and cyclin-dependent kinases. • Sexual reproduction produces genetically variable progeny and allows for accelerated evolution. It includes meiosis, in which haploid sex cells are produced, and fertilization, the fusion of sex cells. Meiosis includes two cell divisions. In meiosis I, crossing over occurs and homologous chromosomes separate. In meiosis II, chromatids separate. • The usual result of meiosis is the production of four haploid cells that are genetically variable. • Genetic variation in meiosis is produced by crossing over and by the random distribution of maternal and paternal chromosomes. • In plants, diploid microsporocytes in the stamens undergo meiosis, each microsporocyte producing four haploid microspores. Each microspore divides mitotically to produce two haploid sperm cells. In the ovary, diploid megasporocytes undergo meiosis, each megasporocyte producing four haploid macrospores, only one of which survives. The surviving megaspore divides mitotically three times to produce eight haploid nuclei, one of which forms the egg. During pollination, one sperm fertilizes the egg cell and the other fuses with two haploid nuclei to form a 3n endosperm. • In animals, diploid spermatogonia initiate meiosis and become diploid primary spermatocytes, which then complete meiosis I, producing two haploid secondary spermatocytes. Each secondary spermatocyte undergoes meiosis II, producing a total of four haploid sperm cells from each primary spermatocyte. Diploid oogonia in the ovary enter meiosis and become diploid primary oocytes, each of which then completes meiosis I, producing one large haploid secondary oocyte and one small haploid polar body. The secondary oocyte completes meiosis II to produce a large haploid ovum and a smaller second polar body. IMPORTANT TERMS genome (p. 16) prokaryote (p. 17) eukaryote (p. 17) eubacteria (p. 18) archaea (p. 18) nucleus (p. 18) histone (p. 18) chromatin (p. 18) homologous pair (p. 20) diploid (p. 20) haploid (p. 20) telomere (p. 21) origin of replication (p. 21) sister chromatid (p. 22) cell cycle (p. 22) interphase (p. 22) M phase (p. 22) mitosis (p. 22) cytokinesis (p. 22) prophase (p. 23) prometaphase (p. 23) metaphase (p. 23) anaphase (p. 23) telophase (p. 23) meiosis (p. 29) fertilization (p. 29) prophase I (p. 29) synapsis (p. 29) bivalent (p. 29) tetrad (p. 29) crossing over (p. 29) metaphase I (p. 30) anaphase I (p. 30) telophase I (p. 30) interkinesis (p. 30) prophase II (p. 30) metaphase II (p. 31) anaphase II (p. 31) telophase II (p. 31) recombination (p. 33) microsporocyte (p. 36) microspore (p. 36) megasporocyte (p. 36) megaspore (p. 36) spermatogenesis (p. 38) spermatogonium (p. 38) primary spermatocyte (p. 38) secondary spermatocyte (p. 38) spermatid (p. 38) oogenesis (p. 38) oogonium (p. 38) primary oocyte (p. 38) secondary oocyte (p. 38) first polar body (p. 38) ovum (p. 38) second polar body (p. 38) Worked Problems 1. A student examines a thin section of an onion root tip and records the number of cells that are in each stage of the cell cycle. She observes 94 cells in interphase, 14 cells in prophase, 3 cells in prometaphase, 3 cells in metaphase, 5 cells in anaphase, and 1 cell in telophase. If the complete cell cycle in an onion root tip requires 22 hours, what is the average duration of each stage in the cycle? Assume that all cells are in active cell cycle (not G0). • Solution This problem is solved in two steps. First, we calculate the proportions of cells in each stage of the cell cycle, which correspond to the amount of time that an average cell spends in each stage. For example, if cells spend 90% of their time in interphase, then, at any given moment, 90% of the cells will be in interphase. The second step is to convert the proportions into lengths of time, which is done by multiplying the proportions by the total time of the cell cycle (22 hours). Step 1. Calculate the proportion of cells at each stage. The proportion of cells at each stage is equal to the number of cells found in that stage divided by the total number of cells examined: Interphase Prophase Prometaphase Metaphase Anaphase Telophase 120  0.783 120  0.117 3 120  0.025 3 120  0.025 5 120  0.042 1 120  0.008 94 14 We can check our calculations by making sure that the proportions sum to 1.0, which they do. Chromosomes and Cellular Reproduction Step 2. Determine the average duration of each stage. To determine the average duration of each stage, multiply the proportion of cells in each stage by the time required for the entire cell cycle: Interphase Prophase Prometaphase Metaphase Anaphase Telophase 0.783  22 hours  17.23 hours 0.117  22 hours  2.57 hours 0.025  22 hours  0.55 hour 0.025  22 hours  0.55 hour 0.042  22 hours  0.92 hour 0.008  22 hours  0.18 hour 2. A cell in G1 of interphase has 8 chromosomes. How many chromosomes and how many DNA molecules will be found per cell as this cell progresses through the following stages: G2, metaphase of mitosis, anaphase of mitosis, after cytokinesis in mitosis, metaphase I of meiosis, metaphase II of meiosis, and after cytokinesis of meiosis II? • Solution Remember the rules that we learned about counting chromosomes and DNA molecules: (1) to determine the number of chromosomes, count the functional centromeres; and (2) to determine the number of DNA molecules, count the chromatids. Think carefully about when and how the numbers of chromosomes and DNA molecules change in the course of mitosis and meiosis. The number of DNA molecules increases only in S phase, when DNA replicates; the number of DNA molecules decreases only when the cell divides. Chromosome number increases only when sister chromatids separate in anaphase of mitosis and anaphase II of meiosis (homologous chromosomes, not chromatids, separate in anaphase I of meiosis). Chromosome number, like the number of DNA molecules, is reduced only by cell division. Let us now apply these principles to the problem. A cell in G1 has 8 chromosomes, each consisting of a single chromatid; so 8 DNA molecules are present in G1. DNA replicates in S stage; so, in G2, 16 DNA molecules are present per cell. However, the two copies of each DNA molecule remain attached at the centromere; so there are still only 8 chromosomes present. As the cell passes through prophase and metaphase of the cell cycle, the number of chromosomes and DNA molecules remains the same; so, at metaphase, there are 16 DNA molecules and 8 chromosomes. In anaphase, the chromatids separate and each becomes an independent chromosome; at this point, the number 41 of chromosomes increases from 8 to 16. This increase is temporary, lasting only until the cell divides in telophase or subsequent to it. The number of DNA molecules remains at 16 in anaphase. The number of DNA molecules and chromosomes per cell is reduced by cytokinesis after telophase, because the 16 chromosomes and DNA molecules are now distributed between two cells. Therefore, after cytokinesis, each cell has 8 DNA molecules and 8 chromosomes, the same numbers that were present at the beginning of the cell cycle. Now, let’s trace the numbers of DNA molecules and chromosomes through meiosis. At G1, there are 8 chromosomes and 8 DNA molecules. The number of DNA molecules increases to 16 in S stage, but the number of chromosomes remains at 8 (each chromosome has two chromatids). The cell therefore enters metaphase I with 16 DNA molecules and 8 chromosomes. In anaphase I of meiosis, homologous chromosomes separate, but the number of chromosomes remains at 8. After cytokinesis, the original 8 chromosomes are distributed between two cells; so the number of chromosomes per cell falls to 4 (each with two chromatids). The original 16 DNA molecules also are distributed between two cells; so the number of DNA molecules per cell is 8. There is no DNA synthesis during interkinesis, and each cell still maintains 4 chromosomes and 8 DNA molecules through metaphase II. In anaphase II, the two chromatids of each chromosome separate, temporarily raising the number of chromosomes per cell to 8, whereas the number of DNA molecules per cell remains at 8. After cytokinesis, the chromosomes and DNA molecules are again distributed between two cells, providing 4 chromosomes and 4 DNA molecules per cell. These results are summarized in the following table: Number of chromosomes Stage per cell 8 G1 G2 8 Metaphase of mitosis 8 Anaphase of mitosis 16 After cytokinesis of mitosis 8 Metaphase I of meiosis 8 Metaphase II of meiosis 4 After cytokinesis of meiosis II 4 Number of DNA molecules per cell 8 16 16 16 8 16 8 4 COMPREHENSION QUESTIONS 1. All organisms have the same universal genetic system. What are the implications of this universal genetic system? 2. Why are the viruses that infect mammalian cells useful for studying the genetics of mammals? * 3. List three fundamental events that must take place in cell reproduction. 4. Outline the process by which prokaryotic cells reproduce. 5. Name three essential structural elements of a functional eukaryotic chromosome and describe their functions. * 6. Sketch and label four different types of chromosomes based on the position of the centromere. 7. List the stages of interphase and the major events that take place in each stage. * 8. List the stages of mitosis and the major events that take place in each stage. 42 Chapter 2 * 9. What are the genetically important results of the cell cycle? 10. Why are the two cells produced by the cell cycle genetically identical? 11. What are checkpoints? What two general classes of compounds regulate progression through the cell cycles? 12. What are the stages of meiosis and what major events take place in each stage? *13. What are the major results of meiosis? 14. What two processes unique to meiosis are responsible for genetic variation? At what point in meiosis do these processes take place? * 15. List similarities and differences between mitosis and meiosis. Which differences do you think are most important and why? 16. Outline the process by which male gametes are produced in plants. Outline the process of female gamete formation in plants. 17. Outline the process of spermatogenesis in animals. Outline the process of oogenesis in animals. APPLICATION QUESTIONS AND PROBLEMS 18. A certain species has three pairs of chromosomes: an acrocentric pair, a metacentric pair, and a submetacentric pair. Draw a cell of this species as it would appear in metaphase of mitosis. 19. A biologist examines a series of cells and counts 160 cells in interphase, 20 cells in prophase, 6 cells in prometaphase, 2 cells in metaphase, 7 cells in anaphase, and 5 cells in telophase. If the complete cell cycle requires 24 hours, what is the average duration of M phase in these cells? Of metaphase? *20. A cell in G1 of interphase has 12 chromosomes. How many chromosomes and DNA molecules will be found per cell when this original cell progresses to the following stages? (a) G2 of interphase (b) Metaphase I of meiosis (c) Prophase of mitosis (d) Anaphase I of meiosis (e) Anaphase II of meiosis (f) Prophase II of meiosis (g) After cytokinesis following mitosis (h) After cytokinesis following meiosis II *21. All of the following cells, shown in various stages of mitosis and meiosis, come from the same rare species of plant. What is the diploid number of chromosomes in this plant? Give the names of each stage of mitosis or meiosis shown. (a) G2 23. * 24. *25. 26. 22. A cell has 1x amount of DNA in G1 of interphase. How much DNA (in multiples or fractions of x) will be present per cell at the following stages? (b) Anaphase of mitosis (c) Prophase II of meiosis (d) After cytokinesis associated with meiosis II A cell in prophase II of meiosis contains 12 chromosomes. How many chromosomes would be present in a cell from the same organism if it were in prophase of mitosis? Prophase I of meiosis? The fruit fly Drosophila melanogaster has four pairs of chromosomes, whereas the house fly Musca domestica has six pairs of chromosomes. Other things being equal, in which species would you expect to see more genetic variation among the progeny of a cross? Explain your answer. A cell has two pairs of submetacentric chromosomes, which we will call chromosomes Ia, Ib, IIa, and IIb (chromosomes Ia and Ib are homologs, and chromosomes IIa and IIb are homologs). Allele M is located on the long arm of chromosome Ia, and allele m is located at the same position on chromosome Ib. Allele P is located on the short arm of chromosome Ia, and allele p is located at the same position on chromosome Ib. Allele R is located on chromosome IIa and allele r is located at the same position on chromosome IIb. (a) Draw these chromosomes, labeling genes M, m, P, p, R, and r, as they might appear in metaphase I of meiosis. Assume that there is no crossing over. (b) Considering the random separation of chromosomes in anaphase I, draw the chromosomes (with labeled genes) present in all possible types of gametes that might result from this cell going through meiosis. Assume that there is no crossing over. A horse has 64 chromosomes and a donkey has 62 chromosomes. A cross between a female horse and a male donkey produces a mule, which is usually sterile. How many chromosomes does a mule have? Can you think of any reasons for the fact that most mules are sterile? Chromosomes and Cellular Reproduction 43 CHALLENGE QUESTIONS 27. Suppose that life exists elsewhere in the universe. All life must contain some type of genetic information, but alien genomes might not consist of nucleic acids and have the same features as those found in the genomes of life on Earth. What do you think might be the common features of all genomes, no matter where they exist? 28. On average, what proportion of the genome in the following pairs of humans would be exactly the same if no crossing over occurred? (For the purposes of this question only, we will ignore the special case of the X and Y sex chromosomes and assume that all genes are located on nonsex chromosomes.) (a) Father and child (b) Mother and child (c) Two full siblings (offspring that have the same two biological parents) (d) Half siblings (offspring that have only one biological parent in common) (e) Uncle and niece (f) Grandparent and grandchild 29. Females bees are diploid and male bees are haploid. The haploid males produce sperm and can successfully mate with diploid females. Fertilized eggs develop into females and unfertilized eggs develop into males. How do you think the process of sperm production in male bees differs from sperm production in other animals? 30. Rec8 is a protein that is found in yeast chromosome arms and centromeres. Rec8 persists throughout meiosis I but breaks down at anaphase II. When the gene that encodes Rec8 is deleted, sister chromatids separate in anaphase I. (a) From these observations, propose a mechanism for the role of Rec8 in meiosis that helps to explain why sister chromatids normally separate in anaphase II but not anaphase I. (b) Make a prediction about the presence or absence of Rec8 during the various stages of mitosis. SUGGESTED READINGS Hawley, R. S., and T. Arbel. 1993. Yeast genetics and the fall of the classical view of meiosis. Cell 72:301 – 303. Contains information about where in meiosis crossing over takes place and the role of the synaptonemal complex in recombination. Jarrell, K. F., D. P. Bayley, J. D. Correia, and N. A. Thomas. 1999. Recent excitement about Archaea. Bioscience 49:530 – 541. An excellent review of differences between eubacteria, archaea, and eukaryotes. King, R. W., P. K. Jackson, and M. W. Kirschner. 1994. Mitosis in transition. Cell 79:563 – 571. A good review of how the cell cycle is controlled. Kirschner, M. 1992. The cell cycle then and now. Trends in Biochemical Sciences. 17:281 – 285. A good review of the history of research into control of the cell cycle. Koshland, D. 1994. Mitosis: back to basics. Cell 77:951 – 954. Reviews research on mitosis and chromosome movement. McIntosh, J. R., and M. P. Koonce. 1989. Mitosis. Science 246:622 – 628. A review of the process of mitosis. McIntosh, J. R., and K. L. McDonald. 1989. The mitotic spindle. Scientific American 261(4):48 – 56. Review of the mitotic spindle. McIntosh, J. R., and C. M. Pfarr. 1991. Mini-review: mitotic motors. Journal of Cell Biology 115:577 – 583. Considers some of the experimental evidence concerning the role of molecular motors in the organization of the spindle and in chromosome movement. McKim, K. S., and R. S. Hawley. 1995. Chromosomal control of meiotic cell division. Science 270:1595 – 1601. Reviews evidence that chromosomes actively take part in their own movement and in controlling the cell cycle. Morgan, D. O. 1995. Principles of CDK regulation. Nature 34:131 – 134. An excellent short review of cell-cycle control. Nasmyth, K. 1999. Separating sister chromatids. Trends in Biochemical Sciences 24:98 – 103. Considers the role of cohesion in the separation of sister chromatids. Pennisi, E. 1998. Cell division gatekeepers identified. Science 279:477 – 478. Short review of work on the role of kinetochores in chromosome separation. 44 Chapter 2 Pluta, A. F., A. M. MacKay, A. M. Ainsztein, I. G. Goldberg, and W. C. Earnshaw. 1995. Centromere: the hub of chromosome activities. Science 270:1591 – 1594. An excellent review of centromere structure and function. Rothfield, L., S. Justice, and J. Garcia-Lara. 1999. Bacterial cell division. Annual Review of Genetics 33:423 – 428. Comprehensive review of how bacterial cells divide. Uhlmann, F., F. Lottespeich, and K. Nasmyth. 1999. Sister-chromatid separation at anaphase onset is promoted by cleavage of the cohesion subunit Scc1. Nature 400:37 – 42. Report that cleavage of cohesion protein has a role in chromatid separation. Zickler, D., and N. Kleckner. 1999. Meiotic chromosomes: integrating structure and function. Annual Review of Genetics 33:603 – 754. A review of chromosomes in meiosis, their structure and function. Basic Principles of Heredity 3 45 __RRH Basic Principles of Heredity • • __CT Black Urine and First Cousins Mendel: The Father of Genetics Mendel's Success Genetic Terminology • Monohybrid Crosses What Monohybrid Crosses Reveal Predicting The Outcomes of Genetic Crosses The Testcross Incomplete Dominance Genetic Symbols • Multiple-Loci Crosses Dihybrid Crosses The Principle of Independent Assortment The Relationship of the Principle of Independent Assortment to Meiosis Applying Probability and the Branch Diagram to Dihybrid Crosses The Dihybrid Testcross Trihybrid Crosses • Observed and Expected Ratios The Goodness of Fit Chi-square Test Penetrance and Expressivity Alkaptonuria results from impaired function of homogentisate dioxygenase (shown here), an enzyme required for catabolism of the amino acids phenylalanine and tyrosine. (Courtesy of David E. Timm, Department of Molecular Biology, Indiana School of Medicine, and Miguel Penalva, Centro de Investigaciones. Biológicas CSIC, Madrid, Spain.) Black Urine and First Cousins Voiding black urine is a rare and peculiar trait. In 1902, Archibald Garrod discovered the hereditary basis of black urine and, in the process, contributed to our understanding of the nature of genes. Garrod was an English physician who was more interested in chemical explanations of disease than in the practice of medicine. He became intrigued by several of his patients who produced black urine, a condition known as alkaptonuria. The urine of alkaptonurics contains homogentisic acid, a compound that, on exposure to air, oxidizes and turns the urine black. Garrod observed that alkaptonuria appears at birth and remains for life. He noted that often several children in the same family were affected: of the 32 cases that he knew about, 19 appeared in only seven families. Furthermore, the parents of these alkaptonurics were frequently first cousins. With the assistance of geneticist William Bateson, Garrod recognized that this pattern of inheritance is precisely the pattern produced by the transmission of a rare, recessive gene. Garrod later proposed that several other human disorders, including albinism and cystinuria, are inherited in the same way as alkaptonuria. He concluded that each gene 45 46 Chapter 3 encodes an enzyme that controls a biochemical reaction. When there is a flaw in a gene, its enzyme is deficient, resulting in a biochemical disorder. He called these flaws “inborn errors of metabolism.” Garrod was the first to apply the basic principles of genetics, which we will learn about in this chapter, to the inheritance of a human disease. His idea — that genes code for enzymes — was revolutionary and correct. Unfortunately, Garrod’s ideas were not recognized as being important at the time and were appreciated only after they had been rediscovered 30 years later. This chapter is about the principles of heredity: how genes are passed from generation to generation. These principles were first put forth by Gregor Mendel, so we begin by examining his scientific achievements. We then turn to simple genetic crosses, those in which a single characteristic is examined. We learn some techniques for predicting the outcome of genetic crosses and then turn to crosses in which two or more characteristics are examined. We will see how the principles applied to simple genetic crosses and the ratios of offspring that they produce serve as the key for understanding more complicated crosses. We end the chapter by considering statistical tests for analyzing crosses and factors that vary their outcome. Throughout this chapter, a number of concepts are interwoven: Mendel’s principles of segregation and independent assortment, probability, and the behavior of chromosomes. These might at first appear to be unrelated, but they are actually different views of the same phenomenon, because the genes that undergo segregation and independent assortment are located on chromosomes. The principle aim of this chapter is to examine these different views and to clarify their relations. www.whfreeman.com/pierce Archibald Garrod’s original paper on the genetics of alkaptonuria Mendel: The Father of Genetics In 1902, the basic principles of genetics, which Archibald Garrod successfully applied to the inheritance of alkaptonuria, had just become widely known among biologists. Surprisingly, these principles had been discovered some 35 years earlier by Johann Gregor Mendel (1822 – 1884). Mendel was born in what is now part of the Czech Republic. Although his parents were simple farmers with little money, he was able to achieve a sound education and was admitted to the Augustinian monastery in Brno in September 1843. After graduating from seminary, Mendel was ordained a priest and appointed to a teaching position in a local school. He excelled at teaching, and the abbot of the monastery recommended him for further study at the University of Vienna, which he attended from 1851 to 1853. There, Mendel enrolled in the newly opened Physics Institute and took courses in mathematics, chemistry, entomology, paleontology, botany, and plant physiology. It was probably here that Mendel acquired the scientific method, which he later applied so successfully to his genetics experiments. After 2 years of study in Vienna, Mendel returned to Brno, where he taught school and began his experimental work with pea plants. He conducted breeding experiments from 1856 to 1863 and presented his results publicly at meetings of the Brno Natural Science Society in 1865. Mendel’s paper from these lectures was published in 1866. In spite of widespread interest in heredity, the effect of his research on the scientific community was minimal. At the time, no one seems to have noticed that Mendel had discovered the basic principles of inheritance. In 1868, Mendel was elected abbot of his monastery, and increasing administrative duties brought an end to his teaching and eventually to his genetics experiments. He died at the age of 61 on January 6, 1884, unrecognized for his contribution to genetics. The significance of Mendel’s discovery was unappreciated until 1900, when three botanists — Hugo de Vries, Erich von Tschermak, and Carl Correns — began independently conducting similar experiments with plants and arrived at conclusions similar to those of Mendel. Coming across Mendel’s paper, they interpreted their results in terms of his principles and drew attention to his pioneering work. Concepts Gregor Mendel put forth the basic principles of inheritance, publishing his findings in 1866. The significance of his work did not become widely appreciated until 1900. Mendel’s Success Mendel’s approach to the study of heredity was effective for several reasons. Foremost was his choice of experimental subject, the pea plant Pisum sativum ( ◗ FIGURE 3.1), which offered clear advantages for genetic investigation. It is easy to cultivate, and Mendel had the monastery garden and greenhouse at his disposal. Peas grow relatively rapidly, completing an entire generation in a single growing season. By today’s standards, one generation per year seems frightfully slow — fruit flies complete a generation in 2 weeks and bacteria in 20 minutes — but Mendel was under no pressure to publish quickly and was able to follow the inheritance of individual characteristics for several generations. Had he chosen to work on an organism with a longer generation time — horses, for example — he might never have discovered the basis of inheritance. Pea plants also produce many offspring — their seeds — which allowed Mendel to detect meaningful mathematical ratios in the traits that he observed in the progeny. The large number of varieties of peas that were available to Mendel was also crucial, because these varieties differed in various traits and were genetically pure. Mendel was therefore able to begin with plants of variable, known genetic makeup. Basic Principles of Heredity Seed (endosperm) color Yellow Green Seed shape Round Wrinkled Seed coat color Gray White Flower position Stem length Axial (along stem) Pod shape Pod color Terminal (at tip of stem) Yellow Green Inflated Constricted Short Tall ◗ 3.1 Mendel used the pea plant Pisum sativum in his studies of heredity. He examined seven characteristics that appeared in the seeds and in plants grown from the seeds. (Photo from Wally Eberhart/Visuals Unlimited.) Much of Mendel’s success can be attributed to the seven characteristics that he chose for study (see Figure 3.1). He avoided characteristics that display a range of variation; instead, he focused his attention on those that exist in two easily differentiated forms, such as white versus gray seed coats, round versus wrinkled seeds, and inflated versus constricted pods. Finally, Mendel was successful because he adopted an experimental approach. Unlike many earlier investigators who just described the results of crosses, Mendel formulated hypotheses based on his initial observations and then conducted additional crosses to test his hypotheses. He kept careful records of the numbers of progeny possessing each type of trait and computed ratios of the different types. He paid close attention to detail, was adept at seeing patterns in detail, and was patient and thorough, conducting his experiments for 10 years before attempting to write up his results. www.whfreeman.com/pierce Mendel’s original paper (in German, with an English translation), as well as references, essays, and commentaries on Mendel’s work Table 3.1 Summary of important genetic terms Term Definition Gene A genetic factor (region of DNA) that helps determine a characteristic Allele One of two or more alternate forms of a gene Locus Specific place on a chromosome occupied by an allele Genotype Set of alleles that an individual possesses Heterozygote An individual possessing two different alleles at a locus Homozygote An individual possessing two of the same alleles at a locus Phenotype or trait The appearance or manifestation of a character Character or characteristic An attribute or feature Genetic Terminology Before we examine Mendel’s crosses and the conclusions that he made from them, it will be helpful to review some terms commonly used in genetics (Table 3.1). The term gene was a word that Mendel never knew. It was not coined until 1909, when the Danish geneticist Wilhelm Johannsen first used it. The definition of a gene varies with the context of its use, and so its definition will change as we explore different aspects of heredity. For our present use in the context of genetic crosses, we will define a gene as an inherited factor that determines a characteristic. Genes frequently come in different versions called alleles ( ◗ FIGURE 3.2). In Mendel’s crosses, seed shape was determined by a gene that exists as two different alleles: one allele codes for round seeds and the other codes for wrinkled seeds. All alleles for any particular gene will be found at a specific place on a chromosome called the locus for that gene. (The plural of locus is loci; it’s bad form in genetics — and incorrect — to speak of locuses.) Thus, there is a specific place — a locus — on a chromosome in pea plants 47 488 Chapter 3 1 Genes exist in different versions called alleles. 2 One allele codes for round seeds… Allele R 3 …and a different allele codes for wrinkled seeds. Allele r 4 Different alleles occupy the same locus on homologous chromosomes. ◗ 3.2 At each locus, a diploid organism possesses two alleles located on different homologous chromosomes. where the shape of seeds is determined. This locus might be occupied by an allele for round seeds or one for wrinkled seeds. We will use the term allele when referring to a specific version of a gene; we will use the term gene to refer more generally to any allele at a locus. The genotype is the set of alleles that an individual organism possesses. A diploid organism that possesses two identical alleles is homozygous for that locus. One that possesses two different alleles is heterozygous for the locus. Another important term is phenotype, which is the manifestation or appearance of a characteristic. A phenotype can refer to any type of characteristic: physical, physiological, biochemical, or behavioral. Thus, the condition of having round seeds is a phenotype, a body weight of 50 kg is a phenotype, and having sickle-cell anemia is a phenotype. In this book, the term characteristic or character refers to a general feature such as eye color; the term trait or phenotype refers to specific manifestations of that feature, such as blue or brown eyes. A given phenotype arises from a genotype that develops within a particular environment. The genotype determines the potential for development; it sets certain limits, or boundaries, on that development. How the phenotype develops within those limits is determined by the effects of other genes and environmental factors, and the balance between these influences varies from character to character. For some characters, the differences between phenotypes are determined largely by differences in genotype; in other words, the genetic limits for that phenotype are narrow. Seed shape in Mendel’s peas is a good example of a characteristic for which the genetic limits are narrow and the phenotypic differences are largely genetic. For other characters, environmental differences are more important; in this case, the limits imposed by the genotype are broad. The height that an oak tree reaches at maturity is a phenotype that is strongly influenced by environmental factors, such as the availability of water, sunlight, and nutrients. Nevertheless, the tree’s genotype still imposes some limits on its height: an oak tree will never grow to be 300 m tall no matter how much sunlight, water, and fertilizer are provided. Thus, even the height of an oak tree is determined to some degree by genes. For many characteristics, both genes and environment are important in determining phenotypic differences. An obvious but important concept is that only the genotype is inherited. Although the phenotype is determined, at least to some extent, by genotype, organisms do not transmit their phenotypes to the next generation. The distinction between genotype and phenotype is one of the most important principles of modern genetics. The next section describes Mendel’s careful observation of phenotypes through several generations of breeding experiments. These experiments allowed him to deduce not only the genotypes of the individual plants, but also the rules governing their inheritance. Concepts Each phenotype results from a genotype developing within a specific environment. The genotype, not the phenotype, is inherited. Monohybrid Crosses Mendel started with 34 varieties of peas and spent 2 years selecting those varieties that he would use in his experiments. He verified that each variety was genetically pure (homozygous for each of the traits that he chose to study) by growing the plants for two generations and confirming that all offspring were the same as their parents. He then carried out a number of crosses between the different varieties. Although peas are normally self-fertilizing (each plant crosses with itself), Mendel conducted crosses between different plants by opening the buds before the anthers were fully developed, removing the anthers, and then dusting the stigma with pollen from a different plant. Mendel began by studying monohybrid crosses — those between parents that differed in a single characteristic. In one experiment, Mendel crossed a pea plant homozygous for round seeds with one that was homozygous for wrinkled seeds ( ◗ FIGURE 3.3). This first generation of a cross is the P (parental) generation. After crossing the two varieties in the P generation, Mendel observed the offspring that resulted from the cross. In regard to seed characteristics, such as seed shape, the phenotype develops as soon as the seed matures, because the seed traits are determined by the newly formed embryo within the seed. For characters associated with the plant itself, such as stem length, the phenotype doesn’t develop until the plant grows from the seed; for these characters, Mendel had to wait until the following spring, plant the seeds, and then observe the phenotypes on the plants that germinated. Basic Principles of Heredity Experiment Question: When peas with two different traits—round and wrinkled seeds—are crossed, will their progeny exhibit one of those traits, both of those traits, or a “blended” intermediate trait? Stigma Method 1 To cross different varieties of peas, remove the anthers from flowers to prevent self-fertilization… Anthers Flower  Flower 2 …and dust the stigma with pollen from a different plant. Cross 3 The pollen fertilizes ova within the flower, which develop into seeds. 4 The seeds grow into plants. The offspring from the parents in the P generation are the F1 (first filial) generation. When Mendel examined the F1 of this cross, he found that they expressed only one of the phenotypes present in the parental generation: all the F1 seeds were round. Mendel carried out 60 such crosses and always obtained this result. He also conducted reciprocal crosses: in one cross, pollen (the male gamete) was taken from a plant with round seeds and, in its reciprocal cross, pollen was taken from a plant with wrinkled seeds. Reciprocal crosses gave the same result: all the F1 were round. Mendel wasn’t content with examining only the seeds arising from these monohybrid crosses. The following spring, he planted the F1 seeds, cultivated the plants that germinated from them, and allowed the plants to self-fertilize, producing a second generation (the F2 generation). Both of the traits from the P generation emerged in the F2 ; Mendel counted 5474 round seeds and 1850 wrinkled seeds in the F2 (see Figure 3.3). He noticed that the number of the round and wrinkled seeds constituted approximately a 3 to 1 ratio; that is, about 34 of the F2 seeds were round and 14 were wrinkled. Mendel conducted monohybrid crosses for all seven of the characteristics that he studied in pea plants, and in all of the crosses he obtained the same result: all of the F1 resembled only one of the two parents, but both parental traits emerged in the F2 in approximately a 3:1 ratio. What Monohybrid Crosses Reveal P generation Homozygous Homozygous round seeds wrinkled seeds  Mendel’s first experiment Cross 5 Mendel crossed two homozygous varieties of peas. F1 generation Self-fertilized 6 All F1 seeds were round.  Mendel’s second experiment Intercross 7 Mendel allowed the plants to self-fertilize. F2 generation Results Fraction of progeny seeds 5474 Round seeds 3/ Round 4 1850 Wrinkled seeds 1/ Wrinkled 4 8 3/ of F seeds 4 2 were round 1 and /4 were wrinkled, a 3: 1 ratio. Conclusion: The traits of the parent plants do not blend. Although F1 plants display the phenotype of one parent, both traits are passed to F2 progeny in a 3:1 ratio. ◗ 3.3 Mendel conducted monohybrid crosses. Mendel drew several important conclusions from the results of his monohybrid crosses. First, he reasoned that, although the F1 plants display the phenotype of only one parent, they must inherit genetic factors from both parents because they transmit both phenotypes to the F2 generation. The presence of both round and wrinkled seeds in the F2 could be explained only if the F1 plants possessed both round and wrinkled genetic factors that they had inherited from the P generation. He concluded that each plant must therefore possess two genetic factors coding for a character. The genetic factors that Mendel discovered (alleles) are, by convention, designated with letters; the allele for round seeds is usually represented by R, and the allele for wrinkled seeds by r. The plants in the P generation of Mendel’s cross possessed two identical alleles: RR in the round-seeded parent and rr in the wrinkled-seeded parent ( ◗ FIGURE 3.4a). A second conclusion that Mendel drew from his monohybrid crosses was that the two alleles in each plant separate when gametes are formed, and one allele goes into each gamete. When two gametes (one from each parent) fuse to produce a zygote, the allele from the male parent unites with the allele from the female parent to produce the genotype of the offspring. Thus, Mendel’s F1 plants inherited an R allele from the round-seeded plant and an r allele from the wrinkled-seeded plant ( ◗ FIGURE 3.4b). However, only the trait encoded by round allele (R) was observed in the F1 — all the F1 progeny had round seeds. Those traits that appeared unchanged in the F1 heterozygous offspring 49 50 Chapter 3 The New Genetics ETHICS • SCIENCE • TECHNOLOGY Should Genetics Researchers Probe Abraham Lincoln’s Genes? Many people agree that no one should be forced to have a genetic test without his or her consent, yet for obvious reasons this ethical principle is difficult to follow when dealing with those who are deceased. There are all sorts of reasons why genetic testing on certain deceased persons might prove important, but one of the primary reasons is for purposes of identification. In anthropology, genetic analysis might help tell us whether we have found the body of a Romanov, Hitler, or Mengele. In cases of war or terrorist attacks, such as those on September 11, 2001, there might be no other way to determine the identify of a deceased person except by matching tissue samples with previously stored biological tissue or with samples from close relatives. One historically interesting case, which highlights the ethical issues faced when determining genetic facts about the dead, is that which centers on Abraham Lincoln. Medical geneticists and advocates for patients with Marfan syndrome have long wondered whether President Lincoln had this particular genetic disease. After all, Lincoln had the tall gangly build often associated with Marfan’s syndrome, which affects the connective tissues and cartilage of the body. Biographers and students of this man, whom many consider to be our greatest president, would like to know whether the depression that Lincoln suffered throughout his life might have been linked to the painful, arthritis-like symptoms of Marfan syndrome. Lincoln was assassinated on April 14, 1865, and died early the next morning. An autopsy was performed, and samples of his hair, bone, and blood were preserved and stored at the National Museum of Health and Medicine; they are still there. The presence of a recently found genetic marker indicates whether someone has Marfan syndrome. With this advancement, it would be possible to use some of the stored remains of Abe Lincoln to see if he had this condition. However, would it be ethical to perform this test? We must be careful about genetic testing, because often too much weight is assigned to the results of such tests. There is a temptation to see DNA as the essence, the blueprint, of a person — that the factor that forms who we are and what we do. Given this tendency, should society be cautious about letting people explore the genes of the deceased? And, if we should not test without permission, then how can we obtain permission in cases where the person in question is dead? In Honest Abe’s case, the “patient” is deceased and has no immediate survivors; there is no one to consent. But allowing testing without consent sets a dangerous precedent. Abraham Lincoln had the tall, gangly build often associated with Marfan syndrome. (Cartoon by Frank Billew, 1864. Bettmann/Corbis.) by Art Caplan It may seem a bit strange to apply the notions of privacy and consent to the deceased. But, considering that most people today agree that consent should be obtained before these tests are administered, do researchers have the right to pry into Lincoln’s DNA simply because neither he nor his descendants are around to say that they can’t? Are we to say that anyone’s body is open to examination whenever a genetic test becomes available that might tell us an interesting fact about that person's biological makeup? Many prominent people from the past have taken special precautions to restrict access to their diaries, papers and letters; for instance, Sigmund Freud locked away his personal papers for 100 years. Will future Lincolns and Freuds need to embargo their mortal remains for eternity to prevent unwanted genetic snooping by subsequent generations? And, when it comes right down to it, what is the point of establishing whether Lincoln had Marfan syndrome? After all, we don’t need to inspect his genes to determine whether he was presidential timber — Marfan or no Marfan, he obviously was. The real questions to ask are, Do we adequately understand what he did as president and what he believed? How did his actions shape our country, and what can we learn from them that will benefit us today? In the end, the genetic basis for Lincoln’s behavior and leadership might be seen as having no relevance. Some would say that genetic testing might divert our attention from Lincoln’s work, writings, thoughts, and deeds and, instead, require that we see him as a jumble of DNA output. Perhaps it makes more sense to encourage efforts to understand and appreciate Lincoln’s legacy through his actions rather than through reconstituting and analyzing his DNA. Basic Principles of Heredity (a) 2 Mendel crossed a plant homozygous for round seeds (RR) with a plant homozygous for wrinkled seeds (rr). 1 Each plant possessed two alleles coding for the character. P generation Homozygous round seeds Homozygous wrinkled seeds  RR rr Gamete formation Gamete formation 3 The two alleles in each plant separated when gametes were formed; one allele went into each gamete. r Gametes R Fertilization (b) F1 generation Round seeds 4 Because round is dominant over wrinkled, all the F1 had round seeds. Rr 5 Gametes fused to produce heterozygous F1 plants that had round seeds. Gamete formation r Gametes R 6 Mendel self-fertilized the F1 to produce the F2,… Self–fertilization (c) F2 generation Round Round Wrinkled 3/4 Round 1/4 Wrinkled 7 …which appeared in a 3:1 ratio of round to wrinkled. 1/4 Rr 1/4 RR 1/4 rR 1/4 rr Gamete formation Gametes R 8 Mendel also selffertilized the F2… R R r r R r r Self–fertilization (d) F3 generation Round Round 9 …to produce F3 seeds. RR Wrinkled Wrinkled Round RR rr rr Rr rR Homozygous round peas produced plants with only round peas. ◗ These heterozygous plants produced round and wrinkled seeds in a 3: 1 ratio. Homozygous wrinkled peas produced plants with only wrinkled peas. 3.4 Mendel’s monohybrid crosses revealed the principle of segregation and the concept of dominance. Mendel called dominant, and those traits that disappeared in the F1 heterozygous offspring he called recessive. When dominant and recessive alleles are present together, the recessive allele is masked, or suppressed. The concept of dominance was a third important conclusion that Mendel derived from his monohybrid crosses. Mendel’s fourth conclusion was that the two alleles of an individual plant separate with equal probability into the gametes. When plants of the F1 (with genotype Rr) produced gametes, half of the gametes received the R allele for round seeds and half received the r allele for wrinkled seeds. The gametes then paired randomly to produce the following genotypes in equal proportions among the F2 : RR, Rr, rR, rr ( ◗ FIGURE 3.4c). Because round (R) is dominant over wrinkled (r), there were three round progeny in the F2 (RR, Rr, rR) for every one wrinkled progeny (rr) in the F2 . This 3:1 ratio of round to wrinkled progeny that Mendel observed in the F2 could occur only if the two alleles of a genotype separated into the gametes with equal probability. The conclusions that Mendel developed about inheritance from his monohybrid crosses have been further developed and formalized into the principle of segregation and the concept of dominance. The principle of segregation (Mendel’s first law) states that each individual diploid organism possesses two alleles for any particular characteristic. These two alleles segregate (separate) when gametes are formed, and one allele goes into each gamete. Furthermore, the two alleles segregate into gametes in equal proportions. The concept of dominance states that, when two different alleles are present in a genotype, only the trait of the dominant allele is observed in the phenotype. Mendel confirmed these principles by allowing his F2 plants to self-fertilize and produce an F3 generation. He found that the F2 plants grown from the wrinkled seeds — those displaying the recessive trait (rr) — produced an F3 in which all plants produced wrinkled seeds. Because his wrinkled-seeded plants were homozygous for wrinkled alleles (rr) they could pass on only wrinkled alleles to their progeny ( ◗ FIGURE 3.4d). The F2 plants grown from round seeds — the dominant trait — fell into two types (Figure 3.4c). On self-fertilization, about 23 of the F2 plants produced both round and wrinkled seeds in the F3 generation. These F2 plants were heterozygous (Rr); so they produced 14 RR (round), 12 Rr (round), and 14 rr (wrinkled) seeds, giving a 3:1 ratio of round to wrinkled in the F3. About 13 of the F2 plants were of the second type; they produced only the dominant round-seeded trait in the F3. These F2 plants were homozygous for the round allele (RR) and thus could produce only round offspring in the F3 generation. Mendel planted the seeds obtained in the F3 and carried these plants through three more rounds of self-fertilization. In each generation, 2 3 of the round-seeded plants produced round and wrinkled offspring, whereas 13 produced only round offspring. These results are entirely consistent with the principle of segregation. 51 52 Chapter 3 Concepts The principle of segregation states that each individual organism possesses two alleles coding for a characteristic. These alleles segregate when gametes are formed, and one allele goes into each gamete. The concept of dominance states that, when dominant and recessive alleles are present together, only the trait of the dominant allele is observed. Connecting Concepts Relating Genetic Crosses to Meiosis We have now seen how the results of monohybrid crosses are explained by Mendel’s principle of segregation. Many students find that they enjoy working genetic crosses but are frustrated by the abstract nature of the symbols. Perhaps you feel the same at this point. You may be asking “What do these symbols really represent? What does the genotype RR mean in regard to the biology of the organism?” The answers to these questions lie in relating the abstract symbols of crosses to the structure and behavior of chromosomes, the repositories of genetic information (Chapter 2). In 1900, when Mendel’s work was rediscovered and biologists began to apply his principles of heredity, the relation between genes and chromosomes was still unclear. The theory that genes are located on chromosomes (the chromosome theory of heredity) was developed in the early 1900s by Walter Sutton, then a graduate student at Columbia University. Through the careful study of meiosis in insects, Sutton documented the fact that each homologous pair of chromosomes consists of one maternal chromosome and one paternal chromosome. Showing that these pairs segregate independently into gametes in meiosis, he concluded that this process is the biological basis for Mendel’s principles of heredity. The German cytologist and embryologist Theodor Boveri came to similar conclusions at about the same time. Sutton knew that diploid cells have two sets of chromosomes. Each chromosome has a pairing partner, its homologous chromosome. One chromosome of each homologous pair is inherited from the mother and the other is inherited from the father. Similarly, diploid cells possess two alleles at each locus, and these alleles constitute the genotype for that locus. The principle of segregation indicates that one allele of the genotype is inherited from each parent. This similarity between the number of chromosomes and the number of alleles is not accidental — the two alleles of a genotype are located on homologous chromosomes. The symbols used in genetic crosses, such as R and r, are just shorthand notations for particular sequences of DNA in the chromosomes that code for particular phenotypes. The two alleles of a genotype are found on different but homologous chromosomes. During the S stage of meiotic interphase, each chromosome replicates, producing two copies of each allele, one on each chromatid ( ◗ FIGURE 3.5a). The homologous chromosomes segregate during anaphase I, thereby separating the two different alleles ( ◗ FIGURE 3.5b and c). This chromosome segregation is the basis of the principle of segregation. During anaphase II of meiosis, the two chromatids of each replicated chromosome separate; so each gamete resulting from meiosis carries only a single allele at each locus, as Mendel’s principle of segregation predicts. If crossing over has taken place during prophase I of meiosis, then the two chromatids of each replicated chromosome are no longer identical, and the segregation of different alleles takes place at anaphase I and anaphase II (see Figure 3.5c). Of course, Mendel didn’t know anything about chromosomes; he formulated his principles of heredity entirely on the basis of the results of the crosses that he carried out. Nevertheless, we should not forget that these principles work because they are based on the behavior of actual chromosomes during meiosis. Concepts The chromosome theory of inheritance states that genes are located on chromosomes. The two alleles of a genotype segregate during anaphase I of meiosis, when homologous chromosomes separate. The alleles may also segregate during anaphase II of meiosis if crossing over has taken place. Predicting the Outcomes of Genetic Crosses One of Mendel’s goals in conducting his experiments on pea plants was to develop a way to predict the outcome of crosses between plants with different phenotypes. In this section, we will first learn a simple, shorthand method for predicting outcomes of genetic crosses (the Punnett square), and then we will learn how to use probability to predict the results of crosses. The Punnett square To illustrate the Punnett square, let’s examine another cross that Mendel carried out. By crossing two varieties of peas that differed in height, Mendel established that tall (T) was dominant over short (t). He tested his theory concerning the inheritance of dominant traits by crossing an F1 tall plant that was heterozygous (Tt) with the short homozygous parental variety (tt). This type of cross, between an F1 genotype and either of the parental genotypes, is called a backcross. Basic Principles of Heredity (a) 1 The two alleles of genotype Rr are located on homologous chromosomes,… R r Chromosome replication 2 …which replicate during S phase of meiosis. R Rr r 3 During prophase I of meiosis, crossing over may or may not take place. Prophase I No crossing over Crossing over (b) (c) R Rr R r rR r 4 During anaphase I, the chromosomes separate. Anaphase I R R r Anaphase II R ◗ 3.5 Anaphase I R 5 If no crossing takes place, the two chromatids of each chromosome separate in anaphase II and are identical. r Anaphase II r R 6 If crossing over takes place, the two chromatids are no longer identical, and the different alleles segregate in anaphase II. r r R Anaphase II R r r Anaphase II R r Segregation happens because homologous chromosomes separate in meiosis. To predict the types of offspring that result from this cross, we first determine which gametes will be produced by each parent ( ◗ FIGURE 3.6a). The principle of segregation tells us that the two alleles in each parent separate, and one allele passes to each gamete. All gametes from the homozygous tt short plant will receive a single short (t) allele. The tall plant in this cross is heterozygous (Tt); so 50% of its gametes will receive a tall allele (T) and the other 50% will receive a short allele (t). A Punnett square is constructed by drawing a grid, putting the gametes produced by one parent along the upper edge and the gametes produced by the other parent down the left side ( ◗ FIGURE 3.6b). Each cell (a block within the Punnett square) contains an allele from each of the corresponding gametes, generating the genotype of the progeny produced by fusion of those gametes. In the upper left-hand cell of the Punnett square in Figure 3.6b, a gamete containing T from the tall plant unites with a gamete containing t from the short plant, giving the genotype of the progeny (Tt). It is useful to write the phenotype expressed by each genotype; here the progeny will be tall, because the tall allele is dominant over the short allele. 53 54 Chapter 3 (a) Concepts P generation The Punnett square is a short-hand method of predicting the genotypic and phenotypic ratios of progeny from a genetic cross.  Tall Short Tt tt Gametes T t t t Fertilization (b) F1 generation t t Tt Tt Tall Tall tt tt Short Short T t Conclusion: Genotypic ratio Phenotypic ratio 1 Tt :1tt 1Ta ll:1Short ◗ 3.6 The Punnett square can be used for determining the results of a genetic cross. This process is repeated for all the cells in the Punnett square. By simply counting, we can determine the types of progeny produced and their ratios. In Figure 3.6b, two cells contain tall (Tt) progeny and two cells contain short (tt) progeny; so the genotypic ratio expected for this cross is 2 Tt to 2 tt (a 1:1 ratio). Another way to express this result is to say that we expect 12 of the progeny to have genotype Tt (and phenotype tall) and 12 of the progeny to have genotype tt (and phenotype short). In this cross, the genotypic ratio and the phenotypic ratio are the same, but this outcome need not be the case. Try completing a Punnett square for the cross in which the F1 round-seeded plants in Figure 3.4 undergo selffertilization (you should obtain a phenotypic ratio of 3 round to 1 wrinkled and a genotypic ratio of 1 RR to 2 Rr to 1 rr). Probability as a tool in genetics Another method for determining the outcome of a genetic cross is to use the rules of probability, as Mendel did with his crosses. Probability expresses the likelihood of a particular event occurring. It is the number of times that a particular event occurs, divided by the number of all possible outcomes. For example, a deck of 52 cards contains only one king of hearts. The probability of drawing one card from the deck at random and obtaining the king of hearts is 152 , because there is only one card that is the king of hearts (one event) and there are 52 cards that can be drawn from the deck (52 possible outcomes). The probability of drawing a card and obtaining an ace is 452 , because there are four cards that are aces (four events) and 52 cards (possible outcomes). Probability can be expressed either as a fraction (152 in this case) or as a decimal number (0.019). The probability of a particular event may be determined by knowing something about how the event occurs or how often it occurs. We know, for example, that the probability of rolling a six-sided die (one member of a pair of dice) and getting a four is 16 , because the die has six sides and any one side is equally likely to end up on top. So, in this case, understanding the nature of the event — the shape of the thrown die — allows us to determine the probability. In other cases, we determine the probability of an event by making a large number of observations. When a weather forecaster says that there is a 40% chance of rain on a particular day, this probability was obtained by observing a large number of days with similar atmospheric conditions and finding that it rains on 40% of those days. In this case, the probability has been determined empirically (by observation). The multiplication rule Two rules of probability are useful for predicting the ratios of offspring produced in genetic crosses. The first is the multiplication rule, which states that the probability of two or more independent events occurring together is calculated by multiplying their independent probabilities. To illustrate the use of the multiplication rule, let’s again consider the roll of dice. The probability of rolling one die and obtaining a four is 16 . To calculate the probability of rolling a die twice and obtaining 2 fours, we can apply the multiplication rule. The probability of obtaining a four on the first roll is 16 and the probability of obtaining a four on the second roll is 16 ; so the probability of rolling a four on both is 16  16  136 ( ◗ FIGURE 3.7a). The key indicator for applying the multiplication rule is the word and; Basic Principles of Heredity (a) The multiplication rule 1 If you roll a die,… 2 …in a large number of sample rolls, on average, one out of six times you will obtain a four… Roll 1 comes up on the other roll, so these events are independent. However, if we wanted to know the probability of being hit on the head with a hammer and going to the hospital on the same day, we could not simply multiply the probability of being hit on the head with a hammer by the probability of going to the hospital. The multiplication rule cannot be applied here, because the two events are not independent — being hit on the head with a hammer certainly influences the probability of going to the hospital. The addition rule The second rule of probability fre3 …so the probability of obtaining a four in any roll is 1/6. 4 If you roll the die again,… 5 …your probability of getting four is again 1/6… Roll 2 6 …so the probability of gettinga four on two sequential rolls is 1/6  1/6 = 1/36 . (b) The addition rule quently used in genetics is the addition rule, which states that the probability of any one of two or more mutually exclusive events is calculated by adding the probabilities of these events. Let’s look at this rule in concrete terms. To obtain the probability of throwing a die once and rolling either a three or a four, we would use the addition rule, adding the probability of obtaining a three (16) to the probability of obtaining a four (again, 16), or 16  16 = 26  13 ( ◗ FIGURE 3.7b). The key indicator for applying the addition rule are the words either and or. For the addition rule to be valid, the events whose probability is being calculated must be mutually exclusive, meaning that one event excludes the possibility of the other occurring. For example, you cannot throw a single die just once and obtain both a three and a four, because only one side of the die can be on top. These events are mutually exclusive. 1 If you roll a die,… Concepts 2 …on average, one out of six times you'll get a three… 3 …and one out of six times you'll get a four. The multiplication rule states that the probability of two or more independent events occurring together is calculated by multiplying their independent probabilities. The addition rule states that the probability that any one of two or more mutually exclusive events occurring is calculated by adding their probabilities. 4 That is, the probability of getting either a three or a four is 1/6 + 1/6 = 2/6 = 1/3. The application of probability to genetic crosses The ◗ 3.7 The multiplication and addition rules can be used to determine the probability of combinations of events. in the example just considered, we wanted to know the probability of obtaining a four on the first roll and a four on the second roll. For the multiplication rule to be valid, the events whose joint probability is being calculated must be independent — the outcome of one event must not influence the outcome of the other. For example, the number that comes up on one roll of the die has no influence on the number that multiplication and addition rules of probability can be used in place of the Punnett square to predict the ratios of progeny expected from a genetic cross. Let’s first consider a cross between two pea plants heterozygous for the locus that determines height, Tt  Tt. Half of the gametes produced by each plant have a T allele, and the other half have a t allele; so the probability for each type of gamete is 12 . The gametes from the two parents can combine in four different ways to produce offspring. Using the multiplication rule, we can determine the probability of each possible type. To calculate the probability of obtaining TT progeny, for example, we multiply the probability of receiving a T allele from the first parent (12) times the probability of 55 56 Chapter 3 receiving a T allele from the second parent (12). The multiplication rule should be used here because we need the probability of receiving a T allele from the first parent and a T allele from the second parent — two independent events. The four types of progeny from this cross and their associated probabilities are: TT Tt tT tt (T gamete and T gamete) (T gamete and t gamete) (t gamete and T gamete) (t gamete and t gamete) 2  12  14 2  12  14 1 2  12  14 1 2  12  14 1 1 tall tall tall short Notice that there are two ways for heterozygous progeny to be produced: a heterozygote can either receive a T allele from the first parent and a t allele from the second or receive a t allele from the first parent and a T allele from the second. After determining the probabilities of obtaining each type of progeny, we can use the addition rule to determine the overall phenotypic ratios. Because of dominance, a tall plant can have genotype TT, Tt, or tT; so, using the addition rule, we find the probability of tall progeny to be 14  14  1 4  34 . Because only one genotype codes for short (tt), the probability of short progeny is simply 14 . Two methods have now been introduced to solve genetic crosses: the Punnett square and the probability method. At this point, you may be saying “Why bother with probability rules and calculations? The Punnett square is easier to understand and just as quick.” For simple monohybrid crosses, the Punnett square is simpler and just as easy to use. However, when tackling more complex crosses concerning genes at two or more loci, the probability method is both clearer and quicker than the Punnett square. The binomial expansion and probability When probability is used, it is important to recognize that there may be several different ways in which a set of events can occur. Consider two parents who are both heterozygous for albinism, a recessive condition in humans that causes reduced pigmentation in the skin, hair, and eyes ( ◗ FIGURE 3.8). When two parents heterozygous for albinism mate (Aa  Aa), the probability of their having a child with albinism (aa) is 14 and the probability of having a child with normal pigmentation (AA or Aa) is 34 . Suppose we want to know the probability of this couple having three children with albinism. In this case, there is only one way in which they can have three children with albinism — their first child has albinism and their second child has albinism and their third child has albinism. Here we simply apply the multiplication rule: 14  14  14  164 . Suppose we now ask, What is the probability of this couple having three children, one with albinism and two with normal pigmentation. This situation is more complicated. The first child might have albinism, whereas the second and third are unaffected; the probability of this sequence of events is 14  34  34  964 . Alternatively, the ◗ 3.8 Albinism in human beings is usually inherited as a recessive trait. (Richard Dranitzke/SS/Photo Researchers.)` first and third children might have normal pigmentation, whereas the second has albinism; the probability of this sequence is 34  14  34  964 . Finally, the first two children might have normal pigmentation and the third albinism; the probability of this sequence is 34  34  14  964 . Because either the first sequence or the second sequence or the third sequence produces one child with albinism and two with normal pigmentation, we apply the addition rule and add the probabilities: 964  964  964  2764 . If we want to know the probability of this couple having five children, two with albinism and three with normal pigmentation, figuring out the different combinations of children and their probabilities becomes more difficult. This task is made easier if we apply the binomial expansion. The binomial takes the form (a  b)n, where a equals the probability of one event, b equals the probability of the alternative event, and n equals the number of times the event occurs. For figuring the probability of two out of five children with albinism: a  the probability of a child having albinism  14 b  the probability of a child having normal pigmentation  34 The binomial for this situation is (a  b)5 because there are five children in the family (n  5). The expansion is: (a  b)5  a5  5a4b  10a3b2  10a2b3  5ab4  b5 Basic Principles of Heredity The first term in the expansion (a5) equals the probability of having five children all with albinism, because a is the probability of albinism. The second term (5a4b) equals the probability of having four children with albinism and one with normal pigmentation, the third term (10a3b2) equals the probability of having three children with albinism and two with normal pigmentation, and so forth. To obtain the probability of any combination of events, we insert the values of a and b; so the probability of having two out of five children with albinism is: n  5; so n!  5  4  3  2  1. Applying this formula to obtain the probability of two out of five children having albinism, we obtain: 10a2b3  10(14)2(34)3  2701024  .26 This value is the same as that obtained with the binomial expansion. We could easily figure out the probability of any desired combination of albinism and pigmentation among five children by using the other terms in the expansion. How did we expand the binomial in this example? In general, the expansion of any binomial (a  b)n consists of a series of n  1 terms. In the preceding example, n  5; so there are 5  1  6 terms: a5, 5a4b, 10a3b2, 10a2b3, 5ab4, and b5. To write out the terms, first figure out their exponents. The exponent of a in the first term always begins with the power to which the binomial is raised, or n. In our example, n equals 5, so our first term is a5. The exponent of a decreases by one in each successive term; so the exponent of a is 4 in the second term (a4), 3 in the third term (a3), and so forth. The exponent of b is 0 (no b) in the first term and increases by 1 in each successive term, increasing from 0 to 5 in our example. Next, determine the coefficient of each term. The coefficient of the first term is always 1; so in our example the first term is 1a5, or just a5. The coefficient of the second term is always the same as the power to which the binomial is raised; in our example this coefficient is 5 and the term is 5a4b. For the coefficient of the third term, look back at the preceding term; multiply the coefficient of the preceding term (5 in our example) by the exponent of a in that term (4) and then divide by the number of that term (second term, or 2). So the coefficient of the third term in our example is (5  4)/2  202  10 and the term is 10a3b2. Follow this same procedure for each successive term. Another way to determine the probability of any particular combination of events is to use the following formula: P n! s t ab s!t! where P equals the overall probability of event X with probability a occurring s times and event Y with probability b occurring t times. For our albinism example, event X would be the occurrence of a child with albinism and event Y the occurrence of a child with normal pigmentation; s would equal the number of children with albinism (2) and t, the number of children with normal pigmentation (3). The ! symbol is termed factorial, and it means the product of all the integers from n to 1. In this example, P 5! 1 2 3 3 ( 4) ( 4) 2!3! P 54321 1 23 3 ( 4) ( 4)  .26 21321 The Testcross A useful tool for analyzing genetic crosses is the testcross, in which one individual of unknown genotype is crossed with another individual with a homozygous recessive genotype for the trait in question. Figure 3.6 illustrates a testcross (as well as a backcross). A testcross tests, or reveals, the genotype of the first individual. Suppose you were given a tall pea plant with no information about its parents. Because tallness is a dominant trait in peas, your plant could be either homozygous (TT) or heterozygous (Tt), but you would not know which. You could determine its genotype by performing a testcross. If the plant were homozygous (TT), a testcross would produce all tall progeny (TT  tt : all Tt); if the plant were heterozygous (Tt), the testcross would produce half tall progeny and half short progeny (Tt  tt : 12 Tt and 12 tt). When a testcross is performed, any recessive allele in the unknown genotype is expressed in the progeny, because it will be paired with a recessive allele from the homozygous recessive parent. Concepts The bionomial expansion may be used to determine the probability of a particular set of of events. A testcross is a cross between an individual with an unknown genotype and one with a homozygous recessive genotype. The outcome of the testcross can reveal the unknown genotype. Incomplete Dominance The seven characters in pea plants that Mendel chose to study extensively all exhibited dominance, but Mendel did realize that not all characters have traits that exhibit dominance. He conducted some crosses concerning the length of time that pea plants take to flower. When he crossed two homozygous varieties that differed in their flowering time by an average of 20 days, the length of time taken by the F1 plants to flower was intermediate between those of 57 58 Chapter 3 the two parents. When the heterozygote has a phenotype intermediate between the phenotypes of the two homozygotes, the trait is said to display incomplete dominance. Incomplete dominance is also exhibited in the fruit color of eggplants. When a homozygous plant that produces purple fruit (PP) is crossed with a homozygous plant that produces white fruit (pp), all the heterozygous F1 (Pp) produce violet fruit ( ◗ FIGURE 3.9a). When the F1 are crossed with each other, 14 of the F2 are purple (PP), 12 are violet (Pp), and 14 are white (pp), as shown in ◗ FIGURE 3.9b. This 1:2:1 ratio is different from the 3:1 ratio that we would observe if eggplant fruit color exhibited dominance. When a (a) ◗ P generation Purple fruit White fruit  PP pp Gametes P p Fertilization F1 generation Violet fruit Violet fruit  Pp Pp Gametes P p P (b) p P PP Pp trait displays incomplete dominance, the genotypic ratios and phenotypic ratios of the offspring are the same, because each genotype has its own phenotype. It is impossible to obtain eggplants that are pure breeding for violet fruit, because all plants with violet fruit are heterozygous. Another example of incomplete dominance is feather color in chickens. A cross between a homozygous black chicken and a homozygous white chicken produces F1 chickens that are gray. If these gray F1 are intercrossed, they produce F2 birds in a ratio of 1 black: 2 gray: 1 white. Leopard white spotting in horses is incompletely dominant over unspotted horses: LL horses are white with numerous dark spots, heterozygous Ll horses have fewer spots, and ll horses have no spots ( ◗ FIGURE 3.10). The concept of dominance and some of its variations are discussed further in Chapter 5. p Fertilization F2 generation 3.10 Leopard spotting in horses exhibits incomplete dominance. (Frank Oberle/Bruce Coleman.) Concepts Incomplete dominance is exhibited when the heterozygote has a phenotype intermediate between the phenotypes of the two homozygotes. When a trait exhibits incomplete dominance, a cross between two heterozygotes produces a 1:2:1 phenotypic ratio in the progeny. P Purple Violet Pp pp Violet White p Conclusion: Genotypic ratio 1 PP :2 Pp :1 pp Phenotypic ratio 1purple:2violet:1white ◗ 3.9 Fruit color in eggplant is inherited as an incompletely dominant trait. Genetic Symbols As we have seen, genetic crosses are usually depicted with the use of symbols to designate the different alleles. Lowercase letters are traditionally used to designate recessive alleles, and uppercase letters are for dominant alleles. Two or three letters may be used for a single allele: the recessive allele for heart-shaped leaves in cucumbers is designated hl, and the recessive allele for abnormal sperm head shape in mice is designated azh. The normal allele for a character — called the wild type because it is the allele most often found in the wild — is of- Basic Principles of Heredity Table 3.2 Phenotypic ratios for simple genetic crosses (crosses for a single locus) Ratio Genotypes of Parents Genotypes of Progeny Type of Dominance 4 A — : 4 aa 4 AA: 12 Aa: 14 aa Dominance 2 Aa: 12 aa Dominance or incomplete dominance 3:1 Aa  Aa 3 1:2:1 Aa  Aa 1 1:1 Aa  aa 1 Uniform progency 1 Incomplete dominance Aa  AA 1 2 Aa: 2 AA Incomplete dominance AA  AA All AA Dominance or incomplete dominance 1 aa  aa All aa Dominance or incomplete dominance AA  aa All Aa Dominance or incomplete dominance AA  Aa All A — Dominance Note: A line in a genotype, such as A __, indicates that any allele is possible. ten symbolized by one or more letters and a plus sign (). The letter(s) chosen are usually based on the phenotype of the mutant. The first letter is lowercase if the mutant phenotype is recessive, uppercase if the mutant phenotype is dominant. For example, the recessive allele for yellow eyes in the Oriental fruit fly is represented by ye, whereas the allele for wild-type eye color is represented by ye. At times, the letters for the wild-type allele are dropped and the allele is represented simply by a plus sign. Superscripts and subscripts are sometimes added to distinguish between genes: Lfr1 and Lfr2 represent dominant alleles at different loci that produce lacerate leaf margins in opium poppies; ElR represents an allele in goats that restricts the length of the ears. A slash may be used to distinguish alleles present in an individual genotype. The genotype of a goat that is heterozygous for restricted ears might be written El/ElR or simply /ElR. If genotypes at more than one locus are presented together, a space may separate them. A goat heterozygous for a pair of alleles that produce restricted ears and heterozygous for another pair of alleles that produce goiter can be designated by El/ElR G/g. Connecting Concepts Ratios in Simple Crosses Now that we have had some experience with genetic crosses, let’s review the ratios that appear in the progeny of simple crosses, in which a single locus is under consideration. Understanding these ratios and the parental genotypes that produce them will allow you to work simple genetic crosses quickly, without resorting to the Punnett square. Later, we will use these ratios to work more complicated crosses entailing several loci. There are only four phenotypic ratios to understand (Table 3.2). The 3:1 ratio arises in a simple genetic cross when both of the parents are heterozygous for a dominant trait (Aa  Aa). The second phenotypic ratio is the 1:2:1 ratio, which arises in the progeny of crosses between two parents heterozygous for a character that exhibits incom- plete dominance (Aa  Aa). The third phenotypic ratio is the 1:1 ratio, which results from the mating of a homozygous parent and a heterozygous parent. If the character exhibits dominance, the homozygous parent in this cross must carry two recessive alleles (Aa  aa) to obtain a 1:1 ratio, because a cross between a homozygous dominant parent and a heterozygous parent (AA  Aa) produces only offspring displaying the dominant trait. For a character with incomplete dominance, a 1:1 ratio results from a cross between the heterozygote and either homozygote (Aa  aa or Aa  AA). The fourth phenotypic ratio is not really a ratio — all the offspring have the same phenotype. Several combinations of parents can produce this outcome (Table 3.2). A cross between any two homozygous parents — either between two of the same homozygotes (AA  AA and aa  aa) or between two different homozygotes (AA  aa) — produces progeny all having the same phenotype. Progeny of a single phenotype can also result from a cross between a homozygous dominant parent and a heterozygote (AA  Aa). If we are interested in the ratios of genotypes instead of phenotypes, there are only three outcomes to remember (Table 3.3): the 1:2:1 ratio, produced by a cross between Table 3.3 Genotypic ratios for simple genetic crosses (crosses for a single locus) Ratio Genotypes of Parents Genotypes of Progeny 1:2:1 Aa  Aa 1 1:1 Aa  aa 1 Aa  AA 1 AA  AA All AA aa  aa All aa AA  aa All Aa Uniform progeny 4 AA: 12 Aa: 14 aa 2 Aa: 12 aa 2 Aa: 12 AA 59 60 Chapter 3 two heterozygotes; the 1:1 ratio, produced by a cross between a heterozygote and a homozygote; and the uniform progeny produced by a cross between two homozygotes. These simple phenotypic and genotypic ratios and the parental genotypes that produce them provide the key to understanding crosses for a single locus and, as you will see in the next section, for multiple loci. (a) P generation Round, yellow seeds  RR YY rr yy RY ry Gametes Multiple-Loci Crosses We will now extend Mendel’s principle of segregation to more complex crosses for alleles at multiple loci. Understanding the nature of these crosses will require an additional principle, the principle of independent assortment. Wrinkled, green seeds Fertilization (b) F1 generation Round, yellow seeds Dihybrid Crosses In addition to his work on monohybrid crosses, Mendel also crossed varieties of peas that differed in two characteristics (dihybrid crosses). For example, he had one homozygous variety of pea that produced round seeds and yellow endosperm; another homozygous variety produced wrinkled seeds and green endosperm. When he crossed the two, all the F1 progeny had round seeds and yellow endosperm. He then self-fertilized the F1 and obtained the following progeny in the F2: 315 round, yellow seeds; 101 wrinkled, yellow seeds; 108 round, green seeds; and 32 wrinkled, green seeds. Mendel recognized that these traits appeared approximately in a 9:3:3:1 ratio; that is, 916 of the progeny were round and yellow, 316 were wrinkled and yellow, 316 were round and green, and 116 were wrinkled and green. Rr Yy Gametes ry Ry rY Self–fertilization (c) F2 generation RY ry Ry rY RR YY Rr Yy RR Yy Rr YY Rr Yy rr yy Rr yy rr Yy RR Yy Rr yy RR yy Rr Yy Rr YY rr Yy Rr Yy rr YY RY The Principle of Independent Assortment Mendel carried out a number of dihybrid crosses for pairs of characteristics and always obtained a 9:3:3:1 ratio in the F2 . This ratio makes perfect sense in regard to segregation and dominance if we add a third principle, which Mendel recognized in his dihybrid crosses: the principle of independent assortment (Mendel’s second law). This principle states that alleles at different loci separate independently of one another. A common mistake is to think that the principle of segregation and the principle of independent assortment refer to two different processes. The principle of independent assortment is really an extension of the principle of segregation. The principle of segregation states that the two alleles of a locus separate when gametes are formed; the principle of independent assortment states that, when these two alleles separate, their separation is independent of the separation of alleles at other loci. Let’s see how the principle of independent assortment explains the results that Mendel obtained in his dihybrid cross. Each plant possesses two alleles coding for each characteristic, so the parental plants must have had genotypes RRYY and rryy ( ◗ FIGURE 3.11a). The principle of segrega- RY ry Ry rY Conclusion: Phenotypic ratio 9 round, yellow: 3 round, green: 3 wrinkled, yellow : 1 wrinkled, green ◗ 3.11 Mendel conducted dihybrid crosses. tion indicates that the alleles for each locus separate, and one allele for each locus passes to each gamete. The gametes produced by the round, yellow parent therefore contain alleles RY, whereas the gametes produced by the wrinkled, green parent contain alleles ry. These two types of gametes unite to produce the F1, all with genotype RrYy. Because Basic Principles of Heredity round is dominant over wrinkled and yellow is dominant over green, the phenotype of the F1 will be round and yellow. When Mendel self-fertilized the F1 plants to produce the F2 , the alleles for each locus separated, with one allele going into each gamete. This is where the principle of independent assortment becomes important. Each pair of alleles can separate in two ways: (1) R separates with Y and r separates with y to produce gametes RY and ry or (2) R separates with y and r separates with Y to produce gametes Ry and rY. The principle of independent assortment tells us that the alleles at each locus separate independently; thus, both kinds of separation occur equally and all four type of gametes (RY, ry, Ry, and rY) are produced in equal proportions ( ◗ FIGURE 3.11b). When these four types of gametes are combined to produce the F2 generation, the progeny consist of 916 round and yellow, 316 wrinkled and yellow, 316 round and green, and 116 wrinkled and green, resulting in a 9:3:3:1 phenotypic ratio ( ◗ FIGURE 3.11c). Rr  Rr, which yields a 3:1 phenotypic ratio (34 round and 4 wrinkled progeny, see Table 3.2). Next consider the other characteristic, the color of the endosperm. The cross was Yy  Yy, which produces a 3:1 phenotypic ratio (34 yellow and 1 4 green progeny). We can now combine these monohybrid ratios by using the multiplication rule to obtain the proportion of progeny with different combinations of seed shape and color. The proportion of progeny with round and yellow seeds is 34 (the probability of round)  34 (the probability of yellow)  916 . The proportion of progeny with round and green seeds is 34  14  316 ; the proportion of progeny with wrinkled and yellow seeds is 14  34  316 ; and the 1 Round, yellow  Rr Yy The Relation of the Principle of Independent Assortment to Meiosis An important qualification of the principle of independent assortment is that it applies to characters encoded by loci located on different chromosomes because, like the principle of segregation, it is based wholly on the behavior of chromosomes during meiosis. Each pair of homologous chromosomes separates independently of all other pairs in anaphase I of meiosis (see Figure 2.18); so genes located on different pairs of homologs will assort independently. Genes that happen to be located on the same chromosome will travel together during anaphase I of meiosis and will arrive at the same destination — within the same gamete (unless crossing over takes place). Genes located on the same chromosome therefore do not assort independently (unless they are located sufficiently far apart that crossing over takes place every meiotic division, as will be discussed fully in Chapter 7). Concepts The principle of independent assortment states that genes coding for different characteristics separate independently of one another when gametes are formed, owing to independent separation of homologous pairs of chromosomes during meiosis. Genes located close together on the same chromosome do not, however, assort independently. Rr Yy 1 The dihybrid cross is broken into two monohybrid crosses… (a) Expected proportions for first trait (shape) Expected proportions for second trait (color) Expected proportions for both traits Rr  Rr Yy  Yy Rr Yy  Rr Yy Cross Cross 3/4 R_ 1/4 Yellow rr 1/4 Wrinkled yy Green 3 The individual traits and the associated probabilities are then combined by using the branch method. (b) 3/4 When the genes at two loci separate independently, a dihybrid cross can be understood as two monohybrid crosses. Let’s examine Mendel’s dihybrid cross (RrYy  RrYy) by considering each characteristic separately ( ◗ FIGURE 3.12a). If we consider only the shape of the seeds, the cross was 2 …and the probability of each trait is determined. 3/4 Y_ Round R_ 3/4 Y_ R_ Y_ Yellow 3/4 Round 1/4 yy Green 4 These proportions are determined from the cross in part a.  3/4 = 9/16 Round, yellow R_ y y  1/4 = 3/16 Round, green 3/4 3/4 Y_ rr Y_ Yellow 1/4 1/4 rr Applying Probability and the Branch Diagram to Dihybrid Crosses Round, yellow  3/4 = 3/16 Wrinkled, yellow Wrinkled 1/4 yy Green ◗ rr yy  1/4 = 1/16 Wrinkled, green 1/4 3.12 A branch diagram can be used for determining the phenotypes and expected proportions of offspring from a dihybrid cross (RrYy  RrYy). 61 62 Chapter 3 proportion of progeny with wrinkled and green seeds is 4  14  116 . Branch diagrams are a convenient way of organizing all the combinations of characteristics ( ◗ FIGURE 3.12b). In the first column, list the proportions of the phenotypes for one character (here, 34 round and 14 wrinkled). In the second column, list the proportions of the phenotypes for the second character (34 yellow and 14 green) next to each of the phenotypes in the first column: put 34 yellow and 14 green next to the round phenotype and again next to the wrinkled phenotype. Draw lines between the phenotypes in the first column and each of the phenotypes in the second column. Now follow each branch of the diagram, multiplying the probabilities for each trait along that branch. One branch leads from round to yellow, yielding round and yellow progeny. Another branch leads from round to green, yielding round and green progeny, and so on. The probability of progeny with a particular combination of traits is calculated by using the multiplicative rule: the probability of round (34) and yellow (34) seeds is 34  34  916 . The advantage of the branch diagram is that it helps keep track of all the potential combinations of traits that may appear in the progeny. It can be used to determine phenotypic or genotypic ratios for any number of characteristics. Using probability is much faster than using the Punnett square for crosses that include multiple loci. Genotypic and phenotypic ratios can quickly be worked out by combining, with the multiplication rule, the simple ratios in Tables 3.2 and 3.3. The probability method is particularly efficient if we need the probability of only a particular phenotype or genotype among the progeny of a cross. Suppose we needed to know the probability of obtaining the genotype Rryy in the F2 of the dihybrid cross in Figure 3.11. The probability of obtaining the Rr genotype in a cross of Rr  Rr is 12 and that of obtaining yy progeny in a cross of Yy  Yy is 14 (see Table 3.3). Using the multiplication rule, we find the probability of Rryy to be 12  14  18 . To illustrate the advantage of the probability method, consider the cross AaBbccDdEe  AaBbCcddEe. Suppose we wanted to know the probability of obtaining offspring with the genotype aabbccddee. If we used a Punnett square to determine this probability, we might be working on the solution for months. However, we can quickly figure the probability of obtaining this one genotype by breaking this cross into a series of single-locus crosses: 1 rule: 14  14  12  12  14  1256 . This calculation assumes that genes at these five loci all assort independently. Concepts A cross including several characteristics can be worked by breaking the cross down into single-locus crosses and using the multiplication rule to determine the proportions of combinations of characteristics (provided the genes assort independently). The Dihybrid Testcross Let’s practice using the branch diagram by determining the types and proportions of phenotypes in a dihybrid testcross between the round and yellow F1 plants (Rr Yy) that Mendel obtained in his dihybrid cross and the wrinkled and green plants (rryy) ( ◗ FIGURE 3.13). Break the cross down into a series of single-locus crosses. The cross Rr  rr yields 12 round (Rr) progeny and 12 wrinkled (rr) progeny. The cross Yy  yy yields 12 yellow (Yy) progeny and 12 green (yy) Round, yellow  Rr Yy rr yy Expected Expected proportions for proportions for first character second character 1/2 Rr  rr Yy  yy Cross Cross Rr Round 1/2 rr Wrinkled 1/2 1/2 Cross Probability Aa  Aa Bb  Bb cc  Cc Dd  dd Ee  Ee aa bb cc dd ee 4 1 4 1 2 1 2 1 4 1 The probability of an offspring from this cross having genotype aabbccddee is now easily obtained by using the multiplication Rr Yy  rr yy Yy yy Green 1/2 Yy Yellow 1/2 Expected proportions for both characters Yellow Rr Rr Yy  1/2 = 1/4 Round, yellow 1/2 Round 1/2 yy Green 1/2 Progeny genotype Wrinkled, green Yy Yellow 1/2 rr Rr yy  1/2 = 1/4 Round, green 1/2 rr Yy  1/2 = 1/4 Wrinkled, yellow 1/2 Wrinkled 1/2 yy Green ◗ rr yy  1/2 = 1/4 Wrinkled, green 1/2 3.13 A branch diagram can be used for determining the phenotypes and expected proportions of offspring from a dihybrid testcross (RrYy  rryy). Basic Principles of Heredity progeny. Using the multiplication rule, we find the proportion of round and yellow progeny to be 12 (the probability of round)  12 (the probability of yellow)  14 . Four combinations of traits with the following proportions appear in the offspring: 14 RrYy, round yellow; 14 Rryy, round green; 1 4 rrYy, wrinkled yellow; and 14 rryy, wrinkled green. possessed round seeds, yellow endosperm, and gray seed coats with another pure-breeding variety that possessed wrinkled seeds, green endosperm, and white seed coats ( ◗ FIGURE 3.14). The branch diagram shows that the expected phenotypic ratio in the F2 is 27:9:9:9:3:3:3:1, and the numbers that Mendel obtained from this cross closely fit these expected ones. In monohybrid crosses, we have seen that three genotypes (RR, Rr, and rr) are produced in the F2. In dihybrid crosses, nine genotypes (3 genotypes for the first locus  3 genotypes for the second locus  9) are produced in the F2: Trihybrid Crosses The branch diagram can also be applied to crosses including three characters (called trihybrid crosses). In one trihybrid cross, Mendel crossed a pure-breeding variety that  Cross RR YY CC rr yy cc Rr Yy Cc Expected proportions for first trait Expected proportions for second trait Expected proportions for third trait Rr  Rr Yy  Yy Cc  Cc Cross Cross Cross 3/4 R– Round 3/4 Y 1/4 rr Wrinkled 1/4 – Yellow yy Green 3/4 C– Gray 1/4 cc White 3/4 C– Gray 3/4 Y – Yellow Rr Yy Rr Yy R_ Y_ C_ 3/4  3/4  3/4 =27/64 Round, yellow, gray 1/4 3/4 Expected proportions for both traits cc White R_Y_ cc 3/4  3/4  1/4 = 9/64 Round, yellow, white R– Round 3/4 1/4 C– Gray yy Green R_ yy C_ 3/4  1/4  3/4 = 9/64 Round, green, gray 1/4 cc White R_ yy cc  1/4  1/4 = 3/64 Round, green, white 3/4 3/4 3/4 Y_ Yellow rr Y_ C_ 1/4  3/4  3/4 = 9/64 Wrinkled, yellow, gray 1/4 1/4 C– Gray cc White rr Y– cc 1/4  3/4  1/4 = 3/64 rr Wrinkled Wrinkled, yellow, white 3/4 1/4 C– Gray yy Green rr yy C– 1/4  1/4  3/4 = 3/64 Wrinkled, green, gray 1/4 cc White rr yy cc 1/4  11/4  11/4 = 1/64 Wrinkled, green, white ◗ 3.14 A branch diagram can be used for determining the phenotypes and expected proportions of offspring from a trihybrid cross (RrYyCc  RrYyCc). 63 64 Chapter 3 RRYY, RRYy, RRyy, RrYY, RrYy, Rryy, rrYY, rrYy, and rryy. There are three possible genotypes at each locus (when there are two alternative alleles); so the number of genotypes produced in the F2 of a cross between individuals heterozygous for n loci will be 3n. If there is incomplete dominance, the number of phenotypes also will be 3n because, with incomplete dominance, each genotype produces a different phenotype. If the traits exhibit dominance, the number of phenotypes will be 2n. about the dominance relations of the characters and about the mice being crossed. Black is dominant over brown and solid is dominant over white spotted. Furthermore, the genes for the two characters assort independently. In this problem, symbols are provided for the different alleles (B for black, b for brown, S for solid, and s for spotted); had these symbols not been provided, you would need to choose symbols to represent these alleles. It is useful to record these symbols at the beginning of the solution: B — black b — brown Worked Problem Not only are the principles of segregation and independent assortment important because they explain how heredity works, but they also provide the means for predicting the outcome of genetic crosses. This predictive power has made genetics a powerful tool in agriculture and other fields, and the ability to apply the principles of heredity is an important skill for all students of genetics. Practice with genetic problems is essential for mastering the basic principles of heredity — no amount of reading and memorization can substitute for the experience gained by deriving solutions to specific problems in genetics. Students may have difficulty with genetics problems when they are unsure where to begin or how to organize the problem and plan a solution. In genetics, every problem is different, so there is no common series of steps that can be applied to all genetics problems. One must use logic and common sense to analyze a problem and arrive at a solution. Nevertheless, certain steps can facilitate the process, and solving the following problem will serve to illustrate these steps. In mice, black coat color (B) is dominant over brown (b), and a solid pattern (S) is dominant over white spotted (s). Color and spotting are controlled by genes that assort independently. A homozygous black, spotted mouse is crossed with a homozygous brown, solid mouse. All the F1 mice are black and solid. A testcross is then carried out by mating the F1 mice with brown, spotted mice. (a) Give the genotypes of the parents and the F1 mice. (b) Give the genotypes and phenotypes, along with their expected ratios, of the progeny expected from the testcross. • Solution Step 1: Determine the questions to be answered. What question or questions is the problem asking? Is it asking for genotypes, genotypic ratios, or phenotypic ratios? This problem asks you to provide the genotypes of the parents and the F1, the expected genotypes and phenotypes of the progeny of the testcross, and their expected proportions. Step 2: Write down the basic information given in the problem. This problem provides important information S — solid s — white-spotted Next, write out the crosses given in the problem. P homozygous  homozygous black, spotted brown, solid F1 black, solid p Testcross black, solid  brown, spotted Step 3: Write down any genetic information that can be determined from the phenotypes alone. From the phenotypes and the statement that they are homozygous, you know that the P-generation mice must be BBss and bbSS. The F1 mice are black and solid, both dominant traits, so the F1 mice must possess at least one black allele (B) and one spotted allele (S). At this point, you cannot be certain about the other alleles, so represent the genotype of the F1 as B?S?. The brown, spotted mice in the testcross must be bbss, because both brown and spotted are recessive traits that will be expressed only if two recessive alleles are present. Record these genotypes on the crosses that you wrote out in step 2: P homozygous  homozygous black, spotted brown, solid BBss  bbSS F1 black, solid B?S? p Testcross black, solid B?S?   brown, spotted bbss Step 4: Break down the problem into smaller parts. First, determine the genotype of the F1. After this genotype has been determined, you can predict the results of the testcross and determine the genotypes and phenotypes of the progeny from the testcross. Second, because this cross includes two independently assorting loci, it can be conveniently broken down into two single-locus crosses: one for coat color and another for spotting. Basic Principles of Heredity Step 5: Work the different parts of problem. Start by determining the genotype of the F1 progeny. Mendel’s first law indicates that the two alleles at a locus separate, one going into each gamete. Thus, the gametes produced by the black, spotted parent contain Bs and the gametes produced by the brown, spotted parent contain bS, which combine to produce F1 progeny with the genotype BbSs: homozygous  homozygous black, spotted brown, solid BBss  bbss P Gametes p p Bs bs F1 Use the F1 genotype to work the testcross (BbSs  bbss), breaking it into two single-locus crosses. First, consider the cross for coat color: Bb  bb. Any cross between a heterozygote and a homozygous recessive genotype produces a 1:1 phenotypic ratio of progeny (see Table 3.2): BB  bb p 2 Bb black 2 bb brown 1 Next do the cross for spotting: Ss  ss. This cross also is between a heterozygote and a homozygous recessive genotype and will produce 12 solid (Ss) and 12 spotted (ss) progeny (see Table 3.2). Ss  ss p 2 Ss solid 1 2 ss spotted 1 Finally, determine the proportions of progeny with combinations of these characters by using the branch diagram. 2 Ss solid 1 2 Bb black ˚ ˚ 9 9 1 9 2 bb brown 1 9: bbss brown, solid 1 2  12  14 2 ss spotted 9: bbss brown, spotted 1 2  12  14 1 Step 6: Check all work. As a last step, reread the problem, checking to see if your answers are consistent with the information provided. You have used the genotypes BBss and bbSS in the P generation. Do these genotypes code for the phenotypes given in the problem? Are the F1 progeny phenotypes consistent with the genotypes that you assigned? The answers are consistent with the information. Observed and Expected Ratios BbSs 1 2 Ss solid 1 ˚ ˚ 9 Third, use a branch diagram to determine the proportion of progeny of the testcross with different combinations of the two traits. 9: BbSs black, solid 1 2  12  14 2 ss spotted 9: Bbss black, spotted 1 2  12  14 1 When two individuals of known genotype are crossed, we expect certain ratios of genotypes and phenotypes in the progeny; these expected ratios are based on the Mendelian principles of segregation, independent assortment, and dominance. The ratios of genotypes and phenotypes actually observed among the progeny, however, may deviate from these expectations. For example, in German cockroaches, brown body color (Y) is dominant over yellow body color (y). If we cross a brown, heterozygous cockroach (Yy) with a yellow cockroach (yy), we expect a 1:1 ratio of brown (Yy) and yellow (yy) progeny. Among 40 progeny, we would therefore expect to see 20 brown and 20 yellow offspring. However, the observed numbers might deviate from these expected values; we might in fact see 22 brown and 18 yellow progeny. Chance plays a critical role in genetic crosses, just as it does in flipping a coin. When you flip a coin, you expect a 1:1 ratio — 12 heads and 12 tails. If you flip a coin 1000 times, the proportion of heads and tails obtained would probably be very close to that expected 1:1 ratio. However, if you flip the coin 10 times, the ratio of heads to tails might be quite different from 1:1. You could easily get 6 heads and 4 tails, or 3 and 7 tails, just by chance. It is possible that you might even get 10 heads and 0 tails. The same thing happens in genetic crosses. We may expect 20 brown and 20 yellow cockroaches, but 22 brown and 18 yellow progeny could arise as a result of chance. The Goodness-of-Fit Chi-Square Test If you expected a 1:1 ratio of brown and yellow cockroaches but the cross produced 22 brown and 18 yellow, you probably wouldn’t be too surprised even though it wasn’t a perfect 1:1 ratio. In this case, it seems reasonable to assume that chance produced the deviation between the expected and the observed results. But, if you observed 25 brown and 15 yellow, would the ratio still be 1:1? Something other than chance might have caused the deviation. Perhaps the 65 66 Chapter 3 To use the goodness-of-fit chi-square test, we first determine the expected results. The chi-square test must always be applied to numbers of progeny, not to proportions or percentages. Let’s consider a locus for coat color in domestic cats, for which black color (B) is dominant over gray (b). If we crossed two heterozygous black cats (Bb  Bb), we would expect a 3:1 ratio of black and gray kittens. A series of such crosses yields a total of 50 kittens — 30 black and 20 gray. These numbers are our observed values. We can obtain the expected numbers by multiplying the expected proportions by the total number of observed progeny. In this case, the expected number of black kittens is 34  50  37.5 and the expected number of gray kittens is 14  50  12.5. The chi-square ( 2) value is calculated by using the following formula: inheritance of this character is more complicated than was assumed or perhaps some of the yellow progeny died before they were counted. Clearly, we need some means of evaluating how likely it is that chance is responsible for the deviation between the observed and the expected numbers. To evaluate the role of chance in producing deviations between observed and expected values, a statistical test called the goodness-of-fit chi-square test is used. This test provides information about how well observed values fit expected values. Before we learn how to calculate the chi square, it is important to understand what this test does and does not indicate about a genetic cross. The chi-square test cannot tell us whether a genetic cross has been correctly carried out, whether the results are correct, or whether we have chosen the correct genetic explanation for the results. What it does indicate is the probability that the difference between the observed and the expected values is due to chance. In other words, it indicates the likelihood that chance alone could produce the deviation between the expected and the observed values. If we expected 20 brown and 20 yellow progeny from a genetic cross, the chi-square test gives the probability that we might observe 25 brown and 15 yellow progeny simply owing to chance deviations from the expected 20:20 ratio. When the probability calculated from the chi-square test is high, we assume that chance alone produced the difference. When the probability is low, we assume that some factor other than chance — some significant factor — produced the deviation. 2   (observed  expected)2 expected where  means the sum of all the squared differences between observed and expected divided by the expected values. To calculate the chi-square value for our black and gray kittens, we would first subtract the number of expected black kittens from the number of observed black kittens (30  37.5  7.5) and square this value: 7.52  56.25. We then divide this result by the expected number of black kittens, 56.25/37.5,  1.5. We repeat the calculations on the number of expected gray kittens: (20  12.5)2/12.5  4.5. To obtain the overall chi-square value, we sum the (observed  expected)2/expected values: 1.5  4.5  6.0. Table 3.4 Critical values of the 2 distribution P df .995 .975 .9 .5 .1 .05 .025 .01 .005 1 .000 .000 0.016 0.455 2.706 3.841 5.024 6.635 7.879 2 0.010 0.051 0.211 1.386 4.605 5.991 7.378 9.210 10.597 3 0.072 0.216 0.584 2.366 6.251 7.815 9.348 11.345 12.838 4 0.207 0.484 1.064 3.357 7.779 9.488 11.143 13.277 14.860 5 0.412 0.831 1.610 4.351 9.236 11.070 12.832 15.086 16.750 6 0.676 1.237 2.204 5.348 10.645 12.592 14.449 16.812 18.548 7 0.989 1.690 2.833 6.346 12.017 14.067 16.013 18.475 20.278 8 1.344 2.180 3.490 7.344 13.362 15.507 17.535 20.090 21.955 9 1.735 2.700 4.168 8.343 14.684 16.919 19.023 21.666 23.589 10 2.156 3.247 4.865 9.342 15.987 18.307 20.483 23.209 25.188 11 2.603 3.816 5.578 10.341 17.275 19.675 21.920 24.725 26.757 12 3.074 4.404 6.304 11.340 18.549 21.026 23.337 26.217 28.300 13 3.565 5.009 7.042 12.340 19.812 22.362 24.736 27.688 29.819 14 4.075 5.629 7.790 13.339 21.064 23.685 26.119 29.141 31.319 15 4.601 6.262 8.547 14.339 22.307 24.996 27.488 30.578 32.801 P, probability; df, degrees of freedom. Basic Principles of Heredity The next step is to determine the probability associated with this calculated chi-square value, which is the probability that the deviation between the observed and the expected results could be due to chance. This step requires us to compare the calculated chi-square value (6.0) with theoretical values that have the same degrees of freedom in a chi-square table. The degrees of freedom represent the number of ways in which the observed classes are free to vary. For a goodness-of-fit chi-square test, the degrees of freedom are equal to n  1, where n is the number of different expected phenotypes. In our example, there are two expected phenotypes (black and gray); so n  2 and the degree of freedom equals 2  1  1. Now that we have our calculated chi-square value and have figured out the associated degrees of freedom, we are ready to obtain the probability from a chi-square table (Table 3.4). The degrees of freedom are given in the lefthand column of the table and the probabilities are given at the top; within the body of the table are chi-square values associated with these probabilities. First, find the row for the appropriate degrees of freedom; for our example with 1 degree of freedom, it is the first row of the table. Find where our calculated chi-square value (6.0) lies among the theoretical values in this row. The theoretical chi-square values increase from left to right and the probabilities decrease from left to right. Our chi-square value of 6.0 falls between the value of 5.024, associated with a probability of .025, and the value of 6.635, associated with a probability of .01. Thus, the probability associated with our chi-square value is less than .025 and greater than .01. So, there is less than a 2.5% probability that the deviation that we observed between the expected and the observed numbers of black and gray kittens could be due to chance. Most scientists use the .05 probability level as their cutoff value: if the probability of chance being responsible for the deviation is greater than or equal to .05, they accept that chance may be responsible for the deviation between the observed and the expected values. When the probability is less than .05, scientists assume that chance is not responsible and a significant difference exists. The expression significant difference means that some factor other than chance is responsible for the observed values being different from the expected values. In regard to the kittens, perhaps one of the genotypes experienced increased mortality before the progeny were counted or perhaps other genetic factors skewed the observed ratios. In choosing .05 as the cutoff value, scientists have agreed to assume that chance is responsible for the deviations between observed and expected values unless there is strong evidence to the contrary. It is important to bear in mind that even if we obtain a probability of, say, .01, there is still a 1% probability that the deviation between the observed and expected numbers is due to nothing more than chance. Calculation of the chi-square value is illustrated in ( ◗ FIGURE 3.15). P generation Purple flowers White flowers  Cross F1 generation A plant with purple flowers is crossed with a plant with white flowers, and the F1 are self-fertilized…. Purple flowers Intercross …to produce 105 F2 progeny with purple flowers and 45 with white flowers (an apparent 3:1 ratio). F2 generation 105 Purple 45 White Phenotype Observed Expected Purple 105 3/4 150 = 112.5 White Total 45 150 1/4 150 = 37.5 2 =  (O – E)2 E 2 = (105–112.5)2 112.5 2 = 2 The expected values are obtained by multiplying the expected proportion by the total… + (45–37.5)2 37.5 56.25 112.5 + 56.25 37.5 0.5 + = 1.5 = 2.0 Degrees of freedom = n –1 Degrees of freedom = 2–1=1 Probability (from Table 3.4) .1 < P< .5 …and then the chi-square value is calculated. The probability associated with the calculated chi-square value is between .10 and .50, indicating a high probability that the difference between observed and expected values is due to chance. Conclusion: No significant difference between observed and expected values. ◗ 3.15 A chi-square test is used to determine the probability that the difference between observed and expected values is due to chance. Concepts Differences between observed and expected ratios can arise by chance. The goodness-of-fit chi-square test can be used to evaluate whether deviations between observed and expected numbers are likely to be due to chance or to some other significant factor. Penetrance and Expressivity In the genetic crosses considered thus far, we have assumed that every individual with a particular genotype expresses 67 68 Chapter 3 the expected phenotype. We assumed, for example, that the genotype Rr always produces round seeds and that the genotype rr always produces wrinkled seeds. For some characters, such an assumption is incorrect: the genotype does not always produce the expected phenotype, a phenomenon termed incomplete penetrance. Incomplete penetrance is seen in human polydactyly, the condition of having extra fingers and toes ( ◗ FIGURE 3.16). There are several different forms of human polydactyly, but the trait is usually caused by a dominant allele. Occasionally, people possess the allele for polydactyly (as evidenced by the fact that their children inherit the polydactyly) but nevertheless have a normal number of fingers and toes. In these cases, the gene for polydactyly is not fully penetrant. Penetrance is defined as the percentage of individuals having a particular genotype that express the expected phenotype. For example, if we examined 42 people having an allele for polydactyly and found that only 38 of them were polydactylous, the penetrance would be 38/42  0.90 (90%). A related concept is that of expressivity, the degree to which a character is expressed. In addition to incomplete penetrance, polydactyly exhibits variable expressivity. Some polydactylous persons possess extra fingers and toes that are fully functional, whereas others possess only a small tag of extra skin. Incomplete penetrance and variable expressivity are due to the effects of other genes and to environmental factors that can alter or completely suppress the effect of a particular gene. A gene might encode an enzyme that produces a particular phenotype only within a limited temperature range. At higher or lower temperatures, the enzyme would not function and the phenotype would not be expressed; the allele encoding such an enzyme is therefore penetrant only within a particular temperature range. Many characters exhibit incomplete penetrance and variable expressivity, ◗ 3.16 Human polydactyly (extra digits) exhibits incomplete penetrance and variable expressivity. (Biophoto Associates/Science Source/Photo Researchers.) emphasizing the fact that the mere presence of a gene does not guarantee its expression. Concepts Penetrance is the percentage of individuals having a particular genotype who express the associated phenotype. Expressivity is the degree to which a trait is expressed. Incomplete penetrance and variable expressivity result from the influence of other genes and environmental factors on the phenotype. Connecting Concepts Across Chapters This chapter has introduced several important concepts of heredity and presented techniques for making predictions about the types of offspring that parents will produce. Two key principles of inheritance were introduced: the principles of segregation and independent assortment. These principles serve as the foundation for understanding much of heredity. In this chapter, we also learned some essential terminology and techniques for discussing and analyzing genetic crosses. A critical concept is the connection between the behavior of chromosomes during meiosis (Chapter 2) and the seemingly abstract symbols used in genetic crosses. The principles taught in this chapter provide important links to much of what follows in this book. In Chapters 4 through 7, we will learn about additional factors that affect the outcome of genetic crosses: sex, interactions between genes, linkage between genes, and environment. These factors build on the principles of segregation and independent assortment. In Chapters 10 through 21, where we focus on molecular aspects of heredity, the importance of these basic principles is not so obvious, but most nuclear processes are based on the inheritance of chromosomal genes. In Chapters 22 and 23, we turn to quantitative and population genetics. These chapters build directly on the principles of heredity and can only be understood with a firm grasp of how genes are inherited. The material covered in the present chapter therefore serves as a foundation for almost all of heredity. Finally, this chapter introduces problem solving, which is at the heart of genetics. Developing hypotheses to explain genetic phenomenon (such as the types and proportions of progeny produced in a genetic cross) and testing these hypotheses by doing genetic crosses and collecting additional data are common to all of genetics. The ability to think analytically and draw logical conclusions from observations are emphasized throughout this book. Basic Principles of Heredity 69 CONCEPTS SUMMARY • Gregor Mendel, an Austrian monk living in what is now the Czech Republic, first discovered the principles of heredity by conducting experiments on pea plants. • Mendel’s success can be attributed to his choice of the pea plant as an experimental organism, the use of characters with a few, easily distinguishable phenotypes, his experimental approach, and careful attention to detail. • Genes are inherited factors that determine a character. Alternate forms of a gene are called alleles. The alleles are located at a specific place, a locus, on a chromosome, and the set of genes that an individual possesses is its genotype. Phenotype is the manifestation or appearance of a characteristic and may refer to physical, biochemical, or behavioral characteristics. • Phenotypes are produced by the combined effects of genes and environmental factors. Only the genotype — not the phenotype — is inherited. • The principle of segregation states that an individual possesses two alleles coding for a trait and that these two alleles separate in equal proportions when gametes are formed. • The concept of dominance indicates that, when dominant and recessive alleles are present in a heterozygote, only the trait of the dominant allele is observed in the phenotype. • The two alleles of a genotype are located on homologous chromosomes, which separate during anaphase I of meiosis. The separation of homologous chromosomes brings about the segregation of alleles. • The types of progeny produced from a genetic cross can be predicted by applying the Punnett square or probability. • Probability is the likelihood of a particular event occurring. The multiplication rule of probability states that the probability of two or more independent events occurring together is calculated by multiplying the probabilities of the independent events. The addition rule of probability states • • • • • • that the probability of any of two or more mutually exclusive events occurring is calculated by adding the probabilities of the events. The binomial expansion may be used to determine the probability of a particular combination of events. A testcross reveals the genotype (homozygote or heterozygote) of an individual having a dominant trait and consists of crossing that individual with one having the homozygous recessive genotype. Incomplete dominance occurs when a heterozygote has a phenotype that is intermediate between the phenotypes of the two homozygotes. The principle of independent assortment states that genes coding for different characters assort independently when gametes are formed. Independent assortment is based on the random separation of homologous pairs of chromosomes during anaphase I of meiosis; it occurs when genes coding for two characters are located on different pairs of chromosomes. When genes assort independently, the multiplication rule of probability can be used to obtain the probability of inheriting more than one trait: a cross including more than one trait can be broken down into simple crosses, and the probabilities of obtaining any combination of traits can be obtained by multiplying the probabilities for each trait. • Observed ratios of progeny from a genetic cross may deviate from the expected ratios owing to chance. The goodnessof-fit chi-square test can be used to determine the probability that a difference between observed and expected numbers is due to chance. • Penetrance is the percentage of individuals with a particular genotype that exhibit the expected phenotype. Expressivity is the degree to which a character is expressed. Incomplete penetrance and variable expressivity result from the influence of other genes and environmental effects on the phenotype. IMPORTANT TERMS gene (p. 47) allele (p. 47) locus (p. 47) genotype (p. 48) homozygous (p. 48) heterozygous (p. 48) phenotype (p. 48) monohybrid cross (p. 48) P (parental) generation (p. 48) F1 (filial 1) generation (p. 49) reciprocal crosses (p. 49) F2 (filial 2) generation (p. 49) dominant (p. 51) recessive (p. 51) principle of segregation (Mendel’s first law) (p. 51) concept of dominance (p. 51) chromosome theory of heredity (p. 52) backcross (p. 52) Punnett square (p. 53) probability (p. 54) multiplication rule (p. 54) addition rule (p. 55) testcross (p. 57) incomplete dominance (p. 58) wild type (p. 58) dihybrid cross (p. 59) principle of independent assortment (Mendel’s second law) (p. 60) trihybrid cross (p. 63) goodness-of-fit chi-square test (p. 66) incomplete penetrance (p. 68) penetrance (p. 68) expressivity (p. 68) 70 Chapter 3 Worked Problems 1. Short hair in rabbits (S) is dominant over long hair (s). The following crosses are carried out, producing the progeny shown. Give all possible genotypes of the parents in each cross. (a) (b) (c) (d) (e) Parents short  short short  short short  long short  long long  long Progeny 4 short and 2 long 8 short 12 short 3 short and 1 long 2 long • Solution For this problem, it is useful to first gather as much information about the genotypes of the parents as possible on the basis of their phenotypes. We can then look at the types of progeny produced to provide the missing information. Notice that the problem asks for all possible genotypes of the parents. (a) short  short 4 short and 2 long Because short hair is dominant over long hair, a rabbit having short hair could be either SS or Ss. The two long-haired offspring must be homozygous (ss) because long hair is recessive and will appear in the phenotype only when both alleles for long hair are present. Because each parent contributes one of the two alleles found in the progeny, each parent must be carrying the s allele and must therefore be Ss. (b) short  short 8 short The short-haired parents could be SS or Ss. All 8 of the offspring are short (S_), and so at least one of the parents is likely to be homozygous (SS); if both parents were heterozygous, 14 longhaired (ss) progeny would be expected, but we do not observe any long-haired progeny. The other parent could be homozygous (SS) or heterozygous (Ss); as long as one parent is homozygous, all the offspring will be short haired. It is theoretically possible, although unlikely, that both parents are heterozygous (Ss  Ss). If this were the case, we would expect 2 of the 8 progeny to be long haired. Although no long-haired progeny are observed, it is possible that just by chance no long-haired rabbits would be produced among the 8 progeny of the cross. (c) short  long 12 short The short-haired parent could be SS or Ss. The long-haired parent must be ss. If the short-haired parent were heterozygous (Ss), half of the offspring would be expected to be long haired, but we don’t see any long-haired progeny. Therefore this parent is most likely homozygous (SS). It is theoretically possible, although unlikely, that the parent is heterozygous and just by chance no long-haired progeny were produced. (d) short  long 3 short and 1 long On the basis of its phenotype, the short-haired parent could be homozygous (SS) or heterozygous (Ss), but the presence of one long-haired offspring tells us that the short-haired parent must be heterozygous (Ss). The long-haired parent must be homozygous (ss). (e) long  long 2 long Because long hair is recessive, both parents must be homozygous for a long-hair allele (ss). 2. In cats, black coat color is dominant over gray. A female black cat whose mother is gray mates with a gray male. If this female has a litter of six kittens, what is the probability that three will be black and three will be gray? • Solution Because black (G) is dominant over gray (g), a black cat may be homozygous (GG) or heterozygous (gg). The black female in this problem must be heterozygous (Bb) because her mother is gray (gg) and she must inherit one of her mother’s alleles. The gray male is homozygous (gg) because gray is recessive. Thus the cross is: Gg  gg black female gray male p 2 Gg black 1 2 gg gray 1 We can use the binomial expansion to determine the probability of obtaining three black and three gray kittens in a litter of six. Let a equal the probability of a kitten being black and b equal the probability of a kitten being gray. The binomial is (a  b)6, the expansion of which is: (a  b)6  a6  6a5b  15a4b2  20a3b3  15a2b4  6a1b5  b6 (See text for an explanation of how to expand the binomial.) The probability of obtaining three black and three gray kittens in a litter of six is provided by the term 20a3b3. The probabilities of a and b are both 12 , so the overall probability is 20(12)3(12)3  2064  516 . 3. The following genotypes are crossed: AaBbCdDd  AaBbCcDd. Give the proportion of the progeny of this cross having the following genotypes: (a) AaBbCcDd, (b) aabbccdd, (c) AaBbccDd. • Solution This problem is easily worked if the cross is broken down into simple crosses and the multiplication rule is used to find the different combinations of genotypes: Locus 1 Locus 2 Aa  Aa  14 AA, 12 Aa, 14 aa Bb  Bb  14 BB, 12 Bb, 14 bb Basic Principles of Heredity Locus 3 Locus 4 Cc  Cc  14 CC, 12 Cc, 14 cc Dd  Dd  14 DD, 12 Dd, 14 dd To find the probability of any combination of genotypes, simply multiply the probabilities of the different genotypes: 1 2 (Aa)  12 (Bb)  12 (Cc)  12 (Dd)  116 (a) AaBbCcDd 1 (b) aabbccdd 4 (aa)  14 (bb)  14 (cc)  14 (dd)  1256 1 2 (Aa)  12 (Bb)  14 (cc)  12 (Dd)  132 (c) AaBbccDd 4. In corn, purple kernels are dominant over yellow kernels, and full kernels are dominant over shrunken kernels. A corn plant having purple and full kernels is crossed with a plant having yellow and shrunken kernels, and the following progeny are obtained: purple, full purple, shrunken yellow, full yellow, shrunken 112 103 91 94 What are the most likely genotypes of the parents and progeny? Test your genetic hypothesis with a chi-square test. • Solution The best way to begin this problem is by breaking the cross down into simple crosses for a single characteristic (seed color or seed shape): P F1 purple  yellow full  shrunken 112  103  215 purple 91  94  185 yellow 112  91  203 full 103  94  197 shrunken Purple  yellow produces approximately 12 purple and 12 yellow. A 11 ratio is usually caused by a cross between a heterozygote and a homozygote. Because purple is dominant, the purple parent must be heterozygous (Pp) and the yellow parent must be homozygous (pp). The purple progeny produced by this cross will be heterozygous (Pp) and the yellow progeny must be homozygous (pp). Now let’s examine the other character. Full  shrunken produces 12 full and 12 shrunken, or a 1:1 ratio, and so these progeny phenotypes also are produced by a cross between a heterozygote (Ff ) and a homozygote (ff ); the full-kernel progeny will be heterozygous (Ff ) and the shrunken-kernel progeny will be homozygous (ff ). Now combine the two crosses and use the multiplication rule to obtain the overall genotypes and the proportions of each genotype: P F1 purple, full  yellow, shrunken PpFf  ppyy PpFf  12 purple  12 full Ppff  12 purple  12 shrunken ppFf  12 yellow  12 full ppff  12 yellow  12 shrunken  14 purple, full  14 purple, shrunken  14 yellow, full  14 yellow shrunken 71 Our genetic explanation predicts that, from this cross, we should see 14 purple, full-kernel progeny; 14 purple, shrunken-kernel progeny; 14 yellow, full-kernel progeny; and 14 yellow, shrunkenkernel progeny. A total of 400 progeny were produced; so 14  400  100 of each phenotype are expected. These observed numbers do not fit the expected numbers exactly. Could the difference between what we observe and what we expect be due to chance? If the probability is high that chance alone is responsible for the difference between observed and expected, we will assume that the progeny have been produced in the 1111 ratio predicted by the cross. If the probability that the difference between observed and expected is due to chance is low, the progeny are not really in the predicted ratio and some other, significant factor must be responsible for the deviation. The observed and expected numbers are: Phenotype Observed purple full purple shrunken yellow full yellow shrunken 112 103 91 94 Expected 4  400  100 1 4  400  100 1 4  400  100 1 4  400  100 1 To determine the probability that the difference between observed and expected is due to chance, we calculate a chi-square value with the formula 2  [(observed  expected)2/expected]: (112  100)2 (103  100)2 (91  100)2   100 100 100 (94  100)2  100 122 32 92 62     100 100 100 100 9 81 36 144     100 100 100 100 2   1.44  0.09  0.81  0.36  2.70 Now that we have the chi-square value, we must determine the probability of this chi-square value being due to chance. To obtain this probability, we first calculate the degrees of freedom, which for a goodness-of-fit chi-square test are n  1, where n equals the number of expected phenotypic classes. In this case, there are four expected phenotypic classes; so the degrees of freedom equal 4  1  3. We must now look up the chi-square value in a chi-square table (see Table 3.4). We select the row corresponding to 3 degrees of freedom and look along this row to find our calculated chisquare value. The calculated chi-square value of 2.7 lies between 2.366 (a probability of .5) and 6.251 (a probability of .1). The probability (P) associated with the calculated chi-square value is therefore .5  P  .1. This is the probability that the difference between what we observed and what we expect is due to chance, which in this case is relatively high, and so chance is likely responsible for the deviation. We can conclude that the progeny do appear in the 1111 ratio predicted by our genetic explanation. 72 Chapter 3 COMPREHENSION QUESTIONS * 1. Why was Mendel’s approach to the study of heredity so successful? 2. What is the relation between the terms allele, locus, gene, and genotype? * 3. What is the principle of segregation? Why is it important? 4. What is the concept of dominance? How does dominance differ from incomplete dominance? 5. Give the phenotypic ratios that may appear among the progeny of simple crosses and the genotypes of the parents that may give rise to each ratio. 6. Give the genotypic ratios that may appear among the progeny of simple crosses and the genotypes of the parents that may give rise to each ratio. * 7. What is the chromosome theory of inheritance? Why was it important? 8. What is the principle of independent assortment? How is it related to the principle of segregation? 9. How is the principle of independent assortment related to meiosis? 10. How is the goodness-of-fit chi-square test used to analyze genetic crosses? What does the probability associated with a chi-square value indicate about the results of a cross? 11. What is incomplete penetrance and what causes it? APPLICATION QUESTIONS AND PROBLEMS *12. In cucumbers, orange fruit color (R) is dominant over cream fruit color (r). A cucumber plant homozygous for orange fruits is crossed with a plant homozygous for cream fruits. The F1 are intercrossed to produce the F2. (a) Give the genotypes and phenotypes of the parents, the F1, and the F2. (b) Give the genotypes and phenotypes of the offspring of a 15. backcross between the F1 and the orange parent. (c) Give the genotypes and phenotypes of a backcross between the F1 and the cream parent. *13. In rabbits, coat color is a genetically determined characteristic. Some black females always produce black progeny, whereas other black females produce black progeny and white progeny. Explain how these outcomes occur. *16. A *14. In cats, blood type A results from an allele (I ) that is dominant over an allele (iB) that produces blood type B. There is no O blood type. The blood types of male and female cats that were mated and the blood types of their kittens follow. Give the most likely genotypes for the parents of each litter. Male parent (a) blood type A (b) blood type B (c) blood type B (d) blood type A Female parent Kittens blood type B 4 kittens with blood type A, 3 with blood type B blood type B 6 kittens with blood type B blood type A 8 kittens with blood type A blood type A 7 kittens with blood type A, 2 kittens with blood type B Male parent (e) blood type A Female parent Kittens blood type A 10 kittens with blood type A (f) blood type A blood type B 4 kittens with blood type A, 1 kitten with blood type B In sheep, lustrous fleece (L) results from an allele that is dominant over an allele for normal fleece (l). A ewe (adult female) with lustrous fleece is mated with a ram (adult male) with normal fleece. The ewe then gives birth to a single lamb with normal fleece. From this single offspring, is it possible to determine the genotypes of the two parents? If so, what are their genotypes? If not, why not? In humans, alkaptonuria is a metabolic disorder in which affected persons produce black urine (see the introduction to this chapter). Alkaptonuria results from an allele (a) that is recessive to the allele for normal metabolism (A). Sally has normal metabolism, but her brother has alkaptonuria. Sally’s father has alkaptonuria, and her mother has normal metabolism. (a) Give the genotypes of Sally, her mother, her father, and her brother. (b) If Sally’s parents have another child, what is the probability that this child will have alkaptonuria? (c) If Sally marries a man with alkaptonuria, what is the probability that their first child will have alkaptonuria? 17. Suppose that you are raising Mongolian gerbils. You notice that some of your gerbils have white spots, whereas others have solid coats. What type of crosses could you carry out to determine whether white spots are due to a recessive or a dominant allele? Basic Principles of Heredity 73 *18. Hairlessness in American rat terriers is recessive to the presence of hair. Suppose that you have a rat terrier with hair. How can you determine whether this dog is homozygous or heterozygous for the hairy trait? wings is crossed with a homozygous cockroach having curved wings. The F1 are intercrossed to produce the F2. Assume that the pair of chromosomes containing the locus for wing shape is metacentric. Draw this pair of chromosomes as it would appear in the parents, the F1, and each class of F2 progeny at 19. In snapdragons, red flower color (R) is incompletely metaphase I of meiosis. Assume that no crossing over takes dominant over white flower color (r); the heterozygotes place. At each stage, label a location for the alleles for wing produce pink flowers. A red snapdragon is crossed with a shape (c and c) on the chromosomes. white snapdragon, and the F1 are intercrossed to produce the F2. *25. In guinea pigs, the allele for black fur (B) is dominant over (a) Give the genotypes and phenotypes of the F1 and F2, the allele for brown (b) fur. A black guinea pig is crossed along with their expected proportions. with a brown guinea pig, producing five F1 black guinea (b) If the F1 are backcrossed to the white parent, what will pigs and six F1 brown guinea pigs. the genotypes and phenotypes of the offspring be? (a) How many copies of the black allele (B) will be present (c) If the F1 are backcrossed to the red parent, what are the in each cell from an F1 black guinea pig at the following genotypes and phenotypes of the offspring? stages: G , G , metaphase of mitosis, metaphase I of meiosis, 1 20. What is the probability of rolling one six-sided die and obtaining the following numbers? (a) 2 (c) An even number (b) 1 or 2 (d) Any number but a 6 *21. What is the probability of rolling two six-sided dice and obtaining the following numbers? (a) 2 and 3 (b) 6 and 6 (c) At least one 6 (d) Two of the same number (two 1s, or two 2s, or two 3s, etc.) (e) An even number on both dice (f) An even number on at least one die *22. In a family of seven children, what is the probability of obtaining the following numbers of boys and girls? (a) All boys (b) All children of the same sex (c) Six girls and one boy (d) Four boys and three girls (e) Four girls and three boys 23. Phenylketonuria (PKU) is a disease that results from a recessive gene. Two normal parents produce a child with PKU. (a) What is the probability that a sperm from the father will contain the PKU allele? (b) What is the probability that an egg from the mother will contain the PKU allele? (c) What is the probability that their next child will have PKU? (d) What is the probability that their next child will be heterozygous for the PKU gene? *24. In German cockroaches, curved wing (c) is recessive to normal wing (c). A homozygous cockroach having normal 2 metaphase II of meiosis, and after the second cytokinesis following meiosis? Assume that no crossing over takes place. (b) How may copies of the brown allele (b) will be present in each cell from an F1 brown guinea pig at the same stages? Assume that no crossing over takes place. 26. In watermelons, bitter fruit (B) is dominant over sweet fruit (b), and yellow spots (S) are dominant over no spots (s). The genes for these two characteristics assort independently. A homozygous plant that has bitter fruit and yellow spots is crossed with a homozygous plant that has sweet fruit and no spots. The F1 are intercrossed to produce the F2. (a) What will be the phenotypic ratios in the F2? (b) If an F1 plant is backcrossed with the bitter, yellow spotted parent, what phenotypes and proportions are expected in the offspring? (c) If an F1 plant is backcrossed with the sweet, nonspotted parent, what phenotypes and proportions are expected in the offspring? 27. In cats, curled ears (Cu) result from an allele that is dominant over an allele for normal ears (cu). Black color results from an independently assorting allele (G) that is dominant over an allele for gray (g). A gray cat homozygous for curled ears is mated with a homozygous black cat with normal ears. All the F1 cats are black and have curled ears. (a) If two of the F1 cats mate, what phenotypes and proportions are expected in the F2? (b) An F1 cat mates with a stray cat that is gray and possesses normal ears. What phenotypes and proportions of progeny are expected from this cross? *28. The following two genotypes are crossed: AaBbCcddEe  AabbCcDdEe. What will the proportion of the following genotypes be among the progeny of this cross? (a) (b) (c) (d) AaBbCcDdEe AabbCcddee aabbccddee AABBCCDDEE 74 Chapter 3 29. In mice, an allele for apricot eyes (a) is recessive to an allele for brown eyes (a). At an independently assorting locus, an allele for tan (t) coat color is recessive to an allele for black (t) coat color. A mouse that is homozygous for brown eyes and black coat color is crossed with a mouse having apricot eyes and a tan coat. The resulting F1 are intercrossed to produce the F2. In a litter of eight F2 mice, what is the probability that two will have apricot eyes and tan coats? 30. In cucumbers, dull fruit (D) is dominant over glossy fruit (d), orange fruit (R) is dominant over cream fruit (r), and bitter cotyledons (B) are dominant over nonbitter cotyledons (b). The three characters are encoded by genes located on different pairs of chromosomes. A plant homozygous for dull, orange fruit and bitter cotyledons is crossed with a plant that has glossy, cream fruit and nonbitter cotyledons. The F1 are intercrossed to produce the F2. (a) Give the phenotypes and their expected proportions in the F2. (b) An F1 plant is crossed with a plant that has glossy, cream fruit and nonbitter cotyledons. Give the phenotypes and expected proportions among the progeny of this cross. * 31. A and a are alleles located on a pair of metacentric chromosomes. B and b are alleles located on a pair of acrocentric chromosomes. A cross is made between individuals having the following genotypes: AaBb  aabb. (a) Draw the chromosomes as they would appear in each type of gamete produced by the individuals of this cross. (b) For each type of progeny resulting from this cross, draw the chromosomes as they would appear in a cell at G1, G2, and metaphase of mitosis. 32. Ptosis (droopy eyelid) may be inherited as a dominant human trait. Among 40 people who are heterozygous for the ptosis allele, 13 have ptosis and 27 have normal eyelids. (a) What is the penetrance for ptosis? (b) If ptosis exhibited variable expressivity, what would it mean? 33. In sailfin mollies (fish), gold color is due to an allele (g) that is recessive to the allele for normal color (G). A gold fish is crossed with a normal fish. Among the offspring, 88 are normal and 82 are gold. (a) What are the most likely genotypes of the parents in this cross? (b) Assess the plausibility of your hypothesis by performing a chi-square test. 34. In guinea pigs, the allele for black coat color (B) is dominant over the allele for white coat color (b). At an independently assorting locus, an allele for rough coat (R) is dominant over an allele for smooth coat (r). A guinea pig that is homozygous for black color and rough coat is crossed with a guinea pig that has a white and smooth coat. In a series of matings, the F1 are crossed with guinea pigs having white, smooth coats. From these matings, the following phenotypes appear in the offspring: 24 black, rough guinea pigs; 26 black, smooth guinea pigs; 23 white, rough guinea pigs; and 5 white, smooth guinea pigs. (a) Using a chi-square test, compare the observed numbers of progeny with those expected from the cross. (b) What conclusions can you draw from the results of the chi-square test? (c) Suggest an explanation for these results. CHALLENGE QUESTIONS 35. Dwarfism is a recessive trait in Hereford cattle. A rancher in western Texas discovers that several of the calves in his herd are dwarfs, and he wants to eliminate this undesirable trait from the herd as rapidly as possible. Suppose that the rancher hires you as a genetic consultant to advise him on how to breed the dwarfism trait out of the herd. What crosses would you advise the rancher to conduct to ensure that the allele causing dwarfism is eliminated from the herd? 36. A geneticist discovers an obese mouse in his laboratory colony. He breeds this obese mouse with a normal mouse. All the F1 mice from this cross are normal in size. When he interbreeds two F1 mice, eight of the F2 mice are normal in size and two are obese. The geneticist then intercrosses two of his obese mice, and he finds that all of the progeny from this cross are obese. These results lead the geneticist to conclude that obesity in mice results from a recessive allele. A second geneticist at a different university also discovers an obese mouse in her laboratory colony. She carries out the same crosses as the first geneticist did and obtains the same results. She also concludes that obesity in mice results from a recessive allele. One day the two geneticists meet at a genetics conference, learn of each other’s experiments, and decide to exchange mice. They both find that, when they cross two obese mice from the different laboratories, all the offspring are normal; however, when they cross two obese mice from the same laboratory, all the offspring are obese. Explain their results. 37. Albinism is a recessive trait in humans. A geneticist studies a series of families in which both parents are normal and at least one child has albinism. The geneticist reasons that both parents in these families must be heterozygotes and that albinism should appear in 14 of the children of these families. To his surprise, the geneticist finds that the Basic Principles of Heredity frequency of albinism among the children of these families is considerably greater than 14. There is no evidence that normal pigmentation exhibits incomplete penetrance. Can you think of an explanation for the higher-than-expected frequency of albinism among these families? 38. Two distinct phenotypes are found in the salamander Plethodon cinereus: a red form and a black form. Some biologists have speculated that the red phenotype is due to an autosomal allele that is dominant over an allele for black. Unfortunately, these salamanders will not mate in captivity; so the hypothesis that red is dominant over black has never been tested. One day a genetics student is hiking through the forest and finds 30 female salamanders, some red and some black, 75 laying eggs. The student places each female and her eggs (about 20 – 30 eggs per female) in separate plastic bags and takes them back to the lab. There, the student successfully raises the eggs until they hatch. After the eggs have hatched, the student records the phenotypes of the juvenile salamanders, along with the phenotypes of their mothers. Thus, the student has the phenotypes for 30 females and their progeny, but no information is available about the phenotypes of the fathers. Explain how the student can determine whether red is dominant over black with this information on the phenotypes of the females and their offspring. SUGGESTED READINGS Corcos, A., and F. Monaghan. 1985. Some myths about Mendel’s experiments. The American Biology Teacher 47:233 – 236. An excellent discussion of some misconceptions surrounding Mendel’s life and discoveries. Dronamraju, K. 1992. Profiles in genetics: Archibald E. Garrod. American Journal of Human Genetics 51:216 – 219. A brief biography of Archibald Garrod and his contributions to genetics. Dunn, L. C. 1965. A Short History of Genetics. New York: McGraw-Hill. An older but very good history of genetics. Garrod, A. E. 1902. The incidence of alkaptonuria: a study in chemical individuality. Lancet 2:1616 – 1620. Garrod’s original paper on the genetics of alkaptonuria. Henig, R. M. 2001. The Monk in the Garden: The Lost and Found Genius of Gregor Mendel, the Father of Genetics. Boston: Houghton Mifflin. A creative history of Gregor Mendel, in which the author has used historical research to create a vivid portrait of Mendel’s life and work. Monaghan, F. V., and A. F. Corcos. 1987. Reexamination of the fate of Mendel’s paper. Journal of Heredity 78:116 – 118. A good discussion of why Mendel’s paper was unappreciated by his peers. Orel, V. 1984. Mendel. Oxford: Oxford University Press. An excellent and authoritative biography of Mendel. Weiling, F. 1991. Historical study: Johann Gregor Mendel 1822 – 1884. American Journal of Medical Genetics 40:1 – 25. A fascinating account that contains much recent research on Mendel’s life as a scientist. 4 Sex Determination and Sex-Linked Characteristics • The Toothless, Hairless Men of Sind • Sex Determination Chromosomal Sex-Determining Systems Genic Sex-Determining Systems Environmental Sex Determination Sex Determination in Drosophila Sex Determination in Humans • Sex-Linked Characteristics X-linked White Eyes in Drosophila Nondisjunction and the Chromosome Theory of Inheritance X-linked Color Blindness in Humans Symbols for X-linked Genes Dosage Compensation Z-linked Characteristics Y-linked Characteristics This is Chapter 4 Opener photo legend to position here. (Credit for Chapter 4 opening photo allowing 2 additional lines which If we need, if we don’t then we can add to depth of photo.) (Historical Picture Archive/Corbis.) The Toothless, Hairless Men of Sind In 1875, Charles Darwin, author of On the Origin of Species, wrote of a peculiar family of Sind, a province in northwest India, in which ten men, in the course of four generations, were furnished in both jaws taken together, with only four small and weak incisor teeth and with eight posterior molars. The men thus affected have little hair on the body, and become bald early in life. They also suffer much during hot weather from excessive dryness of the skin. It is remarkable that no instance has occurred of a daughter being thus affected. . . . Though daughters in the above family are never affected, they transmit the tendency to their sons; and no case has occurred of a son transmitting it to his sons. These men possessed a genetic condition now known as anhidrotic ectodermal dysplasia, which (as noted by Darwin) is 76 characterized by small teeth, no sweat glands, and sparse body hair. Darwin also noted several key features of the inheritance of this disorder: although it occurs primarily in men, fathers never transmit the trait to their sons; unaffected daughters, however, may pass the trait to their sons (the grandsons of affected men). These features of inheritance are the hallmarks of a sex-linked trait, a major focus of this chapter. Although Darwin didn’t understand the mechanism of heredity, his attention to detail and remarkable ability to focus on crucial observations allowed him to identify the essential features of this genetic disease 25 years before Mendel’s principles of heredity became widely known. Darwin claimed that the daughters of this Hindu family were never affected, but it’s now known that some women do have mild cases of anhidrotic ectodermal dysplasia. In these women, the symptoms of the disorder appear on only some parts of the body. For example, some regions of the jaw are missing teeth, whereas other regions have normal teeth. There are irregular patches of skin having few or no sweat Sex Determination and Sex-Linked Characteristics P generation F1 generation F2 generation F3 generation In heterozygous females, there are irregular patches of skin having few or no sweat glands. The placement of these patches varies among affected women owing to random X-chromosome inactivation. began to conduct genetic studies on a wide array of different organisms. As they applied Mendel’s principles more widely, exceptions were observed, and it became necessary to devise extensions to his basic principles of heredity. In this chapter, we explore one of the major extensions to Mendel’s principles: the inheritance of characteristics encoded by genes located on the sex chromosomes, which differ in males and females ( ◗ FIGURE 4.2). These characteristics and the genes that produce them are referred to as sex linked. To understand the inheritance of sex-linked characteristics, we must first know how sex is determined — why some members of a species are male and others are female. Sex determination is the focus of the first part of the chapter. The second part examines how characteristics encoded by genes on the sex chromosomes are inherited. In Chapter 5, we will explore some additional ways in which sex and inheritance interact. As we consider sex determination and sex-linked characteristics, it will be helpful to think about two important principles. First, there are several different mechanisms of sex determination and, ultimately, the mechanism of sex determination controls the inheritance of sex-linked characteristics. Second, like other pairs of chromosomes, the X and Y sex chromosomes may pair in the course of meiosis and segregate, but throughout most of their length they are not homologous (their gene sequences don’t code for the same characteristics): most genes on the X chromosome are different from genes on the Y chromosome. Consequently, males and females do not possess the same number of alleles at sex-linked loci. This difference in the number of sex-linked alleles produces the distinct patterns of inheritance in males and females. Identical twins ◗ 4.1 Three generations of women heterozygous for the X-linked recessive disorder anhidrotic ectodermal dysplasia, which is inherited as an X-linked recessive trait. (After A. P. Mance and J. Mance, Genetics: Human Aspects, Sinauer, 1990, p. 133.) glands; the placement of these patches varies among affected women ( ◗ FIGURE 4.1). The patchy occurrence of these features is explained by the fact that the gene for anhidrotic ectodermal dysplasia is located on a sex chromosome. www.whfreeman.com/pierce Additional information about anhidrotic ectodermal dysplasia, including symptoms, history, and genetics In Chapter 3, we studied Mendel’s principles of segregation and independent assortment and saw how these principles explain much about the nature of inheritance. After Mendel’s principles were rediscovered in 1900, biologists ◗ 4.2 The sex chromosomes of males (Y) and females (X) are different. (Biophoto Associates/Photo Researchers.) 77 78 Chapter 4 Sex Determination Sexual reproduction is the formation of offspring that are genetically distinct from their parents; most often, two parents contribute genes to their offspring. Among most eukaryotes, sexual reproduction consists of two processes that lead to an alternation of haploid and diploid cells: meiosis produces haploid gametes, and fertilization produces diploid zygotes ( ◗ FIGURE 4.3). The term sex refers to sexual phenotype. Most organisms have only two sexual phenotypes: male and female. The fundamental difference between males and females is gamete size: males produce small gametes; females produce relatively large gametes ( ◗ FIGURE 4.4). The mechanism by which sex is established is termed sex determination. We define the sex of an individual in terms of the individual’s phenotype — ultimately, the type of gametes that it produces. Sometimes an individual has chromosomes or genes that are normally associated with one sex but a morphology corresponding to the opposite sex. For instance, the cells of female humans normally have two X chromosomes, and the cells of males have one X chromosome and one Y chromosome. A few rare persons have male anatomy, although their cells each contain two X chromosomes. Even though these people are genetically female, we refer to them as male because their sexual phenotype is male. Concepts In sexual reproduction, parents contribute genes to produce an offspring that is genetically distinct from both parents. In eukaryotes, sexual reproduction consists of meiosis, which produces haploid gametes, and fertilization, which produces a diploid zygote. 1 Meiosis produces haploid gametes. Gamete Haploid (1n ) Meiosis Fertilization Diploid (2n ) Zygote ◗ 2 Fertilization (fusion of gametes) produces a diploid zygote. 4.3 In most eukaryotic organisms, sexual reproduction consists of an alternation of haploid (1n) and diploid (2n) cells. ◗ 4.4 Male and female gametes (sperm and egg, respectively) differ in size. In this photograph, a human sperm (with flagellum) penetrates a human egg cell. (Francis Leroy, Biocosmos/Science Photo Library/Photo Researchers.) There are many ways in which sex differences arise. In some species, both sexes are present in the same individual, a condition termed hermaphroditism; organisms that bear both male and female reproductive structures are said to be monoecious (meaning “one house”). Species in which an individual has either male or female reproductive structures are said to be dioecious (meaning “two houses”). Humans are dioecious. Among dioecious species, the sex of an individual may be determined chromosomally, genetically, or environmentally. Chromosomal Sex-Determining Systems The chromosome theory of inheritance (discussed in Chapter 3) states that genes are located on chromosomes, which serve as the vehicles for gene segregation in meiosis. Definitive proof of this theory was provided by the discovery that the sex of certain insects is determined by the presence or absence of particular chromosomes. In 1891, Hermann Henking noticed a peculiar structure in the nuclei of cells from male insects. Understanding neither its function nor its relation to sex, he called this structure the X body. Later, Clarence E. McClung studied Henking’s X body in grasshoppers and recognized that it was a chromosome. McClung called it the accessory chromosome, but eventually it became known as the X chromosome, from Henking’s original designation. McClung observed that the cells of female grasshoppers had one more chromosome than the cells of male grasshoppers, and he concluded that accessory chromosomes played a role in sex determination. In 1905, Nettie Stevens and Edmund Wilson demonstrated that, in grasshoppers and other insects, the cells of females have two X chromosomes, whereas the cells of males have a single X. In some insects, they counted the same number of chromosomes in Sex Determination and Sex-Linked Characteristics P generation Male Female  XY XX Meiosis Gametes X Y X X Fertilization F1 generation X Sperm XX XY Female XX Male XY Female Male Y X Eggs X Conclusion: 1:1 sex ratio is produced. ◗ 4.5 Inheritance of sex in organisms with X and Y chromosomes results in equal numbers of male and female offspring. cells of males and females but saw that one chromosome pair was different: two X chromosomes were found in female cells, whereas a single X chromosome plus a smaller chromosome, which they called Y, was found in male cells. Stevens and Wilson also showed that the X and Y chromosomes separate into different cells in sperm formation; half of the sperm receive an X chromosome and half receive a Y. All egg cells produced by the female in meiosis receive one X chromosome. A sperm containing a Y chromosome unites with an X-bearing egg to produce an XY male, whereas a sperm containing an X chromosome unites with an X-bearing egg to produce an XX female ( ◗ FIGURE 4.5). This accounts for the 50:50 sex ratio observed in most dioecious organisms. Because sex is inherited like other genetically determined characteristics, Stevens and Wilson’s discovery that sex was associated with the inheritance of a particular chromosome also demonstrated that genes are on chromosomes. As Stevens and Wilson found for insects, sex is frequently determined by a pair of chromosomes, the sex chromosomes, which differ between males and females. The nonsex chromosomes, which are the same for males and females, are called autosomes. We think of sex in these organisms as being determined by the presence of the sex chromosomes, but in fact the individual genes located on the sex chromosomes are usually responsible for the sexual phenotypes. XX-XO sex determination The mechanism of sex determination in the grasshoppers studied by McClung is one of the simplest mechanisms of chromosomal sex determination and is called the XX-XO system. In this system, females have two X chromosomes (XX), and males possess a single X chromosome (XO). There is no O chromosome; the letter O signifies the absence of a sex chromosome. In meiosis in females, the two X chromosomes pair and then separate, with one X chromosome entering each haploid egg. In males, the single X chromosome segregates in meiosis to half the sperm cells — the other half receive no sex chromosome. Because males produce two different types of gametes with respect to the sex chromosomes, they are said to be the heterogametic sex. Females, which produce gametes that are all the same with respect to the sex chromosomes, are the homogametic sex. In the XX-XO system, the sex of an individual is therefore determined by which type of male gamete fertilizes the egg. X-bearing sperm unite with X-bearing eggs to produce XX zygotes, which eventually develop as females. Sperm lacking an X chromosome unite with X-bearing eggs to produce XO zygotes, which develop into males. XX-XY sex determination In many species, the cells of males and females have the same number of chromosomes, but the cells of females have two X chromosomes (XX) and the cells of males have a single X chromosome and a smaller sex chromosome called the Y chromosome (XY). In humans and many other organisms, the Y chromosome is acrocentric ( ◗ FIGURE 4.6), not Y shaped as is commonly assumed. In this type of sex-determining system, the male is the heterogametic sex — half of his gametes have an X chromosome and half have a Y chromosome. The female is the Primary pseudoautosomal region The X and Y chromosomes are homologous only at pseudoautosomal regions, which are essential for X–Y chromosome pairing in meiosis in the male. Secondary pseudoautosomal region Short arms Centromere Y chromosome ◗ Long arms X chromosome 4.6 The X and Y chromosomes in humans differ in size and genetic content. They are homologous only at the pseudoautosomal regions 79 80 Chapter 4 homogametic sex — all her egg cells contain a single X chromosome. Many organisms, including some plants, insects, and reptiles, and all mammals (including humans), have the XX-XY sex-determining system. Although the X and Y chromosomes are not generally homologous, they do pair and segregate into different cells in meiosis. They can pair because these chromosomes are homologous at small regions called the pseudoautosomal regions (see Figure 4.6), in which they carry the same genes. Genes found in these regions will display the same pattern of inheritance as that of genes located on autosomal chromosomes. In humans, there are pseudoautosomal regions at both tips of the X and Y chromosomes. P generation Male (n ) Female (2n)  Meiosis Gametes n Egg No fertilization Mitosis n Egg n Sperm Fertilization ZZ-ZW sex determination In this system, the female is heterogametic and the male is homogametic. To prevent confusion with the XX-XY system, the sex chromosomes in this system are labeled Z and W, but the chromosomes do not resemble Zs and Ws. Females in this system are ZW; after meiosis, half of the eggs have a Z chromosome and the other half have a W. Males are ZZ; all sperm contain a single Z chromosome. The ZZ-ZW system is found in birds, moths, some amphibians, and some fishes. F1 generation Haplodiploidy Some insects in the order Hymenoptera (bees, wasps, and ants) have no sex chromosomes; instead, sex is based on the number of chromosome sets found in the nucleus of each cell. Males develop from unfertilized eggs, and females develop from fertilized eggs. The cells of male hymenopterans possess only a single set of chromosomes (they are haploid) inherited from the mother. In contrast, the cells of females possess two sets of chromosomes (they are diploid), one set inherited from the mother and the other set from the father ( ◗ FIGURE 4.7). The haplodiploid method of sex determination produces some odd genetic relationships. When both parents are diploid, siblings on average have half their genes in common because they have a 50% chance of receiving the same allele from each parent. In these insects, males produce sperm by mitosis (they are already haploid); so all offspring receive the same set of paternal genes. The diploid females produce eggs by normal meiosis. Therefore, sisters have a 50% chance of 2n zygote Male Female Conclusion: In haplodiploidy, sex is determined by the number of chromosome sets (n or 2n). Concepts In XX-XO sex determination, the male is XO and heterogametic, and the female is XX and homogametic. In XX-XY sex determination, the male is XY and the female is XX; in this system the male is heterogametic. In ZZ-ZW sex determination, the female is ZW and the male is ZZ; in this system the female is the heterogametic sex. n zygote ◗ 4.7 In insects with haplodiploidy, males develop from unfertilized eggs and are haploid; females develop from fertilized eggs and are diploid. receiving the same allele from their mother and a 100% chance of receiving the same allele from their father; the average relatedness between sisters is therefore 75%. Brothers have a 50% chance of receiving the same copy of each of their mother’s two alleles at any particular locus; so their average relatedness is only 50%. The greater genetic relatedness among female siblings in insects with haplodiploid sex determination may contribute to the high degree of social cooperation that exists among females (the workers) of these insects. Concepts Some insects possess haplodiploid sex determination, in which males develop from unfertilized eggs and are haploid; females develop from fertilized eggs and are diploid. Genic Sex-Determining Systems In some plants and protozoans, sex is genetically determined, but there are no obvious differences in the chromosomes of males and females — there are no sex chromosomes. These Sex Determination and Sex-Linked Characteristics organisms have genic sex determination; genotypes at one or more loci determine the sex of an individual. It is important to understand that, even in chromosomal sex-determining systems, sex is actually determined by individual genes. For example, in mammals, a gene (SRY, discussed later in this chapter) located on the Y chromosome determines the male phenotype. In both genic sex determination and chromosomal sex determination, sex is controlled by individual genes; the difference is that, with chromosomal sex determination, the chromosomes that carry those genes appear different in males and females. Environmental Sex Determination Genes have had a role in all of the examples of sex determination discussed thus far, but sex is determined fully or in part by environmental factors in a number of organisms. One fascinating example of environmental sex determination is seen in the marine mollusk Crepidula fornicata, also known as the common slipper limpet ( ◗ FIGURE 4.8). Slipper limpets live in stacks, one on top of another. Each limpet begins life as a swimming larva. The first larva to settle on a solid, unoccupied substrate develops into a female limpet. It then produces chemicals that attract other larvae, which settle on top of it. These larvae develop into males, which then serve as mates for the limpet below. After a period of time, the males on top develop into females and, in turn, attract additional larvae that settle on top of the stack, develop into males, and serve as mates for the limpets under them. Limpets can form stacks of a dozen or more animals; the uppermost animals are always male. This type of sexual development is called sequential hermaphroditism; each individual animal can be both male and female, although not at the same time. In Crepidula fornicata, sex is determined environmentally by the limpet’s position in the stack. 1 A larva that settles on an unoccupied substrate develops into a female, which produces chemicals that attract other larvae. 2 The larvae attracted by the female settle on top of her and develop into males, which become mates for the original female. Environmental factors are also important in determining sex in many reptiles. Although most snakes and lizards have sex chromosomes, in many turtles, crocodiles, and alligators, temperature during embryonic development determines sexual phenotype. In turtles, for example, warm temperatures produce females during certain times of the year, whereas cool temperatures produce males. In alligators, the reverse is true. Concepts In genic sex determination, sex is determined by genes at one or more loci, but there are no obvious differences in the chromosomes of males and females. In environmental sex determination, sex is determined fully or in part by environmental factors. Sex Determination in Drosophila The fruit fly Drosophila melanogaster, has eight chromosomes: three pairs of autosomes and one pair of sex chromosomes ( ◗ FIGURE 4.9). Normally, females have two X chromosomes and males have an X chromosome and a Y chromosome. However, the presence of the Y chromosome does not determine maleness in Drosophila; instead, each fly’s sex is determined by a balance between genes on the autosomes and genes on the X chromosome. This type of sex determination is called the genic balance system. In this system, a number of genes seem to influence sexual development. The X chromosome contains genes with femaleproducing effects, whereas the autosomes contain genes with male-producing effects. Consequently, a fly’s sex is determined by the X:A ratio, the number of X chromosomes divided by the number of haploid sets of autosomal chromosomes. 3 Eventually the males on top switch sex, developing into females. ◗        4 They then attract additional larvae, which settle on top of the stack and develop into males. Time 4.8 In Crepidula fornicata, the common slipper limpet, sex is determined by an environmental factor, the limpet’s position in a stack of limpets.  81 82 Chapter 4  II  III Autosomes IV I X II III IV Concepts I Sex chromosomes X females, in spite of the presence of a Y chromosome. Flies with only a single X (an X:A ratio of 0.5), develop as males, although they are sterile. These observations confirm that the Y chromosome does not determine sex in Drosophila. Mutations in genes that affect sexual phenotype in Drosophila have been isolated. For example, the transformer mutation converts a female with an X:A ratio of 1.0 into a phenotypic male, whereas the doublesex mutation transforms normal males and females into flies with intersex phenotypes. Environmental factors, such as the temperature of the rearing conditions, also can affect the development of sexual characteristics. X Y ◗ 4.9 The chromosomes of Drosophila melanogaster (2n  8) consist of three pairs of autosomes (labelled I, II, and III) and one pair of sex chromosomes (labelled X and Y). An X:A ratio of 1.0 produces a female fly; an X:A ratio of 0.5 produces a male. If the X:A ratio is less than 0.5, a male phenotype is produced, but the fly is weak and sterile — such flies are sometimes called metamales. An X:A ratio between 1.0 and 0.50 produces an intersex fly, with a mixture of male and female characteristics. If the X:A ratio is greater than 1.0, a female phenotype is produced, but these flies (called metafemales) have serious developmental problems and many never emerge from the pupal case. Table 4.1 presents some different chromosome complements in Drosophila and their associated sexual phenotypes. Flies with two sets of autosomes and XXY sex chromosomes (an X:A ratio of 1.0) develop as fully fertile The sexual phenotype of a fruit fly is determined by the ratio of the number of X chromosomes to the number of haploid sets of autosomal chromosomes (the X:A ratio). www.whfreeman.com/pierce Links to many Internet resources on the genetics of Drosophila melanogaster Sex Determination in Humans Humans, like Drosophila, have XX-XY sex determination, but in humans the presence of a gene on the Y chromosome determines maleness. The phenotypes that result from abnormal numbers of sex chromosomes, which arise when the sex chromosomes do not segregate properly in meiosis or mitosis, illustrate the importance of the Y chromosome in human sex determination. Turner syndrome Persons who have Turner syndrome are female; they do not undergo puberty and their female Table 4.1 Chromosome complements and sexual phenotypes in Drosophila Sex-Chromosome Complement Haploid Sets of Autosomes XX AA 1.0 Female XY AA 0.5 Male XO AA 0.5 Male X:A Ratio Sexual Phenotype XXY AA 1.0 Female XXX AA 1.5 Metafemale XXXY AA 1.5 Metafemale XX AAA 0.67 Intersex XO AAA 0.33 Metamale XXXX AAA 1.3 Metafemale Sex Determination and Sex-Linked Characteristics (a) (b) ◗ 4.10 Persons with Turner syndrome have a single X chromosome in their cells. (a) Characteristic physical features. (b) Chromosomes from a person with Turner syndrome. (Part a, courtesy of Dr. Daniel C. Postellon, Devos Children’s Hospital; Part b, Dept. of Clinical Cytogenics, Addenbrookes Hospital/Science Photo Library/Photo Reseachers.) secondary sex characteristics remain immature: menstruation is usually absent, breast development is slight, and pubic hair is sparse. This syndrome is seen in 1 of 3000 female births. Affected women are frequently short and have a low hairline, a relatively broad chest, and folds of skin on the neck ( ◗ FIGURE 4.10). Their intelligence is usually normal. Most women who have Turner syndrome are sterile. In 1959, C. E. Ford used new techniques to study human chromosomes and discovered that cells from a 14-year-old girl with Turner syndrome had only a single X chromosome; this chromosome complement is usually referred to as XO. There are no known cases in which a person is missing both X chromosomes, an indication that at least one X chromosome is necessary for human development. Presumably, embryos missing both Xs are spontaneously aborted in the early stages of development. Klinefelter syndrome Persons who have Klinefelter syndrome, which occurs with a frequency of about 1 in 1000 male births, have cells with one or more Y chromosomes and multiple X chromosomes. The cells of most males having this condition are XXY, but cells of a few Klinefelter males are XXXY, XXXXY, or XXYY. Persons with this condition, though male, frequently have small testes, some breast enlargement, and reduced facial and pubic hair ( ◗ FIGURE 4.11). They are often taller than normal and sterile; most have normal intelligence. Poly-X females In about 1 in 1000 female births, the child’s cells possess three X chromosomes, a condition often referred to as triplo-X syndrome. These persons have no distinctive features other than a tendency to be tall and thin. Although a few are sterile, many menstruate regularly and are fertile. The incidence of mental retardation among triple-X females is slightly greater than in the general population, but most XXX females have normal intelligence. Much rarer are women whose cells contain four or five X chromosomes. These women usually have normal female anatomy but are mentally retarded and have a number of physical problems. The severity of mental retardation increases as the number of X chromosomes increases beyond three. www.whfreeman.com/pierce Further information about sex-chromosomal abnormalities in humans The role of sex chromosomes The phenotypes associated with sex-chromosome anomalies allow us to make several inferences about the role of sex chromosomes in human sex determination. 1. The X chromosome contains genetic information essential for both sexes; at least one copy of an X chromosome is required for human development. 2. The male-determining gene is located on the Y chromosome. A single copy of this chromosome, even in the presence of several X chromosomes, produces a male phenotype. 3. The absence of the Y chromosome results in a female phenotype. 4. Genes affecting fertility are located on the X and Y chromosomes. A female usually needs at least two copies of the X chromosome to be fertile. 5. Additional copies of the X chromosome may upset normal development in both males and females, producing physical and mental problems that increase as the number of extra X chromosomes increases. 83 84 Chapter 4 (a) (b) ◗ 4.11 Persons with Klinefelter syndrome have a Y chromosome and two or more X chromosomes in their cells. (a) Characteristic physical features. (b) Chromosomes of a person with Klinefelter syndrome. (Part a, to come; part b, Biophoto Associates/Science Source/ Photo Researchers.) The male-determining gene in humans The Y chromosome in humans and all other mammals is of paramount importance in producing a male phenotype. However, scientists discovered a few rare XX males whose cells apparently lack a Y chromosome. For many years, these males presented a real enigma: How could a male phenotype exist without a Y chromosome? Close examination eventually revealed a small part of the Y chromosome attached to another chromosome. This finding indicates that it is not the entire Y chromosome that determines maleness in humans; rather, it is a gene on the Y chromosome. Early in development, all humans possess undifferentiated gonads and both male and female reproductive ducts. Then, about 6 weeks after fertilization, a gene on the Y chromosome becomes active. By an unknown mechanism, this gene causes the neutral gonads to develop into testes, which begin to secrete two hormones: testosterone and Mullerianinhibiting substance. Testosterone induces the development of male characteristics, and Mullerian-inhibiting substance causes the degeneration of the female reproductive ducts. In the absence of this male-determining gene, the neutral gonads become ovaries, and female features develop. In 1987, David Page and his colleagues at the Massachusetts Institute of Technology located what appeared to be the male-determining gene near the tip of the short arm of the Y chromosome. They had examined the DNA of several XX males and XY females. The cells of one XX male that they studied possessed a very small piece of a Y chromosome attached to one of the Xs. This piece came from a section, called 1A, of the Y chromosome. Because this person had a male phenotype, they reasoned that the male- determining gene must reside within the 1A section of the Y chromosome. Examination of the Y chromosome of a 12 year-old XY girl seemed to verify this conclusion. In spite of the fact that she possessed more than 99.8% of a Y chromosome, this XY person had a female phenotype. Page and his colleagues assumed that the male-determining gene must reside within the 0.2% of the Y chromosome that she was missing. Further examination showed that this Y chromosome was indeed missing part of section 1A. They then sequenced the DNA within section 1A of normal males and found a gene called ZFY, which appeared to be the testis-determining factor. Within a few months, however, results from other laboratories suggested that ZFY might not in fact be the maledetermining gene. Marsupials (pouched mammals), which also have XX-XY sex determination, were found to possess a ZFY gene on an autosomal chromosome, not on the Y chromosome. Furthermore, several human XX males were found who did not possess a copy of the ZFY gene. A new candidate for the male-determining gene, called the sex-determining region Y (SRY) gene, was discovered in 1990 ( ◗ FIGURE 4.12). This gene is found in XX males and is missing from all XY females; it is also found on the Y chromosome of all mammals examined to date. Definitive proof that SRY is the male-determining gene came when scientists placed a copy of this gene into XX mice by means of genetic engineering. The XX mice that received this gene, although sterile, developed into anatomical males. The SRY gene encodes a protein that binds to DNA and causes a sharp bend in the molecule. This alteration of DNA structure may affect the expression of other genes that Sex Determination and Sex-Linked Characteristics Sex-determining region Y (SRY) gene Short arm Centromere Long arm This gene is Y linked because it is found only on the Y chromosome. Y chromosome ◗ 4.12 The SRY gene is on the Y chromosome and causes the development of male characteristics. encode testis formation. Although SRY is the primary determinant of maleness in humans, other genes (some X linked, others Y linked, and still others autosomal) also play a role in fertility and the development of sex differences. Concepts www.whfreeman.com/pierce Additional information on androgen-insensitivity syndrome Sex-Linked Characteristics The presence of the SRY gene on the Y chromosome causes a human embryo to develop as a male. In the absence of this gene, a human embryo develops as a female. www.whfreeman.com/pierce SRY gene Androgen-insensitivity syndrome illustrates several important points about the influence of genes on a person’s sex. First, this condition demonstrates that human sexual development is a complex process, influenced not only by the SRY gene on the Y chromosome, but also by other genes found elsewhere. Second, it shows that most people carry genes for both male and female characteristics, as illustrated by the fact that those with androgen-insensitivity syndrome have the capacity to produce female characteristics, even though they have male chromosomes. Indeed, the genes for most male and female secondary sex characteristics are present not on the sex chromosomes but on autosomes. The key to maleness and femaleness lies not in the genes but in the control of their expression. Additional information on the Androgen-insensitivity syndrome Several genes besides SRY influence sexual development in humans, as illustrated by women with androgen-insensitivity syndrome. These persons have female external sexual characteristics and psychological orientation. Indeed, most are unaware of their condition until they reach puberty and fail to menstruate. Examination by a gynecologist reveals that the vagina ends blindly and that the uterus, oviducts, and ovaries are absent. Inside the abdominal cavity lies a pair of testes, which produce levels of testosterone normally seen in males. The cells of a woman with androgen-insensitivity syndrome contain an X and a Y chromosome. How can a person be female in appearance when her cells contain a Y chromosome and she has testes that produce testosterone? The answer lies in the complex relation between genes and sex in humans. In a human embryo with a Y chromosome, the SRY gene causes the gonads to develop into testes, which produce testosterone. Testosterone stimulates embryonic tissues to develop male characteristics. But, for testosterone to have its effects, it must bind to an androgen receptor. This receptor is defective in females with androgen-insensitivity syndrome; consequently, their cells are insensitive to testosterone, and female characteristics develop. The gene for the androgen receptor is located on the X chromosome; so persons with this condition always inherit it from their mothers. (All XY persons inherit the X chromosome from their mothers.) Sex-linked characteristics are determined by genes located on the sex chromosomes. Genes on the X chromosome determine X-linked characteristics; those on the Y chromosome determine Y-linked characteristics. Because little genetic information exists on the Y chromosome in many organisms, most sex-linked characteristics are X linked. Males and females differ in their sex chromosomes; so the pattern of inheritance for sex-linked characteristics differs from that exhibited by genes located on autosomal chromosomes. X-Linked White Eyes in Drosophila The first person to explain sex-linked inheritance was the American biologist Thomas Hunt Morgan ( ◗ FIGURE 4.13a). Morgan began his career as an embryologist, but the discovery of Mendel’s principles inspired him to begin conducting genetic experiments, initially on mice and rats. In 1909, Morgan switched to Drosophila melanogaster; a year later, he discovered among the flies of his laboratory colony a single male that possessed white eyes, in stark contrast with the red eyes of normal fruit flies. This fly had a tremendous effect on the future of genetics and on Morgan’s career as a biologist. With his white-eyed male, Morgan unraveled the mechanism of X-linked inheritance, ushering in the “golden age” of Drosophila genetics that lasted from 1910 until 1930. Morgan’s laboratory, located on the top floor of Schermerhorn Hall at Columbia University, became known as the Fly Room ( ◗ FIGURE 4.13b). To say that the Fly Room was unimpressive is an understatement. The cramped room, only about 16  23 feet, was filled with eight desks, each occupied by a student and his experiments. The primitive laboratory equipment consisted of little more than milk bottles for rearing the flies and hand-held lenses for observing their traits. Later, microscopes replaced the hand-held lenses, and crude incubators were added to maintain the fly 85 86 Chapter 4 (a) (b) ◗ 4.13 Thomas Hunt Morgan’s work with Drosophila helped unravel many basic principles in genetics, including X-linked inheritance. (a) Morgan. (b) The Fly Room, where Morgan and his students conducted genetic research. (Part a, World Wide Photos; Part b, American Philisophical Society.) cultures, but even these additions did little to increase the physical sophistication of the laboratory. Morgan and his students were not tidy: cockroaches were abundant (living off spilled Drosophila food), dirty milk bottles filled the sink, ripe bananas — food for the flies — hung from the ceiling, and escaped fruit flies hovered everywhere. In spite of its physical limitations, the Fly Room was the source of some of the most important research in the history of biology. There was daily excitement among the students, some of whom initially came to the laboratory as undergraduates. The close quarters facilitated informality and the free flow of ideas. Morgan and the Fly Room illustrate the tremendous importance of “atmosphere” in producing good science. To explain the inheritance of the white-eyed characteristic in fruit flies, Morgan systematically carried out a series of genetic crosses ( ◗ FIGURE 4.14a). First, he crossed purebreeding, red-eyed females with his white-eyed male, producing F1 progeny that all had red eyes. (In fact, Morgan found three white-eyed males among the 1237 progeny, but he assumed that the white eyes were due to new mutations.) Morgan’s results from this initial cross were consistent with Mendel’s principles: a cross between a homozygous dominant individual and a homozygous recessive individual produces heterozygous offspring exhibiting the dominant trait. His results suggested that white eyes were a simple recessive trait. However, when Morgan crossed the F1 flies with one another, he found that all the female F2 flies possessed red eyes but that half the male F2 flies had red eyes and the other half had white eyes. This finding was clearly not the expected result for a simple recessive trait, which should appear in 14 of both male and female F2 offspring. To explain this unexpected result, Morgan proposed that the locus affecting eye color was on the X chromosome (that eye color was X linked). He recognized that the eyecolor alleles were present only on the X chromosome — no homologous allele was present on the Y chromosome. Because the cells of females possess two X chromosomes, females could be homozygous or heterozygous for the eyecolor alleles. The cells of males, on the other hand, possess only a single X chromosome and can carry only a single eye-color allele. Males therefore cannot be either homozygous or heterozygous but are said to be hemizygous for X-linked loci. To verify his hypothesis that the white-eye trait is X linked, Morgan conducted additional crosses. He predicted that a cross between a white-eyed female and a redeyed male would produce all red-eyed females and all white-eyed males ( ◗ FIGURE 4.14b). When Morgan performed this cross, the results were exactly as predicted. Note that this cross is the reciprocal of the original cross and that the two reciprocal crosses produced different results in the F1 and F2 generations. Morgan also crossed the F1 heterozygous females with their white-eyed father, the red-eyed F2 females with white-eyed males, and white-eyed females with white-eyed males. In all of these crosses, the results were consistent with Morgan’s conclusion that white eyes is an Xlinked characteristic. www.whfreeman.com/pierce of Thomas Hunt Morgan More information on the life Sex Determination and Sex-Linked Characteristics (a) Red-eyed female crossed with whiteeyed male P generation Red-eyed female (b) Reciprocal cross (whiteeyed female crossed with red-eyed male) White-eyed male P generation White-eyed female Xw Y w Xw   + X+ X Meiosis X+ Gametes Xw Xw Gametes X+ Y Fertilization F1 generation Red-eyed female X+ Y X Meiosis Red-eyed male F1 generation Red-eyed White-eyed female male  X+ Xw X+ Y X+ Xw Meiosis X+ Xw Gametes X+ Y X+ Xw Gametes ttw Y Fertilization F2 generation F2 generation X+ Sperm Y Xw Sperm Y X+ Y X+ Xw X+ Y Redeyed female Xw Xw Redeyed male Xw Y Whiteeyed female Whiteeyed male X+ Redeyed female X+ Xw Redeyed male Xw Y Redeyed female Whiteeyed male Xw Eggs X+ Eggs Xw Y Meiosis Fertilization Xw Conclusion: 1/2 red-eyed females 1/4 red-eyed males 1/4 white-eyed males ◗ Y Fertilization  X+ X+ Red-eyed male Conclusion: 1/4 red-eyed females 1/4 white-eyed females 1/4 red-eyed males 1/4 white-eyed males 4.14 Morgan’s X-linked crosses for white eyes in fruit flies. (a) Original and F1 crosses. (b) Reciprocal crosses. Nondisjunction and the Chromosome Theory of Inheritance When Morgan crossed his original white-eyed male with homozygous red-eyed females, all 1237 of the progeny had red eyes, except for three white-eyed males. As already mentioned, Morgan attributed these white-eyed F1 males to the occurrence of further mutations. However, flies with these unexpected phenotypes continued to appear in his crosses. Although uncommon, they appeared far too often to be due to mutation. Calvin Bridges, one of Morgan’s students, set out to investigate the genetic basis of these exceptions. Bridges found that, when he crossed a white-eyed female (XwXw) with a red-eyed male (XY), about 2.5% of the male offspring had red eyes and about 2.5% of the female offspring had white eyes ( ◗ FIGURE 4.15a). In this cross, every male fly should inherit its mother’s X chromosome and should be XwY with white eyes. Every female fly should inherit a dominant red-eye allele on its father’s X chromosome, along with a white-eyed allele on its mother’s X chromosome; thus, all the female progeny should be XXw and have red eyes. The appearance of redeyed males and white-eyed females in this cross was therefore unexpected. To explain this result, Bridges hypothesized that, occasionally, the two X chromosomes in females fail to separate during anaphase I of meiosis. Bridges termed this failure of chromosomes to separate nondisjunction. When nondisjunction occurs, some of the eggs receive two copies of the X chromosome and others do not receive an X chromosome ( ◗ FIGURE 4.15b). If these eggs are fertilized by sperm from a red-eyed male, four combinations of sex chromosomes are produced. When an egg carrying two X chromosomes is fertilized by a Y-bearing sperm, the resulting zygote is XwXwY. Sex in Drosophila is determined by the X:A ratio (see Table 4.1); in this case the X:A ratio is 1.0, so the XwXwY zygote develops into a white-eyed female. An egg with two X chromosomes that is fertilized by an X-bearing sperm produces XwXwX, which usually dies. An egg with no X chromosome that is fertilized by an Xbearing sperm produces XO, which develops into a redeyed male. If the egg with no X chromosome is fertilized by a Y-bearing sperm, the resulting zygote with only a Y chromosome and no X chromosome dies. Rare nondisjunction of the X chromosomes among white-eyed females therefore produces a few red-eyed males and whiteeyed females, which is exactly what Bridges found in his crosses. Bridges’s hypothesis predicted that the white-eyed females would possess two X chromosomes and one Y and that red-eyed males would possess a single X chromosome. To verify his hypothesis, Bridges examined the chromosomes of his flies and found precisely what he predicted. The significance of Bridges’s study was not that it explained 87 88 Chapter 4 (a) White-eyed female and red-eyed male P generation White-eyed female (b) White-eyed female and red-eyed male with nondisjunction P generation White-eyed female Red-eyed male Red-eyed male   Xw Xw X+ Y Xw Xw Nondisjunction in meiosis Normal meiosis Gametes Xw X+ Y Xw X+ Normal meiosis X+ Gametes Xw Xw Y Fertilization Fertilization F1 generation F1 generation X+ X+ Xw Xw X+ Sperm X Red-eyed female White-eyed male Sperm Y Xw Xw Y Xw Xw Y X+ Xw Y wY Red-eyed White-eyed metafemale female (dies) Eggs + X Y Eggs Xw These flies are male because their X : A ratio = 0.5 Conclusion: 1/2 red-eyed females and normal separation of chromosomes results in 1/2 white-eyed males. These flies are female because their X: A ratio = 1 None Red-eyed male Dies Conclusion: Nondisjunction results in white-eyed females and red-eyed males. ◗ 4.15 Bridges conducted experiments that proved that the gene for white eyes is located on the X chromosome. (a) A white-eyed female was crossed with a red-eyed male. (b) Rare nondisjunction produced a few eggs with two copies of the XW chromosome and other eggs with no X chromosome. the appearance of an occasional odd fly in his culture but that he was able to predict a fly’s chromosomal makeup on the basis of its eye-color genotype. This association between genotype and chromosomes gave unequivocal evidence that sex-linked genes were located on the X chromosome and confirmed the chromosome theory of inheritance. Concepts By showing that the appearance of rare phenotypes was associated with the inheritance of particular chromosomes, Bridges proved that sex-linked genes are located on the X chromosome and that the chromosome theory of inheritance is correct. X-Linked Color Blindness in Humans To further examine X-linked inheritance, let’s consider another X-linked characteristic: red – green color blindness in humans. Within the human eye, color is perceived in light-sensing cone cells that line the retina. Each cone cell contains one of three pigments capable of absorbing light of a particular wavelength; one absorbs blue light, a second absorbs red light, and a third absorbs green light. The human eye actually detects only three colors — red, green, and blue — but the brain mixes the signals from different cone cells to create the wide spectrum of colors that we perceive. Each of the three pigments is encoded by a separate locus; the locus for the blue pigment is found on chromosome 7, and those for green and red pigments lie close together on the X chromosome. The most common types of human color blindness are caused by defects of the red and green pigments; we will refer Sex Determination and Sex-Linked Characteristics to these conditions as red – green color blindness. Mutations that produce defective color vision are generally recessive and, because the genes coding for the red and green pigments are located on the X chromosome, red – green color blindness is inherited as an X-linked recessive characteristic. We will use the symbol Xc to represent an allele for red – green color blindness and the symbol X to represent an allele for normal color vision. Females possess two X chromosomes; so there are three possible genotypes among females: XX and XXc, which produce normal vision, and Xc Xc, which produces color blindness. Males have only a single X chromosome and two possible genotypes: XY, which produces normal vision, and Xc Y which produces color blindness. If a color-blind man mates with a woman homozygous for normal color vision ( ◗ FIGURE 4.16a), all of the gametes produced by the woman will contain an allele for normal color vision. Half of the man’s gametes will receive the X chromosome with the color-blind allele, and the other half will receive the Y chromosome, which carries no alleles affecting color vision. When an Xc-bearing sperm unites with the X-bearing egg, a heterozygous female with normal vision (XXc) is produced. When a Y-bearing sperm unites with the X-bearing egg, a hemizygous male with normal vision (XY) is produced (see Figure 4.16a). In the reciprocal cross between a color-blind woman and a man with normal color vision ( ◗ FIGURE 4.16b), the woman produces only Xc-bearing gametes. The man produces some gametes that contain the X chromosome and others that contain the Y chromosome. Males inherit the X chromosome (a) Normal female and color-blind male P generation Normalcolor-vision female X+ X+ Color-blind male Xc Y Color-blind female Xc Xc ◗ 4.16 Red – green color blindness is inherited as an X-linked recessive trait in humans.  Normalcolor-vision male X+ Y Meiosis Xc Y Gametes Xc Fertilization F1 generation Xc X+ Y Fertilization F1 generation Xc Sperm Eggs X+ Characteristics determined by genes on the sex chromosomes are called sex-linked characteristics. Diploid females have two alleles at each X-linked locus, whereas diploid males possess a single allele at each X-linked locus. Females inherit X-linked alleles from both parents, but males inherit a single X-linked allele from their mothers. P generation  X+ X+ Xc Normalcolorvision female Concepts (b) Reciprocal cross Meiosis Gametes X+ from their mothers; because both of the mother’s X chromosomes bear the Xc allele in this case, all the male offspring will be color blind. In contrast, females inherit an X chromosome from both parents; thus the female offspring of this reciprocal cross will all be heterozygous with normal vision. Females are color blind only when color-blind alleles have been inherited from both parents, whereas a color-blind male need inherit a color-blind allele from his mother only; for this reason, color blindness and most other rare X-linked recessive characteristics are more common in males. In these crosses for color blindness, notice that an affected woman passes the X-linked recessive trait to her sons but not to her daughters, whereas an affected man passes the trait to his grandsons through his daughters but never to his sons. X-linked recessive characteristics seem to alternate between the sexes, appearing in females one generation and in males the next generation; thus, this pattern of inheritance exhibited by X-linked recessive characteristics is sometimes called crisscross inheritance. X+ Sperm Y Y X+ Y Normalcolorvision male Conclusion: Both males and females have normal color vision. Eggs Xc X+ Xc Normalcolorvision female Xc Y Colorblind male Conclusion: Females have normal color vision, males are color blind. 89 90 Chapter 4 Symbols for X-Linked Genes There are several different ways to record genotypes for X-linked traits. Sometimes the genotypes are recorded in the same fashion as for autosomal characteristics — the hemizygous males are simply given a single allele: the genotype of a female Drosophila with white eyes would be ww, and the genotype of a white-eyed hemizygous male would be w. Another method is to include the Y chromosome, designating it with a diagonal slash (/). With this method, the white-eyed female’s genotype would still be ww and the white-eyed male’s genotype would be w/. Perhaps the most useful method is to write the X and Y chromosomes in the genotype, designating the X-linked alleles with superscripts, as we have done in this chapter. With this method, a white-eyed female would be XwXw and a white-eyed male Xw Y. Using Xs and Ys in the genotype has the advantage of reminding us that the genes are X linked and that the male must always have a single allele, inherited from the mother. Dosage Compensation The presence of different numbers of X chromosomes in males and females presents a special problem in development. Because females have two copies of every X-linked gene and males possess one copy, the amount of gene product (protein) from X-linked genes would normally differ in the two sexes — females would produce twice as much gene product as males. This difference could be highly detrimental because protein concentration plays a critical role in development. Animals overcome this potential problem through dosage compensation, which equalizes the amount of protein produced by X-linked genes in the two sexes. In fruit flies, dosage compensation is achieved by a doubling of the activity of the genes on the X chromosome of the male. In the worm Caenorhabditis elegans, it is achieved by a halving of the activity of genes on both of the X chromosomes in the female. Pla(a) (b) cental mammals use yet another mechanism of dosage compensation; genes on one of the X chromosomes in the female are completely inactivated. In 1949, Murray Barr observed condensed, darkly staining bodies in the nuclei of cells from female cats ( ◗ FIGURE 4.17); this darkly staining structure became known as a Barr body. Mary Lyon proposed in 1961 that the Barr body was an inactive X chromosome; her hypothesis (now proved) has become known as the Lyon hypothesis. She suggested that, within each female cell, one of the two X chromosomes becomes inactive; which X chromosome is inactivated is random. If a cell contains more than two X chromosomes, all but one of them is inactivated. The number of Barr bodies present in human cells with different complements of sex chromosomes is shown in Table 4.2. As a result of X inactivation, females are functionally hemizygous at the cellular level for X-linked genes. In females that are heterozygous at an X-linked locus, approximately 50% of the cells will express one allele and 50% will express the other allele; thus, in heterozygous females, proteins encoded by both alleles are produced, although not within the same cell. This functional hemizygosity means that cells in females are not identical with respect to the expression of the genes on the X chromosome; females are mosaics for the expression of X-linked genes. X inactivation takes place relatively early in development — in humans, within the first few weeks of development. Once an X chromosome becomes inactive in a cell, it remains inactivated and is inactive in all somatic cells that descend from the cell. Thus, neighboring cells tend to have the same X chromosome inactivated, producing a patchy pattern (mosaic) for the expression of an X-linked characteristic in heterozygous females. This patchy distribution can be seen in tortoiseshell cats ( ◗ FIGURE 4.18). Although many genes contribute to coat color and pattern in domestic cats, a single X-linked locus determines the presence of orange color. There are possible ◗ 4.17 A Barr body is an inactivated X chromosome. (a) Female cell with a Barr body (indicated by arrow). (b) Male cell without a Barr body. (Part a, George Wilder/Visuals Unlimited; part b, M. Abbey/Photo Researchers.) Sex Determination and Sex-Linked Characteristics Table 4.2 Number of Barr bodies in human cells with different complements of sex chromosomes Sex Chromosomes Syndrome Number of Barr Bodies XX None 1 XY None 0 XO Turner 0 XXY Klinefelter 1 XXYY Klinefelter 1 XXXY Klinefelter 2 XXXXY Klinefelter 3 XXX Triplo-X 2 XXXX Poly-X female 3 XXXXX Poly-X female 4 ◗ 4.18 The patchy distribution of color on tortoiseshell cats results from the random inactivation of one X chromosome in females. (David Falconer/Words & Pictures/Picture Quest.) two alleles at this locus: X, which produces nonorange (usually black) fur, and Xo, which produces orange fur. Males are hemizygous and thus may be black (XY) or orange (XoY) but not black and orange. (Rare tortoiseshell males can arise from the presence of two X chromosomes, XXoY.) Females may be black (XX), orange (XoXo), or tortoiseshell (XXo), the tortoiseshell pattern arising from a patchy mixture of black and orange fur. Each orange patch is a clone of cells derived from an original cell with the black allele inactivated, and each black patch is a clone of cells derived from an original cell with the orange allele inactivated. The mosaic pattern of gene expression associated with dosage compensation also produces the patchy distribution of sweat glands in women heterozygous for anhidrotic ectodermal dysplasia (see introduction to this chapter). Lyon’s hypothesis suggests that the presence of variable numbers of X chromosomes should not be detrimental in mammals, because any X chromosomes beyond one should be inactivated. However, persons with Turner syndrome (XO) differ from normal females, and those with Klinefelter syndrome (XXY) differ from normal males. How do these conditions arise in the face of dosage compensation? The reason may lie partly in the fact that there is a short period of time, very early in development, when all X chromosomes are active. If the number of X chromosomes is abnormal, any X-linked genes expressed during this early period will produce abnormal levels of gene product. Furthermore, the phenotypic abnormalities may arise because some X-linked genes escape inactivation, although how they do so isn’t known. Exactly how an X chromosome becomes inactivated is not completely understood either, but it appears to entail the addition of methyl groups ( – CH3) to the DNA. The XIST (for X inactive-specific transcript) gene, located on the X chromosome, is required for inactivation. Only the copy of XIST on the inactivated X chromosome is expressed, and it continues to be expressed during inactivation (unlike most other genes on the inactivated X chromosome). Interestingly, XIST does not encode a protein; it produces an RNA molecule that binds to the inactivated X chromosome. This binding is thought to prevent the attachment of other proteins that participate in transcription and, in this way, it brings about X inactivation. Concepts In mammals, dosage compensation ensures that the same amount of X-linked gene product will be produced in the cells of both males and females. All but one X chromosome is randomly inactivated in each cell; which X chromosome is inactivated is random and varies from cell to cell. www.whfreeman.com/pierce Current information on XIST and X-chromosome inactivation in humans Z-Linked Characteristics In organisms with ZZ-ZW sex determination, the males are the homogametic sex (ZZ) and carry two sex-linked (usually referred to as Z-linked) alleles; thus males may be homozygous or heterozygous. Females are the heterogametic sex (ZW) and possess only a single Z-linked allele. Inheritance of Z-linked characteristics is the same as that of X-linked characteristics, except that the pattern of inheritance in males and females is reversed. 91 92 Chapter 4 An example of a Z-linked characteristic is the cameo phenotype in Indian blue peafowl (Pavo cristatus). In these birds, the wild-type plumage is a glossy, metallic blue. The female peafowl is ZW and the male is ZZ. Cameo plumage, which produces brown feathers, results from a Z-linked allele (Zca) that is recessive to the wild-type blue allele (ZCa). If a blue-colored female (ZCaW) is crossed with a cameo male (ZcaZca), all the F1 females are cameo (ZcaW) and all the F1 males are blue (ZCaZca) ( ◗ FIGURE 4.19). When the F1 are interbred, 14 of the F2 are blue males (ZCaZca), 14 are blue females (ZCaW), 14 are cameo males (ZcaZca), and 1 4 are cameo females (ZcaW). The reciprocal cross of a cameo female with a homozygous blue male produces an F1 generation in which all offspring are blue and an F1 consisting of 12 blue males (ZCaZca and ZCaZCa), 14 blue females (ZCa W), and 14 cameo females (Zca W). In organisms with ZZ-ZW sex determination, the female always inherits her W chromosome from her mother, and she inherits her Z chromosome, along with any Z-linked alleles, from her father. In this system, the male inherits Z chromosomes, along with any Z-linked alleles, from both the mother and the father. This pattern of inheritance is the reverse of X-linked alleles in organisms with XX-XY sex determination. Y-Linked Characteristics ◗ 4.19 Inheritance of the cameo phenotype in Indian blue peafowl is inherited as a Z-linked recessive trait. (a) Blue female crossed with cameo male. (b) Reciprocal cross of cameo female crossed with homozygous blue male. Y-linked traits exhibit a distinct pattern of inheritance and are present only in males, because only males possess a Y chromosome. All male offspring of a male with a Y-linked trait will display the trait (provided that the penetrance — see Chapter 3 — is 100%), because every male inherits the Y chromosome from his father. In humans and many other organisms, there is relatively little genetic information on the Y chromosome, and few characteristics exhibit Y-linked inheritance. More than 20 genes have been identified outside the pseudoautosomal region on the human Y chromosome, including the SRY gene and the ZFY gene. A possible Y-linked human trait is hairy ears, a trait that is common among men in some parts of the Middle East and India, affecting as many as 70% of adult men in some regions. This trait displays variable expressivity — some men have only a few hairs on the outer ear, whereas others have ears that are covered with hair. The age at which this trait appears also is quite variable. Only men have hairy ears and, in many families, the occurrence of the trait is entirely consistent with Y-linked inheritance. In a few families, however, not all sons of an affected man display the trait, which implies that the trait has incomplete penetrance. Some investigators have concluded that the hairy-ears trait is not Y-linked, but instead is an autosomal dominant trait expressed only in men (sexlimited expression, discussed more fully in Chapter 5). Distinguishing between a Y-linked characteristic with incomplete penetrance and an autosomal dominant characteristic expressed only in males is difficult, and the pattern of inheritance of hairy ears is consistent with both modes of inheritance. The function of most Y-linked genes is poorly understood, but some appear to influence male sexual development Sex Determination and Sex-Linked Characteristics and fertility. Some Y-linked genes have counterparts on the X chromosome that encode similar proteins in females. DNA sequences in the Y chromosome undergo mutation over time and vary among individuals. Like Y-linked traits, these variants — called genetic markers — are passed from father to son and can be used to study male ancestry. Although the markers themselves do not code for any physical traits, they can be detected with molecular methods. Much of the Y chromosome is nonfunctional; so mutations readily accumulate. Many of these mutations are unique; they arise only once and are passed down through the generations without recombination. Individuals possessing the same set of mutations are therefore related, and the distribution of these genetic markers on Y chromosomes provides clues about genetic relationships of present-day people. Y-linked markers have been used to study the offspring of Thomas Jefferson, principal author of the Declaration of Independence and third president of the United States. In 1802, Jefferson was accused by a political enemy of fathering a child by his slave Sally Hemings, but the evidence was circumstantial. Hemings, who worked in the Jefferson household and accompanied Jefferson on a trip to Paris, had five children. Jefferson was accused of fathering the first child, Tom, but rumors about the paternity of the other children circulated as well. Hemings’s last child, Eston, bore a striking resemblance to Jefferson, and her fourth child, Madison, testified late in life that Jefferson was the father of all Hemings’s children. Ancestors of Hemings’s children maintained that they were descendants of the Jefferson line, but some Jefferson descendants refused to recognize their claim. To resolve this long-standing controversy, geneticists examined markers from the Y chromosomes of male-line descendants of Hemings’s first son (Thomas Woodson), her last son (Eston Hemings), and a paternal uncle of Thomas Jefferson with whom Jefferson had Y chromosomes in common. (Descendants of Jefferson’s uncle were used because Jefferson himself had no verified male descendants.) Geneticists determined that Jefferson possessed a rare and distinctive set of genetic markers on his Y chromosome. The same markers were also found on the Y chromosomes of the male-line descendants of Eston Hemings. The probability of such a match arising by chance is less than 1%. (The markers were not found on the Y chromosomes of the descendants of Thomas Woodson.) Together with the circumstantial historical evidence, these matching markers strongly suggest that Jefferson fathered Eston Hemings but not Thomas Woodson. Another study utilizing Y-linked genetic markers focused on the origins of the Lemba, an African tribe comprising 50,000 people who reside in South Africa and parts of Zimbabwe. Members of the Lemba tribe are commonly referred to as the black Jews of South Africa. This name derives from cultural practices of the tribe, including circumcision and food taboos, which superficially resemble those of Jewish people. Lemba oral tradition suggests that the tribe came from “Sena in the north by boat,” Sena being variously identified as Sanaa in Yemen, Judea, Egypt, or Ethiopia. Legend says that the original group was entirely male, that half of their number was lost at sea, and that the survivors made their way to the coast of Africa, where they settled. Today, most Lemba belong to Christian churches, are Muslims, or claim to be Lemba in religion. Their religious practices have little in common with Judaism and, with the exception of their oral tradition and a few cultural practices, there is little to suggest a Jewish origin. To reveal the genetic origin of the Lemba, scientists examined genetic markers on their Y chromosomes. Swabs of cheek cells were collected from 399 males in several populations: the Lemba in Africa, Bantu (another South African tribe), two groups from Yemen, and several groups of Jews. DNA was extracted and analyzed for alleles at 12 loci. This analysis of genetic markers revealed that Y chromosomes in the Lemba were of two types: those of Bantu origin and those similar to chromosomes found in Jewish and Yemen populations. Most importantly, members of one Lemba clan carried a large number of Y chromosomes that had a rare combination of alleles also found on the Y chromosomes of members of the Jewish priesthood. This set of alleles is thought to be an important indicator of Judaic origin. These findings are consistent with the Lemba oral tradition and strongly suggest a genetic contribution from Jewish populations. Concepts Y-linked characteristics exhibit a distinct pattern of inheritance: they are present only in males, and all male offspring of a male with a Y-linked trait inherit the trait. www.whfreeman.com/pierce An over overview of the use of Y-linked markers in studies of ancestry Connecting Concepts Recognizing Sex-linked Inheritance What features should we look for to identify a trait as sex linked? A common misconception is that any genetic characteristic in which the phenotypes of males and females differ must be sex linked. In fact, the expression of many autosomal characteristics differs between males and females. The genes that code for these characteristics are the same in both sexes, but their expression is influenced by sex hormones. The different sex hormones of males and females cause the same genes to generate different phenotypes in males and females. Another misconception is that any characteristic that is found more frequently in one sex is sex linked. A number of 93 94 Chapter 4 autosomal traits are expressed more commonly in one sex than in the other, because the penetrance of the trait differs in the two sexes; these traits are said to be sex influenced. For some autosomal traits, the penetrance in one sex is so low that the trait is expressed in only one sex; these traits are said to be sex limited. Both sex-influenced and sex-limited characteristics will be discussed in more detail in Chapter 5. Several features of sex-linked characteristics make them easy to recognize. Y-linked traits are found only in males, but this fact does not guarantee that a trait is Y linked, because some autosomal characteristics are expressed only in males. A Y-linked trait is unique, however, in that all the male offspring of an affected male will express the father’s phenotype, provided the penetrance of the trait is 100%. This need not be the case for autosomal traits that are sex-limited to males. Even when the penetrance is less than 100%, a Y-linked trait can be inherited only from the father’s side of the family. Thus, a Y-linked trait can be inherited only from the paternal grandfather (the father’s father), never from the maternal grandfather (the mother’s father). X-linked characteristics also exhibit a distinctive pattern of inheritance. X linkage is a possible explanation when the results of reciprocal crosses differ. If a characteristic is X linked, a cross between an affected male and an unaffected female will not give the same results as a cross between an affected female and an unaffected male. For almost all autosomal characteristics, the results of reciprocal crosses are the same. We should not conclude, however, that, when the reciprocal crosses give different results, the characteristic is X linked. Other sex-associated forms of inheritance, discussed in Chapter 5, also produce different results in reciprocal crosses. The key to recognizing X-linked inheritance is to remember that a male always inherits his X chromosome from his mother, not from his father. Thus, an X-linked characteristic is not passed directly from father to son; if a male clearly inherits a characteristic from his father — and the mother is not heterozygous — it cannot be X linked. Connecting Concepts Across Chapters In this chapter, we have examined sex determination and the inheritance of traits encoded by genes located on the sex chromosomes. An important theme has been that sex is determined in a variety of different ways — not all organisms have the familiar XX-XY system seen in humans. Even among organisms with XX-XY sex determination, the sexual phenotype of an individual can be shaped by very different mechanisms. The discussion of sex determination lays the foundation for an understanding of sex-linked inheritance, covered in the last part of the chapter. Because males and females differ in sex chromosomes, which are not homologous, they do not possess the same number of alleles at sex-linked loci, and the patterns of inheritance for sex-linked characteristics are different from those for autosomal characteristics. This material augments the principles of inheritance presented in Chapter 3. The chromosome theory of inheritance, which states that genes are located on chromosomes, was first elucidated through the study of sex-linked traits. This theory provided the first clues about the physical basis of heredity, which we will explore in more detail in Chapters 10 and 11. The ways in which sex and heredity interact are explored further in Chapter 5, where we consider additional exceptions to Mendel’s principles, including sexlimited and sex-influenced traits, cytoplasmic inheritance, genetic maternal effect, and genomic imprinting. The inheritance of human sex-linked characteristics will be discussed in Chapter 6, and we will take a more detailed look at chromosome abnormalities, including abnormal sex chromosomes, in Chapter 9. CONCEPTS SUMMARY • Sexual reproduction is the production of offspring that are genetically distinct from the parents. Among diploid eukaryotes, sexual reproduction consists of two processes: meiosis, which produces haploid gametes, and fertilization, in which gametes unite to produce diploid zygotes. • Most organisms have two sexual phenotypes — males and females. Males produce small gametes; females produce large gametes. The sex of an individual normally refers to the individual’s sexual phenotype, not its genetic makeup. • The mechanism by which sex is specified is termed sex determination. Sex may be determined by differences in specific chromosomes, ploidy level, genotypes, or environment. • Sex chromosomes differ in number and appearance between males and females; other, nonsex chromosomes are termed autosomes. In organisms with chromosomal sex-determining systems, the homogametic sex produces gametes that are all identical with regard to sex chromosomes; the heterogametic sex produces two types of gametes, which differ in their sexchromosome composition. • In the XX-XO system, females possess two X chromosomes, and males possess a single X chromosome. • In the XX-XY system, females possess two X chromosomes, and males possess a single X and a single Y chromosome. The X and Y chromosomes are not homologous, except at Sex Determination and Sex-Linked Characteristics the pseudoautosomal region, which is essential to pairing in meiosis in males. • In the ZZ-ZW system of sex determination, males possess two Z chromosomes and females possess an LZ and a LW chromosome. • In some organisms, ploidy level determines sex; males develop from unfertilized eggs (and are haploid) and females develop from fertilized eggs (and are diploid). Other organisms have genic sex determination, in which genotypes at one or more loci determine the sex of an individual. Still others have environmental sex determination. • In Drosophila melanogaster, sex is determined by a balance between genes on the X chromosomes and genes on the 95 autosomes, the X:A ratio. An X:A ratio of 1.0 produces a female; an X:A ratio of 0.5 produces a male; and an X:A ratio between 1.0 and 0.5 produces an intersex. • In humans, sex is ultimately determined by the presence or absence of the SRY gene located on the Y chromosome. • Sex-linked characteristics are determined by genes on the sex chromosomes; X-linked characteristics are encoded by genes on the X chromosome, and Y-linked characteristics are encoded by genes on the Y chromosome. • A female inherits X-linked alleles from both parents; a male inherits X-linked alleles from his female parent only. IMPORTANT TERMS sex (p. 78) sex determination (p. 78) hermaphroditism (p. 78) monoecious (p. 78) dioecious (p. 78) sex chromosomes (p. 79) autosomes (p. 79) heterogametic sex (p. 79) homogametic sex (p. 79) pseudoautosomal region (p. 80) genic sex determination (p. 81) sequential hermaphroditism (p. 81) genic balance system (p. 81) X:A ratio (p. 81) Turner syndrome (p. 82) Klinefelter syndrome (p. 83) triplo-X syndrome (p. 83) sex-determining region Y (SRY) gene (p. 84) sex-linked characteristic (p. 85) X-linked characteristic (p. 85) Y-linked characteristic (p. 85) hemizygous (p. 86) nondisjunction (p. 86) dosage compensation (p. 90) Barr body (p. 90) Lyon hypothesis (p. 90) Worked Problems 1. A fruit fly has XXXYY sex chromosomes; all the autosomal chromosomes are normal. What sexual phenotype will this fly have? • Solution Sex in fruit flies is determined by the X:A ratio — the ratio of the number of X chromosomes to the number of haploid autosomal sets. An X:A ratio of 1.0 produces a female fly; an X:A ratio of 0.5 produces a male. If the X:A ratio is greater than 1.0, the fly is a metafemale; if it is less than 0.5, the fly is a metamale; if the X:A ratio is between 1.0 and 0.5, the fly is an intersex. This fly has three X chromosomes and normal autosomes. Normal diploid flies have two autosomal sets of chromosomes; so the X:A ratio in this case is 3⁄2 or 1.5. Thus, this fly is a metafemale. 2. Color blindness in humans is most commonly due to an X-linked recessive allele. Betty has normal vision, but her mother is color blind. Bill is color blind. If Bill and Betty marry and have a child together, what is the probability that the child will be color blind? • Solution Because color blindness is an X-linked recessive characteristic, Betty’s color-blind mother must be homozygous for the colorblind allele (XcXc). Females inherit one X chromosome from each of their parents; so Betty must have inherited a color-blind allele from her mother. Because Betty has normal color vision, she must have inherited an allele for normal vision (X) from her father; thus Betty is heterozygous (XXc). Bill is color blind. Because males are hemizygous for X-linked alleles, he must be (Xc Y). A mating between Betty and Bill is represented as: Betty X1Xc X1 Xc Gametes  Bill XcY Xc Y X1 Xc c X X1Xc normal female XcXc color-blind female Y X1Y normal male XcY color-blind male Thus, 1⁄4 of the children are expected to be female with normal color vision, 1⁄4 female with color blindness, 1⁄4 male with normal color vision, and 1⁄4 male with color blindness. 96 Chapter 4 3. Chickens, like all birds, have ZZ-ZW sex determination. The bar-feathered phenotype in chickens results from a Z-linked allele that is dominant over the allele for nonbar feathers. A barred female is crossed with a nonbarred male. The F1 from this cross are intercrossed to produce the F2. What will the phenotypes and their proportions be in the F1 and F2 progeny? • Solution With the ZZ-ZW system of sex determination, females are the heterogametic sex, possessing a Z chromosome and a W chromosome; males are the homogametic sex, with two Z chromosomes. In this problem, the barred female is hemizygous for the bar phenotype (ZBW). Because bar is dominant over nonbar, the nonbarred male must be homozygous for nonbar (ZbZb). Crossing these two chickens, we obtain: barred female  nonbarred male ZB W ZbZb ZB Gametes b Z Zb W Zb Zb ZB W ZBZb barred male ZbW nonbarred female ZBZb ZbW barred male nonbarred female nonbarred female  barred male ZbW ZBZb Zb B Z Zb W ZB Zb W • Solution This problem is best worked by breaking the cross down into two separate crosses, one for the X-linked genes that determine the type of bristles and one for the autosomal genes that determine eye color. Let’s begin with the autosomal characteristics. A female fly that is homozygous for red eyes (bb) is crossed with a male with brown eyes. Because brown eyes are recessive, the male fly must be homozygous for the brown-eyed allele (bb). All of the offspring of this cross will be heterozygous (bb) and will have brown eyes: P Thus, all the males in the F1 will be barred (ZBZb), and all the females will be nonbarred (ZbW). The F1 are now crossed to produce the F2: Gametes 4. In Drosophila melanogaster, forked bristles are caused by an allele (Xf) that is X linked and recessive to an allele for normal bristles (X). Brown eyes are caused by an allele (b) that is autosomal and recessive to an allele for red eyes (b). A female fly that is homozygous for normal bristles and red eyes mates with a male fly that has forked bristles and brown eyes. The F1 are intercrossed to produce the F2. What will the phenotypes and proportions of the F2 flies be from this cross? Zb ZBZb barred male ZB W barred female ZbZb ZbW nonbarred male nonbarred female So, 1⁄4 of the F2 are barred males,1⁄4 are nonbarred males, 1⁄4 are barred females, and 1⁄4 are nonbarred females. bb  bb red eyes brown eyes b Gametes b bb red F1 The F1 are then intercrossed to produce the F2. Whenever two individuals heterozygous for an autosomal recessive characteristic are crossed, 3⁄4 of the offspring will have the dominant trait and 1⁄4 will have the recessive trait; thus, 3⁄4 of the F2 flies will have red eyes and 1⁄4 will have brown eyes: F1 bb  bb red eyes red eyes Gametes b 1 F2 1 1 3 4 2 4 4 b bb bb bb red, 1 4 b b red red brown brown Next, we work out the results for the X-linked characteristic. A female that is homozygous for normal bristles (XX) is crossed with a male that has forked bristles (X f Y). The female F1 from this cross are heterozygous (XX f ), receiving an X chromosome with a normal-bristle allele from their mother (X) and an X chromosome with a forked-bristle allele (X f ) from their father. The male F1 are hemizygous (XY), receiving an X 97 Sex Determination and Sex-Linked Characteristics chromosome with a normal-bristle allele from their mother (X) and a Y chromosome from their father: X1X1 Xf Y normal  forked bristles bristles P X1 Gametes F1 1 1 1 f X X X1Y 2 2 Xf To obtain the phenotypic ratio in the F2, we now combine these two crosses by using the multiplicative rule of probability and the branch diagram: Eye color Y red (3 4 ) normal bristles normal bristles When these F1 are intercrossed, 1/2 of the F2 will be normalbristle females, 1/4 will be normal-bristle males, and 1/4 will be forked-bristle males: F1 X1Xf Gametes 1 X 1 f X X Y Y X X1X1 normal female X1Y normal female Xf X1Xf normal female Xf Y forked-bristle male F2 2 X1Y X1 1 1  normal female, 1 4 normal male, 1 4 brown (1 4 ) Bristle and sex F2 phenotype Probability normal female (1 2 ) red normal female 3 normal male (1 4 ) red normal male 3 forked-bristle male ( 1 4 ) red forkedbristle male 3 normal female (1 2 ) brown normal female 1 normal male (1 4 ) brown normal male 1 forked-bristle male ( 1 4 ) brown forkedbristle male 1 4  1 4  1 4  1 4  1 4  1 4  1   3 4  3 4  3   1 4  1 4  1 2 2 6 2 8 16 16 16 8 16 16 16 forked bristle male COMPREHENSION QUESTIONS * 1. What is the most defining difference between males and females? 2. How do monoecious organisms differ from dioecious organisms? 3. Describe the XX-XO system of sex determination. In this system, which is the heterogametic sex and which is the homogametic sex? 4. How does sex determination in the XX-XY system differ from sex determination in the ZZ-ZW system? * 5. What is the pseudoautosomal region? How does the inheritance of genes in this region differ from the inheritance of other Y-linked characteristics? * 6. How is sex determined in insects with haplodiploid sex determination? 7. What is meant by genic sex determination? 8. How does sex determination in Drosophila differ from sex determination in humans? 9. Give the typical sex chromosomes found in the cells of people with Turner syndrome, Klinefelter syndrome, and androgen insensitivity syndrome, as well as in poly-X females. * 10. What characteristics are exhibited by an X-linked trait? 11. Explain how Bridges’s study of nondisjunction in Drosophila helped prove the chromosome theory of inheritance. 12. Explain why tortoiseshell cats are almost always female and why they have a patchy distribution of orange and black fur. 13. What is a Barr body? How is it related to the Lyon hypothesis? * 14. What characteristics are exhibited by a Y-linked trait? 98 Chapter 4 APPLICATION QUESTIONS AND PROBLEMS * 15. What is the sexual phenotype of fruit flies having the following chromosomes? Sex chromosomes Autosomal chromosomes (a) XX all normal (b) XY all normal (c) XO all normal (d) XXY all normal (e) XYY all normal (f) XXYY all normal (g) XXX all normal (h) XX four haploid sets (i) XXX four haploid sets (j) XXX three haploid sets (k) X three haploid sets (l) XY three haploid sets (m) XX three haploid sets 16. * 17. * 18. * 19. a color-blind son, would John be justified in claiming nonpaternity? 20. Red – green color blindness in humans is due to an X-linked recessive gene. A woman whose father is color blind possesses one eye with normal color vision and one eye with color blindness. (a) Propose an explanation for this woman’s vision pattern. (b) Would it be possible for a man to have one eye with normal color vision and one eye with color blindness? * 21. Bob has XXY chromosomes (Klinefelter syndrome) and is color blind. His mother and father have normal color vision, but his maternal grandfather is color blind. Assume that Bob’s chromosome abnormality arose from nondisjunction in meiosis. In which parent and in which meiotic division did nondisjunction occur? Explain your answer. 22. In certain salamanders, it is possible to alter the sex of For parts a through g in problem 15 what would the human a genetic female, making her into a functional male; these sexual phenotype (male or female) be? salamanders are called sex-reversed males. When a sexreversed male is mated with a normal female, approximately Joe has classic hemophilia, which is an X-linked recessive 2 3 of the offspring are female and 13 are male. How is sex disease. Could Joe have inherited the gene for this disease determined in these salamanders? Explain the results of this from the following persons? cross. Yes No 23. In some mites, males pass genes to their grandsons, but (a) His mother’s mother ____ ____ they never pass genes to male offspring. Explain. (b) His mother’s father ____ ____ 24. The Talmud, an ancient book of Jewish civil and religious (c) His father’s mother ____ ____ laws, states that if a woman bears two sons who die of (d) His father’s father ____ ____ bleeding after circumcision (removal of the foreskin from the penis), any additional sons that she has should not be In Drosophila, yellow body is due to an X-linked gene that circumcised. (The bleeding is most likely due to the is recessive to the gene for gray body. X-linked disorder hemophilia.) Furthermore, the Talmud (a) A homozygous gray female is crossed with a yellow states that the sons of her sisters must not be circumcised, male. The F1 are intercrossed to produce F2. Give the whereas the sons of her brothers should. Is this religious genotypes and phenotypes, along with the expected law consistent with sound genetic principles? Explain your proportions, of the F1 and F2 progeny. answer. (b) A yellow female is crossed with a gray male. The F1 * 25. Miniature wings (Xm) in Drosophila result from an X-linked are intercrossed to produce the F2. Give the genotypes and allele that is recessive to the allele for long wings (X). Give phenotypes, along with the expected proportions, of the the genotypes of the parents in the following crosses. F1 and F2 progeny. Male Female Male Female (c) A yellow female is crossed with a gray male. The F1 parent parent offspring offspring females are backcrossed with gray males. Give the genotypes and phenotypes, along with the expected proportions, of (a) long long 231 long, 560 long the F2 progeny. 250 miniature (b) miniature long 610 long 632 long (d) If the F2 flies in part b mate randomly, what are the (c) miniature long 410 long, 412 long, expected phenotypic proportions of flies in the F3? 417 miniature 415 miniature Both John and Cathy have normal color vision. After 10 (d) long miniature 753 miniature 761 long years of marriage to John, Cathy gave birth to a color-blind (e) long long 625 long 630 long daughter. John filed for divorce, claiming he is not the father of the child. Is John justified in his claim of * 26. In chickens, congenital baldness results from a Z-linked recessive gene. A bald rooster is mated with a normal hen. nonpaternity? Explain why. If Cathy had given birth to Sex Determination and Sex-Linked Characteristics The F1 from this cross are interbred to produce the F2. Give * 30. the genotypes and phenotypes, along with their expected proportions, among the F1 and F2 progeny. 27. In the eastern mosquito fish (Gambusia affinis holbrooki), which has XX-XY sex determination, spotting is inherited as a Y-linked trait. The trait exhibits 100% penetrance when the fish are raised at 22°C, but the penetrance drops to 42% when the fish are raised at 26°C. A male with spots is crossed with a female without spots, and the F1 are intercrossed to produce the F2. If all the offspring are raised at 22°C, what proportion of the F1 and F2 will have spots? If all the offspring are raised at 26°C, what proportion of the F1 and F2 will have spots? * 28. How many Barr bodies would you expect to see in human cells containing the following chromosomes? 31. (a) XX (d) XXY (g) XYY (b) XY (e) XXYY (h) XXX (c) XO (f) XXXY (i) XXXX 99 Miniature wings in Drosophila melanogaster result from an X-linked gene (Xm) that is recessive to an allele for long wings (X). Sepia eyes are produced by an autosomal gene (s) that is recessive to an allele for red eyes (s). (a) A female fly that has miniature wings and sepia eyes is crossed with a male that has normal wings and is homozygous for red eyes. The F1 are intercrossed to produce the F2. Give the phenotypes and their proportions expected in the F1 and F2 flies from this cross. (b) A female fly that is homozygous for normal wings and has sepia eyes is crossed with a male that has miniature wings and is homozygous for red eyes. The F1 are intercrossed to produce the F2. Give the phenotypes and proportions expected in the F1 and F2 flies from this cross. Suppose that a recessive gene that produces a short tail in mice is located in the pseudoautosomal region. A shorttailed male is mated with a female mouse that is homozygous for a normal tail. The F1 from this cross are intercrossed to produce the F2. What will the phenotypes 29. Red – green color blindness is an X-linked recessive trait in and proportions of the F1 and F2 mice be from this cross? humans. Polydactyly (extra fingers and toes) is an autosomal dominant trait. Martha has normal fingers and * 32. A color-blind female and a male with normal vision have toes and normal color vision. Her mother is normal in all three sons and six daughters. All the sons are color blind. respects, but her father is color blind and polydactylous. Five of the daughters have normal vision, but one of them Bill is color blind and polydactylous. His mother has is color blind. The color-blind daughter is 16 years old, is normal color vision and normal fingers and toes. If Bill and short for her age, and has never undergone puberty. Martha marry, what types and proportions of children can Propose an explanation for how this girl inherited her color they produce? blindness. CHALLENGE QUESTIONS 33. On average, what proportion of the X-linked genes in the first individual is the same as that in the second individual? (a) (b) (c) (d) (e) (f) (g) (h) A A A A A A A A male and his mother female and her mother male and his father female and her father male and his brother female and her sister male and his sister female and her brother 34. A geneticist discovers a male mouse in his laboratory colony with greatly enlarged testes. He suspects that this trait results from a new mutation that is either Y linked or autosomal dominant. How could he determine whether the trait is autosomal dominant or Y linked? 35. Amanda is a genetics student at a small college in Connecticut. While counting her fruit flies in the laboratory one afternoon, she observed a strange species of fly in the room. Amanda captured several of the flies and began to raise them. After having raised the flies for several generations, she discovered a mutation in her colony that produces yellow eyes, in contrast with normal red eyes, and Amanda determined that this trait is definitely X-linked recessive. Because yellow eyes are X linked, she assumed that either this species has the XX-XY system of sex determination with genic balance similar to Drosophila or it has the XX-XO system of sex determination. How can Amanda determine whether sex determination in this species is XX-XY or XX-XO? The chromosomes of this species are very small and hard for Amanda to see with her student microscope, so she can only conduct crosses with flies having the yellow-eye mutation. Outline the crosses that Amanda should conduct and explain how they will prove XX-XY or XX-XO sex determination in this species. 36. Occasionally, a mouse X chromosome is broken into two pieces and each piece becomes attached to a different autosomal chromosome. In this event, only the genes on one of the two pieces undergo X inactivation. What does this observation indicate about the mechanism of X-chromosome inactivation? 100 Chapter 4 SUGGESTED READINGS Allen, G. E. 1978. Thomas Hunt Morgan: The Man and His Science. Princeton, NJ: Princeton University Press. An excellent history of one of the most important biologists of the early twentieth century. Bogan, J. S., and D. C. Page. 1994. Ovary? Testis? A mammalian dilemma. Cell 76:603 – 607. A concise review of the molecular nature of sex determination in mammals. Bridges, C. B. 1916. Nondisjunction as proof of the chromosome theory of heredity. Genetics 1:1 – 52. Bridges’s original paper describing his use of nondisjunction of X chromosomes to prove the chromosome theory of heredity. Foster, E. A., M. A. Jobling, P. G. Taylor, P. Donnelly, P. de Knijff, R. Mieremet, T. Zerjal, and C. Tyler-Smith. 1998. Jefferson fathered slave’s last child. Nature 396:27 – 28. Report on the use of Y-linked markers to establish the paternity of children of Thomas Jefferson’s slave. Kohler, R. E. 1994. Lords of the Fly: Drosophila Genetics and the Experimental Life. Chicago: University of Chicago Press. A comprehensive history of Drosophila genetics from 1910 to the early 1940s. Marx, J. 1995. Tracing how the sexes develop. Science 269: 1822 – 1824. A short, easy-to-read review of research on sex determination in fruit flies and the worm Caenorhabditis elegans. McClung, C. E. 1902. The accessory chromosome: sex determinant. Biological Bulletin 3:43 – 84. McClung’s original description of the X chromosome. Morgan, T. H. 1910. Sex-limited inheritance in Drosophila. Science 32:120 – 122. First description of an X-linked trait. Penny, G. D., G. F. Kay, S. A. Sheardown, S. Rastan, and N. Brockdorff. 1996. Requirement for Xist in X chromosome inactivation. Nature 379:131 – 137. This article provides evidence that the XIST gene has a role in X-chromosome inactivation. Ryner, L. C., and A. Swain. 1995. Sex in the 90s. Cell 81:483 – 493. A review of research findings about sex determination and dosage compensation. Thomas, M. G., T. Parfitt, D. A. Weiss, K. Skorecki, J. F. Wilson, M. le Roux, N. Bradman, and D. B. Goldstein. 2000. Y chromosomes traveling south: the Cohen modal haplotype and the origins of the Lemba — the “Black Jews of Southern Africa.” American Journal of Human Genetics 66:674 – 686. A fascinating report of the use of Y-linked genetic markers to trace the male ancestry of the Lemba tribe of South Africa. Williams, N. 1995. How males and females achieve X equality. Science 269:1826 – 1827. A brief, readable review of recent research on dosage compensation. __RRH 5 Extensions and Modifi ficcations of Basic Principles • • • • __CT Was Mendel Wrong? Dominance Revisited Lethal Alleles Multiple Alleles Duck-Feather Patterns The ABO Blood Group • Gene Interaction Gene Interaction That Produces Novel Phenotypes Gene Interaction with Epistasis The Complex Genetics of Coat Color in Dogs • The Interaction Between Sex and Heredity Sex-Influenced and Sex-Limited Characteristics Cytoplasmic Inheritance Genetic Maternal Effects Genomic Imprinting This is Chapter 5 Opener photo legend to position here. (Nancy Wexler, HDF/Neurology, Columbia University.) • • Anticipation Interaction Between Genes and Environment Environmental Effects on Gene Expression The Inheritance of Continuous Characteristics Was Mendel Wrong? In 1872, a physician from Long Island, New York named George Huntington described a medical condition characterized by jerky, involuntary movements. Now known as Huntington disease, the condition typically appears in middle age. The initial symptoms are subtle, consisting of mild behavioral and neurological changes; but, as the disease progresses, speech is impaired, walking becomes difficult, and psychiatric problems develop that frequently lead to insanity. Most people who have Huntington disease live for 10 to 30 years after the disease begins; there is currently no cure or effective treatment. Huntington disease appears with equal frequency in males and females, rarely skips generations and, when one parent has the disorder, approximately half of the children will be similarly affected. These are the hallmarks of an autosomal dominant trait — with one exception. The disorder occasionally arises before the age of 15 and, in these cases, progresses much more rapidly than it does when it arises in middle age. Among younger patients, the trait is almost always inherited from the father. According to Mendel’s principles of heredity (Chapter 3), males and females transmit autosomal traits with equal frequency, and reciprocal crosses should yield identical results; yet, for juvenile cases of Huntington 101 102 Chapter 5 (a) (b) 1 2 3 Huntingtondisease gene 4 FPO Photo of keryotypes from fig 2.6 Centromere Chromosome 4 ◗ 5.1 The gene for Huntington disease. (a) James Gusella and colleagues, whose research located the Huntington gene. (b) The gene has been mapped to the tip of chromosome 4. (Part a, Sam Ogden; part b, left courtesy of Dr. Thomas Ried and Dr. Evelin Schrock.) disease, Mendel’s principles do not apply. Was Mendel wrong? In 1983, a molecular geneticist at Massachusetts General Hospital named James Gusella determined that the gene causing Huntington disease is located near the tip of the short arm of chromosome 4. Gusella determined its location by analyzing DNA from members of the largest known family with Huntington disease, about 7000 people who live near Lake Maracaibo in Venezuela, more than 100 of whom have Huntington disease. Many experts predicted that, with the general location of the Huntington gene pinned down, the actual DNA sequence would be isolated within a few years. Despite intensive efforts, finding the gene took 10 years. When it was finally isolated in the spring of 1993 ( ◗ FIGURE 5.1), the gene turned out to be quite different from any of those that code for the traits studied by Mendel. The mutation that causes Huntington disease consists of an unstable region of DNA capable of expanding and contracting as it is passed from generation to generation. When the region expands, Huntington disease results. The degree of expansion affects the severity and age of onset of symptoms; the juvenile form of Huntington disease results from rapid expansion of the region, which occurs primarily when the gene is transmitted from father to offspring. This genetic phenomenon — the earlier appearance of a trait as it is passed from generation to generation — is called anticipation. Like a number of other genetic phenomena, anticipation does not adhere to Mendel’s principles of heredity. This lack of adherence doesn’t mean that Mendel was wrong; rather, it means that Mendel’s principles are not, by themselves, sufficient to explain the inheritance of all genetic characteristics. Our modern understanding of genetics has been greatly enriched by the discovery of a number of modifications and extensions of Mendel’s basic principles, which are the focus of this chapter. An important extension of Mendel’s principles of heredity — the inheritance of sex -linked characteristics — was introduced in Chapter 4. In this chapter, we will examine a number of additional refinements of Mendel’s basic tenets. We begin by reviewing the concept of dominance, emphasizing that dominance entails interactions between genes at one locus (allelic genes) and affects the way in which genes are expressed in the phenotype. Next, we consider lethal alleles and their effect on phenotypic ratios, followed by a discussion of multiple alleles. We then turn to interaction among genes at different loci (nonallelic genes). The phenotypic ratios produced by gene interaction are related to the ratios encountered in Chapter 3. In the latter part of the chapter, we will consider ways in which sex interacts with heredity. Our last stop will be a discussion of environmental influences on gene expression. The modifications and extensions of hereditary principles discussed in this chapter do not invalidate Mendel’s important contributions; rather, they enlarge our understanding of heredity by building on the framework provided by his principles of segregation and independent assortment. These modifications rarely alter the way in which the genes are inherited; rather, they affect the ways in which the genes determine the phenotype. www.whfreeman.com/pierce Huntington disease Additional information about Dominance Revisited One of Mendel’s important contributions to the study of heredity is the concept of dominance — the idea that an individual possesses two different alleles for a characteristic, but the trait enclosed by only one of the alleles is observed in the phenotype. With dominance, the heterozygote possesses the same phenotype as one of the homozygotes. When biologists began to apply Mendel’s principles to organisms other then peas, it quickly became apparent that many characteristics do not exhibit this type of dominance. Indeed, Mendel Extensions and Modifications of Basic Principles himself was aware that dominance is not universal, because he observed that a pea plant heterozygous for long and short flowering times had a flowering time that was intermediate between those of its homozygous parents. This situation, in which the heterozygote is intermediate in phenotype between the two homozygotes, is termed incomplete dominance. Dominance can be understood in regard to how the phenotype of the heterozygote relates to the phenotypes of the homozygotes. In the example presented in ◗ FIGURE 5.2, flower color potentially ranges from red to white. One homozygous genotype, A1A1, codes for red flowers, and another, A2A2, codes for white flowers. Where the heterozygote falls on the range of phenotypes determines the type of dominance. If the heterozygote (A1A2) has flowers that are the same color as those of the A1A1 homozygote (red), then the A1 allele is completely dominant over the A2 allele; that is, red is dominant over white. If, on the other hand, the heterozygote has flowers that are the same color as the A2A2 homozygote (white), then the A2 allele is completely dominant, and white is dominant over red. When the heterozygote falls in between the phenotypes of the two homozygotes, dominance is incomplete. With incomplete dominance, the heterozygote need not be exactly intermediate (pink in our example) between the two homozygotes; it might be a slightly lighter shade of red or a slightly pink shade of white. As long as the heterozygote’s phenotype can be differentiated and falls within the range of the two homozygotes, dominance is Complete dominance Phenotypic range 1 A1A1 codes for red flowers. A1A1 2 A2A2 codes for white flowers. Red dominant A2A2 White dominant A1A2 3 If the heterozygote is red, the A1 allele is dominant over the A2 allele. A1A2 4 If the heterozygote is white, the A2 allele is dominant over the A1 allele. Incomplete dominance A1A2 ◗ 5 If the phenotype of the heterozygote falls between the phenotypes of the two homozygotes dominance is incomplete 5.2 The type of dominance exhibited by a trait depends on how the phenotype of the heterozygote relates to the phenotypes of the homozygotes. Table 5.1 Differences between dominance, incomplete dominance, and codominance Type of Dominance Definition Dominance Phenotype of the heterozygote is the same as the phenotype of one of the homozygotes Incomplete dominance Phenotype of the heterozygote is intermediate (falls within the range) between the phenotypes of the two homozygotes Codominance Phenotype of the heterozygote includes the phenotypes of both homozygotes incomplete. The important thing to remember about dominance is that it affects the phenotype that genes produce, but not the way in which genes are inherited. Another type of interaction between alleles is codominance, in which the phenotype of the heterozygote is not intermediate between the phenotypes of the homozygotes; rather, the heterozygote simultaneously expresses the phenotypes of both homozygotes. An example of codominance is seen in the MN blood types. The MN locus codes for one of the types of antigens on red blood cells. Unlike antigens foreign to the ABO and Rh blood groups (which also code for red-blood-cell antigens), foreign MN antigens do not elicit a strong immunological reaction, and therefore the MN blood types are not routinely considered in blood transfusions. At the MN locus, there are two alleles: the LM allele, which codes for the M antigen; and the LN allele, which codes for the N antigen. Homozygotes with genotype LMLM express the M antigen on their red blood cells and have the M blood type. Homozygotes with genotype LNLN express the N antigen and have the N blood type. Heterozygotes with genotype LMLN exhibit codominance and express both the M and the N antigens; they have blood type MN. The differences between dominance, incomplete dominance, and codominance are summarized in Table 5.1. The type of dominance that a character exhibits frequently depends on the level of the phenotype examined. An example is cystic fibrosis, one of the more common genetic disorders found in Caucasians and usually considered to be a recessive disease. People who have cystic fibrosis produce large quantities of thick, sticky mucus, which plugs up the airways of the lungs and clogs the ducts leading from the pancreas to the intestine, causing frequent respiratory infections and digestive problems. Even with medical treatment, patients with cystic fibrosis suffer chronic, lifethreatening medical problems. 103 104 Chapter 5 The gene responsible for cystic fibrosis resides on the long arm of chromosome 7. It encodes a protein termed cystic fibrosis transmembrane conductance regulator, mercifully abbreviated CFTR, which acts as a gate in the cell membrane and regulates the movement of chloride ions into and out of the cell. Patients with cystic fibrosis have a mutated, dysfunctional form of CFTR that causes the channel to stay closed, and so chloride ions build up in the cell. This buildup causes the formation of thick mucus and produces the symptoms of the disease. Most people have two copies of the normal allele for CFTR, and produce only functional CFTR protein. Those with cystic fibrosis possess two copies of the mutated CFTR allele, and produce only the defective CFTR protein. Heterozygotes, with one normal and one defective CFTR allele, produce both functional and defective CFTR protein. Thus, at the molecular level, the alleles for normal and defective CFTR are codominant, because both alleles are expressed in the heterozygote. However, because one normal allele produces enough functional CFTR protein to allow normal chloride transport, the heterozygote exhibits no adverse effects, and the mutated CFTR allele appears to be recessive at the physiological level. In summary, several important characteristics of dominance should be emphasized. First, dominance is a result of interactions between genes at the same locus; in other words, dominance is allelic interaction. Second, dominance does not alter the way in which the genes are inherited; it only influences the way in which they are expressed as a phenotype. The allelic interaction that characterizes dominance is therefore interaction between the products of the genes. Finally, dominance is frequently “in the eye of the beholder,” meaning that the classification of dominance depends on the level at which the phenotype is examined. As we saw with cystic fibrosis, an allele may exhibit codominance at one level and be recessive at another level. Concepts Dominance entails interactions between genes at the same locus (allelic genes) and is an aspect of the phenotype; dominance does not affect the way in which genes are inherited. The type of dominance exhibited by a characteristic frequently depends on the level of the phenotype examined. Lethal Alleles In 1905, Lucien Cuenot reported a peculiar pattern of inheritance in mice. When he mated two yellow mice, approximately 23 of their offspring were yellow and 13 were nonyellow. When he test-crossed the yellow mice, he found that all were heterozygous; he was never able to obtain a yellow mouse that bred true. There was a great deal of P generation Yellow Yellow  Yy Yy Meiosis Gametes y Y y Y Fertilization F1 generatio generation Dead Yellow Nonyellow 1/4 YY 1/2Yy 1/4 yy Conclusion: YY mice die, and so 2/3 of progeny are Yy, yellow 1/3 of progeny are yy, nonyellow ◗ 5.3 A 2 : 1 ratio among the progeny of a cross results from the segregation of a lethal allele. discussion about Cuenot’s results among his colleagues, but it was eventually realized that the yellow allele must be lethal when homozygous ( ◗ FIGURE 5.3). A lethal allele is one that causes death at an early stage of development — often before birth — and so a some genetypes may not appear among the progeny. Cuenot originally crossed two mice heterozygous for yellow: Yy  Yy. Normally, this cross would be expected to produce 14 YY, 12 Yy, and 14 yy (see Figure 5.3). The homozygous YY mice are conceived but never complete development, which leaves a 2 : 1 ratio of Yy (yellow) to yy (nonyellow) in the observed offspring; all yellow mice are heterozygous (Yy). Another example of a lethal allele, originally described by Erwin Baur in 1907, is found in snapdragons. The aurea strain in these plants has yellow leaves. When two plants with yellow leaves are crossed, 23 of the progeny have yellow leaves and 13 have green leaves. When green is crossed with green, all the progeny have green leaves; however, when yellow is crossed with green, 12 of the progeny are green and 1 2 are yellow, confirming that all yellow-leaved snapdragons are heterozygous. A 2 : 1 ratio is almost always produced by a recessive lethal allele; so observing this ratio among the progeny of a cross between individuals with the same phenotype is a strong clue that one of the alleles is lethal. In both of these examples, the lethal alleles are recessive because they cause death only in homozygotes. Unlike its effect on survival, the effect of the allele on color is dominant; in both mice and snapdragons, a single copy of the allele in the heterozygote produces a yellow color. Lethal alleles also can be dominant; in this case, homozygotes and Extensions and Modifications of Basic Principles heterozygotes for the allele die. Truly dominant lethal alleles cannot be transmitted unless they are expressed after the onset of reproduction, as in Huntington disease. P generation Restricted Mallard  Concepts M Rm d A lethal allele causes death, frequently at an early developmental stage, and so one or more genotypes are missing from the progeny of a cross. Lethal alleles may modify the ratio of progeny resulting from a cross. Mm d Meiosis Gametes M R md md M Fertilization Multiple Alleles Most of the genetic systems that we have examined so far consist of two alleles. In Mendel’s peas, for instance, one allele coded for round seeds and another for wrinkled seeds; in cats, one allele produced a black coat and another produced a gray coat. For some loci, more than two alleles are present within a group of individuals — the locus has multiple alleles. (Multiple alleles may also be referred to as an allelic series.) Although there may be more than two alleles present within a group, the genotype of each diploid individual still consists of only two alleles. The inheritance of characteristics encoded by multiple alleles is no different from the inheritance of characteristics encoded by two alleles, except that a greater variety of genotypes and phenotypes are possible. F1 generation 1/2 Restricted 1/4 M RM 1/4 M Rm d Mallard 1/4 Mm d Dusky 1/4 m dm d Conclusion: Progeny are 1/2 restricted, 1/4 mallard, and 1/4 dusky ◗ 5.4 Mendel’s principle of segregation applies to crosses with multiple alleles. In this example, three alleles determine the type of plumage in mallard ducks: M R (Restricted)  M (Mallard)  m d (Dusky). Duck-Feather Patterns The ABO Blood Group An example of multiple alleles is seen at a locus that determines the feather pattern of mallard ducks. One allele, M, produces the wild-type mallard pattern. A second allele, MR, produces a different pattern called restricted, and a third allele, md, produces a pattern termed dusky. In this allelic series, restricted is dominant over mallard and dusky, and mallard is dominant over dusky: M R  M  md. The six genotypes possible with these three alleles and their resulting phenotypes are: Another multiple-allele system is at the locus for the ABO blood group. This locus determines your ABO blood type and, like the MN locus, codes for antigens on red blood cells. The three common alleles for the ABO blood group locus are: IA, which codes for the A antigen; IB, which codes for the B antigen; and i, which codes for no antigen (O). We can represent the dominance relations among the ABO alleles as follows: IA  i, IB  i, IA  IB. The IA and IB alleles are both dominant over i and are codominant with each other; the AB phenotype is due to the presence of an IA allele and an IB allele, which results in the production of A and B antigens on red blood cells. An individual with genotype ii produces neither antigen and has blood type O. The six common genotypes at this locus and their phenotypes are shown in ◗ FIGURE 5.5a. Antibodies are produced against any foreign antigens (see Figure 5.5a). For instance, a person having blood type A produces B antibodies, because the B antigen is foreign. A person having blood type B produces A antibodies, and someone having blood type AB produces neither A nor B antibodies, because neither A nor B antigen is foreign. A person having blood type O possesses no A or B antigens; consequently that person produces both A antibodies and B antibodies. The presence of antibodies against foreign ABO antigens means Genotype MRMR MRM MRmd MM Mmd mdmd Phenotype restricted restricted restricted mallard mallard dusky In general, the number of genotypes possible will be [n(n1)]/2, where n equals the number of different alleles at a locus. Working crosses with multiple alleles is no different from working crosses with two alleles; Mendel’s principle of segregation still holds, as shown in the cross between a restricted duck and a mallard duck ( ◗ FIGURE 5.4). 105 106 Chapter 5 (b) Blood-recipient reactions to donor-blood antibodies (a) Phenotype (blood type) Genotype I Antibodies made by body A (B antibodies) B (A antibodies) O (A and B antibodies) Red blood cells that do not react with the recipient antibody remain evenly dispersed. Donor blood and recipient blood are compatable. A B Blood cells that react with the recipient antibody clump together. Donor blood and recipient blood are not compatible. B I BI B or I Bi B A AB I AI B A and B None O ii None A and B Type O donors can donate to any recipient: they are universal donors. ◗ 5.5 AB (no antibodies) AI A or I Ai A Antigen type Type AB recipients can accept blood from any donor: they are universal recipients. ABO blood types and possible blood transfusions. that successful blood transfusions are possible only between persons with certain compatible blood types ( ◗ FIGURE 5.5b). The inheritance of alleles at the ABO locus can be illustrated by a paternity suit involving the famous movie actor Charlie Chaplin. In 1941, Chaplin met a young actress named Joan Barry, with whom he had an affair. The affair ended in February 1942 but, 20 months later, Barry gave birth to a baby girl and claimed that Chaplin was the father. Barry then sued for child support. At this time, blood typing had just come into widespread use, and Chaplin’s attorneys had Chaplin, Barry, and the child blood typed. Barry had blood type A, her child had blood type B, and Chaplin had blood type O. Could Chaplin have been the father of Barry’s child? Your answer should be no. Joan Barry had blood type A, which can be produced by either genotype IAIA or IAi. Her baby possessed blood type B, which can be produced by either genotype IBIB or IBi. The baby could not have inherited the IB allele from Barry (Barry could not carry an IB allele if she were blood type A); therefore the baby must have inherited the i allele from her. Barry must have had genotype IAi, and the baby must have had genotype IBi. Because the baby girl inherited her i allele from Barry, she must have inherited the IB allele from her father. With blood type O, produced only by genotype ii, Chaplin could not have been the father of Barry’s child. In the course of the trial to settle the paternity suit, three pathologists came to the witness stand and declared that it was genetically impossible for Chaplin to have fathered the child. Nevertheless, the jury ruled that Chaplin was the father and ordered him to pay child support and Barry’s legal expenses. Concepts More than two alleles (multiple alleles) may be present within a group of individuals, although each diploid individual still has only two alleles at that locus. Gene Interaction In the dihybrid crosses that we examined in Chapter 3, each locus had an independent effect on the phenotype. When Mendel crossed a homozygous round and yellow plant (RRYY) with a homozygous wrinkled and green plant (rryy) and then self-fertilized the F1, he obtained F2 progeny in the following proportions: 16 16 3 16 1 16 9 3 R_Y_ R_yy rrY_ rryy round, yellow round, green wrinkled, yellow wrinkled, green Extensions and Modifications of Basic Principles In this example, the genes showed two kinds of independence. First, the genes at each locus are independent in their assortment in meiosis, which is what produces the 9 : 3 : 3 : 1 ratio of phenotypes in the progeny, in accord with Mendel’s principle of independent assortment. Second, the genes are independent in their phenotypic expression; the R and r alleles affect only the shape of the seed and have no influence on the color of the endosperm; the Y and y alleles affect only color and have no influence on the shape of the seed. Frequently, genes exhibit independent assortment but do not act independently in their phenotypic expression; instead, the effects of genes at one locus depend on the presence of genes at other loci. This type of interaction between the effects of genes at different loci (genes that are not allelic) is termed gene interaction. With gene interaction, the products of genes at different loci combine to produce new phenotypes that are not predictable from the single-locus effects alone. In our consideration of gene interaction, we’ll focus primarily on interaction between the effects of genes at two loci, although interactions among genes at three, four, or more loci are common. (a) P generation Red Green  RR CC rr cc Cross F1 generation Red Rr Cc (b) F1 generation Concepts In gene interaction, genes at different loci contribute to the determination of a single phenotypic characteristic.  Rr Cc Rr Cc Cross Gene Interaction That Produces Novel Phenotypes Let’s first examine gene interaction in which genes at two loci interact to produce a single characteristic. Fruit color in the pepper Capsicum annuum is determined in this way. This plant produces peppers in one of four colors: red, brown, yellow, or green. If a homozygous plant with red peppers is crossed with a homozygous plant with green peppers, all the F1 plants have red peppers ( ◗ FIGURE 5.6a). When the F1 are crossed with one another, the F2 are in a ratio of 9 red : 3 brown : 3 yellow : 1 green ( ◗ FIGURE 5.6b). This dihybrid ratio (Chapter 3) is produced by a cross between two plants that are both heterozygous for two loci (RrCc  RrCc). In peppers, a dominant allele R at the first locus produces a red pigment; the recessive allele r at this locus produces no red pigment. A dominant allele C at the second locus causes decomposition of the green pigment chlorophyll; the recessive allele c allows chlorophyll to persist. The genes at the two loci then interact to produce the colors seen in F2 peppers: Genotype R_C_ R_cc rrC_ rrcc Phenotype red brown yellow green F2 generation Red 9/16 R_ C_ Brown 3/16 R_ cc Yellow 3/16 rr C_ Green 1/16 rr cc Conclusion: 9 red : 3 brown : 3 yellow : 1 green ◗ 5.6 Gene interaction in which two loci determine a single characteristic, fruit color, in the pepper Capsicum annuum. To illustrate how Mendel’s rules of heredity can be used to understand the inheritance of characteristics determined by gene interaction, let’s consider a testcross between an F1 plant from the cross in Figure 5.6 (RrCc) and a plant with green peppers (rrcc). As outlined in Chapter 3 (p. 000) for independent loci, we can work this cross by breaking it down into two simple crosses. At the first locus, the heterozygote Rr is crossed with the homozygote rr; this cross produces 12 Rr and 12 rr progeny. Similarly, at the second locus, the heterozygous genotype Cc is crossed with the homozygous genotype cc, producing 12 Cc and 12 cc progeny. In accord with Mendel’s principle of 107 108 Chapter 5 ◗ 5.7 A chicken’s comb is determined by gene interaction between two loci. (a) A walnut comb is produced when there is a dominant allele at each of two loci (R_P_). (b) A rose comb occurs when there is a dominant allele only at the first locus (R_pp). (c) A pea comb occurs when there is a dominant allele only at the second locus (ppR_). (d) A single comb is produced by the presence of only recessive alleles at both loci (rrpp). (Parts a and d, R. OSF Dowling/Animals Animals; part b, Robert Maier/Animals Animals; part c, George Godfrey/Animals Animals.) independent assortment, these single-locus ratios can be combined by using the multiplication rule: the probability of obtaining the genotype RrCc is the probability of Rr (12) multiplied by the probability of Cc (12), or 14. The probability of each progeny genotype resulting from the testcross is: recessive alleles are present at the first locus and at least one dominant allele is present at the second (genotype rrP_), the chicken has a pea comb. Finally, if two recessive alleles are present at both loci (rrpp), the bird has a single comb. Gene Interaction with Epistasis Progeny genotype RrCc Rrcc rrCc rrcc Probability at each locus 1 2  12  1 2  12  1 2  12  1 2  12  Overall probability 4 4 1 4 1 4 1 1 Phenotype red peppers brown peppers yellow peppers green peppers When you work problems with gene interaction, it is especially important to determine the probabilities of singlelocus genotypes and to multiply the probabilities of genotypes, not phenotypes, because the phenotypes cannot be determined without considering the effects of the genotypes at all the contributing loci. Another example of gene interaction that produces novel phenotypes is seen in the genes that determine comb shape in chickens. The comb is the fleshy structure found on the head of a chicken. Genes at two loci (R, r and P, p) interact to determine the four types of combs shown in ◗ FIGURE 5.7. A walnut comb is produced when at least one dominant allele R is present at the first locus and at least one dominant allele P is present at the second locus (genotype R_P_). A chicken with at least one dominant allele at the first locus and two recessive alleles at the second locus (genotype R_pp) possesses a rose comb. If two Sometimes the effect of gene interaction is that one gene masks (hides) the effect of another gene at a different locus, a phenomenon known as epistasis. This phenomenon is similar to dominance, except that dominance entails the masking of genes at the same locus (allelic genes). In epistasis, the gene that does the masking is called the epistatic gene; the gene whose effect is masked is a hypostatic gene. Epistatic genes may be recessive or dominant in their effects. Recessive epistasis Recessive epistasis is seen in the genes that determine coat color in Labrador retrievers. These dogs may be black, brown, or yellow; their different coat colors are determined by interactions between genes at two loci (although a number of other loci also help to determine coat color; see p. 000). One locus determines the type of pigment produced by the skin cells: a dominant allele B codes for black pigment, whereas a recessive allele b codes for brown pigment. Alleles at a second locus affect the deposition of the pigment in the shaft of the hair; allele E allows dark pigment (black or brown) to be deposited, whereas a recessive allele e prevents the deposition of dark pigment, causing the hair to be yellow. The presence of genotype ee at the second locus therefore masks the expression of the black and brown alleles at the first locus. The Extensions and Modifications of Basic Principles genotypes that determine coat color and their phenotypes are: Genotype B_ E_ bbE_ B_ee bbee Phenotype black brown (frequently called chocolate) yellow yellow If we cross a black Labrador homozygous for the dominant alleles with a yellow Labrador homozygous for the recessive alleles and then intercross the F1, we obtain progeny in the F2 in a 9 : 3 : 4 ratio: P BBEE  bbee black yellow s p BbEe black s p Intercross F1 F2 16 B_E_ 3 16 bbE_ 3 16 Bee 1 16 bbee 9 black brown yellow yellow  4 16 yellow Notice that yellow dogs can carry alleles for either black or brown pigment, but these alleles are not expressed in their coat color. In this example of gene interaction, allele e is epistatic to B and b, because e masks the expression of the alleles for black and brown pigments, and alleles B and b are hypostatic to e. In this case, e is a recessive epistatic allele, because two copies of e must be present to mask of the black and brown pigments. Dominant epistasis Dominant epistasis is seen in the interaction of two loci that determine fruit color in summer squash, which is commonly found in one of three colors: yellow, white, or green. When a homozygous plant that produces white squash is crossed with a homozygous plant that produces green squash and the F1 plants are crossed with each other, the following results are obtained: P plants with plants with white squash  green squash s p plants with white squash s p Intercross F1 F2 16 plants with white squash 16plants with yellow squash 1 16plants with green squash 12 3 How can gene interaction explain these results? In the F2, 1216 or 34 of the plants produce white squash and 316  116  416  14 of the plants produce squash having color. This outcome is the familiar 3 : 1 ratio produced by a cross between two heterozygous individuals, which suggests that a dominant allele at one locus inhibits the production of pigment, resulting in white progeny. If we use the symbol W to represent the dominant allele that inhibits pigment production, then genotype W_ inhibits pigment production and produces white squash, whereas ww allows pigment and results in colored squash. Among those ww F2 plants with pigmented fruit, we observe 316 yellow and 116 green (a 3 : 1 ratio). This outcome is because a second locus determines the type of pigment produced in the squash, with yellow (Y_) dominant over green (yy). This locus is expressed only in ww plants, which lack the dominant inhibitory allele W. We can assign the genotype wwY_ to plants that produce yellow squash and the genotype wwyy to plants that produce green squash. The genotypes and their associated phenotypes are: W_Y_ W_yy wwY_ wwyy white squash white squash yellow squash green squash Allele W is epistatic to Y and y — it suppresses the expression of these pigment-producing genes. W is a dominant epistatic allele because, in contrast with e in Labrador retriever coat color, a single copy of the allele is sufficient to inhibit pigment production. Summer squash provides us with a good opportunity for considering how epistasis often arises when genes affect a series of steps in a biochemical pathway. Yellow pigment in the squash is most likely produced in a two-step biochemical pathway ( ◗ FIGURE 5.8). A colorless (white) compound (designated A in Figure 5.8) is converted by enzyme I into green compound B, which is then converted into compound C by enzyme II. Compound C is the yellow pigment in the fruit. Plants with the genotype ww produce enzyme I and may be green or yellow, depending on whether enzyme II is present. When allele Y is present at a second locus, enzyme II is produced and compound B is converted into compound C, producing a yellow fruit. When two copies of y, which does not encode a functional form of enzyme II, are present, squash remain green. The presence of W at the first locus inhibits the conversion of compound A into compound B; plants with genotype W_ do not make compound B and their fruit remains white, regardless of which alleles are present at the second locus. Many cases of epistasis arise in this way. A gene (such as W) that has an effect on an early step in a biochemical pathway will be epistatic to genes (such as Y and y) that affect subsequent steps, because the effect of the enzyme in the later step depends on the product of the earlier reaction. 109 110 Chapter 5 1 Plants with genotype ww produce enzyme I, which converts compound A (colorless) into compound B (green). 3 Plants with genotype Y_ produce enzyme II, which converts compound B into compound C (yellow). ww plants Compound A Y_ plants Enzyme I Compound B W_ plants 4 Plants with genotype yy do not encode a functional form of enzyme II. Yellow pigment in summer squash is produced in a two-step pathway. Duplicate recessive epistasis Let’s consider one more detailed example of epistasis. Albinism is the absence of pigment and is a common genetic trait in many plants and animals. Pigment is almost always produced through a multistep biochemical pathway; thus, albinism may entail gene interaction. Robert T. Dillon and Amy R. Wethington found that albinism in the common freshwater snail Physa heterostroha can result from the presence of either of two recessive alleles at two different loci. Inseminated snails were collected from a natural population and placed in cups of water, where they laid eggs. Some of the eggs hatched into albino snails. When two albino snails were crossed, all of the F1 were pigmented. On intercrossing the F1, the F2 consisted of 916 pigmented snails and 716 albino snails. How did this 9 : 7 ratio arise? The 9 : 7 ratio seen in the F2 snails can be understood as a modification of the 9 : 3 : 3 : 1 ratio obtained when two individuals heterozygous for two loci are crossed. The 9 : 7 ratio arises when dominant alleles at both loci (A_B_) produce pigmented snails; any other genotype produces albino snails: P aaBB AAbb albino albino s p F1 F2 Compound C yy plants 2 Dominant allele W inhibits the conversion of A into B. ◗ 5.8 Enzyme II Conclusion: Genotypes W_ Y_ and W_ yy do not produce enzyme I; ww yy produces enzyme I but not enzyme II; ww Y_ produces both enzyme I and enzyme II. compound B has been converted into compound C by enzyme II. At least one dominant allele A at the first locus is required to produce enzyme I; similarly, at least one dominant allele B at the second locus is required to produce enzyme II. Albinism arises from the absence of compound C, which may happen in three ways. First, two recessive alleles at the first locus (genotype aaB_) may prevent the production of enzyme I, and so compound B is never produced. Second, two recessive alleles at the second locus (genotype A_bb) may prevent the production of enzyme II. In this case, compound B is never converted into compound C. Third, two recessive alleles may be present at both loci (aabb), causing the absence of both enzyme I and enzyme II. In this example of gene interaction, a is epistatic to B, and b is epistatic to A; both are recessive epistatic alleles because the presence of two copies of either allele a or b is necessary to suppress pigment production. This example differs from the suppression of coat color in Labrador retrievers in that recessive alleles at either of two loci are capable of suppressing pigment production in the snails, whereas recessive alleles at a single locus suppress pigment expression in Labs. Concepts AaBb pigmented s p Intercross 16 A_B_ pigmented 16 aaB albino 3 16 Abb albino 716 albino 1 16 aabb albino Epistasis is the masking of the expression of one gene by another gene at a different locus. The epistatic gene does the masking; the hypostatic gene is masked. Epistatic genes can be dominant or recessive. 9 3  The 9 : 7 ratio in these snails is probably produced by a twostep pathway of pigment production ( ◗ FIGURE 5.9). Pigment (compound C) is produced only after compound A has been converted into compound B by enzyme I and after Connecting Concepts Interpreting Ratios Produced by Gene Interaction A number of modified ratios that result from gene interaction are shown in Table 5.2. Each of these examples represents a modification of the basic 9 : 3 : 3 : 1 dihybrid ratio. Extensions and Modifications of Basic Principles 1 A dominant allele at the A locus is required to produce enzyme I, which converts compound A into compound B. 2 A dominant allele at the B locus is required to produce enzyme II, which converts compound B into compound C (pigment). A_ snails B_ snails Enzyme I Compound A Compound B Enzyme II aa snails Compound C bb snails 3 Albinism arises from the absence of enzyme I (aa B_), so compound B is never produced,… ◗ 5.9 5 Pigmented snails must produce enzymes I and II, which requires genotype A_ B_. 4 …or from the absence of enzyme II (A_ bb), so compound C is never produced, or from the absence of both enzymes (aa bb). Pigment is produced in a two-step pathway in snails. In interpreting the genetic basis of modified ratios, we should keep several points in mind. First, the inheritance of the genes producing these characteristics is no different from the inheritance of genes coding for simple genetic characters. Mendel’s principles of segregation and independent assortment still apply; each individual possesses two alleles at each locus, which separate in meiosis, and genes at the different loci assort independently. The only difference is in how the products of the genotypes interact to produce the phenotype. Thus, we cannot consider the expression of genes at each locus separately, but must take into consideration how the genes at different loci interact. A second point is that in the examples that we have considered, the phenotypic proportions were always in sixteenths because, in all the crosses, pairs of alleles segregated at two independently assorting loci. The probability of inheriting one of the two alleles at a locus is 12. Because there are two loci, each with two alleles, the probability of inheriting any particular combination of genes is (12)4  1 16. For a trihybrid cross, the progeny proportions should be in sixty-fourths, because (12)6  164. In general, the progeny proportions should be in fractions of (12)2n, where n equals the number of loci with two alleles segregating in the cross. Table 5.2 Modified dihybrid — phenotypic ratios due to gene interaction Genotype Ratio A_B_ A_bb 9:3:3:1 9 3 9:3:4 9 3 12:3:1 9 9:6:1 9 aabb 3 1 3 15:1 15 13 3 Example Seed shape and endosperm color in peas Recessive epistasis Coat color in Labrador retrievers Dominant epistasis Color in squash Duplicate recessive epistasis Albinism in snails 1 Duplicate interaction — 1 Duplicate dominant epistasis — Dominant and recessive epistasis — 1 7 6 Type of Interaction None 4 12 9:7 13:3 aaB_ *Reading across, each row gives the phenotypic ratios of progeny from a dihybrid cross (AaBb  AaBb). 111 112 Chapter 5 Crosses rarely produce exactly 16 progeny; therefore, modifications of a dihybrid ratio are not always obvious. Modified dihybrid ratios are more easily seen if the number of individuals of each phenotype is expressed in sixteenths: x number of progeny with a phenotype  16 total number of progeny where x/16 equals the proportion of progeny with a particular phenotype. If we solve for x (the proportion of the particular phenotype in sixteenths), we have: number of progeny with a phenotype  16 total number of progeny For example, suppose we cross two homozygous individuals, interbreed the F1 and obtain 63 red, 21 brown, and 28 white F2 individuals. Using the preceding formula, the phenotypic ratio in the F2 is: red  (63  16)/112  9; brown  (21  16)/112  3; and white  (28  16)/112  4. The phenotypic ratio is 9 : 3 : 4 A final point to consider is how to assign genotypes to the phenotypes in modified ratios owing to gene interaction. Don’t try to memorize the genotypes associated with all the modified ratios in Table 5.2. Instead, practice relating modified ratios to known ratios, such as the 9 : 3 : 3 : 1 dihybrid ratio. Suppose we obtain 1516 green progeny and 1 16 white progeny in a cross between two plants. If we compare this 15 : 1 ratio with the standard 9 : 3 : 3 : 1 dihybrid ratio, we see that 916  316  316 equals 1516. All the genotypes associated with these proportions in the dihybrid cross (A_B_ , A_bb, and aaB_) must give the same phenotype, the green progeny. Genotype aabb makes up 116 of the progeny in a dihybrid cross, the white progeny in this cross. In assigning genotypes to phenotypes in modified ratios, students sometimes become confused about which letters to assign to which phenotype. Suppose we obtain the following phenotypic ratio: 916 black : 316 brown : 416 white. Which genotype do we assign to the brown progeny, A_bb or aaB_? Either answer is correct, because the letters are just arbitrary symbols for the genetic information. The important thing to realize about this ratio is that the brown phenotype arises when two recessive alleles are present at one locus. x Concepts Gene interaction frequently produces modified phenotypic ratios. These modified ratios can be understood by relating them to other known ratios. The Complex Genetics of Coat Color in Dogs Coat color in dogs is an excellent example of how complex interactions between genes may take part in the determination of a phenotype. Domestic dogs come in an amazing variety of shapes, sizes, and colors. For thousands of years, humans have been breeding dogs for particular traits, producing the large number of types that we see today. Each breed of dog carries a selection of genes from the ancestral dog gene pool; these genes define the features of a particular breed. One of the most obvious differences between dogs is coat color. The genetics of coat color in dogs is quite complex; many genes participate, and there are numerous interactions between genes at different loci. We will consider seven loci (in the list that follows) that are important in producing many of the noticeable differences in color and pattern among breeds of dogs. In interpreting the genetic basis of differences in coat color of dogs, consider how the expression of a particular gene is modified by the effects of other genes. Keep in mind that additional loci not listed here can modify the colors produced by these seven loci and that not all geneticists agree on the genetics of color variation in some breeds. 1. Agouti (A) locus — This locus has five common alleles that determine the depth and distribution of color in a dog’s coat: As Solid black pigment. aw Agouti, or wolflike gray. Hairs encoded by this allele have a “salt and pepper” appearance, produced by a band of yellow pigment on a black hair. ay Yellow. The black pigment is markedly reduced; so the entire hair is yellow. as Saddle markings (dark color on the back, with extensive tan markings on the head and legs). at Bicolor (dark color over most of the body, with tan markings on the feet and eyebrows). s A and ay are generally dominant over the other alleles, but the dominance relations are complex and not yet completely understood. 2. Black (B) locus — This locus determines whether black pigment can be formed. The actual color of a dog’s fur depends on the effects of genes at other loci (such as the A, C, D, and E loci). Two alleles are common: B Allows black pigment to be produced; the dog will be black if it also possesses certain alleles at the A, C, D, and E loci. b Black pigment cannot be produced; pigmented dogs can be chocolate, liver, tan, or red. B is dominant over b. 3. Albino (C) locus — This locus determines whether full color will be expressed. There are five alleles at this locus: C Color fully expressed. c ch Chinchilla. Less color is expressed, and pigment is completely absent from the base of the long hairs, producing a pale coat. cd All white coat with dark nose and dark eyes. cb All white coat with blue eyes. c Fully albino. The dogs have an all-white coat with pink eyes and nose. Extensions and Modifications of Basic Principles (a) (c) (b) ◗ 5.10 Coat color in dogs is determined by interactions between genes at a number of loci. (a) Most Labrador retrievers are genotype AsAsCCDDSStt, varing only at the B and E loci. (b) Most beares are genotype asasBBCCDDspsptt. (c) Dalmations are genotype AsAsCCDDEEswswTT, varing at the B locus so that the dogs are black (B_) or brown (bb). (Part a, Robert Maier/Animals Animals; part b, Ralph Reinhold/Animals Animals; part c, Robert Percy/ Animals Animals.) The dominance relations among these alleles is presumed to be C  c ch  c d  c b  c, but the c ch and c alleles are rare, and crosses including all possible genotypes have not been completed. 4. Dilution (D) locus — This locus, with two alleles, determines whether the color will be diluted. For example, diluted black pigment appears bluish, and diluted yellow appears cream. The diluted effect is produced by an uneven distribution of pigment in the hair shaft: D Intense pigmentation. d Dilution of pigment. D is dominant over d. 5. Extension (E) locus — Four alleles at this locus determine where the genotype at the A locus is expressed. For example, if a dog has the As allele (solid black) at the A locus, then black pigment will either be extended throughout the coat or be restricted to some areas, depending on the alleles present at the E locus. Areas where the A locus is not expressed may appear as yellow, red, or tan, depending on the presence of particular genes at other loci. When As is present at the A locus, the four alleles at the E locus have the following effects: Em Black mask with a tan coat. E The A locus expressed throughout (solid black). ebr Brindle, in which black and yellow are in layers to give a tiger-striped appearance. e No black in the coat, but the nose and eyes may be black. The dominance relations among these alleles are poorly known. 6. Spotting (S) locus — Alleles at this locus determine whether white spots will be present. There are four common alleles: S No spots. si Irish spotting; numerous white spots. sp Piebald spotting; various amounts of white. sw Extreme white piebald; almost all white. S is completely dominant over si, sp, and sw; si and sp are dominant over sw (S  si, sp  sw). The relation between of si and sp is poorly defined; indeed, they may not be separate alleles. Genes at other poorly known loci also modify spotting patterns. 7. Ticking (T) locus — This locus determines the presence of small colored spots on the white areas, which is called ticking: T Ticking; small colored spots on the areas of white. t No ticking. T is dominant over t. Ticking cannot be expressed if a dog has a solid coat (S_). To illustrate how genes at these loci interact in determining a dog’s coat color, let’s consider a few examples: Labrador retriever- Labrador retrievers ( ◗ FIGURE 5.10a) may be black, brown, or yellow. Most are homozygous AsAsCCDDSStt; thus, they vary only at the B and E loci. The As, C, and D alleles allow dark pigment to be expressed; whether a dog is black depends on which genes are present at the B and E loci. As discussed earlier in the chapter, all black Labradors must carry at least one B allele and one E allele (B_E_). Brown dogs are homozygous bb and have at least one E allele (bbE_). Yellow dogs are a result of the presence of ee (B_ee or bbee). Labrador retrievers are homozygous for the S allele, which produces a solid color; the few white spots that appear in some dogs of this breed are due to other modifying genes. The allele for ticking, T, is presumed not to exist in Labradors; however, Labrador retrievers have solid coats and ticking is expressed only in spotted dogs; so its absence is uncertain. Beagle- Most beagles are homozygous asas BBCCDDspsptt, although other alleles at these loci are occasionally present. The as allele produces the saddle markings — dark back and sides, with tan head 113 114 Chapter 5 and legs — that are characteristic of the breed ( ◗ FIGURE 5.10b). Alleles B, C, and D allow black to be produced, but its distribution is limited by the as allele. Genotype ee does occasionally arise, leading to a few all-tan beagles. White spotting in beagles is due to the sp allele. Ticking can appear, but most beagles are tt. Dalmatian- Dalmatians ( ◗ FIGURE 5.10c) have an interesting genetic makeup. Most are homozygous AsAs CCDDEEswswTT; so they vary only at the B locus. Notice that these dogs possess genotype AsAsCCDDEE, which allows for a solid coat that would be black, if genotype B_ is present, or brown (called liver), if genotype bb is present. However, the presence of the sw allele produces a white coat, masking the expression of the solid color. The dog’s color appears only in the pigmented spots, which are due to the presence of the ticking allele T. Table 5.3 gives the common genotypes of other breeds of dogs. found in wild-type flies. Apricot is an X-linked recessive mutation that produces light orange-colored eyes. Do the white and apricot mutations occur at the same locus or at different loci? We can use the complementation test to answer this question. To carry out a complementation test, parents that are homozygous for different mutations are crossed, producing offspring that are heterozygous. If the mutations are allelic (occur at the same locus), then the heterozygous offspring have only mutant alleles (ab) and exhibit a mutant phenotype:  a b a b a www.whfreeman.com/pierce Information on dog genetics, including the Dog Genome Project mutant phenotype b If, on the other hand, the mutations occur at different loci, each of the homozygous parents possesses wild-type genes at the other locus (aa bb and a a bb); so the heterozygous offspring inherit a mutant and a wild-type allele at each locus. In this case, the mutations complement each other and the heterozygous offspring have the wild-type phenotype: Complementation: Determining Whether Mutations Are at the Same or Different Loci How do we know whether different mutations that affect a characteristic occur at the same locus (are allelic) or at different loci? In fruit flies, for example, white is an X-linked mutation that produces white eyes instead of the red eyes Table 5.3 Common genotypes in different breeds of dogs Breed Usual Homozygous Genes* Other Genes Present Within the Breed Basset hound BBCCDDEEtt ay, at Beagle asasBBCCDDspsptt E, e English bulldog BBCCDDtt As, ay, at Chihuahua tt As, ay, as, at Collie y BBCCEEtt s s t a,a w w Dalmatian A A CCDDEEs s TT B, b Doberman atatCCEESStt B, b S, sp, si E m, E, ebr D, d BBDDSStt a , a , as, at Golden retriever AsAsBBDDSStt C, c ch E, e Greyhound BBtt As, ay C, c ch BBCCDDeeSStt s s D, d E m, E, ebr, e w s, s D, d German shepherd Irish setter C, cch B, b i S, si, sp, sw y g C, c ch D, d E m, E, e E, ebr, e S, sp, sw, si t A, a Labrador retriever A A CCDDSStt B, b Poodle SStt As, at E, e B, b E m, E si, sp, sw C, c ch D, d E, e t t Rottweiler a a BBCCDDEESStt St. Bernard ayayBBCCDDtt *Most dogs in the breed are homozygous for these genes; a few individual dogs may possess other alleles at these loci. Source: Data from M. B. Willis, Genetics of the Dog (London: Witherby, 1989). S, si sp, sw Extensions and Modifications of Basic Principles a b a b  a b a b a b a b (a) P generation Beardless  Bearded   wild-type phenotype Complementation occurs when an individual possessing two mutant genes has a wild-type phenotype and is an indicator that the mutations are nonallelic genes. When the complementation test is applied to white and apricot mutations, all of the heterozygous offspring have lightcolored eyes, demonstrating that white and apricot are produced by mutations that occur at the same locus and are allelic. Interaction Between Sex and Heredity In Chapter 4, we considered characteristics encoded by genes located on the sex chromosomes and how their inheritance differs from the inheritance of traits encoded by autosomal genes. Now we will examine additional influences of sex, including the effect of the sex of an individual on the expression of genes on autosomal chromosomes, characteristics determined by genes located in the cytoplasm, and characteristics for which the genotype of only the maternal parent determines the phenotype of the offspring. Finally, we’ll look at situations in which the expression of genes on autosomal chromosomes is affected by the sex of the parent from whom they are inherited. B +B + Meiosis Bb Gametes B + Fertilization F1 generation Bearded  Beardless B +B b  B +B b (b) F1 generation Bearded  Beardless   Sex-Influenced and Sex-Limited Characteristics Sex influenced characteristics are determined by autosomal genes and are inherited according to Mendel’s principles, but they are expressed differently in males and females. In this case, a particular trait is more readily expressed in one sex; in other words, the trait has higher penetrance (see p. 000 in Chapter 3) in one of the sexes. For example, the presence of a beard on some goats is determined by an autosomal gene (Bb) that is dominant in males and recessive in females. In males, a single allele is required for the expression of this trait: both the homozygote (BbBb) and the heterozygote (BbB+) have beards, whereas the B+B+ male is beardless. In contrast, females require two alleles in order for this trait to be expressed: the homozygote BbBb has a beard, whereas the heterozygote (BbB+) and the other homozygote (B+B+) are beardless. The key to understanding the expression of the bearded gene is to look at the heterozygote. In males (for which the presence of a beard is dominant), the heterozygous genotype produces a beard but, in females (for which the presence of a beard is recessive and its absence is dominant), the heterozygous genotype produces a goat without a beard. ◗ FIGURE 5.11a illustrates a cross between a beardless male (BB) and a bearded female (BbBb). The alleles B bB b B +B b B +B b Meiosis Gametes B + B+ Bb Bb Fertilization F2 generation Beardless Bearded Bearded Beardless Beardless Bearded       1/4 B +B + 1/2 B +B b 1/4 B bB b 1/4 B +B + 1/2 B +B b 1/4 B bB b 3/4 Conclusion: 3/4 of 1/4 of ◗ 1/4 the males are bearded the females are bearded 5.11 Genes that encode sex-influenced traits are inherited according to Mendel’s principles but are expressed differently in males and females. 115 116 Chapter 5 ◗ 5.12 Pattern baldness is a sex-influenced trait. This trait is seen in three generations of the Adams family: (a) John Adams (1735 – 1826), the second president of the United States, was father to (b) John Quincy Adams (1767 – 1848), who was father to (c) Charles Francis Adams (1807 – 1886). Pattern baldness results from an autosomal gene that is thought to be dominant in males and recessive in females. (Part (a) National Museum of American Art, Washington, D.C./Art Resource, NY; (b) National Portrait Gallery, Washington, D.C./Art Resource, N.Y.; (c) Bettmann/Corbis.) separate into gametes according to Mendel’s principle of segregation, and all the F1 are heterozygous (BBb). Because the trait is dominant in males and recessive in females, all the F1 males will be bearded, and all the F1 females will be beardless. When the F1 are crossed with one another, 14 of the F2 progeny are BbBb, 12 are BbB, and 14 are BB ( ◗ FIGURE 5.11b). Because male heterozygotes are bearded, 3 4 of the males in the F2 possess beards; because female heterozygotes are beardless, only 14 of the females in F2 are bearded. An example of a sex-influenced characteristic in humans is pattern baldness, in which hair is lost prematurely from the front and the top of the head ( ◗ FIGURE 5.12). Pattern baldness is an autosomal character believed to be dominant in males and recessive in females, just like beards in goats. Contrary to a popular misconception, a man does not inherit pattern baldness from his mother’s side of the family (which would be the case if the character were X linked, but it isn’t). Pattern baldness is autosomal; men and women can inherit baldness from either their mothers or their fathers. Men require only a single allele for baldness to become bald, whereas women require two alleles for baldness, and so pattern baldness is much more common among men. Furthermore, pattern baldness is expressed weakly in women; those with the trait usually have only a mild thinning of the hair, whereas men frequently lose all the hair on the top of the head. The expression of the allele for pattern baldness is clearly enhanced by the presence of male sex hormones; males who are castrated at an early age rarely become bald (but castration is not a recommended method for preventing baldness). An extreme form of sex-influenced inheritance, a sexlimited characteristic is encoded by autosomal genes that are expressed in only one sex — the trait has zero penetrance in the other sex. In domestic chickens, some males display a plumage pattern called cock feathering ( ◗ FIGURE 5.13a). Other males and all females display a pattern called hen feathering ( ◗ FIGURE 5.13b and c). Cock feathering is an autosomal recessive trait that is sex limited to males. Because the trait is autosomal, the genotypes of males and females are the same, but the phenotypes produced by these genotypes differ in males and females: Genotype HH Hh hh Male phenotype hen feathering hen feathering cock feathering Female phenotype hen feathering hen feathering hen feathering An example of a sex-limited characteristic in humans is male-limited precocious puberty. There are several types of precocious puberty in humans, most of which are not genetic. Male-limited precocious puberty, however, results from an autosomal dominant allele (P) that is expressed only in males; females with the gene are normal in phenotype. Males with precocious puberty undergo puberty at an early age, usually before the age of 4. At this time, the penis enlarges, the voice deepens, and pubic hair develops. There is no impairment of sexual function; affected males are fully fertile. Most are short as adults, because the long bones stop growing after puberty. Because the trait is rare, affected males are usually heterozygous (Pp). A male with precocious puberty who mates Extensions and Modifications of Basic Principles ◗ 5.13 A sex-limited characteristic is encoded by autosomal genes that are expressed in only one sex. An example is cock feathering in chickens, an autosomal recessive trait that is limited to males. (a) Cock-feathered male. (b) and (c) Hen-feathered females. (Part a, Richard Kolar/Animals Animals; part b, Michael Bisceblie/Animals Animals; part c, R. OSF Dowling/Animals Animals.) (a) P generation Precocious puberty   Pp Normal puberty  pp Meiosis p P Gametes p Fertilization F1 generation p Half of the sons have precocious puberty; no daughters have precocious puberty.   1/2 P p Precocious puberty 1/2 Pp Normal puberty 1/2 pp Normal puberty 1/2 p p Normal puberty (b) P generation Normal puberty   pp Normal puberty  Pp Meiosis p p Gametes P Fertilization F1 generation  1/2 P p 1/2 pp p with a woman who has no family history of this condition will transmit the allele for precocious puberty to 12 of the children ( ◗ FIGURE 5.14a), but it will be expressed only in the sons. If one of the heterozygous daughters (Pp) mates with a male who has normal puberty (pp), 12 of the sons will exhibit precocious puberty ( ◗ FIGURE 5.14b). Thus a sex-limited characteristic can be inherited from either parent, although the trait appears in only one sex. The results of molecular studies reveal that the underlying genetic defect in male-limited precocious puberty affects the receptor for luteinizing hormone (LH). This hormone normally attaches to receptors found on certain cells of the testes and stimulates these cells to produce testosterone. During normal puberty in males, high levels of LH stimulate the increased production of testosterone, which, in turn, stimulates the anatomical and physiological changes associated with puberty. The P allele for precocious puberty codes for a defective LH receptor, which stimulates testosterone production even in the absence of LH. Boys with this allele produce high levels of testosterone at an early age, when levels of LH are low. Defective LH receptors are also found in females who carry the precocious-puberty gene, but their presence does not result in precocious puberty, because additional hormones are required along with LH to induce puberty in girls. Concepts Half of the sons have precocious puberty; no daughters have precocious puberty.  Precocious puberty 1/2 Pp Normal puberty Normal puberty 1/2 p p Normal puberty Conclusion: Both males and females can transmit this sex-limited trait, but it is expressed only in males. Sex-influenced characteristics are traits encoded by autosomal genes that are more readily expressed in one sex. Sex-limited characteristics are encoded by autosomal genes whose expression is limited to one sex. ◗ 5.14 Sex-limited characteristics are inherited according to Mendel’s principles. Precocious puberty is an autosomal dominant trait that is limited to males. 117 118 Chapter 5 This cell contains an equal number of mitochondria with wild-type genes and mitochondria with mutated genes. Mitochondria segregate randomly in cell division. Cell division The random segregation of mitochondria in cell division… Replication of mitochondria Cell division Replication of mitochondria ...results in progeny cells that differ in their number of mitochondria with wild-type and mutated genes. ◗ 5.15 Cytoplasmically inherited characteristics frequently exhibit extensive phenotypic variation because cells and individual offspring contain various proportions of cytoplasmic genes. Mitochondria that have wild-type mtDNA are shown in red; those having mutant mtDNA are shown in blue. Cytoplasmic Inheritance Mendel’s principles of segregation and independent assortment are based on the assumption that genes are located on chromosomes in the nucleus of the cell. For the majority of genetic characteristics, this assumption is valid, and Mendel’s principles allow us to predict the types of offspring that will be produced in a genetic cross. However, not all the genetic material of a cell is found in the nucleus; some characteristics are encoded by genes located in the cytoplasm. These characteristics exhibit cytoplasmic inheritance. A few organelles, notably chloroplasts and mitochondria, contain DNA. Each human mitochondrion contains about 15,000 nucleotides of DNA, encoding 37 genes. Compared with that of nuclear DNA, which contains some 3 billion nucleotides encoding perhaps 35,000 genes, the amount of mitochondrial DNA (mtDNA) is very small; nevertheless, mitochondrial and chloroplast genes encode some important characteristics. The molecular details of this extranuclear DNA are discussed in Chapter 20; here, we will focus on patterns of cytoplasmic inheritance. Cytoplasmic inheritance differs from the inheritance of characteristics encoded by nuclear genes in several important respects. A zygote inherits nuclear genes from both parents, but typically all of its cytoplasmic organelles, and thus all its cytoplasmic genes, come from only one of the gametes, usually the egg. Sperm generally contributes only a set of nuclear genes from the male parent. In a few organisms, cytoplasmic genes are inherited from the male parent, or from both parents; however, for most organisms, all the cytoplasm is inherited from the egg. In this case, cytoplasmically inherited maits are present in both males and females and are passed from mother to offspring, never from father to offspring. Reciprocal crosses, therefore, give different results when cytoplasmic genes encode a trait. Cytoplasmically inherited characteristics frequently exhibit extensive phenotypic variation, because there is no mechanism analogous to mitosis or meiosis to ensure that cytoplasmic genes are evenly distributed in cell division. Thus, different cells and individuals will contain various proportions of cytoplasmic genes. Consider mitochondrial genes. There are thousands of mitochondria in each cell, and each mitochondrion contains from 2 to 10 copies of mtDNA. Suppose that half of the mitochondria in a cell contain a normal wild-type copy of mtDNA and the other half contain a mutated copy ( ◗ FIGURE 5.15). In cell division, the mitochondria segregate into progeny cells at random. Just by chance, one cell may receive mostly mutated mtDNA and another cell may receive mostly wild-type mtDNA (see Figure 5.15). In this way, different progeny from the same mother and even cells within an individual offspring may vary in their phenotype. Traits encoded by chloroplast DNA (cpDNA) are similarly variable. In 1909, cytoplasmic inheritance was recognized by Carl Correns as one of the first exceptions to Mendel’s principles. Correns, one of the biologists who rediscovered Mendel’s work, studied the inheritance of leaf variegation in the four-o’clock plant, Mirabilis jalapa. Correns found that the leaves and shoots of one variety of four-o’clock were variegated, displaying a mixture of green and white splotches. He also noted that some branches of the variegated strain had all-green leaves; other branches had allwhite leaves. Each branch produced flowers; so Correns was able to cross flowers from variegated, green, and white branches in all combinations ( ◗ FIGURE 5.16). The seeds from green branches always gave rise to green progeny, no matter whether the pollen was from a green, white, or variegated branch. Similarly, flowers on white branches always produced white progeny. Flowers on the variegated branches gave rise to green, white, and variegated progeny, in no particular ratio. Extensions and Modifications of Basic Principles The phenotype of the branch from which the pollen originated has no effect on the phenotype of the progeny. ) Pollen plant ( Pollen Pollen Seed plant ( White Green Variegated White White White White Green Green Green Green ) Pollen the random segregation of chloroplasts in the course of oogenesis produces some egg cells with normal cpDNA, which develop into green progeny; other egg cells with only abnormal cpDNA develop into white progeny; and, finally, still other egg cells with a mixture of normal and abnormal cpDNA develop into variegated progeny. In recent years, a number of human diseases (mostly rare) that exhibit cytoplasmic inheritance have been identified. These disorders arise from mutations in mtDNA, most of which occur in genes coding for components of the electron-transport chain, which generates most of the ATP (adenosine triphosphate) in aerobic cellular respiration. One such disease is Leber hereditary optic neuropathy. Patients who have this disorder experience rapid loss of vision in both eyes, resulting from the death of cells in the optic nerve. Loss of vision typically occurs in early adulthood (usually between the ages of 20 and 24), but it can occur any time after adolescence. There is much clinical variability in the severity of the disease, even within the same family. Leber hereditary optic neuropathy exhibits maternal inheritance: the trait is always passed from mother to child. Genetic Maternal Effect Variegated White White White Green Green Green Variegated Variegated Variegated Conclusion: The phenotype of the progeny is determined by the phenotype of the branch from which the seed originated ◗ 5.16 Crosses for leaf type in four o’clocks illustrate cytoplasmic inheritance. Corren’s crosses demonstrated cytoplasmic inheritance of variegation in the four-o’clocks. The phenotypes of the offspring were determined entirely by the maternal parent, never by the paternal parent (the source of the pollen). Furthermore, the production of all three phenotypes by flowers on variegated branches is consistent with the occurrence of cytoplasmic inheritance. Variegation in these plants is caused by a defective gene in the cpDNA, which results in a failure to produce the green pigment chlorophyll. Cells from green branches contain normal chloroplasts only, cells from white branches contain abnormal chloroplasts only, and cells from variegated branches contain a mixture of normal and abnormal chloroplasts. In the flowers from variegated branches, A genetic phenomenon that is sometimes confused with cytoplasmic inheritance is genetic maternal effect, in which the phenotype of the offspring is determined by the genotype of the mother. In cytoplasmic inheritance, the genes for a characteristic are inherited from only one parent, usually the mother. In genetic maternal effect, the genes are inherited from both parents, but the offspring’s phenotype is determined not by its own genotype but by the genotype of its mother. Genetic maternal effect frequently arises when substances present in the cytoplasm of an egg (encoded by the mother’s genes) are pivotal in early development. An excellent example is shell coiling of the snail Limnaea peregra. In most snails of this species, the shell coils to the right, which is termed dextral coiling. However, some snails possess a left-coiling shell, exhibiting sinistral coiling. The direction of coiling is determined by a pair of alleles; the allele for dextral (s) is dominant over the allele for sinistral (s). However, the direction of coiling is determined not by that snail’s own genotype, but by the genotype of its mother. The direction of coiling is affected by the way in which the cytoplasm divides soon after fertilization, which in turn is determined by a substance produced by the mother and passed to the offspring in the cytoplasm of the egg. If a male homozygous for dextral alleles (ss) is crossed with a female homozygous for sinistral alleles (ss), all of the F1 are heterozygous (ss) and have a sinistral shell, because the genotype of the mother (ss) codes for sinistral ( ◗ FIGURE 5.17). If these F1 snails are self-fertilized, the genotypic ratio of the F2 is 1 ss : 2 ss : 1 ss. The phenotype of all F2 snails will be dextral regardless of their genotypes, because the genotype of their mother (ss) encodes a rightcoiling shell and determines their phenotype. 119 120 Chapter 5 1 Dextral, a right-handed coil, results from an autosomal allele (s+) that is dominant… P generation Dextral  Sinistral  2 …over an allele for sinistral (s), which encodes a left-handed coil.  s+s+ ss Concepts Characteristics exhibiting cytoplasmic inheritance are encoded by genes in the cytoplasm and are usually inherited from one parent, most commonly the mother. In genetic maternal effect, the genotype of the mother determines the phenotype of the offspring. Meiosis Gametes s+ Genomic Imprinting s Fertilization F1 generation Sinistral 3 All the F1 are heterozygous (s+s); because the genotype of the mother determines the phenotype of the offspring, all the F1 have a sinistral shell. s+s Meiosis s+ s Self-fertilization F2 generation Dextral 1/4 s+s+ Dextral 1/2 s+s Dextral 1/4 ss Conclusion: Because the mother of the F2 progeny has genotype s+s, all the F2 snails are dextral. ◗ 5.17 In genetic maternal effect, the genotype of the maternal parent determines the phenotype of the offspring. Shell coiling in snails is a trait that exhibits genetic maternal effect. Notice that the phenotype of the progeny is not necessarily the same as the phenotype of the mother, because the progeny’s phenotype is determined by the mother’s genotype, not her phenotype. Neither the male parent’s nor the offspring’s own genotype has any role in the offspring’s phenotype. A male does influence the phenotype of the F2 generation; by contributing to the genotypes of his daughters, he affects the phenotypes of their offspring. Genes that exhibit genetic maternal effect are therefore transmitted through males to future generations. In contrast, the genes that exhibit cytoplasmic inheritance are always transmitted through only one of the sexes (usually the female). One of the basic tenets of Mendelian genetics is that the parental origin of a gene does not affect its expression — reciprocal crosses give identical results. We have seen that there are some genetic characteristics — those encoded by X-linked genes and cytoplasmic genes — for which reciprocal crosses do not give the same results. In these cases, males and females do not contribute the same genetic material to the offspring. With regard to autosomal genes, males and females contribute the same number of genes, and paternal and maternal genes have long been assumed to have equal effects. The results of recent studies, however, have identified several mammalian genes whose expression is significantly affected by their parental origin. This phenomenon, the differential expression of genetic material depending on whether it is inherited from the male or female parent, is called genomic imprinting. Genomic imprinting has been observed in mice in which a particular gene has been artificially inserted into a mouse’s DNA (to create a transgenic mouse). In these mice, the inserted gene is faithfully passed from generation to generation, but its expression may depend on which parent transmitted the gene. For example, when a transgenic male passes an imprinted gene to his offspring, they express the gene; but, when his daughter transmits the same gene to her offspring, they don’t express it. In turn, her son’s offspring express it, but her daughter’s offspring don’t. Both male and female offspring possess the gene for the trait; the key to whether the gene is expressed is the sex of the parent transmitting the gene. In the present example, the gene is expressed only when it is transmitted by a male parent. The reverse situation, expression of a trait when the gene is transmitted by the female parent, also occurs. Genomic imprinting has been implicated in several human disorders, including Prader-Willi and Angelman syndromes. Children with Prader-Willi syndrome have small hands and feet, short stature, poor sexual development, and mental retardation; they develop voracious appetites and frequently become obese. Many persons with Prader-Willi syndrome are missing a small region of chromosome 15 called q11–13. The deletion of this region is always inherited from the father in persons with Prader-Willi syndrome. The deletions of q11–13 on chromosome 15 can also be inherited from the mother, but this inheritance results in a completely different set of symptoms, producing Angelman Extensions and Modifications of Basic Principles Anticipation Table 5.4 Sex influences on heredity Genetic Phenomenon Phenotype Determined by Sex-linked characteristic genes located on the sex chromosome Sex-influenced characteristic genes on autosomal chromosomes that are more readily expressed in one sex Sex-limited characteristic autosomal genes whose expression is limited to one sex Genetic maternal effect nuclear genotype of the maternal parent Cytoplasmic inheritance cytoplasmic genes, which are usually inherited entirely from only one parent Genomic imprinting genes whose expression is affected by the sex of the transmitting parent syndrome. Children with Angelman syndrome exhibit frequent laughter, uncontrolled muscle movement, a large mouth, and unusual seizures. The deletion of segment q11–13 from chromosome 15 has severe effects on the human phenotype, but the specific effects depend on which parent contributes the deletion. For normal development to take place, copies of segment q11–13 of chromosome 15 from both male and female parents are apparently required. Several other human diseases also appear to exhibit genomic imprinting. Although the precise mechanism of this phenomenon is unknown, methylation of DNA — the addition of methyl (CH3) groups to DNA nucleotides (see Chapters 10 and 16) — is essential to the process of genomic imprinting, as demonstrated by the observation that mice deficient in DNA methylation do not exhibit imprinting. Some of the ways in which sex interacts with heredity are summarized in Table 5.4. Concepts In genomic imprinting, the expression of a gene is influenced by the sex of the parent who transmits the gene to the offspring. www.whfreeman.com/pierce Additional information about genomic imprinting, Prader-Willi syndrome, and Angelman syndrome Another genetic phenomenon that is not explained by Mendel’s principles is anticipation, in which a genetic trait becomes more strongly expressed or is expressed at an earlier age as it is passed from generation to generation. In the early 1900s, several physicians observed that patients with moderate to severe myotonic dystrophy — an autosomal dominant muscle disorder — frequently had ancestors who were only mildly affected by the disease. These observations led to the concept of anticipation. However, the concept quickly fell out of favor with geneticists because there was no obvious mechanism to explain it; traditional genetics held that genes are passed unaltered from parents to offspring. Geneticists tended to attribute anticipation to observational bias. The results of recent research have reestablished anticipation as a legitimate genetic phenomenon. The mutation causing myotonic dystrophy consists of an unstable region of DNA that can increase or decrease in size as the gene is passed from generation to generation, much like the gene that causes Huntington disease. The age of onset and the severity of the disease are correlated with the size of the unstable region; an increase in the size of the region through generations produces anticipation. The phenomenon has now been implicated in several genetic diseases. We will examine these interesting types of mutations in more detail in Chapter 17. Concepts Anticipation is the stronger or earlier expression of a genetic trait through succeeding generations. It is caused by an unstable region of DNA that increases or decreases in size. Interaction Between Genes and Environment In Chapter 3, we learned that each phenotype is the result of a genotype developing within a specific environment; the genotype sets the potential for development, but how the phenotype actually develops within the limits imposed by the genotype depends on environmental effects. Stated another way, each genotype may produce several different phenotypes, depending on the environmental conditions in which development occurs. For example, genotype GG may produce a plant that is 10 cm high when raised at 20°C, but the same genotype may produce a plant that is 18 cm high when raised at 25°C. The range of phenotypes produced by a genotype in different environments (in this case, plant height) is called the norm of reaction ( ◗ FIGURE 5.18). For most of the characteristics discussed so far, the effect of the environment on the phenotype has been slight. 121 Chapter 5 Average wing length (mm) 122 vg vg  ◗ vg vg  0 18° 19° 20° 21° 22° 23° 24° 25° 26° 27° 28° 29° 30° 31° Environmental temperature during development (Celsius scale) Mendel’s peas with genotype yy, for example, developed yellow endosperm regardless of the environment in which they were raised. Similarly, persons with genotype IAIA have the A antigen on their red blood cells regardless of their diet, socioeconomic status, or family environment. For other phenotypes, however, environmental effects play a more important role. Environmental Effects on Gene Expression The expression of some genotypes is critically dependent on the presence of a specific environment. For example, the himalayan allele in rabbits produces dark fur at the extremities of the body — on the nose, ears, and feet ( ◗ FIGURE 5.19). The dark pigment develops, however, only when the rabbit is reared at 25°C or less; if a Himalayan rabbit is reared at 30°C, no dark patches develop. The expression of the himalayan allele is thus temperature dependent — an enzyme necessary for the production of dark pigment is inactivated at higher temperatures. The pigment is normally restricted to the nose, feet, and ears of Himalayan rabbits because the animal’s core body temperature is normally above 25°C and the enzyme is functional only in the cells of the relatively cool extremities. The himalayan allele is an example of a temperature-sensitive allele, an allele whose product is functional only at certain temperatures. 5.18 Norm of reaction is the range of phenotypes produced by a genotype in different environments. This norm of reaction is for vestigial wings in Drosophila melanogaster. (Data from M. H. Harnly, Journal of Experimental Zoology 56:363 – 379, 1936.) Some types of albinism in plants are temperature dependent. In barley, an autosomal recessive allele inhibits chlorophyll production, producing albinism when the plant is grown below 7°C. At temperatures above 18°C, a plant homozygous for the albino allele develops normal chlorophyll and is green. Similarly, among Drosophila melanogaster homozygous for the autosomal mutation vestigial, greatly reduced wings develop at 25°C, but wings near normal size develop at higher temperatures (see Figure 5.18). Environmental factors also play an important role in the expression of a number of human genetic diseases. Glucose-6-phosphate dehydrogenase is an enzyme taking part in supplying energy to the cell. In humans, there are a number of genetic variants of glucose-6-phosphate dehydrogenase, some of which destroy red blood cells when the body is stressed by infection or by the ingestion of certain drugs or foods. The symptoms of the genetic disease appear only in the presence of these specific environmental factors. Another genetic disease, phenylketonuria (PKU), is due to an autosomal recessive allele that causes mental retardation. The disorder arises from a defect in an enzyme that normally metabolizes the amino acid phenylalanine. When this enzyme is defective, phenylalanine is not metabolized, and its buildup causes brain damage in children. A simple ◗ 5.19 The expression of some genotypes depends on specific environments. The expression of a temperature-sensitive allele, himalayan, is shown in rabbits reared at different temperatures. Reared at 20°C or less Reared at temperatures above 30°C Extensions and Modifications of Basic Principles environmental change, putting an affected child on a lowphenylalanine diet, prevents retardation. These examples illustrate the point that genes and their products do not act in isolation; rather, they frequently interact with environmental factors. Occasionally, environmental factors alone can produce a phenotype that is the same as the phenotype produced by a genotype; this phenotype is called a phenocopy. In fruit flies, for example, the autosomal recessive mutation eyeless produces greatly reduced eyes. The eyeless phenotype can also be produced by exposing the larvae of normal flies to sodium metaborate. Concepts The expression of many genes is modified by the environment. The range of phenotypes produced by a genotype in different environments is called the norm of reaction. A phenocopy is a trait produced by environmental effects that mimics the phenotype produced by a genotype. The Inheritance of Continuous Characteristics So far, we’ve dealt primarily with characteristics that have only a few distinct phenotypes. In Mendel’s peas, for example, the seeds were either smooth or wrinkled, yellow or green; the coats of dogs were black, brown, or yellow; blood types were of four distinct types, A, B, AB, or O. Characteristics such as these, which have a few easily distinguished phenotypes, are called discontinuous characteristics. Not all characteristics exhibit discontinuous phenotypes. Human height is an example of such a character; people do not come in just a few distinct heights but, rather, display a continuum of heights. Indeed, there are so many possible phenotypes of human height that we must use a measurement to describe a person’s height. Characteristics that exhibit a continuous distribution of phenotypes are termed continuous characteristics. Because such characteristics have many possible phenotypes and must be described in quantitative terms, continuous characteristics are also called quantitative characteristics. Continuous characteristics frequently arise because genes at many loci interact to produce the phenotypes. When a single locus with two alleles codes for a characteristic, there are three genotypes possible: AA, Aa, and aa. With two loci, each with two alleles, there are 32  9 genotypes possible. The number of genotypes coding for characteristic is 3n, where n equals the number of loci with two alleles that influence the characteristic. For example, when a characteristic is determined by eight loci, each with two alleles, there are 38  6561 different genotypes possible for this character. If each genotype produces a different phenotype, many phenotypes will be possible. The slight differences between the phenotypes will be indistinguishable, and the characteristic will appear continuous. Characteristics encoded by genes at many loci are called polygenic characteristics. The converse of polygeny is pleiotropy, in which one gene affects multiple characteristics. Many genes exhibit pleiotropy. PKU, mentioned earlier, results from a recessive allele; persons homozygous for this allele, if untreated, exhibit mental retardation, blue eyes, and light skin color. Frequently the phenotypes of continuous characteristics are also influenced by environmental factors. Each genotype is capable of producing a range of phenotypes — it has a relatively broad norm of reaction. In this situation, the particular phenotype that results depends on both the genotype and the environmental conditions in which the genotype develops. For example, there may be only three genotypes coding for a characteristic, but, because each genotype has a broad norm of reaction, the phenotype of the character exhibits a continuous distribution. Many continuous characteristics are both polygenic and influenced by environmental factors; such characteristics are called multifactorial because many factors help determine the phenotype. The inheritance of continuous characteristics may appear to be complex, but the alleles at each locus follow Mendel’s principles and are inherited in the same way as alleles coding for simple, discontinuous characteristics. However, because many genes participate, environmental factors influence the phenotype, and the phenotypes do not sort out into a few distinct types, we cannot observe the distinct ratios that have allowed us to interpret the genetic basis of discontinuous characteristics. To analyze continuous characteristics, we must employ special statistical tools, as will be discussed in Chapter 22. Concepts Discontinuous characteristics exhibit a few distinct phenotypes; continuous characteristics exhibit a range of phenotypes. A continuous characteristic is frequently produced when genes at many loci and environmental factors combine to determine a phenotype. Connecting Concepts Across Chapters This chapter introduced a number of modifications and extensions of the basic concepts of heredity that we learned in Chapter 3. A major theme has been gene expression: how interactions between genes, interactions between genes and sex, and interactions between genes and the environment affect the phenotypic expression of genes. The modifications and extensions discussed in this chapter do not alter the way that genes are inherited, but they do modify the way in which the genes determine the phenotype. 123 124 Chapter 5 A number of topics introduced in this chapter will be explored further in other chapters of the book. Here we have purposefully ignored many aspects of the nature of gene expression because our focus has been on the “big picture” of how these interactions affect phenotypic ratios in genetic crosses. In subsequent chapters, we will explore the molecular details of gene expression, including transcription (Chapter 13), translation (Chapter 15), and the control of gene expression (Chapter 16). The molecular nature of anticipation will be examined in more detail in Chapter 17, and DNA methylation, the basis of genomic imprinting, will be discussed in Chapter 10. Complementation testing will be revisited in Chapter 8, and the role of multiple genes and environmental factors in the inheritance of continuous characteristics will be studied more thoroughly in Chapter 22. CONCEPTS SUMMARY • Dominance always refers to genes at the same locus (allelic genes) and can be understood in regard to how the phenotype of the heterozygote relates to the phenotypes of the homozygotes. • Dominance is complete when a heterozygote has the same phenotype as a homozygote. Dominance is incomplete when the heterozygote has a phenotype intermediate between those of two parental homozygotes. Codominance is the result when the heterozygote exhibits traits of both parental homozygotes. • The type of dominance does not affect the inheritance of an allele; it does affect the phenotypic expression of the allele. The classification of dominance may depend on the level of the phenotype examined. • Lethal alleles cause the death of an individual possessing them, usually at an early stage of development, and may alter phenotypic ratios. • Multiple alleles refers to the presence of more than two alleles at a locus within a group. Their presence increases the number of genotypes and phenotypes possible. • Gene interaction refers to interaction between genes at different loci to produce a single phenotype. An epistatic gene at one locus suppresses or masks the expression of hypostatic genes at different loci. Gene interaction frequently produces phenotypic ratios that are modifications of dihybrid ratios. • A complementation test, in which individuals homozygous for different mutations are crossed, can be used to determine if the mutations occur at the same locus or at different loci. • Sex-influenced characteristics are encoded by autosomal genes that are expressed more readily in one sex. • Sex-limited characteristics are encoded by autosomal genes expressed in only one sex. Both males and females possess sexlimited genes and transmit them to their offspring. • In cytoplasmic inheritance, the genes for the characteristic are found in the cytoplasm and are usually inherited from a single (usually maternal parent). • Genetic maternal effect is present when an offspring inherits genes from both parents, but the nuclear genes of the mother determine the offspring’s phenotype. • Genomic imprinting refers to characteristics encoded by autosomal genes whose expression is affected by the sex of the parent transmitting the genes. • Anticipation refers to a genetic trait that is more strongly expressed or is expressed at an earlier age in succeeding generations. • Phenotypes are often modified by environmental effects. The range of phenotypes that a genotype is capable of producing in different environments is the norm of reaction. A phenocopy is a phenotype produced by an environmental effect that mimics a phenotype produced by a genotype. • Discontinuous characteristics are characteristics with a few distinct phenotypes; continuous characteristics are those that exhibit a wide range of phenotypes. Continuous characteristics are frequently produced by the combined effects of many genes and environmental effects. IMPORTANT TERMS codominance (p. 103) lethal allele (p. 104) multiple alleles (p. 105) gene interaction (p. 107) epistasis (p. 108) epistatic gene (p. 108) hypostatic gene (p. 108) complementation test (p. 114) complementation (p. 115) sex-influenced characteristic (p. 115) sex-limited characteristic (p. 116) cytoplasmic inheritance (p. 118) genetic maternal effect (p. 119) genomic imprinting (p. 120) anticipation (p. 121) norm of reaction (p. 121) temperature-sensitive allele (p. 122) phenocopy (p. 123) discontinuous characteristic (p. 123) continuous characteristic (p. 123) quantitative characteristic (p. 123) polygenic characteristic (p. 123) pleiotropy (p. 123) multifactorial characteristic (p. 123) Extensions and Modifications of Basic Principles 125 Worked Problems 1. The type of plumage found in mallard ducks is determined by three alleles at a single locus: MR, which codes for restricted plumage; M, which codes for mallard plumage; and md, which codes for dusky plumage. The restricted phenotype is dominant over mallard and dusky; mallard is dominant over dusky (MR  M  md). Give the expected phenotypes and proportions of offspring produced by the following crosses. M Rmd  (c) Parents M R md Gametes MR (a) (b) (c) (d) MRM  mdmd MRmd  Mmd MRmd  MRM MRM  Mmd md M RM MR M MR M M RM R M RM restricted restricted M Rmd restricted Mmd mallard 3 4 (d) Parents restricted, M RM 1 4  mallard Mmd • Solution We can determine the phenotypes and proportions of offspring by (1) determining the types of gametes produced by each parent and (2) combining the gametes of the two parents with the use of a Punnett square (a) Parents  M RM MR M Gametes mdmd md MR R md MR M RM restricted M Rmd restricted M MM mallard Mmd mallard 1 2 Mm mallard restricted, M Rmd  (b) Parents 1 2 restricted, 1 2 mallard homozygous strain of purple corn. The F1 are intercrossed, producing an ear of corn with 119 purple kernels and 89 yellow kernels (the progeny). d M m restricted md 2 2. A homozygous strain of yellow corn is crossed with a M d md M 1 MR M Gametes M mallard (a) What is the genotype of the yellow kernels? (b) Give a genetic explanation for the differences in kernel color in this cross. Mmd • Solution M R md Gametes MR M md M RM restricted M Rmd restricted d md 1 2 md M d Mm mallard restricted, 1 d mm dusky 4 mallard, 1 4 dusky (a) We should first consider whether the cross between yellow and purple strains might be a monohybrid cross for a simple dominant trait, which would produce a 3:1 ratio in the F2 (Aa  Aa : 34 A_ and 14 aa). Under this hypothesis, we would expect 156 purple progeny and 52 yellow progeny: Phenotype purple yellow total Genotype A_ aa Observed number 119 89 208 Expected number 3 4  208 156 1 4  208  52 126 Chapter 5 We see that the expected numbers do not closely fit the observed numbers. If we performed a chi-square test (see Chapter 3), we would obtain a calculated chi-square value of 35.08, which has a probability much less than 0.05, indicating that it is extremely unlikely that, when we expect a 3:1 ratio, we would obtain 119 purple progeny and 89 yellow progeny. Therefore we can reject the hypothesis that these results were produced by a monohybrid cross. Another possible hypothesis is that the observed F2 progeny are in a 1:1 ratio. However, we learned in Chapter 3 that a 1:1 ratio is produced by a cross between a heterozygote and a homozygote (Aa  aa) and, from the information given, the cross was not between a heterozygote and a homozygote, because the original parental strains were both homozygous. Furthermore, a chi-square test comparing the observed numbers with an expected 1:1 ratio yields a calculated chi-square value of 4.32, which has a probability of less than .05. Next, we should look to see if the results can be explained by a dihybrid cross (AaBb  AaBb). A dihybrid cross results in phenotypic proportions that are in sixteenths. We can apply the formula given earlier in the chapter to determine the number of sixteenths for each phenotype: x x(purple) number of progeny with a phenotype  16 total number of progeny 89  16  6.85 208 Genotype ? ? Observed number 119 89 208 16 16 3 16 1 16 9 3 Because 916 of the progeny from the corn cross are purple, purple must be produced by genotypes A_B_; in other words, individual kernels that have at least one dominant allele at the first locus and at least one dominant allele at the second locus are purple. The proportions of all the other genotypes (A_bb, aaB_, and aabb) sum to 716, which is the proportion of the progeny in the corn cross that are yellow, so any individual kernel that does not have a dominant allele at both the first and the second locus is yellow. (b) Kernel color is an example of duplicate recessive epistasis, where the presence of two recessive alleles at either the first locus or the second locus or both suppresses the production of purple pigment. 3. A geneticist crosses two yellow mice with straight hair and obtains the following progeny: 2 yellow, straight 6 yellow, fuzzy 1 4 gray, straight 1 12 gray, fuzzy 1 1 Thus, purple and yellow appear approximately a 9:7 ratio. We can test this hypothesis with a chi-square test: 2   aaB_ aabb 119  16   9.15 208 x(yellow)  Phenotype purple yellow total A_B_ A_bb Expected number 9 16  208  117 7 16  208  91 (observed  expected)2 (119  117)2 (89  91)2   expected 117 91  0.034  0.44  0.078 Degree of freedom  n  1 = 2  1  1 (a) Provide a genetic explanation for the results and assign genotypes to the parents and progeny of this cross. (b) What additional crosses might be carried out to determine if your explanation is correct? • Solution (a) This cross concerns two separate characteristics — color and type of hair; so we should begin by examining the results for each characteristic separately. First, let’s look at the inheritance of color. Two yellow mice are crossed producing 1 2  16  36  16  46  23 yellow mice and 14  112  312  1 12  412  13 gray mice. We learned in this chapter that a 2:1 ratio is often produced when a recessive lethal gene is present: Yy  Yy P > .05 The probability associated with the chi-square value is greater than .05, indicating that there is a relatively good fit between the observed results and a 9:7 ratio. We now need to determine how a dihybrid cross can produce a 9:7 ratio and what genotypes correspond to the two phenotypes. A dihybrid cross without epistasis produces a 9:3:3:1 ratio: AaBb  AaBb s p s p YY Yy yy 4 die 2 yellow, becomes 23 1 4 gray, becomes 13 1 1 Now, let’s examine the inheritance of the hair type. Two mice with straight hair are crossed, producing 12  14  24  14  34 mice with straight hair and 16  112  212  112  3 12  14 mice with fuzzy hair. We learned in Chapter 3 that a Extensions and Modifications of Basic Principles 3:1 ratio is usually produced by a cross between two individuals heterozygous for a simple dominant allele: Ss  Ss s p SS Ss ss 4 straight 2 straight 1 4 fuzzy 1 1 } 4 straight 3 We can now combine both loci and assign genotypes to all the individuals in the cross: P yellow, straight  yellow, straight YySs YySs s p Phenotype yellow, straight yellow, fuzzy gray, straight gray, fuzzy Genotype YyS_ Yyss yyS_ yyss Probability at each locus 2 3  34 2 3  14 1 3  34 1 3  14 Combined probability  612  12  212  16  312  14  112 (b) We could carry out a number of different crosses to test our hypothesis that yellow is a recessive lethal and straight is dominant over fuzzy. For example, a cross between any two yellow individuals should always produce 23 yellow and 13 gray, and a cross between two gray individuals should produce all gray offspring. A cross between two fuzzy individuals should always produce all fuzzy offspring. 4. In some sheep, the presence of horns is produced by an autosomal allele that is dominant in males and recessive in females. A horned female is crossed with a hornless male. One of the resulting F1 females is crossed with a hornless male. What proportion of the male and female progeny from this cross will have horns? • Solution The presence of horns in these sheep is an example of a sexinfluenced characteristic. Because the phenotypes associated with the genotypes differ for the two sexes, let’s begin this problem by writing out the genotypes and phenotypes for each sex. We will 127 let H represent the allele that codes for horns and H represent the allele for hornless. In males, the allele for horns is dominant over the allele for hornless, which means that males homozygous (HH) and heterozygous (HH) for this gene are horned. Only males homozygous for the recessive hornless allele (HH) will be hornless. In females, the allele for horns is recessive, which means that only females homozygous for this allele (HH) will be horned; females heterozygous (HH) and homozygous (HH) for the hornless allele will be hornless. The following table summarizes genotypes and associated phenotypes: Genotype HH HH HH Male phenotype horned horned hornless Female phenotype horned hornless hornless In the problem, a horned female is crossed with a hornless male. From the preceding table, we see that a horned female must be homozygous for the allele for horns (HH) and a hornless male must be homozygous for the allele for hornless (HH); so all the F1 will be heterozygous; the F1 males will be horned and the F1 females will be hornless, as shown below: P HH  HH s p F1 HH HH horned males and hornless females A heterozygous hornless F1 female (HH) is then crossed with a hornless male (HH): HH  HH horned female hornless male s p 2 HH 1 2 HH 1 Males hornless horned Females hornless hornless Therefore, 12 of the male progeny will be horned but none of the female progeny will be horned. COMPREHENSION QUESTIONS * 1. How do incomplete dominance and codominance differ? * 2. Explain how dominance and epistasis differ. 3. What is a recessive epistatic gene? 4. What is a complementation test and what is it used for? * 5. What is genomic imprinting? 6. What characteristics do you expect to see in a trait that exhibits anticipation? * 7. What characteristics are exhibited by a cytoplasmically inherited trait? 8. What is the difference between genetic maternal effect and genomic imprinting? 9. What is the difference between a sex-influenced gene and a gene that exhibits genomic imprinting? * 10. What are continuous characteristics and how do they arise? 128 Chapter 5 APPLICATION QUESTIONS AND PROBLEMS * 11. Palomino horses have a golden yellow coat, chestnut horses have a brown coat, and cremello horses have a coat that is almost white. A series of crosses between the three different types of horses produced the following offspring: Cross palomino  palomino chestnut  chestnut cremello  cremello palomino  chestnut palomino  cremello chestnut  cremello Offspring 13 palomino, 6 chestnut, 5 cremello 16 chestnut 13 cremello 8 palomino, 9 chestnut 11 palomino, 11 cremello 23 palomino (a) Explain the inheritance of the palomino, chestnut, and cremello phenotypes in horses. (b) Assign symbols for the alleles that determine these phenotypes, and list the genotypes of all parents and offspring given in the preceding table. * 12. The LM and LN alleles at the MN blood group locus exhibit codominance. Give the expected genotypes and phenotypes and their ratios in progeny resulting from the following crosses. (a) (b) (c) (d) (e) LMLM  LMLN LNLN  LNLN LMLN  LMLN LMLN  LNLN LMLM  LNLN 13. In the pearl millet plant, color is determined by three alleles at a single locus: Rp1 (red), Rp2 (purple), and rp (green). Red is dominant over purple and green, and purple is dominant over green (Rp1  Rp2  rp). Give the expected phenotypes and ratios of offspring produced by the following crosses. (a) Rp1/Rp2  Rp1/rp (b) (c) (d) (e) Rp1/rp  Rp2/rp Rp1/Rp2  Rp1/Rp2 Rp2/rp  rp/rp rp/rp  Rp1/Rp2 * 14. Give the expected genotypic and phenotypic ratios for the following crosses for ABO blood types. (a) (b) (c) (d) (e) IAi  IBi IAIB  IAi IAIB  IAIB ii  IAi IAIB  ii 15. If there are five alleles at a locus, how many genotypes may there be at this locus? How many different kinds of homozygotes will there be? How many genotypes and homozygotes would there be with eight alleles? 16. Turkeys have black, bronze, or black-bronze plumage. Examine the results of the following crosses: Parents Cross 1: black and bronze Cross 2: black and black Cross 3: black-bronze and black-bronze Cross 4: black and bronze Cross 5: bronze and black-bronze Cross 6: bronze and bronze Offspring all black 3 4 black, 14 bronze all black-bronze 2 black, 14 bronze, 1 4 black-bronze 1 2 bronze, 12 black-bronze 1 4 bronze, 14 black-bronze 3 Do you think these differences in plumage arise from incomplete dominance between two alleles at a single locus? If yes, support your conclusion by assigning symbols to each allele and providing genotypes for all turkeys in the crosses. If your answer is no, provide an alternative explanation and assign genotypes to all turkeys in the crosses. 17. In rabbits, an allelic series helps to determine coat color: C (full color), c ch (chinchilla, gray color), c h (himalayan, white with black extremities), and c (albino, all white). The C allele is dominant over all others, c ch is dominant over c h and c, c h is dominant over c, and c is recessive to all the other alleles. This dominance hierarchy can be summarized as C  c ch  c h  c. The rabbits in the following list are crossed and produce the progeny shown. Give the genotypes of the parents for each cross: (a) (b) (c) (d) Phenotypes of parents full color  albino himalayan  albino full color  albino full color  himalayan (e) full color  full color Phenotypes of offspring 1 2 full color, 12 albino 1 2 himalayan, 12 albino 1 2 full color, 12 chinchilla 1 2 full color, 14 himalayan, 1 4 albino 1 3 4 full color, 4 albino 18. In this chapter we considered Joan Barry’s paternity suit against Charlie Chaplin and how, on the basis of blood types, Chaplin could not have been the father of her child. (a) What blood types are possible for the father of Barry’s child? (b) If Chaplin had possessed one of these blood types, would that prove that he fathered Barry’s child? Extensions and Modifications of Basic Principles * 19. A woman has blood type A MM. She has a child with blood type AB MN. Which of the following blood types could not be that of the child’s father? Explain your reasoning. George Tom Bill Claude Henry O AB B A AB NN MN MN NN MM 20. Allele A is epistatic to allele B. Indicate whether each of the following statements is true or false. Explain why. (a) Alleles A and B are at the same locus. (b) Alleles A and B are at different loci. (c) Alleles A and B are always located on the same chromosome. (d) Alleles A and B may be located on different, homologous chromosomes. (e) Alleles A and B may be located on different, nonhomologous chromosomes. * 21. In chickens, comb shape is determined by alleles at two loci (R, r and P, p). A walnut comb is produced when at least one dominant allele R is present at one locus and at least one dominant allele P is present at a second locus (genotype R_ P_). A rose comb is produced when at least one dominant allele is present at the first locus and two recessive alleles are present at the second locus (genotype R_pp). A pea comb is produced when two recessive alleles are present at the first locus and at least one dominant allele is present at the second (genotype rrP_). If two recessive alleles are present at the first and at the second locus (rrpp), a single comb is produced. Progeny with what types of combs and in what proportions will result from the following crosses? (a) (b) (c) (d) (e) (f) RRPP  rrpp RrPp  rrpp RrPp  RrPp Rrpp  Rrpp Rrpp  rrPp Rrpp  rrpp * 22. Eye color of the Oriental fruit fly (Bactrocera dorsalis) is determined by a number of genes. A fly having wild-type eyes is crossed with a fly having yellow eyes. All the F1 flies from this cross have wild-type eyes. When the F1 are interbred, 916 of the F2 progeny have wild-type eyes, 316 have amethyst eyes (a bright, sparkling blue color), and 416 have yellow eyes. (a) Give genotypes for all the flies in the P, F1, and F2 generations. 129 (b) Does epistasis account for eye color in Oriental fruit flies? If so, which gene is epistatic and which gene is hypostatic? 23. A variety of opium poppy (Papaver somniferum L.) having lacerate leaves was crossed with a variety that has normal leaves. All the F1 had lacerate leaves. Two F1 plants were interbred to produce the F2. Of the F2, 249 had lacerate leaves and 16 had normal leaves. Give genotypes for all the plants in the P, F1, and F2 generations. Explain how lacerate leaves are determined in the opium poppy. * 24. A dog breeder liked yellow and brown Labrador retrievers. In an attempt to produce yellow and brown puppies, he bought a yellow Labrador male and a brown Labrador female and mated them. Unfortunately, all the puppies produced in this cross were black. (See p. 000 for a discussion of the genetic basis of coat color in Labrador retrievers.) (a) Explain this result. (b) How might the breeder go about producing yellow and brown Labradors? 25. When a yellow female Labrador retriever was mated with a brown male, half of the puppies were brown and half were yellow. The same female, when mated with a different brown male, produced all brown males. Explain these results. * 26. In summer squash, a plant that produces disc-shaped fruit is crossed with a plant that produces long fruit. All the F1 have disc-shaped fruit. When the F1 are intercrossed, F2 progeny are produced in the following ratio: 916 discshaped fruit: 616 spherical fruit: 116 long fruit. Give the genotypes of the F2 progeny. 27. In sweet peas, some plants have purple flowers and other plants have white flowers. A homozygous variety of pea that has purple flowers is crossed with a homozygous variety that has white flowers. All the F1 have purple flowers. When these F1 are self-fertilized, the F2 appear in a ratio of 916 purple to 716 white. (a) Give genotypes for the purple and white flowers in these crosses. (b) Draw a hypothetical biochemical pathway to explain the production of purple and white flowers in sweet peas. 28. For the following questions, refer to p. 000 for a discussion of how coat color and pattern are determined in dogs. (a) Explain why Irish setters are reddish in color. (b) Will a cross between a beagle and a Dalmatian produce puppies with ticking? Why or why not? (c) Can a poodle crossed with any other breed produce spotted puppies? Why or why not? (d) If a St. Bernard is crossed with a Doberman, will the offspring have solid, yellow, saddle, or bicolor coats? (e) If a Rottweiler is crossed with a Labrador retriever, will the offspring have solid, yellow, saddle, or bicolor coats? 130 Chapter 5 *29. When a Chinese hamster with white spots is crossed with another hamster that has no spots, approximately 12 of the offspring have white spots and 12 have no spots. When two hamsters with white spots are crossed, 23 of the offspring possess white spots and 13 have no spots. (a) What is the genetic basis of white spotting in Chinese hamsters? (b) How might you go about producing Chinese hamsters that breed true for white spotting? 30. Male-limited precocious puberty results from a rare, sexlimited autosomal allele (P) that is dominant over the allele for normal puberty (p) and is expressed only in males. Bill undergoes precocious puberty, but his brother Jack and his sister Beth underwent puberty at the usual time, between the ages of 10 and 14. Although Bill’s mother and father underwent normal puberty, two of his maternal uncles (his mother’s brothers) underwent precocious puberty. All of Bill’s grandparents underwent normal puberty. Give the most likely genotypes for all the relatives mentioned in this family. *31. Pattern baldness in humans is a sex-influenced trait that is autosomal dominant in males and recessive in females. Jack has a full head of hair. JoAnn also has a full head of hair, but her mother is bald. (In women, pattern baldness is usually expressed as a thinning of the hair.) If Jack and JoAnn marry, what proportion of their children are expected to be bald? 32. In goats, a beard is produced by an autosomal allele that is dominant in males and recessive in females. We’ll use the symbol Bb for the beard allele and B for the beardless allele. Another independently assorting autosomal allele that produces a black coat (W) is dominant over the allele for white coat (w). Give the phenotypes and their expected proportions for the following crosses. (a) BBb Ww male  BBb Ww female (b) BBb Ww male  BBb ww female (c) BB Ww male  BbBb Ww female (d) BBb Ww male  BbBb ww female 33. In the snail Limnaea peregra, shell coiling results from a genetic maternal effect. An autosomal allele for a righthanded shell (s), called dextral, is dominant over the allele for a left-handed shell (s), called sinistral. A pet snail called Martha is sinistral and reproduces only as a female (the snails are hermaphroditic). Indicate which of the following statements are true and which are false. Explain your reasoning in each case. (a) Martha’s genotype must be ss. (b) Martha’s genotype cannot be ss. (c) All the offspring produced by Martha must be sinistral. (d) At least some of the offspring produced by Martha must be sinistral. (e) Martha’s mother must have been sinistral. (f) All Martha’s brothers must be sinistral. 34. In unicorns, two autosomal loci interact to determine the type of tail. One locus controls whether a tail is present at all; the allele for a tail (T) is dominant over the allele for tailless (t). If a unicorn has a tail, then alleles at a second locus determine whether the tail is curly or straight. Farmer Baldridge has two unicorns with curly tails. When he crosses these two unicorns, 12 of the progeny have curly tails, 14 have straight tails, and 14 do not have a tail. Give the genotypes of the parents and progeny in Farmer Baldridge’s cross. Explain how he obtained the 2:1:1 phenotypic ratio in his cross. * 35. Phenylketonuria (PKU) is an autosomal recessive disease that results from a defect in an enzyme that normally metabolizes the amino acid phenylalanine. When this enzyme is defective, high levels of phenylalanine cause brain damage. In the past, most children with PKU became mentally retarded. Fortunately, mental retardation can be prevented in these children today by carefully controlling the amount of phenylalanine in the diet. As a result of this treatment, many people with PKU are now reaching reproductive age with no mental retardation. By the end of the teen years, when brain development is complete, many people with PKU go off the restrictive diet. Children born to women with PKU (who are no longer on a phenylalanine-restricted diet) frequently have low birth weight, developmental abnormalities, and mental retardation, even though they are heterozygous for the recessive PKU allele. However, children of men with PKU do not have these problems. Provide an explanation for these observations. 36. In 1983, a sheep farmer in Oklahoma noticed a ram in his flock that possessed increased muscle mass in his hindquarters. Many of the offspring of this ram possessed the same trait, which became known as the callipyge mutant (callipyge is Greek for “beautiful buttocks”). The mutation that caused the callipyge phenotype was eventually mapped to a position on the sheep chromosome 18. When the male callipyge offspring of the original mutant ram were crossed with normal females, they produced the following progeny: 14 male callipyge, 14 female callipyge, 14 male normal, and 14 female normal. When female callipyge offspring of the original mutant ram were crossed with normal males, all of the offspring were normal. Analysis of the chromosomes of these offspring of callipyge females showed that half of them received a chromosome 18 with the callipyge gene from their mother. Propose an explanation for the inheritance of the callipyge gene. How might you test your explanation? Extensions and Modifications of Basic Principles 131 CHALLENGE QUESTION 37. Suppose that you are tending a mouse colony at a genetics research institute and one day you discover a mouse with twisted ears. You breed this mouse with twisted ears and find that the trait is inherited. Both male and female mice have twisted ears, but when you cross a twisted-eared male with a normal-eared female, you obtain different results from those obtained when you cross a twisted-eared female with normal-eared male — the reciprocal crosses give different results. Describe how you would go about determining whether this trait results from a sex-linked gene, a sex-influenced gene, a genetic maternal effect, a cytoplasmically inherited gene, or genomic imprinting. What crosses would you conduct and what results would be expected with these different types of inheritance? SUGGESTED READINGS Barlow, D. P. 1995. Gametic imprinting in mammals. Science 270:1610 – 1613. Discusses the phenomenon of genomic imprinting. Harper, P. S., H. G. Harley, W. Reardon, and D. J. Shaw. 1992. Anticipation in myotonic dystrophy: new light on an old problem [Review]. American Journal of Human Genetics 51:10 – 16. A nice review of the history of anticipation. Li, E., C. Beard, and R. Jaenisch. 1993. Role for DNA methylation in genomic imprinting. Nature 366:362 – 365. Reviews some of the evidence that DNA methylation is implicated in genomic imprinting. Morell, V. 1993. Huntington’s gene finally found. Science 260:28 – 30. Report on the discovery of the gene that causes Huntington disease. Ostrander, E. A., F. Galibert, and D. F. Patterson. 2000. Canine genetics comes of age. Trends in Genetics 16:117 – 123. Review of the use of dog genetics for understanding human genetic diseases. Pagel, M. 1999. Mother and father in surprise agreement. Nature 397:19 – 20. Discusses some of the possible evolutionary reasons for genomic imprinting. Sapienza, C. 1990. Parental imprinting of genes. Scientific American 263 (October):52 – 60. Another review of genomic imprinting. Shoffner, J. M., and D. C. Wallace. 1992. Mitochondrial genetics: principles and practice [Invited editorial]. American Journal of Human Genetics 51:1179 – 1186. Discusses the characteristics of cytoplasmically inherited mitochondrial mutations. Skuse, D. H., R. S. James, D. V. M. Bishop, B. Coppin, P. Dalton, G. Aamodt-Leeper, M. Bacarese-Hamilton, C. Creswell, R. McGurk, and P. A. Jacobs. 1997. Evidence from Turner’s syndrome of an imprinted X-linked locus affecting cognitive function. Nature 387:705 – 708. Report of imprinting in Turner syndrome. Thomson, G., and M. S. Esposito. 1999. The genetics of complex diseases. Trends in Genetics 15:M17 – M20. Discussion of human multifactorial diseases and the effect of the Human Genome Project on the identification of genes influencing these diseases. Wallace, D. C. 1989. Mitochondrial DNA mutations and neuromuscular disease. Trends in Genetics 5:9 – 13. More discussion of cytoplasmically inherited mitochondrial mutations. Willis, M. B. 1989. Genetics of the Dog. London: Witherby. A comprehensive review of canine genetics. 6 Pedigree Analysis a n d Applications • Lou Gehrig and Superoxide Free Radicals • The Study of Human Genetic Characteristics • Analyzing Pedigrees Autosomal Recessive Traits Autosomal Dominant Traits X-Linked Recessive Traits X-Linked Dominant Traits Y-Linked Traits • Twin Studies Concordance Twin Studies and Obesity • Adoption Studies Adoption Studies and Obesity Adoption Studies and Alcoholism • Genetic Counseling and Genetic Testing Genetic Counseling Genetic Testing This is Chapter 6 Opener photo legend to position here-allowing two lines of caption. (AP/ Wide World Photos.) Lou Gehrig and Superoxide Free Radicals Lou Gehrig was the finest first baseman ever to play major league baseball. A left-handed power hitter who grew up in New York City, Gehrig played for the New York Yankees from 1923 to 1939. Throughout his career, he lived in the shadow of his teammates Babe Ruth and Joe Di Maggio, but Gehrig was a great hitter in his own right: he compiled a lifetime batting average of .340 and drove in more than 100 runs every season for 13 years. During his career, he batted in 1991 runs and hit a total of 23 grand slams (home runs with bases loaded). But Gehrig’s greatest baseball record, which stood for more than 50 years and has been broken only once — by Cal Ripkin, Jr., in 1995 — is his record of playing 2130 consecutive games. In the 1938 baseball season, Gehrig fell into a strange slump. For the first time since his rookie year, his batting 132 average dropped below .300 and, in the World Series that year, he managed only four hits — all singles. Nevertheless, he finished the season convinced that he was undergoing a temporary slump that he would overcome in the next season. He returned to training camp in 1939 with high spirits. When the season began, however, it was clear to everyone that something was terribly wrong. Gehrig had no power in his swing; he was awkward and clumsy at first base. His condition worsened and, on May 2, he voluntarily removed himself from the lineup. The Yankees sent Gehrig to the Mayo Clinic for diagnosis and, on June 20, his medical report was made public: Lou Gehrig was suffering from a rare, progressive disease known as amyotrophic lateral sclerosis (ALS). Within two years, he was dead. Since then, ALS has commonly been known as Lou Gehrig disease. Gehrig experienced symptoms typical of ALS: progressive weakness and wasting of skeletal muscles due to Pedigree Analysis and Applications impose certain constraints on the geneticist. In this chapter, we’ll consider these constraints and examine three important techniques that human geneticists use to overcome them: pedigrees, twin studies, and adoption studies. At the end of the chapter, we will see how the information garnered with these techniques can be used in genetic counseling and prenatal diagnosis. Keep in mind as you go through this chapter that many important characteristics are influenced by both genes and environment, and separating these factors is always difficult in humans. Studies of twins and adopted persons are designed to distinguish the effects of genes and environment, but such studies are based on assumptions that may be difficult to meet for some human characteristics, particularly behavioral ones. Therefore, it’s always prudent to interpret the results of such studies with caution. www.whfreeman.com/pierce Information on amyotrophic lateral sclerosis, and more about Lou Gehrig, his outstanding career in baseball, and his fight with amyotrophic lateral sclerosis ◗ 6.1 Some cases of amyotrophic lateral sclerosis are inherited and result from mutations in the gene that encodes the enzyme superoxide dismutase 1. A molecular model of the enzyme. degeneration of the motor neurons. Most cases of ALS are sporadic, appearing in people with no family history of the disease. However, about 10% of cases run in families, and in these cases the disease is inherited as an autosomal dominant trait. In 1993, geneticists discovered that some familial cases of ALS are caused by a defect in a gene that encodes an enzyme called superoxide dismutase 1 (SOD1). This enzyme helps the cell to break down superoxide free radicals, which are highly reactive and extremely toxic. In families studied by the researchers, people with ALS had a defective allele for SOD1 ( ◗ FIGURE 6.1) that produced an altered form of the enzyme. How the altered enzyme causes the symptoms of the disease has not been firmly established. Amyotrophic lateral sclerosis is just one of a large number of human diseases that are currently the focus of intensive genetic research. This chapter will discuss human genetic characteristics and some of the techniques used to study human inheritance. A number of human characteristics have already been mentioned in discussions of general hereditary principles (Chapters 3 through 5), so by now you know that they follow the same rules of inheritance as those of characteristics in other organisms. So why do we have a separate chapter on human heredity? The answer is that the study of human inheritance requires special techniques — primarily because human biology and culture The Study of Human Genetic Characteristics Humans are the best and the worst of all organisms for genetic study. On the one hand, we know more about human anatomy, physiology, and biochemistry than we know about most other organisms; for many families, we have detailed records extending back many generations; and the medical implications of genetic knowledge of humans provide tremendous incentive for genetic studies. On the other hand, the study of human genetic characteristics presents some major obstacles. First, controlled matings are not possible. With other organisms, geneticists carry out specific crosses to test their hypotheses about inheritance. We have seen, for example, how the testcross provides a convenient way to determine if an individual with a dominant trait is homozygous or heterozygous. Unfortunately (for the geneticist at least), matings between humans are more frequently determined by romance, family expectations, and — occasionally — accident than they are by the requirements of the geneticist. Another obstacle is that humans have a long generation time. Human reproductive age is not normally reached until 10 to 14 years after birth, and most humans do not reproduce until they are 18 years of age or older; thus, generation time in humans is usually about 20 years. This long generation time means that, even if geneticists could control human crosses, they would have to wait on average 40 years just to observe the F2 progeny. In contrast, generation time in Drosophila is 2 weeks; in bacteria, it’s a mere 20 minutes. Finally, human family size is generally small. Observation of even the simple genetic ratios that we learned in 133 134 Chapter 6 Chapter 3 would require a substantial number of progeny in each family. When parents produce only 2 children, it’s impossible to detect a 31 ratio. Even an extremely large family with 10 to 15 children would not permit the recognition of a dihybrid 9331 ratio. Although these special constraints make genetic studies of humans more complex, understanding human heredity is tremendously important. So geneticists have been forced to develop techniques that are uniquely suited to human biology and culture. Concepts Although the principles of heredity are the same in humans and other organisms, the study of human inheritance is constrained by the inability to control genetic crosses, the long generation time, and the small number of offspring. Male Female Sex unknown or unspecified Unaffected individual Individual affected with trait Obligate carrier (carries the gene but does not have the trait) Asymptomatic carrier (unaffected at this time but may later exhibit trait) Multiple individuals (5) 5 5 5 Deceased individual Analyzing Pedigrees An important technique used by geneticists to study human inheritance is the pedigree. A pedigree is a pictorial representation of a family history, essentially a family tree that outlines the inheritance of one or more characteristics. The symbols commonly used in pedigrees are summarized in ◗ FIGURE 6.2. The pedigree shown in ◗ FIGURE 6.3a illustrates a family with Waardenburg syndrome, an autosomal dominant type of deafness that may be accompanied by fair skin, a white forelock, and visual problems ( ◗ FIGURE 6.3b). Males in a pedigree are represented by squares, females by circles. A horizontal line drawn between two symbols representing a man and a woman indicates a mating; children are connected to their parents by vertical lines extending below the parents. Persons who exhibit the trait of interest are represented by filled circles and squares; in the pedigree of Figure 6.3a, the filled symbols represent members of the family who have Waardenburg syndrome. Unaffected persons are represented by open circles and squares. Let’s look closely at Figure 6.3 and consider some additional features of a pedigree. Each generation in a pedigree is identified by a Roman numeral; within each generation, family members are assigned Arabic numerals, and children in each family are listed in birth order from left to right. Person II-4, a man with Waardenburg syndrome, mated with II-5, an unaffected woman, and they produced five children. The oldest of their children is III-8, a male with Waardenburg syndrome, and the youngest is III-14, an unaffected female. Deceased family members are indicated by a slash through the circle or square, as shown for I-1 and II-1 in Figure 6.3a. Twins are represented by diagonal lines ◗ 6.2 Standard symbols are used in pedigrees. Proband (first affected family member coming to attention of geneticist) P Family history of individual unknown Family— parents and three children: one boy and two girls in birth order P P ? P ? ? I 1 2 II 1 2 3 Adoption (brackets enclose adopted individuals. Dashed line denotes adoptive parents; solid line denotes biological parent) Identical Nonidentical Twins ? I Consanguinity (mating between related individuals) Unknown 1 2 2 3 1 2 II III Indicates consanguinity Pedigree Analysis and Applications Within each generation, family members are identified by Arabic numerals Each generation in a pedigree is indentified by a Roman numeral. Deceased family members are indicated with a slash. Filled symbols represent family members with Waardenburg syndrome… (a) (b) …and open symbols represent unaffected members. I 1 2 II 1 2 2 3 3 4 5 11 12 III 1 4 5 6 7 8 9 10 13 14 15 IV 1 2 3 4 The person from whom the pedigree is initiated is called the proband. 5 6 7 8 9 10 11 12 13 Children in each family are listed left to right in birth order. 14 15 Twins are represented by diagonal lines extending from a common point. ◗ 6.3 Waardenburg syndrome is an autosomal dominant disease characterized by deafness, fair skin, visual problems, and a white forelock. (Photograph courtesy of Guy Rowland). extending from a common point (IV-14 and IV-15; nonidentical twins). When a particular characteristic or disease is observed in a person, a geneticist studies the family of this affected person and draws a pedigree. The person from whom the pedigree is initiated is called the proband and is usually designated by an arrow (IV-I in Figure 6.3a). The limited number of offspring in most human families means that it is usually impossible to discern clear Mendelian ratios in a single pedigree. Pedigree analysis requires a certain amount of genetic sleuthing, based on recognizing patterns associated with different modes of inheritance. For example, autosomal dominant traits should appear with equal frequency in both sexes and should not skip generations, provided that the trait is fully penetrant (see p. 000 in Chapter 3) and not sex influenced (see p. 000 in Chapter 5). Certain patterns may exclude the possibility of a particular mode of inheritance. For instance, a son inherits his X chromosome from his mother. If we observe that a trait is passed from father to son, we can exclude the possibility of X-linked inheritance. In the following sections, the traits discussed are assumed to be fully penetrant and rare. unaffected; consequently, the trait appears to skip generations ( ◗ FIGURE 6.4). Frequently, a recessive allele may be passed for a number of generations without the trait appearing in a pedigree. Whenever both parents are heterozygous, approximately 14 of the offspring are expected to express the trait, but this ratio will not be obvious unless the family is large. In the rare event that both parents are affected by an autosomal recessive trait, all the offspring will be affected. I 1 II 1 2 3 4 5 Autosomal recessive traits usually appear equally in males and females… First cousins III 1 2 3 4 5 …and tend to skip generations. IV 1 2 3 These double lines represent consanguinous mating. Autosomal Recessive Traits Autosomal recessive traits normally appear with equal frequency in both sexes (unless penetrance differs in males and females), and appear only when a person inherits two alleles for the trait, one from each parent. If the trait is uncommon, most parents carrying the allele are heterozygous and 2 ◗ 4 Autosomal recessive traits are more likely to appear among progeny of related individuals. 6.4 Autosomal recessive traits normally appear with equal frequency in both sexes and seem to skip generations. 135 136 Chapter 6 When a recessive trait is rare, persons from outside the family are usually homozygous for the normal allele. Thus, when an affected person mates with someone outside the family (aa  AA), usually none of the children will display the trait, although all will be carriers (i.e., heterozygous). A recessive trait is more likely to appear in a pedigree when two people within the same family mate, because there is a greater chance of both parents carrying the same recessive allele. Mating between closely related people is called consanguinity. In the pedigree shown in Figure 6.4, persons III-3 and III-4 are first cousins, and both are heterozygous for the recessive allele; when they mate, 14 of their children are expected to have the recessive trait. one normal copy of the hexosaminidase A allele and produce only about half the normal amount of the enzyme, but this amount is enough to ensure that GM2 ganglioside is broken down normally, and heterozygotes are usually healthy. Autosomal Dominant Traits Autosomal dominant traits appear in both sexes with equal frequency, and both sexes are capable of transmitting these traits to their offspring. Every person with a dominant trait must inherit the allele from at least one parent; autosomal dominant traits therefore do not skip generations ( ◗ FIGURE 6.5). Exceptions to this rule arise when people acquire the trait as a result of a new mutation or when the trait has reduced penetrance. If an autosomal dominant allele is rare, most people displaying the trait are heterozygous. When one parent is affected and heterozygous and the other parent is unaffected, approximately 12 of the offspring will be affected. If both parents have the trait and are heterozygous, approximately 3 4 of the children will be affected. Provided the trait is fully penetrant, unaffected people do not transmit the trait to their descendants. In Figure 6.5, we see that none of the descendants of II-4 (who is unaffected) have the trait. Concepts Autosomal recessive traits appear with equal frequency in males and females. Affected children are commonly born to unaffected parents, and the trait tends to skip generations. Recessive traits appear more frequently among the offspring of consanguine matings. A number of human metabolic diseases are inherited as autosomal recessive traits. One of them is Tay-Sachs disease. Children with Tay-Sachs disease appear healthy at birth but become listless and weak at about 6 months of age. Gradually, their physical and neurological conditions worsen, leading to blindness, deafness, and eventually death at 2 to 3 years of age. The disease results from the accumulation of a lipid called GM2 ganglioside in the brain. A normal component of brain cells, GM2 ganglioside is usually broken down by an enzyme called hexosaminidase A, but children with Tay-Sachs disease lack this enzyme. Excessive GM2 ganglioside accumulates in the brain, causing swelling and, ultimately, neurological symptoms. Heterozygotes have only I 1 Concepts Autosomal dominant traits appear in both sexes with equal frequency. Affected persons have an affected parent (unless they carry new mutations), and the trait does not skip generations. Unaffected persons do not transmit the trait. One trait usually considered to be autosomal dominant is familial hypercholesterolemia, an inherited disease in which blood cholesterol is greatly elevated owing to a defect Autosomal dominant traits appear equally in males and females… 2 II 1 2 3 4 5 6 7 III 1 2 3 4 1 2 5 6 7 3 4 8 9 10 11 12 13 IV Unaffected individuals do not transmit the trait. 5 6 …and affected individuals have at least one affected parent. ◗ 6.5 Autosomal dominant traits normally appear with equal frequency in both sexes and do not skip generations. Pedigree Analysis and Applications ◗ 6.6 Low-density lipoprotein (LDL) particles transport cholesterol. The LDL receptor moves LDL through the cell membrane into the cytoplasm. in cholesterol transport. Cholesterol is an essential component of cell membranes and is used in the synthesis of bile salts and several hormones. Most of our cholesterol is obtained through foods, primarily those high in saturated fats. Because cholesterol is a lipid (a nonpolar, or uncharged, compound), it is not readily soluble in the blood (a polar, or charged, solution). Cholesterol must therefore be transported throughout the body in small soluble particles called lipoproteins ( ◗ FIGURE 6.6); a lipoprotein consists of a core of lipid surrounded by a shell of charged phospholipids and proteins that dissolve easily in blood. One of the principle lipoproteins in the transport of cholesterol is low-density lipoprotein (LDL). When an LDL molecule reaches a cell, it attaches to an LDL receptor, which then moves the LDL through the cell membrane into the cytoplasm, where it is broken down and its cholesterol is released for use by the cell. Familial hypercholesterolemia is due to a defect in the gene (located on human chromosome 19) that normally codes for the LDL receptor. The disease is usually considered an autosomal dominant disorder because heterozygotes are deficient in LDL receptors. In these people, too little cholesterol is removed from the blood, leading to elevated blood levels of cholesterol and increased risk of coronary artery disease. Persons heterozygous for familial hypercholesterolemia have blood LDL levels that are twice normal and usually have heart attacks by the age of 35. About 1 in 500 people is heterozygous for familial hypercholesterolemia and is predisposed to early coronary artery disease. Very rarely, a person inherits two defective LDL receptor alleles. Such persons don’t make any functional LDL receptors; their blood cholesterol levels are more than six times normal and they may suffer a heart attack as early as age 2 and almost inevitably by age 20. Because homozygotes are more severely affected than heterozygotes, familial hypercholesterolemia is said to be incompletely dominant. However, homozygotes are rarely seen (occurring with a frequency of only about 1 in 1 million people), and the 137 138 Chapter 6 common heterozygous form of the disease appears as a simple dominant trait in most pedigrees. I An affected male does not pass the trait to his sons… Unaffected female carrier X-Linked Recessive Traits X-linked recessive traits have a distinctive pattern of inheritance ( ◗ FIGURE 6.7). First, these traits appear more frequently in males, because males need inherit only a single copy of the allele to display the trait, whereas females must inherit two copies of the allele, one from each parent, to be affected. Second, because a male inherits his X chromosome from his mother, affected males are usually born to unaffected mothers who carry an allele for the trait. Because the trait is passed from unaffected female to affected male to unaffected female, it tends to skip generations (see Figure 6.7). When a woman is heterozygous, approximately 12 of her sons will be affected and 12 of her daughters will be unaffected carriers. For example, we know that females I-2, II-2, and III-7 in Figure 6.7 are all carriers because they transmit the trait to approximately half of their sons. A third important characteristic of X-linked recessive traits is that they are not passed from father to son, because a son inherits his father’s Y chromosome, not his X. In Figure 6.7, there is no case of a father and son who are both affected. All daughters of an affected man, however, will be carriers (if their mother is homozygous for the normal allele). When a woman displays an X-linked trait, she must be homozygous for the trait, and all of her sons will also display the trait. Concepts Rare X-linked recessive traits appear more often in males than in females and are not passed from father to son. Affected sons are usually born to unaffected mothers; thus X-linked recessive traits tend to skip generations. An example of an X-linked recessive trait in humans is hemophilia A, also called classical hemophilia ( ◗ FIGURE 6.8). This disease results from the absence of a protein necessary for blood to clot. The complex process of blood clotting consists of a cascade of reactions that includes more than 13 different factors. For this reason, there are several types of clotting disorders, each due to a glitch in a different step of the clotting pathway. Hemophilia A results from abnormal or missing factor VIII, one of the proteins in the clotting cascade. The gene for factor VIII is located on the tip of the long arm of the X chromosome; so hemophilia A is an X-linked recessive disorder. People with hemophilia A bleed excessively; even small cuts and bruises can be life threatening. Spontaneous bleeding occurs in joints such as elbows, knees, and ankles, which produces pain, swelling, and erosion of 1 2 …but can pass the allele to a daughter, who is unaffected… II 1 3 2 4 …and passes it to sons who are. III 1 2 3 IV 4 5 6 7 8 5 6 7 Affected male 1 2 3 4 8 X–linked recessive traits appear more frequently in males. ◗ 6.7 X-linked recessive traits appear more often in males and are not passed from father to son. the bone. Fortunately, bleeding in people with hemophilia A can be now controlled by administering concentrated doses of factor VIII. X-Linked Dominant Traits X-linked dominant traits appear in males and females, although they often affect more females than males. As with X-linked recessive traits, a male inherits an X-linked dominant trait only from his mother — the trait is not passed from father to son. A female, on the other hand, inherits an X chromosome from both her mother and father; so females can receive an X-linked trait from either parent. Each child with an X-linked dominant trait must have an affected parent (unless the child possesses a new mutation or the trait has reduced penetrance). X-linked dominant traits do not skip generations ( ◗ FIGURE 6.9); affected men pass the trait on to all their daughters and none of their sons, as is seen in the children of I-1 in Figure 6.9. In contrast, affected women (if heterozygous) pass the trait on to 12 of their sons and 12 of their daughters, as seen in the children of II-5 in the pedigree. Concepts X-linked dominant traits affect both males and females. Affected males must have affected mothers (unless they possess a new mutation), and they pass the trait on to all their daughters. An example of an X-linked dominant trait in humans is hypophosphatemia, also called familial vitamin D-resistant rickets. People with this trait have features that superficially resemble those produced by rickets: bone deformi- 139 Pedigree Analysis and Applications I Princess Victoria of Saxe-Coburg Edward Duke of Kent II Queen Victoria Albert III Edward VII Victoria Alice Louis Alfred of Hesse Helena Louise Arthur Beatrice Leopold Henry IV Irene Henry Frederick Wilhelm Sophie George V of King of Greece England Alexandra Nicholas II Czar of Russia Alice of Athlone Alfonso XIII King of Spain Eugenie Leopold Maurice V George VI King of England Waldemar Prince Henry Sigmund of Prussia Prussian Royal Family VI Olga Tatania Marie Alexis Anastasia Rupert Alfonso Gonzalo Juan Maria Russian Royal Family 4 Margaret Prince Philip Elizabeth II Queen of England Juan Carlos King of Spain Sophia of Greece VII Princess Prince Prince Prince Anne Charles Andrew Edward Elena Cristina Filipe Spanish Royal Family British Royal Family ◗ 6.8 Classic hemophilia is inherited as an X-linked recessive trait. This pedigree is of hemophilia in the royal families of Europe. X-linked dominant traits do not skip generations. Affected males pass the trait on to all their daughters and none of their sons. I 1 2 II 1 2 4 3 5 6 8 9 III 1 2 3 4 5 6 7 10 11 Y-Linked Traits IV 1 2 3 4 5 6 Affected females (if heterozygous) pass the trait on to half of their sons and half of their daughters. ◗ ties, stiff spines and joints, bowed legs, and mild growth deficiencies. This disorder, however, is resistant to treatment with vitamin D, which normally cures rickets. Xlinked hypophosphatemia results from the defective transport of phosphate, especially in cells of the kidneys. People with this disorder excrete large amounts of phosphate in their urine, resulting in low levels of phosphate in the blood and reduced deposition of minerals in the bone. As is common with X-linked dominant traits, males with hypophosphatemia are often more severely affected than females. 6.9 X-linked dominant traits affect both males and females. An affected male must have an affected mother. Y-linked traits exhibit a specific, easily recognized pattern of inheritance. Only males are affected, and the trait is passed from father to son. If a man is affected, all his male offspring should also be affected, as is the case for I-1, II-4, II-6, III-6, and III-10 of the pedigree in ◗ FIGURE 6.10. Y-linked traits do not skip generations. As discussed in Chapter 4, comparatively few genes reside on the human Y chromosome, and so few human traits are Y linked. 140 Chapter 6 Y-linked traits appear only in males. I 1 All male offspring of an affected male are affected. 2 www.whfreeman.com/pierce The Online Mendelian Inheritance in Man, a comprehensive database of human genes and genetic disorders II Worked Problem 1 2 3 4 5 6 5 6 7 III 1 2 3 4 8 7 9 11 10 The following pedigree represents the inheritance of a rare disorder in an extended family. What is the most likely mode of inheritance for this disease? (Assume that the trait is fully penetrant.) I IV 1 1 2 3 4 5 6 7 8 2 9 II ◗ 6.10 Y-linked traits appear only in males and are passed from a father to all his sons. 1 2 3 4 6 5 7 III 1 Concepts Y-linked traits appear only in males and are passed from a father to all his sons. 2 3 4 5 6 7 8 9 10 6 7 8 9 IV 1 The major characteristics of autosomal recessive, autosomal dominant, X-linked recessive, X-linked dominant, and Y-linked traits are summarized in Table 6.1. 2 3 4 5 • Solution To answer this question, we should consider each mode of inheritance and determine which, if any, we can Table 6.1 Pedigree characteristics of autosomal recessive, autosomal dominant, X-linked recessive, X-linked dominant, and Y-linked traits Autosomal recessive trait 1. Appears in both sexes with equal frequency. 2. Trait tends to skip generations. 3. Affected offspring are usually born to unaffected parents. 4. When both parents are heterozygous, approximately 1/4 of the offspring will be affected. 5. Appears more frequently among the children of consanguine marriages. Autosomal dominant trait 1. Appears in both sexes with equal frequency. 2. Both sexes transmit the trait to their offspring. 3. Does not skip generations. 4. Affected offspring must have an affected parent, unless they possess a new mutation. 5. When one parent is affected (heterozygous) and the other parent is unaffected, approximately 1/2 of the offspring will be affected. 6. Unaffected parents do not transmit the trait. X-linked recessive trait 1. More males than females are affected. 2. Affected sons are usually born to unaffected mothers; thus, the trait skips generations. 3. A carrier (heterozygous) mother produces approximately 1/2 affected sons. 4. Is never passed from father to son. 5. All daughters of affected fathers are carriers. X-linked dominant trait 1. Both males and females are affected; often more females than males are affected. 2. Does not skip generations. Affected sons must have an affected mother; affected daughters must have either an affected mother or an affected father. 3. Affected fathers will pass the trait on to all their daughters. 4. Affected mothers (if heterozygous) will pass the trait on to 1/2 of their sons and 1/2 of their daughters. Y-linked trait 1. Only males are affected. 2. Is passed from father to all sons. 3. Does not skip generations. Pedigree Analysis and Applications eliminate. Because the trait appears only in males, autosomal dominant and autosomal recessive modes of inheritance are unlikely, because these occur equally in males and females. Additionally, autosomal dominance can be eliminated because some affected persons do not have an affected parent. The trait is observed only among males in this pedigree, which might suggest Y-linked inheritance. However, with a Y-linked trait, affected men should pass the trait to all their sons, but here this is not the case; II-6 is an affected man who has four unaffected male offspring. We can eliminate Y-linked inheritance. X-linked dominance can be eliminated because affected men should pass an X-linked dominant trait to all of their female offspring, and II-6 has an unaffected daughter (III-9). X-linked recessive traits often appear more commonly in males, and affected males are usually born to unaffected female carriers; the pedigree shows this pattern of inheritance. With an X-linked trait, about half the sons of a heterozygous carrier mother should be affected. II-3 and III-9 are suspected carriers, and about 12 of their male children (three of five) are affected. Another important characteristic of an X-linked recessive trait is that it is not passed from father-to-son. We observe no father-to-son transmission in this pedigree. X-linked recessive is therefore the most likely mode of inheritance. Twin Studies Another method that geneticists use to analyze the genetics of human characteristics is twin studies. Twins come in two types: dizygotic (nonidentical) twins arise when two separate eggs are fertilized by two different sperm, producing genetically distinct zygotes; monozygotic (identical) twins result when a single egg, fertilized by a single sperm, splits early in development into two separate embryos. Because monozygotic twins arise from a single egg and sperm (a single, “mono,” zygote), except for rare somatic mutations, they’re genetically identical, having 100% of their genes in common ( ◗ FIGURE 6.11a). Dizygotic twins ( ◗ FIGURE 6.11b), on the other hand, have on average only 50% of their genes in common (the same percentage that any pair of siblings has in common). Like other siblings, dizygotic twins may be of the same or different sexes. The only difference between dizygotic twins and other siblings is that dizygotic twins are the same age and shared a common uterine environment. The frequency with which dizygotic twins are born varies among populations. Among North American Caucasians, about 7 dizygotic twin pairs are born per 1000 births but, among Japanese, the rate is only about 3 pairs per 1000 births; among Nigerians, about 40 dizygotic (a) (b) ◗ 6.11 Monozygotic twins (a) are identical; dizygotic twins (b) are nonidentical. (Part a, Joe Carini/Index Stock Imagery/Picture Quest; Part b, Bruce Roberts/ Photo Researchers.) twin pairs are born per 1000 births. The rate of dizygotic twinning also varies with maternal age ( ◗ FIGURE 6.12), and dizygotic twinning tends to run in families. In contrast, monozygotic twinning is relatively constant. The frequency of monozygotic twinning in most ethnic groups is about 4 twin pairs per 1000 births, and there is relatively little tendency for monozygotic twins to run in families. Concepts Dizygotic twins develop from two eggs fertilized by two separate sperm; they have, on average, 50% of their genes in common. Monozygotic twins develop from a single egg, fertilized by a single sperm, that splits into two embryos; they have 100% percent of their genes in common. 141 Chapter 6 Frequency of dizygotic twins per 1000 births 142 12 10 8 6 4 2 0 Less 20–24 than 20 25–29 30–34 35–39 40 and over Age of mother ◗ 6.12 Older women tend to have more dizygotic twins than do younger women. Relation between the rate of dizygotic twinning and maternal age. [Data from J. Yerushalmy and S. E. Sheeras, Human Biology 12:95–113, 1940.] Concordance Comparisons of dizygotic and monozygotic twins can be used to estimate the importance of genetic and environmental factors in producing differences in a characteristic. This is often done by calculating the concordance for a trait. If both members of a twin pair have a trait, the twins are said to be concordant; if only one member of the pair has the trait, the twins are said to be discordant. Concordance is the percentage of twin pairs that are concordant for a trait. Because identical twins have 100% of their genes in com- mon and dizygotic twins have on average only 50% in common, genetically influenced traits should exhibit higher concordance in monozygotic twins. For instance, when one member of a monozygotic twin pair has asthma, the other twin of the pair has asthma about 48% of the time, so the monozygotic concordance for asthma is 48%. However, when a dizygotic twin has asthma, the other twin has asthma only 19% of the time (19% dizygotic concordance). The higher concordance in the monozygotic twins suggests that genes influence asthma, a finding supported by other family studies of this disease. Concordance values for several human traits and diseases are listed in Table 6.2. The hallmark of a genetic influence on a particular characteristic is higher concordance in monozygotic twins compared with concordance in dizygotic twins. High concordance in monozygotic twins by itself does not signal a genetic influence. Twins normally share the same environment — they are raised in the same home, have the same friends, attend the same school — so high concordance may be due to common genes or to common environment. If the high concordance is due to environmental factors, then dizygotic twins, who also share the same environment, should have just as high a concordance as that of monozygotic twins. When genes influence the characteristic, however, monozygotic twin pairs should exhibit higher concordance than dizygotic twin pairs, because monozygotic twins have a greater percentage of genes in common. It is important to note that any discordance among monozygotic twins must be due to environmental factors, because monozygotic twins are genetically identical. The use of twins in genetic research rests on the important assumption that, when there is greater concordance in monozygotic twins than in dizygotic twins, it is because monozygotic twins are more similar in their genes and not because they have experienced a more similar environment. Table 6.2 Concordance of monozygotic and dizygotic twins for several traits Trait Monozygotic Concordance (%) Dizygotic Concordance (%) Reference Heart attack (males) 39 26 1 Heart attack (females) 44 14 1 Bronchial asthma 47 24 2 Cancer (all sites) 12 15 2 Epilepsy 59 19 2 Rheumatoid arthritis 32 6 3 Multiple sclerosis 28 5 4 References: (1) B. Havald and M. Hauge, U.S. Public Health Service Publication 1103 (pp. 61–67), 1963. (2) B. Havald and M. Hauge, Genetics and the Epidemiology of Chronic Diseases, U.S. Departement of Health, Education, and Welfare, 1965. (3) J. S. Lawrence, Annals of Rheumatic Diseases 26(1970):357–379. (4) G. C. Ebers et al, American Journal of Human Genetics 36(1984):495. Pedigree Analysis and Applications The degree of environmental similarity between monozygotic twins and dizygotic twins is assumed to be the same. This assumption may not always be correct, particularly for human behaviors. Because they look alike, identical twins may be treated more similarly by parents, teachers, and peers than are nonidentical twins. Evidence of this similar treatment is seen in the past tendency of parents to dress identical twins alike. In spite of this potential complication, twin studies have played a pivotal role in the study of human genetics. Table 6.3 Percent Overweight* 15 20 25 30 35 40 Twin Studies and Obesity To illustrate the use of twins in genetic research, let’s consider a genetic study of obesity. Obesity is a serious publichealth problem. About 50% of adults in affluent societies are overweight and from 15% to 25% are obese. Obesity increases the risk of a number of medical conditions, including diabetes, gallbladder disease, high blood pressure, some cancers, and heart disease. Obesity is clearly familial: when both parents are obese, 80% of their children will also become obese; when both parents are not overweight, only 15% of their children will eventually become obese. The familial nature of obesity could result from genes that influence body weight; alternatively, it could be entirely environmental, resulting from the fact that family members usually have similar diets and exercise habits. A number of genetic studies have examined twins in an effort to untangle the genetic and environmental contributions to obesity. The largest twin study of obesity was conducted on more than 4000 pairs of twins taken from the National Academy of Sciences National Research Council twin registry. This registry is a database of almost 16,000 male twin pairs, born between 1917 and 1927, who served in the U.S. armed forces during World War II or the Korean War. Albert Stunkard and his colleagues obtained weight and height for each of the twins from medical records compiled at the time of their induction into the armed forces. Equivalent data were again collected in 1967, when the men were 40 to 50 years old. The researchers then computed how overweight each man was at induction and at middle age in 1967. Concordance values for monozygotic and dizygotic twins were then computed for several weight categories (Table 6.3). In each weight category, monozygotic twins had significantly higher concordance than did dizygotic twins at induction and in middle age 25 years later. The researchers concluded that, among the group being studied, body weight appeared to be strongly influenced by genetic factors. Using statistics that are beyond the scope of this discussion, the researchers further concluded that genetics accounted for 77% of variation in body weight at induction and 84% at middle age in 1967. (Because a characteristic such as body weight changes in a lifetime, the effects of genes on the characteristic may vary with age.) Concordance values for body weight among monozygotic twins (MZ) and dizygotic twins (D) at induction in the armed services and at follow-up Concordance (%) At Follow-up At Induction in 1967 MZ DZ MZ DZ 61 57 46 51 44 44 31 27 24 19 12 0 68 60 54 47 43 36 49 40 26 16 9 6 *Percent overweight was determined by comparing each man’s actual weight with a standard recommended weight for his height. Source: After A. J. Standard, T.T. Foch, and Z. Hrubec, A twin study of human obesity, Journal of the American Medical Association 256(1986):52. This study shows that genes influence variation in body weight, yet genes alone do not cause obesity. In less affluent societies, obesity is rare, and no one can become overweight unless caloric intake exceeds energy expenditure. One does not inherit obesity; rather, one inherits a predisposition toward a particular body weight; geneticists say that some people are genetically more at risk for obesity than others. How genes affect the risk of obesity is not yet completely understood. In 1994, scientists at Rockefeller University isolated a gene that causes an inherited form of obesity in mice ( ◗ FIGURE 6.13). This gene encodes a protein called leptin, named after the Greek word for “thin.” Leptin is produced by fat tissue and decreases appetite by affecting ◗ 6.13 Obesity in some mice is due to a defect in the gene that encodes the protein leptin. Obese mouse on the left compared with normal-sized mouse on the right. (Remi Banali/Liason.) 143 144 Chapter 6 the hypothalamus, a part of the brain. A decrease in body fat leads to decreased leptin, which stimulates appetite; an increase in body fat leads to increased levels of leptin, which reduces appetite. Obese mice possess two mutated copies of the leptin gene and produce no functional leptin; giving leptin to these mice promotes weight loss. The discovery of the leptin gene raised hopes that obesity in humans might be influenced by defects in the same gene and that the administration of leptin might be an effective treatment for obesity. Unfortunately, most overweight people are not deficient in leptin. Most, in fact, have elevated levels of leptin and appear to be somewhat resistant to its effects. Only a few rare cases of human obesity have been linked to genetic defects in leptin. The results of further studies have revealed that the genetic and hormonal control of body weight is quite complex; several other genes have been identified that also cause obesity in mice, and the molecular underpinnings of weight control are still being elucidated. Concepts Higher concordance in monozygotic twins compared with that in dizygotic twins indicates that genetic factors play a role in determining individual differences of a characteristic. Low concordance in monozygotic twins indicates that environmental factors play a significant role in the characteristic. www.whfreeman.com/pierce twin research in genetics More advanced information on Adoption Studies A third technique that geneticists use to analyze human inheritance is the study of adopted people. This approach is one of the most powerful for distinguishing the effects of genes and environment on characteristics. For a variety of reasons, many children each year are separated from their biological parents soon after birth and adopted by adults with whom they have no genetic relationship. These adopted persons have no more genes in common with their adoptive parents than do two randomly chosen persons; however, they do share a common environment with their adoptive parents. In contrast, the adopted persons have 50% of the genes possessed by each of their biological parents but do not share the same environment with them. If adopted persons and their adoptive parents show similarities in a characteristic, these similarities can be attributed to environmental factors. If, on the other hand, adopted persons and their biological parents show similarities, these similarities are likely to be due to genetic factors. Comparisons of adopted persons with their adoptive parents and with their biological parents can therefore help to define the roles of genetic and environmental factors in the determination of human variation. Adoption studies assume that the environments of biological and adoptive families are independent (i.e., not more alike than would be expected by chance). This assumption may not always be correct, because adoption agencies carefully choose adoptive parents and may select a family that resembles the biological family. Offspring and their biological mother also share a common environment during prenatal development. Some of the similarity between adopted persons and their biological parents may be due to these similar environments and not due to common genetic factors. Concepts Similarities between adopted persons and their genetically unrelated adoptive parents indicate that environmental factors affect the characteristic; similarities between adopted persons and their biological parents indicate that genetic factors influence the characteristic. Adoption Studies and Obesity Like twin studies, adoption studies have played an important role in demonstrating that obesity has a genetic influence. In 1986, geneticists published the results of a study of 540 people who had been adopted in Denmark between 1924 and 1947. The geneticists obtained information concerning the adult body weight and height of the adopted persons, along with the adult weight and height of their biological parents and their unrelated adoptive parents. Geneticists used a measurement called the body-mass index to analyze the relation between the weight of the adopted persons and that of their parents. (The body-mass index, which is a measure of weight divided by height, provides a measure of weight that is independent of height.) On the basis of body-mass index, sex, and age, the adopted persons were divided into four weight classes: thin, median weight, overweight, and obese. A strong relation was found between the weight classification of the adopted persons and the body-mass index of their biological parents: obese adoptees tended to have heavier biological parents, whereas thin adoptees tended to have lighter biological parents ( ◗ FIGURE 6.14). Because the only connection between the adoptees and their biological parents was the genes that they have in common, the investigators concluded that genetic factors influence adult body weight. There was no clear relation between the weight classification of adoptees and the body-mass index of their adoptive parents (see Figure 6.14), suggesting that the rearing environment has little effect on adult body weight. Pedigree Analysis and Applications Biological parents Adoptive parents Father Overweight biological parents tend to have overweight children. Mother Obese Thin Adoptee weight class Body-mass index of parents Body-mass index of parents Father Mother There is no consistant association between the weight of children and that of their adoptive parents. Thin Obese Adoptee weight class ◗ 6.14 Adoption studies demonstrate that obesity has a genetic influence. (Redrawn with permission of the New England Journal of Medicine 314:195.) Adoption Studies and Alcoholism Adoption studies have also been successfully used to assess the importance of genetic factors on alcoholism. Although frequently considered a moral weakness in the past, today alcoholism is more often treated as a disease or as a psychiatric condition. An estimated 10 million people in the United States are problem drinkers, and as many as 6 million are severely addicted to alcohol. Of the U.S. population, 11% are heavy drinkers and consume as much as 50% of all alcohol sold. A large study of alcoholism was carried out on 1775 Swedish adoptees who had been separated from their mothers at an early age and raised by biologically unrelated adoptive parents. The results of this study, along with those of others, suggest that there are at least two distinct groups of alcoholics. Type I alcoholics include men and women who typically develop problems with alcohol after the age of 25 (usually in middle age). These alcoholics lose control of the ability to drink in moderation — they drink in binges — and tend to be nonaggressive during drinking bouts. Type II alcoholics consist largely of men who begin drinking before the age of 25 (often in adolescence); they actively seek out alcohol, but do not binge, and tend to be impulsive, thrillseeking, and aggressive while drinking. The Swedish adoption study also found that alcohol abuse among biological parents was associated with increased alcoholism in adopted persons. Type I alcoholism usually required both a genetic predisposition and exposure to a rearing environment in which alcohol was consumed. Type II alcoholism appeared to be highly hereditary; it developed primarily among males whose biological fathers also were Type II alcoholics, regardless of whether the adoptive parents drank. A male adoptee whose biological father was a Type II alcoholic was nine times as likely to become an alcoholic as was an adoptee whose biological father was not an alcoholic. The results of the Swedish adoption study have been corroborated by other investigations, suggesting that some people are genetically predisposed to alcoholism. However, alcoholism is a complex behavioral characteristic that is undoubtedly influenced by many factors. It would be wrong to conclude that alcoholism is strictly a genetic characteristic. Although some people may be genetically predisposed to alcohol abuse, no gene forces a person to drink, and no one becomes alcoholic without the presence of a specific environmental factor — namely, alcohol. Genetic Counseling and Genetic Testing Our knowledge of human genetic diseases and disorders has expanded rapidly in the past 20 years. Victor McKusick’s Mendelian Inheritance in Man now lists more than 13,000 human genetic diseases, disorders, and traits that have a simple genetic basis. Research has provided a great deal of information about the inheritance, chromosomal location, biochemical basis, and symptoms of many of these genetic traits. This information is often useful to people who have a genetic condition. Genetic Counseling Genetic counseling is a new field that provides information to patients and others who are concerned about hereditary conditions. It is also an educational process that helps patients and family members deal with many aspects of a 145 146 Chapter 6 genetic condition. Genetic counseling often includes interpreting a diagnosis of the condition; providing information about symptoms, treatment, and prognosis; helping the patient and family understand the mode of inheritance; and calculating probabilities that family members might transmit the condition to future generations. Good genetic counseling also provides information about the reproductive options that are available to those at risk for the disease. Finally, genetic counseling tries to help the patient and family cope with the psychological and physical stress that may be associated with their disorder. Clearly, all of these considerations cannot be handled by a single person; so most genetic counseling is done by a team that can include counselors, physicians, medical geneticists, and laboratory personnel. Table 6.4 lists some common reasons for seeking genetic counseling. Genetic counseling usually begins with a diagnosis of the condition. On the bases of a physical examination, biochemical tests, chromosome analysis, family history, and other information, a physician determines the cause of the condition. An accurate diagnosis is critical, because treatment and the probability of passing on the condition may vary, depending on the diagnosis. For example, there are a number of different types of dwarfism, which may be caused by chromosome abnormalities, single-gene mutations, hormonal imbalances, or environmental factors. People who have dwarfism resulting from an autosomal dominant gene have a 50% chance of passing the condition to their children, whereas people with dwarfism caused by a rare recessive gene have a low likelihood of passing the trait to their children. When the nature of the condition is known, a genetic counselor sits down with the patient and other family members and explains the diagnosis. A family pedigree may be constructed, and the probability of transmitting the condition to future generations can be calculated for different family members. The counselor helps the family interpret the genetic risks and explains various reproductive options that are available, including prenatal diagnosis, artificial insemination, and in vitro fertilization. A family’s decision about future pregnancies frequently depends on the magnitude of the genetic risk, the severity and effects of the condition, the importance of having children, and religious and cultural views. The genetic counselor helps the family sort through these factors and facilitates their decision making. Throughout the process, a good genetic counselor uses nondirected counseling, which means that he or she provides information and facilitates discussion but does not bring his or her own opinion and values into the discussion. The goal of nondirected counseling is for the family to reach its own decision on the basis of the best available information. Genetic conditions are often perceived differently from other diseases and medical problems, because genetic conditions are intrinsic to the individual person and can be passed on to children. Such perceptions may produce feelings of guilt about past reproductive choices and intense personal dilemmas about future choices. Genetic counselors are trained to help patients and family members recognize and cope with these feelings. Concepts Genetic counseling is an educational process that provides patients and their families with information about a genetic condition, its medical implications, the probabilities that other family members may have the disease, and reproductive options. It also helps patients and their families cope with the psychological and physical stress associated with a genetic condition. Table 6.3 Common reasons for seeking genetic counseling 1. A person knows of a genetic disease in the family. 2. A couple has given birth to a child with a genetic disease, birth defect, or chromosomal abnormality. 3. A couple has a child who is mentally retarded or a close relative is mentally retarded. 4. An older woman becomes pregnant or wants to become pregnant. There is disagreement about the age at which a prospective mother who has no other risk factor should seek genetic counseling; many experts suggest that any prospective mother age 35 or older should seek genetic counseling. 5. Husband and wife are closely related (e.g., first cousins). 6. A couple experiences difficulties achieving a successful pregnancy. 7. A pregnant woman is concerned about exposure to an environmental substance (drug, chemical, or virus) that causes birth defects. 8. A couple needs assistance in interpreting the results of a prenatal or other test. 9. Both parents are known carriers for a regressive genetic disease. Pedigree Analysis and Applications www.whfreeman.com/pierce Information on genetic counseling and human genetic diseases, as well as a list of genetic counseling training programs accredited by the American Board of Genetic Counseling Genetic Testing Improvements in our understanding of human heredity and the identification of numerous disease-causing genes have led to the development of hundreds of tests for genetic conditions. The ultimate goal of genetic testing is to recognize the potential for a genetic condition at an early stage. In some cases, genetic testing allows early intervention that may lessen or even prevent the development of the condition. In other cases, genetic testing allows people to make informed choices about reproduction. For those who know that they are at risk for a genetic condition, genetic testing may help alleviate anxiety associated with the uncertainty of their situation. Genetic testing includes newborn screening, heterozygote screening, presymptomatic diagnosis, and prenatal testing. Presymptomatic testing Evaluating healthy people to determine whether they have inherited a disease-causing allele gene is known as presymptomatic genetic testing. For example, presymptomatic testing is available for members of families that have an autosomal dominant form of breast cancer. In this case, early identification of the disease-causing allele allows for closer surveillance and the early detection of tumors. Presymptomatic testing is also available for some genetic diseases for which no treatment is available, such as Huntington disease, an autosomal dominant disease that leads to slow physical and mental deterioration in middle age (see introduction to Chapter 5). Presymptomatic testing for untreatable conditions raises a number of social and ethical questions (Chapter 18). Several hundred genetic diseases and disorders can now be diagnosed prenatally. The major purpose of prenatal tests is to provide families with the information that they need to make choices during pregnancies and, in some cases, to prepare for the birth of a child with a genetic condition. A number of approaches to prenatal diagnosis are described in the following sections. Newborn screening Testing for genetic disorders in newborn infants is called newborn screening. Most states in the United States and many other countries require that newborn infants be tested for phenylketonuria and galactosemia. These metabolic diseases are caused by autosomal recessive alleles; if not treated at an early age, they can result in mental retardation, but early intervention — through the administration of a modified diet — prevents retardation (see p. 000 in Chapter 5). Testing is done by analyzing a drop of the infant’s blood collected soon after birth. Because of widespread screening, the frequency of mental retardation due to these genetic conditions has dropped tremendously. Screening newborns for additional genetic diseases that benefit from treatment, such as sickle-cell anemia and hypothyroidism, also is common. Heterozygote screening Testing members of a population to identify heterozygous carriers of recessive diseasecausing alleles, who are healthy but have the potential to produce children with the particular disease, is termed heterozygote screening. Testing for Tay-Sachs disease is a successful example of heterozygote screening. In the general population of North America, the frequency of Tay-Sachs disease is only about 1 person in 360,000. Among Ashkenazi Jews (descendants of Jewish people who settled in eastern and central Europe), the frequency is 100 times as great. A simple blood test is used to detect Ashkenazi Jews who carry the Tay-Sachs allele. If a man and woman are both heterozygotes, approximately one in four of their children is expected to have TaySachs disease. A prenatal test for the Tay Sachs allele also is available. Screening programs have led to a significant decline in the number of children of Ashkenazi ancestry born with Tay-Sachs disease (now fewer than 10 children per year in the United States). Ultrasonography Some genetic conditions can be detected through direct visualization of the fetus. Such visualization is most commonly done with ultrasonography — usually referred to as ultrasound. In this technique, high-frequency sound is beamed into the uterus; when the sound waves encounter dense tissue, they bounce back and are transformed into a picture ( ◗ FIGURE 6.15). The size of the fetus can be determined, as can genetic conditions such as neural tube defects (defects in the development of the spinal column and the skull) and skeletal abnormalities. Amniocentesis Most prenatal testing requires fetal tissue, which can be obtained in several ways. The most widely used method is amniocentesis, a procedure for obtaining a ◗ 6.15 Ultrasonography can be used to detect some genetic disorders in a fetus and locate the fetus during amniocentesis and chorionic villus sampling. (SIU School of Medicine/Photo Research.) 147 148 Chapter 6 The New Genetics ETHICS • SCIENCE • TECHNOLOGY Genetic Testing A couple are seeking help at a clinic that offers preimplantation genetic diagnosis (PGD), which combines in vitro fertilization with molecular analysis of the DNA from a single cell of the developing embryo, and permits the selection and transfer to the uterus of embryos free of a genetic disease. Before PGD, the only alternative for those wishing to prevent the birth of a child with a serious genetic disorder was early chorionic villus sampling or amniocentesis, followed by abortion if the fetus had a disorder. Consider a couple at risk of having a second child with severe combined immune deficiency (SCID). A child born with this condition has a seriously impaired immune system. As recently as 20 years ago, those affected died early in life, but the use of bone-marrow transplantation, which can provide the child with a supply of healthy blood stem cells, has greatly extended survival. In general, the earlier the transplantation and the closer the tissue match of the marrow donor, the better a recipient child’s chances. The couple tell the medical geneticist that they are seeking his help in identifying and transferring only embryos free of the SCID mutation so that they can begin their pregnancy knowing that it will be healthy. Some weeks later they reveal another reason for their interest in this technology: the health of their sixyear-old daughter, who is affected with SCID, is on a downward course despite one partly matched bone-marrow transplant earlier in her life. Their child’s best hope of survival is another bone-marrow transplant, using tissue from a compatible donor, preferably a sibling. Is it possible, they ask, to test the healthy embryos for tissue compatibility and transfer only those that match their daughter’s type? The geneticist responds that it is indeed technically possible to do so but he wonders whether helping the couple in this way is ethically appro- priate. Is it right to conceive a child for this purpose? In addition, because tissue compatibility is not a disease, would responding to this request constitute an unwise step into a world of positive, or “enhancement,” genetics, where parents’ desires, not medical judgment, dictate the use of genetic knowledge? PGD offers significant new reproductive opportunities for families or persons affected by genetic disease. However, the very power of this technology raises new ethical issues that will grow in importance as PGD and related embryo manipulation procedures become more widely available. PGD offers a technology that is medically, psychologically and, in the view of many, morally superior to the existing use of abortion for genetic selection. The case described here is not entirely novel. Even before the advent of PGD, couples who had sought to insure the birth of a child whose HLA (human leukocyte antigen) status could be compatible with that of an existing sibling would establish a pregnancy, undergo testing, and then abort all fetuses that did not have the appropriate HLA type. Because a woman in the United States has a right to abortion for any reason through the second trimester, this option is legal. However, it is certainly not a desirable one from a medical or psychological point of view. Because pregnancy is never begun, PGD avoids the emotional trauma of abortion. Although some would object to the discarding of human embryos even at this early stage, the selection of viable embryos and the discarding of others is a routine part of in vitro fertilization procedures today the and raises few moral questions in the minds of most people. So, from the narrow perspective of parental decision making, the alternative described in this case is a significant medical and moral step by Ron Green forward. We should not lose sight of this fact as we consider other ethically troubling aspects of the case. This case of parental selection raises at least two distinct questions. First, even if the means of selection is relatively innocuous from a moral standpoint, is it appropriate for parents to bring a child into being at least partly for the purpose of saving the life of a sibling? A second question is whether genetic professionals should cooperate with a selection process that entails a nondisease trait. With regard to the first question, some people believe that the parents’ wishes in this situation violate the ethical principle not to use a person merely as a means to an end, as well as the modern principle of responsible parenthood, which judges each child to be of inestimable value. They also worry about future psychological harm to a child conceived in this way. Others argue that children are usually born for a specific purpose, whether it’s to gratify the parents’ need for a family, to cement the relationship of a couple, or whatever else. As a result, they argue that the important question is not whether the purpose for which the child was conceived is eithical but whether the parents will be able to accept the child in its own unique identity once it is born. In response to the second question, some ethicists see any involvement in nondisease testing as a dangerous diversion of genetic testing down paths long since rejected for good reason. They see that a consensus has emerged in popular opinion and among geneticists and ethics advisory boards that nondisease characteristics should not be subject to prenatal testing. HLA testing, however well intentioned, runs counter to thus concensus. Departures from these views about genetic tests could raise very difficult questions about eugenics in the future. Pedigree Analysis and Applications 1 Under the guidance of ultrasound, a sterile needle is inserted through the abdominal wall into the amniotic sac. 2 A small amount of amniotic fluid is withdrawn through the needle. 3 The amniotic fluid contains fetal cells, which are separated from the amniotic fluid… 4 …and cultured. 149 5 Tests are then performed on the cultured cells to detect errors of metabolism, analyze DNA,… Chemical analysis Ultrasound monitor Centrifuged fluid DNA analysis Fetal cells Chromosomal analysis 6 …or examine chromosomes. ◗ 6.16 Amniocentesis is a procedure for obtaining fetal cells for genetic testing. sample of amniotic fluid from a pregnant woman ( ◗ FIGURE 6.16). Amniotic fluid — the substance that fills the amniotic sac and surrounds the developing fetus — contains fetal cells that can be used for genetic testing. Amniocentesis is routinely performed as an outpatient procedure with the use of a local or no anesthetic. First, ultrasonography is used to locate the position of the fetus in the uterus. Next, a long, sterile needle is inserted through the abdominal wall into the amniotic sac (see Figure 6.16), and a small amount of amniotic fluid is withdrawn through the needle. Fetal cells are separated from the amniotic fluid and placed in a culture medium that stimulates them to grow and divide. Genetic tests are then performed on the cultured cells. Complications with amniocentesis (mostly miscarriage) are rare, arising in only about 1 in 400 procedures. Chorionic villus sampling A major disadvantage with amniocentesis is that it is routinely performed in about the 16th week of a pregnancy, (although many obstetricians now successfully perform amniocentesis several weeks earlier). The cells obtained with amniocentesis must then be cultured before genetic tests can be performed, requiring yet more time. For these reasons, genetic information about the fetus may not be available until the 17th or 18th week of pregnancy. By this stage, abortion carries a risk of complications and may be stressful for the parents. Chorionic villus sampling (CVS) can be performed earlier (between the 10th and 11th weeks of pregnancy) and collects more fetal tissue, which eliminates the necessity of culturing the cells. In CVS, a catheter — a soft plastic tube — is inserted into the vagina ( ◗ FIGURE 6.17) and, with the use of ultrasound for guidance, is pushed through the cervix into the uterus. The tip of the tube is placed into contact with the chorion, the outer layer of the placenta. Suction is then applied, and a small piece of the chorion is removed. Although the chorion is composed of fetal cells, it is a part of the placenta that is expelled from the uterus after birth; so the removal of a small sample does not endanger the fetus. The tissue that is removed contains millions of actively dividing cells that can be used directly in many genetic tests. Chorionic villus sampling has a somewhat higher risk of complication than that of amniocentesis; the results of several studies suggest that this procedure may increase the incidence of limb defects in the fetus when performed earlier than 10 weeks of gestation. Fetal cells obtained by amniocentesis or by CVS can used to prepare a karyotype, which is a picture of a complete set of metaphase chromosomes. Karyotypes can be studied for chromosome abnormalities (Chapter 9). Biochemical analyses can be conducted on fetal cells to determine the presence of particular metabolic products of genes. For genetic diseases in which the DNA sequence of the causative gene has been determined, the DNA sequence (DNA testing; Chapter 18) can be examined for defective alleles. Maternal blood tests Some genetic conditions can be detected by performing a blood test on the mother (maternal blood testing). For instance, -fetoprotein is normally produced by the fetus during development and is present in the fetal blood, the amniotic fluid, and the mother’s blood during pregnancy. The level of -fetoprotein is significantly higher than normal when the fetus has a neuraltube or one of several other disorders. Some chromosome 150 Chapter 6 1 CVS can be performed early in preganancy. 10–11 week fetus 2 Using ultrasound for guidance, a catheter is inserted through the vagina and cervix and into the uterus… 3 …where it is placed into contact with the chorion, the outer layer of the placenta. 4 Suction removes a small piece of the chorion. 5 Cells of the chorion are used directly for many genetic tests, and culturing is not required. Ultrasound monitor Chemical analysis Uterus Chorion DNA analysis Catheter Cells of chorion Chromosomal analysis Vagina ◗ 6.17 Chorionic villus sampling is another procedure for obtaining fetal cells for genetic testing. abnormalities produce lower-than-normal levels of -fetoprotein. Measuring the amount of -fetoprotein in the mother’s blood gives an indication of these conditions. However, because other factors affect the amount of -fetoprotein in maternal blood, a high or low level by itself does not necessarily indicate a problem. Thus, when a blood test indicates that the amount of -fetoprotein is abnormal, follow-up tests (additional -fetoprotein determinations, ultrasound, amniocentesis, or all three) are usually performed. Fetal cell sorting Prenatal tests that utilize only maternal blood are highly desirable because they are noninvasive and pose no risk to the fetus. During pregnancy, a few fetal cells are released into the mother’s circulatory system, where they mix and circulate with her blood. Recent advances have made it possible to separate fetal cells from a maternal blood sample (a procedure called fetal cell sorting). With the use of lasers and automated cell-sorting machines, fetal cells can be detected and separated from maternal blood cells. The fetal cells obtained can be cultured for chromosome analysis or used as a source of fetal DNA for molecular testing (see p. 000 in Chapter 18). A large number of genetic diseases can now be detected prenatally (Table 6.5), and the number is growing rapidly as new disease-causing genes are isolated. The Human Genome Project (Chapter 18) has accelerated the rate at which new genes are being isolated and new genetic tests are being developed. In spite of these advances, prenatal tests are still not available for many common genetic diseases, and no test can guarantee that a “perfect” child will be born. Preimplantation genetic diagnosis Prenatal genetic tests provide today’s couples with increasing amounts of information about the health of their future children. New reproductive technologies also provide couples with options for using this information. One of these technologies is in vitro fertilization. In this procedure, hormones are used to induce ovulation. The ovulated eggs are surgically removed from the surface of the ovary, placed in a laboratory dish, and fertilized with sperm. The resulting embryo is then implanted into the uterus. Thousands of babies resulting from in vitro fertilization have now been born. Genetic testing can be combined with in vitro fertilization to allow implantation of embryos that are free of a specific genetic defect. Called preimplantation genetic diagnosis, (PGD), this technique allows people who carry a genetic defect to avoid producing a child with the disorder. For example, when a woman is a carrier of an X-linked recessive disease, approximately half of her sons are expected to have the disease. Through in vitro fertilization and preimplantation testing, it is possible to select an embryo without the disorder for implantation in her uterus. The procedure begins with the production of several single-celled embryos through in vitro fertilization. The embryos are allowed to divide several times until they reach the 8 or 16-cell stage. At this point, one cell is removed from each embryo and tested for the genetic abnormality. Removing a single cell at this early stage does not harm the embryo. After determination of which embryos are free of Pedigree Analysis and Applications Table 6.5 Examples of genetic diseases and disorders that can be detected prenatally and the techniques used in their detection Disorder Method of Detection Chromosome abnormalities Examination of a karyotype from cells obtained by amniocentesis or CVS Cleft lip and palate Ultrasound Cystic fibrosis DNA analysis of cells obtained by amniocentesis or CVS Dwarfism Ultrasound or X-ray; some forms can be detected by DNA analysis of cells obtained by amniocentesis or CVS Hemophilia Fetal blood sampling* or DNA analysis of cells obtained by amniocentesis or CVS Lesch-Nyhan syndrome (deficiency of purine metabolism leading to spasms, seizures, and compulsory self-mutilation) Biochemical tests on cells obtained by amniocentesis or CVS Neural-tube defects Initial screening with maternal blood test, followed by biochemical tests on amniotic fluid obtained by amniocentesis and ultrasound Osteogenesis imperfecta (brittle bones) Ultrasound or X-ray Phenylketonuria DNA analysis of cells obtained by amniocentesis or CVS Sickle-cell anemia Fetal blood sampling or DNA analysis of cells obtained by amniocentesis or CVS Tay-Sachs disease Biochemical tests on cells obtained by amniocentesis or CVS *A sample of fetal blood is otained by inserting needle into the umblical cord. the disorder, a healthy embryo is selected and implanted in the woman’s uterus. Preimplantation genetic diagnosis requires the ability to conduct a genetic test on a single cell. Such testing is possible with the use of the polymerase chain reaction through which minute quantities of DNA can be amplified (replicated) quickly (Chapter 18). After amplification of the cell’s DNA, the DNA sequence is examined. Preimplantation diagnosis is still experimental and is available at only a few research centers. Its use raises a number of ethical concerns, because it provides a means of actively selecting for or against certain genetic traits. Concepts Genetic testing is used to screen newborns for genetic diseases, detect persons who are heterozygous for recessive diseases, detect disease-causing alleles in those who have not yet developed symptoms of the disease, and detect defective alleles in unborn babies. Preimplantation genetic diagnosis combined with in vitro fertilization allows for selection of embryos that are free from specific genetic diseases. www.whfreeman.com/pierce genetic testing Additional information about Connecting Concepts Across Chapters This chapter builds on the basic principles of heredity that were introduced in Chapters 1 through 5, extending them to human genetic characteristics. A dominant theme of the chapter is that human inheritance is not fundamentally different from inheritance in other organisms, but the unique biological and cultural characteristics of humans require special techniques for the study of human characteristics. Several topics introduced in this chapter are explored further in later chapters. Molecular techniques used in genetic testing and some of the ethical implications of modern genetic testing are presented in Chapter 18. Chromosome mutations and karyotypes are studied in Chapter 9. In Chapter 22, we examine additional techniques for separating genetic and environmental contributions to characteristics in humans and other organisms. 151 152 Chapter 6 CONCEPTS SUMMARY • There are several difficuties in applying traditional genetic techniques to the study of human traits, including the inability to conduct controlled crosses, long generation time, small family size, and the difficulty of separating genetic and environmental influences. • A pedigree is a pictorial representation of a family history that displays the inheritance of one or more traits through several generations. • Autosomal recessive traits typically appear with equal frequency in both sexes. If a trait is uncommon, the parents of a child with an autosomal recessive trait are usually heterozygous and unaffected; so the trait tends to skip generations. When both parents are heterozygous, approximately 14 of their offspring will have the trait. Recessive traits are more likely to appear in families with consanguinity (mating between closely related persons). • Autosomal dominant traits usually appear equally in both sexes and do not skip generations. When one parent is affected and heterozygous, approximately 12 of the offspring will have the trait. When both parents are affected and heterozygous, approximately 34 of the offspring will be affected. Unaffected people do not normally transmit an autosomal dominant trait to their offspring. • X-linked recessive traits appear more frequently in males than in females. Affected males are usually born to females who are unaffected carriers. When a woman is a heterozygous carrier and a man is unaffected, approximately 12 of their sons will have the trait and 12 of their daughters will be unaffected carriers. X-linked traits are not passed from father to son. • X-linked dominant traits appear in males and females, but more frequently in females. They do not skip generations. Affected men pass an X-linked dominant trait to all of their daughters but none of their sons. Heterozygous women pass the trait to 12 of their sons and 12 of their daughters. • Y-linked traits appear only in males and are passed from father to all sons. • Analysis of twins is an important technique for the study of human genetic characteristics. Dizygotic twins arise from two separate eggs fertilized by two separate sperm; monozygotic twins arise from a single egg, fertilized by a single sperm, that splits into two separate embryos early in development. • Concordance is the percentage of twin pairs in which both members of the pair express a trait. Higher concordance in monozygotic than in dizygotic twins indicates a genetic influence on the trait; less than 100% concordance in monozygotic twins indicates environmental influences on the trait. • Adoption studies are used to analyze the inheritance of human characteristics. Similarities between adopted children and their biological parents indicate the importance of genetic factors in the expression of a trait; similarities between adopted children and their genetically unrelated adoptive parents indicate the influence of environmental factors. • Genetic counseling provides information and support to people concerned about hereditary conditions in their families. • Genetic testing includes screening for disease-causing alleles in newborns, the detection of people heterozygous for recessive alleles, presymptomatic testing for the presence of a disease-causing allele in at-risk people, and prenatal diagnosis. • Common techniques used for prenatal diagnosis include ultrasound, amniocentesis, chorionic villus sampling, and maternal blood sampling. Preimplantation genetic diagnosis can be used to select for embryos that are free of a genetic disease. IMPORTANT TERMS pedigree (p. 134) proband (p. 135) consanguinity (p. 136) dizygotic twins (p. 141) monozygotic twins (p. 141) concordance (p. 142) genetic counseling (p. 145) newborn screening (p. 147) heterozygote screening (p. 147) presymptomatic genetic testing (p. 147) ultrasonography (p. 147) amniocentesis (p. 147) chorionic villus sampling (p. 149) karyotype (p. 149) maternal blood testing (p. 149) fetal cell sorting (p. 150) preimplantation genetic diagnosis (p. 150) Worked Problems 1. Joanna has short fingers (brachydactyly). She has two older brothers who are identical twins; they both have short fingers. Joanna’s two younger sisters have normal fingers. Joanna’s mother has normal fingers, and her father has short fingers. Joanna’s paternal grandmother (her father’s mother) has short fingers; her paternal grandfather (her father’s father), who is now deceased, had normal fingers. Both of Joanna’s maternal grandparents (her mother’s parents) have normal fingers. Joanna Pedigree Analysis and Applications marries Tom, who has normal fingers; they adopt a son named Bill who has normal fingers. Bill’s biological parents both have normal fingers. After adopting Bill, Joanna and Tom produce two children: an older daughter with short fingers and a younger son with normal fingers. (a) Using correct symbols and labels, draw a pedigree illustrating the inheritance of short fingers in Joanna’s family. (b) What is the most likely mode of inheritance for short fingers in this family? (c) If Joanna and Tom have another biological child, what is the probability (based on you answer to part b) that this child will have short fingers? (c) If having short fingers is autosomal dominant, Tom must be homozygous (bb) because he has normal fingers. Joanna must be heterozygous (Bb) because she and Tom have produced both short- and normal-fingered offspring. In a cross between a heterozygote and homozygote, half of the progeny are expected to be heterozygous and half homozygous (Bb  bb : 12 Bb, 1 2 bb); so the probability that Joanna’s and Tom’s next biological child will have short fingers is 12. 2. Concordance values for a series of traits were measured in monozygotic twins and dizygotic twins; the results are shown in the following table. For each trait, indicate whether the rates of concordance suggest genetic influences, environmental influences, or both. Explain your reasoning. • Solution (a) In the pedigree for the family, note that persons with the trait (short fingers) are indicated by filled circles (females) and filled squares (males). Joanna’s identical twin brothers are connected to the line above with diagonal lines that have a horizontal line between them. The adopted child of Joanna and Tom is enclosed in brackets and is connected to the biological parents by a dashed diagonal line. 153 (a) (b) (c) (d) (e) Characteristic ABO blood type Diabetes Coffee drinking Smoking Schizophrenia Monozygotic concordance (%) 100 85 80 75 53 Dizygotic concordance (%) 65 36 80 42 16 • Solution 1 2 3 4 I 1 2 II 1 2 3 5 4 P 6 7 8 V 1 2 3 (b) The most likely mode of inheritance for short fingers in this family is autosomal dominant. The trait appears equally in males and females and does not skip generations. When one parent has the trait, it appears in approximately half of that parent’s sons and daughters, although the number of children in the families is small. We can eliminate Y-linked inheritance because the trait is found in females. If short fingers were X-linked recessive, females with the trait would be expected to pass the trait to all their sons, but Joanna (III-6), who has short fingers, produced a son with normal fingers. For Xlinked dominant traits, affected men should pass the trait to all their daughters; because male II-1 has short fingers and produced two daughters without short fingers (III-7 and III8), we know that the trait cannot be X-linked dominant. It is unlikely that the trait is autosomal recessive because it does not skip generations and approximately half of the children of affected parents have the trait. (a) The concordance of ABO blood type in the monozygotic twins is 100%. This high concordance in monozygotic twins does not, by itself, indicate a genetic basis for the trait. An important indicator of a genetic influence on the trait is lower concordance in dizygotic twins. Because concordance for ABO blood type is substantially lower in the dizygotic twins, we would be safe in concluding that genes play a role in determining differences in ABO blood types. (b) The concordance for diabetes is substantially higher in monozygotic twins than in dizygotic twins; therefore, we can conclude that genetic factors play some role in susceptibility to diabetes. The fact that monozygotic twins show a concordance less than 100% suggests that environmental factors also play a role. (c) Both monozygotic and dizygotic twins exhibit the same high concordance for coffee drinking; so we can conclude that there is little genetic influence on coffee drinking. The fact that monozygotic twins show a concordance less than 100% suggests that environmental factors play a role. (d) There is lower concordance of smoking in dizygotic twins than in monozygotic twins, so genetic factors appear to influence the tendency to smoke. The fact that monozygotic twins show a concordance less than 100% suggests that environmental factors also play a role. (e) Monozygotic twins exhibit substantially higher concordance for schizophrenia than do dizygotic twins; so we can conclude that genetic factors influence this psychiatric disorder. Because the concordance of monozygotic twins is substantially less than 100%, we can also conclude that environmental factors play a role in the disorder as well. 154 Chapter 6 The New Genetics MINING GENOMES INTRODUCTION TO BIOINFORMATICS AND THE NATIONAL CENTER FOR BIOTECHNOLOGY INFORMATION (NCBI) Biology and computer science merge in the field of bioinformatics, allowing biologists to ask questions and develop perspectives that would never be possible with traditional techniques. This project will introduce you to the diverse suite of powerful interactive bioinformatics tools at the National Center for Biotechnology Information (NCBI). COMPREHENSION QUESTIONS * 1. What three factors complicate the task of studying the inheritance of human characteristics? * 2. Describe the features that will be exhibited in a pedigree in which a trait is segregating with each of the following modes of inheritance: autosomal recessive, autosomal dominant, X-linked recessive, X-linked dominant, and Y-linked inheritance. * 3. What are the two types of twins and how do they arise? 4. Explain how a comparison of concordance in monozygotic and dizygotic twins can be used to determine the extent to which the expression of a trait is influenced by genes or by environmental factors. 5. How are adoption studies used to separate the effects of genes and environment in the study of human characteristics? * 6. What is genetic counseling? 7. Briefly define newborn screening, heterozygote screening, presymptomatic testing, and prenatal diagnosis. * 8. What are the differences between amniocentesis and chorionic villus sampling? What is the purpose of these two techniques? 9. What is preimplantation genetic diagnosis? APPLICATION QUESTIONS AND PROBLEMS * 10. Joe is color-blind. His mother and father both have normal vision, but his mother’s father (Joe’s maternal grandfather) is color-blind. All Joe’s other grandparents have normal color vision. Joe has three sisters — Patty, Betsy, and Lora — all with normal color vision. Joe’s oldest sister, Patty, is married to a man with normal color vision; they have two children, a 9-year old color-blind boy and a 4-year-old girl with normal color vision. (a) Using correct symbols and labels, draw a pedigree of Joe’s family. (b) What is the most likely mode of inheritance for color blindness in Joe’s family? (c) If Joe marries a woman who has no family history of color blindness, what is the probability that their first child will be a color-blind boy? (d) If Joe marries a woman who is a carrier of the color-blind allele, what is the probability that their first child will be a color-blind boy? (e) If Patty and her husband have another child, what is the probability that the child will be a color-blind boy? 11. A man with a specific unusual genetic trait marries an unaffected woman and they have four children. Pedigrees of this family are shown in parts a through e, but the presence or absence of the trait in the children is not indicated. For each type of inheritance, indicate how many children of each sex are expected to express the trait by filling in the appropriate circles and squares. Assume that the trait is rare and fully penetrant. (a) Autosomal recessive trait (b) Autosomal dominant trait (c) X-linked recessive trait 155 Pedigree Analysis and Applications (d) X-linked dominant trait (c) I 1 2 3 4 5 II (e) Y-linked trait 1 2 2 3 3 4 5 6 7 1 2 6 7 8 III 1 4 5 8 9 IV * 12. For each of the following pedigrees, give the most likely mode of inheritance, assuming that the trait is rare. Carefully explain your reasoning. (a) 3 4 5 6 7 8 9 10 11 (d) I I 1 1 2 2 II II 1 2 3 4 6 1 7 III 2 3 4 5 5 6 7 5 6 7 III 1 2 3 4 5 6 7 8 9 1 10 IV 2 3 4 8 12 IV 1 2 3 4 5 1 6 (b) 2 3 4 8 9 (e) I I 1 2 1 II 2 II 1 2 3 4 5 6 7 8 9 III 1 2 3 4 5 6 7 II 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 IV 1 2 3 4 5 IV 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 9 8 156 Chapter 6 13. The trait represented in the following pedigree is expressed only in the males of the family. Is the trait Y linked? Why or why not? If you believe the trait is not Y linked, propose an * 16. alternate explanation for its inheritance. 1 1 1 2 2 3 (a) On the basis of this pedigree, what do you think is the most likely mode of inheritance for Nance-Horan syndrome? (b) If couple III-7 and III-8 have another child, what is the probability that the child will have Nance-Horan syndrome? (c) If III-2 and III-7 mated, what is the probability that one of their children would have Nance-Horan syndrome? 2 3 4 4 5 5 6 6 What can you conclude from these results concerning the role of genetics in schizophrenia? Explain your reasoning. The following pedigree illustrates the inheritance of NanceHoran syndrome, a rare genetic condition in which affected persons have cataracts and abnormally shaped teeth. 7 8 I 1 1 2 * 14. A geneticist studies a series of characteristics in monozygotic twins and dizygotic twins, obtaining the following concordances. For each characteristic, indicate whether the rates of concordance suggest genetic influences, environmental influences, or both. Explain your reasoning. Characteristic Migraine headaches Eye color Measles Clubfoot High blood pressure Handedness Tuberculosis Monozygotic concordance (%) II 1 2 3 2 3 4 3 4 4 III 1 5 6 7 8 IV Dizygotic concordance (%) 1 2 5 6 7 V 60 100 90 30 30 40 90 10 70 70 5 40 70 5 15. In a study of schizophrenia (a mental disorder including disorganization of thought and withdrawal from reality), researchers looked at the prevalence of the disorder in the biological and adoptive parents of people who were adopted as children; they found the following results: Prevalence of schizophrenia (%) Adopted persons With schizophrenia Without schizophrenia 2 Biological parents 12 6 Adoptive parents 2 4 (Source: S. S. Kety, et al., The biological and adoptive families of adopted individuals who become schizophrenic: prevalence of mental illness and other characteristics, In The Nature of Schizophrenia: New Approaches to Research and Treatment, L. C. Wynne, R. L. Cromwell, and S. Matthysse, Eds. (New York: Wiley, 1978), pp. 25 – 37.) 1 2 3 4 (Pedigree adapted from D. Stambolian, R. A. Lewis, K. Buetow, A. Bond, and R. Nussbaum. American Journal of Human Genetics 47(1990):15.) 17. The following pedigree illustrates the inheritance of ringed hair, a condition in which each hair is differentiated into light and dark zones. What mode or modes of inheritance are possible for the ringed-hair trait in this family? I 1 2 II 1 2 3 4 5 6 7 III 1 2 3 4 IV 1 P 2 (Pedigree adapted from L. M. Ashley, and R. S. Jacques, Journal of Heredity 41(1950):83.) 5 157 Pedigree Analysis and Applications I 1 * 18. Ectodactyly is a rare condition in which the fingers are absent and the hand is split. This condition is usually inherited as an autosomal dominant trait. Ademar FreireMaia reported the appearance of ectodactyly in a family in Sao Paulo, Brazil, whose pedigree is shown here. Is this pedigree consistent with autosomal dominant inheritance? If not, what mode of inheritance is most likely? Explain your reasoning. (Pedigree adapted from A. Freire-Maia, Journal of Heredity 62(1971):53.) 2 3 4 II 1 2 III 3 2 4 3 1 4 IV 1 2 3 4 5 6 7 8 CHALLENGE QUESTIONS 19. Draw a pedigree that represent an autosomal dominant trait, sex-limited to males, and that excludes the possibility that the trait is Y linked. 20. Androgen insensitivity syndrome is a rare disorder of sexual development, in which people with an XY karyotype, genetically male, develop external female features. All persons with androgen insensitivity syndrome are infertile. In the past, some researchers proposed that androgen insensitivity syndrome is inherited as a sex-limited, autosomal dominant trait. (It is sex-limited because females cannot express the trait.) Other investigators suggested that this disorder is inherited as a X-linked recessive trait. Draw a pedigree that would show conclusively that androgen insensitivity syndrome is inherited as an X-linked recessive trait and that excludes the possibility that it is sex-limited, autosomal dominant. If you believe that no pedigree can conclusively differentiate between the two choices (sex-limited, X-linked recessive and sex-limited, autosomal dominant), explain why. Remember that all affected persons are infertile. SUGGESTED READINGS Barsh, G. S., I. S. Farooqi, and S. O’Rahilly. 2000. Genetics of body-weight regulation. Nature 404:644 – 651. An excellent review of the genetics of body weight in humans. This issue of Nature has a section on obesity, with additional review articles on obesity as a medical problem, on the molecular basis of thermogenesis, on nervous-system control of food intake, and medical strategies for treatment of obesity. Bennett, R. L., K. A. Steinhaus, S. B. Uhrich, C. K. O’Sullivan, R. G. Resta, D. Lochner-Doyle, D. S. Markel, V. Vincent, and J. Hamanishi. 1995. Recommendations for standardized human pedigree nomenclature. American Journal of Human Genetics 56:745–752. Contains recommendations for standardized symbols used in pedigree construction. Brown, M. S., and J. L. Goldstein. 1984. How LDL receptors influence cholesterol and atherosclerosis. Scientific American 251 November: 58–66. Excellent review of the genetics of atherosclerosis by two scientists who received the Nobel Prize for their research on atherosclerosis. Devor, E. J., and C. R. Cloninger. 1990. Genetics of alcoholism. Annual Review of Genetics 23:19–36. A good review of how genes influence alcoholism in humans. Gurney, M. E., A. G. Tomasselli, and R. L. Heinrikson. 2000. Stay the executioner’s hand. Science 288:283–284. Reports new evidence that mutated SOD1 may be implicated in apoptosis (programmed cell death) in people with amyotrophic lateral sclerosis. Harper, P. S. 1998. Practical Genetic Counseling, 5th ed. Oxford: Butterworth Heineman. A classic textbook on genetic counseling. Jorde, L. B., J. C. Carey, M. J. Bamshad, and R. L. White. 1998. Medical Genetics, 2d ed. St. Louis: Mosby. A textbook on medical aspects of human genetics. Lewis, R. 1994. The evolution of a classical genetic tool. Bioscience 44:722–726. A well-written review of the history of pedigree analysis and recent changes in symbols that have been necessitated by changing life styles and new reproductive technologies. 158 Chapter 6 Mange, E. J., and A. P. Mange. 1998. Basic Human Genetics, 2d ed. Sunderland, MA: Sinauer. A well-written textbook on human genetics. MacGregor, A. J., H. Snieder, N. J. Schork, and T. D. Spector. 2000. Twins: novel uses to study complex traits and genetic diseases. Trends in Genetics 16:131–134. A discussion of new methods for using twins in the study of genes. Mahowald, M. B., M. S. Verp, and R. R. Anderson. 1998. Genetic counseling: clinical and ethical challenges. Annual Review of Genetics 32:547–559. A review of genetic counseling in light of the Human Genome Project, with special consideration of the role of nondirected counseling. McKusick, V. A. 1998. Mendelian Inheritance in Man: A Catalog of Human Genes and Genetic Disorders, 12th ed. Baltimore: Johns Hopkins University Press. A comprehensive catalog of all known simple human genetic disorders and the genes responsible for them. Pierce, B. A. 1990. The Family Genetic Source Book. New York: Wiley. A book on human genetics written for the layperson; contains a catalog of more than 100 human genetic traits. Stunkard, A. J., T. I. Sorensen, C. Hanis, T. W. Teasdale, R. Chakraborty, W. J. Schull, and F. Schulsinger. 1986. An adoption study of human obesity. The New England Journal of Medicine 314:193–198. Describes the Danish adoption study of obesity. 7 Linkage, Recombination, and Eukaryotic Gene Mapping • Alfred Sturtevant and the First Genetic Map • Genes That Assort Independently and Those That Don’t • Linkage and Recombination Between Two Genes Notation for Crosses with Linkage Complete Linkage Compared with Independent Assortment Crossing Over with Linked Genes Calculation of Recombination Frequency Coupling and Repulsion The Physical Basis of Recombination Predicting the Outcome of Crosses with Linked Genes Testing for Independent Assortment Gene Mapping with Recombination Frequencies Constructing a Genetic Map with Two-Point Testcrosses • Linkage and Recombination Between Three Genes Gene Mapping with the Three-Point Testcross Alfred Henry Sturtevant, an early geneticist, developed the first genetic map. (Institute Archives, California Institute of Technology.) Gene Mapping in Humans Mapping with Molecular Markers • Physical Chromosome Mapping Deletion Mapping Somatic-Cell Hybridization Alfred Sturtevant and the First Genetic Map In 1909, Thomas Hunt Morgan taught the introduction to zoology class at Columbia University. Seated in the lecture hall were sophomore Alfred Henry Sturtevant and freshman Calvin Bridges. Sturtevant and Bridges were excited by Morgan’s teaching style and intrigued by his interest in biological problems. They asked Morgan if they could work in his laboratory and, the following year, both young men were given desks in the “fly room,” Morgan’s research laboratory where the study of Drosophila genetics was in its infancy (see p. 000 in Chapter 4). Sturtevant, Bridges, and In Situ Hybridization Mapping by DNA Sequencing Morgan’s other research students virtually lived in the laboratory, raising fruit flies, designing experiments, and discussing their results. In the course of their research, Morgan and his students observed that some pairs of genes did not segregate randomly according to Mendel’s principle of independent assortment but instead tended to be inherited together. Morgan suggested that possibly the genes were located on the same chromosome and thus traveled together during meiosis. He further proposed that closely linked genes — 159 160 Chapter 7 Sturtevant’s symbols: X chromosome locations: Modern symbols: y w Yellow White body eyes v Vermilion eyes m Miniature wings r Rudimentary wings ◗ 7.1 Sturtevant’s map included five genes on the X chromosome of Drosophila. The genes are yellow body (y), white eyes (w), vermilion eyes (v), miniature wings (m), and rudimentary wings (r). Sturtevant’s original symbols for the genes are shown above the line; modern symbols are shown below with their current locations on the X chromosome. those that are rarely shuffled by recombination — lie close together on the same chromosome, whereas loosely linked genes — those more frequently shuffled by recombination — lie farther apart. One day in 1911, Sturtevant and Morgan were discussing independent assortment when, suddenly, Sturtevant had a flash of inspiration: variation in the strength of linkage indicated how genes were positioned along a chromosome, providing a way of mapping genes. Sturtevant went home and, neglecting his undergraduate homework, spent most of the night working out the first genetic map ( ◗ FIGURE 7.1). Sturtevant’s first chromosome map was remarkably accurate, and it established the basic methodology used today for mapping genes. Alfred Sturtevant went on to become a leading geneticist. His research included gene mapping and basic mechanisms of inheritance in Drosophila, cytology, embryology, and evolution. Sturtevant’s career was deeply influenced by his early years in the fly room, where Morgan’s unique personality and the close quarters combined to stimulate intellectual excitement and the free exchange of ideas. www.whfreeman.com/pierce Sturtevant’s life More details about Alfred This chapter explores the inheritance of genes located on the same chromosome. These linked genes do not strictly obey Mendel’s principle of independent assortment; rather, they tend to be inherited together. This tendency requires a new approach to understanding their inheritance and predicting the types of offspring produced. A critical piece of information necessary for predicting the results of these crosses is the arrangement of the genes on the chromosomes; thus, it will be necessary to think about the relation between genes and chromosomes. A key to understanding the inheritance of linked genes is to make the conceptual connection between the genotypes in a cross and the behavior of chromosomes during meiosis. We will begin our exploration of linkage by first comparing the inheritance of two linked genes with the inheritance of two genes that assort independently. We will then examine how crossing over breaks up linked genes. This knowledge of linkage and recombination will be used for predicting the results of genetic crosses in which genes are linked and for mapping genes. The last section of the chapter focuses on physical methods of determining the chromosomal locations of genes. Genes That Assort Independently and Those That Don’t Chapter 3 introduced Mendel’s principles of segregation and independent assortment. Let’s take a moment to review these two important concepts. The principle of segregation states that each diploid individual possesses two alleles that separate in meiosis, with one allele going into each gamete. The principle of independent assortment provides additional information about the process of segregation: it tells us that the two alleles separate independently of alleles at other loci. The independent separation of alleles produces recombination, the sorting of alleles into new combinations. Consider a cross between individuals homozygous for two different pairs of alleles: AABB  aabb. The first parent, AABB, produces gametes with alleles AB, and the second parent, aabb, produces gametes with the alleles ab, resulting in F1 progeny with genotype AaBb ( ◗ FIGURE 7.2). Recombination means that, when one of the F1 progeny reproduces, the combination of alleles in its gametes may differ from the combinations in the gametes of its parents. In other words, the F1 may produce gametes with alleles Ab or aB in addition to gametes with AB or ab. Mendel derived his principles of segregation and independent assortment by observing progeny of genetic crosses, but he had no idea of what biological processes produced these phenomena. In 1903, Walter Sutton proposed a biological basis for Mendel’s principles, called the chromosome theory of heredity (Chapter 3). This theory holds that genes are found on chromosomes. Let’s restate Mendel’s two principles in terms of the chromosome theory of heredity. The principle of segregation states that each diploid individual possesses two alleles for a trait, each of which is located at the same position, or locus, on each of the two homologous chromosomes. These chromosomes segregate in meiosis, with each gamete receiving one homolog. The principle of independent assortment states that, in meiosis, Linkage, Recombination, and Eukaryotic Gene Mapping P generation  AA BB aa bb Gamete formation Gamete formation AB ab Gametes Fertilization F1 generation Aa Bb Gamete formation Gametes AB ab Original combinations of alleles (nonrecombinant gametes) Ab aB New combinations of alleles (recombinant gametes) Conclusion: Through recombination, gametes contain new combinations of alleles. ◗ 7.2 Recombination is the sorting of alleles into new combinations. each pair of homologous chromosomes assorts independently of other homologous pairs. With this new perspective, it is easy to see that the number of chromosomes in most organisms is limited and that there are certain to be more genes than chromosomes; so some genes must be present on the same chromosome and should not assort independently. Genes located close together on the same chromosome are called linked genes and belong to the same linkage group. As we’ve said, linked genes travel together during meiosis, eventually arriving at the same destination (the same gamete), and are not expected to assort independently. However, all of the characteristics examined by Mendel in peas did display independent assortment and, after the rediscovery of Mendel’s work, the first genetic characteristics studied in other organisms also seemed to assort independently. How could genes be carried on a limited number of chromosomes and yet assort independently? The apparent inconsistency between the principle of independent assortment and the chromosome theory of heredity soon disappeared, as biologists began finding genetic characteristics that did not assort independently. One of the first cases was reported in sweet peas by William Bateson, Edith Rebecca Saunders, and Reginald C. Punnett in 1905. They crossed a homozygous strain of peas having purple flowers and long pollen grains with a homozygous strain having red flowers and round pollen grains. All the F1 had purple flowers and long pollen grains, indicating that ◗ 7.3 Nonindependent assortment of flower color and pollen shape in sweet peas. purple was dominant over red and long was dominant over round. When they intercrossed the F1, the resulting F2 progeny did not appear in the 9331 ratio expected with independent assortment ( ◗ FIGURE 7.3). An excess of F2 plants had purple flowers and long pollen or red flowers and round pollen (the parental phenotypes). Although Bateson, Saunders, and Punnett were unable to explain these results, we now know that the two loci that they examined lie close together on the same chromosome and therefore do not assort independently. Linkage and Recombination Between Two Genes Genes on the same chromosome are like passengers on a charter bus: they travel together and ultimately arrive at the 161 162 Chapter 7 Meiosis I Meiosis II Late Prophase Metaphase Crossing over Anaphase Metaphase Anaphase Gametes Recombinant chromosomes Genes may switch from one homologous chromosome to another by crossing over in meiosis I. In meiosis II, genes that are normally linked... ...will then assort independently... ...and end up in different gametes. ◗ 7.4 Crossing over takes place in meiosis and is responsible for recombination. same destination. However, genes occasionally switch from one homologous chromosome to another through the process of crossing over (Chapter 2) ( ◗ FIGURE 7.4). Crossing over produces recombination — it breaks up the associations of genes imposed by linkage. As will be discussed later, genes located on the same chromosome can exhibit independent assortment if they are far enough apart. In summary, linkage adds a further complication to interpretations of the results of genetic crosses. With an understanding of how linkage affects heredity, we can analyze crosses for linked genes and successfully predict the types of progeny that will be produced. Notation for Crosses with Linkage In analyzing crosses with linked genes, we must know not only the genotypes of the individuals crossed, but also the arrangement of the genes on the chromosomes. To keep track of this arrangement, we will introduce a new system of notation for presenting crosses with linked genes. Consider a cross between an individual homozygous for dominant alleles at two linked loci and another individual homozygous for recessive alleles at those loci. Previously, we would have written these genotypes as: AABB  aabb For linked genes, however, it’s necessary to write out the specific alleles as they are arranged on each of the homologous chromosomes: A B A B  a b a b In this notation, each line represents one of the two homologous chromosomes. In the first parent of the cross, each homologous chromosome contains A and B alleles; in the second parent, each homologous chromosome contains a and b alleles. Inheriting one chromosome from each parent, the F1 progeny will have the following genotype: A B a b Here, the importance of designating the alleles on each chromosome is clear. One chromosome has the two dominant alleles A and B, whereas the homologous chromosome has the two recessive alleles a and b. The notation can be simplified by drawing only a single line, with the understanding that genes located on the same side of the line lie on the same chromosome: A a B b This notation can be simplified further by separating the alleles on each chromosome with a slash: AB/ab. Remember that the two alleles at a locus are always located on different homologous chromosomes and therefore must lie on opposite sides of the line. Consequently, we would never write the genotypes as: A B a b because the alleles A and a can never be on the same chromosome. It is also important to always keep the same order of the genes on both sides of the line; thus, we should never write: A b B a Linkage, Recombination, and Eukaryotic Gene Mapping because this would imply that alleles A and b are allelic (at the same locus). Complete Linkage Compared with Independent Assortment We will first consider what happens to genes that exhibit complete linkage, meaning that they are located on the same chromosome and do not exhibit detectable crossing over. Genes rarely exhibit complete linkage but, without the complication of crossing over, the effect of linkage can be seen more clearly. We will then consider what happens when genes assort independently. Finally, we will consider the results obtained if the genes are linked but exhibit some crossing over. A testcross reveals the effects of linkage. For example, if a heterozygous individual is test-crossed with a homozygous recessive individual (AaBb  aabb), whatever alleles are present in the gametes contributed by the heterozygous parent will be expressed in the phenotype of the offspring, because the homozygous parent could not contribute dominant alleles that might mask them. Consequently, traits that appear in the progeny reveal which alleles were transmitted by the heterozygous parent. Consider a pair of linked genes in tomato plants. One pair affects the type of leaf: an allele for mottled leaves (m) is recessive to an allele that produces normal leaves (M). Nearby on the same chromosome is another locus that determines the height of the plant: an allele for dwarf (d) is recessive to an allele for tall (D). Testing for linkage can be done with a testcross, which requires a plant heterozygous for both traits. A geneticist might produce this heterozygous plant by crossing a variety of tomato that is homozygous for normal leaves and tall height with a variety that is homozygous for mottled leaves and dwarf height: M M P D D M m F1 m m  d d D d The geneticist would then use these F1 heterozygotes in a testcross, crossing them with plants homozygous for mottled leaves and dwarf height: M m D d  m m d d The results of this testcross are diagrammed in ◗ FIGURE 7.5a. During gamete formation, the heterozygote produces two D chromosome types of gametes: some with the M and others with the m d chromosome. Because no crossing over occurs, these gametes are the only types produced by the heterozygote. Notice that these gametes contain only combinations of alleles that were present in the original parents: either the allele for normal leaves together with the allele for tall height (M and D) or the allele for mottled leaves together with the allele for dwarf height (m and d). Gametes that contain only original combinations of alleles present in the parents are nonrecombinant gametes, or parental gametes. The homozygous parent in the testcross produces only one type of gamete; it contains chromosome m d_ and pairs with one of the two gametes generated by the heterozygous parent (see Figure 7.5a). Two types of progeny result: half have normal leaves and are tall: M m D d and half have mottled leaves and are dwarf: m m d d These progeny display the original combinations of traits present in the P generation and are nonrecombinant progeny, or parental progeny. No new combinations of the two traits, such as normal leaves with dwarf or mottled leaves with tall, appear in the offspring, because the genes affecting the two characteristics are completely linked and are inherited together. New combinations of traits could arise only if the linkage between M and D or between m and d were broken. These results are distinctly different from the results that are expected when genes assort independently ( ◗ FIGURE 7.5b). With independent assortment, the heterozygous plant (MmDd) would produce four types of gametes: two nonrecombinant gametes containing the original combinations of alleles (MD and md) and two gametes containing new combinations of alleles (Md and mD). Gametes with new combinations of alleles are called recombinant gametes. With independent assortment, nonrecombinant and recombinant gametes are produced in equal proportions. These four types of gametes join with the single type of gamete produced by the homozygous parent of the testcross to produce four kinds of progeny in equal proportions (see Figure 7.5b). The progeny with new combinations of traits formed from recombinant gametes are termed recombinant progeny. In summary, a testcross in which one of the plants is heterozygous for two completely linked genes yields two types of progeny, each type displaying one of the original combinations of traits present in the P generation. Independent assortment, in contrast, produces two types of 163 164 Chapter 7 ◗ 7.5 A testcross reveals the effects of linkage. Results of a testcross for two loci in tomatoes that determine leaf type and plant height. recombinant progeny and two types of nonrecombinant progeny in equal proportions. Crossing Over with Linked Genes Linkage is rarely complete — usually, there is some crossing over between linked genes (incomple linkage), producing new combinations of traits. Let’s see how this occurs. Theory The effect of crossing over on the inheritance of two linked genes is shown in ◗ FIGURE 7.6. Crossing over, which takes place in prophase I of meiosis, is the exchange of genetic material between nonsister chromatids (see Figures 2.15 and 2.17). After a single crossover has taken place, the two chromatids that did not participate in crossing over are unchanged; gametes that receive these chromatids are Linkage, Recombination, and Eukaryotic Gene Mapping (a) No crossing over 1 Homologous chromosomes pair in prophase I. A A a a B B b b 2 If no crossing over occurs... Meiosis II A A a a 3 …all resulting chromosomes in gametes have original allele combinations and are nonrecombinants. B B b b (b) Crossing over 1 A crossover may occur in prophase I. 2 In this case, half of the resulting gametes will have unchanged chromosomes (nonrecombinants)… A A B B a a b b Meiosis II A A a a B b B b Nonrecombinant Recombinant Recombinant Nonrecombinant 3 ….and half will have recombinant chromosomes. ◗ 7.6 Crossing over produces half nonrecombinant gametes and half recombinant gametes. nonrecombinants. The other two chromatids, which did participate in crossing over, now contain new combinations of alleles; gametes that receive these chromatids are recombinants. For each meiosis in which a single crossover takes place, then, two nonrecombinant gametes and two recombinant gametes will be produced. This result is the same as that produced by independent assortment (see Figure 7.5b); so, when crossing over between two loci takes place in every meiosis, it is impossible to determine whether the genes were linked and crossing over took place or whether the genes are on different chromosomes. For closely linked genes, crossing over does not take place in every meiosis. In meioses in which there is no crossing over, only nonrecombinant gametes are produced. In meioses in which there is a single crossover, half the gametes are recombinants and half are nonrecombinants (because a single crossover only affects two of the four chromatids); so the total percentage of recombinant gametes is always half the percentage of meioses in which crossing over takes place. Even if crossing over between two genes takes place in every meiosis, only 50% of the resulting gametes will be recombinants. Thus, the frequency of recombinant gametes is always half the frequency of crossing over, and the maximum proportion of recombinant gametes is 50%. Concepts Linkage between genes causes them to be inherited together and reduces recombination; crossing over breaks up the associations of such genes. In a testcross for two linked genes, each crossover produces two recombinant gametes and two nonrecombinants. The frequency of recombinant gametes is half the frequency of crossing over, and the maximum frequency of recombinant gametes is 50%. Application Let us apply what we have learned about linkage and recombination to a cross between tomato plants that differ in the genes that code for leaf type and plant height. Assume now that these genes are linked and that some crossing over takes place between them. Suppose a geneticist carried out the testcross outlined earlier: M m D d  m m d d When crossing over takes place between the genes for leave type and height, two of the four gametes produced will be recombinants. When there is no crossing over, all four resulting gametes will be nonrecombinants. Thus, over all, the majority of gametes will be nonrecombinants. These gametes then unite with gametes produced by the homozygous recessive parent, which contain only the recessive alleles, resulting in mostly nonrecombinant progeny and a few recombinant progeny ( ◗ FIGURE 7.7). In this cross, we see that 55 of the testcross progeny have normal leaves and are tall and 53 have mottled leaves and are dwarf. These plants are the nonrecombinant progeny, containing the original combinations of traits that were present in the parents. Of the 123 progeny, 15 have new combinations of traits that were not seen in the parents: 8 are normal leaved and dwarf, and 7 are mottle leaved and tall. These plants are the recombinant progeny. 165 166 Chapter 7 The results of a cross such as the one illustrated in Figure 7.7 reveal several things. A testcross for two independently assorting genes is expected to produce a 1111 phenotypic ratio in the progeny. The progeny of this cross clearly do not exhibit such a ratio; so we might suspect that the genes are not assorting independently. When linked genes undergo crossing over, the result is mostly nonrecombinant progeny and fewer recombinant progeny. This result is what we observe among the progeny of the testcross illustrated in Figure 7.7; so we conclude that two genes show evidence of linkage with some crossing over. Calculation of Recombination Frequency The percentage of recombinant progeny produced in a cross is called the recombination frequency, which is calculated as follows: recombinant number of recombinant progeny   100% frequency total number of progeny In the testcross shown in Figure 7.7, 15 progeny exhibit new combinations of traits; so the recombination frequency is: 87 15  100%   100%  12% 55  53  8  7 123 Thus, 12% of the progeny exhibit new combinations of traits resulting from crossing over. Coupling and Repulsion In crosses for linked genes, the arrangement of alleles on the homologous chromosomes is critically important in determining the outcome of the cross. For example, consider the inheritance of two linked genes in the Australian blowfly, Lucilia cuprina. In this species, one locus determines the color of the thorax: purple thorax (p) is recessive to the normal green thorax (p). A second locus determines the color of the puparium: a black puparium (b) is recessive to the normal brown puparium (b). These loci are located close together on the second chromosome. Suppose we test-cross a fly that is heterozygous at both loci with a fly that is homozygous recessive at both. Because these genes are linked, there are two possible arrangements on the chromosomes of the heterozygous fly. The dominant alleles for green thorax (p) and brown puparium (b) might reside on the same chromosome, and the recessive alleles for purple thorax (p) and black puparium (b) might reside on the other homologous chromosome: p p b b This arrangement, in which wild-type alleles are found on one chromosome and mutant alleles are found on the other ◗ 7.7 Crossing over between linked genes produces nonrecombinant and recombinant offspring. In this testcross, genes are linked and there is some crossing over. For comparison, this cross is the same as that illustrated in Figure 7.5. Linkage, Recombination, and Eukaryotic Gene Mapping chromosome, is referred to as coupling, or the cis configuration. Alternatively, one chromosome might bear the alleles for green thorax (p) and black puparium (b), and the other chromosome would carry the alleles for purple thorax (p) and brown puparium (b): p p b b This arrangement, in which each chromosome contains one wild-type and one mutant allele, is called the repulsion or trans configuration. Whether the alleles in the heterozygous parent are in coupling or repulsion determines which phenotypes will be most common among the progeny of a testcross. ◗ When the alleles are in the coupling configuration, the most numerous progeny types are those with green thorax and brown puparium and those with purple thorax and black puparium ( ◗ FIGURE 7.8a); but, when the alleles of the heterozygous parent are in repulsion, the most numerous progeny types are those with green thorax and black puparium and those with purple thorax and brown puparium ( ◗ FIGURE 7.8b). Notice that the genotypes of the parents in Figure 7.8a and b are the same (pp bb  pp bb) and that the dramatic difference in the phenotypic ratios of the progeny in the two crosses results entirely from the configuration — coupling or repulsion — of the chromosomes. It is essential to know the arrangement of the alleles on the chromosomes to accurately predict the outcome of crosses in which genes are linked. 7.8 The arrangement of linked genes on a chromosome (coupling or repulsion) affects the results of a testcross. Linked loci in the Australian blowfly, Lucilia cuprina, determine the color of the thorax and that of the puparium. 167 168 Chapter 7 Concepts In a cross, the arrangement of linked alleles on the chromosomes is critical for determining the outcome. When two wild-type alleles are on one homologous chromosome and two mutant alleles are on the other, they are in the coupling configuration; when each chromosome contains one wild-type allele and one mutant allele, the alleles are in repulsion. Connecting Concepts Relating Independent Assortment, Linkage, and Crossing Over We have now considered three situations concerning genes at different loci. First, the genes may be located on different chromosomes; in this case, they exhibit independent assortment and combine randomly when gametes are formed. An individual heterozygous at two loci (AaBb) produces four types of gametes (AB, ab, Ab, and aB) in equal proportions: two types of nonrecombinants and two types of recombinants. Second, the genes may be completely linked — meaning that they’re on the same chromosome and lie so close together that crossing over between them is rare. In this case, the genes do not recombine. An individual heterozygous for two closely linked genes in the coupling configuration: A a B b produces only the nonrecombinant gametes containing alleles AB or ab. The alleles do not assort into new combinations such as Ab or aB. The third situation, incomplete linkage, is intermediate between the two extremes of independent assortment and complete linkage. Here, the genes are physically linked on the same chromosome, which prevents independent assortment. However, occasional crossovers break up the linkage and allow them to recombine. With incomplete linkage, an individual heterozygous at two loci produces four types of gametes — two types of recombinants and two types of nonrecombinants — but the nonrecombinants are produced more frequently than the recombinants because crossing over does not take place in every meiosis. Linkage and crossing over are two opposing forces: linkage binds alleles at different loci together, restricting their ability to associate freely, whereas crossing over breaks the linkage and allows alleles to assort into new combinations. Earlier in the chapter, the term recombination was defined as the sorting of alleles into new combinations. We can now distinguish between two types of recombination that differ in the mechanism that generates these new combinations of alleles. Interchromosomal recombination is between genes on different chromosomes. It arises from independent assortment — the random segregation of chromosomes in anaphase I of meiosis. Intrachromosomal recombination is between genes located on the same chromosome. It arises from crossing over — the exchange of genetic material in prophase I of meiosis. Both types of recombination produce new allele combinations in the gametes; so they cannot be distinguished by examining the types of gametes produced. Nevertheless, they can often be distinguished by the frequencies of types of gametes: interchromosomal recombination produces 50% nonrecombinant gametes and 50% recombinant gametes, whereas intrachromosomal recombination frequently produces less than 50% recombinant gametes. However, when the genes are very far apart on the same chromosome, intrachromosomal recombination also produces 50% recombinant gametes. The two mechanisms are then genetically indistinguishable. Concepts Recombination is the sorting of alleles into new combinations. Interchromosomal recombination, produced by independent assortment, is the sorting of alleles on different chromosomes into new combinations. Intrachromosomal recombination, produced by crossing over, is the sorting of alleles on the same chromosome into new combinations. The Physical Basis of Recombination William Sutton’s chromosome theory of inheritance, which stated that genes are physically located on chromosomes, was supported by Nettie Stevens and Edmund Wilson’s discovery that sex was associated with a specific chromosome in insects (p. 000 in Chapter 4) and Calvin Bridges’ demonstration that nondisjunction of X chromosomes was related to the inheritance of eye color in Drosophila (p. 000 in Chapter 4). Further evidence for the chromosome theory of heredity came in 1931, when Harriet Creighton and Barbara McClintock ( ◗ FIGURE 7.9) obtained evidence that intrachromosomal recombination was the result of physical exchange between chromosomes. Creighton and McClintock discovered a strain of corn that had an abnormal chromosome 9, containing a densely staining knob at one end and a small piece of another chromosome attached to the other end. This aberrant chromosome allowed them to visually distinguish the two members of a homologous pair. They studied the inheritance of two traits in corn determined by genes on chromosome 9: at one locus, a dom- Linkage, Recombination, and Eukaryotic Gene Mapping ◗ 7.9 Barbara McClintock (left) and Harriet Creighton (right) provided evidence that genes are located on chromosomes. (Karl Maramorosch/Cold Spring Harbor Laboratory Archives.) inant allele (C) produced colored kernels, whereas a recessive allele (c) produced colorless kernels; at another, linked locus, a dominant allele (Wx) produced starchy kernels, whereas a recessive allele (wx) produced waxy kernels. Creighton and McClintock obtained a plant that was heterozygous at both loci in repulsion, with the alleles for colored and waxy on the aberrant chromosome and the alleles for colorless and starchy on a normal chromosome: Knob Extra piece C wx c Wx Predicting the Outcomes of Crosses with Linked Genes They crossed this heterozygous plant with a plant that was homozygous for colorless and heterozygous for waxy: C c wx Wx c c  Wx wx This cross will produce different combinations of traits in the progeny, but the only way that colorless and waxy progeny can arise is through crossing over in the doubly heterozygous parent: C wx c Wx c  Crossing over Colored, starchy progeny C Wx c wx Colorless, waxy progeny c wx c wx Notice that, if crossing over entails physical exchange between the chromosomes, then the colorless, waxy progeny resulting from recombination should have a chromosome with an extra piece, but not a knob. Furthermore, some of the colored, starchy progeny should possess a knob but not the extra piece. This outcome is precisely what Creighton and McClintock observed, confirming the chromosomal theory of inheritance. Curt Stern provided a similar demonstration by using chromosomal markers in Drosophila at about the same time. We will examine the molecular basis of recombination in more detail in Chapter 12. wx Knowing the arrangement of alleles on a chromosome allows us to predict the types of progeny that will result from a cross entailing linked genes and to determine which of these types will be the most numerous. Determining the proportions of the types of offspring requires an additional piece of information — the recombination frequency. The recombination frequency provides us with information about how often the alleles in the gametes appear in new combinations and allows us to predict the proportions of offspring phenotypes that will result from a specific cross entailing linked genes. In cucumbers, smooth fruit (t) is recessive to warty fruit (T) and glossy fruit (d) is recessive to dull fruit (D). Geneticists have determined that these two genes exhibit a recombination frequency of 16%. Suppose we cross a plant homozygous for warty and dull fruit with a plant homozygous for smooth and glossy fruit and then carry out a testcross by using the F1 T t D d  t t d d What types and proportions of progeny will result from this testcross? 169 170 Chapter 7 Four types of gametes will be produced by the heterozygous parent, as shown in ( ◗ FIGURE 7.10): two types of nonrecombinant gametes ( T D and t d ) and two types of recombinant gametes ( T d and t D ). The recombination frequency tells us that 16% of the gametes produced by the heterozygous parent will be recombinants. Because there are two types of recombinant gametes, each should arise with a frequency of 16%/2  8%. All the other gametes will be nonrecombinants; so they should arise with a frequency of 100%  16%  84%. Because there are two types of nonrecombinant gametes, each should arise with a frequency of 84%/2  42%. The other parent in the testcross is homozygous and therefore produces only a single type of gamete (t d ) with a probability of 1.00. The progeny of the cross result from the union of two gametes, producing four types of progeny (see Figure 7.10). The expected proportion of each type can be determined by using the multiplication rule, multiplying together the probability of each uniting gamete. Testcross progeny with warty and dull fruit T t D d appear with a frequency of 0.42 (the probability of D from inheriting a gamete with chromosome T the heterozygous parent)  1.00 (the probability of inheriting a gamete with chromosome t d from the recessive parent)  0.42. The proportions of the other types of F2 progeny can be calculated in a similar manner (see Figure 7.10). This method can be used for predicting the outcome of any cross with linked genes for which the recombination frequency is known. Testing for Independent Assortment In some crosses, the genes are obviously linked because there are clearly more nonrecombinants than recombinants. In other crosses, the difference between independent assortment and linkage is not so obvious. For example, suppose we did a testcross for two pairs of genes, such as AaBb  aabb, and observed the following numbers of progeny: 54 AaBb, 56 aabb, 42 Aabb, and 48 aaBb. Is this outcome a 1111 ratio? Not exactly, but it’s pretty close. Perhaps these genes are assorting independently and chance produced the slight deviations between the observed numbers and the expected 1111 ratio. Alternatively, the genes might be linked, with considerable crossing over taking place between them, and so the number of nonrecombinants is only slightly greater than the number of recombinants. How do we distinguish between the roles of chance and of linkage in producing deviations from the results expected with independent assortment? We encountered a similar problem in crosses in which genes were unlinked — the problem of distinguishing between deviations due to chance and those due to other ◗ 7.10 The recombination frequency allows a prediction of the proportions of offspring expected for a cross entailing linked genes. Linkage, Recombination, and Eukaryotic Gene Mapping factors. We addressed this problem (in Chapter 3) with the goodness-of-fit chi-square test, which serves to evaluate the likelihood that chance alone is responsible for deviations between observed and expected numbers. The chi-square test can also be used to test the goodness of fit between observed numbers of progeny and the numbers expected with independent assortment. Testing for independent assortment between two linked genes requires the calculation of a series of three chi-square tests. To illustrate this analysis, we will examine the data from a cross between German cockroaches, in which yellow body (y) is recessive to brown body (y) and curved wings (cv) are recessive to straight wings (cv). A testcross (yy cvcv  yy cvcv) produced the following progeny: 63 77 28 32 200 yy cvcv yy cvcv yy cvcv yy cvcv total progeny brown body, straight wings yellow body, curved wings brown body, curved wings yellow body, straight wings (95  100)2 (105  100)2  100 100 25 25    0.25  0.25  0.50 100 100 2  The degree of freedom associated with this chi-square value also is 2  1  1, and the associated probability is between .5 and .3. We again assume that there is no significant difference between what we observed and what we expected at this locus in the testcross. Testing ratios for independent assortment We are Testing ratios at each locus To determine if the genes for body color and wing shape are assorting independently, we must examine each locus separately and determine whether the observed numbers differ from the expected (we will consider why this step is necessary at the end of this section). At the first locus (for body color), the cross between heterozygote and homozygote (yy  yy) is expected to produce 12 yy brown and 12 yy yellow progeny; so we expect 100 of each. We observe 63  28  91 brown progeny and 77  32  109 yellow progeny. Applying the chi-square test (see Chapter 3) to these observed and expected numbers, we obtain: 2   We next compare the observed and expected ratios for the second locus, which determines the type of wing. At this locus, a heterozygote and homozygote also were crossed (cvcv  cvcv) and are expected to produce 12 cvcv straight-winged progeny and 12 cvcv curved-wing progeny. We actually observe 63  32  95 straight-winged progeny and 77  28  105 curved-wing progeny; so the calculated chi-square value is: Genotypes yy cv cv yy cvcv yy cvcv (observed  expected)2 expected yy cv cv (91  100) (109  100) 2   100 100 81 81    0.81  0.81  1.62 100 100 2 now ready to test for the independent assortment of genes at the two loci. If the genes are assorting independently, we can use the multiplication rule to obtain the probabilities and numbers of progeny inheriting different combinations of phenotypes: 2 The degrees of freedom associated with the chi-square test (Chapter 3) are n  1, where n equals the number of expected classes. Here, there are two expected phenotypes; so the degree of freedom is 2  1  1. Looking up our calculated chi-square value in Table 3.4, we find that the probability associated with this chi-square value is between .30 and .20. Because the probability is above .05 (our critical probability for rejecting the hypothesis that chance produces the difference between observed and expected values), we conclude that there is no significant difference between the 11 ratio that we expect in the progeny of the testcross and the ratio that we observed. Expected Expected phenoproportypes tions brown, straight yellow, curved brown, curved yellow, straight 2  12  1 4 1 2  12  1 4 1 2  12  1 4 1 2  12  1 4 1 Expected Observed numbers numbers 50 63 50 77 50 28 50 32 The observed and expected numbers of progeny can now be compared by using the chi-square test: 2  (63  50)2 (77  50)2 (28  50)2   50 50 50 (32  50)2   34.12 50 Here, we have four expected classes of phenotypes; so the degrees of freedom equal 4  1  3 and the associated probability is considerably less than .001. This very small probability indicates that the phenotypes are not in the proportions that we would expect if independent assortment were taking place. Our conclusion, then, is that these genes are not assorting independently and must be linked. 171 172 Chapter 7 In summary, testing for linkage between two genes requires a series of chi-square tests: a chi-square test for the segregation of alleles at each individual locus, followed by a test for independent assortment between alleles at the different loci. The chi-square tests for segregation at individual loci should always be carried out before testing for independent assortment, because the probabilities expected with independent assortment are based on the probabilities expected at the separate loci. Suppose that the genes in the cockroach example were assorting independently and that some of the cockroaches with curved wings died in embryonic development; the observed proportion with curved wings was then 13 instead of 12. In this case, the proportion of offspring with yellow body and curved wings expected under independent assortment should be 1 3  12  16 instead of 14. Without the initial chi-square test for segregation at the curved-wing locus, we would have no way of knowing that what we expected with independent assortment was 16 instead of 14. If we carried out only the final test for independent assortment and assumed an expected 1111 ratio, we would obtain a high chisquare value. We might conclude, erroneously, that the genes were linked. If a significant chi-square (one that has a probability less than 0.05) is obtained in either of the first two tests for segregation, then the final chi-square for independent assortment should not be carried out, because the true expected values are unknown. Gene Mapping with Recombination Frequencies Morgan and his students developed the idea that physical distances between genes on a chromosome are related to the rates of recombination. They hypothesized that crossover events occur more or less at random up and down the chromosome and that two genes that lie far apart are more likely to undergo a crossover than are two genes that lie close together. They proposed that recombination frequencies could provide a convenient way to determine the order of genes along a chromosome and would give estimates of the relative distances between the genes. Chromosome maps calculated by using recombination frequencies are called genetic maps. In contrast, chromosome maps based on physical distances along the chromosome (often expressed in terms of numbers of base pairs) are called physical maps. Distances on genetic maps are measured in map units (abbreviated m.u.); one map unit equals 1% recombination. Map units are also called centimorgans (cM), in honor of Thomas Hunt Morgan; one morgan equals 100 m.u. Genetic distances measured with recombination rates are approximately additive: if the distance from gene A to gene B is 5 m.u., the distance from gene B to gene C 10 m.u., and the distance from gene A to gene C is 15 m.u., gene B must be located between genes A and C. On the basis of the map distances just given, we could draw a simple genetic map for genes A, B, and C, as shown here: 15 m.u. A B 5 m.u. C 10 m.u. We could just as plausibly draw this map with C on the left and A on the right: 15 m.u. C B 10 m.u. 5 m.u. A Both maps are correct and equivalent because, with information about the relative positions of only three genes, the most that we can determine is which gene lies in the middle. If we obtained distances to an additional gene, then we could position A and C relative to that gene. An additional gene D, examined through genetic crosses, might yield the following recombination frequencies: Gene pair A and D B and D C and D Recombination frequency (%) 8 13 23 Notice that C and D exhibit the greatest amount of recombination; therefore, C and D must be farthest apart, with genes A and B between them. Using the recombination frequencies and remembering that 1 m.u.  1% recombination, we can now add D to our map: 23 m.u. 13 m.u. D 8 m.u. A 15 m.u. 5 m.u. B 10 m.u. C By doing a series of crosses between pairs of genes, we can construct genetic maps showing the linkage arrangements of a number of genes. Two points should be emphasized about constructing chromosome maps from recombination frequencies. First, recall that the recombination frequency between two genes cannot exceed 50% and that 50% is also the rate of recombination for genes located on different chromosomes. Consequently, one cannot distinguish between genes on different chromosomes and genes located far apart on the same chromosome. If genes exhibit 50% recombination, the most that can be said about them is that they belong to different groups of linked genes (different linkage groups), either on different chromosomes or far apart on the same chromosome. Linkage, Recombination, and Eukaryotic Gene Mapping A A a a B B b b Double crossover 1 A single crossover will switch the alleles on homologous chromosomes,... A A A B B B a a b b 2 ...but a second crossover will reverse the effects of the first, restoring the original parental combination of alleles,... Meiosis II A A a a B B b b 3 ...and producing only nonrecombinant genotypes in the gametes, although parts of the chromosomes have recombined. A second point is that a testcross for two genes that are relatively far apart on the same chromosome tends to underestimate the true recombination frequency, because the cross does not reveal double crossovers that might take place between the two genes ( ◗ FIGURE 7.11). A double crossover arises when two separate crossover events take place between the same two loci. Whereas a single crossover switches the alleles on the homologous chromosomes — producing combinations of alleles that were not present on the original parental chromosomes — a second crossover between the same two genes reverses the effects of the first, thus restoring the original parental combination of alleles (see Figure 7.11). Double crossovers produce only nonrecombinant gametes, and we cannot distinguish between the progeny produced by double crossovers and the progeny produced when there is no crossing over. However, as we shall see in the next section, it is possible to detect double crossovers if we examine a third gene that lies between the two crossovers. Because double crossovers between two genes go undetected, recombination frequencies will be underestimated whenever double crossovers take place. Double crossovers are more frequent between genes that are far apart; therefore genetic maps based on short distances are always more accurate than those based on longer distances. Concepts A genetic map provides the order of the genes on a chromosome and the approximate distances among the genes based on recombination frequencies. In genetic maps, 1% recombination equals 1 map unit, or 1 centimorgan. Double crossovers between two genes go undetected; so map distances between distant genes tend to underestimate genetic distances. ◗ 7.11 A double crossover between two linked genes produces only nonrecombinant gametes. Constructing a Genetic Map with Two-Point Testcrosses Genetic maps can be constructed by conducting a series of testcrosses between pairs of genes and examining the recombination frequencies between them. A testcross between two genes is called a two-point testcross or a two-point cross for short. Suppose that we carried out a series of two-point crosses for four genes, a, b, c, and d, and obtained the following recombination frequencies: Gene loci in testcross Recombination frequency (%) a and b a and c a and d b and c b and d c and d 50 50 50 20 10 28 We can begin constructing a genetic map for these genes by considering the recombination frequencies for each pair of genes. The recombination frequency between a and b is 50%, which is the recombination frequency expected with independent assortment. Genes a and b may therefore either be on different chromosomes or be very far apart on the same chromosome; so we will place them in different linkage groups with the understanding that they may or may not be on the same chromosome: Linkage group 1 a Linkage group 2 b 173 174 Chapter 7 The recombination frequency between a and c is 50%, indicating that they, too, are in different linkage groups. The recombination frequency between b and c is 20%; so these genes are linked and separated by 20 map units: Linkage group 1 a d and c (28%). (This is what was meant by saying that recombination rates — i.e., map units — are approximately additive.) This discrepancy arises because double crossovers between the two outer genes go undetected, causing an underestimation of the true recombination frequency. The genetic map of these genes is now complete: Linkage group 1 a Linkage group 2 b c Linkage group 2 20 m.u. d The recombination frequency between a and d is 50%, indicating that these genes belong to different linkage groups, whereas genes b and d are linked, with a recombination frequency of 10%. To decide whether gene d is 10 map units to the left or right of gene b, we must consult the c-to-d distance. If gene d is 10 map units to the left of gene b, then the distance between d and c should be 20 m.u.  10 m.u.  30 m.u. This distance will be only approximate because any double crossovers between the two genes will be missed and the recombination frequency will be underestimated. If, on the other hand, gene d lies to the right of gene b, then the distance between gene d and c will be much shorter, approximately 20 m.u.  10 m.u.  10 m.u. By examining the recombination frequency between c and d, we can distinguish between these two possibilities. The recombination frequency between c and d is 28%; so gene d must lie to the left of gene b. Notice that the sum of the recombination between d and b (10%) and between b and c (20%) is greater than the recombination between Centromere (a) Single crossover between A and B A B C A B C a a b b c c (b) A A B B C C a a b b c c Single crossover between B and C A B C A B C a a Meiosis b b c c b c 10 m.u. 30 m.u. 20 m.u. Linkage and Recombination Between Three Genes Genetic maps can be constructed from a series of testcrosses for pairs of genes, but this approach is not particularly efficient, because numerous two-point crosses must be carried out to establish the order of the genes and because double crossovers are missed. A more efficient mapping technique is a testcross for three genes (a three-point testcross, or threepoint cross). With a three-point cross, the order of the three genes can be established in a single set of progeny and some double crossovers can usually be detected, providing more accurate map distances. Consider what happens when crossing over takes place among three hypothetical linked genes. ◗ FIGURE 7.12 illustrates a pair of homologous chromosomes from an Pair of homologous chromosomes A A Double crossover B B C C a a b b c c (c) Meiosis Meiosis A A B b C c A A B B C c A A B b C C a a B b C c a a b b C c a a B b c c Conclusion: Recombinant chromosomes resulting from the double crossover have only the middle gene altered. ◗ 7.12 Three types of crossovers can take place among three linked loci. Linkage, Recombination, and Eukaryotic Gene Mapping individual that is heterozygous at three loci (AaBbCc). Notice that the genes are in the coupling configuration; that is, all the dominant alleles are on one chromosome ( A B C ) and all the recessive alleles are on the other chromosome ( a b c ). Three types of crossover events can take place between these three genes: two types of single crossovers (see Figure 7.12a and b) and a double crossover (see Figure 7.12c). In each type of crossover, two of the resulting chromosomes are recombinants and two are nonrecombinants. Notice that, in the recombinant chromosomes resulting from the double crossover, the outer two alleles are the same as in the nonrecombinants, but the middle allele is different. This result provides us with an important clue about the order of the genes. In progeny that result from a double crossover, only the middle allele should differ from the alleles present in the nonrecombinant progeny. st st Gene Mapping with the Three-Point Testcross To examine gene mapping with a three-point testcross, we will consider three recessive mutations in the fruit fly Drosophila melanogaster. In this species, scarlet eyes (st) are recessive to red eyes (st), ebony body color (e) is recessive to gray body color (e), and spineless (ss) — that is, the presence of small bristles — is recessive to normal bristles (ss). All three mutations are linked and located on the third chromosome. We will refer to these three loci as st, e, and ss, but keep in mind that either recessive alleles (st, e, and ss) or the dominant alleles (st, e, and ss) may be present at each locus. So, when we say that there are 10 m.u. between st and ss, we mean that there are 10 m.u. between the loci at which these mutations occur; we could just as easily say that there are 10 m.u. between st and ss. To map these genes, we need to determine their order on the chromosome and the genetic distances between them. First, we must set up a three-point testcross, a cross between a fly heterozygous at all three loci and a fly homozygous for recessive alleles at all three loci. To produce flies heterozygous for all three loci, we might cross a stock of flies that are homozygous for normal alleles at all three loci with flies that are homozygous for recessive alleles at all three loci: P F1 st st e e ss ss st st Additionally, the alleles in these heterozygotes are in coupling configuration (because all the wild-type dominant alleles were inherited from one parent and all the recessive mutations from the other parent), although the testcross can also be done with genes in repulsion. In the three-point testcross, we cross the F1 heterozygotes with flies that are homozygous for all three recessive mutations. In many organisms, it makes no difference whether the heterozygous parent in the testcross is male or female (provided that the genes are autosomal) but, in Drosophila, no crossing over takes place in males. Because crossing over in the heterozygous parent is essential for determining recombination frequencies, the heterozygous flies in our testcross must be female. So we mate female F1 flies that are heterozygous for all three traits with male flies that are homozygous for all the recessive traits:  st st e e ss ss e e ss ss The order of the genes has been arbitrarily assigned because at this point we do not know which is the middle gene. e e ss female ss  st st e e ss ss male The progeny produced from this cross are listed in ◗ FIGURE 7.13. For each locus, two classes of progeny are produced: progeny that are heterozygous, displaying the dominant trait, and progeny that are homozygous, displaying the recessive trait. With two classes of progeny possible for each of the three loci, there will be 23  8 classes of phenotypes possible in the progeny. In this example, all eight phenotypic classes are present but, in some three-point crosses, one or more of the phenotypes may be missing if the number of progeny is limited. Nevertheless, the absence of a particular class can provide important information about which combination of traits is least frequent and ultimately the order of the genes, as we will see. To map the genes, we need information about where and how often crossing over has occurred. In the homozygous recessive parent, the two alleles at each locus are the same; and so crossing over will have no effect on the types of gametes produced; with or without crossing over, all gametes from this parent have a chromosome with three recessive alleles ( st e ss ). In contrast, the heterozygous parent has different alleles on its two chromosomes; so crossing over can be detected. The information that we need for mapping, therefore, comes entirely from the gametes produced by the heterozygous parent. Because chromosomes contributed by the homozygous parent carry only recessive alleles, whatever alleles are present on the chromosome contributed by the heterozygous parent will be expressed in the progeny. As a shortcut, we usually do not write out the complete genotypes of the testcross progeny, listing instead only the alleles expressed in the phenotype (as shown in Figure 7.13), which are the alleles inherited from the heterozygous parent. 175 176 Chapter 7 Concepts To map genes, information about the location and number of crossovers in the gametes that produced the progeny of a cross is needed. An efficient way to obtain this information is use a three-point testcross, in which an individual heterozygous at three linked loci is crossed with an individual that is homozygous recessive at the three loci. Determining the gene order The first task in mapping the genes is to determine their order on the chromosome. In Figure 7.13, we arbitrarily listed the loci in the order st, e, ss, but we had no way of knowing which of the three loci was between the other two. We can now identify the middle locus by examining the double-crossover progeny. First, determine which progeny are the nonrecombinants — they will be the two most-numerous classes of progeny. (Even if crossing over takes place in every meiosis, the nonrecombinants will comprise at least 50% of the progeny.) Among the progeny of the testcross in Figure 7.13, the most numerous are those with all three dominant traits e ss ) and those with all three recessive traits ( st e ss ). ( st Next, identify the double-crossover progeny. These should always be the two least-numerous phenotypes, because the probability of a double crossover is always less than the probability of a single crossover. The least-common progeny among those listed in Figure 7.13 are progeny with red eyes, gray body, and spineless bristles ( st e ss ) and progeny with scarlet eyes, ebony body, and normal bristles ( st e ss ); so they are the double-crossover progeny. Three orders of genes are possible: the eye-color locus st ss ), the body-color could be in the middle ( e locus could be in the middle ( st e ss ), or the bristle locus could be in the middle ( st ss e ). To determine which gene is in the middle, we can draw the chromosomes of the heterozygous parent with all three possible gene orders and then see if a double crossover produces the combination of genes observed in the doublecrossover progeny. The three possible gene orders and the types of progeny produced by their double crossovers are: Original chromosomes e st ss e st st e ss ss st e ss e st ss e st st e ss ss st e ss : 1. st ss e st ss e st ss e st ss st e ss : st ss e : 3. e : : 2. ◗ Chromosomes after crossing over st e ss st ss e st ss e : st ss e 7.13 The results of a three-point testcross can be used to map linked genes. In this three-point testcross in Drosophila melanogaster, the recessive mutations scarlet eyes (st), ebony body color (e), and spineless bristles (ss) are at three linked loci. The order of the loci has been designated arbitrarily, as has the sex of the progeny flies. Linkage, Recombination, and Eukaryotic Gene Mapping The only gene order that produces chromosomes with alleles for the traits observed in the double crossovers (st e ss and st e ss) is the third one, where the locus for bristle shape lies in the middle. Therefore, this order ( st ss e ) must be the correct sequence of genes on the chromosome. With a little practice, it’s possible to quickly determine which locus is in the middle without writing out all the gene orders. The phenotypes of the progeny are expressions of the alleles inherited from the heterozygous parent. Recall that, when we looked at the results of double crossovers (see Figure 7.13), only the alleles at the middle locus differed from the nonrecombinants. If we compare the nonrecombinant progeny with double-crossover progeny, they should differ only in alleles of the middle locus. Let’s compare the alleles in the double-crossover progeny st e ss with those in the nonrecombinant progeny  st e ss . We see that both have an allele for red  eyes (st ) and both have an allele for gray body (e), but the nonrecombinants have an allele for normal bristles (ss), whereas the double crossovers have an allele for spineless bristles (ss). Because the bristle locus is the only one that differs, it must lie in the middle. We would obtain the same results if we compared the other class of double-crossover progeny ( st e ss ) with other nonrecombinant progeny ( st e ss ). Again the only trait that differs is the one for bristles. Don’t forget that the nonrecombinants and the double crossovers should differ only at one locus; if they differ in two loci, the wrong classes of progeny are being compared. Concepts To determine the middle locus in a three-point cross, compare the double-crossover progeny with the nonrecombinant progeny. The double crossovers will be the two least-common classes of phenotypes; the nonrecombinants will be the two most-common classes of phenotypes. The double-crossover progeny should have the same alleles as the nonrecombinant types at two loci and different alleles at the locus in the middle. Determining the locations of crossovers When we know the correct order of the loci on the chromosome, we should rewrite the phenotypes of the testcross progeny in Figure 7.13 with the loci in the correct order so that we can determine where crossovers have taken place ( ◗ FIGURE 7.14). Among the eight classes of progeny, we have already idenss e and tified two classes as nonrecombinants ( st ss e ) and two classes as double crossovers st ( st ss e  and st ss  e ). The other four classes include progeny that resulted from a chromosome that underwent a single crossover: two underwent single ◗ 7.14 Writing the results of a three-point testcross with the loci in the correct order allows the locations of crossovers to be determined. These results are from the testcross illustrated in Figure 7.13, with the loci shown in the correct order. The location of a crossover is indicated with a slash ( /). The sex of the progeny flies has been designated arbitrarily. 177 178 Chapter 7 crossovers between st and ss, and two underwent single crossovers between ss and e. To determine where the crossovers took place in these progeny, compare the alleles found in the single-crossover progeny with those found in the nonrecombinants, just as we did for the double crossovers. Some of the alleles in the singlecrossover progeny are derived from one of the original (nonrecombinant) chromosomes of the heterozygous parent, but at some place there is a switch (due to crossing over) and the remaining alleles are derived from the homologous nonrecombinant chromosome. The position of the switch indicates where the crossover event took place. For example, consider progeny with chromosome st ss e . The first allele (st) came from the nonrecombinant chromosome st ss e and the other two alleles (ss and e) must have come from the other nonrecombinant chromosome st ss e through crossing over: st ss e st ss e : st ss e st ss e st ss e : st ss e This same crossover also produces the st ss e progeny. This same method can be used to determine the location of crossing over in the other two types of singlecrossover progeny. Crossing between ss and e produces st ss e and st ss e chromosomes: st ss e st e st ss e st e : ss st ss e st e : ss ss The distance between the st and ss loci can be expressed as 14.6 m.u. The map distance between the bristle locus (ss) and the body locus (e) is determined in the same manner. The recombinant progeny that possess a crossover between ss and e are the single crossovers st ss / e and st ss / e , and the double crossovers st / ss / e and st / ss / e . The recombination frequency is: ss–e recombination frequency  (43  41  5  3)  100%  12.2% 755 Thus, the genetic distance between ss and e is 12.2 m.u. Finally, calculate the genetic distance between the outer two loci, st and e. Add up all the progeny with crossovers between the two loci. These progeny include those with a single crossover between st and ss, those with a single crossover between ss and e, and the double crossovers ( st / ss / e and st / ss / e ). Because the double crossovers have two crossovers between st and e, we must add the double crossovers twice to determine the recombination frequency between these two loci: st– e recombination frequency  (50  52  43  41  (2  5)(2  3)  100%  26.8% 755 Notice that the distances between st and ss (14.6 m.u.) and between ss and e (12.2 m.u.) add up to the distance between st and e (26.8 m.u.). We can now use the map distances to draw a map of the three genes on the chromosome: We now know the locations of all the crossovers; their locations are marked with a slash in Figure 7.14. Calculating the recombination frequencies Next, we can determine the map distances, which are based on the frequencies of recombination. Recombination frequency is calculated by adding up all of the recombinant progeny, dividing this number by the total number of progeny from the cross, and multiplying the number obtained by 100%. To determine the map distances accurately, we must include all crossovers (both single and double) that take place between two genes. Recombinant progeny that possess a chromosome that underwent crossing over between the eye-color locus (st) and the bristle locus (ss) include the single crossovers ( st / ss e and st / ss e )   and the two double crossovers ( st / ss / e and st / ss / e ); see Figure 7.14. There are a total of 755 progeny; so the recombination frequency between ss and st is: st–ss recombination frequency  (50  52  5  3)  100%  14.6% 755 26.8 m.u. st 14.6 m.u. ss 12.2 m.u. e A genetic map of D. melanogaster is illustrated in ◗ FIGURE 7.15. Interference and coefficient of coincidence Map distances give us information not only about the physical distances that separate genes, but also about the proportions of recombinant and nonrecombinant gametes that will be produced in a cross. For example, knowing that genes st and ss on the third chromosome of D. melanogaster are separated by a distance of 14.6 m.u. tells us that 14.6% of the gametes produced by a fly heterozygous at these two loci will be recombinants. Similarly, 12.2% of the gametes from a fly heterozygous for ss and e will be recombinants. Theoretically, we should be able to calculate the proportion of double-recombinant gametes by using the multiplication rule of probability (Chapter 3), which states that the probability of two independent events occurring Linkage, Recombination, and Eukaryotic Gene Mapping Chromosome 1 (X) 0.0 1.5 3.0 5.5 7.5 13.7 20.0 21.0 Yellow body Scute bristles White eyes Facet eyes Echinus eyes Ruby eyes Crossveinless wings Cut wings Singed bristles 27.7 Lozenge eyes 33.0 36.1 Vermilion eyes Miniature wings 43.0 44.0 Sable body Garnet eyes 56.7 57.0 59.5 62.5 66.0 Forked bristles Bar eyes Fused veins Carnation eyes Bobbed hairs Chromosome 2 0.0 1.3 4.0 13.0 16.5 Chromosome 3 Net veins Aristaless antenna Star eyes Held-out wings 0.0 0.2 Roughoid eyes Veinlet veins 19.2 Javelin bristles 26.0 26.5 Sepia eyes Hairy body 41.0 43.2 44.0 48.0 50.0 58.2 58.5 58.7 62.0 63.0 66.2 69.5 70.7 74.7 Dichaete bristles Thread arista Scarlet eyes Pink eyes Curled wings Stubble bristles Spineless bristles Bithorax body Stripe body Glass eyes Delta veins Hairless bristles Ebony eyes Cardinal eyes 91.1 Rough eyes 100.7 Claret eyes 106.2 Minute bristles 0.0 Chromosome 4 Bent wing Cubitus veins Shaven hairs Grooveless scutellum Eyeless Dumpy wings Clot eyes 48.5 51.0 54.5 54.8 55.0 57.5 66.7 67.0 Black body Reduced bristles Purple eyes Short bristles Light eyes Cinnabar eyes Scabrous eyes Vestigial wings 72.0 75.5 Lobe eyes Curved wings 100.5 Plexus wings 104.5 107.0 Brown eyes Speck body ◗ 7.15 Drosophila melanogaster has four linkage groups corresponding to its four pairs of chromosomes. Distances between genes within a linkage group are in map distances. together is the multiplication of their independent probabilities. Applying this principle, we should find that the proportion (probability) of gametes with double crossovers between st and e is equal to the probability of recombination between st and ss, multiplied by the probability of recombination between ss and e, or 0.146  0.122  0.0178. Multiplying this probability by the total number of progeny gives us the expected number of double-crossover progeny from the cross: 0.0178  755  13.4. Only 8 double crossovers — considerably fewer than the 13 expected — were observed in the progeny of the cross (see Figure 7.13). 179 This phenomenon is common in eukaryotic organisms. The calculation assumes that each crossover event is independent and that the occurrence of one crossover does not influence the occurrence of another. But crossovers are frequently not independent events: the occurrence of one tends to inhibit additional crossovers in the same region of the chromosome, and so double crossovers are less frequent than expected. The degree to which one crossover interferes with additional crossovers in the same region is termed the interference. To calculate the interference, we first determine the 180 Chapter 7 coefficient of coincidence, which is the ratio of observed double crossovers to expected double crossovers: coefficient of coincidence  number of observed double crossovers number of expected double crossovers For the loci that we mapped on the third chromosome of D. melanogaster (see Figure 7.14), we find that: coefficient of coincidence  53 8   0.6 0.146  0.122  755 13.4 which indicates that we are actually observing only 60% of the double crossovers that we expected on the basis of the single-crossover frequencies. The interference is calculated as: interference  1  coefficient of coincidence So the interference for our three-point cross is: interference  1  0.6  0.4 This value of interference tells us that 40% of the doublecrossover progeny expected will not be observed because of interference. When interference is complete and no doublecrossover progeny are observed, the coefficient of coincidence is 0 and the interference is 1. Sometimes more double-crossover progeny appear than expected, which happens when a crossover increases the probability of another crossover occurring nearby. In this case, the coefficient of coincidence is greater than 1 and the interference will be negative. Concepts The coefficient of coincidence equals the number of double crossovers observed, divided by the number of double crossovers expected on the basis of the single-crossover frequencies. The interference equals 1  the coefficient of coincidence; it indicates the degree to which one crossover interferes with additional crossovers. Connecting Concepts Stepping Through the Three-Point Cross We have now examined the three-point cross in considerable detail, seeing how the information derived from the cross can be used to map a series of three linked genes. Let’s briefly review the steps required to map genes from a three-point cross. 1. Write out the phenotypes and numbers of progeny produced in the three-point cross. The progeny phenotypes will be easier to interpret if you use allelic symbols for the traits (such as st e ss). 2. Write out the genotypes of the original parents used to produce the triply heterozygous individual in the testcross and, if known, the arrangement of the alleles on their chromosomes (coupling or repulsion). 3. Determine which phenotypic classes among the progeny are the nonrecombinants and which are the double crossovers. The nonrecombinants will be the two mostcommon phenotypes; the double crossovers will be the two least-common phenotypes. 4. Determine which locus lies in the middle. Compare the alleles present in the double crossovers with those present in the nonrecombinants; each class of double crossovers should be like one of the nonrecombinants for two loci and should differ for one locus. The locus that differs is the middle one. 5. Rewrite the phenotypes with genes in correct order. 6. Determine where crossovers must have taken place to give rise to the progeny phenotypes by comparing each phenotype with the phenotype of the nonrecombinant progeny. 7. Determine the recombination frequencies. Add the numbers of the progeny that possess a chromosome with a crossover between a pair of loci. Add the double crossovers to this number. Divide this sum by the total number of progeny from the cross, and multiply by 100%; the result is the recombination frequency between the loci, which is the same as the map distance. 8. Draw a map of the three loci, indicating which locus lies in the middle, and label the distances between them. 9. Determine the coefficient of coincidence and the interference. The coefficient of coincidence is the number of observed double-crossover progeny divided by the number of expected double-crossover progeny. The expected number can be obtained by multiplying the product of the two single-recombination probabilities by the total number of progeny in the cross. Worked Problem In D. melanogaster, cherub wings (ch), black body (b), and cinnabar eyes (cn) result from recessive alleles that are all located on chromosome 2. A homozygous wild-type fly was mated with a cherub, black, and cinnabar fly, and the resulting F1 females were test-crossed with cherub, black, and cinnabar males. The following progeny were produced from the testcross: Linkage, Recombination, and Eukaryotic Gene Mapping ch b ch b ch b ch b ch b ch b ch b ch b total cn cn cn cn cn cn cn cn 105 750 40 4 753 41 102 5 1800 (a) Determine the linear order of the genes on the chromosome (which gene is in the middle). (b) Calculate the recombinant distances between the three loci. (c) Determine the coefficient of coincidence and the interference for these three loci. • Solution (a) We can represent the crosses in this problem as follows: P ch b cn ch b cn F1 Testcross ch b ch b  ch ch ch b cn ch b cn ch cn  ch cn b b cn cn (b) To calculate the recombination frequencies among the genes, we first write the phenotypes of the progeny with the genes encoding them in the correct order. We have already identified the nonrecombinant and double-crossover progeny; so the other four progeny types must have resulted from single crossovers. To determine where single crossovers took place, we compare the alleles found in the singlecrossover progeny with those in the nonrecombinants. Crossing over must have taken place where the alleles switch from those found in one nonrecombinant to those found in the other nonrecombinant. The locations of the crossovers are indicated with a slash: ch ch ch / ch / ch ch / ch ch / total cn cn cn cn cn cn cn cn / b b b / b b b / b / b 105 750 40 4 753 41 102 5 1800 single crossover nonrecombinant single crossover double crossover nonrecombinant single crossover single crossover double crossover Next, we determine the recombination frequencies and draw a genetic map: ch–cn recombination frequency  40  41  4  5  100%  5% 1800 b b cn cn Note that we do not know, at this point, the order of the genes; we have arbitrarily put b in the middle. The next step is to determine which of the testcross progeny are nonrecombinants and which are double crossovers. The nonrecombinants should be the most-frequent phenotype; so they must be the progeny with phenotypes encoded by ch b cn and ch b cn . These genotypes are consistent with the genotypes of the parents, which we outlined earlier. The double crossovers are the least-frequent phenotypes and are encoded by ch b cn and ch b cn. We can determine the gene order by comparing the alleles present in the double crossovers with those present in the nonrecombinants. The double-crossover progeny should be like one of the nonrecombinants at two loci and unlike it at one; the allele that differs should be in the middle. Compare the double-crossover progeny ch b cn with the nonrecombinant ch b cn . Both have cherub wings (ch) and black body (b), but the double-crossover progeny have wildtype eyes (cn), whereas the nonrecombinants have cinnabar eyes (cn). The locus that determines cinnabar eyes must be in the middle. cn–b recombination frequency  105  4  102  5  100%  12% 1800 ch–b recombination frequency  105  40  (2  4)  41  102  (2  5)  100%  17% 1800 17 m.u. 5 m.u. ch 12 m.u. cn b (c) The coefficient of coincidence is the number of observed double crossovers, divided by the number of expected double crossovers. The number of expected double crossovers is obtained by multiplying the probability of a crossover between ch and cn (0.05)  the probability of a crossover between cn and b (0.12)  the total number of progeny in the cross (1800): coefficient of coincidence  45  0.83 0.05  0.12  1800 Finally, the interference is equal to 1  the coefficient of coincidence: interference  1  0.83  0.17 181 182 Chapter 7 Gene Mapping in Humans has blood type B, which has two possible genotypes: IBIB or IBi (see Figure 5.6). Because some of her offspring are blood type O (genotype ii) and must therefore have inherited an i allele from each parent, female I-2 must have genotype IBi. Similarly, the presence of blood type O offspring in generation II indicates that male I-1, with blood type A, also must carry an i allele and therefore has genotype IAi. The ABO and nail – patella genotypes for all persons in the pedigree are given below the squares and circles. From generation II, we can see that the genes for nail – patella syndrome and the blood types do not appear to assort independently. The parents of this family are: Efforts in mapping the human genome are hampered by the inability to perform desired crosses and the small number of progeny in most human families. Geneticists are restricted to analyses of pedigrees, which are often incomplete and provide limited information. Nevertheless, techniques have been developed that use pedigree data to analyze linkage, and a large number of human traits have been successfully mapped with the use of these methods. Because the number of progeny from any one mating is usually small, data from several families and pedigrees are usually combined to test for independent assortment. The methods used in these types of analysis are beyond the scope of this book, but an example will illustrate how linkage can be detected from pedigree data. One of the first documented demonstrations of linkage in humans was between the locus for nail – patella syndrome and the locus that determines the ABO blood types. Nail – patella syndrome is an autosomal dominant disorder characterized by abnormal fingernails and absent or rudimentary kneecaps. The ABO blood types are determined by an autosomal locus with multiple alleles (Chapter 5). Linkage between the genes encoding these traits was established in families in which both traits segregate. Part of one such family is illustrated in ◗ FIGURE 7.16. Nail – patella syndrome is relatively rare; so we can assume that people having this trait are heterozygous (Nn); unaffected people are homozygous (nn). The ABO genotypes can be inferred from the phenotypes and the types of offspring produced. Person I-2 in Figure 7.16, for example, I IAi Nn  IBi nn If the genes coding for nail – patella syndrome and the ABO blood types assorted independently, we would expect that some children in generation II would have blood type A and nail – patella syndrome, inheriting both the IA and N genes from their father. However, all children in generation II with nail – patella syndrome have either blood type B or blood type O; all those with blood type A have normal nails and kneecaps. This outcome indicates that the arrangements of the alleles on the chromosomes of the crossed parents are: IA i i IB i  B IB n A I A I A or I A i B I B I B or I B i O ii 2 i n N II B 1 B n I All children in generation II with nail-patella syndrome have either blood type B or blood type O. i N A 2 A I n i n O 3 i n i N B 4 B n I i B 5 B I n N O A A 6 7 8 A n A I n i n I i N i n B 4 B n I n i III A 1 A I n i N A 2 A n I i n 3 i O i n i n B 5 B I n i n These outcomes resulted from crossover events. ◗ n n There is no recombination among the offspring of these parents (generation II), but there are two instances of A 1 IA n This person has nail-patella syndrome (Nn). n N 7.16 Linkage between ABO blood types and nail – patella syndrome was established by examining families in whom both traits segregate. The pedigree shown here is for one such family. Solid circles and squares represent the presence of nail – patella syndrome; the ABO blood type is indicated in each circle or square. The genotype, inferred from phenotype, is given below each square or circle. n i N A A 9 10 A A I n I n i n i n Linkage, Recombination, and Eukaryotic Gene Mapping recombination among the persons in generation III. Individuals II-1 and II-2 have the following genotypes: IB i n N  IA i n n Their child III-2 has blood type A and does not have nail – patella syndrome; so he must have genotype: IA i n n and must have inherited both the i and the n alleles from his father. These alleles are on different chromosomes in the father; so crossing over must have taken place. Crossing over also must have taken place to produce child III-3. In the pedigree of Figure 7.16, there are 12 children from matings in which the genes encoding nail – patella syndrome and ABO blood types segregate; 2 of them are recombinants. On this basis, we might assume that the loci for nail – patella syndrome and ABO blood types are linked, with a recombination frequency of 2/12  0.167. However, it is possible that the genes are assorting independently and that the small number of children just makes it seem as though the genes are linked. To determine the probability that genes are actually linked, geneticists often calculate lod (logarithm of odds) scores. To obtain a lod score, one calculates both the probability of obtaining the observations with a specified degree of linkage and the probability of obtaining the observations with independent assortment. One then determines the ratio of these two probabilities, and the logarithm of this ratio is the lod score. Suppose that the probability of obtaining a particular set of observations with linkage and a certain recombination frequency is 0.1 and that the probability of obtaining the same observations with independent assortment is 0.0001. The ratio of these two probabilities is 0.1/0.0001  1000, the logarithm of which (the lod score) is 3. Thus linkage with the specified recombination is 1000 times as likely to produce what was observed as independent assortment. A lod score of 3 or higher is usually considered convincing evidence for linkage. Mapping with Molecular Markers For many years, gene mapping was limited in most organisms by the availability of genetic markers, variable genes with easily observable phenotypes whose inheritance could be studied. Traditional genetic markers include genes that encode easily observable characteristics such as flower color, seed shape, blood types, and biochemical differences. The paucity of these types of characteristics in many organisms limited mapping efforts. In the 1980s, new molecular techniques made it possible to examine variations in DNA itself, providing an almost unlimited number of genetic markers that can be used for creating genetic maps and studying linkage relations. The earliest of these molecular markers consisted of restriction fragment length polymorphisms (RFLPs), variations in DNA sequence detected by cutting the DNA with restriction enzymes (see Chapter 18). Later, methods were developed for detecting variable numbers of short DNA sequences repeated in tandem, called variable number of tandem repeats (VNTRs). More recently, DNA sequencing allows the direct detection of individual variations in the DNA nucleotides, called single nucleotide polymorphisms (SNPs; see Chapter 19). All of these methods have expanded the availability of genetic markers and greatly facilitated the creation of genetic maps. Gene mapping with molecular markers is done essentially in the same manner as mapping performed with traditional phenotypic markers: the cosegregation of two or more markers is studied and map distances are based on the rates of recombination between markers. These methods and their use in mapping are presented in more detail in Chapter 18. Physical Chromosome Mapping Genetic maps reveal the relative positions of genes on a chromosome on the basis of frequencies of crossing over, but they do not provide information that can allow us to place groups of linked genes on particular chromosomes. Furthermore, the units of a genetic map do not always precisely correspond to physical distances on the chromosome, because a number of factors other than physical distances between genes (such as the type and sex of the organism) can influence rates of crossing over. Because of these limitations, physical-mapping methods that do not rely on rates of crossing over have been developed. Deletion Mapping One method for determining the chromosomal location of a gene is deletion mapping. Special staining methods have been developed that make it possible to detect chromosome deletions, mutations in which a part of a chromosome is missing. Genes are assigned to regions of particular chromosomes by studying the association of a gene’s phenotype or product and particular chromosome deletions. In deletion mapping, an individual that is homozygous for a recessive mutation in the gene of interest is crossed with an individual that is heterozygous for a deletion ( ◗ FIGURE 7.17). If the gene of interest is in the region of the chromosome represented by the deletion (the red part of chromosome in Figure 7.17), approximately half of the progeny will display the mutant phenotype (see Figure 7.17a). If the gene is not within the deleted region, all of the progeny will be wild type (see Figure 7.17b). Deletion mapping has been used to reveal the chromosomal locations of a number of human genes. For example, 183 184 Chapter 7 (a) (b) P generation Region of deletion a Chromosome with deletion A+ a a A+ a  A+  Chromosome with deletion Region of deletion Cross aa Mutant Cross A+ Wild type aa Mutant A+A+ Wild type F1 generation A+ a a If the gene of interest is in the deletion region, half of the progeny will display the mutant phenotype. A+ a A+ a If the gene is not within the deletion region, all of the progeny will be wild type. A+a Wild type a Mutant A+a Wild type A+a Wild type ◗ 7.17 Deletion mapping can be used to determine the chromosomal location of a gene. An individual homozygous for a recessive mutation in the gene of interest (aa) is crossed with an individual heterozygous for a deletion. Duchenne muscular dystrophy is a disease that causes progressive weakening and degeneration of the muscles. From its X-linked pattern of inheritance, the mutated allele causing this disorder was known to be on the X chromosome, but its precise location was uncertain. Examination of a number of patients having Duchenne muscular dystrophy, who also possessed small deletions, allowed researchers to position the gene to a small segment of the short arm of the X chromosome. Somatic-Cell Hybridization Another method used for positioning genes on chromosomes is somatic cell hybridization, which requires the fusion of different types of cells. Most mature somatic (nonsex) cells can undergo only a limited number of divisions and therefore cannot be grown continuously. However, cells that have been altered by viruses or derived from tumors that have lost the normal constraints on cell division will divide indefinitely; these types of cells can be cultured in the laboratory and are referred to as a cell line. Cells from two different cell lines can be fused by treating them with polyethylene glycol or other agents that alter their plasma membranes. After fusion, the cell possesses two nuclei and is called a heterokaryon. The two nuclei of a heterokaryon eventually also fuse, generating a hybrid cell that contains chromosomes from both cell lines. If human and mouse cells are mixed in the presence of polyethylene glycol, fusion results in human – mouse somatic-cell hybrids ( ◗ FIGURE 7.18). The hybrid cells tend to lose chromosomes as they divide and, for reasons that are not understood, chromosomes from one of the species are lost preferentially. In human – mouse somatic-cell hybrids, the human chromosomes tend to be lost, whereas the mouse chromosomes are retained. Eventually, the chromosome number stabilizes when all but a few of the human chromosomes have been lost. Chromosome loss is random and differs among cell lines. The presence of these “extra” human chromosomes in the mouse genome makes it possible to assign human genes to specific chromosomes. In the first step of this procedure, hybrid cells must be separated from original parental cells that have not undergone hybridization. This separation is accomplished by using a selection method that allows hybrid cells to grow while suppressing the growth of parental cells. The most commonly used method is called HAT selection ( ◗ FIGURE 7.19), which stands for hypoxanthine, aminopterin, and thymidine, three chemicals that are used to select for hybrid cells. In the presence of HAT medium, a cell must possess two enzymes to synthesize DNA: thymidine kinase (TK) and hypoxanthine-guanine phosphoribosyl transferase (HPRT). Cells that are tk or hprt cannot synthesize DNA and will not grow on HAT medium. The mouse cells used in the hy- Linkage, Recombination, and Eukaryotic Gene Mapping Human fibroblast Mouse tumor cell hprt – /tk + human fibroblast 1 Human fibroblasts and mouse tumor cells are mixed in the presence of polyethylene glycol, which facilitates fusion of their membranes,... hprt +/tk – mouse tumor cell 2 ...creating hybrid cells called heterokaryons. Heterokaryon 3 Human and mouse nuclei in some hybrid cells fuse. Culture in HAT medium Hybrid cell with fused nucleus 4 Human chromosomes are randomly lost from the nucleus during cell division; all but a few of the human chromosomes are eventually lost. A B C D hprt – cells die tk – cells die Hybrid cells survive ◗ 7.19 HAT medium can be used to separate human – mouse hybrid cells from the original hybridized cells. E Cell lines 5 Different human chromosomes are lost in different cell lines. ◗ 7.18 Somatic-cell hybridization can be used to determine which chromosome contains a gene of interest. bridization procedure are deficient in TK, but can produce HPRT (the cells are tk hprt); the human cells can produce TK but are deficient for HPRT (they are tk hprt). On HAT medium, the mouse cells do not survive, because they are tk; the human cells do not survive, because they are hprt. Hybrid cells, on the other hand, inherit the ability to make HPRT from the mouse cell and the ability to make TK from the human cell; thus, they produce both enzymes (the cells are tk hprt) and will grow on HAT medium. To map genes using somatic-cell hybridization requires the use of a panel of different hybrid cell lines. The cell lines of the panel differ in the human chromosomes that they have retained. For example, one cell line might possess human chromosomes 2, 4, 7, and 8, whereas another might possess chromosomes 4, 19, and 20. Each cell line in the panel is examined for evidence of a particular human gene. The human gene can be detected either by looking for the protein that it produces or by looking for the gene itself with the use of molecular probes (discussed in Chapter 18). Correlation of the presence of the gene with the presence of specific human chromosomes often allows the gene to be assigned to the correct chromosome. For example, if a gene was detected in both of the aforementioned cell lines, the gene must be on chromosome 4, because it is the only human chromosome common to both cell lines ( ◗ FIGURE 7.20). 185 186 Chapter 7 Human chromosomes present Cell line Gene product present A + B + C – D + E – F + 1 + 2 3 4 + + + + 5 6 7 8 + + + 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X + + + + + + + + + + + + + + + + + + + ◗ 7.20 Somatic-cell hybridization is used to assign a gene to a particular human chromosome. A panel of six cell lines, each line containing a different subset of human chromosomes, is examined for the presence of the gene product (such as an enzyme). A plus sign means that the gene product is present; a minus sign means that the gene product is missing. Four of the cell lines (A, B, D, and F) have the gene product, indicating that the gene is present on one of the chromosomes found in these cell lines. The only chromosome common to all four of these cell lines is chromosome 4, indicating that the gene is located on this chromosome. Two genes determined to be on the same chromosome with the use of somatic-cell hybridization are said to be syntenic genes. This term is used because syntenic genes may or may not exhibit linkage in the traditional genetic sense — remember that two genes can be located on the same chromosome but may be so far apart that they assort independently. Syntenic refers to genes that are physically linked, regardless of whether they exhibit genetic linkage. (Synteny is sometimes also used to refer to The gene product (an enzyme) is present when there is an intact chromosome 4. The gene product is absent when the entire chromosome 4 is absent… …or its short arm is missing. gene loci in different organisms located on a chromosome region of common evolutionary origin.) Sometimes somatic-cell hybridization can be used to position a gene on a specific part of a chromosome. Some hybrid cell lines carry a human chromosome with a chromosome mutation such as a deletion or a translocation. If the gene is present in a cell line with the intact chromosome but missing from a line with a chromosome deletion, the gene must be located in the deleted region ( ◗ FIGURE 7.21) Similarly, if a gene is usually absent from a chromosome but consistently appears whenever a translocation (a piece of another chromosome that has broken off and attached itself to the chromosome in question) is present, it must be present on the translocated part of the chromosome. In Situ Hybridization 2 4 Cell line 1 11 2 11 Cell line 2 2 4 11 Cell line 3 Conclusion: If the gene product is present in a cell line with an intact chromosome but missing from a line with a chromosome deletion, the gene for that product must be located in the deleted region. ◗ 7.21 Genes can be localized to a specific part of a chromosome by using somatic-cell hybridization. Described in more detail in Chapter 18, in situ hybridization is another method for determining the chromosomal location of a particular gene. This method requires a DNA copy of the gene or its RNA product, which is used to make a molecule (called a probe) that is complementary to the gene of interest. The probe is made radioactive or is attached to a special molecule that fluoresces under ultraviolet (UV) light and is added to chromosomes from specially treated cells that have been spread on a microscope slide. The probe binds to the complementary DNA sequence of the gene on the chromosome. The presence of radioactivity or fluorescence from the bound probe reveals the location of the gene on a particular chromosome ( ◗ FIGURE 7.22a). The use of fluorescence in situ hybridiza- Linkage, Recombination, and Eukaryotic Gene Mapping (a) (b) ◗ 7.22 In situ hybridization is another technique for determining the chromosomal location of a gene. (a) FISH technique: in this case, the bound probe reveals sequences associated with the centromere. (b) SKY technique: 24 different probes, each specific for a different human chromosome and producing a different color, identify the different human chromosomes. (Courtesy of Dr. Hesed Padilla-Nash and Dr. Thomas Ried, NIH.) tion (FISH) has been widely used to identify the chromosomal location of human genes. In spectral karyotyping (SKY) ( ◗ FIGURE 7.22b), a set of 24 FISH probes, each specific to a different human chromosome and attached to a molecule that fluoresces a different color, allows each chromosome in a karyotype to be identified. www.whfreeman.com/pierce hybridization (FISH) Concepts Physical-mapping methods determine the physical locations of genes on chromosomes and include deletion mapping, somatic-cell hybridization, in situ hybridization, and direct DNA sequencing. More on fluorescence in situ Mapping by DNA Sequencing Another means of physically mapping genes is to determine the sequence of nucleotides in the DNA (DNA sequencing, Chapter 19). With this technique, physical distances between genes are measured in numbers of base pairs. Continuous sequences can be determined for only relatively small fragments of DNA; so, after sequencing, some method is still required to map the individual fragments. This mapping is often done by using the traditional gene mapping that examines rates of crossing over between molecular markers located on the fragments. It can also be accomplished by generating a set of overlapping fragments, sequencing each fragment, and then aligning the fragments by using a computer program that identifies the overlap in the sequence of adjacent fragments. With these methods, complete physical maps of entire genomes have been produced (Chapter 19). Connecting Concepts Across Chapters The principle of independent assortment states that alleles at different loci assort (separate) independently in meiosis but only if the genes are located on different chromosomes or are far apart on the same chromosome. This chapter has focused on the inheritance of genes that are physically linked on the same chromosome and do not assort independently. To predict the outcome of crosses entailing linked genes, we must consider not only the genotypes of the parents but also the physical arrangement of the alleles on the chromosomes. An important principle learned in this chapter is that rates of recombination are related to the physical distances between genes. Crossing over is more frequent between genes that are far apart than between genes that lie close together. This fact provides the foundation for gene mapping in eukaryotic organisms: recombination frequencies are used to determine the relative order and distances between 187 188 Chapter 7 linked genes. Gene mapping therefore requires the setting up of crosses in which recombinant progeny can be detected. This chapter also examined several methods of physical mapping that do not rely on recombination rates but use methods to directly observe the association between genes and particular chromosomes or to position genes by determining the nucleotide sequences. Although genetic and physical distances are correlated, they are not identical, because factors other than the distances between genes can influence rates of crossing over. Gene mapping requires a firm understanding of the behavior of chromosomes (Chapter 2) and basic principles of heredity (Chapters 3 through 5). The discussion of gene mapping with pedigrees assumes knowledge of how families are displayed in pedigrees (Chapter 6). In Chapter 8, we will consider specialized mapping techniques used in bacteria and viruses; in Chapter 18, techniques for detecting molecular markers used in gene mapping are examined in more detail. Techniques for mapping whole genomes are discussed in Chapter 19. Chromosome mutations that play a role in deletion mapping and somatic-cell hybridization are considered in more detail in Chapter 9. CONCEPTS SUMMARY • • • • • • • Soon after Mendel’s principles were rediscovered, examples of genes that did not assort independently were discovered. These genes were subsequently shown to be linked on the same chromosome. In a testcross for two completely linked genes (which exhibit no crossing over), only nonrecombinant progeny containing the original combinations of alleles present in the parents are produced. When two genes assort independently, recombinant progeny and nonrecombinant progeny are produced in equal proportions. When two genes are linked with some crossing over between them, more nonrecombinant progeny than recombinant progeny are produced. Because a single crossover between two linked genes produces two recombinant gametes and two nonrecombinant gametes, crossing over and independent assortment produce the same results. Recombination frequency is calculated by summing the number of recombinant progeny, dividing by the total number of progeny produced in the cross, and multiplying by 100%. The recombination frequency is half the frequency of crossing over, and the maximum frequency of recombinant gametes is 50%. When two wild-type alleles are found on one homologous chromosome and their mutant alleles are found on the other chromosome, the genes are said to be in coupling configuration. When one wild-type allele and one mutant allele are found on each homologous chromosome, the genes are said to be in repulsion. Whether genes are in coupling configuration or in repulsion determines which combination of phenotypes will be most frequent in the progeny of a testcross. Linkage and crossing over are two opposing forces: linkage keeps alleles at different loci together, whereas crossing over breaks up linkage and allows alleles to recombine into new associations. • Interchromosomal recombination takes place among genes located on different chromosomes and occurs through the random segregation of chromosomes in meiosis. Intrachromosomal recombination takes place among genes located on the same chromosome and occurs through crossing over. • Testing for independent assortment between genes requires a series of chi-square tests, in which segregation is first tested at each locus individually, followed by testing for independent assortment among genes at the different loci. • Recombination rates can be used to determine the relative order of genes and distances between them on a chromosome. Maps based on recombination rates are called genetic maps; maps based on physical distances are called physical maps. • One percent recombination equals one map unit, which is also a centiMorgan. • When genes exhibit 50% recombination, they belong to different linkage groups, which may be either on different chromosomes or far apart on the same chromosome. • Recombination rates between two genes will underestimate the true distance between them because double crossovers cannot be detected. • Genetic maps can be constructed by examining recombination rates from a series of two-point crosses or by examining the progeny of a three-point testcross. • Gene mapping in humans can be accomplished by examining the cosegregation of traits in pedigrees, although the inability to control crosses and the small number of progeny in many families limit mapping with this technique. • A lod score is obtained by calculating the logarithm of the ratio of the probability of obtaining the observed progeny Linkage, Recombination, and Eukaryotic Gene Mapping with a specified degree of linkage to the probability of obtaining the observed progeny with independent assortment. A lod score of 3 or higher is usually considered evidence for linkage. • Molecular techniques that allow the detection of variable differences in DNA sequence have greatly facilitated gene mapping. • In deletion mapping, genes are physically associated with particular chromosomes by studying the expression of recessive mutations in heterozygotes that possess chromosome deletions. • In somatic-cell hybridization, cells from two different cell lines (human and rodent) are fused. The resulting hybrid 189 cells initially contain chromosomes from both species but randomly lose different human chromosomes. The hybrid cells are examined for the presence of specific genes; if a human gene is present in the hybrid cell, it must be present on one of the human chromosomes in that the cell line. • With in situ hybridization, a radioactive or fluorescence label is added to a fragment of DNA that is complementary to a specific gene. This probe is then added to specially prepared chromosomes, where it pairs with the gene of interest. The presence of the label on a particular chromosome reveals the physical location of the gene. • Nucleotide sequencing is another method of physically mapping genes. IMPORTANT TERMS linked genes (p. 161) linkage group (p. 161) nonrecombinant (parental) gamete (p. 163) nonrecombinant (parental) progeny (p. 163) recombinant gamete (p. 163) recombinant progeny (p. 163) recombination frequency (p. 166) coupling (cis) configuration (p. 167) repulsion (trans) configuration (p. 167) interchromosomal recombination (p. 168) intrachromosomal recombination (p. 168) genetic map (p. 172) physical map (p. 172) map unit (m.u.) (p. 172) centimorgan (p. 172) morgan (p. 172) two-point testcross (p. 173) three-point testcross (p. 174) interference (p. 179) coefficient of coincidence (p. 180) lod score (p. 183) genetic marker (p. 183) deletion mapping (p. 183) somatic-cell hybridization (p. 184) cell line (p. 184) heterokaryon (p. 184) syntenic gene (p. 186) Worked Problems 1. In guinea pigs, white coat (w) is recessive to black coat (W) P and wavy hair (v) is recessive to straight hair (V). A breeder crosses a guinea pig that is homozygous for white coat and wavy hair with a guinea pig that is black with straight hair. The F1 are then crossed with guinea pigs having white coats and wavy hair in a series of testcrosses. The following progeny are produced from these testcrosses: F1 black, straight black, wavy white, straight white, wavy total 30 10 12 31 83 (a) Are the genes that determine coat color and hair type assorting independently? Carry out chi-square tests to test your hypothesis. (b) If the genes are not assorting independently, what is the recombination frequency between them? • Solution (a) Assuming independent assortment, outline the crosses conducted by the breeder: Testcross ww vv  WW VV Ww Vv Ww Vv  ww vv Ww Vv Ww vv ww Vv ww vv 1 1 1 1 4 4 4 4 black, straight black, wavy white, straight white, wavy Because a total of 83 progeny were produced in the testcrosses, we expect 14  83  20.75 of each. The observed numbers of progeny from the testcross (30, 10, 12, 31) do not appear to fit the expected numbers (20.75, 20.75, 20.75, 20.75) well; so independent assortment may not have occurred. To test the hypothesis, carry out a series of three chi-square tests. First, look at each locus separately and determine if the 190 Chapter 7 observed numbers fit those expected from the testcross. For the locus determining coat color, crossing Ww  ww is expected to produce 12 Ww (black) and 12 ww (white) progeny, or 41.5 of a total of 83 progeny. Ignoring the hair type, we find that 30  10  40 black progeny and 12  31  43 white progeny were observed. Thus, the observed and expected values for this chi-square test are: Phenotype Observed Expected black white 40 43 41.5 41.5 expected; so the observed and expected numbers and the associated chi-square value are: Observed Expected black, straight black, wavy white, straight white, wavy 30 10 12 31 20.75 20.75 20.75 20.75 (30  20.75) (10  20.75)2 (12  20.75)2   20.75 20.75 20.75 (31  20.75)2  20.75  118.44 2  The chi-square value is: (observed  expected)2 expected (40  41.5)2 (43  41.5)2    0.108 41.5 41.5 Phenotype 2   The degrees of freedom for the chi-square goodness-of-fit test are n  1, where n equals the number of expected classes. There are two expected classes (black and white) so the degree of freedom is 2  1  1. On the basis of the calculated chisquare value in Table 3.4, the probability associated with this chi-square is greater than .05 (the critical probability for rejecting the hypothesis that chance might be responsible for the differences between observed and expected numbers); so the black and white progeny appear to be in the 11 ratio expected in a testcross. Next, compute a second chi-square value comparing the number of straight and wavy progeny with the numbers expected from the testcross. From the Vv  vv, 12 Vv (straight) and 12 vv (wavy) progeny are expected: degrees of freedom  n  1  4  1  3 In Table 3.4, the associated probability is much less than .05, indicating that chance is very unlikely to be responsible for the differences between the observed numbers and the numbers expected with independent assortment. The genes for coat color and hair type have therefore not assorted independently. (b) To determine the recombination frequencies, identify the recombinant progeny. Using the notation for linked genes, write the crosses: P Observed Expected straight wavy 42 41 41.5 41.5 2  (42  41.5)2 41.5  (41  41.5)2  0.012 41.5 degrees of freedom  n  1  2  1  1 In Table 3.4, the probability associated with this chi-square value is much greater than .05; so straight and wavy progeny are in a 11 ratio. Having established that the observed numbers for each trait do not differ from the numbers expected from the testcross, we next test for independent assortment. With independent assortment, 20.75 of each phenotype are V v  w w v v W w W w V  v V v w w v v W w w w W w w w V v v v v v V v F1 Testcross Phenotype W w 30 black, straight (nonrecombinant progeny) 31 white, wavy (nonrecombinant progeny) 10 black, wavy (recombinant progeny) 12 white, straight (recombinant progeny) The recombination frequency is: number of recombinant progeny total number progeny  100% Linkage, Recombination, and Eukaryotic Gene Mapping or recombination frequency   10  12  100% 30  10  12  31 22  100  26.5% 83 2. A series of two-point crosses entailed seven loci (a, b, c, d, e, f, and g), producing the following recombination frequencies. Using these recombination frequencies, map the seven loci, showing their linkage groups and the order and distances between the loci of each linkage group: 191 The recombination frequency between a and d is 14%; so d is located in linkage group 1. Is locus d 14 m.u. to the right or to the left of gene a? If d is 14 m.u. to the left of a, then the b-to-d distance should be 10 m.u.  14 m.u.  24 m.u. On the other hand, if d is to the right of a, then the distance between b and d should be 14 m.u.  10 m.u.  4 m.u. The b – d recombination frequency is 4%; so d is 14 m.u. to the right of a. The updated map is: Linkage group 1 a d 14 m.u. b 10 m.u. 4 m.u. Linkage group 2 Loci a and b a and c a and d a and e a and f a and g b and c b and d b and e b and f b and g Recombination frequency (%) 10 50 14 50 50 50 50 4 50 50 50 Loci c and d c and e c and f c and g d and e d and f d and g e and f e and g f and g Recombination frequency (%) 50 8 50 12 50 50 50 50 18 50 c The recombination frequencies between each of loci a, b, and d, and locus e are all 50%; so e is not in linkage group 1 with a, b, and d. The recombination frequency between e and c is 8 m.u.; so e is in linkage group 2: Linkage group 1 a d 14 m.u. b 10 m.u. 4 m.u. Linkage group 2 • Solution To work this problem, remember that 1% recombination equals 1 map unit and a recombination frequency of 50% means that genes at the two loci are assorting independently (located in different linkage groups). The recombination frequency between a and b is 10%; so these two loci are in the same linkage group, approximately 10 m.u. apart. c e 8 m.u. There is 50% recombination between f and all the other genes; so f must belong to a third linkage group: Linkage group 1 a d 14 m.u. b Linkage group 1 a 10 m.u. b 10 m.u. 4 m.u. Linkage group 2 c The recombination frequency between a and c is 50%; so c must lie in a second linkage group. Linkage group 1 e 8 m.u. Linkage group 3 a b f 10 m.u. Linkage group 2 c Finally, position locus g with respect to the other genes. The recombination frequencies between g and loci a, b, and d are all 50%; so g is not in linkage group 1. The recombination 192 Chapter 7 frequency between g and c is 12 m.u.; so g is a part of linkage group 2. To determine whether g is 12 map units to the right or left of c, consult the g – e recombination frequency. Because this recombination frequency is 18%, g must lie to the left of c: Linkage group 1 a d 14 m.u. b 10 m.u. Linkage group 2 g 4 m.u. e 18 m.u. c 12 m.u. In this case, we know that ro is the middle locus because the genes have been mapped. Eight classes of progeny will be produced from this cross: e e e e e e e e / ro / ro ro / ro / / ro / / ro / ro ro bv bv bv bv bv bv bv bv nonrecombinant nonrecombinant single crossover between e and ro single crossover between e and ro single crossover between ro and bv single crossover between ro and bv double crossover double crossover To determine the numbers of each type, use the map distances, starting with the double crossovers. The expected number of double crossovers is equal to the product of the single-crossover probabilities: 8 m.u. Linkage group 3 f expected number of double crossovers  0.20  0.12  1800  43.2 Note that the g-to-e distance (18 m.u.) is shorter than the sum of the g-to-c (12 m.u.) and c-to-e distances (8 m.u.), because of undetectable double crossovers between g and e. However, some interference occurs; so the observed number of double crossovers will be less than the expected. The interference is 1  coefficient of coincidence; so the coefficient of coincidence is: 3. Ebony body color (e), rough eyes (ro), and brevis bristles (bv) are three recessive mutations that occur in fruit flies. The loci for these mutations have been mapped and are separated by the following map distances: e ro 20 m.u. bv 12 m.u. The interference between these genes is 0.4. A fly with ebony body, rough eyes, and brevis bristles is crossed with a fly that is homozygous for the wild-type traits. The resulting F1 females are test-crossed with males that have ebony body, rough eyes, and brevis bristles; 1800 progeny are produced. Give the phenotypes and expected numbers of phenotypes in the progeny of the testcross. coefficient of coincidence  1  interference The interference is given as 0.4; so the coefficient of coincidence equals 1  0.4  0.6. Recall that the coefficient of coincidence is: coefficient of coincidence  number of observed double crossovers number of expected double crossovers Rearranging this equation, we obtain: number of observed double crossovers  coefficient of coincidence  number of expected double crossovers number of observed double crossovers  0.6  43.2  26 • Solution The crosses are: P e ro bv e ro bv F1 Testcross e e e e ro ro bv bv e ro bv e ro bv   e ro bv  e ro bv ro ro bv bv  A total of 26 double crossovers should be observed. Because there are two classes of double crossovers ( e / ro / bv and e / ro / bv ), we should observe 13 of each. Next, we determine the number of single-crossover progeny. The genetic map indicates that there are 20 m.u. between e and ro; so 360 progeny (20% of 1800) are expected to have resulted from recombination between these two loci. Some of them will be single-crossover progeny and some will be double-crossover progeny. We have already determined that the number of doublecrossover progeny is 26; so the number of progeny resulting from a single crossover between e and ro is 360  26  334, which will Linkage, Recombination, and Eukaryotic Gene Mapping be divided equally between the two single-crossover phenotypes ( e / ro bv and e / ro bv ). There are 12 map units between ro and bv; so the number of progeny resulting from recombination between these two genes is 0.12  1800  216. Again, some of these recombinants will be single-crossover progeny and some will be double-crossover progeny. To determine the number of progeny resulting from a single crossover, subtract the double crossovers: 216  26  190. These single-crossover progeny will be divided between the two singlecrossover phenotypes ( e ro / bv and e ro / bv ); so there will be 190/2  95 of each. The remaining progeny will be nonrecombinants, and they can be obtained by subtraction: 1800  26  334  190  1250; there are two nonrecombinants ( e ro bv and e ro bv ); so there will be 1250/2  625 of each. The numbers of the various phenotypes are listed here: e e e / e / e e e / e / total ro ro ro ro ro ro ro ro / / / / bv bv bv bv bv bv bv bv 625 625 167 167 95 95 13 13 1800 nonrecombinant nonrecombinant single crossover between e and ro single crossover between e and ro single crossover between ro and bv single crossover between ro and bv double crossover double crossover 4. The locations of six deletions have been mapped to the Drosophila chromosome as shown in the following diagram. Recessive mutations a, b, c, d, e, f, and g are known to be located in the same regions as the deletions, but the order of the mutations on the chromosome is not known. When flies homozygous for the recessive mutations are crossed with flies homozygous for the deletions, the following results are obtained, where the letter “m” represents a mutant phenotype and a plus sign () represents the wild type. On the basis of these data, determine the relative order of the seven mutant genes on the chromosome: 193 • Solution The offspring of the cross will be heterozygous, possessing one chromosome with the deletion and wild-type alleles and one chromosome without the deletion and recessive mutant alleles. For loci within the deleted region, only the recessive mutations will be present in the offspring, which will exhibit the mutant phenotype. The presence of a mutant trait in the offspring therefore indicates that the locus for that trait is within the region covered by the deletion. We can map the genes by examining the expression of the recessive mutations in the flies with different deletions. Mutation a is expressed in flies with deletions 4, 5, and 6 but not in flies with other deletions; so a must be in the area that is unique to deletions 4, 5, and 6: a Deletion 1 Deletion 2 Deletion 3 Deletion 4 Deletion 5 Deletion 6 Mutation b is expressed only when deletion 1 is present; so it must be located in the region of the chromosome covered by deletion 1 and none of the other deletions: b a Deletion 1 Deletion 2 Deletion 3 Deletion 4 Deletion 5 Chromosome Deletion 6 Deletion 1 Deletion 2 Deletion 3 Deletion 4 Using this procedure, we can map the remaining mutations. For each mutation, we look for the area of overlap among deletions that express the mutations and exclude any areas of overlap that are covered by other deletions that do not express the mutation: Deletion 5 Deletion 6 Mutations b Deletion a b c d e f g 1 2 3 4 5 6    m m m m      m m     m m m m  m   m m  m     m m     m  c d e Deletion 1 Deletion 2 Deletion 3 Deletion 4 Deletion 5 Deletion 6 a f g 194 Chapter 7 5. A panel of cell lines was created from mouse – human somaticcell fusions. Each line was examined for the presence of human chromosomes and for the production of human haptoglobin (a protein). The following results were obtained: Human chromosomes Cell line A B C D Human haptoglobin     1     2     3     14     15     16     21     On the basis of these results, which human chromosome carries the gene for haptoglobin? • Solution Examine those cell lines that are positive for human haptoglobin and see what chromosomes they have in common. Lines B and C produce human haptoglobin; the chromosomes that they have in common are 1 and 16. Next, examine all lines that possess chromosomes 1 and 16 and determine whether they produce haptoglobin. Chromosome 1 is found in cell lines A, B, C, and D. If the gene for human haptoglobin were found on chromosome 1, human haptoglobin would be present in all of these cell lines. However, lines A and D do not produce human haptoglobin; so the gene cannot be on chromosome 1. Chromosome 16 is found only in cell lines B and C, and only these lines produce human haptoglobin; so the gene for human haptoglobin lies on chromosome 16. COMPREHENSION QUESTIONS * 1. What does the term recombination mean? What are two causes of recombination? * 2. In a testcross for two genes, what types of gametes are produced with (a) complete linkage, (b) independent assortment, and (c) incomplete linkage? 3. What effect does crossing over have on linkage? 4. Why is the frequency of recombinant gametes always half the frequency of crossing over? * 5. What is the difference between genes in coupling configuration and genes in repulsion? What effect does the arrangement of linked genes (whether they are in coupling configuration or in repulsion) have on the results of a cross? 6. How does one test to see if two genes are linked? 7. What is the difference between a genetic map and a physical map? * 8. Why do calculated recombination frequencies between pairs of loci that are located relatively far apart underestimate the true genetic distances between loci? 9. Explain how one can determine which of three linked loci is the middle locus from the progeny of a three-point testcross. 10. What does the interference tell us about the effect of one * crossover on another? 11. List some of the methods for physically mapping genes and explain how they are used to position genes on chromosomes. 12. What is a lod score and how is it calculated? APPLICATION QUESTIONS AND PROBLEMS *13. In the snail Cepaea nemoralis, an autosomal allele causing a banded shell (BB) is recessive to the allele for unbanded shell (BO). Genes at a different locus determine the background color of the shell; here, yellow (CY) is recessive to brown (CBw). A banded, yellow snail is crossed with a homozygous brown, unbanded snail. The F1 are then crossed with banded, yellow snails (a testcross). (a) What will be the results of the testcross if the loci that control banding and color are linked with no crossing over? (b) What will be the results of the testcross if the loci assort independently? (c) What will be the results of the testcross if the loci are linked and 20 map units apart? *14. In silkmoths (Bombyx mori) red eyes (re) and white-banded wing (wb) are encoded by two mutant alleles that are recessive to those that produce wild-type traits (re and wb); these two genes are on the same chromosome. A moth homozygous for red eyes and white-banded wings is crossed with a moth homozygous for the wild-type traits. The F1 have normal eyes and normal wings. The F1 are crossed with moths that have red eyes and white-banded wings in a testcross. The progeny of this testcross are: wild-type eyes, wild-type wings red eyes, wild-type wings wild-type eyes, white-banded wings red eyes, white-banded wings 418 19 16 426 Linkage, Recombination, and Eukaryotic Gene Mapping (a) What phenotypic proportions would be expected if the genes for red eyes and white-banded wings were located on different chromosomes? (b) What is the genetic distance between the genes for red eyes and white-banded wings? *15. A geneticist discovers a new mutation in Drosophila melanogaster that causes the flies to shake and quiver. She calls this mutation spastic (sps) and determines that spastic is due to an autosomal recessive gene. She wants to determine if the spastic gene is linked to the recessive gene for vestigial wings (vg). She crosses a fly homozygous for spastic and vestigial traits with a fly homozygous for the wild-type traits and then uses the resulting F1 females in a testcross. She obtains the following flies from this testcross.  vg vg vg vg total  sps sps sps sps 230 224 97 99 650 Are the genes that cause vestigial wings and the spastic mutation linked? Do a series of chi-square tests to determine if the genes have assorted independently. 16. In cucumbers, heart-shaped leaves (hl) are recessive to normal leaves (Hl) and having many fruit spines (ns) is recessive to having few fruit spines (Nl). The genes for leaf shape and number of spines are located on the same chromosome; mapping experiments indicate that they are 32.6 map units apart. A cucumber plant having heartshaped leaves and many spines is crossed with a plant that is homozygous for normal leaves and few spines. The F1 are crossed with plants that have heart-shaped leaves and many spines. What phenotypes and proportions are expected in the progeny of this cross? *17. In tomatoes, tall (D) is dominant over dwarf (d) and smooth fruit (P) is dominant over pubescent fruit (p), which is covered with fine hairs. A farmer has two tall and smooth tomato plants, which we will call plant A and plant B. The farmer crosses plants A and B with the same dwarf and pubescent plant and obtains the following numbers of progeny: (b) Are the loci that determine height of the plant and pubescence linked? If so, what is the map distance between them? (c) Explain why different proportions of progeny are produced when plant A and plant B are crossed with the same dwarf pubescent plant. 18. A cross between individuals with genotypes a a bb  aa bb produces the following progeny: aa bb aa bb aa bb aa bb Plant A 122 6 4 124 Plant B 2 82 82 4 (a) What are the genotypes of plant A and plant B? 83 21 19 77 (a) Does the evidence indicate that the a and b loci are linked? (b) What is the map distance between a and b? (c) Are the alleles in the parent with genotype aa bb in coupling configuration or repulsion? How do you know? 19. In tomatoes, dwarf (d) is recessive to tall (D) and opaque (light green) leaves (op) are recessive to green leaves (Op). The loci that determine the height and leaf color are linked and separated by a distance of 7 m.u. For each of the following crosses, determine the phenotypes and proportions of progeny produced. (a) D d Op op  d d op op (b) D d op Op  d d op op (c) D d Op op  D d Op op (d) D d op Op  D d op Op * 20. In Drosophila melanogaster, ebony body (e) and rough eyes (ro) are encoded by autosomal recessive genes found on chromosome 3; they are separated by 20 map units. The gene that encodes forked bristles (f) is X-linked recessive and assorts independently of e and ro. Give the phenotypes of progeny and their expected proportions when each of the following genotypes is test-crossed. Progeny of Dd Pp Dd pp dd Pp dd pp 195 (a) e e ro ro f f (b) e e ro ro f f * 21. A series of two-point crosses were carried out among seven loci (a, b, c, d, e, f, and g), producing the following recombination frequencies. Map the seven loci, showing 196 Chapter 7 their linkage groups, the order of the loci in each linkage group, and the distances between the loci of each group: Loci a and b a and c a and d a and e a and f a and g b and c b and d b and e b and f b and g Recombination frequency (%) 50 50 12 50 50 4 10 50 18 50 50 Loci c and d c and e c and f c and g d and e d and f d and g e and f e and g f and g Recombination frequency (%) 50 26 50 50 50 50 8 50 50 50 (a) Determine the order of these genes on the chromosome. (b) Calculate the map distances between the genes. (c) Determine the coefficient of coincidence and the interference among these genes. (d) List the genes found on each chromosome in the parents used in the testcross. * 24. In Drosophila melanogaster, black body (b) is recessive to gray body (b), purple eyes (pr) are recessive to red eyes (pr), and vestigial wings (vg) are recessive to normal wings (vg). The loci coding for these traits are linked, with the following map distances: b vg 6 * 22. Waxy endosperm (wx), shrunken endosperm (sh), and yellow seedling (v) are encoded by three recessive genes in corn that are linked on chromosome 5. A corn plant homozygous for all three recessive alleles is crossed with a plant homozygous for all the dominant alleles. The resulting F1 are then crossed with a plant homozygous for the recessive alleles in a three-point testcross. The progeny of the testcross are: * 25. wx sh V 87 Wx Sh v 94 Wx Sh V 3,479 wx sh v 3,478 Wx sh V 1,515 wx Sh v 1,531 wx Sh V 292 Wx sh v 280 total 10,756 (a) Determine order of these genes on the chromosome. (b) Calculate the map distances between the genes. (c) Determine the coefficient of coincidence and the interference among these genes. 23. Fine spines (s), smooth fruit (tu), and uniform fruit color (u) are three recessive traits in cucumbers whose genes are linked on the same chromosome. A cucumber plant heterozygous for all three traits is used in a testcross, and the following progeny are produced from this testcross: S U s u S u s u S U s U s U S u total pr Tu Tu Tu tu tu tu Tu tu 2 70 21 4 82 21 13 17 230 13 The interference among these genes is 0.5. A fly with black body, purple eyes, and vestigial wings is crossed with a fly homozygous for gray body, red eyes, and normal wings. The female progeny are then crossed with males that have black body, purple eyes, and vestigial wings. If 1000 progeny are produced from this testcross, what will be the phenotypes and proportions of the progeny? The locations of six deletions have been mapped to the Drosophila chromosome shown here. Recessive mutations a, b, c, d, e, and f are known to be located in the same region as the deletions, but the order of the mutations on the chromosome is not known. When flies homozygous for the recessive mutations are crossed with flies homozygous for the deletions, the following results are obtained, where “m” represents a mutant phenotype and a plus sign () represents the wild type. On the basis of these data, determine the relative order of the seven mutant genes on the chromosome: Chromosome Deletion 1 Deletion 2 Deletion 3 Deletion 4 Deletion 5 Deletion 6 Deletion a b Mutations c d e f 1 2 3 4 5 6 m m       m   m m  m m     m m m m   m m m  m      197 Linkage, Recombination, and Eukaryotic Gene Mapping 26. A panel of cell lines was created from mouse – human somatic-cell fusions. Each line was examined for the presence of human chromosomes and for the production of an enzyme. The following results were obtained: Human chromosomes Cell line Enzyme 1 2 A B C D E                3      4 5           6 7 8                9 10 17 22                     On the basis of these results, which chromosome has the gene that codes for the enzyme? * 27. A panel of cell lines was created from mouse – human somatic-cell fusions. Each line was examined for the presence of human chromosomes and for the production of three enzymes. The following results were obtained. Cell line A B C D Enzyme 1 2 3 4 8 Human chromosomes 9 12 15 16 17 22 X                                                 On the basis of these results, give the chromosome location of enzyme 1, enzyme 2, and enzyme 3. CHALLENGE QUESTION 28. In calculating map distances, we did not concern ourselves with whether double crossovers were two stranded, three stranded, or four stranded; yet, these different types of double crossovers produce different types of gametes. Can you explain why we do not need to determine how many strands take part in double crossovers in diploid organisms? (Hint: Draw out the types of gametes produced by the different types of double crossovers and see how they contribute to the determination of map distances.) SUGGESTED READINGS Creighton, H. B., and B. McClintock. 1931. A correlation of cytological and genetical crossing over in Zea mays. Proceedings of the National Academy of Science U. S. A. 17:492 – 497. Paper reporting Creighton and McClintock’s finding that crossing over is associated with exchange of chromosome segments. Crow, J. 1988. A diamond anniversary: the first genetic map. Genetics 118:1 – 3. A brief review of the history of Sturtevant’s first genetic map. Ruddle, F. H., and R. S. Kucherlapati. 1974. Hybrid cells and human genes. Scientific American 231(1):36 – 44. A readable review of somatic-cell hybridization. Stern, C. 1936. Somatic crossing over and segregation in Drosophila melanogaster. Genetics 21:625 – 631. Stern’s finding, similar to Creighton and McClintock’s, of a correlation between crossing over and physical exchange of chromosome segments. Sturtevant, A. H. 1913. The linear arrangement of six sex-linked factors in Drosophila, as shown by their mode of association. Journal of Experimental Zoology 14:43 – 59. Sturtevant’s report of the first genetic map. 8 Bacterial and Viral Genetic Systems • • Pump Handles and Cholera Genes Bacterial Genetics Techniques for the Study of Bacteria The Bacterial Genome Plasmids Gene Transfer in Bacteria Conjugation Natural Gene Transfer and Antibiotic Resistance Transformation in Bacteria Bacterial Genome Sequences • Viral Genetics Techniques for the Study of Bacteriophages Gene Mapping in Phages Transduction: Using Phages to Map Bacterial Genes Fine-Structure Analysis of Bacteriophage Genes Overlapping Genes A map of London in 1856. (Mylne, Robert W., Map of the Contours of London and Its Environs, Published by Edward Stanford, Charing Cross, London. Engraved and Printed from Stone by Waterlow and Sons, 1856.) Pump Handles and Cholera Genes On the night of August 31 in 1854, a terrible epidemic of cholera broke out in the Soho neighborhood of London. Hundreds of residents were stricken with severe diarrhea and vomiting and, in the next three days, 127 people living on or near Broad Street died. By September 10, the number of fatalities had climbed to more than 500. It was the worst outbreak of cholera ever seen in England. Residents of Soho fled their homes in terror, leaving businesses closed, homes locked, and streets deserted. The Soho epidemic was witnessed firsthand by Dr. John Snow, a physician who lived on Sackville Street and saw the devastating effects and rapid spread of the disease. He had conducted research on cholera and suspected that it was spread through the water supply, but most medical authorities dismissed his suspicion. 198 RNA Viruses Prions: Pathogens Without Genes Snow conducted a thorough survey of the Soho neighborhood, identifying all those who were sick with cholera. He plotted the locations of the cases on a map and observed that they clustered around one particular water pump located on Broad Street. Cholera cases did not cluster around other nearby water pumps. Snow contacted the parish officials and convinced them to remove the handle to the Broad Street pump and, with this simple action, the spread of cholera stopped dramatically. Snow later conducted additional studies of cholera outbreaks in London and established that the disease was spread in water contaminated with sewage. Cholera has existed in Asia for at least 1000 years but, at the time of the Soho epidemic, it was a relatively new disease in England. Today, cholera is recognized as a severe infection of the intestine caused by Vibrio cholerae ( ◗ FIGURE 8.1). This bacterium produces a potent endotoxin that induces Bacterial and Viral Genetic Systems ◗ 8.1 Vibrio cholerae is the bacterium that causes cholera. (CMRI/Science Photo Library/Photo Researchers.) copious diarrhea and vomiting. If untreated, the condition can lead to serious dehydration and death. Although the number of cholera deaths has dropped dramatically since the advent of oral rehydration treatment and antibiotics, the disease continues to be a serious public-health problem, particularly in areas that lack modern water-supply systems. A recent epidemic in South Africa infected more than 25,000 people in a 6-month period. Many of cholera’s secrets have now been revealed through the sequencing of V. cholerae’s genome. Most bacteria have a single circular chromosome, but V. cholerae has two. One of the most significant findings to emerge from the sequencing of the V. cholerae genome is that many of the bacterium’s genes for pathogenesis were acquired from other bacteria. Vibrio cholerae apparently has a long history Table 8.1 Advantages of using bacteria and viruses for genetic studies 1. Reproduction is rapid. 2. Many progeny are produced. 3. Haploid genome allows all mutations to be expressed directly. 4. Asexual reproduction simplifies the isolation of genetically pure strains. 5. Growth in the laboratory is easy and requires little space. 6. Genomes are small. 7. Techniques are available for isolating and manipulating their genes. 8. They have medical importance. 9. They can be genetically engineered to produce substances of commercial value. of exchanging genetic material with other bacteria and viruses, and it seems that much of this acquired DNA is responsible for its virulence. The gene that encodes the cholera toxin, for example, is found imbedded in a viral genome that infected the bacterium and became a permanent part of its genome long ago. Vibrio cholerae illustrates the importance of gene exchange between bacteria and viruses, a major theme of this chapter. Since the 1940s, the genetic systems of bacteria and viruses have contributed to the discovery of many important concepts in genetics. The study of molecular genetics initially focused almost entirely on their genes; today, bacteria and viruses are still essential tools for probing the nature of genes in more-complex organisms, in part because they possess a number of characteristics that make them suitable for genetic studies (Table 8.1). The genetic systems of bacteria and viruses are also studied because these organisms play important roles in human society. They have been harnessed to produce a number of economically important substances, and they are of immense medical significance, causing many human diseases. In this chapter, we focus on several unique aspects of bacterial and viral genetic systems. Important processes of gene transfer and recombination, like those that contributed to the pathogenesis of the cholera bacterium, will be described, and we will see how these processes can be used to map bacterial and viral genes. www.whfreeman.com/pierce Information about John Snow and his contributions to the study of cholera Bacterial Genetics Heredity in bacteria is fundamentally similar to heredity in more-complex organisms, but the bacterial haploid genome and their small size (which makes observation of their phenotypes difficult) require different approaches and methods. First, we will consider how bacteria are studied and then examine several processes that transfer genes from one bacterium to another. Techniques for the Study of Bacteria Microbiologists have defined the nutritional needs of a number of bacteria and developed culture media for growing them in the laboratory. Culture media typically contain a carbon source, essential elements such as nitrogen and phosphorus, certain vitamins, and other required ions and nutrients. Wild-type (prototrophic) bacteria can use these simple ingredients to synthesize all the compounds that they need for growth and reproduction. A medium that contains only the nutrients required by prototrophic bacteria is termed minimal medium. Mutant strains called auxotrophs lack one or more enzymes necessary for metabolizing nutrients or synthesizing essential molecules and will grow only on medium supplemented with one or more 199 200 Chapter 8 ◗ 8.2 Bacteria can be grown in liquid medium. nutrients. For example, auxotrophic strains that are unable to synthesize the amino acid leucine will not grow on minimal medium but will grow on medium to which leucine has been added. Complete medium contains all the substances required by bacteria for growth and reproduction. Cultures of bacteria are often grown in test tubes that contain sterile liquid medium ( ◗ FIGURE 8.2a). A few bacteria are added to the tube, and they grow and divide until all the nutrients are used up or — more commonly — until the concentration of their waste products becomes toxic. Bacteria are also grown in petri plates ( ◗ FIGURE 8.2b). Growth medium suspended in agar is poured into the bottom half of the petri plate, providing a solid, gel-like base (a) (b) (c) (d) for bacterial growth. The chief advantage of this method is that it allows one to isolate and count bacteria, which individually are too small to see without a microscope. In a process called plating, a dilute solution of bacteria is spread over the surface of an agar-filled petri plate. As each bacterium grows and divides, it gives rise to a visible clump of genetically identical cells (a colony). Genetically pure strains of the bacteria can be isolated by collecting bacteria from a single colony and transferring them to a new test tube or petri plate. Because individual bacteria are too small to be seen directly, it is often easier to study phenotypes that affect the appearance of the colony ( ◗ FIGURE 8.3) or can be de- ◗ 8.3 Bacteria can be grown on solid media and show a variety of phenotypes. (a) Smooth, circular raised surface. (b) Granular, circular raised surface. (c) Elevated folds on a flat colony with irregular edges. (d) Irregular elevations on a raised colony with an undulating edge. (Parts a and d, Biophoto Associates/Photo Researchers; part b, Dr. E. Bottone/Peter Arnold; part c, Larry Jensen/Visuals Unlimited.) Bacterial and Viral Genetic Systems ◗ 8.4 Mutant bacterial strains can be isolated on the basis of their nutritional requirements. tected by simple chemical tests. Nutritional requirements of the bacteria are used to detect some commonly studied phenotypes. Suppose we want to detect auxotrophic mutants that cannot synthesize leucine (leu mutants). We first spread the bacteria on a petri plate containing complete medium that includes leucine; so both prototrophs that have the leu allele and auxotrophs that have leu alleles will grow on it ( ◗ FIGURE 8.4). Next, using a technique called replica plating, we transfer a few cells from each of the colonies on the original plate to two new replica plates, one containing complete medium and the other containing selective medium that lacks leucine. The leu bacteria will grow on both media, but the leu mutants will grow only on the complete medium, because they cannot synthesize the leucine that is absent from the selective medium. Any colony that grows on complete medium but not on this selective medium consists of leu bacteria. The auxotrophic mutants growing on complete medium can then be cultured for further study. The Bacterial Genome Bacteria are unicellular organisms that lack a nuclear membrane ( ◗ FIGURE 8.5). Most bacterial genomes consist ◗ 8.5 Most bacterial cells possess a single, circular chromosome not bounded by a nuclear membrane. (K.G. Murti/Visuals Unlimited.) 201 202 Chapter 8 of a circular chromosome that contains a single DNA molecule several million base pairs in length. For example, the genome of E. coli has approximately 4.6 million base pairs of DNA (such as V. cholerae). However, some bacteria contain multiple chromosomes, and a few even have linear chromosomes. Bacterial chromosomes are usually organized efficiently, with little DNA between genes. www.whfreeman.com/pierce General information on bacteria, bacterial structure, and major groups of bacteria, with some great pictures Plasmids In addition to having a chromosome, many bacteria possess plasmids, small, circular DNA molecules ( ◗ FIGURE 8.6). Some plasmids are present in many copies per cell, whereas others are present in only one or two copies. In general, plasmids carry genes that are not essential to bacterial function but that may play an important role in the life cycle and growth of their bacterial hosts. Some plasmids promote mating between bacteria; others contain genes that kill other bacteria. Of great importance, plasmids are used extensively in genetic engineering (Chapter 18) and some of them play a role in the spread of antibiotic resistance among bacteria. Most plasmids are circular and several thousand base pairs in length, although plasmids consisting of several hundred thousand base pairs also have been found. Possessing its own origin of replication, a plasmid replicates independently of the bacterial chromosome. Replication proceeds from the origin in one or two directions until the entire plasmid is copied. In ◗ FIGURE 8.7, the origin of replication is oriV. A few plasmids have multiple replication origins. 1 Replication in a plasmid begins at the origin of replication, the oriV site. Origin of replication (oriV site) Strand separation Double-stranded DNA 2 Strands separate and replication takes place in both directions,… ◗ 8.6 Neisseria gonorrhoeae (a bacterium that causes gonorrhea), like many other bacteria, contains plasmids in addition to its chromosome. The connected plasmids (indicated by the arrow) have just replicated. (A. B. Dowsett/Science Photo Library/ Photo Researchers.) 3 …proceeding around the circle… 4 …and eventually producing two circular DNA molecules. Newly synthesized DNA Replication Strands separate at oriV Separation of daughter plasmids New strand Old strand ◗ 8.7 A plasmid replicates independently of its bacterial chromosome. Replication begins at the origin of replication (oriV) and continues around the circle. In this diagram, replication is taking place in both directions; in some plasmids, replication is ione direction only. (Photo from Photo Researchers.) Bacterial and Viral Genetic Systems S G H F Genes that N U regulate C B plasmid K transfer E L to other A cells J O Sequences that regulate insertion into the bacterial chromosome: IS2 all entailing some type of DNA transfer and recombination between the transferred DNA and the bacterial chromosome. IS3 1. Conjugation ( ◗ FIGURE 8.9a) is the direct transfer DI Origin of transfer F factor Genes that control plasmid replication: oriV (origin of replication) inc rep ◗ 8.8 The F factor, a circular episome of E. coli, contains a number of genes that regulate transfer into the bacterial cell and insertion into the bacterial chromosome. It contains a number of genes that regulate its transfer to other cells and that control replication. Replication is initiated at oriV. Insertion sequences (Chapter 11) IS3 and IS2 control insertion into the bacterial chromosome and excision from it. Episomes are plasmids that are capable of either freely replicating or integrating into the bacterial chromosomes. The F (fertility) factor of E. coli ( ◗ FIGURE 8.8) is an episome that controls mating and gene exchange between E. coli cells, as will be discussed in the next section. Concepts The typical bacterial genome consists of a single circular chromosome that contains several million base pairs. Some bacterial genes may be present on plasmids, which are small, circular DNA molecules that replicate independently of the bacterial chromosome. Gene Transfer in Bacteria For many years, bacteria were thought to reproduce only by simple binary fission, in which one cell splits into two identical cells without any exchange or recombination of genetic material. In 1946, Joshua Lederberg and Edward Tatum demonstrated that bacteria can transfer and recombine genetic information. This finding paved the way for the use of bacteria as model genetic organisms. Bacteria exchange genetic material by three different mechanisms, of genetic material from one bacterium to another. In conjugation, two bacteria lie close together and a connection forms between them. A plasmid or a part of the bacterial chromosome passes from one cell (the donor) to the other (the recipient). Subsequent to conjugation, crossing over takes place between homologous sequences in the transferred DNA and the chromosome of the recipient cell. In conjugation, DNA is transferred only from donor to recipient, with no reciprocal exchange of genetic material. 2. In transformation ( ◗ FIGURE 8.9b), DNA in the medium surrounding a bacterium is taken up. After transformation, recombination may take place between the introduced genes and those of the bacterial chromosome. 3. In transduction ( ◗ FIGURE 8.9c), bacterial viruses (bacteriophages) carry DNA from one bacterium to another. Once inside the bacterium, the newly introduced DNA may undergo recombination with the bacterial chromosome. Not all bacterial species exhibit all three types of genetic transfer. Conjugation is more frequent for some bacteria than for others. Transformation takes place to a limited extent in many bacteria, but laboratory techniques have been developed that increase the rate of DNA uptake. Most bacteriophages have a limited host range; so transduction is normally between bacteria of the same or closely related species only. These processes of genetic exchange in bacteria differ from the sexual reproduction of diploid eukaryotes in two important ways. First, DNA exchange and reproduction are not coupled in bacteria. Second, donated genetic material that is not recombined into the host DNA is usually degraded and so the recipient cell remains haploid. Each type of genetic transfer can be used to map genes, as will be discussed in the following sections. Concepts DNA may be transferred between bacterial cells through conjugation, transformation, or transduction. Each type of genetic transfer consists of a one-way movement of genetic information to the recipient cell, sometimes followed by recombination. These processes are not connected to cellular reproduction in bacteria. 203 204 Chapter 8 (a) Conjugation Donor cell Cytoplasmic bridge forms. DNA replicates and transfers from one cell to the other. Cells separate. Recipient cell A crossover in the recipient cell leads to… …creation of a recombinant chromosome. Degraded DNA Bacterial chromosome Transferred DNA replicates. (b) Transformation Naked DNA is taken up by the recipient cell. A crossover in the bacterium leads to… …creation of a recombinant chromosome. DNA fragments (c) Transduction A virus infects a bacterial cell,… …injects its DNA,… …and replicates, taking up bacterial DNA. The bacterial cell lyses. The virus infects a new bacterium,… …carrying bacterial DNA with it. A crossover in the recipient cell leads to… …creation of a recombinant chromosome. ◗ 8.9 Conjugation, transformation, and transduction are three processes of gene transfer in bacteria. All three processes require transferred DNA to undergo recombination with the bacterial chromosome for the transferred DNA to be stably inherited. Conjugation In the course of their research, Lederberg and Tatum studied strains of E. coli possessing auxotrophic mutations. The Y10 strain required the amino acids threonine (and genotypically was thr) and leucine (leu) and the vitamin thiamine (thi) for growth but did not require the vitamin biotin (bio) or the amino acids phenylalanine (phe) and cysteine (cys); the genotype of this strain can be written as: thr leu thi bio phe cys. The Y24 strain required biotin, phenylalanine, and cysteine in its medium, but it did not require threonine, leucine, or thiamine; its genotype was: thr leu thi bio phe cys. In one experiment, Lederberg and Tatum mixed Y10 and Y24 bacteria together and plated them on minimal medium ( ◗ FIGURE 8.10). Each strain was also plated separately on minimal medium. Bacterial and Viral Genetic Systems Experiment Question: Do bacteria exchange genetic information? 1 Auxotrophic bacterial strain Y10 cannot synthesize Thr, Leu or thiamine… Y10 – leu – thi bio +phe + thr – cys + leu : leu, and thi : thi in strain Y10 or bio : bio, phe : phe, and cys : cys in strain Y24) would have been required for either strain to become prototrophic by mutation, which was very improbable. Lederberg and Tatum concluded that some type of genetic transfer and recombination had taken place: Auxotrophic strain Y10 Y24 Y24 2 …and strain Y24 cannot synthesize biotin, Phe or Cys… leu thr + + thi + bio – 3 …and so neither auxotrophic strain can grow on minimal medium. thr leu thi bio phe cys thr leu thi bio phe cys thr leu thi bio phe cys phe – – cys thr leu thi bio phe cys Bacterial chromosome Prototrophic strain Mixture of strands Y10 and Y24 thr leu thi bio phe cys 4 When strains Y10 6 …because genetic reand Y24 are mixed,… combination has taken place and bacteria can 5 …some colonies synthesize all necessary grow… nutrients. + + leu + thi bio phe + thr + cys + Conclusion: Yes, genetic exchange and recombination took place between the two mutant strains. ◗ 8.10 Lederberg and Tatum’s experiment demonstrated that bacteria undergo genetic exchange. Alone, neither Y10 nor Y24 grew on minimal medium. Strain Y10 was unable to grow, because it required threonine, leucine, and thiamine, which were absent in the minimal medium; strain Y24 was unable to grow, because it required biotin, phenylalanine, and cysteine, which also were absent from the minimal medium. When Lederberg and Tatum mixed the two strains, however, a few colonies did grow on the minimal medium. These prototrophic bacteria must have had genotype thr leu thi bio phe cys. Where had they come from? If mutations were responsible for the prototrophic colonies, then some colonies should also have grown on the plates containing Y10 or Y24 alone, but no bacteria grew on these plates. Multiple simultaneous mutations (thr: thr, thr leu thi bio phe cys What they did not know was how it had taken place. To study this problem, Bernard Davis constructed a U-shaped tube ( ◗ FIGURE 8.11) that was divided into two compartments by a filter having fine pores. This filter allowed liquid medium to pass from one side of the tube to the other, but the pores of the filter were too small to allow passage of bacteria. Two auxotrophic strains of bacteria were placed on opposite sides of the filter, and suction was applied alternately to the ends of the U-tube, causing the medium to flow back and forth between the two compartments. Despite hours of incubation in the U-tube, bacteria plated out on minimal medium did not grow; there had been no genetic exchange between the strains. The exchange of bacterial genes clearly required direct contact between the bacterial cells. This type of genetic exchange entailing cell-to-cell contact in bacteria is called conjugation. F and F cells In most bacteria, conjugation depends on a fertility (F) factor that is present in the donor cell and absent in the recipient cell. Cells that contain F are referred to as F, and cells lacking F are F. The F factor contains an origin of replication and a number of genes required for conjugation (see Figure 8.8). For example, some of these genes encode sex pili (singular, pilus), slender extensions of the cell membrane. A cell containing F produces the sex pili, which makes contact with a receptor on an F cell ( ◗ FIGURE 8.12) and pulls the two cells together. DNA is then transferred from the F cell to the F cell. Conjugation can take place only between a cell that possesses F and a cell that lacks F. 205 206 Chapter 8 Experiment ◗ 8.11 Question: How did the genetic exchange seen in Lederberg and Tatum’s experiment take place? Auxotrophic strain A In most cases, the only genes transferred during conjugation between an F and F cell are those on the F factor ( ◗ FIGURE 8.13a and b). Transfer is initiated when one of the DNA strands on the F factor is nicked at an origin (oriT). One end of the nicked DNA separates from the circle and passes into the recipient cell ( ◗ FIGURE 8.13c). Replication takes place on the nicked strand, proceeding around the circular plasmid and replacing the transferred strand ( ◗ FIGURE 8.13d). Because the plasmid in the F cell is always nicked at the oriT site, this site always enters the recipient cell first, followed by the rest of the plasmid. Thus, the transfer of genetic material has a defined direction. Once inside the recipient cell, the single strand Auxotrophic strain B Airflow Strain A Davis’s U-tube experiment. Strain B When two auxotrophic strains were separated by a filter that allowed mixing of medium but not bacteria,… …no prototrophic bacteria were produced Minimal medium Minimal medium Minimal medium Minimal medium No growth No growth No growth No growth ◗ 8.12 Conclusion: Genetic exchange requires direct contact between bacterial cells. (a) F+ cell F– cell (donor (recipient bacterium) bacterium) (b) A sex pilus connects F and F cells during bacterial conjugation. (Dr. Dennis Kunkel/Phototake.) (c) F+ F– (d) F– F+ (e) F+ F– F+ F+ 5' Bacterial chromosome F factor During conjugation, a cytoplasmic connection forms between the F+ and the F– cell. ◗ One of the DNA strands on the F factor is nicked at an origin and separates. Replication takes place on the F factor, replacing the nicked strand. The 5' end of the nicked DNA passes into the recipient cell… 8.13 The F factor is transferred during conjugation between an F and F cell. …where the single strand is replicated,… …producing a circular, doublestranded copy of the F plasmid. The F– cell now becomes F+. Bacterial and Viral Genetic Systems F+ cell Bacterial chromosome Hfr cell F factor Crossing over takes place between F factor and chromosome. The F factor is integrated into the chromosome. ◗ 8.14 The F factor is integrated into the bacterial chromosome in an Hfr cell. is replicated, producing a circular, double-stranded copy of the F plasmid ( ◗ FIGURE 8.13e). If the entire F factor is transferred to the recipient F cell, that cell becomes an F cell. Hfr cells Conjugation transfers genetic material in the F plasmid from F to F cells but does not account for the transfer of chromosomal genes observed by Lederberg and Tatum. In Hfr (high-frequency) strains, the F factor is integrated into the bacterial chromosome ( ◗ FIGURE 8.14). Hfr cells behave as F cells, forming sex pili and undergoing conjugation with F cells. In conjugation between Hfr and F cells ( ◗ FIGURE 8.15a), the integrated F factor is nicked, and the end of the nicked strand moves into the F cell ( ◗ FIGURE 8.15b), just as it does in conjugation between F and F cells. In the Hfr (a) Hfr cell (b) 207 cells, the F factor is linked to the bacterial chromosome, so the chromosome follows it into the recipient cell. How much of the bacterial chromosome is transferred depends on the length of time that the two cells remain in conjugation. Once inside the recipient cell, the donor DNA strand is replicated ( ◗ FIGURE 8.15c), and crossing over between it and the original chromosome of the F cell ( ◗ FIGURE 8.15d) may take place. This gene transfer between Hfr and F cells is how the recombinant prototrophic cells observed by Lederberg and Tatum were produced. When crossing over has taken place in the recipient cell, the donated chromosome is degraded, and the recombinant recipient chromosome remains ( ◗ FIGURE 8.15e) to be replicated and passed to later generations by binary fission. In a mating of Hfr  F, the F cell almost never becomes F or Hfr, because the F factor is nicked in the middle during the initiation of strand transfer, placing part of F at the beginning and part at the end of the strand to be transferred. To become F or Hfr, the recipient cell must receive the entire F factor, requiring that the entire bacterial chromosome is transferred. This event happens rarely, because most conjugating cells break apart before the entire chromosome has been transferred. The F plasmid in F cells integrates into the bacterial chromosome, causing an F cell to become Hfr, at a frequency of only about 1/10,000. This low frequency accounts for the low rate of recombination observed by Lederberg and Tatum in their F cells. The F factor is excised from the bacterial chromosome at a similarly low rate, causing a few Hfr cells to become F. F cells When an F factor does excise from the bacterial chromosome, a small amount of the bacterial chromosome (c) (d) F– cell (e) Hfr cell F– cell Incorrect alleles Hfr chromosome (F factor plus bacterial genes) Bacterial chromosome In the Hfr cell, the F factor is integrated into the bacterial chromosome. ◗ In conjugation, F is nicked and the 5' end moves into the F– cell. The transferred strand is replicated,… …the cells separate,… 8.15 Bacterial genes may be transferred from an Hfr cell to an F cell in conjugation. …and crossing over takes place between the donated Hfr chromosome and the original chromosome of the F– cell. Crossing over may lead to recombination of alleles (bright green in place of black segment). The linear chromosome is degraded. 208 Chapter 8 Crossing over takes place within the Hfr chromosome. When the F factor excises from the bacterial chromosome, it may carry some bacterial genes (in this case lac) with it. Hfr cell F' cell F factor Table 8.3 Results of conjugation between cells with different F factors Conjugating Cells Cell Types Present After Conjugation F  F Two F cells (F cell becomes F)  lac lac lac Hfr  F One F cell and one F (no change)* F  F Two F cells (F cell becomes F) *Rarely, the F cell becomes F in an Hfr  F conjugation if the entire chromosome is transferred during conjugation. Bacterial chromosome with integrated F factor Bacterial chromosome one on the bacterial chromosome and one on the newly introduced F plasmid. The outcomes of conjugation between different mating types of E. coli are summarized in Table 8.3. ◗ 8.16 An Hfr cell may be converted into an F cell when the F factor excises from the bacterial chromosome and carries bacterial genes with it. may be removed with it, and these chromosomal genes will then be carried with the F plasmid ( ◗ FIGURE 8.16). Cells containing an F plasmid with some bacterial genes are called F prime (F). For example, if an F factor integrates into a chromosome adjacent to the chromosome’s lac operon, the F factor may pick up lac genes when it excises, becoming Flac. F cells can conjugate with F cells, given that they possess the F plasmid with all the genetic information necessary for conjugation and gene transfer. Characteristics of different mating types of E. coli (cells with different types of F) are summarized in Table 8.2. During conjugation between an Flac cell and an F cell, the F plasmid is transferred to the F cell, which means that any genes on the F plasmid, including those from the bacterial chromosome, may be transferred to F recipient cells. This process is called sexduction. It produces partial diploids, or merozygotes, which are cells with two copies of some genes, Table 8.2 Characteristics of E. coli cells with different types of F factor F Factor Characteristics Role in Conjugation F Present as separate circular DNA Donor F Absent Recipient Hfr Present, integrated into bacterial chromosome High-frequency donor F Present as separate circular DNA, carrying some bacterial genes Donor Type Concepts Conjugation in E. coli is controlled by an episome called the F factor. Cells containing F (F cells) are donors during gene transfer; cells without F (F cells) are recipients. Hfr cells possess F integrated into the bacterial chromosome; they donate DNA to F cells at a high frequency. F cells contain a copy of F with some bacterial genes. Mapping bacterial genes with interrupted conjugation The transfer of DNA that takes place during conjugation between Hfr and F cells allows bacterial genes to be mapped. During conjugation, the chromosome of the Hfr cell is transferred to the F cell. Transfer of the entire E. coli chromosome requires about 100 minutes; if conjugation is interrupted before 100 minutes have elapsed, only part of the chromosome will pass into the F cell and have an opportunity to recombine with the recipient chromosome. Chromosome transfer always begins within the integrated F factor and proceeds in a continuous direction; so genes are transferred according to their sequence on the chromosome. The time required for individual genes to be transferred indicates their relative positions on the chromosome. In most genetic maps, distances are expressed as percent recombination; but, in bacterial maps constructed with interrupted conjugation, the basic unit of distance is a minute. Worked Problem To illustrate the method of mapping genes with interrupted conjugation, let’s look at a cross analyzed by François Jacob and Elie Wollman, who first developed this method of gene mapping ( ◗ FIGURE 8.17a). They used donor Hfr cells that were sensitive to the antibiotic streptomycin (genotype str s); resistant to sodium azide (azir) and infection by bacteriophage T1 (tonr); prototrophic for threonine (thr) and Bacterial and Viral Genetic Systems leucine (leu); and able to break down lactose (lac) and galactose (gal). They used F recipient cells that were resistant to streptomycin (str r); sensitive to sodium azide (azis) Experiment Question: How can interrupted conjugation be used to map bacterial genes? 1 An Hfr cell with genotype str s thr + leu+ azi r tonr lac + gal +… (a) F– Hfr ton lac Start azi r thr + lac + gal + leu + azi s – gal – ton r s 2 …was mated with a F – cell with genotype str r thr – leu– azi s tons lac – gal –. thr – leu – str – str + leu + thr + Bacteria separate 8 min Genes transferred: thr + , leu + , and str + (first selected genes, defined as zero time) 3 Conjugation was interrupted at regular intervals. thr + leu + str + azi r Bacteria ton r thr + leu + str + separate 10 min azi r Bacteria separate 16 min tonr thr + leu + str + lac + Bacteria gal + separate azi r ton r thr + leu + str + 25 min (b) Percentage of cells displaying particular trait lac + 100 80 azi r Azir Ton r 60 40 20 Lac + Gal + 0 0 10 20 30 40 50 60 Time (minutes) after start of conjugation between Hfr and F– cells Conclusion: The transfer times indicate the order and relative distances between genes and can be used to construct a genetic map. ◗ 8.17 Jacob and Wollman used interrupted conjugation to map bacterial genes. and to infection by bacteriophage T1 (tons); auxotrophic for threonine (thr) and leucine (leu); and unable to breakdown lactose (lac) and galactose (gal). Thus, the genotypes of the donor and recipient cells were: Donor Hfr cells: Hfr str s thr leu azir tonr lac gal Recipient F cells: F strr thr leu azis tons lac gal The two strains were mixed in nutrient medium and allowed to conjugate. After a few minutes, the medium was diluted to prevent any new pairings. At regular intervals, a sample of cells was removed and agitated vigorously in a kitchen blender to halt all conjugation and DNA transfer. The cells were plated on a selective medium that contained streptomycin and lacked leucine and threonine. The original donor cells were streptomycin sensitive (str s) and would not grow on this medium. The F recipient cells were auxotrophic for leucine and threonine and also failed to grow on this medium. Only cells that underwent conjugation and received at least the leu and thr genes from the Hfr donors could grow on the selective medium. All strr leu thr cells were then tested for the presence of other genes that might have been transferred from the donor Hfr strain. All of the cells that grow on the selective medium are str r leu thr; so we know that these genes were transferred. The percentage of str r leu thr exconjugates receiving specific alleles (azir, tonr, lac, and gal) from the Hfr strain are plotted against the duration of conjugation ( ◗ FIGURE 8.17b). What is the order and distances among the genes? • Solution The first donor gene to appear in all of these exconjugates (at about 9 minutes) was azir. Gene tonr appeared next (after about 10 minutes), followed by lac (at about 18 minutes) and by gal (after 25 minutes). These transfer times indicate the order and relative distances among the genes ( ◗ FIGURE 8.17b). Time (min) 0 5 10 15 20 25 Direction of transfer Gene origin azi ton lac gal Notice that the maximum frequency of exconjugates decreased for the more distant genes. For example, about 90% of the exconjugates received the azir allele, but only about 30% received the gal allele. The lower percentage for gal is due to the fact that some conjugating cells spontaneously broke apart before they were disrupted by the blender. The probability of spontaneous disruption increases with time; so fewer cells had an opportunity to receive genes that were transferred later. 209 210 Chapter 8 Directional transfer and mapping Different Hfr strains have the F factor integrated into the bacterial chromosome at different sites and in different orientations. Gene transfer always begins within F, and the orientation and position of F determine the direction and starting point of gene transfer. In ◗ FIGURE 8.18a, strain Hfr1 has F integrated between leu and azi; the orientation of F at this site dictates that gene transfer will proceed in a counterclockwise direction around the circular chromosome. Genes from this strain will be transferred in the order of: (a) Hfr1 1 Transfer always begins within F, and the orientation of F determines the direction of transfer. 2 In Hfr1, F is integrated between the leu and azi genes;…. thr thi F factor his azi Chromosome ; leu – thr – thi – his – gal – lac – pro – azi Strain Hfr5 has F integrated between the thi and his genes ( ◗ FIGURE 8.18b) and in the opposite orientation. Here gene transfer will proceed in a clockwise direction: pro gal 3 …so the genes are transferred beginning with leu. lac leu thr thi ; thi – thr – leu – azi – pro – lac – gal – his Although the starting point and direction of transfer may differ between two strains, the relative distance in time between any two pairs of genes is constant. Notice that the order of gene transfer is not the same for different Hfr strains ( ◗ FIGURE 8.19a). For example, azi is transferred just after leu in strain HfrH, but long after leu in strain Hfr1. Aligning the sequences ( ◗ FIGURE 8.19b) shows that the two genes on either side of azi are always the same: leu and pro. That they are the same makes sense when one recognizes that the bacterial chromosome is circular and the starting point of transfer varies from strain to strain. These data provided the first evidence that the bacterial chromosome is circular ( ◗ FIGURE 8.19c). leu his gal lac pro azi Genetic map (b) Hfr5 4 In Hfr5, F is integrated between thi and his. thr thi leu F factor his 5 F has the opposite orientation in this chromosome; so the genes are transferred beginning with thi. azi Chromosome pro gal lac thi thr leu azi pro lac gal his Genetic map Concepts Conjugation can be used to map bacterial genes by mixing Hfr and F cells that differ in genotype and interrupting conjugation at regular intervals. The amount of time required for individual genes to be transferred from the Hfr to the F cells indicates the relative positions of the genes on the bacterial chromosome. Natural Gene Transfer and Antibiotic Resistance Many pathogenic bacteria have developed resistance to antibiotics, particularly in environments where antibiotics are routinely used, such as hospitals and fish farms. (Massive amounts of antibiotics are often used in aquaculture to prevent infection in the fish and enhance their growth.) The continual presence of antibiotics in these environments selects for resistant bacteria, which reduces the effectiveness of antibiotic treatment for medically important infections. Antibiotic resistance in bacteria frequently results from the action of genes located on R plasmids, small circular ◗ 8.18 The orientation of the F factor in an Hfr strain determines the direction of gene transfer. Arrowheads indicate the origin and direction of transfer. plasmids that can be transferred by conjugation. R plasmids have evolved in the past 50 years (since the beginning of widespread use of antibiotics), and some convey resistance to several antibiotics simultaneously. Ironic but plausible sources of some of the resistance genes found in R plasmids are the microbes that produce antibiotics in the first place. The results of recent studies demonstrate that R plasmids and their resistance genes are transferred among bacteria in a variety of natural environments. In one study, plasmids carrying genes for resistance to multiple antibiotics were transferred from a cow udder infected with E. coli to a human strain of E. coli on a hand towel: a farmer wiping his hands after milking an infected cow might unwittingly transfer antibiotic resistance from bovine- to human-inhabiting microbes. Conjugation taking place in minced meat on a cutting board allowed R plasmids to be passed from porcine Bacterial and Viral Genetic Systems (a) Order of gene transfer (unaligned) (b) Order of gene transfer with genes aligned Hfr strain Hfr strain thr leu azi pro lac gal his 211 thi thr leu azi pro lac gal his thi H 5 leu thr thi his gal lac pro azi thr leu azi pro lac gal his thi thr leu azi pro lac gal his thi H 1 pro azi leu thr thi his gal lac 4 2 lac pro azi leu thr thi azi pro lac gal his his gal thi thr leu 1 3 thi lac gal his his gal lac pro azi leu thr thi thr leu azi pro 2 4 gal his thi thr leu azi pro lac gal his thi thr leu azi pro lac 3 5 (c) leu his 5 azi pro gal lac leu thi his H azi pro gal lac leu thi his 4 azi pro gal leu thi his 1 azi pro gal lac lac thr thr thr thr thr thr thi leu thi his 2 azi pro gal lac leu thi his 3 pro gal lac Conclusion: The order of the genes on the chromosome is the same, but the position and orientation of the F factor differs among the strains. ◗ 8.19 The order of gene transfer in a series of different Hfr strains indicates that the E. coli chromosome is circular. (pig) to human E. coli. The transfer of R plasmids also occurs in sewage, soil, lake water, and marine sediments. Perhaps most significantly, the transfer of R plasmids is not restricted to bacteria of the same or even related species. R plasmids with multiple antibiotic resistances have been transferred in marine waters from E. coli and other humaninhabiting bacteria (in sewage) to the fish bacterium Aeromona salmonicida and then back to E. coli through raw salmon chopped on a cutting board. These results indicate that R plasmids can spread easily through the environment, passing among related and unrelated bacteria in a variety of common situations. That they can do so underscores both the importance of limiting antibiotic use to treating medically important infections and the importance of hygiene in everyday life. Transformation in Bacteria A second way that DNA can be transferred between bacteria is through transformation (see Figure 8.9b). Transformation played an important role in the initial identification of DNA as the genetic material, which will be discussed in Chapter 10. Transformation requires both the uptake of DNA from the surrounding medium and its incorporation into the bacterial chromosome or a plasmid. It may occur naturally azi when dead bacteria break up and release DNA fragments into the environment. In soil and marine environments, this means may be an important route of genetic exchange for some bacteria. Cells that take up DNA are said to be competent. Some species of bacteria take up DNA more easily than do others; competence is influenced by growth stage, the concentration of available DNA, and the composition of the medium. The uptake of DNA fragments into a competent bacterial cell appears to be a random process. The DNA need not even be bacterial: virtually any type of DNA (bacterial or otherwise) can be transferred to competent cells under the appropriate conditions. As a DNA fragment enters the cell in the course of transformation ( ◗ FIGURE 8.20), one of the strands is hydrolyzed, whereas the other strand associates with proteins as it moves across the membrane. Once inside the cell, this single strand may pair with a homologous region and become integrated into the bacterial chromosome. This integration requires two crossover events, after which the remaining single-stranded DNA is degraded by bacterial enzymes. Bacterial geneticists have developed techniques to increase the frequency of transformation in the laboratory 212 Chapter 8 1/2 nontransformed Recipient DNA Double-stranded fragment of DNA 1/2 transformed One strand of the DNA fragment enters the cell; the other is hydrolyzed. The remainder of the single-stranded DNA fragment is degraded. The single-stranded fragment pairs with the bacterial chromosome and recombination takes place. When the cell replicates and divides,… …one of the resulting cells is transformed and the other is not. ◗ 8.20 Genes can be transferred between bacteria through transformation. in order to introduce particular DNA fragments into cells. They have developed strains of bacteria that are more competent than wild-type cells. Treatment with calcium chloride, heat shock, or an electrical field makes bacterial membranes more porous and permeable to DNA, and the efficiency of transformation can also be increased by using high concentrations of DNA. These techniques make it possible to transform bacteria such as E. coli, which are not naturally competent. Transformation, like conjugation, is used to map bacterial genes, especially in those species that do not undergo conjugation or transduction (see Figure 8.9a and c). Transformation mapping requires two strains of bacteria that differ in several genetic traits; for example, the recipient strain might be a b c (auxotrophic for three nutrients), with the donor cell being prototrophic with alleles a b c. Donor cell 1 DNA from a donor cell is fragmented. Fragments may contain one or more genes of interest. Recipient cell a– a+ c+ b+ a+ b+ DNA from the donor strain is isolated and purified. The recipient strain is treated to increase competency, and DNA from the donor strain is added to the medium. Fragments of the donor DNA enter the recipient cells and undergo recombination with homologous DNA sequences on the bacterial chromosome. Cells that receive genetic material through transformation are called transformants. Genes can be mapped by observing the rate at which two or more genes are transferred together (cotransformed) in transformation. When the DNA is fragmented during isolation, genes that are physically close on the chromosome are more likely to be present on the same DNA fragment and transferred together, as shown for genes a and b in ◗ FIGURE 8.21. Genes that are far apart are unlikely to be present on the same DNA fragment and rarely will be transferred together. Once inside the cell, DNA becomes Uptake of: 2 Fragments are taken up by the recipient cell. a+ a+ c– b– c+ b+ a+ b+ a– c– 3 After entering into the cell, the donor DNA becomes incorporated into the bacterial chromosome through crossing over. b+ c– b– Transformants c+ a– c+ b– b+ a+ a+ c– 4 Genes that are close to one another on the chromosome are more likely to be present on the same DNA fragment and be recombined together. b+ ◗ 8.21 Transformation can be used to map bacterial genes. Conclusion: The rate of cotransformation is inversely proportional to the distances between genes. Bacterial and Viral Genetic Systems incorporated into the bacterial chromosome through recombination. If two genes are close together on the same fragment, any two crossovers are likely to occur on either side of the two genes, allowing both to become part of the recipient chromosome. If the two genes are far apart, there may be one crossover between them, allowing one gene but not the other to recombine with the bacterial chromosome. Thus, two genes are more likely to be transferred together when they are close together on the chromosome, and genes located far apart are rarely cotransformed. Therefore, the frequency of cotransformation can be used to map bacterial genes. If genes a and b are frequently cotransformed, and genes b and c are frequently cotransformed, but genes a and c are rarely cotransformed, then gene b must be between a and c — the gene order is a b c. Concepts Genes can be mapped in bacteria by taking advantage of transformation, the ability of cells to take up DNA from the environment and incorporate it into their chromosomes through crossing over. The relative rate at which pairs of genes are cotransformed indicates the distance between them: the higher the rate of cotransformation, the closer the genes are on the bacterial chromosome. may suggest new targets for antibiotics and other antimicrobial agents. www.whfreeman.com/pierce For a current list of completed and partial microbial genome projects and a list of microbial genome projects funded by the U.S. Department of Energy Viral Genetics All organisms — plants, animals, fungi, and bacteria — are infected by viruses. A virus is a simple replicating structure made up of nucleic acid surrounded by a protein coat (see Figure 2.3). Viruses come in a great variety of shapes and sizes ( ◗ FIGURE 8.22). Some have DNA as their genetic material, whereas others have RNA; the nucleic acid may be double stranded or single stranded, linear or circular. Not surprisingly, viruses reproduce in a number of different ways. Bacteriophages (phages) have played a central role in genetic research since the late 1940s. They are ideal for many types of genetic research because they have small and easily manageable genomes, reproduce rapidly, and produce large numbers of progeny. Bacteriophages have two alternative life cycles: the lytic and the lysogenic cycles. In the lytic cycle, a phage attaches to a receptor on the bacterial cell wall and Bacterial Genome Sequences Genetic maps serve as the foundation for more detailed information provided by DNA sequencing, such as gene content and organization (see Chapter 19 for a discussion of gene sequencing). Geneticists have now determined the complete nucleotide sequence of a number of bacterial genomes. The genome of E. coli, one of the most widely studied of all bacteria, is a single circular DNA molecule approximately 1 mm in length. It consists of 4,638,858 nucleotides and an estimated 4300 genes, more than half of which have no known function. These “orphan genes” may play important roles in adapting to unusual environments, coordinating metabolic pathways, organizing the chromosome, or communicating with other bacterial cells. A number of other bacterial genomes have been completely sequenced (see Table 19.2), and many additional microbial sequencing projects are underway. A substantial proportion of genes in all bacteria have no known function. Certain genes, particularly those with related functions, tend to reside next to one another, but these clusters are in very different locations in different species, suggesting that bacterial genomes are constantly being reshuffled. Comparisons of the gene sequences of pathogenic and nonpathogenic bacteria are helping to identify genes implicated in disease and ◗ 8.22 Viruses come in a great variety of shapes and sizes. (Top, Dr. Dennis Kunkel/Phototake; bottom, R.W. Horne/Photo Researchers.) 213 214 Chapter 8 1 The phage binds to the bacterium. 7 New phages are released to start the cycle again. Phage Lysis Host DNA Phage DNA 3 The prophage may separate and the cell will enter the lytic cycle. 2 The phage DNA enters the host cell. 6 Assembly of new phages is complete. A phage-encoded enzyme causes the cell to lyse. Lytic cycle Lysogenic cycle 3 The host DNA is digested. 2 Chromosome with integrated prophage replicates. This replication can continue through many cell divisions. 5 The host cell transcribes and translates the phage DNA, producing phage proteins. Replicated phage Prophage 4 Phage DNA replicates by using nucleotides from former host DNA. 1 The phage DNA integrates into the bacterial chromosome and becomes a prophage. ◗ 8.23 Bacteriophages have two alternating life cycles — lytic and lysogenic. injects its DNA into the cell ( ◗ FIGURE 8.23). Once inside the cell, the phage DNA is replicated, transcribed, and translated, producing more phage DNA and phage proteins. New phage particles are assembled from these components. The phages then produce an enzyme that breaks open the cell, releasing the new phages. Virulent phages reproduce strictly through the lytic cycle and always kill their host cells. Temperate phage can utilize either the lytic or the lysogenic cycle. The lysogenic cycle begins like the lytic cycle (see Figure 8.23) but, inside the cell, the phage DNA integrates into the bacterial chromosome, where it remains as an inactive prophage. The prophage is replicated along with the bacterial DNA and is passed on when the bacterium divides. Certain stimuli cause the prophage to dissociate from the bacterial chromosome and enter into the lytic cycle, producing new phage particles and lysing the cell. www.whfreeman.com/pierce viruses For additional information on Techniques for the Study of Bacteriophages Viruses reproduce only within host cells; so bacteriophages must be cultured in bacterial cells. To do so, phages and bacteria are mixed together and plated on solid medium in a petri plate. A high concentration of bacteria is used so that the colonies grow into one another and produce a continuous layer of bacteria, or “lawn,” on the agar. An individual phage infects a single bacterial cell and goes through its lytic cycle. Many new phages are released from the lysed cell and infect additional cells; the cycle is then repeated. The bacteria grow on solid medium; so the diffusion of the phages is restricted and only nearby cells are infected. After several rounds of phage reproduction, a clear patch of lysed cells (a plaque) appears on the plate ( ◗ FIGURE 8.24). Each plaque represents a single phage that multiplied and lysed many cells. Plating a known volume of a dilute solution of phages on a bacterial lawn and counting the number of plaques that appear can be used to determine the original concentration of phage in the solution. Bacterial and Viral Genetic Systems Experiment Question: How can we determine the position of a gene on a phage chromosome? h– Infection of E. coli B r– h+ r+ r– h+ ◗ 8.24 Plaques are clear patches of lysed cells on a lawn of bacteria. (E.C.S. Chan/Visuals Unlimited.) Concepts 1 Within the bacterial cells, crossing over between the two viral chromosomes… Viral genomes may be DNA or RNA, circular or linear, and double or single stranded. Bacteriophages are used in many types of genetic research. r– h+ r – r+ Recombination 2 ...produced recombinant progeny (h+ r+ and h– r –). h+ h– h+ r – 3 Some viral h– r+ h+ r+ h– r– h+ r + chromosomes do not cross over, resulting in nonrecombinant progeny. h– h– r – r+ h– r + Gene Mapping in Phages Mapping genes in bacteriophage requires the application of the same principles as those applied to mapping genes in eukaryotic organisms (Chapter 7). Crosses are made between viruses that differ in two or more genes, and recombinant progeny phage are identified and counted. The proportion of recombinant progeny is then used to estimate the distances between the genes and their linear order on the chromosome. In 1949, Alfred Hershey and Raquel Rotman examined rates of recombination between genes in two strains of the T2 bacteriophage that differed in plaque appearance and host range (the bacterial strains that the phages could infect). One strain was able to infect and lyse type B E. coli cells but not B/2 cells (normal host range, h) and produced an abnormal plaque that was large with distinct borders (r). The second strain was able to infect and lyse both B and B/2 cells (mutant host range, h) and produced normal plaques that were small with fuzzy borders (r). Hershey and Rotman crossed the h r and h r strains of T2 by infecting type B E. coli cells with a mixture of the two strains. They used a high concentration of phages so that most cells could be simultaneously infected by both strains ( ◗ FIGURE 8.25). Homologous recombination occasionally took place between the chromosomes of the different strains, producing h r and h r chromosomes, which ◗ 8.25 Hershey and Rotman developed a technique for mapping viral genes. (Photo from G.S. Stent, Molecular Biology of Bacterial Viruses. Copyright © 1963 by W.H. Freeman and Company.) NonRecombinant recombinant phage phage produces produces cloudy, large cloudy, small plaques plaques Recombinant Nonphage recombinant produces phage produces clear, large clear, small plaques plaques FPO Photo of Lawn of E. coli B and E. coli B/2 Results of a cross for the h and r genes in phage T2 (h r+  h+r) Genotype Plaques h– r + 42 h+ r – 34 h+ r + 12 h– r – 12 RF  Designation 4 Progeny phages were then plated on a mixture of E. coli B and E. coli B/2 cells,... 5 ...which allowed all four genotypes of progeny to be identified. Parental progeny 6 The percentage 76% of recombinant progeny allowed the h– and r – Recombinant mutants to be 24% mapped. recombinant plaques (h+ r +)  (h– r – )  total plaques total plaques Conclusion: The recombination frequency indicates that the distance between h and r genes is 24%. 215 216 Chapter 8 Table 8.4 Progeny phage produced from h r   h r  Phenotype Experiment Question: Does genetic exchange between bacteria always require cell-to-cell contact? Genotype Clear and small hr Cloudy and large hr Cloudy and small hr Clear and large hr were then packaged into new phage particles. When the cells lysed, the recombinant phages were released, along with the nonrecombinant h r phages and h r phages. Hershey and Rotman diluted and plated the progeny phages on a bacterial lawn that consisted of a mixture of B and B/2 cells. Phages carrying the h allele (which conferred the ability to infect only B cells) produced a cloudy plaque because the B/2 cells did not lyse. Phages carrying the h allele produced a clear plaque because all the cells within the plaque were lysed. The r phages produced small plaques, whereas the r phages produced large plaques. The genotypes of these progeny phages could therefore be determined by the appearance of the plaque (see Figure 8.25 and Table 8.4). In this type of phage cross, the recombination frequency (RF) between the two genes can be calculated by using the following formula: RF  recombinant plaques total plaques In Hershey and Rotman’s cross, the recombinant plaques were h r and h r; so the recombination frequency was RF  (h  r )  (h  r  ) total plaques trp – tyr –met + trp + tyr +met – – phe + his + his phe – 1 Two auxotrophic strains of Salmonella typhimurium were mixed… trp + tyr +met – – phe + his trp – tyr –met + phe – his + 4 When the two strains were placed in a Davis U-tube,… Filter 2 …and plated on minimal medium. Prototrophic colonies 5 …which separated the strains by a filter with pores too small for the bacteria to pass through,… No colonies Prototrophic colonies Recombination frequencies can be used to determine the distances and orders of genes on the phage chromosome, just as recombination frequencies are used to map genes in eukaryotes. Concepts To map phage genes, bacterial cells are infected with viruses that differ in two or more genes. Recombinant plaques are counted, and rates of recombination are used to determine the linear order of the genes on the chromosome and the distances between them. trp + tyr +met + phe + his + 3 Some prototrophic colonies were obtained. trp + tyr +met + phe + his + 6 …prototrophic colonies were obtained from only one side of the tube. Conclusion: Genetic exchange did not take place via conjugation. A phage was later shown to be the agent of transfer. Transduction: Using Phages to Map Bacterial Genes ◗ 8.26 In the discussion of bacterial genetics, we identified three mechanisms of gene transfer: conjugation, transformation, and transduction (see Figure 8.9). Let’s take a closer look at transduction, in which genes are transferred between bacte- ria by viruses. In generalized transduction, any gene may be transferred. In specialized transduction, only a few genes are transferred. The Lederberg and Zinder experiment. Bacterial and Viral Genetic Systems Generalized transduction Joshua Lederberg and Norton Zinder discovered generalized transduction in 1952. They were trying to produce recombination in the bacterium Salmonella typhimurium by conjugation. They mixed a strain of S. typhimurium that was phe trp tyr met his with a strain that was phe trp tyr met his ( ◗ FIGURE 8.26) and plated them on minimal medium. A few prototrophic recombinants (phe trp tyr met his) appeared, suggesting that conjugation had taken place. However, when they tested the two strains in a U-shaped tube similar to the one used by Davis, some phe trp tyr met his prototrophs were obtained on one side of the tube (compare Figure 8.26 with Figure 8.11). This apparatus separated the two strains by a filter with pores too small for the passage of bacteria; so how were genes being transferred between bacteria in the absence of conjugation? The results of subsequent studies revealed that the agent of transfer was a bacteriophage. In the lytic cycle of phage reproduction, the bacterial chromosome is broken into random fragments ( ◗ FIGURE 8.27). For some types of bacteriophage, a piece of the bacterial chromosome occasionally gets packaged into a phage coat instead of phage DNA; these phage particles are called transducing phages. The transducing phage infects a new cell, releasing the bacterial DNA, and the introduced genes may then become integrated into the bacterial chromosome by a double crossover. Bacterial genes can, by this process, be moved from one bacterial strain to another, producing recombinant bacteria called transductants. Not all phages are capable of transduction, a rare event that requires (1) that the phage degrade the bacterial chromosome; (2) that the process of packaging DNA into the phage protein not be specific for phage DNA; and (3) that the bacterial genes transferred by the virus recombine with the chromosome in the recipient cell. A donor strain of bacterium is infected with phage. Phage Phage DNA Donor bacterium ◗ In phage reproduction, …and some of the the bacterial chromobacterial genes some is fragmented… become incorporated into a few phages. 217 Because of the limited size of a phage particle, only about 1% of the bacterial chromosome can be transduced. Only genes located close together on the bacterial chromosome will be transferred together (cotransduced). The overall rate of transduction ranges from only about 1 in 100,000 to 1 in 1,000,000. Because the chance of a cell being transduced by two separate phages is exceedingly small, any cotransduced genes are usually located close together on the bacterial chromosome. Thus, rates of cotransduction, like rates of cotransformation, give an indication of the physical distances between genes on a bacterial chromosome. To map genes by using transduction, two bacterial strains with different alleles at several loci are used. The donor strain is infected with phages ( ◗ FIGURE 8.28), which reproduce within the cell. When the phages have lysed the donor cells, a suspension of the progeny phage is mixed with a recipient strain of bacteria, which are then plated on several different kinds of media to determine the phenotypes of the transducing progeny phages. Concepts In transduction, bacterial genes become packaged into a viral coat, are transferred to another bacterium by the virus, and become incorporated into the bacterial chromosome by crossing over. Bacterial genes can be mapped with the use of generalized transduction. Specialized transduction Like generalized transduction, specialized transduction requires gene transfer from one bacterium to another through phages, but here only genes near particular sites on the bacterial chromosome are transferred. This process requires lysogenic bacteriophages. The After lysis of the cell, transducing phages are released… …and may transfer the bacterial genes to another bacterium. Recombination between …produces the transferred genes a transduced and the bacterial bacterial cell. chromosome… Fragments of bacterial chromosome Transducing phage Normal phage 8.27 Genes can be transferred from one bacterium to another through generalized transduction. Recipient cell Transductant 218 Chapter 8 Recombination a+ a+ 1 A donor strain of bacteria that is a+ b+ c + is infected with phage. Phage Phage DNA a+ b+ c+ c+ b+ c+ a– a+ c– b– a– b– a– c– b– a– 2 The bacterial chromosome is broken down, and bacterial genes are incorporated into some of the progeny phages… a+ b+ a+ b+ a– a– c– a+ c– 4 Transfer of genes from the donor strain and recombination produce transductants in the recipient bacteria. c+ c– Cotransductant c– Nontransductant b+ a– c– b– 3 …which are used to infect a recipient strain of bacteria that is a – b– c –. ◗ 8.28 Single transductants b– b– a– c– b+ b– b+ c– b– Conclusion: Genes located close to each other are more likely to be cotransduced, so the rate of cotransduction is inversely proportional to the distances between genes. Generalized transduction can be used to map genes. prophage may imperfectly excise from the bacterial chromosome, carrying with it a small part of the bacterial DNA adjacent to the site of prophage integration. A phage carrying this DNA will then inject it into another bacterial cell in the next round of infection. This process resembles the situation in F cells, where the F plasmid carries genes from one bacterium into another (see Figure 8.16). One of the best-studied examples of specialized transduction is in bacteriophage lambda (), which integrates into the E. coli chromosome at the attachment (att) site. The phage DNA contains a site similar to the att site; a single crossover integrates the phage DNA into the bacterial chromosome ( ◗ FIGURE 8.29a). The  prophage is excised through a similar crossover that reverses the process ( ◗ FIGURE 8.29b and c). An error in excision may cause genes on either side of the bacterial att site to be excised along with some of the phage DNA ( ◗ FIGURE 8.29d and e). In E. coli, these genes are usually the gal (galactose fermentation) and bio (biotin biosynthesis) genes. When a transducing phage carrying the gal gene infects another bacterium, the gene may integrate into the bacterial chromosome along with the prophage ( ◗ FIGURE 8.29f ), giving the bacterial chromosome two copies of the gal gene ( ◗ FIGURE 8.29g). These transductants are unstable, because the prophage DNA may excise from the chromosome, carrying the introduced gene with it. Stable transductants are produced when the gal gene in the phage is exchanged for the gal gene in the chromosome through a double crossover ( ◗ FIGURE 8.29h). Concepts Specialized transduction transfers only those bacterial genes located near the site of prophage insertion. Connecting Concepts Three Methods for Mapping Bacterial Genes Three methods of mapping bacterial genes have now been outlined: (1) interrupted conjugation; (2) transformation; and (3) transduction. These methods have important similarities and differences. Mapping with interrupted conjugation is based on the time required for genes to be transferred from one Bacterial and Viral Genetic Systems (a) 219 2 1 The phage chromosome integrates into the bacterial chromosome through a single crossover at the homologous att sites. 3 Phage chromosome 3 att sites gal + Bacterial chromosome (b) 1 (d) 2 2 A normal excision through a similar crossover… 1 1 gal + 2 3 4 An error in excision… gal + (c) (e) 3 3 …produces a normal phage chromosome. 2 5 …produces a phage chromosome that carries the bacterial gal +, called lambda gal defective (dgal). 1 1 gal + (dgal) 2 gal + 3 (h) (f) 2 6 The dgal chromosome may integrate into a bacterial chromosome already containing a copy of the  prophage,… 1 gal + 1 gal + gal – 1 2 gal – 3 (g) Unstable transductant gal – 7 …producing an unstable transductant. 8 The gal + allele on the phage may recombine with the gal – allele on the bacterial chromosome… 2 (i) Stable transductant 1 2 gal + dgal 1 2 3 9 …producing a stable transductant with a gal + allele. gal +  prophage ◗ 8.29 Bacteria can exchange genes through specialized transduction. Segments 1, 2, and 3 represent genes on the phage chromosome. bacterium to another by means of cell-to-cell contact. The key to this technique is that the bacterial chromosome itself is transferred, and the order of genes and the time required for their transfer provide information about the positions of the genes on the chromosome. In contrast with other mapping methods, the distance between genes is measured not in recombination frequencies but units of time required for genes to be transferred. Here, the basic unit of conjugation mapping is a minute. In gene mapping with transformation, DNA from the donor strain is isolated, broken up, and mixed with the recipient strain. Some fragments pass into the recipient cells, 220 Chapter 8 where the transformed DNA may recombine with the bacterial chromosome. The unit of transfer here is a random fragment of the chromosome. Loci that are close together on the donor chromosome tend to be on the same DNA fragment; so the rates of cotransformation provide information about the relative positions of genes on the chromosome. Transduction mapping also relies on the transfer of genes between bacteria that differ in two or more traits, but here the vehicle of gene transfer is a bacteriophage. In a number of respects, transduction mapping is similar to transformation mapping. Small fragments of DNA are carried by the phage from donor to recipient bacteria, and the rates of cotransduction, like the rates of cotransformation, provide information about the relative distances between the genes. All of the methods use a common strategy for mapping bacterial genes. The movement of genes from donor to recipient is detected by using strains that differ in two or more traits, and the transfer of one gene relative to the transfer of others is examined. Additionally, all three methods rely on recombination between the transferred DNA and the bacterial chromosome. In mapping with interrupted conjugation, the relative order and timing of gene transfer provide the information necessary to map the genes; in transformation and transduction, the rate of cotransfer provides this information. In conclusion, the same basic strategies are used for mapping with interrupted conjugation, transformation, and transduction. The methods differ principally in their mechanisms of transfer: in conjugation mapping, DNA is transferred though contact between bacteria; in transformation, DNA is transferred as small naked fragments; and, in transduction, DNA is transferred by bacteriophages. Fine-Structure Analysis of Bacteriophage Genes In the 1950s and 1960s, Seymour Benzer conducted a series of experiments to examine the structure of a gene. Because no molecular techniques were available at the time for directly examining nucleotide sequences, Benzer was forced to infer gene structure from analyses of mutations and their effects. The results of his studies showed that different mutational sites within a single gene could be mapped (intragenic mapping) by using techniques similar to those just described. Different sites within a single gene are very close together; so recombination between them takes place at a very low frequency. Because large numbers of progeny are required to detect these recombination events, Benzer used the bacteriophage T4, which reproduces rapidly and produces large numbers of progeny. Benzer’s mapping techniques Wild-type T4 phages normally produce small plaques with rough edges when ◗ 8.30 T4 phage rII mutants produce distinct plaques when grown on E. coli B cells. (Dr. D. P. Snustad, College of Biological Sciences, University of Minnesota.) grown on a lawn of E. coli bacteria. Certain mutants, called r for rapid lysis, produce larger plaques with sharply defined edges. Benzer isolated phages with a number of different r mutations, concentrating on one particular subgroup called rII mutants. Wild-type T4 phages produce typical plaques ( ◗ FIGURE 8.30) on E. coli strains B and K. In contrast, the rII mutants produce r plaques on strain B and do not form plaques at all on strain K. Benzer recognized the r mutants by their distinctive plaques when grown on E. coli B. He then collected lysate from these plaques and used it to infect E. coli K. Phages that did not produce plaques on E. coli K were defined as the rII type. Benzer collected thousands of rII mutations. He simultaneously infected bacterial cells with two different mutants and looked for recombinant progeny ( ◗ FIGURE 8.31). Consider two rII mutations, a and b, and their wild-type alleles, a and b. Benzer infected E. coli B cells with two different strains of phages, one a b and the other a b (Figure 8.31, step 1). While reproducing within the B cells, a few phages of the two strains recombined (Figure 8.31, step 2). A single crossover produces two recombinant chromosomes; one with genotype a b and the other with genotype a b : Phage 1 a b Phage 2 a b a b a b a b a b The resulting recombinant chromosomes, along with the nonrecombinant (parental) chromosomes, were incorporated Bacterial and Viral Genetic Systems 221 Experiment Question: How can rII phage mutants be mapped and what can they reveal about the structure of the gene? 1 E. coli B cells are simultaneously infected with two rII phage mutants. rII mutant 2 chromosome b– 3 One recombinant chromosome contains both mutants (a– b–). 4 The other recombinant chromosome is wild type (a+ b+). No plaques a+ Infect E. coli K cell Infect E. coli B cells a– b– b+ a+ a+ Wild type a– Infect E. coli K cells Recombination a– a– b+ b– b+ Plaques produced by a+ b+ recombinant phage b– b+ Gene bearing two mutations a– rII mutant 1 chromosome 5 Only the a+ b+ recombinant can grow on E. coli K, allowing them to be identified. a+ b– a+ 2 Recombination between the two phage chromosomes within the doubly infected cells produced recombinant chromosomes. b+ Infect E. coli K cell No plaques Gene-structure map of the rII region rIIA rIIB The frequencies of recombinants were used to map rII mutants. Each box represents one DNA base pair. Conclusion: Mapping more than 2400 rII mutants provided information about the internal structure of a gene at the base-pair level—the first view of the molecular structure of a gene. Mutations were found at each location shown in red. ◗ 8.31 Benzer developed a procedure for mapping rII mutants. Two different rII mutants (a b and a b) are isolated on E. coli B cells. Neither will grow on E. coli K cells. Only the a b recombinant can grow on E. coli K, allowing these recombinants to be identified. rIIA and rIIB refer to different parts of the gene. into progeny phages (Figure 8.31, steps 3 and 4), which were then used to infect E. coli K cells. The resulting plaques were examined to determine the genotype of the infecting phage. The rII mutants would not grow on E. coli K, but wildtype phages could; so progeny phages with the recombinant genotype a b produced plaques on E. coli K (Figure 8.31, step 5). Each recombination event produces an equal number of double mutants (a b) and wild-type chromosomes (a b); so the number of recombinant progeny should be twice the number of wild-type plaques that appeared on E. coli K. The recombination frequency between the two rII mutants would be: recombination frequency  2  number of plaques on E. coli K total number of plaques on E. coli B Benzer was able to detect a single recombinant among billions of progeny phages, allowing very low rates of recombination to be detected. Recombination frequencies are proportional to physical distances along the chromosome (p. 000 in Chapter 7), revealing the positions of the different mutations within the rII region of the phage chromosome. In this way, Benzer eventually mapped more than 2400 rII mutations, many corresponding to single base pairs in the viral DNA. His work provided the first molecular view of a gene. Concepts In a series of experiments with the bacteriophage T4, Seymour Benzer showed that recombination could occur within a single gene and created the first molecular map of a gene. 222 Chapter 8 Complementation experiments At the time Benzer was conducting his experiments, the relationship between genes and DNA structure was unknown. A gene had been defined as a functional unit of heredity that coded for a phenotype. To test whether different rII mutations belonged to different functional genes, Benzer used the complementation (cistrans) test (see pp. xxx in Chapter 5). Individuals heterozygous for two mutations may have the mutations in trans,   a b a b meaning that they are located on different chromosomes, or in cis, meaning that they are located on the same chromosome a a b b (see p. xxx in Chapter 7). Suppose that the a and b mutations occur at different loci, which code for different proteins. In the trans heterozygote a b  a b one chromosome has a functional allele at the a locus ( a b ) and the other chromosome has a functional b ); since a and b are reallele at the b locus ( a cessive mutations, both A and B proteins will be produced. The two mutations complement each other, so the presence of wild type trait in the trans heterozygote indicates that these mutations belong to different complementation groups — they come from different loci. Suppose the two mutations occur within a single locus that codes for one protein. In the trans heterozygote, one chromosome fails to produce a functional protein because it has a defect at the b site ( a b ) and the other chromosome fails to produce a functional protein because it has a defect at the a site ( a b ). No functional protein is produced by either chromosome, and the trans heterozygote has a mutant phenotype — the mutations are unable to complement each other. The heterozygous individual used in complementation testing must have the mutations in the trans configuration. When the mutations are in the cis configuration: a a b b heterozygotes will have a wild-type phenotype regardless of whether the two mutations occur at the same locus or at different loci, because one chromosome ( a b ) is mutation free. To carry out the complementation test in bacteriophage, Benzer infected cells of E. coli K with large numbers of two mutant strains of phage ( ◗ FIGURE 8.32, step 1). We will refer to the two mutations as rIIa ( a b ) and   rIIb ( a b ). Cells infected with both mutants: Experiment Question: How do we determine whether two different rII mutants occur at the same locus? Mutant rII a – phage a– Mutant rII b – phage b+ a+ b– 1 E. coli K cells are simultaneously infected by two different rII mutants (a– and b–),… 2 …making the cells functionally heterozygous for the mutations. a– b+ a+ b– 6 If the two mutations belong to the same cistron,… 3 If these two mutations belong to different cistrons,… a– b+ a – b+ a+ b– a+ b – Cistron I Cistron 2 Protein A Protein B Cistron No functional protein 7 …there is no complementation, no functional proteins are produced,… 4 ...there is complementation and functional proteins are produced,… 5 ...causing the formation of plaques. Plaques produced No plaques produced 8 …and no plaques are formed. Conclusion: The complementation test indicates whether two mutations occur at the same locus or at different loci. ◗ 8.32 Complementation tests are used to determine whether different mutations are at the same functional gene. a a b b were effectively heterozygous for the phage genes, with the mutations in the trans configuration ( ◗ FIGURE 8.32, step 2). In the complementation testing, the phenotypes of progeny phages were examined on the K strain, rather than the B strain as illustrated in Figure 8.31. Bacterial and Viral Genetic Systems (a) Gene B lies entirely within gene A… H A (b) Bases in RNA 223 Amino acids encoded by DNA base triplets Val Glu Ala Cys Val Tyr Gly Thr Leu Asp Phe Reading frame for G U U G A G G C U U G C G U U U A U G G U A C G C U G G A C U U U G gene D G Reading frame for G U U G A G G C U U G C G U U U A U G G U A C G C U G G A C U U U G gene E Met Val Arg Trp Thr Leu B D F J C E The reading frame for gene E is shifted one base pair relative to that for gene D. …and gene E within gene D. φX174 The reading frame encodes different amino acids and therefore a different protein. ◗ 8.33 The genome of bacteriophage X174 contains overlapping genes. The genome contains nine genes (A through J ). If the rIIa and rIIb mutations occur at different loci that code for different proteins then, in bacterial cells infected by both mutants, the wild-type sequences on the chromosome opposite each mutation will overcome the effects of the recessive mutations; the phages will produce normal plaques on E. coli K cells ( ◗ FIGURE 8.32, steps 3, 4, and 5). (Benzer coined the term cistron to designate a functional gene defined by the complementation test.) If, on the other hand, the mutations occur at the same locus, no functional protein is produced by either chromosome, and no plaques develop in the E. coli K cells ( ◗ FIGURE 8.32 steps 6, 7, and 8). Thus, the absence of plaques indicates that the two mutations occur at the same locus. In the complementation test, the cis heterozygote is used as a control. Benzer simultaneously infected bacteria with wild-type phage ( a b ) and with phage carrying both mutations ( a b ). This test also produced cells that were heterozygous and cis for the phage genes: a a b b Regardless of whether the rIIa and rIIb mutations are in the same functional unit, these cells contain a copy of the wildb ) and will produce type phage chromosome ( a normal plaques in E. coli K. Benzer carried out complementation testing on many pairs of rII mutants. He found that the rII region consists of two loci, designated rIIA and rIIB. Mutations belonging to the rIIA and rIIB groups complemented each other, but mutations in the rIIA group did not complement others in rIIA; nor did mutations in the rIIB group complement others in rIIB. Concepts Benzer used the complementation test to distinguish between functional genes (loci). At the time of Benzer’s research, many geneticists believed that genes were indivisible and that recombination could not take place within them. Benzer demonstrated that intragenic recombination did indeed take place (although at a very low rate) and gave geneticists their first glimpse at the structure of an individual gene. Overlapping Genes The first viral genome to be completely sequenced, that of bacteriophage X174, revealed surprising information: the nucleotide sequences of several genes overlapped. This genome encodes nine proteins ( ◗ FIGURE 8.33). Two of the genes are nested within other genes; in both cases, the same DNA sequence codes for two different proteins by using different reading frames (see p. 000 in Chapter 13). In five of the X174 genes, the initiation codon of one gene overlaps the termination codon of another. The results of subsequent studies revealed that overlapping genes are found in a number of viruses and bacteria. Viral genome size is strictly limited by the capacity of the viral protein coat; so there is strong selective pressure for economic use of the DNA. Concepts Some viruses contain overlapping genes, in which the same base sequence specifies more than one protein. RNA Viruses Viral genomes may be encoded in either DNA or RNA. Some medically important human viruses have RNA as their genetic material, including those that cause influenza, common colds, polio, and AIDS. Almost all viruses that infect plants have RNA genomes. The medical and economic importance of RNA viruses has encouraged their study. RNA viruses, like bacteriophages, reproduce by infecting cells and making copies of themselves. Most use RNAdependent RNA polymerases encoded by their own genes. 224 Chapter 8 In positive-strand RNA viruses, the genomic RNA molecule carried inside the viral particle codes directly for viral proteins ( ◗ FIGURE 8.34a). In negative-strand RNA viruses, the virus first makes a complementary copy of its RNA genome, which is then translated into viral proteins ( ◗ FIGURE 8.34b). (a) Positive-strand (+) RNA virus RNA (+) strand + (b) Negative-strand (–) RNA virus Virus enters cell and releases RNA. RNA (–) strand – Cells – + Positive-strand RNA codes directly for coat proteins. Polypeptide Ribosome The other RNA strand is synthesized. Negative-strand RNA is copied into complementary RNA… …and the complementary positive-strand copy codes for viral proteins. RNA is replicated. Viral protein Positive-strand RNA + ◗ Negative-strand RNA Progeny and RNA viruses assemble. Viral protein – 8.34 The process of reproduction differs in positive-strand RNA viruses and negative-strand RNA viruses. RNA viruses capable of integrating into the genome of their hosts, much as temperate phages insert themselves into bacterial chromosomes, are called retroviruses ( ◗ FIGURE 8.35a). Because the retroviral genome is RNA, whereas that of the host is DNA, a retrovirus must produce reverse transcriptase, an enzyme that synthesizes complementary DNA (cDNA) from either an RNA or a DNA template. A retrovirus uses reverse transcriptase to make a doublestranded DNA copy from its single-stranded RNA genome. The DNA copy then integrates into the host chromosome to form a provirus, which is replicated by host enzymes when the host chromosome is duplicated ( ◗ FIGURE 8.35b). When conditions are appropriate, the provirus undergoes transcription to produce numerous copies of the original RNA genome. This RNA codes for viral proteins and serves as genomic RNA for new viral particles. As these viruses escape the cell, they collect patches of the cell membrane to use as their envelopes. All known retroviral genomes have in common three genes: gag, pol, and env ( ◗ FIGURE 8.36), each encoding a precursor protein that is cleaved into two or more functional proteins. The gag gene encodes the three or four proteins that make up the viral capsid. The pol gene codes for reverse transcriptase and an enzyme, called integrase, that inserts the viral DNA into the host chromosome. The env gene codes for the glycoproteins, which appear on the viral envelope that surrounds the viral capsid. Some retroviruses contain oncogenes (Chapter 20) that may stimulate cell division and cause the formation of tumors. The first retrovirus to be isolated, the Rous sarcoma virus, was originally recognized by its ability to produce connective-tissue tumors (sarcomas) in chickens. The human immunodeficiency virus (HIV) is a retrovirus that causes acquired immune deficiency syndrome. AIDS was first recognized in 1982, when a number of homosexual males in the United States began to exhibit symptoms of a new immune-system-deficiency disease. In that year, Robert Gallo proposed that AIDS was caused by a retrovirus. Between 1983 and 1984, as the AIDS epidemic became widespread, the HIV retrovirus was isolated from AIDS patients. HIV is thought to have appeared first in Africa in the 1950s or 1960s. It is closely related to several retroviruses found in monkeys and may have evolved when a monkey retrovirus mutated and infected humans. HIV is transmitted by sexual contact between humans and through any type of blood-to-blood contact, such as that caused by the sharing of dirty needles by drug addicts. Until screening tests could identify HIV-infected blood, transfusions and clotting factors used by hemophiliacs also were sources of infection. HIV principally attacks a class of blood cells called helper T lymphocytes ( ◗ FIGURE 8.37). HIV enters a helper T cell, undergoes reverse transcription, and integrates into the chromosome. The virus reproduces rapidly, destroying the T cell as new virus particles escape from the cell. Because helper T cells are central to immune function and Bacterial and Viral Genetic Systems (a) Retrovirus (b) Viral-envelope glycoprotein Core-shell proteins 225 Envelope Retroviral RNA Capsid protein Reverse transcriptase 1 Virus attaches to host cell at receptors in the membrane. Receptor Viral protein coat (capsid) Viral proteins degrade Single-stranded RNA genome (two copies) Reverse transcriptase Retrovirus ◗ 8.35 A retrovirus uses reverse transcription to incorporate its RNA into the host DNA. (a) Structure of a typical retrovirus. Two copies of the single-stranded RNA genome and reverse transcriptase enzyme are shown enclosed within a protein capsid. The capsid is surrounded by a viral envelope that is studded with viral glycoproteins. (b) The retrovirus life cycle. are destroyed in the infection, AIDS patients have a diminished immune response — most AIDS patients die of secondary infections that develop because they have lost the ability to fight off pathogens. The HIV genome is 9749 nucleotides long and carries gag, pol, env, and six other genes that regulate the life cycle of the virus. HIV’s reverse transcriptase is very error prone, giving the virus a high mutation rate and allowing it to evolve rapidly, even within a single host. This rapid evolution makes the development of an effective vaccine against HIV particularly difficult. Reverse transcriptase 2 The viral core is uncoated as it enters the host cell. RNA template 3 Viral RNA uses reverse transcriptase to make complementary DNA. cDNA strand 4 Viral RNA degrades. 5 Reverse transcriptase synthesizes the second DNA strand. 6 The viral DNA enters the nucleus and is integrated into the host chromosome, forming a provirus. 7 On activation, proviral DNA transcribes viral RNA, which is exported to the cytoplasm. Nucleus Host DNA Transcription Viral RNA 8 In the cytoplasm, the viral RNA is translated. 9 Viral RNA, proteins, new capsids, and envelopes are assembled. Translation Concepts Retrovirus is an RNA virus that integrates into its host chromosome by making a DNA copy of its RNA genome through the process of reverse transcription. Human immunodeficiency virus, the causative agent of AIDS, is a retrovirus. Prions: Pathogens Without Genes In 1997, Stanley B. Prusiner was awarded the Nobel Prize in Physiology or Medicine for his discovery and characterization of prions, a novel class of pathogens that cause several rare neurodegenerative diseases and that appear to replicate without any genes. Initially, Prusiner’s proposal that prions were composed entirely of protein and lacked any trace of nucleic acid was met with skepticism. One of the foundations of modern biology is that all living things possess 10 An assembled virus buds from the cell membrane. 226 Chapter 8 RNA genome gag pol env Viral flanking sequences Viral flanking sequences The RNA genes are copied into DNA by reverse transcription. Viral flanking sequences at each end are duplicated. DNA genome Host DNA gag Viral flanking sequences pol Viral flanking sequences Translation Viral capsid protein Integrase and reverse transcriptase proteins Host DNA env Envelope proteins Viral genes code for viral proteins. ◗ 8.36 The typical genome of a retrovirus contains gag, pol, and env genes. hereditary information in the form of DNA or RNA, and so how are prions able to reproduce without nucleic acid? Prions were first recognized as unusual infectious agents that cause scrapie, a disease of sheep that destroys the brain. In 1982, Prusiner purified the scrapie pathogen and reported that it consisted entirely of protein. Prusiner and his colleagues eventually showed that the prion protein (PrP) is derived from a normal protein that is encoded by a gene found throughout eukaryotes, including yeast. Normal PrP (PrPC) is folded into a helical shape, but the protein can also fold into a flattened sheet that causes scrapie (PrPSc) ( ◗ FIGURE 8.38). When PrPSc is present, it interacts with and causes PrPC to fold into the disease-causing form of the protein; infection with PrPSc converts an individual’s normal PrP protein into abnormal PrP that forms prions. Accumulation of the PrPSc in the brain appears to be responsible for the neurological degeneration associated with diseases caused by prions. This explanation for prion diseases, called the “protein only” hypothesis, is not universally accepted; some scientists still believe that these diseases are caused by an as-of-yet unisolated virus. Prions cause scrapie, bovine spongiform encephalopathy (BSE, or “mad cow” disease), and kuru, an exotic disorder spread among New Guinea aborigines by ritualistic cannibalism. They also play a role in some inherited human neurodegenerative disorders, including Creutzfeldt-Jakob disease and Gerstmann-Scheinker disease. In these inherited diseases, the PrP gene is mutated and produces a type of PrP that is more susceptible to folding into PrPSc. Nearly all those who carry such a mutated gene eventually produce prions and get the disease. Some cases of human prion diseases have been traced to injections of growth hormone, which until recently was obtained from the brains of human cadavers infected with prions. In England, an epidemic of mad cow disease erupted in the late 1980s, the origin of which was traced to cattle feed containing the remains of sheep infected with scrapie. www.whfreeman.com/pierce For more information on prions, prion-caused diseases, and Stanley Prusiner’s account of his hunt for the secret of prions. Connecting Concepts Across Chapters ◗ 8.37 HIV principally attacks T lymphocytes. Electron micrograph showing a T cell infected with HIV. (Courtesy of Dr. Hans Gelderblom.) Bacteria and viruses have been used extensively in the study of genetics: their rapid reproduction, large numbers of progeny, small haploid genomes, and medical importance make them ideal organisms for many types of genetic investigations. This chapter examined some of the techniques used to study and map bacterial and viral genomes. Some of these methods are an extension of the principles of recombination and gene mapping explored in Chapter 7. Bacterial and Viral Genetic Systems PrP gene (prion protein gene) Mutant PrP gene DNA DNA 1 A gene in the host produces a normal cellular copy of the prion protein (PrP C ) that folds into a helical shape. Transcription and translation 2 In the presence of the abnormal prion protein (PrPSc ), the PrP C… Abnormal prion protein (PrP Sc ) Normal prion protein (PrP C ) Transcription and translation Abnormal prion protein (PrP Sc ) PrPSc 6 In some cases the PrP gene is mutated… Normal prion protein (PrP C ) 3 …folds into a flattened sheet and becomes PrP Sc. 4 A cascade of PrP C and PrP Sc conversion follows, which causes more of the cellular protein to misfold. 227 7 …and PrP C may spontaneously misfold into PrP Sc. PrPSc 5 Disease may result. Disease Disease ◗ 8.38 The protein-only hypothesis describes a method for the replication of prions. Bacterial reproduction was discussed in Chapter 2, and a number of the principles and techniques covered in this chapter are linked to topics in future chapters. Bacterial chromosomes will be considered in more detail in Chapter 11, and bacterial replication, transcription, translation, and gene regulation will be the topics of Chapters 12 through 16. Bacteria are central to recombinant DNA technology, the topic of Chapter 18, where they are often used in mass producing specific DNA fragments. Many of the tools of recombinant DNA technology, including plasmids, restriction enzymes, DNA polymerases, and many other enzymes, have been isolated and engineered from natural components of bacterial cells. Engineered viruses are common vehicles for delivering genes to host cells. Some transposable genetic elements (discussed in Chapter 11) are closely related to viruses, and considerable evidence suggests that viruses evolved from such elements. Because their mutations are easily isolated, bacteria also play an important role in the study of gene mutations, a topic examined in Chapter 17. Chapter 20 deals with mitochondrial and chloroplast DNA, which in many respects are more similar to bacterial DNA than to the nuclear DNA of the cells in which these organelles are found. Finally, viruses cause some cancers, and the role of viral genes in cancer development is studied in Chapter 21. CONCEPTS SUMMARY • Bacteria and viruses are well suited to genetic studies: they are small, have a small haploid genome, undergo rapid reproduction, and produce large numbers of progeny through asexual reproduction. When spread on a petri plate, individual bacteria grow into colonies of identical cells that can be easily seen. • The bacterial genome normally consists of a single, circular molecule of double-stranded DNA. • Plasmids are small pieces of bacterial DNA that can replicate independently of the large chromosome. Episomes are plasmids that can exist either in a freely replicating state or can integrate into the bacterial chromosome. 228 Chapter 8 • DNA may be transferred between bacteria by means of conjugation, transformation, and transduction. • Conjugation is the union and the transfer of genetic material between two bacterial cells and is controlled by a fertility factor called F, which is an episome. Fcells are donors, and F cells are recipients during conjugation. An Hfr cell has F incorporated into the bacterial chromosome. An F cell has an F factor that has excised from the bacterial genome and carries some bacterial genes. • The rate at which individual genes are transferred from Hfr to F cells during conjugation provides information about the order and distance between the genes on the bacterial chromosome. • In transformation, bacteria take up DNA from their environment. Frequencies of cotransformation provide information about the physical distances between chromosomal genes. • Viruses are replicating structures with DNA or RNA genomes that may be double stranded or single stranded, linear or circular. Bacteriophages are viruses that infect bacteria. An individual phage can be identified when it enters a bacterial cell, multiplies, and eventually produces a patch of lysed bacterial cells (a plaque) on an agar plate. • Phage genes can be mapped by infecting bacterial cells with two different strains of phage. The numbers of recombinant • • • • • plaques produced by the progeny phages are used to estimate recombination rates between phage genes. In generalized transduction, bacterial genes become incorporated into phage coats and are transferred to other bacteria during phage infection. Rates of cotransduction can be used to determine the order and distance between genes on the bacterial chromosome. In specialized transduction, DNA near the site of phage integration on the bacterial chromosome is transferred from one bacterium to another. Benzer mapped a large number of mutations that occurred within the rII region of phage T4 and showed that intragenic recombination takes place. The results of his complementation studies demonstrated that the rII region consists of two functional units (cistrons). A number of viruses have RNA genomes. In positive-strand viruses, the RNA genome codes directly for viral proteins; in negative-strand viruses, a complementary copy of the genome is translated to form viral proteins. Retroviruses encode a reverse transcriptase enzyme used to make a DNA copy of the viral genome, which then integrates into the host genome as a provirus. Prions are infectious agents consisting only of protein; they are thought to cause disease by altering the shape of proteins encoded by the host genome. IMPORTANT TERMS minimal medium (p. 199) complete medium (p. 200) colony (p. 200) plasmid (p. 202) episome (p. 203) F factor (p. 203) conjugation (p. 203) transformation (p. 203) transduction (p. 203) pili (p. 205) competent cell (p. 211) transformant (p. 212) cotransformation (p. 212) virus (p. 213) virulent phage (p. 214) temperate phage (p. 214) prophage (p. 214) plaque (p. 214) generalized transduction (p. 216) specialized transduction (p. 216) transducing phage (p. 217) transductants (p. 217) cotransduction (p. 217) attachment site (p. 218) intragenic mapping (p. 220) positive-strand RNA virus (p. 223) negative-strand RNA virus (p. 223) retrovirus (p. 224) reverse transcriptase (p. 224) provirus (p. 224) integrase (p. 224) oncogene (p. 224) prion (p. 225) Worked Problems 1. DNA from a strain of bacteria with genotype a b c d e was isolated and used to transform a strain of bacteria that was a b c d e. The transformed cells were tested for the presence of donated genes. The following genes were cotransformed:   a and d b and e c and d c and e What is the order of genes a, b, c, d, and e on the bacterial chromosome? • Solution The rate at which genes are cotransformed is inversely proportional to the distance between them: genes that are close together are frequently cotransformed, whereas genes that are far apart are rarely cotransformed. In this transformation experiment, gene c is cotransformed with both genes e and d, but genes e and d Bacterial and Viral Genetic Systems are not cotransformed; therefore the c locus must be between the d and e loci: d c Gene e is also cotransformed with gene b; so the e and b loci must be located close together. Locus b could be on either side of locus e. To determine whether locus b is on the same side of e as locus c, we look to see whether genes b and c are cotransformed. They are not; so locus b must be on the opposite side of e from c: d c e b Gene a is cotransformed with gene d; so they must be located close together. If locus a were located on the same side of d as locus c, then genes a and c would be cotransformed. Because these genes display no cotransformation, locus a must be on the opposite side of locus d: a d c which are ara and which are leu. Results from these experiments are as follows: Selected marker leu e e 229 thr Cells with cotransduced genes (3%) 3 thr 76 ara 3 leu 0 ara How are the loci arranged on the chromosome? • Solution Notice that, when we select for leu (the top half of the table), most of the selected cells also are ara. This finding indicates that the leu and ara genes are located close together, because they are usually cotransduced. In contrast, thr is only rarely cotransduced with leu, indicating that leu and thr are much farther apart. On the basis of these observations, we know that leu and ara are closer together than are leu and thr, but we don’t yet know the order of three genes — whether thr is on the same side of ara as leu or on the opposite side, as shown here: b thr ? 2. Consider three genes in E. coli: thr (the ability to synthesize threonine), ara (the ability to metabolize arabinose), and leu (the ability to synthesize leucine). All three of these genes are close together on the E. coli chromosome. Phages are grown in a thr ara leu strain of bacteria (the donor strain). The phage lysate is collected and used to infect a strain of bacteria that is thr ara leu. The recipient bacteria are then tested on medium lacking leucine. Bacteria that grow and form colonies on this medium (leu transductants) are then replica plated onto medium lacking threonine and medium lacking arabinose to see which are thr and which are ara. Another group of recipient bacteria are tested on medium lacking threonine. Bacteria that grew and formed colonies on this medium (thr transductants) were then replica plated onto medium lacking leucine and medium lacking arabinose to see leu ara We can determine the position of thr with respect to the other two genes by looking at the cotransduction frequencies when thr is selected (the bottom half of the table). Notice that, although the cotransduction frequency for thr and leu also is 3%, no thr ara cotransductants are observed. This finding indicates that thr is closer to leu than to ara, and therefore thr must be to the left of leu, as shown here: thr leu ara COMPREHENSION QUESTIONS * 1. List some of the characteristics that make bacteria and viruses ideal organisms for many types of genetic studies. 2. Explain how auxotrophic bacteria are isolated. 3. Briefly explain the differences between F, F, Hfr, and F cells.   * 4. What types of matings are possible between F , F , Hfr, and F cells? What outcomes do these matings produce? What is the role of F factor in conjugation? * 5. Explain how interrupted conjugation, transformation, and transduction can be used to map bacterial genes. How are these methods similar and how are they different? 6. What types of genomes do viruses have? 7. Briefly describe the differences between the lytic cycle of virulent phages and the lysogenic cycle of temperate phages. 8. Briefly explain how genes in phages are mapped. 230 Chapter 8 * 9. How does specialized transduction differ from generalized transduction? *10. Briefly explain the method used by Benzer to determine whether two different mutations occurred at the same locus. 11. What is the difference between a positive-strand RNA virus and a negative-strand RNA virus? *12. Explain how a retrovirus, which has an RNA genome, is able to integrate its genetic material into that of a host having a DNA genome. 13. Briefly describe the genetic structure of a typical retrovirus. APPLICATION QUESTIONS AND PROBLEMS *14. John Smith is a pig farmer. For the past 5 years, Smith has been *18. Crosses of three different Hfr strains with separate samadding vitamins and low doses of antibiotics to his pig food; he ples of an F strain are carried out, and the following says that these supplements enhance the growth of the pigs. mapping data are provided from studies of interrupted Within the past year, however, several of his pigs died from infecconjugation: tions of common bacteria, which failed to respond to large doses Appearance of Genes in F cells of antibiotics. Can you offer an explanation for the increased Hfr1: Genes b d c f g rate of mortality due to infection in Smith’s pigs? What advice Time 3 5 16 27 59 might you offer Smith to prevent this problem in the future?      Hfr2: Genes e f c d b  15. Rarely, conjugation of Hfr and F cells produces two Hfr cells. Time 6 24 35 46 48 Explain how this occurs. Hfr3: Genes d c f e g *16. A strain of Hfr cells that is sensitive to the antibiotic Time 4 15 26 44 58 streptomycin (strs) has the genotype gal his bio pur gly. Construct a genetic map for these genes, indicating their These cells were mixed with an F strain that is resistant to r      order on the bacterial chromosome and the distances streptomycin (str ) and has genotype gal his bio pur gly . between them. The cells were allowed to undergo conjugation. At regular intervals, a sample of cells was removed and conjugation was 19. DNA from a strain of Bacillus subtilis with the genotype trp interrupted by placing the sample in a blender. The cells were tyr is used to transform a recipient strain with the genotype then plated on medium that contains streptomycin. The cells trp tyr. The following numbers of transformed cells were that grew on this medium were then tested for the presence of recovered: genes transferred from the Hfr strain. Genes from the donor Genotype Number of transformed cells Hfr strain first appeared in the recipient F strain at the times   trp tyr 154 listed here. On the basis of these data, give the order of the   trp tyr 312 genes on the bacterial chromosome and indicate the minimum   trp tyr 354 distances between them: gly his bio gal pur 3 minutes 14 minutes 35 minutes 36 minutes 38 minutes *17. A series of Hfr strains that have genotype m n o p q r are mixed with an F strain that has genotype m n o p q r. Conjugation is interrupted at regular intervals and the order of appearance of genes from the Hfr strain is determined in the recipient cells. The order of gene transfer for each Hfr strain is: Hfr5 Hfr4 Hfr1 Hfr9 m q p n r o n r o m q p o m q p n r q m o r n p What is the order of genes on the circular bacterial chromosome? For each Hfr strain, give the location of the F factor in the chromosome and its polarity. What do these results suggest about the linkage of the trp and tyr genes? 20. DNA from a strain of Bacillus subtilis with genotype a b c d e is used to transform a strain with genotype a b c d e. Pairs of genes are checked for cotransformation and the following results are obtained: Pair of genes a and b a and c a and d a and e b and c b and d b and e c and d c and e d and e Cotransformation no no yes yes yes no yes no yes no On the basis of these results, what is the order of the genes on the bacterial chromosome? Bacterial and Viral Genetic Systems 21. DNA from a bacterial strain that is his leu lac is used to transform a strain that is his leu lac. The following percentages of cells were transformed: Donor Strain Recipient strain his leu lac his leu lac Genotype of transformed cells Percentage his leu lac his leu lac his leu lac his leu lac his leu lac his leu lac his leu lac 0.02 0.00 2.00 4.00 0.10 3.00 1.50 (a) What conclusions can you make about that order of these three genes on the chromosome? (b) Which two genes are closest? 22. Two mutations that affect plaque morphology in phages (a and b) have been isolated. Phages carrying both mutations (a b) are mixed with wild-type phages (a b) and added to a culture of bacterial cells. Subsequent to infection and lysis, samples of the phage lysate are collected and cultured on bacterial cells. The following numbers of plaques are observed: Plaque phenotype   a b a b a b a b Number 2043 320 357 2134 What is the frequency of recombination between the a and b genes? * 23. A geneticist isolates two mutations in bacteriophage. One mutation causes the clear plaques (c) and the other produces minute plaques (m). Previous mapping experiments have established that the genes responsible for these two mutations are 8 map units apart. The geneticist mixes phages with genotype c m and genotype c m and uses the mixture to infect bacterial cells. She collects the progeny phages and cultures a sample of them on plated bacteria. A total of 1000 plaques are observed. What numbers of the different types of plaques (c m, c m, c m, c m) should she expect to see? 24. The geneticist carries out the same experiment described in Problem 23, but this time she mixes phages with genotypes c m and c m. What results are expected with this cross? * 25. A geneticist isolates two r mutants (r13 and r2) that cause rapid lysis. He carries out the following crosses and counts the number of plaques listed here: Genotype of parental phage   h r 13  h r 13 h r2  h r2 Progeny 231 Number of plaques  h r 13   h r 13  h r 13   h r 13 total 1 104 110 2 216 h r2 h r2 h r2 h r2 total 6 86 81 7 180 (a) Calculate the recombination frequencies between r2 and h and between r13 and h. (b) Draw all possible linkage maps for these three genes. * 26. E. coli cells are simultaneously infected with two strains of phage . One strain has a mutant host range, is temperature sensitive, and produces clear plaques (genotype  h st c); another strain carries the wild-type alleles (genotype  h st c). Progeny phage are collected from the lysed cells and are plated on bacteria. The genotypes of the progeny phage are: Progeny phage genotype h c t h c t h c t h c t h c t h c t h c t h c t Number of plaques 321 338 26 30 106 110 5 6 (a) Determine the order of the three genes on the phage chromosome. (b) Determine the map distances between the genes. (c) Determine the coefficient of coincidence and the interference (see p. 000 in Chapter 7). 27. A donor strain of bacteria with genes a b c is infected with phages to map the donor chromosome with generalized transduction. The phage lysate from the bacterial cells is collected and used to infect a second strain of bacteria that are a b c. Bacteria with the a gene are selected and the percentage of cells with cotransduced b and c genes are recorded. Donor a b c Recipient a b c Selected gene a a Cells with cotransduced gene (%) 25 b 3 c Is the b or c gene closer to a? Explain your reasoning. 232 Chapter 8 28. A donor strain of bacteria with genotype leu gal pro is infected with phages. The phage lysate from the bacterial cells is collected and used to infect a second strain of bacteria that are leu gal pro. The second strain is selected for leu, and the following cotransduction data are obtained: Donor leu gal pro  Recipient leu gal pro Selected gene leu leu Cells with cotransduced gene (%) 47 pro 26 gal Which genes are closest, leu and gal or leu and pro? 29. A geneticist isolates two new mutations from the rII region of bacteriophage T4, called rIIx and rIIy. E. coli B cells are simultaneously infected with phages carrying the rIIx mutation and with phages carrying the rIIy mutation. After the cells have lysed, samples of the phage lysate are collected. One sample is grown on E. coli K cells and a second sample on E. coli B cells. There are 8322 plaques on E. coli B and 3 plaques on E. coli K. What is the recombination frequency between these two mutations? 30. A geneticist is working with a new bacteriophage called phage Y3 that infects E. coli. He has isolated eight mutant phages that fail to produce plaques when grown on E. coli strain K. To determine whether these mutations occur at the same functional gene, he simultaneously infects E. coli K cells with paired combinations of the mutants and looks to see whether plaques are formed. He obtains the following results. (A plus sign means that plaques were formed on E. coli K; a minus sign means that no plaques were formed on E. coli K.) Mutant 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7                             8 (a) To how many functional genes (cistrons) do these mutations belong? (b) Which mutations belong to the same functional gene? CHALLENGE QUESTIONS 31. As a summer project, a microbiology student independently isolates two mutations in E. coli that are auxotrophic for glycine (gly). The student wants to know whether these two mutants occur at the same cistron. Outline a procedure that the student could use to determine whether these two gly mutations occur within the same cistron. 32. A group of genetics students mix two auxotrophic strains of bacteria: one is leu trp his met and the other is leu trphis met. After mixing the two strains, they plate the bacteria on minimal medium and observe a few prototrophic colonies (leu trp his met). They assume that some gene transfer has taken place between the two strains. How can they determine whether the transfer of genes is due to conjugation, transduction, or transformation? SUGGESTED READINGS Aguzzi, A., and C. Weissman. 1997. Prion research: the next frontiers. Nature 389:795 – 798. A review of research into the nature of prions. Benzer, S. 1962. The fine structure of the gene. Scientific American 206(1):70 – 84. A good summary of Benzer’s methodology for intragenic mapping, written by Benzer. Birge, E. A. 2000. Bacterial and Bacteriophage Genetics, 4th ed. New York: Springer-Verlag. An excellent textbook on the genetics of bacteria and bacteriophage. Cole, L. A. 1996. The specter of biological weapons. Scientific American 275(6):60 – 65. Reviews germ warfare and what can be done to discourage it. Dale, J. 1998. Molecular Genetics of Bacteria, 3rd ed. New York: Wiley. A concise summary of basic and molecular genetics of bacteria and bacteriophage. Davies, J. 1994. Inactivation of antibiotics and the dissemination of resistance genes. Science 264:275 – 282. Reviews the crisis of antibiotic resistance in bacteria, with particular emphasis on the physiology and genetics of resistance. Doolittle, R. F. 1998. Microbial genomes opened up. Nature 392:339 – 342. Discussion of sequence data on bacterial genomes and what this information provides. Fraser, C. M., J. A. Eisen, and S. L. Salzberg. 2000. Microbial genome sequencing. Nature 406:799 – 803. A short review of DNA sequencing of bacterial genomes. Bacterial and Viral Genetic Systems Heidelberg, J. F., et al. 2000. DNA sequence of both chromosomes of the cholera pathogen Vibrio cholera. Nature 406:477 – 483. Report of the sequencing and analysis of the genome of the bacterium that causes chorea. Hershey, A. D., and R. Rotman. 1942. Genetic recombination between host-range and plaque-type mutants of bacteriophage in single bacterial cells. Genetics 34:44 – 71. Original report of Hershey and Rotman’s mapping experiments with phage. Ippen-Ihler, K. A., and E. G. Minkley, Jr. 1986. The conjugation system of F, the fertility factor of Escherichia coli. Annual Review of Genetics 20:593 – 624. A detailed review of the F factor. Kruse, H., and H. Sørum. 1994. Transfer of multiple drug resistance plasmids between bacteria of diverse origins in natural microenvironments. Applied and Environmental Microbiology 60:4015 – 4021. Reports experiments demonstrating the transfer of R plasmids between diverse bacteria under natural conditions. Lederberg, J., and E. L. Tatum. 1946. Gene recombination in Escherichia coli. Nature 158:558. One of the original descriptions of Lederberg and Tatum’s discovery of gene transfer in bacteria. A slightly different set of experiments showing the same result were published in 1946 in Cold Spring Harbor Symposium on Quantitative Biology 11:113 – 114. Meselson, M., G. Guillemin, M. Hugh-Jones, A. Langmuir, I. Popova, A. Shelokov, and O. Yampolskaya. 1994. The 233 Sverdlovsk anthrax outbreak of 1979. Science 266:1202 – 1208. Report of the epidemiological studies concerning the anthrax outbreak in Sverdlovsk. Miller, R. V. 1998. Bacterial gene swapping in nature. Scientific American 278(1):66 – 71. Discusses the importance of gene transfer by conjugation, transformation, and transduction in nature. Novick, R. P. 1980. Plasmids. Scientific American 243(6):103 – 124. A good summary of plasmids and their importance in drug resistance. Pace, N. R. 1997. A molecular view of microbial diversity and the biosphere. Science 276:734 – 740. Good review of the diversity and classification of bacteria based on DNA sequence data. Scientific American. 1998. Volume 279, issue 1. This issue contains a special report with a number of articles on HIV and AIDS. Walsh, C. 2000. Molecular mechanisms that confer antibacterial drug resistance. Nature 406:775 – 781. A very good review of how antibiotic resistance develops and how antibiotics can be developed that are less likely to be resisted by bacteria. Wollman, E. L., F. Jacob, and W. Hayes. 1962. Conjugation and genetic recombination in Echerichia coli K-12. Cold Spring Harbor Symposium on Quantitative Biology 21:141 – 162. Original work on the use of interrupted conjugation to map genes in E. coli. 41385_09_p234-265 8/15/02 5:47 PM Page 234 41385 9 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 234 Application file Chromosome Variation • • Once in a Blue Moon Chromosome Variation Chromosome Morphology Types of Chromosome Mutations • Chromosome Rearrangements Duplications Deletions Inversions Translocations • Aneuploidy Types of Aneuploidy Effects of Aneuploidy Aneuploidy in Humans Uniparental Disomy Mosaicism • Polyploidy Autopolyploidy Allopolyploidy Chapter 9 Opening Photo legend not supplied. Space for 2 lines of legend x 26 pica width to come to set this position. (Charles Palek/Animals Animals.) Once in a Blue Moon One of the best-known facts of genetics is that a cross between a horse and a donkey produces a mule. Actually, it’s a cross between a female horse and a male donkey that produces the mule; the reciprocal cross, between a male horse and a female donkey, produces a hinny, which has smaller ears and a bushy tail, like a horse ( ◗ FIGURE 9.1). Both mules and hinnies are sterile because horses and donkeys are different species with different numbers of chromosomes: a horse has 64 chromosomes, whereas a donkey has only 62. There are also considerable differences in the sizes and shapes of the chromosomes that horses and donkeys have in common. A mule inherits 32 chromosomes from its horse mother and 31 chromosomes from its donkey father, giving the mule a chromosome number of 63. The maternal and paternal chromosomes of a mule are not homologous, and so they do not pair and separate properly in meiosis; consequently, a mule’s gametes are abnormal and the animal is sterile. 234 The Significance of Polyploidy • Chromosome Mutations and Cancer In spite of the conventional wisdom that mules are sterile, reports of female mules with foals have surfaced over the years, although many of them can be attributed to mistaken identification. In several instances, a chromosome check of the alleged fertile mule has demonstrated that she is actually a donkey. In other instances, analyses of genetic markers in both mule and foal demonstrated that the foal was not the offspring of the mule; female mules are capable of lactation and sometimes they adopt the foal of a nearby horse or donkey. In the summer of 1985, a female mule named Krause, who was pastured with a male donkey, was observed with a newborn foal. There were no other female horses or donkeys in the pasture; so it seemed unlikely that the mule had adopted the foal. Blood samples were collected from Krause, her horse and donkey parents, and her male foal, which was appropriately named Blue Moon. A team of geneticists led by Oliver Ryder of the San Diego Zoo examined their chromosomal makeup and analyzed 17 genetic markers from the blood samples. 41385_09_p234-265 8/15/02 5:47 PM Page 235 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 235 Application file Chromosome Variation ◗ 9.1 A cross between a female horse and a male donkey produces a mule; a cross between a male horse and a female donkey produces a hinny. (Clockwise from top left, Bonnie Rauch/Photo Researchers; R.J. Erwin/Photo Researchers; Bruce Gaylord/Visuals Unlimited; Bill Kamin/Visuals Unlimited). Krause’s karyotype revealed that she was indeed a mule, with 63 chromosomes and blood type genes that were a mixture of those found in donkeys and horses. Blue Moon also had 63 chromosomes and, like his mother, he possessed both donkey and horse genes ( ◗ FIGURE 9.2). Remarkably, he seemed to have inherited the entire set of horse chromosomes that were present in his mother. A mule’s horse and donkey chromosomes would be expected to segregate randomly when the mule produces its own gametes; so Blue Moon should have inherited a mixture of horse and donkey chromosomes from his mother. The genetic markers that Ryder and his colleagues studied suggested that random segregation had not occurred. Krause and Blue Moon were therefore not only mother and son, but also sister and brother because they have the same father and they inherited the same maternal genes. The mechanism that allowed Krause to pass only horse chromosomes to I II III ◗ her son is not known; possibly all Krause’s donkey chromosomes passed into the polar body during the first division of meiosis (see Figure 2.22), leaving the oocyte with only horse chromosomes. Krause later gave birth to another male foal named White Lightning. Like his brother, White Lightning possessed mule chromosomes and appeared to have inherited only horse chromosomes from his mother. Additional reports of fertile female mules support the idea that their offspring inherit only horse chromosomes from their mother. When the father of a mule’s offspring is a horse, the offspring is horselike in appearance, because it apparently inherits horse chromosomes from both of its parents. When the father of a mule’s offspring is a donkey, however, the offspring is mulelike in appearance, because it inherits horse chromosomes from its mule mother and donkey chromosomes from its father. Most species have a characteristic number of chromosomes, each with a distinct size and structure, and all the tissues of an organism (except for gametes) generally have the same set of chromosomes. Nevertheless, variations in chromosome number and structure do periodically arise. Individual chromosomes may lose or gain parts; the sequence of genes within a chromosome may become altered; whole chromosomes can even be lost or gained. These variations in the number and structure of chromosomes are termed chromosome mutations, and they frequently play an important role in evolution. We begin this chapter by briefly reviewing some basic concepts of chromosome structure, which we learned in Chapter 2. We then consider the different types of chromosome mutations, their definitions, their features, and their phenotypic effects. Finally, we examine the role of chromosome mutations in cancer. www.whfreeman.com/pierce mules For more information on Chromosome Variation Before we consider the different types of chromosome mutations, their effects, and how they arise, we will review the basics of chromosome structure. Chromosome Morphology Donkey 2n = 62 Horse 2n = 64 Mule “Krause” 2n = 63, XX Mule “Blue Moon” 2n = 63, XY 9.2 Blue Moon resulted from a cross between a fertile mule and a donkey. The probable pedigree of Blue Moon, the foal of a fertile mule, is shown. Each functional chromosome has a centromere, where spindle fibers attach, and two telomeres that stabilize the chromosome (see Figure 2.7). Chromosomes are classified into four basic types: metacentric, in which the centromere is located approximately in the middle, and so the chromosome has two arms of equal length; submetacentric, in which the centromere is displaced toward one end, creating a long arm and a short arm; acrocentric, in which the centromere is near one end, producing a long arm and a knob, or satellite, at the other; and telocentric, in which the centromere is at or very near the end of the chromosome (see 235 41385_09_p234-265 8/15/02 5:47 PM Page 236 41385 236 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 236 Application file Chapter 9 ◗ 9.3 A human karyotype consists of 46 chromosomes. A karyotype for a male is shown here; a karyotype for a female would have two X chromosomes. (ISM/Phototake). Figure 2.8). On human chromosomes, the short arm is designated by the letter p and the long arm by the letter q. The complete set of chromosomes that an organism possesses is called its karyotype and is usually presented as a picture of metaphase chromosomes lined up in descending order of their size ( ◗ FIGURE 9.3). Karyotypes are prepared from actively dividing cells, such as white blood cells, bone marrow cells, or cells from meristematic tissues of plants. After treatment with a chemical (such as colchicine) that prevents them from entering anaphase, the cells are chemically preserved, spread on a microscope slide, stained, and photographed. The photograph is then enlarged, and the individual chromosomes are cut out and arranged in a karyotype. For human chromosomes, karyotypes are often (a) (b) (c) (d) routinely prepared by automated machines, which scan a slide with a video camera attached to a microscope, looking for chromosome spreads. When a spread has been located, the camera takes a picture of the chromosomes, the image is digitized, and the chromosomes are sorted and arranged electronically by a computer. Preparation and staining techniques have been developed to help distinguish among chromosomes of similar size and shape. For instance, chromosomes may be treated with enzymes that partly digest them; staining with a special dye called Giemsa reveals G bands, which distinguish areas of DNA that are rich in adenine – thymine base pairs ( ◗ FIGURE 9.4a). Q bands ( ◗ FIGURE 9.4b) are revealed by staining chromosomes with quinacrine mustard and viewing the chromosomes under UV light. Other techniques reveal C bands ( ◗ FIGURE 9.4c), which are regions of DNA occupied by centromeric heterochromatin, and R bands ( ◗ FIGURE 9.4d), which are rich in guanine – cytosine base pairs. www.whfreeman.com/pierce Pictures of karyotypes, including specific chromosome abnormalities, the analysis of human karyotypes, and links to a number of Web sites on chromosomes Types of Chromosome Mutations Chromosome mutations can be grouped into three basic categories: chromosome rearrangements, aneuploids, and polyploids. Chromosome rearrangements alter the structure of chromosomes; for example, a piece of a chromosome might be duplicated, deleted, or inverted. In aneuploidy, the number of chromosomes is altered: one or more individual chromosomes are added or deleted. In polyploidy, one or more ◗ 9.4 Chromosome banding is revealed by special staining techniques. (Part a, Leonard Lessin/Peter Arnold; parts b and c, Dr. Dorothy Warburton, HICC, Columbia University; part d, Dr. Ram Verma/Phototake). 41385_09_p234-265 8/15/02 5:47 PM Page 237 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 237 Application file Chromosome Variation complete sets of chromosomes are added. Some organisms (such as yeast) possess a single chromosome set (1n) for most of their life cycles and are referred to as haploid, whereas others possess two chromosome sets and are referred to as diploid (2n). A polyploid is any organism that has more than two sets of chromosomes (3n, 4n, 5n, or more). Chromosome Rearrangements Chromosome rearrangements are mutations that change the structure of individual chromosomes. The four basic types of rearrangements are duplications, deletions, inversions, and translocations ( ◗ FIGURE 9.5). A B C D E F Original chromosome 1 In a chromosome duplication, a segment of the chromosome is duplicated. Rearrangement A B C D E G F E F G F G Rearranged chromosome (b) Deletion A B C D E 2 In a chromosome deletion, a segment of the chromosome is deleted. Rearrangement A B C D Duplications A chromosome duplication is a mutation in which part of the chromosome has been doubled (see Figure 9.5a). Consider a chromosome with segments AB CDEFG, in which represents the centromere. A duplication might include the EF segments, giving rise to a chromosome with segments AB CDEFEFG. This type of duplication, in which the duplicated region is immediately adjacent to the original segment, is called a tandem duplication. If the duplicated segment is located some distance from the original segment, either on the same chromosome or on a different one, this type is called a displaced duplication. An example of a displaced duplication would be AB CDEFGEF. A duplication can either be in the same orientation as the original sequence, as in the two preceding examples, or be inverted: AB CDEFFEG. When the duplication is inverted, it is called a reverse duplication. An individual homozygous for a rearrangement carries the rearrangement (the mutated sequence) on both homologous chromosomes, and an individual heterozygous for a rearrangement has one unmutated chromosome and one chromosome with the rearrangement. In the heterozygotes ( ◗ FIGURE 9.6a), problems arise in chromosome pairing at prophase I of meiosis, because the two chromosomes are not homologous throughout their length. The homologous regions will pair and undergo synapsis, which often requires • • • • • (a) Normal chromosome G A B C D E F G (c) Inversion A B C D E F 3 In a chromosome inversion, a segment of the chromosome becomes inverted—turned 180°. Rearrangement A B C F G E D Chromosome with duplication A B C D E F E F G G Alignment in prophase I of meiosis (d) Translocation A B C D E F G M N O P Q R S Rearrangement A B C D Q R M N O P E F ◗ One chromosome has a duplication (E and F). (b) 4 In a translocation, a chromosome segment moves from one chromosome to a nonhomologous chromosome G or to another place on the same chromosome. B C D E F G A B C D E F G E F The duplicated EF region must loop out to allow the homologous sequences of the chromosomes to align. S 9.5 The four basic types of chromosome rearrangements are duplication, deletion, inversion, and translocation. A ◗ 9.6 In an individual heterozygous for a duplication, the duplicated chromosome loops out during pairing in prophase I. 237 41385_09_p234-265 8/15/02 5:47 PM Page 238 41385 238 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 238 Application file Chapter 9 (a) Bar region Wild type B +B + (b) Heterozygous Bar B +B (c) Homozygous Bar BB (d) Heterozygous double Bar B +B D ◗ 9.7 The Bar phenotype in Drosophila melanogaster results from an X-linked duplication. (a) Wild-type fruit flies have normal-size eyes. (b) Flies heterozygous and (c) homozygous for the Bar mutation have smaller, bar-shaped eyes. (d) Flies with double Bar have three copies of the duplication and much smaller bar-shaped eyes. that one or both chromosomes loop and twist so that these regions are able to line up ( ◗ FIGURE 9.6b). The appearance of this characteristic loop structure during meiosis is one way to detect duplications. Duplications may have major effects on the phenotype. In Drosophila melanogaster, for example, a Bar mutant has a reduced number of facets in the eye, making the eye smaller and bar shaped instead of oval ( ◗ FIGURE 9.7). The Bar mutant results from a small duplication on the X chromosome, which is inherited as an incompletely dominant, X-linked trait: heterozygous female flies have somewhat smaller eyes (the number of facets is reduced; see Figure 9.7b), whereas, in homozygous female and hemizygous male flies, the number of facets is greatly reduced (see Figure Wild-type chromosomes 9.7c). Occasionally, a fly carries three copies of the Bar duplication on its X chromosome; in such mutants, which are termed double Bar, the number of facets is extremely reduced (see Figure 9.7d). Bar arises from unequal crossing over, a duplication-generating process ( ◗ FIGURE 9.8; see also Figure 17.17). How does a chromosome duplication alter the phenotype? After all, gene sequences are not altered by duplications, and no genetic information is missing; the only change is the presence of additional copies of normal sequences. The answer to this question is not well understood, but the effects are most likely due to imbalances in the amounts of gene products (abnormal gene dosage). The amount of a particular protein synthesized by a cell is often directly related to the number of copies of its corresponding gene: an individual with three functional copies of a gene often produces 1.5 times as much of the protein encoded by that gene as that produced by an individual with two copies. Because developmental processes often require the interaction of many proteins, they may critically depend on the relative amounts of the proteins. If the amount of one protein increases while the amounts of others remain constant, problems can result ( ◗ FIGURE 9.9). Although duplications can have severe consequences when the precise balance of a gene product is critical to cell function, duplications have arisen frequently throughout the evolution of many eukaryotic organisms and are a source of new genes that may provide novel functions. Human phenotypes associated with some duplications are summarized in Table 9.1. Concepts A chromosome duplication is a mutation that doubles part of a chromosome. In individuals heterozygous for a chromosome duplication, the duplicated region of the chromosome loops out when homologous chromosomes pair in prophase I of meiosis. Duplications often have major effects on the phenotype, possibly by altering gene dosage. Bar chromosomes Chromosomes do not align properly,… …resulting in unequal crossing over. One chromosome has a Bar duplication and the other a deletion. ◗ 9.8 Unequal crossing over produces Bar and double-Bar mutations. Unequal crossing over between chromosomes containing two copies of Bar… …produces a chromosome with three Bar copies (double-Bar mutation)… …and a wild-type chromosome. 41385_09_p234-265 8/15/02 5:47 PM Page 239 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 239 Application file Chromosome Variation Table 9.1 Effects of some chromosome rearrangements in humans Type of Rearrangement Chromosome Duplication 4, short arm — Small head, short neck, low hairline, growth and mental retardation Duplication 4, long arm — Small head, sloping forehead, hand abnormalities Duplication 7, long arm — Delayed development, asymmetry of the head, fuzzy scalp, small nose, low-set ears Duplication 9, short arm — Characteristic face, variable mental retardation, high and broad forehead, hand abnormalities Deletion 5, short arm Cri-du-chat syndrome Small head, distinctive cry, widely syndrome spaced eyes, a round face, mental retardation Deletion 4, short arm Wolf-Hirschhorn syndrome Small head with high forehead, wide nose, cleft lip and palate, severe mental retardation Deletion 4, long arm — Deletion 15, long arm Deletion 18, short arm — Round face, large low set-ears, mild to moderate mental retardation Deletion 18, long arm — Distinctive mouth shape, small hands, small head, mental retardation Disorder Prader-Willi syndrome Deletions A second type of chromosome rearrangement is a deletion, the loss of a chromosome segment (see Figure 9.5b). A chromosome with segments AB CDEFG that undergoes a deletion of segment EF would generate the mutated chromosome AB CDG. A large deletion can be easily detected because the chromosome is noticeably shortened. In individuals heterozygous for deletions, the normal chromosome must loop out during the pairing of homologs in prophase I of meiosis ( ◗ FIGURE 9.10) to allow the homologous regions of the two chromosomes to align and undergo synapsis. This looping out generates a structure that looks very much like that seen in individuals heterozygous for duplications. The phenotypic consequences of a deletion depend on which genes are located in the deleted region. If the deletion includes the centromere, the chromosome will not segregate in meiosis or mitosis and will usually be lost. Many deletions are lethal in the homozygous state because all copies of any essential genes located in the deleted region are missing. Even individuals heterozygous for a deletion may have multiple defects for three reasons. First, the heterozygous condition may produce imbalances in the amounts of gene products, similar to the imbalances produced by extra gene copies. Second, deletions may • • Symptoms Small head, mild to moderate mental retardation, cleft lip and palate, hand and foot abnormalities Feeding difficulty at early age, but becoming obese after 1 year of age, mild to moderate mental retardation allow recessive mutations on the undeleted chromosome to be expressed (because there is no wild-type allele to mask their expression). This phenomenon is referred to as pseudodominance. The appearance of pseudodominance in otherwise recessive alleles is an indication that a deletion is present on one of the chromosomes. Third, some genes must be present in two copies for normal function. Such a gene is said to be haploinsufficient; loss of function mutations in haploinsufficient genes are dominant. Notch is a series of X-linked wing mutations in Drosophila that often result from chromosome deletions. Notch deletions behave as dominant mutations: when heterozygous for the Notch deletion, a fly has wings that are notched at the tips and along the edges ( ◗ FIGURE 9.11). The Notch locus is therefore haploinsufficient — a single copy of the gene is not sufficient to produce a wild-type phenotype. Females that are homozygous for a Notch deletion (or males that are hemizygous) die early in embryonic development. The Notch gene codes for a receptor that normally transmits signals received from outside the cell to the cell’s interior and is important in fly development. The deletion acts as a recessive lethal because loss of all copies of the Notch gene prevents normal development. In humans, a deletion on the short arm of chromosome 5 is responsible for cri-du-chat syndrome. The name 239 41385_09_p234-265 8/15/02 5:47 PM Page 240 41385 240 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 240 Application file Chapter 9 (a) 1 Developmental processes often require the interaction of many genes. Wild-type chromosome A B C Gene expression Interaction of gene products Concepts 2 Development may be affected by the relative amounts of gene products. Embryo Normal development (b) 3 Duplications and other chromosome mutations produce extra copies of some, but not all, genes,… Mutant chromosome A B B C www.whfreeman.com/pierce chromosome disorders Information on rare Inversions A third type of chromosome rearrangement is a chromosome inversion, in which a chromosome segment is inverted — turned 180 degrees (see Figure 9.5c). If a chromosome originally had segments AB CDEFG, then chromosome AB CFEDG represents an inversion that includes segments DEF. For an inversion to take place, the chromosome must break in two places. Inversions that do not include the centromere, such as AB CFEDG, are termed paracentric inversions (para meaning “next to”), whereas inversions that include the centromere, such as ADC BEFG, are termed pericentric inversions (peri meaning “around”). Individuals with inversions have neither lost nor gained any genetic material; just the gene order has been altered. Nevertheless, these mutations often have pronounced phenotypic effects. An inversion may break a gene into two parts, with one part moving to a new location and destroying the function of that gene. Even when the chromosome breaks are between genes, phenotypic effects may arise from the inverted gene order in an inversion. Many genes are regulated in a position-dependent manner; if their positions are altered by an inversion, they may be expressed at inappropriate times or in inappropriate tissues. This outcome is referred to as a position effect. When an individual is homozygous for a particular inversion, no special problems arise in meiosis, and the two homologous chromosomes can pair and separate normally. When an individual is heterozygous for an inversion, however, the gene order of the two homologs differs, and the homologous sequences can align and pair only if • • Interaction of gene products • 4 …which alters the relative amounts (doses) of interacting products. Abnormal development 5 If the amount of one product increases but amounts of other products remain the same, developmental problems often result. ◗ A chromosome deletion is a mutation in which a part of the chromosome is lost. In individuals heterozygous for a deletion, the normal chromosome loops out during prophase I of meiosis. Deletions do not undergo reverse mutation. They cause recessive genes on the undeleted chromosome to be expressed and cause imbalances in gene products. • Gene expression Embryo (French for “cry of the cat”) derives from the peculiar, catlike cry of infants with this syndrome. A child who is heterozygous for this deletion has a small head, widely spaced eyes, a round face, and mental retardation. Deletion of part of the short arm of chromosome 4 results in another human disorder — Wolf-Hirschhorn syndrome, which is characterized by seizures and by severe mental and growth retardation. 9.9 Unbalanced gene dosage leads to developmental abnormalities. 41385_09_p234-265 8/15/02 5:47 PM Page 241 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 241 Application file Chromosome Variation ◗ 9.10 In an individual heterozygous A B C D E F G The heterozygote has one normal chromosome… for a deletion, the normal chromosome loops out during chromosome pairing in prophase I. …and one chromosome with a deletion. Formation of deletion loop during pairing of homologs in prophase I A B C E F D G In prophase I, the normal chromosome must loop out in order for the homologous sequences of the chromosome to align. Appearance of homologous chromosomes during pairing the two chromosomes form an inversion loop ( ◗ FIGURE 9.12). The presence of an inversion loop in meiosis indicates that an inversion is present. Individuals heterozygous for inversions also exhibit reduced recombination among genes located in the inverted region. The frequency of crossing over within the inversion is not actually diminished but, when crossing over does take place, the result is a tendency to produce gametes that are The heterozygote has one normal chromosome… A B Paracentric inversion C D E E D C In prophase I of meiosis, the chromosomes form an inversion loop, which allows the homologous sequences to align. D A ◗ 9.11 The Notch phenotype is produced by a chromosome deletion that includes the Notch gene. (top, normal wing veination; bottom, wing veination produced by Notch mutation. (Spyros Artavanis-Tsakonas, Kenji Matsuno, and Mark E. Fortini). ◗ B G … and one chromosome with an inverted segment. Formation of inversion loop C F E F G 9.12 In an individual heterozygous for a paracentric inversion, the chromosomes form an inversion loop during pairing in prophase I. 241 41385_09_p234-265 8/15/02 5:47 PM Page 242 41385 242 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 242 Application file Chapter 9 (a) D (b) 1 The heterozygote possesses one wild-type chromosome… A B 2 …and one chromosome with a paracentric inversion. C D E F 3 In prophase I, an inversion loop forms. 4 A single crossover within the E inverted region… C G Formation of inversion loop E D C Crossing over within inversion (d) (c) 8 In anaphase I, the centromeres separate, stretching the dicentric chromatid across the center of the nucleus. The dicentric chromatid breaks… C D 5 …results in an unusual structure. E C 6 One of the four chromatids now has two centromeres… Anaphase I D Dicentric bridge E D C 9 …and the chromosome lacking a centromere is lost. Anaphase II Nonviable recombinant gametes 11 Two gametes contain wild-type nonrecombinant chromosomes. C E 12 The other two contain recombinant chromosomes that are missing some genes; these gametes will not produce viable offspring. D Nonrecombinant gamete with paracentric inversion E D C Conclusion: The resulting recombinant gametes are nonviable because they are missing some genes. not viable and thus no recombinant progeny are observed. Let’s see why this occurs. ◗ FIGURE 9.13 illustrates the results of crossing over within a paracentric inversion. The individual is heterozygous for an inversion (see Figure 9.13a), with one wild-type, unmutated chromosome (AB CDEFG) and one inverted chromosome (AB EDCFG). In prophase I of meiosis, an inversion loop forms, allowing the homologous sequences to pair up (see Figure 9.13b). If a single crossover takes place in the inverted region (between segments C and D in Figure 9.13), an unusual structure results (see Figure 9.13c). The two outer chromatids, which did not participate in crossing over, contain original, nonrecombinant gene sequences. The two inner chromatids, which did cross over, are highly abnormal: each has two copies of some genes and • D E E D D 7 …and one lacks a centromere. C ◗ 9.13 In a heterozygous individual, a single crossover within a paracentric inversion leads to abnormal gametes. 10 In anaphase II, four gametes are produced. (e) Gametes Normal nonrecombinant gamete D • no copies of others. Furthermore, one of the four chromatids now has two centromeres and is said to be dicentric; the other lacks a centromere and is acentric. In anaphase I of meiosis, the centromeres are pulled toward opposite poles and the two homologous chromosomes separate. This stretches the dicentric chromatid across the center of the nucleus, forming a structure called a dicentric bridge (see Figure 9.13d). Eventually the dicentric bridge breaks, as the two centromeres are pulled farther apart. The acentric fragment has no centromere. Spindle fibers do not attach to it, and so this fragment does not segregate into a nucleus in meiosis and is usually lost. In the second division of meiosis, the chromatids separate and four gametes are produced (see Figure 9.11e). Two of the gametes contain the original, nonrecombinant chromosomes (AB CDEFG and AB EDCFG). The other two gametes contain recombinant chromosomes that are missing some genes; these gametes will not produce viable offspring. Thus, no recombinant progeny result when crossing over takes place within a paracentric inversion. Recombination is also reduced within a pericentric inversion ( ◗ FIGURE 9.14). No dicentric bridges or acentric fragments are produced, but the recombinant chromosomes have too many copies of some genes and no copies of others; so gametes that receive the recombinant chromosomes cannot produce viable progeny. Figures 9.13 and 9.14 illustrate the results of single crossovers within inversions. Double crossovers, in which both crossovers are on the same two strands (two-strand, • • 41385_09_p234-265 8/15/02 5:47 PM Page 243 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 243 Application file Chromosome Variation ◗ 9.14 In a heterozygous individual, a single crossover within a pericentric inversion leads to abnormal gametes. (a) 1 The heterozygote possesses one wild-type chromosome… A B C D E E D F G C 2 …and one chromosome with a pericentric inversion. Formation of inversion loop (b) D 3 In prophase I, an inversion loop forms. 4 If crossing over takes place within the inverted region,… C E double crossovers), result in functional, recombinant chromosomes. (Try drawing out the results of a double crossover.) Thus, even though the overall rate of recombination is reduced within an inversion, some viable recombinant progeny may still be produced through two-stranded double crossovers. Inversion heterozygotes are common in many organisms, including a number of plants, some species of Drosophila, mosquitoes, and grasshoppers. Inversions may have played an important role in human evolution: G-banding patterns reveal that several human chromosomes differ from those of chimpanzees by only a pericentric inversion ( ◗ FIGURE 9.15). Concepts Crossing over within inversion (c) 5 …two of the resulting chromatids have too many copies of some genes and no copies of others. D E D Translocations C Anaphase I 6 The chromosomes separate in anaphase I. (d) G F E D C In an inversion, a segment of a chromosome is inverted. Inversions cause breaks in some genes and may move others to new locations. In heterozygotes for a chromosome inversion, the chromosomes form loops in prophase I of meiosis. When crossing over takes place within the inverted region, nonviable gametes are usually produced, resulting in a depression in observed recombination frequencies. B A translocation entails the movement of genetic material between nonhomologous chromosomes (see Figure 9.5d) or within the same chromosome. Translocation should not be confused with crossing over, in which there is an exchange of genetic material between homologous chromosomes. A Centromere Human chromosome 4 Anaphase II (e) Gametes 7 The chromatids separate in anaphase II, forming four gametes… Normal nonrecombinant gamete E D C E D C Pericentric inversion Nonviable recombinant gametes Nonrecombinant gamete with pericentric inversion Conclusion: Recombinant gametes are nonviable because genes are missing or present in too many copies. Chimpanzee chromosome 4 ◗ 9.15 Chromosome 4 differs in humans and chimpanzees in a pericentric inversion. 243 41385_09_p234-265 8/15/02 5:47 PM Page 244 41385 244 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 244 Application file Chapter 9 In nonreciprocal translocations, genetic material moves from one chromosome to another without any reciprocal exchange. Consider the following two nonhomologous chromosomes: AB CDEFG and MN OPQRS. If chromosome segment EF moves from the first chromosome to the second without any transfer of segments from the second chromosome to the first, a nonreciprocal translocation has taken place, producing chromosomes AB CDG and MN OPEFQRS. More commonly, there is a two-way exchange of segments between the chromosomes, resulting in a reciprocal translocation. A reciprocal translocation between chromosomes AB CDEFG and MN OPQRS might give rise to chromosomes AB CDQRG and MN OPEFS. Translocations can affect a phenotype in several ways. First, they may create new linkage relations that affect gene expression (a position effect): genes translocated to new locations may come under the control of different regulatory sequences or other genes that affect their expression — an example is found in Burkitt lymphoma, to be discussed later in this chapter. Second, the chromosomal breaks that bring about translocations may take place within a gene and disrupt its function. Molecular geneticists have used these types of effects to map human genes. Neurofibromatosis is a genetic disease characterized by numerous fibrous tumors of the skin and nervous tissue; it results from an autosomal dominant mutation. Linkage studies first placed the locus for neurofibromatosis on chromosome 17. Geneticists later identified two patients with neurofibromatosis who possessed a translocation affecting chromosome 17. These patients were assumed to have developed neurofibromatosis because one of the chromosome breaks that occurred in the translocation disrupted a particular gene that causes neurofibromatosis. DNA from the regions around the breaks was sequenced and eventually led to the identification of the gene responsible for neurofibromatosis. Deletions frequently accompany translocations. In a Robertsonian translocation, for example, the long arms of two acrocentric chromosomes become joined to a common centromere through a translocation, generating a metacentric chromosome with two long arms and another chromosome with two very short arms ( ◗ FIGURE 9.16). The smaller chromosome often fails to segregate, leading to an overall reduction in chromosome number. As we will see, Robertsonian translocations are the cause of some cases of Down syndrome. The effects of a translocation on chromosome segregation in meiosis depend on the nature of the translocation. Let us consider what happens in an individual heterozygous for a reciprocal translocation. Suppose that the original chromosome segments were AB CDEFG and MN OPQRS (designated N1 and N2), and a reciprocal translocation takes place, producing chromosomes AB CDQRS and MN OPEFG (designated T1 and T2). An individual heterozygous for this translocation would pos- • • • • • • • 2 …is exchanged with the long arm of another,… Break points • • • • 1 The short arm of one acrocentric chromosome… Robertsonian translocation 3 …creating a large metacentric chromosome… Metacentric chromosome • + Fragment 4 …and a fragment that often fails to segregate and is lost. ◗ 9.16 In a Robertsonian translocation, the short arm of one acrocentric chromosome is exchanged with the long arm of another. sess one normal copy of each chromosome and one translocated copy ( ◗ FIGURE 9.17a). Each of these chromosomes contains segments that are homologous to two other chromosomes. When the homologous sequences pair in prophase I of meiosis, crosslike configurations consisting of all four chromosomes ( ◗ FIGURE 9.17b) form. Notice that N1 and T1 have homologous centromeres (in both chromosomes the centromere is between segments B and C); similarly, N2 and T2 have homologous centromeres (between segments N and O). Normally, homologous centromeres separate and move toward opposite poles in anaphase I of meiosis. With a reciprocal translocation, the chromosomes may segregate in three different ways. In alternate segregation ( ◗ FIGURE 9.17c), N1 and N2 move toward one pole and T1 and T2 move toward the opposite pole. In adjacent-1 segregation, N1 and T2 move toward one pole and T1 and N2 move toward the other pole. In both alternate and adjacent-1 segregation, homologous centromeres segregate toward opposite poles. Adjacent-2 segregation, in which N1 and T1 move toward one pole and T2 and N2 move toward the other, is rare. The products of the three segregation patterns are illustrated in ◗ FIGURE 9.17d. As you can see, the gametes produced by alternate segregation possess one complete set of the chromosome segments. These gametes are therefore functional and can produce viable progeny. In contrast, gametes produced by adjacent-1 and adjacent-2 segregation are not viable, because some chromosome segments are present in two copies, whereas others are missing. Adjacent-2 segregation is rare, and so most gametes are produced by alternate and adjacent segregation. Therefore, approximately half of the gametes from an individual heterozygous for a reciprocal translocation are expected to be functional. 41385_09_p234-265 8/15/02 5:47 PM Page 245 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 245 Application file Chromosome Variation (a) 1 An individual heterozygous for this translocation possesses one normal copy of each chromosome (N1 and N2)… 2 …and one translocated copy of each (T1 and T2). N1 A A B B C C D D E E F F G G N2 M M N N O O P P Q Q S R S S T1 A A B B C C D D Q Q R R S S T2 M M N N O O P P E E F F G G G G G G F F F F E E E E 245 ◗ 9.17 In an individual heterozygous for a reciprocal translocation, cross-like structures form in homologous pairing. (b) 3 Because each chromosome has sections that are homologous to two other chromosomes, a crosslike configuration forms in prophase I of meiosis. N1 A A B B C C D D P P O O N N M M T2 T1 A A B B C C D D P P O O N N M M N2 4 In anaphase I, the chromosomes separate in one of three different ways. (c) Q Q Q Q R R R R S S S S Anaphase I N1 N1 T2 N1 N2 T2 T2 T1 T1 N2 N2 T1 Alternate segregation Adjacent-1 segregation Adjacent-2 segregation (rare) Anaphase II Anaphase II Anaphase II (d) N1 A B C D E F G N1 A B C D E F G N1 A B C D E F G R S T2 M N O P E F G T1 A B C D Q R S N2 M N O P Q N1 A B C D E F G N1 A B C D E F G N1 A B C D E F G N2 M N O P Q R S T2 M N O P E F G T1 A B C D Q R S T1 A B C D Q R S T1 A B C D Q R S T2 M N O P E F G T2 M N O P E F G N2 M N O P Q R S N2 M N O P Q R S T1 A B C D Q R S T1 A B C D Q R S T2 M N O P E F G T2 M N O P E F G N2 M N O P Q R S N2 M N O P Q R S Viable gametes Nonviable gametes Conclusion: Gametes resulting from adjacent-I and adjacent-2 segregation are nonviable because some genes are present in two copies whereas others are missing. 41385_09_p234-265 8/15/02 5:47 PM Page 246 41385 246 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 246 Application file Chapter 9 Note that bands on chromosomes of different species are homologous. Human chromosome 2 Chimpanzee chromosomes Gorilla chromosomes Orangutan chromosomes ◗ 9.18 Human chromosome 2 contains a Robertsonian translocation that is not present in chimps, gorillas, or orangutans. G-banding reveals that a Robertsonian translocation in a human ancestor switched the long and short arms of the two acrocentric chromosomes that are still found in the other three primates. This translocation created the large metacentric human chromosome 2. and great-ape karyotypes; animation of the formation of a Robertsonian translocation and types of gametes produced by a translocation carrier; and pictures of karyotypes containing Robertsonian translocations Fragile Sites Chromosomes of cells grown in culture sometimes develop constrictions or gaps at particular locations called fragile sites ( ◗ FIGURE 9.19) because they are prone to breakage under certain conditions. A number of fragile sites have been identified on human chromosomes. One of the most intensively studied is a fragile site on the human X chromosome that is associated with mental retardation, the fragile-X syndrome. Exhibiting X-linked inheritance and arising with a frequency of about 1 in 1250 male births, fragile-X syndrome has been shown to result from an increase in the number of repeats of a CGG trinucleotide (see Chapter 19). However, other common fragile sites do not consist of trinucleotide repeats, and their nature is still incompletely understood. Translocations can play an important role in the evolution of karyotypes. Chimpanzees, gorillas, and orangutans all have 48 chromosomes, whereas humans have 46. Human chromosome 2 is a large, metacentric chromosome with G-banding patterns that match those found on two different acrocentric chromosomes of the apes ( ◗ FIGURE 9.18). Apparently, a Robertsonian translocation took place in a human ancestor, creating a large metacentric chromosome from the two long arms of the ancestral acrocentric chromosomes and a small chromosome consisting of the two short arms. The small chromosome was subsequently lost, leading to the reduced human chromosome number. Concepts In translocations, parts of chromosomes move to other, nonhomologous chromosomes or other regions of the same chromosome. Translocations may affect the phenotype by causing genes to move to new locations, where they come under the influence of new regulatory sequences, or by breaking genes and disrupting their function. www.whfreeman.com/pierce For gorilla and other great-ape chromosomes with a comparison of the human karyotype ◗ 9.19 Fragile sites are chromosomal regions susceptible breakage under certain conditions. (Erica Woollatt, Women’s and Children’s Hospital, Adelaide). 41385_09_p234-265 8/15/02 5:47 PM Page 247 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 247 Application file Chromosome Variation The New Genetics ETHICS • SCIENCE • TECHNOLOGY Fragile X Syndrome Ryan, age 4, is brought to the medical genetics clinic by his 27-year-old mother, Janet. Ryan is developmentally delayed and hyperactive and has undergone many tests, but all the results were normal. Janet and her husband, Terry, very much want another child. The family history is unremarkable with the exception of the 6-year-old son of one of Janet’s cousins, who is apparently “slow.” However, Janet reports that she does not get along with her siblings and that in fact she has little contact with any of the rest of her family. Both her parents are deceased. A friend told her recently that Janet’s 25-year-old sister just found out that she is pregnant with her first child. The cause of Ryan’s delay is determined to be Fragile X-syndrome. As its name suggests, this condition is a sex-linked disorder carried by females and most seriously affecting males, in whom it can cause severe mental retardation. After describing the genetics of Fragile X-syndrome and its hereditary risks, the medical geneticist asks Janet to notify her sister of the information and alert her to the availability of prenatal testing. The next week, the geneticist calls Janet, who states that she has not called her sister and does not intend to. Are there ways that the geneticist can persuade Janet to inform her sister? Failing that, can the physician alert Janet’s sister to her risk without compromising the obligation to preserve the confidentiality of the relationship with Janet? If there is no other recourse, does the genetic professional have an ethical or legal right to breach confidentiality and inform Janet’s sister — and perhaps others in the family — of the risk? Does the professional have a duty to do so? By law, a professional’s ethical duty to a patient or client can be overridden only if (1) reasonable efforts to gain consent to disclosure have failed; (2) there is a high probability of harm if information is withheld, and the information will be used to avert harm; (3) the harm that persons would suffer is serious; and (4) precautions are taken to ensure that only the genetic information needed for diagnosis or treatment or both is disclosed. However, who determines which genetic risks are among the “serious harms” that would permit breaching confidentiality in medical contexts? Some might argue that all these questions miss the point: the familiar duties of doctor and patient don’t apply in this case because, where genes are concerned, the patient is not the individual, but the entire family to which that patient belongs. Thus, the physician must do whatever best meets the needs of all members of the family. This line of reasoning may be increasingly popular as the powers of genetic medicine grow and physicians are more frequently asked to utilize and interpret genetic tests in the course of routine care. Nevertheless, we should be careful not to hastily discard the traditional ethical principle that the doctor’s and medical team’s first responsibility is to the presenting patient. Replacing it with a generalized responsibility to the whole family takes medical practice into uncharted territory and can impose serious new burdens on medical professionals. In this case, the patient refuses to inform other family members who might benefit from prenatal testing — which would allow them to decide to continue or terminate a pregnancy or prepare for the birth of a child with a genetic disorder. But the same type of problem can arise in many other ways where genetic medicine or research is concerned. In some conditions, testing is aimed at determining whether an individual or family has a genetic 247 By Ron Green susceptibility to a disease, the knowledge of which can help them pursue preventative strategies. Current testing for known breast cancer mutations is an example. In these cases, whenever the specific genes involved have not yet been identified, researchers or clinicians must conduct extensive family linkage studies to determine the pattern of inheritance. One or more family members can block progress by refusing to participate in the study. The principle of respect for autonomy certainly supports such refusals, but should this principle trump research that is needed to improve the health of other members of the family? Sometimes, the reverse problem arises: some members demand participation by other relatives in ways that exert pressure on them. Alternatively, in some cases one person’s use of a genetic test may harm others in the family. This problem has arisen in connection with Huntington disease, a fatal, later-onset neurological disorder for which no treatment exists. There have been instances when one twin of a pair of identical twins has insisted on testing and the other twin, unwilling to be subjected to the fearful psychosocial harms that testing can bring, has objected. Social workers, psychologists, genetic counselors, and others who work closely with families know that such disputes often reveal deep fault lines and sources of conflict within a family. When faced with these cases, they also recognize that it is not a matter of just solving an ethical problem, but of understanding and addressing the underlying problems that give rise to these conflicts. The familial nature of genetic information will undoubtedly increase the number and intensity of conflicts that come before caregivers or counseling professionals. 41385_09_p234-265 8/15/02 5:47 PM Page 248 41385 248 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 248 Application file Chapter 9 Aneuploidy In addition to chromosome rearrangements, chromosome mutations also include changes in the number of chromosomes. Variations in chromosome number can be classified into two basic types: changes in the number of individual chromosomes (aneuploidy) and changes in the number of chromosome sets (polyploidy). Aneuploidy can arise in several ways. First, a chromosome may be lost in the course of mitosis or meiosis if, for example, its centromere is deleted. Loss of the centromere prevents the spindle fibers from attaching; so the chromosome fails to move to the spindle pole and does not become incorporated into a nucleus after cell division. Second, the small chromosome generated by a Robertsonian translocation may be lost in mitosis or meiosis. Third, aneuploids may arise through nondisjunction, the failure of homolo- (a) Nondisjunction in meiosis I MEIOSIS I Gametes gous chromosomes or sister chromatids to separate in meiosis or mitosis (see p. 000 in Chapter 4). Nondisjunction leads to some gametes or cells that contain an extra chromosome and others that are missing a chromosome ( ◗ FIGURE 9.20). Types of Aneuploidy We will consider four types of relatively common aneuploid conditions in diploid individuals: nullisomy, monosomy, trisomy, and tetrasomy. 1. Nullisomy is the loss of both members of a homologous pair of chromosomes. It is represented as 2n  2, where n refers to the haploid number of chromosomes. Thus, among humans, who normally possess 2n  46 chromosomes, a nullisomic person has 44 chromosomes. 2. Monosomy is the loss of a single chromosome, represented as 2n  1. A monosomic person has 45 chromosomes. Zygotes MEIOSIS II (c) Nondisjunction in mitosis MITOSIS Fertilization Trisomic (2n + 1) Nondisjunction Monosomic (2n – 1) (b) Nondisjunction in meiosis II MEIOSIS I Gametes Nondisjunction Cell proliferation Zygotes MEIOSIS II Fertilization Nondisjunction Trisomic (2n + 1) Monosomic (2n – 1) Normal diploid (2n) ◗ 9.20 Aneuploids can be produced through nondisjunction in (a) meiosis I, (b) meiosis II, and (c) mitosis. The gametes that result from meioses with nondisjunction combine with gamete (with blue chromosome) that results from normal meiosis to produce the zygotes. Somatic clone of monosomic cells (2n – 1) Somatic clone of trisomic cells (2n + 1) 41385_09_p234-265 8/15/02 5:47 PM Page 249 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 249 Application file Chromosome Variation 3. Trisomy is the gain of a single chromosome, represented as 2n  1. A trisomic person has 47 chromosomes. The gain of a chromosome means that there are three homologous copies of one chromosome. 4. Tetrasomy is the gain of two homologous chromosomes, represented as 2n  2. A tetrasomic person has 48 chromosomes. Tetrasomy is not the gain of any two extra chromosomes, but rather the gain of two homologous chromosomes; so there will be four homologous copies of a particular chromosome. More than one aneuploid mutation may occur in the same individual. An individual that has an extra copy of two different (nonhomologous) chromosomes is referred to as being double trisomic and represented as 2n  1  1. Similarly, a double monosomic has two fewer nonhomologous chromosomes (2n  1  1), and a double tetrasomic has two extra pairs of homologous chromosomes (2n  2  2). Effects of Aneuploidy One of the first aneuploids to be recognized was a fruit fly with a single X chromosome and no Y chromosome, which was discovered by Calvin Bridges in 1913 (see p. 000 in Chapter 4). Another early study of aneuploidy focused on mutants in the Jimson weed, Datura stramonium. A. Francis Blakeslee began breeding this plant in 1913, and he observed that crosses with several Jimson mutants produced unusual ratios of progeny. For example, the globe mutant (having a seedcase globular in shape) was dominant but was inherited primarily from the female parent. When globe plants were self-fertilized, only 25% of the progeny had the globe phenotype, an unusual ratio for a dominant trait. Blakeslee isolated 12 different mutants ( ◗ FIGURE 9.21) that also exhibited peculiar patterns of inheritance. Eventually, John Belling demonstrated that these 12 mutants are in fact trisomics. Datura stramonium has 12 pairs of chromosomes (2n  24), and each of the 12 mutants is trisomic for a different chromosome pair. The aneuploid nature of the mutants explained the unusual ratios that Blakeslee had observed in the progeny. Many of the extra chromosomes in the trisomics were lost in meiosis, so fewer than 50% of the gametes carried the extra chromosome, and the proportion of trisomics in the progeny was low. Furthermore, the pollen containing an extra chromosome was not as successful in fertilization, and trisomic zygotes were less viable. Aneuploidy usually alters the phenotype drastically. In most animals and many plants, aneuploid mutations are lethal. Because aneuploidy affects the number of gene copies but not their nucleotide sequences, the effects of aneuploidy are most likely due to abnormal gene dosage. Aneuploidy alters the dosage for some, but not all, genes, disrupting the relative concentrations of gene products and often interfering with normal development. A major exception to the relation between gene number and protein dosage pertains to genes on the mammalian ◗ 9.21 Mutant capsules in Jimson weed (Datura stramonium) result from different trisomies. Each type of capsule is a phenotype that is trisomic for a different chromosome. X chromosome. In mammals, X-chromosome inactivation ensures that males (who have a single X chromosome) and females (who have two X chromosomes) receive the same functional dosage for X-linked genes (see p. 000 in Chapter 4 for further discussion of X-chromosome inactivation). Extra X chromosomes in mammals are inactivated; so we might expect that aneuploidy of the sex chromosomes would be less detrimental in these animals. Indeed, this is the case for mice and humans, for whom aneuploids of the sex chromosomes are the most common form of aneuploidy seen in living organisms. Y-chromosome aneuploids are probably common because there is so little information on the Y-chromosome. Concepts Aneuploidy, the loss or gain of one or more individual chromosomes, may arise from the loss of a chromosome subsequent to translocation or from nondisjunction in meiosis or mitosis. It disrupts gene dosage and often has severe phenotypic effects. Aneuploidy in Humans Aneuploidy in humans usually produces serious developmental problems that lead to spontaneous abortion (miscarriage). In fact, as many as 50% of all spontaneously 249 41385_09_p234-265 8/15/02 5:47 PM Page 250 41385 250 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 250 Application file Chapter 9 aborted fetuses carry chromosome defects, and a third or more of all conceptions spontaneously abort, usually so early in development that the mother is not even aware of her pregnancy. Only about 2% of all fetuses with a chromosome defect survive to birth. Sex-chromosome aneuploids The most common aneuploidy seen in living humans has to do with the sex chromosomes. As is true of all mammals, aneuploidy of the human sex chromosomes is better tolerated than aneuploidy of autosomal chromosomes. Turner syndrome and Klinefelter syndrome (see Figures 4.9 and 4.10) both result from aneuploidy of the sex chromosomes. Autosomal aneuploids Autosomal aneuploids resulting in live births are less common than sex-chromosome aneuploids in humans, probably because there is no mechanism of dosage compensation for autosomal chromosomes. Most autosomal aneuploids are spontaneously aborted, with the exception of aneuploids of some of the small autosomes. Because these chromosomes are small and carry fewer genes, the presence of extra copies is less detrimental. For example, the most common autosomal aneuploidy in humans is trisomy 21, also called Down syndrome. The number of genes on different human chromosomes is not precisely known at the present time, but DNA sequence data indicate that chromosome 21 has fewer genes than any other autosome, with perhaps less than 300 genes of a total of 30,000 to 35,000 for the entire genome. The incidence of Down syndrome in the United States is about 1 in 700 human births, although the incidence is higher among children born to older mothers. People with Down syndrome ( ◗ FIGURE 9.22a) show variable degrees of mental retardation, with an average IQ of about 50 (compared with an average IQ of 100 in the general population). Many people with Down syndrome also have characteristic facial features, some retardation of growth and development, and an increased incidence of heart defects, leukemia, and other abnormalities. Approximately 92% of those who have Down syndrome have three full copies of chromosome 21 (and therefore a total of 47 chromosomes), a condition termed primary Down syndrome ( ◗ FIGURE 9.22b). Primary Down syndrome usually arises from random nondisjunction in egg formation: about 75% of the nondisjunction events that cause Down syndrome are maternal in origin, and most arise in meiosis I. Most children with Down syndrome are born to normal parents, and the failure of the chromosomes to divide has little hereditary tendency. A couple who has conceived one child with primary Down syndrome has only a slightly higher risk of conceiving a second child with Down syndrome (compared with other couples of similar age who have not had any Down-syndrome children). Similarly, the couple’s relatives are not more likely to have a child with primary Down syndrome. Most cases of primary Down syndrome arise from maternal nondisjunction, and the frequency of this occuring correlates with maternal age ( ◗ FIGURE 9.23). Although the underlying cause of the association between maternal age and nondisjunction remains obscure, recent studies have indicated a strong correlation between nondisjunction and aberrant meiotic recombination. Most chromosomes that failed to separate in meiosis I do not show any evidence of having recombined with one another. Conversely, chromosomes that appear to have failed to separate in meiosis II often show evidence of recombination in regions that do (a) ◗ 9.22 Primary Down syndrome is caused by the presence of three copies of chromosome 21. (a) A child who has Down syndrome. (b) Karyotype of a person who has primary Down syndrome. (Part a, Hattie Young/Science Photo Library/Photo Researchers; part b, L. Willatt. East Anglian Regional Genetics Service/Science Photo Library/Photo Researchers). (b) 41385_09_p234-265 8/15/02 5:47 PM Page 251 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 251 Application file Chromosome Variation 90 Number of children afflicted with Down syndrome per thousand births 80 Older mothers are more likely to give birth to a child with Down syndrome… One in 12 70 60 50 40 30 …than are younger mothers. 20 10 One in 2000 20 One in 900 One in 100 30 40 Maternal age 50 ◗ 9.23 The incidence of primary Down syndrome increases with maternal age. not normally recombine, most notably near the centromere. Although aberrant recombination appears to play a role in nondisjunction, the maternal age effect is more complex. In female mammals, prophase I begins in all oogonia during fetal development, and recombination is completed prior to birth. Meisosis then arrests in diplotene, and the primary oocytes remain suspended until just before ovulation. As each primary oocyte is ovulated, meiosis resumes and the first division is completed, producing a secondary oocyte. At this point, meiosis is suspended again, and remains so until the secondary oocyte is penetrated by a sperm. The second meiotic division takes place immediately before the nuclei of egg and sperm unite to form a zygote. An explanation of the maternal age effect must take into account the aberrant recombination that occurs prenatally and the long suspension in prophase I. One theory is that the “best” oocytes are ovulated first, leaving those oocytes that had aberrant recombination to be used later in life. However, evidence indicates that the frequency of aberrant recombination is similar between oocytes that are ovulated in young women and those ovulated in older women. Another possible explanation is that aging of the cellular components needed for meiosis results in nondisjunction of chromosomes that are “at risk,” because they have failed to recombine or had some recombination defect. In younger oocytes, these chromosomes can still be segregated from one another, but in older oocytes, they are sensitive to other 251 perturbations in the meiotic machinery. In contrast, sperm are produced continually after puberty, with no long suspension of the meiotic divisions. This fundamental difference between the meiotic process in females and males may explain why most chromosome aneuploidy in humans is maternal in origin. About 4% of people with Down syndrome have 46 chromosomes, but an extra copy of part of chromosome 21 is attached to another chromosome through a translocation ( ◗ FIGURE 9.24). This condition is termed familial Down syndrome because it has a tendency to run in families. The phenotypic characteristics of familial Down syndrome are the same as those for primary Down syndrome. Familial Down syndrome arises in offspring whose parents are carriers of chromosomes that have undergone a Robertsonian translocation, most commonly between chromosome 21 and chromosome 14: the long arm of 21 and the short arm of 14 exchange places. This exchange produces a chromosome that includes the long arms of chromosomes 14 and 21, and a very small chromosome that consists of the short arms of chromosomes 21 and 14. The small chromosome is generally lost after several cell divisions. Persons with the translocation, called translocation carriers, do not have Down syndrome. Although they possess only 45 chromosomes, their phenotypes are normal because they have two copies of the long arms of chromosomes 14 and 21, and apparently the short arms of these chromosomes (which are lost) carry no essential genetic information. Although translocation carriers are completely healthy, they have an increased chance of producing children with Down syndrome. When a translocation carrier produces gametes, the translocation chromosome may segregate in three different ◗ 9.24 The translocation of chromosome 21 onto another chromosome results in familial Down syndrome. (Dr. Dorothy Warburton, HICC, Columbia University). 41385_09_p234-265 8/15/02 5:47 PM Page 252 41385 252 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 252 Application file Chapter 9 P generation Normal parent 21 14 1 A parent who is a carrier for a 14–21 translocation is normal. Parent who is a translocation carrier 2 Gametogenesis produces gametes in these possible chromosome combinations. 21 Gametogenesis 14–21 14 translocation Gametogenesis (a) (b) (c) Gametes 14–21 21 14 Translocation carrier Normal 14–21 21 14 14–21 14 21 F1 generation Gametes Zygotes 2/3 of 3 If a normal person mates with a translocation carrier… live births Down syndrome 1/3 of 4 …two-thirds of their offspring will be healthy and normal—even the translocation carriers—… Monosomy 21 (aborted) Trisomy 14 (aborted) Monosomy 14 (aborted) live births 5 …but one-third will have Down syndrome. 6 Other chromosomal combinations result in aborted embryos. ◗ 9.25 Translocation carriers are at increased risk for producing children with Down syndrome. ways. First, it may separate from the normal chromosomes 14 and 21 in anaphase I of meiosis ( ◗ FIGURE 9.25a). In this type of segregation, half of the gametes will have the translocation chromosome and no other copies of chromosomes 21 and 14; the fusion of such a gamete with a normal gamete will give rise to a translocation carrier. The other half of the gametes produced by this first type of segregation will be normal, each with a single copy of chromosomes 21 and 14, and will result in normal offspring. Alternatively, the translocation chromosome may separate from chromosome 14 and pass into the same cell with the normal chromosome 21 ( ◗ FIGURE 9.25b). This type of segregation produces all abnormal gametes; half will have two functional copies of chromosome 21 (one normal and one attached to chromosome 14) and the other half will lack chromosome 21. The gametes with the two functional copies of chromosome 21 will produce children with familial Down syndrome; the gametes lacking chromosome 21 will result in zygotes with monosomy 21 and will be spontaneously aborted. In the third type of segregation, the translocation chromosome and the normal copy of chromosome 14 segregate together, and the normal chromosome 21 segregates by itself ( ◗ FIGURE 9.25c). This pattern is presumably rare, because the two centromeres are both derived from chromosome 14 separately from each other. In any case, all the gametes produced by this process are abnormal: half result in monosomy 14 and the other half result in trisomy 14 — all are spontaneously aborted. Thus, only three of the six types of gametes that can be produced by a translocation carrier will result in the birth of a baby and, theoretically, these gametes should arise with equal frequency. One-third of the offspring of a translocation carrier should be translocation carriers like their parent, one-third should have familial Down syndrome, and one-third should be normal. In reality, however, fewer than one-third of the children born to translocation carriers have Down syndrome, which suggests that some of the embryos with Down syndrome are spontaneously aborted. www.whfreeman.com/pierce Down syndrome Additional information on Few autosomal aneuploids besides trisomy 21 result in human live births. Trisomy 18, also known as Edward syndrome, arises with a frequency of approximately 1 in 8000 live births. Babies with Edward syndrome are severely retarded and have low-set ears, a short neck, deformed feet, clenched fingers, heart problems, and other disabilities. Few live for more than a year after birth. Trisomy 13 has a frequency of about 1 in 15,000 live births and produces features that are collectively known as Patau syndrome. Characteristics of this condition include severe mental retardation, a small head, sloping forehead, small eyes, cleft lip 41385_09_p234-265 8/15/02 5:47 PM Page 253 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 253 Application file Chromosome Variation and palate, extra fingers and toes, and numerous other problems. About half of children with trisomy 13 die within the first month of life, and 95% die by the age of 3. Rarer still is trisomy 8, which arises with a frequency of about 1 in 25,000 to 50,000 live births. This aneuploid is characterized by mental retardation, contracted fingers and toes, lowset malformed ears, and a prominent forehead. Many who have this condition have normal life expectancy. Concepts In humans, sex-chromosome aneuploids are more common than are autosomal aneuploids. X-chromosome inactivation prevents problems of gene dosage for X-linked genes. Down syndrome results from three functional copies of chromosome 21, either through trisomy (primary Down syndrome) or a Robertsonian translocation (familial Down syndrome). www.whfreeman.com/pierce Additional information on trisomy 13 and trisomy 18 Uniparental Disomy Normally, the two chromosomes of a homologous pair are inherited from different parents — one from the father and one from the mother. The development of molecular techniques that facilitate the identification of specific DNA sequences (see Chapter 18), has made it possible to determine the parental origins of chromosomes. Surprisingly, sometimes both chromosomes are inherited from the same parent, a condition termed uniparental disomy. Uniparental disomy violates the rule that children affected with a recessive disorder appear only in families where both parents are carriers. For example, cystic fibrosis is an autosomal recessive disease; typically, both parents of an affected child are heterozygous for the cystic fibrosis mutation on chromosome 7. However, a small proportion of people with cystic fibrosis have only a single parent who is heterozygous for the cystic fibrosis gene. How can this be? These people must have inherited from the heterozygous parent two copies of the chromosome 7 that carries the defective cystic fibrosis allele and no copy of the normal allele from the other parent. Uniparental disomy has also been observed in Prader-Willi syndrome, a rare condition that arises when a paternal copy of a gene on chromosome 15 is missing. Although most cases of Prader-Willi syndrome result from a chromosome deletion that removes the paternal copy of the gene (see p. 000 in Chapter 4), from 20% to 30% arise when both copies of chromosome 15 are inherited from the mother and no copy is inherited from the father. Many cases of uniparental disomy probably originate as a trisomy. Although most autosomal trisomies are lethal, a trisomic embryo can survive if one of the three chromosomes is lost early in development. If, just by chance, the two remaining chromosomes are both from the same parent, uniparental disomy results. www.whfreeman.com/pierce More on uniparental disomy or links to information on Prader-Willi syndrome Mosaicism Nondisjunction in a mitotic division may generate patches of cells in which every cell has a chromosome abnormality and other patches in which every cell has a normal karyotype. This type of nondisjunction leads to regions of tissue with different chromosome constitutions, a condition known as mosaicism. Growing evidence suggests that mosaicism is relatively common. Only about 50% of those diagnosed with Turner syndrome have the 45,X karyotype (presence of a single X chromosome) in all their cells; most others are mosaics, possessing some 45,X cells and some normal 46,XX cells. A few may even be mosaics for two or more types of abnormal karyotypes. The 45,X/46,XX mosaic usually arises when an X chromosome is lost soon after fertilization in an XX embryo. Fruit flies that are XX/XO mosaics (O designates the absence of a homologous chromosome; XO means the cell has a single X chromosome and no Y chromosome) develop a mixture of male and female traits, because the presence of two X chromosomes in fruit flies produces female traits and the presence of a single X chromosome produces male traits ( ◗ FIGURE 9.26). Sex determination in fruit flies occurs phenotype phenotype (XX) (XO) Red eye White eye Wild-type wing Miniature wing ◗ 9.26 Mosaicism for the sex chromosomes produces a gynandromorph. This XX/XO gynandromorph fruit fly carries one wild-type X chromosome and one X chromosome with recessive alleles for white eyes and miniature wings. The left side of the fly has a normal female phenotype, because the cells are XX and the recessive alleles on one X chromosome are masked by the presence of wild-type alleles on the other. The right side of the fly has a male phenotype with white eyes and miniature wing, because the cells are missing the wild-type X chromosome (are XO), allowing the white and miniature alleles to be expressed. 253 41385_09_p234-265 8/15/02 5:47 PM Page 254 41385 254 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 254 Application file Chapter 9 independently in each cell during development. Those cells that are XX express female traits; those that are XY express male traits. Such sexual mosaics are called gynandromorphs. Normally, X-linked recessive genes are masked in heterozygous females but, in XX/XO mosaics, any X-linked recessive genes present in the cells with a single X chromosome will be expressed. Concepts In uniparental disomy, an individual has two copies of a chromosome from one parent and no copy from the other. It may arise when a trisomic embryo loses one of the triplicate chromosomes early in development. In mosaicism, different cells within the same individual have different chromosome constitutions. Polyploidy Most eukaryotic organisms are diploid (2n) for most of their life cycles, possessing two sets of chromosomes. Occasionally, whole sets of chromosomes fail to separate in meiosis or mitosis, leading to polyploidy, the presence of more than two genomic sets of chromosomes. Polyploids include triploids (3n) tetraploids (4n), pentaploids (5n), and even higher numbers of chromosome sets. Polyploidy is common in plants and is a major mechanism by which new plant species have evolved. Approximately 40% of all flowering-plant species and from 70% to 80% of grasses are polyploids. They include a number of agriculturally important plants, such as wheat, oats, cotton, potatoes, and sugar cane. Polyploidy is less common in animals, but is found in some invertebrates, fishes, salamanders, frogs, and lizards. No naturally occurring, viable polyploids are known in birds, but at least one polyploid mammal — a rat from Argentina — has been reported. We will consider two major types of polyploidy: autopolyploidy, in which all chromosome sets are from a single species; and allopolyploidy, in which chromosome sets are from two or more species. Autopolyploidy Autopolyploidy results when accidents of meiosis or mitosis produce extra sets of chromosomes, all derived from a single species. Nondisjunction of all chromosomes in mitosis in an early 2n embryo, for example, doubles the chromosome number and produces an autotetraploid (4n) ( ◗ FIGURE 9.27a). An autotriploid may arise when nondisjunction in meiosis produces a diploid gamete that then fuses with a normal haploid gamete to produce a triploid zygote ( ◗ FIGURE 9.27b). Alternatively, triploids may arise from a cross between an autotetraploid that produces 2n gametes and a diploid that produces 1n gametes. (a) Autopolyploidy through mitosis Nondisjunction in an early mitotic division results in an autotetraploid. MITOSIS Replication Separation of chromatids Nondisjunction (no cell division) Autotetraploid (4n) cell Diploid (2n) early embryonic cell (b) Autopolyploidy through meiosis Zygotes Gametes MEIOSIS I MEIOSIS II Replication Nondisjunction 2n 1n Diploid (2n) Fertilization Fertilization 2n ◗ 9.27 Autopolyploidy can arise through nondisjunction in mitosis or meiosis. Nondisjunction in meiosis produces a 2n gamete… Triploid (3n) …that then fuses with a 1n gamete to produce an autotriploid. 41385_09_p234-265 8/15/02 5:47 PM Page 255 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 255 Application file Chromosome Variation MEIOSIS I 1 Two homologous chromosomes pair while the other segregates randomly. 255 MEIOSIS II First meiotic cell division Anaphase II Gametes 2 Two gametes therefore receive two copies of the chromosome… Anaphase I (a) 2n 3 …and the other two receive one copy. Pairing of two of three homologus pairs 1n 4 All three chromosomes pair and segregate randomly. 5 Two gametes receive one copy of the chromosome;… (b) 1n 6 …the other two receive two copies. Triploid (3n) cell Pairing of all three homologus pairs 2n 8 All three may segregate together; so two gametes receive three copies of the chromosome… 7 None of the chromosomes pair and all three segregate randomly. 3n (c) 9 …and the other two receive no copies. No pairing Chromosomes absent ◗ 9.28 In meiosis of an autotriploid, homologous chromosomes can pair or not pair in three ways. Because all the chromosome sets in autopolyploids are from the same species, they are homologous and attempt to align in prophase I of meiosis, which usually results in sterility. Consider meiosis in an autotriploid ( ◗ FIGURE 9.28). In meiosis in a diploid cell, two chromosome homologs pair and align, but, in autotriploids, three homologs are present. One of the three homologs may fail to align with the other two, and this unaligned chromosome will segregate randomly (see Figure 9.28a). Which gamete gets the extra chromosome will be determined by chance and will differ for each homologous group of chromosomes. The resulting gametes will have two copies of some chromosomes and one copy of others. Even if all three chromosomes do align, two chromosomes must segregate to one gamete and one chromosome to the other (see Figure 9.28b). Occasionally, the presence of a third chromosome interferes with normal alignment, and all three chromosomes segregate to the same gamete (see Figure 9.28c). No matter how the three homologous chromosomes align, their random segregation will create unbalanced gametes, with various numbers of chromosomes. A gamete produced by meiosis in such an autotriploid might receive, say, two copies of chromosome 1, one copy of chromosome 2, three copies of chromosome 3, and no copies of chromosome 4. When the unbalanced gamete fuses with a normal gamete (or with another unbalanced gamete), the resulting zygote has different numbers of the four types of chromosomes. This difference in number creates unbalanced gene dosage in the zygote, which is often lethal. For this reason, triploids do not usually produce viable offspring. In even-numbered autopolyploids, such as autotetraploids, it is theoretically possible for the homologous chromosomes to form pairs and divide equally. However, this event rarely happens; so these types of autotetraploids also produce unbalanced gametes. 41385_09_p234-265 8/15/02 5:47 PM Page 256 41385 256 Pierce FREEMAN Chapter_09 (First Pages) Chapter 9 The sterility that usually accompanies autopolyploidy has been exploited in agriculture. Wild diploid bananas (2n  22), for example, produce seeds that are hard and inedible, but triploid bananas (3n  33) are sterile, and produce no seeds — they are the bananas sold commercially. Similarly, seedless triploid watermelons have been created and are now widely sold. Allopolyploidy Allopolyploidy arises from hybridization between two species; the resulting polyploid carries chromosome sets derived from two or more species. ◗ FIGURE 9.29 shows how alloploidy can arise from two species that are sufficiently related that hybridization occurs between them. Species I (AABBCC, 2n  6) produces haploid gametes with chromosomes ABC, and species II (GGHHII, 2n  6) produces gametes with chromosomes GHI. If gametes from species I and II fuse, a hybrid with six chromosomes (ABCGHI) is created. The hybrid has the same chromosome number as that of both diploid species; so the hybrid is considered diploid. However, because the hybrid chromosomes are not homologous, they will not pair and segregate properly in meiosis; so this hybrid is functionally haploid and sterile. The sterile hybrid is unable to produce viable gametes through meiosis, but it may be able to perpetuate itself through mitosis (asexual reproduction). On rare occasions, nondisjunction takes place in a mitotic division, which leads to a doubling of chromosome number and an allotetraploid, with chromosomes AABBCCGGHHII. This tetraploid is functionally diploid: every chromosome has one and only one homologous partner, which is exactly what meiosis requires for proper segregation. The allopolyploid can now undergo normal meiosis to produce balanced gametes having six chromosomes. George Karpechenko created polyploids experimentally in the 1920s. Today, as well as in the early twentieth century, cabbage (Brassica oleracea, 2n  18) and radishes (Raphanus sativa, 2n  18) are agriculturally important plants, but only the leaves of the cabbage and the roots of the radish are normally consumed. Karpechenko wanted to produce a plant that had cabbage leaves and radish roots so that no part of the plant would go to waste. Because both cabbage and radish possess 18 chromosomes, Karpechenko was able to successfully cross them, producing a hybrid with 2n  18, but, unfortunately, the hybrid was sterile. After several crosses, Karpechenko noticed that one of his hybrid plants produced a few seeds. When planted, these seeds grew into plants that were viable and fertile. Analysis of their chromosomes revealed that the plants were allotetraploids, with 2n  36 chromo- ◗ 9.29 Allopolyploids usually arise from hybridization between two species followed by chromosome doubling. 08/13/2002 256 Application file 41385_09_p234-265 8/15/02 5:47 PM Page 257 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 257 Application file Chromosome Variation somes. To Karpechencko’s great disappointment, however, the new plants possessed the roots of a cabbage and the leaves of a radish. The Significance of Polyploidy In many organisms, cell volume is correlated with nuclear volume, which, in turn, is determined by genome size. Thus, the increase in chromosome number in polyploidy is often associated with an increase in cell size, and many polyploids are physically larger than diploids. Breeders have used this effect to produce plants with larger leaves, flowers, fruits, and seeds. The hexaploid (6n  42) genome of wheat probably contains chromosomes derived from three different wild species ( ◗ FIGURE 9.30). Many other cultivated plants also are polyploid (Table 9.2). (See next page). Polyploidy is less common in animals than in plants for several reasons. As discussed, allopolyploids require hybridization between different species, which occurs less frequently in animals than in plants. Animal behavior often prevents interbreeding, and the complexity of animal development causes most interspecific hybrids to be nonviable. Many of the polyploid animals that do arise are in groups that reproduce through parthenogenesis (a type of reproduction in which individuals develop from unfertilized eggs). Thus asexual reproduction may facilitate the development of polyploids, perhaps because the perpetuation of hybrids through asexual reproduction provides greater opportunities for nondisjunction. Only a few human polyploid babies have been reported, and most died within a few days of birth. Polyploidy — usually triploidy — is seen in about 10% of all spontaneously aborted human fetuses. Different types of chromosome mutations are summarized in Table 9.3. (See next page). Concepts Polyploidy is the presence of extra chromosome sets: autopolyploids possess extra chromosome sets from the same species; allopolyploids possess extra chromosome sets from two or more species. Problems in chromosome pairing and segregation often lead to sterility in autopolyploids, but many allopolyploids are fertile. ◗ 9.30 Modern bread wheat, Triticum aestivum, is a hexaploid with genes derived from three different species. Two diploids species T. monococcum (n  14) and probably T. searsii (n  14) originally crossed to produce a diploid hybrid (2n  14) that underwent chromosome doubling to create T. turgidum (4n  28). A cross between T. turgidum and T. tauschi (2n  14) produced a triploid hybrid (3n  21) that then underwent chromosome doubling to produce T. aestivum, which is a hexaploid (6n  42). Note: Fig 9.30 to fill this column See art page for exact additions. 257 41385_09_p234-265 8/15/02 5:47 PM Page 258 41385 258 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 258 Application file Chapter 9 Table 9.2 Examples of polyploid crop plants Plant Type of Polyploidy Ploidy Chromosome Number Potato Autopolyploid 4n 48 Banana Autopolyploid 3n 33 Peanut Autopolyploid 4n 40 Sweet potato Autopolyploid 6n 90 Tobacco Allopolyploid 4n 48 Cotton Allopolyploid 4n 52 Wheat Allopolyploid 6n 42 Oats Allopolyploid 6n 42 Sugar cane Allopolyploid 8n 80 Strawberry Allopolyploid 8n 56 Source: After F. C. Elliot, Plant Breeding and Cytogenetics (New York: McGraw-Hill, 1958), p. ***. Table 9.3 Different types of chromosome mutations Chromosome Mutation Definition Chromosome rearrangement Change in chromosome structure Chromosome duplication Duplication of a chromosome segment Chromosome Deletion of a chromosome segment Inversion Chromosome segment inverted 180 degrees Paracentric inversion Inversion that does not include the centromere in the inverted region Pericentric inversion Inversion that includes the centromere in the inverted region Translocation Movement of a chromosome segment to a nonhomologous chromosome or region of the same chromosome Nonreciprocal translocation Movement of a chromosome segment to a nonhomologous chromosome or region of the same chromosome without reciprocal exchange Reciprocal translocation Exchange between segments of nonhomologous chromosomes or regions of the same chromosome Aneuploidy Change in number of individual chromosomes Nullisomy Loss of both members of a homologous pair Monosomy Loss of one member of a homologous pair Trisomy Gain of one chromosome, resulting in three homologous chromosomes Tetrasomy Gain of two homologous chromosomes, resulting in four homologous chromosomes Polyploidy Addition of entire chromosome sets Autopolyploidy Polyploidy in which extra chromosome sets are derived from the same species Allopolyploidy Polyploidy in which extra chromosome sets are derived from two or more species Chromosome Mutations and Cancer Most tumors contain cells with chromosome mutations. For many years, geneticists argued about whether these chromosome mutations were the cause or the result of can- cer. Some types of tumors are consistently associated with specific chromosome mutations, suggesting that in these cases the specific chromosome mutation played a pivotal role in the development of the cancer. However, many cancers are not associated with specific types of chromosome 41385_09_p234-265 8/15/02 5:47 PM Page 259 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 259 Application file Chromosome Variation abnormalities, and individual gene mutations are now known to contribute to many types of cancer. Nevertheless, chromosome instability is a general feature of cancer cells, causing them to accumulate chromosome mutations, which then affect individual genes that contribute to the cancer process. Thus, chromosome mutations appear to both cause and be a result of cancer. At least three types of chromosome rearrangements — deletions, inversions, and translocations — are associated with certain types of cancer. Deletions may result in the loss of one or more genes that normally hold cell division in check. When these so-called tumor-suppressor genes are lost, cell division is not regulated and cancer may result. Inversions and translocations contribute to cancer in several ways. First, the chromosomal breakpoints that accompany these mutations may lie within tumor-suppressor genes, disrupting their function and leading to cell proliferation. Second, translocations and inversions may bring together sequences from two different genes, generating a fused protein that stimulates some aspect of the cancer process. Such fusions are seen in most cases of chronic myeloid leukemia, a fatal form of leukemia affecting bonemarrow cells. About 90% of patients with chronic myeloid leukemia have a reciprocal translocation between the long arm of chromosome 22 and the tip of the long arm of chromosome 9 ( ◗ FIGURE 9.31). This translocation produces a Reciprocal translocation BCR BCR c-ABL c-ABL 9 ◗ 22 9–22 Philadelphia chromosome 9.31 A reciprocal translocation between chromosomes 9 and 22 causes chronic myeloid leukemia. Translocation c-MYC Immunoglobin gene 8 c-MYC 14 8 14 ◗ 9.32 A reciprocal translocation between chromosomes 8 and 14 causes Burkitt lymphoma. shortened chromosome 22, called the Philadelphia chromosome because it was first discovered in Philadelphia. At the end of a normal chromosome 9 is a potential cancer-causing gene called c-ABL. As a result of the translocation, part of the c-ABL gene is fused with the BCR gene from chromosome 22. The protein produced by this BCR – c-ABL fusion gene is much more active than the protein produced by the normal c-ABL gene; the fusion protein stimulates increased, unregulated cell division and eventually leads to leukemia. A third mechanism by which chromosome rearrangements may produce cancer is by the transfer of a potential cancer-causing gene to a new location, where it is activated by different regulatory sequences. Burkitt lymphoma is a cancer of the B cells, the lymphocytes that produce antibodies. Many people having Burkitt lymphoma possess a reciprocal translocation between chromosome 8 and chromosome 2, 14, or 22, each of which carries genes for immunological proteins ( ◗ FIGURE 9.32). This translocation relocates a gene called c-MYC from the tip of chromosome 8 to a position in one of the aforementioned chromosomes that is next to a gene for one of the immunoglobulin proteins. At this new location, c-MYC comes under the control of regulatory sequences that normally activate the production of immunoglobulins, and c-MYC is expressed in B cells. The c-MYC protein stimulates the division of the B cells and leads to Burkitt lymphoma. 259 41385_09_p234-265 8/15/02 5:47 PM Page 260 41385 260 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 260 Application file Chapter 9 Concepts Most tumors contain a variety of types of chromosome mutations. Some tumors are associated with specific deletions, inversions, and translocations. Deletions can eliminate or inactivate genes that control the cell cycle; inversions and translocations can cause breaks in genes that suppress tumors, fuse genes to produce cancer-causing proteins, or move genes to new locations, where they are under the influence of different regulatory sequences. www.whfreeman.com/pierce myeloid leukemia More information on chronic Connecting Concepts Across Chapters This chapter has focused on variations in the number and structure of chromosomes. Because these chromosome mutations affect many genes simultaneously, they have major effects on the phenotypes and often are not compatible with development. A major theme of this chapter has been that, even when the structure of a gene is not dis- rupted, changes in gene number and position produced by chromosome mutations can have severe effects on gene expression. Chromosome mutations most frequently arise through errors in mitosis and meiosis, and so a thorough understanding of these processes and chromosome structure (covered in Chapter 2) is essential for grasping the material in this chapter. The process of crossing over, discussed in Chapter 7, also is helpful for understanding the consequences of recombination in individuals heterozygous for chromosome rearrangements. This chapter has provided a foundation for understanding the molecular nature of chromosome structure (discussed in Chapter 11). The movement of genes through a process called transposition often produces chromosome mutations, and so the current chapter is also relevant to the discussion of transposition in Chapter 11. The discussion in this chapter of chromosomes and cancer is closely linked to the more extended discussion of cancer genetics found in Chapter 21. Variation produced by chromosome mutations, along with gene mutations and recombination, provides the raw material for evolutionary change, which is covered in Chapters 22 and 23. CONCEPTS SUMMARY • Three basic types of chromosome mutations are: (1) chromosome rearrangements, which are changes in the structure of chromosomes; (2) aneuploidy, which is an increase or decrease in chromosome number; and (3) polyploidy, which is the presence of extra chromosome sets. • Chromosome rearrangements include duplications, deletions, inversions, and translocations. • Chromosome duplications arise when a chromosome segment is doubled. The segment may be adjacent to the original segment (a tandem duplication) or distant from the original segment (a displaced duplication). Reverse duplications have the duplicated sequence in the reverse order. In individuals heterozygous for a duplication, the duplicated region will form a loop when homologous chromosomes pair in meiosis. Duplications often have pronounced effects on the phenotype owing to unbalanced gene dosage. • Chromosome deletion is the loss of part of a chromosome. In individuals heterozygous for a deletion, one of the chromosomes will loop out during pairing in meiosis. Many chromosome deletions are lethal in the homozygous state and cause deleterious effects in the heterozygous state, because of unbalanced gene dosage. Deletions may cause recessive alleles to be expressed. • A chromosome inversion is the inversion of a chromosome segment. Pericentric inversions include the centromere; paracentric inversions do not. The phenotypic effects caused by inversions are due to the breaking of genes and their movement to new locations, where they may be influenced by different regulatory sequences. In individuals heterozygous for an inversion, the chromosomes form inversion loops in meiosis, with reduced recombination taking place within the inverted region. • A translocation is the attachment of part of one chromosome to a nonhomologous chromosome. In translocation heterozygotes, the chromosomes form crosslike structures in meiosis, and the segregation of chromosomes produces unbalanced gametes. • Fragile sites are constrictions or gaps that appear at particular regions on the chromosomes of cells grown in culture and are prone to breakage under certain conditions. • Aneuploidy is the addition or loss of individual chromosomes. Nullisomy refers to the loss of two homologous chromosomes; monosomy is the loss of one homologous chromosome; trisomy is the addition of one homologous chromosome; tetrasomy is the addition of two homologous chromosomes. 41385_09_p234-265 8/15/02 5:47 PM Page 261 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 261 Application file 261 Chromosome Variation • Aneuploidy usually causes drastic phenotypic effects because it leads to unbalanced gene dosage. In humans, sexchromosome aneuploids are less detrimental than autosomal aneuploids because X-chromosomeinactivation reduces the problems of unbalanced gene dosage. • The most common autosomal aneuploid in living humans is trisomy 21, which results in Down syndrome. Primary Down syndrome is caused by the presence of three full copies of chromosome 21, whereas familial Down syndrome is caused by the presence of two normal copies of chromosome 21 and a third copy that is attached to another chromosome through a translocation. • Uniparental disomy is the presence of two copies of a chromosome from one parent and no copy from the other. • Mosaicism is caused by nondisjunction in an early mitotic division that leads to different chromosome constitutions in different cells of a single individual. • Polyploidy is the presence of more than two full chromosome sets. In autopolyploidy, all the chromosomes derive from one species; in allopolyploidy, they come from two or more species. • Autopolyploidy arises from nondisjunction in meiosis or mitosis. Here, problems with chromosome alignment and segregation frequently lead to the production of nonviable gametes. • Allopolyploidy arises from nondisjunction that follows hybridization between two species. Allopolyploids are frequently fertile. • Some types of cancer are associated with specific chromosome deletions, inversions, and translocations. Deletions may cause cancer by removing or disrupting genes that suppress tumors; inversions and translocations may break tumor-suppressing genes or they may move genes to positions next to different regulatory sequences, which alters their expression. IMPORTANT TERMS chromosome mutation (p. 000) metacentric chromosome (p. 000) submetacentric chromosome (p. 000) acrocentric chromosome (p. 000) telocentric chromosome (p. 000) chromosome rearrangement (p. 000) aneuploidy (p. 000) polyploidy (p. 000) chromosome duplication (p. 000) tandem duplication (p. 000) displaced duplication (p. 000) reverse duplication (p. 000) chromosome deletion (p. 000) pseudodominance (p. 000) haploinsufficient gene (p. 000) chromosome inversion (p. 000) paracentric inversion (p. 000) pericentric inversion (p. 000) position effect (p. 000) dicentric chromatid (p. 000) acentric chromatid (p. 000) dicentric bridge (p. 000) translocation (p. 000) nonreciprocal translocation (p. 000) reciprocal translocation (p. 000) robertsonian translocation (p. 000) alternate segregation (p. 000) adjacent-1 segregation (p. 000) adjacent-2 segregation (p. 000) fragil site (p. 000) nullisomy (p. 000) monosomy (p. 000) trisomy (p. 000) tetrasomy (p. 000) down syndrome (trisomy 21) (p. 000) primary Down syndrome (p. 000) familial Down syndrome (p. 000) translocation carrier (p. 000) Edward syndrome (trisomy 18) (p. 000) Patau syndrome (trisomy 13) (p. 000) trisomy 8 (p. 000) uniparental disomy (p. 000) mosaicism (p. 000) gynandromorph (p. 000) autopolyploidy (p. 000) allopolyploidy (p. 000) unbalanced gametes (p.000) Worked Problems 1. A chromosome has the following segments, where sents the centromere. A B C D E • • repre- F G What types of chromosome mutations are required to change this chromosome into each of the following chromosomes? (In some cases, more than one chromosome mutation may be required.) • F G (a) A B E (b) A E D C B F G (c) A B A B C D E F G • • • (d) A F E D C B G (e) A B C D E E D C • F G • Solution The types of chromosome mutations are identified by comparing the mutated chromosome with the original, wild-type chromosome. (a) The mutated chromosome (A B E F G) is missing segment C D; so this mutation is a deletion. (b) The mutated chromosome (A E D C B F G) has one and only one copy of all the gene segments, but segment • • 41385_09_p234-265 8/15/02 5:47 PM Page 262 41385 262 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 262 Application file Chapter 9 B C D E has been inverted 180 degrees. Because the centromere has not changed location and is not in the inverted region, this chromosome mutation is a paracentric inversion. (c) The mutated chromosome (A B A B C D E F G) is longer than normal, and we see that segment A B has been duplicated. This mutation is a tandem duplication. (d) The mutated chromosome (A F E D C B G) is normal length, but the gene order and the location of the centromere have changed; this mutation is therefore a pericentric inversion of region (B C D E F). (e) The mutated chromosome (A B C D E E D C F G) contains a duplication (C D E) that is also inverted; so this chromosome has undergone a duplication and a paracentric inversion. 2. Species I is diploid (2n  4) with chromosomes AABB; related species II is diploid (2n  6) with chromosomes MMNNOO. Give the chromosomes that would be found in individuals with the following chromosome mutations. (a) Autotriploid for species I • • • • (b) Allotetraploid including species I and II (c) Monosomic in species I (d) Trisomic in species II for chromosome M (e) Tetrasomic in species I for chromosome A (f) Allotriploid including species I and II (g) Nullisomic in species II for chromosome N • Solution To work this problem, we should first determine the haploid genome complement for each species. For species I, n  2 with chromosomes AB and, for species II, n  3 with chromosomes MNO. (a) An autotriploid is 3n, with all the chromosomes coming from a single species; so an autotriploid of species I will have chromosomes AAABBB (3n  6). (b) An allotetraploid is 4n, with the chromosomes coming from more than one species. An allotetraploid could consist of 2n from species I and 2n from species II, giving the allotetraploid (4n  2  2  3  3  10) chromosomes AABBMMNNOO. An allotetraploid could also possess 3n from species I and 1n from species II (4n  2  2  2  3  9; AAABBBMNO) or 1n from species I and 3n from species II (4n  2  3  3  3; ABMMMNNNOOO). (c) A monosomic is missing a single chromosome; so a monosomic for species 1 would be 2n  1  4  1  3. The monosomy might include either of the two chromosome pairs, with chromosomes ABB or AAB. (d) Trisomy requires an extra chromosome; so a trisomic of species II for chromosome M would be 2n  1  6  1  7 (MMMNNOO). (e) A tetrasomic has two extra homologous chromosomes; so a tetrasomic of species I for chromosome A would be 2n  2  4  2  6 (AAAABB). (f) An allotriploid is 3n with the chromosomes coming from two different species; so an allotriploid could be 3n  2237 (AABBMNO) or 3n  2  3  3  8 (ABMMNNOO). (g) A nullisomic is missing both chromosomes of a homologous pair; so a nullisomic of species II for chromosome N would be 2n  2  6  2  4 (MMOO). The New Genetics MINING GENOMES THE HUMAN GENOME PROJECT The first successful efforts to clone single genes and to determine the sequence of DNA molecules began in the 1970s. Less than two decades later, techniques and strategies were being developed to organize and sequence clones that covered the entire human genome. Today, the sequence of our genome is freshly available. This exercise will introduce you to some of the tools that are used to organize, retrieve, and understand information derived from the Human Genome Project. COMPREHENSION QUESTIONS * 1. List the different types of chromosome mutations and define each one. * 2. Why do extra copies of genes sometimes cause drastic phenotypic effects? 3. Draw a pair of chromosomes as they would appear during synapsis in prophase I of meiosis in an individual heterozygous for a chromosome duplication. 4. How does a deletion cause pseudodominance? * 5. What is the difference between a paracentric and a pericentric inversion? 6. How do inversions cause phenotypic effects? * 7. Draw a pair of chromosomes as they would appear during synapsis in prophase I of meiosis in an individual heterozygous for a paracentric inversion. 41385_09_p234-265 8/15/02 5:47 PM Page 263 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 263 Application file Chromosome Variation 8. Explain why recombination is suppressed in individuals heterozygous for paracentric and pericentric inversions. * 9. How do translocations produce phenotypic effects? 10. Sketch the chromosome pairing and the different segregation patterns that can arise in an individual heterozygous for a reciprocal translocation. 11. What is a Robertsonian translocation? 263 *13. Why are sex-chromosome aneuploids more common in humans than autosomal aneuploids? *14. What is the difference between primary Down syndrome and familial Down syndrome? How does each arise? *15. What is uniparental disomy and how does it arise? 16. What is mosaicism and how does it arise? * 17. What is the difference between autopolyploidy and allopolyploidy? How does each arise? 12. List four major types of aneuploidy. APPLICATION QUESTIONS AND PROBLEMS *18. Which types of chromosome mutations (a) increase the amount of genetic material on a particular chromosome? (b) increase the amount of genetic material for all chromosomes? (c) decrease the amount of genetic material on a particular chromosome? (d) change the position of DNA sequences on a single chromosome without changing the amount of genetic material? (e) move DNA from one chromosome to a nonhomologous chromosome? *19. A chromosome has the following segments, where • represents the centromere: A B • C D E F G What types of chromosome mutations are required to change this chromosome into each of the following chromosomes? (In some cases, more than one chromosome mutation may be required.) (a) (b) (c) (d) (e) (f) (g) (h) (i) A A A A A A C A A • B A B B C D B C F C D E B C D B E D B A D B C F B C D • • • • • • • • C E E F E C E E E D E F G A B F G D G G • A B R S • • C D E F G T U V W X What type of chromosome mutation would produce the following chromosomes? (a) A B R S • • C D T U V W X E F G (b) A U V B R S T • D E (c) A B U V F G E W X C W G T U V D E R S (d) A B R S • • • • • T C W X C D F G F X *22. A species has 2n  16 chromosomes. How many chromosomes will be found per cell in each of the following mutants in this species? (a) Monosomic F G F G D F E D G F C D F E G *20. A chromosome initially has the following segments: A B (b) Displaced duplication of DEF (c) Deletion of FG (e) Paracentric inversion that includes DEFG (f) Pericentric inversion of BCDE 21. The following diagram represents two nonhomologous chromosomes: C D E F G Draw and label the chromosome that would result from each of the following mutations. (a) Tandem duplication of DEF (b) Autotriploid (c) Autotetraploid (d) Trisomic (e) Double monosomic (f) Nullisomic (g) Autopentaploid (h) Tetrasomic *23. The Notch mutation is a deletion on the X chromosome of Drosophila melanogaster. Female flies heterozygous for Notch have an indentation on the margin of their wings; Notch is 41385_09_p234-265 8/15/02 5:47 PM Page 264 41385 264 Pierce FREEMAN Chapter_09 (First Pages) 264 Application file Chapter 9 lethal in the homozygous and hemizygous conditions. The 28. Notch deletion covers the region of the X chromosome that contains the locus for white eyes, an X-linked recessive trait. *29. Give the phenotypes and proportions of progeny produced in the following crosses. (a) A red-eyed, Notch female is mated with white-eyed male. (b) A white-eyed, Notch female is mated with a red-eyed male. (c) A white-eyed, Notch female is mated with a white-eyed male. 24. The green nose fly normally has six chromosomes, two metacentric and four acrocentric. A geneticist examines the chromosomes of an odd-looking green nose fly and discovers that it has only five chromosomes; three of them are metacentric and two are acrocentric. Explain how this change in chromosome number might have occurred. 25. Species I is diploid (2n  8) with chromosomes AABBCCDD; related species II is diploid (2n  8) with chromosomes MMNNOOPP. Individuals with the following sets of chromosomes represent what types of chromosome mutations? (a) AAABBCCDD (b) MMNNOOOOPP (c) AABBCDD (d) AAABBBCCCDDD (e) AAABBCCDDD (f) AABBDD (g) AABBCCDDMMNNOOPP (h) AABBCCDDMNOP * 26. A wild-type chromosome has the following segments: A B C • D E F G H I An individual is heterozygous for the following chromosome mutations. For each mutation, sketch how the wild-type and mutated chromosomes would pair in prophase I of meiosis, showing all chromosome strands. (a) (b) (c) (d) 08/13/2002 A A A A B B B B C C C E • • • D D E D H D G C • F D E F G H I I F E H I F G H I 27. An individual that is heterozygous for a pericentric inversion has the following two chromosomes: A B C D A B C F • E E F G H I D G H I • (a) Sketch the pairing of these two chromosomes in prophase I of meiosis, showing all four strands. (b) Draw the chromatids that would result from a single crossover between the E and F segments. (c) What will happen when the chromosomes separate in anaphase I of meiosis? Answer part b of problem 28 for a two-strand double crossover between E and F. An individual heterozygous for a reciprocal translocation possesses the following chromosomes: C D E F G A B A B C D V W X R S T U E F G R S T U V W X (a) Draw the pairing arrangement of these chromosomes in prophase I of meiosis. (b) Diagram the alternate, adjacent-1, and adjacent-2 segregation patterns in anaphase I of meiosis. (c) Give the products that result from alternate, adjacent-1, and adjacent-2 segregation. 30. Red – green color blindness is a human X-linked recessive disorder. A young man with a 47,XXY karyotype (Klinefelter syndrome) is color-blind. His 46,XY brother also is colorblind. Both parents have normal color vision. Where did the nondisjunction occur that gave rise to the young man with Klinefelter syndrome? • • • • 31. Some people with Turner syndrome are 45,X/46,XY mosaics. Explain how this mosaicism could arise. *32. Bill and Betty have had two children with Down syndrome. Bill’s brother has Down syndrome and his sister has two children with Down syndrome. On the basis of these observations, which of the following statements is most likely correct? Explain your reasoning. (a) Bill has 47 chromosomes. (b) Betty has 47 chromosomes. (c) Bill and Betty’s children each have 47 chromosomes. (d) Bill’s sister has 45 chromosomes. (e) Bill has 46 chromosomes. (f) Betty has 45 chromosomes. (g) Bill’s brother has 45 chromosomes. *33. Tay-Sachs disease is an autosomal recessive disease that causes blindness, deafness, brain enlargement, and premature death in children. It is possible to identify carriers for Tay-Sachs disease by means of a blood test. Mike and Sue have both been tested for the Tay-Sachs gene; Mike is a heterozygous carrier for Tay-Sachs, but Sue is homozygous for the normal allele. Mike and Sue’s baby boy is completely normal at birth, but at age 2 develops TaySachs disease. Assuming that a new mutation has not occurred, how could Mike and Sue’s baby have inherited Tay-Sach’s disease? 34. In mammals, sex-chromosome aneuploids are more common than autosomal aneuploids but, in fishes, sex-chromosome aneuploids and autosomal aneuploids are found with equal 41385_09_p234-265 8/15/02 5:48 PM Page 265 41385 Pierce FREEMAN Chapter_09 (First Pages) 08/13/2002 265 Application file Chromosome Variation frequency. Offer an explanation for these differences in mammals and fishes. *35. A young couple is planning to have children. Knowing that there have been a substantial number of stillbirths, miscarriages, and fertility problems on the husband’s side of the family, they see a genetic counselor. A chromosome analysis reveals that, whereas the woman has a normal karyotype, the man possesses only 45 chromosomes and is a carrier for a Robertsonian translocation between chromosomes 22 and 13. 265 (a) List all the different types of gametes that might be produced by the man. (b) What types of zygotes will develop when each of gametes produced by the man fuses with a normal gamete from the woman? (c) If trisomies and monosomies entailing chromosome 13 and 22 are lethal, what proportion of the surviving offspring will be carriers of the translocation? CHALLENGE QUESTION 36. Red – green color blindness is a human X-linked recessive disorder. Jill has normal color vision, but her father is colorblind. Jill marries Tom, who also has normal color vision. Jill and Tom have a daughter who has Turner syndrome and is color-blind. (a) How did the daughter inherit color blindness? (b) Did the daughter inherit her X chromosome from Jill or from Tom? SUGGESTED READINGS Boue, A. 1985. Cytogenetics of pregnancy wastage. Advances in Human Genetics 14:1 – 58. A study showing that many human spontaneously aborted fetuses contain chromosome mutations. Brewer, C., S. Holloway, P. Zawalnyski, A. Schinzel, and D. FitzPatrick. 1998. A chromosomal deletion map of human malformations. American Journal of Human Genetics 63:1153 – 1159. A study of human malformations associated with specific chromosome deletions. Epstein, C. J. 1988. Mechanisms of the effects of aneuploidy in mammals. Annual Review of Genetics 22:51 – 75. A review of how aneuploidy produces phenotypic effects in mammals. Feldman, M., and E. R. Sears. 1981. The wild resources of wheat. Scientific American 244, (1):98. An account of how polyploidy has led to the evolution of modern wheat. Gardner, R. J. M., and G. R. Sunderland. 1996. Chromosome Abnormalities and Genetic Counseling. Oxford: Oxford University Press. A guide to chromosome abnormalies for genetic counselors. Goodman, R. M., and R. J. Gorlin. 1983. The Malformed Infant and Child: An Illustrated Guide. New York: Oxford University Press. A pictorial compendium of genetic and chromosomal syndromes in humans. Hall, J. C. 1988. Review and hypothesis: somatic mosaicism — observations related to clinical genetics. American Journal of Human Genetics 43:355 – 363. A review of the significance of mosaicism in human genetics. Hieter, P., and T. Griffiths. 1999. Polyploidy: more is more or less. Science 285:210 – 211. Discusses current research that shows that there is some unbalanced gene expression in polyploid cells. Patterson, D. 1987. The causes of Down syndrome. Scientific American 257(2):52 – 60. An excellent review of research concerning the genes on chromosome 21 that cause Down syndrome. Rabbitts, T. H. 1994. Chromosomal translocations in human cancers. Nature 372:143 – 149. Reviews the association of some chromosomal translocations with specific human cancers. Rowley, J. D. 1998. The critical role of chromosome translocations in human leukemias. Annual Review of Genetics 32:495 – 519. A review of molecular analyses of chromosome translocations in leukemias. Ryder, O. A., L. G. Chemnick, A. T. Bowling, and K. Benirschke. 1985. Male mule foal qualifies as the offspring of a female mule and jack donkey. Journal of Heredity 76:379 – 381. A study of a male foal (Blue Moon) born to a mule, which was discussed at the beginning of the chapter. Sánchez-García, I. 1997. Consequences of chromosome abnormalities in tumor development. Annual Review of Genetics 31:429 – 453. Reviews the nature of fusion proteins produced by chromosome translocations that play a role in tumor development. Schulz-Schaeffer, J. 1980. Cytogenetics: Plants, Animals, Humans. New York: Springer Verlag. A detailed treatment of chromosomal variation. 10 DNA: The Chemical Nature of the Gene • The Elegantly Stable Double Helix: Ice Man’s DNA • • Characteristics of Genetic Material The Molecular Basis of Heredity Early Studies of DNA DNA As the Source of Genetic Information Watson and Crick's Discovery of the Three-Dimensional Structure of DNA RNA as Genetic Material • The Structure of DNA The Primary Structure of DNA Secondary Structures of DNA • Special Structures in DNA and RNA DNA Methylation Bends in DNA Ice Man is a 5300-year-old frozen corpse found in the Alps. Analysis of his mitochondrial DNA has revealed that he was a Neolithic hunter related to present-day Europeans living north of the Alps. (Brando Quilicia) The Elegantly Stable Double Helix: Ice Man’s DNA DNA, with its gentle double-stranded spiral, is among the most elegant of all biological molecules. But the double helix is not just a beautiful structure; it also gives DNA incredible stability and permanence, as illustrated by the story of Ice Man. On September 19, 1991, German tourists hiking in the Tyrolean Alps near the border between Austria and Italy spotted a corpse trapped in glacial ice. A copper ax, dagger, bow, and quiver with 14 arrows were found alongside the body. Not realizing its antiquity, local residents made several crude and unsuccessful attempts to free the body from the ice. After 4 days, a team of forensic experts arrived to recover the body and transport it to the University of Innsbruck. There the mummified corpse, known as Ice Man, was refrozen and subjected to scientific study. 2 Radiocarbon dating indicates that Ice Man is approximately 5000 years old. Recent evidence from the South Tyrol Museum of Archeology has led to the conclusion that Ice Man was shot in the chest with an arrow and died soon thereafter. The body became dehydrated in the cold highaltitude air, was covered with snow that turned into ice, and remained frozen for the next 5000 years. Some experts challenged Ice Man’s origin, suggesting that he was a South American mummy who had been planted at the glacier site in an elaborate hoax. To establish his authenticity and ethnic origin, scientists removed eight samples of muscle, connective tissue, and bone from his left hip. Under sterile conditions, the investigators extracted DNA from the samples and used the polymerase chain reaction (see Chapter 18) to amplify a very small region of his mitochondrial DNA a millionfold. They determined the base sequence of this amplified DNA and compared it with mitochondrial sequences from present-day humans. DNA: The Chemical Nature of the Gene This analysis revealed that Ice Man’s mitochondrial DNA sequences resemble those found in present-day Europeans living north of the Alps and are quite different from those of sub-Saharan Africans, Siberians, and Native Americans. Together, radiocarbon dating, the artifacts, and the DNA analysis all indicate that Ice Man was a Neolithic hunter who died while attempting to cross the Alps 5000 years ago. That some of Ice Man’s DNA persists and faithfully carries his genetic instructions even after the passage of 5000 years is testimony to the remarkable stability of the double helix. Even more ancient DNA has been isolated from the fossilized bones of Neanderthals that are at least 30,000 years old. This chapter focuses on how DNA was identified as the source of genetic information and how this elegant molecule encodes the genetic instructions. We begin by considering the basic requirements of the genetic material and the history of our understanding of DNA — how its relation to genes was uncovered and how its structure was determined. The history of DNA illustrates several important points about the nature of scientific research. As with so many important scientific advances, DNA’s structure and its role as the genetic material were not discovered by any single person but were gradually revealed over a period of almost 100 years, thanks to the work of many investigators. Our understanding of the relation between DNA and genes was enormously enhanced in 1953, when James Watson and Francis Crick proposed a three-dimensional structure for DNA that brilliantly illuminated its role in genetics. As illustrated by Watson and Crick’s discovery, major scientific advances are often achieved not through the collection of new data but through the interpretation of old data in new ways. After reviewing the history of DNA, we will examine DNA structure. DNA structure is important in its own right, but the key genetic concept is the relation between the structure and the function of DNA — how its structure allows it to serve as the genetic material. Characteristics of Genetic Material Life is characterized by tremendous diversity, but the coding instructions of all living organisms are written in the same genetic language — that of nucleic acids. Surprisingly, the idea that genes are made of nucleic acids was not widely accepted until after 1950. This late recognition of the role of nucleic acids in genetics resulted principally from a lack of knowledge about the structure of deoxyribonucleic acid (DNA). Until the structure of DNA was fully elucidated, it wasn’t clear how DNA could store and transmit genetic information. Even before nucleic acids were identified as the genetic material, biologists recognized that, whatever the nature of genetic material, it must possess three important characteristics. 1. Genetic material must contain complex information. First and foremost, the genetic material must be capable of storing large amounts of information — instructions for all the traits and functions of an organism. This information must have the capacity to vary, because different species and even individual members of a species differ in their genetic makeup. At the same time, the genetic material must be stable, because most alterations to the genetic instructions (mutations) are likely to be detrimental. 2. Genetic material must replicate faithfully. A second necessary feature is that genetic material must have the capacity to be copied accurately. Every organism begins life as a single cell, which must undergo billions of cell divisions to produce a complex, multicellular creature like yourself. At each cell division, the genetic instructions must be transmitted to descendent cells with great accuracy. When organisms reproduce and pass genes to their progeny, the coding instructions must be copied with fidelity. 3. Genetic material must encode phenotype. The genetic material (the genotype) must have the capacity to “code for” (determine) traits (the phenotype). The product of a gene is often a protein; so there must be a mechanism for genetic instructions to be translated into the amino acid sequence of a protein. Concepts The genetic material must be capable of carrying large amounts information, replicating faithfully, and translating its coding instructions into phenotypes. The Molecular Basis of Heredity Although our understanding of how DNA encodes genetic information is relatively recent, the study of DNA structure stretches back 100 years. Early Studies of DNA In 1868, Johann Friedrich Miescher ( ◗ FIGURE 10.1) graduated from medical school in Switzerland. Influenced by an uncle who believed that the key to understanding disease lay in the chemistry of tissues, Miescher traveled to Tubingen, Germany, to study under Ernst Felix HoppeSeyler, an early leader in the emerging field of biochemistry. Under Hoppe-Seyler’s direction, Miescher turned his attention to the chemistry of pus, a substance of clear medical importance. Pus contains white blood cells with large nuclei; Miescher developed a method of isolating 3 4 Chapter 10 1833 Brown describes nucleus of the cell 1830 1869 1884 Miescher discovers Histones nuclein (DNA) in isolated the nuclei of from white blood cells nucleus 1840 1839 Shleiden and Schwann propose cell theory 1850 1860 1866 Mendel’s work is first published 1900 Mendel’s work rediscovered 1870 1910 Levene proposes tetranucleotide theory 1880 1887 Recognition that nucleus is the physical basis of heredity 1928 Griffith demonstrates transforming principle 1890 1900 Late 1800’s Kossel determines that DNA contains nitrogenous bases 1947 Ashbury begins X-ray diffraction studies of DNA 1910 1920 1944 Avery, MacLeod, and McCarty demonstrate that the transforming principle is DNA 1952 Hershey and Chase demonstrate that DNA is genetic material in bacteriophage 1930 1940 1948 Chargaff and colleagues discover regularity in base ratios of DNA 1953 Watson and Crick devise the secondary structure for DNA 1950 1960 1956 Fraenkel-Conrat and Singer show that some viruses use RNA as genetic material ◗ 10.1 Many people have contributed to our understanding of the structure and function of DNA. these nuclei. The minute amounts of nuclear material that he obtained were insufficient for a thorough chemical analysis, but he did establish that it contained a novel substance that was slightly acidic and high in phosphorus. This material, which consisted of DNA and protein, Miescher called nuclein. The substance was later renamed nucleic acid by one of his students. By 1887, researchers had concluded that the physical basis of heredity lies in the nucleus. Chromatin was shown to consist of nucleic acid and proteins, but which of these substances is actually the genetic information was not clear. In the late 1800s, further work on the chemistry of DNA was carried out by Albrecht Kossel, who determined that DNA contains four nitrogenous bases: adenine, cytosine, guanine, and thymine (abbreviated A, C, G, and T). In the early twentieth century, the Rockefeller Institute in New York City became a center for nucleic acid research. Phoebus Aaron Levene joined the Institute in 1905 and spent the next 40 years studying the chemistry of DNA. He discovered that DNA consists of a large number of linked, repeating units, each containing a sugar, a phosphate, and a base (together forming a nucleotide). Base Phosphate contributed to the idea that protein is the genetic material because, with its 20 different amino acids, protein structure could be highly variable. As additional studies of the chemistry of DNA were completed in the 1940s and 1950s, this notion of DNA as a simple, invariant molecule began to change. Erwin Chargaff and his colleagues carefully measured the amounts of the four bases in DNA from a variety of organisms and found that DNA from different organisms varies greatly in base composition. This finding disproved the tetranucleotide theory. They discovered that, within each species, there is some regularity in the ratios of the bases: the total amount of adenine is always equal to the amount of thymine (A  T), and the amount of guanine is always equal to the amount of cytosine (G  C; Table 10.1). These findings became known as Chargaff ’s rules. Concepts Details of the structure of DNA were worked out by a number of scientists. At first, DNA was interpreted as being too regular in structure to carry genetic information but, by the 1940s, DNA from different organisms was shown to vary in its base composition. Sugar Nucleotide He incorrectly proposed that DNA consists of a series of four-nucleotide units, each unit containing all four bases — adenine, guanine, cytosine, and thymine — in a fixed sequence. This concept, known as the tetranucleotide theory, implied that the structure of DNA is too regular to serve as the genetic material. The tetranucleotide theory DNA As the Source of Genetic Information While chemists were working out the structure of DNA, biologists were attempting to identify the source of genetic information. Two sets of experiments, one conducted on bacteria and the other on viruses, provided pivotal evidence that DNA, rather than protein, was the genetic material. DNA: The Chemical Nature of the Gene Table 10.1 Base composition of DNA from different sources and rations of bases Ratio A T G C A/T G/C A  G/T  C E. coli 26.0 23.9 24.9 25.2 1.09 0.99 1.04 Yeast 31.3 32.9 18.7 17.1 .95 1.09 1.00 Sea urchin 32.8 32.1 17.7 18.4 1.02 .96 1.00 Source of DNA Rat 28.6 28.4 21.4 21.5 1.01 1.00 1.00 Human 30.3 30.3 19.5 19.9 1.00 0.98 0.99 The discovery of the transforming principle The first clue that DNA was the carrier of hereditary information came with the demonstration that DNA was responsible for a phenomenon called transformation. The phenomenon was first observed in 1928 by Fred Griffith, an English physician whose special interest was the bacterium that causes pneumonia, Streptococcus pneumonia. Griffith had succeeded in isolating several different strains of S. pneumonia (type I, II, III, and so forth). In the virulent (diseasecausing) forms of a strain, each bacterium is surrounded by a polysaccharide coat, which makes the bacterial colony appear smooth when grown on an agar plate; these forms are referred to as S, for smooth. Griffith found that these virulent forms occasionally mutated to nonvirulent forms, which lack a polysaccharide coat and produce a roughappearing colony on an agar plate; these forms are referred to as R, for rough. Griffith was interested in the origins of the different strains of S. pneumonia and why some types were virulent, whereas others were not. He observed that small amounts of living type IIIS bacteria injected into mice caused the mice to develop pneumonia and die; on autopsy, he found large amounts of type IIIS bacteria in the blood of the mice ( ◗ FIGURE 10.2a). When Griffith injected type IIR bacteria into mice, the mice lived, and no bacteria were recovered from their blood ( ◗ FIGURE 10.2b). Griffith knew that boiling killed all the bacteria and destroyed their virulence; when he injected large amounts of heat-killed type IIIS bacteria into mice, the mice lived and no type IIIS bacteria were recovered from their blood ( ◗ FIGURE 10.2c). The results of these experiments were not unusual. However, Griffith got a surprise when he infected his mice with a small amount of living type IIR bacteria, along with a large amount of heat-killed type IIIS bacteria. Because both the type IIR bacteria and the heat-killed type IIIS bacteria were nonvirulent, he expected these mice to live. Surprisingly, 5 days after the injections, the mice became infected with pneumonia and died ( ◗ FIGURE 10.2d). When Griffith examined blood from the hearts of these mice, he observed live type IIIS bacteria. Furthermore, these bacteria retained their type IIIS characteristics through several generations; so the infectivity was heritable. ◗ 10.2 Griffith’s experiments demonstrated transformation in bacteria. 5 6 Chapter 10 Griffith’s results had several possible interpretations, all of which he considered. First, it could have been the case that he had not sufficiently sterilized the type IIIS bacteria and thus a few live bacteria remained in the culture. Any live bacteria injected into the mice would have multiplied and caused pneumonia. Griffith knew that this possibility was unlikely, because he had used only heat-killed type IIIS bacteria in the control experiment, and they never produced pneumonia in the mice. A second interpretation was that the live, type IIR bacteria had mutated to the virulent S form. Such a mutation would cause pneumonia in the mice, but it would produce type IIS bacteria, not the type IIIS that Griffith found in the dead mice. Many mutations would be required for type II bacteria to mutate to type III bacteria, and the chance of all the mutations occurring simultaneously was impossibly low. Griffith finally concluded that the type IIR bacteria had somehow been transformed, acquiring the genetic virulence of the dead type IIIS bacteria. This transformation had produced a permanent, genetic change in the bacteria; though Griffith didn’t understand the nature of transformation, he theorized that some substance in the polysaccharide coat of the dead bacteria might be responsible. He called this substance the transforming principle. Identification of the transforming principle At the time of Griffith’s report, Oswald Avery (see Figure 10.1) was a microbiologist at the Rockefeller Institute. At first Avery was skeptical but, after other microbiologists successfully repeated Griffith’s experiments using other bacteria and showed that transformation took place, Avery set out to identify the nature of the transforming substance. After 10 years of research, Avery, Colin MacLeod, and Maclyn McCarty succeeded in isolating and purifying the transforming substance. They showed that it had a chemical composition closely matching that of DNA and quite different from that of proteins. Enzymes such as trypsin and chymotrypsin, known to break down proteins, had no effect on the transforming substance. Ribonuclease, an enzyme that destroys RNA, also had no effect. Enzymes capable of destroying DNA, however, eliminated the biological activity of the transforming substance ( ◗ FIGURE 10.3). Avery, MacLeod, and McCarty showed that purified transforming substance precipitated at about the same rate as purified DNA and that it absorbed ultraviolet light at the same wavelengths as does DNA. These results, published in 1944, provided compelling evidence that the transforming principle — and therefore genetic information — resides in DNA. Many biologists still refused to accept the idea, however, still preferring the hypothesis that the genetic material is protein. ◗ 10.3 Avery, MacLeod, and McCarty’s experiment revealed the nature of the transforming principle. Experiment Question: What is the chemical nature of the transforming substance? Type IIIS (virulent) bacteria 1 Heat kill virulent bacteria, homogenize, and filter. Type IIIS bacterial filtrate 2 Treat samples with enzymes that destroy proteins, RNA, or DNA. RNase (destroys RNA) Protease (destroys proteins) DNase (destroys DNA) 3 Add the treated samples to cultures of type IIR bacteria. Type IIR bacteria Type IIR bacteria Type IIIS and Type IIIS and type IIR bacteria type IIR bacteria 4 Cultures treated with protease or RNase contain transformed type IIIS bacteria,… Type IIR bacteria Type IIR bacteria 5 …but the culture treated with DNase does not. Conclusion: Because only DNase destroyed the transforming substance, the transforming principle is DNA. DNA: The Chemical Nature of the Gene Concepts (a) Phage genome is DNA. The process of transformation indicates that some substance — the transforming principle — is capable of genetically altering bacteria. Avery, MacLeod, and McCarty demonstrated that the transforming principle is DNA, providing the first evidence that DNA is the genetic material. The Hershey-Chase experiment A second piece of evidence implicating DNA as the genetic material resulted from a study of the T2 virus conducted by Alfred Hershey and Martha Chase. T2 is a bacteriophage (phage) that infects the bacterium Escherichia coli ( ◗ FIGURE 10.4a). As stated in Chapter 8, a phage reproduces by attaching to the outer wall of a bacterial cell and injecting its DNA into the cell, where it replicates and directs the cell to synthesize phage protein. The phage DNA becomes encapsulated within the proteins, producing progeny phages that lyse (break open) the cell and escape ( ◗ FIGURE 10.4b). At the time of the Hershey-Chase study (their paper was published in 1952), biologists did not understand exactly how phages reproduce. What they did know was that the T2 phage consists of approximately 50% protein and 50% nucleic acid, that a phage infects a cell by first attaching to the cell wall, and that progeny phages are ultimately produced within the cell. Because the progeny carried the same traits as the infecting phage, genetic material from the infecting phage must be transmitted to the progeny, but how this occurs was unknown. Hershey and Chase designed a series of experiments to determine whether the phage protein or the phage DNA was transmitted in phage reproduction. To follow the fate of protein and DNA, they used radioactive forms (isotopes) of phosphorus and sulfur. A radioactive isotope can be used as a tracer to identify the location of a specific molecule, because any molecule containing the isotope will be radioactive and therefore easily detected. DNA contains phosphorus but not sulfur; so Hershey and Chase used 32P to follow phage DNA during reproduction. Protein contains sulfur but not phosphorus; so they used 35S to follow the protein. First, Hershey and Chase grew E. coli in a medium containing 32P and infected the bacteria with T2 so that all the new phages would have DNA labeled with 32P ( ◗ FIGURE 10.5). They grew a second batch of E. coli in a medium containing 35S and infected these bacteria with T2 so that all these new phages would have protein labeled with 35 S. Hershey and Chase then infected separate batches of unlabeled E. coli with the 35S- and 32P-labeled phages. After allowing time for the phages to infect the cells, they placed the E. coli cells in a blender and sheared off the now-empty protein coats (ghosts) from the cell walls. They separated out the protein coats and cultured the infected bacterial All other parts of the bacteriophage are protein. (b) Phage E. coli 1 Phage attaches to E. coli and injects its chromosome. Bacterial chromosome Phage chromosome 2 Bacterial chromosome breaks down and the phage chromosome replicates. 3 Expression of phage genes produces phage structural components. 4 Progeny phage particles assemble. 5 Bacterial wall lyses, releasing progeny phages. 10.4 T2 is a bacteriophage that infects E. coli. (a) T2 phage. (b) Its life cycle. (Photo, Harold W. Fisher/Visuals Unlimited.) 7 8 Chapter 10 Experiment Question: Which part of the phage—its DNA or its protein—serves as the genetic material and is transmitted to phage progeny? Experiment 1 1 Infect E. coli grown in medium containing 35S. 2 35S is taken up in phage protein, which contains S but not P. 3 Shear off protein coats in blender… 4 …and separate protein from cells by centrifuging. 5 After centrifugation, 35S is recovered in the fluid containing the virus coats. E. coli Protein 35S Radioactivity in protein coats 35S DNA Phage reproduction T2 phage Infect unlabeled E. coli 6 No radioactivity is detected in progeny phages. Experiment 2 1 Infect E. coli grown in medium containing 32P. 2 32P is taken up in phage DNA, which contains P but not S. 3 Shear off protein coats in blender… 4 …and separate protein from cells by centrifuging. 5 After centrifugation, infected bacteria form a pellet containing 32P in the bottom of the tube. E. coli Protein 32P DNA 32P T2 phage Infect unlabeled E. coli Radioactivity in cell Phage reproduction 32P 6 The progeny phages are radioactive. Conclusion: DNA—not protein—is the genetic material in bacteriophages. ◗ 10.5 Hershey and Chase demonstrated that DNA carries the genetic information in bacteriophages. cells. Eventually, the cells burst and new phage particles emerged. When phages labeled with 35S infected the bacteria, most of the radioactivity separated with the protein ghosts and little remained in the cells. Furthermore, when new phages emerged from the cell, they contained almost no radioactivity (see Figure 10.5). This result indicated that, although the protein component of a phage was necessary for infection, it didn’t enter the cell and was not transmitted to progeny phages. In contrast, when Hershey and Chase infected bacteria with 32P-labeled phages and removed the protein ghosts, the bacteria were still radioactive. Most significantly, after the cells lysed and new progeny phages emerged, many of these phages emitted radioactivity from 32P, demonstrating that DNA from the infecting phages had been passed on to the progeny (see Figure 10.5). These results confirmed that DNA, not protein, is the genetic material of phages. Concepts Using radioactive isotopes, Hershey and Chase traced the movement of DNA and protein during phage infection. They demonstrated that DNA, not protein, enters the bacterial cell during phage reproduction and that only DNA is passed on to progeny phages. www.whfreeman.com/pierce A discussion of the requirements of the genetic material and the history of our understanding of DNA structure and function. DNA: The Chemical Nature of the Gene 1 Crystals of a substance are bombarded with X-rays,… 2 …which are diffracted (bounce off). 3 The spacing of the atoms within the crystal determines the diffraction pattern, which appears as spots on a photographic film. 4 Interpretation of the diffraction pattern produced by DNA provides information about the structure of the molecule. Crystal sample Beam of X-rays X-ray source Lead screen Detector (photographic plate) Diffraction pattern ◗ 10.6 X-ray diffraction provides information about the structures of molecules. (Photo from M. H. F. Wilkins, Department of Biophysics, King’s College, University of London.) Watson and Crick’s Discovery of the Three-Dimensional Structure of DNA The experiments on the nature of the genetic material set the stage for one of the most important advances in the history of biology — the discovery of the three-dimensional structure of DNA by James Watson and Francis Crick in 1953. Watson had studied bacteriophage for his Ph.D.; he was familiar with Avery’s work and thus understood the tremendous importance of DNA to genetics. Shortly after receiving his Ph.D., Watson went to the Cavendish Laboratory at Cambridge University in England, where a number of researchers were studying the three-dimensional structure of large molecules. Among these researchers was Francis Crick, who was still working on his Ph.D. Watson and Crick immediately became friends and colleagues. Much of the basic chemistry of DNA had already been determined by Miescher, Kossel, Levene, Chargaff, and others, who had established that DNA consisted of nucleotides, and that each nucleotide contained a sugar, base, and phosphate group. However, how the nucleotides fit together in the threedimensional structure of the molecule was not at all clear. In 1947, William Ashbury began studying the threedimensional structure of DNA by using a technique called X-ray diffraction ( ◗ FIGURE 10.6), but his diffraction pictures did not provide enough resolution to reveal the structure. A research group at King's College in London, led by Maurice Wilkins and Rosalind Franklin, also was studying the structure of DNA by using X-ray diffraction and obtained strikingly better pictures of the molecule. Wilkins and Franklin, however, were unable to develop a complete structure of the molecule; their progress was impeded by personal discord that existed between them. Watson and Crick investigated the structure of DNA, not by collecting new data but by using all available information about the chemistry of DNA to construct molecular 9 models ( ◗ FIGURE 10.7). By applying the laws of structural chemistry, they were able to limit the number of possible structures that DNA could assume. Watson and Crick tested various structures by building models made of wire and metal plates. With their models, they were able to see whether a structure was compatible with chemical principles and with the X-ray images. The key to solving the structure came when Watson recognized that an adenine base could bond with a thymine base and that a guanine base could bond with a cytosine base; these pairings accounted for the base ratios that Chargaff had discovered earlier. The model developed by Watson and Crick showed that DNA consists of two strands of nucleotides wound around each other to form a right- ◗ 10.7 Watson and Crick provided a threedimensional model of the structure of DNA. (A. Barrington Brown/Science Photo Library/Photo Researchers.) 10 Chapter 10 handed helix, with the sugars and phosphates on the outside and the bases in the interior. They published an electrifying description of their model in Nature in 1953. At the same time, Wilkins and Franklin published their X-ray diffraction data, which demonstrated experimentally the theory that DNA was helical in structure. Many have called the solving of DNA’s structure the most important biological discovery of the twentieth century. For their discovery, Watson and Crick, along with Maurice Wilkins, were awarded a Nobel Prize in 1962. (Rosalind Franklin had died of cancer in 1957 and, thus, could not be considered a candidate for the shared prize.) Concepts By collecting existing information about the chemistry of DNA and building molecular models, Watson and Crick were able to discover the three-dimensional structure of the DNA molecule. www.whfreeman.com/pierce A commentary on Watson and Crick’s original paper describing the structure of DNA and more information about some of the key players in the discovery of DNA RNA As Genetic Material In most organisms, DNA carries the genetic information. However, a few viruses utilize RNA, not DNA, as their genetic material. This fact was demonstrated in 1956 by Heinz Fraenkel-Conrat and Bea Singer, who worked with tobacco mosaic virus (TMV), a virus that infects and causes disease in tobacco plants. The tobacco mosaic virus possesses a single molecule of RNA surrounded by a helically arranged cylinder of protein molecules. Fraenkal-Conrat found that, after separating the RNA and protein of TMV, he could remix them and obtain intact, infectious viral particles. With Singer, Fraenkal-Conrat then created hybrid viruses by mixing RNA and protein from different strains of TMV ( ◗ FIGURE 10.8). When these hybrid viruses infected tobacco leaves, new viral particles were produced. The new viral progeny were identical to the strain from which the RNA had been isolated and did not exhibit the characteristics of the strain that donated the protein. These results showed that RNA carries the genetic information in TMV. Also in 1956, Alfred Gierer and Gerhard Schramm demonstrated that RNA isolated from TMV is sufficient to infect tobacco plants and direct the production of new TMV particles, confirming that RNA carries genetic instructions. Concepts RNA serves as the genetic material in some viruses. ◗ 10.8 Fraenkal-Conrat and Singer’s experiment demonstrated that, in the tobacco mosaic virus, RNA carries the genetic information. DNA: The Chemical Nature of the Gene The Structure of DNA 5 HOCH2 DNA, though relatively simple in structure, has an elegance and beauty unsurpassed by other large molecules. It is useful to consider the structure of DNA at three levels of increasing complexity, known as the primary, secondary, and tertiary structures of DNA. The primary structure of DNA refers to its nucleotide structure and how the nucleotides are joined together. The secondary structure refers to DNA’s stable three-dimensional configuration, the helical structure worked out by Watson and Crick. In Chapter 11, we will consider DNA’s tertiary structures, which are the complex packing arrangements of doublestranded DNA in chromosomes. 4 C HC 2 3 4 N C OH OH HC 2 6 3 N C 5 N3 8 CH 9 HC N H 4 C N H Adenine (A) ◗ H2N C OH H H 2 4 1 CH 5 6 CH Pyrimidine (basic structure) HN1 8 CH 9 C N NH2 C N 2 H C O 7 C 1 H 3 Deoxyribose N NH2 C H H The sugars of DNA and RNA are slightly different in structure. RNA’s ribose sugar has a hydroxyl group attached to the 2-carbon atom, whereas DNA’s sugar, called deoxyribose, has a hydrogen atom at this position and contains one oxygen atom fewer overall. This difference gives rise to the names ribonucleic acid (RNA) and deoxyribonucleic acid (DNA). This minor chemical difference is recognized by all the cellular enzymes that interact with DNA or RNA, thus yielding specific functions for each nucleic acid. Further, the additional oxygen atom in the RNA nucleotide makes it more reactive and less chemically stable than DNA. For this reason, DNA is better suited to serve as the long-term repository of genetic information. The second component of a nucleotide is its nitrogenous base, which may be of two types — a purine or a pyrimidine ( ◗ FIGURE 10.10). Each purine consists of a six-sided ring attached to a five-sided ring, whereas each pyrimidine consists of a six-sided ring only. DNA and RNA both contain two purines, adenine and guanine (A Purine (basic structure) N1 4 C H C 7 C C 1 OH O ◗ therefore termed a macromolecule. For example, within each human chromosome is a single DNA molecule that, if stretched out straight, would be several centimeters in length. In spite of its large size, DNA has a relatively simple structure: it is a polymer, a chain made up of many repeating units linked together. As already mentioned, the repeating units of DNA are nucleotides, each comprising three parts: (1) a sugar, (2) a phosphate, and (3) a nitrogen-containing base. The sugars of nucleic acids —called pentose sugars — have five carbon atoms, numbered 1, 2, 3, and so forth ( ◗ FIGURE 10.9). Four of the carbon atoms are joined by an oxygen atom to form a five-sided ring; the fifth (5) carbon atom projects upward from the ring. Hydrogen atoms or hydroxyl groups (OH) are attached to each carbon atom. C 5 H 2 HOCH2 10.9 A nucleotide contains either a ribose sugar (in RNA) or a deoxyribose sugar (in DNA). The atoms of the five-sided ring are assigned primed numbers. Nucleotides DNA is typically a very long molecule and is 6 3 5 OH Ribose The primary structure of DNA consists of a string of nucleotides joined together by phosphodiester linkages. N1 H H The Primary Structure of DNA H C O C 2 6 3 C 5 4 C N Guanine (G) C N N3 7 8 CH 9 N H O C O 2 4 1 O C CH HN3 CH C 5 6 N H Cytosine (C) 10.10 A nucleotide contains either a purine or a pyrimidine base. The atoms of the rings in the bases are assigned unprimed numbers. O 2 4 1 C 5 6 CH3 C HN3 CH N H Thymine (T) (present in DNA) C O 2 4 1 CH 5 6 CH N H Uracil (U) (present in RNA) 11 12 Chapter 10 O are modified forms of the four common bases. These modified bases will be discussed in more detail when we examine the function of RNA molecules in Chapter 14. O 9 P " O O Phosphate ◗ 10.11 Concepts A nucleotide contains a phosphate group. and G), which differ in the positions of their double bonds and in the groups attached to the six-sided ring. There are three pyrimidines found in nucleic acids: cytosine (C), thymine (T), and uracil (U). Cytosine is present in both DNA and RNA; however, thymine is restricted to DNA, and uracil is found only in RNA. The three pyrimidines differ in the groups or atoms attached to the carbon atoms of the ring and in the number of double bonds in the ring. In a nucleotide, the nitrogenous base always forms a covalent bond with the 1-carbon atom of the sugar (see Figure 10.9). A deoxyribose (or ribose) sugar and a base together are referred to as a nucleoside. The third component of a nucleotide is the phosphate group, which consists of a phosphorus atom bonded to four oxygen atoms ( ◗ FIGURE 10.11). Phosphate groups are found in every nucleotide and frequently carry a negative charge, which makes DNA acidic. The phosphate is always bonded to the 5-carbon atom of the sugar in a nucleotide (see Figure 10.9). The DNA nucleotides are properly known as deoxyribonucleotides or deoxyribonucleoside 5-monophosphates. Because there are four types of bases, there are four different kinds of DNA nucleotides ( ◗ FIGURE 10.12). The equivalent RNA nucleotides are termed ribonucleotides or ribonucleoside 5-monophosphates. RNA molecules sometimes contain additional rare bases, which NH2 O 9 P 9 O 9 CH O 2 O H H OH H Deoxyadenosine 5-monophosphate (dAMP) ◗ 10.12 O H N H2N 2 O 9 P 9 O 9 CH O H 2 O H H H OH H Deoxyguanosine 5-monophosphate (dGMP) There are four types of DNA nucleotides. N O O O H H H Deoxythymidine 5-monophosphate (dTMP) N O O 2 H OH N O 9 P 9 O 9 CH O H NH2 CH3 HN N N O 9 P 9 O 9 CH O H nucleotides connected by covalent bonds, which join the 5-phosphate group of one nucleotide to the 3-carbon atom of the next nucleotide ( ◗ FIGURE 10.13). These bonds, called phosphodiester linkages, are relatively strong covalent bonds; a series of nucleotides linked in this way constitutes a polynucleotide strand. The backbone of the polynucleotide strand is composed of alternating sugars and phosphates; the bases project away from the long axis of the strand. The negative charges of the phosphate groups are frequently neutralized by the association of positive charges on proteins, metals, or other molecules. An important characteristic of the polynucleotide strand is its direction, or polarity. At one end of the strand a phosphate group is attached only to the 5-carbon atom of the sugar in the nucleotide. This end of the strand is therefore referred to as the 5end. The other end of the strand, referred to as the 3end, has an OH group attached to the 3-carbon atom of the sugar. RNA nucleotides also are connected by phosphodiester linkages to form similar polynucleotide strands (see Figure 10.13). O HN N N O Polynucleotide strands DNA is made up of many O N N The primary structure of DNA consists of a string of nucleotides. Each nucleotide consists of a fivecarbon sugar, a phosphate, and a base. There are two types of DNA bases: purines (adenine and guanine) and pyrimidines (thymine and cytosine). O H H H H OH H Deoxycytidine 5-monophosphate (dCMP) 13 DNA: The Chemical Nature of the Gene DNA polynucleotide strand RNA polynucleotide strand CH3 –O HC O H2C 5’ H A C N C H H N N O C H2C 5’ N O N C C C H H CH O N O N C C A N O H O N H O C N C H N C G C CH O– P H H H N N 3’ H P N O– C G N H C H N H H H OH H HC O P N O H2C 5’ O H H O OH P H 3’ N N C C N H H H N C HC N O C C N C O H H O OH ◗ 10.13 DNA consists of two polynucleotide chains that are antiparallel and complementary, and RNA consists of a single nucleotide chain. Concepts The nucleotides of DNA are joined in polynucleotide strands by phosphodiester bonds that connect the 3 carbon atom of one nucleotide to the 5 phosphate group of the next. Each polynucleotide strand has polarity, with a 5 end and a 3 end. Secondary Structures of DNA The secondary structure of DNA refers to its threedimensional configuration — its fundamental helical structure. DNA’s secondary structure can assume a variety of configurations, depending on its base sequence and the conditions in which it is placed. H C H O H C A N O O O C O H 3’ O C N H H 3’ O –O N HC H2C 5’ H2C 5’ O N H H O P –O O H O H O H H N C H2C 5’ O C C H O CH n O H C T O H C HC H N C N 3’ H CH3 H C H H O H2C 5’ O N H O H H C H H O C O– P directio 5’-to-3’ N CH CH OH H2C 5’ O C U O H2C 5’ O N H 3’ H n H –O directio O H H 3’ O 5’-to-3’ N O– O H C H O H 3’ C H HC P H H P H N H H H2C 5’ –O N N H O H 3’ G H P n directio 5’-to-3’ H H 3’ O H H C C O H O O C O N H2C 5’ O HN O P O H2C 5’ O N O HC –O H H H O P –O N O 3’ H CH C H H O –O N C H O H 3’ N N C O H H C T N H O C O P The double helix A fundamental characteristic of DNA’s secondary structure is that it consists of two polynucleotide strands wound around each other — it’s a double helix. The sugar – phosphate linkages are on the outside of the helix, and the bases are stacked in the interior of the molecule (see Figure 10.13). The two polynucleotide strands run in opposite directions — they are antiparallel, which means that the 5 end of one strand is opposite the 3 end of the second. The strands are held together by two types of molecular forces. Hydrogen bonds link the bases on opposite strands (see Figure 10.13). These bonds are relatively weak compared with the covalent phosphodiester bonds that connect the sugar and phosphate groups of adjoining nucleotides. As we will see, several important functions of DNA require the separation of its two nucleotide strands, and this separation can be readily accomplished because H 14 Chapter 10 of the relative ease of breaking and reestablishing the hydrogen bonds. The nature of the hydrogen bond imposes a limitation on the types of bases that can pair. Adenine normally pairs only with thymine through two hydrogen bonds, and cytosine normally pairs only with guanine through three hydrogen bonds (see Figure 10.13). Because three hydrogen bonds form between C and G and only two hydrogen bonds form between A and T, C – G pairing is stronger than A – T pairing. The specificity of the base pairing means that wherever there is an A on one strand, there must be a T in the corresponding position on the other strand, and wherever there is a G on one strand, a C must be on the other. The two polynucleotide strands of a DNA molecule are therefore not identical but are complementary. The second force that holds the two DNA strands together is the interaction between the stacked base pairs. These stacking interactions contribute to the stability of the DNA molecule and do not require that any particular base follow another. Thus, the base sequence of the DNA molecule is free to vary, allowing DNA to carry genetic information. Concepts DNA consists of two polynucleotide strands. The sugar – phosphate groups of each polynucleotide strand are on the outside of the molecule, and the bases are in the interior. Hydrogen bonding joins the bases of the two strands: guanine pairs with cytosine, and adenine pairs with thymine. The two polynucleotide strands of a DNA molecule are complementary and antiparallel. Different secondary structures As we have seen, DNA normally consists of two polynucleotide strands that are antiparallel and complementary (exceptions are singlestranded DNA molecules in a few viruses). The precise three-dimensional shape of the molecule can vary, however, depending on the conditions in which the DNA is placed and, in some cases, on the base sequence itself. The three-dimensional structure of DNA that Watson and Crick described is termed the B-DNA structure ( ◗ FIGURE 10.14). This structure exists when plenty of water surrounds the molecule and there is no unusual base sequence in the DNA — conditions that are likely to be present in cells. The B-DNA structure is the most stable configuration for a random sequence of nucleotides under physiological conditions, and most evidence suggests that it is the predominate structure in the cell. B-DNA is an alpha helix, meaning that it has a righthanded, or clockwise, spiral. It possesses approximately 10 base pairs (bp) per 360-degree rotation of the helix; so ◗ 10.14 B-DNA consists of an alpha helix with approximately 10 bases per turn. (a) Diagrammatic representation showing that the bases are 0.34 nanometer (nm) apart, that each rotation encompasses 3.4 nm, and that the diameter of the helix is 2 nm. (b) Space-filling model of B-DNA showing major and minor grooves. each base pair is twisted 36 degrees relative to the adjacent bases (see Figure 10.14a). The base pairs are 0.34 nanometer (nm) apart; so each complete rotation of the molecule encompasses 3.4 nm. The diameter of the helix is 2 nm, and the bases are perpendicular to the long axis of the DNA molecule. A space-filling model shows that B-DNA has a relatively slim and elongated structure (see Figure 10.14b). Spiraling of the nucleotide strands creates major and minor grooves in the helix, features that are important for the binding of some DNA-binding proteins that regulate the expression of genetic information (Chapter 16). Some characteristics of the B-DNA structure, along with characteristics of other secondary structures that exist under certain conditions or with unusual base sequences, are given in Table 10.2. DNA: The Chemical Nature of the Gene Table 10.2 Characteristics of DNA secondary structures Characteristic A-DNA B-DNA Z-DNA Conditions required to produce structure 75% H2O 92% H2O Alternating purine and pyrimidine bases Helix direction Right-handed Right-handed Left-handed Average base pairs per turn 11 10 12 Rotation per base pair 32.7º 36º – 30º Distance between adjacent bases 0.26 nm 0.34 nm 0.37 nm Diameter 2.3 nm 1.9 nm 1.8 nm Overall shape Short and wide Long and narrow Elongated and narrow Note: Within each structure, the parameters may vary somewhat owing to local variation and method of analysis. Another secondary structure that DNA can assume is the A-DNA structure, which exists when less water is present. Like B-DNA, A-DNA is an alpha (right-handed) helix ( ◗ FIGURE 10.15a), but it is shorter and wider than B-DNA ( ◗ FIGURE 10.15b) and its bases are tilted away from the main axis of the molecule. There is little evidence that A-DNA exists under physiological conditions. A radically different secondary structure called Z-DNA ( ◗ FIGURE 10.15c) forms a left-handed helix. In this form, the sugar – phosphate backbones zigzag back and forth, giving rise to the name Z-DNA (for zigzag). Z-DNA structures can arise under physiological conditions when particular base sequences are present, such as stretches of alternating C and G sequences. Parts of some active genes form Z-DNA, suggesting that Z-DNA may play a role in regulating gene transcription. Other secondary structures may exist under special conditions or with special base sequences, and characteristics of some of these structures are given in Table 10.2. Structures other than B-DNA exist rarely, if ever, within cells. Local variation in secondary structures DNA is frequently presented as a static, rigid structure that is invariant in its secondary structure. In reality, the numbers describing the parameters for B-DNA in Figure 10.14 are average values, and the actual measurements vary slightly from one part of the molecule to another. The twist between base pairs within a single molecule of B-DNA, for example, can vary from 27 degrees to as high as 42 degrees. This local variation in DNA structure arises because of differences in local environmental conditions, such as the presence of proteins, metals, and ions that may bind to the DNA. The base sequence also influences DNA structure locally. Concepts 28Å A form ◗ B form Z form 10.15 DNA can assume several different secondary structures. These structures depend on the base sequence of the DNA and the conditions under which it is placed. Lehninger, Biochem 3/e, p.338. Fig.10-19. DNA can assume different secondary structures, depending on the conditions in which it is placed and on its base sequence. B-DNA is thought to be the most common configuration in the cell. Local variation in DNA arises as a result of environmental factors and base sequence. www.whfreeman.com/pierce More on DNA structure and some interesting images of DNA 15 16 Chapter 10 Connecting Concepts Genetic Implications of DNA Structure After Oswald Avery and his colleagues demonstrated that the transforming principle is DNA, it was clear that the genotype resides within the chemical structure of DNA. Watson and Crick’s great contribution was their elucidation of the genotype’s chemical structure, making it possible for geneticists to begin to examine genes directly, instead of looking only at the phenotypic consequences of gene action. Determining the structure of DNA permitted the birth of molecular genetics — the study of the chemical and molecular nature of genetic information. Watson and Crick’s structure did more than just create the potential for molecular genetic studies; it was an immediate source of insight into key genetic processes. At the beginning of this chapter, three fundamental properties of the genetic material were identified. First, it must be capable of carrying large amounts of information; so it must vary in structure. Watson and Crick’s model suggested that genetic instructions are encoded in the base sequence, the only variable part of the molecule. The sequence of the four bases — adenine, guanine, cytosine, and thymine — along the helix encodes the information that ultimately determines the phenotype. Watson and Crick were not sure how the base sequence of DNA determined the phenotype, but their structure clearly indicated that the genetic instructions were encoded in the bases. A second necessary property of genetic material is its ability to replicate faithfully. The complementary polynucleotide strands of DNA make this replication possible. Watson and Crick wrote, “It has not escaped our attention that the specific base pairing we have postulated immediately suggests a possible copying mechanism for the genetic material.” They proposed that, in replication, the two polynucleotide strands unzip, breaking the weak hydrogen bonds between the two strands, and each strand serves as a template on which a new strand is synthesized. The specificity of the base pairing means that only one possible sequence of bases — the complementary sequence — can be synthesized from each template. Newly replicated doublestranded DNA molecules will therefore be identical with the original double-stranded DNA molecule (see Chapter 12 on DNA replication). The third essential property of genetic material is the ability to translate its instructions into the phenotype. For most traits, the immediate phenotype is production of a protein; so the genetic material must be capable of encoding proteins. Proteins, like DNA, are polymers, but their repeating units are amino acids, not nucleotides. A protein’s function depends on its amino acid sequence; so the genetic material must be able to specify that sequence in a form that can be transferred in the course of protein synthesis. DNA expresses its genetic instructions by first transferring its information to an RNA molecule, in a process termed transcription (see Chapter 13). The term transcription is appropriate because, although the information is transferred from DNA to RNA, the information remains in the language of nucleic acids. The RNA molecule then transfers the genetic information to a protein by specifying its amino acid sequence. This process is termed translation (see Chapter 15) because the information must be translated from the language of nucleotides into the language of amino acids. We can now identify three major pathways of information flow in the cell ( ◗ FIGURE 10.16): in replication, information passes from one DNA molecule to other DNA molecules; in transcription, information passes from DNA to RNA; and, in translation, information passes from RNA to protein. This concept of information flow was formalized by Francis Crick in a concept that he called the DNA DNA replication Information is transferred from DNA to an RNA molecule. Transcription Information is transferred from one DNA molecule to another. RNA Information is transferred from RNA to a protein through a code that specifies the amino acid sequence. Reverse transcription 10.16 The three major pathways of information transfer within the cell are replication, transcription, and translation. In some viruses, information is transferred from RNA to DNA … RNA RNA replication Translation PROTEIN ◗ DNA PROTEIN …or to another RNA molecule. DNA: The Chemical Nature of the Gene central dogma of molecular biology. The central dogma states that genetic information passes from DNA to protein in a one-way information pathway. It indicates that genotype codes for phenotype but phenotype cannot code for genotype. We now realize, however, that the central dogma is an oversimplification. In addition to the three general information pathways of replication, transcription, and translation, other transfers may take place in certain organisms or under special circumstances, including the transfer of information from RNA to DNA, (in reverse transcription) and the transfer of information from RNA to RNA (in RNA replication; see Figure 10.16). Reverse transcription takes place in retroviruses and in some transposable elements; RNA replication takes place in some RNA viruses (see Chapter 8). (a) Hairpin Concepts (c) TGCGATACTCATCGCA T CGCA Stem A TGCGA T Loop CA CT (b) Stem GGCAAT CCGTTA GGCAATATTGCC The genetic information of DNA resides in the base sequence. When DNA replicates, the two strands separate, and each strand serves as a template on which a new strand is synthesized. Three principle pathways transfer genetic information: genetic information can pass from DNA to DNA through replication, from DNA to RNA through transcription, and from RNA to protein through translation. U (d) Cruciform ACGCTACTCATAGCGT TGCGATGAGTATCGCA TAGCG T T ACGCTA C C A Special Structures in DNA and RNA G TAGCGT G ACGCTA T A In double-stranded DNA, the pairing of bases on opposite nucleotide strands provides stability and produces the helical secondary structure of the molecule. Single-stranded DNA and RNA (the latter of which is almost always single stranded) lack the stabilizing influence of the paired nucleotide strands; so they exhibit no common secondary structure. Sequences within a single strand of nucleotides may be complementary to each other and can pair by forming hydrogen bonds, producing double-stranded regions ( ◗ FIGURE 10.17). This internal base pairing imparts a secondary structure to a single-stranded molecule. In fact, internal base pairing within single strands of nucleotides can result in a great variety of secondary structures. One common type of secondary structure found in single strands of nucleotides is a hairpin, which forms when sequences of nucleotides on the same strand are inverted complements. The sequence 5 TGCGAT 3 and 5 ATCGCA 3 are examples of inverted complements and, when these sequences are on the same nucleotide strand, they can pair G UACGG CAUGC C C More information on the G www.whfreeman.com/pierce central dogma AA UUCA AAGU ◗ 10.17 Both DNA and RNA can form special secondary structures. (a) A hairpin, consisting of a region of paired bases (which forms the stem) and a region of unpaired bases between the complementary sequences (which form a loop at the end of the stem). (b) A stem with no loop. (c) Secondary structure, showing many hairpins, of an RNA component of a riboprotein, commonly referred to as the enzyme RNase P of E. coli. (d) A cruciform structure. 17 18 Chapter 10 and form a hairpin (see Figure 10.17a). A hairpin consists of a region of paired bases (the stem) and sometimes includes intervening unpaired bases (the loop). When the complementary sequences are contiguous, the hairpin has a stem but no loop (see Figure 10.17b). Hairpins frequently control aspects of information transfer. RNA molecules may contain numerous hairpins, allowing them to fold up into complex structures (see Figure 10.17c). In double-stranded DNA, sequences that are inverted replicas of each other are called inverted repeats. The following double-stranded sequence is an example of inverted repeats: 5 – AAAG . . . CTTT – 3 3 – TTTC . . . GAAA– 5 Notice that the sequences on the two strands are the same when read from 5 to 3 but, because the polarities of the two strands are opposite, their sequences are reversed from left to right. An inverted repeat that is complementary to itself, such as: 5 – ATCGAT – 3 3 – TAGCTA – 5 is also a palindrome, defined as a word or sentence that reads the same forward and backward, such as “rotator.” Inverted repeats are palindromes because the sequences on the two strands are the same but in reverse orientation. When an inverted repeat forms a perfect palindrome, the double-stranded sequence reads the same forward and backward. Another secondary structured, called a cruciform, can be made from an inverted repeat when a hairpin forms within each of the two single-stranded sequences. (see Figure 10.17d). Concepts In DNA and RNA, base pairing between nucleotides on the same strand produces special secondary structures such as hairpins and cruciforms. DNA Methylation The primary structure of DNA can be modified in various ways. These modifications are important in the expression of the genetic material, as we will see in the chapters to come. One such modification is DNA methylation, in which methyl groups (–CH3) are added (by specific enzymes) to certain positions on the nucleotide bases. In bacteria, adenine and cytosine are commonly methylated, whereas, in eukaryotes, cytosine is the most NH2 CH3 Methyl group N O N H 5-Methylcytosine ◗ 10.18 In eukaryotic DNA, cytosine bases are often methylated to form 5-methylcytosine. commonly methylated base. Bacterial DNA is frequently methylated to distinguish it from foreign, unmethylated DNA that may be introduced by viruses; bacteria use proteins called restriction enzymes to cut up any unmethylated viral DNA (see Chapter 18). In eukaryotic DNA, cytosine bases are often methylated to form 5-methylcytosine ( ◗ FIGURE 10.18). The extent of cytosine methylation varies; in most animal cells, about 5% of the cytosine bases are methylated, but more than 50% of the cytosine bases in some plants are methylated. On the other hand, no methylation of cytosine has been detected in yeast cells, and only very low levels of methylation (about 1 methylated cytosine base per 12,500 nucleotides) are found in Drosophila. Why eukaryotic organisms differ so widely in their degree of methylation is not clear. Methylation is most frequent on cytosine nucleotides that sit next to guanine nucleotides on the same strand: . . . GC . . . . . . CG . . . In eukaryotic cells, methylation is often related to gene expression. Sequences that are methylated typically show low levels of transcription while sequences lacking methylation are actively being transcribed (see Chapter 16). Methylation can also affect the three-dimensional structure of the DNA molecule. Concepts Methyl groups may be added to certain bases in DNA, depending on their positions in the molecule. Both prokaryotic and eukaryotic DNA can be methylated. In eukaryotes, cytosine bases are most often methylated to form 5methylcytosine, and methylation is often related to gene expression. www.whfreeman.com/pierce The latest on DNA methylation, at the Web site of the DNA Methylation Society DNA: The Chemical Nature of the Gene Bends in DNA Some specific base sequences — such as a series of four or more adenine – thymine base pairs — cause the DNA double helix to bend. Bending affects how the DNA binds to certain proteins and may be important in controlling the transcription of some genes. The DNA helix can also be made to bend by the binding of proteins to specific DNA sequences ( ◗ FIGURE 10.19). The SRY protein, which is encoded by a Y-linked gene and is responsible for sex determination in mammals (see Chapter 4), binds to certain DNA sequences (along the minor groove) and activates nearby genes that encode male traits. When the SRY protein grips the DNA, it bends the molecule about 80 degrees. This distortion of the DNA helix apparently facilitates the binding of other proteins that activate the transcription of genes that encode male characteristics. ◗ 10.19 The DNA helix can be bent by the binding of proteins to the DNA molecule. 19 Connecting Concepts Across Chapters This chapter has shifted the focus of our study to molecular genetics. The first nine chapters of this book examined various aspects of transmission genetics. In these chapters, the focus was on the individual: which phenotype was produced by an individual genotype, how the genes of an individual were transmitted to the next generation, and what types of offspring were produced when two individuals were crossed. In molecular genetics, our focus now shifts to genes: how they are encoded in DNA, how they are replicated, and how they are expressed. Much of what follows in this book will depend on your knowledge of DNA. An understanding of all the major processes of information transfer — replication, transcription, and translation — requires an understanding of nucleic acid structure; discussions of recombinant DNA, mutation, gene expression, cancer genetics, and even population genetics are based on the assumption that you understand the basic structure and function of DNA. Thus the information in this chapter provides a critical foundation for much of the remainder of the book. In this chapter, the history of how DNA’s structure and function were unraveled has been strongly emphasized, because the DNA story illustrates how pivotal scientific discoveries are often made. No one scientist discovered the structure of DNA; rather, numerous persons, over a long period of time, made important contributions to our understanding of its structure. Watson and Crick’s proposal for DNA’s double-helical structure stands out as a singularly important contribution, because it combined many known facts about the structure into a new model that allowed important inferences about the fundamental nature of genes. The DNA story also illustrates the important lesson that science is a human enterprise, influenced by personalities, relations, and motivation. CONCEPTS SUMMARY • Genetic material must contain complex information, be replicated accurately, and have the capacity to be translated into the phenotype. • Evidence that DNA is the source of genetic information came from the finding by Avery, MacLeod, and McCarty that transformation — the genetic alteration of bacteria — was dependent on DNA and from the demonstration by Hershey and Chase that viral DNA is passed on to progeny phages. The results of experiments with tobacco mosaic virus showed that RNA carries genetic information in some viruses. • James Watson and Francis Crick proposed a new model for the three-dimensional structure of DNA in 1953. • A DNA nucleotide consists of a deoxyribose sugar, a phosphate group, and a nitrogenous base. RNA consists of a ribose sugar, a phosphate group, and a nitrogenous base. • The bases of a DNA nucleotide are of two types: purines (adenine and guanine) and pyrimidines (cytosine and thymine). RNA contains the pyrimidine uracil instead of thymine. • Nucleotides are joined by phosphodiester linkages in a polynucleotide strand. Each polynucleotide strand has a 5 end with a phosphate and a 3 end with a hydroxyl group. • DNA consists of two nucleotide strands that wind around each other to form a double helix. The sugars and phosphates lie on the outside of the helix, and the bases are stacked in 20 Chapter 10 the interior. Bases from the two strands are joined by hydrogen bonding. The two strands are antiparallel and complementary. • DNA molecules can form a number of different secondary structures, depending on the conditions in which the DNA is placed and on its base sequence. B-DNA, which consists of a right-handed helix with approximately 10 bases per turn, is the most common form of DNA in cells. • The structure of DNA has several important genetic implications. Genetic information resides in the base sequence of DNA, which ultimately specifies the amino acid sequence of proteins. Complementarity of the bases on DNA’s two strands allows genetic information to be replicated. • Important pathways by which information passes from DNA to other molecules include: (1) replication, in which one molecule of DNA serves as a template for the synthesis of two new DNA molecules; (2) transcription, in which DNA serves as a template for the synthesis of an RNA molecule; and (3) translation, in which RNA codes for protein. • The central dogma of molecular biology proposes that information flows in a one-way direction, from DNA to RNA to protein. Clear exceptions to the central dogma are not known. • Pairing between bases on the same nucleotide strand can lead to hairpins and other secondary structures. Inverted repeats are sequences on the same strand that are inverted and complementary; they can lead to cruciform structures. • DNA methylation is the addition of methyl groups to the nucleotide bases. In bacteria, adenine and cytosine are commonly methylated. Among eukaryotes, cytosine bases are most commonly methylated to form 5-methylcytosine. • Some sequences, such as a series of four or more adenine – thymine base pairs, can cause DNA to bend, which may affect gene expression. IMPORTANT TERMS nucleotide (p. 000) Chargaff ’s rules (p. 000) transforming principle (p. 000) isotopes (p. 000) X-ray diffraction (p. 000) ribose (p. 000) deoxyribose (p. 000) nitrogenous base (p. 000) purine (p. 000) pyrimidine (p. 000) adenine (A) (p. 000) guanine (G) (p. 000) cytosine (C) (p. 000) thymine (T) (p. 000) uracil (U) (p. 000) nucleoside (p. 000) phosphate group (p. 000) deoxyribonucleotide (p. 000) ribonucleotide (p. 000) phosphodiester linkage (p. 000) polynucleotide strand (p. 000) 5 end (p. 000) 3 end (p. 000) antiparallel (p. 000) complementary (p. 000) B-DNA (p. 000) A-DNA (p. 000) Z-DNA (p. 000) local variation (p. 000) transcription (p. 000) translation (p. 000) replication (p. 000) central dogma (p. 000) reverse transcription (p. 000) RNA replication (p. 000) hairpin (p. 000) inverted repeats (p. 000) palindrome (p. 000) cruciform (p. 000) DNA methylation (p. 000) 5-methylcytosine (p. 000) Worked Problems 1. The percentage of cytosine in a double-stranded DNA molecule is 40%. What is the percentage of thymine? • Solution In double-stranded DNA, A pairs with T, whereas G pairs with C; so the percentage of A equals the percentage of T, and the percentage of G equals the percentage of C. If C  40%, then G also must be 40%. The total percentage of C  G is therefore 40%  40%  80%. All the remaining bases must be either A or T; so the total percentage of A  T  100%  80%  20%; because the percentage of A equals the percentage of T, the percentage of T is 20%/2  10%. 2. Which of the following relations will be true for the percentage of bases in double-stranded DNA? C T (a) C  T  A  G (b)  A G • Solution An easy way to determine whether the relations are true is to arbitrarily assign percentages to the bases, remembering that, in double-stranded DNA, A  T and G  C. For example, if the percentages of A and T are each 30%, then the percentages of G and C are each 20%. We can substitute these values into the equations to see if the relations are true. (a) 20  30  30  20, so this relation is true. 20 30 (b) ; so this relation is not true.  30 20 DNA: The Chemical Nature of the Gene 21 The New Genetics MINING GENOMES INTRODUCTION TO GENBANK AND PUBMED This exercise introduces you to some of the genetics databases that are most frequently used by contemporary researchers. You will explore some of the tools available at the National Center for Biotechology Information (NCBI), whis is managed by the National Library of Medicine of the United States. COMPREHENSION QUESTIONS * 1. What three general characteristics must the genetic material possess? 2. Briefly outline the history of our knowledge of the structure of DNA until the time of Watson and Crick. Which do you think were the principle contributions and developments? * 3. What experiments demonstrated that DNA is the genetic material? 4. What is transformation? How did Avery and his colleagues demonstrate that the transforming principle is DNA? * 5. How did Hershey and Chase show that DNA is passed to new phages in phage reproduction? 6. Why was Watson and Crick’s discovery so important? * 7. Draw and label the three parts of a DNA nucleotide. 8. How does an RNA nucleotide differ from a DNA nucleotide? 9. How does a purine differ from a pyrimidine? What purines and pyrimidines are found in DNA and RNA? *10. Draw a short segment of a single polynucleotide strand, including at least three nucleotides. Indicate the polarity of the strand by labeling the 5 end and the 3 end. 11. Which bases are capable of forming hydrogen bonds with each other? * 12. What is local variation in DNA structure and what causes it? 13. What are some of the important genetic implications of the DNA structure? 14. What are the major transfers of genetic information? * 15. What are hairpins and how do they form? 16. What is DNA methylation? APPLICATION QUESTIONS AND PROBLEMS 17. A student mixes some heat-killed type IIS Streptococcus pneumonia bacteria with live type IIR bacteria and injects the mixture into a mouse. The mouse develops pneumonia and dies. The student recovers some type IIS bacteria from the dead mouse. It is the only experiment conducted by the student. Has the student demonstrated that transformation has taken place? What other explanations might explain the presence of the type IIS bacteria in the dead mouse? * 18. (a) Why did Hershey and Chase choose 32P and 35S for use in their experiment? (b) Could they have used radioactive isotopes of carbon (C) and oxygen (O) instead? Why or why not? 19. What results would you expect if the Hershey and Chase experiment were conducted on tobacco mosaic virus? * 22. Which of the following relations will be found in the percentages of bases of a double-stranded DNA molecule? (d) AT  1.0 CG (g) A T  G C (b) A  G  T  C (e) AG  1.0 CT (h) G A  T C (c) A  C  G  T (f) A G  C T * 23. If a double-stranded DNA molecule is 15% thymine, what are the percentages of all the other bases? 24. * 20. Each nucleotide pair of a DNA double helix weighs about 1  1021 g. The human body contains approximately 0.5 g of DNA. How many nucleotide pairs of DNA are in the human body? If you assume that all the DNA in human cells is in * 25. the B-DNA form, how far would the DNA reach if stretched end to end? 21. What aspects of its structure contribute to the stability of the DNA molecule? Why is RNA less stable than DNA? (a) A  T  G  C A virus contains 10% adenine, 24% thymine, 30% guanine, and 36% cytosine. Is the genetic material in this virus doublestranded DNA, single-stranded DNA, double-stranded RNA, or single-stranded RNA? Support your answer. A B-DNA molecule has 1 million nucleotide pairs. (a) How many complete turns are there in this molecule? (b) If this same molecule were in the Z-DNA configuration, how many complete turns would it have? 22 Chapter 10 26. For entertainment on a Friday night, a genetics professor * 27. proposed that his children diagram a polynucleotide strand of DNA. Having learned about DNA in preschool, his 5year-old daughter was able to draw a polynucleotide strand, but she made a few mistakes. The daughter’s diagram 28. (represented here) contained at least 10 mistakes. (a) Make a list of all the mistakes in the structure of this DNA polynucleotide strand. (b) Draw the correct structure for the polynucleotide strand. Chapter 1 considered the theory of the inheritance of acquired characteristics and noted that this theory is no longer accepted. Is the central dogma consistent with the theory of the inheritance of acquired characteristics? Why or why not? Write a sequence of bases in an RNA molecule that would produce a hairpin structure. 29. The following sequence is present in one strand of a DNA molecule: 5 – CATTGACCGA – 3 O Write the sequence on the same strand that produces an inverted repeat and the sequence on the complementary strand.  O 9 P 9 O OH 9 CH base C H H H OH H O  O 9 P 9 O OH 9 CH H H base C H OH H OH CHALLENGE QUESTIONS 30. Suppose that an automated, unmanned probe is sent into deep space to search for extraterrestrial life. After wandering for many light-years among the far reaches of the universe, this probe arrives on a distant planet and detects life. The chemical composition of life on this planet is completely different from that of life on Earth, and its genetic material is not composed of nucleic acids. What predictions can you make about the chemical properties of the genetic material on this planet? 31. How might 32P and 35S be used to demonstrate that the transforming principle is DNA? Briefly outline an experiment that would show that DNA and not protein is the transforming principle. 32. Scientists have reportedly isolated short fragments of DNA from fossilized dinosaur bones hundreds of millions of years old. The technique used to isolate this DNA is the polymerase chain reaction (PCR), which is capable of amplifying very small amounts of DNA a millionfold (see Chapter 16). Critics have claimed that the DNA isolated from dinosaur bones is not of ancient origin but instead represents contamination of the samples with DNA from present-day organisms such as bacteria, mold, or humans. What precautions, analyses, and control experiments could be carried out to ensure that DNA recovered from fossils is truly of ancient origin? SUGGESTED READINGS Avery, O. T., C. M. MacLeod, and M. McCarty. 1944. Studies on the chemical nature of the substance inducing transformation of pneumococcal types. Journal of Experimental Medicine 79:137 – 158. Avery, MacLeod, and McCarty’s paper describing their demonstration that the transforming principle is DNA. Crick, F. 1988. What Mad Pursuit: A Personal View of Scientific Discovery. New York: Basic Books. Francis Crick’s personal account of the discovery of the structure of DNA. Dickerson, R. E., H. R. Drew, B. N. Conner, R. M Wing, A. V. Fratini, and M. L. Kopka. 1982. The anatomy of A-, B-, and ZDNA. Science 216:475 – 485. A review of differences in secondary structures of DNA. Fraenkal-Conrat, H., and B. Singer. 1957. Virus reconstitution II: combination of protein and nucleic DNA: The Chemical Nature of the Gene acid from different strains. Biochimica et Biophysica Acta 24:540 – 548. Report of Fraenkal-Conrat and Singer’s well-known experiment showing that RNA is the genetic material in tobacco mosaic virus. Griffith, F. 1928. The significance of pneumoncoccal types. Journal of Hygiene 27:113 – 159. Griffith’s original report of the transforming principle. Handt, O., M. Richards, M. Trommsdorff, et al. 1994. Molecular genetic analysis of the Tyrolean Ice Man. Science 264:1775 – 1778. Describes the isolation and analysis of DNA from a 5000-year-old frozen man found on a glacier in the Alps. Hershey, A. D., and M. Chase. 1952. Independent functions of viral protein and nucleic acid in growth of bacteriophage. Journal of General Physiology 36:39 – 56. Original report of Hershey and Chase’s well-known experiment with T2 bacteriophage. Judson, H. F. 1996. The Eighth Day of Creation: Makers of the Revolution in Biology, expanded edition. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. A comprehensive account of the early years of molecular genetics. Miescher, F. 1871. On the chemical composition of pus cells. Hoppe-Seyler’s Med.-Chem. Untersuch. 4:441 – 460. Abridged 23 and translated in Great Experiments in Biology, M. L. Gabriel, and S. Fogel (Eds.). Englewood Cliffs, NJ: Prentice-Hall, 1955. An abridged and translated version of Miescher’s original paper chemically characterizing DNA. Mirsky, A. E. 1968 The discovery of DNA. Scientific American 2 (6):78 – 88. A good account of the discovery of DNA structure. Rich, A., A. Nordheim, and A. H.-J. Wang. 1984. The chemistry and biology of left-handed Z-DNA. Annual Review of Biochemistry 53:791 – 846. Good review article on the structure and possible function of Z-DNA. Watson, J. D. 1968. The Double Helix. New York: Atheneum. An excellent account of Watson and Crick’s discovery of DNA. Watson, J. D., and F. C. Crick. 1953. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acids. Nature 171:737 – 738. Original paper in which Watson and Crick first presented their new structure for DNA. Zimmerman, S. B. 1982. The three-dimensional structure of DNA. Annual Review of Biochemistry 51:395 – 427. Review of the different secondary structures that DNA can assume. 11 Chromosome Structure and Transposable Elements • • • • YACs and the Common Mouse Packing DNA into Small Spaces The Bacterial Chromosome The Eukaryotic Chromosome Chromotin Structure Centromere Structure Telomere Structure • Variation in Eukaryotic DNA Sequences Denaturation and Renaturation of DNA Renaturation Reactions and C0t Curves Types of DNA sequence in Eukaryotes • The Nature of Transposable Elements General Characteristics of Transposable Elements Transposition Mechanisms of Transposition The Mutagenic Effects of Transposition The house mouse, Mus musculus, is one of the oldest and most important organisms used for genetic studies. Molecular techniques allow genes to be introduced into mice on yeast artificial chromosomes (YACs). Pigment in the brown mice is produced by a gene for tyrosinase that is carried on a YAC. The white mouse is a littermate, without the introduced gene. (Carolyn A. McKeone/Photo Researchers.) YACs and the Common Mouse The common house mouse, Mus musculus, is among the oldest and most valuable subjects for genetic study. It’s an excellent genetic organism — small, prolific, and easy to keep, with a short generation time (about 3 months). It tolerates inbreeding well; so a large number of inbred strains have been developed through the years. Finally, being a mammal, the mouse is genetically and physiologically more similar to humans than are other organisms used in genetics studies, such as bacteria, yeast, corn, and fruit flies. Powerful tools of molecular biology have enhanced the mouse’s role in probing fundamental questions of heredity. New and altered genes can be added to the mouse genome 2 The Regulation of Transposition • The Structure of Transposable Elements Transposable Elements in Bacteria Transposable Elements in Eukaryotes • The Evolution of Transposable Elements by injecting DNA directly into embryos that are implanted into surrogate mothers. The resulting transgenic mice can be bred to produce offspring carrying the new genes. Today, it is possible to introduce not just individual genes, but entire chromosomes into mouse cells. In 1983, the first artificial chromosomes, made of parts culled from yeast and protozoans, were created for studying chromosome structure and segregation. In 1987, David Burke and Maynard Olson (at Washington University, St. Louis) used yeast to create much larger artificial chromosomes called yeast artificial chromosomes or YACs. Each YAC includes the three essential elements of a chromosome: a centromere, a pair of telomeres, and an origin of replication. These elements ensure that artificial chromosomes will segregate in Chromosome Structure and Transposabe Elements mitosis and meiosis, will not be degraded, and will replicate successfully. Large chunks of extra DNA from any source can be added to a YAC, and the new artificial chromosome can be inserted into a cell. Eukaryotic centromeres, telomeres, and origins of replication are similar in different organisms; so YACs function well in almost any eukaryotic cell. In 1993, molecular geneticists successfully modified YACs so that they could be transferred to mouse cells. Previously, transgenic mice could carry only relatively small pieces of DNA, usually no more than 50,000 bp. Now, large genes as well as the surrounding DNA, which may be important in the regulation of those genes, can be added to mouse-cell nuclei. Artificial chromosomes have also been made from chromosomal components of bacteria (BACs) and mammals (MACs). The successful construction of YACs, BACs, and MACs illustrates the fundamental nature of eukaryotic chromosomes: huge amounts of DNA complexed with proteins and possessing telomeres, centromeres, and origins of replication. In this chapter, we explore the molecular nature of chromosomes, including details of the DNA – protein complex and the structure of telomeres and centromeres; origins of replication will be discussed in Chapter 12. Much of this chapter focuses on a storage problem: how to cram tremendous amounts of DNA into the limited confines of a cell. Even in those organisms having the smallest amounts of DNA, the length of genetic material far exceeds the length of the cell. Thus, cellular DNA must be highly folded and tightly packed, but this packing creates problems — it renders the DNA inaccessible, unable to be copied or read. Functional DNA must be capable of partly unfolding and expanding so that individual genes can undergo replication and transcription. The flexible, dynamic nature of DNA packing will be a central theme of this chapter. We begin this chapter by considering supercoiling, an important tertiary structure of DNA found in both prokaryotic and eukaryotic cells. After a brief look at the bacterial chromosome, we examine the structure of eukaryotic chromosomes. After considering chromosome structure, we pay special attention to the working parts of a chromosome, specifically centromeres and telomeres. We also consider the types of DNA sequences present in many eukaryotic chromosomes and how DNA sequences are analyzed. The second part of this chapter focuses on genes that move. For many years, biologists viewed genes as static entities that occupied fixed positions on chromosomes. But we now recognize that many genetic elements do not occupy fixed positions. Genes that can move have been given a variety of names, including transposons, transposable genetic elements, mobile DNA, movable genes, controlling elements, and jumping genes. We will refer to mobile DNA sequences as transposable elements, and by this term we mean any DNA sequence that is capable of moving from one place to another place within the genome. We begin the second part of the chapter by outlining some of the general features of transposable elements and the processes by which they move from place to place. We then consider several different types of transposable elements found in prokaryotic and eukaryotic genomes. Finally, we consider the evolutionary significance of transposable elements. www.whfreeman.com/pierce More information about YACs and how genes are cloned into YACs and more information about mouse genetics Packing DNA into Small Spaces The packaging of tremendous amounts of genetic information into the small volume of a cell has been called the ultimate storage problem. Consider the chromosome of the bacterium E. coli, a single molecule of DNA with approximately 4.64 million base pairs. Stretched out straight, this DNA would be about 1000 times as long as the cell within which it resides ( ◗ FIGURE 11.1). Human cells contain 6 billion base pairs of DNA, which would measure some 1.8 meters stretched end to end. Even DNA in the smallest human chromosome would stretch 14,000 times the length of the nucleus. Clearly, DNA molecules must be tightly packed to fit into such small spaces. The structure of DNA can be considered at three hierarchical levels: the primary structure of DNA is its nucleotide sequence; the secondary structure is the doublestranded helix; and the tertiary structure refers to higherorder folding that allows DNA to be packed into the confined space of a cell. Concepts Chromosomal DNA exists in the form of very long molecules, which must be tightly packed to fit into the small confines of a cell. One type of DNA tertiary structure is supercoiling, which occurs when the DNA helix is subjected to strain by being overwound or underwound. The lowest energy state for B-DNA is when it has approximately 10 bp per turn of its helix. In this relaxed state, a stretch of 100 bp of DNA would assume about 10 complete turns. ( ◗ FIGURE 11.2a). If energy is used to add or remove any turns by rotating one strand around the other, strain is placed on the molecule, causing the helix to supercoil, or twist, on itself ( ◗ FIGURE 11.2b and c). Supercoiling is a natural consequence of the overrotating or underrotating of the helix; it occurs only when the ◗ 11.1 (overleaf, encircling pp. 4 – 5) The DNA in E. coli is about 1000 times as long as the cell itself. 3 E. coli bacterium 4 Chapter 11 Bacterial chromosome (a) Relaxed circular DNA A coiled telephone cord is like relaxed circular DNA. (b) Add two turns (overrotate) (c) Remove two turns (underrotate) molecule is placed under strain. Molecules that are overrotated exhibit positive supercoiling (see Figure 11.2b). Underrotated molecules exhibit negative supercoiling (see Figure 11.2c), in which the direction of the supercoil is opposite that of the right-handed coil of the DNA helix. Supercoiling occurs only if the two polynucleotide strands of the DNA double helix are unable to rotate about each other freely. If the chains can turn freely, their ends will simply turn as extra rotations are added or removed, and the molecule will spontaneously revert to the relaxed state. Supercoiling takes place when the strain of overrotating or underrotating cannot be compensated by the turning of the ends of the double helix, which is the case if the DNA is circular — that is, there are no free ends. Some viral chromosomes are in the form of simple circles and readily undergo supercoiling. Large molecules of bacterial DNA are typically a series of large loops, the ends of which are held together by proteins. Eukaryotic DNA is normally linear but also tends to fold into loops stabilized by proteins. In these chromosomes, the anchoring proteins prevent free rotation of the ends of the DNA; so supercoiling does take place. Supercoiling relies on topoisomerases, enzymes that add or remove rotations from the DNA helix by temporarily breaking the nucleotide strands, rotating the ends around each other, and then rejoining the broken ends. The two classes of topoisomerases are: type I, which breaks only one of the nucleotide strands and reduces supercoiling by removing rotations; and type II, which adds or removes rotations by breaking both nucleotide strands. Most DNA found in cells is negatively supercoiled, which has two advantages for the cell. First, supercoiling makes the separation of the two strands of DNA easier during replication and transcription. Negatively supercoiled DNA is underrotated; so separation of the two strands during replication and transcription is more rapid and requires less energy. Second, supercoiled DNA can be packed into a smaller space because it occupies less volume than relaxed DNA. Concepts Positive supercoil Positive supercoiling occurs when DNA is overrotated; the helix twists on itself. ◗ 11.2 Negative supercoil Negative supercoiling occurs when DNA is underrotated; the helix twists on itself in the opposite direction. If you turn the receiver in the way opposite to how it's coiled when you hang up, you induce a negative supercoil in the cord. Supercoiled DNA is overwound or underwound, causing it to twist on itself. Electron micrographs are of relaxed DNA (top) and supercoiled DNA (bottom). (Dr. Gopal Murti/Phototake.) Overrotation or underrotation of a DNA double helix places strain on the molecule, causing it to supercoil. Supercoiling is controlled by topoisomerase enzymes. Most cellular DNA is negatively supercoiled, which eases the separation of nucleotide strands during replication and transcription and allows DNA to be packed into small spaces. The Bacterial Chromosome Most bacterial genomes consist of a single, circular DNA molecule, although linear DNA molecules have been found in a few species. In circular bacterial chromosomes, the Chromosome Structure and Transposabe Elements (a) (b) Twisted loops of DNA ◗ 11.3 Bacterial DNA is highly folded into a series of twisted loops. Proteins (Part a, Dr. Gopal Murti/Photo Researchers.) DNA does not exist in an open, relaxed circle; the 3 million to 4 million base pairs of DNA found in a typical bacterial genome would be much too large to fit into a bacterial cell (see Figure 11.1). Bacterial DNA is not attached to histone proteins (as is eukaryotic DNA, discussed later in the chapter). Consequently, for many years bacterial DNA was called “naked DNA.” However, this term is inaccurate, because bacterial DNA is complexed to a number of proteins that help compact it. When a bacterial cell is viewed with the electron microscope, its DNA frequently appears as a distinct clump, the nucleoid, which is confined to a definite region of the cytoplasm. If a bacterial cell is broken open gently, its DNA spills out in a series of twisted loops ( ◗ FIGURE 11.3a). The ends of the loops are most likely held in place by proteins ( ◗ FIGURE 11.3b). Many bacteria contain additional DNA in the form of small circular molecules called plasmids, which replicate independently of the chromosome (see Chapter 8). Concepts The typical bacterial chromosome consists of a large, circular molecule of DNA that is a series of twisted loops. Bacterial DNA appears as a distinct clump, the nucleoid, within the bacterial cell. www.whfreeman.com/pierce Information about the genome of the common bacterium E. coli The Eukaryotic Chromosome Individual eukaryotic chromosomes contain enormous amounts of DNA. Like bacterial chromosomes, each eukaryotic chromosome consists of a single, extremely long molecule of DNA. For all of this DNA to fit into the nucleus, tremendous packing and folding are required, the extent of which must change through time. The chromosomes are in an elongated, relatively uncondensed state during interphase of the cell cycle (see p. 000 in Chapter 2), but the term relatively is an important qualification here. Although the DNA of interphase chromosomes is less tightly packed than DNA in mitotic chromosomes, it is still highly condensed; it’s just less condensed. In the course of the cell cycle, the level of DNA packaging changes — chromosomes progress from a highly packed state to a state of extreme condensation. DNA packaging also changes locally in replication and transcription, when the two nucleotide strands must unwind so that particular base sequences are exposed. Thus, the packaging of eukaryotic DNA (its tertiary, chromosomal structure) is not static but changes regularly in response to cellular processes. Chromatin Structure As mentioned in Chapter 2, eukaryotic DNA is closely associated with proteins, creating chromatin. The two basic types of chromatin are: euchromatin, which undergoes the normal process of condensation and decondensation in the cell cycle, and heterochromatin, which remains in a highly condensed state throughout the cell cycle, even during interphase. Euchromatin constitutes the majority of the chromosomal material, whereas heterochromatin is found at the centromeres and telomeres of all chromosomes, at other specific places on some chromosomes, and along the entire inactive X chromosome in female mammals (see p. 000 in Chapter 4). The most abundant proteins in chromatin are the histones, which are relatively small, positively charged proteins of five major types: H1, H2A, H2B, H3, and H4 (Table 11.1). All histones have a high percentage of arginine and lysine, positively charged amino acids that give them a net positive charge. The positive charges attract the negative charges on the phosphates of DNA and holds the DNA in contact with the histones. A heterogeneous assortment of nonhistone chromosomal proteins make up about half of the protein mass of the chromosome. A fundamental problem in the study of these proteins is that the nucleus is full of all sorts of proteins; so, whenever chromatin is isolated from the nucleus, it may be contaminated by nonchromatin proteins. On the other hand, isolation procedures may also remove proteins that 5 6 Chapter 11 Table 11.1 Characteristics of histone proteins Histone Protein Molecular Weight Number of Amino Acids H1 21,130 223 H2A 13,960 129 H2B 13,774 125 H3 15,273 135 H4 11,236 102 Note: The sizes of H1, H2A, and H2B histones vary somewhat from species to species. The values given are for bovine histones. Source: Data are from A.L. Lehninger, D. L. Nelson, and M. M. Cox, Principles of Biochemistry, 3d ed. (New York: Worth Publishers, 1993), p. 924. Other types of nonhistone chromosomal proteins play a role in genetic processes. They are components of the replication machinery (DNA polymerases, helicases, primases; see Chapter 12) and proteins that carry out and regulate transcription (RNA polymerases, transcription factors, acetylases; see Chapter 13). High-mobility-group proteins are small, highly charged proteins that vary in amount and composition, depending on tissue type and stage of the cell cycle. Several of these proteins may play an important role in altering the packing of chromatin during transcription. The highly organized structure of chromatin is best viewed from several levels. In the next sections, we will examine these levels of chromatin organization. Concepts are associated with chromatin. In spite of these difficulties, we know that some groups of nonhistone proteins are clearly associated with chromatin. Nonhistone chromosomal proteins may be broadly divided into those that serve structural roles and those that take part in genetic processes such as transcription and replication. Chromosomal scaffold proteins ( ◗ FIGURE 11.4) are revealed when chromatin is treated with a concentrated salt solution, which removes histones and most other chromosomal proteins, leaving a chromosomal protein “skeleton” to which the DNA is attached. These scaffold proteins may play a role in the folding and packing of the chromosome. Other structural proteins make up the kinetochore, cap the chromosome ends by attaching to telomeres, and constitute the molecular motors that move chromosomes in mitosis and meiosis. ◗ 11.4 Scaffold proteins play a role in the folding and packing of chromosomes. (Professor U. Laemmli/ Photo Researchers.) Chromatin, which consists of DNA complexed to proteins, is the material that makes up eukaryotic chromosomes. The most abundant of these proteins are the five types of positively charged histone proteins: H1, H2A, H2B, H3, and H4. The nucleosome Chromatin has a highly complex structure with several levels of organization. The simplest level ( ◗ FIGURE 11.5) is the double helical structure of DNA discussed in Chapter 8. At a more complex level, the DNA molecule is associated with proteins and is highly folded to produce a chromosome. When chromatin is isolated from the nucleus of a cell and viewed with an electron microscope, it frequently looks like beads on a string ( ◗ FIGURE 11.6a on page 000), If a small amount of nuclease is added to this structure, the enzyme cleaves the string between the beads, leaving individual beads attached to about 200 bp of DNA ( ◗ FIGURE 11.6b). If more nuclease is added, the enzyme chews up all of the DNA between the beads and leaves a core of proteins attached to a fragment of DNA ( ◗ FIGURE 11.6c). Such experiments demonstrated that chromatin is not a random association of proteins and DNA but has a fundamental repeating structure. The repeating core of protein and DNA produced by digestion with nuclease enzymes is the simplest level of chromatin structure, the nucleosome (see Figure 11.5). The nucleosome is a core particle consisting of DNA wrapped about two times around an octamer of eight histone proteins (two copies each of H2A, H2B, H3, and H4), much like thread wound around a spool ( ◗ FIGURE 11.6d). The DNA in direct contact with the histone octamer is between 145 and 147 bp in length, coils around the histones in a left-handed direction, and is supercoiled. It does not wrap around the octamer smoothly; there are four bends, 7 Chromosome Structure and Transposabe Elements DNA double helix 1 At the simplest level is a double-stranded helical structure of DNA. 2 nm 2 DNA is complexed with histones to form nucleosomes. 3 Each nucleosome consists of eight histone proteins around which the DNA wraps 1.65 times. 4 Nucleosomes form “beads” on DNA “string.” Histone H1 Nucleosome core of eight histone molecules 11 nm 6 …that forms loops averaging 300 nm in length. 5 The nucleosomes fold up to produce a 30-nm fiber… 300 nm 30 nm 250 nm wide fiber 7 The 300-nm fibers are compressed and folded to produce a 250-nm-wide fiber. 8 Tight coiling of the 250-nm fiber produces the chromatid of a chromosome. 1400 nm 700 nm ◗ 11.5 Chromatin has a highly complex structure with several levels of organization. or kinks, in its helical structure as it winds around the histones. The fifth type of histone, H1, is not a part of the core particle but plays an important role in the nucleosome structure. The precise location of H1 with respect to the core particle is still uncertain. The traditional view is that H1 sits outside the octamer and binds to the DNA where the DNA joins and leaves the octamer (see Figure 11.5). However, the results of recent experiments suggest that the H1 histone sits inside the coils of the nucleosome. Regardless of its position, H1 helps to lock the DNA into place, acting as a clamp around the nucleosome octamer. Together, the core particle and its associated H1 histone are called the chromatosome, the next level of chromatin organization. The H1 protein is attached to between 20 and 22 bp of DNA, and the nucleosome encompasses an additional 145 to 147 bp of DNA; so about 167 bp of DNA are held within the chromatosome. Chromatosomes are located at regular intervals along the DNA molecule and are separated from one another by linker DNA, which varies in size among cell types — most cells have from about 30 bp to 40 bp of linker DNA. Nonhistone chromosomal proteins may be associated with this linker DNA, and a few also appear to bind directly to the core particle. Higher-order chromatin structure In chromosomes, adjacent nucleosomes are not separated by space equal to the length of the linker DNA; rather, nucleosomes fold on themselves to form a dense, tightly packed structure (see Figure 11.5). This structure is revealed when nuclei are gently broken open and their contents are examined with the use of an electron microscope; much of the chromatin that spills out appears as a fiber with a diameter of about 30 nm ( ◗ FIGURE 11.7a). A model of how this 30-nm fiber forms is shown in ◗ FIGURE 11.7b. The next-higher level of chromatin structure is a series of loops of 30-nm fibers, each anchored at its base by proteins in the nuclear scaffold (see Figure 11.5). On average, each loop encompasses some 20,000 to 100,000 bp of 8 Chapter 11 DNA and is about 300 nm in length, but the individual loops vary considerably. The 300-nm fibers are packed and folded to produce a 250-nm-wide fiber. Tight helical coiling of the 250-nm fiber, in turn, produces the structure that appears in metaphase: an individual chromatid approximately 700 nm in width. (a) Core histones of nucleosome (a) Linker DNA “Beads-on-a-string” view of chromatin Nuclease (b) 1 A small amount of nuclease cleaves the “string” between the beads,… (b) Individual nucleosomes 30-nm fiber ◗ 2 …releasing individual beads attached to about 200 bp of DNA. 11.7 Adjacent nucleosomes pack together to form a 30-nm fiber. (Part a, Barbara Hamkalo, Molecular Biology and Biochemistry, University of California at Irvine.) Concepts Nuclease 3 More nuclease destroys all of the unprotected DNA between the beads,… (c) 4 …leaving a core of proteins attached to 145–147 bp of DNA. 11 nm (d) The nucleosome consists of a core particle of eight histone proteins and DNA, about 146 bp in length, that wraps around the core. Chromatosomes, each including the core particle plus an H1 histone, are separated by linker DNA. Nucleosomes fold up to form a 30-nm chromatin fiber, which appears as a series of loops that pack to create a 250-nm-wide fiber. Helical coiling of the 250-nm fiber produces a 700-nm-wide chromatid. www.whfreeman.com/pierce A virtual tour of nucleosome and chromatin structure and links to additional sites on chromatin structure and research H2A' Changes in chromatin structure Although eukaryotic H2B H3 H2B' H2A H4 ◗ 11.6 The nucleosome is the fundamental repeating unit of chromatin. The space-filling model shows that the nucleosome core particle consists of two copies each of H2A, H2B, H3, and H4, around which DNA (white) coils. (Part d, from K. Luger et al., 1997, Nature 389:251; courtesy of T. H. Richmond.) DNA must be tightly packed to fit into the cell nucleus, it must also periodically unwind to undergo transcription and replication. Evidence of the changing nature of chromatin structure is seen in the puffs of polytene chromosomes and in the sensitivity of genes to digestion by DNase I. Polytene chromosomes are giant chromosomes found in certain tissues of Drosophila and some other organisms ( ◗ FIGURE 11.8). These large, unusual chromosomes arise when repeated rounds of DNA replication take place without accompanying cell divisions, producing thousands of copies of DNA that lie side by side. When polytene chromosomes are stained with dyes, numerous bands are revealed. Under certain conditions, the bands may exhibit chromosomal puffs — localized swellings of the chromosome. Each puff is a region of the chromatin that has relaxed its Chromosome Structure and Transposabe Elements Experiment Question: Is chromatin structure altered during transcription? Method: Sensitivity to DNase I was tested on different tissues and at different times in development. Key DNA sensitive to DNase I Highest sensitivity to DNase I (a) ◗ 11.8 Chicken DNA Syred/Science Photo Library/Photo Researchers.) (b) structure, assuming a more open state. If radioactively labeled uridine (a precursor to RNA) is briefly added to a Drosophila larva, radioactivity accumulates in chromosomal puffs, indicating that they are regions of active transcription. Additionally, the appearance of puffs at particular locations on the chromosome can be stimulated by exposure to hormones and other compounds that are known to induce the transcription of genes at those locations. This correlation between the occurrence of transcription and the relaxation of chromatin at a puff site indicates that chromatin structure undergoes dynamic change associated with gene activity. A second piece of evidence indicating that chromatin structure changes with gene activity is sensitivity to DNase I, an enzyme that digests DNA. The ability of this enzyme to digest DNA depends on chromatin structure: when DNA is tightly bound to histone proteins, it is less sensitive to DNase I, whereas unbound DNA is more sensitive to digestion by DNase I. The results of experiments that examine the effect of DNase I on specific genes show that DNase sensitivity is correlated with gene activity. For example, globin genes code for hemoglobin in the erythroblasts (precursors of red blood cells) of chickens. The forms of hemoglobin produced in chick embryos and chickens are different and are encoded by different genes ( ◗ FIGURE 11.9a). However, no hemoglobin is synthesized in chick embryos in the first 24 hours after fertilization. If DNase I is applied to chromatin from chick erythroblasts in this first 24-hour period, all the globin genes are insensitive to digestion ( ◗ FIGURE 11.9b). From day 2 to day 6 after fertilization, after hemoglobin synthesis has begun, the globin genes become sensitive to DNase I, and the genes that code for embryonic hemoglobin are the most sensitive ( ◗ FIGURE 11.9c). After 14 days of development, embryonic hemoglobin is replaced by the adult forms of hemoglobin. The most Erythroblasts first 24 hours Polytene chromosomes are giant chromosomes isolated from the salivary glands of larval Drosophila. Each puff represents a region of relaxed chromatin where transcription is taking place. (Andrew Embryonic globin gene U D A Before hemoglobin synthesis, none of the globin genes are sensitive to DNase I digestion. U D A After globin synthesis has begun, all genes are sensitive to DNase I, but the embryonic globin gene is the most sensitive. (c) Erythroblasts 5 days U D A In the 14-day-old embryo, when only adult hemoglobin is expressed, adult genes are most sensitive, and the embryonic gene is insensitive. (d) Erythroblasts 14 days U (e) Brain cells throughout development Adult globin genes D A Globin genes in the brain—which does not produce globin—remain insensitive throughout development. U D A Conclusion: Sensitivity of DNA to digestion by DNase I is correlated with gene expression, suggesting that chromatin structure changes during transcription. ◗ 11.9 DNase I sensitivity is correlated with the transcription of globin genes in erythroblasts of chick embryos. The U gene codes for embryonic hemoglobin; the D and  A genes code for adult hemoglobin. 9 10 Chapter 11 sensitive regions now lie near the genes that produce the adult hemoglobins ( ◗ FIGURE 11.9d). DNA from brain cells, which produce no hemoglobin, remains insensitive to DNase digestion throughout development ( ◗ FIGURE 11.9e). In summary, when genes become transcriptionally active, they also become sensitive to DNase I, indicating that the chromatin structure is more exposed during transcription. What is the nature of the change in chromatin structure that produces chromosome puffs and DNase I sensitivity? In both cases, the chromatin relaxes; presumably the histones loosen their grip on the DNA. One process that appears to be implicated in changing chromatin structure is acetylation, a reaction that adds chemical groups called acetyls to the histone proteins. Enzymes called acetyltransferases attach acetyl groups to lysine amino acids at one end (called a tail) of the histone protein. This modification reduces the positive charges that normally exist on lysine and destabilizes the nucleosome structure, and so the histones hold the DNA less tightly. Proteins taking part in transcription can then bind more easily to the DNA and carry out transcription. www.whfreeman.com/pierce chromosomes (a) Chromosome breakage Replication 1 A chromosome break produces two types of fragments, those with a centromere and those without. Centromere Images of polytene Mitosis Centromere Structure The centromere is a constricted region of the chromosome where spindle fibers attach and is essential for proper movement of the chromosome in mitosis and meiosis (Chapter 2). The essential role of the centromere in chromosome movement was recognized by early geneticists, who observed what happens when a chromosome breaks in two. A chromosome break produces two fragments, one with a centromere and one without ( ◗ FIGURE 11.10a). In mitosis, the chromosome fragment containing the centromere attaches to spindle fibers and moves to the spindle pole, whereas the fragment lacking a centromere never connects to a spindle fiber and is usually lost because it fails to move into the nucleus of a daughter cell ( ◗ FIGURE 11.10b). Although the centromere’s role in chromosome movement has been recognized for some time, its molecular nature has only recently been revealed. The first centromeres to be isolated and studied at the molecular level came from yeast, which have small, linear chromosomes. When molecular biologists attached sequences from yeast centromeres to plasmids (small circular DNA molecules that don’t have centromeres), the plasmids behaved in mitosis as if they were eukaryotic chromosomes. This finding indicated that the sequences from yeast, called centromeric sequences ( ◗ FIGURE 11.11), contain a functional centromere that allows segregation to take place. Centromeric sequences are the binding sites for proteins that function as the kinetochore, a complex that assembles on the centromere and to which the spindle fibers attach. (b) 2 In mitosis, each fragment with a centromere attaches to a spindle fiber and moves to the spindle pole,… Anaphase of mitosis Mitosis Telophase of mitosis 3 …but fragments without a centromere do not attach to a spindle fiber and are usually lost from the nucleus. After cytokinesis Chromosome fragments degrade ◗ 11.10 Chromosome fragments that lack a centromere are lost in mitosis. Chromosome Structure and Transposabe Elements 11 TCACATGATGATATTTGATTTTATTATATTTTTAAAAAAAGTAAAAAATAAAAAGTAGTTTATTTTTAAAAAATAAAATTTAAAATATTTCACAAAATGATTTCCGAA AGTGTACTACTATAAACTAAAATAATATAAAAATTTTTTTCATTTTTTATTTTTCATCAAATAAAAATTTTTTATTTTAAATTTTATAAAGTGTTTTACTAAAGGCTT Region I Region II Region III 80–90 bp, more than 90% A + T ◗ 11.11 Centromeres consist of particular sequences repeated many times. This nucleotide sequence is found in the point centromere of Saccharomyces cerevisiae. It is repeated many times in the centromeric region. Each copy of the sequence has approximately 110 bp and possesses three regions. Region I (9 bp) and region III (11 bp) are located at the ends of the sequence. Region II, consisting of about 80 to 90 mostly A – T base pairs, is in the middle. No part of the centromeric sequence codes for a protein; specific centromere proteins bind to centromeric sequences and provide anchor sites for spindle fibers. The centromeres of different organisms exhibit considerable variation in centromeric sequences. Some organisms have chromosomes with diffuse centromeres, and spindle fibers attach along the entire length of the chromosome. Most have chromosomes with localized centromeres; in these organisms, spindle fibers attach at a specific place on the chromosome. Localized centromeres appear constricted, but there also can be secondary constrictions at places that do not have centromeric functions. Two major classes of localized centromeres are point centromeres and regional centromeres. Point centromeres are relatively small; the point centromere of budding yeast (Saccharomyces cerevisiae) encompasses 125 bp of DNA. Regional centromeres are found on the chromosomes of fission yeast (Schizosaccharomyces pombe) and most plants and animals. In fission yeast, centromeres consist of a central core of 4000 – 7000 bp. This core is flanked by blocks of centromere-specific sequences that may be repeated several times. Some of these blocks have specialized functions, such as during meiosis. In Drosophila, Arabidopsis, and humans, centromeres span hundreds of thousands of base pairs. Most of the centromere is made up of short sequences of DNA that are repeated thousands of times in tandem. Within these repeats are “islands” of more complex sequence, primarily transposable element sequences. However, there do not appear to be any sequences that are unique to the centromere, which raises the question of what exactly determines where the centromere is. One possibility is that centromeres are defined not by a specific sequence but by a specific chromatin structure. In support of this idea, some nuclesomes at centromeres contain variant forms of certain histone proteins. In addition to their roles in the attachment of the spindle fibers and the movement of chromosomes, centromeres also help control the cell cycle (see p. 000 in Chapter 2). In mitosis, the spindle fibers make contact with the kinetochore of the centromere and orient the chromosomes on the metaphase plate. If anaphase is initiated before each chromosome is attached to the spindle fibers, chromosomes will not move toward the spindle pole and will be lost. Research findings indicate that the commencement of anaphase is inhibited by a signal from the centromere. This inhibitory signal disappears only after the centromere of each chromosome is attached to spindle fibers from opposite poles. Concepts The centromere is a region of the chromosome to which spindle fibers attach. Centromeres display considerable variation in structure. In addition to their role in chromosome movement, centromeres also help control the cell cycle by inhibiting anaphase until chromosomes are attached to spindle fibers from both poles. Telomere Structure Telomeres are the natural ends of a chromosome (see p. 000 in Chapter 2). Pioneering work by Hermann Muller (on fruit flies) and Barbara McClintock (on corn) showed that chromosome breaks produce unstable ends that have a tendency to stick together and allow the chromosome to be degraded. Because attachment and degradation don’t happen to the ends of a chromosome that has telomeres, each telomere must serve as a cap that stabilizes the chromosome, much like the plastic tips on the ends of a shoelace that prevent the lace from unraveling. Telomeres also provide a means of replicating the ends of the chromosome. The enzymes that synthesize DNA are unable to replicate the last few nucleotides at the end of each newly synthesized DNA strand (discussed in Chapter 12). Consequently, a chromosome should get shorter each time its DNA is synthesized, and this progressive shortening would eventually damage genes on the chromosome. Indeed, such chromosome shortening does occur in somatic cells, which are capable of only a limited number of divisions. Germ cells and cells in singlecelled organisms, however, must divide continually. 12 Chapter 11 Chromosomes in these cells don’t progressively shorten and self-destruct, because the cells possess an enzyme called telomerase that replicates the telomeres. The ability of telomerase to replicate a chromosome end depends on the unique molecular structure of the telomere. We will examine this mechanism of replication in Chapter 12. Telomeres were first isolated from the protozoan Tetrahymena thermophila and were found to possess multiple copies of the sequence: 5 – CCCCAA – 3 3 – GGGGTT – 5 Telomeres have now been isolated from protozoans, plants, humans, and other organisms; most are similar in structure (Table 11.2). These telomeric sequences usually consist of a series of cytosine nucleotides followed by several adenine or thymine nucleotides or both, taking the form 5 – Cn(A or T)m – 3, where n is 2 or greater and m is from 1 to 4. For example, the repeating unit in human telomeres is CCCTAA, which may be repeated from 250 to 1500 times. The sequence is always oriented with the string of Cs and Gs toward the end of the chromosome, as shown here: end of 5 – CCCTAA toward ; : chromosome 3 – GGGATT centromere 5’ CCC TAACCCTAA 3’ GGGATTGGGATTGGGATT 3’ 5’ DNA sequence at end of chromosome ◗ 11.12 DNA at the ends of eukaryotic chromosomes consists of telomeric sequences. The G-rich strand often protrudes beyond the complementary C-rich strand at the end of the chromosome ( ◗ FIGURE 11.12). The length of the telomeric sequence varies from chromosome to chromosome and from cell to cell, suggesting that each telomere is a dynamic structure that actively grows and shrinks. The telomeres of Drosophila chromosomes are different in structure. They consist of multiple copies of the two different retrotransposons (discussed later in this chapter), Het-A and Tart, arranged in tandem repeats. Apparently, in Drosophila, loss of telomere sequences during replication is balanced by transposition of additional copies of the Het-A and Tart elements. Farther away from the end of the chromosome, from several thousand to hundreds of thousands of base pairs form telomere-associated sequences. They, too, contain repeated sequences, but the repeats are longer, more varied, and more complex than those found in telomeric sequences. Table 11.2 DNA sequences typically found in telomeres of various organisms Organism Sequence Tetrahymena (protozoan) 5 – CCCCAA – 3 3 – GGGGTT – 5 Oxytricha (protozoan) 5 – CCCCAAAA – 3 3 – GGGGTTTT – 5 Trypanosoma (protozoan) 5 – CCCTAA – 3 3 – GGGATT – 5 Saccharomyces (yeast) 5 – C2–3 ACA1–6 – 3 3 – G2–3 TGT1–6 – 5 Neurospora (fungus) 5 – CCCTAA – 3 3 – GGGATT – 5 Caenorhabditis (nematode) 5 – GCCTAA – 3 3 – CGGATT – 5 Bombyx (insect) 5 – CCTAA – 3 3 – GGATT – 5 Vertebrate 5 – CCCTAA – 3 3 – GGGATT – 5 Arabidopsis (plant) 5 – CCCTAAA – 3 3 – GGGATTT – 5 Source: V. A. Zakian, Science 270(1995): 1602. Concepts A telomere is the stabilizing end of a chromosome. At the end of each telomere are many short telomeric sequences. Longer, more complex telomere-associated sequences are found adjacent to the telomeric sequences. www.whfreeman.com/pierce telomeres More detailed information on Variation in Eukaryotic DNA Sequences Prokaryotic and eukaryotic cells differ dramatically in the amount of DNA per cell, a quantity termed an organism’s C value (Table 11.3). Each cell of a fruit fly, for example, contains 35 times the amount of DNA found in a cell of the bacterium E. coli. In general, eukaryotic cells contain more DNA than that of prokaryotes, but variability in the C values of different eukaryotes is huge. Human cells contain more than 10 times the amount of DNA found in Drosophila cells, whereas some salamander cells contain 20 Chromosome Structure and Transposabe Elements Table 11.3 Genome sizes of various organisms Organism Approximate Genome Size (bp)  (bacteriophage) 50,000 E. coli (bacterium) 4,600,000 Saccharomyces cerevisiae (yeast) 13,500,000 Arabidopsis thaliana (plant) 100,000,000 Drosophila melanogaster (insect) 140,000,000 Homo sapiens (human) 3,000,000,000 Zea mays (corn) 4,500,000,000 Amphiuma (salamander) 765,000,000,000 times as much DNA as that of human cells. Clearly, these differences in C value cannot be explained simply by differences in organismal complexity. So what is all this extra DNA in eukaryotic cells doing? We do not yet have a complete answer to this question, but examination of DNA sequences has revealed that eukaryotic DNA has complexity that is absent from prokaryotic DNA. Denaturation and Renaturation of DNA The first clue that the DNA of eukaryotes contains several types of sequences came from the results of studies in which double-stranded DNA was separated and then allowed to reassociate. When double-stranded DNA in solution is heated, the hydrogen bonds that hold the two strands together are weakened and, with enough heat, the two nucleotide strands separate completely, a process called denaturation or melting ( ◗ FIGURE 11.13). DNA is typically denatured within a narrow temperature range. The midpoint of this range, the melting temperature (Tm), depends on the base sequence of a particular sample of DNA: G – C base pairs have three hydrogen bonds, whereas A – T base 1 If a solution of doublestranded DNA is slowly heated, the nucleotide strands separate. Doublestranded DNA ◗ pairs only have two; so the separation of G – C pairs requires more energy than does the separation of A – T pairs. A DNA molecule with a higher percentage of G – C pairs will therefore have a higher Tm than that of DNA with more A – T pairs. The denaturation of DNA by heating is reversible; if single-stranded DNA is slowly cooled, single strands will collide and hydrogen bonds will again form between complementary base pairs, producing double-stranded DNA (see Figure 11.13). This reaction, called renaturation or reannealing, takes place in two steps. First, single strands in solution collide randomly with their complementary strands. Second, hydrogen bonds form between complementary bases. Two single-stranded molecules of DNA from different sources will anneal if they are complementary, a process termed hybridization. For hybridization to take place, the two strands do not have to be complementary at all their bases — just at enough bases to hold the two strands together. The extent of hybridization can be used to measure the similarity of nucleic acids from two different sources and is a common tool for assessing evolutionary relationships. The rate at which hybridization takes place also provides information about the sequence complexity of DNA (see next subsection). Renaturation Reactions and C 0t Curves In a typical renaturation reaction, DNA molecules are first sheared into fragments several hundred base pairs in length. Next, the fragments are heated to about 100°C, which causes the DNA to denature. The solution is then cooled slowly, and the amount of renaturation is measured by observing optical absorbance. Double-stranded DNA absorbs less UV light than does single-stranded DNA; so the amount of renaturation can be monitored by shining a UV light through the solution and measuring the amount of the light absorbed. The amount of renaturation depends on two critical factors: (1) initial concentration of single-stranded DNA (C0) and (2) amount of time allowed for renaturation (t). Other things being equal, there will be more renaturation 2 If the solution is then cooled, the complementary single strands will come back together (reanneal). Singlestranded DNA Doublestranded DNA 11.13 The slow heating of DNA causes the two strands to separate (denature). 13 Chapter 11 at higher concentrations of DNA, because high concentrations increase the likelihood that the two complementary strands will collide. There will also be more renaturation with increasing time, because there are more opportunities for two complementary sequences to collide. These two factors together form a parameter called C0t, which equals the initial concentration multiplied by the renaturation time (C0  t  C0t). A plot of the fraction of single-stranded DNA as a function of C0t during a renaturation reaction is called a C0t curve. A typical C0t curve for a prokaryotic organism is shown in ◗ FIGURE 11.14. The upper left-hand side of the curve represents the start of the renaturation reaction, when all of the DNA is single stranded, and so the proportion of single-stranded DNA is 1. As the reaction proceeds, singlestranded DNA pairs to form double-stranded DNA, represented by the decreasing fraction of single-stranded DNA. At the end of the reaction, the proportion of singlestranded DNA is 0, because all of the DNA is now double stranded. The value at which half of the DNA is reannealed is called C0t 12. The rate of renaturation also depends on the size and complexity of the DNA molecules used. Consider the following analogy. Suppose we distribute 100 cards equally among the students in a class. We ask each student to write his or her name on the cards, and we put all the cards in a hat. We then randomly draw two cards from the hat and see if the names on the two cards match. If they don’t match, we put them back in the hat; if they do match, we remove them, and we continue drawing until all the cards have been removed. If there are only four students in the class, each student will receive 25 cards. Because each student’s name is on 25 cards, the chance of drawing two cards that match is high, and we will quickly empty the hat. If we do the same exercise in another class with 50 students, again using 100 cards, each student’s name will appear on only two cards, and the chance of removing two cards with the same name is much lower. Thus, it will take longer to empty the hat. This exercise resembles what occurs in the renaturation reaction. If we start with the same total amount of DNA, but there are only a few different sequences in the DNA, a chance collision between two complementary fragments is more likely to occur than if there were many different sequences. Therefore DNA from organisms with larger genomes will have a larger C0t 12 value. Thus far, we have considered renaturation reactions in which each DNA sequence is present only once in each molecule. If some sequences are present in multiple copies, these sequences will be more likely to collide with a complementary copy, and renaturation of these sequences will be rapid. Think about our analogy of drawing names from a hat. Imagine that we have 50 students and 100 cards; each student gets two cards. This time, the students write only their first names on the cards. Again, we place the cards in the hat and draw out two cards at random. If Denatured Fraction of DNA remaining single stranded 14 Renatured 1 Start of reaction 3/4 C0t 1/2 1/2 End of reaction 1/4 0 0 0.0001 0.01 0.1 1 Concentration time (C0t) 10 100 ◗ 11.14 A C0t curve represents the fraction of DNA remaining single stranded in a renaturation reaction, plotted as a function of DNA concentration  time (C0t). This graph is a typical C0t curve for a prokaryotic organism. there are five students in the class named Scott, this name will appear on ten cards; so the chance of drawing out two cards at random bearing the name Scott is fairly high. On the other hand, if there is only one Susan in the class, this name will appear on only two cards, and the chance of drawing out two cards with the name Susan is low. The cards with Scott match up more quickly than the cards with Susan, because there are more copies with the name Scott. Similarly, in a renaturation reaction, if some sequences of DNA are present in multiple copies, they will renature more quickly. Concepts When double-stranded DNA is heated, it denatures, separating into single-stranded molecules. On cooling, these single-stranded molecules pair and re-form double-stranded DNA, a process called renaturation. A C0t curve is a plot of a renaturation reaction. Types of DNA Sequences in Eukaryotes For most eukaryotic organisms, C0t curves similar to the one presented in ◗ FIGURE 11.15 are produced and indicate that eukaryotic DNA consists of at least three types of sequences. Slowly renaturing DNA consists of sequences that are present only once, or at most a few times, in the genome. This nonrepetitive, unique-sequence DNA includes sequences that code for proteins, as well as a great deal of DNA whose function is unknown. The more rapidly renaturing DNA represents two kinds of repetitive DNA — Chromosome Structure and Transposabe Elements Denatured Renatured Fraction of DNA remaining single stranded 1.0 Highly repetitive 3/4 Moderately repetitive 1/2 Unique Concepts 1/4 0 0.0001 length, are present in hundreds of thousands to millions of copies that are repeated in tandem and clustered in certain regions of the chromosome, especially at centromeres and telomeres. Highly repetitive DNA is sometimes called satellite DNA, because it has a different base composition from those of the other DNA sequences and separates as a satellite fraction when centrifuged at high speeds. Highly repetitive DNA is rarely transcribed into RNA. Although these sequences may contribute to centromere and telomere function, most highly repetitive DNA has no known function. 0.01 1 100 10,000 C0t ◗ 11.15 A typical C0t curve for a eukaryotic organism contains several steps. The first step in the curve represents DNA renaturing at very low C0t values, because these sequences are present in many copies (highly repetitive). The second step represents DNA renaturing at intermediate C0t values; these sequences are present in an intermediate number of copies (moderately repetitive). The last step represents DNA that renatures slowly; these sequences are present singly or in few copies (unique). DNA sequences that exist in multiple copies. Although not identical, these copies are similar enough to reanneal. Moderately repetitive DNA typically consists of sequences from 150 to 300 bp in length (although they may be longer) that are repeated many thousands of times. Some of these sequences perform important functions for the cell; for example, the genes for ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs) make up a part of the moderately repetitive DNA. However, much of the moderately repetitive DNA has no known function in the cell. Moderately repetitive DNA itself is of two types of repeats. Tandem repeat sequences appear one after another and tend to be clustered at a few locations on the chromosomes. Interspersed repeat sequences are scattered throughout the genome. An example of an interspersed repeat is the Alu sequence, each of which consists of about 200 bp. The Alu sequence is present more than a million times in the human genome and makes up about 11% of each person’s DNA. Short repeats, such as the Alu sequences, are called SINEs (short interspersed elements). Longer interspersed repeats consisting of several thousand base pairs are called LINEs (long interspersed elements). Most interspersed repeats are transposable genetic elements, sequences that can multiply and move (see next section). The other major class of repetitive DNA is highly repetitive DNA. These short sequences, often less than 10 bp in Eukaryotic DNA comprises three major classes: unique-sequence DNA, moderately repetitive DNA, and highly repetitive DNA. Unique-sequence DNA consists of sequences that exist in one or only a few copies; moderately repetitive DNA consists of sequences that may be several hundred base pairs in length and is present in thousands to hundreds of thousands of copies. Highly repetitive DNA consists of very short sequences repeated in tandem and present in hundreds of thousands to millions of copies. The Nature of Transposable Elements Transposable elements are mobile DNA sequences found in the genomes of all organisms. In many genomes, they are quite abundant: for example, they make up at least 50% of human DNA. Most transposable elements are able to insert at many different locations, relying on mechanisms that are distinct from homologous recombination. They often cause mutations, either by inserting into another gene and disrupting it or by promoting DNA rearrangements such as deletions, duplications, and inversions (see Chapter 9). General Characteristics of Transposable Elements There are many different types of transposable elements: some have simple structures, encompassing only those sequences necessary for their own transposition (movement), whereas others have complex structures and encode a number of functions not directly related to transposition. Despite this variation, many transposable elements have certain features in common. Short, flanking direct repeats of 3 to 12 base pairs are present on both sides of most transposable elements. They are not a part of a transposable element and do not travel with it. Rather, they are generated in the process of transposition, at the point of insertion. The sequences of these repeats vary, but the length is constant for each type of transposable element. 15 16 Chapter 11 At the ends of many, but not all, transposable elements are terminal inverted repeats, which are sequences from 9 to 40 bp in length that are inverted complements of one another. For example, the following sequences are inverted repeats: 1 Staggered cuts are made in the target DNA. CGTCGATAG GCAGCTATC 5 – ACAGTTCAG . . . CTGAACTGT – 3 3 – TGTCAAGTC . . . GACTTGACA – 5 CGTCGAT GC Transposable element AG AGCTATC On the same strand, the two sequences are not simple inversions, as their name might imply; rather, they are both inverted and complementary. (Notice that the sequence from left to right in the top strand is the same as the sequence from right to left in the bottom strand.) Terminal inverted repeats are recognized by enzymes that carry out transposition and are required for transposition to take place. ◗ FIGURE 11.17 summarizes the general characteristics of transposable elements. 2 A transposable element inserts itself into the DNA. CGTCGAT GC AG AGCTATC Gaps filled in by DNA polymerase 3 The staggered cuts leave short, single-stranded pieces of DNA. Concepts CGTCGAT GCAGCTA Transposable elements are mobile DNA sequences that often cause mutations. There are many different types of transposable elements; most generate short, flanking direct repeats at the target site as they insert. Many transposable elements also possess short terminal inverted repeats. TCGATAG AGCTAAGCTATC Flanking direct repeats 4 Replication of this singlestranded DNA creates the flanking direct repeats. ◗ 11.16 Flanking direct repeats are generated when a transposable element inserts into DNA. Transposition Transposition is the movement of a transposable element from one location to another. Although our understanding of transposition is still incomplete, it’s clear that, rather than a single mechanism, several different mechanisms are required for transposition in both prokaryotic and eukaryotic cells. Nevertheless, all types of transposition have several features in common: (1) staggered breaks are made in the target DNA (see Figure 11.16); (2) the The presence of flanking direct repeats indicates that staggered cuts are made in the target DNA when a transposable element inserts itself, as shown in ◗ FIGURE 11.16. The staggered cuts leave short, single-stranded pieces of DNA on either side of the transposable element. Replication of the single-stranded DNA then creates the flanking direct repeats. Transposable element (a) TGCAA ATCGCA ACGTT TAGCGT ◗ (b) Transposable element TGCGATTGCAA ACGCTAACGTT Terminal inverted repeat Terminal inverted repeat Flanking direct repeat Flanking direct repeat 11.17 Many transposable elements have common characteristics. (a) Most transposable elements generate flanking direct repeats on each side of the point of insertion into target DNA. Many transposable elements also possess terminal inverted repeats. (b) These representations of direct and indirect repeats are used in illustrations throughout this chapter. Chromosome Structure and Transposabe Elements transposable element is joined to single-stranded ends of the target DNA; and (3) DNA is replicated at the singlestrand gaps. Mechanisms of Transposition Some transposable elements transpose through DNA intermediates, whereas others use RNA intermediates. Among those that transpose through DNA, transposition may be replicative or nonreplicative. In replicative transposition, a new copy of the transposable element is introduced at a new site while the old copy remains behind at the original site; the number of copies of the transposable element increases. In nonreplicative transposition, the transposable element excises from the old site and inserts at a new site without any increase in the number of its copies. Nonreplicative transposition requires replication of only the few nucleotides that constitute the direct repeats. Replicative transposition Replicative transposition, sometimes called copy-and-paste transposition, can be either between two different DNA molecules or between two parts of the same DNA molecule. ◗ FIGURE 11.18 summarizes the steps of transposition between two circular DNA molecules. Before transposition (see Figure 11.18a), the transposable element is on one molecule. In the first step, the two DNA molecules are joined, and the transposable element is replicated, producing the cointegrate structure that consists of molecules A  B fused together with two copies of the transposable element (see Figure 11.18b). In a moment, we’ll see how the copy is produced, but let’s first look at the second step of the replicative transposition process. After the cointegrate has formed, crossing over at regions within the transposable elements produces two molecules, each with a copy of the transposable element (a) 17 (see Figure 11.18c). This second step is known as resolution of the cointegrate. How are the steps of replicative transposition (cointegrate formation and resolution) brought about? Cointegrate formation requires four events. First, a transposase enzyme (often encoded by the transposable element) makes singlestrand breaks at each end of the transposable element and on either side of the target sequence where the element inserts ( ◗ FIGURE 11.19 a and b). Second, the free ends of the transposable element attach to the free ends of the target sequence ( ◗ FIGURE 11.19c). Third, replication takes place on the single-stranded templates, beginning at the 3 ends of the single strands and proceeding through the transposable element ( ◗ FIGURE 11.19d and e). This replication creates the cointegrate, with its two copies of both the transposable element and the sequence at the target site, which is now on one side of each copy ( ◗ FIGURE 11.19f). The enzymes that perform the replication and ligation functions are cellular enzymes that function in replication and DNA repair. Fourth, after the cointegrate has formed, it undergoes resolution, which requires crossing over between sites located within the transposon. Resolution gives rise to two copies of the transposable element ( ◗ FIGURE 11.19g). The resolution step is brought about by resolvase enzymes (encoded in some cases by the transposable element and in other cases by a cellular gene) that function in homologous recombination. Nonreplicative transposition In nonreplicative transposition, the transposable element moves from one site to another without replication of the entire transposable element, although short sequences in the target DNA are replicated, generating flanking direct repeats. Sometimes referred to as cut-and-paste transposition, nonreplicative transposition requires only that the transposable element (b) A + B cointegrate (c) Resolution A A Transposable element B 1 Before transposition takes place, a single copy of the transposable element is found on only one molecule. ◗ B 2 The two DNA molecules are joined and the transposable element is replicated, producing the cointegrate. 3 Crossing over at a region within the transposable element… 11.18 Replicative transposition increases the number of copies of the transposable element. 4 …results in two separate molecules, each with a copy of the transposable element. 18 Chapter 11 1 One copy of the transposable element is present. (a) (b) 5 Replication takes place on the single-stranded templates, beginning at the free ends of the single strands… 4 The free ends of the transposable element attach to the free ends of the target sequence. 2 A transposase enzyme makes singlestrand breaks at each end of the transposable element… 3 …and at the ends of the target sequence. (c) (d) A Transposable element Target sequence B ◗ 11.19 Replicative transposition requires single-strand breaks, replication, and resolution. and the target DNA be cleaved and joined together. Cleavage requires a transposase enzyme produced by the transposable element. The joining of the transposable element and target DNA is probably carried out by normal replication and repair enzymes. If a transposable element moves by nonreplicative transposition, how does it increase in copy number in the genome? The answer comes from examining the fate of the original site of the element. After excision, a break will be left at the original insertion site. Such breaks are harmful to the cell, and so they are repaired efficiently (see Chapter 17). One common method of repair is to copy sequence information from a homologous template; the sister chromatid is the preferred template for this type of repair. Before transposition, both sisters will have a copy of the transposable element. After excision from one chromatid, repair of the break can result in copying the transposable element sequence off the sister chromatid. Thus, the transposable element is moved from the original site to a new site, but a copy is restored to the original site by DNA repair mechanisms. Transposition through an RNA intermediate Eukaryotic transposable elements that transpose through RNA intermediates are called retrotransposons. A retrotransposon in DNA ( ◗ FIGURE 11.20a) is first transcribed into an RNA sequence ( ◗ FIGURE 11.20b), which may be processed. The processed RNA undergoes reverse transcription by a reverse transcriptase enzyme to produce a double-stranded DNA copy of the RNA ( ◗ FIGURE 11.20c). Staggered cuts are made in the target DNA ( ◗ FIGURE 11.20d), and the DNA copy of the retrotransposon inserts into the genome ( ◗ FIGURE 11.20e). Replication fills in the short gaps produced by the staggered cuts, generating flanking direct repeats on both sides of the retrotransposon. Concepts Transposition may be through either a DNA or an RNA intermediate. In replicative transposition, a new copy of the transposable element inserts in a new location and the old copy stays behind; in nonreplicative transposition, the old copy excises from the old site and moves to a new site. Transposition through an RNA intermediate requires reverse transcription, in which a retrotransposon is transcribed into RNA, the RNA is copied into DNA, and the new DNA copy is integrated into the target site. The Mutagenic Effects of Transposition Because transposable elements may insert into other genes and disrupt their function, transposition is generally mutagenic. In fact, more than half of all spontaneously occurring mutations in Drosophila result from the insertion of a transposable element in or near a functional gene. Although most of these mutations are detrimental, transposition may occasionally activate a gene or change the phenotype of the cell in a beneficial way. Additionally, a transposable element may carry information that benefits the cell, such as antibiotic resistance conferred by genes carried on bacterial transposable elements. In 1991, Francis Collins and his colleagues discovered a 31-year-old man with neurofibromatosis caused by a transposition of the Alu sequence. Neurofibromatosis is a disease ◗ 11.20 (opposite page) Retrotransposons transpose through RNA intermediates. Chromosome Structure and Transposabe Elements 6 …and proceeding through the transposable element and the target sequences… (e) 7 …to produce the cointegrate, with two copies of the transposable element and two copies of the target sequence. (f) 8 Crossing over between sites within the transposable element… 9 …gives rise to two separate copies of the transposable element. (g) 10 The new copy is flanked by direct repeats of the target sequence. A A+B cointegrate B (a) Flanking direct repeat Retrotransposon Flanking direct repeat Transcription (b) RNA Reverse transcription (c) 1 The retrotransposon sequence is transcribed into RNA… 2 …and undergoes reverse transcription to produce double-stranded DNA . Doublestranded DNA (d) 3 Staggered cuts are made in the target DNA. (e) 4 The retrotransposon integrates into the host DNA at the new site. Old copy of retrotransposon New copy of retrotransposon 5 Replication fills in the gaps at the site of insertion and creates the flanking direct repeats. that produces numerous tumors of the skin and nerves; it results from mutations in a gene called NF1. Collins and his colleagues found a copy of Alu in one of the introns of this man’s NF1 gene. The Alu had caused an RNA splicing error, with the result that one of the exons was left out of the NF1 mRNA. The absence of the exon caused a shift in the reading frame and resulted in an abnormal protein, which eventually caused the neurofibromatosis. Examination of DNA from the man’s mother and father revealed that the Alu sequence was not present in their NF1 genes — the insertion was new. Cases of hemophilia and muscular dystrophy also have been traced to mutations caused by transposition. Because transposition entails the exchange of DNA sequences and recombination, it often leads to DNA rearrangements. Homologous recombination between multiple copies of transposons also leads to duplications, deletions, and inversions, as shown in ◗ FIGURE 11.21. The Bar mutation in Drosophila (see Figures 9.7 and 9.8) is a tandem duplication thought to have arisen through homologous recombination between two copies of a transposable element present in different locations on the X chromosome. DNA rearrangements can also be caused by excision of transposable elements in a cut-and-paste transposition. If the broken DNA is not repaired properly, a chromosome rearrangement can be generated. If it is not repaired at all, the acentric fragment will be last, resulting in a deletion. This type of chromosome breakage led to the first discovery of transposable elements by Barbara McClintock (described below). She named the gene that appeared at these sites Dissociation because of the tendency for it to cause chromosome breakage and loss of a fragment. The Regulation of Transposition Many transposable elements move through replicative transposition and increase in number with each transposition. As the number of copies of the transposon increases, the rate of transposition increases because the concentration 19 20 Chapter 11 ◗ 11.21 Chromosome rearrangements are often generated by transposition. Transposable genetic elements (a) A B C 1 Pairing by looping and crossing over between two transposable elements oriented in the same direction… D E F G of transposase in the cell becomes greater (remember that transposase is produced by the transposon). In the absence of mechanisms to restrict transposition, the number of copies of transposable elements would increase continuously, and the host DNA would be harmed by the resulting high rate of mutation (caused by frequent insertion of transposable elements). Furthermore, large amounts of energy and resources would be required to replicate the “extra” DNA in the proliferating transposable elements. For these reasons, it isn’t surprising that cells have evolved mechanisms to regulate transposition, just as they have mechanisms to regulate gene expression (see Chapter 16). When a transposable element first enters a cell that possesses no other copies of that element, transposition is frequent. As the number of copies of the transposable element increases, the frequency of transposition diminishes until a steady-state number of transposable elements is reached. This regulation of transposition means that most cells have a characteristic number of copies of a particular transposable element. Many transposable elements regulate transposition by limiting the production of the transposase enzyme required for movement. In some cases, transcription of the transposase gene is regulated but, more frequently, translation of the transposase mRNA is controlled (see p. 000 in Chapter 16). Other regulatory mechanisms do not affect the level of transposase; rather, they directly inhibit the transposition event. D E F G C A B D Deletion product E C A B F G 2 …leads to chromosome deletion. A (b) B C D E F G 3 Pairing by bending and crossing over between two transposable elements oriented in opposite directions… G F E D A A B C B E D C F G B C D E F G 4 …leads to a chromosome inversion. A (c) Concepts 5 Misalignment and unequal exchange between transposable elements located on sister chromatids… A D A B F G F E G The Structure of Transposable Elements D C A E C B Transposable elements frequently cause mutations and DNA rearrangements. Many transposable elements regulate their own transposition, either by controlling the amount of transposase produced or by direct inhibition of the transposition event. B F G 6 …leads to one chromosome with a deletion… A B C D E C D E F G 7 …and one chromosome with a duplication. Bacteria and eukaryotic organisms possess a number of different types of transposable elements, the structures of which vary extensively. In this section, we consider the structures of representative types of transposable elements. Transposable Elements in Bacteria The two major groups of bacterial transposable elements are: (1) simple transposable elements that carry only the information required for movement and (2) more-complex Chromosome Structure and Transposabe Elements IS1 (768 bp) Transposase gene 23-bp terminal inverted repeat 9-bp flanking direct repeat ◗ 11.22 Insertion sequences are simple transposable elements found in bacteria. transposable elements that contain DNA sequences not directly related to transposition. Insertion sequences The simplest type of transposable element in bacterial chromosomes and plasmids is an insertion sequence (IS). This type of element carries only the genetic information necessary for its movement. Insertion sequences are common constituents of bacteria and plasmids. They are designated by IS, followed by an identifying number. For example, IS1 is a common insertion sequence found in E. coli. Insertion sequences are typically from 800 to 2000 bp in length and possess the two hallmarks of transposable elements: terminal inverted repeats and the generation of flanking direct repeats at the site of insertion. Most insertion sequences contain one or two genes that code for transposase. IS1, a typical insertion sequence, is 768 nucleotide pairs long and has terminal inverted repeats of 23 bp at each end ( ◗ FIGURE 11.22). The flanking direct repeats created by IS1 are each 9 bp long — the most common length for flanking direct repeats. Table 11.4 summarizes these features for several bacterial insertion sequences. composite transposon of about 9300 bp that carries a gene (about 6500 bp) for tetracycline resistance between two IS10 insertion sequences ( ◗ FIGURE 11.23). The insertion sequences have terminal inverted repeats; so the composite transposon also ends in inverted repeats. Composite transposons also generate flanking direct repeats at their sites of insertion (see Figure 11.23). The insertion sequences at the ends of a composite transposon may be in the same orientation or they may be inverted relative to one other (as in Tn10). The insertion sequences at the ends of a composite transposon are responsible for transposition. The DNA between the insertion sequences is not required for movement and may carry additional information (such as antibiotic resistance). Presumably, composite transposons evolve when one insertion sequence transposes to a location close to another of the same type. The transposase produced by one of the IS sequences catalyzes the transposition of both insertions sequences, allowing them to move together and carry along the DNA that lies between IS10L tet R gene IS10R Composite transposons Any segment of DNA that becomes flanked by two copies of an insertion sequence may itself transpose and is called a composite transposon. Each type of composite transposon is designated by the abbreviation Tn, followed by a number. Tn10 is a Flanking direct repeat Tn10 (9300 bp) ◗ 11.23 Tn10 is a composite transposon in bacteria. Table 11.4 Structures of some common insertion sequences Length of Insertion Sequence Total Length (bp) Inverted Repeats (bp) Flanking Direct Repeats (bp) IS1 768 23 9 IS2 1327 41 5 IS4 1428 18 11 or 12 IS5 1195 16 4 Source: B. Lewin, Genes, 3d ed. (New York: Wiley, 1987), p. 591. 21 22 Chapter 11 Table 11.5 Characteristics of several composite transposons Composite Transposon Total Length (bp) Associated IS Elements Tn9 2500 IS1 Chloramphenicol resistance Tn10 9300 IS10 Tetracycline resistance Tn5 5700 IS50 Kanamycin resistance Tn903 3100 IS903 Kanamycin resistance them. In some composite transposons (such as Tn10), one of the insertion sequences may be defective; so its movement depends on the transposase produced by the other. Characteristics of several composite transposons are listed in Table 11.5. Noncomposite transposons As already stated, insertion sequences carry only information for their own movement, whereas bacterial transposons are more complex. Some transposable elements in bacteria lack insertion sequences and are referred to as noncomposite transposons. For instance, Tn3 is a noncomposite transposon that is about 5000 bp long, possesses terminal inverted repeats of 38 bp, and generates flanking direct repeats that are 5 bp in length. Tn3 carries genes for transposase and resolvase (mentioned earlier in this chapter), plus a gene that codes for the enzyme -lactamase, which provides resistance to ampicillin. A few bacteriophage genomes reproduce by transposition and use transposition to insert themselves into a bacterial chromosome in their lysogenic cycle; the best studied of these transposing bacteriophages is Mu ( ◗ FIGURE 11.24). Although Mu does not possess terminal inverted repeats, it does generate short (5-bp) flanking direct repeats when it inserts randomly into DNA. Mu replicates through transposition and causes mutations at the site of insertion, properties characteristic of transposable elements. Concepts Insertion sequences are prokaryotic transposable elements that carry only the information needed for transposition. A composite transposon is a more complex element that consists of two insertion sequences plus intervening DNA. Noncomposite transposons in bacteria lack insertion sequences but have terminal inverted repeats and carry information not related to transposition. All of these transposable elements generate flanking direct repeats at their points of insertion. Transposable Elements in Eukaryotes Eukaryotic transposable elements can be divided into two groups. One group is structurally similar to transposable elements found in bacteria, typically ending in short inverted repeats and transposing through DNA intermediates. The other group comprises retrotransposons (see Figure 11.20); they use RNA intermediates and are similar in structure and movement to retroviruses (see p. 000 in Chapter 8). On the basis of their structure, function, and genomic sequences, it is clear that some retrotransposons are evolutionarily related to retroviruses. Although their mechanism of movement is fundamentally different from that of other transposable elements, retrotransposons Mu (38,000 bp) Flanking direct repeat ◗ 11.24 Other Genes Within the Transposon Other phage genes Head and tail genes Mu is a transposing bacteriophage. Flanking direct repeat Chromosome Structure and Transposabe Elements Ty element (6300 bp) TyA ◗ 11.25 of yeast. Ty is a transposable element TyB Protease, integrase, reverse transcriptase, RNase genes Flanking direct repeat Flanking direct repeat Delta sequence (334 bp) (direct repeat) also generate direct repeats at the point of insertion. Retrotransposons include the Ty elements in yeast, the copia elements in Drosophila, and the Alu sequences in humans. Ty elements in yeast Ty (for transposon yeast) elements are a family of common transposable elements found in yeast; many yeast cells have 30 copies of Ty elements. These elements are retrotransposons that are about 6300 nucleotide pairs in length and generate 5-bp flanking direct repeats when they insert into DNA ( ◗ FIGURE 11.25). At each end of a Ty element are direct repeats called delta sequences, which are 334 bp long. The delta sequences are analogous to the long terminal repeats found in retroviruses (see p. 000 in Chapter 8). These delta sequences contain promoters required for the transcription of Ty genes, and the promoters may also stimulate the transcription of genes that lie downstream of the Ty element. Between the delta sequences at each end of a Ty element are two genes (TyA and TyB, which encode several enzymes) that are related to the gag and pol genes found in retroviruses (see p. 000 in Chapter 8). Many Ty elements are defective and no longer capable of undergoing transposition. Ac and Ds elements in maize Transposable elements were first identified in maize (corn), more than 50 years ago by Barbara McClintock ( ◗ FIGURE 11.26). McClintock spent much of her long career studying their properties, and her work stands among the landmark discoveries of genetics. Her results, however, were misunderstood and ignored for many years. Not until molecular techniques were developed in the late 1960s and 1970s did the importance of transposable elements become widely accepted. Born in 1902, Barbara McClintock attended Cornell University as an undergraduate and, later, as a graduate student. She was especially interested in genetics, but the subject was taught in the department of plant breeding, which did not accept women students. So she registered for botany instead and studied maize chromosomes for her Ph.D. dissertation. After receiving her degree, McClintock remained at Cornell, continuing her cytogenetic analysis of maize chromosomes. Her discoveries in the next 10 years included the identification of all the chromosomes in maize, the assignment of linkage groups to chromosomes, proof of crossing over, mapping genes to chromosomes by using rearrangements, and associating chromosome elements with the nucleolus. ◗ 11.26 Barbara McClintock was the first to discover transposable elements. (CSHL Archives/Peter Arnold.) McClintock’s discovery of transposable elements had its genesis in the early work of Rollins A. Emerson on the maize genes that caused variegated (multicolored) kernels. Most corn kernels are either wholly pigmented or colorless (yellow), but Emerson noted that some yellow kernels had spots or streaks of color ( ◗ FIGURE 11.27). He proposed that these kernels resulted from an unstable mutation: a muta- ◗ 11.27 Variegated (spotted) kernels in corn are caused by mobile genes. The study of variegated corn led Barbara McClintock to discover transposable elements. (Matt Meadows/Peter Arnold.) 23 24 Chapter 11 tion in the wild-type gene for pigment produced a colorless kernel; but, in some cells, the mutation reverted back to the wild type, causing a spot of pigment. However, Emerson didn’t know why these mutations were unstable. McClintock discovered that the cause of the unstable mutation was a gene that moved. She noticed that chromosome breakage in maize often occurred at a locus that she called Dissociation (Ds) but only if another gene, the Activator (Ac), also was present. Ds and Ac exhibited unusual patterns of inheritance; occasionally, the genes moved together. McClintock called these moving genes controlling elements, because they controlled the expression of other genes. McClintock published her conclusion that controlling elements moved in 1948. Although her results were not disputed, they were neither understood nor recognized by most geneticists. Of her work, Alfred Sturtevant, then a prominent geneticist remarked, “I didn’t understand one word she said, but if she says it is so, it must be so!” He expressed what seems to have been the attitude of many geneticists at the time. McClintock was frustrated by the genetics community’s reaction to her research, but she continued to pursue it nonetheless. In the 1960s, bacteria and bacteriophages were shown to possess transposable elements, and the development of recombinant DNA techniques in the 1970s and 1980s demonstrated that transposable elements exist in all organisms. The significance of McClintock’s early discoveries was finally recognized in 1983, when she was awarded the Nobel Prize in Physiology or Medicine. www.whfreeman.com/pierce A series of links to Barbara McClintock and her work on transposable elements Ac and Ds elements in maize have now been examined in detail, and their structure and function are similar to those of transposable elements found in bacteria: they possess terminal inverted repeats and generate flanking direct repeats at the points of insertion. Ac elements are about 4500 bp long, including terminal inverted repeats of 11 bp, and the flanking direct repeats that they generate are 8 bp in length ( ◗ FIGURE 11.28a). Each Ac element contains a single gene that encodes a transposase enzyme. Thus Ac elements are autonomous — that is, able to transpose. Ds elements are Ac elements with one or more deletions that have inactivated the transposase gene ( ◗ FIGURE 11.28b). Unable to transpose on their own, (nonautonomous), Ds elements can transpose in the presence of Ac elements because they still possess terminal inverted repeats recognized by Ac transposase. Each kernel in an ear of corn is a separate individual, originating as an ovule fertilized by a pollen grain. A kernel’s pigment pattern is determined by several loci. A pigment-encoding allele at one of these loci can be designated C, and an allele at the same locus that does not confer pigment is designated as c. A kernel with genotype cc will be colorless — that is, yellow or white ( ◗ FIGURE 11.29a); a kernel with genotype CC or Cc will produce pigment and be purple ( ◗ FIGURE 11.29b). A Ds element, transposing under the influence of a nearby Ac element, may insert into the C allele, destroying its ability to produce pigment ( ◗ FIGURE 11.29c). An allele inactivated by a transposable element is designated with a subscript “t”; so in this case it would be designated Ct. After the transposition of Ds into the C allele, the kernel cell has genotype Ctc. This kernel will be colorless (white or yellow), because neither the Ct nor the c allele confers pigment. (a) Ac element Ac element (4563 bp) Transposase gene (b) Ds elements Ds9 Ds2d1 Ds2d2 Ds6 ◗ 11.28 Ac and Ds are transposable elements in maize. Different Ds elements have different deletions. Deletions Chromosome Structure and Transposabe Elements (a) Genotype cc : no transposition 1 Cells with genotype cc produce no pigment,… Ac (b) Genotype Cc : no transposition 3 Cells with genotype Cc produce pigment,… 2 …resulting in a colorless (yellow or white) kernel. Phenotype c Ds Ac 4 …resulting in a pigmented (purple) kernel. Ds C Yellow kernel Purple kernel c (c) Genotype Cc Ac c C tc : transposition 5 An Ac element produces transposase,… Ds C tc /cc : mosaic (transposition (d) Genotype C tc during development) 6 …which stimulates transposition of a Ds element into the C allele… 9 An Ac element produces transposase,… Ac C 10 …which stimulates further transposition of the Ds element in some cells. Ds c c Ct Ac Ac C Yellow kernel Early transposition Late transposition Variegated kernel c 7 …and disrupts its pigmentproducing function. c 8 The resulting cells have genotype Ct c and are colorless. 11 As Ds transposes, it leaves the C allele, restoring the allele’s function. 12 A cell in which Ds has transposed out of the C allele will produce pigment, generating spots of color in an otherwise colorless kernel. Conclusion: Variegated corn kernels result from excision of Ds elements from genes controlling pigment production during development. ◗ 11.29 25 Transposition results in variegated maize kernels. As development takes place and the original one-celled maize embryo divides by mitosis, additional transpositions may take place in some cells. In any cell in which the transposable element excises from the Ct allele and moves to a new location, the C allele is rendered functional again: all cells derived from those in which this event has taken place will have the genotype Cc and be purple. The presence of these pigmented cells, surrounded by the colorless (Ctc) cells, produces a purple spot or streak (called a sector) in the otherwise yellow kernel ( ◗ FIGURE 11.29d). The size of the sector varies, depending on when the excision of the transposable element from the Ctc allele occurred. If excision occurred early in development, then many cells will contain the functional C allele and the pigmented sector will be large; if excision occurred late in development, few cells will have the functional C allele and the pigmented sector will be small. Transposable elements in Drosophila A number of different transposable elements are found in Drosophila. One of the best studied is copia, a retrotransposon about 5000 bp long ( ◗ FIGURE 11.30). Copia has direct (i.e., not inverted) repeats of 276 bp at each end, and within each direct repeat are terminal inverted repeats. When copia transposes, it generates flanking direct repeats that are 5 bp long at the site of insertion. Like Ty elements, copia contains sequences similar to those found in the gag and pol genes of retroviruses (see Figure 8.36). The number of copia elements in a typical fruit fly genome varies from 20 to 60. Another family of transposable elements found in Drosophila are the P elements. Most functional P elements are about 2900 bp long, although shorter P elements with deletions also exist. Each P element possesses terminal inverted repeats that are 31 bp long and generates flanking direct repeats of 8 bp at the site of insertion. Like transpos- 26 Chapter 11 Copia element (~5000 bp) Flanking Long terminal direct repeat repeat (276 bp) ◗ 11.30 Long terminal Flanking repeat (276 bp) direct repeat Copia is a transposable element of Drosophila. able elements in bacteria, P elements transpose through DNA intermediates. Each element encodes both a transposase and a repressor of transposition. The role of this repressor in controlling transposition is demonstrated dramatically in hybrid dysgenesis, which is the sudden appearance of numerous mutations, chromosome aberrations, and sterility in the offspring of a cross between a P male fly (with P elements) and a P female fly (without them). The reciprocal cross between a P female and a P male produces normal offspring. Hybrid dysgenesis arises from a burst of transposition that takes place when P elements are introduced into a cell that does not possess them. A cell that contains P elements produces the repressor in the cytoplasm that inhibits transposition. When a P female produces eggs, the repressor protein is incorporated into the egg cytoplasm, which prevents further transposition in the embryo and thus prevents mutations from arising. The resulting offspring are fertile as adults ( ◗ FIGURE 11.31a). However, a P females does not produce the repressor; so none is stored in the cytoplasm of her eggs. When her eggs are fertilized by sperm from a P male, the absence of repression allows the P elements contributed by the sperm to undergo rapid transposition in the embryo, causing hybrid dysgenesis ( ◗ FIGURE 11.31b). P elements appear to have invaded D. melanogaster within the past 50 years. Today, almost all D. melanogaster caught in the wild possess P elements, but these transposable elements are uncommon in laboratory colonies of flies that were established more than 30 years ago. In fact, no strain of D. melanogaster collected before 1945 possesses them, suggesting that P elements have recently invaded D. melanogaster and have spread rapidly throughout the species. Because P elements are not present in most laboratory stocks, they have been useful experimentally as vectors for introducing modified or foreign DNA into the Drosophila genome. P elements have been extensively manipulated and engineered for a variety of uses. If P elements are a recent addition to the genome of D. melanogaster, where did they come from? A likely source is Drosophila willistoni, another fruit fly species. D. willistoni appears to have long possessed P elements that are virtually identical with those now found in D. melanogaster. Researchers Marilyn Houck and Margaret Kidwell proposed that the P elements made the leap from D. willistoni to D. melanogaster by hitching a ride on a mite. All fruit flies are infected with a variety of mites. One mite species, Proctolaelaps regalis, infests both D. willistoni and D. melanogaster. This mite has needlelike mouth parts that allow it to pierce and feed on the eggs and larvae of the flies. Houck and Kidwell suggest that, while feeding on D. willistoni, a mite picked up fruit fly DNA with P elements, which it later injected into a developing D. melanogaster. This hypothesis is supported by the finding that mites do pick up P element DNA from P fruit flies. www.whfreeman.com/pierce and P elements More information on copia Transposable elements in humans Almost 50% of the human genome consists of sequences derived from transposable elements, although most of these elements are now inactive and no longer capable of transposing. One of the most common transposable elements in the human genome is Alu, named after a restriction enzyme (AluI), which cleaves the element into two parts. Every human cell contains more than 1 million related, but not identical, copies of Alu in its chromosomes. Unlike the retrotransposons we have described earlier (Ty elements from yeast and copia elements from Drosophila), Alu sequences are not similar to retroviruses. They do not have genes resembling gag and pol, and are therefore nonautonomous. Rather, Alu sequences are similar to the gene that encodes the 7S RNA molecule, which transports newly synthesized proteins across the endoplasmic reticulum. Alu sequences create short flanking direct repeats when they insert into DNA and have characteristics that suggest that they have transposed through an RNA intermediate. Alu belongs to a class of repetitive sequences found frequently in mammalian and some other genomes. These sequences are collectively referred to as SINEs, (short interspersed sequences). The human genome also has many LINEs (long interspersed sequences), which are somewhat more similar in structure to retroviruses, but not as similar as Ty or copia. The human genome contains evidence for several classes of transposable elements that transpose through a DNA intermediate, by the cut-and-paste mechanism. However, these all appear to have been inactive for about 50 million years; the nonfunctional sequences that remain have been referred to as DNA fossils. Chromosome Structure and Transposabe Elements (a) No hybrid dysgenesis P generation 1 Cross a P – male and a P + female. P+ P– Gamete production (b) Hybrid dysgenesis P generation 1 Cross a P + male and a P – female. P– P+ 2 A repressor in the egg cytoplasm inhibits transposition of P elements. 27 Gamete production 2 There is no repressor in the egg cytoplasm. Repressors P element Paternal chromosomes without P elements Maternal chromosomes with P elements Paternal chromosomes with P element Fertilization F1 generation Fertilization 3 Thus, there is no hybrid dysgenesis and development is normal,… Zygote Maternal chromosomes without P elements Zygote 3 Thus, P elements on the paternal chromosomes undergo a burst of transposition—hybrid dysgenesis—… Fertile fly 4 …producing fertile offspring. ◗ 11.31 Hybrid dysgenesis in Drosophila is caused by the transposition of P elements. F1 generation Sterile fly 4 …resulting in mutations, chromosome aberrations, and sterile offspring. Concepts A great variety of transposable elements exist in eukaryotes. Some resemble transposable elements in prokaryotes, having terminal inverted repeats, and transpose through a DNA intermediate. Others are retrotransposons with long direct repeats at their ends and transpose through an RNA intermediate. Connecting Concepts Classes of Transposable Elements Now that we have examined the process of transposition, let us review the major classes of transposable elements (Table 11.6). Transposable elements can be divided into two major classes on the basis of structure and movement. The first class consists of elements that possess terminal inverted repeats and transpose through DNA intermediates. They all generate flanking direct repeats at their points of Conclusion: Only the cross between a P+ male and the P– female causes hybrid dysgenesis, because the sperm does not contribute repressor. insertion into DNA. All active forms of these transposable elements encode transposase, which is required for their movement. Some also encode resolvase, repressors, and other proteins. Their transposition may be replicative or nonreplicative, but they never use RNA intermediates. Examples of transposable elements in this first class include insertion sequences and all complex transposons in bacteria, the Ac and Ds elements of maize, and the P elements of Drosophila. The second class of transposable elements are the retrotransposons, which transpose through RNA intermediates. They generate flanking direct repeats at their points of insertion when they transpose into DNA. Retrotransposons do not encode transposase, but some types are similar in structure to retroviruses and carry sequences that produce reverse transcriptase. Transposition 28 Chapter 11 Table 11.6 Characteristics of two major classes of transposable genetic elements Transposable Genetic Element Structure Genes Encoded Transposition Examples Class I Short, terminal inverted repeats; short flanking direct repeats at target site Transposase gene (and sometimes others) By DNA intermediate (replicative or nonreplicative) IS1 (E. coli) Tn3 (E. coli) Ac, Ds (maize) P elements (Drosophila) Class II (retrotransposon) Long, terminal direct repeats; short flanking direct repeats at target site Reverse transcriptase gene (and sometimes others) By RNA intermediate Ty (yeast) copia (Drosophila) Alu (human) takes place when transcription produces an RNA intermediate, which is then transcribed into DNA by reverse transcriptase and inserted into the target site. Examples of retrotransposons in this class include Ty elements in yeast, copia elements in Drosophila, and Alu sequences in humans. Retrotransposons are not found in prokaryotes. The Evolution of Transposable Elements As mentioned earlier, transposable elements exist in all organisms, often in large numbers. Why are they so common? Three principal hypotheses have been proposed to explain their widespread occurrence. The cellular function hypothesis proposes that transposable elements serve a valuable function within the cell, such as the control of gene expression or the regulation of development. Although the insertion of a transposable element can alter gene expression, there are few data to suggest that transposition plays a routine role in either of these or any other cellular processes. The genetic variation hypothesis proposes that transposable elements exist because of their mutagenic activity. It suggests that a certain amount of genetic variation is useful because it allows a species to adapt to environmental change. Although some mutations caused by transposable elements may allow species to evolve beneficial traits, the vast majority of mutations generated by random transposition have deleterious effects. Thus, although mutations produced by transposable elements may be useful in the future, their immediate effect is usually deleterious and they will be selected against. The fact that many organisms have evolved mechanisms to regulate transposition suggests that there is selective pressure to limit the extent of transposition. In fact, if their only effect were to generate mutations, transposable genetic elements could be expected to disappear in time. The selfish DNA hypothesis asserts that transposable elements serve no purpose for the cell; they exist simply because they are capable of replicating and spreading. They can be thought of as “selfish” parasites of DNA that provide no benefit to the cell and may even be somewhat detrimental. Their capacity to reproduce and spread is what makes them common. Which, if any, of these hypotheses is the correct explanation for the existence of transposable elements is not known. These hypotheses are not mutually exclusive, and all may contribute to the existence of mobile genes. Regardless of the evolutionary forces responsible for their existence, transposable elements have clearly played an important role in shaping the genomes of many organisms. In some cases, they have even been adopted for useful purposes by their host cells. One example is the mechanism that generates antibody diversity in the immune systems of vertebrates. As will be discussed in Chapter 21, the ability of the immune system to recognize and attack foreign substances (antigens) depends on a mechanism whereby lymphocytes join several DNA segments that code for antigenrecognition proteins. Three DNA segments, called V, D, and J, exist in multiple forms within each cell. In the development of a lymphocyte, particular V, D, and J segments are randomly joined to produce a protein that recognizes a specific antigen. Within different lymphocytes, different V, D, and J segments are joined together in different combinations. The variety of combinations provides a large array of cells, each of which recognizes a particular antigen. Close examination of the V, D, and J joining process reveals that its mechanism is the same as that for transposition. The genes — designated RAG1 and RAG2 — participating in bringing about V, D, and J joining may have at one time been transposable elements that inserted into the germ line of a vertebrate ancestor, some 450 million years ago. Another cellular function that may have originated as the result of a transposable element is the process that maintains the ends of chromosomes in eukaryotic organisms. As mentioned earlier in this chapter, DNA polymerases are unable to replicate the ends of chromosomes. In germ cells and single-celled eukaryotic organisms, chromosome length is maintained by telomerase, an enzyme Chromosome Structure and Transposable Elements that extends the chromosome ends by copying repeated DNA sequences from an RNA template that is a part of the telomerase enzyme. The mechanism used by the telomerase enzyme is similar to the reverse transcription process used in retrotransposition, and telomerase is evolutionarily related to the reverse transcriptases encoded by certain retrotransposons. These findings suggest that an invading retrotransposon in an ancestral eukaryotic cell may have provided the ability to copy the ends of chromosomes and eventually evolved into the gene that encodes the modern telomerase enzyme. Drosophila lacks the telomerase enzyme; retrotransposons appear to have resumed the role of telomere maintenance in this case. Connecting Concepts Across Chapters The material covered in this chapter has important connections to several topics already covered and to others in chapters yet to come. In Chapter 2, the gross structure of chromosomes and their behavior during mitosis and meiosis were introduced. The present chapter has built on that introduction by examining the molecular details of chromosome structure and the higher-level folding and packing of DNA that allows these very large molecules to maintain their functionality and still fit into the confined space of the cell. The solution to this cellular storage problem and the essential elements of eukaryotic chromosomes have been major themes of this chapter, completing the story of DNA structure introduced in Chapter 8. Transposable genetic elements, DNA sequences that move, are a part of chromosome structure. Earlier chapters dealt with crossing over, in which homologous DNA sequences switch places, and chromosome rearrangements, in which the breakage and rejoining of chromosome segments moves blocks of genes to new locations. The movement of transposable elements is fundamentally different 29 from these other mechanisms of gene movement because transposable elements possess sequences that facilitate their movement. Understanding the structure of transposable genetic elements requires a basic knowledge of DNA structure and sequence, topics covered in Chapter 10. Transposable elements violate a basic premise of classical genetics — that genes have a particular fixed location on a chromosome. This departure from a longheld view helps to explain why the discovery of transposable elements by Barbara McClintock was ignored for many years. A common theme in the history of genetics is that fundamental discoveries are often overlooked or unrecognized, because they require a radical rethinking of basic principles. Transposable elements today are recognized as ubiquitous DNA sequences with important implications for medicine, recombinant DNA technology, and evolution, but the reason for their widespread occurrence is still not completely understood. This chapter has provided a foundation for topics introduced in several later chapters of the book. Transposition requires the replication of DNA (Chapter 12) or reverse transcription (Chapter 14) and generates gene mutations (Chapter 17). In Chapter 16, we explore the control of gene expression, which requires changes in chromatin structure. Condensed chromatin structure tends to inhibit the transcription of genetic information; some of the proteins that take part in activating and repressing transcription are known to affect the binding of DNA to histones. The regulation of transposition is by some of the same mechanisms that regulate the expression of other genes, also discussed in Chapter 16. Additional topics covered in more detail in later chapters include the origins of replication (Chapter 12) and the application of repetitive sequences to DNA fingerprinting (Chapter 18). Transposable elements are important in the generation of immune-system diversity (Chapter 21) and in molecular evolution (Chapter 23). CONCEPTS SUMMARY • Chromosomes contain very long DNA molecules that are tightly packed. Packing is accomplished through tertiary structures and the binding of DNA to proteins. • Supercoiling results from strain produced when rotations are added or removed from a relaxed DNA molecule. Overrotation produces positive supercoiling; underrotation produces negative supercoiling. • Topoisomerases control the degree of supercoiling by adding or removing rotations to DNA. • A bacterial chromosome consists of a single, circular DNA molecule that is bound to proteins and exists as a series of large loops. It usually appears in the cell as a distinct clump known as the nucleoid. • Each eukaryotic chromosome contains a single, very long linear DNA molecule that is bound to histone and nonhistone chromosomal proteins. Euchromatin undergoes the normal cycle of decondensation and condensation in the cell cycle. Heterochromatin remains highly condensed throughout the cell cycle. • The nucleosome is a core of eight histone proteins (two each of H2A, H2B, H3, and H4) and DNA (145 – 147 bp) that wraps around it. The H1 protein holds DNA onto the histone core. • Nucleosomes are folded into a 30-nm fiber that forms a series of 300-nm-long loops; these loops are anchored at their bases by proteins associated with the nuclear scaffold. The 300-nm loops are condensed to form a 30 • • • • • • • Chapter 11 fiber that is 250 nm in diameter, which is itself tightly coiled to produce a 700-nm-wide chromatid. Chromosomal puffs are regions of localized unpacking of the DNA that are associated with regions of active transcription. Chromosome regions that are undergoing active transcription are relatively sensitive to digestion by DNase I, indicating that DNA unfolds during transcription. Centromeres are chromosomal regions where spindle fibers attach; chromosomes without centromeres are usually lost in the course of cell division. Centromeres play an important role in the regulation of the cell cycle. Telomeres stabilize the ends of chromosomes. Telomeric sequences consist of many copies of short sequences, which usually consist of a series of cytosine nucleotides followed by several adenine nucleotides. Longer telomere-associated sequences are found adjacent to the telomeric sequences. The C value is the amount of DNA in an organism’s genome. Eukaryotic organisms exhibit much variation in C value owing to differences in sequence complexity, which can be measured by observing the time required for denatured DNA to reanneal in a hybridization reaction, as plotted by a C0t curve. Eukaryotic DNA exhibits three classes of sequences. Uniquesequence DNA exists in very few copies. Moderately repetitive DNA consists of moderately long sequences that are repeated from hundreds to thousands of times. Highly repetitive DNA consists of very short sequences that are repeated in tandem from many thousands to millions of times. Transposable elements are mobile DNA sequences that insert into many locations within a genome and often cause mutations and DNA rearrangements. Most transposable elements have two common characteristics: terminal inverted repeats and the generation of short direct repeats in DNA at the point of insertion. • Transposition may take place through a DNA molecule or through the production of an RNA molecule that is then reverse transcribed into DNA. Transposition may be replicative, in which the transposable element is copied and the copy moves to a new site, or nonreplicative, in which the transposable element excises from the old site and moves to a new site. • Retrotransposons transpose through RNA molecules that undergo reverse transcription to produce DNA. • In many transposable elements, transposition is tightly regulated. • Insertion sequences are small bacterial transposable elements that carry only the information needed for their own movement. Composite transposons in bacteria are more complex elements that consist of DNA between two insertion sequences. Some complex transposable elements in bacteria do not contain insertion sequences. • Some transposable elements in eukaryotic cells are similar to those found in bacteria, ending in short inverted repeats and producing flanking direct repeats at the point of insertion. Others are retrotransposons, similar in structure to retroviruses and transposing through RNA intermediates. • Hybrid dysgenesis is the appearance of numerous mutations, chromosome rearrangements, and sterility when transposable P elements undergo a burst of transposition in Drosophila. • The evolutionary significance of transposable elements is unknown, but three hypotheses have been proposed to explain their common occurrence. The cellular function hypothesis suggests that transposable elements provide some important function for the cell; the genetic variation hypothesis proposes that transposable elements provide evolutionary flexibility by inducing mutations; and the selfish DNA hypothesis suggests that transposable elements do not benefit the cell but are widespread because they can replicate and spread. IMPORTANT TERMS transgenic mouse (p. 000) transposable element (p. 000) supercoiling (p. 000) relaxed state of DNA (p. 000) positive supercoiling (p. 000) negative supercoiling (p. 000) topoisomerase (p. 000) nucleoid (p. 000) euchromatin (p. 000) heterochromatin (p. 000) nonhistone chromosomal proteins (p. 000) chromosomal scaffold protein (p. 000) high-mobility-group proteins (p. 000) nucleosome (p. 000) chromatosome (p. 000) linker DNA (p. 000) polytene chromosome (p. 000) chromosomal puff (p. 000) centromeric sequence (p. 000) telomeric sequence (p. 000) telomere-associated sequence (p. 000) C value (p. 000) denaturation (melting) (p. 000) melting temperature (Tm) (p. 000) renaturation (reannealing) (p. 000) hybridization (p. 000) unique-sequence DNA (p. 000) repetitive DNA (p. 000) moderately repetitive DNA (p. 000) tandem repeat sequence(p. 000) interspersed repeat sequences (p. 000) short interspersed element (SINE) (p. 000) long interspersed element (LINE) (p. 000) highly repetitive DNA (p. 000) flanking direct repeat (p. 000) terminal inverted repeats (p. 000) transposition (p. 000) replicative transposition (p. 000) nonreplicative transposition (p. 000) cointegrate structure (p. 000) transposase (p. 000) resolvase (p. 000) retrotransposon (p. 000) insertion sequence (p. 000) composite transposon (p. 000) delta sequence (p. 000) hybrid dysgenesis (p. 000) Chromosome Structure and Transposabe Elements 31 Worked Problems (a) How many nucleosomes are present in the cell? (b) Give the numbers of molecules of each type of histone protein associated with the genomic DNA. • Solution Each nucleosome encompasses about 200 bp of DNA: from 144 to 147 bp of DNA wrapped twice around the histone core, from 20 to 22 bp of DNA associated with the H1 protein, and another 30 to 40 bp of linker DNA. (a) To determine how many nucleosomes are present in the cell, we simply divide the total number of base pairs of DNA (2  109 bp) by the number of base pairs per nucleosome: 2  109 nucleotides  1  107 nuclesomes 2  10 nucleotides per nucleosome 2 Thus, there are approximately 10 million nucleosomes in the cell. (b) Each nucleosome includes two molecules each of H2A, H2B, H3, and H4 histones. Therefore, there are 2  107 molecules each of H2A, H2B, H3, and H4 histones. Each nucleosome has associated with it one copy of the H1 histone; so there are 1  107 molecules of H1. 2. A renaturation reaction is carried out on the genomic DNA from three different bacterial species. Species I has a genome size of 2  106 bp, species II has a genome size of 1  108 bp, and species III has a genome size of 1  106 bp. Assume that the same total amount of DNA is used in each renaturation reaction and draw a C0t curve for each species, showing the relative positions of each species on the same graph. • Solution Because this DNA is from bacteria, which contain only uniquesequence DNA, the complexity of renaturation with repetitive DNA can be ignored. If the total amount of DNA is the same for all three bacterial species, the number of copies of each sequence will depend on the genome size: there are fewer different sequences in organisms with small genomes, and so a chance collision is more likely to be between two complementary sequences that will anneal. Consequently, renaturation will proceed more rapidly in organisms with smaller genomes. (Recall the analogy of drawing cards from a hat; when there are only a few different names on the cards, the hat empties more quickly.) At the start of the reaction, all the DNA is single stranded; so the proportion of single-stranded DNA is 1. As the reaction proceeds, single-stranded DNA pairs to form double-stranded DNA; so the proportion of single-stranded DNA decreases. This decrease will occur at a low C0t in the organisms with a smaller genome, as shown in the following graph. Percentage of single-stranded DNA 1. A diploid plant cell contains 2 billion base pairs of DNA. Species III Species I Concentration Species II time (C0t) 3. Genomic DNAs from species I, II, and III have the following base compositions: % Species A G T C I II III 10 27 46 40 23 4 10 27 46 40 23 4 DNA from which species has a higher Tm? Explain your reasoning. • Solution The melting temperature (Tm) of DNA depends on its base sequence. The three hydrogen bonds of a G – C base pair require more energy to break than do the two hydrogen bonds of an A – T pair; so a molecule with a higher percentage of G – C pairs will have a higher Tm. Species I has the highest G – C content of the three species; so it should exhibit the highest Tm. 4. Certain repeated sequences in eukaryotes are flanked by short direct repeats, suggesting that they originated as transposable elements. These same sequences lack introns and possess a string of thymine nucleotides at their 3 ends. Have these elements transposed through DNA or RNA sequences? Explain your reasoning. • Solution The absence of introns and the string of thymine nucleotides (which would be complementary to adenine nucleotides in RNA) at the 3 end are characteristics of processed RNA. These similarities to RNA suggest that the element was originally transcribed into mRNA, processed to remove the introns and to add a poly(A) tail, and then reverse transcribed into a complementary DNA that was inserted into the chromosome. 5. Which of the following pairs of sequences might be found at the ends of an insertion sequence? (a) 5 – TAAGGCCG – 3 and 5 – TAAGGCCG – 3 (b) 5 – AAAGGGCTA – 3 and 5 – ATCGGGAAA – 3 (c) 5 – GATCCCAGTT – 3 and 5 – CTAGGGTCAA – 3 32 Chapter 11 (d) 5 – GATCCAGGT – 3 and 5 – ACCTGGATC – 3 (e) 5 – AAAATTTT – 3 and 5 – TTTTAAAA – 3 (f) 5 – AAAATTTT – 3 and 5 – AAAATTTT – 3 • Solution The correct answer is d and f. The ends of insertion sequences always have inverted repeats, which are sequences on the same strand that are inverted and complementary. The sequences in part a are direct repeats, which are generated on the outside of an insertion sequence but are not part of the transposable element itself. The sequences in part b are inverted but not complementary. The sequences in part c are complementary but not inverted. The sequences in part d are both inverted and complementary. The sequences in part e are complementary but not inverted. Interestingly, the sequences in part f are both inverted complements and direct repeats. The New Genetics MINING GENOMES INTRODUCTION TO BLAST AND BLAST SEARCHING This exercise casts you in the role of biological detective, trying to figure out the functions of newely discovered genes. The simplest way to determine what is encoded by new sequences is to compare them with information already in the databases by using BLAST (Basic Local Alignment Search Tools). You will use the National Center for Biotechnology Information (NCBI) Web site to explore some of the strengths and weaknesses of this powerful approach. COMPREHENSION QUESTIONS * 1. How does supercoiling arise? What is the difference between positive and negative supercoiling? 2. What functions does supercoiling serve for the cell? * 3. Describe the composition and structure of the nucleosome. How do core particles differ from chromatosomes? 4. Describe in steps how the double helix of DNA, which is 2 nm in width, gives rise to a chromosome that is 700 nm in width. 5. What are polytene chromosomes and chromosomal puffs? * 6. Describe the function and molecular structure of the centromere. * 7. Describe the function and molecular structure of a telomere. 8. What is the C value of an organism? 9. What is a C0t curve? Explain how C0t curves of DNA provide evidence for the existence of repetitive DNA in eukaryotic cells. *10. Describe the different types of DNA sequences that exist in eukaryotes. *11. What general characteristics are found in many transposable elements? Describe the differences between replicative and nonreplicative transposition. *12. What is a retrotransposon and how does it move? *13. Describe the process of replicative transposition through DNA intermediates. What enzymes are required? *14. Draw and label the structure of a typical insertion sequence. 15. Draw and label the structure of a typical composite transposon in bacteria. 16. How are composite transposons and retrotransposons alike and how are they different? 17. Explain how Ac and Ds elements produce variegated corn kernels. 18. Briefly explain hybrid dysgenesis and how P elements lead to hybrid dysgenesis. *19. Briefly summarize three hypotheses for the widespread occurrence of transposable elements. APPLICATION QUESTIONS AND PROBLEMS * 20. Compare and contrast prokaryotic and eukaryotic chromosomes. How are they alike and how do they differ? 21. (a) In a typical eukaryotic cell, would you expect to find more molecules of the H1 histone or more molecules of the H2A histone? Explain your reasoning. (b) Would you expect to find more molecules of H2A or more molecules of H3? Explain your reasoning. 22. Suppose you examined polytene chromosomes from the salivary glands of fruit fly larvae and counted the number of chromosomal puffs observed in different regions of DNA. (a) Would you expect to observe more puffs from euchromatin or from heterochromatin? Explain your answer. (b) Would you expect to observe more puffs in unique- sequence DNA, moderately repetitive DNA, or repetitive DNA? Why? Chromosome Structure and Transposable Elements 33 * 23. A diploid human cell contains approximately 6 billion base pairs of DNA. (a) How many nucleosomes are present in such a cell? (Assume that the linker DNA encompasses 40 bp.) (b) How many histone proteins are complexed to this DNA? * 24. 25. 26. * 27. (c) 5 – TTTCGAC – 3 and 5 – CAGCTTT – 3 (d) 5 – ACGTACG – 3 and 5 – CGTACGT – 3 (e) 5 – GCCCCAT – 3 and 5 – GCCCAT – 3 * 30. A particular transposable element generates flanking direct repeats that are 4 bp long. Give the sequence that will be found on both sides of the transposable element Would you expect to see more or less acetylation in regions of if this transposable element inserts at the position DNA that are sensitive to digestion by DNase I? Why? indicated on each of the following sequences. A YAC that contains only highly repetitive, nonessential DNA (a) Transposable is added to mouse cells that are growing culture. The cells are element then divided into two groups, A and B. A laser is then used to damage the centromere on the YACs in cells of group A. The centromeres on the YACs of group B are not damaged. In p 5 – ATTCGAACTGACCGATCA – 3 spite of the fact that the YACs contain no essential DNA, the cells in group A divide more slowly than those in group B. (b) Transposable Provide a possible explanation. element Species A possesses only unique-sequence DNA. Species B p possesses unique-sequence DNA and highly repetitive DNA. 5 – ATTCGAACTGACCGATCA – 3 Species C possesses only moderately repetitive DNA. The genomes of all three species are similar in size. A student * 31. White eyes in Drosophila melanogaster result from an performs typical renaturation reactions with DNA from X-linked recessive mutation. Occasionally, white-eyed each species and plots a C0t curve for each. Draw a C0t curve mutants give rise to offspring that possess white eyes for the renaturation reaction of each species. with small red spots. The number, distribution, and Which of the following two molecules of DNA has the lower size of the red spots are variable. Explain how a melting temperature? Why? transposable element could be responsible for this spotting phenomenon. AGTTACTAAAGCAATACATC * 32. An insertion sequence contains a large deletion in its TCAATGATTTCGTTATGTAG transposase gene. Under what circumstances would this insertion sequence be able to transpose? AGGCGGGTAGGCACCCTTA TCCGCCCATCCGTGGGAAT * 33. What factor do you think determines the length of the 28. DNA was isolated from a newly discovered worm collected near a deep-sea vent in the Pacific Ocean. This DNA was sheared into pieces, heated to melting, and then cooled slowly. The amount of renaturation was measured with optical absorbance, and the following results were obtained. flanking direct repeats that are produced in transposition? 34. A transposable element is found to encode a transposase enzyme. On the basis of this information, what conclusions can you make about the likely structure and method of transposition of this element? Percentage of singlestranded DNA 35. A transposable element is found to encode a reverse transcriptase enzyme. On the basis of this information, what conclusions can you make about the likely structure and method of transposition of this element? 36. Transposition often produces chromosome rearrangements, such as deletions, inversions, and translocations. Can you suggest a reason why transposition leads to these chromosome mutations? Concentration time (C0t) What conclusions can you draw about the type of sequences found in this DNA? 29. Which of the following pairs of sequences might be found at the ends of an insertion sequence? (a) 5 – GGGCCAATT – 3 and 5 – CCCGGTTAA – 3 (b) 5 – AAACCCTTT – 3 and 5 – AAAGGGTTT – 3 37. A geneticist studying the DNA of the Japanese bottle fly finds many copies of a particular sequence that appears similar to the copia transposable element in Drosophila. Using recombinant DNA techniques, the geneticist places an intron into a copy of this DNA sequence and inserts it into the genome of a Japanese bottle fly. If the sequence is a transposable element similar to copia, what prediction would you make concerning the fate of the introduced sequence in the genomes of offspring of the fly receiving it? 34 Chapter 11 CHALLENGE QUESTIONS 38. An explorer discovers a strange new species of plant and sends some of the plant tissue to a geneticist to study. The geneticist isolates chromatin from the plant and examines it with the electron microscope. She observes what appear to be beads on a string. She then adds a small amount of nuclease, which cleaves the string into individual beads that each contain 280 bp of DNA. After digestion with more nuclease, she finds that a 120-bp fragment of DNA remains attached to a core of histone proteins. Analysis of the histone core reveals histones in the following proportions: H1 H2A H2B H3 H4 H7 (a new histone) 12.5% 25% 25% 0% 25% 12.5% On the basis of these observations, what conclusions could the geneticist make about the probable structure of the nucleosome in the chromatin of this plant? 39. Although highly repetitive DNA is common in eukaryotic chromosomes, it does not code for proteins; in fact, it is probably never transcribed into RNA. If highly repetitive DNA does not code for RNA or proteins, why is it present in eukaryotic genomes? Suggest some possible reasons for the widespread presence of highly repetitive DNA. 40. As discussed in the chapter, Alu sequences are retrotransposons that are common in the human genome. Alu sequences are thought to have evolved from the 7S RNA gene, which encodes an RNA molecule that takes part in transporting newly synthesized proteins across the endoplasmic reticulum. The 7S RNA gene is transcribed by RNA polymerase III, which uses an internal promoter (see Chapter 13). How might this observation explain the large number of copies of Alu sequences? 41. Houck and Kidwell proposed that P elements were carried from Drosophila willistoni to D. melanogaster by mites that fed on fruit flies. What evidence do you think would be required to demonstrate that D. melanogaster acquired P elements in this way? Propose a series of experiments to provide such evidence. SUGGESTED READINGS Beermann, W., and U. Clever. 1964. Chromosome puffs. Scientific American 210(4):50 – 58. Describes early research on chromosome puffs that led to the conclusion that puffs are areas of active transcription. Blackburn, E. H. 2000. Telomere states and cell fates. Nature 408:53 – 56. Proposal that telomere length is less important to cell aging than telomere capping. Burlingame, R. W., W. E. Love, B. C. Wang, R. Hamlin, H. X. Nguyen, and E. N. Moudrianakis. 1985. Crystallographic structure of the octameric histone core of the nucleosome at a resolution of 3.3 Å. Science 228:546 – 553. A detailed description of the histone octamer based on X-ray crystallographic data. Cohen, S. N., and J. A. Shapiro. 1980. Transposable genetic elements. Scientific American 242(2):40 – 49. A readable review of transposable elements. Fedoroff, N. V. 1993. Barbara McClintock (June 16, 1902 – September 2, 1992). Genetics 136:1 – 10. An excellent summary of Barbara McClintock’s life and her influence in genetics. Grindley, N. D. F., and R. R. Reed. 1985. Transpositional recombination in prokaryotes. Annual Review of Biochemistry 54:863 – 896. A thorough review of mechanisms of transposition. Greider, C. W., and E. H. Blackburn. 1996. Telomeres, telomerase, and cancer. Scientific American 274(2):92 – 97. An excellent review of telomeres and telomerase. Hagmann, M. 1999. How chromatin changes its shape. Science 285:1200 – 1203. A report of research on how chromatin changes its shape in response to various genetic functions. Houck, M. A., J. B. Clark, K. R. Peterson, and M. G. Kidwell. 1991. Possible horizontal transfer of Drosophila genes by the mite Proctolaelaps regalis. Science 253:1125 – 1129. Reports that P elements may have been transported by mites. Keller, E. F. 1983. A Feeling for the Organism: The Life and Work of Barbara McClintock. New York: W. H. Freeman and Company. A wonderful biography of Barbara McClintock that captures her unique personality, her love for research, and her deep understanding of corn genetics. Kornberg, R. D., and A. Klug. 1981. The nucleosome. Scientific American 244(2):52 – 64. A good review of basic chromatin structure and how it was discovered. Luger, K., A. W. Mäder, R. K. Richmond, D. F. Sargent, and T. J. Richmond. 1997. Crystal structure of the nucleosome core particle at 2.8 Å resolution. Nature 389:251 – 260. A report of the detailed structure of the nucleosome as revealed by X-ray crystallography. Chromosome Structure and Transposabe Elements McEachern, M. J., A. Krauskopf, and E. H. Blackburn. 2000. Telomeres and their control. Annual Review of Genetics 34:331 – 358. A review of telomere structure and control. Murray, A. W., and J. W. Szostak. 1987. Artificial chromosomes. Scientific American 257(5):62 – 68. Describes how YACs are created and gives the history of their discovery. Ng, H. H., and A. Bird. 2000. Histone deacetylases: silencers for hire. Trends in Biochemical Science 25:121 – 126. Reviews the role of acetylation in the control of chromatin structure and gene expression. Pluta, A. F., A. M. Mackay, A. M. Ainsztein, I. G. Goldberg, and W. C. Earnshaw. 1995. The centromere: hub of chromosomal activities. Science 270:1591 – 1594. An excellent review of centromere structure and function. Syvanen, M. 1984. The evolutionary implications of mobile genetic elements. Annual Review of Genetics 18:271 – 293. A review that discusses the evolutionary significance of transposable elements. 35 Travers, A. 1999. The location of the linker histone on the nucleosome. Trends in Biochemical Science 24:4 – 7. Reviews and discusses different models concerning the location of the linker H1 histone in the chromatosome. Voytas, D. F. 1996. Retroelements in genome organization. Science 274:737 – 738. A discussion of the organization of transposable elements in maize. Weiner, A. M., P. L. Deininger, and A. Efstatiadis. 1986. Nonviral retroposon: genes, pseudogenes, and transposable elements generated by the reverse flow of genetic information. Annual Review of Biochemistry 55:631 – 661. A review of retrotransposons and related genetic elements. Wolffe, A. P. 1998. Chromatin: Structure and Function, 3d ed. San Diego: Academic Press. A detailed and current review of chromatin structure and function. Zakian, V. A. 1995. Telomeres: beginning to understand the end. Science 270:1601 – 1606. An excellent review of telomere structure and function. 12 DNA Replication a n d Recombination • • The Cause of Bloom Syndrome • Semiconservative Replication The Central Problem of Replication Meselson and Stahl’s Experiment Modes of Replication Requirements of Replication Direction of Replication • The Mechanism of Replication Bacterial DNA Replication Eukaryotic DNA Replication Replication in Archaea • The Molecular Basis of Recombination The Holliday Model The Double-Strand-Break Model Enzymes Required for Recombination This is photo legend x 26 picas width for opening chapter photo for Chapter 12. This is legend copy area for Chapter opening photo for Chapter 12. Set in 8.5/11 Lucida Sans Roman ulc. Copy to come for this opener. (Credit to come). The Cause of Bloom Syndrome Tommy was a full-term baby but weighed only 4.5 pounds (2 kg) at birth. At about 9 months of age, an unusual and persistent rash appeared on his face, and he frequently caught colds and infections. The illnesses caused no serious problems; so his parents were not concerned. Throughout childhood, Tommy remained small; by age 18, he was only 4 feet 6 inches (137 cm) in height. Tommy’s first major health problem arose shortly after he turned 22 — he was diagnosed with intestinal cancer. The 322 tumor was surgically removed but additional, unrelated tumors appeared spontaneously over the next 10 years. Their appearance startled Tommy’s doctors; the chance of multiple, independent cancers arising in the same person is generally remote. The propensity of Tommy’s cells to become cancerous hinted at a high mutation rate in his genes. Indeed, when pathologists studied Tommy’s chromosomes, they observed a wide range of abnormalities. Tommy had inherited Bloom syndrome. Bloom syndrome is a rare autosomal recessive condition characterized by short stature, a facial rash induced by DNA Replication and Recombination sun exposure, a small narrow head, and a predisposition to cancers of all types. The disorder is extremely rare; only several hundred cases have been reported worldwide. Cells from persons with Bloom syndrome exhibit excessive mutations in all genes, and numerous gaps and breaks occur in chromosomes that lead to extensive genetic exchange in cell division. Rates of DNA synthesis are retarded. The characteristics of Bloom syndrome suggest that its underlying cause is a defect in DNA replication. In 1995, researchers at the New York Blood Center traced Bloom syndrome to a gene on chromosome 15 that encodes an enzyme called DNA helicase. A variety of helicase enzymes are responsible for unwinding double-stranded DNA during replication and repair. The cells of a person with Bloom syndrome carry two mutated copies of the gene and possess little or no activity for a particular helicase. Normal DNA replication is disrupted, leading to chromosome breaks and numerous mutations. The genetic damage resulting from faulty DNA replication leads to tumors. It is not yet clear whether the basic defect in DNA synthesis is associated with replication or DNA repair or both. Rapid and accurate DNA replication is fundamental to normal cell function and health. Replication is a complex process in which dozens of proteins, enzymes, and DNA structures take part; a single defective component, such as DNA helicase, can disrupt the whole process. This chapter deals with DNA replication, the process whereby a cell doubles its DNA before division. We begin with the basic mechanism of replication that emerged from the Watson and Crick structure of DNA. We then examine several different modes of replication, the requirements of replication, and the universal direction of DNA synthesis. We examine the enzymes and proteins that participate in the process and conclude the chapter by considering the molecular details of recombination, which is closely related to replication and is essential for the segregation of homologous chromosomes in meiosis, production of genetic variation, and for DNA repair. www.whfreeman.com/pierce More information about the symptoms and genetics of Bloom syndrome The Central Problem of Replication In a schoolyard game, a verbal message, such as “John’s brown dog ran away from home,” is whispered to a child, who runs to a second child and repeats the message. The message is relayed from child to child around the schoolyard until it returns to the original sender. Inevitably, the last child returns with an amazingly transformed message, such as “Joe Brown has a pig living under his porch.” The more children playing the game, the more garbled the message becomes. This game illustrates an important principle: errors arise whenever information is copied; the more times it is copied, the greater the number of errors. A complex, multicellular organism faces a problem similar to that of the children in the schoolyard game: how to faithfully transmit genetic instructions each time its cells divide. The solution to this problem is central to replication. A huge amount of genetic information and an enormous number of cell divisions are required to produce a multicellular adult organism; even a low rate of error during copying would be catastrophic. A single-celled human zygote contains 6 billion base pairs of DNA. If a copying error occurred only once per million base pairs, 6000 mistakes would be made every time a cell divided — errors that would be compounded at each of the millions of cell divisions that take place in human development. Not only must the copying of DNA be astoundingly accurate, it must also take place at breakneck speed. The single, circular chromosome of E. coli contains about 4.7 million base pairs. At a rate of more than 1000 nucleotides per minute, replication of the entire chromosome would require almost 3 days. Yet, these bacteria are capable of dividing every 20 minutes. E. coli actually replicates its DNA at a rate of 1000 nucleotides per second, with fewer than one in a billion errors. How is this extraordinarily accurate and rapid process accomplished? Semiconservative Replication From the three-dimensional structure of DNA that Watson and Crick proposed in 1953 (see Figure 10.8), several important genetic implications were immediately apparent. The complementary nature of the two nucleotide strands in a DNA molecule suggested that, during replication, each strand can serve as a template for the synthesis of a new strand. The specificity of base pairing (adenine with thymine; guanine with cytosine) implied that only one sequence of bases can be specified by each template, and so two DNA molecules built on the pair of templates will be identical with the original. This process is called semiconservative replication, because each of the original nucleotide strands remains intact (conserved), despite no longer being combined in the same molecule; the original DNA molecule is half (semi) conserved during replication. Initially, three alternative models were proposed for DNA replication. In conservative replication ( ◗ FIGURE 12.1a), the entire double-stranded DNA molecule serves as a template for a whole new molecule of DNA, and the original DNA molecule is fully conserved during replication. In dispersive replication ( ◗ FIGURE 12.1b), both nucleotide strands break down (disperse) into fragments, which serve as templates for the synthesis of new DNA fragments, and then somehow reassemble into two complete DNA molecules. In this model, each resulting DNA molecule is interspersed with fragments of old and new DNA; none of the original molecule is conserved. Semiconservative replication ( ◗ FIGURE 12.1c) is intermediate between these two models; the two nucleotide strands unwind and each serves as a template for a new DNA molecule. 323 324 Chapter 12 (a) Conservative replication (b) Dispersive replication (c) Semiconservative replication Original DNA First replication Second replication ◗ 12.1 Three proposed models of replication are conservative replication, dispersive replication, and semiconservative replication. These three models allow different predictions to be made about the distribution of original DNA and newly synthesized DNA after replication. With conservative replication, after one round of replication, 50% of the molecules would consist entirely of the original DNA and 50% would consist entirely of new DNA. After a second round of replication, 25% of the molecules would consist entirely of the original DNA and 75% would consist entirely of new DNA. With each additional round of replication, the proportion of molecules with new DNA would increase, although the number of molecules with the original DNA would remain constant. Dispersive replication would always produce hybrid molecules, containing some original and some new DNA, but the proportion of new DNA within the molecules would increase with each replication event. In contrast, with semiconservative replication, one round of replication would produce two hybrid molecules, each consisting of half original DNA and half new DNA. After a second round of replication, half the molecules would be hybrid, and the other half would consist of new DNA only. Additional rounds of replication would produce more and more molecules consisting entirely of new DNA, and a few hybrid molecules would persist. Meselson and Stahl’s Experiment To determine which of the three models of replication applied to E. coli cells, Matthew Meselson and Franklin Stahl needed a way to distinguish old and new DNA. They did so by using two isotopes of nitrogen, 14N (the common form) and 15N (a rare, heavy form). Meselson and Stahl grew a culture of E. coli in a medium that contained 15 N as the sole nitrogen source; after many generations, all A centrifuge tube is filled with a heavy salt solution and DNA fragments. It is then spun in a centrifuge at high speeds for several days. DNA with 14N DNA with 15N ◗ A density gradient develops within the tube. DNA will move to where its own density matches that of salt. Heavy DNA (with 15N) will move toward the bottom; light DNA (with 14N) will remain closer to the top. 12.2 Meselson and Stahl used equilibrium density gradient centrifugation to distinguish between heavy, 15N-laden DNA and lighter, 14 N-laden DNA. DNA Replication and Recombination The density of the DNA fragments matches that of the salt: light molecules rise and heavy molecules sink. Meselson and Stahl found that DNA from bacteria grown only on medium containing 15N produced a single band at the position expected of DNA containing only 15N ( ◗ FIGURE 12.3a). DNA from bacteria transferred to the medium with 14N and allowed one round of replication also produced a single band, but at a position intermediate between that expected of DNA with only 15N and that expected of DNA with only 14N ( ◗ FIGURE 12.3b).* After a second round of replication in medium with 14N, two bands of equal intensity appeared, one in the intermediate posi- the E. coli cells had 15N incorporated into the purine and pyrimidine bases of DNA (see Figure 10.10). Meselson and Stahl took a sample of these bacteria, switched the rest of the bacteria to a medium that contained only 14N, and then took additional samples of bacteria over the next few cellular generations. In each sample, the bacterial DNA that was synthesized before the change in medium contained 15N and was relatively heavy, whereas any DNA synthesized after the switch contained 14N and was relatively light. Meselson and Stahl distinguished between the heavy 15 N-laden DNA and the light 14N-containing DNA with the use of equilibrium density gradient centrifugation ( ◗ FIGURE 12.2). In this technique, a centrifuge tube is filled with a heavy salt solution and a substance whose density is to be measured — in this case, DNA fragments. The tube is then spun in a centrifuge at high speeds. After several days of spinning, a gradient of density develops within the tube, with high density at the bottom and low density at the top. *This result is inconsistent with the conservative replication model, which predicts one heavy band (the original DNA molecules) and one light band (the new DNA molecules). A single band of intermediate density is predicted by both the semiconservative and dispersive models. To distinguish between these two models, Meselson and Stahl grew the bacteria in medium containing 14N for a second generation. Experiment Question: Which model of DNA replication—conservative, dispersive or semiconservative—applies to E. coli ? (a) 15N (b) (c) Transfer to 14N medium and replicate medium Spin Replication in 14N medium Spin (d) Replication in medium 14N Spin Spin Light (14N) Heavy (15N) 1 DNA from bacteria that had been grown on medium containing 15N appeared as a single band. 2 After one round of replication, the DNA appeared as a single band intermediate between that expected for DNA with 15N and that expected for DNA with 14N. 3 After a second round of replication, DNA appeared as two bands, one in the position of hybrid DNA (half 15N and half 14N) and the other in the position of DNA that contained only 14N. Original DNA Parental strand New strand Conclusion: DNA replication in E.coli is semiconservative. ◗ 12.3 Meselson and Stahl demonstrated that DNA replication is semiconservative. 4 Samples taken after additional rounds of replication appeared as two bands, as in part c. 325 326 Chapter 12 tion and the other at the position expected of DNA that contained only 14N ( ◗ FIGURE 12.3c). All samples taken after additional rounds of replication produced two bands, and the band representing light DNA became progressively stronger ( ◗ FIGURE 12.3d). Meselson and Stahl’s results were exactly as expected for semiconservative replication and are incompatible with those predicated for both conservative and dispersive replication. Concepts Replication is semiconservative: each DNA strand serves as a template for the synthesis of a new DNA molecule. Meselson and Stahl convincingly demonstrated that replication in E. coli is semiconservative. www.whfreeman.com/pierce Stahl’s experiment A summary of Meselson and Modes of Replication Following Meselson and Stahl’s work, investigators confirmed that other organisms also use semiconservative replication. No evidence was found for conservative or dispersive replication. There are, however, several different ways that semiconservative replication can take place, differing principally in the nature of the template DNA — whether it is linear or circular — and in the number of replication forks. Individual units of replication are called replicons, each of which contains a replication origin. Replication starts at the origin and continues until the entire replicon has been replicated. Bacterial chromosomes have a single replication origin, whereas eukaryotic chromosomes contain many. Theta replication A common type of replication that takes place in circular DNA, such as that found in E. coli and other bacteria, is called theta replication ( ◗ FIGURE 12.4a), because it generates a structure that resembles the Greek letter theta (). In theta replication, double-stranded DNA begins to unwind at the replication origin, producing single-stranded ◗ 12.4 Theta replication is a type of replication common in E. coli and other organisms possessing circular DNA. (Electron micrographs from Bernard Hint, Institut (a) Suisse de Richerdies Experimentals sur le Cancer). Replication fork Origin of replication 1 Double-stranded DNA unwinds at the replication origin,… Newly synthesized DNA Replication bubble 2 …producing single-stranded templates for the synthesis of new DNA. A replication bubble forms, usually having a replication fork at each end. 3 The forks proceed around the circle. 4 Eventually two circular DNA molecules are produced. Conclusion: The products of theta replication are two circular DNA molecules. (b) Origin of replication Replication Fork Replication bubble DNA Replication and Recombination Cleavage releases a single-stranded linear DNA and a doublestranded circular DNA. DNA synthesis begins at the 3’ end of the broken strand; the inner strand is used as a template. The 5’ end of the broken strand is displaced. Replication is initiated by a break in one of the nucleotide strands. Cleavage 5’ 5’ P 3’ Conclusion: The products of rolling-circle replication are multiple circular DNA molecules. 5’ 5’ 3’ OH The linear DNA may circularize and serve as a template for synthesis of a complementary strand. 3’ 3’ O The cycle may be repeated ◗ 12.5 Rolling-circle replication takes place in some viruses and in the F factor of E. coli. nucleotide strands that then serve as templates on which new DNA can be synthesized. The unwinding of the double helix generates a loop, termed a replication bubble. Unwinding may be at one or both ends of the bubble, making it progressively larger. DNA replication on both of the template strands is simultaneous with unwinding. The point of unwinding, where the two single nucleotide strands separate from the double-stranded DNA helix, is called a replication fork. If there are two replication forks, one at each end of the replication bubble, the forks proceed outward in both directions in a process called bidirectional replication, simultaneously unwinding and replicating the DNA until they eventually meet. If a single replication fork is present, it proceeds around the entire circle to produce two complete circular DNA molecules, each consisting of one old and one new nucleotide strand. John Cairns provided the first visible evidence of theta replication in 1963 by growing bacteria in the presence of radioactive nucleotides. After replication, each DNA molecule consisted of one “hot” (radioactive) strand and one “cold” (nonradioactive) strand. Cairns isolated DNA from the bacteria after replication and placed it on an electron microscope grid, which was then covered with a photographic emulsion. Radioactivity present in the sample exposes the emulsion and produces a picture of the molecule (called an autoradiograph), similar to the way that light exposes a photographic film. Because the newly synthesized DNA contained radioactive nucleotides, Cairns was able to produce an electron micrograph of the replication process, similar to those shown in ◗ FIGURE 12.4b. Rolling-circle replication Another form of replication, called rolling-circle replication ( ◗ FIGURE 12.5), takes place in some viruses and in the F factor (a small circle of extrachromosomal DNA that controls mating, discussed in 327 Chapter 8) of E. coli. This form of replication is initiated by a break in one of the nucleotide strands that creates a 3-OH group and a 5-phosphate group. New nucleotides are added to the 3 end of the broken strand, with the inner (unbroken) strand used as a template. As new nucleotides are added to the 3 end, the 5 end of the broken strand is displaced from the template, rolling out like thread being pulled off a spool. The 3 end grows around the circle, giving rise to the name rolling-circle model. The replication fork may continue around the circle a number of times, producing several linked copies of the same sequence. With each revolution around the circle, the growing 3 end displaces the nucleotide strand synthesized in the preceding revolution. Eventually, the linear DNA molecule is cleaved from the circle, resulting in a doublestranded circular DNA molecule and a single-stranded linear DNA molecule. The linear molecule circularizes either before or after serving as a template for the synthesis of a complementary strand. Linear eukaryotic replication Circular DNA molecules that undergo theta or rolling-circle replication have a single origin of replication. Because of the limited size of these DNA molecules, replication starting from one origin can traverse the entire chromosome in a reasonable amount of time. The large linear chromosomes in eukaryotic cells, however, contain far too much DNA to be replicated speedily from a single origin. Eukaryotic replication proceeds at a rate ranging from 500 to 5000 nucleotides per minute at each replication fork (considerably slower than bacterial replication). Even at 5000 nucleotides per minute at each fork, DNA synthesis starting from a single origin would require 7 days to replicate a typical human chromosome consisting of 100 million base pairs of DNA. The replication of eukaryotic chromosomes actually occurs in a matter of minutes or hours, not 328 Chapter 12 Table 12.1 Number and length of replicons Organism Number of Replication Origins Average Length of Replicon (bp) 1 4,200,000 500 40,000 3,500 40,000 Escherichia coli (bacterium) Saccharomyces cerevisiae (yeast) Drosophila melanogaster (fruit fly) Xenopus laevis (toad) 15,000 200,000 Mus musculus (mouse) 25,000 150,000 1 Each chromosome contains numerous origins. Origin 1 Origin 2 Origin 3 2 At each origin, the DNA unwinds, producing a replication bubble. 3 DNA synthesis takes place on both strands at each end of the bubble as the replication forks proceed outward. Source: Data from B. L. Lewin, Genes V (Oxford: Oxford University Press, 1994), p. 536. days. This rate is possible because replication takes place simultaneously from thousands of origins. Typical eukaryotic replicons are from 20,000 to 300,000 base pairs in length (Table 12.1). At each replication origin, the DNA unwinds and produces a replication bubble. Replication takes place on both strands at each end of the bubble, with the two replication forks spreading outward. Eventually, replication forks of adjacent replicons run into each other, and the replicons fuse to form long stretches of newly synthesized DNA ( ◗ FIGURE 12.6). Replication and fusion of all the replicons leads to two identical DNA molecules. Important features of theta replication, rolling-circle replication, and linear eukaryotic replication are summarized in Table 12.2. 4 Eventually, the forks of adjacent bubbles run into each other and the segments of DNA fuse,… 5 …producing two identical linear DNA molecules. Newly synthesized DNA Concepts Theta replication, rolling-circle replication, and linear replication differ with respect to the initiation and progression of replication, but all produce new DNA molecules by semiconservative replication. Requirements of Replication Although the process of replication includes many components, they can be combined into three major groups: 1. a template consisting of single-stranded DNA, 2. raw materials (substrates) to be assembled into a new nucleotide strand, and 3. enzymes and other proteins that “read” the template and assemble the substrates into a DNA molecule. Conclusion: The products of eukaryotic DNA replication are two linear DNA molecules. ◗ 12.6 Linear DNA replication takes place in eukaryotic chromosomes. Because of the semiconservative nature of DNA replication, a double-stranded DNA molecule must unwind to expose the bases that act as a template for the assembly of new polynucleotide strands, which are made complementary and antiparallel to the template strands. The raw materials from which new DNA molecules are synthesized are deoxyribonucleoside triphosphates (dNTPs), each consisting of a deoxyribose sugar and a base (a nucleoside) DNA Replication and Recombination 329 Table 12.2 Characteristics of theta, rolling-circle, and linear eukaryotic replication Replication Model DNA Template Breakage of Nucleotide Strand Number of Replicons Unidirectional or Bidirectional Products Theta Circular No 1 Unidirectional or bidirectional Two circular molecules Rolling circle Circular Yes 1 Unidirectional One circular molecule and one linear molecule that may circularize Linear eukaryotic Linear No Many Bidirectional Two linear molecules attached to three phosphates ( ◗ FIGURE 12.7a). In DNA synthesis, nucleotides are added to the 3-OH group of the growing nucleotide strand ( ◗ FIGURE 12.7b). The 3-OH group of the last nucleotide on the strand attacks the 5phosphate group of the incoming dNTP. Two phosphates (a) are cleaved from the incoming dNTP, and a phosphodiester bond is created between the two nucleotides. DNA synthesis does not happen spontaneously. Rather, it requires a host of enzymes and proteins that function in a coordinated manner. We will examine this complex array of (b) Phosphates O –O O O P C H O P C H H C C 3’ OH P T O– 5’ 3’ OH T A A O– base O H2C Template strand 3’ OH O O– O New strand 5’ H H C Deoxyribose sugar A 2 In replication, the 3’-OH group of the last nucleotide on the strand attacks the 5’-phosphate group of the incoming dNTP. C G 1 New DNA is synthesized from deoxyribonucleoside triphosphates (dNTPs). C T A 4 A phosphodiester bond forms between the two nucleotides,… G C OH 3’ G G C G T G 5’ OH C C 3 Two phosphates are cleaved off. 3’ ◗ 12.7 In replication, new DNA is synthesized from deoxyribonucleoside triphosphates (dNTPs). C 5 …and phosphate ions are released. OH Deoxyribonucleoside triphosphate (dNTP) 3’ 5’ 5’ 330 Chapter 12 proteins and enzymes as we consider the replication process in more detail. Concepts DNA synthesis requires a single-stranded DNA template, deoxyribonucleoside triphosphates, a growing nucleotide strand, and a group of enzymes and proteins. Direction of Replication In DNA synthesis, new nucleotides are joined one at a time to the 3 end of the newly synthesized strand. DNA polymerases, the enzymes that synthesize DNA, can add nucleotides only to the 3 end of the growing strand (not the 5 end), so new DNA strands always elongate in the same 5-to-3 direction (5 : 3). Because the two singlestranded DNA templates are antiparallel and strand elongation is always 5 : 3, if synthesis on one template proceeds from, say, right to left, then synthesis on the other template must proceed in the opposite direction, from left to right ( ◗ FIGURE 12.8). As DNA unwinds during replication, the antiparallel nature of the two DNA strands means that one template is exposed in the 5 : 3 direction and the other template is exposed in the 3 : 5 direction (see Figure 12.8); so how can synthesis take place simultaneously on both strands at the fork? As the DNA unwinds, the template strand that is exposed in the 3 : 5 direction (the lower strand in Figures 12.8 and 12.9) allows the new strand to be synthesized continuously, in the 5 : 3 direction. This new strand, which undergoes continuous replication, is called the leading strand. The other template strand is exposed in the 5 : 3 direction (the upper strand in Figures 12.8 and 12.9). After a short length of the DNA has been unwound, synthesis must proceed 5 : 3; that is, in the direction opposite that of unwinding ( ◗ FIGURE 12.9). Because only a short length of DNA needs to be unwound before synthesis on this strand gets started, the replication machinery soon runs out of template. By that time, more DNA has unwound, providing new template at the 5 end of the new strand. DNA synthesis must start anew at the replication fork and proceed in the direction opposite that of the movement of the fork until it runs into the previously replicated segment of DNA. This process is repeated again and again, so synthesis of this strand is in short, discontinuous bursts. The newly made strand that undergoes discontinuous replication is called the lagging strand. The short lengths of DNA produced by discontinuous replication of the lagging strand are called Okazaki fragments, after Reiji Okazaki, who discovered them. In bacterial cells, each Okazaki fragment ranges in length from about 1000 to 2000 nucleotides; in eukaryotic cells, they are about 100 to 200 nucleotides long. Okazaki fragments on the lagging strand are linked together to create a continuous new DNA molecule. Let’s relate the direction of DNA synthesis to the modes of replication examined earlier. In the theta model ( ◗ FIGURE 12.10a), the DNA unwinds at one particular location, the origin, and a replication bubble is formed. If the bubble has two forks, one at each end, synthesis takes place simultaneously at both forks (bidirectional replication). At each fork, synthesis on one of the template strands proceeds in the same direction as that of unwinding; the newly replicated strand is the leading strand with continuous replication. On the other template strand, synthesis is proceeding in the direction opposite that of unwinding; this newly synthesized strand is the lagging strand with discontinuous replication. Focus on just one of the template strands within the bubble. Notice that synthesis on this template 3 …DNA synthesis on one strand proceeds from right to left… 5’ 3’ 5’ Template exposed 5’ 3’ Direction of synthesis 3’ 1 Because two template strands are antiparallel… 2 …and DNA synthesis is always 5’ 3’,… Replication fork Unwinding Direction of synthesis 5’ 3’ 3’ 4 …and on the other strand from left to right. ◗ 5’ 12.8 DNA synthesis takes place simultaneously but in opposite directions on the two DNA template strands. DNA replication at a single replication fork begins when a double-stranded DNA molecule unwinds to provide two single-strand templates. Template exposed 3’ 5’ DNA Replication and Recombination strand is continuous at one fork but discontinuous at the other. This difference arises because DNA synthesis is always in the same direction (5 : 3), but the two forks are moving in opposite directions. Replication in the rolling-circle model ( ◗ FIGURE 12.10b) is somewhat different, because there is no replication bubble. (a) Theta model 1 On the lower template strand, DNA synthesis proceeds continuously in the 5’ 3’ direction, the same as that of unwinding. 5’ 3’ Lagging strand Origin 3’ 5’ 3’ 5’ Unwinding and replication Origin Leading strand Template strands 5’ 3’ 1 DNA unwinds at the origin. 3’ 5’ Lagging strand 5’ 3’ Leading strand Unwinding and replication Newly synthesized DNA Unwinding and replication 2 At each fork, DNA synthesis of the leading strand proceeds continuously in the same direction as that of unwinding. 3 DNA synthesis of the lagging strand proceeds discontinuously in the direction opposite that of unwinding. (b) Rolling-circle model 2 On the upper template strand, DNA synthesis begins at the fork and proceeds in the direction opposite that of unwinding; so it soon runs out of template. 5’ 3’ Origin Leading strand 1 Continuous DNA synthesis begins at the 3’ end of the broken nucleotide strand. 5’ 5’ 3’ 5’ 3’ 2 As the DNA molecule unwinds, the 5’ end is progressively displaced. 3’ 5’ 3’ Unwinding and replication 3 DNA synthesis starts again on the upper strand, at the fork, each time proceeding away from the fork until it runs out of template. 5’ 3’ (c) Linear eukaryotic replication 5’ 3’ 3’ 5’ 5’ 3’ 5' 3' 1 At each fork, synthesis of the leading strand takes place continuously in the same direction as that of unwinding. Origin 5’ 3’ 5’ 3’ 5’ 3’ 3’ 5’ 5’ 3’ 5’ 3’ Unwinding and replication 4 DNA synthesis on this strand is discontinuous; short fragments of DNA produced by discontinuous synthesis are called Okazaki fragments. 5’ 3’ Lagging strand Discontinuous DNA synthesis 5’ 3’ Leading strand Lagging strand Lagging strand Leading strand Origin 5’ 3’ 3’ 5’ Unwinding and replication 2 The lagging strand is synthesized discontinuously in the direction opposite that of unwinding. Okazaki fragments 5’ 3’ 3’ 5’ 5’ 3’ 3’ 5’ ◗ 12.10 The process of replication differs in theta replication, rolling-circle replication, and linear replication. 5’ 3’ Leading strand Continuous DNA synthesis ◗ 12.9 DNA synthesis is continuous on one template strand of DNA and discontinuous on the other. Replication begins at the 3 end of the broken nucleotide strand. Continuous replication takes place on the circular template as new nucleotides are added to this 3 end. The replication of linear molecules of DNA, such as those found in eukaryotic cells, produces a series of replica- 331 332 Chapter 12 tion bubbles ( ◗ FIGURE 12.10c). DNA synthesis in these bubbles is the same as that in the single replication bubble of the theta model; it begins at the center of each replication bubble and proceeds at two forks, one at each end of the bubble. At both forks, synthesis of the leading strand proceeds in the same direction as that of unwinding, whereas synthesis of the lagging strand proceeds in the direction opposite that of unwinding. Concepts All DNA synthesis is 5 : 3, meaning that new nucleotides are always added to the 3 end of the growing nucleotide strand. At each replication fork, synthesis of the leading strand proceeds continuously and that of the lagging strand proceeds discontinuously. The Mechanism of Replication ori C Initiator proteins 1 Initiator proteins bind to oriC, the origin of replication,… 2 …causing a short stretch of DNA to unwind. Replication takes place in four stages: initiation, unwinding, elongation, and termination. Bacterial DNA Replication The following discussion of the process of replication will focus on bacterial systems, where replication has been most thoroughly studied and is best understood. Although many aspects of replication in eukaryotic cells are similar to those of prokaryotic cells, there are some important differences. We will compare bacterial and eukaryotic replication later in the chapter. 3 The unwinding allows helicase and other single-strand- binding proteins to attach to the single-stranded DNA. Helicase Initiation The circular chromosome of E. coli has a single replication origin (oriC). The minimal sequence required for oriC to function consists of 245 bp that contain several critical sites. Initiator proteins bind to oriC and cause a short section of DNA to unwind. This unwinding allows helicase and other single-strand-binding proteins to attach to the polynucleotide strand ( ◗ FIGURE 12.11). Unwinding Because DNA synthesis requires a singlestranded template and double-stranded DNA must be unwound before DNA synthesis can take place, the cell relies on several proteins and enzymes to accomplish the unwinding. DNA helicases break the hydrogen bonds that exist between the bases of the two nucleotide strands of a DNA molecule. Helicases cannot initiate the unwinding of double-stranded DNA; the initiator proteins first separate DNA strands at the origin, providing a short stretch of single-stranded DNA to which a helicase binds. Helicases bind to the lagging-strand template at each replication fork and move in the 5 : 3 direction along this strand, thus also moving the replication fork ( ◗ FIGURE 12.12). Single-strandbinding proteins ◗ 12.11 E. coli DNA replication begins when initiator proteins bind to oriC, the origin of replication, causing a short stretch of DNA to unwind. After DNA has been unwound by helicase, the singlestranded nucleotide chains have a tendency to form hydrogen bonds and reanneal (stick back together). Secondary structures, such as hairpins (see Figure 8.21), also may form between complementary nucleotides on the same strand. To stabilize the single-stranded DNA long enough for replication to take place, single-strand-binding (SSB) proteins attach tightly to the exposed single-stranded DNA (see Figure 12.12). Unlike many DNA-binding proteins, SSBs are indifferent to base sequence — they will bind to any single-stranded DNA. Single-strand-binding proteins form tetramers (groups of four) that together cover from 35 to 65 nucleotides. Another protein essential for the unwinding process is the enzyme DNA gyrase, a topoisomerase. As discussed in DNA Replication and Recombination 1 DNA helicase binds to the lagging-strand template at each replication fork and moves in the 5’ 3’ direction along this strand, breaking hydrogen bonds and moving the replication fork. 2 Single-strand-binding proteins stabilize the exposed singlestranded DNA. 3 DNA gyrase relieves strain ahead of the replication fork. Origin 5’ 5’ 3’ 3’ Unwinding DNA gyrase Unwinding DNA helicase Single-strandbinding proteins 5’ 5’ 3’ 3’ Unwinding Unwinding ◗ 12.12 DNA helicase unwinds DNA by binding to the laggingstrand template at each replication fork and moving in the 5  B 3 direction along the strand. Chapter 11, topoisomerases control the supercoiling of DNA. In replication, DNA gyrase reduces torsional strain (torque) that builds up ahead of the replication fork as a result of unwinding (see Figure 12.12). It reduces torque by making a double-stranded break in one segment of the DNA helix, passing another segment of the helix through the break, and then resealing the broken ends of the DNA. This action removes a twist in the DNA and reduces the supercoiling. Concepts Replication is initiated at a replication origin, where an initiator protein binds and causes a short stretch of DNA to unwind. DNA helicase breaks hydrogen bonds at a replication fork, and single-strand-binding proteins stabilize the separated strands. DNA gyrase reduces torsional strain that develops as the two strands of doublehelical DNA unwind. Primers All DNA polymerases require a nucleotide with a 3-OH group to which a new nucleotide can be added. Because of this requirement, DNA polymerases cannot initiate DNA synthesis on a bare template; rather, they require a primer — an existing 3-OH group — to get started. How, then, does DNA synthesis begin? An enzyme called primase synthesizes short stretches of nucleotides (primers) to get DNA replication started. Primase synthesizes a short stretch of RNA nucleotides (about 10 – 12 nucleotides long), which provides a 3-OH group to which DNA polymerase can attach DNA nucleotides. (Because primase is an RNA polymerase, it does not require an existing 3-OH group to which nucleotides can be added.) All DNA molecules initially have short RNA primers imbedded within them; these primers are later removed and replaced by DNA nucleotides. On the leading strand, where DNA synthesis is continuous, a primer is required only at the 5 end of the newly synthesized strand. On the lagging strand, where replication is discontinuous, a new primer must be generated at the beginning of each Okazaki fragment ( ◗ FIGURE 12.13). Primase forms a complex with helicase at the replication fork and moves along the template of the lagging strand. The single primer on the leading strand is probably synthesized by the primase – helicase complex on the template of the lagging strand of the other replication fork, at the opposite end of the replication bubble. Concepts Primase synthesizes a short stretch of RNA nucleotides (primers), which provides a 3-OH group for the attachment of DNA nucleotides to start DNA synthesis. Elongation After DNA is unwound and a primer has been added, DNA polymerases elongate the polynucleotide strand by catalyzing DNA polymerization. The best-studied polymerases are those of E. coli, which has at least five different DNA polymerases. Two of them, DNA polymerase I and DNA polymerase III, carry out DNA synthesis associ- 333 334 Chapter 12 Primase Origin Helicase Gyrase 3’ 3’ 5’ 3’ OH OH 5’ 3’ Unwinding Unwinding Primase synthesizes short stretches of RNA nucleotides, providing a 3’-OH group to which DNA polymerase can add DNA nucleotides. On the leading strand, where replication is continuous, a primer is required only at the 5’ end of the newly synthesized strand. DNA synthesis Primer for lagging strand Leading strand 5’ 3’ OH 3’ OH 5’ Primer for lagging strand Unwinding Leading strand On the lagging strand with discontinuous replication, a new primer must be generated at the beginning of each Okazaki fragment. Lagging strand DNA synthesis continues Leading strand 5’ 5’ 3’ 3 ’ Primers 3’ 5’ 3’ Primers 3’ 3’ 5’ 3’ 3’ Unwinding 3’ 5’ 5’ 3’ 3 ’ 3’ 5’ 5’ Unwinding Unwinding Lagging strand Leading strand ◗ 12.13 Primase synthesizes short stretches of RNA nucleotides, providing a 3-OH group to which DNA polymerase can add DNA nucleotides. ated with replication; the other three have specialized functions in DNA repair (Table 12.3). DNA polymerase III is a large multiprotein complex that acts as the main workhorse of replication. DNA polymerase III synthesizes nucleotide strands by adding new nucleotides to the 3 end of growing DNA molecules. This enzyme has two enzymatic activities (Table 12.3). Its 5 : 3 polymerase activity allows it to add new nucleotides in the 5 : 3 direction. Its 3 : 5 exonuclease activity allows it to remove nucleotides in the 3 : 5 direction, enabling it to correct errors. If a nucleotide having an incorrect base is inserted into the growing DNA molecule, DNA polymerase III uses its 3 : 5 exonuclease activity to back up and remove the incorrect nucleotide. It then resumes its 5 : 3 Table 12.3 Characteristics of DNA Polymerases in E. coli 5  B 3 Polymerization 3  B 5 Exonuclease 5  B 3 Exonuclease Function I Yes Yes Yes Removes and replaces primers II Yes Yes No DNA repair; restarts replication after damaged DNA halts synthesis III Yes Yes No Elongates DNA IV Yes No No DNA repair V Yes No No DNA repair; translesion DNA synthesis DNA Polymerase DNA Replication and Recombination polymerase activity. These two functions together allow DNA polymerase III to efficiently and accurately synthesize new DNA molecules. The first E. coli polymerase to be discovered, DNA polymerase I, also has 5 : 3 polymerase and 3 : 5 exonuclease activities (see Table 12.3), permitting the enzyme to synthesize DNA and to correct errors. Unlike DNA polymerase III, however, DNA polymerase I also possesses 5 : 3 exonuclease activity, which is used to remove the primers laid down by primase and to replace them with DNA nucleotides by moving in a 5 : 3 direction. The removal and replacement of primers appear to constitute the main function of DNA polymerase I. DNA polymerases IV and V function in DNA repair. Despite their differences, all of E. coli’s DNA polymerases (a) Template strand 5’ 3’ 3’ 5’ RNA primer added by primase 1 DNA nucleotides have been added to the primer by DNA polymerase III. DNA polymerase I (b) 5’ 3’ 5’ G A 3. 4. 5. 6. 7. T G A C 1. synthesize any sequence specified by the template 2. 3’ 5’ 3’ T C strand; synthesize in the 5 : 3 direction by adding nucleotides to a 3-OH group; use dNTPs to synthesize new DNA; require a primer to initiate synthesis; catalyze the formation of a phosphodiester bond by joining the 5 phosphate group of the incoming nucleotide to the 3-OH group of the preceding nucleotide on the growing strand, cleaving off two phosphates in the process; produce newly synthesized strands that are complementary and antiparallel to the template strands; and are associated with a number of other proteins. 3’ 5’ 3’ A 5’ OH OH 2 DNA polymerase I replaces the RNA nucleotides of the primer with DNA nucleotides. U T OH RNA nucleotide DNA dNTP (c) 5’ 3’ 5’ 3’ 3’ 5’ Nick 3 After the last nucleotide of the RNA primer has been replaced, a nick remains in the sugar–phosphate backbone of the strand. (d) Concepts DNA polymerases synthesize DNA in the 5 : 3 direction by adding new nucleotides to the 3 end of a growing nucleotide strand. DNA ligase After DNA polymerase III attaches a DNA nucleotide to the 3-OH group on the last nucleotide of the RNA primer, each new DNA nucleotide then provides the 3-OH group needed for the next DNA nucleotide to be added. This process continues as long as template is available ( ◗ FIGURE 12.14a). DNA polymerase I follows DNA polymerase III and, using its 5 : 3 exonuclease activity, removes the RNA primer. It then uses its 5 : 3 polymerase activity to replace the RNA nucleotides with DNA nucleotides. DNA polymerase I attaches the first nucleotide to the OH group at the 3 end of the preceding Okazaki fragment and then continues, in the 5 : 3 direction along the nucleotide strand, removing and replacing, one at a time, the RNA nucleotides of the primer ( ◗ FIGURE 12.14b). 5’ 3’ 3’ 5’ DNA ligase 4 DNA ligase seals this nick with a phosphodiester bond between the 5’-P group of the initial nucleotide added by DNA polymerase III and the 3’-OH group of the final nucleotide added by DNA polymerase I. ◗ 12.14 DNA ligase seals the nick left by DNA polymerase I in the sugar–phosphate backbone after the polymerase has added the final nucleotide. After polymerase I has replaced the last nucleotide of the RNA primer with a DNA nucleotide, a nick remains in the sugar – phosphate backbone of the new DNA strand. The 3-OH group of the last nucleotide to have been added by DNA polymerase I is not attached to the 5phosphate group of the first nucleotide added by DNA polymerase III ( ◗ FIGURE 12.14c). This nick is sealed by the enzyme DNA ligase, which catalyzes the formation of a 335 336 Chapter 12 Table 12.4 Components required for replication in bacterial cells Component Function Initiator protein Binds to origin and separates strands of DNA to initiate replication DNA helicase Unwinds DNA at replication fork Single-strand-binding proteins Attach to single-stranded DNA and prevent reannealing DNA gyrase Moves ahead of the replication fork, making and resealing breaks in the double-helical DNA to release torque that builds up as a result of unwinding at the replication fork DNA primase Synthesizes short RNA primers to provide a 3-OH group for attachment of DNA nucleotides DNA polymerase III Elongates a new nucleotide strand from the 3-OH group provided by the primer DNA polymerase I Removes RNA primers and replaces them with DNA DNA ligase Joins Okazaki fragments by sealing nicks in the sugar – phosphate backbone of newly synthesized DNA phosphodiester bond without adding another nucleotide to the strand ( ◗ FIGURE 12.14d). Some of the major enzymes and proteins required for replication are summarized in Table 12.4. Concepts After primers are removed and replaced, the nick in the sugar – phosphate linkage is sealed by DNA ligase. www.whfreeman.com/pierce More information on helicase, primase, and single-strand-binding proteins The replication fork Now that the major enzymatic components of elongation — DNA polymerases, helicase, primase, and ligase — have been introduced, let’s consider how these components interact at the replication fork. Because the synthesis of both strands takes place simultaneously, two units of DNA polymerase III must be present at the replication fork, one for each strand. In one model of the replication process ( ◗ FIGURE 12.15), the two units of DNA polymerase III are connected, and the lagging-strand template loops around so that, as the DNA polymerase III complex moves along the helix, the two antiparallel strands can undergo 5 : 3 replication simultaneously. In summary, each active replication fork requires five basic components: 1. helicase to unwind the DNA, 2. single-strand-binding proteins to keep the nucleotide strands separate long enough to allow replication, 3. the topoisomerase gyrase to remove strain ahead of the replication fork, 4. primase to synthesize primers with a 3-OH group at the beginning of each DNA fragment, and 5. DNA polymerase to synthesize the leading and lagging nucleotide strands. www.whfreeman.com/pierce Additional information about the mechanism of replication and an animation of a replication fork Termination In some DNA molecules, replication is terminated whenever two replication forks meet. In others, specific termination sequences block further replication. A termination protein, called Tus in E. coli, binds to these sequences. Tus blocks the movement of helicase, thus stalling the replication fork and preventing further DNA replication. The fidelity of DNA replication Overall, replication results in an error rate of less than one mistake per billion nucleotides. How is this incredible accuracy achieved? No single process could produce this level of accuracy; a series of processes are required, each catching errors missed by the preceding ones ( ◗ FIGURE 12.16). DNA polymerases are very particular in pairing nucleotides with their complements on the template strand. Errors in nucleotide selection by DNA polymerase arise only about once per 100,000 nucleotides. Most of the errors that do arise in nucleotide selection are corrected in a second process called proofreading. When a DNA polymerase inserts an incorrect nucleotide into the growing strand, the 3-OH group of the mispaired nucleotide is not correctly positioned for accepting the next nucleotide. The incorrect positioning stalls the poly- DNA Replication and Recombination Two units of DNA polymerase III Helicase–primase complex Leading strand 3’ 5’ DNA gyrase 3’ Third Primer Second primer Single-strandbinding proteins 5’ Lagging strand First Primer 1 The lagging strand loops around so that 5’ 3’ synthesis can take place on both antiparallel strands. 3’ 5’ First primer 5’ 3’ Second primer Third primer merization reaction, and the 3 : 5 exonuclease activity of DNA polymerase removes the incorrectly paired nucleotide. DNA polymerase then inserts the correct nucleotide. Together, proofreading and nucleotide selection result in an error rate of only one in 10 million nucleotides. A third process, called mismatch repair (discussed further in Chapter 17), corrects errors after replication is complete. Any incorrectly paired nucleotides remaining after replication produce a deformity in the secondary structure of the DNA; the deformity is recognized by enzymes that excise an incorrectly paired nucleotide and use the original nucleotide strand as a template to replace the incorrect nucleotide. Mismatch repair requires the ability to distinguish between the old and the new strands of DNA, because the enzymes need some way of determining which of the two incorrectly paired bases to remove. In E. coli, methyl groups (CH3) are added to particular nucleotide sequences, but only after replication. Thus, methylation lags behind replication: so, immediately after DNA synthesis, only the old DNA strand is methylated. Therefore it can be distinguished from the newly synthesized strand, and mismatch repair takes place preferentially on the unmethylated nucleotide strand. Concepts 2 As the lagging-strand unit of DNA polymerase III comes up against the end of the previously synthesized Okazaki fragment with the first primer,… 3’ Third primer 5’ First primer 3’ Fourth primer Second primer 3 …the polymerase must release the template and shift to a new position farther along the template (at the third primer) to resume synthesis. Conclusion: In this model, DNA must form a loop so that both strands can replicate simultaneously. ◗ 12.15 In one model of DNA replication in E. coli, the two units of DNA polymerase III are connected, and the lagging-strand template forms a loop so that replication can take place on the two anti-parallel DNA strands. Components of the replication machinery at the replication fork are shown at the top. Replication is extremely accurate, with less than one error per billion nucleotides. This accuracy results from the processes of nucleotide selection, proofreading, and mismatch repair. Connecting Concepts The Basic Rules of Replication Bacterial replication requires a number of enzymes, proteins, and DNA sequences that function together to synthesize a new DNA molecule. These components are important, but it is critical that we not become so immersed in the details of the process that we lose sight of general principles of replication. 1. Replication is always semiconservative. 2. Replication begins at sequences called origins. 3. DNA synthesis is initiated by short segments of RNA called primers. 4. The elongation of DNA strands is always in the 5 : 3 direction. 5. New DNA is synthesized from dNTPs; in the polymerization of DNA, two phosphates are cleaved from a dNTP and the resulting nucleotide is added to the 3-OH group of the growing nucleotide strand. 337 338 Chapter 12 Nucleotide selection DNA proofreading GGGA CCCT T T G AA GG C A GGGA CCCT T T G AA GG CC A TT GG G ATTGGGATTGGT A A C DNA polymerase TT GG G ATTGGGATTGGT GGGA CCCT T T G AA GG CC A C TT GG A G ATTGGGATTGGT C DNA polymerase pairs nucleotides during DNA replication with a high rate of accuracy. 1 If an incorrect base is added,… 2 …the DNA polymerase is stalled and… 3 …removes the incorrect base,… 4 …replacing it with the correct one. Replication proceeds. Mismatch repair GGGAT TCGTATTAGGCATAGCACT CCC TAAGCATAATACGTATCGTGA New DNA 5 Sometimes proofreading fails and an incorrect base is inserted in the new DNA. GGGAT TCGTATTAGGCATAGCACT CCC TAAGCATAATACGTATCGTGA GGGAT TCGTATTAGGCATAGCACT CCC TAAGCATAATCCGTATCGTGA C 6 Mismatch repair enzymes recognize the deformity in secondary structure caused by the mismatched base… A 7 …and replace the mismatched base with the correct one. Conclusion: Multiple mechanisms ensure highly accurate DNA replication. ◗ 12.16 A series of processes are required to ensure the incredible accuracy of DNA replication. Among these processes are DNA selection, proofreading, and mismatch repair. 6. Replication is continuous on the leading strand and discontinuous on the lagging strand. 7. New nucleotide strands are made complementary and antiparallel to their template strands. 8. Replication takes place at very high rates and is astonishingly accurate, thanks to precise nucleotide selection, proofreading, and repair mechanisms. Eukaryotic DNA Replication Although not as well understood, eukaryotic replication resembles bacterial replication in many respects. The most obvious differences are that eukaryotes have: (1) multiple replication origins in their chromosomes; (2) more types of DNA polymerases, with different functions; and (3) nucleosome assembly immediately following DNA replication. Eukaryotic origins Researchers first isolated eukaryotic origins of replication from yeast cells by demonstrating that certain DNA sequences confer the ability to replicate when transferred from a yeast chromosome to small circular pieces of DNA (plasmids). These autonomously replicating sequences (ARSs) enabled any DNA to which they were attached to replicate. They were subsequently shown to be the origins of replication in yeast chromosomes. Yeast ARSs typically consist of 100 to 120 bp of DNA. A multiprotein complex, the origin recognition complex (ORC), binds to the ARS and probably unwinds the DNA in this region. Interestingly, ORCs also function in regulating transcription. Concepts Eukaryotic DNA contains many origins of replication. At each origin, a multiprotein origin recognition complex binds to initiate the unwinding of the DNA. Licensing of DNA replication Eukaryotic cells utilize thousands of origins, and so the entire genome can be replicated in a timely manner. The use of multiple origins, however, creates a special problem in the timing of replication: the entire genome must be precisely replicated once and only once in each cell cycle so that no genes are left unreplicated and no genes are replicated more than once. How does a cell ensure that replication is initiated at thousands of origins only once per cell cycle? The precise replication of DNA is accomplished by the separation of the initiation of replication into two distinct steps. In the first step, the origins are licensed, meaning that they are approved for replication. This step is early in the cell DNA Replication and Recombination cycle when a replication licensing factor attaches to an origin. In the second step, initiator proteins cause the separation of DNA strands and the initiation of replication at each licensed origin. The key is that initiator proteins function only at licensed origins. As the replication forks move away from the origin, the licensing factor is removed, leaving the origin in an unlicensed state, where replication cannot be initiated again until the license is renewed. To ensure that replication takes place only once each cell cycle, the licensing factor is active only after the cell has completed mitosis and before the initiator proteins become active. Unwinding Several helicases that separate double-stranded DNA have been isolated from eukaryotic cells, as have singlestrand-binding proteins and topoisomerases (which have a function equivalent to the DNA gyrase in bacterial cells). These enzymes and proteins are assumed to function in unwinding eukaryotic DNA in much the same way as unwinding in bacterial cells. (Table 12.5). DNA polymerase , which contains primase activity, initiates nuclear DNA synthesis by synthesizing an RNA primer, followed by a short string of DNA nucleotides. After DNA polymerase  has laid down from 30 to 40 nucleotides, DNA polymerase  completes replication on the leading and lagging strands. DNA polymerase  does not participate in replication but is associated with the repair and recombination of nuclear DNA. DNA polymerase  replicates mitochondrial DNA; a -like polymerase also replicates chloroplast DNA. Similar in structure and function to DNA polymerase , DNA polymerase  appears to take part in nuclear replication of both the leading and the lagging strands, but its precise role is not yet clear. Other DNA polymerases (, , , , , ) allow replication to bypass damaged DNA (called translesion replication) or play a role in DNA repair. Many of the DNA polymerases have multiple roles in replication and DNA repair (see Table 12.5). Concepts Eukaryotic DNA polymerases A significant difference in the processes of bacterial and eukaryotic replication is in the number and functions of DNA polymerases. Eukaryotic cells contain a number of different DNA polymerases that function in replication, recombination, and DNA repair There are at least thirteeen different DNA polymerases in eukaryotic cells. DNA polymerases  and  carry out replication on the leading and lagging strands. Table 12.5 DNA polymerases in eukaryotic cells DNA Polymerase 5  B 3 Polymerase Activity 3  B 5 Exonuclease Activity  (alpha) Yes No Initiation of nuclear DNA synthesis and DNA repair (beta) Yes No DNA repair and recombination of nuclear DNA  (gamma) Yes Yes Replication of mitochondrial DNA  (delta) Yes Yes Leading- and lagging-strand synthesis of nuclear DNA, DNA repair, and translesion DNA synthesis  (epsilon) Yes Yes Unknown; probably repair and replication of nuclear DNA  (zeta) Yes No Translesion DNA synthesis (eta) Yes No Translesion DNA synthesis  (theta) Yes No DNA repair Cellular Function  (iota) Yes No Translesion DNA synthesis (kappa) (lambda) (mu)  (sigma) Yes Yes Yes Yes No No No No Translesion DNA synthesis DNA repair DNA repair Nuclear DNA replication (possibly), DNA repair, and sister-chromatid cohesion 339 340 Chapter 12 0.1 m ◗ 12.17 This electron micrograph of eukaryotic DNA in the process of replication clearly shows that newly replicated DNA is already covered with nucleosomes (dark circles). (Victoria Foe). Nucleosome assembly Eukaryotic DNA is complexed Location of replication within the nucleus The DNA to histone proteins in nucleosome structures that contribute to the stability and packing of the DNA molecule (see Figure 11.6). The disassembly and reassembly of nucleosomes on newly synthesized DNA probably takes place in replication, but the precise mechanism for these processes has not yet been determined. The unwinding of doublestranded DNA and the assembly of the replication enzymes on the single-stranded templates probably require the disassembly of the nucleosome structure. Electron micrographs of eukaryotic DNA show recently replicated DNA already covered with nucleosomes ( ◗ FIGURE 12.17), indicating that nucleosome structure is reassembled quickly. Before replication, a single DNA molecule is associated with histone proteins. After replication and nucleosome assembly, two DNA molecules are associated with histone proteins. Do the original histones remain together, attached to one of the new DNA molecules, or do they disassemble and mix with new histones on both DNA molecules? Techniques similar to those employed by Meselson and Stahl to determine the mode of DNA replication were used to address this question. Cells were cultivated for several generations in a medium containing amino acids labeled with a heavy isotope. The histone proteins incorporated these heavy amino acids and were relatively dense ( ◗ FIGURE 12.18). The cells were then transferred to a culture medium that contained amino acids with a light isotope. Histones assembled after the transfer possessed the new, relatively light amino acids and were less dense. After allowing replication to take place, the histone octamers were isolated and centrifuged in a density gradient. Results show that, after replication, the octamers were in a continuous band between high density (representing old octamers) and low density (representing new octamers). This finding suggests that newly assembled octamers consist of a random mixture of old and new histones. polymerases that carry out replication are frequently depicted as moving down the DNA template, much as a locomotive travels along a train track. Recent evidence suggests that this view is incorrect. A more accurate view is that the polymerase is fixed in location, and template DNA is threaded through it, with newly synthesized DNA molecules emerging from the other end. Concepts After DNA replication, new nucleosomes quickly reassemble on the molecules of DNA. Nucleosomes apparently break down in the course of replication and reassemble from a random mixture of old and new histones. Experiment Question: What happens to histones during eukaryotic DNA replication? 1 Grow cells for several generations in medium that contains amino acids labeled with a heavy isotope. Change medium 2 Transfer the cells to a medium that contains amino acids labeled with a light isotope. Replication Isolate octamers 3 Isolate histone octamers before and after replication… Isolate octamers Spin 4 …and subject them to densitygradient centrifugation. Spin 5 Newly synthesized octamers are less dense and thus will be higher in the tube. 6 Old octamers are dense and will move toward the bottom of the tube. Single band; Broad band; old octamers octamers with with heavy mixture of old amino acids and new histones (heavy and light amino acids) Conclusion: After DNA replication, the new octamers are a random mixture of old and new histones. ◗ 12.18 Experimental procedure for studying how nucleosomes dissociate and reassociate during replication. DNA Replication and Recombination Techniques of fluorescence microscopy, which are capable of revealing active sites of DNA synthesis, show that most replication in the nucleus of a eukaryotic cell takes place at a limited number of fixed sites, often referred to as replication factories. Time-lapse micrographs reveal that newly duplicated DNA is extruded from these particular sites. Similar results have also been obtained with bacterial cells. DNA synthesis at the ends of chromosomes A fundamental difference between eukaryotic and bacterial replication arises because eukaryotic chromosomes are linear and thus have ends. As already stated, the 3-OH group needed for replication by DNA polymerases is provided at the initiation of replication by RNA primers that are synthesized by primase. This solution is temporary, because eventually the primers must be removed and replaced by DNA nucleotides. In a circular DNA molecule, elongation around the circle eventually provides a 3-OH group immediately in front of the primer ( ◗ FIGURE 12.19a). After the primer has been removed, the replacement DNA nucleotides can be added to this 3-OH group. In linear chromosomes with multiple origins, the elongation of DNA in adjacent replicons also provides a 3-OH group preceding each primer ( ◗ FIGURE 12.19b). At the very end of a linear chromosome, however, there is no adjacent stretch of replicated DNA to provide this crucial 3-OH group. Once the primer at the end of the chromosome has been removed, it cannot be replaced with DNA nucleotides, which produces a gap at the end of the chromosome ( ◗ FIGURE 12.19c), suggesting that the chromosome should become progressively shorter with each round of replication. The chromosome would be shortened each generation, leading to the eventual elimination of the entire telomere, destabilization of the chromosome, and cell death. But chromosomes don’t become shorter each generation and destabilize; so how are the ends of linear chromosomes replicated? The ends of chromosomes — the telomeres — possess several unique features, one of which is the presence of many copies of a short repeated sequence. In the protozoan Tetrahymena, this telomeric repeat is CCCCAA (see Table 11.2), with the G-rich strand typically protruding beyond the C-rich strand ( ◗ FIGURE 12.20a): end of 5–CCCCAA toward ; : centromere chromosome 3–GGGGTTGGGGTT The single-stranded protruding end of the telomere can be extended by telomerase, an enzyme with both a protein and an RNA component (also known as a ribonucleoprotein). The RNA part of the enzyme contains from 15 to 22 nucleotides that are complementary to the sequence on the G-rich strand. This sequence pairs with the overhanging 3 end of the DNA ( ◗ FIGURE 12.20b) and provides a template for the synthesis of additional DNA copies of the repeats. DNA nucleotides are added to the 3 end of the (a) Circular DNA Primer OH Replication around the circle provides a 3’-OH group in front of primer, onto which nucleotides can be added when the primer is replaced. 3’ 5’ OH 3 ’ 5’ 3’ Replication around circle Template DNA (b) Linear DNA Telomeres 1 In linear DNA with multiple origins of replication,… Origin Primer 5’ 3’ 3’ 5’ Primer 2 …elongation of DNA in adjacent replicons provides a 3’-OH group for replacement of primers. Replication and unwinding Lagging strand 5’ 3’ 3’ 3’ 3’ 3’ 3’ 3’ Leading strand 3’ 3’ 3’ Unwinding 3’ 3’ 3’ 3’ 5’ Primer at end of chromosome 5’ 3’ 3’ 5’ 3’OH 5’ 3’ 3’ 5’ 3 Primers at the ends of chromosomes cannot be replaced, because there is no adjacent 3’-OH to which DNA nucleotides can be attached. (c) End of a linear chromosome 5’ 3’ Synthesis of primer 5’ 3’ 5’ 3’ OH Elongation of DNA 5’ 3’ 3’ 5’ 4 When the primer at the end of a chromosome is removed,… 5’ 3’ Removal of primer 3’ 5’ 5 …there is no 3’-OH group to which DNA nucleotides can be attached, producing a gap. Gap left by removal of primer Conclusion: In the absence of special mechanisms, DNA replication would leave gaps due to the removal of primers at the ends of chromosomes. ◗ 12.19 DNA synthesis must differ at the ends of circular and linear chromosomes. strand one at a time ( ◗ FIGURE 12.20c) and, after several nucleotides have been added, the RNA template moves down the DNA and more nucleotides are added to the 3 end 341 Chapter 12 1 The telomere has a protruding end with a G-rich repeated sequence. (a) 5’ CCCCAA 3’ GGGGTTGGGGTT 2 The RNA part of telomerase is complementary to the G-rich strand and pairs with it, providing a template for the synthesis of copies of the repeats. Telomerase 3’ (b) RNA 5’ template C CCCAACCCCA ACCCCAA 3’ GGGGTTGGGGTT 3 Nucleotides are added to the 3’ end of the G-rich strand. 3’ C CCCAACCCCA ACCCCAA 3’ GGGGTTGGGGTTGGGGTT 5’ (c) New DNA 4 After several nucleotides have been added, the RNA template moves along the DNA. (d) 5’ 3’ C CCCAACCCCA A 5’ CCCCAA 3’ GGGGTTGGGGTTGGGGTT ( ◗ FIGURE 12.20d). Usually, from 14 to 16 nucleotides are added to the 3 end of the G-rich strand. In this way, the telomerase can extend the 3 end of the chromosome without the use of a complementary DNA template ( ◗ FIGURE 12.20e). How the complementary C-rich strand is synthesized ( ◗ FIGURE 12.20f) is not yet clear. It may be synthesized by conventional replication, with primase synthesizing an RNA primer on the 5 end of the extended (Grich) template. The removal of this primer once again leaves a gap at the 5 end of the chromosome, but this gap does not matter, because the end of the chromosome is extended at each replication by telomerase; no genetic information is lost, and the chromosome does not become shorter overall. The extended single-strand end may fold back on itself, forming a terminal loop by nonconventional pairing of bases ( ◗ FIGURE 12.21). This loop could provide a 3-OH group for the attachment of DNA nucleotides along the C-rich strand. Telomerase is present in single-celled organisms, germ cells, early embryonic cells, and certain proliferative somatic cells (such as bone-marrow cells and cells lining the intestine), all of which must undergo continuous cell division. Most somatic cells have little or no telomerase activity, and chromosomes in these cells progressively shorten with each cell division. These cells are capable of only a limited number of divisions; once the telomeres shorten beyond a critical point, a chromosome becomes unstable, has a tendency to undergo rearrangements, and is degraded. These events lead to cell death. The shortening of telomeres may contribute to the process of aging. Genetically engineered mice that lack a functional telomerase gene (and therefore do not express 5 More nucleotides are added. (e) 5’ 3’ 5’ C CCCAACCCCA A 5’ CCCCAA 3’ GGGGTTGGGGTTGGGGTTGGGGTT GGGGTTGGGGTTGGGGTTGGGGTTGGGG 3’ 1 The G-rich single-strand end that has been extended by telomerase may fold back on itself,… 6 The telomerase is removed. 5’ C CCCA ACCC 5’ CA 5’ CCCCAA 3’ GGGGTTGGGGTTGGGGTTGGGGTT Nonconventional base pairing 2 …forming a terminal loop by nonconventional base pairing… 7 Synthesis takes place on the complementary strand (see Figure 12.21), filling in the gap due to the removal of the RNA primer at the end. DNA replication 5’ CCCCAACCCCAACCCCAA 3’ GGGGTTGGGGTTGGGGTTGGGGTT Conclusion: Telomerase extends the DNA, filling in the gap due to the removal of the RNA primer. The enzyme telomerase is responsible for the replication of chromosome ends. CCCCAA GGGGTTGGGG GGGGTTGGGGTTGGGG T T 3’ 5’ (f) ◗ 12.20 3’ OH GGGGTTGGGG GGGGTTGGGGTTGGGG T T 3’ A 342 3 … to provide a 3’-OH group for attachment of DNA nucleotides. ◗ 12.21 The complementary G-rich strand at the end of the telomere must be primed before the extension of the 3 end of the chromosome by telomerase. DNA Replication and Recombination telomerase in somatic or germ cells) experience progressive shortening of their telomeres in successive generations. After several generations, these mice show some signs of premature aging, such as graying, hair loss, and delayed wound healing. Through genetic engineering, it is also possible to create somatic cells that express telomerase. In these cells, telomeres do not shorten, cell aging is inhibited, and the cells will divide indefinitely. Telomerase also appears to play a role in cancer. Cancer tumor cells have the capacity to divide indefinitely, and many tumor cells express the telomerase enzyme. As will be discussed in Chapter 21, cancer is a complex, multistep process that usually requires mutations in at least several genes. Telomerase activation alone does not lead to cancerous growth in most cells, but it does appear to be required along with other mutations for cancer to develop. tial for some types of DNA repair (as will be discussed in Chapter 17). Homologous recombination is a remarkable process: a nucleotide strand of one chromosome aligns precisely with a nucleotide strand of the homologous chromosome, breaks arise in corresponding regions of different DNA molecules, parts of the molecules precisely change place, and then the pieces are correctly joined. In this complicated series of events, no genetic information is lost or gained. Although the precise A a B b Chromosomes cross over A B a b Concepts The ends of eukaryotic chromosomes are replicated by an RNA – protein enzyme called telomerase. This enzyme adds extra nucleotides to the G-rich DNA strand of the telomere. Exchange of segments www.whfreeman.com/pierce More on telomerase, including an animated cartoon that illustrates the process of replication by telomerase A b a B DNA synthesis A a B b DNA synthesis A A a a B B b b Chromosomes cross over A A B B a a b b Exchange of segments Replication in Archaea A A b b A A B b The process of replication in archaebacteria has a number of features in common with replication in eukaryotic cells — many of the proteins taking part are more similar to those in eukaryotic cells than to those in to those in eubacteria. Although some archaea have a single origin of replication, as do eubacteria, this origin does not contain the typical sequences recognized by bacterial initiator proteins but instead has sequences that are similar to those found in eukaryotic origins. These similarities in replication between archaeal and eukaryotic cells reinforce the conclusion that the archaea are more closely related to eukaryotic cells than to the prokaryotic eubacteria. a a B B a a B b Meiosis Recombination is the exchange of genetic information between DNA molecules; when the exchange is between homologous DNA molecules, it is called homologous recombination. This process takes place in crossing over, in which homologous regions of chromosomes are exchanged (see Figure 2.17) and genes are shuffled into new combinations. Recombination is an extremely important genetic process because it increases genetic variation. Rates of recombination provide important information about linkage relations among genes, which is used to create genetic maps (see Figures 7.12 through 7.14). Recombination is also essen- A b A b a B a The Molecular Basis of Recombination Meiosis B A B a b Nonrecombinant chromosomes A b a B All recombinant chromosomes Recombinant chromosomes If crossing over took place before DNA synthesis, all products of meiosis would be recombinants. If crossing over took place after DNA synthesis, meiosis would produce both nonrecombinant and recombinant products. Conclusion: Because crossing over results in recombinant and nonrecombinant products, it must take place after DNA synthesis. ◗ 12.22 Genetic evidence suggests that crossing over takes place after DNA synthesis. 343 344 Chapter 12 (a) (b) 1 Two double-stranded DNA molecules from homologous chromosomes align. A 2 Single-strand breaks occur in the same position on both DNA molecules. B (c) 3 The free end of each broken strand migrates to the other DNA molecule. A 4 Each invading strand joins to the broken end of the other DNA molecule, creating a Holliday junction, and begins to displace the original complementary strand. B A B Holliday junction a b a b a b ◗ 12.23 In the Holliday model, homologous recombination is accomplished through a single-strand break in each DNA duplex, strand displacement, branch migration, and resolution of a single Holliday junction. molecular mechanism of homologous recombination is still poorly known, the exchange is probably accomplished through the pairing of complementary bases. A singlestranded DNA molecule of one chromosome pairs with a single-stranded DNA molecule of another, forming heteroduplex DNA. In meiosis, homologous recombination (crossing over) could theoretically take place before, during, or after DNA synthesis. Cytological, biochemical, and genetic evidence indicates that it takes place in prophase I of meiosis, whereas DNA replication takes place earlier, in interphase. Thus, crossing over must entail the breaking and rejoining of chromatids when homologous chromosomes are at the fourstrand stage ( ◗ FIGURE 12.22). This section explores some theories about how the process of recombination takes place. The Holliday Model One model of homologous recombination, the Holliday junction, states that the process is initiated by singlestrand breaks in the DNA molecule. This model begins with double-stranded DNA molecules from two homologous chromosomes that carry identical (or nearly identical) nucleotide sequences. These two DNA molecules align precisely, and so their homologous sequences sit side by side ( ◗ FIGURE 12.23a). Single-strand breaks in the same position on both DNA molecules allow the free ends of the strands to move to the other DNA molecule ( ◗ FIGURE 12.23b and c). Each invading strand joins to the broken end of the other homologous DNA molecule and begins to displace the original complementary strand, taking its place by hydrogen bonding to the original strand. The invasion and joining take place on both DNA molecules, creating two heteroduplex DNAs, each consisting of one original strand plus one new strand from the other DNA molecule. The point at which nucleotide strands pass from one DNA molecule to the other is the cross bridge. In the Holliday model of recombination, there is a single cross bridge. As the two nucleotide strands exchange positions, the cross bridge moves along the molecules in a process called branch migration ( ◗ FIGURE 12.23d). The exchange of nucleotide strands and branch migration create two duplex molecules connected by the cross bridge. This structure is termed the Holliday intermediate. Holliday intermediates in E. coli and yeast have been observed with electron microscopy. If the ends of the two interconnected duplexes illustrated in Figure 12.23d are pulled away from one another, we obtain the structure illustrated in ◗ FIGURE 12.23e. If you carefully compare parts d and e, you will see that the structures in each are the same; the only difference is that, in part e, the ends of the molecules have been pulled apart. The next step in the Holliday model is easier to visualize if we rotate the bottom half of the Holliday intermediate by 180 degrees, producing the structure shown in ◗ FIGURE 12.23f. These interconnected DNA duplexes are then separated by additional cleavage and reunion of the nucleotide strands. The duplexes can be cleaved in one of two ways, as shown by two different pathways in Figure 12.23. Cleavage may be in the horizontal plane ( ◗ FIGURE 12.23g), in which case the nucleotide strands are rejoined as shown in ◗ FIGURE 12.23h, and two DNA molecules are produced. Although both resulting DNA molecules contain a patch of heteroduplex DNA, the genes on either end of the molecules are identical with those originally present (gene A with B, and gene a with b). These DNA molecules are called patched recombinants ( ◗ FIGURE 12.23i). On the other hand, cleavage of the Holliday structure in the vertical plane and rejoining of the nucleotide strands ( ◗ FIGURE 12.23j) produces spliced recombinants ( ◗ FIGURE 12.23k). In these recombinants, both resulting DNA molecules are heteroduplex, and recombination has taken place between loci at the ends of the molecules; now gene A is paired with b, and gene a is paired with B. Recombination is equally likely to produce patched and spliced recombinants. www.whfreeman.com/pierce An animated cartoon that will help you visualize the Holliday structure and its resolution DNA Replication and Recombination (d) (f) A (e) 5 Branch migration takes place as the two nucleotide strands exchange positions, creating the two duplex molecules. A A 6 We can view of this structure with the ends of the two interconnected duplexes pulled away from one another. 7 Rotation of the bottom half of the structure… B B B a b Heteroduplex DNA Branch point b b a a (g) A Holliday intermediate 8 …produces this structure. The Double-Strand-Break Model In the Holliday model, recombination starts with singlestrand breaks at the same positions in two homologous DNA molecules. The double-strand-break model, in contrast, begins with double-strand breaks in one of the two aligned DNA molecules ( ◗ FIGURE 12.24a). On both sides of the break, an enzyme nibbles away nucleotides, producing a gap in the DNA, with some single-stranded DNA on each side ( ◗ FIGURE 12.24b). A free 3 end then invades the other unbroken DNA molecule and displaces the homologous strand ( ◗ FIGURE 12.24c). The 3 end of the invading strand is elongated by DNA synthesis, which further displaces the original strand of the unbroken molecule ( ◗ FIGURE 12.24d). The displaced strand forms a loop that fills the gap in the broken DNA molecule ( ◗ FIGURE 12.24e) and serves as a template for the synthesis of a complementary DNA strand ( ◗ FIGURE 12.24f). The result is that two heteroduplex DNA molecules are joined by two cross bridges ( ◗ FIGURE 12.24g), in contrast with the single cross bridge produced in the Holliday single-strand-break model. The interconnected molecules produced in the double-strand-break model can be separated by further cleavage and reunion of the nucleotide strands, in the same way that the Holliday intermediate is separated in the Holliday single-strand-break model (see Figure 12.23g – k). Patched or spliced recombinant products can be produced, depending on whether cleavage is in the vertical or the horizontal plane. Evidence for the double-strand-break model originally came from results of genetic crosses in yeast that could not be explained by the Holliday model. Subsequent observations in yeast showed that double-strand breaks appear in meiosis during prophase I when crossing over occurs and that mutant strains that are unable to form double-strand breaks do not exhibit meiotic recombination. Although considerable evidence supports the doublestrand-break model in yeast, the extent to which it applies to other organisms is not known. B Horizontal plane 9 Cleavage in the horizontal plane… 12 Cleavage in the vertical plane… b Vertical plane a Cleavage (j) Cleavage A (h) A 10 …and rejoining of the nucleotide strands,… 13 …and rejoining of the nucleotide strands… B B b b a a (i) Non-crossover recombinants (k) Crossover recombinants A B A b a b a B 11 …produces non-crossover recombinants consisting of two heteroduplex molecules. 14 …produces crossover recombinants consisting of two heteroduplex molecules. Conclusion: The Holliday model predicts non-crossover or crossover recombinant DNA, depending on whether cleavage is in the horizontal or the vertical plane. 345 346 Chapter 12 1 Two double-stranded DNA molecules from homologous chromosomes align. (a) 2 A double-strand break occurs in one of the molecules. (b) 5’ 3’ 3’ 5’ 3 Nucleotides are enzymatically removed on one of the strands, producing some single-stranded DNA on each side. (c) 3’ 3' 5’ 3’ 5’ 4 A free 3’ end invades and displaces a strand of the unbroken DNA molecule. (d) 3’ 3' 5’ 3’ 5’ 5 The 3’ end then elongates, further displacing the original strand. (e) 3’ 3' 5’ 3’ 5’ 6 The displaced strand forms a loop that base pairs with the broken DNA molecule. (f) 3’ 3' 5’ 3’ 5’ 7 DNA synthesis is initiated at the 3’ end of the bottom strand, the displaced loop being used as a template. (g) Cross bridges ◗ 12.24 8 Strand attachment produces two Holliday junctions, which can each be separated by cleavage and reunion. In the double-strand-break model, recombination is accomplished through a double-strand break in one DNA duplex, strand displacement, DNA synthesis, and resolution of two Holliday junctions. Concepts Homologous recombination requires the formation of heteroduplex DNA consisting of one nucleotide strand from each of two different chromosomes. In the Holliday model, homologous recombination is accomplished through a single-strand break in the DNA, strand displacement, and branch migration. In the double-strand-break model, recombination is accomplished through doublestrand breaks, strand displacement, and branch migration. Enzymes Required for Recombination Recombination between DNA molecules requires the unwinding of DNA helices, the cleavage of nucleotide strands, strand invasion, and branch migration, followed by further strand cleavage and union to remove cross bridges. Much of what we know about these processes arises from studies of gene exchange in E. coli. Although bacteria do not undergo meiosis, they do have a type of sexual reproduction (conjugation), in which one bacterium donates its chromosome to another (discussed more fully in Chapter 8). Subsequent to conjugation, the recipient bacterium has two chromosomes, which may undergo homologous recombination. Geneticists have isolated mutant strains of E. coli that are deficient in recombination; the study of these strains has resulted in the identification of genes and proteins that play a role in bacterial recombination, revealing several different pathways by which it can take place. Three genes that play a pivotal role in E. coli recombination are recB, recC, and recD, which encode three polypeptides that together form the RecBCD protein. This protein unwinds double-stranded DNA and is capable of cleaving nucleotide strands. The recA gene encodes the RecA protein that allows a single strand to invade a DNA helix and the subsequent displacement of one of the original strands. Thus invasion and displacement are necessary for both the single-strand- and the double-strand-break models of homologous recombination. The ruvA and ruvB genes encode proteins that catalyze branch migration, and the ruvC gene produces a protein, called resolvase, that cleaves Holliday structures. Singlestrand-binding proteins, DNA ligase, DNA polymerases, and DNA gyrase also play roles in various types of recombination, in addition to their functions in DNA replication. Concepts A number of proteins have roles in recombination, including RecA, RecBCD, RuvA, RuvB, resolvase, single-strand-binding proteins, ligase, DNA polymerases, and gyrase. DNA Replication and Recombination Connecting Concepts Across Chapters This chapter has built on a central concept introduced in Chapter 2, that cell division is preceded by replication of the genetic material. In Chapter 2, we saw that DNA replication takes place in the S phase of the cell cycle and that several checkpoints ensure that division does not take place in the absence of DNA replication. The current chapter examined the process of DNA synthesis. DNA is sometimes said to be a self-replicating molecule, but nothing could be farther from the truth. Replication requires much more than a DNA template; a large number of proteins and enzymes also are necessary. Despite this complexity, a few rules summarize the process: (1) all replication is semiconservative, (2) new DNA molecules always elongate at the 3 end (replication is 5 : 3), (3) replication begins at sequences called origins and requires RNA primers for initiation, (4) DNA synthesis takes place continuously on one strand and discontinuously on the other, and (5) newly synthesized nucleotide strands are antiparallel and complementary to their template strands. 347 As we have seen, replication takes place with a high degree of accuracy; this accuracy is essential to maintain the integrity of genetic information as DNA molecules are copied again and again. The accuracy of replication is maintained by several different mechanisms, including precision in nucleotide selection, the ability of DNA polymerases to proofread and correct mistakes, and the detection and repair of residual mismatches after replication (mismatch repair). An understanding of DNA replication provides a foundation for several topics that will be introduced in later chapters of this book. Chapter 18 (on recombinant DNA technology) examines the polymerase chain reaction and other techniques (DNA sequence analysis and cloning) that require an understanding of DNA synthesis. In Chapter 17 (on gene mutation and DNA repair), we learn that, in spite of the accuracy of DNA synthesis, errors do arise and sometimes lead to mutations. These errors are addressed by mechanisms of DNA repair, many of which require DNA synthesis. The movement of transposable genetic elements (Chapter 11) also requires DNA synthesis. CONCEPTS SUMMARY • Replication is semiconservative: DNA’s two nucleotide strands separate and each serves as a template on which a new strand is synthesized • A replicon is a unit of replication that contains an origin of replication. • In theta replication of DNA, the two nucleotide strands of a circular DNA molecule unwind, creating a replication bubble; within each replication bubble, DNA is normally synthesized on both strands and at both replication forks, producing two circular DNA molecules. • Rolling-circle replication is initiated by a nick in one strand of circular DNA, which produces a 3-OH group to which new nucleotides are added while the 5 end of the broken strand is displaced from the circle. Replication proceeds around the circle, producing a circular DNA molecule and a single-stranded linear molecule. • Linear eukaryotic DNA contains many origins of replication. At each origin, the DNA unwinds, producing two nucleotide strands that serve as templates. Unwinding and replication take place on both templates at both ends of the replication bubble until adjacent replicons meet, resulting in two linear DNA molecules. • DNA synthesis requires a single-stranded DNA template, deoxyribonucleoside triphosphates; and a group of enzymes and proteins that carry out replication. • All DNA synthesis is in the 5  : 3 direction. Because the two nucleotide strands of DNA are antiparallel, replication takes place continuously on one strand (the leading strand) and discontinuously on the other (the lagging strand). • Replication begins when an initiator protein binds to a replication origin and unwinds a short stretch of DNA, to which DNA helicase attaches. DNA helicase unwinds the DNA at the replication fork, single-strand-binding proteins bind to single nucleotide strands to prevent them from reannealing, and DNA gyrase (a topoisomerase) removes the strain ahead of the replication fork that is generated by unwinding. • During replication, primase synthesizes short primers of RNA nucleotides, providing a 3-OH group to which DNA polymerase can add DNA nucleotides. • DNA polymerase adds new nucleotides to the 3 end of a growing polynucleotide strand. Bacteria have two DNA polymerases that have primary roles in replication: DNA polymerase III, which synthesizes new DNA on the leading and lagging strands; and DNA polymerase I, which removes and replaces primers. • DNA ligase seals nicks that remain in the sugar – phosphate backbones when the RNA primers are replaced by DNA nucleotides. • Several mechanisms ensure the high rate of accuracy in replication, including precise nucleotide selection, proofreading, and mismatch repair. 348 Chapter 12 • Eukaryotic replication is similar to bacterial replication, although eukaryotes have multiple origins of replication and different DNA polymerases. • Precise replication at multiple origins is ensured by a licensing factor that must attach to an origin before replication can begin. The licensing factor is removed after replication is initiated and renewed after cell division. • Eukaryotic nucleosomes are quickly assembled on new molecules of DNA; newly assembled nucleosomes consist of a random mixture of old and new histone proteins. • The ends of linear eukaryotic DNA molecules are replicated by the enzyme telomerase. • Replication in archaeal bacteria has a number of features in common with eukaryotic replication. • Homologous recombination takes place through the exchange of genetic material between homologous DNA molecules. In the Holliday model, homologous recombination begins with single-strand breaks in both DNA molecules, followed by strand displacement, branch migration, and Holliday junction resolution. In the double-strand break model, it begins with a double-strand-break, followed by strand displacement, DNA synthesis, and resolution of two Holliday junctions. • Homologous recombination in E. coli requires a number of enzymes, including RecA, RecBCD, resolvase, single-strandbinding proteins, ligase, DNA polymerases, and gyrase. IMPORTANT TERMS semiconservative replication (p. 000) equilibrium density gradient centrifugation (p. 000) replicon (p. 000) replication origin (p. 000) theta replication (p. 000) replication bubble (p. 000) replication fork (p. 000) bidirectional replication (p. 000) rolling-circle replication (p. 000) DNA polymerase (p. 000) continuous replication (p. 000) leading strand (p. 000) discontinuous replication (p. 000) lagging strand (p. 000) Okazaki fragments (p. 000) initiator protein (p. 000) DNA helicase (p. 000) single-strand-binding protein (SSB) (p. 000) DNA gyrase (p. 000) primase (p. 000) primer (p. 000) DNA polymerase III (p. 000) DNA polymerase I (p. 000) DNA ligase (p. 000) proofreading (p. 000) mismatch repair (p. 000) autonomously replicating sequence (p. 000) replication licensing factor (p. 000) DNA polymerase  (p. 000) DNA polymerase  (p. 000) DNA polymerase (p. 000) DNA polymerase  (p. 000) DNA polymerase  (p. 000) telomerase (p. 000) homologous recombination (p. 000) heteroduplex DNA (p. 000) Holliday junction (p. 000) branch migration (p. 000) Holliday intermediate (p. 000) double-strand-break model (p. 000) Worked Problems Origin 1. The following diagram (below0 represents the template strands of a replication bubble in a DNA molecule. Draw in the newly synthesized strands and label the leading and lagging strands. 5 Origin 3 3 5 5 3 Unwinding 5 3 3 5 Unwinding Unwinding Origin • Solution To determine the leading and lagging strands, first note which end of each template strands is 5 and which end is 3. With a pencil, draw in the strands being synthesized on these templates, and label their 5 and 3 ends, recalling that the newly synthesized strands must be antiparallel to the templates. 3 5 Unwinding Origin Next, determine the direction of replication for each new strand, which must be 5 : 3. You might draw arrows on the new strands to indicate the direction of replication. After you have established the direction of replication for each strand, look at each fork and determine whether the direction of replication for a strand is the same as the direction of unwinding. The strand on which replication is in the same direction as unwinding is the leading strand. The strand on which replication is in the direction opposite that of unwinding is the lagging strand. Make sure that you have one leading strand and one lagging strand for each fork. DNA Replication and Recombination Origin Leading 5 Lagging 3 5 3 3 5 3 Lagging two bands should be present. Subsequent rounds of replication will increase the fraction of DNA consisting entirely of new 14N, thus the upper band will get darker. However, the original DNA with 15N will remain, so two bands will be present. 5 Leading Unwinding Unwinding Origin 2. Consider the experiment conducted by Meselson and Stahl in which they used 14N and 15N in cultures of E. coli and equilibrium density gradient centrifugation. Draw pictures to represent the bands produced by bacterial DNA in the density-gradient tube before the switch to medium containing 14N and after one, two, and three rounds of replication after the switch to the medium containing 14N. Use a separate set of drawings to show the bands that would appear if replication were (a) semiconservative; (b) conservative; (c) dispersive. • Solution DNA labeled with 15N will be denser than DNA labeled with 14 N; therefore 15N-labeled DNA will sink lower in the densitygradient tube. Before the switch to medium containing 14N, all DNA in the bacteria will contain 15N and will produce a single band in the lower end of the tube. (a) With semiconservative replication, the two strands separate, and each serves as a template on which a new strand is synthesized. After one round of replication, the original template strand of each molecule will contain 15N and the new strand of each molecule will contain 14N; so a single band will appear in the density gradient halfway between the positions expected of DNA with 15N and of DNA with 14N. In the next round of replication, the two strands again separate and serve as templates for new strands. Each of the new strands contains only 14N, thus some DNA molecules will contain one strand with the original 15N and one strand with new 14 N, whereas the other molecules will contain two strands with 14 N. This labeling will produce two bands, one at the intermediate position and one at a higher position in the tube. Additional rounds of replication should produce increasing amounts of DNA that contains only 14N; so the higher band will get darker. Replication Before the switch to 14N 349 Replication After one round of replication Replication After two rounds of replication After three rounds of replication (b) With conservative replication, the entire molecule serves as a template. After one round of replication, some molecules will consist entirely of 15N, and others will consist entirely of 14N; so Replication Before the switch to 14N Replication After one round of replication Replication After two rounds of replication After three rounds of replication (c) In dispersive replication, both nucleotide strands break down into fragments that serve as templates for the synthesis of new DNA. The fragments then reassemble into DNA molecules. After one round of replication, all DNA should contain approximately half 15N and half 14N, producing a single band that is halfway between the positions expected of DNA labeled with 15N and of DNA labeled with 14N. With further rounds of replication, the proportion of 14N in each molecule increases; so a single hybrid band remains, but its position in the density gradient will move upward. The band is also expected to get darker as the total amount of DNA increases. Replication Before the switch to 14N Replication After one round of replication Replication After two rounds of replication After three rounds of replication 3. The E. coli chromosome contains 4.7 million base pairs of DNA. If synthesis at each replication fork occurs at a rate of 1000 nucleotides per second, how long will it take to completely replicate the E. coli chromosome with theta replication? • Solution Bacterial chromosomes contain a single origin of replication, and theta replication usually employs two replication forks, which proceed around the chromosome in opposite directions. Thus, the overall rate of replication for the whole chromosome is 2000 nucleotides per second. With a total of 4.7 million base pairs of DNA, the entire chromosome will be replicated in: 4,700,000 bp  1 second 1 minute  2350 seconds  2000 bp 60 seconds  39.17 minutes At the beginning of this chapter it was stated that E. coli is capable of dividing every 20 minutes. How is this possible if it 350 Chapter 12 takes almost twice as long to replicate its genome? The answer is that a second round of replication begins before the first round has finished. Thus, when an E. coli cell divides, the chromosomes that are passed on to the daughter cells are already partially replicated. This is in contrast to eukaryotic cells, which replicate their entire genome once, and only once, during each cell cycle. COMPREHENSION QUESTIONS 1. What is semiconservative replication? * 2. How did Meselson and Stahl demonstrate that replication in E. coli takes place in a semiconservative manner? * 3. Draw a molecule of DNA undergoing theta replication. On your drawing, identify (1) origin, (2) polarity (5 and 3 ends) of all template strands and newly synthesized strands, (3) leading and lagging strands, (4) Okazaki fragments, and (5) location of primers. 4. Draw a molecule of DNA undergoing rolling-circle replication. On your drawing, identify (1) origin, (2) polarity (5 and 3 ends) of all template and newly synthesized strands, (3) leading and lagging strands, (4) Okazaki fragments, and (5) location of primers. 5. Draw a molecule of DNA undergoing eukaryotic linear replication. On your drawing, identify (1) origin; (2) polarity (5 and 3 ends) of all template and newly synthesized strands, (3) leading and lagging strands, (4) Okazaki fragments, and (5) location of primers. 6. What are three major requirements of replication? * 7. What substrates are used in the DNA synthesis reaction? 8. List the different proteins and enzymes taking part in bacterial replication. Give the function of each in the replication process. 9. What similarities and differences exist in the enzymatic activities of DNA polymerases I, II, and III? What is the function of each type of DNA polymerase in bacterial cells? *10. Why is primase required for replication? 11. What three mechanisms ensure the accuracy of replication in bacteria? 12. How does replication licensing ensure that DNA is replicated only once at each origin per cell cycle? *13. In what ways is eukaryotic replication similar to bacterial replication, and in what ways is it different? 14. Outline in words and pictures how telomeres at the end of eukaryotic chromosomes are replicated. 15. Briefly outline with diagrams the Holliday model of homologous recombination. *16. What are some of the enzymes taking part in recombination in E. coli and what roles do they play? APPLICATION QUESTIONS AND PROBLEMS *17. Suppose a future scientist explores a distant planet and discovers a novel form of double-stranded nucleic acid. When this nucleic acid is exposed to DNA polymerases from E. coli, replication takes place continuously on both strands. What conclusion can you make about the structure of this novel nucleic acid? *18. Phosphorus is required to synthesize the deoxyribonucleoside triphosphates used in DNA replication. A geneticist grows some E. coli in a medium containing nonradioactive phosphorous for many generations. A sample of the bacteria is then transferred to a medium that contains a radioactive isotope of phosphorus (32P). Samples of the bacteria are removed immediately after the transfer and after one and two rounds of replication. What will be the distribution of radioactivity in the DNA of the bacteria in each sample? Will radioactivity be detected in neither, one, or both strands of the DNA? 19. A line of mouse cells is grown for many generations in a medium with 15N. Cells in G1 are then switched to a new medium that contains 14N. Draw a pair of homologous chromosomes from these cells at the following stages, showing the two strands of DNA molecules found in the chromosomes. Use different colors to represent strands with 14 N and 15N. (a) Cells in G1, before switching to medium with 14N (b) Cells in G2, after switching to medium with 14N (c) Cells in anaphase of mitosis, after switching to medium with 14N (d) Cells in metaphase I of meiosis, after switching to medium with 14N (e) Cells in anaphase II of meiosis, after switching to medium with 14N * 20. A circular molecule of DNA contains 1 million base pairs. If DNA synthesis at a replication fork occurs at a rate of 100,000 nucleotides per minute, how long will theta replication require to completely replicate the molecule, assuming that theta replication is bidirectional? How long 351 DNA Replication and Recombination will replication of this circular chromosome take by rolling-circle replication? Ignore replication of the displaced strand in rolling-circle replication. 21. A bacterium synthesizes DNA at each replication fork at a rate of 1000 nucleotides per second. If this bacterium completely replicates its circular chromosome by theta replication in 30 minutes, how many base pairs of DNA will its chromosome contain? * 22. The following diagram represents a DNA molecule that is undergoing replication. Draw in the strands of newly synthesized DNA and identify the following: (a) Polarity of newly synthesized strands (b) Leading and lagging strands (c) Okazaki fragments (d) RNA primers * 23. What would be the effect on DNA replication of mutations that destroyed each of the following activities in DNA polymerase I? Origin 3 5 5 3 Unwinding Unwinding Origin (a) 3 : 5 exonuclease activity (b) 5 : 3 exonuclease activity (c) 5 : 3 polymerase activity CHALLENGE QUESTIONS 24. Conditional mutations express their mutant phenotype only under certain conditions (the restrictive conditions) and express the normal phenotype under other conditions (the permissive conditions). One type of conditional mutation is a temperature-sensitive mutation, which expresses the mutant phenotype only at certain temperatures. Strains of E. coli have been isolated that contain temperature-sensitive mutations in the genes encoding different components of the replication machinery. In each of these strains, the protein produced by the mutated gene is nonfunctional under the restrictive conditions. These strains are grown under permissive conditions and then abruptly switched to the restrictive condition. After one round of replication under the restrictive condition, the DNA from each strain is isolated and analyzed. What would you predict to see in the DNA isolated from each strain in the following list? Temperature-sensitive mutation in gene encoding: (a) DNA ligase (b) DNA polymerase I (c) DNA polymerase III (d) Primase (e) Initiator protein SUGGESTED READINGS Baker, T. A., and S. H. Wickner. 1992. Genetics and enzymology of DNA replication in Escherichia coli. Annual Review of Genetics 26:447 – 477. A detailed review of replication in bacteria. Bell, S. P., R. Kobayashi, and B. Stillman. 1993. Yeast origin recognition complex functions in transcription silencing and DNA replication. Science 262:1844 – 1849. A research article on the role of eukaryotic origins of replication in replication and transcription. Blow, J. J., and S. Tada. 2000. A new check on issuing the license. Nature 404:560 – 561. A short review of the molecular basis of replication licensing. Cairns, J. 1966. The bacterial chromosome. Scientific American 214(1):36 – 44. Classical research that verified semiconservative replication and the theta model in bacteria. Campbell, J. L. 1986. Eukaryotic DNA replication. Annual Review of Biochemisty 55:733 – 771. A detailed review of replication in eukaryotic cells. Cook. P. R. 1999. The organization of replication and transcription. Science 284:1790 – 1795. A review of the location of DNA and RNA polymerase enzymes that carry out replication and transcription. Provides evidence that the polymerases are immobilized and the DNA template is threaded through the enzymes. Echols, H., and M. F. Goodman. 1991. Fidelity mechanisms in DNA replication. Annual Review of Biochemistry 60:477 – 511. A review of error avoidance mechanisms in replication. 352 Chapter 12 Ellis, N., J. Groden, T. Ye, J. Straughen, D. J. Lennon, S. Ciocci, M. Proytcheva, and J. German. 1995. The Blooms’s syndrome gene product is homologous to RecQ helicases. Cell 83:655 – 666. Report of the isolation of the gene causing Bloom syndrome and the identification of its biochemical function. Lee, H., M. A. Blasco, G. J. Gottlieb, J. W. Horner, II, G. W. Greider, and R. A. DePinho. 1998. Essential role of mouse telomerase in highly proliferative organs. Nature 392:569 – 574. Describes the role of telomerase, investigated by creating knockout mice that lack the gene for telomerase. Frick, D. N., and C. C. Richardson. 2000. DNA primases. Annual Review of Biochemistry 70:39 – 80. An excellent and detailed review of DNA primases, which are essential to the replication process. Matson, S. W., and K. A. Kaiser-Rogers. 1990. DNA helicases. Annual Review of Biochemistry 59:289 – 329. A detailed review of the enzymes that unwind DNA. Greider, C. W., and E. H. Blackburn. 1996. Telomeres, telomerase, and cancer. Scientific American 274(2):92 – 97. A readable account of telomeres, how they are replicated, and their role in cancer. Haber, J. E. 1999. DNA recombination: the replication connection. Trends in Biochemical Science 24:271 – 276. The role of the establishment of replication forks in recombination. Huberman, J. A. 1998. Choosing a place to begin. Science 281:929 – 930. A short review on evidence regarding replication origins in eukaryotic cells. Hübscher, U., H. Nasheuer, and J. E. Syväoja. 2000. Eukaryotic DNA polymerases: a growing family. Trends in Biochemical Science 25:143 – 147. An excellent review of the increasing number of different DNA polymerases found in eukaryotic cells and their functions. Keck, J. L. , D. D. Roche, A. S. Lynch, and J. M. Berger. 2000. Structure of the RNA polymerase domain of E. coli primase. Science 287:2482 – 2492. Report of the detailed structure of bacterial primase. Kornberg, A., and T. A. Baker. 1992. DNA Replication, 2d ed. New York: W. H. Freeman and Company. The “Bible” of DNA replication by the world’s foremost authority on the subject. Kowalczykowski, S. C. 2000. Initiation of genetic recombination and recombination-dependent replication. Trends in Biochemical Science 25:156 – 164. A review of the role of replication processes in recombination. Newton, C. S. 1993. Two jobs for the origin of replication. Science 262:1830 – 1831. Discusses findings about the molecular structure and functioning of origins of replication and their role in transcription. Nossal, N. C. 1983. Prokaryotic DNA replication systems. Annual Review of Biochemistry 53:581 – 615. A good review of replication in bacteria. Radman, M., and R. Wagner. 1988. The high fidelity of DNA duplication. Scientific American 259(2):40 – 46. A very readable account of how the accuracy of DNA replication is ensured. Stahl, F. W. 1987 Genetic recombination. Scientific American 256(2):90 – 101. An interesting discussion of research examining the molecular mechanism of homologous recombination. Stahl, F. W. 1994. The Holliday junction on its thirtieth anniversary. Genetics 138:241 – 246. A brief history of the Holliday model of recombination and an update on its relevance today. West, S. C. 1992. Enzymes and molecular mechanisms of genetic recombination. Annual Review of Biochemistry 61:603 – 640. An excellent but detailed review of recombination at the molecular level. Waga, S., and B. Stillman. 1998. The DNA replication fork in eukaryotic cells. Annual Review of Biochemistry 67:721 – 751. Summarizes the components of the replication machinery and the process of DNA synthesis that takes place at the replication fork in eukaryotic cells. Zakian, V. A. 1995. Telomeres: beginning to understand the ends. Science 270:1601 – 1606. A review article that discusses telomeres and how they are replicated. 13 Transcription • • RNA in the Primeval World RNA Molecules The Structure of RNA Classes of RNA • Transcription: Synthesizing RNA from a DNA Template The Template The Substrate for Transcription The Transcription Apparatus • The Process of Bacterial Transcription Initiation Elongation Termination • The Process of Eukaryotic Transcription Transcription and Nucleosome Structure Transcription Initiation RNA Polymerase II Promoters RNA Polymerase I Promoters RNA Polymerase III Promoters Molecular image of the hammerhead ribozyme (in blue) bound to RNA (in orange). Ribozymes are catalytic RNA molecules that may have been the first carriers of genetic information. (K. Eward/Biografx/Photo Evolutionary Relationships and the TATA-Binding Protein Termination Researchers.) RNA in the Primeval World Life requires two basic functions. First, living organisms must be able to store and faithfully transmit genetic information during reproduction. Second, they must have the ability to catalyze chemical transformations, to fire the reactions that drive life processes. It was long believed that the functions of information storage and chemical transformation are handled by two entirely different types of molecules. Genetic information is stored in nucleic acids. Catalysis of chemical transformations was held to be the exclusive domain of certain proteins that serve as biological catalysts or enzymes, making reactions take place rapidly within the cell. This biochemical dichotomy — nucleic acid for information, proteins for catalysts — revealed a dilemma in our understanding of the early stages in the evolution of life. Which came first: proteins or nucleic acids? If nucleic acids carry the coding instructions for proteins, how could proteins be generated without them? Because nucleic acids are unable to copy themselves, how could they be generated without proteins? If DNA and proteins each require the other, how could life begin? This apparent paradox disappeared in 1981 when Thomas Cech and his colleagues discovered that RNA can serve as a biological catalyst. They found that RNA from the protozoan Tetrahymena thermophila can excise 400 nucleotides from its RNA in the absence of any protein. Other examples of catalytic RNAs have now been discovered in 353 354 Chapter I3 different types of cells. Called ribozymes, these RNA molecules can cut out parts of their own sequences, connect some RNA molecules together, replicate others, and even catalyze the formation of peptide bonds between amino acids. The discovery of ribozymes complements other evidence suggesting that the original genetic material was RNA. Ribozymes that were self-replicating probably first arose between 3.5 billion and 4 billion years ago and may have begun the evolution of life on Earth. Early life was an RNA world, with RNA molecules serving both as carriers of genetic information and as catalysts that drove the chemical reactions needed to sustain and perpetuate life. These catalytic RNAs may have acquired the ability to synthesize protein-based enzymes, which are more efficient catalysts; with enzymes taking over more and more of the catalytic functions, RNA probably became relegated to the role of information storage and transfer. DNA, with its chemical stability and faithful replication, eventually replaced RNA as the primary carrier of genetic information. In modern cells, RNA still plays a vital role in both DNA replication and protein synthesis. Transcription is the synthesis of RNA molecules, with DNA as a template, and it is the first step in the transfer of genetic information from genotype to phenotype. The process is complex, and requires a number of protein components. As we examine the stages of transcription, try to keep all the detail in perspective; focus on understanding how the details relate to the overall purpose of transcription — the selective synthesis of an RNA molecule. This chapter begins with a brief review of RNA structure and a discussion of the different classes of RNA. We then consider the major components required for transcription. Finally, we explore the process of transcription in eubacteria and eukaryotic cells. At several points in the text, we’ll pause to absorb some general principles that emerge. www.whfreeman.com/pierce Current research on ribozymes RNA Molecules Before we begin our study of transcription, let’s review the structure of RNA and consider the different types of RNA molecules. The Structure of RNA RNA, like DNA, is a polymer consisting of nucleotides joined together by phosphodiester bonds (see Chapter 10 for a discussion of RNA structure). However, there are several important differences in the structures of DNA and RNA. Whereas DNA nucleotides contain deoxyribose sugars, RNA nucleotides have ribose sugars ( ◗ FIGURE 13.1a). With a free hydroxyl group on the 2-carbon atom of the ribose sugar, RNA is degraded rapidly under alkaline condi- tions. The deoxyribose sugar of DNA lacks this free hydroxyl group; so DNA is a more stable molecule. Another important difference is that thymine, one of the two pyrimidines found in DNA, is replaced by uracil in RNA. A final difference in the structures of DNA and RNA is that RNA is usually single stranded, consisting of a single polynucleotide strand ( ◗ FIGURE 13.1b), whereas DNA normally consists of two polynucleotide strands joined by hydrogen bonding between complementary bases. Some viruses contain double-stranded RNA genomes, as discussed in Chapter 8. Although RNA is usually single stranded, short complementary regions within a nucleotide strand can pair and form secondary structures (see Figure 13.1b). These RNA secondary structures are often called hairpin-loops or stem-loop structures. When two regions within a single RNA molecule pair up, the strands in those regions must be antiparallel, with pairing between cytosine and guanine and between adenine and uracil (although occasionally guanine pairs with uracil). The formation of secondary structures plays an important role in RNA function. Secondary structure is determined by the base sequence of the nucleotide strand; so different RNA molecules can assume different structures. Because their structure determines their function, RNA molecules have the potential for tremendous variation in function. With its two complementary strands forming a helix, DNA is much more restricted in the range of secondary structures that it can assume, and it serves fewer functional roles in the cell. Similarities and differences in DNA and RNA structures are summarized in Table 13.1. Table 13.1 The structures of DNA and RNA compared Characteristic DNA RNA Composed of nucleotides Yes Yes Type of sugar Deoxyribose Ribose Presence of 2-OH group No Yes Bases A, G, C, T A, G, C, U Nucleotides joined by phosphodiester bonds Yes Yes Double or single stranded Usually double Usually single Secondary structure Double helix Many types Stability Quite stable Easily degraded Transcription (b) Primary structure 5’ AUGCGGCUACGUAACGAGCUUAGCGCGUAUACCGAAAGGGUAGAAC (a) 5’ Phosphate H H H 2' OH O C H2C 5’ N O H N O O H O OH N C C A A A C G N C H H N C HC H H H O H2C 5’ C H O H 3’ GAU H N N H P AUG C AUGCGGCUACG C H HC N H2C 5’ AG H O P 5’ CA C OH O 3’ H N H –O C G H H 3’ H 3’ O C N O –O N HC O P Secondary structure U –O RNA has a hydroxyl group on the 2’-carbon atom of its sugar component, whereas DNA has a hydrogen atom. RNA is more reactive than DNA. A UC 3’ Ribose sugar 1' H …owing to hydrogen bonding between complementary bases on the same strand. G 4' CH O H2C 5’ CH AG N RNA contains uracil in place of thymine. C U A O G C GA O Folding O HN O P An RNA molecule folds to form secondary structures… Base AUGG UACC Strand continues –O N O H C C N C O H H O OH Strand continues 3’ ◗ 13.1 355 RNA has a primary and a secondary structure. Classes of RNA RNA molecules perform a variety of functions in the cell. Ribosomal RNA (rRNA), along with ribosomal protein subunits, makes up the ribosome, the site of protein assembly. We’ll take a more detailed look at the ribosome in Chapter 14. Messenger RNA (mRNA) carries the coding instructions for polypeptide chains from DNA to the ribosome. After attaching to a ribosome, an mRNA molecule specifies the sequence of the amino acids in a polypeptide chain and provides a template for joining amino acids. Large precursor molecules, which are termed pre-messenger RNAs (pre-mRNAs), are the immediate products of transcription in eukaryotic cells. Pre-mRNAs are modified extensively before they exit the nucleus for translation into protein. Bacterial cells do not possess premRNA; in these cells, transcription takes place concurrently with translation. Transfer RNA (tRNA) serves as the link between the coding sequence of nucleotides in the mRNA and the amino acid sequence of a polypeptide chain. Each tRNA attaches to one particular type of amino acid and helps to incorporate that amino acid into a polypeptide chain (discussed in Chapter 15). Additional classes of RNA molecules are found in the nuclei of eukaryotic cells. Small nuclear RNAs (snRNAs) combine with small nuclear protein subunits to form small nuclear ribonucleoproteins (snRNPs, affectionately known as “snurps”). The snRNPs are analogous to ribosomes in structure, only smaller, and they typically contain a single RNA molecule combined with approximately 10 small nuclear protein subunits. Some snRNAs participate in the processing of RNA, converting pre-mRNA into mRNA. Small nucleolar RNAs (snoRNAs) take part in the processing of rRNA. Small 3' 356 Chapter I3 Table 13.2 Locations and functions of different classes of RNA molecules Location of Function* in Eukaryotic Cells Class of RNA Cell Type Function Ribosomal RNA (rRNA) Bacterial and eukaryotic Cytoplasm Structural and functional components of the ribosome Messenger RNA (mRNA) Bacterial and eukaryotic Nucleus and cytoplasm Carries genetic code for proteins Transfer RNA (tRNA) Bacterial and eukaryotic Cytoplasm Helps incorporate amino acids into polypeptide chain Small nuclear RNA (snRNA) Eukaryotic Nucleus Processing of pre-mRNA Small nucleolar RNA (snoRNA) Eukaryotic Nucleus Processing and assembly of rRNA Small cytoplasmic RNA (scRNA) Eukaryotic Cytoplasm Variable *All eukaryotic RNAs are transcribed in the nucleus. RNA molecules also are found in the cytoplasm of eukaryotic cells; these molecules are called small cytoplasmic RNAs (scRNAs). The different classes of RNA molecules are summarized in Table 13.2. Concepts RNA differs from DNA in that it possesses a hydroxyl group on the 2-carbon atom of its sugar, contains uracil instead of thymine, and is normally single stranded. Several classes of RNA exist within bacterial and eukaryotic cells. Transcription: Synthesizing RNA from a DNA Template All cellular RNAs are synthesized from a DNA template through the process of transcription ( ◗ FIGURE 13.2). Transcription is in many ways similar to the process of replication, but one fundamental difference relates to the length of the template used. During replication, all the nucleotides in the DNA template are copied, but, during transcription, only small parts of the DNA molecule — usually a single gene or, at most, a few genes — are transcribed into RNA. Because not all gene products are needed at the same time or in the same cell, it would be highly DNA 1 Some RNAs are transcribed in both prokaryotic and eukaryotic cells… Messenger RNA (mRNA) Ribosomal RNA (rRNA) Transfer RNA (tRNA) 2 …and some are produced only in eukaryotes. Pre-messenger RNA (pre-mRNA) Small nuclear RNA (snRNA) Small nucleolar RNA (sno-RNA) Small cytoplasmic RNA (scRNA) Transcription RNA RNA replication PROTEIN ◗ 13.2 All cellular types of RNA are transcribed from DNA. 3 Some viruses copy RNA directly from RNA. Transcription inefficient for a cell to constantly transcribe all of its genes. Furthermore, much of the DNA does not code for a functional product, and transcription of such sequences would be pointless. Transcription is, in fact, a highly selective process — individual genes are transcribed only as their products are needed. But this selectivity imposes a fundamental problem on the cell — the problem of how to recognize individual genes and transcribe them at the proper time and place. Like replication, transcription requires three major components: 1. a DNA template; 2. the raw materials (substrates) needed to build a new RNA molecule; and 3. the transcription apparatus, consisting of the proteins necessary to catalyze the synthesis of RNA. The Template In 1970, Oscar Miller, Jr., Barbara Hamkalo, and Charles Thomas used electron microscopy to examine cellular contents and demonstrate that RNA is transcribed from a DNA template. The results of this study revealed within the cell the presence of Christmas-tree-like structures: thin central fibers (the trunk of the tree), to which were attached strings (the branches) with granules ( ◗ FIGURE 13.3). The addition of deoxyribonuclease (an enzyme that degrades DNA) caused the central fibers to disappear, indicating that the “tree trunks” were DNA molecules. Ribonuclease (an enzyme that degrades RNA) removed the granular strings, indicating that the branches were RNA. Their conclusion was that each Christmas tree represented a gene undergoing transcription. The transcription of each gene begins at the top of the tree; there, little of the DNA has been transcribed and the RNA branches are short. As the transcription apparatus moves down the tree, transcribing more of the template, the RNA molecules lengthen, producing the long branches at the bottom. ◗ 13.3 Under the electron microscope, DNA molecules undergoing transcription exhibit Christmas-tree-like structures. The trunk of each “Christmas tree” (a transcription unit) represents a DNA molecule; the tree branches (granular strings attached to the DNA) are RNA molecules that have been transcribed from the DNA. As the transcription apparatus moves down the DNA, transcribing more of the template, the RNA molecules become longer and longer. (O. L. Miller, B. R. Beatty, D. W. Fawcett/Visuals Unlimited.) The transcribed strand The template for RNA synthesis, as for DNA synthesis, is a single strand of the DNA double helix. Unlike replication, however, transcription typically takes place on only one of the two nucleotide strands of DNA ( ◗ FIGURE 13.4). The nucleotide strand used for transcription is termed the template strand. The other strand, called the nontemplate strand, is not ordinarily transcribed. Thus, in any one section of DNA, only one of the nucleotide strands normally carries the genetic information that is transcribed into RNA (there are some exceptions to this rule). Evidence that only one DNA strand serves as a template came from several experiments carried out by Julius DNA 3’ RNA 5’ 3’ 5’ 1 RNA is synthesized complementary and antiparallel to the template strand. 5’ 3’ TACGGATACG Nontemplate strand 3 The nontemplate strand is not usually transcribed. DNA ’ RNA 5 UACGGAUA 3’ ATGCCTATGC 3’ 5’ Template strand ◗ 13.4 RNA molecules are synthesized that are complementary and antiparallel to one of the two nucleotide strands of DNA, the template strand. 2 New nucleotides are added to the 3’-OH group of the growing RNA, so transcription proceeds in a 5’ 3’ direction. 357 358 Chapter I3 Marmur and his colleagues in 1963 on the DNA of bacteriophage SP8, which infects the bacterium Bacillus subtilus. This phage carries its genetic information in the form of a double-stranded DNA molecule. The two strands have different base compositions and therefore different densities, which permits the separation of the strands by equilibrium density gradient centrifugation (see Figure 12.2) into “heavy” and “light” DNA strands. Marmur and his colleagues placed some B. subtilis in a medium that contained a radioactively labeled precursor of RNA ( ◗ FIGURE 13.5). They infected the bacteria with SP8, and the phage injected their DNA into the bacterial cells. Transcription of the phage DNA within the cells incorporated the radioactive precursor into the newly synthesized RNA, producing radioactively labeled RNA complementary to the phage DNA (step 2), which was then isolated from the cells (step 3). The DNA of another culture of SP8 phage was isolated (step 4) and the heavy and light strands of the DNA were separated (step 5). When the radioactively labeled RNA (obtained in steps 1 through 3 of Figure 13.5) was com- bined with the heavy strand (step 6), the RNA hybridized to it, indicating that the RNA and DNA were complementary (step 7). However, when radioactively labeled RNA was added to the light strand (step 8), no hybridization took place. These findings led Marmur and his colleagues to conclude that RNA is transcribed from only one of the DNA strands in SP8 — in this case, the heavy strand. SP8 is unusual in that all of its genes are transcribed from the same strand. In most organisms, each gene is transcribed from a single strand, but different genes may be transcribed from different strands ( ◗ FIGURE 13.6). Notice that one of the strands in Figure 13.6 is identified as plus () and the other as minus (). The plus strand is the template for genes a and c, and the minus strand is the template for gene b. During transcription, an RNA molecule is synthesized that is complementary and antiparallel to the DNA template strand (see Figure 13.4). The RNA transcript has the same polarity and base sequence as does the nontemplate strand, with the exception that U in RNA substitutes for T in DNA. Experiment Question: Do both strands of DNA serve as templates for RNA synthesis? 1 Bacillus subtilis was placed in medium containing radioactively labeled substrate for RNA and was infected with SP8 phage. 2 Labeled substrate was incorporated into RNA in its transcription from phage DNA. 3 RNA was then isolated from bacterial cells. Bacteria SP8 phage Phage DNA Radioactive RNA Transcription SP8 phage culture Light DNA strand 8 …but not to the light strand. 7 The RNA hybridized to the heavy strand… Heavy DNA strand 4 DNA of a different culture of SP8 was isolated,… 5 …and the light and heavy strands were separated by equilibrium density gradient centrifugation. 6 Radioactively labeled RNA complementary to SP8 DNA was added to the separated heavy and light phage DNA strands. Conclusion: The fact that RNA hybridized to only one of the DNA strands indicates that it was transcribed only from that strand. ◗ 13.5 Marmur and colleagues showed that only one DNA strand serves as template during transcription. Transcription Genes a and c are transcribed from the (+) strand,… DNA (–) strand 5’ (+) strand 3’ RNA RNA Gene a Gene b Gene c 3’ 5’ RNA …and b is transcribed from the (–) strand. ◗ 13.6 RNA is transcribed from one DNA strand. In most organisms, each gene is transcribed from a single DNA strand, but different genes may be transcribed from one or the other of the two DNA strands. Concepts Within a single gene, only one of the two DNA strands, the template strand, is generally transcribed into RNA. The transcription unit A transcription unit is a stretch of DNA that codes for an RNA molecule and the sequences necessary for its transcription. In eukaryotes, as discussed in Chapter 14, alternative RNA molecules can be produced from each transcription unit. How does the complex of enzymes and proteins that performs transcription — the transcription apparatus — recognize a transcription unit? How does it know which DNA strand to read, and where to start and stop? This information is encoded by the DNA sequence. Included within a transcription unit are three critical regions: a promoter, an RNA coding sequence, and a terminator ( ◗ FIGURE 13.7). The promoter is a DNA sequence that the transcription apparatus recognizes and binds. It indicates which of the two DNA strands is to be read as the template and the direction of transcription. The promoter also determines the transcription start site, the first nucleotide that will be transcribed into RNA. In most transcription units, the promoter is located next to the transcription start site but is not, itself, transcribed. The second critical region of the transcription unit is the RNA-coding region, a sequence of DNA nucleotides that is copied into an RNA molecule. A third component of the transcription unit is the terminator, a sequence of nucleotides that signals where transcription is to end. Terminators are usually part of the coding sequence; that is, transcription stops only after the terminator has been copied into RNA. Molecular biologists often use the terms upstream and downstream to refer to the direction of transcription and the location of nucleotide sequences surrounding the RNA coding sequence. The transcription apparatus is said to move downstream during transcription: it binds to the promoter (which is usually upstream of the start site) and moves toward the terminator (which is downstream of the start site). When DNA sequences are written out, often the sequence of only one of the two strands is listed. Molecular biologists typically write the sequence of the nontemplate strand, because it will be the same as the sequence of the RNA transcribed from the template (with the exception that U in RNA replaces T in DNA). By convention, the sequence on the nontemplate strand is written with the 5 end on the left and the 3 end on the right. The first nucleotide transcribed (the transcription start site) is numbered 1; nucleotides downstream of the start site are assigned positive numbers, and nucleotides upstream of the start site are assigned negative numbers. So, nucleotide 34 would be 34 nucleotides downstream of the start site, whereas nucleotide 75 would be 75 nucleotides upstream of the start site. Concepts A transcription unit is a piece of DNA that encodes an RNA molecule and the sequences necessary for its proper transcription. Each transcription unit includes a promoter, an RNA-coding region, and a terminator. The Substrate for Transcription RNA is synthesized from ribonucleoside triphosphosphates (rNTPs) ( ◗ FIGURE 13.8). In synthesis, nucleotides are added one at a time to the 3-OH group of the growing RNA molecule. Two phosphates are cleaved from the Upstream Nontemplate Promoter strand Downstream RNA-coding region DNA 5’ 3’ ◗ 13.7 A transcription unit includes a promoter, an RNA-coding region, and a terminator. 3’ 5’ Transcription start site Template strand RNA transcript 5’ Terminator Transcription termination site 3’ 359 360 Chapter I3 Triphosphate O O O Base O 9 P 9 O 9P9O9P9O9 CH 2 O O 1 Initiation of RNA synthesis does not require a primer. O DNA O 5’ 3’ RNA OH OH Sugar ◗ 13.8 3’ Ribonucleoside triphosphates are substrates used in RNA synthesis. incoming ribonucleoside triphosphate; the remaining phosphate participates in a phosphodiester bond that connects the nucleotide to the growing RNA molecule. The overall chemical reaction for the addition of each nucleotide is: RNAn  rNTP 9: RNAn1  PPi where PPi represents two atoms of inorganic phosphorus. Nucleotides are always added to the 3 end of the RNA molecule, and the direction of transcription is therefore 5 : 3 ( ◗ FIGURE 13.9), the same as the direction of DNA synthesis during replication. RNA is made complementary and antiparallel to one of the DNA strands (the template strand). Concepts RNA is synthesized from ribonucleoside triphosphates. Transcription is 5 : 3: each new nucleotide is joined to the 3-OH group of the last nucleotide added to the growing RNA molecule. RNA synthesis does not require a primer. The Transcription Apparatus Recall that, in replication, a number of different enzymes and proteins are required to bring about DNA synthesis. Although transcription might initially appear to be quite different, because a single enzyme — RNA polymerase — carries out all the required steps of transcription, on closer inspection, the processes are actually similar. The action of RNA polymerase is enhanced by a number of accessory proteins that join and leave the polymerase at different stages of the process. Each accessory protein is responsible for providing or regulating a special function. Thus, transcription, like replication, requires an array of proteins. Bacterial RNA polymerase Bacterial cells typically possess only one type of RNA polymerase, which catalyzes the synthesis of all classes of bacterial RNA: mRNA, tRNA, and 2 New nucleotides are added to the 3’ end of the RNA molecule. 5’ 3 DNA unwinds at the front of the transcription bubble… 3’ 5’ 4 …and then rewinds. ◗ 13.9 In transcription, nucleotides are always added to the 3 end of the RNA molecule. rRNA. Bacterial RNA polymerase is a large, multimeric enzyme (meaning that it consists of several polypeptide chains). At the heart of bacterial RNA polymerase are four subunits (individual polypeptide chains) that make up the core enzyme: two copies of a subunit called alpha (), a single copy of beta (), and single copy of beta prime () ( ◗ FIGURE 13.10). The core enzyme catalyzes the elongation of the RNA molecule by the addition of RNA nucleotides. Other functional subunits join and leave the core enzyme at particular stages of the transcription process. The sigma () factor controls the binding of the RNA polymerase to the promoter. Without sigma, RNA polymerase will initiate transcription at a random point along the DNA. After sigma has associated with the core enzyme (forming a holoenzyme), RNA polymerase binds stably only to the promoter region and initiates transcription at the proper start site. Sigma is required only for promoter binding and initiation; when a few RNA nucleotides have been joined together, sigma detaches from the core enzyme. Many bacteria possess multiple types of sigma. E. coli, for example, possesses sigma 28 ( 28), sigma 32 ( 32), sigma 54 ( 54), and sigma 70 ( 70), named on the basis of their molecular weights. Each type of sigma initiates the binding of RNA polymerase to a particular set of promoters. For example,  32 binds to promoters of genes that protect against environmental stress,  54 binds to promoters of genes used during nitrogen starvation, and  70 binds to many different promoters. Other subunits provide the core RNA polymerase with additional functions. Rho () and NusA, for example, facilitate the termination of transcription. Transcription (a) Sigma factor Table 13.3 Eukaryotic RNA polymerases σ α β’ α β α Core RNA polymerase β’ σ α RNA polymerase holoenzyme (b) Type Transcribes RNA polymerase I Large rRNAs RNA polymerase II Pre-mRNA, some snRNAs,snoRNAs RNA polymerase III tRNAs, small rRNA, snRNAs Concepts DNA Bacterial cells possess a single type of RNA polymerase, consisting of a core enzyme and other subunits that participate in various stages of transcription. Eukaryotic cells possess three distinct types of RNA polymerase: RNA polymerase I transcribes rRNA; RNA polymerase II transcribes pre-mRNA, snoRNAs, and some snRNAs; and RNA polymerase III transcribes tRNAs, small rRNAs, and some snRNAs. The Process of Bacterial Transcription Now that we’ve considered some of the major components of transcription, we’re ready to take a detailed look at the process. Transcription can be conveniently divided into three stages: ◗ 13.10 1. initiation, in which the transcription apparatus In bacterial RNA polymerase, the core enzyme consists of four subunits: two copies of alpha (), a single copy of beta (), and single copy of beta prime (). The core enzyme catalyzes the elongation of the RNA molecule by the addition of RNA nucleotides. (a) The sigma factor () joins the core to form the holoenzyme, which is capable of binding to a promoter and initiating transcription. (b) The molecular model shows RNA polymerase (shown in yellow) binding DNA. assembles on the promoter and begins the synthesis of RNA; 2. elongation, in which RNA polymerase moves along the DNA, unwinding it and adding new nucleotides, one at a time, to the 3 end of the growing RNA strand; and 3. termination, the recognition of the end of the transcription unit and the separation of the RNA molecule from the DNA template. Eukaryotic RNA polymerases Eukaryotic cells pos- We will first examine each of these steps in bacterial cells, where the process is best understood; then we will consider eukaryotic transcription. sess three distinct types of RNA polymerase, each of which is responsible for transcribing a different class of RNA: RNA polymerase I transcribes rRNA; RNA polymerase II transcribes pre-mRNAs, snoRNAs, and some snRNAs; and RNA polymerase III transcribes small RNA molecules — specifically tRNAs, small rRNA, and some snRNAs (Table 13.3). All three eukaryotic polymerases are large, multimeric enzymes, typically consisting of more than a dozen subunits. Some subunits are common to all three RNA polymerases, whereas others are limited to one of the polymerases. As in bacterial cells, a number of accessory proteins bind to the core enzyme and affect its function. Initiation Initiation includes all the steps necessary to begin RNA synthesis, including (1) promoter recognition, (2) formation of the transcription bubble, (3) creation of the first bonds between rNTPs, and (4) escape of the transcription apparatus from the promoter. Transcription initiation requires that the transcription apparatus recognize and bind to the promoter. At this step, the selectivity of transcription is enforced; the binding of RNA polymerase to the promoter determines which parts of the DNA template are to be transcribed and how often. 361 362 Chapter I3 Promoter DNA 5’ 3’ Nontemplate strand TTGACA TATAAT –35 consensus sequence –10 consensus sequence +1 Transcription start site RNA transcript 5’ Different genes are transcribed with different frequencies, and promoter binding is primarily responsible for determining the frequency of transcription for a particular gene. Promoters also have different affinities for RNA polymerase. Even within a single promoter, the affinity can vary over time, depending on its interaction with RNA polymerase and a number of other factors. Bacterial promoters Essential information for the transcription unit — where it will start transcribing, which strand is to be read, and in what direction the RNA polymerase will move — is imbedded in the nucleotide sequence of the promoter. Promoters are sequences in the DNA that are recognized by the transcription apparatus and are required for transcription to take place. In bacterial cells, promoters are usually adjacent to an RNA coding sequence. The examination of many promoters in E. coli and other bacteria reveals a general feature: although most of the nucleotides within the promoters vary in sequence, short stretches of nucleotides are common to many. Furthermore, the spacing and location of these nucleotides relative to the transcription start site are similar in most promoters. These short stretches of common nucleotides are called consensus sequences. The term “consensus sequence” refers to sequences that possess considerable similarity or consensus. By definition, the consensus sequence comprises the most commonly encountered nucleotides found at a specific location. For example, consider the following nucleotides found near the transcription start site of four prokaryotic genes. Consensus sequence 5 – A A T A A A – 3 5 – T T T A A T – 3 5 – T A T T T T – 3 5 – T A A A A T – 3 5 – T A T A A T – 3 If two bases are equally frequent, they are designated by listing both bases separated by a line or a slash, as in 5  – T A T A A A/T – 3. Purines can be indicated by the abbreviation R, pyrimidines by Y, and any nucleotide by N. For example, the consensus sequence 5  – T A Y A R N A – 3 Template strand ◗ 13.11 In bacterial promoters, consensus sequences are found upstream of the start site, approximately at positions 10 and 35. means that the third nucleotide in the consensus sequence (Y) is usually a pyrimidine, but either pyrimidine is equally likely. Similarly, the fifth nucleotide in the sequence (R) is most likely one of the purines, but both are equally frequent. In the sixth position (N), no particular base is more common than any other. The presence of consensus in a set of nucleotides usually implies that the sequence is associated with an important function. Consensus exists in a sequence because natural selection has favored a restricted set of nucleotides in that position. The most commonly encountered consensus sequence, found in almost all bacterial promoters, is located just upstream of the start site, centered on position 10. Called the 10 consensus sequence or, sometimes, the Pribnow box, its sequence is 5 TATAAT 3 3 ATATTA 5 often written simply as TATAAT ( ◗ FIGURE 13.11). Remember that TATAAT is just the consensus sequence — representing the most commonly encountered nucleotides at each of these positions. In most prokaryotic promoters, the actual sequence is not TATAAT ( ◗ FIGURE 13.12). Another consensus sequence common to most bacterial promoters is TTGACA, which lies approximately 35 nucleotides upstream of the start site and is termed the 35 consensus sequence (see Figure 13.11). The nucleotides on either side of the 10 and 35 consensus sequences and those between them vary greatly from promoter to promoter, suggesting that they are relatively unimportant in promoter recognition. The function of these consensus sequences in bacterial promoters has been studied by inducing mutations at various positions within the consensus sequences and observing the effect of the changes on transcription. The results of these studies reveal that most base substitutions within the 10 and 35 consensus sequences reduce the rate of transcription; these substitutions are termed down mutations because they slow down the rate of transcription. Occasionally, a particular change in a consensus sequence increases the rate of transcription; such a change is called an up mutation. Transcription 16–18 base pairs 6–7 base pairs Promoter trp –35 region –10 region TTGACA TTAACT Transcription start site with this upstream element, greatly enhancing the rate of transcription in those bacterial promoters that possess it. A number of other proteins may bind to sequences in and near the promoter; some stimulate the rate of transcription and others repress it; we will consider the proteins that regulate gene expression in Chapter 16. Concepts tRNATyr TTTACA TATGAT lac TTTACA TATGTT recA TTGATA TATAAT rrnDI TTGTGC TATAAT araB,A,D CTGACG TACTGT Consensus sequences TTGACA TATAAT A promoter is a DNA sequence that is adjacent to a gene and required for transcription. Promoters contain short consensus sequences that are important in the initiation of transcription. Initial RNA synthesis After the holoenzyme has attached ◗ 13.12 In most prokaryotic promoters, the actual sequence is not TATAAT. The sequences shown are found in five E. coli promoters, including those of genes for tryptophan biosynthesis (trp), tyrosine tRNA (tRNATyr), lactose metabolism (lac), a recombination protein (recA), and arabinose metabolism (araB, A, D). These sequences are on the nontemplate strand and read 5 : 3, left to right. The sigma factor associates with the core enzyme ( ◗ FIGURE 13.13a) to form a holoenzyme, which binds to the 35 and 10 consensus sequences in the DNA promoter ( ◗ FIGURE 13.13b). Although it binds only the nucleotides of consensus sequences, the enzyme extends from 50 to 20 when bound to the promoter. The holoenzyme initially binds weakly to the promoter but then undergoes a change in structure that allows it to bind more tightly and unwind the double-stranded DNA ( ◗ FIGURE 13.13c). Unwinding begins within the 10 consensus sequence and extends downstream for about 17 nucleotides, including the start site. Some bacterial promoters contain a third consensus sequence that also takes part in the initiation of transcription. Called the upstream element, this sequence contains a number of A – T pairs and is found at about 40 to 60. The alpha subunit of the RNA polymerase interacts directly to the promoter, RNA polymerase is positioned over the start site for transcription (at position 1) and has unwound the DNA to produce a single-stranded template. The orientation and spacing of consensus sequences on a DNA strand determine which strand will be the template for transcription, and thereby determine the direction of transcription. The start site itself is not marked by a consensus sequence but often has the sequence CAT, with the start site at the A. The position of the start site is determined not by the sequences located there but by the location of the consensus sequences, which positions RNA polymerase so that the enzyme’s active site is aligned for initiation of transcription at 1. If the consensus sequences are artificially moved upstream or downstream, the location of the starting point of transcription correspondingly changes. To begin the synthesis of an RNA molecule, RNA polymerase pairs the base on a ribonucleoside triphosphate with its complementary base at the start site on the DNA template strand ( ◗ FIGURE 13.13d). No primer is required to initiate the synthesis of the 5 end of the RNA molecule. Two of the three phosphates are cleaved from the ribonucleoside triphosphate as the nucleotide is added to the 3 end of the growing RNA molecule. However, because the 5 end of the first ribonucleoside triphosphate does not take part in the formation of a phosphodiester bond, all three of its phosphates remain. An RNA molecule therefore possesses, at least initially, three phosphates at its 5 end ( ◗ FIGURE 13.13e). Elongation After initiation, RNA polymerase moves downstream along the template, progressively unwinding the DNA at the leading (downstream) edge of the transcription bubble, joining nucleotides to the RNA molecule according to the sequence on the template, and rewinding the DNA at the trailing (upstream) edge of the bubble. In bacterial 363 364 Chapter I3 (a) σ Core RNA polymerase Sigma factor 1 The sigma factor associates with the core enzyme to form a holoenzyme,… Promoter Transcription start (b) Holoenzyme + (c) 2 …which binds to the –35 and –10 consensus sequences in the promoter, creating a closed complex. σ Template strand P P σ N P CGGATTCG 3 The holoenzyme binds the promoter tightly and unwinds the double-stranded DNA, creating an open complex. Nucleoside triphosphate (NTP) (d) σ 5’ CGGATTCG P Pi (e) 3’ 4 A nucleoside triphosphate complementary to the DNA at the start site serves as the first in the RNA molecule. N σ GCCTAAGC 5 Two phosphates are cleaved from each subsequent nucleoside triphosphate, creating an RNA nucleotide that is added to the 3’ end of the growing RNA molecule. 6 The sigma factor is released as the RNA polymerase moves beyond the promoter. P P P G 3’ GCCTAAGC 3’ 5’ CGGATTCG CGGAUUCG 3’ GCCTAAGC 3’ 5’ P P P 5’ 3’ 5’ Conclusion: RNA transcription is initiated when core RNA polymerase binds to the promoter with the help of sigma. ◗ 13.13 Transcription in bacteria is carried out by RNA polymerase, which must bind to the sigma factor to initiate transcription. cells at 37°C, about 40 nucleotides are added per second. This rate of RNA synthesis is much lower than that of DNA synthesis, which is more than 1500 nucleotides per second in bacterial cells. Transcription takes place within a short stretch of about 18 nucleotides of unwound DNA — the transcription bubble. Within this region, RNA is continuously synthesized, with single-stranded DNA used as a template. About 8 nucleotides of newly synthesized RNA are paired with the DNA-template nucleotides at any one time. As the transcription apparatus moves down the DNA template, it generates positive supercoiling ahead of the transcription bubble and negative supercoiling behind it. Topoisomerase enzymes probably relieve the stress associated with the unwinding and rewinding of DNA in transcription, as they do in DNA replication. Concepts Transcription is initiated at the start site, which, in bacterial cells, is set by the binding of RNA polymerase to the consensus sequences of the Transcription promoter. Transcription takes place within the transcription bubble. DNA is unwound ahead of the bubble and rewound behind it. 1 A rho-independent terminator contains an inverted repeat followed by a string of approximately six adenine nucleotides. Inverted repeats DNA Termination GGCGGGCT CCGCCCGAAAAAAAA GGCGGGCT 3’ AGCCCGCC TCGGGCGG GCCGGGCUUUUUUUU 3’ CCGCCC G A AAAAAAA 5’ 2 The inverted repeats are transcribed into RNA,… 5’ AGCCCG CC RNA transcript GGCGGGCT CCGCCC G A 5’ 3 …and the inverted repeat in RNA folds into a hairpin loop, which causes RNA polymerase to pause. 4 The hydrogen bonds in the A–U base pairs break,… GGCGGGCT CCGCCC G A AAAAAAA 5’ Conclusion: Transcription terminates when inverted repeats form a hairpin followed by a string of uracils. CC UUUUUUU 3’ UCGGGCG G AA GCCCGC 5 …and the RNA transcript separates from the template, terminating transcription. C AGCCCGCC TCGGGCGG 3' 3’ UUUUUUU AAAAAAA CC C AGCCCGCC TCGGGCGG UCGGGCG G CAGCCCGC RNA polymerase moves along the template, adding nucleotides to the 3 end of the growing RNA molecule until it transcribes a terminator. Most terminators are found upstream of the point of termination. Transcription therefore does not suddenly end when polymerase reaches a terminator, like a car stopping in front of a stop sign. Rather, transcription ends after the terminator has been transcribed, like a car that stops only after running over a speed bump. At the terminator, several overlapping events are needed to bring an end to transcription: RNA polymerase must stop synthesizing RNA, the RNA molecule must be released from RNA polymerase, the newly made RNA molecule must dissociate fully from the DNA, and RNA polymerase must detach from the DNA template. Bacterial cells possess two major types of terminators. Rho-dependent terminators are able to cause the termination of transcription only in the presence of an ancillary protein called the rho factor. Rho-independent terminators are able to cause the end of transcription in the absence of rho. Rho-independent terminators have two common features. First, they contain inverted repeats (sequences of nucleotides on one strand that are inverted and complementary). When inverted repeats have been transcribed into RNA, a hairpin secondary structure forms ( ◗ FIGURE 13.14). Second, in rho-independent terminators, a string of approximately six adenine nucleotides follows the second inverted repeat in the template DNA. Their transcription produces a string of uracil nucleotides after the hairpin in the transcribed RNA. The presence of a hairpin in an RNA transcript causes RNA polymerase to slow down or pause, which creates an opportunity for termination. The adenine – uracil base pairings downstream of the hairpin are relatively unstable compared with other base pairings, and the formation of the hairpin may itself destablize the DNA – RNA pairing, causing the RNA molecule to separate from its DNA template. When the RNA transcript has separated from the template, RNA synthesis can no longer continue (see Figure 13.14). Rho-dependent terminators have two features: (1) DNA sequences that produce a pause in transcription; and (2) a DNA sequence that encodes a stretch of RNA upstream of the terminator that is devoid of any secondary structures. This unstructured RNA serves as binding site for the rho protein, which binds the RNA and moves toward its 3 end, following the RNA polymerase ( ◗ FIGURE 13.15). When RNA polymerase encounters the terminator, it pauses, allowing AGCCCGCC TCGGGCGG Hairpin ◗ 13.14 Termination by bacterial rho-independent terminators is a multistep process. rho to catch up. The rho protein has helicase activity, which it uses to unwind the RNA – DNA hybrid in the transcription bubble, bringing an end to transcription. 365 366 Chapter I3 Concepts 3’ Transcription ends after RNA polymerase transcribes a terminator. Bacterial cells possess two types of terminator: a rho-independent terminator, which RNA polymerase can recognize by itself; and a rho-dependent terminator, which RNA polymerase can recognize only with the help of the rho protein. Unstructured RNA 5’ rho Rho binds to an unstructured region of RNA and moves toward its 3’ end, following RNA polymerase. When RNA polymerase encounters a terminator sequence, it pauses,… Connecting Concepts The Basic Rules of Transcription 3’ Before we examine the process of eukaryotic transcription, let’s pause to summarize some of the general principles of bacterial transcription. …and rho catches up. 5’ The Basic Rules of Transcription 1. Transcription is a selective process; only certain parts of 2. 3. 4. 5. 6. 7. 8. the DNA are transcribed. RNA is transcribed from single-stranded DNA. Normally, only one of the two DNA strands — the template strand — is copied into RNA. Ribonucleoside triphosphates are used as the substrates in RNA synthesis. Two phosphates are cleaved from a ribonucleoside triphosphate, and the resulting nucleotide is joined to the 3-OH group of the growing RNA strand. RNA molecules are antiparallel and complementary to the DNA template strand. Transcription is always in the 5  : 3 direction, meaning that the RNA molecule grows at the 3 end. Transcription depends on RNA polymerase — a complex, multimeric enzyme. RNA polymerase consists of a core enzyme, which is capable of synthesizing RNA, and other subunits that may join transiently to perform additional functions. The core enzyme of RNA polymerase requires a sigma factor in order to bind to a promoter and initiate transcription. Promoters contain short sequences crucial in the binding of RNA polymerase to DNA; these consensus sequences are interspersed with nucleotides that play no known role in transcription. RNA polymerase binds to DNA at a promoter, begins transcribing at the start site of the gene, and ends transcription after a terminator has been transcribed. 3’ 5’ Using helicase activity, rho unwinds the DNA–RNA hybrid and brings an end to transcription. ◗ 13.15 The termination of transcription in some bacterial genes requires the presence of the rho protein. The Process of Eukaryotic Transcription The process of eukaryotic transcription is similar to that of bacterial transcription. Eukaryotic transcription also includes initiation, elongation, and termination, and the basic principles of transcription already outlined apply to eukaryotic transcription. However, there are some important differences. Eukaryotic cells possess three different RNA polymerases, each of which transcribes a different class of RNA and recognizes a different type of promoter. Thus, a generic promoter cannot be described for eukaryotic cells, as was done for bacterial cells; rather, a promoter’s description depends on whether the promoter is recognized by RNA polymerase I, II, or III. Another difference is in the nature of promoter recognition and initiation. Many proteins take part in the binding of eukaryotic RNA polymerases to DNA templates, and the different types of promoters require different proteins. Transcription and Nucleosome Structure www.whfreeman.com/pierce of transcription A brief overview of the process Transcription requires that sequences on DNA are accessible to RNA polymerase and other proteins. However, in Transcription eukaryotic cells, DNA is complexed with histone proteins in highly compressed chromatin (see Figure 11.5). How can the proteins necessary for transcription gain access to eukaryotic DNA when it is complexed with histones? The answer to this question is that, before transcription, the chromatin structure is modified so that the DNA is in a more open configuration and is more accessible to the transcription machinery. Several types of proteins have roles in chromatin modification. Acetyltransferases add acetyl groups to amino acids at the ends of the histone proteins, which destabilizes the nucleosome structure and makes the DNA more accessible. Other types of histone modification also can affect chromatin packing. In addition, proteins called chromatin- remodeling proteins may bind to the chromatin and displace nucleosomes from promoters and other regions important for transcription. We will take a closer look at the role of changes to chromatin structure associated with gene expression in Chapter 16. Transcription Initiation The initiation of transcription is a complex processes in eukaryotic cells because of the variety of initiation sequences and because numerous proteins bind to these sequences. Two broad classes of DNA sequences are important for the initiation of transcription: promoters and enhancers. A promoter is always found adjacent to (or sometimes within) the gene that it regulates and has a fixed location with regard to the transcription start point. An enhancer, in contrast, need not be adjacent to the gene; enhancers can affect the transcription of genes that are thousands of nucleotides away, and their positions relative to start sites can vary. A significant difference between bacterial and eukaryotic transcription is the existence of three different eukaryotic RNA polymerases, which recognize different types of promoters. In bacterial cells, the holoenzyme (RNA polymerase plus sigma) recognizes and binds directly to sequences in the promoter. In eukaryotic cells, promoter recognition is carried out by accessory proteins that bind to the promoter and then recruit a specific RNA polymerase (I, II, or III) to the promoter. One class of accessory proteins comprises general transcription factors, which, along with RNA polymerase, form the basal transcription apparatus that assembles near the DNA 5’ 3’ Regulatory promoter TFIIB recognition element start site and is sufficient to initiate minimal levels of transcription. Another class of accessory proteins consists of transcriptional activator proteins, which bind to specific DNA sequences and bring about higher levels of transcription by stimulating the assembly of the basal transcription apparatus at the start site. Concepts Two classes of DNA sequences in eukaryotic cells affect transcription: enhancers and promoters. A promoter is near the gene and has a fixed position relative to the start site of transcription. An enhancer can be distant from the gene and variable in location. RNA Polymerase II Promoters We will focus most of our attention on promoters recognized by RNA polymerase II, which transcribes the genes that encode proteins. A promoter for a gene transcribed by RNA polymerase II typically consists of two primary parts: the core promoter and the regulatory promoter. Core promoter The core promoter is located immediately upstream of the gene ( ◗ FIGURE 13.16) and typically includes one or more consensus sequences. The most common of these consensus sequences is the TATA box, which has the consensus sequence TATAAA and is located from 25 to 30 bp upstream of the start site. Mutations in the sequence of the TATA box affect the rate of transcription, and changing its position alters the location of the transcription start site. Another common consensus sequence in the core promoter is the TFIIB recognition element (BRE), which has the consensus sequence G/C G/C G/C C G C C and is located from 32 to 38 bp upstream of the start site. (TFIIB is the abbreviation for a transcription factor that binds to this element; see next subsection). Instead of a TATA box, some core promoters have an initiator element (Inr) that directly overlaps the start site and has the consensus Y Y A N T/A Y Y. Another consensus sequence called the downstream core promoter element (DPE) is found TATA box Initiator element Downstream core promoter element G/ G/ G/ C C C CGCC TATAAA YYAN T/AYY RG A/T CGTG –35 –25 +1 +30 Core promoter Transcription start site ◗ 13.16 The promoters of genes transcribed by RNA polymerase II consist of a core promoter and a regulatory promoter that contain consensus sequences. Not all the consensus sequences shown are found in all promoters. 367 368 Chapter I3 ( ◗ FIGURE 13.17). The general transcription factors include TFIIA, TFIIB, TFIID, TFIIE, TFIIF, and TFIIH, in which TFII stands for transcription factor for RNA polymerase II and the letter designates the individual factor. TFIID binds to the TATA box and positions the active site of RNA polymerase II so that it begins transcription at the correct place. TFIID consists of at least nine polypeptides. One of them is the TATA-binding protein (TBP), which recognizes and binds to the TATA box on the DNA template. The TATA-binding protein binds to the minor groove and straddles the DNA as a molecular saddle ( ◗ FIGURE 13.18), bending the DNA and partly unwinding it. Other proteins, called TBP-associated factors (TAFs), approximately 30 bp downstream of the start site in many promoters that also have Inr; the consensus sequence of DPE is R G A/T C G T G. All of these consensus sequences in the core promoter are recognized by transcription factors that bind to them and serve as a platform for the assembly of the basal transcription apparatus. Assembly of the basal transcription apparatus The basic transcriptional machinery that binds to DNA at the start site is called the basal transcription apparatus and is required to initiate minimal levels of transcription. It consists of RNA polymerase, a series of general transcription factors, and a complex of proteins known as the mediator TATA box Regulatory promoter DNA 1 TFIID binds to TATA box in the core promoter. Transcription start TFIID TBP Holoenzyme Mediator TFIIE TFIIA TFIIB TFIIH TFIIF RNA polymerase II 2 A preassembled holoenzyme consisting of RNA polymerase,… 3 …transcription factors, and the mediator binds to TFIID. 4 Transcription activator proteins bind to sequences in enhancers. Enhancer 5 DNA loops out, allowing the proteins bound to the enhancer to interact with the basal transcription apparatus. Transcriptional activator protein Coactivator Transcriptional activator protein 6 Transcription activator proteins bind to sequences in the regulatory promoter and interact with the basal transcription apparatus through the mediator. ◗ Basal transcription apparatus 13.17 Transcription is initiated at RNA polymerase II promoters when the TFIID transcription factor binds to the TATA box, followed by the binding of a preassembled holoenzyme containing general transcription factors, RNA polymerase II, and the mediator. Transcription TBP DNA ◗ 13.18 The TATA-binding protein (TBP) binds to the minor groove of DNA, straddling the double helix of DNA like a saddle. combine with TBP to form the complete TFIID transcription factor. The large holoenzyme consisting of RNA polymerase, additional transcription factors, and the mediator are thought to preassemble and bind as a unit to TFIID. The other transcription factors provide additional functions: TFIIA helps to stabilize the interaction between TBP and DNA, TFIIB plays a role in the selection of the start site, and TFIIH has helicase activity and unwinds the DNA during transcription. The mediator plays a role in communication between the basal transcription apparatus and transcriptional activator proteins (see next subsection). of a transcriptional activator protein to an enhancer affect the initiation of transcription at a gene thousands of nucleotides away? The answer is that the DNA between the enhancer and the promoter loops out, allowing the enhancer and the promoter to lie close to each other. Transcriptional activator proteins bound to the enhancer interact with proteins bound to the promoter and stimulate the transcription of the adjacent gene (see Figure 13.17). The looping of DNA between the enhancer and the promoter explains how the position of an enhancer can vary with regard to the start site — enhancers that are farther from the start site simply cause a longer length of DNA to loop out. Sequences having many of the properties possessed by enhancers sometimes take part in repressing transcription instead of enhancing it; such sequences are called silencers. Although enhancers and silencers are characteristic of eukaryotic DNA, some enhancer-like sequences have been found in bacterial cells. Concepts General transcription factors assemble into the basal transcription apparatus, which binds to DNA near the start site and is necessary for transcription to take place at minimal levels. Additional proteins called transcriptional activators bind to other consensus sequences in promoters and enhancers, and affect the rate of transcription. Regulatory promoter SV40 early promoter GC GC GC GC Core promoter GC GC Regulatory promoter The regulatory promoter is located immediately upstream of the core promoter. A variety of different consensus sequences may be found in the regulatory promoters, and they can be mixed and matched in different combinations ( ◗ FIGURE 13.19). Transcriptional activator proteins bind to these sequences and, either directly or indirectly (through the mediation of coactivator proteins), make contact with the mediator in the basal transcription apparatus and affect the rate at which transcription is initiated. Some regulatory promoters also contain repressing sequences, which are bound by proteins that lower the rate of transcription through inhibitory inactions with the mediator. Enhancers DNA sequences that increase the rate of transcription at distant genes are called enhancers. Furthermore, the precise position of an enhancer relative to a gene’s transcriptional start site is not critical; most enhancers can stimulate any promoter in their vicinities, and an enhancer may be upstream or downstream from the affected gene or, in some cases, within an intron of the gene itself. Enhancers also contain sequences that are recognized by transcriptional activator proteins. How does the binding Transcription start site Thymidine kinase promoter GC CAAT OCT Histone H2B promoter OCT CAAT CAAT –120 –100 GC TATA OCT TATA –80 –60 –40 –20 TATA Box CAAT Box GC Box OCT Box ◗ 13.19 The consensus sequences in promoters of three eukaryotic genes illustrate the principle that different sequences can be mixed and matched to yield a functional promoter. 369 370 Chapter I3 RNA Polymerase I Promoters RNA Polymerase III Promoters RNA polymerase I promoters have two functional sequences near the start site ( ◗ FIGURE 13.20). A core element surrounds the start site, extending from 45 to 20, and is needed to initiate transcription. An upstream control element extends from 180 to 107 and increases the efficiency of the core element. The DNA sequences of the core element and the upstream control element are rich in guanine and cytosine nucleotides and are similar in sequence. RNA polymerase I requires two proteins to initiate transcription: SL1 and UBF. SL1 is made up of four subunits, one of which is TBP — the same protein that binds the TATA box in RNA polymerase II promoters (see Figure 13.17). The other protein, UBF, binds to both the core element and the upstream control element (see Figure 13.20) and enables SL1 to bind to the promoter. SL1 then recruits RNA polymerase I to the promoter. Thus; RNA polymerase I promoters function much as RNA polymerase II promoters do: transcription factors bind to a consensus sequence in the promoter and recruit RNA polymerase to the start site. RNA polymerase III transcribes small rRNA, tRNAs, and some snRNAs (see Table 13.3) and recognizes several distinct types of promoters. The promoters of snRNA genes transcribed by RNA polymerase III contain several consensus sequences that are also found in some promoters transcribed by RNA polymerase II ( ◗ FIGURE 13.21a). These consensus sequences include the TATA box, which is recognized by a transcription factor that contains TBP. As in other types of eukaryotic promoters, TBP positions the active site of RNA polymerase over the start site for transcription in these promoters. Promoters for small rRNA and tRNA genes, also transcribed by RNA polymerase III, contain internal promoters that are downstream of the start site and are actually transcribed into the RNA ( ◗ FIGURE 13.21 b AND c). These promoters contain critical sets of nucleotides (boxes A, B, and C) that also are recognized by transcription factors. One of these transcription factors includes TBP, which, again, positions the active site of RNA polymerase III over the upstream start site and ensures that the enzyme initiates transcription at the correct location. Additional transcription factors then bind to the DNA-binding factor and recruit RNA polymerase to the initiation complex. Concepts RNA polymerase I promoters have two key components: (1) the core element, which surrounds the start site and is sufficient to initiate transcription, and (2) the upstream control sequence, which increases the efficiency of the core promoter. Concepts Some RNA polymerase III promoters are upstream of the start site and contain a TATA box. Others are internal, imbedded within the transcribed sequence downstream of the start site. 1 The core element is needed to initiate transcription. Upstream control element –180 –107 Core element –45 SL1 SL1 UBF UBF 4 …and enables SL1 to bind to the promoter. 5 SL1 then recruits RNA polymerase I to the promoter. SL1 UBF ◗ 13.20 The basal transcription apparatus assembles at RNA polymerase I promoters. +20 Transcription start site 2 The upstream control element increases the efficiency of the core element. 3 Transcription factor UBF binds to both the core element and the upstream control element… +1 Polymerase I SL1 UBF Transcription 2. Some sequences that affect transcription, called (a) snRNA gene Upstream promoter OCT TATA PSE 3. snRNA (b) 4. small (5S) rRNA gene Internal promoter Box A Box C Transcription Boxes represent specific sequences start site recognized by transcription factors. 5. rRNA 6. (c) tRNA gene Internal promoter Box A Box B tRNA Conclusion: Promoters for RNA polymerase III vary in their sequences and positions relative to the gene. ◗ 13.21 RNA polymerase III recognizes several different types of promoters. OCT and PSE are consensus sequences that may also be present in RNA polymerase II promoters. Connecting Concepts Characteristics of Eukaryotic Promoters and Transcription Factors Mastering the details of eukaryotic promoters and their associated transcription factors is a daunting task even for experienced researchers, never mind the beginning genetics student. Let’s step back from the detail for a moment and identify some general principles of eukaryotic promoters and transcription factors: 1. Several types of DNA sequences take part in the initiation of transcription in eukaryotic cells. These sequences generally serve as the binding sites for proteins that interact with RNA polymerase and influence the initiation of transcription. promoters, are adjacent to or within the RNA coding region and are relatively fixed with regard to the start site of transcription. Promoters consist of a core promoter located adjacent to the gene and a regulatory promoter located farther upstream. Other sequences, called enhancers, are distant from the gene and function independently of position and direction. Enhancers stimulate transcription. General transcription factors bind to the core promoter near the start site and, with RNA polymerase, assemble into a basal transcription apparatus. The TATA-binding protein (TBP) is a critical transcription factor that positions the active site of RNA polymerase over the start site. Transcriptional activator proteins bind to sequences in the regulatory promoter and enhancers and affect transcription by interacting with the basal transcription apparatus. Proteins binding to enhancers interact with the basal transcription apparatus by causing the DNA between the promoter and the enhancer to loop out, bringing the enhancer into close proximity to the promoter. Evolutionary Relationships and the TATA-Binding Protein Some 2 billion to 3 billion years ago, life diverged into three lines of evolutionary descent: the eubacteria, the archaea, and the eukaryotes (see Chapter 2). Although eubacteria and archaea are superficially similar — both are unicellular and lack a nucleus — the results of studies of their DNA sequences and other biochemical properties indicate that they are distantly related. The evolutionary distinction between archaea, eubacteria, and eukaryotes is clear: however, did eukaryotes first diverge from an ancestral prokaryote, with the later separation of prokaryotes into eubacteria and archaea, or did the archaea and the eubacteria split first, with the eukaryotes later evolving from one of these groups? Studies of transcription in eubacteria, archaea, and eukaryotes have yielded important findings about the evolutionary relationships of these organisms. The results of studies in 1994 demonstrated that archaea possess a TATAbinding protein, a critical transcription factor in all three of the eukaryotic polymerases. The binding of TBP to DNA is the first step in the assembly of the eukaryotic transcription apparatus. In earlier studies, TATA-like sequences were found in eukaryotic cells, but no such sequences have been found in eubacteria. TBP binds the TATA box in archaea with the help of another transcription factor, TFIIB, which is also found in eukaryotes but not in eubacteria. Together these findings indicate that transcription, one of the most basic of life processes, has strong similarities in 371 372 Chapter I3 eukaryotes and archaea, suggesting that these two groups are more closely related to each other than either is to the eubacteria. This conclusion is supported by other data, including those obtained from a comparison of gene sequences. Concepts Termination Connecting Concepts Across Chapters The termination of transcription in eukaryotic genes is less well understood than in bacterial genes. The three eukaryotic RNA polymerases use different mechanisms for termination. RNA polymerase I requires a termination factor, like the rho factor utilized in termination of some bacterial genes. Unlike rho, which binds to the newly transcribed RNA molecule, the termination factor for RNA polymerase I binds to a DNA sequence downstream of the termination site. RNA polymerase III ends transcription after transcribing a terminator sequence that produces a string of Us in the RNA molecule, like that produced by the rho-independent terminators of bacteria. Unlike rho-independent terminators in bacterial cells, RNA polymerase III does not requre that a hairpin structure precede the string of Us. In many of the genes transcribed by RNA polymerase II, transcription can end at multiple sites located within a span of hundreds or thousands of base pairs. As we will see in Chapter 14, the transcription of these genes continues well beyond the coding sequence necessary to produce the mRNA. After transcription, the 3 end of premRNA is cleaved at a specific site, designated by a consensus sequence, producing the mature mRNA. Research findings suggest that termination is coupled to cleavage, which is carried out by a cleavage complex that probably associates with the RNA polymerase. This complex may suppress termination until the consensus sequence that marks the cleavage site is encountered. The 3 end of the pre-mRNA is then cleaved by the complex, and transcription is terminated downstream. This chapter has focused on the process of transcription, during which an RNA molecule that is complementary and antiparallel to a DNA template is synthesized. Transciption is the first step in gene expression, the transfer of genetic information from genotype to phenotype and, as we will see in Chapter 16, is an important point at which gene expression is regulated. Transcription is similar in many respects to replication — it utilizes a DNA template, takes place in the 5  : 3 direction, synthesizes a molecule that is antiparallel and complementary to the template, and utilizes nucleoside triphosphates as substrates. But there are important differences as well: only one strand is typically transcribed, each gene is transcribed separately, and the process is subject to numerous regulatory mechanisms. This chapter has provided important links to topics discussed in several other chapters of the book. Transcription is the first step in the molecular transfer of genestic information from the genotype to the phenotype and is therefore the starting point for discussions of RNA processing in Chapter 14 and translation in Chapter 15. Knowledge of the details of transcription is also essential for understanding gene regulation (Chapter 16), because transcription is an important point at which the expression of many genes is controlled. Additionally, because transcription factors play an important role in some types of cancer, the information in this chapter will be useful when we consider the molecular basis of cancer in Chapter 21. The different eukaryotic RNA polymerases utilize different mechanisms of termination. CONCEPTS SUMMARY • RNA molecules can function as biological catalysts and may have been the first carriers of genetic information. • RNA is a polymer, consisting of nucleotides joined together by phosphodiester bonds. Each RNA nucleotide consists of a ribose sugar, a phosphate, and a base. RNA contains the base uracil; it is usually single stranded, which allows it to form secondary structures. • Ribosomal RNA is a component of the ribosome, messenger RNA carries coding instructions for proteins, and transfer RNA helps incorporate the amino acids into a polypeptide chain. Other RNA molecules found in eukaryotic cells include pre-mRNAs, the precursor of mRNA; snRNAs, which function in the processing of pre-mRNAs; snoRNAs, which process rRNA; and scRNAs, which exist in the cytoplasm. • The template for RNA synthesis is single-stranded DNA. In transcription, RNA synthesis is complementary and antiparallel to the DNA template strand. • A transcription unit consists of a promoter, an RNA-coding region, and a terminator. • The substrates for RNA synthesis are ribonucleoside triphosphates. In transcription, two phosphates are cleaved from a ribonucleoside triphosphate and the remaining phosphate takes part in a phosphodiester bond with the 3-OH group at the growing end of the RNA molecule. • RNA polymerase in bacterial cells consists of a core enzyme, which catalyzes the addition of nucleotides to an RNA molecule, and other subunits, which join the core enzyme to provide additional functions. The sigma factor controls the Transcription • • • • • binding of the core enzyme to the promoter; rho and NusA assist in the termination of transcription. Eukaryotic cells contain three RNA polymerases: RNA polymerase I, which transcribes rRNA; RNA polymerase II, which transcribes pre-mRNA and some snRNAs; and RNA polymerase III, which transcribes tRNAs, small rRNA, and some snRNAs. The process of transcription consists of three stages: initiation, elongation, and termination. Promoters are recognized by the transcription apparatus and are required for transcription. They contain short consensus sequences imbedded within longer stretches of DNA. Transcription begins at the start site, which is determined by the consensus sequences. A short stretch of DNA is unwound near the start site, RNA is synthesized from a single strand of DNA as a template, and the DNA is rewound at the lagging end of the transcription bubble. Terminators consist of sequences within the RNA coding region; RNA synthesis ceases after the terminator has been transcribed. Bacterial cells have two types of terminators: rhoindependent terminators, which RNA polymerase can recognize by itself, and rho-dependent terminators, which RNA polymerase can recognize only with the help of the rho protein. 373 • In eukaryotic cells, DNA is complexed to histone proteins, which interfere with the binding of transcription factors and RNA polymerase. Chromatin may be modified by acetylation, chromatin-remodeling proteins, and other factors, allowing transcription factors and RNA polymerase to bind to the DNA. • Two classes of sequences affect transcription in eukaryotic cells: promoters, which are adjacent to genes, and enhancers, which may be distant to the genes that they affect. • A promoter for RNA polymerase II consists of a core promoter, which is required for minimal levels of transcription, and a regulatory promoter, which affects the rate of transcription. • General transcription factors bind to the core promoter and are part of the basal transcription apparatus. Transcriptional activator proteins bind to sequences in regulatory promoters and enhancers and interact with the basal transcription apparatus at the core promoter. • The three types of RNA polymerase in eukaryotic cells recognize different types of promoters, all of which have consensus sequences that serve as binding sites for transcription factors. • The three RNA polymerases found in eukaryotic cells use different mechanisms of termination. IMPORTANT TERMS ribozyme (p. 354) ribosomal RNA (rRNA) (p. 355) messenger RNA (mRNA) (p. 355) pre-messenger RNA (pre-mRNA) (p. 355) transfer RNA (tRNA) (p. 355) small nuclear RNA (snRNA) (p. 355) small nuclear ribonucleoprotein (snRNP) (p. 355) small nucleolar RNA (snoRNA) (p. 355) small cytoplasmic RNA (scRNA) (p. 356) template strand (p. 357) nontemplate strand (p. 357) transcription unit (p. 359) promoter (p. 359) RNA-coding region (p. 359) terminator (p. 359) ribonucleoside triphosphate (rNTP) (p. 359) RNA polymerase (p. 360) core enzyme (p. 360) sigma factor (p. 360) holoenzyme (p. 360) RNA polymerase I (p. 361) RNA polymerase II (p. 361) RNA polymerase III (p. 361) consensus sequence (p. 362) 10 consensus sequence (Pribnow box) (p. 362) 35 consensus sequence (p. 362) upstream element (p. 363) rho-dependent terminator (p. 365) rho factor (p. 365) rho-independent terminator (p. 365) general transcription factor (p. 367) basal transcription apparatus (p. 367) transcriptional activator protein (p. 367) core promoter (p. 367) TATA box (p. 367) TATA-binding protein (TBP) (p. 368) regulatory promoter (p. 369) enhancer (p. 369) silencer (p. 369) internal promoter (p. 370) Worked Problems 1. The following diagram represents a sequence of nucleotides surrounding an RNA coding sequence. RNA 5 – CATGTT. . . TTGATGT – coding – GACGA. . . TTTATA. . . GGCGCGC – 3 3 – GTACAA. . . AACTACA – sequence – CTGCT. . . AAATAT. . . CCGCGCG – 5 (a) Is the RNA coding sequence likely to be from a bacterial cell or from a eukaryotic cell? How can you tell? (b) Which DNA strand will serve as the template strand during transcription of the RNA coding sequence? 374 Chapter I3 • Solution (a) Bacterial and eukaryotic cells use the same DNA bases (A, T, G, and C); so the bases themselves provide no clue to the origin of the sequence. The RNA coding sequence must have a promoter, and bacterial and eukaryotic cells do differ in the consensus sequences found in their promoters; so we should examine the sequences for the presence of familiar consensus sequences. On the bottom strand to the right of the RNA coding sequence we find AAATAT, which, written in the conventional manner (5 on the left), is 5  – TATAAA – 3. This sequence is the TATA box found in most eukaryotic promoters. However, the sequence is also quite similar to the 10 consensus sequence (5  – TATAAT – 3) found in bacterial promoters. Farther to the right on the bottom strand, we also see 5  – GCGCGCC – 3, which is the TFIIB recognition element (BRE) in eukaryotic RNA polymerase II promoters. No similar consensus sequence is found in bacterial promoters; so we can be fairly certain that this sequence is a eukaryotic promoter and RNA coding sequence. (b) The TATA box and BRE of RNA polymerase II promoters are upstream of the RNA coding sequences; so RNA polymerase must bind to these sequences and then proceed downstream, transcribing the RNA coding sequence. Thus RNA polymerase must proceed from right (upstream) to left (downstream). The RNA molecule is always synthesized in the 5 : 3 direction and is antiparallel to the DNA template strand; so the template strand must be read 3 : 5. If the enzyme proceeds from right to left and reads the template in the 3 : 5 direction, the upper strand must be the template, as shown here. Template strand RNA 5–CATGTT ...TTGATGT– coding 3–GTACAA ...AACTACA– sequence RNA polymerase –GACGA... TTTATA...GGCGCGC– 3 –CTGCT... AAATAT...CCGCGCG– 5 Direction of transcription Suppose that a consensus sequence in the regulatory promoter of a gene that encodes enzyme A were deleted. Which of the following effects would result from this deletion? 2. (a) Enzyme A would have a different amino acid sequence. (b) The mRNA for enzyme A would be abnormally short. (c) Enzyme A would be missing some amino acids. (d) The mRNA for enzyme A would be transcribed but not translated. (e) The amount of mRNA transcribed would be affected. Explain your reasoning. • Solution The correct answer is part e. The regulatory protein contains binding sites for transcriptional activator proteins. These sequences are not part of the RNA coding sequence for enzyme A; so the mutation would have no effect on the length or amino acid sequence of the enzyme, eliminating answers a, b, and c. The TATA box is the binding site for the basal transcription apparatus. Transcriptional activator proteins bind to the regulatory promoter and affect the amount of transcription that takes place through interactions with the basal transcription apparatus at the core promoter. COMPREHENSION QUESTIONS * 1. Draw an RNA nucleotide and a DNA nucleotide, highlighting the differences. How is the structure of RNA similar to that of DNA? How is it different? 2. What are the major classes of cellular RNA? Where would you expect to find each class of RNA within eukaryotic cells? * 3. What parts of DNA make up a transcription unit? Draw and label a typical transcription unit in a bacterial cell. 4. What is the substrate for RNA synthesis? How is this substrate modified and joined together to produce an RNA molecule? 5. Describe the structure of bacterial RNA polymerase. * 6. Give the names of the three RNA polymerases found in eukaryotic cells and the types of RNA that they transcribe. 7. What are the four basic stages of transcription? Describe what happens at each stage. * 8. Draw and label a typical bacterial promoter. Include any common consensus sequences. 9. What are the two basic types of terminators found in bacterial cells? Describe the structure of each. 10. How is the process of transcription in eukaryotic cells different from that in bacterial cells? *11. How are promoters and enhancers similar? How are they different? 12. How can an enhancer affect the transcription of a gene that is thousands of nucleotides away? 13. Compare the roles of general transcription factors and transcriptional activator proteins. 14. What are some of the common consensus sequences found in RNA polymerase II promoters? *15. What protein associated with a transcription factor is common to all eukaryotic promoters? What is its function in transcription? *16. Compare and contrast transcription and replication. How are these processes similar and how are they different? Transcription 375 APPLICATION QUESTIONS AND PROBLEMS 17. Write the consensus sequence for the following set of nucleotide sequences. *18. 19. 20. *21. 22. 23. 24. (b) If this DNA molecule is transcribed, which strand will be the template strand and which will be the nontemplate strand? AGGAGTT (c) Where, approximately, will the start site of transcription be? AGCTATT *25. What would be the most likely effect of a mutation at the TGCAATA following locations in E. coli gene? ACGAAAA (a) 8 (c) 20 TCCTAAT TGCAATT (b) 35 (d) Start site 26. A strain of bacteria possesses a temperature-sensitive List at least five properties that DNA polymerases and RNA mutation in the gene that encodes the sigma factor. At polymerases have in common. List at least three differences. elevated temperatures, the mutant bacteria produce a sigma RNA molecules have three phosphates at the 5 end, but DNA factor that is unable to bind to RNA polymerase. What effect molecules never do. Explain this difference. will this mutation have on the process of transcription when An RNA molecule has the following percentages of bases: the bacteria are raised at elevated temperatures? A 23%, U 42%, C 21%, and G 14%. *27. The following diagram represents a transcription unit on a (a) Is this RNA single stranded or double stranded? How can DNA molecule. you tell? Transcription start site (b) What would be the percentages of bases in the template strand of the DNA that contains the gene for this RNA? 5 The following diagram represents DNA that is part of the 3 RNA-coding sequence of a transcription unit. The bottom Template strand strand is the template strand. Give the sequence found on the RNA molecule transcribed from this DNA and label the (a) Assume that this DNA molecule is from a bacterial cell. 5 and 3 ends of the RNA. Draw in the approximate location of the promoter and terminator for this transcription unit. 5 – ATAGGCGATGCCA – 3 3 – TATCCGCTACGGT – 5 ;9 Template strand (b) Assume that this DNA molecule is from a eukaryotic cell. Draw in the approximate location of an RNA polymerase II The following sequence of nucleotides is found in a singlepromoter. stranded DNA template: (c) Assume that this DNA molecule is from a eukaryotic cell. Draw in the approximate location of an internal RNA ATTGCCAGATCATCCCAATAGAT polymerase III promoter. 28. The following DNA nucleotides are found near the end of a Assume that RNA polymerase proceeds along this template bacterial transcription unit. Find the terminator in this from left to right. sequence. (a) Which end of the DNA template is 5 and which end is 3? (b) Give the sequence and label the 5 and 3 ends of the 3 – AGCATACAGCAGACCGTTGGTCTGAAAAAAGCATACA – 5 RNA copied from this template. (a) Mark the point at which transcription will terminate. Write out a hypothetical sequence of bases that might be found in the first 20 nucleotides of a promoter of a bacterial gene. (b) Is this terminator rho independent or rho dependent? Include both strands of DNA and label the 5 and 3 ends of (c) Draw a diagram of the RNA that will be transcribed from both strands. Be sure to include the start site for transcription this DNA, including its nucleotide sequence and any and any consensus sequences found in the promoter. secondary structures that form. The following diagram represents a transcription unit in a *29. A strain of bacteria possesses a temperature-sensitive mutation hypothetical DNA molecule. in the gene that encodes the rho subunit of RNA polymerase. At high temperatures, rho is not functional. When these 5 TTGACA TATAAT 3 bacteria are raised at elevated temperatures, which of the 3 AACTGT ATATTA 5 following effects would you expect to see? (a) Transcription does not take place. (a) On the basis of the information given, is this DNA from a (b) All RNA molecules are shorter than normal. bacterium or from a eukaryotic organism? 376 Chapter I3 (c) All RNA molecules are longer than normal. (d) Some RNA molecules are longer than normal. (e) RNA is copied from both DNA strands. Explain your reasoning for accepting or rejecting each of these five options. 30. Suppose that the string of As following the inverted repeat in a rho-independent terminator were deleted, but the inverted repeat were left intact. How would this deletion affect termination? What would happen when RNA polymerase reached this region? *31. Through genetic engineering, a geneticist mutates the gene that encodes TBP in cultured human cells. This mutation destroys the ability of TBP to bind to the TATA box. Predict the effect of this mutation on cells that possess it. 32. Elaborate repair mechanisms are associated with replication to prevent permanent mutations in DNA, yet no similar repair is associated with transcription. Can you think of a reason for these differences in replication and transcription? (Hint: Think about the relative effects of a permanent mutation in a DNA molecule compared with one in an RNA molecule.) CHALLENGE QUESTIONS 33. Enhancers are sequences that affect the initiation of the transcription of genes that are hundreds or thousands of nucleotides away. Enhancer-binding proteins usually interact directly with transcription factors at promoters by causing the intervening DNA to loop out. An enhancer of bacteriophage T4 does not function by looping of the DNA (D. R. Herendeen, et al., 1992, Science 256:1298 – 1303). Propose some additional mechanisms (other than DNA looping) by which this enhancer might affect transcription at a gene thousands of nucleotides away. *34. The location of the TATA box in two species of yeast, Saccharomyces pombe and S. cerevisiae, differs dramatically. The TATA box of S. pombe is about 30 nucleotides upstream of the start site, similar to the location for most other eukaryotic cells. However, the TATA box of S. cerevisiae can be as many as 120 nucleotides upstream of the start site. To understand how the TATA box functions in these two species, a series of experiments was conducted to determine which components of the transcription apparatus of these two species could be interchanged. In these experiments, different components of the transcription apparatus were switched in S. pombe and S. cerevisiae, and the effects of the switch on the level of RNA synthesis and on the start point of transcription were observed. TFIID from S. pombe could be used in S. cerevisiae cells and vice versa, without any effect on the transcription start site in either cell type. Switching TFIIB, TFIIE, or RNA polymerase did alter the level of transcription. However, the following pairs of components could be exchanged without affecting transcription: TFIIE together with TFIIH; and TFIIB together with RNA polymerase. The exchange of TFIIE – TFIIH did not alter the start point, but the exchange of TFIIB – RNA polymerase did shift it. (Y. Li, P. M. Flanagan, H. Tschochner, and R. D. Kornberg, 1994, Science 263:805 – 807.) On the basis of these results, what conclusions can you draw about how the different components of the transcription apparatus interact and which components are responsible for setting the start site? Propose a mechanism for the determination of the start site in eukaryotic RNA polymerase II promoters. 35. The relation between chromatin structure and transcription activity has been the focus of recent research. In one set of experiments, the level of in vitro transcription of a Drosophila gene by RNA polymerase II was studied with the use of DNA and various combinations of histone proteins. First, the level of transcription was measured for naked DNA with no associated histone proteins. Then, the level of transcription was measured after nucleosome octamers (without H1) were added to the DNA. The addition of the octamers caused the level of transcription to drop by 50%. When both nucleosome octamers and H1 proteins were added to the DNA, transcription was greatly repressed, dropping to less than 1% of that obtained with naked DNA (see the table below). GAL4-VP16 is a protein that binds to the DNA of certain eukaryotic genes. When GAL4-VP16 is added to DNA, the level of RNA polymerase II transcription is greatly elevated. Even in the presence of the H1 protein, GAL4-VP16 stimulates high levels of transcription. Propose a mechanism for how the H1 protein represses transcription and how GAL4-VP16 overcomes this repression. Explain how your proposed mechanism would produce the results obtained in these experiments. Treatment Naked DNA DNA  octamers DNA  octamers  H1 DNA  GAL4-VP16 DNA  octamers  GAL4-VP16 DNA  octamers  H1  GAL4-VP16 Relative amount of transcription 100 50 1 1000 1000 1000 (Based on experiments reported in an article by G. E. Croston et al., 1991, Science 251:643 – 649.) Transcription 377 SUGGESTED READINGS Atchinson, M. L. 1988. Enhancers: mechanisms of action and cell specificity. Annual Review of Cell Biology 4:127 – 154. A good review of the mechanism by which enhancers influence the transcription of distant genes. Baumann, P., S. A. Qreshi, and S. P. Jackson. 1995. Transcription: new insights from studies on archaea. Trends in Genetics 11:279 – 283. A discussion of how the transcription in archaea is similar to that of eukaryotes. Cramer, P., D. A. Bushnell, J. Fu, A. L. Gnatt, B. Maier-Davis, N. E. Thompson, R. R. Burgess, A. M. Edwards, P. R. David, and R. D. Kornberg. 2000. Architecture of RNA polymerase II and implications for the transcription mechanism. Science 288:640 – 649. Report of the detailed structure of RNA polymerase II. Gesteland, R. F., and J. F. Atkins. 1993. The RNA World. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press. Contains a number of chapters on ribozymes and their possible role in the early evolution of life. Helmann, J. D., and M. J. Chamberlin. 1988. Structure and function of bacterial sigma factors. Annual Review of Biochemistry 57:839 – 872. Review of sigma factors and their role in bacterial transcription initiation. Kim, Y., J. H. Geiger, S. Hahn, and P. B. Sigler. 1993. Crystal structure of a yeast TBP/TATA-box complex. Nature 365:512 – 527. Report of the three-dimensional structure of the TBP protein binding to the TATA box. Korzheva, N., A. Mustaev, M. Kozlov, A. Malhotra, V. Nikiforov, A. Goldfarb, and S. A. Darst. 2000. A structural model of transcription elongation. Science 289:619 – 625. Presentation of a model of the transcription apparatus. Lee, T. I., and R. A. Young. 2000. Transcription of eukaryotic protein-encoding genes. Annual Review of Genetics 34:77 – 138. A good review of how eukaryotic genes are transcribed by RNA polymerase II. Nikolov, D. B. 1992. Crystal structure of TFIID TATA-box binding protein. Nature 360:40 – 45. A look at the three-dimensional structure of TFIID. Ptashne, M., and A. Gann. 1997. Transcriptional activation by recruitment. Nature 386:569 – 577. Excellent summary of how prokaryotic and eukaryotic proteins that bind to promoters affect transcription. Rowlands, R., P. Baumann, and S. P. Jackson. 1994. The TATAbinding protein: a general transcription factor in eukaryotes and archaebacteria. Science 264:1326 – 1329. Report of a TATA-binding protein in archaea. von Hippel, P. H. 1998. An integrated model of the transcription complex in elongation, termination, and editing. Science 281:660 – 665. Review of how the transcription apparatus elongates, terminates, and edits during transcription. Young, R. A. 1991. RNA polymerase II. Annual Review of Biochemistry 60:689 – 716. A review of RNA polymerase II. 14 RNA Molecules and RNA Processing • • The Immense Dystrophin Gene Gene Structure Gene Organization Introns The Concept of the Gene Revisited • Messenger RNA The Structure of Messenger RNA Pre-mRNA Processing The Addition of the 5 Cap The Addition of the Poly(A) Tail RNA Splicing Alternative Processing Pathways RNA Editing • Transfer RNA The Structure of Transfer RNA Transfer RNA Gene Structure and Processing • Ribosomal RNA The Structure of the Ribosome Ribosomal RNA Gene Structure and Processing This is Chapter 14 Opener photo legend to position here. (Courtesy of the Muscular Dystrophy Association.) The Immense Dystrophin Gene The most common and devastating of the muscular dystrophies is Duchenne muscular dystrophy, a fatal disease that strikes nearly 1 in 3500 males. At birth, affected boys appear normal. The first symptom is mild muscle weakness appearing between 3 and 5 years of age: the child stumbles frequently, has difficulty climbing stairs, and is unable to rise from a sitting position. In time, the arm and leg muscles become progressively weaker. By age 11, those affected are usually confined to a wheel chair and, by age 20, most persons with Duchenne muscular dystrophy have died. At present, there is no cure for the disease. Duchenne muscular dystrophy was first recognized in 1852, and the disease was fully described in 1861 by Benjamin A. Duchenne, a Paris physician. Even before Mendel’s laws were discovered, physicians noticed its X-linked 2 pattern of inheritance, remarking that the disease developed almost exclusively in males and seemed to be inherited through unaffected mothers. In spite of this early recognition of its hereditary basis, the biochemical cause of Duchenne muscular dystrophy remained a mystery until 1987. In 1985, Louis Kunkel and his colleagues at Harvard Medical School observed a boy with Duchenne muscular dystrophy whose X chromosome had a visible deletion on the short arm. Reasoning that this boy’s disease was caused by the absence of a gene within the deletion, they recognized that the deletion pointed to the location on the X chromosome of the gene responsible for Duchenne muscular dystrophy. Kunkel and his colleagues located and cloned the piece of DNA responsible for the disease. Shortly thereafter, the sequence of the gene was determined, and the protein that it encodes was isolated. This large protein, called dystrophin, consists of nearly 4000 amino acids and RNA Molecules and RNA Processing is an integral component of muscle cells. Persons with Duchenne muscular dystrophy lack functional dystrophin. The dystrophin gene is among the most remarkable of all genes yet examined. It’s huge, encompassing more than 2 million nucleotides of DNA. However, only about 12,000 of its nucleotides encode its amino acids. Why is the dystrophin gene so large? What are all those other nucleotides doing? The unusual properties of the dystrophin gene make sense only in the context of RNA processing — the alteration of RNA after it has been transcribed. Dystrophin mRNA, like many eukaryotic RNAs, undergoes extensive processing after transcription, including the removal of large sections that are not required for translation. Chapter 13 focused on transcription — the process of RNA synthesis. In this chapter, we will examine the function and processing of RNA. We begin by taking a careful look at the nature of the gene. Next, we examine messenger RNA, its structure, and how it is modified in eukaryotes after transcription. We’ll also see how, through alternative pathways of RNA modification, one gene can produce several different proteins. Then, we turn to transfer RNA, the adapter molecule that forms the interface between amino acids and mRNA in protein synthesis. Finally, we examine ribosomal RNA, the structure and organization of rRNA genes, and how rRNAs are processed. As we explore the world of RNA and its role in gene function, we will see evidence of two important characteristics of this nucleic acid. First, RNA is extremely versatile, both structurally and biochemically. It can assume a number of different secondary structures, which provide the basis for its functional diversity. Second, RNA processing and function frequently include interactions between two or more RNA molecules. www.whfreeman.com/pierce More information about Duchenne muscular dystrophy DNA 5’ 3’ Gene Structure What is a gene? In Chapter 3, it was noted that the definition of gene would appear to change as we explored different aspects of heredity. A gene was defined as an inherited factor that determined a trait. This definition may have seemed vague, because it says nothing about what a gene is, only what it does. Nevertheless, this definition was appropriate for our purposes at the time, because our focus was on how genes influence the inheritance of traits. It wasn’t necessary to consider the physical nature of the gene in learning the rules of inheritance. Knowing something about the chemical structure of DNA and the process of transcription enables us to be more precise about what a gene is. Chapter 10 described how genetic information is encoded in the base sequence of DNA; so a gene consists of a set of DNA nucleotides. But how many nucleotides are encompassed in a gene, and how is the information in these nucleotides organized? In 1902, Archibald Garrod suggested, correctly, that genes code for proteins (see p. 000). Proteins are made of amino acids; so a gene contains the nucleotides that specify the amino acids of a protein. We could, then, define a gene as a set of nucleotides that specifies the amino acid sequence of a protein, which indeed was for many years the working definition of a gene. As geneticists learned more about the structure of genes, however, it became clear that this concept of a gene was an oversimplification. Gene Organization Early work on gene structure was carried out largely through the examination of mutations in bacteria and viruses. This research led Francis Crick in 1958 to propose that genes and proteins are colinear — that there is a direct correspondence between the nucleotide sequence of DNA and the amino acid sequence of a protein ( ◗ FIGURE 14.1). CGTGGATACACTTTTGCCGTTTCT GCACCTATGTGAAAACGGCAAAGA DNA Transcription Transcription ranscription RNA mRNA 5’ CGUGGAUACACUUUUGCCGUUUCU 3’ 5’ 1 A continuous sequence of nucleotides in the DNA… 3’ Translation Codons Translation PROTEIN Polypeptide chain Arg Gly Tyr Thr Phe Ala Val Ser Amino acids ◗ 14.1 The concept of colinearity suggests that a continuous sequence of nucleotides in DNA encodes a continuous sequence of amino acids in a protein. 2 …codes for a continuous sequence of amino acids in the protein. Conclusion: With colinearity, the number of nucleotides in the gene is proportional to the number of amino acids in the protein. 3 4 Chapter I4 The concept of colinearity suggests that the number of nucleotides in a gene should be proportional to the number of amino acids in the protein encoded by that gene. In a general sense, this concept is true for genes found in bacterial cells and many viruses, although these genes are slightly longer than expected if colinearity is strictly applied (the mRNAs encoded by the genes contain sequences at their ends that do not specify amino acids). At first, eukaryotic genes and proteins also were generally assumed to be colinear, but there were hints that eukaryotic gene structure was fundamentally different. Eukaryotic cells contain far more DNA than is required to encode proteins (the socalled C-value paradox; see Chapter 11). Furthermore, many large RNA molecules observed in the nucleus were absent from the cytoplasm, suggesting that nuclear RNAs undergo some type of change before they are exported to the cytoplasm. Most geneticists were nevertheless surprised by the announcement in the 1970s that four coding sequences in a gene from a eukaryotic virus were interrupted by nucleotides that did not specify amino acids. This discovery was made when the viral DNA was hybridized with the mRNA transcribed from it, and the hybridized structure was examined with the use of an electron microscope ( ◗ FIGURE 14.2). The DNA was clearly much longer than the mRNA, because regions of DNA looped out from the hybridized molecules. These regions contained nucleotides in the DNA that were absent from the coding nucleotides in the mRNA. Many other examples of interrupted genes were subsequently discovered; it quickly became apparent that most eukaryotic genes comprise stretches of coding and noncoding nucleotides. Experiment Question: Is the coding sequence in a gene always continuous? DNA RNA 1 Mix DNA with complementary RNA and heat to separate DNA strands. 2 Cool the mixture. Complementary sequences pair. DNA may reanneal with its complementary strand… …or with RNA. Noncoding regions of DNA are seen as loops. Concepts When a continuous sequence of nucleotides in DNA encodes a continuous sequence of amino acids in a protein, the two are said to be colinear. The discovery of coding and noncoding regions within eukaryotic genes shows that not all genes are colinear with the proteins that they encode. Conclusion: Coding sequences in a gene may be interrupted by noncoding sequences. ◗ 14.2 The noncolinearity of eukaryotic genes was discovered by hybridizing DNA and mRNA. (ElectromiIntrons Many eukaryotic genes contain coding regions called exons and noncoding regions called intervening sequences or introns. For example, the ovalbumin gene has eight exons and seven introns; the gene for cytochrome b has five exons and four introns ( ◗ FIGURE 14.3). All the introns and the exons are initially transcribed into RNA but, after transcription, the introns are removed by splicing and the exons are joined to yield the mature RNA. Introns are common in eukaryotic genes but are rare in bacterial genes. For a number of years after their discovery, crograph from O.L. Miller, B.R. Beatty, D.W. Fawcett/Visuals Unlimited.) introns were thought to be entirely absent from prokaryotic genomes, but they have now been observed in archaea, bacteriophages, and even some eubacteria. Introns are present in mitochondrial and chloroplast genes, as well as nuclear genes. In eukaryotic genomes, the size and number of introns appear to be directly related to increasing organismal complexity. Yeast genes contain only a few short introns; Drosophila introns are longer and more numerous; and most vertebrate genes are interrupted by long introns. All RNA Molecules and RNA Processing Ovalbumin gene DNA 1 5’ 3’ Cytochrome b gene Exons 2 3 4 5 Exons DNA 6 7 8 1 3’ 5’ 2 3 4 5 5’ 3’ 3’ 5’ Introns Introns Transcription Transcription DNA is transcribed into RNA, and introns are removed by RNA splicing. 1 234567 8 mRNA 5’ 1 23 4 5 3’ mRNA 5’ 3’ ◗ 14.3 The coding sequences of many eukaryotic genes are disrupted by noncoding introns. classes of genes — those that code for rRNA, tRNA, and proteins — may contain introns. The number and size of introns vary widely: some eukaryotic genes have no introns, whereas others may have more than 60; intron length varies from fewer than 200 nucleotides to more than 50,000. Introns tend to be longer than exons, and most eukaryotic genes contain more noncoding nucleotides than coding nucleotides. Finally, most introns do not encode proteins (an intron of one gene is not usually an exon for another), although there are exceptions. There are four major types of introns (Table 14.1). Group I introns, found in some rRNA genes, are self-splicing — they can catalyze their own removal. Group II introns are present in some protein-encoding genes of mitochondria, chloroplasts, and a few eubacteria; they also are self-splicing, but their mechanism of splicing differs from that of the group I introns. Nuclear pre-mRNA introns are the best studied; they include introns located in the proteinencoding genes of the nucleus. The splicing mechanism by Table 14.1 Major types of introns Type of Intron Location Type of Splicing Group I Some rRNA genes Self-splicing Group II Protein-encoding genes in mitochondria and chloroplasts Self-splicing Nuclear pre-mRNA Protein-encoding genes in the nucleus Spliceosomal tRNA tRNA genes Enzymatic Note: There are also several types of minor introns, including group III introns, twintrons, and archaeal introns. which these introns are removed is similar to that of the group II introns, but nuclear introns are not self-splicing; their removal requires snRNAs (discussed later) and a number of proteins. Transfer RNA introns, found in tRNA genes, utilize yet another splicing mechanism that relies on enzymes to cut and reseal the RNA. In addition to these major groups, there are several other types of introns. We’ll take a detailed look at the chemistry and mechanics of RNA splicing later in the chapter. For now, we should keep in mind two general characteristics of the splicing process: (1) the splicing of all pre-mRNA introns takes place in the nucleus and is probably required for RNA to move to the cytoplasm; and (2) the order of exons in DNA is usually maintained in the spliced RNA — the coding sequences of a gene may be split up, but they are not usually jumbled up. Concepts Many eukaryotic genes contain exons and introns, both of which are transcribed into RNA, but introns are later removed by RNA processing. The number and size of introns vary from gene to gene; they are common in many eukaryotic genes but uncommon in bacterial genes. The Concept of the Gene Revisited How does the presence of introns affect our concept of a gene? It no longer seems appropriate to define a gene as a sequence of nucleotides that codes for amino acids in a protein, because this definition excludes from the gene those sequences in introns that don’t specify amino acids. This definition also excludes nucleotides that code for the 5 and 3 ends of a mRNA molecule, which are required for translation but do not code for amino acids. And defining a gene 5 6 Chapter I4 in these terms also excludes sequences that encode rRNA, tRNA, and other RNAs that do not encode proteins. In view of our current understanding of DNA structure and function, we need a more satisfactory definition of gene. Many geneticists have broadened the concept of a gene to include all sequences in the DNA that are transcribed into a single RNA molecule. Defined in this way, a gene includes all exons, introns, and those sequences at the beginning and end of the RNA that are not translated into a protein. This definition also includes DNA sequences that code for rRNAs, tRNAs, and other types of non-messenger RNA. Many geneticists have expanded the definition of a gene even further, to include the entire transcription unit — the promoter, the RNA coding sequence, and the terminator. ( ◗ FIGURE 14.4). The genetic information needed to produce new phage proteins was not carried by the ribosomes. In a related experiment, François Gros and his colleagues infected E. coli cells with bacteriophages while radioactively Experiment Question: Do ribosomes carry genetic information? 1 E. coli were grown in medium containing heavy isotopes through several generations so that the heavy isotopes would become incorporated into all E. coli ribosomes. Medium with 15N and E. coli culture Concepts The discovery of introns forced a reevaluation of the definition of the gene. Today, a gene is often defined as a DNA sequence that codes for an RNA molecule. Move to new medium Bacteriophage added 13C 2 The cells were moved into medium containing light isotopes (14N and 12C)… 3 … and infected with bacteriophage. Messenger RNA As soon as DNA was identified as the source of genetic information, it became clear that DNA could not directly encode proteins. In eukaryotic cells, DNA resides in the nucleus, yet most protein synthesis takes place in the cytoplasm. Geneticists recognized that an additional molecule must take part in the transfer of genetic information. The results of studies of bacteriophage infection conducted in the late 1950s and early 1960s pointed to RNA as a likely candidate for this transport function. Bacteriophages inject their DNA into bacterial cells, where the DNA is replicated, and large amounts of phage protein are produced on the bacterial ribosomes. As early as 1953, Alfred Hershey discovered a type of RNA that was synthesized rapidly after bacteriophage infection. Findings from later studies showed that the bacteriophage T2 produced shortlived RNA having a nucleotide composition similar to that of phage DNA but quite different from that of the bacterial RNA. These observations were consistent with the idea that RNA was copied from DNA and that this RNA then directed the synthesis of proteins. At the time, ribosomes were known to be somehow implicated in protein synthesis, and much of the RNA in a cell was known to be in the form of ribosomes. Each gene was thought to direct the synthesis of a special type of ribosome in the nucleus, which then moved to the cytoplasm and produced a specific protein. Using equilibrium densitygradient centrifugation (see Figure 10.2), Sydney Brenner, François Jacob, and Matthew Meselson demonstrated in 1961 that new ribosomes are not produced during the burst of protein synthesis that accompanies phage infection Medium with 14N and 12C E. coli culture 4 New ribosomes produced after phage infection would contain 14N and 12C, and would be relatively light. 5 After phage proteins were produced, ribosomes were separated by equilibrium density gradient centrifugation. Spin Increasing density 6 Only old ribosomes containing heavy isotopes (15N and 13C), were found. Conclusion: Ribosomes are not produced during phage reproduction. ◗ 14.4 Brenner, Jacob, and Meselson demonstrated that ribosomes do not carry genetic information. RNA Molecules and RNA Processing labeled (“hot”) uracil was added to the medium (which would become incorporated into newly produced phage RNA). After a few minutes, they transferred the cells to a medium that contained unlabeled (“cold”) uracil. This type of experiment is called a pulse – chase experiment: the cells are exposed to a brief pulse of label, which is then “chased” by cold, unlabeled precursor. Pulse – chase experiments make it possible to follow, by tracking the presence of the radioactivity, products of short-term biochemical events, such as RNA synthesis immediately following phage infection. Gros and his coworkers found that the newly produced phage RNA was short lived, lasting only a few minutes, and was associated with ribosomes but was distinct from them. They concluded that newly synthesized, short-lived RNA carries the genetic information for protein structure to the ribosome. The term messenger RNA was coined for this carrier. The Structure of Messenger RNA Messenger RNA functions as the template for protein synthesis; it carries genetic information from DNA to a ribosome and helps to assemble amino acids in their correct order. Each amino acid in a protein is specified by a set of three nucleotides in the mRNA, called a codon. Both prokaryotic and eukaryotic mRNAs contain three primary regions ( ◗ FIGURE 14.5). The 5 untranslated region (5 UTR; sometimes call the leader) is a sequence of nucleotides at the 5 end of the mRNA that does not code for the amino acid sequence of a protein. In bacterial mRNA, this region contains a consensus sequence called the Shine-Dalgarno sequence, which serves as the ribosomebinding site during translation; it is found approximately seven nucleotides upstream of the first codon translated into an amino acid (called the start codon). Eukaryotic mRNA has no equivalent consensus sequence in its 5 untranslated region. In eukaryotic cells, ribosomes bind to a modified 5 end of mRNA, as discussed later in the chapter. The next section of mRNA is the protein-coding region, which comprises the codons that specify the amino acid sequence of the protein. The protein-coding region begins with a start codon and ends with a stop codon. The last region of mRNA is the 3 untranslated region (3 UTR), a sequence of nucleotides at the 3 end of mRNA that is not translated into protein. The 3 untranslated region affects the stability of mRNA and the translation of the mRNA protein-coding sequence. Concepts Messenger RNA molecules contain three main regions: a 5 untranslated region, a protein-coding region, and a 3 untranslated region. The 5 and 3 untranslated regions do not code for the amino acids of a protein. Shine-Dalgarno sequence in prokaryotes only mRNA 5’ Start codon Stop codon 3’ 5’ untranslated region Protein-coding region 3’ untranslated region ◗ 14.5 Three primary regions of mature mRNA are the 5 untranslated region, the protein-coding region, and the 3 untranslated region. Pre-mRNA Processing In bacterial cells, transcription and translation take place simultaneously; while the 3 end of an mRNA is undergoing transcription, ribosomes attach to the Shine-Dalgarno sequence near the 5 end and begin translation. Because transcription and translation are coupled, there is little opportunity for the bacterial mRNA to be modified before protein synthesis. In contrast, transcription and translation are separated in both time and space in eukaryotic cells. Transcription takes place in the nucleus, whereas most translation takes place in the cytoplasm; this separation provides an opportunity for eukaryotic RNA to be modified before it is translated. Indeed, eukaryotic mRNA is extensively altered after transcription. Changes are made to the 5 end, the 3 end, and the protein-coding section of the RNA molecule. The initial transcript of protein-encoding genes of eukaryotic cells is called pre-mRNA, whereas the mature, processed transcript is mRNA. We will reserve the term mRNA for RNA molecules that have been completely processed and are ready to undergo translation. The Addition of the 5 Cap Almost all eukaryotic pre-mRNAs are modified at their 5 ends by the addition of a structure called a 5 cap. This capping consists of the addition of an extra nucleotide at the 5 end of the mRNA and methylation by the addition of a methyl group (CH3) to the base in the newly added neucleotide and to the 2 – OH group of the sugar of one or more nucleotides at the 5 end ( ◗ FIGURE 14.6). Capping takes place rapidly after the initiation of transcription and, as will be discussed in more depth in Chapter 15, the 5 cap functions in the initiation of translation. Capbinding proteins recognize the cap and attach to it; a ribosome then binds to these proteins and moves downstream along the mRNA until the start codon is reached and translation begins. The presence of a 5 cap also increases the stability of mRNA and influences the removal of introns. In the discussion of transcription in Chapter 13, it was noted that three phosphates are present at the 5 end of all RNA molecules, because phosphates are not cleaved 7 8 Chapter I4 ◗ 14.6 Most eukaryotic mRNAs have a 5 cap. The cap consists of a nucleotide with 7-methyl guanine attached to the pre-mRNA by a unique 5  – 5 bond (shown in detail in the bottom box). The cap is added shortly after the initiation of transcription. A methyl group is added to position 7 of the guanine base of the newly added (now the terminal) nucleotide and to the 2 position of each sugar of the next two nucleotides. DNA Transcription RNA RNA PROCESSING Nucleotide Phosphate mRNA 5’ P P P N P N P 1 One of the three phosphates at the 5’ end of the mRNA is removed,… Pi 3’ Removal of phosphate 3’ 5’ P P N P N P GTP 2 …and a guanine nucleotide (with its phosphate) is added. P Pi 5’ G P P P N P N P 3 Methyl groups are added to position 7 of the base of the terminal guanine nucleotide,… 3’ Methylation 3’ 5’ CH3 G P P P N P N P Methylation 4 …and to the 2’ position of the sugar in the second and third nucleotides. The Addition of the Poly(A) Tail 5 The base on the initial nucleotide also may be methylated. CH3 CH3 5’ CH3 G P P P N P N P 3’ CH3 CH3 G 7-Methyl group H2C P P P CH3 CH2 N 2’-Methyl group 7-Methylguanine OCH3 P CH2 P from the first ribonucleoside triphosphate in the transcription reaction. The 5 end of pre-mRNA can be represented as 5  – pppNpNpN . . . , in which the letter N represents a ribonucleotide and p represents a phosphate. Shortly after the initiation of transcription, one of these phosphates is removed and a guanine nucleotide is added (see Figure 14.6). This guanine nucleotide is attached to the pre-mRNA by a unique 5  – 5 bond, which is quite different from the usual 5  – 3 phosphodiester bond that joins all the other nucleotides in RNA. One or more methyl groups are then added to the 5 end; the first of these methyl groups is added to position 7 of the base of the terminal guanine nucleotide, making the base 7methylguanine. Next, a methyl group may be added to the 2 position of the sugar in the second and third nucleotides, as shown in Figure 14.6. Rarely, additional methyl groups may be attached to the bases of the second and third nucleotides of the pre-mRNA. N OCH3 Most mature eukaryotic mRNAs have from 50 to 250 adenine nucleotides at the 3 end (a poly(A) tail). These nucleotides are not encoded in the DNA but are added after transcription ( ◗ FIGURE 14.7) in a process termed polyadenylation. Many eukaryotic genes transcribed by RNA polymerase II are transcribed well beyond the end of the coding sequence (see Chapter 13); the extra material at the 3 end is then cleaved and the poly(A) tail is added. For some pre-mRNA molecules, more than 1000 nucleotides may be cleaved from the 3 end. Processing of the 3 end of pre-mRNA requires sequences both upstream and downstream of the cleavage site ( ◗ FIGURE 14.8a). The consensus sequence AAUAAA is usually from 11 to 30 nucleotides upstream of the cleavage site (see Figure 14.7) and determines the point at which cleavage will take place. A sequence rich in Us (or Gs and Us) is typically downstream of the cleavage site. In mammals, 3 cleavage and the addition of the poly(A) tail requires a complex consisting of several proteins: cleavage and polyadenylation specificity factor (CPSF); cleavage stimulation factor (CstF); at least two cleavage factors (CFI and CFII); and polyadenylate polymerase (PAP). CPSF binds to the upstream AAUAAA consensus sequence, whereas CstF binds to the downstream sequence ( ◗ FIGURE 14.8b). RNA Molecules and RNA Processing DNA DNA Transcription start site Transcription 11–30 nucleotides Transcription RNA Consensus sequence Pre-mRNA 5’ 3’ AAUAAA RNA PROCESSING Cleavage site The addition of adenine nucleotides (polyadenylation) takes place at 3’ the 3‘ end of the pre-mRNA, generating the poly(A) tail. Cleavage 5’ Pre-mRNA is cleaved, at a position from 11 to 30 nucleotides downstream of an AAUAAA consensus sequence, in the 3’ untranslated region. AAUAAA Polyadenylation Poly (A) tail mRNA 5’ AAUAAA AAAAAAAAAAAAAAAA 3’ Conclusion: In pre-mRNA processing, a poly(A) tail is added through cleavage and polyadenylation. ◗ 14.7 Most eukaryotic mRNAs have a 3 poly(A) tail. The pre-mRNA is cleaved, and CstF and the cleavage factors leave the complex; the cleaved 3 end of the pre-mRNA is then degraded ( ◗ FIGURE 14.8c). CFSF and PAP remain bound to the pre-mRNA and carry out polyadenylation ( ◗ FIGURE 14.8d). After the addition of approximately 10 adenine nucleotides, a poly(A)-binding protein (PABII) attaches to the poly(A) tail and increases the rate of polyadenylation ( ◗ FIGURE 14.8e). As more of the tail is synthesized, additional molecules of PABII attach to it( ◗ FIGURE 14.8f). The poly(A) tail confers stability on many mRNAs, increasing the time during which the mRNA remains intact and available for translation before it is degraded by cellular enzymes. The stability conferred by the poly(A) tail is dependent on the proteins that attached to the tail. Eukaryotic mRNAs that lack a poly(A) tail depend on a different mechanism for 3 cleavage that requires the formation of a hairpin structure in the pre-mRNA and a small ribonucleoprotein particle (snRNP) called U7 ( ◗ FIGURE 14.9). U7 contains an snRNA with nucleotides that are complementary to a sequence on the pre-mRNA just downstream of the cleavage site, and U7 most likely binds to this sequence. A hairpin-binding protein binds to the hairpin structure and stabilizes the binding of U7 to the complementary sequence on the pre-mRNA. Concepts Eukaryotic pre-mRNAs are processed at their 5 and 3 ends. A cap, consisting of a modified nucleotide and several methyl groups, is added to the 5 end. The cap facilitates the binding of a ribosome, increases the stability of the mRNA, and may affect the removal of introns. Processing at the 3 end includes cleavage downstream of an AAUAAA consensus sequence and the addition of a poly(A) tail. RNA Splicing The other major type of modification that takes place in eukaryotic pre-mRNA is the removal of introns by RNA splicing. This occurs in the nucleus following transcription but before the RNA moves to the cytoplasm. Consensus sequences and the spliceosome Splicing requires the presence of three sequences in the intron. One end of the intron is referred to as the 5 splice site, and the other end is the 3 splice site ( ◗ FIGURE 14.10); these splice sites possess short consensus sequences. Most introns in pre-mRNA begin with GU and end with AG, suggesting that these sequences play a crucial role in splicing. Changing a single nucleotide at either of these sites does indeed prevent splicing. A few introns in pre-mRNA begin with AU and end with AC. These introns are spliced by a process that is similar to that seen in GU. . . AG introns but utilizes a different set of splicing factors. This discussion will focus on splicing of the more common GU. . . AG introns. The third sequence important for splicing is at the so-called branch point, which is an adenine nucleotide that lies from 18 to 40 nucleotides upstream of the 3 splice site (see Figure 14.10). The sequence surrounding the branch point does not have a strong consensus but usually takes the 9 Chapter I4 Consensus sequence (a) Pre-mRNA 5’ Hairpin-binding protein Hairpin AAUAAA Cleavage site 1 A complex of proteins links the consensus sequence and downstream U-rich sequence. Cleavage factors U-rich sequence 2 Cleavage occurs. Cleavage factors and the 3’ end of the pre-mRNA are released. (c) CF 5’ 3’ PAP 3’ 3 The 3’ end of the pre-mRNA is degraded. AAA (d) CPSF PAP AA AAUAAA A AA 5’ Degradation 4 Polyadenylate polymerase adds adenine nucleotides to the 3’ end of the new mRNA… 3’ PABII AAA PAP AA CPSF A AA AAUAAA AA (e) 5’ 5 …and poly(A)-binding protein (PABII) attaches to the poly(A) tail and increases the rate of polyadenylation. 3’ AA AA A A A A A A AA AA 6 Polyadenylation and continued PABII binding elongate the poly(A) tail. AAAAAA AAA AAA A A A ◗ PAP A CPSF AA AAUAAA A (f) 5’ 3’ U7 snRNA Region of probable pairing ◗ 14.9 Eukaryotic mRNAs that lack a poly(A) tail depend on a different mechanism for 3 cleavage. Cleavage requires the presence of U7 snRNA, which has bases complementary to a consensus sequence downstream of the 3 cleavage site. Cleavage depends on the formation of a hairpin structure near the 3 end of the pre-mRNA; base pairing probably takes place between the complementary regions of the pre-mRNA and the U7 snRNA. CstF AAUAAA CPSF CF 3’ GAAAGA CUUUCU Cleavage site 3’ 3’ cleavage site 5’ PAP CF CstF CF Consensus sequence 5’ Cleavage stimulation factor AAUAAA CPSF Pre-mRNA Polyadenylate polymerase (b) Cleavage and 5’ polyadenylation specificity factor U-rich sequence AA 10 3’ 14.8 Processing of the 3 end of pre-mRNA requires a consensus sequence and several factors. form YNYYRAY (Y is any pyrimidine, N is any base, R is any purine, and A is adenine). The deletion or mutation of the adenine nucleotide at the branch point prevents splicing. Splicing takes place within a large complex called the spliceosome, which consists of several RNA molecules and many proteins. The RNA components are small nuclear RNAs (Chapter 13); these snRNAs associate with proteins to form small ribonucleoprotein particles. Each snRNP contains a single snRNA molecule and multiple proteins. The spliceosome is composed of five snRNPs, named for the snRNAs that they contain (U1, U2, U4, U5, and U6), and some proteins not associated with an snRNA. Concepts Introns in nuclear genes contain three consensus sequences critical to splicing: a 5 splice site, a 3 splice site, and a branch point. Splicing of pre-mRNA takes place within a large complex called the spliceosome, which consists of snRNAs and proteins. The process of splicing To illustrate the process of RNA splicing, we’ll first consider the chemical reactions that take place. Then we’ll see how these splicing reactions constitute a set of coordinated processes within the context of the spliceosome. Before splicing takes place, an upstream exon (exon 1) and a downstream exon (exon 2) are separated by an intron ( ◗ FIGURE 14.11). Pre-mRNA is spliced in two distinct steps. 11 RNA Molecules and RNA Processing Exon 1 Exon 2 5’ splice site C/ A A AG GU /G AGU 5’ ◗ Intron 14.10 Splicing of pre-mRNA requires consensus sequences. In the consensus sequence surrounding the branch point (YNYYRAY) Y is any pyrimidine, R is any purine, A is adenine, and N is any base. 3’ splice site YNYYRAY 5’ consensus sequence 3’ AG Branch point 3’ consensus sequence Conclusion: Critical consensus sequences are present at the 5’ splice site, the branch point, and the 3’ splice site. In the first step, the pre-mRNA is cut at the 5 splice site. This cut frees exon 1 from the intron, and the 5 end of the intron attaches to the branch point; that is, the intron folds back on itself, forming a structure called a lariat. The guanine nucleotide in the consensus sequence at the 5 splice site bonds with the adenine nucleotide at the branch point. This bonding is accomplished through transesterification, a chemical reaction in which the OH group on the 2-carbon atom of the adenine nucleotide at the branch point attacks the 5 phosphodiester bond of the guanine nucleotide at the 5 splice site, cleaving it and forming a new 5  – 2 phosphodiester bond between the guanine and adenine nucleotides. In the second step of RNA splicing, a cut is made at the 3 splice site and, simultaneously, the 3 end of exon 1 becomes covalently attached (spliced) to the 5 end of exon 2. This bond also forms through a transesterification reaction, in which the 3-OH group attached to the end of exon 1 attacks the phosphodiester bond at the 3 splice site, cleaving it and forming a new phosphodiester bond between the 3 end of exon 1 and the 5 end of exon 2; the intron is released as a lariat. The intron becomes linear when the bond breaks at the branch point and is then rapidly degraded by nuclear enzymes. The mature mRNA consisting of the exons spliced together is exported to the cytoplasm where it is translated. Although splicing is illustrated in Figure 14.11 as a two-step process, the reactions are in fact coordinated within the spliceosome. A key feature of the spliceosome is a series of interactions between the mRNA and snRNAs and 3’ splice site 5’ splice site P DNA Transcription RNA Pre-mRNA Exon 1 1 The mRNA is cut at the 5’ splice site. Intron Exon 2 2 The 5’ end of the intron attaches to the branch point. N 3 A cut is made at the 3’ splice site. P G RNA PROCESSING 5’ –O P 2’ 3’ P mRNA Exon 1 Exon 2 6 The bond holding the lariat is broken, and the linear intron is degraded. ◗ P A 14.11 The splicing of nuclear introns requires a two-step process. First, cleavage takes place at the 5 splice site, and a lariat is formed by the attachment of the 5 end of the intron to the branch point. Second, cleavage takes place at the 3 splice site, and two exons are spliced together. Translation N N O 5 …and the two exons are spliced together. G O Lariat Exon 2 – O 4 The intron is released as a lariat,… 5’ P Exon 1 7 The spliced mRNA is exported to the cytoplasm and translated. P Chapter I4 Table 14.2 RNA – RNA interactions in pre-mRNA 5’ Exon 1 Intron Exon 2 3’ A splicing Branch point Function U1 with 5 splice site U1 attaches to 5 end of intron; commits intron to splicing; no direct role in splicing U2 with branch point Positions 5 end of intron near branch point for lariat formation U2 with U6 Holds 5 end of intron near branch point U6 with 5 splice site Positions 5 end of intron near branch point U5 with 3 end of first exon Anchors first exon to spliceosome subsequent to cleavage; juxtaposes two ends of exon for splicing U5 with 3 end of one exon and 5 end of the other Juxtaposes two ends of exon for splicing U4 with U6 Delivers U6 to intron; no direct role in splicing between different snRNAs (summarized in Table 14.2). These interactions depend on complementary base pairing between the different RNA molecules and bring the essential components of the pre-mRNA transcript and the spliceosome close together, which makes splicing possible. The spliceosome is assembled on the pre-mRNA transcript in a step-by-step fashion ( ◗ FIGURE 14.12). First, snRNP U1 attaches to the 5 splice site, and then U2 attaches to the branch point. A complex consisting of U5 and U4 – U6 (which form a single snRNP) joins the spliceosome. At this point, the intron loops over and the 5 splice site is brought close to the branch point. U1 and U4 disassociate from the spliceosome. The 5 splice site, 3 splice site, and branch point are in close proximity, held together by the spliceosome. The two transesterification reactions take place, joining the two exons together and releasing the intron as a lariat. www.whfreeman.com/pierce process An animation of the splicing Nuclear organization RNA splicing takes place in the nucleus and must occur before the RNA can move into the cytoplasm. For many years, the nucleus was viewed as a biochemical soup, in which components such as the spliceosome diffused and reacted randomly. Now, the nucleus is believed to have a highly ordered internal structure, with transcription and RNA processing taking place 1 U1 attaches to the 5’ splice site. U1 Exon 1 Intron A Exon 2 U1 U2 2 U2 attaches to the branch point. Exon 1 A Intron Exon 2 U2 U1 U5 U4 U6 3 U5, U4, and U6 join the spliceosome. 4 U1 and U4 are released. U1 Spliceosome In Interaction U4 5 The 5’ splice site, 3’ splice site, and branch point are in close proximity,… n tro U6 on Ex A 1 U5 Exon 2 U2 6 …and are held together by pairing between the pre-mRNAs and the snRNP. In 12 n tro U5 A U2 U6 Exon 1 Exon 2 7 Two esterification reactions join the exons together and release the intron as a lariat with U2, U5, and U6 attached. ◗ 14.12 RNA splicing takes place within the spliceosome. at particular locations within it. By attaching fluorescent tags to pre-mRNA and using special imaging techniques, researchers have been able to observe the location of premRNA as it is transcribed and processed. The results of these studies revealed that intron removal and other processing reactions take place at the same sites as those of transcription ( ◗ FIGURE 14.13), suggesting that these processes may be physically coupled. This suggestion is supported by the observation that part of RNA polymerase II is also required for the splicing and 3 processing of pre-mRNA. RNA Molecules and RNA Processing 13 Concepts Intron splicing of nuclear genes is a two-step process: (1) the 5 end of the intron is cleaved and attached to the branch point to form a lariat and (2) the 3 end of the intron is cleaved and the two ends of the exon are spliced together. These reactions take place within the spliceosome. Self-splicing introns Some introns are self-splicing, meaning that they possess the ability to remove themselves from an RNA molecule. These self-splicing introns fall into two major categories. Group I introns are found in a variety of genes, including some rRNA genes in protists, some mitochondrial genes in fungi, and even some bacteriophage genes. Although the lengths of group I introns vary, all of them fold into a common secondary structure with nine looped stems ( ◗ FIGURE 14.14a), which are necessary for splicing. Transesterification reactions are required for the splicing of group I introns ( ◗ FIGURE 14.14b). ◗ 14.13 Intron removal, processing, and transcription take place at the same site. RNA tracks can be seen in the nucleus of a eukaryotic cell. Fluorescent tags were attached to DNA (red) and RNA (green). Transcribed RNA does not disperse; rather, it accumulates near the site of synthesis and follows a defined track during processing. (Credit for Fig 14.13 allowing additional line for photo credit.) (a) (b) All but the exons is removed by splicing. Intron 1 The 3’-OH group of a guanine nucleotide attacks and cleaves the 5’ end of the intron. 5’ splice site 5’ Exon 1 U P A PG 5’ splice site Exon 1 3’ splice site G P U Exon 2 3’ OH 2 The guanine nucleotide is added to the 5’ end of the intron,… Intron 5’ 3 …and a free 3’-OH group is generated at the end of exon 1. Exon 2 3’ splice site 5’ Exon 1U P G P A OH3’ G P U Exon 2 3’ 4 3’-OH at the end of exon 1 attacks the 5’ end of exon 2,… Intron 5 …cleaving the intron at its 3’ end, releasing the intron, and splicing the two exons together. 5’ P GP A 5’ Exon 1 U P U Exon 2 3’ Group I intron ◗ G OH 3’ Conclusion: A group I intron is removed through a unique self-splicing reaction. 14.14 Group I introns undergo self-splicing. (a) Secondary structure of a group I intron. (b) Self-splicing of a group I intron. 14 Chapter I4 (b) (a) A OH 5’ splice site 3’ splice site 5’ Exon 1U P G 1 An adenine nucleotide within the intron attacks a guanine nucleotide at the 5’ end of the intron,… Exon I Exon II 5’ 3’ Group II intron P U Exon 2 3’ P U Exon 2 3’ Cleavage Lariat structure 2 …creating a 3’-OH group at the end of exon 1 and a lariat structure within the intron. G PA 5’ Exon 1 U OH 3’ Intron 3 The intron is removed as a lariat,… G PA Splicing 4 …and the two exons are spliced together. 5’ Exon 1 U P U Exon 2 3’ OH Conclusion: A group II intron is removed through a self-splicing reaction similar to that of nuclear introns. ◗ 14.15 Group II introns undergo self-splicing by a different mechanism from that for group I introns. (a) Secondary structure of a group II intron. (b) Self-splicing of group II introns, which is similar to the splicing of nuclear introns. Group II introns, present in some mitochondrial genes, also have the ability to self-splice. All group II introns fold into similar secondary structures ( ◗ FIGURE 14.15a). The splicing of group II introns is accomplished by a mechanism that has some similarities to the spliceosomal-mediated splicing of nuclear genes; splicing takes place through two transesterification reactions that generate a lariat structure ( ◗ FIGURE 14.15b). Because of these similarities, group II introns and nuclear pre-mRNA introns have been suggested to be evolutionarily related — perhaps the nuclear introns evolved from self-splicing group II introns and later adopted the proteins and snRNAs of the spliceosome to carry out the splicing reaction. Concepts Self-splicing introns are of two types: group I introns and group II introns. These introns have complex secondary structures that enable them to catalyze their excision from RNA molecules without the aid of enzymes or other proteins. Alternative Processing Pathways Another finding that complicates the view of a gene as a sequence of nucleotides that specifies the amino acid sequence of a protein is the existence of alternative processing pathways, in which a single pre-mRNA is processed in different ways to produce alternative types of mRNA, resulting in the production of different proteins from the same DNA sequence. One type of alternative processing is alternative splicing, in which the same pre-mRNA can be spliced in more than one way to yield multiple mRNAs that are translated into proteins with different amino acid sequences ( ◗ FIGURE 14.16a). Another type of alternative processing requires the use of multiple 3 cleavage sites ( ◗ FIGURE 14.16b); two or more potential sites for cleavage and polyadenylation are present in the pre-mRNA. In our example, cleavage at the first site produces a relatively short mRNA, compared with the mRNAs produced through cleavage at other sites. Both alternative splicing and multiple 3 cleavage sites can exist in the same pre-mRNA transcript; an example is seen in the mammalian calcitonin gene, which contains six RNA Molecules and RNA Processing exons and five introns ( ◗ FIGURE 14.17a). The entire gene is transcribed into pre-mRNA ( ◗ FIGURE 14.17b). There are two possible 3 cleavage sites. In cells of the thyroid gland, 3 cleavage and polyadenylation take place after the fourth exon, and the first three introns are then removed to produce a mature mRNA consisting of exons 1, 2, 3, and 4 ( ◗ FIGURE 14.17c). This mRNA is translated into the hormone calcitonin. In brain cells, the identical pre-RNA is transcribed from DNA, but it is processed differently. Cleavage and polyadenylation take place after the sixth exon, yielding an initial transcript that includes all six exons. During splicing, exon 4 (part of the calcitonin mRNA) is removed, along with all the introns; so only exons 1, 2, 3, 5, and 6 are present in the mature mRNA ( ◗ FIGURE 14.17d). When translated, this mRNA produces a protein called calcitonin-gene-related peptide (CGRP), which has an amino acid sequence quite different from that of calcitonin. Alternative splicing may produce different combinations of Intron 1 Exon 1 exons in the mRNA, but the order of the exons is not usually changed. Different processing pathways contribute to gene regulation, as discussed in Chapter 16. Concepts Alternative splicing enables exons to be spliced together in different combinations to yield mRNAs that encode different proteins. Alternative 3 cleavage sites allow pre-mRNA to be cleaved at different sites to produce mRNAs of different lengths. RNA Editing A long-standing principle of molecular genetics is that genetic information ultimately resides in the nucleotide sequence of DNA, except in RNA viruses (Chapter 10). This (b) Multiple 3’ cleavage sites (a) Alternative splicing DNA 15 Intron 2 Intron DNA Exon 3 Exon 2 Exon 1 Transcription Exon 2 Transcription 3’ cleavage site Pre-mRNA 5’ Exon 1 Pre-mRNA Exon 2 3’ Exon 3 Cleavage may occur at 3‘ site 1… 3’ cleavage and polyadenylation 5’ Exon 1 Either two introns are removed to yield one mRNA… 5’ Exon 1 Exon 2 Exon 3 AAAAA 3’ Alternative RNA splicing …or two introns and exon 2 are removed to yield a different mRNA. mRNA 5’ Exon 1 …or at 3‘ site 2. 3’ cleavage and polyadenylation 1 AAAAA 3’ Exon 2 RNA splicing 1 2 3’ cleavage sites 3’ Exon 2 5’ Exon 1 mRNA products of different lengths are produced after splicing. Exon 2 RNA splicing mRNA 5’ Exon 1 Exon 2 Exon 3 AAAAA 3’ 5’ Exon 1 Exon 3 AAAAA 3’ AAAAA 3’ 5’ 5’ AAAAA 3’ E x on 2 Intron 1 Intron 2 Intron 1, exon 2, and intron 2 Intron 1 Conclusion: Both alternate splicing and multiple 3‘ cleavage sites produce different mRNAs from a single pre-mRNA. ◗ 2 AAAAA 3’ 14.16 Eukaryotic cells have alternative pathways for processing pre-mRNA. (a) With alternative splicing; pre-mRNA can be spliced in different ways to produce different mRNAs. (b) With multiple 39 cleavage sites, there are two or more potential sites for cleavage and polyadenylation; use of the different sites produces mRNAs of different lengths. Intron 2 16 Chapter I4 (a) DNA 5’ 3’ Exon 1 Exon 2 Exon 3 Exon 4 Exon 5 Exon 6 Exon 5 Exon 6 Transcription (b) Pre-mRNA 5’ Exon 1 Exon 2 Exon 3 Exon 4 3’ cleavage site 1 In thyroid cells, cleavage and polyadenylation takes place at the end of exon 4,… 3’ 3’ cleavage site 4 In brain cells, 3‘ cleavage takes place at the end of exon 6. RNA processing Thyroid cells Brain cells 5 During splicing exon 4, is eliminated with the five introns,… (d) (c) mRNA 5’ Exon 1 Exon 2 Exon 3 2 …producing an mRNA that contains exons 1, 2, 3, and 4. Exon 4 AAAAA 3’ mRNA 5’ 3 Translation produces the hormone calcitonin. Exon 1 Exon 2 Exon 3 6 …producing an mRNA that contains exons 1, 2, 3, 5, and 6. Calcitonin Exon 5 Exon 6 AAAAA 3’ 7 Translation yields calcitonin-generelated peptide. Calcitonin-gene-related peptide (CGRP) ◗ 14.17 Pre-mRNA encoded by the calcitonin gene undergoes alternative processing. information is transcribed into mRNA, and mRNA is then translated into a protein. The assumption that all information about the amino acid sequence of a protein resides in DNA is violated by a process called RNA editing. In RNA editing, the coding sequence of an mRNA molecule is altered after transcription, and so the protein has an amino acid sequence that differs from that encoded by the gene. RNA editing was first detected in 1986 when the coding sequences of mRNAs were compared with the coding sequences of the DNAs from which they had been transcribed. Discrepancies were found for some nuclear genes in mammalian cells and for mitochondrial genes in plant cells. In these cases, substitutions had occurred in some of the nucleotides of the mRNA. More extensive RNA editing has been found in the mRNA for some mitochondrial genes in trypanosome parasites (which cause African sleeping sickness). In some mRNAs of these organisms, more than 60% of the sequence is determined by RNA editing. Different types of RNA editing have now been observed in mRNAs, tRNAs, and rRNAs from a wide range of organisms; they include the insertion and the deletion of nucleotides and the conversion of one base into another. If the modified sequence in edited RNA molecules doesn’t come from a DNA template, then how is it specified? There are a variety of mechanisms that may bring about changes in RNA sequences. In some cases, molecules called guide RNAs (gRNAs) play a crucial role. The gRNAs contain sequences that are partly complementary to segments of the preedited RNA, and the two molecules undergo base pairing in these sequences ( ◗ FIGURE 14.18). After the mRNA is anchored to the gRNA, the mRNA undergoes cleavage and nucleotides are added, deleted, or altered according to the template provided by gRNA. The ends of the mRNA are then joined together. In other cases, enzymes bring about base conversion. In humans, for example, a gene is transcribed into RNA Molecules and RNA Processing Concepts DNA Individual nucleotides in the interior of pre-mRNA may be changed, added, or deleted by RNA editing. The amino acid sequence produced by the edited mRNA is not the same as that encoded by DNA. Transcription RNA RNA PROCESSING www.whfreeman.com/pierce More information on RNA editing and a database of guide RNA sequences Preedited mRNA 5’ AAAAGGGCUUUAACUUCA UUUAAAUAUAUAAUAGAAAAUUGAAGU 3’ Connecting Concepts 1 The preedited mRNA pairs with guide RNA. Preedited mRNA 5’ Guide mRNA 3’ AAAAGGGCUUUAACUUCA UUUUUUUGAAAUUGAAGU AA A A A A A A Eukaryotic Gene Structure and Pre-mRNA Processing 3’ 5’ 2 The guide RNA serves as a template for the addition, deletion, or alteration of bases. 5’ AAAUUUAUGUG UUGUC UUUUAACUUCA UUUAAAUAUAUAAUAGAAAAUUGAAGU 3’ 3’ 5’ 3 The mature mRNA is then released. Mature mRNA 5’ AAAUUUAUGUG UUGUC UUUUAACUUCA 3’ Conclusion: Guide RNA adds nucleotides to the pre-mRNA that were not encoded by the DNA. ◗ 14.18 Chapters 13 and 14 have introduced a number of different components of genes and RNA molecules, including promoters, 5 untranslated regions, coding sequences, introns, 3 untranslated regions, poly(A) tails, and caps. Let’s see how some of these components are combined to create a typical eukaryotic gene and how a mature mRNA is produced from them. RNA editing is carried out by guide RNAs. mRNA that codes for a lipid-transporting polypeptide called apolipoprotein-B100, which has 4563 amino acids and is synthesized in liver cells. A truncated form of the protein called apolipoprotein-B48 — with only 2153 amino acids — is synthesized in intestinal cells. The truncated protein is produced from an edited version of the same mRNA that codes for apolipoprotein-B100. In editing, an enzyme deaminates a cytosine base, converting it into uracil. This conversion changes a codon that specifies the amino acid glutamine into a stop codon that prematurely terminates translation, resulting in the shortened protein. The promoter, which typically encompasses about 100 nucleotides upstream of the transcription start site, is necessary for transcription to take place but is itself not usually transcribed when protein-encoding genes are transcribed by RNA polymerase II ( ◗ FIGURE 14.19a). Farther upstream or downstream of the start site, there may be enhancers that also regulate transcription. In transcription, all the nucleotides between the transcription start site and the stop site are transcribed into pre-mRNA, including exons, introns, and a long 3 end that is later cleaved from the transcript ( ◗ FIGURE 14.19b). Notice that the 5 end of the first exon contains the sequence that codes for the 5 untranslated region, and the 3 end of the last exon contains the sequence that codes for the 3 untranslated region. The pre-mRNA is then processed to yield a mature mRNA. The first step in this processing is the addition of a cap to the 5 end of the pre-mRNA ( ◗ FIGURE 14.19c). Next, the 3 end is cleaved at a site downstream of the AAUAAA consensus sequence in the last exon ( ◗ FIGURE 14.19d). Immediately after cleavage, a poly(A) tail is added to the 3 end ( ◗ FIGURE 14.19e). Finally, the introns are removed to yield the mature mRNA ( ◗ FIGURE 14.19f). The mRNA now contains 5 and 3 untranslated regions, which are not translated into amino acids, and the nucleotides that carry the protein-coding sequences. The nu- 17 18 Chapter I4 (a) Enhancer is typically upstream, but could be downstream or in an intron Promoter 5’ 3’ RNA coding DNA Intron Intron Exon 1 Exon 2 Transcription start 1 All introns, exons, and a long 3’ end are all transcribed into pre-mRNA. (b) Pre-mRNA 5’ Transcription ranscription AAUAAA RNA PROCESSING 3 Cleavage at the 3’ end is approximately 10 nucleotides downstream of the consensus sequence. (d) Pre-mRNA 5’ PROTEIN AAUAAA 4 Polyadenylation at the cleavage site produces the poly(A) tail. (e) Pre-mRNA 5’ 5 Finally, the introns are removed,… AAUAAA 3’ 3’ cleavage site AAAAA 3’ RNA splicing Poly(A) tail mRNA 5’ 5’ untranslated region 3’ cleavage site 3’ cleavage site 6 …producing the mature mRNA. Introns 3’ AAUAAA 2 A 5’ cap is added. Translation 3’ 3’ untranslated region (c) Pre-mRNA 5’ RNA End of transcription Consensus sequence 5’ untranslated region DNA (f) Exon 3 AAAAA 3’ Protein-coding region 3’ untranslated region ◗ 14.19 Mature eukaryotic mRNA is produced when pre-mRNA is transcribed and undergoes several types of processing. cleotide sequence of a small gene, with these components labeled, is presented in ( ◗ FIGURE 14.20). Transfer RNA In 1956, Francis Crick proposed the idea of a molecule that transported amino acids to the ribosome and interacted with codons in mRNA, placing amino acids in their proper order in protein synthesis. By 1963, the existence of such an adapter molecule, called transfer RNA, had been confirmed. Transfer RNA (tRNA) serves as a link between the genetic code in mRNA and the amino acids that make up a protein. Each tRNA attaches to a particular amino acid and carries it to the ribosome, where the tRNA adds its amino acid to the growing polypeptide chain at the position specified by the genetic instructions in the mRNA. We’ll take a closer look at the mechanism of this process in Chapter 15. Each tRNA is capable of attaching to only one type of amino acid. The complex of tRNA plus its amino acid can be written in abbreviated form by adding a three-letter superscript representing the amino acid to the term tRNA. For example, a tRNA that attaches to the amino acid alanine is written as tRNAAla. Because 20 different amino acids are found in proteins, there must be a minimum of 20 different types of tRNA. In fact, most organisms possess from at least 30 to 40 different types of tRNA, each RNA Molecules and RNA Processing 19 TATA box 5’ ….CATCAGAAGAGGAAAAATGAAGGTAATGTTTTTTCAGACAGGTAAAGTCTTTGAAAATATGTGTAATATGTAAAACATTTTGACACCCCCATAATATTTTTCCAGAATTAACAGTATAAATTGCATCTCTTG TTCAAGAGTTCCCTATCACTCTCTTTAATCACTACTCACAGTAACCTCAACTCCTGCCACAATGTACAGGATGCAACTCCTGTCTTGCATTGCACTAAGTCTTGCACTTGTCACAAACAGTGCACCTACTTCAA Start codon Transcription start site Intron 1 Exon 1 GTTCTACAAAGAAAACACAGCTACAACTGGAGCATTTACTTCTGGATTTACAGATGATTTTGAATGGAATTAATGTAAGTATATTTCCTTTCTTACTAAAATTATTACATTTAGTAATCTAGCTGGAGATCATTTCT Exon 2 TAATAACAATGCATTATACTTTCTTAGAATTACAAGAATCCCAAACTCACCAGGATGCTCACATTTAAGTTTTACATGCCCAAGAAGGTAAGTACAATATTTTATGTTCAATTTCTGTTTTAATAAAATTCAAAGTA ATATGAAAATTTGCACAGATGGGACTAATAGCAGCTCATCTGAGGTAAAGAGTAACTTTAATTTGTTTTTTTGAAAACCCAAGTTTGATAATGAAGCCTCTATTAAAACAGTTTTACCTATATTTTTAATATATATTT Intron 2 GTGTGTTGGTGGGGGTGGGAAGAA- - - (+2400bp)- - - -TGCAGAAAGTCTAACATTTTGCAAAGCCAAATTAAGCTAAAACCAGTGAGTCAACTATCACTTAACGCTAGTCATAGGTACTTGAGCCCTAGTTTT TCCAGTTTTATAATGTAAACTCTACTGGTCCATCTTTACAGTGACATTGAGAACAGAGAGAATGGTAAAAACTACATACTGCTACTCCAAATAAAATAAATTGGAAATTAATTTCTGATTCTGACCTCTATGTAAA Exon 3 CTGAGCTGATGATAATTATTATTCTAGGCCACAGAACTGAAACATCTTCAGTGTCTAGAAGAAGAACTCAAACCTCTGGAGGAAGTGCTAAATTTAGCTCAAAGCAAAAACTTTCACTTAAGACCCAGGGACT Intron 3 TAATCAGCAATATCAACGTAATAGTTCTGGAACTAAAGGTAAGGCATTACTTTATTTGCTCTCCTGGAAATAAAAAAAAAAAAGTAGGGGGAAAAGT----(+1900 BP)-----CTTGAAAATAAAGGCAACAGGCCTA Exon 4 TAAGACTTCAATTGGGAATAACTGTATATAAGGTAAACTACTCTGTACTTTAAAAAATTAACATTTTTCTTTTATAGGGATCTGAAACAACATTCATGTGTGAATATGCTGATGAGACAGCAACCATTGTAGAATTT CTGAACAGATGGATTACCTTTTGTCAAAGCATCATCTCAACACTGACTTGATAATTAAGTGCTTCCCACTTAAAACATATCAGGCCTTCTATTTATTTAAATATTTAAATTTTATATTTATTGTTGAATGTATGGTTT Stop codon GCTACCTATTGTAACTATTATTCTTAATCTTAAAACTATAAATATGGATCTTTTATGATTCTTTTTGTAAGCCCTAGGGGCTCTAAAATGGTTTCACTTATTTATCCCAAAATATTTATTATTATGTTGAATGTTAAATA TAGTATCTATGTAGATTGGTTAGTAAAACTATTTAATAAATTTGATAAATATAAACAAGCCTGGATATTTGTTATTTTGGAAACAGCACAGAGTAAGCATTTAAATATTTCTTAGTTACTTGTGTGAACTGTAGGATG 3’ cleavage site Poly(A) consensus sequence GTTAAAATGCTTACAAAAGTCACTCTTTCTCTGAAGAAATATGTAGAACAGAGATGTAGACTTCTCAAAAGCCCTTGCTTT 3’ You can see that non-coding introns occupy large parts of genes, even when we have left out large numbers of bases. Exons code for less than 165 amino acids, a small protein. ◗ 14.20 This representation of the nucleotide sequence of the human interleukin 2 gene includes the TATA box, transcription start site, start and stop codons, introns, exons, poly(A) consensus sequence, and the 3 cleavage site. encoded by a different gene (or, in some cases, multiple copies of a gene) in DNA. The Structure of Transfer RNA A unique feature of tRNA is the occurrence of rare, modified bases. All RNAs have the four standard bases (adenine, cytosine, guanine, and uracil) specified by DNA, but tRNAs have additional bases, including ribothymine, pseudourasil (which is also occasionally present in snRNAs and rRNA), and dozens of others. The structures of some of these modified bases are shown in ( ◗ FIGURE 14.21). If there are only four bases in DNA, and all RNA molecules are transcribed from DNA, how do tRNAs acquire these additional bases? Modified bases arise from chemical changes made to the four standard bases after transcription. These changes are carried out by special tRNA-modifying enzymes. For example, the addition of a methyl group to uracil creates the modified base ribothymine. The structures of all tRNAs are similar, a feature critical to tRNA function. Most tRNAs contain between 74 and 95 nucleotides, some of which are complementary to each other and form intramolecular hydrogen bonds. As a result, each tRNA has a cloverleaf structure ( ◗ FIGURE 14.22). The cloverleaf has four major arms, each consisting of a stem and a loop. The stem is formed by the paring of comple- mentary nucleotides, and the loop lies at the terminus of the stem, where there is no nucleotide pairing. If we start at the top and proceed clockwise around the tRNA shown at the right in Figure 14.22, the four major arms are the acceptor arm, the TC arm, the anticodon arm, and the DHU arm. O O O N Ribothymidine HN O CH3 N Addition of methyl group O N NH Uracil Addition of amino group HN O N Pseudouridine ◗ 14.21 Modified bases are found in tRNAs. All the modified bases are produced by the chemical alteration of the four standard RNA bases. Chapter I4 2 This ribbon model emphasizes the internal regions of base pairing. 3’ A C C 3’ 5’ A Acceptor arm G C G C G U DHU arm C G TψC arm G C U U G C AUG YU Rare CCCC U AGGCC base ( ) UCCGG GGGG G C G C U AAG Anticodon C G Extra arm C G arm C G (size varies) U The anticodon comprises U three bases and interacts GC with a codon in mRNA. Amino acid attachment site (always CCA) CG This icon for tRNA will be used in subsequent chapters. AG G 5’ Hydrogen bonds between paired bases 3 This flattened cloverleaf model shows pairing between complementary nucleotides. A 1 This computer-generated, space-filling molecular model shows the threedimensional structure of a tRNA. C 20 ◗ 14.22 All tRNAs possess a common secondary structure, the cloverleaf structure. The base sequence in the flattened model is for tRNAAla. (Credit for Fig 14.22 allowing rest of line for photo credit here.) The acceptor arm has no loop but contains the 5 and 3 ends of the tRNA molecule. All tRNAs have the same sequence (CCA) at the 3 end, where the amino acid attaches to the tRNA; so clearly this sequence is not responsible for specifying which amino acid will attach to the tRNA. The TC arm is named for the bases of three nucleotides in the loop of this arm: thymine (T), pseudouracil (), and cytosine (C). The anticodon arm lies at the bottom of the tRNA. Three nucleotides at the end of this arm make up the anticodon, which pairs with the corresponding codon on mRNA to ensure that the amino acids link in the correct order. The DHU arm is so named because it often contains the modified base dihydrouridine. Although each tRNA molecule folds into a cloverleaf owing to the complementary paring of bases, the cloverleaf is not the three-dimensional (tertiary) structure of tRNAs found in the cell. The results of X-ray crystallographic studies have shown that the cloverleaf folds upon itself to from an L-shaped structure, as illustrated by the space-filling and ribbon models in Figure 14.22. Notice that the acceptor stem is at one end of the tertiary structure and the anticodon is at the other end. Transfer RNA Gene Structure and Processing The genes that produce tRNAs may be scattered about the genome or may be in clusters. In E. coli, the genes for some tRNAs are present in a single copy, whereas the genes for other tRNAs are present in several copies; eukaryotic cells usually have many copies of each tRNA gene. All tRNA molecules in both bacterial and eukaryotic cells undergo processing after transcription. In E. coli, several tRNAs are usually transcribed together as one large precursor tRNA, which is then cut up into pieces, each containing a single tRNA. Additional nucleotides may then be removed one at a time from the 5 and 3 ends of the tRNA in a process known as trimming. Base-modifying enzymes may then change some of the standard bases into modified bases, and additional bases (such as CCA at the 3 end) may be added ( ◗ FIGURE 14.23). Different tRNAs are processed in different ways; so it is not possible to outline a generic processing pathway for all tRNAs. Eukaryotic tRNAs are processed in a manner similar to that for bacterial tRNAs: most are transcribed as larger precursors that are then cleaved, trimmed, and modified to produce mature tRNAs. Some eukaryotic tRNA genes possess introns of variable length that must be removed in processing. For example, about 40 of the 400 tRNA genes in yeast contain a single intron that is always found adjacent to the 3 side of the anticodon. The splicing process for tRNA genes (see Figure 14.23) is quite different from the spliceosome-mediated reactions that remove introns from protein-encoding genes. The intron in the precursor tRNA is cut at both ends by an endonuclease enzyme, which releases the linear intron from the rest of the tRNA. The two pieces of tRNA, which are held together by intramolecular bonding, are then folded and ligated to produce the mature tRNA. RNA Molecules and RNA Processing 1 A large precursor tRNA… 2 …is cleaved to produce an individual tRNA molecule. 3 An intron is removed by splicing,… 4 …and bases are added to the 3’ end. Precursor tRNA 3’ 3’ 5’ Will form anticodon 5’ A G P 5’ splice site A G C 3’ 5’ 3’ 5’ Mature tRNA 3’ 5’ AGC AGC 3’ splice site A C C 5 Modification of several bases ( • ) produces the mature tRNA. A C C GC Anticodon Conclusion: tRNA processing may include cleavage, splicing, base addition, and base modification. Intron ◗ 14.23 Transfer RNAs are processed in both bacterial and eukaryotic cells. Different tRNAs are modified in different ways. One example is shown here. Concepts All tRNAs are similar in size and have a common secondary structure known as the cloverleaf. Transfer RNAs contain modified bases and are extensively processed after transcription in both bacterial and eukaryotic cells. Ribosomal RNA Within ribosomes, the genetic instructions contained in mRNA are translated into the amino acid sequences of polypeptides. Thus, ribosomes play an integral part in the transfer of genetic information from genotype to phenotype. We will examine the role of ribosomes in the process of translation in Chapter 15. Here, we will consider ribo- some structure and examine how ribosomes are processed before becoming functional. The Structure of the Ribosome The ribosome is one of the most abundant organelles in the cell: a single bacterial cell may contain as many as 20,000 ribosomes, and eukaryotic cells possess even more. Ribosomes typically contain about 80% of the total cellular RNA. They are complex organelles, each consisting of more than 50 different proteins and RNA molecules (Table 14.3) A functional ribosome consists of two subunits, a large subunit and a small subunit, each of which consists of one or more pieces of RNA and a number of proteins. The sizes of the ribosomes and their RNA components are given in Svedberg (S) units (a measure of how rapidly an object sediments in a centrifugal field). It is important to note that Svedberg units are not additive; in other words, combining a 10S Table 14.3 Composition of ribosomes in bacterial and eukaryotic cells Cell Type Bacterial Eukaryotic Ribosome Size 70S 80S Note:The letter S stands for Svedberg unit. Subunit rRNA Component Proteins Large (50S) 23S (2900 nucleotides) 5S (120 nucleotides) 31 Small (30S) 16S (1500 nucleotides) 21 Large (60S) 28S (4700 nucleotides) 5.8S (160 nucleotides) 5S (120 nucleotides) 49 Small (40S) 18S (1900 nucleotides) 33 21 22 Chapter I4 structure and a 20S structure does not necessarily produce a 30S structure, because the sedimentation rate is affected by the three-dimensional structure as well as the mass. Table 14.4 Number of rRNA genes in different organisms Ribosomal RNA Gene Structure and Processing The genes for rRNA, like those for tRNA, can be present in multiple copies, and the numbers vary among species (Table 14.4); all copies of the rRNA gene in a species are identical or nearly identical. In bacteria, rRNA genes are dispersed, but, in eukaryotic cells, they are clustered, with the genes arrayed in tandem, one after another. Eukaryotic cells possess two types of rRNA genes: a large one that encodes 18S rRNA, 28S rRNA, and 5.8S rRNA, and a small one that encodes the 5S rRNA. All three bacterial rRNAs (23S rRNA, 16S rRNA, and 5S rRNA) are encoded by a single type of gene. Ribosomal RNA is processed in both bacterial and eukaryotic cells. In E. coli, the immediate product of transcription is a 30S rRNA precursor ( ◗ FIGURE 14.24a). Methyl groups (CH3) are added to specific bases and to the 2 carbon of some of the ribose sugars of this 30S precursor, which is then cleaved into several pieces and trimmed to produce 16S rRNA, 23S rRNA, and 5S rRNA, along with one or more tRNAs. All rRNA genes in E. coli produce the same three rRNA molecules, but the number and location of these rRNAs within the 30S rRNA transcript differ among genes. Eukaryotic rRNAs undergo similar processing ( ◗ FIGURE 14.24b). Small nucleolar RNAs help to cleave and Copies of rRNA Genes or Genome Species Escherichia coli 1 Yeast 100 – 200 Human 280 Frog 450 modify eukaryotic rRNAs (as well as some archaeal rRNAs), and help to assemble the processed rRNAs into mature ribosomes. The snoRNAs have extensive complementarity to the rRNA sequences where modification takes place. Interestingly, some snoRNAs are encoded by sequences in the introns of other protein-encoding genes. Concepts A ribosome is a complex organelle consisting of several rRNA molecules and many proteins. Each functional ribosome consists of a large and a small subunit. rRNAs in both bacterial and eukaryotic cells are modified after transcription. In eukaryotes, rRNA processing is carried out by small nucleolar RNAs (snoRNAs). (a) Prokaryotic rRNAs (b) Eukaryotic rRNAs Precursor rRNA transcript (30S) Precursor rRNA transcript (45S) 1 Methyl groups are added to specific bases and to the 2’-carbon atom of some ribose sugars. Methylation Methylation Methyl groups 2 The RNA is cleaved into several intermediates… Intermediates 16S tRNA 23S 5S 3 …and trimmed. Mature RNAs 16S rRNA tRNA 23S rRNA 5S rRNA 4 Mature rRNA molecules are the result. ◗ 14.24 Ribosomal RNA is processed after transcription. Note that eukaryotic rRNA does not undergo trimming and that 5S rRNA is transcribed separately from the small eukaryotic rRNA gene. 18S rRNA 5.8S rRNA 28S rRNA RNA Molecules and RNA Processing Connecting Concepts Across Chapters Because it is single stranded and can form hydrogen bonds between complementary bases on the same strand, RNA is capable of assuming a number of secondary structures. This ability gives RNA functional flexibility, and it assumes a number of important roles in information transfer within the cell. A central theme in this chapter has been the nature of the gene. The concept of a gene has changed with time and even today depends on the particular question that is being addressed. A modern definition used by many geneticists is: a gene is a sequence of nucleotides in DNA that is transcribed into a single RNA molecule. The details of RNA function and processing covered in this chapter are important for understanding the process of protein synthesis, which is the focus of Chapter 15. Knowledge of the structure of the ribosome and tRNAs will be important for understanding how amino acids are assembled into a protein. In eukaryotic cells, features [such as the 5 cap and the poly(A) tail] that are added to pre-mRNA and those removed (introns) from it are essential for translation to proceed properly. These features of processed mRNA also play an important role in eukaryotic gene regulation, a subject to be addressed in Chapter 16. CONCEPTS SUMMARY • The discovery of introns in eukaryotic genes forced the redefinition of the gene at the molecular level. Today, a gene is often defined as a sequence of DNA nucleotides that is transcribed into a single RNA molecule. • Some introns found in rRNA genes and mitochondrial genes are self-splicing. • Introns are noncoding sequences that interrupt the coding sequences (exons) of genes. Common in eukaryotic cells but rare in bacterial cells, introns exist in all types of genes and vary in size and number. They comprise four major types: group I introns, group II introns, nuclear pre-mRNA introns, and tRNA introns. • Messenger RNAs may also be altered by the addition, deletion, or modification of nucleotides in the coding sequence, a process called RNA editing. • The results of experiments in the late 1950s and early 1960s suggested that genetic information is carried from DNA to ribosomes by short-lived RNA molecules called messenger RNA. An mRNA molecule has three primary parts: a 5 untranslated region, a protein-coding sequence, and a 3 untranslated region. • Bacterial mRNA is translated immediately after transcription and undergoes little processing. • The primary transcript (pre-mRNA) of a eukaryotic protein-encoding gene is extensively processed: a modified nucleotide and methyl groups, collectively termed the cap, are added to the 5 end of premRNA; the 3 end is cleaved and a poly(A) tail is added; and introns are removed. • The process of RNA splicing takes place within a structure called the spliceosome, which is composed of several small nuclear RNAs and proteins. RNA splicing takes place in a two-step process that entails RNA – RNA interactions among snRNAs of the spliceosome and the pre-mRNA. • Some pre-mRNAs undergo alternative splicing, in which different combinations of exons are spliced together or different 3 cleavage sites are used. • Transfer RNA serves as a bridge between amino acids and the genetic information carried in mRNA. Transfer RNAs are relatively short molecules that assume a common secondary structure and contain modified bases. Most organisms have multiple copies of tRNA genes; the tRNAs transcribed from these genes are extensively processed in bacterial and eukaryotic cells. • Ribosomes are the sites of protein synthesis in the cell. Each ribosome is composed of several rRNA molecules and a number of proteins that form a large and a small subunit. Genes for rRNA exist in multiple copies; the primary transcripts from these genes are extensively modified after transcription in bacterial and eukaryotic cells. In eukaryotic cells, rRNA processing is carried out by small nucleolar RNAs. 23 24 Chapter I4 IMPORTANT TERMS colinearity (p. 000) exon (p. 000) intron (p. 000) group I intron (p. 000) group II intron (p. 000) nuclear pre-mRNA intron (p. 000) tRNA intron (p. 000) codon (p. 000) 5 untranslated region (p. 000) Shine-Dalgarno sequence (p. 000) protein-coding region (p. 000) 3 untranslated region (p. 000) 5 cap (p. 000) poly(A) tail (p. 000) RNA splicing (p. 000) 5 splice site (p. 000) 3 splice site (p. 000) branch point (p. 000) spliceosome (p. 000) lariat (p. 000) transesterification (p. 000) alternative processing pathway (p. 000) alternative splicing (p. 000) multiple 3 cleavage site (p. 000) RNA editing (p. 000) guide RNA (p. 000) modified base (p. 000) tRNA-modifying enzyme (p. 000) cloverleaf structure (p. 000) anticodon (p. 000) large ribosomal subunit (p. 000) small ribosomal subunit (p. 000) Worked Problems (b) Intron 1. DNA from a eukaryotic gene was isolated, denatured, and hybridized to the mRNA transcribed from the gene; the hybridized structure was then observed with the use of an electron microscope. The following structure was observed. Exon Exon Intron Exon Intron Exon Exon Exon Intron Intron (a) How many introns and exons are there in this gene? Explain your answer. (b) Identify the exons and introns in this hybridized structure. 2. Draw a typical bacterial mRNA and the gene from which it was transcribed. Label the 5 and 3 ends of the RNA and DNA molecules, and identify the following regions or sequences: (a) Promoter (e) Transcription start site (b) 5 untranslated region (f) Terminator (c) 3 untranslated region (g) Shine-Dalgarno sequence (d) Protein-coding sequence • Solution (a) Each of the loops represents a region where there are sequences in the DNA that do not have corresponding sequences in the RNA; these regions are introns. There are five loops in the hybridized structure; so there must be five introns in the DNA. Transcription start DNA 5 3 Transcription stop Promoter Start codon Stop codon Terminator ShineDalgarno Protein-coding sequence sequence RNA 5 3 5 untranslated 3 untranslated region region RNA Molecules and RNA Processing 25 • Solution • Solution 3. A test-tube splicing system has been developed that contains all the components (snRNAs, proteins, splicing factors) necessary for the splicing of nuclear genes. When a piece of RNA containing an intron and two exons is added to the system, the intron is removed as a lariat and the exons are spliced together. If the RNA molecule added to the system has the following mutations, what intermediate products of the splicing reactions will accumulate? Explain your answer. (a) The GT sequence at the 5 splice site is required for the attachment of the U1 snRNP and the first cleavage reaction. If this sequence is mutated, cleavage will not take place. Thus, the original pre-mRNA with the intron will accumulate. (b) After cleavage at the 5 splice site, the 5 end of the intron attaches to the A at the branch point in a transesterification reaction. If the A at the branch point is deleted, no lariat structure will form. The separated first exon and the intron attached to the second exon will accumulate as intermediate products. (c) The AG sequence at the 3 splice site is required for cleavage at the 3 splice site. If this sequence is mutated, accumulated intermediate products will be: (1) the separated first exon and (2) the intron attached to the second exon, with the 5 end of the intron attached to the branch point to form a lariat structure. (a) GT at the 5 splice site is deleted. (b) A at the branch point is deleted. (c) AG at the 3 splice site is deleted. The New Genetics MINING GENOMES RIBOSOMAL RNA STUDIES Ribosomal RNA is the most plentiful nucleic acid in cells and is widely exploited for a variety of genetic studies. This exercise introduces you to some of the ways that ribosomal RNA sequences are used and to the collection of tools at the Ribosomal Database Project II, managed by Michigan State University’s Center for Microbial Ecology. COMPREHENSION QUESTIONS * 1. What is the concept of colinearity? In what way is this concept fulfilled in bacterial and eukaryotic cells? 2. What are some characteristics of introns? * 3. What are the four basic types of introns? In which genes are they found? * 4. What are the three principal elements in mRNA sequences in bacterial cells? 5. What is the function of the Shine-Dalgarno consensus sequence? * 6. (a) What is the 5 cap? (b) How is the 5 cap added to eukaryotic pre-mRNA? (c) What is the function of the 5 cap? 7. How is the poly(A) tail added to pre-mRNA? What is the purpose of the poly(A) tail? * 8. What makes up the spliceosome? What is the function of the spliceosome? 9. Explain the process of pre-mRNA splicing in nuclear genes. 10. Describe two types of alternative processing pathways. How do they lead to the production of multiple proteins from a single gene? *11. What is RNA editing? Explain the role of guide RNAs in RNA editing. *12. Summarize the different types of processing that can take place in pre-mRNA. *13. What are some of the modifications in tRNA processing? 14. Describe the basic structure of ribosomes in bacterial and eukaryotic cells *15. Explain how rRNA is processed. APPLICATION QUESTIONS AND PROBLEMS *16. At the beginning of the chapter, we considered Duchenne muscular dystrophy and the dystrophin gene. We learned that the gene causing Duchenne muscular dystrophy encompasses more than 2 million nucleotides, but less than 1% of the gene encodes the protein dystrophin. On the basis of what you now know about gene structure and RNA processing in eukaryotic cells, provide a possible explanation for the large size of the dystrophin gene. 17. How do the mRNA of bacterial cells and the pre-mRNA of eukaryotic cells differ? How do the mature mRNAs of bacterial and eukaryotic cells differ? 26 Chapter I4 *18. Draw a typical eukaryotic gene and the pre-mRNA and mRNA derived from it. Assume that the gene contains three exons. Identify the following items and, for each item, give a brief description of its function: (a) 5 untranslated region (e) 3 untranslated region (b) Promoter (f) Introns (c) AAUAAA consensus (g) Exons sequence (h) Poly(A) tail (d) Transcription start site (i) 5 cap 19. How would the deletion of the Shine-Dalgarno sequence affect a bacterial mRNA? *20. How would the deletion of the following sequences or features most likely affect a eukaryotic pre-mRNA? (a) AAUAAA consensus sequence (b) 5 cap (c) Poly(A) tail 21. What would be the most likely effect on the amino acid sequence of a protein of a mutation that occurred in an intron of the gene encoding the protein? Explain your answer. 22. A geneticist induces a mutation in the gene that codes for cleavage and polyadenylation specificity factor (CPSF) in a line of cells growing in the laboratory. What would be the immediate effect of this mutation on RNA molecules in the cultured cells? *23. A geneticist mutates the gene for proteins that bind to the poly(A) tail in a line of cells growing in the laboratory. What would be the immediate effect of this mutation in the cultured cells? *24. An in vitro (within a test tube) splicing system has been developed that contains all the components (snRNAs, proteins, splicing factors) necessary for the splicing of nuclear pre-mRNA genes. When a piece of RNA containing an intron and two exons is added to the system, the intron is removed as a lariat and the exons are spliced together. What intermediate products of the splicing reaction would accumulate if the following components were omitted from the splicing system? Explain your reasoning. (a) U1 (c) U6 (e) U4 (b) U2 (d) U5 25. The splicing system introduced in Problem 24 is used to splice an RNA molecule containing two exons and one intron. This time, however, the U2 snRNA used in the splicing reaction contains several mutations in the sequence that pairs with the U6 snRNA. What would be the effect of these mutations on the splicing process? 26. A geneticist isolates a gene that contains five exons. He then isolates the mature mRNA produced by this gene. After making the DNA single stranded, he mixes the single-stranded DNA and RNA. Some of the single-stranded DNA hybridizes (pairs) with the complementary mRNA. Draw a picture of what the DNA – RNA hybrids would look like under the electron microscope. 27. The chemical reagent psoralen can be used to elucidate nucleic acid structure. This chemical attaches itself to nucleic acids and, on exposure to UV light, forms covalent bonds between closely associated nucleotide sequences. Such cross-links provide information about the proximity of RNA molecules to one another in complex structures. Psoralen cross-linking has been used to examine the structure of the spliceosome. In one study, the following cross-linked structures were obtained during splicing. U1, U2, U5, and U6 became cross-linked to pre-mRNA. U2 was cross-linked to U6 and to pre-mRNA. The U1, U5, and U6 cross-links with pre-mRNA were mapped to sequences near the 5 splice site, whereas the U2 snRNA cross-links with pre-mRNA were mapped to the branch site. After splicing, U2, U5, and U6 were cross-linked to the excised lariat. Explain these results in regard to what is known about the structure of the spliceosome and how it functions in RNA splicing. (Based on D. A. Wassarman, and J. A. Steitz, 1992, Interactions of small nuclear RNAs with precursor messenger RNA during in vitro splicing, Science 257:1918–1925.) CHALLENGE QUESTIONS 28. In addition to snRNAs, the spliceosome contains a number of proteins. Some of these proteins are associated with the snRNAs to form snRNPs. Other proteins are associated with the spliceosome but are not associated with any specific snRNA. One group of spliceosomal proteins comprises the precursor RNA-processing (PRP) proteins. Three PRP proteins that directly take part in splicing are PRP2, PRP16, and PRP22. The results of studies have shown that PRP2 is required for the first step of the splicing reaction, PRP16 acts at the second step, and PRP22 is required for the release of the mRNA from the spliceosome. Other studies have found that these PRP proteins have amino acid sequences similar to the sequences found in RNA helicase enzymes — enzymes that are capable of unwinding two paired RNA molecules. On the basis of this information, propose a functional role for PRP2, PRP16, and PRP22 in RNA splicing. 29. Propose a scenario by which spliceosomal-mediated splicing might have evolved from the splicing of group II introns. RNA Molecules and RNA Processing 27 SUGGESTED READINGS Bjork, G. R., J. U. Erikson, C. E. D. Gustafsson, T. G. Hagervall, Y. H. Jonsson, and P. M. Wikstrom. 1987. Transfer RNA modification. Annual Review of Biochemistry 56:263 – 288. A review of how tRNA is processed. Broker, T. R., L. T. Chow, A. R. Dunn, R. E. Gelinas, J. A. Hassel, D. F. Klessig, J. B. Lewis, R. J. Roberts, and B. S. Zain. 1978. Adenovirus-2 messengers: an example of baroque molecular architecture. Cold Spring Harbor Symposium on Quantatative Biology 42:531 – 534. One of the first reports of introns in eukaryotic genes. Gott, J. M., and R. B. Emerson. 2000. Functions and mechanisms of RNA editing. Annual Review of Genetics 34:499 – 531. An extensive review of the different types of RNA editing and their mechanisms. Hurst, L. D. 1994. The uncertain origin of introns. Nature 371:381 – 382. A discussion of some of the ideas about when and how introns first arose. Keller, W. 1995. No end yet to messenger RNA 3 processing. Cell 81:829 – 832. An excellent review of processing at the 3 end of eukaryotic pre-mRNA. Lake, J. A. 1981. The ribosome. Scientific American 245(2):84 – 97. A review of the structure of ribosomes. Landweber, L. F., P. J. Simon, and T. A. Wagner. 1998. Ribozyme engineering and early evolution. Bioscience 48:94 – 103. A nice review of the idea that early life may have consisted of an RNA world. McKeown, M. 1992. Alternative mRNA splicing. Annual Review of Cell Biology 8:133 – 155. An extensive review that discusses the different types of alternative splicing with specific examples of each type. Misteli, T., J. F. Caceres, and D. L. Spector. 1997. The dynamics of a pre-mRNA splicing factor in living cells. Nature 387:523 – 527. Reports that pre-mRNA splicing and transcription take place at the same sites in the nucleus. Nilsen, T. W. 1994. RNA – RNA interactions in the spliceosome: unraveling the ties that bind. Cell 78: 1 – 4. An excellent summary of how RNA – RNA interactions play an important role in the splicing of nuclear pre-mRNAs. Noller, H. F. 1984. Structure of ribosomal RNA. Annual Review of Biochemistry 53:119 – 162. A review of rRNA. Rich, A., and S. H. Kim. 1978. The three-dimensional structure of transfer RNA. Scientific American 238(1):52 – 62. Discusses the structure of tRNA. Scott, J. 1995. A place in the world for RNA editing. Cell 81:833 – 836. A good, succinct review of RNA editing. Smith, C. M., and J. A. Steitz. 1997. Sno storm in the nucleolus: new roles for myriad small RNPs. Cell 89:669 – 672. A good review of the role of snoRNAs in rRNA processing. Volkin, E., and L. Astrachan. 1956. Phosphorous incorporation in Escherichia coli ribonucleic acid after infections with bacteriophage T2. Virology 2:149 – 161. Reports the discovery of short-lived RNA after phage infection. 15 The Genetic Code and Translation • Lesch-Nyhan Syndrome and the Relation Between Genotype and Phenotype The Molecular Relation Between Genotype and Phenotype The One Gene, One Enzyme Hypothesis The Structure and Function of Proteins • The Genetic Code Breaking the Genetic Code The Degeneracy of the Code The Reading Frame and Initiation Codons Termination Codons The Universality of the Code • The Process of Translation The Binding of Amino Acids to Transfer RNAs The Initiation of Translation Elongation This is photo legend x 26 picas width for opening chapter photo for Chapter 15. This is legend copy area for Chapter opening photo. (Phil A. Harrington/Peter Arnold). Termination The Energy of Protein Connecting Concepts RNA–RNA Interactions in Translation Polyribosomes The Posttranslational Modifications of Proteins Translation and Antibiotics Lesch-Nyhan Syndrome and the Relation Between Genotype and Phenotype In 1962, Dr. William Nyhan and his student Michael Lesch examined a seriously ill boy with a strange combination of symptoms. The boy had blood in his urine, high concentrations of uric acid in his blood, and uncontrollable spasms in his arms and legs. He was mentally retarded and selfdestructively bit his fingers and lips. After carefully studying the boy, Nyhan and Lesch came to the conclusion that he was afflicted by an undescribed disease. Soon other patients with similar symptoms were reported, and the disease became known as the Lesch-Nyhan syndrome. 404 One of the earliest symptoms of Lesche-Nyhan syndrome is the appearance of orange “sand ” — actually uric acid crystals — in diapers a few weeks after birth. Within a year, the child begins to exhibit writhing movements of the hands and feet and involuntary spasms. About half of the children have seizures. After 2 or 3 years, some of the children exhibit compulsive self-mutilation — biting fingers, lips, the tongue, and the insides of the mouth. Lesch-Nyhan syndrome develops almost exclusively in boys, and the trait is inherited as an X-linked recessive disorder; the presence of a defective gene on a male’s single X chromosome causes the disease. In 1967, a team of scientists at the National Institutes of Health determined that the disease results from a defective copy of the gene that normally encodes the enzyme hypoxanthine-guanine phos- The Genetic Code and Translation phoribosyl transferase (HGPRT). As DNA and RNA are degraded in the cell, purines are liberated, and HGPRT salvages these purines and uses them to synthesize new RNA and DNA nucleotides. In people who have Lesch-Nyhan syndrome, a mutation in the gene for HGPRT changes the amino acid sequence of the enzyme, rendering it nonfunctional. The result is that purines are not recycled; they accumulate and are converted into uric acid. High levels of uric acid produce the symptoms of the disease. Lesch-Nyhan syndrome illustrates the link between genotype and phenotype: a mutation in a gene affects a protein, which then produces the symptoms of the disease. Preceding chapters described how DNA encodes genetic information and how that information is transferred from DNA to RNA. In this chapter, we examine translation, the process by which the nucleotide sequence in mRNA specifies the amino acid sequence of a protein. We begin by examining the molecular relation between genotype and phenotype. As with Lesch-Nyhan syndrome, the final phenotype may be complex, including biochemical, physiological, and behavioral traits, but it is ultimately caused by the protein that the gene encodes. We next study the genetic code — the instructions that specify the amino acid sequence of a protein — and then examine the mechanism of protein synthesis. Our primary focus will be on protein synthesis in bacterial cells, but we will highlight some of the differences in eukaryotic cells. www.whfreeman.com/pierce Lesch-Nyhan syndrome can synthesize all the biological molecules that it needs from these basic compounds. However, mutations may arise that disrupt fungal growth by destroying the fungus’s ability to synthesize one or more essential biological molecules. 1 The vegetative haploid mycelium produces haploid spores through mitosis. 2 These spores may germinate and produce a genetically identical mycelium (asexual reproduction)… Haploid (n) mycelium mating type A Mitosis Asexual reproduction Sexual reproduction 3 …or they may land on the fruiting body of a different mating type and reproduce sexually. 4 Within the fruiting body, nuclei of opposite mating types fuse to form a diploid meiocyte. Fruiting body More information on The Molecular Relation Between Genotype and Phenotype Nuclear Fusion Germination and growth Meiocyte Germination and growth Diploid, 2n Meiosis I 5 The meiocyte undergoes meiosis I and II and one mitosis… The first person to suggest the existence of a relation between genotype and proteins was Archibald Garrod. In 1908, Garrod correctly proposed that genes encode enzymes (see p. 000), but, unfortunately, his theory made little impression on his contemporaries. Not until the 1940s, when George Beadle and Edward Tatum examined the genetic basis of biochemical pathways in Neurospora, did the relation between genes and proteins become widely accepted. Haploid, n Meiosis II Type A ascospore Type a ascospore Mitosis Ascus The One Gene, One Enzyme Hypothesis Beadle and Tatum used the bread mold Neurospora to study the biochemical result of mutations. Neurospora is a good model organism because it is easy to cultivate in the laboratory and because the main vegetative part of the fungus is haploid; the haploid state allows the effects of recessive mutations to be easily observed ( ◗ FIGURE 15.1). Wild-type Neurospora grows on minimal medium, which contains only inorganic salts, nitrogen, a carbon source such as sucrose, and the vitamin biotin. The fungus Haploid mycelium mating type a Haploid spores Four type A ascospores 6 …to produce an ascus containing eight haploid ascospores, four each of the two parental mating types. ◗ Four type a ascospores 7 The spores are released from the ascus and germinate to form new mycelia. 15.1 Beadle and Tatum used the fungus Neurospora, which has a complex life cycle, to work out the relation of genes to proteins. 405 406 Chapter 15 These nutritionally deficient mutants, termed auxotrophs, will not grow on minimal medium, but they can grow on medium that contains the substance that they are no longer able to synthesize. ◗ Beadle and Tatum first irradiated spores of Neurospora to induce mutations ( ◗ FIGURE 15.2). After irradiation, each spore was placed into a different culture tube with complete medium (medium containing all the biological substances needed for growth (see Figure 15.2). Next, they transferred spores from each culture to tubes containing minimal medium. Fungi containing auxotrophic mutations grew on complete medium but would not grow on minimal medium, which allowed Beadle and Tatum to identify cultures that contained mutations. After they had determined that a particular culture had an auxotrophic mutation, Beadle and Tatum set out to determine the specific effect of the mutation. They transferred spores of each mutant strain from complete medium to a series of tubes (see Figure 15.2), each of which possessed minimal medium plus one of a variety of essential biological molecules, such as an amino acid. If the spores in a tube grew, Beadle and Tatum were able to identify the added substance as the biological molecule whose synthesis had been affected by the mutation. For example, an auxotrophic mutant that would grow only on minimal medium to which arginine had been added must have possessed a mutation that disrupts the synthesis of arginine. Patient application of this procedure allowed the genetic dissection of multistep biochemical pathways. Adrian Srb and Norman H. Horowitz used this method to investigate genes that control arginine synthesis ( ◗ FIGURE 15.3). 15.2 Beadle and Tatum developed a method for isolating auxotrophic mutants in Neurospora. The Genetic Code and Translation ◗ 15.3 Beadle and Tatum established that each step in a pathway is controlled by a different enzyme. This biochemical pathway leads to the synthesis of arginine in Neurospora. Steps in the pathway are catalyzed by enzymes affected by mutants. They first isolated a series of auxotrophic mutants whose growth required arginine. They then tested these mutants for their ability to grow on minimal medium supplemented with three compounds: ornithine, citrulline, and arginine. From the results, they were able to place the mutants into three groups (Table 15.1) based on which of the substances allowed growth. Group I mutants grew on minimal medium supplemented with ornithine, citrulline, or arginine. Group II mutants grew on minimal medium supplemented with either arginine or citrulline but did not grow on 407 408 Chapter 15 tions in groups II and III affect steps that come after the production of ornithine. We’ve already established that group II mutations affect a step before the production of citrulline; so group II mutations must block the conversion of ornithine into citrulline. Table 15.1 Growth of arginine auxotrophic mutants on minimal medium with various supplements Mutant Strain Number Ornithine Citrulline Arginine Group I    Group II    Group III    Note:  indicates growth;  indicates no growth. medium supplemented only with ornithine. Finally, group III mutants grew only on medium supplemented with arginine. Srb and Horowitz therefore proposed that the biochemical pathway leading to the amino acid arginine has at least three steps: Step 1 Step 2 Group II mutations ornithine 999: citrulline 999: arginine Because group I mutations affect some step before the production of ornithine, we can conclude that they must affect the conversion of some precursor into ornithine. We can now outline the biochemical pathway yielding ornithine, citrulline, and arginine. Group I mutations Group III mutations citrulline 999: arginine precursor 9: ornithine 9: citrulline 9: arginine Group III mutations citrulline 999: arginine The addition of ornithine allows the growth of group I mutants but not group II or group III mutants; thus, muta- Group II mutations precursor 999: ornithine 999: Step 3 They concluded that the mutations in group I affected step 1 of this pathway, mutations in group II affected step 2, and mutations in group III affected step 3. But how did they know that the order of the compounds in the biochemical pathway was correct? Notice that, if step 1 is blocked by a mutation, then the addition of either ornithine or citrulline allows growth, because these compounds can still be converted into arginine (see Figure 15.3). Similarly, if step 2 is blocked, the addition of citrulline allows growth, but the addition of ornithine has no effect. If step 3 is blocked, the spores will grow only if arginine is added to the medium. The underlying principle is that an auxotrophic mutant cannot synthesize any compound that comes after the step blocked by a mutation. Using this reasoning with the information in Table 15.1, we can see that the addition of arginine to the medium allows all three groups of mutants to grow. Therefore, biochemical steps affected by all the mutants precede the step that results in arginine. The addition of citrulline allows group I and group II mutants to grow but not group III mutants; therefore, group III mutations must affect a biochemical step that takes place after the production of citrulline but before the production of arginine. Group III mutations It is important to note that this procedure does not necessarily detect all steps in a pathway; rather, it detects only the steps producing the compounds tested. Using mutations and this type of reasoning, Beadle and Tatum were able to identify genes that control several biosynthetic pathways in Neurospora. They established that each step in a pathway is controlled by a different enzyme, as shown in Figure 15.3 for the arginine pathway. The results of genetic crosses and mapping studies demonstrated that mutations affecting any one step in a pathway always map to the same chromosomal location. Beadle and Tatum reasoned that mutations affecting a particular biochemical step occurred at a single locus that encoded a particular enzyme. This idea became known as the one gene, one enzyme hypothesis: genes function by encoding enzymes, and each gene encodes a separate enzyme. When research showed that some proteins are composed of more than one polypeptide chain and that different polypeptide chains are encoded by separate genes, this model was modified to become the one gene, one polypeptide hypothesis. Concepts Beadle and Tatum’s studies of biochemical pathways in the fungus Neurospora established the one gene, one enzyme hypothesis, the idea that each gene encodes a separate enzyme. This hypothesis was later modified to become the one gene, one polypeptide hypothesis. The Genetic Code and Translation (a) www.whfreeman.com/pierce Further information about the use of Neurospora in genetics The Structure and Function of Proteins (b) (c) Proteins are central to all living processes ( ◗ FIGURE 15.4). Many proteins are enzymes, the biological catalysts that drive the chemical reactions of the cell; others are structural components, providing scaffolding and support for membranes, filaments, bone, and hair. Some proteins help transport substances; others have a regulatory, communication, or defense function. All proteins are composed of amino acids, linked end to end. There are 20 common amino acids found in proteins; these amino acids are shown in ◗ FIGURE 15.5 with both their three- and one-letter abbreviations. (Other amino acids that are sometimes found in proteins are modified forms of the common amino acids.) The 20 common amino acids are similar in structure, differing only in the structures of the R (radical) groups. The amino acids in proteins are joined by peptide bonds ( ◗ FIGURE 15.6) to form polypeptide chains, and a protein consists of one or more polypeptide chains. Like nucleic acids, polypeptides have polarity with one end having a free amino group (NH2) and the other end possessing a free carboxyl group (CO2H). Some proteins consist of only a few amino acids, whereas others may have thousands. Like that of nucleic acids, the molecular structure of proteins has several levels of organization. The primary structure of a protein is its sequence of amino acids ( ◗ FIGURE 15.7a). Through interactions between neighboring amino acids, a polypeptide chain folds and twists into a secondary structure ( ◗ FIGURE 15.7b); two common secondary structures found in proteins are the beta () pleated sheet and the alpha () helix. Secondary structures interact and fold further to form a tertiary structure ( ◗ FIGURE 15.7c), which is the overall, three-dimensional shape of the protein. The secondary and tertiary structures of a protein are ultimately determined by the primary structure — the amino acid sequence — of the protein. Finally, some proteins consist of two or more polypeptide chains that associate to produce a quaternary structure ( ◗ FIGURE 15.7d). Concepts ◗ 15.4 Proteins serve a number of biological functions and are central to all living processes. (a) The light produced by fireflies is the result of a light-producing reaction between luciferin and ATP catalyzed by the enzyme luciferase. (b) The protein fibroin is the major structural component of spider webs. (c) Castor beans contain a highly toxic protein called ricin. (Part a, Gregory K. Scott/Photo Researchers; part b, A. Shay/ Animals Animals; part c, Gerald & Buff Corsi/Visuals Unlimited). The product of many genes is a protein, whose action produces the trait encoded by that gene. Proteins are polymers, consisting of amino acids linked by peptide bonds. The amino acid sequence of a protein is its primary structure. This structure folds to create the secondary and tertiary structures; two or more polypeptide chains may associate to create a quaternary structure. www.whfreeman.com/pierce structure More information on protein 409 410 Chapter 15 ◗ 15.5 The common amino acids have similar structures. Each amino acid consists of a central () carbon atom attached to: (1) an amino group (NH3+); (2) a carboxyl group (COO); (3) a hydrogen atom (H); and (4) a radical group, designated R. In the structures of the 20 common amino acids, the parts in black are common to all amino acids and the parts in red are the R groups. Hydrogen H Amino Carboxyl group group H N999 C 999 COOH  3 R Radical group (side chain) Nonpolar, aliphatic R groups COO COO H N9C9H 3 H N9C9H 3 H H N9C9H 3 CH3 Glycine (Gly, G) COO H N9C9H 3 H N9C9H 3 CH2 H9C9CH3 CH CH2 CH3 Leucine (Leu, L) CH2OH Serine (Ser, S)  H N9C 9H 3 CH2 CH CH3 Isoleucine (Ile, I) COO H C H N CH2 2 COO COO H N9C 9H 3 H N9C9H 3 CH2 H N9C9H 3 CH3 Threonine (Thr, T) C Proline (Pro, P) Phenylanine (Phe, F) H N9C 9H 3 Tyrosine (Try, Y) H N9C 9H 3 H N9C9H 3 CH2 COO CH2 CH2 C H N9C 9H 3 CH2 CH2 CH2 SH Cysteine (Cys, C) COO H N9C9H 3 H N9C9H 3 H N9C9H 3 CH2 NH 3 Lysine (Lys, K)  C"NH2 NH2 Arginine (Arg, R) Negatively charged R groups COO COO H N9C 9H 3 H N9C 9H 3 CH2 CH2 CH2 CH2 C CH2 CH2 CH2 C COO CH2 O H2N Asparagine (Asn, N) COO CH2 COO CH3 Methionine (Met, M) Tryptophan (Trp, W) CH2 COO S CH NH O Positively charged R groups COO COO H 9C 9OH H2N CH2 CH2 H2C Polar, uncharged R groups COO COO H N9C9H 3 Aromatic R groups COO H3C CH3 Valine (Val, V) Alanine (Ala, A) COO CH3 COO O Glutamine (Gln, Q) Aspartate (Asp, D) COO Glutamate (Glu, E) C H NH CH NH Histidine (His, H) The Genetic Code and Translation R1 H H N9CH9C9O 3 R2 H9N9CH9COO 411 ◗ 15.6 Amino acids are joined together by peptide bonds. In a peptide bond, the carboxyl group of one amino acid is covalently attached to the amino group of another amino acid. H O H2O R1 H R2 H N9CH9C9N9CH9COO 3 O Peptide bond (a) Primary structure Amino acid 1 Amino acid 4 ◗ 15.7 (b) Secondary structure (c) Tertiary structure Two or more polypeptide chains may associate to create a quaternary structure. (d) Quaternary structure R Amino acid 2 Amino acid 3 The secondary structure folds further into a tertiary structure. Interactions between amino acids cause the primary structure to fold into a secondary structure, such as this alpha helix. The primary structure of a protein is its sequence of amino acids. R Side chain R R Proteins have several levels of structural organization. The Genetic Code In 1953, Watson and Crick solved the structure of DNA and identified the base sequence as the carrier of genetic information. However, the way in which the base sequence of DNA specified the amino acid sequences of proteins (the genetic code) was not immediately obvious and remained elusive for another 10 years. One of the first questions about the genetic code to be addressed was: How many nucleotides are necessary to specify a single amino acid? This basic unit of the genetic code — the set of bases that encode a single amino acid — is a codon (Chapter 14). Many early investigators recognized that codons must contain a minimum of three nucleotides. Each nucleotide position in mRNA can be occupied by one of four bases: A, G, C, or U. If a codon consisted of a single nucleotide, only four different codons (A, G, C, and U) would be possible, which is not enough to code for the 20 different amino acids commonly found in proteins. If codons were made up of two nucleotides each (i.e., GU, AC, etc.) there would be 4  4  16 possible codons — still not enough to code for all 20 amino acids. With three nucleotides per codon, there are 4  4  4  64 possible codons, which is more than enough to specify 20 different amino acids. Therefore, a triplet code requiring three nucleotides per codon is the most efficient way to encode all 20 amino acids. Using mutations in bacteriophage, Francis Crick and his colleagues confirmed in 1961 that the genetic code is indeed a triplet code. Concepts The genetic code is a triplet code, in which three nucleotides code for each amino acid in a protein. www.whfreeman.com/pierce An electronic table of codons and the amino acids they specify 412 Chapter 15 Breaking the Genetic Code When it had been firmly established that the genetic code consists of codons that are three nucleotides in length, the next step was to determine which groups of three nucleotides specify which amino acids. This task required the development of a cell-free system for protein synthesis ( ◗ FIGURE 15.8), which would make it possible to study the translation of a known mRNA. Logically, the easiest way to break the code would have been to determine the base sequence of a piece of RNA, add it to a cell-free protein-synthesizing system, and allow it to direct the synthesis of a protein. The amino acid sequence Prepairing a cell-free synthesizing system 1 Grow bacteria in culture and isolate by centrifugation. 2 Grind the cells to release the cellular contents, including RNA, DNA, ribosomes, enzymes, and other components needed for translation. Deoxyribonuclease 3 Add deoxyribonuclease. This enzyme destroys all the cellular DNA, and no more mRNA is produced. Protein synthesis stops. mRNA of known sequence Labeled amino acids 4 Restart translation by adding mRNA of known sequence and labeled amino acids to the tube, and incubate the solution at 37C. 5 The protein produced by the system can be precipitated by adding trichloroacetic acid. ◗ 15.8 Breaking the genetic code required a cell-free protein-synthesizing system. of the newly synthesized protein could then be determined, and its sequence could be compared with that of the RNA. Unfortunately, there was no way at that time to determine the nucleotide sequence of a piece of RNA; so indirect methods were necessary to break the code. The first clues to the genetic code came in 1961, from the work of Marshall Nirenberg and Johann Heinrich Matthaei. These investigators created synthetic RNAs by using an enzyme called polynucleotide phosphorylase. Unlike RNA polymerase, polynucleotide phosphorylase does not require a template; it randomly links together any RNA nucleotides that happen to be available. The first synthetic mRNAs used by Nirenberg and Matthaei were homopolymers, RNA molecules consisting of a single type of nucleotide. For example, by adding polynucleotide phosphorylase to a solution of uracil nucleotides, they generated RNA molecules that consisted entirely of uracil nucleotides and thus contained only UUU codons ( ◗ FIGURE 15.9). These poly(U) RNAs were then added to 20 tubes, each containing a cell-free protein-synthesizing system and the 20 different amino acids, one of which was radioactively labeled. Translation took place in all 20 tubes, but radioactive protein appeared in only one of the tubes — the one containing labeled phenylalanine (see Figure 15.9). This result showed that the codon UUU specifies the amino acid phenylalanine. The results of similar experiments using poly(C) and poly(A) RNA demonstrated that CCC codes for proline and AAA codes for lysine; for technical reasons, the results from poly(G) were uninterpretable. To gain information about additional codons, Nirenberg and his colleagues created synthetic RNAs containing two or three different bases. Because polynucleotide phosphorylase incorporates nucleotides randomly, these RNAs contained random mixtures of the bases and are thus called random copolymers. For example, when adenine and cytosine nucleotides are mixed with polynucleotide phosphorylase, the RNA molecules produced have eight different codons: AAA, AAC, ACC, ACA, CAA, CCA, CAC, and CCC. In cell-free protein-synthesizing systems, these poly(AC) RNAs produced proteins containing six different amino acids: asparagine, glutamine, histidine, lysine, proline, and threonine. The proportions of the different amino acids in the proteins depended on the ratio of the two nucleotides used in creating the synthetic mRNA, and the theoretical probability of finding a particular codon could be calculated from the ratios of the bases. If a 41 ratio of C to A were used in making the RNA, then the probability of C occurring at any given position in a codon is 45 and the probability of A being in it is 15. With random incorporation of bases, the probability of any one of the codons with two Cs and one A (CCA, CAC, or ACC) should be 45  45  15  16125  0.13, or 13%, and the probability of any codon with two As and one C (AAC, ACA, or CAA) should be 15  15  45  4125  0.032, or about 3%. Therefore, an amino acid encoded by The Genetic Code and Translation Experiment Question: What amino acids are specified by codons composed of only one type of base? U Uracil nucleotides Polynucleotide phosphorylase UUUUUUUUUUUUUUUUUU Poly(U) homopolymer (b) 1 A homopolymer (in this case, poly(U) mRNA) was added to a test tube containing a cell-free translation system, 1 radioactively labeled amino acid, and 19 unlabeled amino acids. 2 The tube was 3 Translation took place. incubated at 37C. Percentage of codons in poly(AC) (a) U U UU U U U U U U U U U U U U U Theoretical frequency of RNA code words 30 Observed frequency of amino acid incorporation 20 AAC ACC Histidine Asparagine 10 0 0 10 20 30 40 50 60 70 80 Percentage of C in poly(AC) 90 100 ◗ 4 The protein was filtered and the filter was checked for radioactivity. Precipitate protein Free amino acids Protein 15.10 Nirenberg and Matthaei’s use of random copolymers provided information about the genetic code. The theoretical percentage of codons (vertical axis) is plotted against various percentages of cytosine in random AC copolymers. Notice that the distribution of histidine approximates the theoretical percentage of a codon with two Cs and one A, whereas the distribution of asparagine approximates the percentage expected of a codon with two As and one (Credit to come-allowed 1 line). Suction 5 The procedure was repeated in 20 tubes, with each tube containing a different labeled amino acid. Pro Lys Arg His Tyr Ser Thr Asn Gln Cys Phe Asp Glu Trp Gly Ala Val Ile Leu Met 6 The tube in which the protein was radioactively labeled contained newly synthesized protein with the amino acid specified by the homopolymer. In this case, UUU specified the amino acid phenylalanine. Conclusion: UUU codes for phenylalanine; in other experiments, AAA was found to code for alanine, and CCC for proline. ◗ 15.9 Nirenberg and Matthaei developed a method for identifying the amino acid specified by a homopolymer. two Cs and one A should be more common than an amino acid encoded by two As and one C. By comparing the percentages of amino acids in proteins produced by random copolymers with the theoretical frequencies expected for the codons ( ◗ FIGURE 15.10), information about the base composition of the codons was derived. Findings from these experiments revealed nothing, however, about the codon base sequence; histidine was clearly encoded by a codon with two Cs and one A (see Figure 15.10), but whether that codon was ACC, CAC, or CCA was unknown. There were other problems with this method: the theoretical calculations depended on the random incorporation of bases, which did not always occur, and, because the genetic code is redundant, sometimes several different codons specify the same amino acid. To overcome the limitations of random copolymers, Nirenberg and Philip Leder developed another technique in 1964 that used ribosome-bound tRNAs. They found that a very short sequence of mRNA — even one consisting of a single codon — would bind to a ribosome. The codon on the short mRNA would then base pair with the matching anticodon on a tRNA that carries the amino acid specified by the codon ( ◗ FIGURE 15.11). The ribosome-bound mRNA was mixed with tRNAs and amino acids, and this mixture was passed through a nitrocellulose filter. The tRNAs paired with the ribosome-bound mRNA stuck to the filter, whereas unbound tRNAs passed through. The advan- 413 Chapter 15 Question: With the use of tRNAs, what other matches between codon and amino acid could be determined? Val Glu 1 Very short mRNAs with known codons were synthesized…. Arg CAA CAC GCA tRNAs with amino acids GUU Synthetic mRNA with one codon Mix 2 …and added to a mixture of ribosomes and tRNAs attached to amino acids. Val Arg Ribosome Glu 3 The ribosome bound the mRNA and the tRNAs that it specified. CAA GUU GCA CAC Unbound tRNAs Ribosome with mRNA and tRNA specified by codon Filter solution Filter 4 The mixture was then passed through a nitrocellulose filter. The tRNAs paired with ribosome-bound mRNA stuck to the filter, whereas unbound tRNAs passed through. A third method provided additional information about the genetic code. Gobind Khorana and his colleagues used chemical techniques to synthesize RNA molecules that contained known repeating sequences. They hypothesized that an mRNA that contained, for instance, alternating uracil and guanine nucleotides (UGUG UGUG) would be read during translation as two alternating codons, UGU GUG UGU GUG, producing a protein composed of two alternating amino acids. When Khorana and his colleagues placed this synthetic mRNA in a cellfree protein-synthesizing system, it produced a protein made of alternating cysteine and valine residues. This technique could not determine which of the two codons (UGU or GUG) specified cysteine, but, combined with other methods, it made a crucial contribution to cracking the genetic code. The genetic code was fully understood by 1968 ( ◗ FIGURE 15.12). In the next section, we will examine some of the features of the code, which is so important to modern biology that Francis Crick has compared its place to that of the periodic table of the elements in chemistry. www.whfreeman.com/pierce Nirenberg U Conclusion: The codon GUU specifies valine; many other codons were determined by using this method. ◗ tage of this system was that it could be used with very short synthetic mRNA molecules that could be synthesized with a known sequence. Nirenberg and Leder synthesized over 50 short mRNAs with known codons and added them individually to a mixture of ribosomes and tRNAs. They then isolated the ribosome-bound tRNAs and determined which amino acids were present on the bound tRNAs. For example, synthetic RNA with the codon GUU retained a tRNA to which valine was attached, whereas RNAs with the codons UGU and UUG did not. Using this method, Nirenberg and his colleagues were able to determine the amino acids encoded by more than 50 codons. Second base C A UCU UAU UUU Tyr Phe UCC UAC UUC U Ser UCA UAA Stop UUA Leu UCG UAG Stop UUG 5 The tRNAs on the filter were bound to valine. 15.11 Nirenberg and Leder developed a technique for using ribosome-bound tRNAs to provide additional information about the genetic code. A brief biography of Marshall G U UGU Cys C UGC UGA Stop A UGG Trp G CCU CUU U CAU CGU His CCC CUC C CAC CGC Pro Leu Arg C CCA CUA A CAA CGA Gln CCG CUG G CAG CGG AUU AUC Ile A AUA AUG Met ACU AAU AGU Ser Asn ACC AAC AGC Thr ACA AAA AGA Arg Lys ACG AAG AGG U C A G GCU U GAU GGU GUU Asp GCC C GAC GGC GUC G Ala Val Gly GCA A GGA GAA GUA Glu GCG G GGG GAG GUG ◗ 15.12 The genetic code consists of 64 codons and the amino acids specified by these codons. The codons are written 5 : 3, as they appear in the mRNA. AUG is an initiation codon; UAA, UAG, and UGA are termination codons. Third base Experiment First base 414 The Genetic Code and Translation The Degeneracy of the Code One amino acid is encoded by three consecutive nucleotides in mRNA, and each nucleotide can have one of four possible bases (A, G, C, and U) at each nucleotide position thus permitting 43  64 possible codons (see Figure 15.12). Three of these codons are stop codons, specifying the end of translation. Thus, 61 codons, called sense codons, code for amino acids. Because there are 61 sense codons and only 20 different amino acids commonly found in proteins, the code contains more information than is needed to specify the amino acids and is said to be a degenerate code. This expression does not mean that the genetic code is depraved; degenerate is a term that Francis Crick borrowed from quantum physics, where it describes multiple physical states that have equivalent meaning. The degeneracy of the genetic code means that amino acids may be specified by more than one codon. Only tryptophan and methionine are encoded by a single codon (see Figure 15.12). Others amino acids are specified by two codons, and some, such as leucine, are specified by six different codons. Codons that specify the same amino acid are said to be synonymous, just as synonymous words are different words that have the same meaning. Isoaccepting tRNAs As we learned in Chapter 14, tRNAs serve as adapter molecules, binding particular amino acids and delivering them to a ribosome, where the amino acids are then assembled into polypeptide chains. Each type of tRNA attaches to a single type of amino acid. The cells of most organisms possess from about 30 to 50 different tRNAs, and yet there are only 20 different amino acids in proteins. Thus, some amino acids are carried by more than one tRNA. Different tRNAs that accept the same amino acid but have different anticodons are called isoaccepting tRNAs. Some synonymous codons code for different isoacceptors. Wobble Many synonymous codons differ only in the third position (see Figure 15.12). For example, alanine is encoded by the codons GCU, GCC, GCA, and GCG, all of which begin with GC. When the codon on the mRNA and the anticodon of the tRNA join ( ◗ FIGURE 15.13), the first (5) base of the codon pairs with the third base (3) of the anticodon, strictly according to Watson and Crick rules: A with U; C with G. Next, the middle bases of codon and anticodon pair, also strictly following the Watson and Crick rules. After these pairs have hydrogen bonded, the third bases pair weakly — there may be flexibility, or wobble, in their pairing. In 1966, Francis Crick developed the wobble hypothesis, which proposed that some nonstandard pairings of bases could occur at the third position of a codon. For example, a G in the anticodon may pair with either a C or a U in the third position of the codon (Table 15.2). The important thing to remember about wobble is that it allows Ser Ser tRNA Anticodon 1 Pairing at the third codon position is relaxed. AGG UCC 5’ Wobble GGG AGG position UCU 3’ Codon 2 G can pair with C… 3 …or with U. ◗ 15.13 Wobble may exist in the pairing of a codon on mRNA with an anticodon on tRNA. The mRNA and tRNA pair in an antiparallel fashion. Pairing at the first and second codon positions is in accord with the Watson and Crick pairing rules (A with T, G with C); however, pairing rules are relaxed at the third position of the codon, and G on the anticodon can pair with either U or C on the codon in this example. some tRNAs to pair with more than one codon on an mRNA; thus from 30 to 50 tRNAs can pair with 61 sense codons. Some codons are synonymous through wobble. Concepts The genetic code consists of 61 sense codons that specify the 20 common amino acids; the code is degenerate and some amino acids are encoded by more than one codon. Isoaccepting tRNAs are different tRNAs with different anticodons that specify the same amino acid. Wobble exists when more than one codon can pair with the same anticodon. The Reading Frame and Initiation Codons Findings from early studies of the genetic code indicated that it is generally nonoverlapping. An overlapping code is one in which a single nucleotide is included in more than one codon, as shown in ◗ FIGURE 15.14. Usually, however, each nucleotide sequence of an mRNA specifies a single amino acid. A few overlapping codes are found in viruses; in these cases, two different proteins may be encoded within the same sequence of mRNA. For any sequence of nucleotides, there are three potential sets of codons — three ways that the sequence can be read in groups of three. Each different way of reading the sequence is called a reading frame, and any sequence of nucleotides has three potential reading frames. The three reading frames have completely different sets of codons and therefore will specify proteins with entirely different amino acid sequences. Thus, it is essential for the 415 416 Chapter 15 Table 15.2 The wobble rules, indicating which bases in the third position (3 end) of the mRNA codon can pair with bases at the first (5 end) of the anticodon of the tRNA First Position of Anticodon C G A U I (inosine) Third Position of Codon G U or C U A or G A, U, or C Nucleotide sequence A U A C G A G U C Nonoverlapping code A U A C G A G U C 1 2 3 Overlapping code A U A C G A G U C 1 U A C 2 A C G 3 Pairing Anticodon 3 – X — Y — C – 5 s s s 5 – Y — X — G – 3 Codon Anticodon 3 – X — Y — G – 5 s s s 5 – Y — X — U– 3 C Codon Anticodon 3 – X — Y — A – 5 s s s 5 – Y — X — U – 3 Codon Anticodon 3 – X — Y — U – 5 s s s 5 – Y — X — A– 3 G Codon Anticodon 3 – X — Y — I – 5 s s s 5 – Y — X —A– 3 U C Codon translational machinery to use the correct reading frame. How is the correct reading frame established? The reading frame is set by the initiation codon, which is the first codon of the mRNA to specify an amino acid. After the initiation codon, the other codons are read as successive groups of three nucleotides. No bases are skipped between the codons; so there are no punctuation marks to separate the codons. The initiation codon is usually AUG, although GUG and UUG are used on rare occasions. The initiation codon is not just a punctuation mark; it specifies an amino acid. In bacterial cells, AUG encodes a modified type of methionine, N-formylmethionine; all proteins in bacteria begin with this amino acid, but the formyl group (or, in some cases, the entire amino acid) may be removed after the protein has been synthesized. When the codon AUG is at an internal . . . ◗ 15.14 The genetic code is generally nonoverlapping. In a nonoverlapping code, each nucleotide belongs to only one codon. In an overlapping code, some nucleotides belong to more than one codon. The genetic code used in almost all living organisms is nonoverlapping. position in a gene, it codes for unformylated methionine. In archaeal and eukaryotic cells, AUG specifies unformylated methionine both at the initiation position and at internal positions. Termination Codons Three codons — UAA, UAG, and UGA — do not encode amino acids. These codons signal the end of the protein in both bacterial and eukaryotic cells and are called stop codons, termination codons, or nonsense codons. No tRNA molecules have anticodons that pair with termination codons. The Universality of the Code For many years the genetic code was assumed to be universal, meaning that each codon specifies the same amino acid in all organisms. We now know that the genetic code is almost, but not completely, universal; a few exceptions have been found. Most of these exceptions are termination codons, but there are a few cases in which one sense codon substitutes for another. The majority of exceptions are found in mitochondrial genes; a few nonuniversal codons have also been detected in nuclear genes of protozoans (Table 15.3). Concepts Each sequence of nucleotides possesses three potential reading frames. The correct reading frame is set by the initiation codon. The end of a protein-encoding sequence is marked by a termination codon. With a few exceptions, all organisms use the same genetic code. The Genetic Code and Translation Table 15.3 Some exceptions Connecting Concepts to the universal genetic code Characteristics of the Genetic Code We have now considered a number of characteristics of the genetic code. Let’s pause for a moment and review these characteristics. 1. The genetic code consists of a sequence of nucleotides 2. 3. 4. 5. 6. 7. 8. 9. in DNA or RNA. There are four letters in the code, corresponding to the four bases — A, G, C, and U (T in DNA). The genetic code is a triplet code. Each amino acid is encoded by a sequence of three consecutive nucleotides, called a codon. The genetic code is degenerate — there are 64 codons but only 20 amino acids in proteins. Some codons are synonymous, specifying the same amino acid. Isoaccepting tRNAs are tRNAs with different anticodons that accept the same amino acid; wobble allows the anticodon on one type of tRNA to pair with more than one type of codon on mRNA. The code is generally nonoverlapping; each nucleotide in an mRNA sequence belongs to a single reading frame. The reading frame is set by an initiation codon, which is usually AUG. When a reading frame has been set, codons are read as successive groups of three nucleotides. Any one of three termination codons (UAA, UAG, and UGA) can signal the end of a protein; no amino acids are encoded by the termination codons. The code is almost universal. Genome Codon Universal Code Altered Code Bacterial DNA Mycoplasma capricolum UGA Stop Trp UGA AUA AGA, AGG UGA UGA CGG Stop Ile Arg Trp Met Stop Stop Stop Arg Trp Trp Trp UAA UAG Stop Stop Gln Gln Mitochondrial DNA Human Human Human Yeast Trypanosomes Plants Nuclear DNA Tetrahymena Paramecium begins at the amino end of the protein, and the protein is elongated by the addition of new amino acids to the carboxyl end. Protein synthesis can be conveniently divided into four stages: (1) the binding of amino acids to the tRNAs; (2) initiation, in which the components necessary for translation Ribosome mRNA 5’ 3’ AUG The Process of Translation Now that we are familiar with the genetic code, we can begin to study the mechanism by which amino acids are assembled into proteins. Because more is known about translation in bacteria, we will focus primarily on bacterial translation. In most respects, eukaryotic translation is similar, although there are some significant differences that will be noted as we proceed through the stages of translation. Translation takes place on ribosomes; indeed, ribosomes can be thought of as moving protein-synthesizing machines. Through a variety of techniques, a detailed view of the structure of the ribosome has been produced in recent years, which has greatly improved our understanding of the translational process. A ribosome attaches near the 5 end of an mRNA strand and moves toward the 3 end, translating the codons as it goes ( ◗ FIGURE 15.15). Synthesis N Polypeptide chain C mRNA 5’ ◗ AUG 15.15 The translation of an mRNA molecule takes place on a ribosome. 3’ 417 418 Chapter 15 are assembled at the ribosome; (3) elongation, in which amino acids are joined, one at a time, to the growing polypeptide chain; and (4) termination, in which protein synthesis halts at the termination codon and the translation components are released from the ribosome. The Binding of Amino Acids to Transfer RNAs The first stage of translation is the binding of tRNA molecules to their appropriate amino acids. When linked to its amino acid, a tRNA delivers that amino acid to the ribosome, where the tRNA’s anticodon pairs with a codon on mRNA. This process enables the amino acids to be joined in the order specified by the mRNA. Proper translation, then, first requires the correct binding of tRNA and amino acid. As already mentioned, a cell typically possesses from 30 to 50 different tRNAs, and, collectively, these tRNAs are attached to the 20 different amino acids. Each tRNA is specific for a particular kind of amino acid. All tRNAs have the sequence CCA at the 3 end, and the carboxyl group (COO) of the amino acid is attached to the 2- or 3hydroxyl group of the adenine nucleotide at the end of the tRNA, ( ◗ FIGURE 15.16). If each tRNA is specific for a particular amino acid but all amino acids are attached to the same nucleotide (A) at the 3 end of a tRNA, how does a tRNA link up with its appropriate amino acid? Amino acid Amino R group Carboxyl group group +H N 3 C C O H tRNA tRNA 5’ C C A OH O 2’ 3’ 3’ Amino acid acceptor stem Adenine O CH 2 O O P O– O C Anticodon C 1 Positions in blue are the same in all tRNAs and cannot be used in differentiating among tRNAs. Acceptor stem 3’ 5’ 2 Positions in red are important in the recognition of tRNAs by one synthetase. 3 Positions in green are used by more than one synthetase. TψC arm DHU arm Extra arm Anticodon arm Anticodon ◗ 15.17 Certain positions on tRNA molecules are recognized by the appropriate aminoacyl-tRNA synthetase. The key to specificity between an amino acid and its tRNA is a set of enzymes called aminoacyl-tRNA synthetases. A cell has 20 different aminoacyl-tRNA synthetases, one for each of the 20 amino acids. Each synthetase recognizes a particular amino acid, as well as all the tRNAs that accept that amino acid. Recognition of the appropriate amino acid by a synthetase is based on the different sizes, charges, and R groups of the amino acids. The tRNAs, however, are all similar in tertiary structure. How does a synthetase distinguish among tRNAs? The recognition of tRNAs by a synthetase depends on the differing nucleotide sequences of tRNAs. Researchers have identified which nucleotides are important in recognition by altering different nucleotides in a particular tRNA and determining whether the altered tRNA is still recognized by its synthetase. The results of these studies revealed that the anticodon loop, the DHU-loop, and the acceptor stem are particularly critical for the identification of most tRNAs ( ◗ FIGURE 15.17). The attachment of a tRNA to its appropriate amino acid (termed tRNA charging) requires energy, which is supplied by adenosine triphosphate (ATP): amino acid  tRNA  ATP 9: ◗ 15.16 An amino acid attaches to the 3 end of a tRNA. The carboxyl group (COO) of the amino acid attaches to the hydroxyl group of the 2- or 3- carbon atom of the final nucleotide at the 3 end of the tRNA, in which the base is always an adenine. aminoacyl-tRNA  AMP  PPi Two phosphates are cleaved from ATP, producing adenosine monophosphate (AMP) and pyrophosphate (PPi), as well as the aminoacylated tRNA (the tRNA with its attached The Genetic Code and Translation 1 In the first step, the amino acid reacts with ATP,… 2 …producing aminocyl-AMP and PPi. O ATP R group +H 3N R group R O C H R group R tRNA Amino acid +H N 3 3 In the second step, the amino acid is transferred to the tRNA,… C +H N 3 O– P Pi C H C C O H AMP Aminoacyl tRNA C AMP O 4 …and AMP is released. Conclusion: At the end of tRNA charging, an amino acid is linked to its appropriate tRNA. ◗ 15.18 An amino acid becomes attached to the appropriate tRNA in a two-step reaction. amino acid). This reaction takes place in two steps ( ◗ FIGURE 15.18). To identify the resulting aminoacylated tRNA, we write the three-letter abbreviation for the amino acid in front of the tRNA; for example, the amino acid alanine (Ala) attaches to its tRNA (tRNAAla), giving rise to its aminoacyl-tRNA (Ala-tRNAAla). Errors in tRNA charging are rare; they occur in only about 1 in 10,000 to 1 in 100,000 reactions. This fidelity is due to the presence of proofreading activity in the synthetases, which detects and removes incorrectly paired amino acids from the tRNAs. Concepts Amino acids are attached to specific tRNAs by aminoacyl-tRNA synthetases in a two-step reaction that requires ATP. The Initiation of Translation The second stage in the process of protein synthesis is initiation. During initiation, all the components necessary for protein synthesis assemble: (1) mRNA; (2) the small and large subunits of the ribosome; (3) a set of three proteins called initiation factors; (4) initiator tRNA with N-formylmethionine attached (fMet-tRNAfMet); and (5) guanosine triphosphate (GTP). Initiation comprises three major steps. First, mRNA binds to the small subunit of the ribosome. Second, initiator tRNA binds to the mRNA through base pairing between the codon and anticodon. Third, the large ribosome joins the initiation complex. Let’s look at each of these steps more closely. A functional ribosome exists as two subunits, the small 30S subunit and the large 50S subunit (in bacterial cells). When not actively translating, the two subunits exist in dynamic equilibrium, in which they are constantly joining and separating ( ◗ FIGURE 15.19). An mRNA molecule can bind to the small ribosome subunit only when the subunits are separate. Initiation factor 3 (IF-3) binds to the small subunit of the ribosome and prevents the large subunit from binding during initiation (see Figure 15.19b). Key sequences on the mRNA required for ribosome binding have been identified in experiments in which the ribosome is allowed to bind to mRNA under conditions that allow initiation but prevent later stages of protein synthesis, thereby stalling the ribosome at the initiation site. After the ribosome has attached to the mRNA in these experiments, ribonuclease is added, which degrades all the mRNA except the region covered by the ribosome. The remaining mRNA can be separated from the ribosome and studied. The sequence covered by the ribosome during initiation is from 30 to 40 nucleotides long and includes the AUG initiation codon. Within the ribosome-binding site is the Shine-Dalgarno consensus sequence ( ◗ FIGURE 15.20) (see Chapter 14), which is complementary to a sequence of nucleotides at the 3 end of 16S rRNA (part of the small subunit of the ribosome). During initiation, the nucleotides in the Shine-Dalgarno sequence pair with their complementary nucleotides in the 16S rRNA, allowing the small subunit of the ribosome to attach to the mRNA and positioning the ribosome directly over the initiation codon. Next, the initiator fMet-tRNAfMet attaches to the initiation codon (see Figure 15.19c). This step requires initiation factor 2 (IF-2), which forms a complex with GTP. A third factor, initiation factor 1 (IF-1), enhances the dissociation of the large and small ribosomal subunits. At this point, the initiation complex consists of (1) the small subunit of the ribosome; (2) the mRNA; (3) the initiator tRNA with its amino acid (fMet-tRNAfMet); (4) one molecule of GTP; and (5) IF-3, IF-2, and IF-1. These components are collectively known as the 30S initiation complex (see Figure 15.19c). In the final step of initiation, IF-3 dissociates from the small subunit, allowing the large 419 420 Chapter 15 (a) E. coli trpA gene 5’ AGCACGAGGGGAAAUCUGAUGGAACGCUAC 3’ Ribosome Large subunit (50S) E. coli araB gene UUUGGAUGGAGUGAAACGAUGGCGAUUGCA E. coli lacI gene CAAUUCAGGGUGGUGAAUCUGAAACCAGUA λ phage CRO gene AUGUACUAAGGAGGUUGUAUGGAACAAGCG Shine-Dalgarno sequence Small subunit (30S) 1 The ribosomal subunits exist in dynamic equilibrium, constantly joining and separating. IF-3 IF-3 (b) Shine-Dalgarno sequence mRNA Initiation codon 2 IF-3 binds to the small subunit of the ribosome, preventing the large subunit from binding,… AUGUGC IF-3 3 …which allows the small subunit of the ribosome to attach to mRNA. GTP tRNA IF-2 fM et Anticodon UA (c) (c) 30S initiation complex IF-1 C 4 A tRNA charged with N-formylmethionine forms a complex with IF-2 and GTP… fMet IF-2 GTP UAC AUGUGC mRNA 5 …and joins the small subunit of the ribosome and the mRNA. IF-3 IF-1 IF-3 6 IF-1, IF-2, and IF-3 dissociate from the complex, GTP is hydrolyzed to GDP,… IF-1 IF-2 + GDP + P i (d) (d) 70S initiation complex mRNA mRNA 5’ AUGUACUAAGGAGGUUGUAUGGAACAAGACG 3’ A U U C C U C CA Initiation codon 5’ 16S rRNA 3’ ◗ 15.20 Shine-Dalgarno consensus sequences in mRNA are required for the attachment of the small subunit of the ribosome. The Shine-Dalgarno sequences are complementary to a sequence of nucleotides found near the 3 end of 16S rRNA in the small subunit of the ribosome. These complementary nucleotides base pair during the initiation of translation. subunit of the ribosome to join the initiation complex. The molecule of GTP (provided by IF-2) is hydrolyzed to guanosine diphosphate (GDP), and IF-1 and IF-2 depart (see Figure 15.19d). When the large subunit has joined the initiation complex, it is called the 70S initiation complex. Similar events take place in the initiation of translation in eukaryotic cells, but there are some important differences. In bacterial cells, sequences in 16S rRNA of the small subunit of the ribosome bind to the Shine-Dalgarno sequence in mRNA; this binding positions the ribosome over the start codon. No analogous consensus sequence exists in eukaryotic mRNA. Instead, the cap at the 5 end of eukaryotic mRNA plays a critical role in the initiation of translation. The small subunit of the eukaryotic ribosome, with the help of initiation factors, recognizes the cap and binds there; the small subunit then migrates along (scans) the mRNA until it locates the first AUG codon. The identification of the start codon is facilitated by the presence of a consensus sequence (called the Kozak sequence) that surrounds the start codon: Kozak sequence fMet Next UAC codon Next AUGUGC codon Next Next codon codon 7 …and the large subunit joins to create a 70S initiation complex. Conclusion: At the end of initiation, the ribosome is assembled on the mRNA and the first tRNA is attached to the initiation codon. ◗ Initiation codon; pairs with fMet-tRNAfMet 15.19 The initiation of translation in bacterial cells requires several initiation factors and GTP. 5–ACCAUGG–3 Start codon Another important difference is that eukaryotic initiation requires more initiation factors. Some factors keep the ribosomal subunits separated, just as IF-3 does in bacterial cells. Others recognize the 5 cap on mRNA and allow the small subunit of the ribosome to bind there. Still others possess RNA helicase activity, which is used to unwind secondary structures that may exist in the 5 untranslated region of mRNA, allowing the small subunit to move down The Genetic Code and Translation the mRNA until the initiation codon is reached. Other initiation factors help bring the initiator tRNA and methionine (Met-tRNAfMet) to the initiation complex. The poly(A) tail at the 3 end of eukaryotic mRNA also plays a role in the initiation of translation. Proteins that attach to the poly(A) tail interact with proteins that bind to the 5 cap, enhancing the binding of the small subunit of the ribosome to the 5 end of the mRNA. This interaction between the 5 cap and the 3 tail suggests that the mRNA bends backward during the initiation of translation, forming a circular structure ( ◗ FIGURE 15.21). A few eukaryotic mRNAs contain internal ribosome entry sites, where ribosomes can bind directly without first attaching to the 5 cap. Start codon 5’ AUG Stop codon 5’ untranslated region 3’ untranslated region Cap-binding proteins Poly(A) proteins Poly(A) protein Cap-binding proteins AAAAAAA 3’ UAA Poly(A) tail Proteins that attach to the 3‘ poly(A) tail interact with cap-binding proteins… 3’ AAAAAAA Ribosome Concepts In the initiation of translation in bacterial cells, the small ribosomal subunit attaches to mRNA, and initiator tRNA attaches to the initiation codon. This process requires several initiation factors (IF-1, IF-2, and IF-3) and GTP. In the final step, the large ribosomal subunit joins the initiation complex. Elongation The next stage in protein synthesis is elongation, in which amino acids are joined to create a polypeptide chain. Elongation requires (1) the 70S complex just described; (2) tRNAs charged with their amino acids; (3) several elongation factors (EF-Ts, EF-Tu, and EF-G); and (4) GTP. A ribosome has three sites that can be occupied by tRNAs; the aminoacyl, or A, site, the peptidyl, or P, site, and the exit, or E, site ( ◗ FIGURE 15.22a). The initiator tRNA immediately occupies the P site (the only site to which the fMet-tRNAfMet is capable of binding), but all other tRNAs first enter the A site. After initiation, the ribosome is attached to the mRNA, and fMet-tRNAfMet is positioned over the AUG start codon in the P site; the adjacent A site is unoccupied (see Figure 15.22a). Elongation occurs in three steps. The first step ( ◗ FIGURE 15.22b) is the delivery of a charged tRNA (tRNA with its amino acid attached) to the A site. This requires the presence of elongation factor Tu (EF-Tu), elongation factor Ts (EF-Ts), and GTP. EF-Tu first joins with GTP and then binds to a charged tRNA to form a three-part complex. This three-part complex enters the A site of the ribosome, where the anticodon on the tRNA pairs with the codon on the mRNA. After the charged tRNA is in the A site, GTP is cleaved to GDP, and the EF-Tu – GDP complex is released ( ◗ FIGURE 15.22c). Factor EF-Ts regenerates EF-Tu – GDP to EF-Tu – GTP. In eukaryotic cells, a similar set of reactions delivers the charged tRNA to the A site. The second step of elongation is the creation of a peptide bond between the amino acids that are attached to 5’ AUG UAA …and enhance the binding of the ribosome to the 5‘ end of the mRNA. ◗ 15.21 The poly(A) tail at the 3 end of eukaryotic mRNA plays a role in the initiation of translation. tRNAs in the P and A sites ( ◗ FIGURE 15.22d). The formation of this peptide bond releases the amino acid in the P site from its tRNA. The activity responsible for peptidebond formation in the ribosome is referred to as peptidyl transferase. For many years, the assumption was that this activity is carried out by one of the proteins in the large subunit of the ribosome. Evidence, however, now indicates that the catalytic activity is a property of the rRNA in the large subunit of the ribosome; this rRNA acts as a ribozyme (see pp. 000 in Chapter 14). The third step in elongation is translocation, ( ◗ FIGURE 15.22e), the movement of the ribosome down the mRNA in the 5 : 3 direction. This step positions the ribosome over the next codon and requires elongation factor G (EF-G) and the hydrolysis of GTP to GDP. Because the tRNAs in the P and A site are still attached to the mRNA through codon – anticodon pairing, they do not move with the ribosome as it translocates. Consequently, the ribosome shifts so that the tRNA that previously occupied the P site now occupies the E site, from which it moves into the cytoplasm where it may be recharged with another amino acid. Translocation also causes the tRNA that occupied the A site (which is attached to the growing polypeptide chain) to be in the P site, leaving the A site open. Thus, the progress of each tRNA through the ribosome during elongation can be summarized as follows: cytoplasm : A site : P site : E site : cytoplasm. As discussed earlier, the initiator tRNA is an exception: it attaches directly to the P site and never occupies the A site. 421 422 Chapter 15 1 fMET-tRNAfMet occupies the P site of the ribosome. 2 EF-Tu, EF-Ts, GTP, and charged tRNA form a complex… Gl (a) y (c) (b) fM fM et 4 After the charged tRNA is placed into the A site, GTP is cleaved to GDP, and the EF-Tu–GDP complex is released. 3 …that enters the A site of the ribosome. fM et G ly et G ly GGG E mRNA UAC AUGCC CACG A P E EF-Tu GTP EF-Tu EF-Ts GTP UAC GGG AUGC CCACG P A E UAC GGG AUGC CCACG P A EF-Tu + P i GDP EF-Ts 5 EF-Ts regenerates the EF-Tu–GTP complex, which is then ready to combine with another charged tRNA. ◗ 15.22 The elongation of translation comprises three steps. After translocation, the A site of the ribosome is empty and ready to receive the tRNA specified by the next codon. The elongation cycle (Figure 15.22a through d) repeats itself: a charged tRNA and its amino acid occupy the A site, a peptide bond is formed between the amino acids in the A and P sites, and the ribosome translocates to the next codon. Throughout the cycle, the polypeptide chain remains attached to the tRNA in the P site. The ribosome moves down the mRNA in the 5  : 3 direction, adding amino acids one at a time according to the order specified by the mRNA’s codon sequence. Elongation in eukaryotic cells takes place in a similar manner. Concepts Elongation consists of three steps: (1) a charged tRNA enters the A site, (2) a peptide bond is created between amino acids in the A and P sites, and (3) the ribosome translocates to the next codon. Elongation requires several elongation factors (EF-Tu, EF-Ts, and EF-G) and GTP. process, the GTP that is complexed to RF3 is hydrolyzed to GDP. Additional factors help bring about the release of the tRNA from the P site, the release of the mRNA from the ribosome, and the dissociation of the ribosome ( ◗ FIGURE 15.23c). Translation in eukaryotic cells terminates in a similar way, except that there are two release factors: eRF1, which recognizes all three termination codons, and eRF2, which binds GTP and stimulates the release of the polypeptide from the ribosome. Findings from recent studies suggest that the release factors bring about the termination of translation by completing a final elongation cycle of protein synthesis. In this model, RF1 and RF2 are similar in size and shape to tRNAs and occupy the A site of the ribosome, just as the amino acid–tRNA–EF–Tu–GTP complex does during an elongation cycle. Release factor 3 is structurally similar to EF-G; it then translocates RF1 and RF2 to the P site, as well as the last tRNA to the E site, in a way similar to that in which EF-G brings about translocation. When both the A site and the P site of the ribosome are cleared of tRNAs, the ribosome can dissociate. Research findings also indicate that some of the sequences in the rRNA play a role in the recognition of termination codons. Termination Protein synthesis terminates when the ribosome translocates to a termination codon. Because there are no tRNAs with anticodons complementary to the termination codons, no tRNA enters the A site of the ribosome when a termination codon is encountered ( ◗ FIGURE 15.23a). Instead, proteins called release factors bind to the ribosome ( ◗ FIGURE 15.23b). E. coli has three release factors — RF1, RF2, and RF3. Release factor 1 recognizes the termination codons UAA and UAG, and RF2 recognizes UGA and UAA. Release factor 3 forms a complex with GTP and binds to the ribosome. The release factors then promote the cleavage of the tRNA in the P site from the polypeptide chain; in the Concepts Termination takes place when the ribosome reaches a termination codon. Release factors bind to the termination codon, causing the release of the polypeptide from the last tRNA, the tRNA from the ribosome, and the mRNA from the ribosome. The overall process of protein synthesis, including tRNA charging, initiation, elongation, and termination, is summarized in ◗ FIGURE 15.24, and the components taking part in this process are listed in Table 15.4 (See page 00). The Genetic Code and Translation 6 A peptide bond is formed between the amino acids in the P and A sites. 7 The peptide-bond formation releases the amino acid in the P site from its tRNA. (d) et fM Dipeptide 8 The ribosome moves down 10 The tRNA that previously occupied the mRNA to the next the P site is now in the E site from codon (translocation),… which it moves into the cytoplasm. 9 …which requires EF-G and GTP. (e) fMe Gly t 11 The tRNA that occupied the A site is now in the P site, leaving the A site open. Gly EF-G E UAC GGG AUGC CCACG P A UA GTP C GDP E GGG AUGC CCACG P A 12 The unoccupied A site is now ready to receive another tRNA. + Pi Conclusion: At the end of each cycle of elongation, the amino acid in the P site is added to the polypeptide chain and the A site is free to accept another tRNA. www.whfreeman.com/pierce A brief overview of translation and how it fits into the central dogma of genetics RNA – RNA Interactions in Translation The process of translation is rich in RNA – RNA interactions (which were discussed in Chapter 14 in the context of RNA processing). For example, in bacterial translation, the ShineDalgarno consensus sequence at the 5 end of the mRNA pairs with the 3 end of the 16S rRNA (see Figure 15.20), which ensures the binding of the ribosome to mRNA. Mutations that alter the Shine-Dalgarno sequence, so that the mRNA and rRNA are no longer complementary, inhibit translation. Corresponding mutations affecting the rRNA that restore complementarity allow translation to proceed. RNA – RNA interactions also take place between the tRNAs in the A and P sites and the rRNAs found in both the large and the small subunits of the ribosome. Furthermore, association of the large and small subunits of the ribosome may require interactions between the 16S rRNA and the 23S rRNA, although whether ribosomal proteins are implicated is not yet clear. Finally, tRNAs and mRNAs interact through their codon – anticodon pairing. Polyribosomes In both prokaryotic and eukaryotic cells, mRNA molecules are translated simultaneously by multiple ribosomes; see page 000 ( ◗ FIGURE 15.25). The resulting structure — an mRNA with several ribosomes attached — is called a polyribosome. Each ribosome successively attaches to the ribosome-binding site at the 5 end of the mRNA and moves toward the 3 end; the polypeptide associated with each ribosome becomes progressively longer as the ribosome moves along the mRNA. In prokaryotic cells, transcription and translation are simultaneous; so multiple ribosomes may be attached to the 5 end of the mRNA while transcription is still taking place at the 3 end, as shown in ◗ FIGURE 15.26; see page 000. Until recently, transcription and translation were thought not to be simultaneous in eukaryotes, because transcription takes place in the nucleus and all translation was assumed to take place in the cytoplasm. However, research findings have now demonstrated that some translation takes place within the eukaryotic nucleus, and evidence suggests that, when the nucleus is the site of translation, transcription and translation may be simultaneous, much as in prokaryotes. Concepts In both prokaryotic and eukaryotic cells, multiple ribosomes may be attached to a single mRNA, generating a structure called a polyribosome. Connecting Concepts A Comparison of Bacterial and Eukaryotic Translation We have now considered the process of translation in bacterial cells and noted some distinctive differences that exist in eukaryotic cells. Let’s take a few minutes to reflect on some of the important similarities and differences of protein synthesis in bacterial and eukaryotic cells. First, we should emphasize that the genetic code of bacterial and eukaryotic cells is virtually identical; the only difference is in the amino acid specified by the initiation codon. In bacterial cells, AUG codes for a modified type of methionine, N-formylmethionine, whereas, in eukaryotic 423 424 Chapter 15 1 When the ribosome translocates to a stop codon, there is no tRNA with an anticodon that can pair with the codon in the A site. (a) Ribosome mRNA E 5’ DNA tRNA charging A A RNA tRNA Stop codon UCC AGGUAG P A Ribosomal Anticodon subunits UAC Translation Large PROTEIN 3’ Start codon 5’ RF1 or RF2 and RF3 AUGC CCACGACUGCGAGCGUUCCGCUAAGGUAG 3’ mRNA Small Stop codon Release factors (b) 2 RF1 or RF2 attaches to the A site,… RF3 Polypeptide E 5’ Transcription Amino acid AGGUAG P A AA GTP 3 …and RF3 forms a complex with GTP and binds 3’ to the ribosome. RF1 or UCC RF2 Initiation 5’ UAC AUGC CCACGACUGCGAGCGUUCCGCUAAGGUAG CCACG AA Elongation 4 The polypeptide is released from the tRNA in the P site. 7 AA2 AA3 AA AA 1 4 AA 5 GTP associated with RF3 is hydrolyzed to GDP. 6 GC RF3 U 5’ Charged tRNA AA 5 (c) AGGUAG CC GDP + P i 5’ 3’ AA CAA 7 G UCG CAA AUGC CCACGACUGCGAGCGUUCCGCUAAGGUAG CCACG 3’ RF or 1 RF 2 3’ Termination 6 The tRNA, mRNA, and release factors are released from the ribosome. Conclusion: When a stop codon is encountered, release factors associate with the ribosome and bring about the termination of translation. AA 1 5’ AA 2 AA 3 4 AA AA 5 AA6 AA 7 Release factor A A 8 AA 9 UCC AUGC CCACGACUGCGAGCGUUCCGCUAAGGUAG CCACG 3’ ◗ 15.23 Translation ends when a stop codon is encountered. Peptide release ◗ 15.24 The four steps involved in translation are tRNA charging (the binding of amino acids to tRNAs), initiation, elongation, and termination. In this process, amino acids are linked together in the order specified by the mRNA to create a polypeptide chain. A number of initiation, elongation, and release factors take part in the process, and energy is supplied by ATP and GTP. Completed polypeptide 5’ AUGC CCACGACUGCGAGCGUUCCGCUAAGGUAG CCACG Conclusion: Through the process of translation, amino acids are linked in the order specified by the mRNA. 3’ The Genetic Code and Translation Table 15.4 Components required for protein synthesis in bacterial cells Stage Component Function Binding of amino acid to tRNA Amino acids tRNAs aminoacyl-tRNA synthetase ATP Building blocks of proteins Deliver amino acids to ribosomes Attaches amino acids to tRNAs Provides energy for binding amino acid to tRNA Initiation mRNA fMet-tRNAfMet 30S ribosomal subunit 50S ribosomal subunit Initiation factor 1 Carries coding instructions Provides first amino acid in peptide Attaches to mRNA Stabilizes tRNAs and amino acids Enhances dissociation of large and small subunits of ribosome Binds GTP; delivers fMet-tRNAfMet to initiation codon Binds to 30S subunit and prevents association with 50S subunit Initiation factor 2 Initiation factor 3 Elongation 70S initiation complex Charged tRNAs Elongation factor Tu Elongation factor Ts Elongation factor G GTP Peptidyl transferase Termination Release factors 1, 2, and 3 cells, AUG codes for unformylated methionine. One consequence of the fact that bacteria and eukaryotes use the same code is that eukaryotic genes can be translated in bacterial systems, and vice versa; this feature makes genetic engineering possible, as we will see in Chapter 18. Another difference is that transcription and translation take place simultaneously in bacterial cells, but the nuclear envelope may separate these processes in eukaryotic cells. The physical separation of transcription and translation has important implications for the control of gene expression, which we will consider in Chapter 16, and it allows for extensive modification of eukaryotic mRNAs, as discussed in Chapter 14. However, it is now evident that some translation does take place in the eukaryotic nucleus and, there, transcription and translation may be simultaneous. The extent of nuclear translation and how it may affect gene regulation are not yet clear. Functional ribosome with A, P, and E sites and peptidyl transferase activity where protein synthesis takes place Bring amino acids to ribosome and help assemble them in order specified by mRNA Binds GTP and charged tRNA; delivers charged tRNA to A site Generates active elongation factor Tu Stimulates movement of ribosome to next codon Provides energy Creates peptide bond between amino acids in A site and P site Bind to ribosome when stop codon is reached and terminate translation Yet another difference is that mRNA in bacterial cells is short lived, typically lasting only a few minutes, but the longevity of mRNA in eukaryotic cells is highly variable and is frequently hours or days. Thus the synthesis of a particular bacterial protein ceases very quickly after transcription of the corresponding mRNA stops, but protein synthesis in eukaryotic cells may continue long after transcription has ended. In both bacterial and eukaryotic cells, aminoacyl-tRNA synthetases attach amino acids to their appropriate tRNAs; the chemical reaction employed is the same. There are significant differences in the sizes and compositions of bacterial and eukaryotic ribosomal subunits. For example, the large subunit of the eukaryotic ribosome contains three rRNAs, whereas the bacterial ribosome contains only two. These differences allow antibiotics and other substances to inhibit bacterial translation while having no effect on the transla- 425 426 Chapter 15 (a) Incoming ribosomal subunits Growing polypeptide chain Direction of translation mRNA Drawing to be rendered from actual micrograph being used in part (b) (b) FPO Electron micrograph of polyribosome from silk-worm ◗ 15.25 An mRNA molecule may be transcribed simultaneously by several ribosomes. (a) Four ribosomes are translating a eukaryotic mRNA molecule, moving from the 5 end to the 3 end. (b) In this electron micrograph of a polyribosome from the silkworm, the dark staining spheres are ribosomes, and the long, thin filament connecting the ribosomes is mRNA. The 5 end of the mRNA is toward the upper left-hand corner of the micrograph. The twisted filament coming out of each ribosome is a polypeptide chain. The polypeptide chains become longer as the ribosomes move toward the 3 end of the mRNA. (Part b, O. L. Miller, Jr., and Barbara A. Hamaklo). Direction of transcription RNA polymerase DNA 3’ tion of eukaryotic nuclear genes, as will be discussed near the end of this chapter. Other fundamental differences lie in the process of initiation. In bacterial cells, the small subunit of the ribosome attaches directly to the region surrounding the start codon through hydrogen bonding between the ShineDalgarno consensus sequence in the 5 untranslated region of the mRNA and a sequence at the 3 end of the 16S rRNA. In contrast, the small subunit of a eukaryotic ribosome first binds to proteins attached to the 5 cap on mRNA and then migrates down the mRNA, scanning the sequence until it encounters the first AUG initiation codon. (A few eukaryotic mRNAs have internal ribosomebinding sites that utilize a specialized initiation mechanism similar to that seen in bacterial cells.) Additionally, more initiation factors take part in eukaryotic initiation than in bacterial initiation. Elongation and termination are similar in bacterial and eukaryotic cells, although different elongation and termination factors are used. In both types of organisms, mRNAs are translated multiple times and are simultaneously attached to several ribosomes, forming polyribosomes. What about translation in archaea, which are prokaryotic in structure (see Chapter 2) but are similar to eukaryotes in other genetic processes such as transcription? Much less is known about the process of translation in archaea, but available evidence suggests that they possess a mixture of eubacterial and eukaryotic features. Because archaea lack nuclear membranes, transcription and translation take place simultaneously, just as they do in eubacterial cells. As mentioned earlier, archaea utilize unformylated methionine as the initiator amino acid, a characteristic of eukaryotic translation. Findings from recent studies of DNA sequences that code for initiation and elongation factors in archaea suggest that some of them are similar to those found in eubacteria, whereas others are similar to those found in eukaryotes. Finally, some of the antibiotics that inhibit translation in eubacteria have no effect on translation in archaea. Ribosome mRNA 5’ The Posttranslational Modifications of Proteins Direction of translation ◗ 15.26 In prokaryotic cells, transcription and translation take place simultaneously. While mRNA is being transcribed from the DNA template at mRNA’s 3 end, translation is taking place simultaneously at mRNA’s 5 end. After translation, proteins in both prokaryotic and eukaryotic cells may undergo alterations termed posttranslational modifications. A number of different types of modifications are possible. As mentioned earlier, the formyl group or the entire methionine residue may be removed from the amino end of a protein. Some proteins are synthesized as larger precursor proteins and must be cleaved and trimmed by enzymes before the proteins can become functional. For others, the attachment of carbohydrates may be required for activation. The functions of many proteins depend critically on the proper folding of the polypeptide chain; some pro- The Genetic Code and Translation teins spontaneously fold into their correct shapes, but, for others, correct folding may initially require the participation of other molecules called molecular chaperones. In eukaryotic cells, the amino end of a protein is often acetylated after translation. Another modification of some proteins is the removal of 15 to 30 amino acids, called the signal sequence, at the amino end of the protein. The signal sequence helps direct a protein to a specific location within the cell, after which the sequence is removed by special enzymes. Amino acids within a protein may also be modified: phosphates, carboxyl groups, and methyl groups are added to some amino acids. Concepts Many proteins undergo posttranslational modifications after their synthesis. Translation and Antibiotics Antibiotics are drugs that kill microorganisms. To make an effective antibiotic — not just any poison will do — the trick is to kill the microbe without harming the patient. Antibiotics must be carefully chosen so that they destroy bacterial cells but not the eukaryotic cells of their host. Translation is frequently the target of antibiotics because translation is essential to all living organisms and differs significantly between bacterial and eukaryotic cells. For example, bacterial and eukaryotic ribosomes differ in size and composition. A number of antibiotics bind selectively to bacterial ribosomes and inhibit various steps in translation, but they do not affect eukaryotic ribosomes. Tetracyclines, for instance, are a class of antibiotics that bind to the A site of bacterial ribosomes and block the entry of charged tRNAs, yet they have no effect on eukaryotic ribosomes. Neomycin binds to the ribosome near the A site and induces translational errors, probably by causing mistakes in the binding of charged tRNAs to the A site. Chloramphenicol binds to the large subunit of the ribosome and blocks peptide-bond formation. Streptomycin binds to the small subunit of the ribosome and inhibits initiation, and erythromycin blocks translocation. Although chloramphenicol and streptomycin are potent inhibitors of translation in bacteria, they do not inhibit translation in archaebacteria. The three-dimensional structure of puromycin resembles the 3 end of a charged tRNA, permitting puromycin to enter the A site of a ribosome efficiently and inhibit the entry of tRNAs. A peptide bond can form between the puromycin molecule in the A site and an amino acid on the tRNA in the P site of the ribosome, but puromycin cannot bind to the P site and translocation does not take place, blocking further elongation of the protein. Because tRNA structure is similar in all organisms, puromycin inhibits translation in both bacterial and eukaryotic cells; consequently, puromycin kills eukaryotic cells along with bacteria and is sometimes used in cancer therapy to destroy tumor cells. Many antibiotics act by blocking specific steps in translation, and different antibiotics affect different steps in protein synthesis. Because of this specificity, antibiotics are frequently used to study the process of protein synthesis. Connecting Concepts Across Chapters This chapter has focused on the process by which genetic information in an mRNA molecule is transferred to the amino acid sequence of a protein. This process is termed translation because information contained in the language of nucleotides must be “translated” into the language of amino acids. The link between genotype and phenotype is usually a protein: most genes affect phenotypes by encoding proteins. How the presence of a protein produces a particular anatomical, physiological, or behavioral trait, however, is often far from clear, as was illustrated by the story of Lesch-Nyhan disease. The relation between genes and traits is the subject of much current research and will be explored further in Chapters 16 and 21. In this chapter, we have examined the nature of the genetic code. It is a very concise code, with each codon consisting of three nucleotides, the minimum number capable of specifying all 20 common amino acids. Breaking the genetic code required great ingenuity and hard work on the part of a number of geneticists. Much of this chapter has centered on protein synthesis. We learned that translation is a highly complex process: rRNAs, ribosomal proteins, tRNAs, mRNA, initiation factors, elongation factors, release factors, and aminoacyl-tRNA synthetases all help to assemble amino acids into a protein. This complexity might seem surprising, because the peptide bonds that hold amino acids together are simple covalent bonds. Translation is complex not because of any special property of the peptide bond, but rather because the amino acids must be linked in a highly precise order. The amino acid sequence determines the secondary and tertiary structures of a protein, which are critical to its function; so the genetic information in a mRNA molecule must be accurately translated. The complexity of translation has evolved to ensure that few mistakes are made in the course of protein synthesis. An important theme in protein synthesis is RNA – RNA interaction, which takes place between tRNAs and mRNA, between mRNA and rRNAs, and between tRNAs and rRNAs. The prominence of these RNA – RNA interactions in translation reinforces the proposal that life first evolved in an RNA world, where flexible and versatile RNA molecules carried out many life processes (Chapter 13). 427 428 Chapter 15 This chapter has built on our understanding of other processes of information transfer covered earlier in the book: replication (Chapter 12), transcription (Chapter 13), and RNA processing (Chapter 14). It also provides a critical foundation for later discussions of gene regulation (Chapter 16), gene mutations (Chapter 17), and the advanced topics of developmental genetics, cancer genetics, and immunological genetics (Chapter 21). CONCEPTS SUMMARY • Genes code for phenotypes by specifying the amino acid sequences of proteins. • The relation between genes and proteins was first suggested by Archibald Garrod. • Beadle and Tatum developed the one gene, one enzyme hypothesis, which proposed that each gene specifies one enzyme; this hypothesis was later modified to become the one gene, one polypeptide hypothesis. • Proteins are composed of 20 different amino acids, several or many of which are linked together by peptide bonds. Chains of amino acids fold and associate to produce the secondary, tertiary, and quaternary structures of proteins. • The genetic code is the way in which genetic information is stored in the nucleotide sequence of a gene. • Solving the genetic code required several different approaches: the use of synthetic mRNAs with random sequences; short mRNAs that bind tRNAs with their amino acids; and long synthetic mRNAs with regularly repeating sequences. • The genetic code is a triplet code: three nucleotides specify a single amino acid. It is also degenerate, nonoverlapping, and universal (almost). • The degeneracy of the code means that more than one codon may specify an amino acid. Different tRNAs (isoaccepting tRNAs) may accept the same amino acid, and different anticodons may pair with the same codon through wobble, which can exist at the third position of the codon and which allows some nonstandard pairing of bases in this position. • The reading frame is set by the initiation codon. • The end of the protein-coding section of an mRNA is marked by one of three termination codons. • Protein synthesis comprises four steps: (1) the binding of amino acids to the appropriate tRNAs, (2) initiation, (3) elongation, and (4) termination. • The binding of an amino acid to a tRNA requires the presence of a specific aminoacyl-tRNA synthetase and ATP. The amino acid is attached by its carboxyl end to the 3 end of the tRNA. • In bacterial translation initiation, the small subunit of the ribosome attaches to the mRNA and is positioned over the initiation codon. It is joined by the first tRNA and its associated amino acid (N-formylmethionine in bacterial cells) and, later, by the large subunit of the ribosome. Initiation requires several initiation factors and GTP. • In elongation, a charged tRNA enters the A site of a ribosome, a peptide-bond is formed between amino acids in the A and P sites, and the ribosome moves (translocates) along the mRNA to the next codon. Elongation requires several elongation factors and GTP. • Translation is terminated when the ribosome encounters one of the three termination codons. Release factors and GTP are required to bring about termination. • Like RNA processing, translation requires a number of RNA – RNA interactions. • Each mRNA may be simultaneously translated by several ribosomes, producing a structure called a polyribosome. • Many proteins undergo posttranslational modification. IMPORTANT TERMS auxotroph (p. 000) one gene, one enzyme hypothesis (p. 000) one gene, one polypeptide hypothesis (p. 000) amino acid (p. 000) peptide bond (p. 000) polypeptide (p. 000) sense codon (p. 000) degenerate genetic code (p. 000) synonymous codons (p. 000) isoaccepting tRNAs (p. 000) wobble (p. 000) nonoverlapping genetic code (p. 000) reading frame (p. 000) initiation codon (p. 000) stop (termination or nonsense) codon (p. 000) universal genetic code (p. 000) aminoacyl-tRNA synthetase (p. 000) tRNA charging (p. 000) initiation factors (IF-1, IF-2, IF-3) (p. 000) 30S initiation complex (p. 000) 70S initiation complex (p. 000) aminoacyl (A) site (p. 000) peptidyl (P) site (p. 000) exit (E) site (p. 000) elongation factor Tu (EF-Tu) (p. 000) elongation factor Ts (EF-Ts) (p. 000) peptidyl transferase (p. 000) translocation (p. 000) elongation factor G (EF-G) (p. 000) release factors (RF1, RF2, RF3) (p. 000) polyribosome (p. 000) molecular chaperone (p. 000) signal sequence (p. 000) The Genetic Code and Translation 429 Worked Problems 1. A series of auxotrophic mutants were isolated in Neurospora. Examination of fungi containing these mutations revealed that they grew on minimal medium to which various compounds (A, B, C, D) were added; growth responses to each of the four compounds are presented in the following table. Give the order of compounds A, B, C, and D in a biochemical pathway. Outline a biochemical pathway that includes these four compounds and indicate which step in the pathway is affected by each of the mutations. Compound Mutation number 134 276 987 773 772 146 333 123 A         B         C         D         II III IV Number 276 773 134 146 333 123 987 772 B         C         II mutations Group III mutants allow growth if compound B or D is added but not if compound A or C is added. Thus group III mutations affect steps that follow the production of A and C; we have already determined that compound C precedes A in the pathway; so A must be the next compound in the pathway: II mutations compound A 999: compounds B, D Group III mutations Finally, mutants in group IV will grow if compound D is added, but not if compound A, B, or C are added. Thus compound D is the fourth compound in the pathway, and mutations in group IV block the conversion of B into D: 999: compound C 999: compound A 999: Group Group Group Compound A         I mutations I mutations To solve this problem, we should first group the mutations for which compounds allow growth, as follows. Group I 999: compound C 999: compounds A, B, D Group Group 999: compound C 999: Group Group • Solution Mutation before A, B, and D; and group II mutations affect the conversion of compound C into one of the other compounds: D         The underlying principle used to determine the order of the compounds in the pathway is as follows: If a compound is added after the block, it will allow the mutant to grow, whereas, if a compound is added before the block, it will have no effect. Applying this principle to the data in the table, we see that mutants in group I will grow if compound A, B, C, or D is added to the medium; so these mutations must affect a step before the production of all four compounds: 999: compounds A, B, C, D Group I mutations Group II mutants will grow if compound A, B, or D is added but not if compound C is added. Thus compound C comes I mutations II mutations III mutations compound B 999: compound D Group IV mutations 2. If there were five different types of bases in mRNA instead of four, what would be the minimum codon size (number of nucleotides) required to specify the following numbers of different amino acid types: (a) 4, (b) 20, (c) 30? • Solution To answer this question, we must determine the number of combinations (codons) possible when there are different numbers of bases and different codon lengths. In general, the number of different codons possible will be equal to: blg  number of codons where, b equals the number of different types of bases and lg equals the number of nucleotides in each codon (codon length). If there are five different types of bases, then: 51  5 possible codons 52  25 possible codons 53  125 possible codons 430 Chapter 15 The number of possible codons must be greater than or equal to the number of amino acids specified. Therefore, a codon length of one nucleotide could specify 4 different amino acids, a codon length of 2 nucleotides could specify 20 different amino acids, and a codon length of 3 nucleotides could specify 30 different amino acids: (a) 1, (b) 2, (c) 3. 3. A template strand in bacterial DNA has the following base sequence: 5 – AGGTTTAACGTGCAT – 3 What amino acids would be encoded by this sequence? • Solution To answer this question, we must first work out the mRNA sequence that will be transcribed from this DNA sequence. The mRNA must be antiparallel and complementary to the DNA template strand: DNA template strand: 5 – AGGTTTAACGTGCAT – 3 mRNA copied from DNA: 3 –UCCAAAUUGCACGUA –5 An mRNA is translated 5 : 3; so it will be helpful if we turn the RNA molecule around with the 5 end on the left: mRNA copied from DNA: 5 – AUGCACGUUAAACCU – 3 The codons consist of groups of three nucleotides that are read successively after the first AUG codon; using Figure 15.14, we can determine that the amino acids are: 5–AUG fMet CAC His GUU AAA Val Lys CCU–3 Pro 4. The following triplets constitute anticodons found on a series of tRNAs. Give the amino acid carried by each of these tRNAs. • Solution To solve this problem, we first determine the codons with which these anticodons pair and then look up the amino acid specified by the codon in Figure 15.14. The codons are antiparallel and complementary to the anticodons. For part a, the anticodon is 5 – UUU – 3. According to the wobble rules in Table 15.2, U in the first position of the anticodon can pair with either A or G in the third position of the anticodon, so there are two codons that can pair with this anticodon: Anticodon: 5 – UUU – 3 Codon: 3 – AAA – 5 Codon: 3 – GAA – 5 Listing these codons in the conventional manner, with the 5 end on the right, we have: Codon: 5 – AAA – 3 Codon: 5 – AAG – 3 According to Figure 15.14, both codons specify the amino acid lysine (Lys). Recall that the wobble in the third position allows more than one codon to specify the same amino acid; so any wobble that exists should produce the same amino acid as the standard base pairings would, and we do not need to figure the wobble to answer this question. The answers for parts b, c, and d are: (b) Anticodon: 5 – GAC – 3 Codon: 3 – CUG – 5 5 – GUC – 3 codes for Val (c) Anticodon: 5 – UUG – 3 Codon: 3 – AAC – 5 5 – CAA – 3 codes for Gln (d) Anticodon: 5 – CAG – 3 Codon: 3 – GUC – 5 5 – CUG – 3 codes for Leu (a) 5 – UUU – 3 (b) 5 – GAC – 3 (c) 5 – UUG – 3 (d) 5 – CAG – 3 The New Genetics MINING GENOMES THREE DIMENSIONAL PROTEIN STRUCTURE The function of proteins and other biological macromolecules is directly related to their shape, and understanding the three-dimensional shape of proteins is an important developing field in bioinformatics. This exercise uses tools available at the Biology Workbench, managed by the San Diego Supercomputing Center at the University of California, San Diego, to visualize the three-dimensional shape of some interesting proteins. The Genetic Code and Translation 431 COMPREHENSION QUESTIONS 1. What is the one gene, one enzyme hypothesis? Why was this hypothesis an important advance in our understanding of genetics? 2. What three different methods were used to help break the genetic code? What did each reveal and what were the advantages and disadvantages of each? 3. What are isoaccepting tRNAs? 4. What is the significance of the fact that many synonymous codons differ only in the third nucleotide position? 5. Define the following terms as they apply to the genetic code: (a) reading frame (b) overlapping code (f ) sense codon (g) nonsense codon (c) nonoverlapping code (d) initiation codon (e) termination codon (h) universal code ( i ) nonuniversal codons 6. How is the reading frame of a nucleotide sequence set? 7. How are tRNAs linked to their corresponding amino acids? 8. What role do the initiation factors play in protein synthesis? 9. How does the process of initiation differ in bacterial and eukaryotic cells? 10. Give the elongation factors used in bacterial translation and explain the role played by each factor in translation. 11. What events bring about the termination of translation? 12. Give several examples of RNA – RNA interactions that take place in protein synthesis. 13. What are some types of posttranslational modification of proteins? 14. Explain how some antibiotics work by affecting the process of protein synthesis. 15. Compare and contrast the process of protein synthesis in bacterial and eukaryotic cells, giving similarities and differences in the process of translation in these two types of cells. APPLICATION QUESTIONS AND PROBLEMS 16. Sydney Brenner isolated Salmonella typhimurium mutants that were implicated in the biosynthesis of tryptophan and would not grow on minimal medium. When these mutants were tested on minimal medium to which one of four compounds (indole glycerol phosphate, indole, anthranilic acid, and tryptophan) had been added, the growth responses shown in the following table were obtained. Mutant trp-1 trp-2 trp-3 trp-4 trp-6 trp-7 trp-8 trp-9 trp-10 trp-11 Minimal medium           Anthranilic acid           Indole glycerol phosphate           Indole           Tryptophan           Give the order of indole glycerol phosphate, indole, anthranilic acid, and tryptophan in a biochemical pathway leading to the synthesis of tryptophan. Indicate which step in the pathway is affected by each of the mutations. 17. The addition of a series of compounds yielded in the following biochemical pathway: precursor 99: compound I 99: enzyme enzyme A B compound II 99: compound III enzyme C Mutation a inactivates enzyme A, mutation b inactivates enzyme B, and mutation c inactivates enzyme C. Mutants, each having one of these defects, were tested on minimal medium to which compound I, II, or III was added. Fill in the results expected of these tests by placing a plus sign () for growth or a minus sign () for no growth in the following table: Minimal medium to which is added Strain with Compound Compound Compound mutation I II III a b c 18. Assume that the number of different types of bases in RNA is four. What would be the minimum codon size (number of nucleotides) required if the number of different types of amino acids in proteins were: (a) 2, (b) 8, (c) 17, (d) 45, (e) 75. 19. How many codons would be possible in a triplet code if only three bases (A, C, and U) were used? 432 Chapter 15 20. Using the genetic code given in Figure 15.14, give the amino acids specified by the following bacterial mRNA sequences, and indicate the amino and carboxyl ends of the polypeptide produced. (a) 5 – AUGUUUAAAUUUAAAUUUUGA – 3 (b) 5 – AUGUAUAUAUAUAUAUGA – 3 (c) 5 – AUGGAUGAAAGAUUUCUCGCUUGA – 3 (d) 5 – AUGGGUUAGGGGACAUCAUUUUGA – 3 21. A nontemplate strand on DNA has the following base sequence. What amino acid sequence would be encoded by this sequence? 5 – ATGATACTAAGGCCC – 3 22. The following amino acid sequence is found in a tripeptide: Met-Trp-His. Give all possible nucleotide sequences on the mRNA, on the template strand of DNA, and on the nontemplate strand of DNA that could encode this tripeptide. 23. How many different mRNA sequences can code for a polypeptide chain with the amino acid sequence Met-Leu-Arg? (Be sure to include the stop codon.) 24. A series of tRNAs have the following anticodons. Consider the wobble rules given in Table 14.2 and give all possible codons with which each tRNA can pair. (a) 5 – GGC – 3 (b) 5 – AAG – 3 (c) 5 – IAA – 3 (d) 5 – UGG – 3 (e) 5 – CAG – 3 25. An anticodon on a tRNA has the sequence 5 – GCA – 3. (a) What amino acid is carried by this tRNA? (b) What would be the effect if the G in the anticodon were mutated to a U? 26. Which of the following amino acid changes could result from a mutation that changed a single base? For each change that could result from the alteration of a single base, determine which position of the codon (first, second, or third nucleotide) in the mRNA must be altered for the change to result. (a) Leu : Gln (b) Phe : Ser (c) Phe : Ile (d) Pro : Ala (e) Asn : Lys (f) Ile : Asn 27. A synthetic mRNA added to a cell-free protein-synthesizing system produces a peptide with the following amino acid sequence: Met-Pro-Ile-Ser-Ala. What would be the effect on translation if the following components were omitted from the cell-free protein-synthesizing system? What, it any, type of protein would be produced? Explain your reasoning. (a) initiation factor 1 (b) initiation factor 2 (c) elongation factor Tu (d) elongation factor G (f) release factors R1, R2, and R3 (g) ATP (h) GTP CHALLENGE QUESTIONS 28. In what ways are spliceosomes and ribosomes similar? In what ways are they different? Can you suggest some possible reasons for their similarities. 29. Several experiments were conducted to obtain information about how the eukaryotic ribosome recognizes the AUG start codon. In one experiment, the gene that codes for methionine initiator tRNA (tRNAiMet) was located and changed. The nucleotides that specify the anticodon on tRNAiMet were mutated so that the anticodon in the tRNA was 5 – CCA – 3 instead of 5 – CAU – 3. When this mutated gene was placed into a eukaryotic cell, protein synthesis took place, but the proteins produced were abnormal. Some of the proteins produced contained extra amino acids, and others contained fewer amino acids. (a) What do these results indicate about how the ribosome recognizes the starting point for translation in eukaryotic cells? Explain your reasoning. (b) If the same experiment had been conducted on bacterial cells, what results would you expect? The Genetic Code and Translation 433 SUGGESTED READINGS Agrawal, R. K., P. Penczek, R. A. Grassucci, Y. Li, A. Leith, K. H. Nierhaus, and J. Frank. 1996. Direct visualization of A-, P-, and E-site transfer RNAs in the Escherichia coli ribosome. Science 271:1000 – 1002. A three-dimensional reconstruction of the location of tRNAs in the three sites of the ribosome during translation. Beadle, G. W., and E. L. Tatum. 1942. Genetic control of biochemical reactions in Neurospora. Proceedings of the National Academy of Sciences 27:499 – 506. Seminal paper in which Beadle and Tatum outline their basic methodology for isolating auxotrophic mutants. Cech, T. R. 2000. The ribosome is a ribozyme. Science 289:878–879. A brief commentary on research indicating that RNA in the ribosome is responsible for catalyzing peptide-bond formation in protein synthesis. Dever, T. E. 1999. Translation initiation: adept at adapting. Trends in Biochemical Science. 24:398 – 403. A discussion of factors that play a role in eukaryotic translation initiation. Fox, T. D. 1987. Natural variation in the genetic code. Annual Review of Genetics 21:67 – 91. A review of exceptions to the universal genetic code. Gualerzi, C. O., and C. L. Pon. 1990. Initiation of mRNA translation in prokaryotes. Biochemistry 29:5881 – 5889. A good, although fairly technical, review of the process of translational initiation in prokaryotic cells. Ibba, M., and D. Söll. 1999. Quality control mechanisms during translation. Science 286:1893 – 1897. A review of mechanisms that prevent errors in translation. Iborra, F. J., D. A. Jackson, and P. R. Cook. 2001. Coupled transcription and translation within nuclei of mammalian cells. Science 293:1139 – 1142. A report of experiments demonstrating that some translation takes place within the eukaryotic nucleus. Khorana, H. G., H. Buchi, H. Ghosh, N. Gupta, T. M. Jacob, H. Kossel, R. Morgan, S. A. Narang, E. Ohtsuka, and R. D. Wells. 1966. Polynucleotide synthesis and the genetic code. Cold Spring Harbor Symposium on Quantitative Biology 31:39 – 49. The use of repeating RNA polymers in solving the code. Nirenberg, M., and P. Leder. 1964. RNA code words and protein synthesis I: the effect of trinucleotides upon the binding of sRNA to ribosomes. Science 145:1399 – 1407. A description of the tRNA binding technique for solving the code. Nirenberg, M. W., O. W. Jones, P. Leder, B. F. C. Clark, W. S. Sly, and S. Pestka. 1963. On the coding of genetic information. Cold Spring Harbor Symposium on Quantitative Biology 28:549 – 557. The use of random copolymers in solving the code. Nakamura, Y., K. Ito, and L. A. Isaksson. 1996. Emerging understanding of translational termination. Cell 87:147 – 150. A review of translational termination. Noller, H. F. 1991. Ribosomal RNA and translation. Annual Review of Biochemistry 60:191 – 227. A good review of RNA – RNA interactions in translation. Preiss, T., and M. W. Hentze. 1998. Dual function of the messenger RNA cap structure in poly(A)-tail-promoted translation in yeast. Nature 392:516 – 519. Research article describing evidence that the poly(A) tail plays a role in translational initiation. Sachs, A. B., P. Sarnow, and M. W. Hentz. 1997. Starting at the beginning, middle, and end: translation initiation in eukaryotes. Cell 89:831 – 838. A good review of different ways in which translation is initiated in eukaryotic mRNAs. Wickner, S., M. R. Maurizi, and S. Gottesman. 1999. Postranslational quality control: folding, refolding, and degrading proteins. Science 286:1888 – 1893. A review of posttranslation modifications of proteins. Yusupov, M. M., G. Z. Yusupova, A. Baucom, K. Lieberman, T. N. Earnest, J. H. D. Cate, and H. F. Noller. 2001. Crystal structure of the ribosome at 5.5 Å resolution. Science 292:883 – 896. A report of the structure of the complete ribosome with mRNA and bound tRNAs at high resolution. 16 Control of Gene Expression • Creating Giant Mice Through Gene Regulation • General Principles of Gene Regulation Levels of Gene Control Genes and Regulatory Elements DNA-Binding Proteins • Gene Regulation in Bacterial Cells Operon Structure Negative and Positive Control: Inducible and Repressible Operons The lac Operon of E. coli lac Mutations Positive Control and Catabolite Repression The trp Operon of E. coli Attenuation: The Premature Termination of Transcription Antisense RNA in Gene Regulation Transcriptional Control in Bacteriophage Lambda • Eukaryotic Gene Regulation Chromatin Structure and Gene Regulation The giant transgenic mouse on the left was produced by injecting a rat gene for growth hormone into a mouse embryo; a normal-size mouse is on the right. To ensure expression, the rat gene was linked to a DNA sequence that stimulates the transcription of mouse DNA whenever heavy metals are present. Zinc was provided in the food for the transgenic mouse; some transgenic mice produced 800 times the normal levels of growth hormone. (Courtesy of Transcriptional Control in Eukaryotic Cells Dr. Ralph L. Brinster, School of Veterinary Medicine, University of Pennsylvania.) Translational and Posttranslational Control Creating Giant Mice Through Gene Regulation In 1982, a group of molecular geneticists led by Richard Palmiter at the University of Washington produced gigantic mice that grew to almost twice the size of normal mice. Palmiter and his colleagues created these large mice through genetic engineering, by injecting the rat gene for growth hormone into the nuclei of fertilized mouse embryos and then implanting these embryos into surrogate mouse mothers. In a few embryos, the rat gene became incorporated into the mouse chromosome and, after birth, these trans- 434 Gene Control Through Messenger RNA Processing Gene Control Through RNA Stability RNA Silencing genic mice produced growth hormone encoded by the rat gene. Some of the transgenic mice produced from 100 to 800 times the amount of growth hormone found in normal mice, which caused them to grow rapidly into giants. Inserting foreign genes into bacteria, plants, mice, and even humans is now a routine procedure for molecular geneticists (see Chapter 18). However, simply putting a gene into a cell does not guarantee that the gene will be transcribed or produce a protein; indeed, most foreign genes are never transcribed or translated, which isn’t surprising. Organisms have evolved complex systems to ensure that genes are expressed at the appropriate time and in the Control of Gene Expression appropriate amounts, and sequences other than the gene itself are required to ensure transcription and translation. In this chapter, we will learn more about these sequences and other mechanisms that control gene expression. If foreign genes are rarely expressed, why did the transgenic mice with the gene for rat growth hormone grow so big? Palmiter and his colleagues, aware of the need to provide sequences that control gene expression, linked the rat gene with the mouse metallothionein I promoter sequence, a DNA sequence normally found upstream of the mouse metallothionein I gene. When heavy metals such as zinc are present, they activate the metallothionein promoter sequence, thereby stimulating transcription of the metallothionein I gene. By connecting the rat growth-hormone gene to this promoter, Palmiter and his colleagues provided a means of turning on the transcription of the gene, simply by putting extra zinc in the food for the transgenic mice. This chapter is about gene regulation, the mechanisms and systems that control the expression of genes. We begin by discussing why gene regulation is necessary; the levels at which gene expression is controlled; and the difference between genes and regulatory elements. We then examine gene regulation in bacterial cells. In the second half of the chapter, we turn to gene regulation in eukaryotic cells, which is often more complex than in bacterial cells. General Principles of Gene Regulation One of the major themes of molecular genetics is the central dogma, which stated that genetic information flows from DNA to RNA to proteins (see Figure 10.17a) and provided a molecular basis for the connection between genotype and phenotype. Although the central dogma brought coherence to early research in molecular genetics, it failed to address a critical issue: How is the flow of information along the molecular pathway regulated? Consider E. coli, a bacterium that resides in your large intestine. Your eating habits completely determine the nutrients available to this bacteria: it can’t seek out nourishment when nutrients are scarce; nor can it move away when confronted with unpleasant changes. E. coli makes up for its inability to alter the external environment by being internally flexible. For example, if glucose is present, E. coli uses it to generate ATP; if there’s no glucose, it utilizes lactose, arabinonse, maltose, xylose, or any of a number of other sugars. When amino acids are available, E. coli uses them to synthesize proteins; if a particular amino acid is absent, E. coli produces the enzymes needed to synthesize that amino acid. Thus, E. coli responds to environmental changes by rapidly altering its biochemistry. This biochemical flexibility, however, has a high price. Producing all the enzymes necessary for every environmental condition would be energetically expensive. So how does E. coli maintain biochemical flexibility while optimizing energy efficiency? The answer is through gene regulation. Bacteria carry the genetic information for many proteins, but only a subset of this genetic information is expressed at any time. When the environment changes, new genes are expressed, and proteins appropriate for the new environment are synthesized. For example, if a carbon source appears in the environment, genes encoding enzymes that take up and metabolize this carbon source are quickly transcribed and translated. When this carbon source disappears, the genes that encode them are shut off. This type of response, the synthesis of an enzyme stimulated by a specific substrate, is called induction. Multicellular eukaryotic organisms face a different dilemma. Individual cells in a multicellular organism are specialized for particular tasks. The proteins produced by a nerve cell, for example, are quite different from those produced by a white blood cell. The problem that a eukaryotic cell faces is how to specialize. Although they are quite different in shape and function, a nerve cell and a blood cell still carry the same genetic instructions. A multicellular organism’s challenge is to bring about the specialization of cells that have a common set of genetic instructions. This challenge is met through gene regulation: all of an organism’s cells carry the same genetic information, but only a subset of genes are expressed in each cell type. Genes needed for other cell types are not expressed. Gene regulation is therefore the key to both unicellular flexibility and multicellular specialization, and it is critical to the success of all living organisms. Concepts In bacteria, gene regulation maintains internal flexibility, turning genes on and off in response to environmental changes. In multicellular eukaryotic organisms, gene regulation brings about cellular differentiation. Levels of Gene Control A gene may be regulated at a number of points along the pathway of information flow from genotype to phenotype ( ◗ FIGURE 16.1). First, regulation may be through the alteration of gene structure. Modifications to DNA or its packaging may influence which sequences are available for transcription or the rate at which sequences are transcribed. DNA methylation and changes in chromatin are two processes that play a pivotal role in gene regulation. A second point at which a gene can be regulated is at the level of transcription. For the sake of cellular economy, it makes sense to limit protein production early in the transfer of information from DNA to protein, and transcription is an important point of gene regulation in both bacterial and eukaryotic cells. A third potential point of gene regulation is mRNA processing. Eukaryotic mRNA is 435 436 Chapter 16 Compact DNA Levels of gene control Alteration of structure Relaxed DNA All of these factors, as well as the availability of amino acids and sequences in mRNA, influence the rate at which proteins are produced and therefore provide points at which gene expression may be controlled. Finally, many proteins are modified after translation (Chapter 13), and these modifications affect whether the proteins become active; so genes can be regulated through processes that affect posttranscriptional modification. Gene expression may be affected by regulatory activities at any or all of these points. Transcription Concepts Pre-mRNA mRNA processing Gene expression may be controlled at any of a number of points along the molecular pathway from DNA to protein, including gene structure, transcription, mRNA processing, RNA stability, translation, and posttranslational modification. Processed mRNA AAAAA RNA stability Translation Protein (inactive) Posttranslational modification Modified protein (active) ◗ 16.1 Gene expression may be controlled at multiple levels. extensively modified before it is translated; a 5 cap is added, the 3 end is cleaved and polyadenylated, and introns are removed (see Chapter 14). These modifications determine the stability of the mRNA, whether mRNA can be translated, the rate of translation, and the amino acid sequence of the protein produced. There is growing evidence that a number of regulatory mechanisms in eukaryotic cells operate at the level of mRNA processing. A fourth point for the control of gene expression is the regulation of RNA stability. The amount of protein produced depends not only on the amount of mRNA synthesized, but also on the rate at which the mRNA is degraded; so RNA stability plays an important role in gene expression. A fifth point of gene regulation is at the level of translation, a complex process requiring a large number of enzymes, protein factors, and RNA molecules (Chapter 15). Genes and Regulatory Elements In our consideration of gene regulation, it will be necessary to distinguish between the DNA sequences that are transcribed and the DNA sequences that regulate the expression of other sequences. We will refer to any DNA sequence that is transcribed into an RNA molecule as a gene. According to this definition, genes include DNA sequences that encode proteins, as well as sequences that encode rRNA, tRNA, snRNA, and other types of RNA. Structural genes encode proteins that are used in metabolism or biosynthesis or that play a structural role in the cell. Regulatory genes are genes whose products, either RNA or proteins, interact with other sequences and affect their transcription or translation. In many cases, the products of regulatory genes are DNAbinding proteins. We will also encounter DNA sequences that are not transcribed at all but still play a role in regulating other nucleotide sequences. These regulatory elements affect the expression of sequences to which they are physically linked. Much of gene regulation takes place through the action of proteins produced by regulatory genes that recognize and bind to regulatory elements. Concepts Genes are DNA sequences that are transcribed into RNA. Regulatory elements are DNA sequences that are not transcribed but affect the expression of genes. DNA-Binding Proteins Much of gene regulation is accomplished by proteins that bind to DNA sequences and influence their expression. These regulatory proteins generally have discrete functional Control of Gene Expression (a) Helix-turn-helix Helix Turn DNAbinding helix (b) Zinc fingers (c) Steroid receptor Helix Turn Dimerbinding helix (d) Leucine zipper Finger Zinc ions (e) Helix-loop-helix (f) Homeodomain Helix Loop Leucine Zipper DNA-binding helix Minor Major groove groove ◗ 16.2 DNA-binding proteins can be grouped into several types on the basis of their structure, or motif. (a) The helix-turn-helix DNA motif consists of two alpha helices connected by a turn. (b) The zinc-finger motif consists of a loop of amino acids containing a single zinc ion. Most proteins containing zinc fingers have several repeats of the zinc-finger motif. Each zinc finger fits into the major groove of DNA and forms hydrogen bonds with bases in the DNA. (c) The steroid receptor binding motif has two alpha helices, each with a zinc ion surrounded by four cysteine residues. The two alpha helices are perpendicular to one another: one fits into the major groove of the double helix, whereas the other is parallel to the DNA. (d) The leucine-zipper motif consists of a helix of leucine nucleotides and an arm of basic amino acids. DNA-binding proteins usually have two polypeptides; the leucine nucleotides of the two polypeptides face one another, whereas the basic amino acids bind to the DNA. (e) The helix-loop-helix binding motif consists of two alpha helices separated by a loop of amino acids. Two polypeptide chains with this motif join to form a functional DNA-binding protein. A highly basic set of amino acids in one of the helices binds to the DNA. (f) The homeodomain motif consists of three alpha helices; the third helix fits in a major groove of DNA. parts — called domains, typically consisting of 60 to 90 amino acids — that are responsible for binding to DNA. Within a domain, only a few amino acids actually make contact with the DNA. These amino acids (most commonly asparagine, glutamine, glycine, lysine, and arginine) often form hydrogen bonds with the bases or interact with the sugar – phosphate backbone of the DNA. Many regulatory proteins have additional domains that can bind other molecules such as other regulatory proteins. DNA-binding proteins can be grouped into several distinct types on the basis of a characteristic structure, called a motif, found within the binding domain. Motifs are simple structures, such as alpha helices, that can fit into the major groove of the DNA. Some common DNA-binding motifs are illustrated in ◗ FIGURE 16.2 and are summarized in Table 16.1. www.whfreeman.com/pierce DNA-binding proteins Molecular images of several 437 438 Chapter 16 Table 16.1 Common DNA-binding motifs Motif Location Characteristics Binding Site in DNA Helix-turn-helix Bacterial regulatory proteins; related motifs in eukaryotic proteins Two alpha helices Major groove Zinc-finger Eukaryotic regulatory and other proteins Loop of amino acids with zinc at base Major groove Steroid receptor Eukaryotic proteins Two perpendicular alpha helices with zinc surrounded by four cysteine residues Major groove and DNA backbone Leucine-zipper Eukaryotic transcription factors Helix of leucine residues and a basic arm; two leucine residues interdigitate Two adjacent major grooves Helix-loop-helix Eukaryotic proteins Two alpha helices separated by a loop of amino acids Major groove Homeodomain Eukaryotic regulatory proteins Three alpha helices Major groove Gene Regulation in Bacterial Cells The mechanisms of gene regulation were first investigated in bacterial cells, where the availability of mutants and the ease of laboratory manipulation made it possible to unravel the mechanisms. When the study of these mechanisms in eukaryotic cells began, it seemed clear that bacterial and eukaryotic gene regulation were quite different. As more and more information has accumulated about gene regulation, however, a number of common themes have emerged, and today many aspects of gene regulation in bacterial and eukaryotic cells are recognized to be similar. Although we will look at gene regulation in these two cell types separately, the emphasis will be on the common themes that apply to all cells. Operon Structure One significant difference in prokaryotic and eukaryotic gene control lies in the organization of functionally related genes. Many bacterial genes that have related functions are clustered and are under the control of a single promoter. These genes are often transcribed together into a single mRNA. Eukaryotic genes, in contrast, are dispersed, and typically, each is transcribed into a separate mRNA. A group of bacterial structural genes that are transcribed together (along with their promoter and additional sequences that control transcription) is called an operon. The organization of a typical operon is illustrated in ◗ FIGURE 16.3. At one end of the operon is a set of structural genes, shown in Figure 16.3 as gene a, gene b, and gene c. These structural genes are transcribed into a single mRNA, which is translated to produce enzymes A, B, and C. These enzymes carry out a series of biochemical reactions that convert precursor molecule X into product Y. The transcription of structural genes a, b, and c is under the control of a promoter, which lies upstream of the first structural gene. RNA polymerase binds to the promoter and then moves downstream, transcribing the structural genes. A regulator gene helps to regulate the transcription of the structural genes of the operon. The regulator gene is not considered part of the operon, although it affects operon function. The regulator gene has its own promoter and is transcribed into a relatively short mRNA, which is translated into a small protein. This regulator protein may bind to a region of DNA called the operator and affect whether transcription can take place. The operator usually overlaps the 3 end of the promoter and sometimes the 5 end of the first structural gene (see Figure 16.3). Concepts Functionally related genes in bacterial cells are frequently clustered together as a single transcriptional unit termed an operon. A typical operon includes several structural genes, a promoter for the structural genes, and an operator site where the product of a regulator gene binds. Negative and Positive Control: Inducible and Repressible Operons There are two types of transcriptional control: negative control, in which a regulatory protein acts as a repressor, binding to DNA and inhibiting transcription; and positive control, in Control of Gene Expression 1 An operon is a group of structural genes plus sequences that control transcription. Regulator gene Promoter 2 A separate regulator gene— with its own promoter—… Operator Structural genes RNA polymerase Transcription Gene a Gene b Gene c Promoter 4 …that may bind to the operator site to regulate the transcription… mRNA Transcription mRNA Translation 3 …encodes a regulator protein… Operon Regulator protein 5 …of mRNA,… Proteins (enzymes) Translation A B C 6 …whose products catalyze reactions in a biochemical pathway. Biochemical pathway Precursor X ◗ 16.3 An operon is a single transcriptional unit that includes a series of structural genes, a promoter, and an operator. which a regulatory protein acts as an activator, stimulating transcription. In the next sections, we will consider several varieties of these two basic control mechanisms. Negative inducible operons In an operon with negative control at the operator site, the regulatory protein is a repressor — the binding of the regulator protein to the operator inhibits transcription. In a negative inducible operon, transcription and translation of the regulator gene produce an active repressor that readily binds to the operator ( ◗ FIGURE 16.4a). Because the operator site overlaps with the promoter site, the binding of this protein to the operator physically blocks the binding of RNA polymerase to the promoter and prevents transcription. For transcription to take place, something must happen to prevent the binding of the repressor at the operator site. This type of system is said to be inducible, because transcription is normally off (inhibited) and must be turned on (induced). Transcription is turned on when a small molecule, an inducer, binds to the repressor. ◗ FIGURE 16.4b shows that, when precursor V (acting as the inducer) binds to the repressor, the repressor can no longer bind to the operator. Regulatory proteins frequently have two binding sites: one that binds to DNA and another that binds to a small molecule such as an inducer. Binding of the inducer alters the shape of the repressor, preventing it from binding to DNA. Proteins of this type, which change shape on binding to another molecule, are called allosteric proteins. Intermediate products Product Y 7 In some operons, product molecules may, in turn, bind to the regulator protein to either activate it or turn it off. When the inducer is absent, the repressor binds to the operator, the structural genes are not transcribed, and enzymes D, E, and F (which metabolize precursor V) are not synthesized (see Figure 16.4a). This is an adaptive mechanism: because no precursor V is available, it would be wasteful for the cell to synthesize the enzymes when they have no substrate to metabolize. As soon as precursor V becomes available, some of it binds to the repressor, rendering the repressor inactive and unable to bind to the operator site. Now RNA polymerase can bind to the promoter and transcribe the structural genes. The resulting mRNA is then translated into enzymes D, E, and F, which convert substrate V into product W (see Figure 16.4b). So, an operon with negative inducible control regulates the synthesis of the enzymes economically: the enzymes are synthesized only when their substrate (V) is available. Negative repressible operons Some operons with negative control are repressible, meaning that transcription normally takes place and must be turned off, or repressed. The regulator protein in this type of operon also is a repressor but is synthesized in an inactive form that cannot by itself bind to the operator. Because there is no repressor bound to the operator, RNA polymerase readily binds to the promoter and transcription of the structural genes takes place ( ◗ FIGURE 16.5a). To turn transcription off, something must happen to make the repressor active. A small molecule called a core- 439 440 Chapter 16 Operon Negative inducible operon (a) No precursor present Operator Regulator gene Promoter RNA polymerase Structural genes Cannot bind Gene d Transcription and translation 1 The regulator protein is a repressor, produced in an active form,… (b) Gene e Gene f Promoter No transcription 2 …that binds to the operator… Active regulator protein 3 …and prevents transcription of the structural genes. Operator Precursor V (the inducer) present RNA polymerase Transcription and translation 4 When precursor V is present, it binds to the regulator protein and makes the regulator protein inactive. ◗ 16.4 Inactive regulator protein Active regulator protein Some operons are inducible. pressor binds to the repressor and makes it capable of binding to the operator. In the example illustrated (see Figure 16.5a), the product (U) of the metabolic reaction is the corepressor. As long as the level of product U is high, it is available to bind to and activate the repressor, preventing transcription ( ◗ FIGURE 16.5b). With the operon repressed, enzymes G, H, and I are not synthesized, and no more U is produced from precursor T. However, when all of product U is used up, the repressor is no longer activated by U and cannot bind to the operator. The inactivation of the repressor allows the transcription of the structural genes and the synthesis of enzymes G, H, and I, resulting in the conversion of precursor T into product U. As with inducible operons, repressible operons are economical: the enzymes are synthesized only as needed. Note that both the inducible and the repressible systems that we have considered are forms of negative control, in which the regulatory protein is a repressor. We will now consider positive control, in which a regulator protein stimulates transcription. Positive control With positive control, a regulatory protein binds to DNA (usually at a site other than the operator) Transcription and translation D 5 The inactive regulator protein is unable to bind to the operator… Biochemical pathway Precursor V E F 6 …and transcription and translation of structural genes takes place. Intermediate products Product W Conclusion: The operon is turned on (and produces product W) only when precursor, V, is available. and stimulates transcription. Theoretically, positive control could be inducible or repressible. In a positive inducible operon, transcription would normally be turned off because the regulator protein would be produced in an inactive form. Transcription would take place when an inducer became attached to the regulatory protein, rendering the regulator active. Logically, the inducer should be the precursor of the reaction controlled by the operon so that the necessary enzymes would be synthesized only when the substrate for their reaction was present. A positive operon could also be repressible; transcription would normally take place and would have to be repressed. In this case, the regulator protein would be produced in a form that readily binds to DNA and stimulates transcription. Transcription would be inhibited when a substance became attached to the activator and rendered it unable to bind to the DNA so that transcription was no longer stimulated. Here, the product (P) of the reaction controlled by the operon would logically be the repressing substance, because it would be economical for the cell to prevent the transcription of genes that allow the synthesis of P when plenty of P is already available. Control of Gene Expression Operon Negative repressible operon (a) No product U present Regulator gene Promoter Operator Structural genes RNA polymerase Gene g Transcription and translation Inactive regulator protein (repressor) Gene i Transcription and translation 1 The regulator protein is an inactive repressor, unable to bind to the operator. 2 Transcription of the structural genes therefore takes place. Enzymes G Biochemical pathway Precursor T H I Intermediate products Product U (corepressor) Operator (b) Product U present RNA polymerase Cannot bind Transcription and translation Inactive regulator protein (repressor) No transcription 3 Product U binds to the regulator protein,… 4 …making it active and able to bind to the operator… 5 …and thus preventing transcription. Product U (corepressor) Active regulator protein ◗ 16.5 Gene h Promoter Some operons are repressible. Putting it all together Theoretically, operons might exhibit positive or negative control and be either inducible or repressible. Try sketching out all possible types — negative inducible, negative repressible, positive inducible, and positive repressible. To do so, learn the meanings of positive and negative control and inducible and repressible; then use logic to work out the details of whether the regulatory protein is a repressor or an activator and whether it is produced in an active or inactive form. You can check your answers against Table 16.2, where the important features of these four types of operons are summarized. Another useful exercise is to think about the effects of mutations at various sites in different types of operon systems. Although it is a useful learning device to think of operons as either positive or negative and either inducible or repressive, in reality both positive and negative controls often exist in the same operon. Concepts There are two basic types of transcriptional control: negative and positive. In negative control, when a regulatory protein (repressor) binds to DNA, transcription is inhibited; in positive control, when a regulatory protein (activator) binds to DNA, transcription is stimulated. Some operons are inducible; transcription is normally off and must be turned on. Other operons are repressible; transcription is normally on and must be turned off. The lac Operon of E. coli In 1961, François Jacob and Jacques Monod described the “operon model” for the genetic control of lactose metabolism in E. coli. This work and subsequent research on the genetics of lactose metabolism established the operon as 441 442 Chapter 16 Table 16.2 Features of inducible and repressible operons with positive and negative control Effect of Regulatory Protein Action of Modulator Active repressor Inhibits transcription Substrate makes repressor inactive On Inactive repressor Inhibits transcription Product makes repressor active Positive inducible Off Inactive activator Stimulates transcription Substrate makes activator active Positive repressible On Active activator Stimulates transcription Product makes activator inactive Type of Control Transcription Normally Regulator Protein Negative inducible Off Negative repressible the basic unit of transcriptional control in bacteria. Despite the fact that, at the time, no methods were available for determining nucleotide sequences, Jacob and Monod deduced the structure of the operon genetically by analyzing the interactions of mutations that interfered with the normal regulation of lactose metabolism. We will examine the effects of some of these mutations after seeing how the lac operon regulates lactose metabolism. Lactose (a disaccharide) is one of the major carbohydrates found in milk; it can be metabolized by E. coli bacteria that reside in the gut of mammals. Lactose does not easily diffuse across the E. coli cell membrane and must be actively transported into the cell by the enzyme permease ( ◗ FIGURE 16.6). To utilize lactose as an energy source, E. coli must first break it into glucose and galactose, a reaction catalyzed by the enzyme -galactosidase. This enzyme can also convert lactose into allolactose, a compound that plays an important role in regulating lactose metabolism. A third enzyme, thiogalactoside transacetylase, also is produced by the lac operon, but its function in lactose metabolism is not yet known. The enzymes -galactosidase, permease, and transacetylase are encoded by adjacent structural genes in the lac operon of E. coli. -Galactosidase is encoded by the lacZ gene, permease by the lacY gene, and transacetylase by the lacA gene ( ◗ FIGURE 16.7a). When lactose is absent from the medium in which E. coli grows, only a few molecules of each enzyme are produced. If lactose is added to the medium and glucose is absent, the rate of synthesis of all three enzymes simultaneously increases about a thousandfold within 2 to 3 minutes. This boost in enzyme synthesis results from transcription of lacZ, lacY, and lacA and examplifies coordinate induction, the simultaneous synthesis of several enzymes, stimulated by a specific molecule, the inducer ( ◗ FIGURE 16.7b). Although Extracellular lactose 1 Permease actively transports lactose into the cell,… Permease Cell membrane 2 …where the enzyme ß-galactosidase breaks it into galactose and glucose. β-Galactosidase Lactose 3 ß-Galactosidase also converts lactose into the related compound allolactose… ◗ 16.6 Lactose, a major carbohydrate found in milk, consists of 2 six-carbon sugars linked together. β-Galactosidase Allolactose + Galactose Glucose β-Galactosidase 4 …and converts allolactose into galactose and glucose. Control of Gene Expression Operon The lac operon (a) Absence of lactose RNA polymerase Regulator gene (lac I ) lac O operator Cannot bind PI lac Z Active regulator protein (repressor) Gene lacA i No transcription 1 In the absence of lactose, the regulator protein (a repressor) binds to the operator and inhibits transcription. lacO operator RNA polymerase Presence of lactose Transcription and translation Active regulator protein Transcription and translation 3 …which then binds to the regulatory protein, making the protein inactive. Inactive regulator protein (repressor) 2 When lactose is present, some of it is converted into allolactose,… 5 …and the structural genes are transcribed and translated. 4 The regulatory protein cannot bind to the operator,… Enzymes β-Galactosidase Allolactose ◗ 16.7 lac Y lac P Transcription and translation (b) Structural genes Permease Transacetylase Glucose β-Ga lactosidas Lactose Galactose e 6 The enzymes convert lactose into glucose and galactose. The lac operon regulates lactose metabolism. lactose appears to be the inducer here, allolactose is actually responsible for induction. In the lac operon, the lacZ, lacY, and lacA genes have a common promoter (lacP in Figure 16.7a) and are transcribed together. Upstream of the promoter is a regulator gene, lacI, which has its own promoter (PI). The lacI gene is transcribed into a short mRNA that is translated into a repressor. Each repressor consists of four identical polypeptides and has two binding sites; one site binds to allolactose and the other binds to DNA. In the absence of lactose (and, therefore, allolactose), the repressor binds to the lac operator site lacO (see Figure 16.7a). Jacob and Monod mapped the operator to a position adjacent to the lacZ gene; more recent nucleotide sequencing has demonstrated that the operator actually overlaps the 3 end of the promoter and the 5 end of lacZ ( ◗ FIGURE 16.8). Immediately upstream of the structural genes is the lac promoter. RNA polymerase binds to the promoter and moves down the DNA molecule, transcribing the structural genes. When the repressor is bound to the operator, the binding of RNA polymerase is blocked, and transcription is prevented. When lactose is present, some of it is converted into allolactose, which binds to the repressor and causes the repressor to be released from the DNA. In the presence of lactose, then, the repressor is inactivated, the binding of RNA polymerase is no longer blocked, the transcription of lacZ, lacY, and lacA takes place, and the lac enzymes are produced. Have you spotted the flaw in the explanation just given for the induction of the lac enzymes? You might recall that permease is required to transport lactose into the cell. If the lac operon is repressed and no permease is being produced, how does lactose get into the cell to inactivate the repressor and turn on transcription? Furthermore, the inducer is actually allolactose, which must be produced from lactose by -galactosidase. If -galactosidase production is repressed, how can lactose metabolism be induced? 443 444 Chapter 16 lacZ gene RNA polymerase 5’ 3’ DNA nontemplate strand lac repressor TAGGCACCCCAGGCTTTACACTTTATGCTTCCGGCTCGTATGTTGTGTGGAATTGTGAGCGGATAACAATTTCAC –35 region (consensus sequence) –10 region (consensus sequence) Transcription start site 3’ 5’ Operator bound by lac repressor ◗ 16.8 In the lac operon, the operator overlaps the promoter and the 5 end of the first structural gene. The answer is that repression never completely shuts down transcription of the lac operon. Even with active repressor bound to the operator, there is a low level of transcription and a few molecules of -galactosidase, permease, and transacetylase are synthesized. When lactose appears in the medium, the permease that is present transports a small amount of lactose into the cell. There, the few molecules of -galactosidase that are present convert some of the lactose into allolactose. The allolactose then attaches to the repressor and alters its shape so that the repressor no longer binds to the operator. When the operator site is clear, RNA polymerase can bind and transcribe the structural genes of the lac operon. Several compounds related to allolactose also can bind to the lac repressor and induce transcription of the lac operon. One such inducer is isopropylthiogalactoside (IPTG). Although IPTG inactivates the repressor and allows the transcription of lacZ, lacY, and lacA, IPTG is not metabolized by -galactosidase; for this reason it is often used in research to examine the effects of induction, independent of metabolism. Concepts The lac operon of E. coli controls the transcription of three genes in lactose metabolism: the lacZ gene, which encodes -galactosidase; the lacY gene, which encodes permease; and the lacA gene, which encodes thiogalactoside transacetylase. The lac operon is inducible: a regulator gene produces a repressor that binds to the operator site and prevents the transcription of the structural genes. The presence of allolactose inactivates the repressor and allows the transcription of the lac operon. lac Mutations Jacob and Monod worked out the structure and function of the lac operon by analyzing mutations that affected lactose metabolism. To help define the roles of the different components of the operon, they used partial diploid strains of E. coli. The cells of these strains possessed two different DNA molecules: the full bacterial chromosome and an extra piece of DNA. Jacob and Monod created these strains by allowing conjugation to take place between two bacteria (see Chapter 8). In conjugation, a small circular piece of DNA (a plasmid) is transferred from one bacterium to another. The plasmid used by Jacob and Monod contained the lac operon; so the recipient bacterium became partly diploid, possessing two copies of the lac operon. By using different combinations of mutations on the bacterial and plasmid DNA, Jacob and Monod determined that parts of the lac operon were cis acting (able to control the expression of genes on the same piece of DNA only) or trans acting (able to control the expression of genes on other DNA molecules). Structural-gene mutations Jacob and Monod first discovered some mutant strains that had lost the ability to synthesize either -galactosidase or permease. (They did not study in detail the effects of mutations on the transacetylase enzyme, so it will not be considered here.) These mutations mapped to the lacZ or lacY structural genes and altered the amino acid sequence of the enzymes encoded by the genes. These mutations clearly affected the structure of the enzymes and not the regulation of their synthesis. Through the use of partial diploids, Jacob and Monod were able to establish that mutations at the lacZ and lacY genes were independent and usually affected only the product of the gene in which they occurred. Partial diploids with lacZ lacY on the bacterial chromosome and lacZ lacY on the plasmid functioned normally, producing -galactosidase and permease in the presence of lactose. (The genotype of a partial diploid is written by separating the genes on each DNA molecule with a slash: lacZ lacY/lacZ lacY.) In this partial diploid, a single functional -galactosidase gene (lacZ) is sufficient to produce -galactosidase; it makes no difference whether the functional -galactosidase gene is coupled to a functional (lacY) or a defective (lacY) permease gene. The same is true of the lacY gene. Regulator-gene mutations Jacob and Monod also isolated mutations that affected the regulation of enzyme production. Control of Gene Expression (a) lacO + operator Absence of lactose Regulator gene (lacI +) RNA polymerase PI Active repressor PI lacZ – lacP + Transcription inhibited 1 The lac I + gene is trans dominant: the repressor produced by lac I + can bind to both operators and repress transcription in the absence of lactose. Mutant repressor Cannot bind Cannot bind lacZ + lacP + lacl – (b) Presence of lactose Regulator gene (lacI +) lacO + operator 2 When lactose is present, it inactivates the repressor, and functional β-galactosidase is produced from the lac Z +gene. lacZ – lacP + PI Lactose Transcription and translation Active repressor Nonfunctional β-Galactosidase Inactive repressor Mutant repressor lacZ + lacP + PI lacl – Transcription and translation ◗ 16.9 The partial diploid lacI lacZ/lacI lacZ produces -galactosidase only in the presence of lactose because the lacI gene is trans dominant. These mutations affected the production of both -galactosidase and permease, because genes for both enzymes are in the same operon and are regulated coordinately. Some of these mutations were constitutive, causing the lac enzymes to be produced all the time, whether lactose was present or not, and these mutations fell into two classes: regulator and operator. Jacob and Monod mapped one class to a site upstream of the structural genes; these mutations occurred in the regulator gene and were designated lacI. The construction of partial diploids demonstrated that a lacI gene was dominant over a lacI gene; a single copy of lacI (genotype lacI/lacI) was sufficient to bring about normal regulation of enzyme production. Furthermore, lacI restored normal control to an operon even if the operon was located on a different DNA molecule, showing that lacI was able to act in trans. A partial diploid with genotype lacI lacZ/lacI lacZ functioned normally, synthesizing -galactosidase only when lactose was present ( ◗ FIGURE 16.9). In this strain, the lacI gene on the bacterial chromosome was functional, but the lacZ gene was defec- β-Galactosidase tive; on the plasmid, the lacZ gene was defective, but the lacI gene was functional. The fact that a lacI gene could regulate a lacZ gene located on a different DNA molecule indicated to Jacob and Monod that the lacI gene product was able to diffuse to either the plasmid or the chromosome. Some lacI mutations isolated by Jacob and Monod prevented transcription from taking place even in the presence of lactose and other inducers such as IPTG. These mutations were referred to as superrepressors (lacIs), because they produced repressors that could not be inactivated by an inducer. Recall that the repressor has two binding sites, one for the inducer and one for DNA. The lacIs mutations produced a repressor with an altered inducer-binding site, which made the inducer unable to bind to the repressor; consequently, the repressor was always able to attach to the operator site and prevent transcription of the lac genes. Superrepressor mutations were dominant over lacI; partial diploids with genotype lacIs lacZ/lacI lacZ were unable to synthesize either -galactosidase or permease, whether or not lactose was present ( ◗ FIGURE 16.10). 445 446 Chapter 16 lacl s PI 1 The lacI s gene produces a superrepressor that does not bind lactose. Superrepressor Active repressor PI Lactose Inactive repressor RNA polymerase RNA polymerase Cannot bind lacO + operator lacZ + lacP + Cannot bind 2 The lacI s gene is trans dominant: the superrepressor binds both operators and prevents transcription in the presence and absence of lactose. lacI + ◗ 16.10 The partial diploid lacI s IacZ /lacI lacZ fails to produce -galactosidase in the presence and absence of lactose, because the lacI s gene encodes a superrepressor. Operator mutations Jacob and Monod mapped the other class of constitutive mutants to a site adjacent to lacZ. These mutations occurred at the operator site, and were labeled lacOc (O stands for operator and c for constitutive). The lacOc mutations altered the sequence of DNA at the operator so that the repressor protein was no longer able to bind. A partial diploid with genotype lacI lacOc lacZ/lacI lacO lacZ exhibited constitutive synthesis of -galactosidase, indicating that lacOc was dominant over lacO. Analysis of other partial diploids showed that the lacO gene was cis acting, affecting only genes on the same DNA molecule. For example, a partial diploid with genotype lacI lacO lacZ/lacI lacOc lacZ was constitutive, producing -galactosidase in the presence or absence of lactose ( ◗ FIGURE 16.11a), but a partial diploid with genotype lacI lacO lacZ/lacI lacOc lacZ produced -galactosidase only in the presence of lactose ( ◗ FIGURE 16.11b). In the constitutive partial diploid (lacI lacO lacZ/lacI lacOc lacZ; see Figure 16.11a), the lacOc mutation and the functional lacZ gene are present on the same DNA molecule; but in lacI lacO lacZ/lacI lacOc lacZ (see Figure 16.11b), the lacOc mutation and the functional lacZ gene are on different molecules. The lacO mutation affects only genes to which it is physically connected, as is true of all operator mutations. They prevent the binding of a repressor protein to the operator and thereby allow RNA polymerase to transcribe genes on the same DNA molecule. However, they cannot prevent a repressor from binding to normal operators on other DNA molecules. Promoter mutations Mutations affecting lactose metabolism have also been isolated at the promoter site; these mutations are designated lacP, and they interfere with the binding of RNA polymerase to the promoter. Because this binding is essential for the transcription of the structural genes, E. coli strains with lacP mutations don’t produce lac enzymes either in the presence or in the absence of lactose. Like operator mutations, lacP mutations are cis acting and affect only genes on the same DNA molecule. The partial diploid lacI lacP lacZ/lacI lacP lacZ exhibits normal synthesis of -galactosidase, whereas the lacI lacP lacZ/lacI lacP lacZ fails to produce -galactosidase whether or not lactose is present. Positive Control and Catabolite Repression E. coli and many other bacteria will metabolize glucose preferentially in the presence of lactose and other sugars. They do so because glucose enters glycolysis without further modification and therefore requires less energy to metabolize than do other sugars. When glucose is available, genes that participate in the metabolism of other sugars are repressed, in a phenomenon known as catabolite repression. For example, the efficient transcription of the lac operon takes place only if lactose is present and glucose is absent. But how is the expression of the lac operon influenced by glucose? What brings about catabolite repression? Catabolite repression results from positive control in response to glucose. (This regulation is in addition to the negative control brought about by the repressor binding at the operator site of the lac operon when lactose is absent.) Positive control is accomplished through the binding of a dimeric protein called the catabolite activator protein (CAP) to a site that is about 22 nucleotides long and is located within or slightly upstream of the promoter of the lac genes ( ◗ FIGURE 16.12). RNA polymerase does not bind efficiently to many promoters unless CAP is first bound to the DNA. Before CAP can bind to DNA, it must form a complex with a modified nucleotide called adenosine-3, 5-cyclic monophosphate (cyclic AMP or cAMP), which is important in cellular signaling processes in both bacterial and eukaryotic cells. In E. coli, the concentration of cAMP is inversely proportional to the level of available glucose. A high concentration of glucose within the cell lowers the amount of cAMP, and so little cAMP – CAP complex is available to bind to the DNA. Subsequently, RNA polymerase has poor affinity for the lac promoter, and little Control of Gene Expression (a) Partial diploid lacI + lacO + lacZ –/ lacI + lacO c lacZ + lacO + operator Absence of lactose Cannot bind lac I + PI Active repressor (b) Partial diploid lacI + lacO + lacZ +/ lacI + lacO c lacZ – lacO + operator Absence of lactose lacZ – Cannot bind lacI + PI lac P Binds lacZ + lac I + lac Z + lacP Active repressor Binds lacZ – lacI + lacO c operator lacO c operator Transcription and translation Transcription and translation Nonfunctional β-Galactosidase β-Galactosidase lacO + operator Presence of lactose Binds lac I + PI Lactose lacZ – Nonfunctional β-Galactosidase Binds Binds lacI + PI Transcription and translation Inactive repressor lacO + operator Presence of lactose lac P Active repressor 447 Lactose Active repressor Transcription and translation Inactive repressor Binds lacZ + lac I + lac Z + lacP lacZ – lacI + lacO c operator lacO c operator Transcription and translation Transcription and translation Nonfunctional β-Galactosidase β-Galactosidase ◗ 16.11 Mutations in lacO are constitutive and cis acting. (a) The partial diploid lacI lacO lacZ/lacI lacOc lacZ is constitutive, producing -galactosidase in the presence and absence of lactose. (b) The partial diploid lacI lacO lacZ/lacI lacOc lacZ is inducible (produces -galactosidase only when lactose is present), demonstrating that the lacO gene is cis acting. transcription of the lac operon takes place. Low concentrations of glucose stimulate high levels of cAMP, resulting in increased cAMP – CAP binding to DNA. This increase enhances the binding of RNA polymerase to the promoter and increases transcription of the lac genes by some 50-fold. The catabolite activator protein exerts positive control in more than 20 operons of E. coli. The response to CAP varies among these promoters; some operons are activated by low levels of CAP, whereas others require high levels. CAP contains a helix-turn-helix DNA-binding motif and, when it binds at the CAP site, it causes the DNA helix to bend ( ◗ FIGURE 16.13). The bent helix enables CAP to interact directly with the RNA polymerase enzyme bound to the promoter and facilitate the initiation of transcription. 448 Chapter 16 When glucose is low 1 When glucose level is low, cAMP levels are high. cAMP cAMP cAMP cAMP 2 CAP readily binds cAMP, and the CAP–cAMP complex binds DNA,… RNA polymerase CAP cAMP cAMP 3 …increasing the efficiency of polymerase binding. cAMP cAMP CAP lacI lacO operator CAP PI lacZ lacY lac lacA A lacP Transcription and translation Enzymes β-Galactosidase Permease 4 The result is high rates of transcription and translation of the structural genes… Transacetylase Glucose Galactose Lactose 5 …and the production of glucose from lactose. When glucose is high 2 RNA polymerase cannot bind to DNA as efficiently;… 1 When glucose level is high, cAMP levels are low, and cAMP is less likely to bind to CAP. cAMP cAMP RNA polymerase CAP lacO operator CAP 3 …so transcription is at a low rate. Little transcription ◗ 16.12 The catabolite activator protein (CAP) binds to the promoter of the lac operon and stimulates transcription. CAP must complex with cAMP before binding to the promoter of the lac operon. The binding of cAMP – CAP to the promoter activates transcription by facilitating the binding of RNA polymerase. Levels of cAMP are inversely related to glucose: low glucose stimulates high cAMP; high glucose stimulates low cAMP. Concepts In spite of its name, catabolite repression is a type of positive control in the lac operon. CAP, complexed with cAMP, binds to a site near the promoter and stimulates the binding of RNA polymerase. Cellular levels of cAMP in the cell are controlled by glucose; a low glucose level increases the abundance of cAMP and enhances the transcription of the lac structural genes. www.whfreeman.com/pierce Information on CAP control and the binding of CAP to DNA The trp Operon of E. coli The lac operon just discussed is an inducible operon, one in which transcription does not normally take place and must be turned on. Other operons are repressible; transcription in these operons is normally turned on and must be repressed. The tryptophan (trp) operon in E. coli, which controls the biosynthesis of the amino acid tryptophan, is an example of a repressible operon. The trp operon contains five structural genes (trpE, trpD, trpC, trpB, and trpA), which produce components of three enzymes (two of the enzymes consist of two polypeptide chains). These enzymes convert chorismate into tryptophan ( ◗ FIGURE 16.14). The first structural Control of Gene Expression DNA cAMP CAP cAMP–CAP complex DNA CAP site cAMP RNA polymerase CAP Promoter Transcription start site ◗ 16.13 Binding of the cAMP – CAP complex to DNA produces a sharp bend in DNA that activates transcription. When tryptophan is low Regulator gene (trpR) gene, trpE, contains a long 5 untranslated region (5 UTR) that is transcribed but does not encode any of these enzymes. Instead, this 5 UTR plays an important role in another regulatory mechanism, discussed in the next section. Upstream of the structural genes is the trp promoter. When tryptophan levels are low, RNA polymerase binds to the promoter and transcribes the five structural genes into a single mRNA, which is then translated into enzymes that convert chorismate into tryptophan. Some distance from the trp operon is a regulator gene, trpR, which encodes a repressor that alone cannot bind DNA (see Figure 16.14). Like the lac repressor, the tryptophan repressor has two binding sites, one that binds to DNA at the operator site and another that binds to tryptophan (the activator). Binding with tryptophan causes a conformational change in the repressor that makes it capable of binding to DNA at the operator site, which overlaps the promoter (see Figure 16.14). When the operator is occupied by the tryptophan repressor, RNA polymerase cannot bind to the promoter and the structural genes cannot be transcribed. Thus, when cellular levels of tryptophan are low, transcription of the trp operon takes place and more tryptophan is synthesized; when cellular levels of tryptophan are high, transcription of the trp operon is inhibited and the synthesis of more tryptophan does not take place. RNA polymerase Operator 5’ UTR PR 1 The trp repressor is normally inactive. Structural genes trpE trpD trpC trpB trpA Promoter Transcription and translation 2 It cannot bind to the operator,… Transcription and translation Inactive regulator protein (repressor) Tryptophan Operator When tryptophan is high PR Transcription and translation Inactive regulator protein (repressor) 3 …and transcription takes place. Enzymes Chorismate (trp repressor) ◗ 449 No transcription 1 Tryptophan binds to the repressor and makes it active. 2 The trp repressor then binds to the operator and shuts transcription off. 16.14 The trp operon controls the biosynthesis of the amino acid tryptophan in E. coli. 450 Chapter 16 Concepts The trp operon is a repressible operon that controls the biosynthesis of tryptophan. In a repressible operon, transcription is normally turned on and must be repressed. Repression is accomplished through the binding of tryptophan to the repressor, which renders the repressor active. The active repressor binds to the operator and prevents RNA polymerase from transcribing the structural genes. Attenuation: The Premature Termination of Transcription We’ve now seen how both positive and negative control regulate the initiation of transcription in an operon. Some operons have an additional level of control that affects the continuation of transcription rather than its initiation. In attenuation, transcription begins at the start site, but termination takes place prematurely, before the RNA polymerase even reaches the structural genes. Attenuation occurs in a number of operons that code for enzymes participating in the biosynthesis of amino acids. We can understand the process of attenuation most easily by looking at one of the best-studied examples, (a) Trp operon Ribosome binding site which is found in the trp operon of E. coli. Several observations by Charles Yanofsky and his colleagues in the early 1970s indicated that repression at the operator site is not the only method of regulation in the trp operon. They isolated a series of mutants that possessed deletions in the transcribed region of the operon. Some of these mutants exhibited increased levels of transcription, yet control at the operator site was unaffected. Furthermore, they observed that two mRNAs of different sizes were transcribed from the trp operon: a long mRNA containing sequences for the structural genes and a much shorter mRNA of only 140 nucleotides. These observations led Yanofsky to propose that another mechanism — one that caused premature termination of transcription — also regulates transcription in the trp operon. Close examination of the trp operon reveals a region of 162 nucleotides that corresponds to the long 5 UTR of the mRNA (mentioned earlier) transcribed from the trp operon ( ◗ FIGURE 16.15a). The 5 UTR (also called a leader) contains four regions: region 1 is complementary to region 2, region 2 is complementary to region 3, and region 3 is complementary to region 4. These complementarities allow the 5 UTR to fold into two different secondary structures ( ◗ FIGURE 16.15b). Which secondary structure is assumed determines whether attenuation will occur. 5’ UTR Regions: 1 2 3 Gene e 4 3’ 5’ UUUUUUU Trp codons Start codon Start codon (b) 2 When tryptophan is low, region 2 pairs with region 3. This structure does not terminate transcription. 1 When tryptophan is high, region 3 pairs with region 4. This structure terminates transcription. Trp codons 12 3 1 23 UUUUUUU UUUUU 4 1+2 and 3+4 secondary structure Attenuation (terminates transcription) ◗ 4 2+3 secondary structure Antitermination 16.15 Two different secondary structures may be formed by the 5 UTR of the mRNA transcript of the trp operon. Control of Gene Expression One of the secondary structures contains one hairpin produced by the base pairing of regions 1 and 2 and another hairpin produced by the base pairing of regions 3 and 4. Notice that a string of uracil nucleotides follows the 34 hairpin. Not coincidentally, the structure of a bacterial intrinsic terminator (Chapter 13) includes a hairpin followed by a string of uracil nucleotides; this secondary structure in the 5 UTR of the trp operon is indeed a terminator and is called an attenuator. When cellular levels of tryptophan are high, regions 3 and 4 of the 5 UTR base pair, producing the attenuator structure; this base pairing causes transcription to be terminated before the trp structural genes can be transcribed. The alternative secondary structure of the 5 UTR is produced by the base pairing of regions 2 and 3 (see Figure 16.15b). This base pairing also produces a hairpin, but this hairpin is not followed by a string of uracil nucleotides; so this structure does not function as a terminator. When cellular levels of tryptophan are low, regions 2 and 3 base pair, and transcription of the trp structural genes is not terminated. RNA polymerase continues past the 5 UTR into the coding section of the structural genes, and the enzymes that synthesize tryptophan are produced. Because it prevents the termination of transcription, the 23 structure is called an antiterminator. To summarize, the 5 UTR of the trp operon can fold into one of two structures. When tryptophan is high, the 34 structure forms, transcription is terminated within the 5 UTR, and no additional tryptophan is synthesized. When tryptophan is low, the 23 structure forms, transcription continues through the structural genes, and tryptophan is synthesized. The critical question, then, is, Why does the 34 structure arise when tryptophan is high and the 23 structure when tryptophan is low? To answer this question, we must take a closer look at the nucleotide sequence of the 5 UTR. At the 5 end, upstream of region 1, is a ribosome-binding site. Region 1 actually encodes a small protein (see Figure 16.15b). Within the coding sequence for this protein are two UGG codons, which specify the amino acid tryptophan; so tryptophan is required for the translation of this 5 UTR sequence. The protein encoded by the 5 UTR has not been isolated and is presumed to be unstable; its only apparent function is to control attenuation. Although it was stated in Chapter 14 that a 5 UTR is not translated into a protein, the 5 UTR of operons subject to attenuation are exceptions to this rule. The formation of hairpins in the 5 UTR of the trp operon is controlled by the interplay of transcription and translation that takes place near the 5 end of the mRNA. Recall that, in prokaryotic cells, transcription and translation are coupled: while transcription is taking place at the 3 end of the mRNA, translation is initiated at the 5 end. The precise timing and interaction of these two processes in the 5 UTR determine whether attenuation occurs. Transcription when tryptophan levels are high Let’s first consider what happens when intracellular levels of tryptophan are high. RNA polymerase begins transcribing the DNA, producing region 1 of the 5 UTR ( ◗ FIGURE 16.16a). Following RNA polymerase closely, a ribosome binds to the 5 UTR (at the Shine-Dalgarno sequence, see Chapter 14) and begins to translate the coding region. Meanwhile, RNA polymerase is transcribing region 2 ( ◗ FIGURE 16.16b). Region 2 is complementary to region 1 but, because the ribosome is translating region 1, the nucleotides in regions 1 and 2 cannot base pair. As RNA polymerase begins to transcribe region 3, the ribosome is continuing to translate region 1 ( ◗ FIGURE 16.16c). When the ribosome reaches the two UGG tryptophan codons, it doesn’t slow or stall, because tryptophan is abundant and tRNAs charged with tryptophan are readily available. This point is critical to note: because tryptophan is abundant, translation can keep up with transcription. As it moves past region 1 to the stop codon, the ribosome partly covers region 2; ( ◗ FIGURE 16.16d); meanwhile, RNA polymerase completes the transcription of region 3. Although regions 2 and 3 are complementary, region 2 is partly covered by the ribosome; so it can’t base pair with 3. RNA polymerase continues to move along the DNA, eventually transcribing regions 4 of the 5 UTR. Region 4 is complementary to region 3, and, because region 3 cannot base pair with region 2, it pairs with region 4. The pairing of regions 3 and 4 (see Figure 16.16d) produces the attenuator — a hairpin followed by a string of uracil nucleotides — and transcription terminates just beyond region 4. The structural genes are not transcribed, no tryptophanproducing enzymes are translated, and no additional tryptophan is synthesized. Transcription when tryptophan levels are low What happens when tryptophan levels are low? Once again, RNA polymerase begins transcribing region 1 of the 5 UTR ( ◗ FIGURE 16.16e), and the ribosome binds to the 5 end of the 5 UTR and begins to translate region 1 while RNA polymerase continues transcribing region 2 ( ◗ FIGURE 16.16f). When the ribosome reaches the UGG tryptophan codons, it stalls ( ◗ FIGURE 16.16g) because the level of tryptophan is low, and tRNAs charged with tryptophan are scarce or even unavailable. The ribosome sits at the tryptophan codons, awaiting the arrival of a tRNA charged with tryptophan. Stalling of the ribosome does not, however, hinder transcription; RNA polymerase continues to move along the DNA, and transcription gets ahead of translation. Because the ribosome is stalled at the tryptophan codons in region 1, region 2 is not covered by the ribosome when region 3 has been transcribed. Therefore, nucleotides in region 2 and region 3 base pair, forming the 23 hairpin ( ◗ FIGURE 16.16h). This hairpin does not cause termination, and so transcription continues. Because region 3 is already paired with region 2, the 34 hairpin (the attenuator) 451 452 Chapter 16 When tryptophan is high (a) 1 RNA polymerase begins transcribing DNA, producing region 1 of the 5‘ UTR. RNA polymerase Trp codons mRNA DNA 1 (b) 2 A ribosome binds to the 5‘ end of the 5‘ UTR and begins to translate region 1, while region 2 is being transcribed. 2 Ribosome 1 1 (c) 3 The ribosome translates region 1 while RNA polymerase transcribes region 3. 3 4 2 2 2 3 4 The ribosome does not stall at the Trp codons, because tryptophan is abundant. (d) 5 The leading edge of the ribosome covers part of region 2, preventing it from pairing with region 3. 2 6 Region 4 is transcribed and pairs with region 3. The pairing of regions 3 and 4 produces the attenuator that terminates transcription. 3 4 When tryptophan is low (e) 1 RNA polymerase begins transcribing the DNA, producing region 1 of the 5‘ UTR. (f) 2 A ribosome attaches to the 5‘ end of the 5‘ UTR and begins to translate region 1 while region 2 is being transcribed. (g) 3 The ribosome stalls at the Trp codons in region 1 because tryptophan is low. Trp codons 4 Because the ribosome is stalled, region 2 is not covered by the ribosome when region 3 is transcribed. (h) 5 When region 3 is transcribed, it pairs with region 2. 2 3 4 6 When region 4 is transcribed, it cannot pair with region 3, because region 3 is already paired with region 2; the attenuator never forms, and transcription continues. ◗ 16.16 The premature termination of transcription (attenuation) takes place in the trp operon, depending on the cellular level of tryptophan. never forms, and so attenuation does not occur. RNA polymerase continues along the DNA, past the 5 UTR, transcribing all the structural genes into mRNA, which is translated into the enzymes encoded by the trp operon. These enzymes then synthesize more tryptophan. Important events in the process of attenuation are summarized in Table 16.3. Several additional points about attenuation need clarification. The key factor controlling attenuation is the number of tRNA molecules charged with tryptophan, because Control of Gene Expression Table 16.3 Events in the process of attenuation Intracellular Level of Tryptophan Ribosome Stalls at Trp Codons Position of Ribosome When Region 3 Is Transcribed Secondary Structure of 5 UTR Termination of Transcription of trp Operon High No Covers region 2 34 hairpin Yes Low Yes Covers region 1 23 hairpin No their availability is what determines whether the ribosome stalls at the tryptophan codons. A second point concerns the synchronization of transcription and translation, which is critical to attenuation. Synchronization is achieved through a pause site located in region 1 of the 5 UTR. After initiating transcription, RNA polymerase stops temporarily at this site, which allows time for a ribosome to bind to the 5 end of the mRNA so that translation can closely follow transcription. A third point is that ribosomes do not traverse the convoluted hairpins of the 5 UTR to translate the structural genes. Ribosomes that attach to the ribosomebinding site at the 5 end of the mRNA encounter a stop codon at the end of region 1. Ribosomes translating the structural genes attach to a different ribosome-binding site located near the beginning of the trpE gene. Why does attenuation occur? Why do bacteria need attenuation in the trp operon? Shouldn’t repression at the operator site prevent transcription from taking place when tryptophan levels in the cell are high? Why does the cell have two types of control? Part of the answer is that repression is never complete; some transcription is initiated even when the trp repressor is active; repression reduces transcription only as much as 70-fold. Attenuation can further reduce transcription another 8- to 10-fold; so together the two processes are capable of reducing transcription of the trp operon more than 600-fold. Both mechanisms provide E. coli with a much finer degree of control over tryptophan synthesis than either could achieve alone. Another reason for the dual control is that attenuation and repression respond to different signals: repression responds to the cellular levels of tryptophan, whereas attenuation responds to the number of tRNAs charged with tryptophan. There may be times when it is advantageous for the cell to be able to respond to these different signals. Finally, the trp repressor affects several operons other than the trp operon. It’s possible that at an earlier stage in the evolution of E. coli, the trp operon was controlled only by attenuation. The trp repressor may have evolved primarily to control the other operons and only incidentally affects the trp operon. Attenuation is a complex process to grasp because you must simultaneously visualize how two dynamic processes — transcription and translation — interact, and it’s easy to get the two processes confused. Remember that attenuation entails the early termination of transcription, not translation (although events in translation bring about the termination of transcription). Attenuation often causes confusion because we know that transcription must precede translation. We’re comfortable with the idea that transcription might affect translation, but it’s harder to imagine that the effects of translation could influence transcription, as it does in attenuation. The reality is that transcription and translation are closely coupled in prokaryotic cells, and events in one process can easily affect the other. Concepts In attenuation, transcription is initiated but terminates prematurely. When tryptophan levels are low, the ribosome stalls at the tryptophan codons and transcription continues. When tryptophan levels are high, the ribosome does not stall at the tryptophan codons, and the 5 UTR adopts a secondary structure that terminates transcription before the structural genes can be copied into RNA (attenuation). www.whfreeman.com/pierce attenuation More information on Antisense RNA in Gene Regulation All the regulators of gene expression that we have considered so far have been proteins. Several examples of RNA regulators have also been discovered. These small RNA molecules are complementary to particular sequences on mRNAs and are called antisense RNA. They control gene expression by binding to sequences on mRNA and inhibiting translation. Translational control by antisense RNA is seen in the regulation of the ompF gene of E. coli ( ◗ FIGURE 16.17a). Two E. coli genes, ompF and ompC, produce outer-membrane proteins that function as diffusion pores, allowing bacteria to adapt to external osmolarities (the tendency of water to move across a membrane owing to different ion concentrations). Under most conditions, both the ompF and the ompC genes are transcribed and translated. When the osmolarity of the medium increases, a regulator gene named micF—for 453 454 Chapter 16 (a) Low osmolarity 1 When extracellular osmolarity is low,… (b) High osmolarity 1 When extracellular osmolarity is high,… ompF gene micF gene ompF gene Transcription Transcription Transcription Antisense RNA mRNA Translation 2 …the ompF gene is transcribed and translated to produce OmpF protein. 5’ 3’ Ribosome OmpF protein ◗ 16.17 mRNA 2 …the micF gene is activated and micF RNA is produced. 3 micF RNA pairs with the 5‘ end of ompF RNA, blocking the ribosome-binding site and preventing translation. 4 Thus, no OmpF protein is produced. Translation 3’ 5’ 5’ 3’ No translation Antisense RNA can regulate translation. mRNA-interfering complementary RNA — is activated and micF RNA is produced ( ◗ FIGURE 16.17b). The micF RNA, an antisense RNA, binds to a complementary sequence in the 5 UTR of the ompF mRNA and inhibits the binding of the ribosome. This inhibition reduces the amount of translation (see Figure 16.17b), which results in fewer OmpF proteins in the outer membrane and thus reduces the detrimental movement of substances across the membrane owing to the changes in osmolarity. A number of examples of antisense RNA controlling gene expression have now been identified in bacteria and bacteriophages. Concepts Antisense RNA is complementary to other RNA or DNA sequences. In bacterial cells, it may inhibit translation by binding to sequences in the 5 UTR of mRNA and preventing the attachment of the ribosome. Transcriptional Control in Bacteriophage Lambda Bacteriophage  is a virus that infects the bacterium E. coli (Chapter 8). Bacteriophage  possesses a single DNA chromosome consisting of 48,502 nucleotides surrounded by a protein coat. A bacteriophage infects a bacterial cell by attaching to the cell wall and injecting its DNA into the cell. Inside the cell,  phage undergoes either of two life cycles. In the lytic cycle (see Chapter 8), phage genes are transcribed and translated to produce phage coat proteins and enzymes that synthesize from 100 to 200 copies of the phage DNA. The viral components are assembled to produce phage particles, and the phage produces a protein that causes the cell to lyse. The released phage can then infect other bacterial cells. In the lysogenic cycle, phage genes that encode replication enzymes and phage proteins are not immediately transcribed. Instead, the phage DNA integrates into the bacterial chromosome as a prophage. When the bacterial chromosome replicates, the prophage is duplicated along with the bacterial genes and is passed to the daughter cells in bacterial reproduction. The prophage may later excise from the bacterial chromosome and enter the lytic cycle. Whether a  phage enters the lytic or the lysogenic cycle depends on the regulation of the phage genes. In the lytic cycle, the genes that encode replication enzymes, phage proteins, and bacterial cell lysis are transcribed; but, in the lysogenic cycle, these genes are repressed. Like bacterial genes, functionally related phage genes are clustered together into operons. There are four major operons in the phage  chromosome ( ◗ FIGURE 16.18). The early right operon contains genes that are required for DNA replication and are transcribed early in the lytic cycle. The early left operon contains genes necessary for recombination and the integration of phage DNA into the bacterial chromosome as a part of the lysogenic cycle. A third operon, the late operon, contains genes that encode the protein coat of the phage, produced late in the lytic cycle. The fourth operon is the repressor operon, which produces the  repressor responsible for maintaining the prophage DNA in a dormant state. Although there are several additional promoters on the  chromosome that may be activated at special times, here the emphasis is on three general features of transcriptional control in bacteriophage . First, both positive control and negative control are seen in  gene regulation. Several proteins act as repressors, inhibiting transcription, whereas others act as activators, Control of Gene Expression Regulator of λ Repressor operon Regulator of early left genes Regulator of late genes Earl righy oper t o ft le n o Genes for lysis proteins n Ea operly r Genes for integrating viral DNA into bacterial chromosome Phage DNA replication proteins Promoters Lat ro pe o e Genes for viral head proteins Genes for viral tail proteins ◗ 16.18 The bacteriophage  chromosome contains four major operons: the early left operon, the early right operon, the late operon, and the repressor operon. stimulating transcription. The  repressor, which plays a major role in  gene regulation, can act as either an activator or a repressor. A second feature is that transcription is accomplished through a cascade of reactions. As one operon is transcribed, it produces a protein that regulates the transcription of a second operon, which produces a protein that affects the transcription of a third operon. Thus, the operons are activated and repressed in a particular order, with the use of several different promoters, each with an affinity for specific activators and repressors. As each promoter is activated, only the genes under its control are transcribed; (a) this controlled transcription ensures that genes appropriate to each stage of the lytic or lysogenic cycle are expressed. A third feature of  gene regulation is the use of transcriptional antiterminator proteins, which bind to RNA polymerase and alter its structure, allowing it to ignore certain terminators ( ◗ FIGURE 16.19a). In the absence of the antiterminator protein, RNA polymerase stops at a terminator located early in the operon ( ◗ FIGURE 16.19b), and so only some of the genes in the operon are transcribed and translated. n Regulator genes Concepts The entry of bacteriophage  into lysis or lysogeny is controlled by a cascade of reactions, in which the transcription of operons is turned on and off in a specific sequence. The expression of the operons is controlled by the affinity of different promoters for repressor and activator proteins and through transcriptional antiterminators. Eukaryotic Gene Regulation Many features of gene regulation are common to both bacterial and eukaryotic cells. For example, in both types of cells, DNA-binding proteins influence the ability of RNA polymerase to initiate transcription. However, there are also some differences, although these differences are often a matter of degree. First, eukaryotic genes are not organized into operons and are rarely transcribed together into a single mRNA molecule; instead, each structural gene typically has its own promoter and is transcribed separately. Second, chromatin structure affects gene expression in eukaryotic cells; DNA must unwind (b) Antiterminator present RNA polymerase Terminator 1 Gene A Terminator 2 Antiterminator absent RNA polymerase Terminator 1 Gene A Gene B Terminator 2 Gene B Promoter Promoter Antiterminator Transcription 1 RNA polymerase reads through terminator 1… Long mRNA Protein Transcription Short mRNA 2 …and transcribes a longer mRNA… Translation ◗ 455 A B 3 …that codes for proteins A and B. 16.19 Antiterminator proteins bind to RNA polymerase and alter its structure so that it ignores certain terminators. Translation Protein A 1 RNA polymerase stops at terminator 1. 2 A short mRNA is produced that codes for protein A. 456 Chapter 16 from the histone proteins before transcription can take place. Third, although both repressors and activators function in eukaryotic and bacterial gene regulation, activators seem to be more common in eukaryotic cells. Finally, the regulation of gene expression in eukaryotic cells is characterized by a greater diversity of mechanisms that act at different points in the transfer of information from DNA to protein. Eukaryotic gene regulation is less well understood than bacterial regulation, partly owing to the larger genomes in eukaryotes, their greater sequence complexity, and the difficulty of isolating and manipulating mutations that can be used in the study of gene regulation. Nevertheless, great advances in our understanding of the regulation of eukaryotic genes have been made in recent years, and eukaryotic regulation continues to be one of the cutting-edge areas of research in genetics. Histone protein DNase I hypersensitivity Several types of changes are observed in chromatin structure when genes become transcriptionally active. One type is an increase in the sensitivity of chromatin to degradation by DNase I, an enzyme that digests DNA. When tightly bound by histone proteins, DNA is resistant to DNase I digestion because the enzyme cannot gain access to the DNA. When DNA is less tightly bound by histones, it becomes sensitive to DNase I degradation. Thus, the ability of DNase I to digest DNA provides an indication of the DNA – histone association. As genes become transcriptionally active, regions around the genes become highly sensitive to the action of DNase I (see Chapter 11). These regions, called DNase I hypersensitive sites, frequently develop about 1000 nucleotides upstream of the start site of transcription, suggesting that the chromatin in these regions adopts a more open configuration during transcription. This relaxation of the chromatin structure may allow regulatory proteins access to binding sites on the DNA. Indeed, many DNase I hypersensitive sites correspond to known binding sites for regulatory proteins. 1 Positively charged tails of nucleosomal histone proteins probably interact with the negatively charged phosphates of DNA. H1 Positively charged tail Acetylation H1 Chromatin Structure and Gene Regulation One type of gene control in eukaryotic cells is accomplished through the modification of gene structure. In the nucleus, histone proteins associate to form octamers, around which helical DNA tightly coils to create chromatin (see Figure 11.5). In a general sense, this chromatin structure represses gene expression. For a gene to be transcribed, transcription factors, activators, and RNA polymerase must bind to the DNA. How can these events take place with DNA wrapped tightly around histone proteins? The answer is that before transcription, chromatin structure changes, and the DNA becomes more accessible to the transcriptional machinery. DNA 2 Acetylation of the tails weakens their interaction with DNA and may permit some transcription factors to bind to DNA. ◗ 16.20 The acetylation of histone proteins alters chromatin structure and permits some transcription factors to bind to DNA. Histone acetylation One factor affecting chromatin structure is acetylation, the addition of acetyl groups (CH3CO) to histone proteins. Histones in the octamer core of the nucleosome have two domains: (1) a globular domain that associates with other histones and the DNA and (2) a positively charged tail domain that probably interacts with the negatively charged phosphates on the backbone of DNA ( ◗ FIGURE 16.20). Acetyl groups are added to histone proteins by acteyltransferase enzymes; the acetyl groups destabilize the nucleosome structure, perhaps by neutralizing the positive charges on the histone tails and allowing the DNA to separate from the histones. Other enzymes called deacetylases strip acetyl groups from histones and restore chromatin repression. Certain transcription factors (see Chapter 13) and other proteins that regulate transcription either have acteyltransferase activity or attract acteyltransferases to the DNA. Some transcription factors and other regulatory proteins are known to alter chromatin structure without acetylating histone proteins. These chromatin-remodeling complexes bind directly to particular sites on DNA and reposition the nucleosomes, allowing transcription factors to bind to promoters and initiate transcription. Control of Gene Expression DNA methylation Another change in chromatin structure associated with transcription is the methylation of cytosine bases, which yields 5-methylcytosine (see Figure 10.19). Heavily methylated DNA is associated with the repression of transcription in vertebrates and plants, whereas transcriptionally active DNA is usually unmethylated in these organisms. DNA methylation is most common on cytosine bases adjacent to guanine nucleotides on the same strand (CpG); so two methylated cytosines sit diagonally across from each other on opposing strands:   GC     CG   Concepts Sensitivity to DNase I digestion suggests that transcribed DNA assumes an open configuration before transcription. The acetylation of histone proteins disrupts nucleosome structure and may facilitate transcription. The activation of transcription is often preceded by demethylation of DNA; methylated sequences may attract deacetylases, which remove acetyl groups from histone proteins, stabilizing chromatin structure and repressing transcription. Transcriptional Control in Eukaryotic Cells DNA regions with many CpG sequences are called CpG islands and are commonly found near transcription start sites. While genes are not being transcribed, these CpG islands are often methylated, but the methyl groups are removed before the initiation of transcription. CpG methylation is also associated with long-term gene repression, such as on the inactivated X chromosome of female mammals (see Chapter 4). Recent evidence suggests an association between DNA methylation and the deacetylation of histones, both of which repress transcription. Certain proteins that bind tightly to methylated CpG sequences form complexes with other proteins that act as histone deacetylases. In other words, methylation appears to attract deacetylases, which remove acetyl groups from the histone tails, stabilizing the nucleosome structure and repressing transcription. Demethylation of DNA would allow acetyltransferases to remove these acetyl groups, disrupting nucleosome structure and permitting transcription. Transcription is an important level of control in eukaryotic cells, and this control requires a number of different types of proteins and regulatory elements. The initiation of eukaryotic transcription was discussed in detail in Chapter 13. Recall that general transcription factors and RNA polymerase assemble into a basal transcription apparatus, which binds to a core promoter located immediately upstream of a gene. The basal transcription apparatus is capable of minimal levels of transcription; transcriptional activator proteins are required to bring about normal levels of transcription. These proteins bind to a regulatory promoter, which is located upstream of the core promoter, and to enhancers, which may be located some distance from the gene ( ◗ FIGURE 16.21). Transcriptional activators, coactivators and repressors Transcriptional activator proteins stimulate transcription by facilitating the assembly or action of the basal transcription apparatus at the core promoter; the activators may interact directly with the basal transcription apparatus or indirectly Activator binding site (regulatory promoter) Core promoter DNA Transcription factors, RNA polymerase, and transcriptional activator proteins bind DNA and stimulate transcription. Transcription start RNA polymerase TATA DNA Transcriptional activator protein ◗ TATA box Basal transcription apparatus 16.21 Transcriptional activator proteins bind to sites on DNA and stimulate transcription. Most act by stimulating or stabilizing the assembly of the basal transcription apparatus. 457 458 Chapter 16 through protein coactivators. Some activators and coactivators, as well as the general transcription factors, also have acteyltransferase activity and facilitate transcription further by altering chromatin structure (see earlier subsection on histone acetylation). Transcriptional activator proteins have two distinct functions (see Figure 16.21). First, they are capable of binding DNA at a specific base sequence, usually a consensus sequence in a regulatory promoter or enhancer; for this function, most transcriptional activator proteins contain one or more of the DNA-binding motifs discussed at the beginning of this chapter. A second function is the ability to interact with other components of the transcriptional apparatus and influence the rate of transcription. Most do so by either stabilizing or stimulating the assembly of the basal transcription apparatus. GAL4 is a transcription activator protein that regulates the transcription of several yeast genes in galactose metabolism. GAL4 contains several zinc fingers and binds to a DNA sequence called UASG (upstream activating sequence for GAL4). UASG exhibits the properties of an enhancer — a regulatory sequence that may be some distance from the regulated gene and is independent of the gene in position and orientation (see Chapter 13). When bound to UASG, GAL4 stimulates the transcription of yeast genes needed for metabolizing galactose. A particular region of GAL4 binds another protein called GAL80, which regulates the activity of GAL4 in the presence of galactose. When galactose is absent, GAL80 binds to GAL4 (two molecules of GAL80 bind to each molecule of GAL4), preventing GAL4 from activating transcription ( ◗ FIGURE 16.22). When galactose is present, however, it binds to GAL80, causing a conformational change in the protein so that it can no longer bind GAL4. The GAL4 protein is then available to activate the transcription of the genes whose products metabolize galactose. GAL4 and a number of other transcriptional activator proteins contain multiple amino acids with negative charges that form an acidic activation domain. These acidic activators stimulate transcription by enhancing the ability of TFIIB (see Chapter 13), one of the general transcription factors, to join the basal transcription apparatus. Without the activator, the binding of TFIIB is a slow process; the activator helps “recruit” TFIIB to the initiation complex, thereby stimulating the binding of RNA polymerase and the initiation of transcription. Acidic activators may also enhance other steps in the assembly of the basal transcription apparatus. Some regulatory proteins in eukaryotic cells act as repressors, inhibiting transcription. These repressors may bind to sequences in the regulatory promoter or to distant sequences called silencers, which, like enhancers, are position and orientation independent. Unlike repressors in bacteria, most eukaryotic repressors do not directly block RNA polymerase. These repressors may compete with activators for DNA binding sites: when a site is occupied by an activa- UASG Absence of galactose Presence of galactose Galactose GAL80 GAL80 GAL4 GAL4 1 GAL80 protein binds to GAL4 and prevents activation. UASG 2 When galactose is present, it binds to GAL80 and prevents it from binding to GAL4. UASG 3 GAL4 stimulates the transcription of galactosemetabolizing genes. Protein Transcription of genes not stimulated Transcription of genes stimulated ◗ 16.22 Transcription is activated by GAL4 in response to galactose. GAL4 binds to the UASG site and controls the transcription of genes in galactose metabolism. tor, transcription is stimulated, but, if a repressor occupies that site, no activation occurs. Alternatively, a repressor may bind to sites near an activator site and prevent the activator from contacting the basal transcription apparatus. A third possible mechanism of repressor action is direct interference with the assembly of the basal transcription apparatus, thereby blocking the initiation of transcription. Concepts Transcriptional regulatory proteins in eukaryotic cells can influence the initiation of transcription by affecting the stability or assembly of the basal transcription apparatus. Some regulatory proteins are activators and stimulate transcription; others are repressors and inhibit transcription. Enhancers and insulators Enhancers are capable of affecting transcription at distant promoters. For example, an enhancer that regulates the gene encoding the alpha chain of the T-cell receptor is located 69,000 bp down- Control of Gene Expression ◗ 16.23 An insulator blocks the action of an enhancer on a promoter when the insulator lies between the enhancer and the promoter. 1 Enhancer I can stimulate translation of gene A but its effect on gene B is blocked by the insulator. Insulator binding protein 2 Enhancer II can stimulate translation of gene B but its effect on gene A is blocked by the insulator. Gene A Gene B Promoter Transcription start Promoter Enhancer I stream of the gene’s promoter. Furthermore, the exact position and orientation of an enhancer relative to the promoter can vary. How can an enhancer affect the initiation of transcription taking place at a promoter that is tens of thousands of base pairs away? The mechanism of action of many enhancers is not known, but evidence suggest that, in some cases, activator proteins bind to the enhancer and cause the DNA between the enhancer and the promoter to loop out, bringing the promoter and enhancer close to one another, so that the transcriptional activator proteins are able to directly interact with the basal transcription apparatus at the core promoter. Most enhancers are capable of stimulating any promoter in their vicinities. Their effects are limited, however, by insulators (also called boundary elements), which are DNA sequences that block or insulate the effect of enhancers in a position-dependent manner. If the insulator lies between the enhancer and the promoter, it blocks the action of the enhancer; but, if the insulator lies outside the region between the two, it has no effect ( ◗ FIGURE 16.23). Specific proteins bind to insulators and play a role in their blocking activity, but exactly how this takes place is poorly understood. Some insulators also limit the spread of changes in chromatin structure that affect transcription. Concepts Some activator proteins bind to enhancers, which are regulatory elements that are distant from the gene whose transcription they stimulate. Insulators are DNA sequences that block the action of enhancers. Table 16.4 Insulator Enhancer II Transcription start Coordinated gene regulation Although eukaryotic cells do not possess operons, several eukaryotic genes may be activated by the same stimulus. For example, many eukaryotic cells respond to extreme heat and other stresses by producing heat-shock proteins that help to prevent damage from such stressing agents. Heat-shock proteins are produced by approximately 20 different genes. During times of environmental stress, the transcription of all the heat-shock genes is greatly elevated. Groups of bacterial genes are often coordinately expressed (turned on and off together) because they are physically clustered as an operon and have the same promoter, but coordinately expressed genes in eukaryotic cells are not clustered. How, then, is the transcription of eukaryotic genes coordinately controlled if they are not organized into an operon? Genes that are coordinately expressed in eukaryotic cells are able to respond to the same stimulus because they have regulatory sequences in common in their promoters or enhancers. For example, different eukaryotic heat-shock genes possess a common regulatory element upstream of their start sites. A transcriptional activator protein binds to this regulatory element during stress and elevates transcription. Such common DNA regulatory sequences are called response elements; they typically contain short consensus sequences (Table 16.4) at varying distances from the gene being regulated. A single eukaryotic gene may be regulated by several different response elements. The metallothionein gene protects cells from the toxicity of heavy metals by encoding a protein that binds to heavy metals and removes them from cells. The basal transcription apparatus assembles around the TATA box, just upstream of the transcription start site for the metallothionein gene, but the apparatus alone is capable of only low rates of transcription. The presence of heavy metals stimulates much higher rates of transcription. A few response elements found in eukaryotic cells Response Element Responds to Consensus Sequence Heat-shock element Heat and other stress CNNGAANNTCCNNG Glucocorticoid response element Glucocorticoids TGGTACAAATGTTCT Phorbol ester response element Phorbal esters TGACTCA Serum response element Serum CCATATTAGG Source: Adapted from B. Lewin, Genes IV (Oxford: Oxford University Press, 1994), p. 880. 459 460 Chapter 16 Enhancer GRE MRE Enhancer TRE MRE TATA +1 Metallothionein gene Transcription start Steroid receptor protein MRE activator protein AP1 Various proteins may bind to upstream response elements to stimulate transcription. MRE activator protein RNA polymerase Transcription factors Basal transcription apparatus ◗ 16.24 Multiple response elements (MREs) are found in the upstream region of the metallothionein gene. The basal transcription apparatus binds near the TATA box. In response to heavy metals, activator proteins bind to several MRE elements and stimulate transcription. The TRE response element is the binding site for transcription factor AP1. In response to glucocorticoid hormones, steroid receptors bind to the GRE response element located approximately 250 nucleotides upstream of the metallothionein gene and stimulate transcription. Other response elements found upstream of the metallothionein gene also contribute to increasing its rate of transcription. For example, several copies of a metal response element (MRE) are upstream of the metallothionein gene ( ◗ FIGURE 16.24). Heavy metals stimulate the binding of an activator protein to MREs, which elevates the rate of transcription of the metallothionein gene. The presence of multiple copies of this response element permits high rates of transcription to be induced by metals. Two enhancers also are located in the upstream region of the metallothionein gene; one enhancer contains a response element known as TRE, which stimulates transcription in the presence of phorbol esters. A third response element called GRE is located approximately 250 nucleotides upstream of the metallothionein gene and stimulates transcription in response to glucocorticoid hormones. This example illustrates a common feature of eukaryotic transcriptional control: a single gene may be activated by several different response elements, found in both promoters and enhancers. Multiple response elements allow the same gene to be activated by different stimuli. At the same time, the presence of the same response element in different genes allows a single stimulus to activate multiple genes. In this way, response elements allow complex biochemical responses in eukaryotic cells. The T-antigen gene of the mammalian virus SV40 serves as a well-studied example of alternative splicing. This gene is capable of encoding two different proteins, the large T and small t antigens. Which of the two proteins is produced depends on which of two alternative 5 splice sites is used during RNA splicing ( ◗ FIGURE 16.25). The use of one 5 splice site produces mRNA that encodes the large T Gene Control Through Messenger RNA Processing 5’ 1 Use of the first 5‘ splice site produces an mRNA that encodes the large T antigen. Pre-mRNA 5’ A 2 Use of the second 5’ splice site produces an mRNA that encodes the small t antigen. Alternative 5’ splice sites 2 1 Intron B 3 The SF2 protein enhances the use of the second splice site. mRNA processing SF2 Intron mRNA 3’ 5’ A ◗ n t ron I B mRNA Alternative splicing allows a pre-mRNA to be spliced in multiple ways, generating different proteins in different tissues or at different times in development (see Chapter 14). Many eukaryotic genes undergo alternative splicing, and the regulation of splicing is probably an important means of controlling gene expression in eukaryotic cells. 3’ C C 3’ A B Translation Translation Large T antigen Small t antigen C 16.25 Alternative splicing leads to the production of the small t antigen and the large T antigen in the mammalian virus SV40. Control of Gene Expression 461 XX genotype X:A = 1.0 Female Fly Tra-2 protein dsx pre-mRNA Sxl gene Sxl protein 1 In X:A = 1.0 embryos, the activated Sxl gene produces a protein… 2 …that causes tra premRNA to be spliced at a downstream 3‘ site… Dsxprotein Tra protein tra pre-mRNA 3 …to produce Tra protein. 4 Together, Tra andTra-2 proteins direct the female-specific splicing of dsx pre-mRNA,… 5 …which produces proteins causing the embryo to develop into a female. XY genotype X:A = 0.5 Sxl gene 1 In X:A = 0.5 embryos, the Sxl gene is not activated,… Male Fly No Sxl protein 2 …and no Sxl protein is produced. tra pre-mRNA 3 Thus tra pre-mRNA is spliced at an upstream site,… Dsxprotein Nonfunctional Tra protein dsx pre-mRNA 4 …producing a nonfunctional Tra protein. 5 Without Tra, the malespecific splicing of dsx pre-mRNA… 6 …produces male Dsx proteins that cause the embryo to develop into a male. ◗ 16.26 Alternative splicing controls sex determination in Drosophila. antigen, whereas the use of the other 5 splice site (which is farther downstream) produces an mRNA encoding the small t antigen. A protein called splicing factor 2 (SF2) enhances the production of mRNA encoding the small t antigen (see Figure 16.25). Splicing factor 2 has two binding domains: one is an RNA-binding region and the other has alternating serine and arginine amino acids. These two domains are typical of SR proteins, which often play a role in regulating splicing. Splicing factor 2 stimulates the binding of U1 snRNP to the 5 splice site, one of the earliest steps in RNA splicing (see Chapter 14). The precise mechanism by which SR proteins influence the choice of splice sites is poorly understood. One model suggests that SF2 and other SR proteins bind to specific splice sites on mRNA and stimulate the attachment of snRNPs, which then commit the site to splicing. Another example of alternative mRNA splicing that regulates the expression of genes controls whether a fruit fly develops as male or female. Sex differentiation in Drosophila arises from a cascade of gene regulation ( ◗ FIGURE 16.26). When the ratio of X chromosomes to the number of haploid sets of autosomes (the XA ratio; see Chapter 4) is 1, a female-specific promoter is activated early in development and stimulates the transcription of the sex-lethal (Sxl) gene. The protein encoded by Sxl regulates the splicing of the pre-mRNA transcribed from another gene called transformer (tra). The splicing of tra pre-mRNA results in the production of Tra protein. Together with another protein (Tra-2), Tra stimulates the female-specific splicing of pre-mRNA from yet another gene called doublesex (dsx). This event produces a female-specific Dsx protein, which causes the embryo to develop female characteristics. In male embryos, which have an XA ratio of 0.5 (see Figure 16.26), the promoter that transcribes the Sxl gene in females is inactive; so no Sxl protein is produced. In the absence of Sxl protein, Tra pre-mRNA is spliced at a different 3 splice site to produce a nonfunctional form of Tra protein ( ◗ FIGURE 16.27). In turn, the presence of this nonfunctional Tra in males causes Dsx pre-mRNAs to be spliced differently (see Figure 16.26), and a male-specific Dsx protein is produced. This event causes the development of male-specific traits. In summary, the Tra, Tra-2, and Sxl proteins regulate alternative splicing that produces male and female phenotypes in Drosophila. Exactly how these proteins regulate alternative splicing is not yet known, but it’s possible that the Sxl protein (produced only in females) may block the upstream splice site on the tra pre-mRNA. This blockage would force the spliceosome to use the downstream 3 splice site, which causes the production of Tra protein and eventually results in female traits (see Figure 16.27). Concepts Eukaryotic genes may be regulated through the control of mRNA processing. The selection of alternative splice sites leads to the production of different proteins. 462 Chapter 16 Alternative 3’ splice sites A B C tra pre-mRNA 5’ D 3’ Intron 1 In females, the presence of Sxl protein… Intron 1 In males, the upstream 3‘ splice site is used,… Sxl protein B Use of downstream 3’ splice site A B C D mRNA 5’ A C D 3’ 5’ 3’ Premature stop codon 2 …resulting in the inclusion of a premature stop codon in the mRNA. Translation Nonfunctional Tra protein 3 No functional Tra protein is produced. Male phenotype 2 …causes the downstream 3‘ splice site to be used; the termination codon is spliced out with the intron… Tra protein 3 …and a functional Tra protein is produced. Female phenotype removal of nucleotides. A second pathway begins at the 3 end of the mRNA and removes nucleotides in the 3 : 5 direction. In a third pathway, the mRNA can be cleaved at internal sites. Messenger RNA degradation from the 5 end is most common and begins with the removal of the 5 cap. This pathway is usually preceded by the shortening of the poly(A) tail. Poly(A)-binding proteins (PABPs) normally bind to the poly(A) tail and contribute to its stability-enhancing effect. The presence of these proteins at the 3 end of the mRNA protects the 5 cap. When the poly(A) tail has been shortened below a critical limit, the 5 cap is removed, and nucleases then degrade the mRNA by removing nucleotides from the 5 end. These observations suggest that the 5 cap and 3 poly(A) tail of eukaryotic mRNA physically interact with each other, most likely by the poly(A) tail bending around so that the PABPs make contact with the 5 cap (see Chapter 14). Other parts of eukaryotic mRNA, including sequences in the 5 UTR, the coding region, and the 3 UTR, also affect mRNA stability. Poly(A) tails are added to the 3 ends of some bacterial mRNAs, but they are shorter than those typically associated with eukaryotic mRNA and have the opposite effect; they appear to destabilize most prokaryotic mRNAs. Concepts The stability of mRNA influences gene expression by affecting the amount of mRNA available to be translated. The stability of mRNA is affected by the 5 cap, the poly(A) tail, the 5 UTR, the coding section, and the 3 UTR. ◗ 16.27 Alternative splicing of tra pre-mRNA. Two alternative 3 splice sites are present. Gene Control Through RNA Stability The amount of a protein that is synthesized depends on the amount of corresponding mRNA available for translation. The amount of available mRNA, in turn, depends on both the rate of mRNA synthesis and the rate of mRNA degradation. Eukaryotic mRNAs are generally more stable than bacterial mRNAs, which typically last only a few minutes before being degraded, but nonetheless there is great variability in the stability of eukaryotic mRNA: some persist for only a few minutes; others last for hours, days, or even months. These variations can result in large differences in the amount of protein that is synthesized. Cellular RNA is degraded by ribonucleases, enzymes that specifically break down RNA. Most eukaryotic cells contain 10 or more types of ribonucleases, and there are several different pathways of mRNA degradation. In one pathway, the 5 cap is first removed, followed by 5 : 3 RNA Silencing Recent evidence indicates that the expression of some genes may be suppressed through RNA silencing, also known as RNA interference and posttranscriptional gene silencing. Although many of the details of this mechanism are still poorly understood, it appears to be widespread, existing in fungi, plants, and animals. It may also prove to be a powerful tool for artificially regulating gene expression in genetically engineered organisms. RNA silencing is initiated by the presence of doublestranded RNA, which may arise in several ways: by the transcription of inverted repeats in DNA into a single RNA molecule that base pairs with itself; by the simultaneous transcription of two different RNA molecules that are complementary to one another and pair; or by the replication of double-stranded RNA viruses ( ◗ FIGURE 16.28a). In Drosophila, an enzyme called Dicer cleaves and processes the double-stranded RNA to produce small pieces of single-stranded RNA that range in length from 21 to 25 nucleotides ( ◗ FIGURE 16.28b). These small interfering Control of Gene Expression RNAs (siRNAs) then pair with complementary sequences in mRNA and attract an RNA – protein complex that cleaves the mRNA approximately in the middle of the bound siRNA. After cleavage, the mRNA is further degraded. In (a) Inverted repeat Transcription through an inverted repeat in the DNA… DNA AGTCC GGACT Transcription the nucleus, siRNAs serve as guides for the methylation of complementary sequences in DNA, which then affects transcription. Some related RNA molecules produced through the cleavage of double-stranded RNA bind to complementary sequences in the 3 UTR of mRNA and inhibit their translation. RNA silencing is thought to have evolved as a defense against RNA viruses and transposable elements that move through an RNA intermediate (see Chapter 20). The extent to which it contributes to normal gene regulation is uncertain, but dramatic phenotypic effects result from some mutations that occur in the enzymes that carry out RNA silencing. RNA 5’ UCAGG 3’ CCUGA Concepts Folds 5’ 3’ …produces an RNA molecule that folds to produce doublestranded RNA. UCAGG AGUCC (b) 5’ 3’ 1 Double-stranded RNA is cleaved and processed by the enzyme dicer… 6 siRNAs may also attach to complementary sequences in DNA and attract methylating enzymes… Dicer DNA 2 …to produce small interfering RNAs (siRNAs). Methylating enzyme mRNA 5’ 3’ 3 siRNA pairs with complementary sequences on mRNA… RNA-protein complex 4 …and attracts an RNA-protein complex that cleaves the mRNA in the middle of the bound siRNA. Methylated DNA RNA silencing is initiated by double-stranded RNA molecules that are cleaved and processed. The resulting small interfering RNAs bind to complementary sequences in mRNA and bring about their cleavage and degradation. Small interfering RNAs may also stimulate the methylation of complementary sequences in DNA. Translational and Posttranslational Control Ribosomes, aminoacyl tRNAs, initiation factors, and elongation factors are all required for the translation of mRNA molecules. The availability of these components affects the rate of translation and therefore influences gene expression. The initiation of translation in some mRNAs is regulated by proteins that bind to the mRNA’s 5 UTR and inhibit the binding of ribosomes, similar to the way in which repressor proteins bind to operators and prevent the transcription of structural genes. Many eukaryotic proteins are extensively modified after translation by the selective cleavage and trimming of amino acids from the ends, by acetylation, or by the addition of phosphates, carboxyl groups, methyl groups, and carbohydrates to the protein). These modifications affect the transport, function, and activity of the proteins and have the capacity to affect gene expression. Concepts Cleavage 7 …which methylate cytosine bases in the DNA, affecting transcription. 5 After cleavage, the RNA is degraded. The initiation of translation may be affected by proteins that bind to specific sequences at the 5 end of mRNA. The availability of ribosomes, tRNAs, initiation and elongation factors, and other components of the translational apparatus may affect the rate of translation. Degradation Conclusion: siRNAs produced from doublestranded RNA molecules affect gene expression. ◗ 16.28 RNA silencing leads to the degradation of mRNA and the methylation of DNA. 463 464 Chapter 16 Connecting Concepts A Comparison of Bacterial and Eukaryotic Gene Control Now that we have considered the major types of gene regulation, let’s review some of the similarities and differences of bacterial and eukaryotic gene control. 1. Much of gene regulation in both bacterial and 2. 3. 4. 5. 6. 7. eukaryotic cells is accomplished through proteins that bind to specific sequences in DNA. Regulatory proteins come in a variety of types, but most can be characterized according to a small set of DNA-binding motifs. Regulatory proteins that affect transcription exhibit two basic types of control: repressors inhibit transcription (negative control); activators stimulate transcription (positive control). Both negative control and positive control are found in bacterial and eukaryotic cells. Complex biochemical and developmental events in bacterial and eukaryotic cells may require a cascade of gene regulation, in which the activation of one set of genes stimulates the activation of another set. Most gene regulation in bacterial cells is at the level of transcription (although it does exist at other levels). Gene regulation in eukaryotic cells often takes place at multiple levels, including chromatin structure, transcription, mRNA processing, and RNA stability. In bacterial cells, genes are often clustered in operons and are coordinately expressed by transcription into a single mRNA molecule. In contrast, each eukaryotic gene typically has its own promoter and is transcribed independently. Coordinate regulation in eukaryotic cells takes place through common response elements, present in the promoters and enhancers of the genes. Different genes that have the same response element in common are influenced by the same regulatory protein. Chromatin structure plays a role in eukaryotic (but not bacterial) gene regulation. In general, condensed chromatin represses gene expression; chromatin structure must be altered before transcription. Acetylation of the histone proteins, which may be influenced by the degree of DNA methylation, appears to be important in bringing about these changes in chromatin structure. The initiation of transcription is a relatively simple process in bacterial cells, and regulatory proteins function by blocking or stimulating the binding of RNA polymerase to DNA. Eukaryotic transcription requires complex machinery that includes RNA polymerase, general transcription factors, and transcriptional activators, which allows transcription to be influenced by multiple factors. 8. Some eukaryotic transcriptional activator proteins function at a distance from the gene by binding to enhancers, causing a loop in the DNA, and bringing the promoter and enhancer into close proximity. Some distant-acting sequences analogous to enhancers have been described in bacterial cells, but they appear to be less common. 9. The greater time lag between transcription and translation in eukaryotic cells than in bacterial cells allows mRNA stability and mRNA processing to play larger roles in eukaryotic gene regulation. Connecting Concepts Across Chapters The focus of this chapter has been on how the flow of information from genotype to phenotype is controlled. We have seen that there are a number of potential points of control in this pathway of information flow, including changes in gene structure, transcription, mRNA processing, mRNA stability, translation, and posttranslational modifications. Gene regulation is critically important from a number of perspectives. It is essential to the survival of cells, which cannot afford to simultaneously transcribe and translate all of their genes. The evolution of complex genomes consisting of thousands of genes would not have been possible without some mechanism to selectively control gene expression. Gene regulation is also important from a practical point of view. A number of human diseases are caused by the breakdown of gene regulation, which produces proteins at inappropriate times or places. Gene regulation is also important to genetic engineering, where the key to success is often not getting genes into a cell, which is relatively easy, but getting them expressed at useful levels. For all of these reasons, there is tremendous interest in how gene expression is controlled, and understanding gene regulation is one of the frontiers of genetic research. Information presented in this chapter builds on the foundation of molecular genetics developed in Chapters 10 through 15. The mechanisms of gene regulation provide important links to several topics in subsequent chapters. Gene regulation is important to the success of recombinant DNA, which is discussed in Chapter 18. Gene regulation also plays an important role in the genetics of development and cancer, which are discussed in Chapter 21. Control of Gene Expression 465 CONCEPTS SUMMARY • Gene expression may be controlled at different levels, including the alteration of gene structure, transcription, mRNA processing, RNA stability, translation, and posttranslational modification. Much of gene regulation is through the action of regulatory proteins binding to specific sequences in DNA. • Genes in bacterial cells are typically clustered into operons — groups of functionally related structural genes and the sequences that control their transcription. Structural genes in an operon are transcribed together as a single mRNA. • In negative control, a repressor protein binds to DNA and inhibits transcription. In positive control, an activator protein binds to DNA and stimulates transcription. In inducible operons, transcription is normally off and must be turned on; in repressible operons, transcription is normally on and must be turned off. • The lac operon of E. coli is a negative inducible operon that controls the metabolism of lactose. In the absence of lactose, a repressor binds to the operator and prevents transcription of the structural genes that encode -galactosidase, permease, and transacetylase. When lactose is present, some of it is converted into allolactose, which binds to the repressor and makes it inactive, allowing the structural genes to be transcribed and lactose to be metabolized. When all the lactose has been metabolized, the repressor once again binds to the operator and blocks transcription. • Positive control in the lac operon and other operons is through catabolite repression. When complexed with cAMP, the catabolite activator protein (CAP) binds to a site in or near the promoter and stimulates the transcription of the structural genes. Levels of cAMP are indirectly correlated with glucose; so low levels of glucose stimulate transcription and high levels inhibit transcription. • The trp operon of E. coli is a negative repressible operon that controls the biosynthesis of tryptophan. • Attenuation is another level of control that allows transcription to be stopped before RNA polymerase has reached the structural genes. It takes place through the close coupling of transcription and translation and depends on the secondary structure of the 5 UTR sequence. • Small RNA molecules, called antisense RNA, are complementary to sequences in mRNA and may inhibit • • • • • • • • • • translation by binding to these sequences, thereby preventing the attachment or progress of the ribosome. Transcriptional control regulates the lytic and lysogenic cycles of bacteriophage . The transcription of certain operons stimulates the transcription of some operons and represses the transcription of others. Which operons are stimulated and which are repressed depends on the affinity of promoters for repressor and activator proteins. Like gene regulation in bacterial cells, much of eukaryotic regulation is accomplished through the binding of regulatory proteins to DNA. However, there are no operons in eukaryotic cells, and gene regulation is characterized by a greater diversity of mechanisms acting at different levels. In eukaryotic cells, chromatin structure represses gene expression. During transcription, chromatin structure may be altered by the acetylation of histone proteins and demethylation. The initiation of eukaryotic transcription is controlled by general transcription factors that assemble into the basal transcription apparatus and by transcriptional activator proteins that stimulate normal levels of transcription by binding to regulatory promoters and enhancers. Some DNA sequences limit the action of enhancers by blocking their action in a position-dependent manner. Coordinately controlled genes in eukaryotic cells respond to the same factors because they have common response elements that are stimulated by the same transcriptional activator. Gene expression in eukaryotic cells may be influenced by RNA processing. Gene expression may be regulated by changes in RNA stability. The 5 cap, the coding sequence, the 3 UTR, and the poly(A) tail are important in controlling the stability of eukaryotic mRNAs. Proteins binding to the 5 end of eukaryotic mRNA may affect its translation. RNA silencing takes place when double-stranded RNA is cleaved and processed to produce small interfering RNAs that bind to complementary mRNAs and bring about their cleavage and degradation. Control of the posttranslational modification of proteins also may play a role in gene expression. IMPORTANT TERMS gene regulation (p. 000) induction (p. 000) structural gene (p. 000) regulatory gene (p. 000) regulatory element (p. 000) domain (p. 000) operon (p. 000) regulator gene (p. 000) regulator protein (p. 000) operator (p. 000) negative control (p. 000) positive control (p. 000) inducible operon (p. 000) inducer (p. 000) allosteric protein (p. 000) repressible operon (p. 000) corepressor (p. 000) coordinate induction (p. 000) partial diploid (p. 000) constitutive mutation (p. 000) 466 Chapter 16 attenuator (p. 000) antiterminator (p. 000) antisense RNA (p. 000) transcriptional antiterminator protein (p. 000) DNase I hypersensitive site (p. 000) catabolite repression (p. 000) catabolite activator protein (CAP) (p. 000) adenosine-3, 5-cyclic monophosphate (cAMP) (p. 000) attenuation (p. 000) chromatin-remodeling complex (p. 000) CpG island (p. 000) coactivator (p. 000) insulator (p. 000) heat-shock protein (p. 000) response element (p. 000) SR protein (p. 000) RNA silencing (p. 000) small interfering RNAs (siRNAs) (p. 000) Worked Problems 1. A regulator gene produces a repressor in an inducible operon. A geneticist isolates several constitutive mutations affecting this operon. Where might these constitutive mutations occur? How would the mutations cause the operon to be constitutive? • Solution An inducible operon is normally not being transcribed, meaning that the repressor is active and binds to the operator, inhibiting transcription. Transcription takes place when the inducer binds to the repressor, making it unable to bind to the operator. Constitutive mutations cause transcription to take place at all times, whether the inducer is present or not. Constitutive mutations might occur in the regulator gene, altering the repressor so that it is never able to bind to the operator. Alternatively, constitutive mutations might occur in the operator, altering the binding site for the repressor so that the repressor is unable to bind under any conditions. 2. For E. coli strains with the lac genotypes, use a plus sign () to indicate the synthesis of -galactosidase and permease and a minus sign () to indicate no synthesis of the enzymes. Lactose absent -Galactosidase Genotype of strain (a) (b) (c) (d)     Lactose present Permease -Galactosidase Permease  lacI lacP lacO lacZ lacY lacI lacP lacOc lacZ lacY lacI lacP lacO lacZ lacY lacI lacP lacO lacZ lacY/ lacI lacP lacO lacZ lacY • Solution Lactose absent Genotype of strain (a) (b) (c) (d)     Lactose present -Galactosidase Permease -Galactosidase Permease                  lacI lacP lacO lacZ lacY lacI lacP lacOc lacZ lacY lacI lacP lacO lacZ lacY lacI lacP lacO lacZ lacY/ lacI lacP lacO lacZ lacY (a) All the genes possess normal sequences, and so the lac operon functions normally: when lactose is absent, the regulator protein binds to the operator and inhibits the transcription of the structural genes, and so -galactosidase and permease are not produced. When lactose is present, some of it is converted into allolactose, which binds to the repressor and makes it inactive; the repressor does not bind to the operator, and so the structural genes are transcribed, and -galactosidase and permease are produced. (b) The structural lacZ gene is mutated; so -galactosidase will not be produced under any conditions. The lacO gene has a constitutive mutation, which means that the repressor is unable to bind to it, and so transcription takes place at all times. Therefore, permease will be produced in both the presence and the absence of lactose. (c) In this strain, the promoter is mutated, and so RNA polymerase is unable to bind and transcription does not take place. Therefore -galactosidase and permease are not produced under any conditions. (d) This strain is a partial diploid, which consists of two copies of the lac operon — one on the bacterial chromosome and another on a plasmid. The lac operon represented in the upper part of the genotype has mutations in both the lacZ and lacY genes, and so it Control of Gene Expression is not capable of encoding -galactosidase or permease under any conditions. The lac operon in the lower part of the genotype has a defective regulator gene, but the normal regulator gene in the upper operon produces a diffusible repressor (trans acting) that binds to the lower operon in the absence of lactose and inhibits transcription. Therefore no -galactosidase or permease is produced when lactose is absent. In the presence of lactose, the repressor cannot bind to the operator, and so the lower operon is transcribed and -galactosidase and permease are produced. 3. The fox operon, which has sequences A, B, C, and D, encodes enzymes 1 and 2. Mutations in sequences A, B, C, and D have the following effects, where a plus sign ()  enzyme synthesized and a minus sign ()  enzyme not synthesized. Fox absent Fox present Mutation in sequence Enzyme 1 Enzyme 2 Enzyme 1 Enzyme 2 No mutation A B C D                     (a) Is the fox operon inducible or repressible? (b) Indicate which sequence (A, B, C, or D) is part of the following components of the operon: 467 (a) When no mutations are present, enzymes 1 and 2 are produced in the presence of Fox but not in its absence, indicating that the operon is inducible and Fox is the inducer. (b) Mutation A allows the production of enzyme 2 in the presence of Fox, but enzyme 1 is not produced in the presence or absence of Fox, and so A must have a mutation in the structural gene for enzyme 1. With B, neither enzyme is produced under any conditions, and so this mutation most likely occurs in the promoter and prevents RNA polymerase from binding. Mutation C affects only enzyme 2, which is not produced in the presence or absence of lactose; enzyme 1 is produced normally (only in the presence of Fox), and so mutation C most likely occurs in the structural gene for enzyme 2. Mutation D is constitutive, allowing the production of enzymes 1 and 2 whether or not Fox is present. This mutation most likely occurs in the regulator gene, producing a defective repressor that is unable to bind to the operator under any conditions. Regulator gene Promoter Structural gene for enzyme 1 Structural gene for enzyme 2 D B A C 4. A mutation occurs in the 5 UTR of the trp operon that reduces the ability of region 2 to pair with region 3. What would be the effect of this mutation when the tryptophan level is high and when the tryptophan level is low? • Solution Regulator gene Promoter Structural gene for enzyme 1 Structural gene for enzyme 2 • Solution Because the structural genes in an operon are coordinately expressed, mutations that affect only one enzyme are likely to occur in the structural genes; mutations that affect both enzymes must occur in the promoter or regulator. When the tryptophan level is high, regions 2 and 3 do not normally pair, and therefore the mutation will have no effect. When the tryptophan level is low, however, the ribosome normally stalls at the Trp codons in region 1 and does not cover region 2, and so regions 2 and 3 are free to pair, which prevents regions 3 and 4 from pairing and forming a terminator, ending transcription. If regions 2 and 3 cannot pair, then regions 3 and 4 will pair even when tryptophan is low and attenuation will always occur. Therefore, no more tryptophan will be synthesized even in the absence of tryptophan. The New Genetics MINING GENOMES MICROARRAY ANALYSIS AND THE ANALYSIS OF GENE EXPRESSION This exercise introduces the powerful technique of microarray analysis, one of the most potent tools in bioinformatics. After a general introduction to microarrays, you will explore the use of microarrays in studies of gene expression. You will use SAGE (Serial Analysis of Gene Expression) to try to identify which genes are important in the development of specific diseases. COMPREHENSION QUESTIONS * 1. Name six different levels at which gene expression might be controlled. * 2. Draw a picture illustrating the general structure of an operon and identify its parts. 468 Chapter 16 3. What is the difference between positive and negative control? What is the difference between inducible and repressible operons? * 4. Briefly describe the lac operon and how it controls the metabolism of lactose. 5. What is catabolite repression? How does it allow a bacterial cell to use glucose in preference to other sugars? * 6. What is attenuation? What is the mechanism by which the attenuator forms when tryptophan levels are high and the antiterminator forms when tryptophan levels are low? 10. Briefly explain how transcriptional activator proteins and repressors affect the level of transcription of eukaryotic genes. 11. What is an insulator? 12. What is a response element? How do response elements bring about the coordinated expression of eukaryotic genes? 13. Outline the role of alternative splicing in the control of sex differentiation in Drosophila. *14. What role does RNA stability play in gene regulation? What controls RNA stability in eukaryotic cells? * 7. What is antisense RNA? How does it control gene expression? 15. 8. What general features of transcriptional control are found in bacteriophage ? *16. * 9. What changes take place in chromatin structure and what role do these changes play in eukaryotic gene regulation? Define RNA silencing. Explain how siRNAs arise and how they potentially affect gene expression. Compare and contrast bacterial and eukaryotic gene regulation. How are they similar? How are they different? APPLICATION QUESTIONS AND PROBLEMS * 20. *17. For each of the following types of transcriptional control, indicate whether the protein produced by the regulator gene will be synthesized initially as an active repressor, inactive repressor, active activator, or inactive activator. (a) Negative control in a repressible operon 21. (b) Positive control in a repressible operon (c) Negative control in an inducible operon A mutation prevents the catabolite activator protein (CAP) from binding to the promoter in the lac operon. What will be the effect of this mutation on transcription of the operon? Under which of the following conditions would a lac operon produce the greatest amount of -galactosidase? The least? Explain your reasoning. (d) Positive control in an inducible operon *18. A mutation occurs at the operator site that prevents the regulator protein from binding. What effect will this mutation have in the following types of operons? (a) Regulator protein is a repressor in a repressible operon. (b) Regulator protein is a repressor in an inducible operon. 19. The blob operon produces enzymes that convert compound A into compound B. The operon is controlled by a regulatory gene S. Normally the enzymes are synthesized only in the absence of compound B. If gene S is mutated, the enzymes are synthesized in the presence and in the absence of compound B. Does gene S produce a repressor or an activator? Is this operon inducible or repressible? Lactose present Glucose present Yes No Yes No No Yes Yes No Condition 1 Condition 2 Condition 3 Condition 4 22. A mutant strain of E. coli produces -galactosidase in the presence and in the absence of lactose. Where in the operon might the mutation in this strain occur? * 23. For E. coli strains with the following lac genotypes, use a plus sign () to indicate the synthesis of -galactosidase and permease and a minus sign () to indicate no synthesis of the enzymes. Lactose absent -Galactosidase Genotype of strain     Permease Lactose present -Galactosidase Permease  lacI lacP lacO lacZ lacY lacI lacP lacO lacZ lacY lacI lacP lacOc lacZ lacY lacI lacP lacO lacZ lacY lacI lacP lacO lacZ lacY lacI lacP lacO lacZ lacY/ lacI lacP lacO lacZ lacY (continued on p. 469) Control of Gene Expression Lactose absent *23. (continued) -Galactosidase Genotype of strain Permease 469 Lactose present -Galactosidase Permease lacI lacP lacOc lacZ lacY/ lacI lacP lacO lacZ lacY lacI lacP lacO lacZ lacY/ lacI lacP lacO lacZ lacY lacI lacP lacOc lacZ lacY/ lacI lacP lacO lacZ lacY lacI lacP lacO lacZ lacY/ lacI lacP lacO lacZ lacY lacI s lacP lacO lacZ lacY/ lacI lacP lacO lacZ lacY lacI s lacP lacO lacZ lacY/ lacI lacP lacO lacZ lacY 24. Give all possible genotypes of a lac operon that produces -galactosidase and permease under the following conditions. Do not give partial diploid genotypes. Lactose absent Lactose present -Galactosidase Permease (a)   (b) (c) (d) (e) (f) (g)             -Galactosidase Permease               * 25. Explain why mutations in the lacI gene are trans in their effects, but mutations in the lacO gene are cis in their effects. * 26. The mmm operon, which has sequences A, B, C, and D, encodes enzymes 1 and 2. Mutations in sequences A, B, C, and D have the following effects, where a plus sign ()  enzyme synthesized and a minus sign ()  enzyme not synthesized. Mmm absent Mmm present Mutation in sequence Enzyme 1 Enzyme 2 Enzyme 1 Enzyme 2 No mutation A B C D                     (a) Is the mmm operon inducible or repressible? (b) Indicate which sequence (A, B, C, or D) is part of the following components of the operon: Regulator gene Promoter Structural gene for enzyme 1 Structural gene for enzyme 2 * 27. Listed in parts a through g are some mutations that were found in the 5 UTR region of the trp operon of E. coli. What would the most likely effect of each of these mutations be on the transcription of the trp structural genes? (a) A mutation that prevented the binding of the ribosome to the 5 end of the mRNA 5 UTR (b) A mutation that changed the tryptophan codons in region 1 of the mRNA 5 UTR into codons for alanine (c) A mutation that created a stop codon early in region 1 of the mRNA 5 UTR (d) Deletions in region 2 of the mRNA 5 UTR (e) Deletions in region 3 of the mRNA 5 UTR 470 Chapter 16 (f) Deletions in region 4 of the mRNA 5 UTR (g) Deletion of the string of adenine nucleotides that follows region 4 in the 5 UTR 28. Some mutations in the trp 5 UTR region increase termination by the attenuator. Where might these mutations occur and how might they affect the attenuator? 30. Several examples of antisense RNA regulating translation in bacterial cells have been discovered. Molecular geneticists have also used antisense RNA to artificially control transcription in both bacterial and eukaryotic genes. If you wanted to inhibit the transcription of a bacterial gene with antisense RNA, what sequences might the antisense RNA contain? 29. Some of the mutations mentioned in Question 28 have an * 31. What would be the effect of deleting the Sxl gene in a newly interesting property. They prevent the formation of the fertilized Drosophila embryo? antiterminator that normally takes place when the tryptophan level is low. In one of the mutations, the AUG 32. What would be the effect of a mutation that destroyed the start codon for the 5 UTR peptide has been deleted. How ability of poly(A)-binding protein (PABP) to attach to a might this mutation prevent antitermination from occurring? poly(A) tail? CHALLENGE QUESTIONS 33. Would you expect to see attenuation in the lac operon and other operons that control the metabolism of sugars? Why or why not? 34. A common feature of many eukaryotic mRNAs is the presence of a rather long 3 UTR, which often contains consensus sequences. Creatine kinase B (CK-B) is an enzyme important in cellular metabolism. Certain cells — termed U937D cells — have lots of CK-B mRNA, but no CK-B enzyme is present. In these cells, the 5 end of the CK-B mRNA is bound to ribosomes, but the mRNA is apparently not translated. Something inhibits the translation of the CK-B mRNA in these cells. In recent experiments, numerous short segments of RNA containing only 3 UTR sequences were introduced into U937D cells. As a result, the U937D cells began to synthesize the CK-B enzyme, but the total amount of CK-B mRNA did not increase. Short segments of other RNA sequences did not stimulate the synthesis of CK-B; only the 3 UTR sequences turned on the translation of the enzyme. On the basis of these experiments, propose a mechanism for how CK-B translation is inhibited in the U937D cells. Explain how the introduction of short segments of RNA containing the 3 UTR sequences might remove the inhibition. SUGGESTED READINGS Beelman, C. A., and R. Parker. 1995. Degradation of mRNA in eukaryotes. Cell 81:179 – 183. An excellent review of the importance of mRNA stability in eukaryotic gene regulation and of some of the ways in which mRNA is degraded. Bell, A. C., A. G. West, and G. Felsenfeld. Insulators and boundaries: versatile regulatory elements in the eukaryotic genome. Science 291:447 – 498. A good introduction to research on insulators. Bestor, T. H. 1998. Methylation meets acetylation. Nature 393:311 – 312. A short review of research demonstrating a connection between DNA methylation and histone acetylation. Bird, A. P., and A. P. Wolffe. 1999. Methylation-induced repression: belts, braces, and chromatin. Cell 99:451 – 454. Discusses the role of methylation in gene regulation and development. Blackwood, E. M., and J. T. Kadonaga. 1998. Going the distance: a current view of enhancer action. Science 281:60 – 63. Reviews and discusses current models of enhancer action. Gerasimova, T. I., and V. G. Corces. 2001. Chromatin insulators and boundaries: effects on transcription and nuclear organization. Annual Review of Genetics 35:193 – 208. Reviews the effects of insulators and chromatin boundaries on the transcription of eukaryotic genes. Green, P. J., O. Pines, and M. Inouye. 1986. The role of antisense RNA in gene regulation. Annual Review of Biochemistry 55:569 – 597. A good review of antisense RNA and its role in gene regulation. Hodgkin, J. 1989. Drosophila sex determination: a cascade of regulated splicing. Cell 56:905 – 906. A good short review of alternative splicing and how it regulates sex differentiation in Drosophila. Jacob, F., and J. Monod. 1961. Genetic regulatory mechanisms in the synthesis of proteins. Journal of Molecular Biology 3:318 – 356. A classic paper describing Jacob and Monod’s work on the lac operon, as a well as a review of gene control in several other systems. Control of Gene Expression Matzke, M., A. J. M. Matzke, and J. M. Kooter. 2001. RNA: guiding gene silencing. Science 293:1080 – 1083. A review of RNA silencing. Ng, H. H., and A. Bird. 2000. Histone deacetylases: silencers for hire. Trends in Biochemical Science 25:121 – 126. Reviews how deacetylation can affect transcription in eukaryotic cells. Pabo, C. O., and R. T. Sauer. 1992. Transcription factors: structural families and principles. Annual Review of Biochemistry 61:1053 – 1095. A review of different DNA-binding motifs. Ptashne, M. 1989. How gene activators work. Scientific American 260(1):41 – 47. A discussion of some of the similarities in the ways in which gene activators work in prokaryotes and eukaryotes. Ross, J. 1989. The turnover of messenger RNA. Scientific American 260(4):48 – 55. A review of factors that control mRNA stability in eukaryotes. Struhl, K. 1995. Yeast transcriptional regulatory mechanisms. Annual Review of Genetics 29:651 – 674. A good review of transcriptional control in yeast. 471 Tuite, M. F. 1996. Death by decapitation for mRNA. Nature 382:577 – 579. Discusses the interaction of the 3poly(A) tail and the 5 cap in mRNA degradation. Tyler, J. K., and J. T. Kadonaga. 1999. The “dark side” of chromatin remodeling: repressive effects on transcription. Cell 99:443 – 446. Discusses the role of chromatin-remodeling complexes in eukaryotic gene regulation. Wolffe, A. P. 1994. Transcription: in tune with histones. Cell 77:13 – 16. A review of the role of histone proteins in eukaryotic gene regulation. Wolffe, A. P. 1997. Sinful repression. Nature 387:16 – 17. A short review of the role of histone acetylation in eukaryotic gene regulation. Yanofsky, C. 1981. Attenuation in the control of expression of bacterial operons. Nature 289:751 – 758. A good review of attenuation. 17 Gene Mutations and DNA Repair • • The Genetic Legacy of Chernobyl The Nature of Mutation The Importance of Mutations Categories of Mutations Types of Gene Mutations Mutation Rates • Causes of Mutations Spontaneous Replication Errors Spontaneous Chemical Changes Chemically Induced Mutations Radiation • The Study of Mutations The Analysis of Reverse Mutations Detecting Mutations with the Ames Test Radiation Exposure in Humans • DNA Repair Mismatch Repair Direct Repair Base-Excision Repair Nucleotide-Excision Repair Other Types of DNA Repair Genetic Diseases and Faulty DNA Repair This is photo legend x 26 picas width for opening chapter photo for Chapter 17. This is legend copy area for Chapter opening photo for Chapter Seventeen allowing 4lines, if more space is needed crop photo at top to allow for deeper legend here. (Volodymyr Repik/AP). The Genetic Legacy of Chernobyl Early on the morning of April 26, 1986, unit 4 of the Chernobyl nuclear power plant in northern Ukraine exploded, creating the worst nuclear disaster in history. The explosion blew off the 2000-ton metal plate that sealed the top of the reactor and ignited hundreds of tons of graphite, which burned uncontrollably for 10 days. The exact amount of radiation released in the explosion and ensuing fire is still unknown, but a minimum estimate is 100 mil- 472 lion curies, equal to a medium-sized nuclear strike. A plume of radioactive particles blew west and north from the crippled reactor, raining dangerous levels of radiation down on thousands of square kilometers. Regions as far away as Germany and Norway were affected; even Japan and the United States received measurable increases in radiation. Immediately after the accident, 31 people, mostly firefighters who heroically battled the blaze, died of acute radiation sickness. More than 400,000 workers later toiled to Gene Mutations and DNA Repair bury radioactive and chemical wastes from the accident and to entomb the remains of the disabled reactor in a steel and concrete sarcophagus. Many of these workers are now ill, suffering from a variety of problems including immune suppression, increased rates of cancer, and reproductive disorders. Radiation is a known mutagen, causing damage to DNA. More than 13,000 children in the area surrounding Chernobyl were exposed to the radioactive isotope iodine-131; many had exposures 400 times the maximum annual radiation exposure recommended for workers in the nuclear industry. The rate of thyroid cancer among children in the Ukraine is now 10 times the pre-Chernobyl levels. Chromosome mutations have been detected in the cells of many people who resided near Chernobyl at the time of the accident, and birth defects in the population have increased significantly. To examine germ-line mutations (those passed on to future generations) resulting from the Chernobyl accident, geneticists collected blood samples from 79 families who resided in heavily contaminated districts. These families included children born in 1994 who had not been exposed to radiation but who might possess mutations acquired from their parents. DNA sequences from these parents and children were analyzed, allowing the researchers to identify possible germ-line mutations. The germ-line mutation rate in these families was found to be twice as high as that in a control group of families in Britain. Furthermore, the mutation rate was correlated with the level of surface radiation: families in which the parents had resided in morecontaminated districts had higher mutation rates than those from less-contaminated districts. This chapter is about the infidelity of DNA — about how errors arise in genetic instructions and how those errors are sometimes repaired. The Chernobyl catastrophe illustrates one cause of mutations (radiation) and the detrimental effects that DNA damage can have. We begin with a brief examination of the different types of mutations, including their phenotypic effects, how they may be suppressed, and mutation rates. The next section explores how mutations spontaneously arise in the course of replication and afterward, as well as how chemicals and radiation induce mutations. We then consider the analysis of mutations. Finally, we take a look at DNA repair and some of the diseases that arise when DNA repair is defective. Throughout the chapter, it will be useful to keep in mind that mutations, by definition, are inherited changes in the DNA sequence — they must be passed on. Mutation requires both that the structure of a DNA molecule be changed and that this change is replicated. www.whfreeman.com/pierce More information about the health effects of radiation released in the Chernobyl accident The Nature of Mutation DNA is a highly stable molecule that replicates with amazing accuracy (see Chapters 10 and 12), but changes in DNA structure and errors of replication do occur. A mutation is defined as an inherited change in genetic information; the descendants may be cells produced by cell division or individual organisms produced by reproduction. The Importance of Mutations Mutations are both the sustainer of life and the cause of great suffering. On the one hand, mutation is the source of all genetic variation, the raw material of evolution. Without mutations and the variation that they generate, organisms could not adapt to changing environments and would risk extinction. On the other hand, most mutations have detrimental effects, and mutation is the source of many human diseases and disorders. Much of genetics focuses on how variants produced by mutation are inherited; genetic crosses are meaningless if all individuals are identically homozygous for the same alleles. Mutations serve as important tools of genetic analysis; the solution to almost any genetic problem begins with a good set of mutants. Much of Gregor Mendel’s success in unraveling the principles of inheritance can be traced to his use of carefully selected variants of the garden pea; similarly, Thomas Hunt Morgan and his students discovered many basic principles of genetics by analyzing mutant fruit flies ( ◗ FIGURE 17.1). Mutations are also useful for probing fundamental biological processes. Finding mutations that affect different components of a biological system and studying their effects can often lead to an understanding of the system. This method, referred to as genetic dissection, is analogous to figuring out how an automobile works by breaking different parts of a car and observing the effects — for example, smash the radiator and the engine overheats, revealing that the radiator cools the engine. The disruption of function in individual organisms bearing particular mutations likewise can be a source of insight into biological processes. For example, geneticists have begun to unravel the molecular details of development by studying mutations that interrupt various embryonic stages in Drosophila (see Chapter 21). Although this method of breaking “parts” to determine their function might seem like a crude approach to understanding a system, it is actually very powerful and has been used extensively in biochemistry, developmental biology, physiology, and behavioral science (but this method is not recommended for learning how your car works). Concepts Mutations are heritable changes in the genetic coding instructions of DNA. They are essential to the study of genetics and are useful in many other biological fields. 473 474 Chapter 17 Wild type Bar eyes Miniature (wings) Vestigial wings White eyes Curly wings Curved Bithorax Kidney Dichaete ◗ 17.1 Morgan and his students discovered many principles of heredity by studying mutation in Drosophila melanogaster. Shown here are several common mutations. Categories of Mutations body. If a mutation arises only once in every million cell divisions (a fairly typical rate of mutation), hundreds of millions of somatic mutations must arise in each person. The effect of these mutations depends on many factors, including the type of cell in which they occur and the developmental stage at which they arise. Many somatic mutations have no obvious effect on the phenotype of the organism, because the function of the mutant cell (even the cell itself) is replaced by that of normal cells. However, cells with a somatic mutation that stimulates cell division can increase in number and spread; this type of mutation can give rise to cells with a selective advantage and is the basis for all cancers (see Chapter 21). In multicellular organisms, we can distinguish between two broad categories of mutations: somatic mutations and germline mutations. Somatic mutations arise in somatic tissues, which do not produce gametes ( ◗ FIGURE 17.2). These mutations are passed on to other cells through the process of mitosis, which leads to a population of genetically identical cells (a clone). The earlier in development that a somatic mutation occurs, the larger the clone of cells within that individual organism that will contain the mutation. Because of the huge number of cells present in a typical eukaryotic organism, somatic mutations must be numerous. For example, there are about 1014 cells in the human 1 Somatic mutations occur in nonreproductive cells… 2 …and are passed to other cells through mitosis, creating a clone of cells having the mutant gene. Somatic mutation Somatic tissue Population of mutant cells Mitosis Mutant cell Germ-line tissue Germ-line mutation 3 Germ-line mutations occur in cells that give rise to gametes. ◗ Sexual reproduction 4 Meiosis and sexual reproduction allow germ-line mutations to be passed to approximately half the members of the next generation,… 17.2 There are two basic classes of mutations: somatic mutations and germ-line mutations. All cells carry mutation No cells carry mutation 5 …who will carry the mutation in all their cells. Gene Mutations and DNA Repair Original DNA sequence (a) Base substitution (b) Insertion (c) Deletion T GGG AGT GTA GAT CGT GGG AGT GCA GAT CGT GGG AGT GTT 475 T AGA TCG T GGG AGT GAG ATC GTC One codon changed A base substitution alters a single codon. An insertion or a deletion alters the reading frame and may change many codons. ◗ 17.3 Three basic types of gene mutations are base substitutions, insertions, and deletions. Germ-line mutations arise in cells that ultimately produce gametes. These mutations can be passed to future generations, producing individual organisms that carry the mutation in all their somatic and germ-line cells (see Figure 17.2). When we speak of mutations in multicellular organisms, we’re usually talking about germ-line mutations. In single-cell organisms, however, there is no distinction between germ-line and somatic mutations, because cell division results in new individuals. Historically, mutations have been partitioned into those that affect a single gene, called gene mutations, and those that affect the number or structure of chromosomes, called chromosome mutations. This distinction arose because chromosome mutations could be observed directly, by looking at chromosomes with a microscope, whereas gene mutations could be detected only by observing their phenotypic effects. Now, with the development of DNA sequencing, gene mutations and chromosome mutations are distinguished somewhat arbitrarily on the basis of the size of the DNA lesion. Nevertheless, it is useful to use the term chromosome mutation for a large-scale genetic alteration that affects chromosome structure or the number of chromosomes and the term gene mutation for a relatively small DNA lesion that affects a single gene. This chapter focuses on gene mutations; chromosome mutations were discussed in Chapter 9. Types of Gene Mutations There are a number of ways to classify gene mutations. Some classification schemes are based on the nature of the phenotypic effect — whether the mutation alters the amino acid sequence of the protein and, if so, how. Other schemes Transitions Base substitutions The simplest type of gene mutation is a base substitution, the alternation of a single nucleotide in the DNA ( ◗ FIGURE 17.3a). Because of the complementary nature of the two DNA strands (see Figure 10.14), when the base of one nucleotide is altered, the base of the corresponding nucleotide on the opposite strand also will be altered in the next round of replication. A base substitution therefore usually leads to a base-pair substitution. Base substitutions are of two types. In a transition, a purine is replaced by a different purine or, alternatively, a pyrimidine is replaced by a different pyrimidine ( ◗ FIGURE 17.4). In a transversion, a purine is replaced by a pyrimidine or a pyrimidine is replaced by a purine. The number of possible transversions (see Figure 17.4) is twice the number of possible transitions, but transitions usually arise more frequently. Insertions and deletions The second major class of gene mutations contains insertions and deletions — the addition or the removal, respectively, of one or more nucleotide pairs ( ◗ FIGURE 17.3b and c). Although base substitutions are often assumed to be the most common type of mutation, molecular analysis has revealed that insertions and deletions are more frequent. Insertions and deletions within Possible base changes ◗ 17.4 A transition is the substitution of a purine for a purine or a pyrimidine for a pyrimidine; a transversion is the substitution of a pyrimidine for a purine or a purine for a pyrimidine. are based on the causative agent of the mutation, and still others focus on the molecular nature of the defect. The most appropriate scheme depends on the reason for studying the mutation. Here, we will categorize mutations primarily on the basis of their molecular nature, but we will also encounter some terms that relate the causes and the phenotypic effects of mutations. A G Purine G A Purine Purine T C Pyrimidine Pyrimidine Transversions A A G Pyrimidine G C T C T C C T T A G A G C T Pyrimidine Purine 476 Chapter 17 sequences that encode proteins may lead to frameshift mutations, changes in the reading frame (see p. 000 in Chapter 15) of the gene. The initiation codon in mRNA sets the reading frame: after the initiation codon, other codons are read as successive nonoverlapping groups of three nucleotides. The addition or deletion of a nucleotide usually changes the reading frame, altering all amino acids encoded by codons following the mutation (see Figure 17.3b and c). Many amino acids can be affected; so frameshift mutations generally have drastic effects on the phenotype. Not all insertions and deletions lead to frameshifts, however; because codons consist of three nucleotides, insertions and deletions consisting of any multiple of three nucleotides will leave the reading frame intact, although the addition or removal of one or more amino acids may still affect the phenotype. These mutations are called in-frame insertions and deletions, respectively. Concepts Gene mutations consist of changes in a single gene and may be base substitutions (a single pair of nucleotides is altered) or insertions or deletions (nucleotides are added or removed). A base substitution may be a transition (substitution of like bases) or a transversion (substitution of unlike bases). Insertions and deletions often lead to a change in the reading frame of a gene. ◗ 17.5 The fragile-X chromosome is associated with a characteristic constriction (fragile site) on the long arm. (Visuals Unlimited.) Expanding trinucleotide repeats In 1991, an entirely novel type of mutation was discovered. This mutation occurs in a gene called FMR-1 and causes fragile-X syndrome, the most common hereditary cause of mental retardation. The disorder is so named because, in specially treated cells of persons having the condition, the tip of the X chromosome is attached only by a slender thread ( ◗ FIGURE 17.5). The FMR-1 gene contains a number of adjacent copies of the trinucleotide CGG. The normal FMR-1 allele (not containing the mutation) has 60 or fewer copies of this trinucleotide but, in persons with fragile-X syndrome, the allele may har- Table 17.1 Examples of genetic diseases caused by expanding trinucleotide repeats Number of Copies of Repeat Disease Repeated Sequence Normal Range Spinal and bulbar muscular atrophy CAG 11 – 33 Fragile-X syndrome CGG Jacobsen syndrome CGG Spinocerebellar ataxia (several types) CAG 4 – 44 21 – 130 Autosomal dominant cerebellar ataxia CAG 7 – 19 37 – 220 Myotonic dystrophy CTG 5 – 37 44 – 3000 Huntington disease CAG 9 – 37 37 – 121 6 – 54 11 Disease Range 40 – 62 50 – 1500 100 – 1000 Friedreich ataxia GAA 6 – 29 200 – 900 Dentatorubral-pallidoluysian atrophy CAG 7 – 25 49 – 75 Myoclonus epilepsy of the Unverricht-Lundborg type* CCCGCCCGCG 2–3 12 – 13 *Technically not a trinucleotide repeat but does entail a multiple of three nucleotides that expands and contracts in similar fashion to trinucleotide repeats. Gene Mutations and DNA Repair bor hundreds or even thousands of copies. Mutations in which copies of a trinucleotide may increase greatly in number are called expanding trinucleotide repeats. Expanding trinucleotide repeats have been found in several other human diseases (Table 17.1). The number of copies of the trinucleotide repeat often correlates with the severity or age of onset of the disease. The number of copies of the repeat also correlates with the instability of trinucleotide repeats — when more repeats are present, the probability of expansion to even more repeats increases. This instability leads to a phenomenon known as anticipation (see p. 000 in Chapter 5), in which diseases caused by trinucleotide-repeat expansions become more severe in each generation. Less commonly, the number of trinucleotide repeats may decrease within a family. How an increase in the number of trinucleotides produces disease symptoms is not yet clear. In several of the diseases (e.g., Huntington disease), the trinucleotide CAG expands within the coding part of a gene, producing a toxic protein that has extra glutamine residues (the amino acid encoded by CAG). In other diseases (e.g., fragile-X syndrome and myotonic dystrophy), the repeat is outside the coding region of the gene and therefore must have some other mode of action. At least one disease (a rare type of epilepsy) has now been associated with an expanding repeat of a 12-bp sequence. Although this repeat is not a trinucleotide, it is included as a type of expanding trinucleotide because its repeat is a multiple of three. The mechanism that leads to the expansion of trinucleotide repeats is still unclear. Strand slippage in DNA replication (see Figure 17.14) and crossing over between misaligned repeats (see Figure 17.15) are two possible sources of expansion. Single-stranded regions of some trinucleotide repeats are known to fold into hairpins ( ◗ FIGURE 17.6) and other special DNA structures. Such structures may promote strand slippage in replication and may prevent these errors from being recognized and corrected, as described later in this chapter in the section on mismatch repair. (a) 1 4 5 6 7 8 1 This DNA molecule has eight copies of a CAG repeat. 2 The two strands separate… (b) GTC GTC GTC GTC GTC GTC GTC GTC 3 …and replicate. (c) GTC GTC GTC GTC GTC GTC GTC GTC CAG CAG CAG CAG CAG CAG CAG (d) GTC GTC CAG CAG C A G C A G G A C G A C C (e) GTC GTC GTC GTC GTC GTC CAG G A Mispaired bases 4 During replication, a hairpin forms on the newly synthesized strand,… 1 2 GTC GTC CAG CAG 1 2 C 3A G C 4A G 3 4 5 6 7 8 GTC GTC GTC GTC GTC GTC CAG CAG CAG CAG CAG CAG G 8 9 10 11 12 13 A7 C G A6 5 …causing part of the template C strand to be replicated twice and C G increasing the number of repeats A on the newly synthesized strand. 5 6 The two strands of the new DNA molecule separate,… (f) GTC GTC GTC CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG 1 2 3 4 5 6 7 8 9 10 11 12 13 7 …and the strand with extra CAG copies serves as a template for replication. (g) 1 2 3 4 5 6 7 8 9 10 11 12 13 GTC GTC GTC GTC GTC GTC GTC GTC GTC GTC GTC GTC GTC CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG CAG 1 2 3 4 5 6 7 8 9 10 11 12 13 8 The resulting DNA molecule contains five additional copies of the CAG repeat. Phenotypic effects of mutations Mutations have a variety of phenotypic effects. The effect of a mutation must be considered with reference to a phenotype against which the mutant can be compared, which is usually the wild-type phenotype — that is, the most common phenotype in natural populations of the organism. For example, most Drosophila melanogaster in nature have red eyes; so 3 GTC GTC GTC GTC GTC GTC GTC GTC CAG CAG CAG CAG CAG CAG CAG CAG Concepts Expanding trinucleotide repeats are regions of DNA that consist of repeated copies of three nucleotides. Increased numbers of trinucleotide repeats are associated with several genetic diseases. 2 ◗ 17.6 The number of copies of a trinucleotide may increase by strand slippage in replication. 477 478 Chapter 17 DNA No mutation TCA AGT (a) Missense mutation (b) Nonsense mutation (c) Silent mutation DNA TCA AGT TTA AAT TAA ATT TCG AGC mRNA UCA UUA UAA UCG Stop codon Ser Protein Wild-type protein produced. Ser Leu The new codon encodes a different amino acid; there is a change in amino acid sequence. The new codon is a stop codon; there is premature termination of translation. The new codon encodes the same amino acid; there is no change in amino acid sequence. ◗ 17.7 Base substitutions can cause (a) missense, (b) nonsense, and (c) silent mutations. red eyes are considered the wild-type eye color; any other genetically determined eye color in fruit flies is considered to be a mutant. A mutation that alters the wild-type phenotype is called a forward mutation, where as a reverse mutation (a reversion) changes a mutant phenotype back into the wild type. Geneticists use special terms to describe the phenotypic effects of mutations. A base substitution that alters a codon in the mRNA, resulting in a different amino acid in the protein, is referred to as a missense mutation ( ◗ FIGURE 17.7a). A nonsense mutation changes a sense codon (one that specifies an amino acid) into a nonsense codon (one that terminates translation; ◗ FIGURE 17.7b). If a nonsense mutation occurs early in the mRNA sequence, the protein will be greatly shortened and will usually be nonfunctional. A silent mutation alters a codon but, thanks to the redundancy of the genetic code, the codon still specifies the same amino acid ( ◗ FIGURE 17.7c). A neutral mutation is a missense mutation that alters the amino acid sequence of the protein but does not change its function. Neutral mutations occur when one amino acid is replaced by another that is chemically similar or when the affected amino acid has little influence on protein function. Loss-of-function mutations cause the complete or partial absence of normal function. A loss-of-function mutation so alters the structure of the protein that the protein no longer works correctly or the mutation can occur in regulatory regions that affect the transcription, translation, or splicing of the protein. Loss-of-function mutations are frequently recessive, and diploid individuals must be homozygous for the mutation before they can exhibit the effects of the loss of the functional protein. In contrast, a gain-of-function mutation produces an entirely new trait or it causes a trait to appear in inappropriate tissues or at inappropriate times in development. These mutations are frequently dominant in their expression. Still other types of mutations are conditional mutations, which are expressed only under certain conditions, and lethal mutations, which cause premature death. Suppressor mutations A suppressor mutation is a genetic change that hides or suppresses the effect of another mutation. This type of mutation is distinct from a reverse mutation, in which the mutated site changes back into the original wild-type sequence ( ◗ FIGURE 17.8). A suppressor mutation occurs at a site that is distinct from the site of the original mutation; thus, an individual organism with a suppressor mutation is a double mutant, possessing both the original mutation and the suppressor mutation but exhibiting the phenotype of an unmutated wild type. Geneticists distinguish between two classes of suppressor mutations: intragenic and intergenic. An intragenic suppressor is in the same gene as that containing the mutation being suppressed and may work in several ways. The suppressor may change a second nucleotide in the same codon that was altered by the original mutation, producing a codon that specifies the same amino acid as the original, unmutated codon ( ◗ FIGURE 17.9). Intragenic suppressors may also work by suppressing a frameshift mutation. If the original mutation is a one-base deletion, then the addition of a single base elsewhere in the gene will restore the former reading frame (see Figure 17.9). Consider the following nucleotide sequence in DNA and the amino acids that it encodes: DNA Amino acids AAA TCA CTT GGC GTA CAA Phe Ser Glu Pro His Val Suppose a one-base deletion occurs in the first nucleotide of the second codon. This deletion shifts the reading frame by one nucleotide and alters all the amino acids that follow the mutation. Gene Mutations and DNA Repair 1 A forward mutation changes the wild type into a mutant phenotype. Genotype: 2 A reverse mutation restores the wild-type gene and the phenotype. Forward – Wild type mutation A + + A B Reverse of mutation A– 479 3 A suppressor mutation occurs at a site different from that of the original mutation… Suppressor – Mutation mutation B – A Mutations A– B – 4 … and produces an individual that possesses both the original mutation and the suppressor mutation… 5 …but has the wild-type phenotype. Red eyes ◗ 17.8 White eyes Red eyes Relation of forward, reverse, and suppressor mutations. Intergenic suppressors, in contrast, occur in a gene that is different from the one bearing the original mutation. These suppressors sometimes work by changing the way that the mRNA is translated. In the example illustrated in ( ◗ FIGURE 17.10), the original DNA sequence is AAC (UUG in the mRNA) and specifies leucine. This sequence mutates to ATC (UAG in mRNA), a termination codon. The ATC nonsense mutation could be suppressed by a mutation in a gene that encodes a tRNA molecule by changing the anticodon on the tRNA so that it is capable of pairing with the UAG termination codon. For example, the gene that encodes the tRNA for tyrosine (tRNATyr), which has the anticodon AUA, might be mutated to have the anticodon AUC, which will then pair with the UAG stop codon. Instead of translation terminating at the UAG codon, tyrosine would be inserted into the protein and a full-length protein would be produced, although tyrosine would now substitute for leucine. The effect of this change would depend on the role of this amino acid in the overall structure of the protein, but the effect is likely to be less detrimental than the effect of the nonsense mutation, which would halt translation prematurely. Because cells in many organisms have multiple copies of tRNA genes, other unmutated copies of tRNATyr would One-nucleotide deletion AAA X T CAC TTG GCG TAC AA Phe Val Asn Arg Stop If a single nucleotide is added to the third codon (the suppressor mutation), the reading frame is restored, although two of the amino acids differ from those specified by the original sequence. One-nucleotide duplication AAA CAC TTT X GGC GTA CAA Phe Val Lys Pro His Val Similarly, a mutation due to an insertion may be suppressed by a subsequent deletion in the same gene. A third way in which an intragenic suppressor may work is by making compensatory changes in the protein. A first missense mutation may alter the folding of a polypeptide chain by changing the way in which amino acids in the protein interact with one another. A second missense mutation at a different site (the suppressor) may recreate the original folding pattern by restoring interactions between the amino acids. 1 A missense mutation alters a single codon. DNA AAT mRNA UUA Mutation AAA UUU 2 A second mutation at a different site in the same gene… Intragenic supressor mutation GAA CUU 3 …may restore the original amino acid. Protein ◗ Leu Phe 17.9 An intragenic suppressor mutation occurs in the same gene that contains the mutation being suppressed. Leu 480 Chapter 17 (a) 1 With the wildtype sequence,… (b) (c) TT G AAC DNA TT G AAC 3 A base substitution at one site produces a premature stop codon,… Transcription TA G ATC UUG Translation AT A TAT Second basesubstitution mutation TA G ATC TA G ATC Transcription tRNA AUA Base-substitution mutation mRNA 5 At site 2 is a gene encoding tyrosine-tRNA. Site 1 (first mutation) Site 2 AT C TAG 6 Normal transcription produces a tRNA with an anticodon AUA (which would pair with the tyrosine codon UAU in translation). 7 If a base substitution introduces an incorrect base (G),… Transcription Stop codon Ribosome Leu UAG Translation tRNA UAG AUC 8 …the resulting mutant tRNA has anticodon AUC (instead of AUA),… Tyr AAC UUG Translation UAG 9 …which can pair with the stop codon UAG. Tyr 2 Leu is incorporated into a protein. 4 …which halts protein synthesis, resulting in a nonfunctional protein. AUC UAG Termination of translation Full-length, functional protein Shortened, nonfunctional protein 10 Translation continues past the stop codon, Tyr is incorporated into the protein. Full-length, functional protein ◗ 17.10 An intergenic suppressor mutation occurs in a different gene from the one bearing the original mutation. (a) The wild-type sequence produces a full-length, functional protein. (b) A base substitution at a site in one gene produces a premature stop codon, resulting in a shortened, nonfunctional protein. (c) A base substitution at a site in another gene, which in this case encodes tRNA, alters the anticodon of tRNATyr so that tRNATyr can pair with the stop codon produced by the original mutation, allowing tyrosine to be incorporated into the protein and translation to continue. Tyrosine replaces the leucine residue present in the original protein. remain available to recognize the tyrosine codons. However, we might expect that the tRNAs that have undergone a suppressor mutation would also suppress the normal termination codons at the ends of coding sequences, resulting in the production of longer-than-normal proteins, but this event does not usually take place. Mutations in tRNA genes can also suppress missense and frameshift mutations. Intergenic suppressors can also work through genic interactions (see p. 000 in Chapter 5). Polypeptide chains which are produced by two genes may interact to produce a functional protein. A mutation in one gene may alter the encoded polypeptide so that the interaction is destroyed and then a functional protein is no longer produced. A suppressor mutation in the second gene may produce a compensatory change in its polypeptide therefore restoring the original interaction. Characteristics of some of the different types of mutations are summarized in Table 17.2. Gene Mutations and DNA Repair Table 17.2 Characteristics of different types of mutations Type of Mutation Definition Base substitution Changes the base of a single DNA nucleotide Transition Base substitution in which a purine replaces a purine or a pyrimidine replaces a pyrimidine Transversion Base substitution in which a purine replaces a pyrimidine or a pyrimidine replaces a purine Insertion Addition of one or more nucleotides Deletion Deletion of one or more nucleotides Frameshift mutation Insertion or deletion that alters the reading frame of a gene In-frame deletion or insertion Insertion or deletion of a multiple of three nucleotides that does not alter the reading frame Expanding trinucleotide repeats Repeated sequence of three nucleotides (trinucleotide) in which the number of copies of the trinucleotide increases Forward mutation Changes the wild-type phenotype to a mutant phenotype Reverse mutation Changes a mutant phenotype back to the wild-type phenotype Missense mutation Changes a sense codon into a different sense codon, resulting in the incorporation of a different amino acid in the protein Nonsense mutation Changes a sense codon into a nonsense codon, causing premature termination of translation Silent mutation Changes a sense codon into a synonymous codon, leaving unchanged the amino acid sequence of the protein Neutral mutation Changes the amino acid sequence of a protein without altering its ability to function Loss-of-function mutation Causes a complete or partial loss of function Gain-of-function mutation Causes the appearance of a new trait or function or causes the appearance of a trait in inappropriate tissues or at inappropriate times Lethal mutation Causes premature death Suppressor mutation Suppresses the effect of an earlier mutation at a different site Intragenic suppressor mutation Suppresses the effect of an earlier mutation within the same gene Intergenic suppressor mutation Suppresses the effect of an earlier mutation in another gene Concepts A suppressor mutation overrides the effect of an earlier mutation at a different site. An intragenic suppressor mutation occurs within the same gene, as that containing the original mutation, whereas an intergenic suppressor mutation occurs in a different gene. www.whfreeman.com/pierce Descriptions and illustrations of different types of mutations Mutation Rates The frequency with which a gene changes from the wild type to a mutant is referred to as the mutation rate and is generally expressed as the number of mutations per biological unit, which may be mutations per cell division, per 481 482 Chapter 17 The New Genetics ETHICS • SCIENCE • TECHNOLOGY Achondroplasia Achondroplasia is an inherited autosomal dominant condition that causes diminished growth in the long bones of the legs, leading to dwarfism. Several years ago, the gene for achondroplasia was identified and cloned. If two people with achondroplasia marry, and each of them is heterozygous for achondroplasia (has one of two possible copies of the gene), chances are that two of every four children that they have will also be heterozygous and dwarfs. On average, one child in every four born to the couple will not inherit the achondroplasia gene and will be of average height, and one child in four will be homozygous for the gene. Homozygosity for this gene is lethal, and these children usually die in infancy. A researcher who helped identify the gene understandably felt that he had made a significant contribution by allowing short-statured parents the option of aborting fetuses with the lethal double dose of the gene. To his surprise, shortly after news of the discovery was published, he received a call from one member of an achondroplasia couple, asking whether it was possible to test for both the presence and the absence of the gene. The couple wanted this information, they said, because they planned to abort not just all fetuses homozygous for the achondroplasia gene, but any completely unaffected ones as well. They were intent on having only short-statured children like themselves. This case poses at least two major conflicts for genetic professionals. First, there is the conflict between respect for parental autonomy, which would ordinarily encourage acceding to the parents’ request for assistance and information, and the medical professional’s desire not to visit harm on a child. Children born with achondroplasia frequently must undergo a series of surgical procedures to correct serious bone problems. Throughout life, they also face many social and physical obstacles because of their short stature. Is it right for parents to deliberately bring a child into existence with this condition? Is it appropriate for health professionals to assist such efforts? How do we balance respect for parental autonomy against nonmaleficence? Matters become more complex when we realize that some people with achondroplasia reject the idea that any harm is being done by the parents in this case. They maintain that most of the problems that they face are socially constructed and are due to society’s marginalization and neglect of those who are different. Some also reject medical or genetic “solutions” to their problems. The proper response, they believe, is not to prevent the birth of a child with a genetic condition but to eliminate the social handicaps and discriminatory attitudes. Thus, the parents in this case may be driven not merely by their personal wishes but by a commitment to social justice. Traditional, nondirective genetic counseling has assumed that people seek prenatal testing to prevent the A family of three who have achondroplasia. (Gail Burton/AP.) by Ron Green birth of a child with a genetic disease, to prepare for the birth and treatment of a child with a recognized genetic disorder, or to reconsider their reproductive plans. What this case reveals is that genetics is opening up the possibility of shaping our children’s lives in ways that go far beyond what is normally associated with the healing role. Somewhat less dramatic, but perhaps more worrisome is the fact that the identification of the genetic basis of many traits that are not considered diseases (e.g., height, intelligence, temperament) will offer parents a new range of choices in the “genetic design” of their children. At this moment, research is underway to identify and replace disease-causing genes in human embryos. In the future, such embryonic gene therapy will open up the possibility of enhancing children’s capabilities. Beginning with genes that improve a child’s resistance to cancer or AIDS, genetic interventions may make it possible to increase a child’s height, stamina, or IQ. Science could offer parents who yearn for a champion basketball player or world-class swimmer the means to realize their dreams. As complex as it may seem, the science here is the easy part. Far more difficult are the ethical questions. To begin with, there is the question of whether we will ever have enough knowledge to “play God” in this way. Do we dare alter the course of human evolution? The history of twentieth-century science is littered with well-intended technologies — from DDT to nuclear power — that eventually brought unforeseen harms. Will our genetic interventions follow this path? Will our clumsy attempts to “improve” the human genome unleash an epidemic of new genetic diseases? And what of the child’s rights in all this? Is it fair to “engineer” a child into a parent’s dream of perfection? Gene Mutations and DNA Repair gamete, or per round of replication. For example, the mutation rate for achondroplasia (a type of hereditary dwarfism) is about four mutations per 100,000 gametes, usually expressed more simply as 4  105. In contrast, mutation frequency is defined as the incidence of a specific type of mutation within a group of individual organisms. For achondroplasia, the mutation frequency in the United States is about 2  104, which means that about 1 of every 20,000 persons in the U.S. population carries this mutation. Mutation rates are affected by three factors. First, they depend on the frequency with which primary changes take place in DNA. Primary change may arise from spontaneous molecular changes in DNA or it may be induced by chemical or physical agents in the environment. A second factor influencing the mutation rate is the probability that, when a change takes place, it will be repaired. Most cells possess a number of mechanisms to repair altered DNA; so most alterations are corrected before they are replicated. If these repair systems are effective, mutation rates will be low; if they are faulty, mutation rates will be elevated. Some mutations increase the overall rate of mutation at other genes; these mutations usually occur in genes that encode components of the replication machinery or DNA repair enzymes. A third factor, one that influences our ability to calculate mutation rates, is the probability that a mutation will be recognized and recorded. When DNA is sequenced, all mutations are potentially detectable. In practice, however, sequencing is expensive; so mutations are usually detected by their phenotypic effects. Some mutations may appear to arise at a higher rate simply because they are easier to detect. Mutation rates vary among organisms and among genes within organisms (Table 17.3), but we can draw several general conclusions about mutation rates. First, spontaneous mutation rates are low for all organisms studied. Typical mutation rates for viral and bacterial genes range from about 1 to 100 mutations per 10 billion cells (1  108 to 1  1010). The mutation rates for most eukaryotic genes are a bit higher, from about 1 to 10 mutations per million gametes (1  105 to 1  106). These higher values in eukaryotes may be due to the fact that the rates are calculated per gamete, and several cell divisions are required to produce a gamete, whereas mutation rates in prokaryotic cells and viruses are calculated per cell division. Within each major class of organisms, mutation rates vary considerably. These differences may be due to differing abilities to repair mutations, unequal exposures to mutagens, or biological differences in rates of spontaneously arising mutations. Even within a single species, spontaneous rates of mutation vary among genes. The reason for this variation is not entirely understood, but some regions of DNA are known to be more susceptible to mutation than others. Table 17.3 Mutation rates of different genes in different organisms Organism Mutation Rate Unit Bacteriophage T2 Lysis inhibition Host range 1  108 3  109 Per replication Escherichia coli Lactose fermentation Histidine requirement 2  107 2  108 Per cell division Neurospora crassa Inositol requirement Adenine requirement 8  108 4  108 Per asexual spore Corn Kernel color 2.2  106 Per gamete Drosophila Eye color Allozymes 4  105 5.14  106 Per gamete Mouse Albino coat color Dilution coat color 4.5  105 3  105 Per gamete Human Huntington disease Achondroplasia Neurofibromatosis (Michigan) Hemophilia A (Finland) Duchenne muscular dystrophy (Wisconsin) 1  106 1  105 1  104 Per gamete 3.2  105 9.2  105 483 484 Chapter 17 (a) Concepts H3 C N H N N N N G H N NH2 N Guanine H H Mutations result from both internal and external factors. Those that are a result of natural changes in DNA structure are termed spontaneous mutations, whereas those that result from changes caused by environmental chemicals or radiation are induced mutations. NH2 N H H H N N H H H C H N H H H H N N N A H N O N Cytosine H H N C O N Spontaneous Replication Errors ◗ H OH O G O N H Thymine H H N T O N H Causes of Mutations 17.11 Purine and pyrimidine bases exist in different forms called tautomers. (a) A tautomeric shift occurs when a proton changes its position, resulting in a rare tautomeric form. (b) Standard and anomalous base-pairing arrangements occur if bases are in the rare tautomeric forms. Base mispairings due to tautomeric shifts were originally thought to be a major source of errors in replication, but such structures have not been detected in DNA, and most evidence now suggests that other types of anomalous pairings (see Figure 17.14) are responsible for replication errors. OH H3 C H T N Replication is amazingly accurate: fewer than one in a billion errors are made in the course of DNA synthesis (Chapter 12). However, spontaneous replication errors do occasionally occur. The primary cause of spontaneous replication errors was formerly thought to be tautomeric shifts, in which the positions of protons in the DNA bases change. Purine and pyrimidine bases exist in different chemical forms called tautomers ( ◗ FIGURE 17.11a). The two tautomeric forms of each base are in dynamic equilibrium, although one form is more common than the other. The standard Watson and Crick base pairings — adenine with thymine, and cytosine with guanine — are between the common forms of the bases, but, if the bases are in their rare tautomeric forms, other base pairings are possible ( ◗ FIGURE 17.11b). Watson and Crick proposed that tautomeric shifts might produce mutations, and for many years their proposal was the accepted model for spontaneous replication errors, but there has never been convincing evidence that the rare tautomers are the cause of spontaneous mutations. Furthermore, research now shows little evidence of these structures in DNA. Mispairing can also occur through wobble, in which normal, protonated, and other forms of the bases are Rare forms Common forms Proton shift O Mutation rate is the frequency with which a specific mutation arises, whereas mutation frequency is the incidence of a mutation within a defined group of individual organisms. Rates of mutations are generally low and are affected by environmental and genetic factors. H N N A H N N H Adenine H N N H H (b) Standard base-pairing arrangements H H O H3 C T H N N N A N H H N N N H O Thymine (common form) Adenine (common form) H N H C H H O N H N O H N H N N G N N H Cytosine (common form) Guanine (common form) Anomalous base-pairing arrangements H H H N H C H N N N H N A N N H O Cytosine (rare form) T N Adenine (commom form) H O H3 C H H N H O H N N G N N N O H N H Thymine (common form) Guanine (rare form) Gene Mutations and DNA Repair Non-Watson-Crick base pairing O H3 C T N O H N N O H N H N N G N H Thymine–guanine wobble H N H H H C H N H N O H N N N N A N H Cytosine–adenine protonated wobble ◗ 17.12 Nonstandard base pairings can occur as a result of the flexibility in DNA structure. Thymine and guanine can pair through wobble between normal bases. Cytosine and adenine can pair through wobble when adenine is protonated (has an extra hydrogen). able to pair because of flexibility in the DNA helical structure ( ◗ FIGURE 17.12). These structures have been detected in DNA molecules and are now thought to be responsible for many of the mispairings in replication. When a mismatched base has been incorporated into a newly synthesized nucleotide chain, an incorporated error is said to have occurred. Suppose that, in replication, thymine (which normally pairs with adenine) mispairs with guanine through wobble ( ◗ FIGURE 17.13). In the next round of replication, the two mismatched bases separate, and each serves as template for the synthesis of a new nucleotide strand. This time, thymine pairs with adenine, producing another copy of the original DNA sequence. On the other strand, however, the incorrectly incorporated guanine serves as the template and pairs with cytosine, producing a new DNA molecule that has an incorporated error — a 1 DNA strands separate for replication. DNA Concepts Spontaneous replication errors arise from altered base structures and from wobble base pairing. Small insertions and deletions may occur through strand slippage in replication and through unequal crossing over. TTCG TTCG AAG C Wild type AGGC TCCG AGG C Mutant TTC G A G GC Wild type AAG C ◗ 17.13 CG pair in place of the original TA pair (a TA : CG base substitution). The original incorporated error leads to a replication error, which creates a permanent mutation, because all the base pairings are correct and there is no mechanism for repair systems to detect the error. Mutations due to small insertions and deletions also may arise spontaneously in replication and crossing over. Strand slippage may occur when one nucleotide strand forms a small loop ( ◗ FIGURE 17.14). If the looped-out nucleotides are on the newly synthesized strand, an insertion results. At the next round of replication, the insertion will be incorporated into both strands of the DNA molecule. If the looped-out nucleotides are on the template strand, then there is a deletion on the newly replicated strand, and this deletion will be perpetuated in subsequent rounds of replication. During normal crossing over, the homologous sequences of the two DNA molecules align, and crossing over produces no net change in the number of nucleotides in either molecule. Misaligned pairing may cause unequal crossing over, which results in one DNA molecule with an insertion and the other with a deletion ( ◗ FIGURE 17.15). Some DNA sequences are more likely than others to undergo strand slippage or unequal crossing over. Stretches of repeated sequences, such as trinucleotide repeats or homopolymeric repeats (more than five repeats of the same base in a row), are prone to strand slippage. Stretches with more repeats are more likely to undergo strand slippage. Duplicated or repetitive sequences may misalign during pairing, leading to unequal crossing over. Both strand slippage and unequal crossing over produce duplicated copies of sequences, which in turn promote further strand slippage and unequal crossing over. This chain of events may explain the phenomenon of anticipation often observed for expanding trinucleotide repeats. 2 Thymine on the original template strand base pairs with guanine through wobble, leading to an incorporated error. TTCG TTCG AAG C 485 TT GC AAC G Wobble base pairing leads to a replicated error. Wild type 3 At the next round of replication, the guanine nucleotide pairs with cytosine, leading to a transition mutation. 486 Chapter 17 AATTAATT TTAATTAA Newly synthesized strand 5’ TACGGACTGAAAA 3’ Template strand 3’ ATGCCTGACTTTTTGCGAAG 5’ 1 Newly synthesized strand loops out,… AATTAATT TTAATTAA 3 Template strand loops out,… A 5’ ACGGACTGAA A 3’ 3’ TGCCTGAC T T TTTGCGAA 5’ Unequal crossing over 5’ ACGGACTGAA AA 3’ 3’ TGCCTGACTT TTGCGAA 5’ T 2 …resulting in the addition of one nucleotide on the new strand. AATTAATT TTAATTAA 4 …resulting in the omission of one nucleotide on the new strand. A 5’ ACGGACTGAA AAACGCTT 3’ 3’ TGCCTGACTT TTTGCGAA 5’ 1 If homologous chromosomes misalign during crossing over,… AATT TTAA AATT TTAA 2 …one crossover product contains an insertion… 5’ ACGGACTGAA AACGCTT 3’ 3’ TGCCTGACTT TTGCGAA 5’ T AATTAATTAATT TTAATTAATTAA ◗ 17.14 Insertions and deletions may result from strand slippage. 3 …and the other has a deletion. AATT TTAA ◗ Spontaneous Chemical Changes 17.15 Unequal crossing over produces insertions and deletions. In addition to spontaneous mutations that arise in replication, mutations also result from spontaneous chemical changes in DNA. One such change is depurination, the loss of a purine base from a nucleotide. Depurination results when the covalent bond connecting the purine to the 1-carbon atom of the deoxyribose sugar breaks ( ◗ FIGURE 17.16a), producing an apurinic site — a nucleotide that lacks its purine base. An apurinic site cannot act as a template for a complementary base in replication. In the absence of base-pairing constraints, an incorrect nucleotide (most often adenine) is incorporated into the newly synthesized DNA strand opposite the apurinic site ( ◗ FIGURE 17.16b), frequently leading to an incorporated error. The incorporated error is then transformed into a replication error at the next round of replication. Depurination is a common cause of spontaneous mutation; a mammalian cell in culture loses approximately 10,000 purines every day. Another spontaneously occurring chemical change that takes place in DNA is deamination, the loss of an amino group (NH2) from a base. Deamination may occur spontaneously or be induced by mutagenic chemicals. (a) (b) DNA sugar–phosphate backbone 5’ 1 During replication, the apurinic site cannot provide a template for a complementary base on the newly synthesized strand. 2 A nucleotide with the incorrect base (most often A) is incorporated into the newly synthesized strand. 3 At the next round of replication, this incorrectly incorporated base will be used as a template,… Bases Template strands AACG T Pyrimidine G G Purine G OH 3’ ◗ DNA TGGC ACC G AACG T GC Apurinic site T GC ACC G Depurination Strand separation 4 …leading to a permament mutation. Mutant TTG C AACG Replication T GC T GC Strand separation Replication ACC G ACCG TGG C Normal DNA molecule (no mutation) 17.16 Depurination, loss of a purine base from the nucleotide, produces an apurinic site. T GC AACG 5 A nucleotide is incorporated into the newly synthesized strand opposite the apurinic site. Gene Mutations and DNA Repair (a) (b) NH2 C H NH2 O H H Deamination N O N U H Cytosine ◗ 17.17 H H3 C N N C O Uracil H N O H3 C N H Deamination O 5-Methylcytosine (5mC) T N N H O Thymine Deamination alters DNA bases. Deamination may alter the pairing properties of a base: the deamination of cytosine, for example, produces uracil ( ◗ FIGURE 17.17a), which pairs with adenine during replication. After another round of replication, the adenine will pair with thymine, creating a TA pair in place of the original CG pair (CG : UA : TA); this chemical change is a transition mutation. This type of mutation is usually repaired by enzymes that remove uracil whenever it is found in DNA. The ability to recognize the product of cytosine deamination may explain why thymine, not uracil, is found in DNA. Some cytosine bases in DNA are naturally methylated and exist in the form of 5-methylcytosine (5mC; see p. 000 in Chapter 10 and Figure 10.19), which when deaminated becomes thymine ( ◗ FIGURE 17.17b). Because thymine pairs with adenine in replication, the deamination of 5-methylcytosine changes an original CG pair to TA (CG : 5mCA : TA). This change cannot be detected by DNA repair systems, because it produces a normal base. Consequently, CG : TA transitions occur frequently in eukaryotic cells. Concepts Some mutations arise from spontaneous alterations to DNA structure, such as depurination and deamination, which may alter the pairing properties of the bases and cause errors in subsequent rounds of replication. Chemically Induced Mutations Although many mutations arise spontaneously, a number of environmental agents are capable of damaging DNA, including certain chemicals and radiation. Any environmental agent that significantly increases the rate of mutation above the spontaneous rate is called a mutagen. The first discovery of a chemical mutagen was made by Charlotte Auerbach, who was born in Germany to a Jewish family in 1899. After attending university in Berlin and doing research, she spent several years teaching at various schools in Berlin. Faced with increasing anti-Semitism in Nazi Germany, Auerbach immigrated to Britain, where she conducted research on the development of mutants in Drosophila. There she met Herman Muller, who had shown that radiation induces mutations; he suggested that Auerbach try to obtain mutants by treating Drosophila with chemicals. Her initial attempts met with little success. Other scientists were conducting top-secret research on mustard gas (used as a chemical weapon in World War I) and noticed that it produced many of the same effects as radiation. Auerbach was asked to determine whether mustard gas was mutagenic. Collaborating with pharmacologist J. M. Robson, Auerbach studied the effects of mustard gas on Drosophila melanogaster. The experimental conditions were crude. They heated liquid mustard gas over a Bunsen burner on the roof of the pharmacology building, and the flies were exposed to the gas in a large chamber. After developing serious burns on her hands from the gas, Auerbach let others carry out the exposures, and she analyzed the flies. Auerbach and Robson showed that mustard gas is indeed a powerful mutagen, reducing the viability of gametes and increasing the numbers of mutations seen in the offspring of exposed flies. Because the research was part of the secret war effort, publication of their findings was delayed until 1947. www.whfreeman.com/pierce Muller A brief history of Herman Base analogs One class of chemical mutagens consists of base analogs, chemicals with structures similar to that of any of the four standard bases of DNA. DNA polymerases cannot distinguish these analogs from the standard bases; so, if base analogs are present during replication, they may be incorporated into newly synthesized DNA molecules. For example, 5-bromouracil (5BU) is an analog of thymine; it has the same structure as that of thymine except that it has a bromine (Br) atom on the 5-carbon atom instead of a methyl group ( ◗ FIGURE 17.18a). Normally, 5-bromouracil pairs with adenine just as thymine does, but it occasionally mispairs with guanine ( ◗ FIGURE 17.18b), leading to a transition (TA : 5BUA : 5BUG : CG), as shown in 487 488 Chapter 17 (a) (b) Normal base T H O Br H N Normal pairing Base analog O H3 C Bu H N N O Br H U H N O H N Bu H N H – H N N O Adenine N G N N H 5-Bromouracil O O Br N A N H O 5-Bromouracil H N H N O Thymine N Mispairing H H N H 5-Bromouracil (ionized) Guanine ◗ 17.18 5-Bromouracil (a base analog) resembles thymine, except that it has a bromine atom in place of a methyl group on the 5-carbon atom. Because of the similarity in their structures, 5-bromouracil may be incorporated into DNA in place of thymine. Like thymine, 5-bromouracil normally pairs with adenine but, when ionized, it may pair with guanine through wobble. ◗ FIGURE 17.19. Through mispairing, 5-bromouracil may also be incorporated into a newly synthesized DNA strand opposite guanine. In the next round of replication, 5-bromouracil may pair with adenine, leading to another transition (GC : G5BU : A5BU : AT). Another mutagenic chemical is 2-aminopurine (2AP), which is a base analog of adenine ( ◗ FIGURE 17.20). Normally, 2-aminopurine base pairs with thymine, but it may mispair with cytosine, causing a transition mutation (TA : T2AP : C2AP : CG). Alternatively, 2-aminopurine may be incorporated through mispairing into the newly synthesized DNA opposite cytosine and later pair with thymine, leading to a CG : C2AP : T2AP : TA transition. Thus, both 5-bromouracil and 2-aminopurine can produce transition mutations. In the laboratory, mutations by 3’ 3’ 5’ GAC CTG 5’ 3’ 5’ GAC 5’ Strand separation 3’ 5’ GAC CBG donate alkyl groups. These agents include methyl (CH3) and ethyl (CH3 – CH2) groups, which are added to nucleotide bases by some chemicals. For example, ethylmethanesulfonate (EMS) adds an ethyl group to guanine, producing 6-ethylguanine, which pairs with thymine (see Figure 17.20a). Thus, EMS produces CG : TA transitions. EMS is also capable of adding an ethyl group to thymine, producing 4-ethylthymine, which then pairs with guanine, leading to a TA : CG transition. Because EMS produces both CG : TA and TA : CG transitions, mutations produced by EMS can be reversed by additional treatment with EMS. Mustard gas is another alkylating agent. 5’ 3’ 5’ GAC CTG 5’ CBG 3’ GGC CBG GGC 5’ Replication CTG 3’ 3’ 5’ Replication GAC CTG 5’ 3’ 5-Bromouracil can lead to a replicated error. 3’ 5’ GGC CCG Mutant 5’ 3’ 3’ 5’ GAC CBG 5’ 3’ Strand separation 5’ 3’ Conclusion: Incorporation of bromouracil followed by mispairing leads to a TA CG transition mutation. ◗ 17.19 Replicated error 3’ 3’ 5’ 3 In the next replication, this guanine nucleotide pairs with cytosine, leading to a permanent mutation. 5’ 3’ Strand separation 5’ 3’ Incorporated error GAC Alkylating agents Alkylating agents are chemicals that 2 5-Bromouracil may mispair with guanine in the next round of replication. 1 In replication, 5-bromouracil may become incorporated into DNA in place of thymine, producing an incorporated error. 3’ base analogs can be reversed by treatment with the same analog or by treatment with a different analog. 5’ CBG 3’ Replication 4 If 5-bromouracil pairs with adenine, no replication error occurs. Gene Mutations and DNA Repair Original base Mutagen Modified base Pairing partner Type of mutation H3C CH2 O N O N 6 N G1 N (a) H EMS N H H H Nitrous acid (HNO2) U H N N N H Adenine H HO 1 C3N N Hydroxylamine (NH2OH) H H N 1 3N H N N Cytosine H N N A N N O TA N Uracil H N A H O NH2 H N N Cytosine H CG H O (c) TA Thymine H O H H CG H NH2 C3N TA O O 6-Ethylguanine N CG N N H H H 1 H Guanine 1 T3 N N H H CH3 4 1 N N (b) O 6 O Hydroxylaminocytosine H Adenine ◗ 17.20 Chemicals may alter DNA bases. (a) The alkylating agent ethylmethanesulfonate (EMS) adds an ethyl group to guanine, producing 6-ethylguanine, which pairs with thymine, producing a CG : TA transition mutation. (b) Nitrous acid deaminates cytosine to produce uracil, which pairs with adenine, producing a CG : TA transition mutation. (c) Hydroxylamine converts cytosine into hydroxylaminocytosine, which frequently pairs with adenine, leading to a CG : TA transition mutation. Deamination In addition to its spontaneous occurrence (see Figure 17.17), deamination can be induced by some chemicals. For instance, nitrous acid deaminates cytosine, creating uracil, which in the next round of replication pairs with adenine (see Figure 17.20b), producing a CG : TA transition mutation. Nitrous acid changes adenine into hypoxanthine, which pairs with cytosine, leading to a TA : CG transition. Nitrous acid also deaminates guanine, producing xanthine, which pairs with cytosine just as guanine does; however xanthine may also pair with thymine, leading to a CG : TA transition. Nitrous acid produces exclusively transition mutations and, because both CG : TA and TA : CG transitions are produced, these mutations can be reversed with nitrous acid. Hydroxylamine Hydroxylamine is a very specific basemodifying mutagen that adds a hydroxyl group to cytosine, converting it into hydroxylaminocytosine (see Figure 17.20c). This conversion increases the frequency of a rare tautomer that pairs with adenine instead of guanine and leads to CG : TA transitions. Because hydroxylamine acts only on cytosine, it will not generate TA : CG transitions; thus, hydroxylamine will not reverse the mutations that it produces. Oxidative reactions Reactive forms of oxygen (including superoxide radicals, hydrogen peroxide, and hydroxyl radicals) are produced in the course of normal aerobic metabolism, as well as by radiation, ozone, peroxides, and certain drugs. These reactive forms of oxygen damage DNA and induce mutations by bringing about chemical changes to DNA. For example, oxidation converts guanine into 8-oxy-7,8-dihydrodeoxyguanine ( ◗ FIGURE 17.21), which frequently mispairs with adenine instead of cytosine, causing a GC : TA transversion mutation. Intercalating agents Intercalating agents, such as proflavin, acridine orange, ethidium bromide, and dioxin are 489 490 Chapter 17 (a) H H O N N G N H N H Oxidative radicals N O (b) O N N H N H N H Guanine N H2N H H NH2 H Proflavin 8-Oxy-7,8-dihydrodeoxyguanine (may mispair with adenine) ◗ 17.21 Oxidative radicals convert guanine into 8-oxy-7,8-dihydrodeoxyguanine, which frequently mispairs with adenine instead of cytosine, producing a CG : TA transversion. H H Nitrogenous bases H H3C Intercalated molecule H H N H CH3 N N CH3 about the same size as a nucleotide ( ◗ FIGURE 17.22a). They produce mutations by sandwiching themselves (intercalating) between adjacent bases in DNA, distorting the three-dimensional structure of the helix and causing single-nucleotide insertions and deletions in replication ( ◗ FIGURE 17.22b). These insertions and deletions frequently produce frameshift mutations (which change all amino acids downstream of the mutation), and so the mutagenic effects of intercalating agents are often severe. Because intercalating agents generate both additions and deletions, they can reverse the effects of their own mutations. H H N H H H H CH3 Acridine orange ◗ 17.22 Intercalating agents such as proflavin and acridine orange insert themselves between adjacent bases in DNA, distorting the three-dimensional structure of the helix and causing single-nucleotide insertions and deletions in replication. Cosmic rays/Gamma rays Wavelength (nm) Concepts Chemicals can produce mutations by a number of mechanisms. Base analogs are inserted into DNA and frequently pair with the wrong base. Alkylating agents, deaminating chemicals, hydroxylamine, and oxidative radicals change the structure of DNA bases, thereby altering their pairing properties. Intercalating agents wedge between the bases and cause single-base insertions and deletions in replication. 1 X-rays Shorter wavelengths are more energetic. 10 400 Ultraviolet (UV) Blue 10 2 Blue green 500 Green Visible light Radiation In 1927, Herman Muller demonstrated that mutations in fruit flies could be induced by X-rays. The results of subsequent studies showed that X-rays greatly increase mutation rates in all organisms. The high energies of X-rays, gamma rays, and cosmic rays ( ◗ FIGURE 17.23) are all capable of penetrating tissues and damaging DNA. These forms of radiation, called ionizing radiation, dislodge electrons from the atoms that they encounter, changing stable molecules into free radicals and reactive ions, which then alter the structures of bases and break phosphodiester Yellow green 10 3 Yellow 600 Orange 10 4 Red 700 Infrared (IR) 105 Longer wavelengths are less energetic. ◗ 17.23 106 In the electromagnetic spectrum, as wavelength decreases, energy increases. (Adapted from Life 6e, figure 8.5). Violet Microwaves/Radio waves Gene Mutations and DNA Repair (a) (b) T AG G T G CATC TCCAAC GTAG UV light 3’ 3’ T P P Thymine bases T 5’ Covalent bonds 5’ Sugar–phosphate 1 UV light causes adjacent backbone thymines to be cross-linked by covalent bonds. 2 Thymine dimers distort the configuration of the DNA molecule. ◗ 17.24 Pyrimidine dimers result from Ultraviolet light. (a) Formation of thymine dimer. (b) Distorted DNA. bonds in DNA. Ionizing radiation also frequently results in double-strand breaks in DNA. Attempts to repair these breaks can produce chromosome mutations (discussed in Chapter 9). Ultraviolet light has less energy than that of ionizing radiation and does not eject electrons and cause ionization but is nevertheless highly mutagenic. Purine and pyrimidine bases readily absorb UV light, resulting in the formation of chemical bonds between adjacent pyrimidine molecules on the same strand of DNA and in the creation of structures called pyrimidine dimers ( ◗ FIGURE 17.24a). Pyrimidine dimers consisting of two thymine bases (called thymine dimers) are most frequent, but cytosine dimers and thymine – cytosine dimers also can form. These dimers distort the configuration of DNA ( ◗ FIGURE 17.24b) and often block replication. Most pyrimidine dimers are immediately repaired by mechanisms discussed later in this chapter, but some escape repair and inhibit replication and transcription. When pyrimidine dimers block replication, cell division is inhibited and the cell usually dies; for this reason, UV light kills bacteria and is an effective sterilizing agent. For a mutation — a hereditary error in the genetic instructions — to occur, the replication block must be overcome. How do bacteria and other organisms replicate despite the presence of thymine dimers? Bacteria can circumvent replication blocks produced by pyrimidine dimers and other types of DNA damage by means of the SOS system. This system allows replication blocks to be overcome, but in the process makes numerous mistakes and greatly increases the rate of mutation. Indeed, the very reason that replication can proceed in the presence of a block is that the enzymes in the SOS system do not strictly adhere to the base-pairing rules. The trade-off is that replication may continue and the cell survives, but only by sacrificing the normal accuracy of DNA synthesis. The SOS system is complex, including the products of at least 25 genes. A protein called RecA binds to the damaged DNA at the blocked replication fork and becomes activated. This activation promotes the binding of a protein called LexA, which is a repressor of the SOS system. The activated RecA complex induces LexA to undergo self-cleavage, destroying its repressive activity. This inactivation enables other SOS genes to be expressed, and the products of these genes allow replication of the damaged DNA to proceed. The SOS system allows bases to be inserted into a new DNA strand in the absence of bases on the template strand, but these insertions result in numerous errors in the base sequence. Eukaryotic cells have a specialized DNA polymerase called polymerase  (eta) that bypasses pyrimidine dimers. Polymerase  preferentially inserts AA opposite a pyrimidine dimer. This strategy seems to be reasonable because about two-thirds of pyrimidine dimers are thymine dimers. However, the insertion of AA opposite a CT dimer results in a CG : AT transversion. Polymerase  is therefore said to be an error-prone polymerase. Concepts Ionizing radiation such as X-rays and gamma rays damage DNA by dislodging electrons from atoms; these electrons then break phosphodiester bonds and alter the structure of bases. Ultraviolet light causes mutations primarily by producing pyrimidine dimers that disrupt replication and transcription. The SOS system enables bacteria to overcome replication blocks but introduces mistakes in replication. 491 492 Chapter 17 The Study of Mutations Because mutations often have detrimental effects, they have been the subject of intense study by geneticists. These studies have included the analysis of reverse mutations, which are often sources of important insight into how mutations cause DNA damage; the development of tests to determine the mutagenic properties of chemical compounds; and the investigation of human populations tragically exposed to high levels of radiation. The Analysis of Reverse Mutations The study of reverse mutations (reversions) can provide useful information about how mutagens alter DNA structure. For example, any mutagen that produces both AT : GC and GC : AT transitions should be able to reverse its own mutations. However, if the mutagen produces only GC : AT transitions, then reversion by the same mutagen is not possible. Hydroxylamine (see Figure 17.20c) exhibits this type of one-way mutagenic activity; it causes GC : AT transitions but is incapable of reversing the mutations that it produces; so we know that it does not produce AT : GC transitions. Ethylmethanesulfonate (see Figure 17.20a), on the other hand, produces GC : AT transitions and reverses its own mutations; so we know that it also produces TA : CG transitions. Analyses of the ability of different mutagens to cause reverse mutations can be sources of insight into the molecular nature of the mutations. We can use reverse mutations to determine whether a mutation results from a base substitution or a frameshift. Base analogs such as 2-aminopurine cause transitions, and intercalating agents such as acridine orange (see Figure 17.22) produce frameshifts. If a chemical reverses mutations produced by 2-aminopurine but not those produced by acridine orange, we can conclude that Table 17.4 the chemical causes transitions and not frameshifts. If nitrous acid (which produces both GC : AT and AT : GC transitions) reverses mutations produced by the chemical but hydroxylamine (which causes only GC : AT transitions) does not, we know that, like hydroxylamine, the chemical produces only GC : AT transitions. Table 17.4 illustrates the reverse mutations that are theoretically possible among several mutagenic agents. The actual ability of mutagens to produce reversals is more complex than suggested by Table 17.4 and depends on environmental conditions and the organism tested. Concepts The study of the ability of mutagenic agents to produce reverse mutations provides important information about how mutagens alter DNA. Detecting Mutations with the Ames Test Humans in industrial societies are surrounded by a multitude of artificially produced chemicals: more than 50,000 different chemicals are in commercial and industrial use today, and from 500 to 1000 new chemicals are introduced each year. Some of these chemicals are potential carcinogens and may cause potential harm to humans. How can we determine which chemicals are hazardous? In a few instances, previous human exposure to a specific chemical is correlated with an increase in cancer incidence, providing good evidence that the chemical is a carcinogen. But, ideally, we would like to know which chemicals are hazardous before we are exposed to them. One method for testing the cancer-causing potential of chemicals is to administer them to laboratory animals (rats or mice) and compare the inci- Theoretical reverse mutations possible by various mutagenic agents TReversal of Mutations by Mutagen Type of Mutation 5-Bromouracil 2-Aminopurine Ethyl methane sulfonate Nitrous acid Hydroxylamine Acridine orange 5-Bromouracil CG 4 TA     /  2-Aminopurine CG 4 TA     /  Nitrous acid CG 4 TA     /  Ethylmethane sulfonate CG 4 TA     /  Hydroxylamine CG 4 TA       Acridine orange Frameshift       Note:  indicates that reverse mutations occur,  indicates that reverse mutations do not occur, and / indicates that only some mutations are reversed. Not all reverse mutations are equally likely. Gene Mutations and DNA Repair dence of cancer in the treated animals with that of control animals. These tests are unfortunately time consuming and expensive. Furthermore, the ability of a substance to cause cancer in rodents is not always indicative of its effect on humans. After all, we aren’t rats! In 1974, Bruce Ames developed a simple test for evaluating the potential of chemicals to cause cancer. The Ames test is based on the principle that both cancer and mutations result from damage to DNA, and the results of experiments have demonstrated that 90% of known carcinogens are also mutagens. Ames proposed that mutagenesis in bacteria could serve as an indicator of carcinogenesis in humans. The Ames test uses four strains of the bacterium Salmonella typhimurium that have defects in the lipopolysaccharide coat, which normally protects the bacteria from chemicals in the environment. Furthermore, their DNA repair system has been inactivated, enhancing their susceptibility to mutagens. One of the four strains used in the Ames test detects base-pair substitutions; the other three detect different types of frameshift mutations. Each strain carries a mutation that renders it unable to synthesize the amino acid histidine (his), and the bacteria are plated onto medium that lacks histidine ( ◗ FIGURE 17.25). Only bacteria that have undergone a reverse mutation of the histidine gene (his : his) are able to synthesize histidine and grow on the medium. Different dilutions of a chemical to be tested are added to plates inoculated with the bacteria, and the number of mutant bacterial colonies that appear on each plate is compared with the number that appear on control plates with no chemical (arose through spontaneous mutation). Any chemical that significantly increases the number of colonies appearing on a treated plate is mutagenic and is probably also carcinogenic. Some compounds are not active carcinogens but may be converted into cancer-causing compounds in the body. To make the Ames test sensitive for such potential carcinogens, a compound to be tested is first incubated in mammalian liver extract that contains metabolic enzymes. The Ames test has been applied to thousands of chemicals and commercial products. An early demonstration of its usefulness was the discovery, in 1975, that most hair dyes sold in the United States contained compounds that were mutagenic to bacteria. These compounds were then removed from most hair dyes. Concepts The Ames test uses his strains of bacteria to test chemicals for their ability to produce his : his mutations. Because mutagenic activity and carcinogenic potential are closely correlated, the Ames test is widely used to screen chemicals for their cancer-causing potential. his – bacteria 1 Bacterial his – strains are mixed with liver enzymes (which have the ability to convert compounds into potential mutagens). 2 Some of the bacterial strains are also mixed with the chemical to be tested for mutagenic activity. 3 The bacteria are then plated on medium that lacks histidine. Incubate Incubate 4 Bacterial colonies that appear on the plates have undergone a his – his + mutation. Control plate (no chemical) Treatment plate (chemical added) 5 Any chemical that significantly increases the number of colonies appearing on the plate is mutagenic and therefore probably also carcinogenic. ◗ 17.25 The Ames test is used to identify chemical mutagens. www.whfreeman.com/pierce test More information on the Ames Radiation Exposure in Humans People are routinely exposed to low levels of radiation from cosmic, medical, and environmental sources, but there have also been tragic events that produced exposures of much higher degree. 493 494 Chapter 17 ◗ 17.26 Hiroshima was destroyed by an atomic bomb on August 6, 1945. The atomic explosion produced many somatic mutations among the survivors. (Stanley Troutman/AP.) On August 6, 1945, a high-flying American plane dropped a single atomic bomb on the city of Hiroshima, Japan. The explosion devastated 4.5 square miles of the city, killed from 90,000 to 140,000 people, and injured almost as many ( ◗ FIGURE 17.26). Three days later, the United States dropped an atomic bomb on the city of Nagasaki, this time destroying 1.5 square miles of city and killing between 60,000 and 80,000 people. Huge amounts of radiation were released during these explosions and many people were exposed. After the war, a joint Japanese – U.S. effort was made to study the biological effects of radiation exposure on the survivors of the atomic blasts and their children. Somatic mutations were examined by studying radiation sickness and cancer among the survivors; germ-line mutations were assessed by looking at birth defects, chromosome abnormalities, and gene mutations in children born to people that had been exposed to radiation. Geneticist James Neel and his colleagues examined almost 19,000 children of parents who were within 2000 meters of the center of the atomic blast at Hiroshima or Nagasaki, along with a similar number of children whose parents did not receive radiation exposure. Radiation doses were estimated for the child’s parents on the basis of careful assessment of the parents’ location, posture, and position at the time of the blast. A blood sample was collected from each child, and gel electrophoresis was used to investigate amino acid substitutions in 28 proteins. When rare variants were detected, blood samples from the child’s parents also were analyzed to establish whether the variant was inherited or a new mutation. Of a total of 289,868 genes examined by Neel and his colleagues, only one mutation was found in the children of exposed parents; no mutations were found in the control group. From these findings, a mutation rate of 3.4  106 was estimated for the children whose parents were exposed to the blast, which is within the range of spontaneous mutation rates observed for other eukaryotes. Neel and his colleagues also examined the frequency of chromosome mutations, sex ratios of children born to exposed parents, and frequencies of chromosome aneuploidy. There was no evidence in any of these assays for increased mutations among the children of the people who were exposed to radiation from the atomic explosions, suggesting that germ-line mutations were not elevated. Animal studies clearly show that radiation causes germ-line mutations; so why was there no apparent increase in germ-line mutations among the inhabitants of Hiroshima and Nagasaki? The exposed parents did exhibit an increased incidence of leukemia and other types of cancers; so somatic mutations were clearly induced. The answer to the question is not known, but the lack of germ-line mutations may be due to the fact that those persons who received the largest radiation doses died soon after the blasts. www.whfreeman.com/pierce Information on studies of the health effects of the nuclear blasts at Hiroshima and Nagasaki The Techa River in southern Russia is another place where people have been tragically exposed to high levels of radiation. The Mayak nuclear facility, located 60 miles from the city of Chelyabinsk, produced plutonium for nuclear warheads in the early days of the Cold War. Between 1949 and 1956, this plant dumped some 76 million cubic meters of radioactive sludge into the Techa River. People downstream used the river for drinking water and crop irrigation; some received radiation doses 1700 times the annual amount considered safe by today’s standards. Radiation in the area was further elevated by a series of nuclear accidents Gene Mutations and DNA Repair at the Mayak plant; the worst was an explosion of a radioactive liquid storage tank in 1957, which showered radiation over a 27,000-square-kilometer area. Although Soviet authorities suppressed information about the radiation problems along the Techa until the 1990s, Russian physicians lead by Mira Kossenko quietly began studying cancer and other radiation-related illnesses among the inhabitants in the 1960s. They found that the overall incidence of cancer was elevated among people who lived on the banks of the Techa River. Most data on radiation exposure in humans are from the intensive study of the survivors of the atomic bombing of Hiroshima and Nagasaki. However, the inhabitants of Hiroshima and Nagasaki were exposed in one intense burst of radiation, and these data may not be appropriate for understanding the effects of long-term low-dose radiation. Today, U.S. and Russian scientists are studying the people of the Techa River region, as well as those exposed to radiation in the Chernobyl accident (see the story at the beginning of this chapter), in an attempt to better understand the effects of chronic radiation exposure on human populations. DNA Repair The integrity of DNA is under constant assault from radiation, chemical mutagens, and spontaneously arising changes. In spite of this onslaught of damaging agents, the rate of mutation remains remarkably low, thanks to the efficiency with which DNA is repaired. It has been estimated that fewer than one in a thousand DNA lesions becomes a mutation; all the others are corrected. There are a number of complex pathways for repairing DNA, but several general statements can be made about DNA repair. First, most DNA repair mechanisms require two nucleotide strands of DNA because most replace whole nucleotides, and a template strand is needed to specify the base sequence. The complementary, double-stranded nature of DNA not only provides stability and efficiency of replication, but also enables either strand to provide the information necessary for correcting the other. A second general feature of DNA repair is redundancy, meaning that many types of DNA damage can be corrected by more than one pathway of repair. This redundancy testifies to the extreme importance of DNA repair to the survival of the cell: it ensures that almost all mistakes are corrected. If a mistake escapes one repair system, it’s likely to be repaired by another system. We will consider four general mechanisms of DNA repair: mismatch repair, direct repair, base-excision repair, and nucleotide-excision repair (Table 17.5). Mismatch Repair Replication is extremely accurate: each new copy of DNA has only one error per billion nucleotides. However, in the process of replication, mismatched bases are incorporated Table 17.5 Summary of common DNA repair mechanisms Repair System Type of Damage Repaired Mismatch Replication errors, including mispaired bases and strand slippage Direct Pyrimidine dimers; other specific types of alterations Base-excision Abnormal bases, modified bases, and pyrimidine dimers Nucleotide-excision DNA damage that distorts the double helix, including abnormal bases, modified bases, and pyrimidine dimers into the new DNA with a frequency of about 104 to 105; so most of the errors that initially arise are corrected and never become permanent mutations. Some of these corrections are made in proofreading (see p. 000 in Chapter 12). DNA polymerases have the capacity to recognize and correct mismatched nucleotides. When a mismatched nucleotide is added to a newly synthesized DNA strand, the polymerase stalls. It then uses its 3 : 5 exonuclease activity to back up and remove the incorrectly inserted nucleotide before continuing with 5 : 3 polymerization. Many incorrectly inserted nucleotides that escape detection by proofreading are corrected by mismatch repair (see p. 000 in Chapter 12). Incorrectly paired bases distort the three-dimensional structure of DNA, and mismatchrepair enzymes detect these distortions. In addition to detecting incorrectly paired bases, the mismatch-repair system corrects small unpaired loops in the DNA, such as those caused by strand slippage in replication (see Figure 17.14). Some trinucleotide repeats may form secondary structures on the unpaired strand (see Figure 17.6d), allowing them to escape detection by the mismatch-repair system. After the incorporation error has been recognized, mismatch-repair enzymes cut out the distorted section of the newly synthesized strand and fill the gap with new nucleotides, by using the original DNA strand as a template. For this strategy to work, mismatch repair must have some way of distinguishing between the old and the new strands of the DNA so that the incorporation error, and not part of the original strand, is removed. The proteins that carry out mismatch repair in E. coli differentiate between old and new strands by the presence of methyl groups on special sequences of the old strand. After replication, adenine nucleotides in the sequence GATC are methylated by an enzyme called Dam methylase. The process of methylation is delayed and so, immediately after replication, the old strand is methylated and the new 495 496 Chapter 17 (a) New DNA 1 In DNA replication, a mismatched base was added to the new strand. 2 Methylation at GATC sequences allows old and newly synthesized nucleotide strands to be differentiated: immediately after replication, the old strand will be methylated but the new strand will not. G GATC CTAG T Methyl group Old (template) DNA ◗ 17.27 Many incorrectly inserted nucleotides that escape proofreading are corrected by mismatch repair. strand is not ( ◗ FIGURE 17.27a). In E. coli, the proteins MutS, MutL, and MutH are required for mismatch repair. MutS binds to the mismatched bases and forms a complex with MutL and MutH; this complex is thought to bring an unmethylated GATC sequence in close proximity to the mismatched bases. MutH nicks the unmethylated strand at the GATC site ( ◗ FIGURE 17.27b), and exonucleases degrade the unmethylated strand from the nick to the mismatched bases ( ◗ FIGURE 17.27c). DNA polymerase and DNA ligase fill in the gap on the unmethylated strand with correctly paired nucleotides ( ◗ FIGURE 17.27d). Mismatch repair in eukaryotic cells is similar to that in E. coli, except that several proteins are related to MutS and several are related to MutL. These proteins function together in different combinations to detect different types of incorporation errors, such as mispaired bases and small unpaired loops. Eukaryotic cells do not have any proteins related to E. coli MutH. What enzyme makes the nick in eukaryotic cells is not clear. How the old and new strands are recognized in eukaryotic cells is not known, because in some eukaryotes, such as yeast and fruit flies, there is no detectable methylation of DNA. 3 Protein MutS binds to the mismatched bases and forms a complex with MutH and MutL; the mismatch is brought close to a methylated GATC sequence; and the new strand is identified. MutL, MutS, MutH (mismatch repair proteins) Nick (b) GATC CTAG MutH MutL Methyl group MutS T G 4 Exonucleases remove nucleotides on the new strand between the GATC sequence and the mismatch. (c) 5’ GATC CTAG Methyl group T 3’ DNA bases (d) GATC CTAG Methyl group Direct Repair Direct-repair mechanisms do not replace altered nucleotides but instead change them back into their original (correct) structures. One of the best-characterized directrepair mechanisms is photoreactivation of UV-induced pyrimidine dimers. E. coli and some eukaryotic cells possess an enzyme called photolyase, which uses energy captured from light to break the covalent bonds that link the pyrimidines in a dimer. Direct repair also corrects O6-methylguanine, an alkylation product of guanine that pairs with adenine, producing GC : TA transversions. An enzyme called O6-methylguanine-DNA methyltransferase removes the methyl group from O6-methylguanine, restoring the base to guanine ( ◗ FIGURE 17.28). T A glycosylases, each of which recognizes and removes a specific type of modified base by cleaving the bond that links that base to the 1-carbon atom of deoxyribose ( ◗ FIGURE 17.29a). Uracil glycosylase, for example, recognizes and removes uracil produced by the deamination of cytosine. Other glycosylases OCH3 N N G N O N Methyltransferase N N G O 6-Methylguanine ◗ N N NH2 Base-Excision Repair In base-excision repair, modified bases are first excised and then the entire nucleotide is replaced. The excision of modified bases is catalyzed by a set of enzymes called DNA 5 DNA polymerase then replaces the nucleotides, correcting the mismatch, and DNA ligase seals the nick in the sugar–phosphate backbone. NH2 CH3 Guanine 17.28 Direct repair changes nucleotides back into their original structures. Gene Mutations and DNA Repair recognize hypoxanthine, 3-methyladenine, 7-methylguanine, and other modified bases. After the base has been removed, an enzyme called AP (apurinic or apyrimidinic) endonuclease cuts the phosphodiester bond, and other enzymes remove the deoxyribose sugar ( ◗ FIGURE 17.29b). DNA polymerase then adds new nucleotides to the exposed 3-OH group ( ◗ FIGURE 17.29c), replacing a section of nucleotides on the damaged strand. The nick in the phosphodiester backbone is sealed by DNA ligase ( ◗ FIGURE 17.29d), and the original intact sequence is restored ( ◗ FIGURE 17.29e). (a) DNA P 5’ 3’ P P P P P P P P P P P P 3’ P 5’ P 3’ DNA glycosylase (b) P 5’ P P P P P Nucleotide-Excision Repair AP site 3’ P P P P P P P 5’ AP endonuclease (c) P 5’ P P OH P P P 3’ 5’ 3’ P P P P 5’ P P P 3’ 5’ P 3’ NTPs DNA polymerase Deoxyribose phosphate + dNMPs (d) P 5’ 3’ P P P P P P P P P P P P 3’ P 5’ Other Types of DNA Repair DNA ligase (e) 5’ P P P P P P P 3’ New DNA 3’ ◗ P P P P P P P The final repair pathway that we’ll consider is nucleotideexcision repair, which removes bulky DNA lesions that distort the double helix, such as pyrimidine dimers or large hydrocarbons attached to the DNA. Nucleotide-excision repair is quite versatile and can repair many different types of DNA damage. It is found in cells of all organisms from bacteria to humans and is one of the most important of all repair mechanisms. The process of nucleotide excision is complex; in humans, a large number of genes take part. First, a complex of enzymes scans DNA, looking for distortions of its threedimensional configuration ( ◗ FIGURE 17.30a and b). When a distortion is detected, additional enzymes separate the two nucleotide strands at the damaged region, and single-strand-binding proteins stabilize the separated strands ( ◗ FIGURE 17.30c). Next, the sugar – phosphate backbone of the damaged strand is cleaved on both sides of the damage. One cut is made 5 nucleotides upstream (on the 3 side) of the damage, and the other cut is made 8 nucleotides (in prokaryotes) or from 21 to 23 nucleotides (in eukaryotes) downstream (on the 5 side) of the damage ( ◗ FIGURE 17.30d). Part of the damaged strand is peeled away ( ◗ FIGURE 17.30e), and the gap is filled in by DNA polymerase and sealed by DNA ligase ( ◗ FIGURE 17.30f). 5’ 17.29 Base-excision repair excises modified bases and then replaces the entire nucleotide. The DNA repair pathways described so far respond to damage that is limited to one strand of a DNA molecule, leaving the other strand to be used as a template for the synthesis of new DNA during the repair process. Some types of DNA damage, however, affect both strands of the molecule and therefore pose a more severe challenge to the DNA repair machinery. Ionizing radiation frequently results in doublestrand breaks in DNA. The repair of double-strand breaks is frequently by homologous recombination. Models for homologous recombination were described in Chapter 12. Another type of damage that affects both strands is an interstrand cross-link, which arises when the two strands of a duplex are connected through covalent bonds. Interstrand cross-links are extremely toxic to cells because they halt replication. Several drugs commonly used in chemotherapy, including cisplatin, mitomycin C, psoralen, and nitrogen 497 498 Chapter 17 mustard, cause interstrand cross-links. Nitrogen mustard, which is structurally related to the mustard gas used by Charlotte Auerbach to induce mutations in Drosophila, was the first chemical agent to be used in chemotherapy treat- 1 Damage to the DNA, distorts the configuration of the molecule. (a) Damaged DNA 2 An enzyme complex recognizes the distortion resulting from damage. (b) 3 The DNA is separated, and single-strandbinding proteins stabilize the single strands. (c) 4 An enzyme cleaves the strand on both sides of the damage. (d) 5 A part of the damaged strand is removed,… (e) 5’ 3’ DNA polymerase, ligase (f) 6 …and the gap is filled in by DNA polymerase and sealed by DNA ligase. New DNA ment. Little is known about how interstrand cross-links are repaired. One model proposes that double-strand breaks are made on each side of the cross-link and are subsequently repaired by the pathways that repair double-strand breaks. Connecting Concepts The Basic Pathway of DNA Repair We have now examined several different mechanisms of DNA repair. What do these methods have in common? How are they different? Most methods of DNA repair depend on the presence of two strands, because nucleotides in the damaged area are removed and replaced. Nucleotides are replaced in mismatch repair, base excision repair, and nucleotide-excision repair, but are not replaced by direct-repair mechanisms. Repair mechanisms that include nucleotide removal utilize a common four-step pathway: 1. Detection: The damaged section of the DNA is recognized. 2. Excision: DNA repair endonucleases nick the phosphodiester backbone on one or both sides of the DNA damage. 3. Polymerization: DNA polymerase adds nucleotides to the newly exposed 3-OH group by using the other strand as a template and replacing damaged (and frequently some undamaged) nucleotides. 4. Ligation: DNA ligase seals the nicks in the sugar – phosphate backbone. The primary differences in the mechanisms of mismatch, base-excision, and nucleotide-excision repair are in the details of detection and excision. In base-excision and mismatch repair, a single nick is made in the sugar – phosphate backbone on one side of the damaged strand; in nucleotide-excision repair, nicks are made on both sides of the DNA lesion. In base-excision repair, DNA polymerase displaces the old nucleotides as it adds new nucleotides to the 3 end of the nick; in mismatch repair, the old nucleotides are degraded; and, in nucleotide-excision repair, nucleotides are displaced by helicase enzymes. All three mechanisms use DNA polymerase and ligase to fill in the gap produced by the excision and removal of damaged nucleotides. www.whfreeman.com/pierce DNA repair Additional information on Genetic Diseases and Faulty DNA Repair ◗ 17.30 Nucleotide-excision repair consists of four steps: detection of damage, excision of damage, polymerization, and ligation. Several human diseases are connected to defects in DNA repair. These diseases are often associated with high incidences of specific cancers, because defects in DNA repair Gene Mutations and DNA Repair ◗ 17.31 Xeroderma pigmentosum is a human disease that results from defects in DNA repair. (Ken Greer/Visuals Unlimited.) lead to increased rates of mutation. This concept is discussed further in Chapter 21. Among the best studied of the human DNA repair diseases is xeroderma pigmentosum ( ◗ FIGURE 17.31), a rare autosomal recessive condition that includes abnormal skin pigmentation and acute sensitivity to sunlight. Persons who have this disease also have a strong predisposition to skin cancer, with an incidence from 1000 to 2000 times that found in unaffected people. Sunlight includes a strong UV component; so exposure to sunlight produces pyrimidine dimers in the DNA of skin cells. Although human cells lack photolyase (the enzyme that repairs pyrimidine dimers in bacteria), most pyrimidine dimers in humans can be corrected by nucleotideexcision repair. However, the cells of most people with xeroderma pigmentosum are defective in nucleotideexcision repair, and many of their pyrimidine dimers go uncorrected and may lead to cancer. Xeroderma pigmentosum can result from defects in several different genes; studies have identified at least seven different xeroderma pigmentosum complementation groups, meaning that at least seven genes are required for nucleotide-excision repair in humans. Recent molecular research has led to the identification of genetic defects of nucleotide-excision repair associated with these complementation groups. Some persons with xeroderma pigmentosum have mutations in a gene encoding the protein that recognizes and binds to damaged DNA; others have mutations in a gene encoding helicase. Still others have defects in the genes that play a role in cutting the damaged strand on the 5 or 3 sides of the pyrimidine dimer. Some persons have a slightly different form of the disease (xeroderma pigmentosum variant) owing to mutations in the gene encoding polymerase , the DNA polymerase that bypasses pyrimidine dimers by inserting AA. Two other genetic diseases due to defects in nucleotideexcision repair are Cockayne syndrome and trichothiodystrophy (also known as brittle-hair syndrome). Persons who have either of these diseases do not have an increased risk of cancer but do exhibit multiple developmental and neurological problems. Both diseases result from mutations in some of the same genes that cause xeroderma pigmentosum. Several of the genes taking part in nucleotide-excision repair produce proteins that also play a role in recombination and the initiation of transcription. These other functions may account for the developmental symptoms seen in Cockayne syndrome and trichothiodystrophy. Another genetic disease caused by faulty DNA repair is an inherited form of colon cancer called hereditary nonpolyposis colon cancer (HNPCC). This cancer is one of the most common hereditary cancers, accounting for about 15% of colon cancers. Research indicate that HNPCC arises from mutations in the proteins that carry out mismatch repair (see Figure 17.27). Li-Fraumeni syndrome is caused by mutations in a gene called p53, which plays an important role in regulating the cell cycle. The product encoded by the p53 gene can halt cell division until damage to DNA has been repaired; it can also directly stimulate DNA repair. The p53 gene product may actually cause cells with damaged DNA to self-destruct (undergo apoptosis, or controlled cell death; see Chapter 21), preventing their mutated genetic instructions from being passed on. Patients who have Li-Fraumeni syndrome exhibit multiple independent cancers in different tissues. Some additional genetic diseases associated with defective DNA repair are summarized in Table 17.6. Concepts Defects in DNA repair are the underlying cause of several genetic diseases. Many of these diseases are characterized by a predisposition to cancer. www.whfreeman.com/pierce xeroderma pigmentosum Additional information about Connecting Concepts Across Chapters This chapter has been our first comprehensive look at mutations, but we have been considering and using mutations throughout the book. Mutation is a fact of life. Our DNA is continually assaulted by spontaneously arising and environmentally 499 500 Chapter 17 Table 17.6 Genetic diseases associated with defects in DNA repair systems Disease Symptoms Genetic Defect Xeroderma pigmentosum Frecklelike spots on skin, sensitivity to sunlight, predisposition to skin cancer Defects in nucleotide-excision repair Cockayne syndrome Dwarfism, sensitivity to sunlight, premature aging, deafness, mental retardation Defects in nucleotide-excision repair Trichothiodystrophy Brittle hair, skin abnormalities, short stature, immature sexual development, characteristic facial features Defects in nucleotide-excision repair Hereditary nonpolyposis colon cancer Predisposition to colon cancer Defects in mismatch repair Fanconi anemia Increased skin pigmentation, abnormalities of skeleton, heart, and kidneys, predisposition to leukemia Possibly defects in the repair of interstrand cross-links Ataxia telangiectasia Defective muscle coordination, dilation of blood vessels in skin and eyes, immune deficiencies, sensitivity to ionizing radiation, predisposition to cancer Defects in DNA damage detection and response Li-Fraumeni syndrome Predisposition to cancer in many different tissues Defects in DNA damage response induced mutations. These mutations are the raw material of evolution and, in the long run, allow organisms to adapt to the environment, a topic that will be taken up in Chapter 23. In spite of their long-term contribution to species evolution, the vast majority of mutations are, in the short term, detrimental to cells. The fact that most are detrimental is evidenced by the number mechanisms that cells possess to reduce the generation of errors in DNA and to repair those that do arise. A dominant theme of this chapter is that cells go to great lengths to prevent mutations. This chapter has incorporated information presented in a number of earlier chapters, which you might want to review for a better understanding of the processes and structures discussed in the current chapter. Chromosome mutations and transposable elements (which frequently cause mutations) are discussed in Chapters 9 and 11. Although the structural nature of these mutations is different from that of gene mutations, many fundamental aspects of the mutational process that were introduced in this chapter also apply to these other types of mutations. The study of gene mutations is fundamentally about changes in DNA structure; so the discussion of DNA structure in Chapter 10 is critical for understanding the nature of mutations and how they arise. Some mutations spontaneously arise from errors in replication, and many DNA repair mechanisms include some DNA synthesis; hence, the process of replication outlined in Chapter 12 also is important. The relation between the nucleotide sequences of DNA and the amino acid sequences of proteins, which is discussed in Chapter 15, is particularly relevant for understanding the phenotypic effects of mutations and the nature of intra- and intergenic suppressors. Some of the material covered on bacterial and viral genetics in Chapter 15 is helpful for understanding complementation and the Ames test. The current chapter has provided information that is important for understanding material presented in future chapters. Mutation is the molecular basis of cancer; so the contents of the current chapter will be highly relevant to the discussion of cancer genetics in Chapter 21. The importance of the mutation process to evolution will be revisited in Chapter 23. Gene Mutations and DNA Repair 501 CONCEPTS SUMMARY • Mutations are heritable changes in genetic information. They are important for the study of genetics and can be used to unravel other biological processes. • Somatic mutations occur in somatic cells; germ-line mutations occur in cells that give rise to gametes. Gene mutations are genetic alterations that affect a single gene; chromosome mutations entail changes in the number or structure of chromosomes. • The simplest type of mutation is a base substitution, a change in a single base pair of DNA. Transitions are base substitutions in which purines are replaced by purines or pyrimidines are replaced by pyrimidines. Transversions are base substitutions in which a purine replaces a pyrimidine or a pyrimidine replaces a purine. • Insertions are the addition of nucleotides, and deletions are the removal of nucleotides; these mutations often change the reading frame of the gene. • Expanding trinucleotide repeats are mutations in which the number of copies of a trinucleotide increases through time; they are responsible for several human genetic diseases. • A missense mutation alters the coding sequence so that one amino acid substitutes for another. A nonsense mutation changes a codon that specifies an amino acid to a termination codon. A silent mutation produces a synonymous codon that specifies the same amino acid as the original sequence, whereas a neutral mutation alters the amino acid sequence but does not change the functioning of the protein. A suppressor mutation reverses the effect of a previous mutation at a different site and may be intragenic (within the same gene as the original mutation) or intergenic (within a different gene). • Mutation rate is the frequency with which a particular mutation arises in a population, whereas mutation frequency is the incidence of a mutation in a population. Mutation rates are usually low and are influenced by both genetic and environmental factors. • Some mutations occur spontaneously. These mutations include the mispairing of bases in replication and spontaneous depurination and deamination. • Insertions and deletions may arise from strand slippage in replication or from unequal crossing over. • Base analogs may become incorporated into DNA in replication and pair with the wrong base in subsequent replication events. Alkylating agents and hydroxylamine modify the chemical structure of bases and lead to mutations. Intercalating agents insert into the DNA molecule and cause single-nucleotide additions and deletions. Oxidative reactions alter the chemical structures of bases. • Ionizing radiation is mutagenic, altering base structures and breaking phosphodiester bonds. Ultraviolet light produces pyrimidine dimers, which block replication. Bacteria use the SOS response to overcome replication blocks produced by pyrimidine dimers and other lesions in DNA, but the SOS response causes the occurrence of more replication errors. Pyrimidine dimers in eukaryotic cells can be bypassed by DNA polymerase  but may result in the placement of incorrect bases opposite the dimer. • The analysis of reverse mutations provides information about the molecular nature of the original mutation. • The Ames tests uses bacteria to assess the mutagenic potential of chemical substances. • Most damage to DNA is corrected by DNA repair mechanisms. These mechanisms include mismatch repair, direct repair, base-excision repair, nucleotide-excision repair, and other repair pathways. Although the details of the different DNA repair mechanisms vary, most require two strands of DNA and exhibit some overlap in the types of damage repaired. Proofreading and mismatch repair correct errors that arise in replication. Direct-repair mechanisms change the altered nucleotides back into their original condition, whereas base-excision and nucleotide-excision repair mechanisms replace nucleotides around the damaged segment of the DNA. • Defects in DNA repair are the underlying cause of several genetic diseases. IMPORTANT TERMS mutation (p. 000) somatic mutation (p. 000) germ-line mutation (p. 000) gene mutation (p. 000) base substitution (p. 000) transition (p. 000) transversion (p. 000) insertion (p. 000) deletion (p. 000) frameshift mutation (p. 000) in-frame insertion (p. 000) in-frame deletion (p. 000) expanding trinucleotide repeat (p. 000) forward mutation (p. 000) reverse mutation (reversion) (p. 000) missense mutation (p. 000) nonsense mutation (p. 000) silent mutation (p. 000) neutral mutation (p. 000) loss-of-function mutation (p. 000) gain-of-function mutation (p. 000) conditional mutation (p. 000) lethal mutation (p. 000) suppressor mutation (p. 000) intragenic suppressor mutation (p. 000) 502 Chapter 17 intergenic suppressor mutation (p. 000) mutation rate (p. 000) mutation frequency (p. 000) spontaneous mutation (p. 000) induced mutation (p. 000) incorporated error (p. 000) replicated error (p. 000) strand slippage (p. 000) unequal crossing over (p. 000) depurination (p. 000) deamination (p. 000) mutagen (p. 000) base analog (p. 000) intercalating agent (p. 000) pyrimidine dimer (p. 000) SOS system (p. 000) Ames test (p. 000) direct repair (p. 000) base-excision repair (p. 000) nucleotide-excision repair (p. 000) Worked Problems 1. A codon that specifies the amino acid Asp undergoes a singlebase substitution that yields a codon that specifies Ala. Refer to the genetic code in Figure 15.12 and give all possible DNA sequences for the original and the mutated codon. Is the mutation a transition or a transversion? • Solution There are two possible RNA codons for Asp: GAU and GAC. The DNA sequences that encode these codons will be complementary to the RNA codons: CTA and CTG. There are four possible RNA codons for Ala: GCU, GCC, GCA, and GCG, which correspond to DNA sequences CGA, CGG, CGT, and CGC. If we organize the original and mutated sequences as shown in the following table, it is easy to see what type of mutations may have occurred: Possible original sequence for Asp CTA CTG Possible mutated sequence for Ala CGA CGG CGT CGC If the mutation is confined to a single-base substitution, then the only mutations possible are that CTA mutated to CGA or that GTG mutated to CGG. In both, there is a T : G transversion in the middle nucleotide of the codon. 2. A gene encodes a protein with the following amino acid sequence: Met-Arg-Cys-Ile-Lys-Arg A mutation of a single nucleotide alters the amino acid sequence to: Met-Asp-Ala-Tyr-Lys-Gly-Glu-Ala-Pro-Val A second single-nucleotide mutation occurs in the same gene and suppresses the effects of the first mutation (an intragenic suppressor). With the original mutation and the intragenic suppressor present, the protein has the following amino acid sequence: Met-Asp-Gly-Ile-Lys-Arg What is the nature and location of the first mutation and the intragenic suppressor mutation? • Solution The first mutation alters the reading frame, because all amino acids after Met are changed. Insertions and deletions affect the reading frame; so the original mutation consists of a singlenucleotide insertion or deletion in the second codon. The intragenic suppressor restores the reading frame; so the intragenic suppressor also is most likely a single-nucleotide insertion or deletion: if the first mutation is an insertion, the suppressor must be a deletion; if the first mutation is a deletion, then the suppressor must be an insertion. Notice that the protein produced by the suppressor still differs from the original protein at the second and third amino acids, but the suppressor’s second amino acid is the same as that in the protein produced by the original mutation. Thus the suppressor mutation must have occurred in the third codon, because the suppressor does not alter the second amino acid. 3. The mutations produced by the following compounds are reversed by the substances shown. What conclusions can you make about the nature of the mutations originally produced by these compounds? Reversed by Mutations produced by compound (a) 1 (b) 2 (c) 3 (d) 4 5-Bromouracil EMS Hydroxylamine Acridine orange Yes Yes No Yes Yes Yes No Yes No Some No Yes No No Yes Yes • Solution The ability of various compounds to produce reverse mutations reveals important information about the nature of the original mutation. (a) Mutations produced by compound 1 are reversed by 5-bromouracil, which produces both AT : GC and GC : AT transitions. This tells us that compound 1 produces single-base substitutions that may include the generation of either AT or Gene Mutations and DNA Repair GC pairs. The mutations produced by compound 1 are also reversed by EMS, which, like 5-bromouracil, produces both AT : GC and GC : AT transitions; so no additional information is provided here. Hydroxylamine does not reverse the mutations produced by compound 1. Because hydroxylamine produces only CG : TA transitions, we know that compound 1 does not generate CG base pairs. Acridine orange, an intercalating agent that produces frameshift mutations, also does not reverse the mutations, revealing that compound 1 produces only single-base-pair substitutions, not insertions or deletions. In summary, compound 1 appears to causes single-base substitutions that generate TA but not GC base pairs. (b) Compound 2 generates mutations that are reversed by 5-bromouracil and EMS, indicating that it may produce GC or 503 AT base pairs. Some of these mutations are reversed by hydroxylamine, which produces only CG : TA transitions. This indicates that some of the mutations produced by compound 2 are TA base pairs. None of the mutations are reversed by acridine orange; so compound 2 does not induce insertions or deletions. In summary, compound 2 produces single-base substitutions that generate both GC and AT base pairs. (c) Compound 3 produces mutations that are reversed only by acridine orange; so compound 3 appears to produce only insertions and deletions. (d) Compound 4 is reversed by 5 bromouracil, EMS, hydroxylamine, and acridine orange, indicating that this compound produces single-base substitutions, which include both GC and AT base pairs, and insertions and deletions. The New Genetics MINING GENOMES MOLECULAR EVOLUTION This exercise introduces you to some of the basic principles of molecular evolution, the study of the ways in which molecules evolve, and the reconstruction of the evolutionary history of molecules and organisms. You will use several of the Internet tools most frequently used by contemporary molecular geneticists to analyze analogous sequences from related organisms. COMPREHENSION QUESTIONS * 1. What is the difference between somatic mutations and germ-line mutations? * 8. What is the cause of errors in DNA replication? * 2. What is the difference between a transition and a transversion? Which type of base substitution is usually more common? Why? *10. How do base analogs lead to mutations? 11. How do alkylating agents, nitrous acid, and hydroxylamine produce mutations? * 3. Briefly describe expanding trinucleotide repeats. How do they account for the phenomenon of anticipation? 12. What types of mutations are produced by ionizing and UV radiation? 4. What is the difference between a missense mutation and a nonsense mutation? A silent mutation and a neutral mutation? 5. Briefly describe two different ways that intragenic suppressors may reverse the effects of mutations. * 6. How do intergenic suppressors work? * 7. What is the difference between mutation frequency and mutation rate? 9. How do insertions and deletions arise? *13. What is the SOS system and how does it lead to an increase in mutations? 14. What is the purpose of the Ames test? How are his bacteria used in this test? *15. List at least three different types of DNA repair and briefly explain how each is carried out. 16. What features do mismatch repair, base-excision repair, and nucleotide-excision repair have in common? APPLICATION QUESTIONS AND PROBLEMS * 17. A codon that specifies the amino acid Gly undergoes a single-base substitution to become a nonsense mutation. In accord with the genetic code given in Figure 15.12, is this mutation a transition or a transversion? At which position of the codon does the mutation occur? *18. (a) If a single transition occurs in a codon that specifies Phe, what amino acids could be specified by the mutated sequence? (b) If a single transversion occurs in a codon that specifies Phe, what amino acids could be specified by the mutated sequence? 504 Chapter 17 (c) If a single transition occurs in a codon that specifies Leu, what amino acids could be specified by the mutated sequence? (d) If a single transversion occurs in a codon that specifies Leu, what amino acids could be specified by the mutated sequence? 19. Hemoglobin is a complex protein that contains four polypeptide chains. The normal hemoglobin found in adults — called adult hemoglobin — consists of two  and two  polypeptide chains, which are encoded by different loci. Sickle-cell hemoglobin, which causes sickle-cell anemia, arises from a mutation in the  chain of adult hemoglobin. Adult hemoglobin and sickle-cell hemoglobin differ in a single amino acid: the sixth amino acid from one end in adult hemoglobin is glutamic acid, whereas sickle-cell hemoglobin has valine at this position. After consulting the genetic code provided in Figure 15.12, indicate the type and location of the mutation that gave rise to sickle-cell anemia. * 20. The following nucleotide sequence is found on the template strand of DNA. First, determine the amino acids of the protein encoded by this sequence by using the genetic code provided in Figure 15.12. Then, give the altered amino acid sequence of the protein that will be found in each of the following mutations. Sequence of DNA template ↵ 3 – TAC TGG CCG TTA GTT GAT ATA ACT – 5 1 24 Nucleotide number ↵ (a) Mutant 1: A transition at nucleotide 11. (b) Mutant 2: A transition at nucleotide 13. (c) Mutant 3: A one-nucleotide deletion at nucleotide 7. (d) Mutant 4: A T : A transversion at nucleotide 15. (e) Mutant 5: An addition of TGG after nucleotide 6. (f) Mutant 6: A transition at nucleotide 9. 21. A polypeptide has the following amino acid sequence: Met-Ser-Pro-Arg-Leu-Glu-Gly The amino acid sequence of this polypeptide was determined in a series of mutants listed in parts a through e. For each mutant, indicate the type of change that occurred in the DNA (single-base substitution, insertion, deletion) and the phenotypic effect of the mutation (nonsense mutation, missense mutation, frameshift, etc.). (a) Mutant 1: (b) Mutant 2: (c) Mutant 3: (d) Mutant 4: (e) Mutant 5: Met-Ser-Ser-Arg-Leu-Glu-Gly Met-Ser-Pro Met-Ser-Pro-Asp-Trp-Arg-Asp-Lys Met-Ser-Pro-Glu-Gly Met-Ser-Pro-Arg-Leu-Leu-Glu-Gly * 22. A gene encodes a protein with the following amino acid sequence: Met-Trp-His-Val-Ala-Ser-Phe. A mutation occurs in the gene. The mutant protein has the following amino acid sequence: Met-Trp-His-Met-Ala-Ser-Phe. An intragenic suppressor restores the amino acid sequence to that of the original the protein: Met-Trp-His-Arg-Ala-Ser-Phe. Give at least one example of base changes that could produce the original mutation and the intragenic suppressor? (Consult the genetic code in Figure 15.12.) 23. A gene encodes a protein with the following amino acid sequence: Met-Lys-Ser-Pro-Ala-Thr-Pro A nonsense mutation from a single-base-pair substitution occurs in this gene, resulting in a protein with the amino acid sequence Met-Lys. An intergenic suppressor mutation allows the gene to produce the full-length protein. With the original mutation and the intergenic suppressor present, the gene now produces a protein with the following amino acid sequence: Met-Lys-Cys-Pro-Ala-Thr-Pro Give the location and nature of the original mutation and the intergenic suppressor. * 24. Can nonsense mutations be reversed by hydroxylamine? Why or why not? 25. XG syndrome is a rare genetic disease that is due to an autosomal dominant gene. A complete census of a small European country reveals that 77,536 babies were born in 2000, of whom 3 had XG syndrome. In the same year, this country had a population of 5,964,321 people, and there were 35 living persons with XG syndrome. What are the mutation rate and mutation frequency of XG syndrome for this country? * 26. The following nucleotide sequence is found in a short stretch of DNA: 5 – ATGT – 3 3 – TACA – 5 If this sequence is treated with hydroxylamine, what sequences will result after replication? 27. The following nucleotide sequence is found in a short stretch of DNA: 5 – AG – 3 3 – TC – 5 (a) Give all the mutant sequences that may result from spontaneous depurination in this stretch of DNA. (b) Give all the mutant sequences that may result from spontaneous deamination in this stretch of DNA. Gene Mutations and DNA Repair 505 * 30. A plant breeder wants to isolate mutants in tomatoes that are defective in DNA repair. However, this breeder does not have the expertise or equipment to study enzymes in DNA repair systems. How could the breeder identify tomato plants that are deficient in DNA repair? What are the traits to look for? 31. A genetics instructor designs a laboratory experiment to study the effects of UV radiation on mutation in bacteria. In the experiment, the students expose bacteria plated on petri plates to UV light for different lengths of time, place the plates in an incubator for 48 hours, and then count the number of colonies that appear on each plate. The plates that have received more UV radiation should have more pyrimidine dimers, which block replication; thus, fewer colonies should appear on the plates exposed to UV light for longer periods of time. Before the students carry out the experiment, the instructor warns them that, while the Acridine bacteria are in the incubator, the students must not open orange the incubator door unless the room is darkened. Why should the bacteria not be exposed to light? No No No Yes 28. In many eukaryotic organisms, a significant proportion of cytosine bases are naturally methylated to 5-methylcytosine. Through evolutionary time, the proportion of AT base pairs in the DNA of these organisms increases. Can you suggest a possible mechanism by which this increase occurs? * 29. A chemist synthesizes four new chemical compounds in the laboratory and names them PFI1, PFI2, PFI3, and PFI4. He gives the PFI compounds to a geneticist friend and asks her to determine their mutagenic potential. The geneticist finds that all four are highly mutagenic. She also tests the capacity of mutations produced by the PFI compounds to be reversed by other known mutagens and obtains the following results. What conclusions can you make about the nature of the mutations produced by these compounds? Reversed by Mutations produced by PFI1 PFI2 PFI3 PFI4 2-Aminopurine Nitrousacid Hydroxylamine Yes No Yes No Yes No Yes No Some No No No CHALLENGE QUESTIONS 32. Ochre and amber are two types of nonsense mutations. Before the genetic code was worked out, Sydney Brenner, Anthony O. Stretton, and Samuel Kaplan applied different types of mutagens to bacteriophages in an attempt to determine the bases present in the codons responsible for amber and ochre mutations. They knew that ochre and amber mutants were suppressed by different types of mutations, demonstrating that each was a different termination codon. They obtained the following results. (1) A single-base substitution could convert an ochre mutation into an amber mutation. (2) Hydroxylamine induced both ochre and amber mutations in wild-type phages. (3) 2-Aminopurine caused ochre to mutate to amber. (4) Hydroxylamine did not cause ochre to mutate to amber. These data do not allow the complete nucleotide sequence of the amber and ochre codons to be worked out, but they do provide some information about the bases found in the nonsense mutations. (a) What conclusions about the bases found in the codons of amber and ochre mutations can be made from these observations? (b) Of the three nonsense codons (UAA, UAG, UGA), which represents the ochre mutation. 33. To determine whether radiation associated with the atomic bombings of Hiroshima and Nagasaki produced recessive germ-line mutations, scientists examined the sex ratio of the children of the survivors of the blasts. Can you explain why an increase in germ-line mutations might be expected to alter the sex ratio? 34. The results of several studies provide evidence that DNA repair is rapid in genes that are undergoing transcription and that some proteins that play a role in transcription also participate in DNA repair. How are transcription and DNA repair related? Why might a gene that is being transcribed be repaired faster than a gene that is not being transcribed? SUGGESTED READINGS Balter, M. 1995. Filtering a river of cancer data. Science 267:1084–1086. Article describing the nuclear disaster on the Techa river in Russia. Beale, G. 1993. The discovery of mustard gas mutagenesis by Auerbach and Robson in 1941. Genetics 134:393–399. An informative and personal account of Auerbach’s life and research. 506 Chapter 17 Dovoret, R. 1979. Bacterial tests for potential carcinogens. Scientific American 241(2):40–49. A discussion of the Ames tests and more recent tests of mutagenesis in bacteria. Drake, J.W., and R.H. Baltz. 1976. The biochemistry of mutagenesis. Annual Review of Biochemistry 45:11–37. A discussion of how mutations are produced by mutagenic agents. Dubrova, Y.E., V.N. Nesterov, N.G. Krouchinsky, V.A. Ostapenko, R. Neumann, D.L. Neil, and A.J. Jeffreys. 1996. Human minisatellite mutation rate after the Chernobyl accident. Nature 380:683–686. A report of increased germ-line mutation rate in people exposed to radiation in the Chernobyl accident. Goodman, M.F. 1995. DNA models: mutations caught in the act. Nature 378:237–238. A review of the role of tautomerization in replication errors. Hoeijmakers, J.H., and D. Bootsma. 1994. Incisions for excision. Nature 371:654–655. Commentary on the proteins in eukaryotic nucleotide-excision repair. Martin, J.B. 1993. Molecular genetics of neurological diseases. Science 262:674–676. A discussion of expanding trinucleotide repeats as cause of neurological diseases. Modrich, P. 1991. Mechanisms and biological effects of mismatch repair. Annual Review of Genetics 25:229–253. A comprehensive review of mismatch repair. Neel, J.V., C. Satoh, H.B. Hamilton, M. Otake, K. Goriki, T. Kageoka, M. Fujita, S. Neriishi, and J. Asakawa. 1980. Search for mutations affecting protein structure in children of atomic bomb survivors: preliminary report. Proceedings of the National Academy of Sciences of the United States of America. 77:4221–4225. A report of the gene mutations in the children of survivors of the atomic bombings in Japan. Sancar, A. 1994. Mechanisms of DNA excision repair. Science 266:1954–1956. An excellent review of research on excision repair. This issue of Science was about the “molecule of the year” for 1994, which was DNA repair (actually not a molecule). Schull, W.J., M. Otake, and J.V. Neel. 1981. Genetic effects of the atomic bombs: a reappraisal. Science 213:1220–1227. Research findings concerning the genetic effects of radiation exposure in survivors of the atomic bombings in Japan. Shcherbak, Y.M. 1996. Ten years of the Chernobyl era. Scientific American 274(4):44–49. Considers the long-term effects of the Chernobyl accident. Sinden, R.R. 1999. Biological implications of DNA structures associated with disease-causing triplet repeats. American Journal of Human Genetics 64:346–353. A good summary of disease-causing trinucleotide repeats and some models for how they might arise. Tanaka, K., and R.D. Wood. 1994. Xeroderma pigmentosum and nucleotide excision repair. Trends in Biochemical Sciences 19:84–86. A review of the molecular basis of xeroderma pigmentosum. Yu, S., J. Mulley, D. Loesch, G. Turner, A. Donnelly, A. Gedeon, D. Hillen, E. Kremer, M. Lynch, M. Pritchard, G.R. Sunderland, and R.I. Richards. 1992. Fragile-X syndrome: unique genetics of the heritable unstable element. American Journal of Human Genetics 50:968–980. A research report describing the expanding trinucleotide repeat that causes fragile-X syndrome. 18 Recombinant DNA Technology • PCR and the Arrival of Tuberculosis in America • Basic Concepts of Recombinant DNA Technology The Impact of Recombinant DNA Technology Working at the Molecular Level • Recombinant DNA Techniques Cutting and Joining DNA Fragments Viewing DNA Fragments Locating DNA Fragments with Southern Blotting and Probes Cloning Genes Finding Genes Using the Polymerase Chain Reaction to Amplify DNA Analyzing DNA Sequences • Applications of Recombinant DNA Technology Pharmaceuticals Specialized Bacteria Agricultural Products Oligonucleotide Drugs Genetic Testing Disembarkation of the Spanish at Veracruz by Mexican artist Diego Rivera. Anthropologists have suggested that Europeans first transmitted tuberculosis to the Native Americans. The polymerase chain reaction — a technique for amplifying very small amounts of DNA — has now demonstrated the presence of the bacterium that causes tuberculosis in a 1000 year old mummy from Peru, demonstrating that the disease was present in South America long before Europeans arrived. (Diego Rivera, Disembarkation of the Spanish at Gene Therapy Gene Mapping DNA Fingerprinting Concerns About Recombinant DNA Technology Veracruz, 1951. Schalkwijk/Art Resource.) PCR and the Arrival of Tuberculosis in America In the early 1600s, soon after Europeans arrived in the New World, devastating epidemics of tuberculosis ravaged many tribes of Native Americans. Anthropologists long argued that this disease was absent from the New World before 1492 and that Europeans first transmitted tuberculosis to the Native Americans. With no prior exposure to the disease and little natural immunity, the indigenous people would, it was argued, have been highly susceptible to tuberculosis. A few anthropologists challenged this conventional view. On the basis of tuberculosis-like lesions found in a few skeletons and mummified remains that pre-date European contact, they suggested that the disease was present in Native Americans before European contact. On the other hand, many diseases and even bacteria that gain access to the body after death can produce similar marks, and the origin of tuberculosis in America remained controversial. 507 508 Chapter 18 In 1994, pathologist Arthur C. Aufderheide and molecular biologist Wilmar Salo teamed up to resolve this controversy. Aufderheide obtained access to the remains of a woman who died and was naturally mummified in Peru about 1000 years ago, hundreds of years before Europeans arrived in South America. He removed samples of the woman’s right lung and a lymph node and sent them to Salo, who used the newly developed polymerase chain reaction (PCR, which selectively amplifies sequences of DNA) to search the samples for DNA from Mycobacterium tuberculosis, the bacterium that causes tuberculosis. When applied to DNA from modern-day Mycobacterium tuberculosis, this technique produces copies of a 97-bp fragment of DNA. Salo detected an identical 97-bp piece of DNA by applying PCR to DNA samples from the ancient lung and lymph tissue, demonstrating unambiguously the presence of tuberculosis in the tissue of this 1000-year-old woman. Although Europeans were not the first to transmit tuberculosis to Native Americans, they were still the most likely cause of the tuberculosis epidemics that accompanied their arrival in the Americas. European settlement was highly disruptive to many indigenous societies, often causing mass displacements of people and radically altering their traditional life styles. Stressful conditions, accompanied by crowding and malnutrition on reservations, probably lowered the resistance of many Native Americans and contributed to the spread of tuberculosis. This story of the discovery of tuberculosis in the remains of a 1000-year-old woman illustrates the power of PCR, one of the techniques of molecular biology discussed in this chapter. We begin the chapter with a discussion of recombinant DNA technology and some of its effects. We then examine a number of methods used to isolate, study, alter, and recombine DNA sequences and place them back into cells. Finally, we explore some of the applications of recombinant DNA technology. In reading this chapter, it will be helpful to understand two things. First, working at the molecular level is quite different from working with whole organisms: different approaches are needed, because the molecular objects of study cannot be seen directly. Second, there are a number of different approaches for isolating DNA sequences, amplifying them, and inserting them into bacteria, each approach with its own strengths and weaknesses. The optimal method depends on the starting materials, how much is known about the sequences to be isolated, and what the final objective is. www.whfreeman.com/pierce tuberculosis More information about Basic Concepts of Recombinant DNA Technology In 1973, a group of scientists produced the first organisms with recombinant DNA molecules. Stanley Cohen at Stanford University and Herbert Boyer at the University of California School of Medicine at San Francisco and their colleagues inserted a piece of DNA from one plasmid into another, creating an entirely new, recombinant DNA molecule. They then introduced the recombinant plasmid into E. coli cells. Within a short time, they used the same methods to stitch together genes from two different types of bacteria, as well as to transfer genes from a frog to a bacterium. They called the hybrid DNA molecules chimeras, after the mythological Chimera, a creature with the head of a lion, the body of a goat, and the tail of a serpent. These experiments ushered in one of the most momentous revolutions in the history of science. Recombinant DNA technology is a set of molecular techniques for locating, isolating, altering, and studying DNA segments. The term recombinant is used because frequently the goal is to combine DNA from two distinct sources. Genes from two different bacteria might be joined, for example, or a human gene might be inserted into a viral chromosome. Commonly called genetic engineering, recombinant DNA technology now encompasses an array of molecular techniques that can be used to analyze, alter, and recombine virtually any DNA sequences. The Impact of Recombinant DNA Technology Recombinant DNA technology has drastically altered the way that genes are studied. Previously, information about the structure and organization of genes was gained by examining their phenotypic effects, but the new technology makes it possible to read the nucleotide sequences themselves. Previously, geneticists had to wait for the appearance of random or induced mutations to analyze the effects of genetic differences; now they can create mutations at precisely defined spots and see how they alter the phenotype. Recombinant DNA technology has provided new information about the structure and function of genes and has altered many fundamental concepts of genetics. For example, whereas the genetic code was once thought to be entirely universal, we now know that nonuniversal codons exist in mitochondrial DNA. Previously, we thought that the organization of eukaryotic genes was like that of prokaryotes, but we now know that many eukaryotic genes are interrupted by introns. Much of what we know today about replication, transcription, translation, RNA processing, and gene regulation has been learned through the use of recombinant DNA techniques. These techniques are also used in many other fields, including biochemistry, microbiology, developmental biology, neurobiology, evolution, and ecology. Recombinant DNA technology is also used to create a number of commercial products, including drugs, hormones, enzymes, and crops ( ◗ FIGURE 18.1). An entirely new industry — biotechnology — has grown up around the use of these techniques to develop new products. In medicine, recombinant DNA techniques are used to probe the nature Recombinant DNA Technology ◗ 18.1 Recombinant DNA technology has been used to create genetically modified crops. Genetically engineered corn, which produces a toxin that kills insect pests, now comprises over 30% of all corn grown in the United States. (Chris Knapton/Photo Researchers.) of cancer, diagnose genetic and infectious diseases, produce drugs, and treat hereditary disorders. Concepts Recombinant DNA technology is a set of methods used to locate, analyze, alter, study, and recombine DNA sequences. It is used to probe the structure and function of genes, address questions in many areas of biology, create commercial products, and diagnose and treat diseases. www.whfreeman.com/pierce Information on genetic engineering and the biotechnology industry Working at the Molecular Level The manipulation of genes presents a serious challenge, often requiring strategies that may not, at first, seem obvi- ous. The basic problem is that genes are minute and there are thousands of them in every cell. Even when viewed with the most powerful microscope, DNA appears as a tiny thread — individual nucleotides cannot be seen, and no physical features mark the beginning or the end of a gene. To illustrate the problem, let’s consider a typical situation faced by a molecular geneticist. Suppose we wanted to isolate a particular human gene, place it inside bacterial cells, and use the bacteria to produce large quantities of the encoded human protein. The first and most formidable problem is to find the desired gene. A haploid human genome consists of 3.3 billion base pairs of DNA. Let’s assume that the gene that we want to isolate is 3000 bp long. Our target gene occupies only one-millionth of the genome; so searching for our gene in the huge expanse of genomic DNA is more difficult than looking for a needle in the proverbial haystack. But, even if we are able to locate the gene, how are we to separate it from the rest of the DNA? No forceps are small enough to pick up a single piece of DNA, and no mechanical scissors precise enough to snip out an individual gene. If we did succeed in locating and isolating the desired gene, we would next need to insert it into a bacterial cell. Linear fragments of DNA are quickly degraded by bacteria; so the gene must be inserted in a stable form. It must also be able to successfully replicate or it will not be passed on when the cell divides. If we succeed in transferring our gene to bacteria in a stable form, we still must ensure that the gene is properly transcribed and translated. Gene expression is a complex process requiring a number of DNA sequence elements, some of which lie outside the gene itself (Chapters 13 through 16). All of these elements must be present in their proper orientations and positions for the protein to be produced. Finally, the methods used to isolate and transfer genes are inefficient and, of a million cells that are subjected to these procedures, only one cell might successfully take up and express the human gene. So we must search through many bacterial cells to find the one containing the recombinant DNA. We are back to the problem of the needle in the haystack. Although these problems might seem insurmountable, molecular techniques have been developed to overcome all of them, and human genes are routinely transferred to bacterial cells, where the genes are expressed. Concepts Recombinant DNA technology requires special methods because individual genes make up a tiny fraction of the cellular DNA and they cannot be seen. 509 510 Chapter 18 Recombinant DNA Techniques In the sections that follow, we will examine some of the following techniques of recombinant DNA technology and see how they are used to create recombinant DNA molecules: 1. Methods for locating specific DNA sequences 2. Techniques for cutting DNA at precise locations 3. Procedures for amplifying a particular DNA sequence billions of times, producing enough copies of a DNA sequence to carry out further manipulations 4. Methods for mutating and joining DNA fragments to produce desired sequences 5. Procedures for transferring DNA sequences into recipient cells Cutting and Joining DNA Fragments The key development that made recombinant DNA technology possible was the discovery in the late 1960s of restriction enzymes (also called restriction endonucleases) that recognize and make double-stranded cuts in the sugar – phosphate backbone of DNA molecules at specific nucleotide sequences. These enzymes are produced naturally by bacteria, where they are used in defense against viruses. In bacteria, restriction enzymes recognize particular sequences in viral DNA and then cut it up. A bacterium protects its own DNA from a restriction enzyme by modifying the recognition sequence, usually by adding methyl groups to its DNA. Three types of restriction enzymes have been isolated from bacteria (Table 18.1). Type I restriction enzymes recognize specific sequences in the DNA but cut the DNA at random sites that may be some distance (1000 bp or more) from the recognition sequence. Type III restriction enzymes recognize specific sequences and cut the DNA at nearby sites, usually about 25 bp away. Type II restriction enzymes recognize specific sequences and cut the DNA within the recognition sequence. Virtually all work on recombinant DNA is done with type II restriction enzymes; discussions Table 18.1 Types of restriction enzymes Activity of Enzyme ATP Required I Cleavage and methylation Yes Random sites distant from recognition site II Cleavage only No Within recognition site III Cleavage and methylation Yes Random sites near recognition site Type of restriction enzymes throughout this book, refers to type II enzymes. More than 800 different restriction enzymes that recognize and cut DNA at more than 100 different sequences have been isolated from bacteria. Many of these enzymes are commercially available; examples of some commonly used restriction enzymes are given in Table 18.2. Each restriction enzyme is referred to by a short abbreviation that signifies its bacterial origin. The sequences recognized by restriction enzymes are usually from 4 to 8 bp long; most enzymes recognize a sequence of 4 or 6 bp. Most recognition sequences are palindromic — sequences that read the same forward and backward. Notice in Table 18.2 that the sequence on the bottom strand is the same as the sequence on the top strand, only reversed. All type II restriction enzymes recognize palindromic sequences. Some of the enzymes make staggered cuts in the DNA. For example, HindIII recognizes the following sequence: 5– AAGCTT–3 3– TTCGAA–5 HindIII cuts the sugar – phosphate backbone of each strand at the point indicated by the arrow, generating fragments with short, single-stranded overhanging ends: 5– A AGCTT–3 3– TTCGA A–5 Such ends are called cohesive ends or sticky ends, because they are complementary to each other and can spontaneously pair to connect the fragments. Thus DNA fragments can be “glued” together: any two fragments cleaved by the same enzyme will have complementary ends and will pair ( ◗ FIGURE 18.2). When their cohesive ends have paired, two DNA fragments can be joined together permanently by the enzyme DNA ligase, which seals nicks between the sugar – phosphate groups of the fragments. Not all restriction enzymes produce staggered cuts and sticky ends. PvuII cuts in the middle of its recognition site, producing blunt-ended fragments: Cleavage Site 5– CAGCTG–3 3– GTCGAC–5 5– CAGCTG–3 3– GTCGAC–5 Fragments with blunt ends must be joined together in other ways, which will be discussed later. Recombinant DNA Technology Table 18.2 Characteristics of some common type II restriction enzymes used in recombinant DNA technology Enzyme Microorganism from Which Enzyme Is Isolated BamHI Bacillus amyloliquefaciens 5–GGATCC–3 3–CCTAGG–3 Cohesive CofI Clostridium formicoaceticum 5–GCGC–3 3–CGCG–5 Cohesive DraI Deinococcus radiophilus 5–TTTAAA–3 3–AAATTT–5 Blunt EcoRI Escherichia coli 5–GAATTC–3 3–CTTAAG–5 Cohesive EcoRII Escherichia coli 5–CCAGG–3 3–GGTCC–5 Cohesive HaeIII Haemophilus aegyptius 5–GGCC–3 3–CCGG–5 Blunt HindIII Haemophilus influenczae 5–AAGCTT–3 3–TTCGAA–5 Cohesive HpaII Haemophilus parainfluenzae 5–CCGG–3 3–GGCC–5 Cohesive NotI Nocardia otitidis-caviarum 5–GCGGCCGC–3 3–CGCCGGCG–5 Cohesive PstI Providencia stuartii 5–CTGCAG–3 3–GACGTC–5 Cohesive PvuII Proteus vulgaris 5–CAGCTG–3 3–GTCGAC–5 Blunt SmaI Serratia marcescens 5–CCCGGG–3 3–GGGCCC–5 Blunt Recognition Sequence Type of Fragment End Produced Note: The first three letters of the abbreviation for each restriction enzyme refer to the bacterial species from which the enzyme was isolated (e.g., Eco refers to E. coli). A fourth letter may refer to the strain of bacteria from which the enzyme was isolated (the “R” in EcoRI indicates that this enzyme was isolated from the RY13 strain of E. coli). Roman numerals that follow the letters allow different enzymes from the same species to be identified. For convenience, molecular geneticists have come up with idiosyncratic pronunciations of the names: EcoRI is pronounced “echo-R-one,” HindIII is “hin-D-three,” and HaeIII is “hay-three.” These common pronunciations obey no formal rules and simply have to be learned. 511 512 Chapter 18 (a) (a) Linear DNA HindIII 1 Some restriction enzymes, such as HindIII, make staggered cuts in DNA,… AAGCTT TTCGAA A B C D E Digestion at four sites by HindIII A TTCGA AGCTT A 2 …producing single-stranded, cohesive (sticky) ends. PvuII 4 …cut both strands of DNA straight across, producing blunt ends. CTG GAC B C D E With a linear piece of DNA, number With the a linear piece of DNA, the number of fragments producedofis fragments one more produced is one more than the number of restriction than thesites. number of restriction sites. 3 Other restriction enzymes, such as PvuII,… CAGCTG GTCGAC CAG GTC A (b) Circular DNA A D B Blunt ends C Digestion at four sites by HindIII (b) AAGCTT TTCGAA Digestion with HindIII AGCTT A A TTCGA Gap in sugar– phosphate backbone Combine fragments A AGCTT TTCGAA 5 DNA molecules cut with the same restriction enzyme have complementary sticky ends that pair if fragments are mixed together. Gap in sugar– phosphate backbone 6 The nicks in the sugar– phosphate backbone AAGCTT of the two fragments TTCGAA can be sealed by DNA ligase. Ligase ◗ 18.2 Restriction enzymes make double-stranded cuts in the sugar – phosphate backbone of DNA, producing cohesive, or sticky, ends. The sequences recognized by a restriction enzyme occur randomly within genomic DNA. Consequently, there is a relation between the length of the recognition sequence and its frequency of occurrence: there are fewer long recog- B C D With a circular piece of DNA, the number With a circular piece of DNA, the number of fragments produced equal to the of isfragments produced is equal to the number of restriction number sites. of restriction sites. Digestion with HindIII AGCTT A A TTCGA Ligase A AAGCTT TTCGAA ◗ 18.3 The number of restriction sites is related to the number of fragments produced when DNA is cut by a restriction enzyme. nition sequences than short sequences because the probability of all the bases being in the required order is less. Restriction enzymes are the workhorses of recombinant DNA technology and are used whenever DNA fragments must be cut or joined. In a typical restriction reaction, a concentrated solution of purified DNA is placed in a small tube with a buffer solution and a small amount of restriction enzyme. The reaction mixture is then heated at the optimal temperature for the enzyme, usually 37C. Within a few hours, the enzyme cuts all the restriction sites in the DNA, producing a set of DNA fragments ( ◗ FIGURE 18.3). Concepts Type II restriction enzymes cut DNA at specific base sequences. Some restriction enzymes make staggered cuts, producing DNA fragments with cohesive ends; others cut both strands straight across, producing blunt-ended fragments. There are fewer long recognition sequences in DNA than short sequences. Recombinant DNA Technology www.whfreeman.com/pierce restriction enzymes Information on specific Viewing DNA Fragments After the completion of a restriction reaction, a number of questions arise. Did the restriction enzyme cut the DNA? How many times was the DNA cut? What are the sizes of the resulting fragments? Gel electrophoresis provides us with a means of answering these questions. Electrophoresis is a standard biochemical technique for separating molecules on the basis of their size and electrical charge. There are a number of different types of electrophoresis; to separate DNA molecules, gel electrophoresis is used. A porous gel is often made from agarose (a polysaccharide isolated from seaweed), which is melted in a buffer solution and poured into a plastic mold. As it cools, the agarose solidifies, making a gel that looks something like stiff gelatin. Small indentions called wells are made at one end of the gel to hold solutions of DNA fragments ( ◗ FIGURE 18.4a), and an electrical current is passed through the gel. Because the phosphate of each DNA nucleotide carries a negative charge, the DNA fragments migrate toward the positive end of the gel ( ◗ FIGURE 18.4b). In this migration, the gel acts as a sieve; as the DNA molecules migrate toward the positive pole, they move through the pores between the gel particles. Small DNA fragments migrate more rapidly than do large ones and, with time, the fragments separate on the basis of their size. The distance that each fragment migrates depends on its size. Typically, DNA fragments of known length (a marker sample) are placed in another well. By comparing the migration distance of the unknown fragments with the distance traveled by the marker fragments, one can determine the approximate size of the unknown fragments. After electrophoresis, the DNA fragments are separated according to size ( ◗ FIGURE 18.4c). However, the DNA fragments are still too small to see; so the problem of visualizing the DNA needs to be addressed. Visualization can be accomplished in several ways. The simplest procedure is to stain the gel with a dye specific for nucleic acids, such as ethidium bromide, which wedges itself tightly (intercalates) between the bases of DNA. When exposed to UV light, ethidium bromide fluoresces bright orange; so copies of each DNA fragment appear as a brilliant orange band ( ◗ FIGURE 18.4d). The original concentrated sample of purified DNA contained millions of copies of a DNA molecule, and thus each band represents millions of copies of identical DNA fragments. Alternatively, DNA fragments can be visualized by adding a radioactive or chemical label to the DNA before it is placed in the gel. Nucleotides with radioactively labeled phosphate (32P) can be used as the substrate for DNA synthesis and will be incorporated into the newly synthesized DNA strand. In another method called end labeling, the bacteriophage enzyme polynucleotide kinase is used to ◗ 18.4 Gel electrophoresis can be used to separate DNA molecules on the basis of their size and electrical charge. (Photo courtesy of Carol Eng.) 513 514 Chapter 18 transfer a single 32P to the 5 end of each DNA strand. Radioactively labeled DNA can be detected with a technique called autoradiography (see Figure 10.4), in which a piece of X-ray film is placed on top of the gel. Radiation from the labeled DNA exposes the film, just as light exposes photographic film in a camera. The developed autoradiograph gives a picture of the fragments in the gel; each DNA fragment appears as a dark band on the film. Chemical labels can be detected by adding antibodies or other substances that carry a dye and will attach to the relevant DNA, which can be visualized directly. Gel electrophoresis is used widely in recombinant DNA technology; it is often employed when there is a need to determine the number or size of DNA fragments or to isolate DNA fragments by size. For example, to determine the number and location of BamHI restriction sites in a plasmid, we might cut the plasmid by using the BamHI restriction enzyme and place the products of the restriction reaction in a well of an agarose gel. In another well of the same gel, we would place a set of control fragments of known size. After applying an electrical current to the gel for an hour or more, we would stain the gel with ethidium bromide and place it over a UV light. The appearance of three orange bands on the gel would indicate that the circular plasmid had been cut three times and that there are three BamHI restriction sites in the plasmid. A comparison of the migration distance of the plasmid fragments with the migration distance of the standard fragments would reveal the sizes of the fragments and the distances between the BamHI recognition sites. Concepts DNA fragments can be separated, and their sizes can be determined with the use of gel electrophoresis. The fragments can be viewed by using a dye that is specific for nucleic acids or by labeling the fragments with a radioactive or chemical tag. Locating DNA Fragments with Southern Blotting and Probes If a relatively small piece of DNA, such as a plasmid, is cut by a restriction enzyme, the few fragments produced can be seen as distinct bands on an electrophoretic gel. In contrast, if genomic DNA from a cell is cut by a restriction enzyme, a large number of fragments of different sizes are produced. A restriction enzyme that recognizes a four-base sequence would theoretically cut about once every 256 bp. The human genome, with 3.3 billion base pairs, would generate more than 12 million fragments when cut by this restriction enzyme. Separated by electrophoresis and stained, this large set of fragments would appear as a continuous smear on the gel because of the presence of so many fragments of differing size. Usually, one is interested in only a few of these fragments, perhaps those carrying a specific gene. How does one locate the desired fragments in such a large pool of DNA? One approach is to use a probe, which is a DNA or RNA molecule with a base sequence complementary to a sequence in the gene of interest. The bases on a probe will pair only with the bases on a complementary sequence and, if suitably tagged with an identifying label, the probe can be used to locate a specific gene or other DNA sequence. To use a probe, one first cuts the DNA into fragments by using one or more restriction enzymes and then separates the fragments with gel electrophoresis ( ◗ FIGURE 18.5). Next, the separated fragments must be denatured and transferred to a thinner solid medium (such as nitrocellulose or nylon membrane) to prevent diffusion. Southern blotting (named after Edwin M. Southern) is one technique used to transfer the denatured, single-stranded fragments from a gel to a thin solid medium. After the single-stranded DNA fragments have been transferred, the membrane is placed in a hybridization solution of a radioactively or chemically labeled probe (see Figure 18.5). The probe will bind to any DNA fragments on the membrane that bear complementary sequences. The membrane is then washed to remove any unbound probe; bound probe is detected by autoradiography or another method for chemically labeled probes. RNA can be transferred from a gel to a solid support by a related procedure called Northern blotting (not named after anyone but capitalized to match Southern). The hybridization of a probe can reveal the size of a particular mRNA molecule, its relative abundance, or the tissues in which the mRNA is transcribed. Here, the probe is usually an antibody, used to determine the size of a particular protein and the pattern of the protein’s expression. Western blotting is the transfer of protein from a gel to a membrane. Concepts Labeled probes, which are sequences of RNA or DNA that are complementary to the sequence of interest, can be used to locate individual genes or DNA sequences. Southern blotting can be used to transfer DNA fragments from a gel to a membrane such as nitrocellulose. Cloning Genes Many recombinant DNA methods require numerous copies of a specific DNA fragment. One way to obtain these copies is to place the fragment in a bacterial cell and allow the cell to replicate the DNA. This procedure is termed gene cloning, because identical copies (clones) of the original piece of DNA are produced. Recombinant DNA Technology 1 DNA is cleaved by restriction enzymes and transferred to an agarose gel. The fragments are separated by gel electrophoresis. 2 The gel is soaked in an alkali solution to denature the double-stranded DNA and then placed on a platform in a dish containing buffer. Weight Blotting paper Weight Blotting paper 3 A membrane is positioned on top of the gel. Nitrocellulose or other membrane Platform DNA molecule to which a foreign DNA fragment can be attached for introduction into a cell. An effective cloning vector has three important characteristics ( ◗ FIGURE 18.6): (1) an origin of replication, which ensures that the vector is replicated within the cell; (2) selectable markers, which enable any cells containing the vector to be selected or identified; (3) one or more unique restriction sites into which a DNA fragment can be inserted. The restriction sites used for cloning must be unique; if a vector is cut at multiple recognition sites, generating several pieces of DNA, there will be no way to get the pieces back together in the correct order. Three types of cloning vectors are commonly used for cloning genes in bacteria: plasmids, bacteriophages, and cosmids. Gel Plasmid vectors Plasmids are circular DNA molecules Blotting paper that exist naturally in bacteria (see p. 000 in Chapter 15). They contain origins of replication and are therefore able to replicate independently of the bacterial chromosome. The plasmids typically used in cloning have been constructed from the larger, naturally occurring bacterial plasmids. The pUC19 plasmid is a typical cloning vector ( ◗ FIGURE 18.7). It has an origin of replication and two selectable markers — an ampicillin-resistance gene and a lacZ gene. Ampicillin is an antibiotic that normally kills bacterial cells, but any bacterium that contains a pUC19 plasmid will be resistant to this antibiotic. The lacZ gene encodes the enzyme -galactosidase, which normally cleaves lactose to produce glucose and galactose (see p. 000 in Chapter 16). The enzyme will also cleave a chemical called X-gal, producing a blue substance; when X-gal is placed in the medium, any bacterial colonies that contain intact pUC19 plasmids will turn blue and can be easily identified. (In these experiments, the bacterium’s own -galactosidase gene has been inactivated, and so only bacteria with the plasmid turn blue.) The pUC19 plasmid also possesses a number of different unique restrictions sites grouped together (a polylinker) that allow DNA fragments to be inserted into the plasmid. The easiest method for inserting a foreign DNA fragment into a plasmid is to use restriction cloning, in which the foreign DNA and the plasmid are cut by the same restriction enzyme. Restriction cloning produces complementary sticky ends on the foreign DNA and the plasmid ( ◗ FIGURE 18.8a). The DNA and plasmid are then mixed together; some of the foreign DNA fragments will pair with the cut ends of the plasmid. DNA ligase seals the nicks in the sugar–phosphate backbone, creating a recombinant plasmid that contains the foreign DNA fragment. Although simple, restriction cloning has several disadvantages. First, restriction cloning requires that a single restriction site in the plasmid matches sites on both ends of the foreign sequence to be cloned. If this arrangement of restriction sites is not available, this relatively straightfor- 4 Buffer drawn up into the top layer of blotting paper passes through Alkali the gel, carrying DNA solution onto the membrane. Membrane DNA 5 DNA on the membrane is fixed,… 6 …placed in a hybridization bottle with solution that contains a radioactively labeled probe, and gently rotated. Radioactive probe 7 The probe binds to Size 7 The probe binds to complementary DNA fragments on the membrane,… Size standards Autoradiography 8 …and autoradiography detects fragments with probe attached. ◗ Cloning vectors A cloning vector is a stable, replicating 18.5 Southern blotting and hybridization with probes can be used to locate a few specific fragments in a large pool of DNA. 515 516 Chapter 18 Unique restrictionenzyme cleavage sites Bam HI Pst I ori (orgin of replication) Selectable marker EcoRI KpnI SacI BamHI XmaI SmaI XbaI ◗ Sal I Eco RI 1 First, a cloning vector must contain an origin of replication recognized in the host cell so that it is replicated along with the DNA that it carries. SalI HincII AccII 3 Third, a cloning vector needs a single cleavage site for one or more restriction enzymes. Pst I BspMI Hind III SphI Polylinker lacZ + ori pUC19 amp R ◗ 18.7 The pUC19 plasmid is a typical cloning vector. It contains a cluster of unique restriction sites, an origin of replication, and two selectable markers—an ampicillin-resistance gene and a lacZ gene. ward method cannot be used. Second, this technique often leads to undesirable products. The sticky ends of the plasmid are complementary to each other; so the two ends of the plasmid will often simply reanneal, reproducing the intact plasmid. Alternatively, the two complementary ends of the cleaved foreign DNA may anneal; or several pieces of foreign DNA or several plasmids may join. However, these undesirable products do not constitute a serious problem if an efficient method is used for screening bacterial cells for the presence of a recombinant plasmid. Another method for inserting DNA into a plasmid (a method that gets around the problem of undesired products) is tailing ( ◗ FIGURE 18.8b). In this procedure, comple- HindIII 2 Second, it should carry selectable markers—traits that enable cells containing the vector to be selected or identified. 18.6 Three characteristics of an idealized cloning vector. An origin of replication, one or more selectable markers, and one or more unique restriction sites. mentary sticky ends are created on blunt-ended pieces of DNA. The plasmid and the foreign DNA are first cut by any restriction enzyme. If the restriction enzyme produces sticky ends, these ends are removed by an enzyme that digests single-stranded DNA. Alternatively, the plasmid and foreign DNA can be cut by a restriction enzyme that produces blunt ends. Once the plasmid and the foreign DNA have blunt ends, single-stranded sticky ends are added by an enzyme called terminal transferase, which adds any available nucleotides to the 3 end of DNA in a template-independent reaction. For example, terminal transferase and deoxyadenosine triphosphate (dATP) might be mixed with the plasmid DNA, creating poly(A) single-stranded tails on the 3 ends of the plasmid. Terminal transferase and deoxythymidine triphosphate (dTTP) could be mixed with the blunt-ended foreign DNA fragments, creating poly(T) single-stranded tails on their 3 ends. The poly(A) tail of the plasmid would be complementary to the poly(T) tail of the foreign DNA, allowing them to anneal and connecting the plasmid and foreign DNA together. DNA polymerase can be used to fill in any missing nucleotides, and DNA ligase can be used to seal the nicks in the sugar – phosphate backbone. One advantage of tailing is that it prevents the production of the undesired products created by restriction cloning: the single-stranded ends of the plasmid are complementary only to the single-stranded ends of the foreign DNA. Another advantage is that identical restriction sites are not required in plasmid and foreign DNA; any restriction site can be used for cleavage. But tailing has several disadvantages of its own. First, it destroys the restriction site used to cut the original molecule, preventing later cleavage by the same restriction enzyme to retrieve the foreign DNA. Second, the new nucleotides (the complementary tails) introduced at the junctions between plasmid and foreign DNA sometimes interfere with the function of the cloned DNA. A third method of inserting fragments into plasmids is to use the enzyme T4 ligase, which is capable of connecting any two pieces of blunt-ended DNA. Like tailing, this Recombinant DNA Technology (a) Restriction cloning Plasmid EcoRI Foreign DNA Plasmid (c) Cloning by using linkers Foreign DNA Foreign DNA EcoRI G TTC AA AAG TT C (b) Cloning by tailing 517 GAATTC CTTAAG GAATTC CTTAAG 1 The plasmid and the foreign DNA are cut by the same restriction enzyme—in this case, EcoRI. TC G A GA CT AT TA AATTC CTTAA G G CTTAAG 1 The foreign DNA is cut by any restriction enzyme, and sticky ends are digested away. GGATCC CCTAGG Sticky ends GA GT A TA Complementary sticky ends 1 The plasmid and the foreign DNA are cleaved by any restriction enzyme. TC T AG 2 If sticky ends are produced, these ends are removed by an enzyme that digests single-stranded DNA. BamHI linkers 2 Linkers, small DNA fragments containing a restriction site, are added to the blunt ends of the foreign DNA by T4 DNA ligase. The linkers are selected so that the foreign DNA can be inserted into a restriction site in the plasmid. GGATCC CCTAGG 2 When mixed, the sticky ends anneal, joining the foreign DNA and plasmid. 3 The restriction sites in the linkers and plasmid are cut by the restriction enzyme,… Plasmid TC G A GA CT AT TA Terminal transferase and dATP Terminal transferase and dTTP GGATCC CCTAGG GGATCC CCTAGG GA GT A TA TC T AG AA AA Cut by Bam HI Cut by BamHI TT Complementary sticky ends AA GATCC G G CCTAG AA DNA ligase TC AG GA CT AT TA GA G T AT TA 3 The plasmid and foreign DNA are mixed, and the complementary sticky ends join, connecting the plasmid and foreign DNA. A TC C G G CCTAG GG GATCC G GGAT C C TA C C G TTT 4 …and the foreign DNA and plasmid are mixed. TC AG TT A A 4 DNA polymerase is added to insert any missing nucleotides,… Mix DNA fragments DNA ligase AA TT A T ◗ AA TT A T 5 The complementary sticky ends of the plasmid and foreign DNA anneal, and DNA ligase seals the nicks. GGAT C C TA CC GG A A 5 …and the nicks in the sugar– phosphate groups are sealed by DNA ligase. A TCC G G TAGG CC TT AA T A T T 18.8 A foreign DNA fragment can be inserted into a plasmid with the use of (a) restriction cloning, (b) tailing, or (c) linkers. A 3 Nicks in the sugar–phosphate bonds are sealed by DNA ligase. AA 518 Chapter 18 method requires no specific restriction sites and has great versatility; its chief drawback is that it creates a number of undesired products. A fourth method, and one commonly used today, is the use of linkers to add complementary ends to DNA molecules ( ◗ FIGURE 18.8c). Linkers are small, synthetic DNA fragments that contain one or more restriction sites. The foreign DNA of interest is cut by any restriction enzyme; if sticky ends are created, they are digested to produce blunt ends. The linkers are then attached to the blunt ends by T4 ligase and are then cut by a restriction enzyme, generating sticky ends that are complementary to sticky ends on the plasmid, which have been generated by using the same restriction enzyme to cut the plasmid. Mixing the plasmid and foreign DNA leads to the formation of recombinant DNA that can be stabilized by ligase. The great advantage of using linkers is that a particular restriction site can be added at almost any desired location; so any two pieces of DNA can be cut and joined. Transformation When a gene has been placed inside a plasmid, the plasmid must be introduced into bacterial cells. This task is usually accomplished by transformation, which is the capacity of bacterial cells to take up DNA from the external environment (see Chapter 8). Some types of cells undergo transformation naturally; others must be treated chemically or physically before they will undergo transformation. Inside the cell, the plasmids replicate and multiply. The use of selective markers Cells bearing recombinant plasmids can be detected by using the selectable markers on the plasmid. One type of selectable marker commonly used with plasmids is a copy of the lacZ gene ( ◗ FIGURE 18.9). The lacZ gene contains a series of unique restriction sites into which may be inserted a fragment of DNA to be cloned. In the absence of an inserted fragment, the lacZ gene is active and produces -galactosidase. When foreign DNA is inserted into the restriction site, it disrupts the lacZ gene, and -galactosidase is not produced. The plasmid also usually contains a second selectable marker, which may be a gene that confers resistance to an antibiotic such as ampicillin. Bacteria that are lacZ are transformed by the plasmids and plated on medium that contains ampicillin. Only cells that have been successfully transformed and contain a plasmid with the ampicillin-resistance gene will survive and grow. Some of these cells will contain an intact plasmid, whereas others possess a recombinant plasmid. The medium also contains the chemical X-gal. Bacterial cells with an intact original plasmid — without an inserted fragment — have a functional lacZ gene and can synthesize -galactosidase, which cleaves X-gal and turns the bacteria blue. Bacterial cells with a recombinant plasmid, however, have a -galactosidase gene that is disrupted by the inserted DNA; they do not synthesize -galactosidase and remain white. Thus, the color of ◗ 18.9 The lacZ gene can be used to screen bacteria containing recombinant plasmids. A special plasmid carries a copy of the lacZ gene and an ampicillin resistance gene. (Photo: Cytographics/Visuals Unlimited.) the colony allows quick determination of whether a recombinant or intact plasmid is present in the cell. Plasmids make ideal cloning vectors but can hold only DNA less than about 15 kb in size. When large DNA fragments are inserted into a plasmid vector, the plasmid Recombinant DNA Technology becomes unstable. Cloning DNA fragments that are longer than 15 kb requires the use of different cloning vectors. Eco RI 15 kb DNA fragments can be inserted into cloning vectors, stable pieces of DNA that will replicate within a cell. Cloning vectors must have an origin of replication, one or more unique restriction sites, and selectable markers. Plasmids are commonly used as cloning vectors. Cosmid vectors Although only about 23 kb of DNA can be cloned in  vectors, DNA fragments as large as about 44 kb can be cloned in cosmids, which combine the properties of plasmids and phage vectors. Cosmids are small plasmids that carry phage  cos sites; they can be packaged into viral coats and transferred to bacteria by viral infection. Because all viral genes except the cos sites are missing, a cosmid can carry more than twice as much foreign DNA as can a phage vector. Cosmid vectors Eco RI Nonessential genes Concepts Bacteriophage vectors Bacteriophages offer a number of advantages as cloning vectors. The most widely used bacteriophage vector is bacteriophage , which infects E. coli. One of its chief advantages is the high efficiency with which it transfers DNA into bacteria cells. A second advantage is that about a third of the  genome is not essential for infection and reproduction; without these genes, a  particle will still faithfully inject its DNA into a bacterial cell and reproduce. These nonessential genes, which comprise about 15 kb, can be replaced by as much as about 23 kb of foreign DNA. A third advantage is that DNA will not be packaged into a  coat unless it is 40 to 50 kb long; so fragments of foreign DNA are not likely to be transferred by the vector unless they are inserted into the  genome, which ensures that the foreign DNA fragment will be replicated after it enters the cell. The essential genes of the phage  genome are located in a cluster. Strains of phage , called replacement vectors, have been engineered with unique EcoRI sites on either side of the nonessential genes ( ◗ FIGURE 18.10) so that, by using EcoRI, the nonessential genes can be removed. Foreign DNA cut with EcoRI will have sticky ends that are complementary to those on the ends of the essential  genes, to which it can be connected by ligase. The  chromosome possesses short, single-stranded ends called cos sites that are required for packaging  DNA into a phage head. The recombinant phage chromosomes can then be packaged into protein coats and added to E. coli. The phages inject their recombinant DNA into the cell, where it will be replicated. Only DNA fragments of the proper size and containing essential genes will be packaged into the phage coats, providing an automatic selection system for recombinant vectors. 45 kb Phage chromosome 1 Cleavage allows nonessential genes to be removed. 2 The left and right arms are mixed with foreign DNA also cut by EcoRI. Eco RI EcoRI 23 kb 3 Complementary sticky ends of DNA anneal. 4 The recombinant phage chromosome is then packaged into λ protein coats. ◗ 18.10 Phage  is an effective cloning vector. have the following components: (1) a plasmid origin of replication (ori); (2) one or more unique restriction sites; (3) one or more selectable markers; and (4) cos sites to allow the packaging of DNA into phage heads. Foreign DNA is inserted into cosmids in the same way that DNA is introduced into plasmids: the cosmid and foreign DNA are both cut by a restriction enzyme that produces complementary (sticky) ends, and they are joined by DNA ligase. Recombinant cosmids are incorporated into the coats, and the phage particles are used to infect bacterial cells, where the cosmid replicates as a plasmid. Table 18.3 compares the properties of plasmids, phage  vectors, and cosmids. Concepts Bacteriophage vectors not only hold more DNA than do plasmids, but also transfer foreign DNA into bacterial cells at a relatively high rate. A cosmid vector consists of a plasmid with cos sites, which allow DNA to be packaged into phage protein coats. Cosmids hold more DNA than do bacteriophage vectors. 519 Chapter 18 Table 18.3 Comparison of plasmids, phage lambda vectors, and cosmids Cloning Vector Size of DNA That Can Be Cloned Method of Propagation Introduction to Bacteria Plasmid As large as 15 kb Plasmid replication Transformation Phage lambda As large as 23 kb Phage reproduction Phage infection Cosmid As large as 44 kb Plasmid reproduction Phage infection Note: 1 kb  1000 bp Expression vectors Sometimes the goal in gene cloning is 2. A DNA sequence that, when transcribed into RNA, not just to replicate the gene, but also to produce the protein that it encodes. One of the first commercial products produced by recombinant DNA technology was the protein insulin. The gene for human insulin was isolated and inserted into bacteria, which were then multiplied and used to synthesize human insulin. However, the successful expression of a human gene in a bacterial cell is not a straightforward matter. Although the universality of the genetic code allows human genes to specify the same protein in both human and bacterial cells, the sequences that regulate transcription and translation are quite different in bacteria and eukaryotes. To ensure transcription and translation, a foreign gene is usually inserted into an expression vector, which, in addition to the usual origin of replication, restriction sites, and selectable markers, contains sequences required for transcription and translation in bacterial cells ( ◗ FIGURE 18.11). These additional sequences may include: produces a prokaryotic ribosome binding site. 3. Prokaryotic transcription initiation and termination sequences. 4. Sequences that control transcription initiation, such as regulator genes and operators. 1. A bacterial promoter, such as the lac promoter. The promoter precedes a restriction site where foreign DNA is to be inserted, allowing transcription of the foreign sequence to be regulated by adding substances that induce the promoter. The bacterial promoter and ribosome-binding site are usually placed upstream of the restriction site, which allows the foreign DNA to be inserted just downstream of the initiation codon. When the plasmid is placed in a bacterial cell, RNA polymerase binds to the promoter and transcribes the foreign DNA. Bacterial ribosomes attach to the ribosome-binding site on the RNA and translate the sequence into a foreign protein. Concepts An expression vector contains a promoter, a ribosome-binding site, and other sequences that allow a cloned gene to be transcribed and translated in bacteria. Restriction sites 1 Expression vectors contain operon sequences that allow inserted DNA to be transcribed and translated. Transcription termination sequence O Operator (O ) Bacterial promoter (P) sequences ◗ 18.11 To ensure transcription and translation, a foreign gene may be inserted into an expression vector — in this example, an E. coli expression vector. 2 They also include sequences that regulate—turn on or turn off the desired gene. Gene-encoding repressor that binds O and regulates P P 520 Ribosomebinding site Transcription initiation sequences ori Selectable genetic marker (e.g., antibiotic resistance) Recombinant DNA Technology Cloning vectors for eukaryotes The vectors discussed so far allow genes to be cloned in bacterial cells. Other cloning vectors have been developed for transferring genes into eukaryotic cells. Special plasmids, for example, have been developed for cloning in yeast, and retroviral vectors have been developed for cloning in mammals. Shuttle vectors are used to shuttle genes back and forth between two hosts. For example, plasmids have been engineered that allow gene sequences to be cloned and manipulated in bacteria and then transferred to yeast cells for study. For this reason, they must contain replication origins and selectable markers that work in both hosts. Yeast artificial chromosomes (YACs ) are DNA molecules with a yeast origin of replication, a pair of telomeres, and a centromere. Mitotic spindle fibers attach to the centromere, and YACs segregate in the same way as yeast chromosomes; the telomeres ensure that YACs remain stable within the cell; and the origin of replication allows YACs to be replicated. YACs are particularly useful because they can carry DNA fragments as large as 600 kb, and some special 521 YACs can carry inserts of more than 1000 kb. YACs have been modified so that they can be used in eukaryotic organisms other than yeast (see introduction to Chapter 11). Bacterial artificial chromosomes (BACs), constructed from F factors (see Chapter 8), are used to clone large fragments ranging in length from 100 to 500 kb in bacteria. The soil bacterium Agrobacterium tumefaciens, which invades plants through wounds and induces crown galls (tumors), has been used to transfer genes to plants. This bacterium contains a large plasmid called the Ti plasmid, part of which is transferred to a plant cell when A. tumefaciens infects a plant. In the plant, the Ti plasmid DNA integrates into one of the plant chromosomes where it is transcribed and translated to produce several enzymes that help support the bacterium ( ◗ FIGURE 18.12a). Transfer of the DNA segment from the Ti plasmid to a plant chromosome requires two 25-bp sequences that flank the Ti DNA, as well as several genes located in the Ti plasmid. Geneticists have engineered an Agrobacterium – E. coli shuttle vector that contains the flanking sequences required (a) Plant chromosome Plant chromosome Ti DNA TL Infection TR Bacterial Ti plasmid chromosome Agrobacterium tumefaciens (b) Shuttle vector Restriction site Sequences required for transfer Plant cell 1 In natural gene transfer, the Agrobacterium invades the plant at a wound. 2 Part of the Ti plasmid is transferred to the plant cell… 6 The helper Ti plasmid is required for infection. Foreign DNA ori Infection Helper Ti plasmid 3 …where it integrates into one of the plant chromosomes. Plant chromosome Foreign DNA Selectable marker Foreign DNA Bacterial chromosome 4 Foreign DNA is inserted into an Agrobacterium– E. coli shuttle vector… ◗ 18.12 Shuttle vector 5 …and transferred to Agrobacterium tumefaciens with the Ti plasmid. The Ti plasmid can be used to transfer genes into plants. Infection 7 The shuttle vector, along with any foreign DNA that it carries,… 8 …is then transferred to a plant cell where it integrates into a plant chromosome. 522 Chapter 18 to transfer DNA, a selectable marker, and restriction sites into which foreign DNA can be inserted ( ◗ FIGURE 18.12b). When placed in A. tumefaciens with the Ti plasmid, the shuttle vector will transfer the foreign DNA that it carries into a plant cell, where it will integrate into a plant chromosome. This vector has been used to transfer genes that confer economically significant attributes such as resistances to herbicides, plant viruses, and insect pests. Concepts Special cloning vectors are used for introducing genes into eukaryotes; they include shuttle vectors, which can reproduce in two different hosts, yeast and bacterial artificial chromosomes, which hold DNA fragments hundreds of thousands of base pairs in length, and the Ti plasmid, which transfers genes to plants. Finding Genes In our consideration of gene cloning, we’ve glossed over a problem of major significance: How do we find the DNA sequence to be cloned in the first place? In fact, this problem is frequently the most significant one in cloning, because there are often millions or billions of base pairs of DNA in a cell. A discussion of how to solve this problem has been purposely delayed until now because, paradoxically, one must often clone a gene to find it. This approach — to clone first and search later — is called “shotgun cloning,” because it is like hunting with a shotgun: one sprays one’s shots widely in the general direction of the quarry, knowing that there is a good chance that one or more of the pellets will hit the intended target. In shotgun cloning, one first clones a large number of DNA fragments, knowing that one or more contains the DNA of interest, and then searches for the fragment of interest among the clones. A collection of clones containing all the DNA fragments from one source is called a DNA library. For example, we might isolate genomic DNA from human cells, break it into fragments, and clone all of them in bacterial cells or phages. The set of bacterial colonies or phages containing these fragments is a human genomic library, containing all the DNA sequences found in the human genome. Creating a genomic library To create a genomic library, cells are collected and disrupted, which causes them to release their DNA and other cellular contents into an aqueous solution. There are several methods for isolating the DNA from the other cellular contents. In one method, phenol (an organic solvent that does not mix well with water) is added to the mixture, which is then shaken. The proteins from the cell associate with phenol, whereas the DNA and RNA remain in the aqueous solution, which is removed with the use of a pipette. The nucleic acids are then precipitated from this so- lution by adding cold alcohol. RNA can be removed by adding an enzyme that degrades RNA but not DNA. When DNA has been extracted, it is cut into fragments by using a restriction enzyme to digest it for a limited amount of time only (a partial digestion) so that only some of the restriction sites in each DNA molecule are cut. Because which sites are cut is random, different DNA molecules will be cut in different places, and a set of overlapping fragments will be produced ( ◗ FIGURE 18.13). The fragments are then joined to plasmid, phage, or cosmid vectors, which can be transferred to bacteria. This technique produces a set of bacterial cells or phage particles containing the overlapping genomic fragments. A few of the clones contain the entire gene of interest, a few contain parts of the gene, but most contain fragments that have no part of the gene of interest. A genomic library must contain a large number of clones to ensure that all DNA sequences in the genome are represented in the library. A library of the human genome formed by using cosmids, each carrying a random DNA fragment from 35,000 to 44,000 bp long, would require about 350,000 cosmid clones to provide a 99% chance that every sequence is included in the library. Creating a cDNA library An alternative to creating a genomic library is to create a library consisting only of those DNA sequences that are transcribed into mRNA (called a cDNA library because all the DNA in this library is complementary to mRNA). Much of eukaryotic DNA consists of repetitive (and other DNA) sequences that are not transcribed into mRNA (see p. 000 in Chapter 11), and the sequences are not represented in a cDNA library. A cDNA library has two additional advantages. First, it is enriched with fragments from actively transcribed genes. Second, introns do not interrupt the cloned sequences; introns would pose a problem when the goal is to produce a eukaryotic protein in bacteria, because most bacteria have no means of removing the introns. The disadvantage of a cDNA library is that it contains only sequences that are present in mature mRNA. Introns and any other sequences that are altered after transcription are not present; sequences, such as promoters and enhancers, that are not transcribed into RNA also are not present in a cDNA library. It is also important to note that the cDNA library represents only those gene sequences expressed in the tissue from which the RNA was isolated. Furthermore, the frequency of a particular DNA sequence in a cDNA library depends on the abundance of the corresponding mRNA in the given tissue. In contrast, almost all genes are present at the same frequency in a genomic DNA library. To create a cDNA library, messenger RNA must first be separated from other types of cellular RNA (tRNA, rRNA, snRNA, etc.). Most eukaryotic mRNAs possess a string of adenine nucleotides at the 3 end, and this poly(A) tail provides a convenient hook for separating eukaryotic mRNA from the other types. Total cellular RNA is isolated from cells and poured through a column packed Recombinant DNA Technology 1 Multiple copies of genomic DNA are digested by a restriction enzyme for a limited time so that only some of the restriction sites in each molecule are cut. Restriction sites Genomic DNA Gene of interest 2 Different DNA molecules are cut in different places, providing a set of overlapping fragments. 3 Each fragment is then joined to a cloning vector… 4 …and transferred to a bacterial cell,… 5 …producing a set of clones containing overlapping genomic fragments, some of which may include segments of the gene of interest. Conclusion: Some clones contain the entire gene of interest, others include part of the gene, and most contain none of the gene of interest. ◗ 18.13 A genomic library contains all of the DNA sequences found in an organism’s genome. with short fragments of DNA consisting entirely of thymine nucleotides [oligo(dT) chains; ◗ FIGURE 18.14a]. As the RNA moves through the column, the poly(A) tails of mRNA molecules pair with the oligo(dT) chains and are retained in the column, whereas the rest of the RNA passes through. The mRNA can then be washed from the column by adding a buffer that breaks the hydrogen bonds between poly(A) tails and oligo(dT) chains. The mRNA molecules are then copied into cDNA by reverse transcription. Short oligo(dT) primers are added to the mRNA. A primer pairs with the poly(A) tail at the 3 end of the mRNA, providing a 3-OH group for the initiation of DNA synthesis ( ◗ FIGURE 18.14b). Reverse transcriptase, an enzyme isolated from retroviruses (see p. 000 in Chapter 8), synthesizes single-stranded complementary DNA from the RNA template by adding DNA nucleotides to the 3-OH group of the primer. The resulting RNA – DNA hybrid molecule is then converted into a double-stranded cDNA molecule by one of several methods. One common method is to treat the RNA – DNA hybrid with RNase to partly digest the RNA strand. Partial digestion leaves gaps in the RNA – DNA hybrid, allowing DNA polymerase to synthesize a second DNA strand by using the short undigested RNA pieces as primers and the first DNA strand as a template. DNA polymerase eventually displaces all the RNA fragments, replacing them with DNA nucleotides, and nicks in the sugar – phosphate backbone are sealed by DNA ligase. Concepts One method of finding a gene is to create and screen a DNA library. A genomic library is created by cutting genomic DNA into overlapping fragments and cloning each fragment in a separate bacterial cell. A cDNA library is created from mRNA that is converted into cDNA and cloned in bacteria. Screening DNA libraries Creating a genomic or cDNA library is relatively easy compared with screening the library to find clones that contain the gene of interest. The screening procedure used depends on what is known about the gene. The first step in screening is to plate out the clones of the library. If a plasmid or cosmid vector was used to construct the library, the cells are diluted and plated so that each bacterium grows into a distinct colony. If a phage vector was used, the phages are allowed to infect a lawn of bacteria on a petri plate. Each plaque or bacterial colony contains a single, cloned DNA fragment that must be screened for the gene of interest. One common way to screen libraries is with probes. We’ve seen how probes can be used to find specific fragments of DNA on an electrophoretic gel (see Figure 18.5). In a similar way, probes can be used to find cloned fragments of DNA in bacteria or phages. To use a probe, replicas of the plated colonies or plaques in the library must first be made. ◗ FIGURE 18.15 illustrates this procedure for a cosmid library. 523 524 Chapter 18 Elution column (a) TT 1 A special column contains short oligo(dT) chains linked to cellulose. (b) Poly(A) tail mRNA 5’ AAAAAA 3’ TT TT T TTTTTT Cellulose AA AA AA A AA A AA A 2 mRNAs have poly(A) tails.Total cellular RNA is isolated from cells and passed through the column. 5’ 3 The poly(A) tails of mRNA molecules pair with the oligo(dT) chains and the mRNA is retained in the column,… 5’ 3’ 7 Oligo(dT) primers are added, which anneal to the poly(A) tails of the mRNA and provide 3’-OH groups for DNA synthesis. OH Reverse transcriptase and dNTPs AAAAAA 3’ T T T T T T 5’ 8 Reverse transcriptase synthesizes a DNA strand by using the mRNA as a template. mRNA–DNA hybrid AA AA T T AA T T T T AAAAAA 3’ T T T T T T 5’ RNase 9 The RNA–DNA hybrid molecule is briefly treated with RNase, which partly digests the RNA strand. mRNA 5’ 3’ AAA 3’ T T T T T T 5’ cDNA 4 …whereas the rest of the RNA passes through. Buffer 5 The mRNA can then be washed from the column by adding a buffer that breaks the hydrogen bonds between the poly(A) tails and the oligo(T) chains,… DNA polymerase DNA 5’ 3’ AAA AAA 3’ T T T T T T 5’ DNA DNA ligase AA AAA A AA AAA 6 …leaving only mRNA with poly(A) tails. 10 DNA polymerase is used to synthesize the second DNA strand by using the short undigested RNA pieces as primers,… 5’ 3’ 11 …and the nicks in the sugar– phosphate backbone are sealed by DNA ligase. AAAAAA 3’ T T T T T T 5’ ◗ 18.14 A cDNA library contains only those DNA sequences that are transcribed into mRNA. How is a probe obtained when the gene has not yet been isolated? One option is to use a similar gene from another organism as the probe. For example, if we wanted to screen a human genomic library for the growth-hormone gene and the gene had already been isolated from rats, we could use a purified rat gene sequence as the probe to find the human gene for growth hormone. Successful hybridization does not require perfect complementarity between the probe and the target sequence; so a related sequence can often be used as a probe. The temperature and salt concentration of the hybridization reaction can be adjusted to regulate the degree of complementarity required for pairing to take place. Alternatively, synthetic probes can be created if the protein pro- duced by the gene has been isolated and its amino acid sequence has been determined. With the use of the genetic code and the amino acid sequence of the protein, possible nucleotide sequences of a small region of the gene can be deduced. Although only one sequence in the gene encodes a particular protein, the presence of synonymous codons means that the same protein could be produced by several different DNA sequences, and it is impossible to know which is correct. To overcome this problem, a mixture of all the possible DNA sequences is used as a probe. To minimize the number of sequences required in the mixture, a region of the protein is selected with relatively little degeneracy in its codons ( ◗ FIGURE 18.16). Recombinant DNA Technology 525 ◗ 18.15 Genomic and cDNA libraries may be screened with a probe to find the gene of interest. When part of the DNA sequence of the gene has been determined, a set of DNA probes can be synthesized chemically by using an automated machine known as an oligonucleotide synthesizer. The resulting probes can be used to screen a library for a gene of interest. Yet another method of screening a library is to look for the protein product of a gene. This method requires that the DNA library be cloned in an expression vector. The clones can be tested for the presence of the protein by using an antibody that recognizes the protein or by using a chemical test for the protein product. This method depends on the existence of a test for the protein produced by the gene. Almost any method used to screen a library will identify several clones, some of which will be false positives that do not contain the gene of interest; several screening methods may be needed to determine which clones actually contain the gene. Concepts A DNA library can be screened for a specific gene by using complementary probes that hybridize to the gene. Alternatively, the library can be cloned into an expression vector, and the gene can be located by examining the clones for the protein product of the gene. Known part of amino acid sequence Possible codons Gly Val Arg Met Asp Trp Asn Tyr Glu Pro Leu Ser Thr Trp Glu Met Asn Gln Trp Phe Val Arg Ala GGA GGC GGG GGU GUA GUC GUG GUU AGA AUG GAC UGG AAC UAC GAA CCA UUA AGC GAU AAU UAU GAG CCC UUG AGU AGG CCG CUA UCA CGA CCU CUC UCC CGC CUG UCG CGG CUU UCU CGU ACA UGG GAA AUG AAC CAA UGG UUC GUA AGA GAG AAU CAG UUU GUC AGG ACC GUG CGA ACG GUU CGC ACU CGG CGU C C C A AUGGAUUGGAAUUAUGAG (2222 = 16 possible sequences) ◗ 18.16 A synthetic probe can be designed on the basis of the genetic code and the known amino acid sequence of the protein encoded by the gene of interest. Because of ambiguity in the code, the same protein can be encoded by several different DNA sequences, and probes consisting of all the possible DNA sequences must be synthesized. To minimize the number of sequences that must be synthesized, a region of the gene with minimal degeneracy is picked. A C A UGGGAGAUGAAUCAGUGG GCA GCC GCG GCU This sequence would make a better probe because there is less degeneracy than in the sequence at left. (222 = 8 possible sequences) 526 Chapter 18 Chromosome walking For many genes with important functions, no associated protein product is yet known. The biochemical bases of many human genetic diseases, for example, are still unknown. How could these genes be isolated? One approach is to first determine the general location of the gene on the chromosome by using recombination frequencies derived from crosses or pedigrees (see p. 000 in Chapter 7). After the gene has been placed on a chromosome map, neighboring genes that have already been cloned can be identified. With the use of a technique called chromosome walking ( ◗ FIGURE 18.17), it is possible to move from these neighboring genes to the new gene of interest. The basis of chromosome walking is the fact that a genomic library consists of a set of overlapping DNA fragments (see Figure 18.13). We start with a cloned gene or DNA sequence that is close to the new gene of interest so that the “walk” will be as short as possible. One end of the clone of a neighboring gene (clone A in Figure 18.17) is used to make a complementary probe. This probe is used to screen the genomic library to find a second clone (clone B) that overlaps with the first and extends in the direction of the gene of interest. This second clone is isolated and purified and a probe is prepared from its end. The second probe is used to screen the library for a third clone (clone C) that overlaps with the second. In this way, one can walk systematically toward the gene of interest, one clone at a time. A number of important human genes and genes of other organisms have been found in this way. 1 A probe complementary to the end of clone A is used to find overlapping clone B. Clone A 2 A probe complementary to the end of clone B is used to find overlapping clone C. Clone B 3 A probe complementary to the end of clone C is used to find overlapping clone D containing gene of interest. Clone C Clone D Previously cloned gene Direction of walk Gene of interest Conclusion: By making probes complementary to areas of overlap between cloned fragments in a genomic library, we can connect a gene of interest to a previously mapped, linked gene.. ◗ 18.17 In chromosome walking, neighboring genes are used to locate a gene of interest. Concepts In chromosome walking, a gene is first mapped in relation to a previously cloned gene. A probe made from one end of the cloned gene is used to find an overlapping clone, which is then used to find another overlapping clone. In this way, it is possible to walk down the chromosome to the gene of interest. Connecting Concepts Cloning Strategies All gene-cloning experiments have four basic steps: 1. Isolation of a DNA fragment 2. Joining of the fragment to a cloning vector 3. Introduction of the cloning vector, along with the inserted DNA fragment, into host cells 4. Identification of cells containing the recombinant DNA molecule We’ve now considered a number of different methods for carrying out these four steps. There is no single procedure for cloning a gene but rather a variety of methods, each with its strengths and weaknesses. The particular combination of methods chosen for a cloning experiment is termed the cloning strategy. In developing a cloning strategy, a number of factors must be taken into consideration. These factors include how much is known about the gene to be cloned, the size and nature of the gene, and the ultimate purpose of the cloning experiment. The procedure for cloning a small, well-characterized DNA fragment for sequencing would be very different from that for cloning a large, poorly known gene for the commercial production of a protein. The first step in gene cloning is to find the particular gene or DNA fragment of interest. There are two basic approaches. In one approach, a DNA library can be constructed from genomic or cDNA, and the library can be screened to find the gene of interest. In the other approach, the gene can first be isolated and then cloned. Which approach is used depends largely on what is already known about the gene. Has the gene been mapped? Is there a probe available for screening? Is the amino acid sequence of a protein encoded by the gene known? If one chooses to make and screen a DNA library, the next problem is to select the best source of DNA. Will it be a genomic library or a cDNA library? If the purpose is to clone the gene in an expression vector and produce a protein, then a cDNA library is ideal. Using a cDNA library means that introns (which bacteria cannot splice out) will be excluded, and fewer colonies will need to be screened. If, on the other hand, the purpose is to examine the regu- Recombinant DNA Technology latory sequences or the introns within a gene, then a genomic library is required. The next important decision in developing a cloning strategy is to select the cloning vector. The choice depends on a number of factors: • • The length of the sequence to be cloned. For a sequence only a few thousand base pairs in length, a plasmid may be the best choice; if one wants to clone a gene that is 35 kb or longer, a cosmid will be required. • The organism in which the gene will be cloned. Some vectors are specific for E. coli, whereas others are specific for other bacteria or for eukaryotic cells. • The selection methods used to find cells containing a plasmid with the inserted gene. One may need a vector with selectable markers so that cells containing the gene can be identified. • The need for the inserted gene to be expressed. If the protein product is desired, it may be necessary to use an expression vector that contains a promoter and other sequences that ensure transcription and translation of the inserted gene. A cloning strategy must also take into consideration the best method for joining the DNA fragment and cloning vector. Important points here include the simplicity and ease of the method, the need to retain restriction sites so that the foreign gene can later be retrieved from the vector, and whether the gene sequence must be joined to a promoter and other regulatory sequences to ensure transcription. The method chosen for moving the vector into the host cell is usually dictated by the type of vector; plasmids are transferred to bacterial cells by transformation, whereas phage vectors and cosmids are transferred by viral infection. The procedure for screening cells to find those with recombinant molecules depends on how much is known about the cloned fragment, the efficiency of transfer, and the cloning vector used. Considerations used in a developing a cloning strategy are summarized in Table 18.4. Table 18.4 The need for efficiency of transfer to host cells. If selection methods can be used to screen a large number of cells, then a low rate of transfer may be adequate; but, if screening is less efficient or is costly, a higher rate of transfer may be desirable. Considerations in developing a cloning strategy Step in Gene Cloning Considerations 1. Isolation of DNA fragment a. The purpose of cloning (is expression required?). Is the entire sequence needed? b. What is known about the gene and the protein (if any) that it encodes? c. The size of the gene. d. Is the chromosomal location of the gene known? e. Size of the genome from which the gene is isolated. 2. Joining DNA fragment to vector a. Type i. ii. iii. iv. v. vi. of cloning vector used. The size of the gene. The organism into which the gene will be cloned. The need for a selection mechanism. Whether expression is required. Efficiency of transfer to host cell required. The purpose of cloning. b. Method of joining the gene to vector. i. Simplicity of method. ii. Availability of restriction sites. iii. The need to retrieve the fragment from the vector. iv. Whether expression is required. v. The purpose of cloning. 3. Transfer of recombinant vector to host cell a. Type of cloning vector used. 4. Identification of cells carrying recombinant molecule a. b. c. d. Known information about the gene. Type of cloning vector used. Efficiency of transfer. Purpose of cloning. 527 528 Chapter 18 Using the Polymerase Chain Reaction to Amplify DNA A major problem in working at the molecular level is that each gene is a tiny fraction of the total cellular DNA. Because each gene is rare, it must be isolated and amplified before it can be studied. Before mid-1980, the only procedure available for amplifying DNA was gene cloning — placing the gene in a bacterial cell and multiplying the bacteria. Cloning is labor intensive and requires at least several days to grow the bacteria. In 1983, Kary Mullis of the Cetus Corporation conceptualized a new technique for amplifying DNA in a test tube. The polymerase chain reaction allows DNA fragments to be amplified a billionfold within just a few hours. It can be used with extremely small amounts of original DNA, even a single molecule. The polymerase chain reaction has revolutionized molecular biology and is now one of the most widely used of all molecular techniques. The basis of PCR is replication catalyzed by a DNA polymerase enzyme, which has two essential requirements: 2 The DNA is quickly cooled 3 The solution is 4 …creating two 5 The entire cycle 6 Each time the cycle is repeated, the is repeated. amount of target DNA doubles. to 30º–65ºC to allow heated to 60º–70ºC; new, doubleshort single-strand primers DNA polymerase stranded DNA to anneal to their synthesizes new molecules. complementary sequences. DNA strands,… 1 DNA is heated to 90º–100ºC to separate the two strands. 5’ 3’ (1) a single-stranded DNA template from which a new DNA strand can be copied and (2) a primer with a 3-OH group to which new nucleotides can be added. Because a DNA molecule consists of two nucleotide strands, each of which can serve as a template to produce a new molecule of DNA, the amount of DNA doubles with each replication event. The starting point of DNA synthesis on the template is determined by the choice of primers. The primers used in PCR are short fragments of DNA, typically from 17 to 25 nucleotides long, that are complementary to known sequences on the template. A different primer is used for each strand. To carry out PCR, one begins with a solution that includes the target DNA (the DNA to be amplified), DNA polymerase, all four deoxyribonucleoside triphosphates (dNTPs — the substrates for DNA polymerase), primers that are complementary to short sequences on each strand of the target DNA, and magnesium ions and other salts that are necessary for the reaction to proceed. A typical polymerase chain reaction includes three steps ( ◗ FIGURE 18.18). 3’ 5’ Primer New DNA Old DNA ◗ 18.18 The polymerase chain reaction is used to amplify even very small samples of DNA. The photograph is of hot springs in Yellowstone National Park, habitat of Thermus aquaticus, the source of Taq enzyme used in PCR. (Photo: Fritz Pölking/Bruce Coleman.) Recombinant DNA Technology In step 1, a starting solution of DNA is heated to between 90 and 100C to break the hydrogen bonds between the two nucleotide strands and thus produce the necessary single-stranded templates. The reaction mixture is held at this temperature for only a minute or two. In step 2, the DNA solution is cooled quickly to between 30 and 65C and held at this temperature for a minute or less. During this short interval, the DNA strands will not have a chance to reanneal, but the primers will be able to attach to their complementary sequences on the template strands. In step 3, the solution is heated to between 60 and 70C, the temperature at which DNA polymerase can synthesize new DNA strands by adding nucleotides to the primers. Within a few minutes, two new double-stranded DNA molecules are produced for each original molecule of target DNA. The whole cycle is then repeated. With each cycle, the amount of target DNA doubles; so the target DNA increases geometrically. One molecule of DNA increases to more than 1000 molecules in 10 PCR cycles, to more than 1 million molecules in 20 cycles, and to more than 1 billion molecules in 30 cycles (Table 18.5). Each cycle is completed within a few minutes; so a large amplification of DNA can be achieved within a few hours. Two key innovations facilitated the use of PCR in the laboratory. The first was the discovery of a DNA polymerase that is stable at the high temperatures used in step 1 of PCR. The DNA polymerase from E. coli that was originally used in PCR denatures at 90C. For this reason, fresh enzyme had to be added to the reaction mixture during each cycle, slowing the process considerably. This obstacle was overcome when DNA polymerase was isolated from the bacterium Thermus aquaticus, which lives in the boiling springs of Yellowstone National Park (see Figure 18.18). This enzyme, dubbed Taq polymerase, is remarkably stable at high temperatures and is not denatured during the strand-separation step of PCR; so it can be added to the reaction mixture at the beginning of the PCR process and will continue to function through many cycles. The second key innovation was the development of automated thermal cyclers — machines that bring about the rapid temperature changes necessary for the different steps of PCR. Originally, tubes containing reaction mixtures were moved by hand among water baths set at the different temperatures required for the three steps of each cycle. In automated thermal cyclers, the reaction tubes are placed in a metal block that changes temperature rapidly according to a computer program. The polymerase chain reaction is now often used in place of gene cloning, but it does have several limitations. First, the use of PCR requires prior knowledge of at least part of the sequence of the target DNA to allow construction of the primers. Therefore PCR cannot be used to amplify a gene that has not been at least partly sequenced. Second, the capacity of PCR to amplify extremely small Table 18.5 Number of copies of DNA fragment in PCR amplification Number of PCR Cycles (n) Number of Double-Stranded Copies of Original DNA (2n) 0 1 1 2 2 4 3 8 4 16 5 32 6 64 7 128 8 256 9 512 10 1,024 20 1,048,576 30 1,073,741,824 amounts of DNA makes contamination a significant problem. Minute amounts of DNA from the skin of laboratory workers and even in small particles in the air can enter a reaction tube and be amplified along with the target DNA. Careful laboratory technique and the use of controls are necessary to circumvent this problem. A third limitation of PCR is accuracy. Unlike other DNA polymerases, Taq polymerase does not have the capacity to proofread (see p. 000 in Chapter 12) and, under standard PCR conditions, it incorporates an incorrect nucleotide about once every 20,000 bp. DNA polymerases with proofreading capacity usually incorporate an incorrect nucleotide only about once every billion base pairs. For many applications, the error rate produced by PCR is not a problem, because only a few DNA molecules of the billions produced will contain an error. However, for other applications such as the cloning of PCR products, the relatively high error rate of PCR can pose significant problems. New heat-stable DNA polymerases with proofreading capacity have been isolated, giving more accurate PCR results. A fourth limitation of PCR is that the size of the fragments that can be amplified by standard Taq polymerase is usually less than 2000 bp. By using a combination of Taq polymerase and a DNA polymerase with proofreading capacity and by modifying the reaction conditions, investigators have been successful in extending PCR amplification to larger fragments. In spite of its limitations, PCR is used routinely in a wide array of molecular applications. 529 530 Chapter 18 Concepts The polymerase chain reaction is an enzymatic, in vitro method for rapidly amplifying DNA. In this process, DNA is heated to separate the two strands, short primers attach to the target DNA, and DNA polymerase synthesizes new DNA strands from the primers. Each cycle of PCR doubles the amount of DNA. Analyzing DNA Sequences In addition to cloning and amplifying DNA, molecular techniques are used to analyze DNA molecules through a determination of their sequences and an investigation of their functions. DNA sequencing A powerful technique to emerge from recombinant DNA technology is the ability to quickly sequence DNA molecules. DNA sequencing is the determination of the sequence of bases in DNA. Sequencing allows the genetic information in DNA to be read, providing an enormous amount of information about gene structure and function. Details of DNA sequencing will be covered in Chapter 19. In situ hybridization DNA probes can be used to determine the chromosomal location of a gene or the cellular location of an mRNA in a process called in situ hybridization. The name is derived from the fact that DNA or RNA is visualized while it (a) is in the cell (in situ). This technique requires that the cells be fixed and the chromosomes be spread on a microscope slide. The chromosomes are then briefly exposed to a solution with high pH, which disrupts the pairing of the DNA bases, making them accessible to probes. A labeled probe, which binds to any complementary DNA sequences, is added. Excess probe is washed off, and the location of the bound probe is detected. Originally, probes were radioactively labeled and detected with autoradiography, but now many probes carry attached fluorescent dyes that can be seen directly with the microscope ( ◗ FIGURE 18.19a) Several probes with different colored dyes can be used simultaneously to investigate different sequences or chromosomes. In situ hybridization can also be used to determine the tissue distribution of specific mRNA molecules, serving as a source of insight into how gene expression differs among cell types ( ◗ FIGURE 18.19b). A labeled DNA or RNA probe complementary to a specific mRNA molecule is added to tissue, and the location of the probe is determined by using either autoradiography or fluorescent tags. DNA footprinting Many important DNA sequences serve as binding sites for proteins; for example, consensus sequences in promoters are often binding sites for transcription factors (see p. 000 in Chapter 11). A technique called DNA footprinting can be used to determine which DNA sequences are bound by such proteins. In a typical DNA-footprinting experiment, purified DNA fragments are labeled at one end with a radioactive isotope of phosphorus, 32P. An enzyme or chemical that (b) Nontemplate strand   C 1 10 20 30 ◗ 18.19 In in situ hybridization, DNA probes are used to determine the cellular or chromosomal location of a gene. (a) Fluorescent probes are used to mark the locations of specific gene sequences on chromosomes. (b) In situ hybridization can also be used to detect specific mRNA sequences in tissues. Here, the distribution of mRNA produced by the ftz gene (which has a role in controlling development) in a young Drosophila embryo is detected by a radioactive probe and autoradiography. [Autoradiograph from E. Hafen, A. Kuriowa, and W. J. Gehring, Cell 37:833 – 841.] Regions bound by RNA polymerase 40 50 Recombinant DNA Technology makes cuts in DNA is used to cleave the DNA randomly into subfragments, which are then denatured and separated by gel electrophoresis. The positions of the subfragments are visualized with autoradiography. This procedure is carried out both in the presence and in the absence of a particular DNA-binding protein. When the protein is absent, cleavage is random along the DNA, producing a continuous “ladder” of bands on the autoradiograph ( ◗ FIGURE 18.20). When the protein is present, it binds to specific nucleotides and protects their phosphodiester bonds from cleavage. Therefore, there is no cleavage in the area protected by the protein, and no labeled fragments terminating in the binding site appear on the autoradiograph. Their omission leaves a gap, or “footprint”, on the ladder of bands (see Figure 18.20), and the position of the footprint identifies those nucleotides bound tightly by the protein. 1 A purified DNA fragment that is labeled at one end with 32P… 32P 2 …is bound tightly by a DNA-binding protein. 3 The DNA is cut at random locations . 4 Fragments are separated by gel electrophoresis. No cleavage in area protected by DNA-binding protein 32P 32P 32P 32 P 32P Concepts 32P In situ hybridization can be used to visualize the chromosomal location of a gene or to determine the tissue distribution of an mRNA transcribed from a specific gene. DNA footprinting is used to determine the sequences to which DNA-binding proteins attach. 32P 32P 32P 32P 32P Mutagenesis A powerful way to study gene function is to create mutations at specific locations in a process called site-directed mutagenesis and then to study the effects of these mutations on the organism. A number of different strategies have been developed for site-directed mutagenesis. One strategy is to cut out a short sequence of nucleotides with restriction enzymes and replace it with a short, synthetic oligonulceotide that contains the desired mutated sequence ( ◗ FIGURE 18.21). The success of this method depends on the availability of restriction sites flanking the sequence to be altered. If appropriate restriction sites are not available, oligonucleotide-directed mutagenesis can be used ( ◗ FIGURE 18.22). In this method, a single-stranded oligonucleotide is produced that differs from the target sequence by one or a few bases. Because they differ in only a few bases, the target DNA and the oligonucleotide will pair under the appropriate conditions. When successfully paired with the target DNA, the oligonucleotide can act as a primer to initiate DNA synthesis, which produces a double-stranded molecule with a mismatch in the primer region. When this DNA is transferred to bacterial cells, the mismatched bases will be repaired by bacterial enzymes. About half of the time the normal bases will be changed into mutant bases, and about half of the time the mutant bases will be changed into normal bases. The bacteria are then screened for the presence of the mutant gene. 32P 5 The positions of the fragments with 32p on the gel are revealed by autoradiography. Top of gel 6 Because there is no cleavage in the area protected by the protein, a gap, or “footprint,” appears in the ladder of bands, identifying bases that are bound by the protein. ◗ 18.20 DNA footprinting can be used to determine which DNA sequences are bound by binding proteins. Concepts Particular mutations can be introduced at specific sites within a gene by means of site-directed mutagenesis. Transgenic animals The oocytes of mice and other mammals are large enough that DNA can be injected into them directly. Immediately after penetration by sperm, 531 532 Chapter 18 1 A short sequence of nucleotides is cut out… CC C GG CT C G A GCC G G AG T TCC G A G Gene CC GG GG G T C G CC A CTAG A TC C CC C GG 2 and replaced by a synthetic sequence containing mutated bases. Restriction endonucleases DNA ligase Recombinant plasmid DNA Plasmid contains gene with mutated sequence ◗ 18.21 In site-directed mutagenesis, restriction enzymes cut out a short sequence of nucleotides that is then replaced by a synthetic mutated DNA sequence. CCAGT Target sequence 1 An oligonucleotide is created that differs from the target sequence in a single nucleotide. GGCCA 5’ 3’ 5’ CCA GG CCAGT Mismatched bases 2 The oligonucleotide and the target DNA sequence pair. 3’ 3 The oligonucleotide is a primer for DNA synthesis,… CCA GG G CC A T 4 …which produces a molecule with a single mismatched pair. 5 The DNA is transferred to bacterial cells, and the mismatched bases are repaired by bacterial enzymes. ◗ CCA GG G CC G T C G GT A G CC A T Plasmid with original sequence Plasmid with desired sequence 6 About half of the plasmids will have the original sequence and the other half will have the altered sequence. 7 The bacteria are then screened for the presence of the altered gene. 18.22 Oligonucleotide-directed mutagenesis is used to study gene function when appropriate restriction sites are not available. a fertilized mouse egg contains two pronuclei, one from the sperm and one from the egg; these pronuclei later fuse to form the nucleus of the embryo. Mechanical devices can manipulate extremely fine, hollow glass needles to inject DNA directly into one of the pronuclei of a fertilized egg ( ◗ FIGURE 18.23). Typically, a few hundred copies of cloned, linear DNA are injected into a pronucleus, and, in a few of the injected eggs, copies of the cloned DNA integrate randomly into one of the chromosomes through a process called nonhomologous recombination. After injection, the embryos are implanted in a pseudopregnant female — a surrogate mother that has been physiologically prepared for pregnancy by mating with a vasectomized male. Only about 10% to 30% of the eggs survive and, of those that do survive, only a few have a copy of the cloned DNA stablely integrated into a chromosome. Nevertheless, if several hundred embryos are injected and implanted, there is a good chance that one or more mice whose chromosomes contain the foreign DNA will be born. Moreover, because the DNA was injected at the one-cell stage of the embryo, these mice usually carry the cloned DNA in every cell of their bodies, including their reproductive cells, and will therefore pass the foreign DNA on to their progeny. Through interbreeding, a strain of mice that is homozygous for the foreign gene can be created. Animals that have been permanently altered in this way are said to be transgenic, and the foreign DNA that they carry is called a transgene. Transgenic mice have proved useful in the study of gene function. For example, proof that the SRY gene (see p. 000 in Chapter 4) is the male-determining gene in mice was obtained by injecting a copy of the SRY gene into XX embryos and observing that these mice developed as males. In addition, a number of transgenic mouse strains that serve as experimental models for human genetic diseases have been created by injecting mutated copies of genes into mouse embryos. www.whfreeman.com/pierce transgenic animals More information on Recombinant DNA Technology Knockout mice A particularly useful variant of the ◗ 18.23 Transgenic animals have genomes that have been permanently altered through recombinant DNA technology. In this photograph a mouse embryo (red and blue) is being injected with DNA. (Photo: Jon Gordon/Phototake) transgenic approach is to produce mice in which a normal gene has been disabled. The phenotypes of these animals, called knockout mice, help geneticists to determine the function of a gene. The creation of these knockout mice begins when a normal gene is cloned in bacteria and then “knocked out”, or disabled. There are a number of ways to disable a gene, but a common method is to insert a gene called neo, which confers resistance to the antibiotic G418, into the middle of the target gene ( ◗ FIGURE 18.24). The insertion of neo both disrupts (knocks out) the target gene and provides a convenient marker for finding copies of the disabled gene. In addition, a second gene, usually the herpes simplex viral thymidine kinase (tk) gene, is cloned adjacent to the disrupted gene. The disabled gene is then transferred to cultured embryonic mouse cells, where it may exchange places with the normal chromosomal copy through homologous recombination. After the disabled gene has been transferred to the embryonic cells, the cells are screened by adding the antibiotic G418 to the medium. Only cells with the disabled gene containing the neo insert will survive. Because the frequency of nonhomologous recombination is higher than that of homologous recombination and because the intact target gene is replaced by the disabled copy only through homologous recombination, a means to select for the rarer homologous recombinants is required. The presence of the viral tk gene makes the cells sensitive to gancyclovir. Thus, transfected cells that grow on medium containing G418 and gancyclovir will contain the neo gene (disabled target gene) but not the adjacent tk gene. These cells contain the desired homologous recombinants. The nonhomologous recombinants (random insertions) will contain both the neo and the tk genes, and these transfected cells will die on the selection medium owing to the presence of gancyclovir. The surviving cells are injected into an early-stage mouse embryo, which is then implanted into a pseudopregnant mouse. Cells in the embryo carrying the disabled gene and normal embryonic cells carrying the wild-type gene will develop together, producing a chimera — a mouse that is a genetic mixture of the two cell types. The chimeric mice can be identified easily if the injected embryonic cells came from a black mouse and the embryos into which they are injected came from a white mouse; the resulting chimeras will have variegated black and white fur. The chimeras can then be interbred to produce some progeny that are homozygous for the knockout gene. The effects of disabling a particular gene can be observed in these homozygous mice. Although they are a recent innovation, knockout mice have become important subjects for research in a number of fields. They have been used to study genes implicated in immune function, development, ethology, and human behavior. 533 534 Chapter 18 Target gene Concepts neo + neo + tk + 1 A normal gene is disabled by inserting the neo+ gene. The tk + gene is cloned adjacent to the target gene, and… Embryonic stem cells from black mouse 2 … the disabled gene is transferred to embryonic mouse stem cells… Transferred sequence www.whfreeman.com/pierce A catalog of knockout mice neo + Mouse chromosome Applications of Recombinant DNA Technology Target gene Homologous recombination 3 …where the disabled copy recombines with the normal gene on the mouse chromosome. neo + 4 The cells are grown on a medium that contains the antibiotic G418 and gancyclovir; only cells that received the neo+ gene by homologous recombination survive. Mouse neo + cells 5 Cells containing a neo+ disabled gene are then injected into early mouse embryos,… White mouse embryo 6 …which are implanted into a pseudopregnant mouse. 7 Variegated progeny contain a mixture of normal cells and cells with the disabled gene.  8 Variegated mice are then interbred and produce some progeny that are homozygous for the knocked-out gene. Conclusion: Through this procedure, mice that contain no functional copy of the gene—that is, the gene is knocked out—are produced. The phenotype of the knockout mice reveals the function of the gene. ◗ Transgenic mice are produced by the injection of cloned DNA into the pronucleus of a fertilized egg, followed by implantation of the egg into a female mouse. In knockout mice, the injected DNA contains a mutation that disables a gene. Inside the mouse embryo, the disabled copy of the gene can exchange with the normal copy of the gene through homologous recombination. 18.24 Knockout mice possess a genome in which a gene has been disabled. In addition to providing valuable new information about the nature and function of genes, recombinant DNA technology has many practical applications. These applications include the production of pharmaceuticals and other chemicals, specialized bacteria, agriculturally important plants, and genetically engineered farm animals. The technology is also used extensively in medical testing and, in a few cases, is even being used to correct human genetic defects. Hundreds of firms now specialize in developing products through genetic engineering, and many large multinational corporations have invested enormous sums of money in recombinant DNA research. Recombinant DNA technology is also frequently used in criminal investigations and for the identification of human remains. For example, the remains of people who died in the collapse of the World Trade Center on September 11, 2001, are being identified through DNA comparisons with the use of recombinant DNA technology. Pharmaceuticals The first commercial products to be developed by using recombinant DNA technology were pharmaceuticals used in the treatment of human diseases and disorders. In 1979, the Eli Lilly corporation began selling human insulin produced with the use of recombinant DNA technology. Before this time, all the insulin used in the treatment of diabetics was isolated from the pancreases of farm animals slaughtered for meat. Although this source of insulin worked well for many diabetics, it was not human insulin, and some people suffered allergic reactions to the foreign protein. The human insulin gene was inserted into plasmids and transferred to bacteria that then produced human insulin. Pharmaceuticals produced through recombinant DNA technology include human growth hormone (for children with growth deficiencies), clotting factors (for hemophiliacs), and tissue plasminogen activator (used to dissolve blood clots in heartattack patients). Recombinant DNA Technology Specialized Bacteria Bacteria play an important role in many industrial processes, including the production of ethanol from plant material, the leaching of minerals from ore, and the treatment of sewage and other wastes. The bacteria engaged in these processes are being modified by genetic engineering so that they work more efficiently. New strains of technologically useful bacteria are being developed that will break down toxic chemicals and pollutants, enhance oil recovery, increase nitrogen uptake by plants, and inhibit the growth of pathogenic bacteria and fungi. Agricultural Products Recombinant DNA technology has had a major effect on agriculture, where it is now used to create crop plants and domestic animals with valuable traits. For many years, plant pathologists had recognized that plants infected with mild strains of viruses are resistant to infection by virulent strains. Using this knowledge, geneticists have created viral resistance in plants by transferring genes for viral proteins to the plant cells. A genetically engineered squash, called Freedom II, carries genes from the watermelon mosaic virus 2 and the zucchini yellow mosaic virus that protect the squash against viral infections. Another objective has been to genetically engineer pest resistance into plants to reduce dependence on chemical pesticides. A protein toxin from the bacterium Bacillus thuringiensis selectively kills the larvae of certain insect pests but is harmless to wildlife, humans, and many other insects. The toxin gene has been isolated from the bacteria, linked to active promoters, and transferred into corn, tomato, potato, and cotton plants. The gene produces the insecticidal toxin in the plants, and caterpillars that feed on the plant die. Recombinant DNA technology has also permitted the development of herbicide resistance in plants. A major problem in agriculture is the control of weeds, which compete with crop plants for water, sunlight, and nutrients. Although herbicides are effective at killing weeds, they can also damage the crop plants. Genes that provide resistance to broad-spectrum herbicides have been transferred into tomato, soybean, cotton, oilseed rape, and other commercially important crops. When the fields containing these crops are sprayed with herbicides, the weeds are killed but the genetically engineered plants are unaffected. In 1999, more than 21 million hectares (1 hectare  2.471 acres) of genetically engineered soybeans and 11 million hectares of genetically engineered corn was grown throughout the world. Recombinant DNA techniques are also applied to domestic animals. For example, the gene for growth hormone was isolated from cattle and cloned in E. coli; these bacteria produce large quantities of bovine growth hormone, which is administered to dairy cattle to increase milk production. Transgenic animals are being developed to carry genes that encode pharmaceutical products. For example, a gene for human clotting factor VIII has been linked to the regulatory region of the sheep gene for -lactoglobulin, a milk protein. The fused gene was injected in sheep embryos, creating transgenic sheep that produce in their milk the human clotting factor, which is used to treat hemophiliacs. A similar procedure was used to transfer a gene for 1-antitrypsin, a protein used to treat patients with hereditary emphysema, into sheep. Female sheep bearing this gene produce as much as 15 grams of 1-antitrypsin in each liter of their milk, generating $100,000 worth of 1-antitrypsin per year for each sheep. The genetic engineering of agricultural products is controversial. One area of concern focuses on the potential effects of releasing novel organisms produced by genetic engineering into the environment. There are many examples in which nonnative organisms released into a new environment have caused ecological disruption because they are free of predators and other natural control mechanisms. Genetic engineering normally transfers only small sequences of DNA, relative to the large genetic differences that often exist between species, but even small genetic differences may alter ecologically important traits that might affect the ecosystem. Another area of concern is that transgenic organisms may hybridize with native organisms and transfer their genetically engineered traits. For example, herbicide resistance engineered into crop plants might be transferred to weeds, which would then be resistant to the herbicides that are now used for their control. The results of some studies have demonstrated gene transfer between engineered plants and native plants, but the extent and effect of this transfer are uncertain. Other concerns focus on health-safety issues associated with the presence of engineered products in natural foods; some critics have advocated required labeling of all genetically engineered foods that contain transgenic DNA or protein. Such labeling is required in countries of the European Union but not in the United States. On the other hand, the use of genetically engineered crops and domestic animals has potential benefits. Genetically engineered crops that are pest resistant have the potential to reduce the use of environmentally harmful chemicals, and research findings indicate that lower amounts of pesticides are used in the United States as a result of the adoption of transgenic plants. Transgenic crops also increase yields, providing more food per acre, which reduces the amount of land that must be used for agriculture. Concepts Recombinant DNA technology is used to create a wide range of commercial products, including pharmaceuticals, specialized bacteria, genetically engineered crops, and transgenic domestic animals. www.whfreeman.com/pierce More information about the use of recombinant DNA and biotechnology in agriculture 535 536 Chapter 18 Oligonucleotide Drugs A recent application of DNA technology has been the development of oligonucleotide drugs, which are short sequences of synthetic DNA or RNA molecules that can be used to treat diseases. Antisense oligonucleotides are complementary to undesirable RNAs, such as viral RNA. When added to a cell, these antisense DNAs bind to the viral mRNA and inhibit its translation. Single-stranded DNA oligonucleotides bind tightly to other DNA sequences, forming a triplex DNA molecule ( ◗ FIGURE 18.25). The formation of triplex DNA interferes with the binding of RNA polymerase and other proteins required for transcription. Other oligonucleotides are ribozymes, RNA molecules that function as enzymes (see Chapter 13). These compounds bind to specific mRNA molecules and cleave them into fragments, destroying their ability to encode proteins. Several oligonucleotide drugs are already being tested for the treatment of AIDS and cancer. Concepts Oligonucleotide drugs are short pieces of DNA or RNA that prevent the expression of particular genes. Genetic Testing The identification and cloning of many important diseasecausing human genes has allowed the development of probes for detecting disease-causing mutations. Prenatal testing is already available for several hundred genetic disorders (see Chapter 4). Additionally, presymptomatic genetic tests for adults and children are available for an increasing number of disorders. DNA Viral mRNA 3’ 5’ The growing availability of genetic tests raises a number of ethical and social issues. For example, is it ethical to test for genetic diseases for which there is no cure or treatment? Huntington disease, an autosomal dominant disorder that appears in middle age, causes slow physical and mental deterioration and eventually death. No effective treatment is currently available. If one parent is affected, a child has a 50% chance of inheriting the gene for Huntington disease and eventually getting the disorder. Tests are now available that make it possible to determine whether a person carries the Huntington-disease gene, but is it beneficial to tell a young person that he or she has the Huntington-disease gene and will get the disease later in life? Although learning that you do not have the gene might provide great peace of mind, learning that you do have it might lead to despair and depression. Many people at risk for Huntington disease want predictive testing, saying that the uncertainty of not knowing is more debilitating than the certain knowledge that they will get it and, in fact, a number of medical centers now offer predictive testing for Huntington disease. A few people who learned that they have the gene have committed suicide, and others had to be hospitalized for depression, but the results of several studies indicate that most people who undergo predictive testing for Huntington disease are able to cope with the information. Other ethical and legal questions concern the confidentiality of test results. Who should have access to the results of genetic testing? Should insurance companies be allowed to use results from such tests to deny coverage to healthy people who are at risk for genetic diseases? Should relatives who also might be at risk be informed of the results of genetic testing? Other concerns focus on whether the cost of genetic testing justifies the benefits. In some cases, genetic tests provide clear benefits because early identification allows for mRNA 5’ 3’ 3’ 5’ 3’ 5’ Ribozyme Promotor Antisense DNA Antisense DNA 3’ 5’ Cleavage 5’ Triplex DNA Antisense DNA may bind to the 5’ end of viral mRNA and prevent the binding of the ribosome; so no viral protein is produced. ◗ Antisense DNA may bind to a promoter (forming triplex DNA) and prevent the binding of RNA polymerase; so mRNA is not produced. 18.25 Oligonucleotide drugs are short sequences of DNA or RNA that can be used to treat diseases. 3’ Disabled mRNA Antisense RNA in the form of a ribozyme may bind and cleave mRNA, destroying it. Recombinant DNA Technology better treatment. For example, when phenylketonuria (an autosomal recessive disorder that can cause mental retardation) is identified in infants, the administration of a special diet can prevent mental retardation. Because of this obvious benefit and the low cost of testing for this disorder, all states in the United States and many other countries require newborns to be tested for PKU. Predictive testing for colorectal cancer and breast cancer also may be beneficial for at-risk people, because finding these cancers early improves the chances for successful treatment. Patients with genes that predispose to cancer may require more aggressive treatment than do patients with sporadically arising cancers. In these diseases, genetic testing provides clear benefits. Another set of concerns are related to the accuracy of genetic tests. For many genetic diseases, the only predictive tests available are those that identify a predisposing mutation in DNA, but many genetic diseases may be caused by dozens or hundreds of different mutations. Probes that detect common mutations can be developed, but they won’t detect rare mutations and will give a false negative result. Short of sequencing the entire gene — which is expensive and time consuming — there is no way to identify all predisposed persons. These questions and concerns are currently the focus of intense debate by ethicists, physicians, scientists, and patients. Table 18.6 www.whfreeman.com/pierce testing More information on genetic Gene Therapy Perhaps the ultimate application of recombinant DNA technology is gene therapy, the direct transfer of genes into humans to treat disease. When the first recombinant DNA experiments with bacteria were announced, many researchers recognized the potential for using this new technology in the treatment of patients with genetic diseases. But, before recombinant DNA could be used on humans, a number of difficult obstacles had to be overcome. The genes responsible for particular genetic diseases needed to be located and cloned, and special vectors had to be developed that would reliably and efficiently deliver genes to human cells. In 1990, gene therapy became reality. W. French Anderson and his colleagues at the U.S. National Institutes of Health (NIH) transferred a functional gene for adenosine deaminase to a young girl with severe combined immunodeficiency disease, an autosomal recessive condition that produces impaired immune function. Today, thousands of patients have received gene therapy, and many clinical trials are underway. Gene therapy is being used to treat genetic diseases, cancer, heart disease, and even some infectious diseases such as AIDS. All of these Vectors used in gene therapy Vector Advantages Disadvantages Retrovirus Efficient transfer Transfers DNA only to dividing cells, inserts randomly; risk of producing wild-type viruses Adenovirus Transfers to nondividing cells Causes immune reaction Adeno-associated virus Does not cause immune reaction Holds small amount of DNA; hard to produce Herpes virus Can insert into cells of nervous system; does not cause immune reaction Hard to produce in large quantities Lentivirus Can accommodate large genes Safety concerns Liposomes and other lipid-coated vectors No replication; does not stimulate immune reaction Low efficiency Direct injection No replication; directed toward specific tissues Low efficiency; does not work well with in some tissues Pressure treatment Safe, because tissues are treated outside the body and then transplanted into the patient Most efficient with small DNA molecules Gene gun (DNA coated on small gold particles and shot into tissue) No vector required Low efficiency Source: After E. Marshall, Gene therapy’s growing pains, Science 269(1995):1050 – 1055. 537 538 Chapter 18 therapies depend on an introduced gene’s ability to produce a therapeutic protein. A number of different methods for transferring genes into human cells are currently under development. Commonly used vectors include genetically modified retroviruses, adenoviruses, and adeno-associated viruses (Table 18.6). One method of gene transfer is to remove cells (such as white blood cells) from a patient’s body, add viruses containing recombinant genes, and then reintroduce the cells back into the patient’s body. In other cases, vectors are injected directly into the body. In spite of the growing number of clinical trials for gene therapy, significant problems remain in transferring foreign genes into human cells, getting them expressed, and limiting immune responses to the gene products and the vectors used to transfer the genes to the cells. There are also heightened concerns about safety, especially after the death in 1999 of a patient participating in a gene-therapy trial who had a fatal immune reaction after he was injected with a viral vector carrying a gene to treat his metabolic disorder. Despite this setback, gene-therapy research has moved ahead. Unequivocal results demonstrating positive benefits from gene therapy for a severe combined immunodeficiency disease and for head and neck cancer were announced in 2000. Gene therapy conducted to date has targeted only nonreproductive, somatic cells. Correcting a genetic defect in these cells (termed somatic gene therapy) may provide positive benefits to patients but will not affect the genes of future generations. Gene therapy that alters reproductive, or germ-line, cells (termed germ-line gene therapy) is technically possible but raises a number of significant ethical issues, because it has the capacity to alter the gene pool of future generations. Ancestral chromosome DNA Gene Mapping A significant contribution of recombinant DNA technology has been to provide numerous genetic markers that can be used in gene mapping. One group of markers used in gene mapping comprises restriction fragment length polymorphisms (RFLPs, pronounced rifflips). RFLPs are variations (polymorphisms) in the patterns of fragments produced when DNA molecules are cut with the same restriction enzyme. If DNA from two persons is cut with the same restriction enzyme and different patterns of fragments are produced ( ◗ FIGURE 18.26), these persons must possess differences in their DNA sequences. These differences are inherited and can be used in mapping, similar to the way in which allelic differences are used to map conventional genes. GGCC CCGG 2 A mutation creates a polymorphism. Some copies have both restriction sites and others only one. Joe GGCC CCGG GACC CTGG GGCC CCGG 3 When DNA from two persons is digested by HaeIII,… RFLP analysis 5 Bob’s DNA is cut into three bands because his chromosomes possess both restriction sites. Restriction fragment length polymorphism Bob’s DNA Joe’s DNA 4 … two different patterns appear on the autoradiograph of the gel. 6 Joe’s DNA is cut into only two bands because his chromosomes possess only one of the two sites. Pattern A Pattern B 7 This example assumes that Bob is homozygous for the A pattern and Joe is homozygous for the B pattern. A person heterozygous for the RFLP would display bands seen in both the A and the B patterns. Concepts Gene therapy is the direct transfer of genes into humans to treat disease. Gene therapy was first successfully implemented in 1990 and is now being used to treat genetic diseases, cancer, and infectious diseases. GGCC CCGG Bob GGCC CCGG 1 DNA sequence had two HaeIII restriction sites. HaeIII site ◗ 18.26 Restriction fragment length polymorphisms are genetic markers that can be used in mapping. Traditionally, gene mapping has relied on the use of genetic differences that produce easily observable phenotypic differences. Unfortunately, because most traits are influenced by multiple genes and the environment, the number of traits with a simple genetic basis suitable for use in mapping is limited. RFLPs provide a large number of genetic markers that can be used in mapping. To illustrate mapping with RFLPs, let’s again consider Huntington disease. As mentioned earlier, this disease is caused by an autosomal dominant gene but, until recently, the chromosomal location of the gene was unknown. A team of scientists led by James Gusella (see introduction to Chapter 5) set out to determine the location of the Huntington gene, in the hope that, when the gene was found, its biochemical basis could be determined and possi- Recombinant DNA Technology ble treatments might be suggested. DNA was collected from members of the largest known family with Huntington disease, who live near Lake Maracaibo in Venezuela. The basic strategy employed in the search for the Huntington-disease gene and a number of other human disease-causing genes is to look for coinheritance of the disease-causing gene and an RFLP with a known chromosomal location. If the disease gene and the RFLP have been inherited together, they must be physically linked. This approach is summarized in ◗ FIGURE 18.27, which illustrates the coinheritance of two traits: (1) the presence or absence of Huntington disease and (2) the type of restriction pattern produced (pattern A or C). In the family shown, the father is heterozygous for Huntington disease (Hh) and is also heterozygous for a restriction pattern (AC). From the father, each child inherits either a Huntington(a) I AC Hh BB hh II DNA Fingerprinting AB hh CB Hh AB Hh CB hh AB Hh AC Hh BB hh AB hh CB Hh CB Hh AB hh CB hh AB hh CB Hh AB hh (b) I II CB Hh ◗ disease allele (H) or a normal allele (h); any child inheriting the Huntington-disease allele develops the disease, because it is an autosomal dominant disorder. The child also inherits one of the two RFLP alleles from the father, either A or C, which produces the corresponding RFLP pattern. In ◗ FIGURE 18.27a, there is no correspondence between the inheritance of the RFLP pattern and the inheritance of the disease: children who have inherited Huntington disease (and therefore the H allele) from their father are equally likely to have inherited the A or C RFLP pattern. Because, in this case, the H allele and the RFLP alleles segregate randomly, we know that they are not closely linked. ◗ FIGURE 18.27b, on the other hand, shows that every child who inherits the C pattern from the father also inherits Huntington disease (and therefore the H allele), because the locus for the RFLP is closely linked to the locus for the disease-causing gene. The chromosomal location of the RFLP provides a general indication of the disease-causing locus. An examination of the cosegregation of other RFLPs from the same region can precisely determine the location of the gene. Actual RFLP patterns and part of the Huntingtondisease gene are shown in ◗ FIGURE 18.28. AB hh CB Hh 18.27 Restriction fragment length polymorphisms can be used to detect linkage. In this hypothetical pedigree, the father and half of the children are affected (red circles and squares) with Huntington disease, an autosomal dominant disease. The father is heterozygous (Hh) and will pass the chromosome with the Huntington gene to approximately half of his offspring. The father is also heterozygous for RFLP alleles A and C; each child receives one of these two alleles from the father. The mother is homozygous for RFLP allele B, so all children receive the B allele from her. (a) In this case, there is no correspondence between the inheritance of the RFLP allele and inheritance of the disease — children with the disease are just as likely to carry the A allele as they are the C allele. Thus the disease gene and RFLP alleles segregate independently and are not closely linked. (b) In this case, there is a close correspondence between the inheritance of the RFLP alleles and the presence of the disease — every child who inherits the C allele from the father also has the disease. This correspondence indicates that the RFLP is closely linked to the Huntington gene. Restriction fragment length polymorphisms are often found in noncoding regions of DNA and are therefore frequently quite variable in humans. Two randomly chosen people will differ at many RFLPs and, if enough RFLPs are examined, no two people (with the exception of identical twins) will be exactly the same. The use of DNA sequences to identify a person, called DNA fingerprinting, is a powerful tool for criminal investigations and other forensic applications. In a typical application, DNA fingerprinting might be used to confirm that a suspect was present at the scene of a crime ( ◗ FIGURE 18.29). A sample of DNA from blood, semen, hair, or other body tissue is collected from the crime scene. If the sample is very small, PCR can be used to amplify it so that enough DNA is available for testing. Additional DNA samples are collected from one or more suspects. Each DNA sample is cut with one or more restriction enzymes, and the resulting DNA fragments are separated by gel electrophoresis. The fragments in the gel are denatured and transferred to nitrocellulose paper by Southern blotting. One or more radioactive probes is then hybridized to the nitrocellulose and detected by autoradiography. The pattern of bands produced by DNA from the sample collected at the crime scene is then compared with the patterns produced by DNA from the suspects. The probes used in DNA fingerprinting detect highly variable regions of the genome; so the chances of DNA from two people producing exactly the same banding pattern is low. When several probes are used in the analysis, the probability that two people have the same set of patterns becomes vanishingly small (unless they are identical twins). A match between the sample from the crime scene and one 539 540 Chapter 18 ◗ 18.28 Restriction fragment length polymorphisms were used to map the Huntington-disease gene to chromosome 8. (a) Autoradiograph showing different banding patterns revealed by cutting the DNA with HindIII and using a probe to chromosome 8. The RFLP A allele produces five bands. The C allele also produces five bands, but the first band is just below the first band produced by the A allele; AC heterozygotes have both bands, which are very close together. The B allele has an extra band representing a 4.9-kb fragment. (b) Partial pedigree of large family from Lake Maracaibo. Red symbols represent family members with Huntington disease; the RFLP genotypes are indicated below each person represented in the pedigree. Notice that persons with the disease carry the C allele, indicating that the sequences on chromosome 8 revealed by the probe are closely linked to the Huntington-disease gene. Recombinant DNA Technology ◗ 18.29 The use of DNA sequences to identify a person is called DNA fingerprinting. from the suspect can provide evidence that the suspect was present at the scene of the crime. The probes most commonly used in DNA fingerprinting are complementary to short sequences repeated in tandem that are widely found in the human genome (see p. 000 in Chapter 11). People vary greatly in the number of copies of these repeats; thus, these polymorphisms are termed variable number of tandem repeats (VNTRs). Since its introduction in the 1980s, DNA fingerprinting has helped convict a number of suspects in murder and rape cases. Suspects in other cases have been proved innocent when their DNA failed to match that from the crime scenes. Initially there was some controversy over calculating the odds of a match (the probability that two people could have the same pattern) and concerns about quality control (such as the accidental contamination of samples and the reproducibility of results) in laboratories where DNA analysis is done. In spite of the controversy, DNA fingerprinting has become an important tool in forensic investigations. DNA fingerprinting has also been used to provide information about the relationships and sources of other organisms. For example, DNA fingerprinting was used to determine that several samples of anthrax mailed to different people in 2001 were all from the same source. Concepts RFLPs are variations in the pattern of fragments produced by restriction enzymes, which reveal variations in DNA sequences. They are used extensively in gene mapping. DNA fingerprinting detects genetic differences among people by using probes for highly variable regions of chromosomes. www.whfreeman.com/pierce fingerprinting More information on DNA Concerns About Recombinant DNA Technology In 1971, as researchers were planning some of the first genecloning experiments, in which they planned to transfer genes from tumor viruses to E. coli, several scientists raised concerns about the safety of such experiments. E. coli is present in the human intestinal tract, and these scientists questioned whether it might be possible for recombinant bacteria to escape from the laboratory and infect people, eventually transferring tumor-causing genes to people. The risks were thought to be small, but the real hazards were quite unknown. When the first experiments using recombinant DNA were performed in 1973, concerns about risks associated with recombinant technology were heightened. Although no hazard had been demonstrated, a number of potential dangers could be envisioned. In July 1974, leading molecular biologists published a letter in Science urging scientists to stop conducting certain types of potentially hazardous recombinant DNA experiments until their risks could be evaluated. In February 1975, a group of more than 100 molecular biologists met and agreed that some restrictions on recombinant DNA research were warranted. They formulated a series of recommendations concerning the types of recombinant DNA experiments that should be prohibited. The National Institutes of Health then appointed a committee to develop guidelines for recombinant DNA research. Different types of cloning experiments were considered to have different degrees of risk, and more precautions were required for the more “risky” experiments. The 541 542 Chapter 18 Recombinant DNA Advisory Committee was established to oversee the safety of this work in the United States, and similar committees were established in Europe. After years of experience with recombinant DNA experiments, the initial concerns about risks were turned out to be largely unfounded, and the NIH guidelines have now been significantly relaxed. Current controversy about recombinant DNA technology revolves largely around the release of genetically modified organisms into the environment and the application of recombinant DNA technology to humans. Connecting Concepts Across Chapters This chapter has focused on recombinant DNA technology, a set of methods to isolate, study, and manipulate DNA sequences. Before the development of this technology, geneticists were forced to study genes by examining the phenotypes produced by the genes under study. The power of recombinant DNA technology is that it allows geneticists to read and alter genetic information directly, leading to an entirely new approach to the study of heredity in which genes are studied by altering DNA sequences and observing the associated change in phenotype. A major theme of this chapter has been that working at the molecular level requires special approaches because DNA and other molecules are too small to see and manipulate directly. A number of recombinant DNA techniques are available and can be mixed and matched in different combinations or strategies; the particular set of methods used depends both on the sequences being manipulated and on the ultimate goal of the researcher. Mastering the information in this chapter requires an understanding of material presented in many of the preceding chapters, particularly those on molecular genetics. A detailed understanding of DNA structure (Chapter 10), replication (Chapter 12), and the genetic code (Chapter 15) are essential for grasping the details of recombinant DNA technology. Knowledge of bacterial and viral genetics (Chapter 8) is helpful, because much of gene cloning takes place in bacteria, and plasmids and viruses are commonly used as cloning vectors. Knowledge of gene regulation (Chapter 16) is useful for understanding expression vectors and recombinant DNA applications where proteins are produced. The information presented in this chapter will complement and enhance much of the material presented in the remaining chapters of the book. Chapter 19 deals with the use of recombinant DNA technology to compare the organization, content, and expression of genomes of different organisms. CONCEPTS SUMMARY • Recombinant DNA technology is a set of molecular techniques for locating, cutting, joining, analyzing, and altering DNA sequences and for inserting the sequences into a cell. • Restriction endonucleases are enzymes that make double-stranded cuts in DNA at specific base sequences. • DNA fragments can be separated with the use of gel electrophoresis and visualized by staining the gel with a dye that is specific for nucleic acids or by labeling the fragments with a radioactive or chemical tag. • Individual genes can be studied by transferring DNA fragments from a gel to nitrocellulose or nylon and applying complementary probes. • Gene cloning refers to placing a gene or a DNA fragment into a bacterial cell, where it will be multiplied as the cell divides. • Plasmids, small circular pieces of DNA, are often used as vectors to ensure that a cloned gene is stable and replicated within the recipient cells. • Bacteriophage  offers several advantages over plasmids: it can hold larger fragments of foreign DNA and transfers DNA to cells with higher efficiency. • Cosmids, which combine properties of plasmids and phage vectors, hold even larger amounts of foreign DNA. Yeast and • • • • • bacterial artificial chromosomes can accommodate large inserts more than 100,000 bp in length. Expression vectors contain promoters, ribosome-binding sites, and other sequences necessary for foreign DNA to be transcribed and translated. Genes can be isolated by creating a DNA library, a set of bacterial colonies or viral plaques that each contain a different cloned fragment of DNA. A genomic library contains the entire genome of an organism, cloned as a set of overlapping fragments; a cDNA library contains DNA fragments complementary to all the different mRNAs in a cell. DNA libraries can be screened with probes complementary to particular genes or DNA fragments in the library can be cloned into an expression vector and screened by looking for the associated protein product. Genes can also be located by chromosome walking, in which a neighboring gene is used to make a probe; a genomic library is screened with this probe to find a clone that overlaps the gene. A probe is made from the end of this clone, and the probe is used to screen the library for a second clone that overlaps the first. The process is continued until the gene of interest is reached. The cloning strategy depends on the purpose of the cloning experiment, what is known about the gene, the size of the Recombinant DNA Technology gene to be cloned, the size of the genome from which it is isolated, and the organism into which it will be cloned. • The polymerase chain reaction is a method for amplifying DNA enzymatically without cloning. A solution containing DNA is heated, so that the two DNA strands separate, and then quickly cooled, allowing primers to attach to the template DNA. The solution is then heated again, and DNA polymerase synthesizes new strands from the primers. Each time the cycle is repeated, the amount of DNA doubles. • In situ hybridization can be used to determine the chromosomal location of a gene and the distribution of the mRNA produced by a gene. DNA footprinting reveals the nucleotides that are covered by DNA-binding proteins. Site-directed mutagenesis can be used to produce mutations at specific sites in DNA, allowing genes to be tailored for a particular purpose. Transgenic animals, produced by injecting DNA into fertilized eggs, contain foreign DNA that is integrated into a chromosome. Knockout mice are transgenic mice that have a normal gene disabled. 543 • Recombinant DNA technology has many applications, including not only the production of pharmaceuticals and other biological substances in bacteria but also the creation of bacteria that are genetically engineered for economically or medically important tasks. It is also being used in agriculture to transfer particular traits, such as disease and pest resistance, to crop plants. Transgenic domestic animals can be produced with desirable traits. Oligonucleotide drugs — short nucleotide sequences for treating diseases — are another application of recombinant DNA technology. • In gene therapy, diseases are being treated by altering the genes of human cells. • Restriction fragment length polymorphisms and variable number tandem repeats facilitate gene mapping by making available numerous genetic markers and are being used to identify people by their DNA sequences (DNA fingerprinting). IMPORTANT TERMS recombinant DNA technology (p. 000) genetic engineering (p. 000) biotechnology (p. 000) restriction enzyme (p. 000) restriction endonuclease (p. 000) cohesive end (p. 000) gel electrophoresis (p. 000) end labeling (p. 000) autoradiography (p. 000) probe (p. 000) Southern blotting (p. 000) Northern blotting (p. 000) Western blotting (p. 000) gene cloning (p. 000) cloning vector (p. 000) cosmid (p. 000) expression vector (p. 000) shuttle vector (p. 000) yeast artificial chromosome (YAC) (p. 000) bacterial artificial chromosome (BAC) (p. 000) Ti plasmid (p. 000) DNA library (p. 000) genomic library (p. 000) cDNA library (p. 000) chromosome walking (p. 000) cloning strategy (p. 000) polymerase chain reaction (PCR) (p. 000) Taq polymerase (p. 000) in situ hybridization (p. 000) DNA footprinting (p. 000) site-directed mutagenesis (p. 000) oligonucleotide-directed mutagenesis (p. 000) transgene (p. 000) knockout mice (p. 000) gene therapy (p. 000) restriction fragment length polymorphism (RFLP) (p. 000) DNA fingerprinting (p. 000) variable number of tandem repeats (VNTRs) (p. 000) Worked Problems 1. A molecule of double-stranded DNA that is 5 million base pairs long has a base composition that is 62% G  C. How many times, on average, are the following restriction sites likely to be present in this DNA molecule? (a) BamHI (recognition sequence  GGATCC) (b) HindIII (recognitions sequence  AAGCTT) (c) HpaII (recognition sequence  CCGG) • Solution The percentages of G and C are equal in double-stranded DNA; so, if G  C  62%, then %G  %C  62%/2  31%. The percentage of A  T  (100%  G  C)  48%, and %A  %T  48%/2  24%. To determine the probability of finding a particular base sequence, we use the multiplicative rule, multiplying together the probably of finding each base at a particular site. (a) The probability of finding the sequence GGATCC  0.31 0.31 0.24 0.24 0.31 0.31  0.00053. To determine the average number of recognition sequences in a 5-million-basepair piece of DNA, we multiply 5,000,000 bp 0.00053  2659.5 recognition sequences. (b) The number of AAGCTT recognition sequences is 0.24 0.24 0.31 0.31 0.24 0.24 5,000,000  1594 recognition sequences. (c) The number of CCGG recognition sequences is 0.31 0.31 0.31 0.31 5,000,000  46,176 recognition sequences. 544 Chapter 18 2. A protein has the following amino acid sequence: • Solution Met-Leu-Arg-Ser-Arg-Met-Tyr-Trp-Asp-His-Glu-Thr You wish to make a set of probes to screen a cDNA library for the sequence that encodes this protein. Your probes should be at least 18 nucleotides in length. (a) Which amino acids in the protein should be used so that the smallest number of probes is required? (Consult the genetic code in Figure 15.12.) (b) How many different sequences must be synthesized to be certain that you will find the correct cDNA sequence that specifies the protein? 1 Met AUG 2 Leu UUA UUG CUU CUC CUA CUG 3 Arg CGU CGC CGA CGG 4 Ser UCU UCC UCA UCG AGU AGC We first write out all the codons that can specify all the amino acids in the protein, using the genetic code in Figure 15.12 (see table below). (a) The 18-bp region encoding amino acids 6 through 11 should be used, because this region has the fewest number of possible codons. (b) For amino acids 6 through 11, there is one possible codon for Met, two for Tyr, one for Trp, two for Asp, two for His, and two for Glu. Thus 1 2 1 2 2 2  16 possible sequences must be synthesized to locate the gene. 5 Arg CGU CGC CGA CGG AGA AGG 6 Met AUG 7 Tyr UAU UAC 8 Trp UGG 9 Asp GAU GAC 10 His CAU CAC 11 Glu GAA GAG 12 Thr ACU ACC ACA ACG The New Genetics MINING GENOMES RECOMBINANT DNA PROJECT This exercise casts you in the role of research geneticist. Your job is to plan a project to clone a specific gene into a plasmid vector, including the selection of the restriction enzymes and vector that you will use. You will utilize the Web sites of some of the major suppliers of biotchnology reagents in the process. COMPREHENSION QUESTIONS * * * * 1. List some of the effects and applications of recombinant DNA technology. 2. What common feature is seen in the sequences recognized by type II restriction enzymes? 3. What role do restriction enzymes play in bacteria? How do bacteria protect their own DNA from the action of restriction enzymes? 4. Explain how gel electrophoresis is used to separate DNA fragments of different lengths. 5. After DNA fragments are separated by gel electrophoresis, how can they be visualized? 6. What is the purpose of Southern blotting? How is it carried out? 7. What are the differences between Southern, Northern, and Western blotting? 8. Give three important characteristics of cloning vectors. 9. Briefly describe four different methods for inserting foreign DNA into plasmids, giving the strengths and weaknesses of each. 10. How are plasmids transferred into bacterial cells? *11. Briefly explain how an antibiotic-resistance gene and the lacZ gene can be used as markers to determine which cells contain a particular plasmid. 12. How are genes inserted into bacteriophage  vectors? What advantages do  vectors have over plasmids? *13. What is a cosmid? What are the advantages of using cosmids as gene vectors? 14. What are yeast artificial chromosomes and shuttle vectors? When are these cloning vectors used? *15. How does a genomic library differ from a cDNA library? How is each created? 16. How are probes used to screen DNA libraries? Explain how a synthetic probe can be prepared when the protein product of a gene is known. 17. Explain how chromosome walking can be used to find a gene. 18. Discuss some of the considerations that must go into developing an appropriate cloning strategy. Recombinant DNA Technology 545 * 19. Briefly explain how the polymerase chain reaction is used to 24. Describe how RFLPs can be used in gene mapping. amplify a specific DNA sequence. What are some of the * 25. What is DNA fingerprinting? What types of sequences are limitations of PCR? examined in DNA fingerprinting? * 20. Briefly explain in situ hybridization, giving some 26. What is gene therapy? applications of this technique. 27. As the first recombinant DNA experiments were being 21. What is DNA footprinting? carried out, there was concern among some scientists about 22. Briefly explain how site-directed mutagenesis is carried out. this research. What were these concerns and how were they addressed? * 23. What are knockout mice, how are they produced, and for what are they used? APPLICATION QUESTIONS AND PROBLEMS * 28. Suppose that a geneticist discovers a new restriction enzyme in the bacterium Aeromonas ranidae. This restriction enzyme is the first to be isolated from this bacterial species. Using the standard convention for abbreviating restriction enzymes, give this new restriction enzyme a name (for help, see footnote to Table 18.2). 29. How often, on average, would you expect a type II restriction endonuclease to cut a DNA molecule if the recognition sequence for the enzyme had 5 bp? (Assume that the four types of bases are equally likely to be found in the DNA and that the bases in a recognition sequence are independent.) How often would the endonuclease cut the DNA if the recognition sequence had 8 bp? * 30. A microbiologist discovers a new type II restriction endonuclease. When DNA is digested by this enzyme, fragments that average 1,048,500 bp in length are produced. What is the most likely number of base pairs in the recognition sequence of this enzyme? 31. Will restriction sites for an enzyme that has 4 bp in its restriction site be closer together, farther apart, or similarly spaced, on average, compared with those of an enzyme that has 6 bp in its restriction site? Explain your reasoning. * 32. About 30% of the base pairs in a human DNA molecule are AT. If the human genome has 3 billion base pairs of DNA, about how many times will the following restriction sites be present? (a) BamHI (restriction site  5 – GGATCC – 3) (b) EcoRI (restriction site  5 – GAATTC – 3) (c) HaeIII (restriction site  5– GGCC – 3) * 33. Restriction mapping of a linear piece reveals the following EcoRI restriction sites. EcoRI site 1 2 kb EcoRI site 2 4 kb 5 kb (a) This piece of DNA is cut by EcoRI, the resulting fragments are separated by gel electrophoresis, and the gel is stained with ethidium bromide. Draw a picture of the bands that will appear on the gel. (b) If a mutation that alters EcoRI site 1 occurs in this piece of DNA, how will the banding pattern on the gel differ from the one that you drew in part a? (c) If mutations that alter EcoRI sites 1 and 2 occur in this piece of DNA, how will the banding pattern on the gel differ from the one that you drew in part a? (d) If a 1000-bp insertion occurred between the two restriction sites, how would the banding pattern on the gel differ from the one that you drew in part a? (e) If a 500-bp deletion occurred between the two restriction sites, how would the banding pattern on the gel differ from the one that you drew in part a? * 34. Which vectors (plasmid, phage , cosmid) can be used to clone a continuous fragment of DNA with the following lengths? (a) 4 kb. (b) 20 kb. (c) 35 kb. 35. A geneticist uses a plasmid for cloning that has a gene that confers resistance to penicillin and the lacZ gene. The geneticist inserts a piece of foreign DNA into a restriction site that is located within the lacZ gene and transforms bacteria with the plasmid. Explain how the geneticist can identify bacteria that contain a copy of a plasmid with the foreign DNA. * 36. Suppose that you have just graduated from college and have started working at a biotechnology firm. Your first job assignment is to clone the pig gene for the hormone prolactin. Assume that the pig gene for prolactin has not yet been isolated, sequenced, or mapped; however, the mouse gene for prolactin has been cloned and the amino acid sequence of mouse prolactin is known. Briefly explain two different strategies that you might use to find and clone the pig gene for prolactin. 37. A genetic engineer wants to isolate a gene from a scorpion that encodes the deadly toxin found in its stinger, with the 546 Chapter 18 ultimate purpose of transferring this gene to bacteria and producing the toxin for use as a commercial pesticide. Isolating the gene requires a DNA library. Should the genetic engineer create a genomic library or a cDNA library? Explain your reasoning. it is closely linked to two RFLPs on the same chromosome, one at the A locus and one at the C locus. The genes at the G, A, and C loci are very close together, and there is little crossing over between them. The following RFLP alleles are found at the A and C loci: *38. A protein has the following amino acid sequence: Met-Tyr-Asn-Val-Arg-Val-Tyr-LysAla-Lys-Trp-Leu-Ile-His-Thr-Pro You wish to make a set of probes to screen a cDNA library for the sequence that encodes this protein. Your probes should be at least 18 nucleotides in length. (a) Which amino acids in the protein should be used to construct the probes so that the least degeneracy results? (Consult the genetic code in Table 15.12.) (b) How many different probes must be synthesized to be certain that you will find the correct cDNA sequence that specifies the protein? *39. A gene in mice is discovered that is similar to a gene in yeast. How might it be determined whether this gene is essential for development in mice? *40. A hypothetical disorder called G syndrome is an autosomal dominant disease characterized by visual, skeletal, and cardiovascular defects. The disorder appears in middle age. Because the symptoms of the disorder are variable, the disorder is difficult to diagnose. Early diagnosis is important, however, because the cardiovascular symptoms can be treated if the disorder is recognized early. The gene for G syndrome is known to reside on chromosome 7, and A locus: A1, A2, A3, A4 C locus: C1, C2, C3 Sally, shown in the following pedigree, is concerned that she might have G syndrome. Her deceased mother had G syndrome, and she has a brother with the disorder. A geneticist genotypes Sally and her immediate family for the A and C loci and obtains the genotypes shown on the pedigree. A1 A1 C2 C3 Sally A1 A3 C2 C3 A1 A2 C2 C3 A1 A2 C1 C2 (a) Assume that there is no crossing over between the A, C, and G loci. Does Sally carry the gene that causes G syndrome? Explain why or why not? (b) Draw the arrangement of the A, C, and G alleles on the chromosomes for all members of the family. CHALLENGE QUESTIONS 41. Suppose that you are hired by a biotechnology firm to produce a giant strain of fruit flies, by using recombinant DNA technology, so that genetics students will not be forced to strain their eyes when looking at tiny flies. You go to the library and learn that growth in fruit flies is normally inhibited by a hormone called shorty substance P (SSP). You decide that you can produce giant fruit flies if you can somehow turn off the production of SSP. SSP is synthesized from a compound called XSP in a single-step reaction catalyzed by the enzyme runtase: XSP runtase SSP A researcher has already isolated cDNA for runtase and has sequenced it, but the location of the runtase gene in the Drosophila genome is unknown. In attempting to devise a strategy for turning off the production of SSP and producing giant flies by using standard recombinant DNA techniques, you discover that deleting, inactivating, or otherwise mutating this DNA sequence in Drosophila turns out to be extremely difficult. Therefore you must restrict your genetic engineering to gene augmentation (adding new genes to cells). Describe the methods that you will use to turn off SSP and produce giant flies by using recombinant DNA technology. 42. A rare form of polydactyly (extra fingers and toes) in humans is due to an X-linked recessive gene, whose chromosomal location is unknown. Suppose a geneticist studies the family whose pedigree is shown here. She isolates DNA from each member of this family, cuts the DNA with a restriction enzyme, separates the resulting fragments by gel electrophoresis, and transfers the DNA to nitrocellulose by Southern blotting. She then hybridizes the nitrocellulose with a cloned DNA sequence that comes from the X chromosome. The pattern of bands Recombinant DNA Technology that appear on the autoradiograph is shown below each person in the pedigree. 547 (a) For each person in the pedigree, give his or her genotype for RFLPs revealed by the probe. (Remember that males are hemizygous for X-linked genes, and females can be homozygous or heterozygous.) (b) Is there evidence for close linkage between the probe sequence and the X-linked gene for polydactyly? Explain your reasoning. (c) How many of the daughters in the pedigree are likely to be carriers of X-linked polydactyly? Explain your reasoning. SUGGESTED READINGS Andrews, L. B., J. E. Fullarton, N. A. Holtzman, and A.G. Molulsky. 1994. Assessing Genetic Risks: Implications for Health and Social Policy. Washington, DC: National Academy Press. Discusses some of the legal, ethical, and social issues surrounding gene testing. Berg, P., D. Baltimore, H. W. Boyer, S. N. Cohen, R. W. Davis, D. S. Hogness, D. Nathans, R. Roblin, J. D. Watson, S. Weissman, and N. D. Zinder. 1974. Potential biohazards of recombinant DNA molecules. Science 185:303. Well-known letter calling for a moratorium on certain types of recombinant DNA experiments. Cohen, J. S., and M. E. Hogan. 1994. The new genetic medicine. Scientific American 271(6):76 – 82. A good review of oligonucleotide drugs. Cohen, S., A. Chang, H. Boyer, and R. Helling. 1973. Construction of biologically functional bacterial plasmids in vitro. Proceedings of the National Academy of Sciences of the United States of America 70:3240 – 3244. Description of the first gene-cloning experiments. Enriquez, J. 1998. Genomics and the world’s economy. Science 281:925 – 926. Discussion of the growing importance of gene sequencing and biotechnology in the world economy. Friedmann, T. 1997. Overcoming the obstacles to gene therapy. Scientific American 276(6):96 – 101. Review of some of the current problems in gene therapy. Gasser, C. S., and R. T. Fraley. 1992. Transgenic crops. Scientific American 266(6):62 – 69. An excellent and readable account of how genes are put into plants and some of the applications. Isner, J. M. 2002. Myocardial gene therapy. Nature 415:234 – 239. Discusses recent research on the use of gene therapy to treat coronary artery disease and heart failure. Mullis, K. B. 1990. The unusual origin of the polymerase chain reaction. Scientific American 262(4):56 – 65. Mullis describes his inspiration for the polymerase chain reaction. Nowak, R. 1994. Forensic DNA goes to court with O. J. Science 265:1352 – 1354. A report on the use of DNA fingerprinting in the famed O. J. Simpson trial. Roberts, L. 1992. Science in court: a culture clash. Science 257:732 – 736. A report on some of the controversy surrounding DNA fingerprinting. Salo, W. L., A. C. Aufderheide, J. Buikstra, and T. A. Holcomb. 1994. Identification of Mycobacterium tuberculosis DNA in a pre-Columbian Peruvian mummy. Proceedings of the National Academy of Sciences of the United States of America 91:2091 – 2094. Report of tuberculosis-causing bacteria in a 1000-year-old Peruvian mummy. Stein, C. A., and Y. C. Cheng. 1993. Antisense oligonucleotides as therapeutic agents: is the bullet really magical? Science 261:1004 – 1011. A review of the development of oligonucleotide drugs. Verma, I. M., and J. Somia. 1997. Gene therapy: promises, problems, and prospects. Nature 389:239 – 242. A good review of the state of gene therapy. Wofenbarger, L. L., and P. R. Phifer. 2000. The ecological risks and benefits of genetically engineered plants. Science 290:2088 – 2093. A review of scientific evidence of benefits and risks associated with the use of genetically engineered organisms in agriculture. Yan, H., K. W. Kinzler, and B. Vogelstein. 2000. Genetic testing: present and future. Science 289:1890 – 1892. A review of some of the problems associated with genetic testing and current techniques being developed to overcome them. Zanjani, E. D., and W. F. Anderson. 1999. Prospects for in utero human therapy. Science 285:2084 – 2088. A review of the potential use of gene therapy in utero on unborn fetuses to correct human genetic defects. 19 Genomics • The Decaying Genome of Mycobacterium leprae • Structural Genomics Genetic Maps Physical Maps DNA-Sequencing Methods Sequencing an Entire Genome The Human Genome Project Single-Nucleotide Polymorphisms Expressed-Sequence Tags Bioinformatics • Functional Genomics Predicting Function from Sequence Gene Expression and Microarrays Genomewide Mutagenesis • Comparative Genomics Prokaryotic Genomes Eukaryotic Genomes • The Future of Genomics A young girl with leprosy, a disease caused by the bacteria Mycobacterium leprae. Leprosy causes characteristic patches of skin with a loss of sensation, as seen on the face of this young patient; if untreated, leprosy may lead to nerve damage and disfigurement. Genomic studies reveal the genome of M. leprae has undergone extensive gene loss, mutation, and rearrangement over evolutionary time. (WHO/OMS.) The Decaying Genome of Mycobacterium leprae Leprosy, one of the most feared diseases of history, was well known in ancient times and is still a major public health problem today; from 2 million to 3 million people are affected worldwide, and approximately 650,000 new cases are reported each year. In its severest form, leprosy causes paralysis, blindness, and disfigurement. Although human genes play some role in susceptibility to leprosy, the disease is caused by the bacterium Mycobacterium leprae, which infects cells of the nervous system and causes nerve damage, sensory loss, and disfigurement. In 1873, Armauer Hansen observed these bacteria in tissue samples taken from people with leprosy, but to this day no one has successfully cultured the bacterium in laboratory media, severely restricting the study of the disease agent. 548 In 2001, scientists in Britain and France determined the sequence of the entire genome of M. leprae. Comparing its genome with that of its close relative M. tuberculosis (the pathogen that causes tuberculosis) and other mycobacteria has been a source of important insight into the unique properties of this pathogen. The genome of M. leprae is 3,268,203 bp in size, 1 million base pairs smaller than the genomes of other mycobacteria. In most bacterial genomes, the vast majority of the DNA encodes proteins — there is little noncoding DNA between genes. In contrast, only 50% of the DNA of M. leprae encodes proteins (Table 19.1), and M. leprae has 2300 fewer genes than M. tuberculosis. An incredible 27% of M. leprae’s genome consists of pseudogenes — nonfunctional copies of genes that have been inactivated by mutations. M. leprae has 1116 pseudogenes, whereas its close relative, M. tuberculosis, has just 6. Genomics Table 19.1 Comparison of the genomes of Mycobacterium leprae, which causes leprosy, and Mycobacterium tuberculosis, which causes tuberculosis Characteristics M. leprae M. tuberculosis Genome size (bp) 3,268,203 4,411,532 Percentage of genome that encodes proteins 49.5% 90.8% Protein-encoding genes (bp) 1604 3959 Pseudogenes (bp) 1116 6 Gene density (bp/gene) 2037 1114 Average length of gene (bp) 1011 1012 Source: S. T. Cole et al., Massive gene decay in the leprosy bacillus, Nature 409 (2001), p. 1007. The reduced DNA content, fewer functional genes, and the large number of pseudogenes suggest that, evolutionarily, the genome of M. leprae has undergone massive decay through time, losing DNA and acquiring mutations that have inactivated many of its genes. Furthermore, the genome of M. leprae has undergone extensive rearrangement; comparison with the genome of M. tuberculosis has identified at least 65 gene segments that are arranged in different order and distribution. The mechanisms responsible for gene decay and genomic rearrangement in M. leprae are not known, although the loss of proofreading ability in the bacterium’s DNA polymerase III (the enzyme responsible for most bacterial DNA replication, see Chapter 12) may contribute to a high rate of mutation and the large number of pseudogenes. Because the leprosy bacterium resides in a highly specialized habitat (human nerve cells), it may have lost the need for many enzymatic functions found in other bacteria. When a function is no longer required for survival, genes encoding that function usually accumulate mutations and deletions. Regardless of the mechanism for gene inactivation and loss, this genomic decay helps explain some of the bacterium’s unique properties. Genes for many metabolic enzymes and structural proteins have been lost, which may explain why the bacterium cannot be cultured on synthetic media containing traditional carbon sources; it may also account for the bacterium’s slow growth, with a doubling time of 14 days, compared with a doubling time of 20 minutes for E. coli. A comparison of M. leprae’s genome with those of other related bacteria has identified a few unique genes that may contribute to its pathogenesis. The study of these genes has opened the door to an improved understanding of leprosy, better diagnostic tests, and the development of new drugs for the disease. The information gleaned from sequencing the genome of M. leprae illustrates the power of genomics, which is the focus of this chapter. Genomics is the field of genetics that attempts to understand the content, organization, function, and evolution of genetic information contained in whole genomes. Genomics consists of two complementary fields: structural genomics and functional genomics. Structural genomics determines the organization and sequence of the genetic information contained within a genome, and functional genomics characterizes the function of sequences elucidated by structural genomics. A third area, comparative genomics, compares the gene content, function, and organization of genomes of different organisms. The field of genomics is at the cutting edge of modern biology; information resulting from research in this field has made significant contributions to human health, agriculture, and numerous other areas. It has also provided gene sequences necessary for producing medically important proteins through recombinant DNA technology. Comparisons of genome sequences from different organisms are leading to a better understanding of evolution and the history of life. We begin this chapter by examining genetic and physical maps and methods for sequencing entire genomes. Next, we explore functional genomics — how genes are identified in genomic sequences and how their functions are defined. Some of the genomes that have been sequenced are then examined in detail. We end the chapter by briefly considering the future of genomics. Concepts The field of genomics comprises structural genomics, which focuses on the content and organization of genomic information, and functional genomics, which attempts to understand the function of information in genomes. Comparative genomics compares the content and organization of genomes of different organisms. Genomics makes important contributions to human health, agriculture, biotechnology, and our understanding of evolution. www.whfreeman.com/pierce information on leprosy Internet sources of 549 Chapter 19 17.0 7.7 7.3 4.7 11.4 13.4 8.7 5.0 11.8 9.8 11.0 14.2 9.4 4.4 13.9 13.6 6.5 7.9 7.5 10.7 6.2 5.6 33.8 9.3 9.5 12.7 15.8 D1S434 D1S496 13.7 D1S209 Distance (cM) D1S160 D1S243 D1S548 D1S450 D1S228 D1S507 D1S436 D1S1592 D1S199 D1S482 D1S234 D1S247 D1S513 D1S233 D1S201 D1S441 D1S472 D1S186 D1S1157 D1S193 D1S319 D1S161 D1S417 D1S200 D1S476 D1S220 D1S312 D1S473 D1S246 D1S1613 D1S198 D1S159 D1S224 D1S532 D1S500 D1S1728 D1S207 D1S167 D1S188 D1S236 D1S223 D1S239 D1S221 D1S187 D1S418 D1S189 D1S440 D1S534 D1S498 D1S305 D1S303 SPTA1 CRP D1S484 APOA2 D1S104 D1S194 D1S318 D1S210 D1S218 D1S416 D1S215 D1S399 D1S240 D1S191 D1S518 D1S461 D1S422 D1S412 D1S310 D1S510 D1S249 D1S245 D1S414 D1S505 D1S237 D1S229 D1S549 D1S213 D1S225 D1S459 D1S446 ACTN2 D1S547 D1S1609 D1S180 3 Bands visible on a metaphase chromosome are numbered. The locations of some DNA markers relative to chromosome bands have been determined. 36.3 36.2 36.1 35 34.3 34.2 34.1 33 32.3 32.2 32.1 31.3 31.2 31.1 22.3 22.2 22.1 D1S221 DNA markers Genetic Maps Everyone has used a map at one time or another. Maps are indispensable for finding a new friend’s house, the way to an unfamiliar city in your state, or the location of a country on the globe. Each of these examples requires a map with a different scale. For finding a friend’s house, you would probably use a city street map; for finding your way to an unknown city, you might pick up a state highway map; for finding a country such as Kazakhstan, you would need a world atlas. Similarly, navigating a genome requires maps of different types and scales. Genetic maps (also called linkage maps) provide a rough approximation of the locations of genes relative to the locations of other known genes ( ◗ FIGURE 19.1). These maps are based on the genetic function of recombination (hence the name genetic map). The basic principles of constructing genetic maps are discussed in detail in Chapter 7. In short, individuals heterozygous at two or more genetic loci are crossed, and the frequency of recombination between loci is determined by examining the progeny. If the recombination frequency between two loci is 50%, then the loci are located on different chromosomes or are far apart on the same chromosome. If the recombination frequency is less than 50%, the loci are located close together on the same chromosome (they belong to the same linkage group). For linked genes, the rate of recombination is proportional to the physical distance between the loci. Distances on genetic maps are measured in percent recombination (centimorgans, cM) or map units. Data from multiple two-point or three-point crosses can be integrated into linkage maps for whole chromosomes. For many years, genes could be detected only by observing their influence on a trait (the phenotype), and construction of genetic maps was limited by the availability of single-locus traits that could be examined for evidence of recombination. Eventually, this limitation was overcome by the development of molecular techniques such as restriction fragment length polymorphisms, the polymerase chain reaction, and DNA sequencing (see Chapter 18) that are able to provide molecular markers that can be used to construct and refine genetic maps. Genetic maps have several limitations, the first of which is resolution or detail. The human genome includes 3.4 billion base pairs of DNA and has a total genetic distance of about 4000 cM, an average of 850,000 bp/cM. 2 DNA markers and a few genes (in blue) of known phenotypes can be used to determine the positions of genes. 21 13.3 13.2 13.1 12 11 11 12 D1S431 Structural genomics is concerned with sequencing and understanding the content of genomes. Often, one of the first steps in characterizing a genome is to prepare genetic and physical maps of its chromosomes. These maps provide information about the relative locations of genes, molecular markers, and chromosome segments, which are often essential for positioning chromosome segments and aligning stretches of sequenced DNA into a whole-genome sequence. 1 Distances on a genetic map are measured in centimorgans. 21.1 21.2 21.3 22 23 24 25 D1S237 D1S412 Structural Genomics D1S446 550 31 32.1 32.2 32.3 41 42.1 42.2 42.3 43 44 Human chromosome 1 4 This gene encodes for -actinin, an actin-binding protein found in muscle cells. Mutation in this gene may be associated with muscular dystrophy. ◗ 19.1 Genetic maps are based on rates of recombination. Shown here is a genetic map of human chromosome 1. Genomics Even if a marker occurred every centimorgan (which is unrealistic), the resolution in regard to the physical structure of the DNA would still be quite low. In other words, the detail of the map is very limited. A second problem with genetic maps is that they do not always accurately correspond to physical distances between genes. Genetic maps are based on rates of crossing over, which vary somewhat from one part of a chromosome to another; so the distances on a genetic map are only approximations of real physical distances along a chromosome. ◗ FIGURE 19.2 compares the genetic map of chromosome III of yeast with a physical map determined by DNA sequencing. There are some discrepancies between the distances and even among the positions of some genes. In spite of these limitations, genetic maps have been critical to the development of physical maps and the sequencing of whole genomes. 1 This map shows the actual cha1 location of the genes. It has greater resolution glk1 and accuracy,… his4 SUP53 leu2 Centromere glk1 cha1 his4 2 …whereas this map is created from recombination frequency data and has limited accuracy. SUP53 leu2 Centromere pgk1 pgk1 pet18 pet18 cry1 cry1 MAT MAT thr4 SUP61 thr4 SUP61 ABP1 Physical Maps Physical maps are based on the direct analysis of DNA, and they place genes in relation to distances measured in number of base pairs, kilobases, or megabases ( ◗ FIGURE 19.3). A common type of physical map is one that connects isolated pieces of genomic DNA that have been cloned in bacteria or yeast. Physical maps generally have higher resolution and are more accurate than genetic maps. A physical map is analogous to a neighborhood map that shows the location of every house along a street, whereas a genetic map is analogous to a highway map that shows the locations of major towns and cities. A number of techniques exist for creating physical maps, including restriction mapping, which determines the positions of restriction sites on DNA; sequence-tagged site (STS) mapping, which locates the positions of short unique ABP1 Physical Genetic distance map Chromosome III ◗ 19.2 Genetic and physical maps may differ in relative distances and even in the position of genes on a chromosome. Genetic and physical maps of yeast chromosome III reveal such differences. sequences of DNA on a chromosome; fluorescent in situ hybridization (FISH), by which markers can be visually mapped to locations on chromosomes (see Figure 7.22); and DNA sequencing. YAC clones yOX224 yOX28 yOX88 yOX44 yOX223 yOX222 yOX225 yOX210 yOX10 yOX8 yOX9 yOX62 yOX38 yOX55 tagged sites yOX205 yOX31 yOX33 yOX110 yOX193 yOX35 yOX36 yOX135 yOX90 yOX14 yOX13 yOX160 yOX142 yOX184 yOX200 yOX199 4 5 3 6 179 7 9 10 175 180 11 12 178 4 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 Sequence- 1 2 yOX41 yOX32 ◗ 19.3 Physical maps are often used to order cloned DNA fragments. A part of a physical map of a set of overlapping YAC clones from one end of the human Y chromosome. 551 552 Chapter 19 EcoRI alone Concepts Both genetic and physical maps provide information about the relative positions and distances between genes, molecular markers, and chromosome segments. Genetic maps are based on rates of recombination and are measured in percent recombination, or centimorgans. Physical maps are based on the physical distances and are measured in base pairs. BamH1 alone EcoRI + Size BamHI standards (double digest) Wells 10 kb 9 kb 8 kb 7 kb 6 kb Restriction mapping determines the relative positions of restriction sites on a piece of DNA. When a piece of DNA is cut with a restriction enzyme and the fragments are separated by gel electrophoresis, the number of restriction sites in the DNA and the distances between them can be determined by the number and positions of bands on the gel (p. 000 in Chapter 18), but this information does not tell us the order or the precise location of the restriction sites. To map restriction sites, a sample of the DNA is cut with one restriction enzyme, and another sample is cut with a different restriction enzyme. A third sample is cut with both restriction enzymes together (a double digest). The DNA fragments produced by these restriction digests are then separated by gel electrophoresis, and their sizes are compared. Overlap in size of fragments produced by the digests can be used to position the restriction sites on the original DNA molecule. 4 kb 3 kb 3 kb 3 kb 2 kb 2 kb 2 kb 1 kb 1 kb ◗ 19.4 Restriction sites can be mapped by comparing DNA fragments produced by digestion by restriction enzymes used alone and in various combinations. A sample of a linear piece of DNA was first digested with EcoRI alone. Another sample was digested by BamHI alone, and finally a third sample was digested by both EcoRI and BamHI. The resulting fragments were separated by gel electrophoresis and stained with ethidium bromide. with BamHI produced 9-kb and 4-kb fragments, indicating that there is only one BamHI site. The BamHI restriction site must be 9 kb from one end and 4 kb from the other end. The double digest produced four pieces of DNA: 7-kb, 3-kb, 2-kb, and 1-kb fragments. Neither of the fragments generated by BamHI alone is present in the double digest, and so EcoRI must have cut both of the BamHI fragments. Consider the 9-kb fragment. How could this fragment be cut by EcoRI to produce the fragments found in the double digest? Two of the fragments produced by the double digest, the 7-kb and 2-kb fragments, add up to 9 kb, the length of one fragment produced by digestion by BamHI alone. Similarly, the 3-kb fragment and the 1-kb fragment of the double digest add up to 4 kb, the length of the other fragment produced by BamHI alone. Therefore, EcoRI cut the first BamHI fragment into 7-kb and 2-kb fragments and cut the second BamHI fragment into 3-kb and 1-kb fragments: Worked Problem One sample of a linear 13,000-bp (13-kb) DNA fragment is cut with the restriction enzyme EcoRI; a second sample of the same DNA is cut with BamHI; and a third sample is cut with both EcoRI and BamHI together. The resulting fragments are separated and sized by gel electrophoresis ( ◗ FIGURE 19.4). Determine the positions of the EcoRI and BamHI restriction sites on the original 13-kb fragment. • Solution Using the sizes of the fragments produced from the three digests in Figure 19.4, we can order the positions of the restriction sites on the original 13-kb piece of DNA. First, note that digestion with EcoRI alone produced 8-kb, 3-kb, and 2-kb fragments, indicating that there are two EcoRI restriction sites in the original linear piece of DNA. Digestion 9 kb 4 kb Digestion with Bam HI 9 kb 4 kb Digestion with Eco RI 7 kb 2 kb 3 kb 1 kb Genomics We now know that the BamHI site lies between these two EcoRI sites. Considering the four fragments produced by the double digest, there are several possible arrangements by which a BamHI site could fit in between the two EcoRI sites. To determine which of the arrangements is correct, compare the results of the EcoRI digestion with the double digest. When the original 13-kb DNA fragment was cut by EcoRI alone, the three fragments produced were 8 kb, 3 kb, and 2 kb in length. The 2-kb and 3-kb bands are also present in the double digest, indicating that these fragments do not contain a BamHI site. The 8-kb fragment present in the EcoRI digest disappears in the double digest and is replaced by the 7-kb fragment and the 1-kb fragment, indicating that the 8-kb fragment has the BamHI site. Thus the 7-kb and 1-kb fragments must lie next to each other, and the 2-kb and 3-kb fragments are on the ends. Thus, the correction arrangement of the restriction sites is: EcoRI site 2 kb BamHI site EcoRI site 7 kb 1 kb 3 kb In the example in the Worked Problem, we can map the restriction sites in our heads or with a few simple sketches. Most restriction mapping is done with several restriction enzymes, used alone and in various combinations, producing many restriction fragments. With long pieces of DNA (greater than 30 kb), computer programs are used to determine the restriction maps, and restriction mapping may be facilitated by tagging one end of a large DNA fragment with radioactivity or by identifying the end with the use of a probe. Physical maps, such as restriction maps of DNA fragments or even whole chromosomes, are often created for genomic analysis. These lengthy maps are often put together by combining maps of shorter, overlapping genomic fragments. Maxam and Walter Gilbert developed a second method based on the chemical degradation of DNA. The Sanger method quickly became the standard procedure for sequencing any purified fragment of DNA. The Sanger, or dideoxy, method of DNA sequencing is based on the process of replication. The fragment to be sequenced is used as a template to make a series of new DNA molecules. In the process, replication is sometimes (but not always) terminated when a specific base is encountered, producing DNA strands of different length, each of which ends in the same base. The method relies on the use of a special substrate for DNA synthesis. Normally, DNA is synthesized from deoxyribonucleoside triphosphates (dNTPs), which have an OH group on the 3-carbon atom ( ◗ FIGURE 19.5a). In DNA synthesis, two phosphate groups on the 5-carbon atom of a dNTP are removed, and a phosphodiester bond is formed between the remaining 5-phosphate group of the dNTP and the 3-OH group of the last nucleotide on the growing DNA chain (see p. 000 in Chapter 12). In the Sanger method, a special nucleotide, called a dideoxyribonucleoside triphosphate (ddNTP; ◗ FIGURE 19.5b), is used as substrate. The ddNTPs are identical with dNTPs, except that they lack a 3-OH group. Like dNTPs, ddNTPs possess three phosphate groups on their 5 ends, and so they are incorporated into a growing DNA chain. When a ddNTP has been incorporated into a DNA chain, however, no more nucleotides can be added, because there is (a) (b) O– O O– O– P O O O O O– P O O O P O H 3’ CH2 O base H H OH H H Deoxyribonucleoside triphosphate (dNTP) DNA-Sequencing Methods The most detailed physical maps are based on direct DNA sequence information. The first methods for quickly sequencing DNA were developed between 1975 and 1977. Frederick Sanger and his colleagues created the dideoxy sequencing method based on the elongation of DNA; Allan ◗ O– P O CH2 The locations of restriction sites can be mapped by cutting DNA with several restriction enzymes, first with each restriction enzyme alone and then with combinations of restriction enzymes. O– P O O– O Concepts O– P H 3’ O base H H H H H Dideoxyribonucleoside triphosphate (ddNTP) 19.5 The dideoxy sequencing reaction requires a special substrate for DNA synthesis. (a) Structure of deoxyribonucleoside triphosphate, the normal substrate for DNA synthesis. (b) Structure of dideoxyribonucleoside triphosphate, which lacks an OH group on the 3 carbon atom. 553 554 Chapter 19 3. a small amount of one of the four types of no 3-OH group to form a phosphodiester bond with an incoming nucleotide. Thus, ddNTPs terminate DNA synthesis. A single DNA molecule cannot be sequenced; so any DNA fragment to be sequenced must first be amplified by PCR or by cloning in bacteria. Copies of the target DNA are isolated and split into four parts ( ◗ FIGURE 19.6). Each part is placed in a different tube, to which are added: dideoxyribonucleoside triphosphates (ddCTP, ddATP, ddGTP, or ddTTP), which will terminate DNA synthesis as soon as it is incorporated into any growing chain (each of the four tubes received a different ddNTP); and 4. DNA polymerase. Either the primer or one of the dNTPs is radioactively or chemically labeled so that newly produced DNA can be detected. Within each of the four tubes, the DNA polymerase enzyme carries out DNA synthesis. Let’s consider the reaction in one of the four tubes; the one that received ddATP. Within this tube, each of the single strands of target DNA 1. many copies of a primer that is complementary to one end of the target DNA strand; 2. all four deoxyribonucleoside triphosphates (dCTP, dATP, dGTP, and dTTP), the normal precursors of DNA synthesis; 1 Each of four reactions contains single-stranded target DNA to be sequenced,… 2 …a primer,… Template Primer CTAAGCTCGACT 5’ OH 3’ dCTP dTTP dATP dGTP + DNA polymerase 3 …all four deoxyribonucleoside triphosphates (dCTP, dGTP, dATP, and dTTP), DNA polymerase,… 4 …and one type of dideoxyribonucleoside triphosphate (ddNTP). 3’ 5’ + ddATP + ddCTP + ddGTP + ddTTP 5 During DNA synthesis, nucleotides are added to the 3’ end of the primer with the target DNA being used as a template. 6 When a dideoxynucleotide is incorporated into the growing chain, synthesis terminates because the dideoxynucleotide lacks a 3’ OH. Template CTAAGCTCGACT GATTCGAGCTGddA GATTCGddA GddA 7 Both the normal substrate and a ddNTP are used in the reaction; so synthesis terminates at different positions on different strands,… CTAAGCTCGACT GATTCGAGddC GATTddC A 8 …which generates a set of DNA fragments of various length, each ending in a dideoxynucleotide with the same base. 9 The fragments produced in each reaction are separated by gel electrophoresis. 10 The sequence can be read directly from the bands that appear on the autoradiograph of the gel, starting from the bottom. 11 The sequence obtained is the complement of the original template strand. ◗ 19.6 The dideoxy method of DNA sequencing is based on the termination of DNA synthesis. C CTAAGCTCGACT GATTCGAGCTddG GATTCddG ddG G T CTAAGCTCGACT GATTCGAGCddT GATddT GAddT 3’ 5’ A G T C G A G C T T A G T C A G C T C G A A T C Autoradiogram of electrophoresis gel 5’ 3’ Sequence of complementary strand Sequence of original template strand Genomics serves as template for DNA synthesis. The primer pairs to its complementary sequence at one end of each template strand, providing a 3-OH group for the initiation of DNA synthesis. DNA polymerase elongates a new strand of DNA from this primer, by using the target DNA strand as a template. Wherever DNA polymerase encounters a T on the template strand, it uses at random either a dATP or a ddATP to introduce an A in the newly synthesized strand. Because there is more dATP than ddATP in the reaction mixture, dATP is incorporated most often, allowing DNA synthesis to continue. Occasionally, however, ddATP is incorporated into the strand and synthesis terminates. The incorporation of ddA into the new strand occurs randomly at different positions in different copies, producing a set of DNA chains of different length (12, 7, and 2 nucleotides long in the example illustrated in Figure 19.6), each ending in a nucleotide with adenine. Equivalent reactions take place in the other three tubes. In the tube that received ddCTP, all the chains terminate in a nucleotide with cytosine; in the tube that received ddGTP, all the chains terminate in a nucleotide with guanine; and, in the tube that received ddTTP, all the chains terminate in a nucleotide with thymine. After the completion of the polymerization reactions, all the DNA in the tubes is denatured, and the single-strand products of each reaction are separated by gel electrophoresis. The contents of the four tubes are separated side by side on an acrylamide gel so that DNA strands differing in length by only a single nucleotide can be distinguished. After electrophoresis, the locations of the DNA strands in the gel are revealed by autoradiography. The shortest strands, which terminated at positions early in the DNA sequence, migrate quickly and end up near the bottom of the gel; longer fragments, which terminated late in the sequence, migrate more slowly and end up near the top of the gel. Reading the DNA sequence is simple and the shortest part of the procedure. In Figure 19.6, you can see that the band closest to the bottom of the gel is from the tube that contained the ddGTP reaction, which means that the first nucleotide synthesized had guanine (G). The next band up is from the tube that contained ddATP; so the next nucleotide in the sequence is adenine (A), and so forth. In this way, the sequence is read from the bottom to the top of the gel, with the nucleotides near the bottom corresponding to the 5 end of the newly synthesized DNA strand and those near the top corresponding to the 3 end. Keep in mind that the sequence obtained is not that of the target DNA but that of its complement. You may have wondered how the primers used in dideoxy sequencing are constructed, because the sequence of the target DNA may not be known ahead of time. The trick is to insert a sequence that will be recognized by the primer into the target DNA. This is often done by first cloning the target DNA in a vector that contains sequences Primer Vector DNA Primer site Primer site DNA insert Vector DNA Primer ◗ 19.7 Sites recognized by sequencing primers are added to the target DNA by cloning the DNA in a vector that contains universal sequencing primer sites on either side of the site where the target DNA will be inserted. recognized by a common primer (called universal sequencing primer sites) on either side of the site where the target DNA will be inserted. The target DNA is then isolated from the vector and will contain universal sequencing primer sites at each end ( ◗ FIGURE 19.7). Sequencing is often carried out by automated machines that use fluorescent dyes and laser scanners to sequence thousands of base pairs in a few hours ( ◗ FIGURE 19.8). The dideoxy reaction is also used here, but the ddNTPs used in the reaction are labeled with a fluorescent dye, and a different colored dye is used for each type of dideoxynucleotide. For example, a red dye might be used for nucleotides with thymine, a green dye for those with adenine, a black dye for those with guanine, and a blue dye for those with cytosine. In this case, the four sequencing reactions can take place in the same test tube and can be placed in the same well during electrophoresis, given that each ddNTP is distinctively marked. The most recently developed sequencing machines carry out electrophoresis in gel-containing capillary tubes. The different-sized fragments produced by the sequencing reaction separate within a tube and migrate past a laser beam and detector. As the fragments pass the laser, their fluorescent dyes are activated and the resulting fluorescence is detected by an optical scanner. Each colored dye emits fluorescence of a characteristic wavelength, which is read by the optical scanner. The information is fed into a computer for interpretation, and the results are printed out as a set of peaks on a graph (See Figure 19.8). Automated sequencing machines may contain 96 or more capillary tubes, allowing from 50,000 to 60,000 bp of sequence to be read in a few hours. Concepts DNA can be rapidly sequenced by the dideoxy method, in which ddNTPs are used to terminate DNA synthesis at specific bases. Automated sequencing methods allow tens of thousands of base pairs to be read in just a few hours. 555 556 Chapter 19 1 A single-stranded DNA fragment whose base sequence is to be determined (the template) is isolated. 5’ CCTATTATGACACAACCGCA 3’ ddCTP ddGTP ddTTP ddATP C G T A dNTPs 2 Each of the four ddNTPs is tagged with a fluorescent dye, and the Sanger sequencing reaction is carried out. Template strand 3 The fragments that end in the same base have the same colored dye attached. Primer (sequence known) 5’ CCTATTATGACACAACCGCA 3’ 3’ GCGT 5’ 5’ CCTATTATGACACAACCGCA 3’ 3’ GGATAATACTGTGTTGGCGT 5’ 4 The products are denatured, and the DNA fragments produced by the four reactions are mixed and loaded into a single well on an electrophoresis gel. The fragments migrate through the gel according to size,… Laser Electrophoresis 5’ CCTATTATGACACAACCGCA 3’ 3’ GGATAATACTGTGTTGGCGT 5’ GGA T A A T AC T G T G T T G G C G T Longest Shortest fragment fragment 5 …and the fluorescent dye on the DNA is detected by using a laser beam and detector. Detector 6 Each fragment appears as a peak on the computer printout; the color of the peak indicates which base the peak represents. 7 The sequence information is read directly into the computer, which converts it into the complementary—target—sequence. ◗ 19.8 3’ 5’ The dideoxy sequencing method can be automated. Sequencing an Entire Genome The ultimate goal of structural genomics is to determine the ordered nucleotide sequences of entire genomes of organisms. The main obstacle to this task is the immense size of most genomes. Bacterial genomes are usually at least several million base pairs long; many eukaryotic genomes are billions of base pairs long and are distributed among dozens of chromosomes. In addition, for technical reasons, it is not possible to begin sequencing at one end of a chromosome and continue straight through to the other end; only small fragments of DNA — usually from 500 to 700 nucleotides — can be sequenced at one time. Therefore, determining the sequence for an entire genome requires that the DNA be broken into thousands or millions of smaller fragments that can then be sequenced. The difficulty lies in putting these short sequences back together in the correct order. As we will see, two different approaches have been used to assemble the short, sequenced fragments into a complete genome. The first genomes to be sequenced were small genomes of some viruses. The genome of bacteriophage , consisting of 49,000 bp, was completed in 1982. In 1995, the first genome of a living organism (Haemophilus influenzae) was sequenced by Craig Venter and Claire Fraser of the Institute for Genomic Research (TIGR) and Hamilton Smith of Johns Hopkins University. This bacterium has a relatively small genome of 1.8 million base pairs ( ◗ FIGURE 19.9). By 1996, the genome the first eukaryotic organism (yeast) had been determined, followed by the genome of Eschericia coli (1997), Caenorhabditis elegans (1998), and Drosophila melanogaster (2000). The first draft of the human genome was completed in June 2000. Map-based sequencing The first method for assembling short, sequenced fragments into a whole-genome sequence, called a map-based approach, requires the initial creation of detailed genetic and physical maps of the genome, which provide known locations of genetic markers (restriction sites, other genes, or known DNA sequences) at regularly spaced intervals along each chromosome. These markers can later be used to help align the short, sequenced fragments into their correct order. Genomics Sma I 1 Sma I Not I 1,800,000 Sma I Numbers represent a scale in base pairs. Sites where restriction enzymes cleave DNA are shown with the enzyme name. 100,000 Rsr II Sma I 1,700,000 Key 200,000 Sma I 1,600,000 Sma I Sma I Rsr II Amino acid biosynthesis Biosynthesis of cofactors, prosthetic groups, carriers Cell envelope Cellular processes Central intermediary metabolism Energy metabolism Fatty acid phospholipid metabolism Purines, pyrimides, nucleosides and nucleotides The outer circle shows genes whose function is indicated in the key. Sma I Sma I 300,000 Rsr II 1,500,000 400,000 1,400,000 Sma I 500,000 1,300,000 Sma I 600,000 1,200,000 Sma I Sma I Sma I 700,000 1,100,000 Sma I Sma I 800,000 Sma I 1,000,000 Rsr II 557 The second circle indicates the base composition in that area: red > 42% GC; blue > 40% GC; black > 66% AT; green > 64% AT. Regulatory functions Replication Transport/binding proteins Translation Transcription Other categories Hypothetical Unknown The third circle shows coverage of some clones used in sequencing the genome. The fourth circle shows ribosomal operons (green), tRNAs (black) and µ-like prophage (blue). 900,000 The fifth circle shows positions of simple tandem repeats. ◗ 19.9 The bacterium Haemophilas influenzae was the first free-living organism to be sequenced. (From R.D. Fleischman et al., 1993, Science 269:469; Scan courtesy of TIGR.) After the genetic and physical maps are available, chromosomes or large pieces of chromosomes are separated by pulsed-field gel electrophoresis (PFGE) or by flow cytometry. In pulsed field gel electrophoresis (which is similar to standard gel electrophoresis), large molecules of DNA or whole chromosomes are separated in a gel by periodically alternating the orientation of an electrical current. In flow cytometry, chromosomes are sorted optically by size ( ◗ FIGURE 19.10). Each chromosome (or sometimes the entire genome) is then cut up by partial digestion with restriction enzymes ( ◗ FIGURE 19.11). Partial digestion means that the restriction enzymes are allowed to act only for a limited time so that not all restriction sites in every DNA molecule are cut. Thus partial digestion produces a set of large overlapping DNA fragments, which are then cloned by using cosmids, yeast artificial chromosomes (YACs), or bacterial artificial chromosomes (BACs). Next, these large-insert clones are put together in their correct order on the chromosome (see Figure 19.11). This assembly can be done in several ways. One method relies on the presence of a high-density map of genetic markers. A complementary DNA probe is made for each genetic marker, and a library of the large-insert clones is screened with the probe, which will hybridize to any colony containing a clone with the marker. The library is then screened for neighboring markers. Because the clones are much larger than the markers used as probes, some clones will have more than one marker. For example, clone A might have markers M1 and M2, clone B markers M2, M3, and M4, and clone C markers M4 and M5. Such a result would indicate that these clones contain areas of overlap, as shown here. Clone A M1 M2 M4 M2 M3 M4 M2 M3 M4 M5 Clone C Clone B Contig M1 M5 A set of two or more overlapping DNA fragments that form a contiguous stretch of DNA is called a contig. This approach was used in 1993 to create a contig of the human Y chromosome consisting of 196 overlapping YAC clones (see Figure 19.3). 558 Chapter 19 clones and look for areas of overlap. The overlap is then used to arrange the clones in order, as shown here: Chromosomes Cell Restriction sites 1 Cells are broken open to release chromosomes… Clone A Clone C Clone B Contig Chromosomes in dilute medium 2 …and are stained with fluorescent dye. 3 The dye taken up is proportional to chromosome size. 4 Chromosomes—one per droplet—pass a laser, which causes them to fluoresce. Fluoresence detector 5 A detector determines a particular chromosome’s identity from its unique fluorescense… Laser Deflecting plates 6 …and signals a charge ring to apply a charge to the designated drops,… – + 7 …which are deflected into a separate receptacle. Sample with desired chromosome Sample with other chromosome ◗ 19.10 Flow cytometry is used to separate individual chromosomes. It is also possible to determine the order of clones without the use of preexisting genetic maps. For example, each clone can be cut with a series of restriction enzymes, and the resulting fragments are then separated by gel electrophoresis. This method generates a unique set of restriction fragments, called a fingerprint, for each clone. The restriction patterns for the clones are stored in a database. A computer program is then used to examine the restriction patterns of all the Other genetic markers can be used to help position contigs along the chromosome. When the large-insert clones have been assembled into the correct order on the chromosome, a subset of overlapping clones that efficiently cover the entire chromosome can be chosen for sequencing. Each of the selected large-insert clones is fractured into smaller overlapping fragments, which are themselves cloned with the use of phages or cosmids (Figure 19.11). These smaller clones (called small-insert clones) are then sequenced. The sequences of the small-insert clones are examined for overlap, which allows them to be correctly assembled to give the sequence of the larger insert clones. Enough overlapping small-insert clones are usually sequenced to ensure that the entire genome is sequenced several times. Finally, the whole genome is assembled by putting together the sequences of all overlapping contigs (Figure 19.11). Often, gaps still exist in the genome map that must be filled in by using other methods. Whole-genome shotgun sequencing The second approach to genome sequencing does not map and assemble the large-insert clones. In this approach, called whole-genome shotgun sequencing ( ◗ FIGURE 19.12), small-insert clones are prepared directly from genomic DNA and sequenced. Powerful computer programs then assemble the entire genome by examining overlap among the small-insert clones. The requirement for overlap means that most of the genome will be sequenced multiple (often from 10 to 15) times. Concepts Sequencing a genome requires breaking it up into small overlapping fragments whose DNA sequences can be determined in a sequencing reaction. The sequences can be ordered into the final genome sequence by a map-based approach (large fragments are ordered with the use of genetic and physical maps) or by whole-genome shotgun sequencing (overlap between the sequences of small fragments is compared by computers). Genomics 559 1 Partial digestion of DNA results in overlapping fragments that are then cloned in bacteria. Restriction sites A B C Markers 2 These large-insert clones are analyzed for markers or overlapping restriction sites,… 3 …which allows the large-insert clones to be assembled into a contig, a continuous stretch of DNA. 4 A subset of overlapping clones that cover the entire chromosome are selected and fractured. These pieces are then cloned. 5 Each of these small-insert clones is sequenced, and overlap in sequences is used to assemble them in the correct order. ATGCCTG TACGGAC TGGCTT ACCGAA Subclones TTATGCCA AATACGGT ATGCCTGGCTTATGCCA TACGGACCGAATACGGT Contig Gene A Gene B Gene C Gene D 6 The final sequence is assembled by putting together the sequences of the large clones and filling in any gaps. ◗ 19.11 Map-based approaches to whole-genome sequencing rely on detailed genetic and physical maps to align sequenced fragments. ◗ 19.12 Whole-genome shotgun sequencing utilizes sequence overlap to align sequenced fragments. Genomic DNA 1 Genomic DNA is cut into numerous small overlapping fragments and cloned in bacteria. www.whfreeman.com/pierce More about current developments in DNA sequencing The Human Genome Project 2 Each fragment is sequenced. TTACC AC GGGGA 3 Overlap in sequence is used to order the clones,… TTACC AC GGGGA GGGGA CGA TCCT 4 … and the entire genomic sequence is assembled by powerful computer programs. TCCT GCG AGAC AGAC GTG TCAA TTACC ACGGGGACGA TCCT GCG AGAC GTG TCAA By 1980, methods for mapping and sequencing DNA fragments had been sufficiently developed that geneticists began seriously proposing that the entire human genome could be sequenced. An international collaboration was planned to undertake the Human Genome Project ( ◗ FIGURE 19.13); initial estimates suggested that 15 years and $3 billion would be required to accomplish the task. As a part of the effort, the genomes of several model organisms, including Escherichia coli, Saccharomyces cerevisiae (yeast), Drosophila melanogaster (fruit fly), Arabidopsis thaliani (a plant), and Caenorhabditis elegans (a nematode) were to be sequenced as well. The genomes of these model organisms were sequenced to help develop methods that could then be applied to the sequencing of the human genome and to 560 Chapter 19 ◗ 19.13 The Human Genome Project has produced an initial draft of the sequence for the human genome. (Mario Tama/Getty.) provide sequenced genomes with which to compare the organization and structure of the human genome. The Human Genome Project officially got underway in October 1990. Initial efforts focused on developing new and automated methods for cloning and sequencing DNA and on generating detailed physical and genetic maps of the human genome. The methods described earlier for mapping, sequencing, and assembling DNA fragments were pivotal in these early stages of the project. By 1993, large-scale physical maps were completed for all 24 pairs of human chromosomes. At the same time, automated sequencing techniques ( ◗ FIGURE 19.14) had been developed that made large-scale sequencing feasible. The initial effort to sequence the genome was a public project consisting of the international collaboration of 20 research groups and hundreds of individual researchers who formed the International Human Genome Sequencing Consortium. In 1998, Craig Venter announced that he would lead a company called Celera Genomics in a private effort to sequence the human genome. The public and private efforts moved forward simultaneously but used different approaches. The Human Genome Consortium used a map-based approach; many copies of the human genome were cut up into fragments of about 150,000 bp each, which were inserted into bacterial artificial chromosomes. Yeast artificial chromosomes and cosmids had been used in early stages of the project but did not prove to be as stable as the BAC clones, although YAC clones were instrumental in putting together some of the larger contigs. Restriction fingerprints were used to assemble the BAC clones into contigs, which were positioned on the chromosomes by using genetic markers and probes. The individual BAC clones were sheared into smaller overlapping fragments and sequenced, and the whole genome was assembled by putting together the sequence of the BAC clones. Celera Genomics used a whole-genome shotgun approach to determine the human genome sequence, although the genetic and physical maps produced by the public effort helped Celera assemble the final sequence. In this approach, small-insert clones were prepared directly from genomic DNA and then sequenced. The overlapping of DNA sequences among these small-insert clones was then used to assemble the entire genome. Both public and private sequencing projects announced the completion of a rough draft that included most of the sequence of the human genome in the summer of 2000, 5 years ahead of schedule. Analysis of this sequence was published 6 months later. The availability of the complete sequence of the human genome is proving to be of enormous benefit. It is greatly facilitating the identification and isolation of genes that contribute to many human diseases and is providing probes that can be used in genetic testing, diagnosis, and drug development. The sequence is also providing important information about many basic cellular processes. Comparisons of the human genome with those of other organisms are adding to our understanding of evolution and the history of life. ◗ 19.14 Automated sequencers and powerful computers allowed a rough draft of the human genome sequence to be completed in just 10 years. (Whitehead/MIT Genome Center, 2001; from NATURE 409: 860 – 921.) Genomics The New Genetics ETHICS • SCIENCE • TECHNOLOGY Mapping the Human Genome — Where It Leads, What It Means In June 2000, scientists from the Human Genome Project and Celera Genomics stood at a podium with President Bill Clinton to announce a stunning achievement — they had successfully constructed a sequence of the entire human genome. Soon this process of identifying and sequencing each and every human gene became characterized as “mapping the human genome.” As with maps of the physical world, the map of the human genome provides a picture of locations, terrains, and structures. But, like explorers, scientists must continue to decipher what each location on the map can tell us about diseases, human health, and biology. The map accelerates this process because it allows researchers to identify key structural dimensions of the gene that they are exploring and reminds them where they have been and where they have yet to explore. What does the map of the human genome depict? When researchers discuss the sequencing of the genome, they are describing the identification of the patterns and order of the 3 billion human DNA base pairs. Although such identification provides valuable information about overall structure and the evolution of humans in relation to other organisms, researchers really want the key information encoded in just 2% of this enormous map — the information that encodes most of the proteins of which you and I are composed. Proteins stand as the link between genes and pharmaceutical drug development, they show which genes are being expressed at any given moment, and they provide information about gene function. Knowing our genes will lead to a greater understanding and radically improved treatment of many diseases. However, sequencing the entire human genome, in conjunction with sequencing of various nonhuman genomes under the same project, has raised fundamental questions about what it means to be human. After all, fruit flies possess about one-third the number of genes possessed by humans, and an ear of corn has approximately the same number of genes as a human. In addition, the overall DNA sequence of a chimpanzee is about 99% the same as the human genome sequence. As the genomes of other species become available, the similarities to the human genome in both structure and sequence pattern will continue to be identified. At a basic level, the discovery of so many commonalities Dr. Craig Venter (Celera Genomics), President Clinton, and Dr. Francis Collins (NIH). (Ron Edmonds/AP) and links and ancestral trees with other species adds credence to principles of evolution and Darwinism. Some of the most expected developments and potential benefits of the Human Genome Project directly affect human health; researchers, practicing physicians, and the general public eagerly await the development of targeted pharmaceutical agents and more specific diagnostic tests. Pharmacogenomics is at the intersection of genetics and pharmacology; It is the study of how one’s genetic makeup will affect one’s response to various drugs. In the 561 Arthur L. Caplan and Kelly A. Carroll future, medicine will potentially be safer, cheaper, and more disease specific, all while causing fewer side effects and acting more effectively, the first time around. There are, however, some hard ethical questions that follow in the wake of new genetic knowledge. Patients will have to undergo genetic testing in order to match drugs to their genetic makeup. Who will have access to these results — just the health care practitioner? Or will the patient’s insurance company, employer, school, or family have access? Although the tests may have been administered for one case, will information derived from them be used for other purposes, such as for the identification of other conditions or future diseases or even in research studies? How should researchers conduct studies in pharmacogenomics? Often, they need to study subjects by some kind of identifiable trait that they believe will assist in separating groups of drugs, and in turn they separate people into populations. The order of almost all the DNA base pairs (99.9 %) is exactly the same in all humans, leaving a small window of difference. The potential for the stigmatization of individuals and groups of people based on race and ethnicity is inherent in genomic research and analysis. As scientists continue drug development, they must be careful not to further such ideas, especially because studies of nuclear DNA indicate that there is often more genetic variation within ethnic groups or cultures than between ethnic groups or cultures. These are just a few of the ethical issues arising out of one development of the Human Genome Project. The potential applications of genome research are staggering, and the mapping is just the beginning. 562 Chapter 19 Concepts The Human Genome Project is an effort to sequence the entire human genome. Begun in 1990, a rough draft of the sequence was completed by two competing teams, an international consortium of publicly supported investigators and a private company, both of which finished a rough draft of the genome sequence in 2000. www.whfreeman.com/pierce Information about the Human Genome Project and numerous links to it Single-Nucleotide Polymorphisms In addition to the DNA sequence of an entire genome, several other types of data are useful for genomic projects and have been the focus of sequencing efforts. One consists of single-nucleotide polymorphisms (SNPs, pronounced “snips”), which are single-base-pair differences in DNA sequence between individual members of a species. Arising through mutation, SNPs are inherited as allelic variants (just like alleles that produce phenotypic differences, such as blood types), although SNPs do not usually produce a phenotypic difference. Single-nucleotide polymorphisms are numerous and are present throughout genomes. In a comparison of the same chromosome from two different people, a SNP can be found approximately every 1000 bp. Because of their variability and widespread occurrence throughout the genome, SNPs are valuable as markers in linkage studies. For example, human SNPs are being cataloged and mapped for use in identifying genes that contribute to disease. When a SNP is physically close to a disease-causing locus, it will tend to be inherited along with the disease-causing allele. Thus the SNP marks the location of a genetic locus that causes the disease. A SNP can also be useful for determining family relationships — most SNPs are unique within a population, having arisen only once by mutation. Thus the presence of the same SNP in two persons often indicates that they have a common ancestor. Expressed-Sequence Tags Another type of data identified by sequencing projects consists of databases of expressed-sequence tags (ESTs). In most eukaryotic organisms, only a small percentage of the DNA actually encodes proteins; in humans, less than 2% of human DNA encodes the amino acids of proteins. If only protein-encoding genes are of interest, it is often more efficient to examine RNA than the entire DNA genomic sequence. RNA can be examined by using ESTs — markers associated with DNA sequences that are expressed as RNA. Expressed-sequence tags are obtained by isolating RNA from a cell and subjecting it to reverse transcription, producing a set of cDNA fragments that correspond to RNA molecules from the cell. Short stretches of these cDNA fragments are then sequenced, and the sequence obtained (called a tag) provides a marker that identifies the DNA fragment. Expressed-sequence tags can be used to find active genes in a particular tissue or at a particular point in development. Concepts In addition to the genomic-sequence data, genomic projects are collecting databases of nucleotides that vary among individuals (single-nucleotide polymorphisms, SNPs) and markers associated with transcribed sequences (expressed-sequence tags, ETSs). Bioinformatics By the time this book is published, complete genome sequences will have been determined for more than 100 different organisms, with many additional projects underway. These studies are producing tremendous quantities of sequence data. GenBank, one of the major databases of DNA sequence information, now contains more than 19 billion base pairs of sequence, and this number increases in size every month. Cataloging, storing, retrieving, and analyzing this huge data set are a major challenge of modern genetics. Bioinformatics is an emerging field consisting of molecular biology and computer science that centers on developing databases, computer-search algorithms, geneprediction software, and other analytical tools that are used to make sense of DNA, RNA, and protein sequence data. Bioinformatics develops and applies these tools to “mine the data,” extracting the useful information from sequencing projects. Before being sequenced, most genomes contain few genes whose locations have already been determined, which, coupled with the enormous amount of DNA in a genome and the complexities of gene structure, makes finding genes a difficult task. Computer programs have been developed to look for specific sequences in DNA that are associated with certain genes. For example, protein-encoding genes are characterized by an open reading frame (ORF), which includes a start codon and a stop codon in the same reading frame. Specific sequences mark the splice sites at the beginning and end of introns; other specific sequences are present in promoters immediately upstream of start codons. Still other sequences are associated with particular functions in certain classes of proteins. Computer programs have been developed that scan the DNA for these sequences and identify genes on the basis of their presence and position. Some of these programs are capable of examining databases of EST and protein sequences to see if there is evidence that a potential gene is expressed. Genomics It is important to recognize that the programs that have been developed to identify genes on the basis of DNA sequence are not perfect. Therefore, the numbers of genes reported in most genome projects are estimates. The presence of multiple introns, alternative splicing, multiple copies of some genes, and much noncoding DNA between genes makes accurate identification and counting of genes difficult. www.whfreeman.com/pierce and bioinformatics Information on ESTs, SNPs, homologous. Homologous genes found in different species that evolved from the same gene in a common ancestor are called orthologs ( ◗ FIGURE 19.15). For example, both mouse and human genomes contain a gene that encodes the alpha subunit of hemoglobin; the mouse and human alphahemoglobin genes are said to be orthologs, because both genes evolved from an alpha-hemoglobin gene in a mammalian ancestor common to mice and humans. Homologous genes in the same organism (arising by duplication of a single gene in the evolutionary past) are called paralogs (see Figure 19.15). Within the human genome is a gene that Functional Genomics A genomic sequence is, by itself, of limited use. It would be like having a huge set of encyclopedias without being able to read — you could recognize the different letters but the text would be meaningless. Functional genomics is, in essence, probing genome sequences for meaning — identifying genes, recognizing their organization, and understanding their function. The goals of functional genomics include identifying all the RNA molecules transcribed from a genome (the transcriptome) and all the proteins encoded by the genome (the proteome). Functional genomics exploits both bioinformatics and laboratory-based experimental approaches in its search to define the function of DNA sequences. Chapter 18 considered several methods for identifying genes and assessing their functions, including in situ hybridization, DNA footprinting, experimental mutagenesis, and the use of transgenic animals and knockouts. These methods can be applied to individual genes and can provide important information about the locations and functions of genetic information. In this section, we will focus primarily on methods that rely on knowing the sequences of other genes or that can be applied to large numbers of genes simultaneously. A Gene duplication A Evolution A1 A2 Speciation A1 A 2 Predicting Function from Sequence The nucleotide sequence of a gene can be used to predict the amino acid sequence of the protein that it encodes. The protein can then be synthesized or isolated and its properties studied to determine its function. However, this biochemical approach to understanding gene function is both time consuming and expensive. A major goal of functional genomics has been to develop computational methods that allow gene function to be identified from DNA sequence alone, bypassing the laborious process of isolating and characterizing individual proteins. Homology searches One computational method (often the first employed) for determining gene function is to conduct a homology search, which relies on comparing DNA and protein sequences from the same and different organisms. Genes that are evolutionarily related are said to be A A1 A 2 Evolution A1 A 2 B1 B2 Conclusion: Genes A1 and A2 are paralogs. Genes B1 and B2 are paralogs. Genes A1 and B1 are orthologs. Genes A2 and B2 are orthologs. ◗ 19.15 Homologous sequences are evolutionarily related. Orthologs are homologous sequences found in different species; paralogs are homologous genes in the same species that arise from gene duplication. 563 564 Chapter 19 encodes the alpha subunit of hemoglobin and another homologous gene that encodes the beta subunit of hemoglobin. These two genes arose because an ancestral gene underwent duplication and the resulting two genes diverged through evolutionary time, giving rise to the alpha- and beta-subunit genes; these two genes are paralogs. Homologous genes (both orthologs and paralogs) often have the same or related functions; so, after a function has been assigned to a particular gene, it can provide a clue to the function of a homologous gene. Databases containing genes and proteins found in a wide array of organisms are available for homology searches. Powerful computer programs have been developed for scanning these databases to look for particular sequences. A commonly used homology search program is BLAST (Basic Local Alignment Search Tool). Suppose a geneticist sequences a genome and locates a gene that encodes a protein of unknown function. A homology search conducted on databases containing the DNA or protein sequences of other organisms may identify one or more orthologous sequences. If a function is known for one of these sequences, that function may provide information about the function of the newly discovered protein. In a similar way, computer programs can search a single genome for paralogs. Eukaryotic organisms often contain families of genes that have arisen by duplication of a single gene. If a paralog is found and its function has been previously assigned, this function can provide information about a possible function of the unknown gene. However, paralogs often evolve new functions; so information about their functions must be used cautiously. Of the genes newly identified through genomic-sequencing projects, 50% are significantly similar to orthologs and paralogs whose function has already been described. The 50% of newly identified genes that cannot be assigned a function on the basis of homology searches will undoubtedly decrease in number as functions are assigned to more and more genes and as more genomes are sequenced. Other sequence comparisons Complex proteins often contain regions that have specific shapes or functions called protein domains. For example, certain DNA-binding proteins attach to DNA in the same way; these proteins have in common a domain that provides the DNA-binding function. Each protein domain has an arrangement of amino acids common to that domain. There are probably a limited, though large, number of protein domains, which have mixed and matched through evolutionary time to yield the protein diversity seen in present-day organisms. Many protein domains have been characterized, and their molecular functions have been determined. The sequence from a newly identified gene can be scanned against a database of known domains. If the gene sequence encodes one or more domains whose functions have been previously determined, the function of the domain can provide important information about a possible function of the new gene. Another computational method for predicting protein function is a phylogenetic profile. In this method, the presence-and-absence pattern of a particular protein is examined across a set of organisms whose genomes have been sequenced. If two proteins are either both present or both absent in all genomes surveyed, the two proteins may be functionally related. For example, the two proteins might function as consecutive steps in a biochemical pathway. The idea is that the two proteins depend on each other and will evolve together. One protein cannot function without the other, and they will either both be present or both be absent. Consider the following proteins in four bacterial species ( ◗ FIGURE 19.16a): E. coli: protein 1, protein 2, protein 3, protein 4, protein 5, protein 6 Species A: protein 1, protein 2, protein 3, protein 6 Species B: protein 1, protein 3, protein 4, protein 6 Species C: protein 2, protein 4, protein 5 We can create a phylogenetic profile by constructing a table comparing the presence () or absence () of the proteins in the four bacterial species ( ◗ FIGURE 19.16b). The phylogenetic profile reveals that proteins 1, 3, and 6 are either all present or all absent in all species; so these proteins might be functionally related. Examining fusion patterns among proteins is another method for predicting functional relations; this technique is sometimes called the Rosetta Stone method. Functionally related, separate proteins in one organism sometimes exist as a single, fused protein in another organism. Thus, the presence of a fused A  B protein in one species suggests that separate proteins A and B in another organism may be functionally related. Yet another method for determining the function of an unknown gene is gene neighbor analysis ( ◗ FIGURE 19.17). Genes that encode functionally related proteins are often closely linked in bacteria. For example, if two genes are consistently linked in the genomes of several bacteria, they might be functionally related. Functionally related genes are sometimes also linked in eukaryotes; examples are the hox genes, which play an important role in embryonic development (Chapter 21). It is important to recognize that functions suggested by computational methods such as homology searches, phylogenetic profiling, fusion proteins, and neighbor analysis do not define a protein’s function; rather these computational methods provide hints about possible Genomics 6 2 4 2 Genes 4 and 5 are consistently linked. 7 5 6 4 3 8 1 5 7 3 Genome of species A 8 1 Genome of species B 8 1 3 5 4 7 5 4 6 Genes 1 and 8 are consistently linked. 8 1 2 6 3 2 Genome of species C 7 Genome of species D Conclusion: Genes 4 and 5 may be functionally related. Genes 1 and 8 may be functionally related. ◗ 19.17 The gene neighbor method infers gene function on the basis of the linkage arrangements of the genes. Genes that are consistently linked in different genomes may be functionally related. Gene Expression and Microarrays ◗ 19.16 Phylogenetic profiling can be used to infer protein function. (Micrographs from: top, CNRI/SPL/Photo Researchers; middle left and center, Gary Gaugler/Visuals unlimited; middle right, M. Abbey/Visuals unlimited.) functions that can be pursued through detailed analyses of the biochemistry and cellular location of the protein. Nevertheless, these computational methods and others like them have proved to be invaluable in determining the functions of genes revealed in genomic studies. Concepts Genes can be identified by computer programs that look for characteristic features of genes, such as start and stop codons in the same reading frame, sequences that mark the beginning and the end of introns, and sequences found within promoters. Clues to the functions of genes can be obtained by homology searches, comparing protein domains, phylogenetic profiling, protein fusion patterns, and gene neighbor analysis. Many important clues about gene function come from knowing when and where the genes are expressed. The development of microarrays has allowed the expression of thousand of genes to be monitored simultaneously. Microarrays rely on nucleic acid hybridization (see Chapter 18), in which a known DNA fragment is used as a probe to find complementary sequences ( ◗ FIGURE 19.18). The probe is usually fixed to some type of solid support, such as a nylon filter or a glass slide. A solution containing a mixture of DNA or RNA is applied to the solid support; any nucleic acid that is complementary to the probe will bind to it. Nucleic acids in the mixture are labeled with a radioactive or fluorescent tag so that molecules bound to the probe can be easily detected. In a microarray (also called a gene chip), numerous known DNA fragments are fixed to a solid support in an orderly pattern or array, usually as a series of dots. These DNA fragments (the probes) usually correspond to known genes. When the microarray has been constructed, mRNA, DNA, or cDNA isolated from experimental cells is labeled with fluorescent nucleotides and applied to the array. Any of the DNA or RNA molecules that are complementary to probes on the array will hybridize with them and emit fluo- 565 566 Chapter 19 1 A microarray consists of DNA probes fixed to a solid support, such as a nylon membrane or glass slide. 2 Each spot has a different DNA probe. Microarray Hybridization Cell Cellular RNA 3 RNA is extracted from cells,… cDNA (single stranded) 4 …and reverse transcription in the presence of a labeled nucleotide produces cDNA molecules with a fluorescence tag. 5 The tagged cDNA will pair with any complementary probe. 6 After hybridization, the color of the dot indicates the relative amount of mRNA in the samples. 7 A microarray can be constructed with thousands of different DNA probes. ◗ 19.18 Microarrays are used to simultaneously detect the expression of many genes. (D. Lockhart and E. Winzeler, 2000, Nature 405:827.) rescence, which can be detected by an automated scanner. An array containing tens of thousands of probes can be applied to a glass slide or silicon wafer just a few square centimeters in size. One type of DNA chip is illustrated in ◗ FIGURE 19.19. For this chip, mRNA from experimental cells is converted into cDNA and labeled with red fluorescent nucleotides. MessengerRNA from control cells is converted into cDNA and labeled with green fluorescent nucleotides. The labeled cDNAs are mixed and hybridized to the DNA chip, which contains DNA probes from different genes. Hybridization of the red (experimental) and green (control) cDNAs is proportional to the relative amounts of mRNA in the samples. The fluorescence of each spot is assessed with microscopic scanning and appears as a single color. Red indicates the overexpression of a gene in the experimental cells relative to that in the control cells (more red-labeled cDNA hybridizes), whereas green indicates the underexpression of a gene in the experimental cells relative to that in the control cells (more green-labeled cDNA hybridizes). Yellow indicates equal expression in experimental and control cells (equal hybridization of red- and greenlabeled cDNAs), and no color indicates no expression in either experimental or control cells. Microarrays that allow the detection of specific alleles, SNPs, and even particular proteins also have been created. Microarrays allow the expression of thousands of genes to be monitored simultaneously, enabling scientists to study which genes are active in particular tissues. They can also be used to investigate how gene expression changes in the course of biological processes such as development or disease progression. In one study, researchers examined gene expression to predict the long-term outcome for women who had undergone treatment for breast cancer. Breast cancer affects 1 of 10 women in the United States, and half of those with the disease die from it. Current treatment depends on a number of factors, including a woman’s age, the size of the tumor, the characteristics of tumor cells, and whether the cancer has already spread to nearby lymph nodes. Many women whose cancer has not spread are treated by removal of the tumor and radiation therapy, yet the cancer later reappears in some of the women thus treated. These women might benefit from more-aggressive treatment when the cancer is first detected. Using microarrays, researchers examined the expression patterns of 25,000 genes from primary tumors of 78 young women who had breast cancer. In 34 of these patients, the cancer later spread to other sites; the other 44 patients remained free of breast cancer for 5 years after their initial diagnoses. The researchers identified a subset of 70 genes whose expression patterns in the initial tumors accurately predicted whether the cancer would later spread ( ◗ FIGURE 19.20). This degree of prediction was much higher than that of traditional predictive measures, which are based on the size and histology of the tumor. These results, though preliminary and confined to a small sample of cancer patients, suggest that gene-expression data Genomics ◗ 19.19 Microarrays can be used to compare levels of gene expression in different types of cells. obtained from microarrays can be a powerful tool in determining the nature of cancer treatment. Concepts Microarrays, consisting of DNA probes attached to a solid support, can be used to determine which RNA and DNA sequences are present in a mixture of nucleic acids. They are capable of determining which RNA molecules are being synthesized and thus can be used to examine changes in gene expression. ◗ 19.20 Microarrays can be used to examine gene expression associated with disease progression. Shown here are expression patterns of 70 genes in the initial tumors from patients whose cancer later spread to other sites and from other patients who remained free of breast cancer for 5 years after their initial diagnosis. Red indicates higher gene expression; green indicates lower gene expression; black indicates no change in gene expression; and gray indicates no data available. Each row represents the primary tumor from a patient and each column represents a different gene. Tumors below the solid yellow line came primarily from patients in which the cancer spread to distant sites within 5 years of diagnosis; tumors above the solid line came primarily from patients who remained cancer free for at least 5 years. (L.J. Van’t Veer, 2002, Nature 405:532.) www.whfreeman.com/pierce More on microarrays Genomewide Mutagenesis One of the best methods for determining the function of a gene is to examine the phenotypes of individual organisms that possess a mutation in the gene. Traditionally, genes encoding naturally occurring variations in a phenotype were mapped, the causative genes were isolated, and their products were studied. But this procedure was limited by the number of naturally occurring mutations and the difficulty of mapping genes with a limited number of chromosomal markers. The number of naturally occurring mutations can be increased by exposure to mutagenic agents, and the accuracy of mapping is increased dramatically by the availability of mapped molecular markers, such as RFLPs, microsatellites, STSs, ETSs, and SNPs. These two methods — random 567 568 Chapter 19 inducement of mutations on a genomewide basis and mapping with molecular markers — are coupled and automated in a mutagenesis screen. Mutagenesis screens can be used to search for specific genes encoding a particular function or trait. For example, mutagenesis screens of mice are being used to identify genes having roles in cardiovascular function. When genes that affect cardiovascular function are located in mice, homology searches are carried out to determine if similar genes exist in humans. These genes can then be studied to better understand cardiac disease in humans. To conduct a mutagenesis screen, random mutations are induced in a population of organisms, creating new phenotypes. The mutations are induced by exposing the organisms to radiation, a chemical mutagen (Chapter 17), or transposable elements (DNA sequences that insert randomly into the DNA; Chapter 20). The procedure for a typical mutagenesis screen is illustrated in ◗ FIGURE 19.21. Here, male zebra fish are treated with ethylmethylsulfonate, or EMS, a chemical that induces mutations in their sperm. The treated males are mated with wild-type female fish. The offspring are heterozygous for mutations induced by EMS and are screened for any variant phenotypes that might be the product of dominant mutations expressed in these heterozygous fish. Recessive mutations will not be expressed in the F1 progeny but can be revealed with further breeding. The F1 offspring are mated with wild-type fish, and the offspring from this cross are then backcrossed with their male parents, producing fish that are homozygous for recessive mutations. The offspring of the backcross are then screened for variant phenotypes. The fish with variant phenotypes undergo further breeding experiments to verify that their variant phenotype is, in fact, due to a single-gene mutation. After the genetic nature of an abnormal phenotype has been verified, the gene that causes the phenotype can be located by positional cloning. The first step in positional cloning is to demonstrate linkage between the trait and one or more already mapped genetic markers. The progeny of genetic crosses that include the mutant phenotype are examined for a large number of molecular markers that cover the entire genome. The cosegregation of markers and the mutant phenotype provides evidence of linkage, indicating that the marker and the gene encoding the mutant phenotype are physically linked on the same chromosome. Cosegregating markers provide information about the general chromosome region in which the gene is located. The next step is to localize the mutated gene to a smaller region of the chromosome, which is usually done by examining a linkage map of the chromosome region to identify other molecular markers in close proximity to the gene of interest. The gene causing the mutant phenotype is then mapped in relation to these markers. Next is the creation of a physical map, which requires a set of overlap- 1 Male zebrafish are treated with EMS to produce mutations in their sperm… 2 …and are then crossed with wild-type females.  Wild-type female EMS-treated male 3 Progeny fish are screened for mutant phenotypes. Recessive mutations (m2) will not be expressed in the F1 phenotype. 4 Variant fish may posses a dominant mutation (M1). m 2/+ +/+ M1/+ +/+ Further breeding and positional cloning  m 2/+ +/+ 5 Fish with normal phenotypes are mated with wild-type fish… +/+ m 2/+ m 2/+ 6 …and backcrossed to reveal recessive mutations. m 2/+ m 2/m 2 +/+ 7 Some fish homozygous for recessive mutations are produced. Further breeding and positional cloning ◗ 19.21 Genes affecting a particular characteristic or function can be identified by a genomewide mutagenesis screen. In this illustration, M1 represents a dominant mutation and m2 represents a recessive mutation. ping clones from the area of interest. A physical map of these overlapping clones that includes information about the molecular markers allows the identification of one or more clones that contain the gene of interest. These clones Genomics are then sequenced to find potential candidate genes that might encode the mutant phenotype. Candidate genes are evaluated by studying their expression patterns, protein products, and homology to genes of known function. This information might suggest that one or more of the candidate genes is likely to be the cause of the phenotype. The candidate genes can be examined for the presence of mutations in the gene sequences carried by those individuals having a mutant phenotype. Further proof that a particular gene causes the phenotype can be obtained by mutating a specific gene and observing the phenotype in the offspring. Concepts Genomewide mutagenesis screening coupled with positional cloning can be used to identify genes that affect a specific characteristic or function. Table 19.2 Comparative Genomics Genome-sequencing projects provide detailed information about gene content and organization in different species and even in different members of the same species, allowing inferences about how genes function and genomes evolve. They also provide important information about evolutionary relationships among organisms and about factors that influence the speed and direction of evolution. Prokaryotic Genomes A large number of bacterial genomes have now been sequenced (Table 19.2). Most prokaryotic genomes consist of a single circular chromosome, but there are exceptions, such as Vibrio cholerae (the bacterium that causes cholera; see introduction to Chapter 8), which has two circular chromosomes, and Borrelia burgdorferi, which has one large linear chromosome and 21 smaller chromosomes. The total amount of DNA in prokaryotic genomes ranges from more than 7 million base pairs in Mesorhizo- Characteristics of some completely sequenced representative prokaryotic genomes Size (Millions of Base Pairs) Number of Predicted Genes G  C (%) Archaeoglobus fulgidus 2.18 2407 49 Methanobacterium thermoautotrophicum 1.75 1869 50 Methanococcus jannaschii 1.66 1715 32 Thermoplasma acidophilum 1.56 1478 46 Bacillus subtilis 4.21 4100 44 Bordetella parapertussis 4.75 * 69 Buchnera species 0.64 564 27 Campylobacter jejuni 1.64 1654 31 Escherichia coli 4.64 4289 51 Haemophilus influenzae 1.83 1709 39 Mesorhizobium loti 7.04 6752 63 Mycobacterium tuberculosis 4.41 3918 66 Mycoplasma genitalium 0.58 480 32 Staphylococcus aureus 2.88 2697 33 Treponema pallidum 1.14 1031 53 Ureaplasma urealyticum 0.75 611 26 Vibrio cholerae 4.03 3828 48 Species Archaea Eubacteria Source: Data from the Genome Atlas of the Center for Biological Sequence Analysis, http://www.cbs.dtu.dk/services/GenomeAtlas/ * Data not available. 569 570 Chapter 19 bium loti to only 580,000 bp in Mycoplasma genitalium. Escherichia coli, the most widely used bacterium for genetic studies, has 4.6 million base pairs ( ◗ FIGURE 19.22a). The number of genes is usually from 1000 to 2000, but some species have as many as 6700, and others as few as 480. The density of genes is rather constant across all species, with about 1 gene for every 1000 bp. Thus bacteria with larger genomes usually have more genes. Only about half of the genes identified in prokaryotic genomes can be assigned a function. Almost a quarter of the genes have no significant sequence similarity to any other known genes in bacteria, suggesting that there is considerable genetic diversity among bacteria. The number of genes that encode biological functions such as transcription and translation tends to be similar among species, even when their genomes differ greatly in size. This similarity suggests that these functions are encoded by a basic set of proteins that does not vary among species. On the other hand, the number of genes taking part in biosynthesis, energy metabolism, transport, and regulatory functions varies greatly among species and tends to be higher in larger genomes. The functions of predicted genes (i.e., genes identified by computer programs) and known genes in E. coli are presented in ◗ FIGURE 19.22b. A substantial part of the “extra” DNA found in the larger bacterial genomes is made up of paralogous genes that have arisen by duplication. The G  C content (percentage of bases that consist of guanine or cytosine) of prokaryotic genomes varies widely, from 26% to 69%. This more-than-twofold difference in G  C content affects the frequency of particular amino acids in the proteins produced by different bacterial species. For example, glycine, alanine, proline, and argi- (a) Escherichia coli (common bacteria) nine are encoded by codons that have G and C nucleotides; so these amino acids are incorporated into proteins with higher frequency in organisms whose genomes have a high G  C content. On the other hand, isoleucine, phenylalanine, tyrosine, and methionine are encoded by codons that tend to have A and T (U in RNA) nucleotides; so these amino acids are found more frequently in proteins encoded by species whose genome has a low G  C content. Which synonymous codons are used is also affected by the G  C content; some synonymous codons have more G and C nucleotides than do others, and these codons tend to be used more frequently in those species with high G  C content. The results of genomic studies of prokaryotic species support the conclusion that archaea and eubacteria are evolutionarily unique (see Chapter 2). The results also reveal that both closely and distantly related bacterial species periodically exchange genetic information over evolutionary time, a process called horizontal gene exchange. Such exchange may take place through bacterial uptake of DNA in the environment (transformation), through the exchange of plasmids, and through viral vectors (see Chapter 8). Horizontal gene exchange has been recognized for some time, but analyses of many microbial genomes now indicate that it is more extensive than was previously recognized. For example, an analysis of two eubacteria species demonstrated that from 20% to 25% of their genes were more similar to genes from archaea than to those from other eubacterial species. www.whfreeman.com/pierce and other genomes (b) 6 5 1 4 32 7 8 17 9 10 11 12 One circular chromosome Genome size: 4.64 million bp Number of genes: 4289 G + C content: 51% ◗ 19.22 Information on prokaryotic 13 14 15 16 Genomic characteristics of the bacterium E. coli. (a) Genome size, number of genes, and G  C content. (b) Percentages of genes affecting various known and unknown functions. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. Fatty acids, phospholipid metabolism Transcription, RNA metabolism Nucleotide metabolism Phage, transposon, plasmid functions DNA replication, recombination, repair Carbon compounds metabolism Amino acid metabolism Other genes with known functions Regulatory functions Translation, protein metabolism Central intermediary metabolism Adaptation, protection functions Cell, wall, membrane structural components Energy metabolism Putative enzymes Transport proteins Genes with unknown functions Genomics Table 19.3 Characteristics of Some Eukaryotic Genomes That Have Been Completely Sequenced Species Genome Size (Millions of Base Pairs) Saccharomyces cerevisiae (yeast) Number of Predicted Genes Number of Protein-Domain Families 12 6,144 851 Arabidopsis thaliana (plant) 125 25,706 1012 Caenorhabditis elegans (roundworm) 100 18,266 1014 180 13,338 1035 3400 32,000 1262 Drosophila melanogaster (fruit fly) Homo sapiens (human) Source: Number of genes and protein-domain families from International Human Genome Sequencing Consortium, Initial sequencing and analysis of the human genome, Nature 409 (2001), Table 23. Eukaryotic Genomes The genomes of only a few eukaryotic organisms have been completely sequenced, but some tentative statements can be made about the content and organization of eukaryotic genetic information from these organisms. The genomes of eukaryotic organisms (Table 19.3) are larger than those of prokaryotes, and, in general, multicellular eukaryotes have more DNA than do simple, single-celled eukaryotes such as yeast (see p. 000 in Chapter 11). There is no close relation, however, between genome size and complexity among the multicellular eukaryotes. For example, the roundworm Caenorhabditis elegans is structurally more complex than the plant Arabidopsis but has considerably less DNA. Eukaryotic genomes also contain more genes than do prokaryotes, and the genomes of multicellular eukaryotes have more genes than do the genomes of single-celled eukaryotes. The number of genes among multicellular eukaryotes also is not obviously related to phenotypic complexity: humans have more genes than do invertebrates but only twice as many as fruit flies and only slightly more than the plant Arabidopsis. Eukaryotic genomes contain multiple copies of many genes, indicating that gene duplication has been an important process in genome evolution. A substantial part of the genomes of multicellular organisms consists of moderately and highly repetitive sequences (see Chapter 11), and the percentage of repetitive sequences is usually higher in those species with larger genomes (Table 19.4). Most of these repetitive sequences appear to have arisen through transposition. This is particularly evident in the human genome, where 45% of the DNA is derived from transposable elements, many of which are defective and no longer able to move. The majority of DNA in multicellular organisms is noncoding, and many genes are interrupted by introns. In the more complex eukaryotes, both the number and the length of the introns are greater. In spite of only a modest increase in gene number, vertebrates have considerably more protein diversity than do invertebrates. The human genome does not encode many new Table 19.4 Percentage of genome consisting of interspersed repeats derived from transposable elements Organism Plant (Arabidopsis thaliana) Percentage of Genome 10.5 Worm (Caenorhabditis elegans) 6.5 Fly (Drosophila melanogaster) 3.1 Human (Homo sapiens) 44.4 protein domains; there are 1262 domains in humans compared with 1035 in fruit flies (see Table 19.3). However, the existing domains in humans are assembled into more combinations, leading to many more types of proteins. For example, the human genome contains almost two times as many arrangements of protein domains as worms or flies contain and almost six times as many as yeast contains. Humans, worms, and flies have many of the same families of genes in common, but each family in the human genome has a greater number of different genes, suggesting that gene duplication has been an important process in vertebrate evolution. Concepts Comparative genomics compares the content and organization of whole genomic sequences from different organisms. Prokaryotic genomes are small, usually ranging from 1 million to 3 million base pairs of DNA, with several thousand genes. Among multicellular eukaryotic organisms, there is no clear relation between organismal complexity and amount of DNA or gene number. A substantial part of the genome in eukaryotic organisms consists of repetitive DNA, much of which is derived from transposable elements. 571 572 Chapter 19 (a) Saccharomyces cerevisiae (yeast) (b) 11 10 1 16 pairs of linear chromosomes Genome size: 12.1 million bp Number of genes: 6100 G + C content: 38% 9 2 3 4 5 1. 2. 3. 4. 5. 8 6. 7. 8. 9. 7 10. 11. Cellular organization and biogenesis Intracellular transport Transport facilitation Protein destination Protein synthesis Transcription Cell growth, cell division, and DNA synthesis Energy Metabolism Cell rescue Signal transduction 6 ◗ 19.23 Genomic characteristics of yeast, Saccharomyces cerevisiae. (a) Number of chromosomes, genome size, number of genes, and G  C content. (b) Percentages of genes affecting various known and unknown functions. Yeast genome As mentioned earlier, Saccharomyces cere- Caenorhabditis elegans (round worm) visiae (yeast) was the first eukaryotic genome to be completely sequenced. Its genome consists of 12.1 million base pairs of DNA and 6100 potential genes, of which about 5900 encode proteins ( ◗ FIGURE 19.23a), giving a gene density of about one gene for every 2000 bp of DNA. The distribution of gene functions in yeast is displayed in ◗ FIGURE 19.23b. The yeast genome contains considerable redundancy; there are a number of blocks of repeated sequences in the genome, and 30% of the genes exist in two or more copies. Worm genome Caenorhabditis elegans, a roundworm, has a genome consisting of 97 million base pairs of DNA ( ◗ FIGURE 19.24). More than 18,000 protein-encoding genes have been identified in the C. elegans genome, of which more than 40% are homologous with genes found in other organisms. There is one gene for about every 5000 bp of DNA, and gene density is more uniform across chromosomes than it is in most eukaryotes. Plant genome The genome of Arabidopsis thaliana, a small mustardlike plant, consists of 167 million base pairs of DNA ( ◗ FIGURE 19.25a), encoding 25,706 predicted genes. Although Arabidopsis has many proteins in common with yeast, worm, fly, and humans, it has roughly 150 protein families not seen in other eukaryotes, including structural proteins, transcription factors, enzymes, and proteins of unknown function. ◗ FIGURE 19.25b shows the distribution of gene functions in Arabidopsis. Gene duplication has played an important role in the evolution of Arabidopsis, with 60% of its genome consisting Six pairs of linear chromosomes Genome size: 97 million bp Number of genes: 18,266 G + C content: 49% ◗ 19.24 Genomic characteristics of the roundworm, Caenorhabditis elegans. of duplicated segments. Seventeen percent of the genes exist in tandem arrays, which are multiple copies of the same gene positioned one after another. One of the processes that produce tandem arrays of duplicated genes is unequal crossing over (see p. 000 in Chapter 9). A number of large duplicated regions, encompassing hundreds of thousands or millions of base pairs of DNA also are present. The large extent of duplication in the Arabidopsis genome suggests that this species had a tetraploid (4N) ancestor (see Chapter 9) and that all genes were duplicated in the past, followed by extensive gene rearrangement and divergence. Thus, at least two different mechanisms seem to have led to the large number of duplications seen in the Arabidopsis genome: (1) duplication of the whole genome through polyploidy; and Genomics (a) Arabidopsis thaliana (mustard-like weed) 573 (b) 13 1 12 11 10 Five pairs of linear chromosomes Genome size: 167 million bp Number of genes: 25,706 G + C content: 47% 9 2 8 6 34 5 7 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. Metabolism Unclassified Ionic homeostasis Protein synthesis Energy Transport facilitation Cellular biogenesis Intracellular transport Protein destination Cellular communication and signal transduction Cell rescue, defense, cell death, aging Cell growth, cell division, and DNA synthesis Transcription ◗ 19.25 Genomic characteristics of the mustard plant, Arabidopsis thaliana. (a) Number of chromosomes, genome size, number of genes, and G  C content. (b) Percentages of genes affecting various known and unknown functions. (2) duplication of individual genes arrayed in tandem through unequal crossing over. Transposable elements are common in the Arabidopsis genome and make up about 10% of the genome but are much less frequent than in the human genome and in some other plant genomes. Most of these transposable elements are not transcribed, and many are concentrated in the regions surrounding the centromere. Although Arabidopsis, C. elegans, and Drosophila have similar numbers of proteins, the Arabidopsis genome has more genes. This difference can be explained by the large number of duplicated copies of genes found in the Arabidopsis genome. Fly genome Drosophila melanogaster, the fruit fly, has a genome of 180 million base pairs of DNA located on four chromosomes ( ◗ FIGURE 19.26). A third of its genome is made up of heterochromatin, which contains few genes. This extensive heterochromatin, consisting mainly of short simple repeats, made sequencing the genome of Drosophila difficult (because the repeats lead to much overlap in Drosophila melanogaster (fruit fly) Four pairs of linear chromosomes Genome size: 180 million bp Number of genes: 13,338 G + C content: 41% ◗ 19.26 Genomic characteristics of Drosophila melanogaster. sequence among cloned fragments, making it difficult to assemble the clones in the correct order). Drosophila has more than 13,000 predicted genes. There are 14,113 RNA transcripts produced from these genes, with some genes encoding multiple transcripts through alternative splicing. Drosophila genes average four exons per gene, although this number is probably an underestimate. The average RNA molecule encoded by a gene is 3058 nucleotides in length. Human genome The human genome is 3.4 billion base pairs in length ( ◗ FIGURE 19.27a). Only about 25% of the DNA is transcribed into RNA, and less than 2% actually encodes proteins ( ◗ FIGURE 19.27b). Active genes are often separated by vast deserts of noncoding DNA, much of which consists of repeated sequences derived from transposable elements. The average gene in the human genome is approximately 27,000 bp in length, with about 9 exons. (Table 19.5). (One exceptional gene has 234 exons.) The introns of human genes are much longer, and there are more of them than in other genomes ( ◗ FIGURE 19.27c) The human genome does not encode substantially more protein domains (see Table 19.3), but the domains are combined in more ways to produce a relatively diverse proteome. Gene functions encoded by the human genome are presented in Figure 19.27b. A single gene often encodes multiple proteins through alternative splicing; each gene may encode, on the average, two or three different mRNAs, meaning that the human genome, with approximately 32,000 genes, might encode as many as 96,000 proteins. Gene density varies among human chromosomes; chromosomes 17, 19, and 22 have the highest density and Chapter 19 (a) Homo sapiens (human) (b) 28 32 4 1 29 5 25 27 26 23 24 22 21 20 19 18 6 7 8 9 24 pairs of linear chromosomes Genome size: 3.4 billion bp Number of genes: ~32,000 G + C content: 41% 10 11 12 13 15 14 17 16 1. Miscellaneous 2. Viral protein 3. Transfer/carrier protein 4. Transcription factor 5. Nucleic acid enzyme 6. Signaling molecule 7. Receptor 8. Kinase 9. Select regulatory molecule 10. Transferase 11. Synthase and synthetase 12. Oxidoreductase 13. Lyase 14. Ligase 15. Isomerase 16. Hydrolase 17. Molecular function unknown 18. Transporter 19. Intracellular transporter 20. Select calciumbinding protein 21. Protooncogene 22. Structural protein of muscle 23. Motor 24. Ion channel 25. Immunoglobulin 26. Extracellular matrix 27. Cytoskeletal structural protein 28. Chaperone 29. Cell adhesion (c) 60 Percentage of introns 574 Human 50 Worm 40 Fly 30 20 10 0 <1 1–2 2–5 5–30 Intron length (kb) >30 ◗ 19.27 Genomic characteristics of Homo sapiens. (a) Number of chromosomes, genome size, number of genes, and G  C content. (b) Percentages of genes affecting various known and unknown functions. (c) Intron length of genes in humans, worm, and fly. Table 19.5 Average characteristics of genes in the human genome Characteristic Number of exons Size of internal exon Size of intron Average 8.8 145 bp 3365 bp Size of 5 untranslated region 300 bp Size of 3 untranslated region 770 bp Size of coding region Total length of gene 1340 bp 27,000 bp chromosomes X, 4, 18, 13, and Y have the lowest density. Some proteins encoded by the human genome that are not found in other animals include those affecting immune function; neural development, structure and function; intercellular and intracellular signaling pathways in development; hemostasis; and apoptosis. Transposable elements are much more common in the human genome than in worm, plant, and fruit-fly genomes (Table 19.4). The density of transposable elements varies, depending on chromosome location. In one region of the X chromosome, 89% of the DNA is made up of transposable elements, whereas other regions are largely devoid of these elements. There are variety of types of transposable elements in the human genome, including LINEs, SINEs, retrotransposons, and DNA transposons (see Chapter 11). Most appear to be evolutionarily old and are defective, containing mutations and deletions so that they are no longer capable of transposition. www.whfreeman.com/pierce Information on the human genome and other genome-sequencing projects, including animal, plant, protozoan, fungal, and bacterial genomes The Future of Genomics The genomes of numerous organisms are in the process of being sequenced. These sequencing efforts, combined with the large amount of known DNA sequence that now exists, provide information that is tremendously useful for agriculture, human health, and biotechnology. The complete genome sequences of the mouse and the chimpanzee will serve as important sources of insight into the function and evolution of the human genome, inasmuch as these organisms are related to humans and are often used in studies of human health. Having complete genome sequences of crop plants and domestic animals will make it easier to identify Genomics genes that affect yield, disease and pest resistance, and other agriculturally important traits, which can then be manipulated by traditional breeding or genetic engineering to produce greater quantities and more-nutritious foods. In the future, whole or partial genomic sequence information will be used in individual patient care. Currently, newborn babies are screened for a few treatable genetic diseases, such as phenylketonuria, which can be identified with the use of simple biochemical tests. In the future, newborns may be screened for a large number of variations in genetic sequence that confer high risk to treatable diseases, such as coronary artery disease, hypertension, asthma, and certain types of cancer. For those persons who are identified as genetically at risk, preventive treatment may be started early. In what has been called “personalized medicine,” a person’s DNA sequence may be used to predict responses to different treatment regimes, and drug therapy may then be fine-tuned to a person’s genetic background. Genetic testing of both patients and pathogens will allow faster and moreprecise diagnoses of many diseases. Along with the many potential benefits of having complete sequence information are concerns about the misuse of this information. With the knowledge gained from genomic sequencing, many more genes for diseases, disorders, and behavioral and physical traits will be identified, increasing the number of genetic tests that can be performed to make predictions about the future phenotype and health of a person. There is concern that information from genetic testing might be used to discriminate against people who are carriers of disease-causing genes or who might be at risk for some future disease. Questions arise about who owns a person’s genome sequence. Should employers and insurance companies have access to this information? What about relatives, who have similar genomes and who might also be at risk for some of the same diseases? There are also questions about the use of this information to select for specific traits in future offspring. All of these concerns are legitimate and must be addressed if we are to use the information from genome sequencing responsibly. 575 www.whfreeman.com/pierce Ethical issues associated with the Human Genome Project and genomics in general Connecting Concepts Across Chapters Genomics, the focus of this chapter, uses many of the techniques described in Chapter 18 for studying individual genes and applies them to the entire genome. What is different about genomics is the tremendous amount of information that is produced by using these techniques, requiring special computational tools. Although the details of many of these methods are beyond the scope of this book, an understanding of the underlying principles of genomics and the general trends emerging from the results of genomic studies is important to a student in a general genetics course. Genomics holds great potential for understanding biological processes and for applications in health, agriculture, and biotechnology. It will undoubtedly be one of the most important areas of future genetic research. A surprising result to emerge from the study of genomics is the finding that organisms that differ greatly in phenotype and complexity may possess many similar genes and, in fact, may not differ greatly in the total number of genes that they possess. This finding suggests that differences in phenotype are often due more to differing patterns of gene expression than to differences in the protein-coding information of their genomes. Much of what has already been covered in this book is relevant to the study of genomics. Information on gene mapping (Chapter 7), DNA structure (Chapter 10), chromosome organization (Chapter 11), transcription (Chapter 13), protein synthesis (Chapter 15), and recombinant DNA (Chapter 18) is particularly critical for understanding the concepts presented in this chapter. Comprehension of some of the topics covered in subsequent chapters will be facilitated by an understanding of the information in this chapter; such topics include organelle DNA in Chapter 20 and evolutionary genetics in Chapter 23. CONCEPTS SUMMARY • Genomics is the field of genetics that attempts to understand the content, organization, and function of genetic information contained in whole genomes. percent recombination. Physical maps are based on the physical distances between genes and are measured in base pairs. • Structural genomics concerns the organization and sequence of the genome. Functional genomics studies the biological function of genomic information. Comparative genomics compares the genomic information in different organisms. • The location of sites recognized by restriction enzymes can be determined by cutting the DNA with each restriction enzyme separately and in combinations and then comparing the restriction fragments produced. • Genetic maps position genes relative to other genes by determining rates of recombination and are measured in • DNA sequencing determines the base sequence of nucleotides along a stretch of DNA. The Sanger (dideoxy) method uses 576 • • • • • • • • Chapter 19 special substrates for DNA synthesis (dideoxynucleoside triphosphates, ddNTPs) that terminate synthesis after they are incorporated into the newly made DNA. Four reactions, each with a different ddNTP, are set up. In each reaction, DNA fragments of varying length are produced, all of which terminate in nucleotides with the same base. The products of the four reactions are separated by gel electrophoresis, and the sequence of the DNA synthesized is read from the pattern of bands on the gel. Sequencing a whole genome requires breaking the genome into small overlapping fragments whose DNA sequence can be determined in sequencing reactions. The individual sequences can be ordered into a whole genome sequence with the use of a map-based approach, in which fragments are assembled in order by using previously created genetic and physical maps, or with the use of a whole-genome shotgun approach, in which overlap between fragments is used to assemble them into a whole-genome sequence. The Human Genome Project is an effort to determine the entire sequence of the human genome. The project began officially in 1990; rough drafts of the human genome sequence were completed in 2000. Single-nucleotide polymorphisms are single-base differences in DNA between individuals and are valuable as markers in linkage studies. Expressed-sequence tags are markers associated with expressed (transcribed) DNA sequences. RNA from a cell is subjected to reverse transcription, producing cDNA molecules. A short stretch of the cDNA is then sequenced, which provides a marker that tags (identifies) the DNA fragment. Expressed-sequence tags can be used to find the genes expressed in a genome. Bioinformatics is a synthesis of molecular biology and computer science that develops tools to store, retrieve, and analyze DNA, cDNA, and protein sequence data. A transcriptome is the set of all RNA molecules transcribed from a genome; a proteome is the set of all the proteins encoded by the genome. Computer programs can identify genes by looking for characteristic features of genes within a sequence. Homologous genes are evolutionarily related. Orthologs are homologous sequences found in different organisms, whereas paralogs are homologous sequences found in the same organism. Gene function may be determined by looking for homologous sequences (both orthologs and paralogs) whose function has been previously determined. • Functions of unknown genes may be inferred by searching databases for protein domains in genes that have been previously characterized. • The functions of unknown genes can be inferred by using methods that compare DNA sequences, including phylogenetic profiling, protein fusion patterns, and linkage arrangements of genes in different organisms. • A microarray consists of DNA fragments fixed in an orderly pattern to a solid support, such as a nylon filter or glass slide. When a solution containing a mixture of DNA or RNA is applied to the array, any nucleic acid that is complementary to the probe being used will bind to the probe. Microarrays can be used to monitor the expression of thousands of genes simultaneously. • Genes affecting a particular function or trait can be identified through whole-genome mutagenesis screens. In this process, a group of organisms is screened for abnormal phenotypes subsequent to mutagenesis, and the mutated genes causing the abnormal phenotypes are identified by positional cloning. • The genomes of many prokaryotic organisms have been determined. Most species have between 1 million and 3 million base pairs of DNA and from 1000 to 2000 genes. Compared with that of eukaryotic genomes, the density of genes in prokaryotic genomes is relatively uniform, with about one gene per 1000 bp. There is relatively little noncoding DNA between prokaryotic genes. Horizontal gene transfer (the movement of genes between different species) has been an important evolutionary process in prokaryotes. • Eukaryotic genomes are larger and more variable in size than prokaryotic genomes. There is no clear relation between organismal complexity and the amount of DNA or number of genes among multicellular organisms. Much of the genomes of eukaryotic organisms consist of repetitive DNA. Transposable elements are very common in most eukaryotic genomes. • Genomics is making important contributions to human health, agriculture, biotechnology, and our understanding of evolution. IMPORTANT TERMS genomics (p. 000) structural genomics (p. 000) functional genomics (p. 000) comparative genomics (p. 000) genetic map (p. 000) physical map (p. 000) restriction mapping (p. 000) DNA sequencing (p. 000) dideoxyribonucleoside triphosphate (ddNTP) (p. 000) map-based sequencing (p. 000) contig (p. 000) whole-genome shotgun sequencing (p. 000) single-nucleotide polymorphism (SNP) (p. 000) expressed-sequence tag (EST) (p. 000) bioinformatics (p. 000) open reading frame (p. 000) transcriptome (p. 000) proteome (p. 000) Genomics homologous genes (p. 000) orthologous genes (p. 000) paralogous genes (p. 000) protein domain (p. 000) phylogenetic profile (p. 000) fusion pattern (p. 000) gene neighbor analysis (p. 000) microarray (p. 000) mutagenesis screen (p. 000) positional cloning (p. 000) 577 horizontal gene exchange (p. 000) Worked Problems 1. A linear piece of DNA that is 30 kb long is first cut with BamHI, then with HpaII, and finally with both BamHI and HpaII together. Fragments of the following size were obtained from this reaction. BamHI: 20-kb, 6-kb, and 4-kb fragments HpaII: 21-kb and 9-kb fragments BamHI and HpaII: 20-kb, 5-kb, 4-kb, and 1-kb fragments Now, let’s examine the fragments produced when the DNA is cut by BamHI alone. The 20-kb and 4-kb fragments are also present in the double digest; so neither of these fragments contains an HpaII site. The 6-kb fragment, however, is not present in the double digest, and the 5-kb and 1-kb fragments in the double digest sum to 6 kb; so this fragment contains an HpaII site that is 5 kb from one end and 1 kb from the other end. Hpa II site Draw a restriction map of the 30-kb piece of DNA, indicating the locations of the BamHI and HpaII restriction sites. 5 kb 1 kb • Solution This problem can be solved correctly through a variety of approaches; this solution applies one possible approach. When cut by BamHI alone, the linear piece of DNA is cleaved into three fragments; so there must be two BamHI restriction sites. When cut with HpaII alone, a clone of the same piece of DNA is cleaved into only two fragments; so there is a single HpaII site. Let’s begin to determine the location of these sites by examining the HpaII fragments. Notice that the 21-kb fragment produced when the DNA is cut by HpaII is not present in the fragments produced when the DNA is cut by BamHI and HpaII together (the double digest); this result indicates that the 21-kb HpaII fragment has within it a BamHI site. If we examine the fragments produced by the double digest, we see that the 20-kb and 1-kb fragments sum to 21 kb; so a BamHI site must be 20 kb from one end of the fragment and 1 kb from the other end. We have accounted for all the restriction sites, but we must still determine the order of the sites on the original 30-kb fragment. Notice that the 5-kb fragment must be adjacent to both the 1-kb and 4-kb fragments; so it must be in between these two fragments. 1 kb 5 kb BamHI site Bam HI site HpaII site 1 kb 20 kb Similarly, we see that the 9-kb HpaII fragment does not appear in the double digest and that the 5-kb and 4-kb fragments in the double digest add up to 9 kb; so another BamHI site must be 5 kb from one end of this fragment and 4 kb from the other end. BamHI site 5 kb 4 kb We have also established that the 1-kb and 20-kb fragments are adjacent; because the 5-kb fragment is on one side, the 20-kb fragment must be on the other, completing the restriction map: Bam HI site 20 kb BamHI site Hpa II site 4 kb 1 kb 5 kb 4 kb 2. You are given the following DNA fragment to sequence: 5 – GCTTAGCATC – 3. You first clone the fragment in bacterial cells to produce sufficient DNA for sequencing. You isolate the DNA from the bacterial cells and carry out the dideoxy sequencing method. You then separate the products of the polymerization reactions by gel electrophoresis. Draw the bands that should appear on the gel from the four sequencing reactions. 578 Chapter 19 Reaction containing ddATP ddTTP ddCTP • Solution ddGTP Origin In the dideoxy sequencing reaction, the original fragment is used as a template for the synthesis of a new DNA strand; and it is the sequence of the new strand that is actually determined. The first task, therefore, is to write out the sequence of the newly synthesized fragment, which will be complementary and antiparallel to the original fragment. The sequence of the newly synthesized strand, written 5 : 3 is: 5 – GATGCTAAGC – 3. Bands representing this sequence will appear on the gel, with the bands representing nucleotides near the 5 end of the molecule at the bottom of the gel. The New Genetics MINING GENOMES GENOME ANALYSIS AND COMPARATIVE GENOMICS Recent developments in genomics are revolutionizing our understanding of life, evolution, and medicine. This exercise allows you to explore and compare completed genomes at the Comprehensive Microbial Resources site at the Institute for Genomic Research. You will also explore the Human Genome and the Mouse Genome by using tools at the National Center for Biotechnology Information (NCBI). COMPREHENSION QUESTIONS 1. (a) What is genomics and how does structural genomics 8. What is a single-nucleotide polymorphism (SNP), and how differ from functional genomics? are SNPs used in genomic studies? (b) What is comparative genomics? 9. How are genes recognized within genomic sequences? * 2. What is the difference between a genetic map and a physical *10. What are homologous sequences? What is the difference map? Which generally has higher resolution and accuracy between orthologs and paralogs? and why? 11. Describe several different methods for inferring the 3. What is the purpose of the dideoxynucleoside triphosphate function of a gene by examining its DNA sequence. in the dideoxy sequencing reaction? 12. What is a microarray and how can it be used to obtain information about gene function? * 4. What is the difference between a map-based approach to sequencing a whole genome and a whole-genome shotgun *13. Briefly outline how a mutagenesis screen is carried out. approach? 14. Eukaryotic genomes are typically much larger than 5. How are DNA fragments ordered into a contig by using prokaryotic genomes. What accounts for the increased restriction sites? amount of DNA seen in eukaryotic genomes? * 6. Describe the different approaches to sequencing the human 15. What is one consequence of differences in the G  C genome that were taken by the international collaboration content of different genomes? and Celera Genomics. *16. What is horizontal gene exchange? How might it take place 7. (a) What is an expressed-sequence tag (EST)? between different species of bacteria? (b) How are ESTs created? 17. DNA content varies considerably among different (c) How are ESTs used in genomics studies? multicellular organisms. Is this variation closely related to Genomics the number of genes and the complexity of the organism? If not, what accounts for the differences? * 18. More than half of the genome of Arabidopsis thaliana consists of duplicated sequences. What mechanisms are thought to have been responsible for these extensive duplications? 579 19. The human genome does not encode substantially more protein domains than do invertebrate genomes, and yet it encodes many more proteins. How are more proteins encoded when the number of domains does not differ substantially? 20. What are some of the ethical concerns arising out of the information produced by the Human Genome Project? APPLICATION QUESTIONS AND PROBLEMS * 21. A 22-kb piece of DNA has the following restriction sites. Reaction containing ddATP HindIII site HpaI site HpaI site 3 kb 4 kb 5 kb ddTTP ddCTP ddGTP HindIII site Hpa I site 7 kb 2 kb Origin 1 kb A batch of this DNA is first fully digested by HpaI alone, then another batch is fully digested by HindIII alone, and finally a third batch is fully digested by both HpaI and HindIII together. The fragments resulting from each of the three digestions are placed in separate wells of an agarose gel, separated by gel electrophoresis, and stained by ethidium bromide. Draw the bands as they would appear on the gel. * 22. A piece of DNA that is 14 kb long is cut first by EcoRI alone, then by SmaI alone, and finally by both EcoRI and SmaI together. The following results are obtained. Digestion by EcoRI alone Digestion by SmaI alone Digestion by both EcoRI and SmaI 3-kb fragment 5-kb fragment 6-kb fragment 7-kb fragment 7-kb fragment 2-kb fragment 3-kb fragment 4-kb fragment 5-kb fragment * 24. Suppose that you are given a short fragment of DNA to sequence. You clone the*fragment, isolate the cloned DNA fragments, and set up a series of four dideoxy reactions. You then separate the products of the reactions by gel electrophoresis and obtain the following banding pattern: Reaction containing ddATP ddTTP ddCTP ddGTP Origin Draw a map of the EcoRI and SmaI restriction sites on this 14-kb piece of DNA, indicating the relative positions of the restriction sites and the distances between them. 23. Suppose that you want to sequence the following DNA fragment: Fragment to be sequenced: 5 – TCCCGGGAAA-primer site – 3 You first clone the fragment in bacterial cells to produce sufficient DNA for sequencing. You isolate the DNA from the bacterial cells and carry out the dideoxy sequencing method. You then separate the products of the polymerization reactions by gel electrophoresis. Draw the bands that should appear on the gel from the four sequencing reactions. Write out the base sequence of the original fragment that you were given. Original sequence: 5 – _______________ – 3 580 Chapter 19 25. Microarrays can be used to determine the levels of gene expression. In one type of microarray, hybridization of the red (experimental) and green (control) cDNAs is proportional to the relative amounts of mRNA in the samples. Red indicates the overexpression of a gene and green indicates the underexpression of a gene in the experimental cells relative to the control cells, yellow indicates equal expression in experimental and control cells, and no color indicates no expression in either experimental or control cells. In one experiment, mRNA from a strain of antibioticresistant bacteria (experimental cells) is converted into cDNA and labeled with red fluorescent nucleotides; mRNA from a nonresistant strain of the same bacteria (control cells) is converted into cDNA and labeled with green fluorescent nucleotides. The cDNAs from the resistant and nonresistant cells are mixed and hybridized to a chip containing spots of DNA from genes 1 through 25. The results are shown in the adjoining illustration. What conclusions can you make about which genes might be implicated in antibiotic resistance in these bacteria? How might this information be used to design new antibiotics that are less vulnerable to resistance? 27. The physical locations of several genes determined from genomic sequences are shown here for three bacterial species. On the basis of this information, which genes might be functionally related? 1 1 2 4 5 7 3 2 3 7 4 6 1 7 5 4 2 5 3 6 Genome of species C 6 Genome of species A Genome of species B 28. The presence () or absence () of six sequence-tagged sites (STSs) in each of five bacterial artificial chromosome (BAC) clones (A–E) is indicated in the following table. Using these markers, put the BAC clones in their correct order and indicate the locations of the STS sites within them. STSs 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 BAC clone A B C D E 1      2      3      4      5      6      29. How does the density of genes found on chromosome 22 compare with the density of genes found on chromosome 21, two similar-sized chromosomes? How does the number of genes on chromosome 22 compare with the number found on the Y chromosome? To answer these questions, go to the Ensembl Web site: http://www.ensembl.org/ * 26. Genes for the following proteins are found in five different species whose genomes have been completely sequenced. On the basis of the presence-and-absence patterns of these proteins in the genomes of the five species, which proteins are most likely to be functionally related? (Hint: Create a table listing the presence or absence of each protein in the five species.) Species A B C D E Proteins P1, P2, P3, P4, P5 P1, P2, P3, P5 P2, P4 P3, P5 P1, P3, P4, P5 Under the heading Ensembl Species, click Human. On the left-hand side of the next page are pictures of the human chromosomes. Click on chromosome 22. You will be shown a picture of this chromosome and a histogram illustrating the density of total genes (uncolored bars) and known genes (colored bars). The number of known and novel (uncharacterized) genes is given in the upper right-hand side of the page. Now go to chromosome 21 by pulling down the Change Chromosome menu and selecting chromosome 21. Examine the density and total number of genes for chromosome 21. Now do the same for the Y chromosome. (a) Which chromosome has the highest density and greatest number of genes? Which has the fewest? Genomics (b) Examine in more detail the genes at the tip of the short arm of the Y chromosome by clicking on the top bar in the histogram of genes. A more detailed view will be shown. What known genes are found in this region? How many novel genes are there in this region? * 30. Some researchers have proposed creating an entirely new, free-living organism with a minimal genome, the smallest set of genes that allows for replication of the organism in a particular environment. This organism could be used to design and create, from “scratch,” novel organisms that might perform specific tasks such as the breakdown of toxic materials in the environment. (a) How might the minimal genome required for life be determined? (b) What, if any, social and ethical concerns might be associated with the creation of novel organisms by 581 constructing an entirely new organism with a minimal genome? 31. What are some of the major differences between the ways in which genetic information is organized in the genomes of prokaryotes versus eukaryotes? 32. How do the following genomic features of prokaryotic organisms compare with those of eukaryotic organisms? How do they compare among eukaryotes? (a) Genome size (b) Number of genes (c) Gene density (bp/gene) (d) G  C content (e) Number of exons SUGGESTED READINGS Adams, M. D., S. E. Celniker, R. A. Holt, C. A. Evans, J. D. Gocayne, et al. 2000. The genome sequence of Drosophila melanogaster. Science 287:2185 – 2195. Report of the complete sequence of Drosophila melanogaster, the fruit fly. Dean, P. M., E. D. Zanders, and D. S. Bailey. 2001. Industrialscale genomics-based drug design and discovery. Trends in Biotechnology 19:288 – 292. A review of the effect of genomics on drug discovery and design. Arabidopsis Genome Initiative. 2000. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408:796 – 815. Analysis of the complete genome of the first plant genome to be published. Eisenberg, D., E. M. Marcotte, I. Xenarios, and T. O. Yeates. 2000. Protein function in the post-genomic era. Nature 405:823 – 826. A review of how protein function can be inferred from DNA sequence data. C. elegans Sequencing Consortium. 1998. Genome sequence of the nematode C. elegans: a platform for investigating biology. Science 282: 2012 – 2018. Report of the sequence and analysis of the genome of C. elegans, the roundworm. Choe, M. K., D. Magnus, A. L. Caplan, D. McGee, and the Ethics of Genomics Group. 1999. Ethical considerations in synthesizing a minimal genome. Science 286:2087 – 2090. A discussion of some of the ethical implications of creating novel organisms by constructing a minimal genome. Fraser, C. M., J. Eisen, R. D. Fleischmann, K. A. Ketchum, and S. Peterson. 2001. Comparative genomics and understanding of microbial biology. Emerging Infectious Diseases 6:505 – 512. An excellent overview of what has been learned from wholegenome sequences of prokaryotic organisms. Howard, K. 2000. The bioinformatics gold rush. Scientific American 283(1):58 – 63. A good overview of bioinformatics and its economic potential. In the same issue, see articles on “The human genome business today” and “Beyond the human genome.” Cole, S. T., K. Eiglmeier, J. Parkhill, K. D. James, N. R. Thomson, et al. 2001. Massive gene decay in the leprosy bacillus. Nature 409:1007 – 1011. Report of the genomic sequence of Mycobacterium leprae, the bacterium that causes leprosy. International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409:860 – 921. A report from the public consortium on its version of the human genome sequence. Many articles in this issue of Nature report on various aspects of the human genome. Davies, K. 2001. Cracking the Genome: Inside the Race to Unlock Human DNA. New York: Simon & Schuster. A very readable account of the history of the human genome project, placed within the context of advances in molecular biology. International SNP Map Working Group. 2001. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409:928 – 933. A report on mapping single nucleotide polymorphisms in the human genome. 582 Chapter 19 Knight, J. 2001. When the chips are down. Nature 410:860 – 861. News story about progress in using DNA chips to monitor gene expression. Mewes, H. W., K. Albermann, M. Bahr, D. Frishman, A. Gleissner, J. Hani, K. Heumann, K. Kleine, A. Maierl, S. G. Oliver, F. Pfeiffer, and A. Zollner. 1997. Overview of the yeast genome. Nature 387:7 – 8. A broad look at the yeast genome and what can be learned from its sequence. Rosamond, J., and A. Allsop. 2000. Harnessing the power of the genome in the search for new antibiotics. Science 287:1973 – 1976. Describes how genomic sequences can be useful in the search for new drugs. Rubin, G. M., M. D. Yandell, J. R. Wortman, G. L. G. Miklos, C. R. Nelson, I. K. Hariharan, et al. 2000. Comparative genomics of the eukaryotes. Science 287:2204 – 2215. An analysis of the proteins encoded by the genomes of fly, worm, and yeast. Sander, C. 2000. Genomic medicine and the future of health care. Science 287:1977 – 1978. A discussion of the effect of genomics on the future of medicine. Venter, J. C., M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, et al. 2001. The sequence of the human genome. Science 291:1304 – 1351. An analysis of the private draft of the human genome sequence. Much of this issue of Science reports on the human genome sequence and its analysis. 20 Organelle DNA • • Coyote Genes in Declining Wolves The Biology of Mitochondria and Chloroplasts Mitochondrion and Chloroplast Structure The Genetics of Organelle-Encoded Traits The Endosymbiotic Theory • • Mitochondrial DNA Gene Structure and Organization of mtDNA Nonuniversal Codons in mtDNA Replication, Transcription, and Translation of mtDNA Evolution of mtDNA • Chloroplast DNA Gene Structure and Organization of cpDNA Replication, Transcription, and Translation of cpDNA (John Shaw/Bruce Coleman.) Evolution of cpDNA • The Intergenomic Exchange of Genetic Information Mitochondrial DNA and Aging in Humans Coyote Genes in Declining Wolves North America is home to two wild canids: the gray wolf (Canis lupus) and the coyote (Canis latrans). Before European settlement, gray wolves ranged across much of North America, occupying forest, plains, desert, and tundra habitat ( ◗ FIGURE 20.1). Coyotes had a more limited distribution, being confined primarily to plains and deserts. With the expansion of European settlement and the development of intensive agriculture in the eighteenth and nineteenth centuries, the distribution of wolves and coyotes changed dramatically (see Figure 20.1). Wolf populations in North America declined precipitously owing to habitat destruction and deliberate extermination. In contrast, coyote populations expanded, probably because competition from wolves was eliminated and because coyotes were better able to adapt to human disruption of the ecosystem. Today coyotes are found throughout most of North America. Habitat alternation and changes in their distributions have increased interactions between wolves and coyotes in recent times, providing more opportunities for hybridization between the two species. In captivity, wolves and coyotes will interbreed and produce fertile hybrids. Large coyotes in New England and southeastern Canada may be the result of hybridization between wolves and coyotes in these areas. To what extent is hybridization occurring between coyotes and wolves in nature? To answer this question, Niles Lehman and his colleagues studied DNA in the mitochondria of wolves and coyotes. Mitochondrial DNA (mtDNA) can be helpful in determining hybridization between animals for two reasons: (1) in animals, it is inherited only from the female 1 2 Chapter 20 Key Original range Current range (a) Wolf populations (b) Coyote populations ◗ 20.1 Original and current ranges of the gray wolf (Canis lupus) and the coyote (Canis latrans). Originally, the gray wolf occupied most of North America, but its current range is restricted to northern Minnesota, Canada, and Alaska. The coyote originally occupied the plains and desert habitat in the midwestern United States and Mexico. Today, the coyote is found throughout most of North America. The results of studies of mitochondrial DNA reveal that hybridization is occurring between wolves and coyotes. parent and (2) it evolves rapidly. Lehman and his colleagues gathered tissue and blood samples from more than 500 gray wolves and coyotes in North America, extracted mtDNA from the samples, and analyzed restriction fragment length polymorphisms (see Chapter 18) in the mtDNA. The results of his study revealed two major clusters of mtDNA among the canids: one consisting of wolf mtDNA and another of coyote-like mtDNA. Surprisingly, the coyote-like mtDNA cluster included several samples that had been obtained from wolves, indicating that some wolves possessed coyotelike mtDNA. The wolves with coyote-like mtDNA were all from the U.S. – Canadian border area, which has recently been invaded by coyotes. No wolves from Alaska or northern Canada possessed coyote-like mtDNA. All the coyotes had only coyote-like mtDNA. These results indicate that unidirectional hybridization has taken place between coyotes and wolves: coyote mtDNA has entered wolf populations, but wolf mtDNA has not entered coyote populations. The fact that in animals mtDNA is inherited only from the female parent implies that female coyotes are mating successfully with male wolves and the wolf – coyote hybrids are backcrossing with wolves, introducing coyote genes into wolf populations. These findings have important implications for the future of gray wolves in North America. Hybridization between wolves and coyotes threatens to erode the genetic integrity of wolves. As human activities encroach on areas occupied by wolves, wolves and coyotes will be forced into closer contact and there will be more hybridization; the wolf genome (both mitochondrial and nuclear) will become increasingly diluted by coyote DNA. Current efforts to reintroduce wolves into former territories, which are now occupied by coyotes, may lead to further hybridization, ultimately harming, rather than helping, wolf populations. DNA sequences found in mitochondria and other organelles possess unique properties that make these sequences useful in the fields of conservation biology, evolution, and genetic diseases. Uniparental inheritance exhibited by genes found in mitochondria and chloroplasts was discussed in Chapter 5; the present chapter examines molecular aspects of organelle DNA. We begin by briefly considering the structures of mitochondria and chloroplasts, the inheritance of traits encoded by their genes, and their evolutionary origin. We then examine the general characteristics of mtDNA, followed by a discussion of the organization and function of different types of mitochondrial genomes. Organelle DNA Mitochondrion Chloroplast Outer membrane Inner membrane 4–6 µm 0.5–1.0 µm Matrix Stroma Grana DNA Ribosomes Thylakoid membrane ◗ 20.2 Comparison of the structures of mitochondria and chloroplasts. (Left, Don Fawcett/Visuals Unlimited; (right) Biophoto Associates; Photo Researchers.) Finally, we turn to chloroplast DNA (cpDNA), examining its characteristics, organization, and function. (a) Mitochondrion Mitochondrial DNA The Biology of Mitochondria and Chloroplasts Mitochondria and chloroplasts are membrane-bounded organelles located in the cytoplasm of eukaryotic cells ( ◗ FIGURE 20.2). Mitochondria are present in almost all eukaryotic cells, whereas chloroplasts are found in multicellular plants and some algae. Both organelles generate ATP, the universal energy carrier of cells. 1 A mitochondrion grows, and its DNA replicates. Mitochondrion and Chloroplast Structure Mitochondria are from 0.5 to 1.0 micrometer (m) in diameter, about the size of a typical bacterium; chloroplasts are typically from about 4 to 6 m in diameter. Both are surrounded by two membranes that enclose a region (called the matrix in mitochondria and the stroma in chloroplasts) that contains enzymes, ribosomes, RNA, and DNA. In mitochondria, the inner membrane is highly folded; embedded within it are the enzymes that catalyze electron transport and oxidative phosphorylation. Chloroplasts have a third membrane, called the thylakoid membrane, which is highly folded and stacked to form aggregates called grana. This membrane bears the pigments and enzymes required for photophosphorylation. New mitochondria and chloroplasts arise by the division of existing organelles ( ◗ FIGURE 20.3). Mitochondria 2 Organelle division starts with constriction of the outer membrane. 3 During division, cellular DNA segregates randomly. (b) ◗ 20.3 New mitochondria arise by division of existing mitochondria. (a) DNA molecules within the mitochondria segregate randomly in organelle division. (b) Electron micrograph of a dividing mitochondrion from a liver cell. (T. Kanaseki and D. Fawcett/Visuals Unlimited.) 1 µm 3 4 Chapter 20 and chloroplasts possess DNA that encodes polypeptides used by the organelle, as well as rRNAs and tRNAs needed for the translation of these proteins. The Genetics of Organelle-Encoded Traits Cell division Mitochondria and chloroplasts are present in the cytoplasm and are usually inherited from a single parent. Thus traits encoded by mtDNA and cpDNA exhibit uniparental inheritance. In animals, mtDNA is inherited almost exclusively from the female parent, although occasional male transmission of mtDNA has been documented. Paternal inheritance of organelles is common in gymnosperms and occurs occasionally in angiosperms as well. Some plants even exhibit biparental inheritance of mtDNA and cpDNA. Individual cells may contain from dozens to hundreds of organelles, each with numerous copies of the organelle genome; so each cell typically possesses from hundreds to thousands of copies of mitochondrial and chloroplast genomes ( ◗ FIGURE 20.4). A mutation arising within one organellar DNA molecule generates a mixture of mutant and wild-type DNA sequences within that cell. The occurrence of two distinct varieties of DNA within the cytoplasm of a single cell is termed heteroplasmy. When a heteroplasmic cell divides, the organelles segregate randomly into the two progeny cells in a process called replicative segregation ( ◗ FIGURE 20.5), and chance determines the proportion of mutant organelles in each cell. Although most progeny cells will inherit a mixture of mutant and normal organelles, just by chance some cells may receive organelles with only mutant or only wild-type sequences; this situation is known as homoplasmy. A heteroplasmic cell has two distinct varieties of DNA contained in its organelles. The organelles segregate randomly in cell division,… Replication of mitochondria …reproduce,… Cell division Replication of mitochondria …and again segregate randomly in cell division. The organelles reproduce again… …and randomly segregate. Homoplasmic cell Heteroplasmic cells Homoplasmic cells Conclusion: Most cells are heteroplasmic, but, just by chance, some cells may receive only one type of organelle (e.g., they may receive all normal or all mutant). ◗ 20.5 Organelles in a heteroplasmic cell divide randomly into the progeny cells. This diagram illustrates replicative segregation in mitosis; the same process also takes place in meiosis. ◗ 20.4 Individual cells may contain many mitochondria, each with several copies of the mitochondrial genome. Shown is a cell of Euglena gracilis, stained so that the nucleus appears red, mitochondria green, and mtDNA yellow. (From Y. Huyashi and K. Veda, Journal of Cell Sciences 93, 1989, 565.) When replicative segregation occurs in somatic cells, it may create phenotypic variation within a single organism; different cells of the organism may possess different proportions of mutant and wild-type sequences, resulting in different degrees of phenotypic expression among tissues. When replicative segregation occurs in the germ cells of a heteroplasmic cytoplasmic donor, the offspring may show quite different phenotypes. Organelle DNA The disease known as myoclonic epilepsy and raggedred fiber disease syndrome (MERRF) is caused by a mutation in an mtDNA gene. A 20-year-old person who carried this mutation in 85% of his mtDNAs displayed a normal phenotype, whereas a cousin who had the mutation in 96% of his mtDNAs was severely affected. In diseases caused by mutations in mtDNA, the severity of the disease is frequently related to the proportion of mutant mtDNA sequences inherited at birth. A number of traits encoded by organellar DNA have been studied. One of the first to be examined in detail was the phenotype produced by petite mutations in yeast ( ◗ FIGURE 20.6). In the late 1940s, Boris Ephrussi and his colleagues noticed that, when grown on solid medium, some colonies of yeast were much smaller than normal. Examination of these petite colonies revealed that growth rates of the cells within the colonies were greatly reduced. The results of biochemical studies demonstrated that petite mutants were unable to carry out aerobic respiration; they obtained all of their energy from anaerobic respiration (glycolysis), which is much less efficient than aerobic respiration and results in the smaller colony size. Some petite mutations are inherited from both parents and are defects in nuclear DNA. However, most petite mutations are inherited from only a single parent; such mutants possess large deletions in mtDNA or, in some cases, are missing mtDNA entirely. Because much of their mtDNA encodes enzymes that catalyze aerobic respiration, the petite mutants are unable to carry out aerobic respiration and therefore cannot produce normal quantities of ATP, which inhibits their growth. Another known mtDNA mutation occurs in Neurospora. Isolated by Mary Mitchell in 1952, poky mutants grow slowly, display cytoplasmic inheritance, and have abnormal amounts of cytochromes. Cytochromes are protein components of the electron-transport chain of the mitochondria and play an integral role in the production of ATP. Most organisms have three primary types of cytochromes: cytochrome a, cytochrome b, and cytochrome c. Poky mutants have cytochrome c but no cytochrome a or b. Like petite mutants, poky mutants are defective in ATP synthesis and therefore grow more slowly than normal wild-type cells. In recent years, a number of genetic diseases that result from mutations in mtDNA have been identified in humans. Leber hereditary optic neuropathy (LHON), which typically leads to sudden loss of vision in middle age, results from mutations in the mtDNA genes that encode electrontransport proteins. Another disease caused by mitochondrial mutations is neurogenic muscle weakness, ataxia, and tetinitis pigmentosa (NARP), which is characterized by seizures, dementia, and developmental delay. Other mitochondrial diseases include Kearns-Sayre syndrome (KSS) and chronic external opthalmoplegia (CEOP), both of which result in paralysis of the eye muscles, droopy eyelids, and, in severe cases, vision loss, deafness, and dementia. All of these diseases exhibit cytoplasmic inheritance and variable expression (see Chapter 5). A trait in plants that is produced by mutations in mitochondrial genes is cytoplasmic male sterility, a mutant phenotype found in more than 140 different plant species and inherited only from the maternal parent. These mutations inhibit pollen development but do not affect female fertility. A number of cpDNA mutants also have been discovered. One of the first to be recognized was leaf variegation in the Mirabilis jalapaa, which was studied by Carl Correns in 1909 (see p. 000 in Chapter 5). In the green alga Chlamydomonas, streptomycin-resistant mutations occur in cpDNA, and a number of mutants exhibiting altered pigmentation and growth in higher plants have been traced to defects in cpDNA. Concepts ◗ 20.6 The petite mutants have large deletions in their mtDNA and are unable to carry out oxidative phosphorylation. Electron micrograph of mitochondria in (a) a normal yeast cell and (b) a petite mutant. (Part a, David M. Phillips/Visuals Unlimited; part b, .) In most organisms, genes encoded by mtDNA and cpDNA are inherited entirely from a single parent. A gamete may contain more than one distinct type of mtDNA or cpDNA; in these cases, random segregation of the organelle DNA may produce phenotypic variation within a single organism or it may produce different degrees of phenotypic expression among progeny of a cross. www.whfreeman.com/pierce More information about mitochondria and chloroplasts 5 6 Chapter 20 Anaerobic eukaryotic cell 1 Approximately 1 billion to 1.5 billion years ago, an anaerobic eukaryotic cell engulfed an aerobic eubacterial cell through endocytosis. 3 Likewise, endocytosis of a photosynthesizing eubacterium… 2 The aerobic endosymbiont evolved into mitochondria. Endocytosis 4 …led to the evolution of modern eukaryotic cells with mitochondria and chloroplasts. Endocytosis DNA Evolution Evolution α-Proteobacterium (aerobic) Cyanobacterium Mitochondrion Chloroplast Present-day plant cell Present-day animal cell ◗ 20.7 The endosymbiotic theory proposes that mitochondria and chloroplasts in eukaryotic cells arose from eubacteria. An original ancestral cell gave rise to prokaryotic and eukaryotic cells. The Endosymbiotic Theory Chloroplasts and mitochondria are in many ways similar to bacteria. This resemblance is not superficial; indeed there is compelling evidence that these organelles evolved from eubacteria (see p. 000 in Chapter 2). The endosymbiotic theory ( ◗ FIGURE 20.7) proposes that mitochondria and chloroplasts were once free-living bacteria that became internal inhabitants (endosymbionts) of early eukaryotic cells. According to this theory, between 1 billion and 1.5 billion years ago, a large, anaerobic eukaryotic cell engulfed an aerobic eubacterium, one that possessed the enzymes necessary for oxidative phosphorylation. The eubacterium provided the formerly anaerobic cell with the capacity for oxidative phosphorylation and allowed it to produce more ATP for each organic molecule digested. With time, the endosymbiont became an integral part of the eukaryotic host cell, and its descendants evolved into present-day mitochondria. Sometime later, a similar relation arose between photosynthesizing eubacteria and eukaryotic cells, leading to the evolution of chloroplasts. A great deal of evidence supports the idea that mitochondria and chloroplasts originated as eubacterial cells. Many modern, single-celled eukaryotes (protists) are hosts to endosymbiotic bacteria. Mitochondria and chloroplasts are similar in size to present-day eubacteria and possess their own DNA, which has many characteristics in common with eubacterial DNA. Mitochondria and chloroplasts possess ribosomes, some of which are similar in size and structure to eubacterial ribosomes. Finally, antibiotics that inhibit protein synthesis in eubacteria but do not affect protein synthesis in eukaryotic cells also inhibit protein synthesis in these organelles. The strongest evidence for the endosymbiotic theory comes from the study of DNA sequences in organellar DNA. Ribosomal RNA and protein-encoding gene sequences in mitochondria and chloroplasts have been found to be more closely related to sequences in the genes of eubacteria than they are to those found in the eukaryotic nucleus. Mitochondrial DNA sequences are most similar to sequences found in a group of eubacteria called the -proteobacteria, suggesting that the original bacterial endosymbiont came from this group. Chloroplast DNA sequences are most closely related to sequences found in cyanobacteria, a group of photosynthesizing eubacteria. All of this evidence indicates that mitochondria and chloroplasts are more closely related to eubacterial cells than they are to the eukaryotic cells in which they are now found. Concepts Mitochondria and chloroplasts are membrane-bounded organelles of eukaryotic cells that generally possess their own DNA. The well-supported endosymbiotic theory proposes that these organelles began as free-living eubacteria that developed stable endosymbiotic relations with early eukaryotic cells. Mitochondrial DNA In animals and most fungi, the mitochondrial genome consists of a single, highly coiled, circular DNA molecule Plant mitochondrial genomes often exist as a complex collection of multiple circular DNA molecules. Each mitochondrion Organelle DNA Table 20.1 Sizes of mitochondrial genomes in selected organisms Organism Size of mtDNA in Nucleotide Pairs dria are actually encoded by nuclear DNA, translated on cytoplasmic ribosomes, and then transported into the mitochondria; the mitochondrial genome typically encodes only a few rRNA and tRNA molecules needed for mitochondrial protein synthesis. The organization of these mitochondrial genes and how they are expressed is extremely diverse across organisms. Ascaris summ (nematode worm) 14,284 Drosophila melanogaster (fruit fly) 19,517 Ancestral and derived mitochondrial genomes Mito- Lumbricus terrestis (earthworm) 14,998 Xenopus laevis (frog) 17,553 Mus musculus (house mouse) 16,295 Canis familiaris (dog) 16,728 Homo sapiens (human) 16,569 Pichia canadensis (fungus) 27,694 chondrial genomes can be divided in two basic types — ancestral genomes and derived genomes — although there is much variation within each type and the mtDNA of some organisms does not fit well into either category. Ancestral mitochondrial genomes are found in some plants and protists and retain many characteristics of their eubacterial ancestors. These mitochondrial genomes contain more genes than do derived genomes, have rRNA genes that encode eubacteriallike ribosomes, and have a complete or almost complete set of tRNA genes. They possess few introns and little noncoding DNA between genes, generally use universal codons, and have their genes organized into clusters similar to those found in eubacteria. Derived mitochondrial genomes, in contrast, are usually smaller than ancestral genomes and contain fewer genes. Their rRNA genes and ribosomes differ substantially from those found in typical eubacteria. The DNA sequences found in derived mitochondrial genomes differ more from typical eubacterial sequences than do ancestral genomes, and they contain nonuniversal codons. Most animal and fungal mitochondrial genomes fit into this category. Podospora anserina (fungus) Schizosaccharomyces pome (fungus) 100,314 19,431 Saccharomyces cerevisiae (fungus) 85,779 Chlamydomonas reinhardtii (green alga) 15,758 Paramecium aurelia (protist) 40,469 Reclinomonas americana (protist) 69,034 Arabidopsis thaliana (plant) 166,924 Brassica hirta (plant) 208,000 Cucumis melo (plant) 2,400,000 contains multiple copies of the mitochondrial genome, and a cell may contain many mitochondria. A typical rat liver cell, for example, has from 5 to 10 mtDNA molecules in each of about 1000 mitochondria; so each cell possesses from 5000 to 10,000 copies of the mitochondrial genome, and mtDNA constitutes about 1% of the total cellular DNA in a rat liver cell. Like eubacterial chromosomes, mtDNA lacks the histone proteins normally associated with eukaryotic nuclear DNA. The guanine – cytosine (GC) content of mtDNA is often sufficiently different from that of nuclear DNA that mtDNA can be separated from nuclear DNA by density gradient centrifugation. Mitochondrial genomes are small compared with nuclear genomes and vary greatly in size among different organisms (Table 20.1). Most of this size variation is in noncoding sequences such as introns and intergenic regions. Gene Structure and Organization of mtDNA The nucleotide sequence of the mitochondrial genome has been determined for a variety of different organisms, including protists, fungi, plants, and animals. The genes for many of the structural proteins and enzymes found in mitochon- Human mtDNA Human mtDNA is a circular molecule encompassing 16,569 bp that encode two rRNAs, 22 tRNAs, and 13 proteins. The two nucleotide strands of the molecule differ in their base composition: the heavy (H) strand has more guanine nucleotides, whereas the light (L) strand has more cytosine nucleotides. The H strand is the template for both rRNAs, 14 of the 22 tRNAs, and 12 of the 13 proteins, whereas the L strand serves as template for only 8 of the tRNAs and one protein. The origin of replication for the H strand is within a region known as the D loop ( ◗ FIGURE 20.8), which also contains promoters for both the H and L strands. Human mtDNA is highly economical in its organization: there are few noncoding nucleotides between the genes; almost all the mRNA is translated (there are no 5 and 3 untranslated regions); and there are no introns. Each strand has only a single promoter; so transcription produces two very large RNA precursors that are later cleaved into individual RNA molecules. Many of the genes that encode polypeptides even lack a complete termination codon, ending in either U or UA; the addition of a poly(A) tail to the 3 end of the mRNA provides a UAA termination codon that halts translation. Human mtDNA also contains very little repetitive DNA. The one region of the human mtDNA that does contain some noncoding nucleotides is the D loop. 7 Chapter 20 thr cys his leu gln Large ribosomal RNA arg lys gly asp ala iIe tyr asn met ser Ribosome associated protein Cytochrom eb 8 phe thr val Cytochrome c oxidase II Cytochrome c oxidase III ATPase subunit 9 Yeast mitochondrial DNA fMet pro trp Key glu tRNA gene Protein gene rRNA gene Small ribosomal RNA Cy trp toc h rome c oxidase I ◗ 20.9 The yeast mitochondrial genome, consisting of 78,000 bp, contains much noncoding DNA. consists of noncoding sequences. Yeast mitochondrial genes are separated by long intergenic spacer regions that have no known functions. The genes encoding polypeptides often include regions that encode 5 and 3 untranslated regions of the mRNA; there are also short repetitive sequences and some duplications. www.whfreeman.com/pierce Information on the Fungal Mitochondrial Genome Project (FMGP), whose goals are to sequence and analyze complete mitochondrial genomes from all major groups of fungi ◗ 20.8 The human mitochondrial genome, consisting of 16,569 bp, is highly economic in its organization. (a) The outer circle represents the heavy (H) strand, and the inner circle represents the light (L) strand. The origins of replication for the H and L strands are ori H and ori L, respectively. (b) Electron micrograph of isolated mtDNA. (Part b, CNRI/Photo Researchers.) www.whfreeman.com/pierce Information on genes of the human mitochondrial genome Yeast mtDNA The organization of yeast mtDNA is quite different from that of human mtDNA. Although the yeast mitochondrial genome with 78,000 bp is nearly five times as large, it encodes only six additional genes, for a total of 2 rRNAs, 25 tRNAs, and 16 polypeptides ( ◗ FIGURE 20.9). Most of the extra DNA in the yeast mitochondrial genome Flowering plant mtDNA Flowering plants (angiosperms) have the largest and most complex mitochondrial genomes known; their mitochondrial genomes range in size from 186,000 bp in white mustard to 2,400,000 bp in muskmelon. Even closely related plant species may differ greatly in the sizes of their mtDNA. Part of the extensive size variation in the mtDNA of flowering plants can be explained by the presence of large direct repeats, which constitute large parts of the mitochondrial genome. Crossing over between these repeats can generate multiple circular chromosomes of different sizes. The mitochondrial genome in turnip, for example, consists of a “master circle” consisting of 218,000 bp that has direct repeats ( ◗ FIGURE 20.10). Homologous recombination between the repeats can generate two smaller circles of 135,000 bp and 83,000 bp. Other species contain several direct repeats, providing possibilities for complex crossing-over events that may increase or decrease the number and sizes of the circles. Organelle DNA 83,000 bp 83,000 bp Repeat B C Repeat C D B Crossing over D A A 218,000 bp C B A D Separation 135,000 bp 135,000 bp ◗ 20.10 Size variation in plant mtDNA can be generated through recombination between direct repeats. In turnips, the mitochondrial genome consists of a “master circle” of 218,000 bp, which has direct repeats that are separated by 135,000 bp on one side and 83,000 bp on the other. Crossing over between the direct repeats produces two smaller circles of 135,000 bp and 83,000 nucleotide pairs. Nonuniversal Codons in mtDNA In the vast majority of bacterial and eukaryotic DNA, the same codons specify the same amino acids (see p. 000 in Chapter 15). However, there are exceptions to this universal code, and many of these exceptions are in mtDNA (Table 20.2). There is not a “mitochondrial code”; rather, exceptions to the universal code exist in mitochondria, and these exceptions often differ among organisms. For example, AGA specifies arginine in the universal code, but AGA codes for serine in Drosophila mtDNA and is a stop codon in mammalian mtDNA. Concepts The mitochondrial genome consists of circular DNA with no associated histone proteins. The size and structure of mtDNA differ greatly among organisms. Human mtDNA exhibits extreme economy, but mtDNAs found in yeast and flowering plants contain many noncoding nucleotides and repetitive sequences. Mitochondrial DNA in most flowering plants is large and typically has one or more large direct repeats that can recombine to generate smaller or larger molecules. Table 20.2 Replication, Transcription, and Translation of mtDNA Mitochondrial DNA does not replicate in the orderly, regulated manner of nuclear DNA. Mitochondrial DNA is synthesized throughout the cell cycle and is not coordinated with the synthesis of nuclear DNA. Which mtDNA molecules are replicated at any particular moment appears to be random; within the same mitochondrion, some molecules are replicated two or three times, whereas others are not replicated at all. Furthermore, the two strands in human mtDNA may not replicate synchronously. Mitochondrial DNA is replicated by a special DNA polymerase called DNA polymerase  (gamma). Presumably, helicases and topoisomerases are required for mitochondrial DNA replication, just as they are in eubacterial and nuclear DNA replication. The processes of transcription and translation of mitochondrial genes exhibit extensive variation among different organisms. In human mtDNA, eubacterial-like operons are absent, and there are two promoters, one for each nucleotide strand, within the D loop. Transcription of the two strands proceeds in opposite directions, generating two giant precursor RNAs that are then cleaved to yield individual rRNAs, tRNAs, and mRNAs. As the tRNAs are transcribed, they fold up into three-dimensional configurations. These configurations are recognized and cut out by enzymes. The tRNA Nonuniversal codons found in mtDNA mtDNA Codon Universal Code Vertebrate Drosophila Yeast UGA Stop Tryptophan Tryptophan Tryptophan AUA Isoleucine Methionine Methionine Methionine AGA Arginine Stop Serine Arginine Source: After T. D. Fox, Annual Review of Genetics 21 (1987), p. 69. 9 10 Chapter 20 genes generally flank the protein and rRNA genes; so cleavage of the tRNAs releases mRNAs and rRNAs. In the mitochondrial genomes of fungi, plants, and protists, there are multiple promoters, although genes are occasionally arranged and transcribed in operons. Most mRNA molecules produced by the transcription of mtDNA are not capped at their 5 ends, unlike mRNA transcribed from nuclear genes (See Figure 14.6). Poly(A) tails are added to the 3 end of some mRNAs encoded by animal mtDNA, but poly(A) tails are missing from those encoded by mtDNA in fungi, plants, and protists. The poly(A) tails added to animal mitochondrial mRNAs are shorter than those attached to nuclear-encoded mRNA and are probably added by an entirely different mechanism. Some of the genes in yeast and plant mitochondrial DNA contain introns, many of which are self-splicing. RNA encoded by some mitochondrial genomes undergoes extensive editing (see p. 000 in Chapter 14). Translation in mitochondria has some similarities to eubacterial translation, but there are also important differences. In mitochondria, protein synthesis is initiated at AUG start codons by N-formylmethionine, just as in eubacterial initiation of translation. Mitochondrial translation also employs elongation factors similar to those seen in eubacteria, and the same antibiotics that inhibit translation in eubacteria also inhibit translation in mitochondria. However, mitochondrial ribosomes are variable in structure and are often different from those seen in both eubacterial and eukaryotic cells. Additionally, the initiation of translation in mitochondria must be different from that of both eubacterial and eukaryotic cells, because animal mitochondrial mRNA contains no Shine-Dalgarno ribosome-binding site and no 5 cap. (A Shine-Dalgarno sequence has been observed in mitochondrial mRNA of the protozoan Reclinomonas americana, which has a very primitive, eubacterial-like mitochondrion.) There is also much diversity in the tRNAs encoded by various mitochondrial genomes. Human mtDNA encodes 22 of the 32 tRNAs required for translation in the cytoplasm. (Only 32 are required in cytoplasmic translation because wobble at the third position of the codon allows tRNAs to pair with more than one codon; see p. 000 in Chapter 15.) In human mitochondrial translation, there is even more wobble than in cytoplasmic translation; many mitochondrial tRNAs will recognize any of the four nucleotides in the third position of the codon, permitting translation to take place with even fewer tRNAs. The increased wobble also means that any change in a DNA nucleotide at the third position of the codon will be a silent mutation (see p. 000 in Chapter 17) and will not alter the amino acid sequence of the protein. Thus more of the changes that occur in mtDNA are silent and accumulate over time, contributing to a higher rate of evolution. In some organisms, fewer than 22 tRNAs are encoded by mtDNA; in these organisms, nuclear-encoded tRNAs are imported from the cytoplasm to help carry out translation. In yet other organisms, the mitochondrial genome encodes a complete set of all 32 tRNAs. Concepts The processes of replication, transcription, and translation vary widely among mitochondrial genomes and exhibit a curious mix of eubacterial, eukaryotic, and unique characteristics. Evolution of mtDNA As already mentioned, comparisons of DNA sequences in mitochondrial genomes with homologous sequences in other organisms strongly support a common eubacterial origin for all mtDNA. Nevertheless, patterns of evolution seen in mtDNA vary greatly among different groups of organisms. The sequences of vertebrate mtDNA exhibit an accelerated rate of change: mammalian mtDNA, for example, typically evolves from 5 to 10 times as fast as mammalian nuclear DNA. The gene content and organization of vertebrate mitochondrial genomes, however, is relatively constant. In contrast, sequences of plant mtDNA evolve slowly at a rate only one-tenth that of the nuclear genome, but their gene content and organization change rapidly. The reason for these basic differences in rates of evolution is not yet known. One possible reason for the accelerated rate of evolution seen in vertebrate mtDNA is a high mutation rate in mtDNA, which would allow DNA sequences to change quickly. Increased errors associated with replication, the absence of DNA repair functions, and the frequent replication of mtDNA may increase the number of mutations. The large amount of wobble in mitochondrial translation may also allow mutations to accumulate over time, as discussed earlier. The use of mtDNA in evolutionary studies will be described in more detail in Chapter 23. Concepts All mtDNA appears to have evolved from a common eubacterial ancestor, but the patterns of evolution seen in different mitochondrial genomes varies greatly. Vertebrate mtDNA exhibits rapid change in sequence but little change in gene content and organization, whereas the mtDNA of plants exhibits little change in sequence but much variation in gene content and organization. www.whfreeman.com/pierce Data on mitochondrial genomes that have been completely sequenced, and more information on human diseases and disorders caused by defects in mitochondria Organelle DNA Chloroplast DNA Table 20.3 Geneticists have long recognized that many traits associated with chloroplasts exhibit cytoplasmic inheritance, indicating that these traits are not encoded by nuclear genes. In 1963, chloroplasts were shown to have their own DNA ( ◗ FIGURE 20.11). Among different plants, the chloroplast genome ranges in size from 80,000 to 600,000 bp, but most chloroplast genomes range from 120,000 to 160,000 bp (Table 20.3). Chloroplast DNA is usually contained on a single, doublestranded DNA molecule that is circular, is highly coiled, and lacks associated histone proteins. As in mtDNA, multiple copies of the chloroplast genome are found in each chloroplast, and there are multiple organelles per cell; so there are several hundred to several thousand copies of cpDNA in a typical plant cell. Gene Structure and Organization of cpDNA psa A psa B rps 1 4 trn R Chlorella vulgaris (green alga) 150,613 Marchantia polymorpha (liverwort) 121,024 Nicotiana tabacum (tobacco) 155,939 Zea mays (corn) 140,387 Pinus thunbergii (black pine) 119,707 and many chloroplast genes are organized into operon-like clusters. Among vascular plants, chloroplast chromosomes are similar in gene content and gene order. A typical chloroplast genome encodes 4 rRNA genes, from 30 to 35 tRNA genes, a number of ribosomal proteins, many proteins engaged in rp o C2 C trn 29 ORF Oryza sativa (rice) chloroplast DNA 134,525 bp tRNA gene Protein gene rRNA gene Unassigned open reading frame(ORF) C1 ndh D psa C ndh E ndh G F trn ORF L 85 * OR tr trn F nA I 10 23 9 4.5 S S tr 5S rps n R 1 OR F3 5 trn 21 Chloroplast DNA of rice. 34 ORF trn DY/trn E trn trn T trn G trnf M trn S trn S trn Q rps 1 6 * OR trn K F5 4 ps 2 bA OR trn H F1 OR 37 F2 49 s 7 trn G ORF 62 psb C psb D ORF 100 psb I psb K rp rp s 19 rp l 2 l trn 23 I trn L nd rp trn N ◗ 20.11 191,028 12 s rp 3 85 F OR 72 F V OR trn S 16 33 F1 I OR trn A trn 9 10 ORF 23S 4.5S 5S trn R rps 15 ORF 393 ndh A ORF 178 2 F7 OR rn V S t 6 1 33 1 RF O Porphyra purpurea (red alga) oB N 249 143,172 rp ndh trn H 7 ORF 13 IRF 170 rps 4 trn T 59 ORF 1 G psb C ndh V* trn E atp B atp pet D ORF M psb H pet B Euglena gracilis (protist) o rp OR ps F 40 psb b L psb F trn E trn P W rpl 2 0 5' rp s1 ORF 2 2 16 ORF 43 rpo A rps 11 rpl 36 int A rps 8 rpl 14 rpl 16 rps 3 rpl 22 rps 19 rpl 2 rpl 23 trn I L trn hB nd 7 12 s rp rps 5 3' F 8 OR B trn S trn L trn F trn psb L rbc 33 F 1 06 OR F 1 36 OR RF 85 O 1 0 F 23 OR RF et A p O O OR RF 3 OR F 3 1 F4 7 4 rpl 33 rps 18 Size of cpDNA in Nucleotide Pairs Organism atp A atp atp F atp H I rp s2 The chloroplast genomes from a number of plant and algal species have been sequenced, and cpDNA is now recognized to be basically eubacterial in its organization: the order of some groups of genes is the same as that observed in E. coli, Size of the chloroplast genomes in selected organisms h B 11 12 Chapter 20 photosynthesis, and several proteins having roles in nonphotosynthesis processes. A key protein encoded by cpDNA is ribulose-1,5-bisphosphate carboxylase-oxygenase (abbreviated RuBisCO), which participates in carbon fixation of photosynthesis. RuBisCO makes up about 50% of the protein found in green plants and is therefore considered the most abundant protein on earth. It is a complex protein consisting of eight identical large subunits and eight identical small subunits. The large subunit is encoded by chloroplast DNA, whereas the small subunit is encoded by nuclear DNA. The circular chloroplast genome has genes on both of its strands. Some chloroplast genes have been identified on the basis of a start and stop codon in the same reading frame, but no protein products have yet been isolated for these genes. These sequences are referred to as open reading frames. A prominent feature of most chloroplast genomes is the presence of a large inverted repeat. In rice, this repeat includes genes for 23S rRNA, 4.5S rRNA, and 5S rRNA, as well as several genes for tRNAs and proteins (see Figure 20.11). In some plants, these repeats include the majority of the genome, whereas, in others, the repeats are absent entirely. Much of cpDNA consists of noncoding sequences, and introns are found in many chloroplast genes. Finally, many of the sequences in cpDNA are quite similar to those found in equivalent eubacterial genes. Concepts Most chloroplast genomes consist of a single, circular DNA molecule not complexed with histone proteins. Although there is considerable size variation, the cpDNAs found in most vascular plants are about 150,000 bp. Genes are scattered in the circular chloroplast genome, and many contain introns. Most cpDNAs contain a large inverted repeat. Replication, Transcription, and Translation of cpDNA Little is known about the process of replication of cpDNA. The results of studies viewing cpDNA replication with electron microscopy suggest that replication begins within two D loops and spreads outward to form a theta-like structure. After an initial round of replication, DNA synthesis may switch to a rolling-circle-type mechanism (see Figure 12.5). The transcription and translation of chloroplast genes are similar in many respects to these processes in eubacteria. For example, promoters found in cpDNA are virtually identical with those found in eubacteria and possess sequences similar to the 10 and 35 consensus sequences of eubacterial promoters. The same antibiotics that inhibit protein synthesis in eubacteria (as well as in mitochondria) inhibit protein synthesis in chloroplasts, indicating that protein synthesis in eubacteria and chloroplasts is similar. Chloroplast translation is initiated by N-formylmethionine, just as it is in eubacteria. Most genes in cpDNA are transcribed in groups; only a few genes have their own promoters and are transcribed as separate mRNA molecules. The RNA polymerase that transcribes cpDNA is more similar to eubacterial RNA polymerase than to any of the RNA polymerases that transcribe eukaryotic nuclear genes. Like eubacterial mRNAs, chloroplast mRNAs are not capped at the 5 ends, and poly(A) tails are not added to the 3 ends. However, introns are removed from some RNA molecules after transcription, and the 5 and 3 ends may undergo some additional processing before the molecules are translated. Like eubacterial mRNAs, many chloroplast mRNAs have a Shine-Dalgarno sequence in the 5 untranslated region, which may serve as a ribosome-binding site. Chloroplasts, like eubacteria, contain 70S ribosomes that consist of two subunits, a large 50S subunit and a smaller 30S subunit. The small subunit includes a single RNA molecule that is 16S in size, similar to that found in the small subunit of eubacterial ribosomes. The larger 50S subunit includes three rRNA molecules: a 23S rRNA, a 5S rRNA, and a 4.5 rRNA. In eubacterial ribosomes, the large subunit possesses only two rRNA molecules, which are 23S and 5S in size. The 4.5S rRNA molecule found in the large subunit of chloroplast ribosomes is homologous to the 3 end of the 23S rRNA found in eubacteria; so the structure of the chloroplast ribosome is very similar to that of ribosomes found in eubacteria. Initiation factors, elongation factors, and termination factors function in chloroplast translation and eubacterial translation in similar ways. Most chloroplast chromosomes encode from 30 to 35 different tRNAs, suggesting that the expanded wobble seen in mitochondria does not exist in chloroplast translation. Only universal codons have been found in cpDNA, and translation in chloroplast starts with N-formylmethionine as the first amino acid. Evolution of cpDNA The DNA sequences of chloroplasts are very similar to those found in cyanobacteria; so chloroplast genomes clearly have a eubacterial ancestry. Overall, cpDNA sequences evolve slowly compared with sequences in nuclear DNA and some mtDNA. For most chloroplast genomes, size and gene organization are similar, although there are some notable exceptions. Concepts Many aspects of the transcription and translation of cpDNA are similar to those of eubacteria. Chloroplast DNA sequences are most similar to DNA sequences in cyanobacteria which supports the endosymbiotic theory. Most cpDNA evolves slowly in sequence and structure. www.whfreeman.com/pierce Information on chloroplast genomes that have been sequenced Organelle DNA Connecting Concepts Genome Comparisons A theme running through the preceding discussions of mitochondrial and chloroplast genomes has been a comparison of these genomes with those found in eubacterial and eukaryotic cells (Table 20.4). The endosymbiotic theory indicates that mitochondria and chloroplasts evolved from eubacterial ancestors, and one might therefore assume that mtDNA and cpDNA would be similar to DNA found in eubacterial cells. The actual situation is more complex: mitochondrial DNA and chloroplast DNA possess a mix of eubacterial, eukaryotic, and unique characteristics. The mitochondrial and chloroplast genomes are similar to those of eubacterial cells in that they are relatively small, lack histone proteins, and are usually on circular DNA molecules. Gene organization and the expression of organelle genomes, however, display some similarities to eubacterial genomes and some similarities to eukaryotic genomes. Introns are present in some organelle genomes but are absent from others. Pre-mRNA introns (see p. 000 in Chapter 14 for a discussion of different types of introns) are absent from mitochondrial and chloroplast genes, as they are from eubacterial genes. Group II introns are present in some organelle and eubacterial genomes but are absent from eukaryotic nuclear genomes. Group I introns are common in some mtDNA and in most cpDNA, and these introns are also found in eubacterial, archaeal, and eukaryotic genomes. Table 20.4 13 Polycistronic mRNA, which is common in eubacteria but uncommon in eukaryotes, is also found in mitochondria and especially chloroplasts. Human mtDNA, which has little noncoding DNA between genes and little repetitive DNA, is similar in organization to that of typical eubacterial chromosomes, but other mitochondrial and chloroplast genomes possess long noncoding sequences between genes. Antibiotics that inhibit eubacterial translation also inhibit organelle translation, and the 5 cap, which is added to eukaryotic mRNA after transcription, is absent from organelle mRNA. A 3 poly(A) tail, characteristic of most nuclear mRNAs, is present only in some animal mitochondrial mRNA, and it appears to be fundamentally different from that found in nuclear mRNAs. Shine-Dalgarno sequences, the ribosome-binding sites characteristic of eubacterial DNA, are present in some cpDNA but are absent in mtDNA. Finally, some mitochondrial genomes use nonuniversal codons and have extended wobble, which is rare in both eubacterial and eukaryotic DNA. What conclusions can we draw from these comparisons? Clearly, the genomes of mitochondria and chloroplasts are not typical of the nuclear genomes of the eukaryotic cells in which they reside. In sequence, organelle DNA is most similar to eubacterial DNA, but many aspects of organization and expression in organelle genomes are unique. It is important to remember that the endosymbiotic theory does not propose that mitochondria and chloroplasts are eubacterial in nature but that they arose Comparison of nuclear eukaryotic, eubacterial, mitochondrial, and chloroplast genomes Characteristic Eukaryotic Genome Eubacterial Genome Mitochondrial Genome Chloroplast Genome Genome consists of double-stranded DNA Yes Yes Yes Yes Circular No Yes Most Yes Histone proteins Yes No No No Size Large Small Small Small Single molecule per genome No Yes Yes in animals No in some plants Yes Pre-mRNA introns Common Absent Absent Absent Group I introns Present Present Present Present Group II introns Absent Present Present Present Polycistronic mRNA Uncommon Common Present Common 5 cap added to mRNA Yes No No No 3 poly(A) tail added to mRNA Yes No Some in animals No Shine-Dalgarno sequence in 5untranslated region of mRNA No Yes Rare Some Nonuniversal codons Rare Rare Yes No Extended wobble No No Yes No Translation inhibited by tetracycline No Yes Yes Yes 14 Chapter 20 from eubacterial ancestors more than a billion years ago. Through time, the genomes of the endosymbiont have undergone considerable evolutionary change and have evolved characteristics that distinguish them from contemporary eubacterial and eukaryotic genomes. The Intergenomic Exchange of Genetic Information Many proteins found in modern mitochondria and chloroplasts are encoded by nuclear genes, which suggests that much of the original genetic material in the endosymbiont has probably been transferred to the nucleus. This assumption is supported by the observation that some DNA sequences normally found in mtDNA have been detected in nuclear DNA of some strains of yeast and maize. Likewise, chloroplast sequences have been found in the nuclear DNA of spinach Furthermore, the sequences of nuclear genes that encode organelle proteins are most similar to their eubacterial counterparts. There is also evidence that genetic material has moved from chloroplasts to mitochondria. For example, DNA fragments from the 16S rRNA gene and two tRNA genes that are normally encoded by cpDNA have been found in the mtDNA of maize. Sequences from the gene that encodes the large subunit of RuBisCO, which is normally encoded by cpDNA, are duplicated in maize mtDNA. And there is even evidence that some nuclear genes have moved into mitochondrial genomes. The exchange of genetic material between the nuclear, mitochondrial, and chloroplast genomes has given rise to the term “promiscuous DNA” to describe this phenomenon. The mechanism by which this exchange takes place is not entirely clear. Mitochondrial DNA and Aging in Humans Symptoms of many human genetic diseases caused by defects in mtDNA first appear in middle age or later and increase in severity as people age. One hypothesis to explain the late onset and progressive worsening of mitochondrial diseases is related to the decline in oxidative phosphorylation with aging. Oxidative phosphorylation is the process that generates ATP, the primary carrier of energy in the cell. This process takes place on the inner membrane of the mitochondrion and requires a number of different proteins, some encoded by mtDNA and others encoded by nuclear genes. Oxidative phosphorylation normally declines with age and, if it falls below some critical threshold, tissues do not make enough ATP to sustain vital functions and disease symptoms appear. Most people start life with an excess capacity for oxidative phosphorylation; this capacity decreases with age, but most people reach old age or die before the critical threshold is passed. Persons born with mitochondrial diseases carry mutations in their mtDNA that lower their oxidative phosphorylation capacity. At birth, their capacity may be sufficient to support their ATP needs but, as their oxidative phosphorylation capacity declines with age, they cross the critical threshold and begin to experience symptoms. These symptoms usually first appear in tissues that are most critically dependent on mitochondrial energy: the central nervous system, heart and skeletal muscle, pancreatic islets, kidneys, and the liver. Why does oxidative phosphorylation capacity decline with age? One possible explanation is that damage to mtDNA accumulates with age; deletions and base substitutions in mtDNA increase with age. For example, a common 5000-bp deletion in mtDNA is absent in normal heart muscle cells before the age of 40, but afterward this deletion is present with increasing frequency. The same deletion is found at a low frequency in normal brain tissue before age 75 but is found in 11% to 12% of mtDNAs in the basal ganglia by age 80. People with mtDNA genetic diseases may age prematurely because they begin life with damaged mtDNA. The mechanism of age-related increases in mtDNA damage is not yet known. Oxygen radicals, highly reactive compounds that are natural by-products of oxidative phosphorylation, are known to damage DNA (see p. 000 in Chapter 17). Because mtDNA is physically close to the enzymes taking part in oxidative phosphorylation, it may be more prone to oxidative damage than nuclear DNA. When mtDNA has been damaged, the cell’s capacity to produce ATP drops. To produce sufficient ATP to meet the cell’s energy needs, even more oxidative phosphorylation must occur, which in turn may stimulate further production of oxygen radicals, leading to a vicious cycle. Significantly elevated levels of mtDNA defects have been observed in some patients with late-onset degenerative diseases, such as diabetes mellitus, ischemic heart disease, Parkinson disease, Alzheimer disease, and Huntington disease. All of these diseases appear in middle to old age and have symptoms associated with tissues that critically depend on oxidative phosphorylation for ATP production. However, because Huntington disease and some cases of Alzheimer disease are inherited as autosomal dominant conditions, mtDNA defects cannot be the primary cause of these diseases, although they may contribute to their progression. Connecting Concepts Across Chapters This chapter is about the unique properties of organelle DNA, which is part of the cytoplasm and usually exhibits uniparental inheritance. A unifying theme has been that mitochondria and chloroplasts evolved from free-living eubacteria that entered into an endosymbiotic relation with the eukaryotic cells in which they are found. Endosymbiosis helps to explain many of the characteristics of mitochondrial DNA and chloroplast DNA, which Organelle DNA resemble eubacterial DNA more than they do nuclear eukaryotic DNA. However, not all aspects of mtDNA and cpDNA are similar to eubacterial DNA; organelle DNA has a number of properties that are unique. Another prominent theme that runs through this chapter is that cpDNA and mtDNA display a bewildering diversity of variation in size and organization. The reason for this variation is unknown, but the variation makes summarizing mitochondrial and chloroplast genomes difficult. Traits encoded by mitochondrial and chloroplast genes are inherited in a very different manner from those encoded by nuclear genes. Because organelle DNA is located in the cytoplasm, the traits that it encodes exhibit cytoplasmic inheritance and are typically inherited from a single parent, most often the mother. Many traits encoded by mtDNA and cpDNA exhibit phenotypic variation among progeny of a single cross and even among cells and tissues within an 15 individual organism; the latter occurs when there are two or more genetic variants in a single cell and random segregation of the organelles in cell division produces cells with different proportions of the two types of DNA. Understanding the inheritance of mitochondrial- and chloroplast-encoded traits builds on earlier discussions of uniparental inheritance in Chapter 5 and biparental inheritance (with which it is contrasted) in Chapter 3. Material in the present chapter is closely linked to information on DNA structure and organization found in Chapters 10 and 11 and to discussions of replication, transcription, RNA processing, and translation found in Chapters 12 through 15. Molecular techniques described in this chapter are covered more thoroughly in Chapter 18. The use of mtDNA in evolutionary studies will be discussed in more detail in Chapter 23. CONCEPTS SUMMARY • Mitochondria and chloroplasts are eukaryotic organelles that possess their own DNA. Traits encoded by mtDNA and cpDNA exhibit cytoplasmic inheritance and usually are inherited from a single parent, most often the mother. Random segregation of organelles in cell division may produce phenotypic variation among cells within a single individual and among the offspring of a single female. • The endosymbiotic theory proposes that mitochondria and chloroplasts originated as free-living prokaryotic (specifically eubacterial) organisms that entered into a beneficial association with eukaryotic cells. Similarities in the gene sequences of organelle and eubacterial DNA support a eubacterial origin for mitochondrial and chloroplast DNA. • The mitochondrial genome usually consists of a single, circular DNA molecule that lacks histone proteins, although plants may have multiple circular molecules. Mitochondrial DNA varies in size among different groups of organisms; most of this variation is due to noncoding DNA. Each cell contains many copies of mtDNA. • The organization of genes in the mitochondrial genome differs among organisms. Ancestral mitochondrial genomes typically have characteristics of eubacterial genomes, including eubacterial-like ribosomes, a complete or almost complete set of tRNA genes, few introns, little noncoding DNA between genes, genes organized into eubacterial-like clusters, and the use of only universal codons. Derived mitochondrial genomes are smaller and contain fewer genes. Their rRNA genes and ribosomes differ from those found in eubacteria, and they use some nonuniversal codons. • Human mtDNA is highly economical, with few noncoding nucleotides. Fungal and plant mtDNAs contain much noncoding DNA between genes, introns within genes, and extensive 5 and 3 untranslated regions. Most plant mitochondrial genomes contain one or more large direct repeats, which may recombine to produce smaller or larger DNA molecules. • Mitochondrial DNA is synthesized throughout the cell cycle, and its synthesis is not coordinated with the replication of nuclear DNA. • The transcription of mitochondrial genes varies among different organisms. Messenger RNAs produced by the transcription of mtDNA are not capped at their 5 ends; poly(A) tails are added to the 3 ends of some animal mRNAs, but these tails are different from the poly(A) tails found on nuclear-encoded mRNAs. • Antibiotics that inhibit eubacterial ribosomes also inhibit mitochondrial ribosomes. Protein synthesis in mitochondria is initiated at AUG start codons by N-formylmethionine and employs eubacterial-like elongation factors. Many mitochondrial genomes encode a limited number of tRNAs, with relaxed codon – anticodon pairing rules and extended wobble. • Comparisons of mtDNA sequences suggest that mitochondria evolved from a eubacterial ancestor. Vertebrate mtDNA exhibits rapid change in sequence but little change in gene content and organization. Plant mtDNA exhibits little change in sequence but much variation in gene content and organization. • Chloroplast genomes consist of a single, circular DNA molecule that varies little in size and lacks histone proteins. Each plant cell contains multiple copies of cpDNA. • Most chloroplast chromosomes possess large inverted repeats; some chloroplast genes contain introns. • Transcription and translation are similar in chloroplasts and eubacteria: most chloroplast genes are transcribed as polycistronic units, their mRNAs are not capped, no poly(A) 16 Chapter 20 tails are added, and they possess a Shine-Dalgarno ribosome-binding sequence. • Chloroplast DNA sequences are most similar to those in cyanobacteria. Chloroplast DNA sequences tend to evolve slowly. • Through evolutionary time, many mitochondrial and chloroplast genes have moved to nuclear chromosomes. In some plants, there is evidence that copies of chloroplast genes have moved to the mitochondrial genome. IMPORTANT TERMS mitochondrial DNA (mtDNA) (p. 000) chloroplast DNA (cpDNA) (p. 000) heteroplasmy (p. 000) replicative segregation (p. 000) homoplasmy (p. 000) endosymbiotic theory (p. 000) D loop (p. 000) Worked Problems 1. A physician examines a young man who has a progressive muscle disorder and visual abnormalities. A number of the patient’s relatives have the same condition, as shown in the adjoining pedigree. The degree of expression of the trait is highly variable among members of the family: some are only slightly affected, whereas others developed severe symptoms at an early age. The physician concludes that this disorder is due to a mutation in the mitochondrial genome. Do you agree with the physician’s conclusion? Why or why not? Could the disorder be due to a mutation in a nuclear gene? Explain your reasoning. I 1 2 II 1 3 2 4 5 III 1 2 3 4 5 6 7 8 9 10 11 IV 1 2 3 4 5 • Solution The conclusion that the disorder is caused by a mutation in the mitochondrial genome is supported by the pedigree and the observation of variable expression in affected members of the same family. The disorder is passed only from affected mothers to offspring; when fathers are affected, none of their children have the trait (as seen in the children of II-2 and III-6). This outcome is expected of traits determined by mutations in mtDNA, because mitochondria are in the cytoplasm and usually inherited only from a single (in humans, the maternal) parent. The facts that some offspring of affected mothers do not show the trait (III-9 and IV-5) and that expression varies from one person to another suggest that affected persons are heteroplasmic, with both mutant and wild-type mitochondria. Random segregation of mitochondria in meiosis may produce gametes having different proportions of mutant and wild-type sequences, resulting in different degrees of phenotypic expression among the offspring. Most likely, symptoms of the disorder develop when some minimum proportion of the mitochondria are mutant. Just by chance, some of the gametes produced by an affected mother contain few mutant mitochondria and result in offspring that lack the disorder. Another possible explanation for the disorder is that it results from an autosomal dominant gene. When an affected (heterozygous) person mates with an unaffected (homozygous) person, about half of the offspring are expected to have the trait, but just by chance some affected parents will have no affected offspring. It is possible that individuals II-2 and III-6 in the pedigree just happened to be male and their sex is unrelated to the mode of transmission. The variable expression could be explained by variable expressivity (see p. 000 in Chapter 3). 2. Suppose that a new organelle is discovered in an obscure group of protists. This organelle contains a small DNA genome and some scientists are arguing that, like chloroplasts and mitochondria, this organelle originated as a free-living eubacterium that entered into an endosymbiotic relation with the protist. Outline a research plan to determine if the new organelle evolved from a free-living eubacterium. What kinds of data would you collect and what predictions would you make if the theory is correct? • Solution We could examine the structure, organization, and sequences of the organelle genome. If the organelle shows only characteristics of eukaryotic DNA, then it most likely has a eukaryotic origin but, if it displays some characteristics of eubacterial DNA, then this Organelle DNA finding supports the theory of a eubacterial origin. However, on the basis of our knowledge of mitochondrial and chloroplast genomes, we should not expect the organelle genome to be entirely eubacterial in its characteristics. We could start by examining the overall characteristics of the organelle DNA. If it has a eubacterial origin, we might expect that the organelle genome will consist of a circular molecule and will lack histone proteins. We might then sequence the organelle DNA to determine its gene content and organization. The presence of any group II introns would suggest a eubacterial origin, because these introns have been found only in eubacterial genomes and genomes derived from 17 eubacteria. The presence of any pre-mRNA introns, on the other hand, would suggest a eukaryotic origin, because these introns have been found only in nuclear eukaryotic genomes. If the organelle genome has a eubacterial origin, we might expect to see polycistronic mRNA, the absence of a 5 cap, and inhibition of translation by those antibiotics that typically inhibit eubacterial translation. Finally, we could compare the DNA sequences found in the organelle genome with homologous sequences from eubacteria and eukaryotic genomes. If the theory of an endosymbiotic origin is correct, then the organelle sequences should be most similar to homologous sequences found in eubacteria. The New Genetics MINING GENOMES EVOLUTIONARY ANALYSIS WITH THE USE OF MITOCHONDRIAL GENOMES Phylogenetic trees are graphic representations of the relationships between different organisms. Traditionally based on morphological, physiological, and behavioral data, evolutionary analysis has been substantially changed by the use of molecular information. In this exercise, you will pose and evaluate a question relating to human evolution. You will use mitochondrial DNA sequences and the tools available at the Biology Workbench, managed by the San Diego Supercomputing Center at the University of California, San Diego. COMPREHENSION QUESTIONS * 1. Briefly describe the general structures of mtDNA and cpDNA. How are they similar? How do they differ? How do their structures compare with the structures of eubacterial and eukaryotic (nuclear) DNA. 2. Explain why many traits encoded by mtDNA and cpDNA exhibit considerable variation in their expression, even among members of the same family. * 3. What is the endosymbiotic theory? How does it help to explain some of the characteristics of mitochondria and chloroplasts? 4. What evidence supports the endosymbiotic theory? 5. How are genes organized in the mitochondrial genome? How does this organization differ between ancestral and derived mitochondrial genomes? * 6. What are nonuniversal codons? Where are they found? 7. How does replication of mtDNA differ from replication of nuclear DNA in eukaryotic cells. * 8. The human mitochondrial genome encodes only 22 tRNAs, whereas at least 32 tRNAs are required for cytoplasmic translation. Why are fewer tRNAs needed in mitochondria? 9. What are some possible explanations for an accelerated rate of evolution in the sequences of vertebrate mtDNA. 10. Briefly describe the organization of genes on the chloroplast * genome. 11. What is meant by the term “promiscuous DNA”? APPLICATION QUESTIONS AND PROBLEMS 12. A wheat plant that is light green in color is found growing in a field. Biochemical analysis reveals that chloroplasts in this plant produce only 50% of the chlorophyll normally found in wheat chloroplasts. Propose a set of crosses to determine whether the light-green phenotype is caused by a mutation in a nuclear gene or a chloroplast gene. 18 Chapter 20 *13. A rare neurological disease is found in the family illustrated in the following pedigree. What is the most likely mode of inheritance for this disorder? Explain your reasoning. graph, draw a line to represent the relative amounts of nuclear DNA that you expect her to find per cell throughout the cell cycle. Then, draw a dotted line on the same graph to indicate the relative amount of mtDNA that you would expect to see at different points throughout the cell cycle. I 2 II 1 2 3 4 5 6 7 8 III 1 2 3 4 5 6 7 8 9 10 11 IV 1 2 3 4 5 6 7 Relative amount of DNA 1 2.0 1.5 1.0 8 14. In a particular strain of Neurospora, a poky mutation exhibits biparental inheritance, whereas poky mutations in other strains are inherited only from the maternal parent. Explain these results. 15. Antibiotics such as chloramphenicol, tetracycline, and erythromycin inhibit protein synthesis in eubacteria but have no effect on protein synthesis encoded by nuclear genes. Cycloheximide inhibits protein synthesis encoded by nuclear genes but has no effect on eubacterial protein synthesis. How might these compounds be used to determine which proteins are encoded by the mitochondrial and chloroplast genomes? 16. A scientist collects cells at various points in the cell cycle and isolates DNA from them. Using density gradient centrifugation, she separates the nuclear and mtDNA. She then measures the amount of mtDNA and nuclear DNA present at different points in the cell cycle. On the following G1 S Cytokinesis G2 Mitosis Cell cycle 17. The introduction to Chapter 1 described how bones found in 1979 outside Ekaterinburg, Russia, were shown to be those of Tsar Nicholas and his family, who were executed in 1918 by a Bolshevik firing squad in the Russian Revolution. To prove that the skeletons were those of the royal family, mtDNA was extracted from the bone samples, amplified by PCR, and compared with mtDNA from living relatives of the tsar’s family. Why was DNA from the mitochondria analyzed instead of nuclear DNA? What are some of the advantages of using mtDNA for this type of study? 18. From Figure 20.8, determine as best you can the percentage of human mtDNA that is coding (transcribed into RNA) and the percentage that is noncoding (not transcribed). CHALLENGE QUESTIONS 19. Mitochondrial DNA sequences have been detected in the nuclear genomes of many organisms, and cpDNA sequences are sometimes found in the mitochondrial genome. Propose a mechanism for how such “promiscuous DNA” might move between nuclear, mitochondrial, and chloroplast genomes. 20. Steven A. Frank and Laurence D. Hurst argued that a cytoplasmically inherited mutation in humans that has severe effects in males but no effect in females will not be eliminated from a population by natural selection, because only females pass on mtDNA. Using this argument, explain why males with Leber hereditary optic neuropathy are more severely affected than females. 21. Several families have been described that exhibit vision problems, muscle weakness, and deafness. This disorder is inherited as an autosomal dominant trait and the disease-causing gene has been mapped to chromosome 10 in the nucleus. Analysis of the mtDNA from affected persons in these families reveals that large numbers of their mitochondrial genomes possess deletions of varying length. Different members of the same family and even different mitochondria from the same person possess deletions of different sizes; so the underlying defect appears to be a tendency for the mtDNA of affected persons to have deletions. Propose an explanation for how a mutation in a nuclear gene might lead to deletions in mtDNA. Organelle DNA 19 SUGGESTED READINGS Anderson, S., A. T. Bankier, B. G. Barrell, M. H. L. de Bruijn, A. R. Coulson, et al. 1981. Sequence and organization of the human mitochondrial genome. Nature 290:457 – 465. Original report of the complete sequencing of the human mtDNA. Birky, C. W., Jr. 1978. Transmission genetics of mitochondria and chloroplasts. Annual Review of Genetics 12:471 – 512. A review of how traits encoded by mtDNA and cpDNA are inherited. Fox, T. D. 1987. Natural variation in the genetic code. Annual Review of Genetics 21:67 – 91. A review of nonuniversal codons. Gray, M. W. 1992. The endosymbiotic hypothesis revisited. International Review of Cytology 141:233 – 357. An excellent review of how data from organelle genomes relate to the endosymbiotic theory. Gray, M. W. 1998. Rickettsia, typhus and the mitochondrial connection. Nature 396:109 – 110. A short commentary on the DNA sequence of Rickettsia prowazekii, the bacterium that causes typhus and is thought to be closely related to the eubacteria that gave rise to mitochondria. Gray, M. W., G. Burger, and B. Franz Lang. 1999. Mitochondrial evolution. Science 283:1476 – 1481. A review of the evolution of mitochondria based on DNA sequence data from a number of different species. Gruissem, W. 1989. Chloroplast RNA: transcription and processing. In A. Marcus, Ed. The Biochemistry of Plants: A Comprehensive Treatise, pp. 151 – 191. Vol. 15, Molecular Biology. New York: Academic Press. A review of the transcription and RNA processing that takes place in chloroplasts. Lehman, N., A. Eisenhawer, K. Hansen, L. D. Mech, R. O. Peterson, P. J. P. Gogan, and R. K. Wayne. 1991. Introgression of coyote mitochondrial DNA into sympatric North American gray wolf population. Evolution 45:104 – 119. A report of the introgression of coyote genes into wolf populations as revealed by mtDNA. Levings, C. S., III, and G. G. Brown. 1989. Molecular biology of plant mitochondria. Cell 56:171 – 179. A report of some of the unique characteristics and properties of plant mtDNA. Poulton, J. 1995. Transmission of mtDNA: cracks in the bottleneck. American Journal of Human Genetics 57:224 – 226. A discussion of the transmission of mtDNA. Sugiura, M. 1989. The chloroplast genome. In A. Marcus, Ed. The Biochemistry of Plants: A Comprehensive Treatise, pp. 133 – 150. Vol. 15, Molecular Biology. New York: Academic Press. An excellent review of the organization and sequence of cpDNA. Sugiura, M. 1989. The chloroplast chromosomes in land plants. Annual Review of Cell Biology 5:51 – 70. A review of the features and evolution of chloroplast DNA among vascular plants. Sugiura, M., T. Hirose, and M. Sugita. 1998. Evolution and mechanism of translation in chloroplasts. Annual Review of Genetics 32:437 – 459. A review of the translation of cpDNA. Wallace, D. C. 1992. Mitochondrial genetics: a paradigm for aging and degenerative diseases? Science 256:628 – 632. A good review of the role of mtDNA in human genetic disease and in aging. Wallace, D. C. 1999. Mitochondrial diseases in man and mouse. Science 283:1482 – 1488. A review of diseases arising from defects in mitochondria, including those due to mutations in mtDNA and nuclear DNA. Yaffe, M. P. 1999. The machinery of mitochondrial inheritance and behavior. Science 283:1493 – 1497. A discussion of evidence suggesting that movements of mitochondria in cell division may not be random but regulated by the cell. 21 Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics • • Flies with Extra Eyes Developmental Genetics Cloning Experiments The Genetics of Pattern Formation in Drosophila Homeobox Genes in Other Organisms Programmed Cell Death in Development Evo-Devo: The Study of Evolution and Development • Immunogenetics The Organization of the Immune System Immunoglobulin Structure The Generation of Antibody Diversity T-Cell-Receptor Diversity Major Histocompatibility Complex Genes Genes and Organ Transplants • Cancer Genetics The Nature of Cancer Cancer As a Genetic Disease Flies have been genetically engineered to have extra eyes on their legs, wings, and elsewhere. Through genetic engineering, the eyeless gene can be expressed in cells of body parts where eyes do not normally appear. Flies with Extra Eyes We can all imagine situations where an extra set of eyes might come in handy: eyeing members of the opposite sex while still paying attention to the professor during lecture, looking both ways at the same time before crossing the street; or watching your backside in a barroom brawl. However useful extra eyes might be, creating them at selected locations is no simple matter. An eye is, after all, an exceedingly complex structure, consisting of photoreceptors, lens, nerves, and other tissues. It would be very unlikely for all of these structures to develop at a site where eyes don’t normally exist. Nevertheless, in 1995, a group of geneticists succeeded in genetically engineering fruit flies with extra eyes on their wings, legs, and antennae. How was this amazing feat accomplished? 20 Genes That Contribute to Cancer The Molecular Genetics of Colorectal Cancer The story of creating flies with extra eyes began in 1915, when Mildred Hoge discovered a mutant fruit fly with small eyes due to a recessive mutation in a gene called eyeless. The product of the normal allele of the eyeless locus is required for proper development of the fruit-fly eye. In 1993, Walter Gehring and his collaborators were investigating Drosophila genes that encode transcription factors (see Chapter 13). One of these genes mapped to the same location as that of the eyeless gene and, in fact, turned out to be the eyeless gene. To see what effect eyeless might have on development, Gehring’s group genetically engineered cells that expressed the eyeless gene in parts of the fly where the gene is not normally expressed. When these flies hatched, they had huge eyes on their wings, antennae, and legs. These structures were not just tissue that resembled eyes; they were complete eyes with a cornea, cone cells, and photoreceptors that Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics responded to light, although the flies could not use them to see, because they were not connected to the nervous system. The eyeless gene appears to be one of the long-sought master control switches of development: its protein activates a set of other genes that are responsible for making a complete eye. The eyeless gene has counterparts in mice and humans that affect the development of mammalian eyes. There is a striking similarity between the eyeless gene of Drosophila and the Small eye gene that exists in mice. In mice, a mutation in one copy of Small eye causes small eyes; a mouse that is homozygous for the Small eye mutation has no eyes. There is also a similarity between the eyeless gene in Drosophila and the Aniridia gene in humans; a mutation in Aniridia produces a severely malformed human eye. Similarities in the sequences of eyeless, Small eye, and Aniridia suggest that all three genes evolved from a common ancestral sequence. This possibility is surprising, because the eyes of insects and mammals were thought to have evolved independently. Similarities among eyeless, Small eye, and Aniridia suggest that a common pathway underlies eye development in flies, mice, and humans. This chapter focuses on three specialized topics in genetics: developmental genetics, immunogenetics, and cancer genetics. We begin with a discussion of the genetic control of the early development of Drosophila embryos, one of the best-understood developmental systems. We then turn to the genetics of the immune system in vertebrates. This system is capable of generating proteins that recognize virtually any foreign substance in the body. The generation of this huge diversity of proteins relies on a special type of genetic recombination unique to the immune system. Last, we consider the genetic basis of cancer and how mutations in particular types of genes contribute to the growth of tumors in humans. Developmental Genetics Every multicellular organism begins life as a unicellular, fertilized egg. This single-celled zygote undergoes repeated cell divisions, eventually producing millions or trillions of cells 1 Phloem tissue from the carrot is disrupted… ◗ 2 …and single cells are isolated. that constitute a complete adult organism. Initially, each cell in the embryo is totipotent — it has the potential to develop into any cell type. Many cells in plants and fungi remain totipotent, but animal cells usually become committed to developing into specific types of cells after just a few early embryonic divisions. This commitment often comes well before a cell begins to exhibit any characteristics of a particular cell type; once the cell becomes committed, it cannot reverse its fate and develop into a different cell type. A cell becomes committed by a process called determination, the mechanism of which is still unknown. For many years, the work of developmental biologists was limited to describing the changes that take place in the course of development, because techniques for probing the intracellular processes behind these changes were unavailable. But, in recent years, powerful genetic and molecular techniques have had a tremendous influence on the study of development. In a few model systems such as Drosophila, the molecular mechanisms underlying developmental change are now beginning to be understood. Cloning Experiments If all cells in a multicellular organism are derived from the same original cell, how do different cells types arise? One possibility is that, throughout development, genes might be selectively lost or altered, causing different cell types to have different genomes. Alternatively, each cell might contain the same genetic information, but different genes might be expressed in each cell type. Early cloning experiments helped to answer this question. In the 1950s, Frederick Steward developed methods for cloning plants. He disrupted phloem tissue from the root of a carrot, separating and isolating individual cells. He then placed individual cells in a sterile medium that contained nutrients. Steward was successful in getting the cells to grow and divide, and eventually he obtained whole edible carrots from single cells ( ◗ FIGURE 21.1). Because all parts of the plant were regenerated from a specialized phloem cell, 3 A single cell is placed in a nutritive medium that contains growth hormones… 21.1 Many plants can be cloned from isolated single cells. Thus none of the original genetic material is lost during development. 4 …and eventually gives rise to a complete carrot plant. 21 22 Chapter 21 Steward concluded that each phloem cell contained the genetic potential for a whole plant; none of the original genetic material was lost during determination. The results of later studies demonstrated that most animal cells also retain a complete set of genetic information during development. In 1952, Robert Briggs and Thomas King removed the nuclei from unfertilized oocytes of the frog Rana pipiens. They then isolated nuclei from frog blastulas (an early embryonic stage) and injected these nuclei individually into the oocytes. The eggs were then pricked with a needle to stimulate them to divide. Although most were damaged in the process, a few eggs developed into complete tadpoles that eventually metamorphosed into frogs. In the late 1960s, John Gurdon used these methods to successfully clone a few frogs with nuclei isolated from intestinal cells of tadpoles. This accomplishment suggested that the differentiated intestinal cells carried the genetic information necessary to encode traits found in all other cells. However, Gurdon’s successful clonings may have resulted from the presence of a few undifferentiated stem cells in the intestinal tissue, which were inadvertently used as the nuclei donors. In 1997, researchers at the Roslin Institute of Scotland announced that they had successfully cloned a sheep by using the genetic material from a differentiated cell of an adult animal. To perform this experiment, they fused an udder cell from a white-faced Finn Dorset ewe with an enucleated egg cell and stimulated the egg electrically to initiate development. After growing it in the laboratory for a week, they implanted the embryo into a Scottish black-faced surrogate mother. Dolly, the first mammal cloned from an adult cell, was born on July 5, 1996 ( ◗ FIGURE 21.2). Since ◗ 21.2 In 1996, researchers at the Roslin Institute of Scotland successfully cloned a sheep named Dolly. They used the genetic material from a differentiated cell of an adult animal. the cloning of Dolly, other sheep, mice, and calves have been cloned from differentiated adult cells. These cloning experiments demonstrated that genetic material is not lost or permanently altered during development — development must require the selective expression of genes. But how do cells regulate their gene expression in a coordinated manner to give rise to a complex, multicellular organism? Research has now begun to provide some answers to this important question. Concepts The ability to clone plants and animals from single specialized cells demonstrates that genes are not lost or permanently altered during development. www.whfreeman.com/pierce Information about cloning, nuclear transfer research, and about the ethics of cloning The Genetics of Pattern Formation in Drosophila One of the best-studied systems for the genetic control of pattern formation is the early embryonic development of Drosophila melanogaster. Geneticists have isolated a large number of mutations in fruit flies that influence all aspects of their development, and these mutations have been subjected to molecular analysis, providing much information about how genes control early development in Drosophila. The development of the fruit fly An adult fruit fly possesses three basic body parts: head, thorax, and abdomen ( ◗ FIGURE 21.3). The thorax consists of three segments: the first thoracic segment carries a pair of legs; the second thoracic segment carries a pair of legs and a pair of wings; and the third thoracic segment carries a pair of legs and the halteres (rudiments of the second pair of wings found in most other insects). The abdomen contains nine segments. When a Drosophila egg has been fertilized, its diploid nucleus ( ◗ FIGURE 21.4a) immediately divides nine times without division of the cytoplasm, creating a single, multinucleate cell ( ◗ FIGURE 21.4b). These nuclei are scattered throughout the cytoplasm but later migrate toward the periphery of the embryo and divide several more times ( ◗ FIGURE 21.4c). Next, the cell membrane grows inward and around each nucleus, creating a layer of approximately 6000 cells at the outer surface of the embryo ( ◗ FIGURE 21.4d). Four nuclei at one end of the embryo develop into pole cells, which eventually give rise to germ cells. The early embryo then undergoes further development in three distinct stages: (1) the anterior – posterior axis and the dorsal – ventral axis of the embryo are established ( ◗ FIGURE 21.5a); (2) the number and orientation of the body segments are determined ( ◗ FIGURE 21.5b); and (3) the identity of each individual Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics 23 3 The embryo develops into a larva that passes through three stages… 2 Within a few hours, segmentation appears. 1 day 10 hours 2 days 5–8 hours Embryo (shown enlarged) Larval stages Egg 0 hours 1 A Drosophila egg hatches and develops into a hollow cylinder of cells. 3 days 2 hours Pupa Eye Haltere Wing Leg Antennae 9 days Abdomen Thorax Head 5 days 5 …and an adult emerges. 4 …before becoming a pupa. The pupa undergoes metamorphosis… ◗ 21.3 The fruit fly, Drosophila melanogaster, passes through three larval stages and a pupa before developing into an adult fly. The three major body parts of the adult are head, thorax, and abdomen. segment is established ( ◗ FIGURE 21.5c). Different sets of genes control each of these three stages (Table 21.1). Egg-polarity genes The egg-polarity genes play a crucial role in establishing the two main axes of development in fruit flies. You can think of these axes as the longitude and latitude of development: any location in the Drosophila embryo can be defined in relation to these two axes. There are two sets of egg-polarity genes: one set determines the anterior – posterior axis and the other determines the dorsal – ventral axis. These genes work by setting up concentration gradients of morphogens within the developing embryo. A morphogen is a protein whose concentration gradient affects the developmental fate of the surrounding region. The egg-polarity genes are transcribed into mRNAs during egg formation in the maternal parent, and these mRNAs become incorporated into the cytoplasm of the egg. After fertilization, the mRNAs are translated into proteins that play an important role in determining the anterior – posterior and dorsal – ventral axes of the embryo. Because the mRNAs of the polarity genes are produced by the female parent and influence the phenotype of their offspring, the traits encoded by them are examples of genetic maternal effects (see p. 000 in Chapter 4). Egg-polarity genes function by producing proteins that become asymmetrically distributed in the cytoplasm, giving the egg polarity, or direction. This asymmetrical distribution may take place in a couple of ways. The mRNA may be localized to particular regions of the egg cell, leading to an abundance of the protein in those regions when the mRNA is translated. Alternatively, the mRNA may be randomly distributed, but the protein that it encodes may become asymmetrically distributed, either by a transport system that delivers it to particular regions of the cell or by its removal from particular regions by selective degradation. Determination of the dorsal – ventral axis The dorsal – ventral axis defines the back (dorsum) and belly (ventrum) of a fly (see Figure 21.5). At least 12 different genes determine this axis, one of the most important being a gene called dorsal. The dorsal gene is transcribed and translated in the maternal ovary, and the resulting mRNA and protein are transferred to the egg during oogenesis. In a newly laid egg, mRNA and protein encoded by the dorsal gene are uniformly distributed throughout the cytoplasm but, after the nuclei migrate to the periphery of the embryo (see Figure 21.4c), Dorsal protein becomes redistributed. Along one side of the embryo, Dorsal protein remains in the cytoplasm; this side will become the dorsal surface. Along the other side, Dorsal protein is taken up into the nuclei; this side will become the ventral surface. At this point, there is a smooth gradient of increasing nuclear Dorsal concentration from the dorsal to the ventral side ( ◗ FIGURE 21.6). 24 Chapter 21 (a) Single-celled diploid zygote (a) 2-hour embryo 1 Sperm and egg nuclei fuse to create a single-celled diploid zygote. Dorsal 1 The anterior–posterior and dorsal–ventral axes of the embryo are established. Single 2n nucleus Anterior (b) Multinucleate syncytium Posterior Ventral (b) 10-hour embryo (c) Syncytial blastoderm 3 The nuclei migrate to the periphery of the embryo and divide several more times, creating the syncytial blastoderm. Head 2 The number and orientation of the body segments are established. Thoracic Abdominal segments segments (c) Adult 3 The identity of each individual segment is established. Pole nuclei (d) Cellular blastoderm Early embryo 2 Multiple nuclear divisions create a single multinucleate cell, the syncytium. 4 The cell membrane grows around each nucleus, producing a layer of cells that surrounds the embryo. The resulting structure is the cellular blastoderm. ◗ Pole cells 5 Nuclei at one end of the blastoderm develop into pole cells, which become the primordial germ cells. ◗ 21.4 Early development of a Drosophila embryo. The nuclear uptake of Dorsal protein is thought to be governed by a protein called Cactus, which binds to Dorsal protein and traps it in the cytoplasm. The presence of yet another protein, called Toll, can alter Dorsal, allowing it to dissociate from Cactus and move into the nucleus. Together, Cactus and Toll regulate the nuclear distribution of Dorsal protein, which in turn determines the dorsal – ventral axis of the embryo. Inside the nucleus, Dorsal protein acts as a transcription factor, binding to regulatory sites on the DNA and activating or repressing the expression of other genes (Table 21.2). High nuclear concentration of Dorsal protein (as on the ventral side of the embryo) activates a gene called twist, which causes mesoderm to develop. Low concentrations of 21.5 In an early Drosophila embryo, the major body axes are established, the number and orientation of the body segments are determined, and the identity of each individual segment is established. Different sets of genes control each of these three stages. Dorsal protein (as in cells on the dorsal side of the embryo), activates a gene called decapentaplegic, which specifies dorsal structures. In this way, the ventral and dorsal sides of the embryo are determined. Table 21.1 Stages in the early development of fruit flies and the genes that control each stage Developmental Stage Genes Establishment of main body axes Egg polarity genes Determination of number and polarity of body segments Segmentation genes Establishment of identity of each segment Homeotic genes Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics ◗ 21.6 Dorsal protein in the nuclei helps to determine the dorsal – ventral axis of the Drosophila embryo. (a) Relative concentrations of Dorsal protein in the cytoplasm and nuclei of cells in the early Drosophila embryo. (b) Micrograph of a cross section of the embryo showing the Dorsal protein, darkly stained, in the nuclei along the ventral surface. Determination of the anterior – posterior axis Establishing the anterior – posterior axis of the embryo is a crucial step in early development. We will consider several genes in this pathway (Table 21.3). One important gene is bicoid, which is first transcribed in the ovary of an adult female during oogenesis. Bicoid mRNA becomes incorporated into the cytoplasm of the egg and, as it is passes into the egg, bicoid mRNA becomes anchored to the anterior end of the egg by part of its 3 end. This anchoring causes bicoid mRNA to become concentrated at the anterior end (FIGURE 21.7a). (A number of other genes that are active in the ovary are required for proper localization of bicoid mRNA in the egg.) When the egg has been laid, bicoid mRNA is translated into Bicoid protein. Because most of the mRNA is at the anterior end of the egg, Bicoid protein is synthesized there and forms a concentration gradient along the Table 21.2 anterior – posterior axis of the embryo, with a high concentration at the anterior end and a low concentration at posterior end. This gradient is maintained by the continuous synthesis of Bicoid protein and its short half-life. The high concentration of Bicoid protein at the anterior end induces the development of anterior structures such as the head of the fruit fly. Bicoid — like Dorsal — is a morphogen. It stimulates the development of anterior structures by binding to regulatory sequences in the DNA and influencing the expression of other genes. One of the most important of the genes stimulated by Bicoid protein is hunchback, which is required for the development of the head and thoracic structures of the fruit fly. The development of the anterior – posterior axis is also greatly influenced by a gene called nanos, an egg-polarity Key genes that control development of the dorsal – ventral axis in fruit flies and their action Gene Where Expressed Action of Gene Product dorsal Ovary Affects expression of genes such as twist and decapentaplegic cactus Ovary Traps Dorsal protein in cytoplasm toll Ovary Alters Dorsal protein, allowing it to dissociate from Cactus protein and move into nuclei of ventral cells twist Embryo Takes part in development of mesodermal tissues decapentaplegic Embryo Takes part in development of gut structures 25 26 Chapter 21 Table 21.3 Some key genes that determine the anterior – posterior axis in fruit flies Gene Where Expressed Action bicoid Ovary Regulates expression of genes responsible for anterior structures; stimulates hunchback nanos Ovary Regulates expression of genes responsible for posterior structures; inhibits translation of hunchback mRNA hunchback Embryo Regulates transcription of genes responsible for anterior structures gene that acts at the posterior end of the axis. The nanos gene is transcribed in the adult female, and the resulting mRNA becomes localized at the posterior end of the egg ( ◗ FIGURE 21.7b). After fertilization, nanos mRNA is translated into Nanos protein, which diffuses slowly toward the anterior end. The Nanos protein gradient is opposite that of Bicoid protein: Nanos is most concentrated at the posterior end of the embryo and is least concentrated at the anterior end. Nanos protein inhibits the formation of anterior structures by repressing the translation of hunchback mRNA. The synthesis of the Hunchback protein is therefore stimulated at the anterior end of the embryo by Bicoid protein and is repressed at the posterior end by Nanos protein. This combined stimulation and repression results in a Hunchback ◗ protein concentration gradient along the anterior – posterior axis that, in turn, affects the expression of other genes and helps determine the anterior and posterior structures. Concepts The major axes of development in early fruit-fly embryos are established as a result of initial differences in the distribution of specific mRNAs and proteins encoded by genes in the female parent (genetic maternal effect). These differences in distribution establish concentration gradients of morphogens, which cause different genes to be activated in different parts of the embryo. 21.7 The anterior – posterior axis in a Drosophila embryo is determined by concentrations of Bicoid and Nanos proteins. Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics Table 21.4 Segmentation genes and the effects of mutations in them Class of Gene Effect of Mutations Examples of Genes Gap genes Delete adjacent segments hunchback, Krüppel, knirps, giant, tailless Pair-rule genes Delete same part of pattern in every other segment runt, hairy, fushi tarazu, even paired, odd paired, skipped, sloppy, paired, odd skipped Segment-polarity genes Affect polarity of segment; part of segment replaced by mirror image of part of another segment engrailed, wingless, gooseberry, cubitus interruptus, patched, hedgehog, disheveled, costal-2, fused The segmentation genes fall into three groups as shown in Table 21.4 and ◗ FIGURE 21.8. Gap genes define large sections of the embryo; mutations in these genes eliminate whole groups of adjacent segments. Mutations in the Krüppel gene, for example, cause the absence of several adjacent segments. Pair-rule genes define regional sections of the embryo and affect alternate segments. Mutations in the even-skipped gene cause the deletion of even-numbered segments, whereas mutations in the fushi tarazu gene cause the absence of odd-numbered segments. Segment-polarity genes affect the organization of segments. Mutations in Segmentation genes Like all insects, the fruit fly has a segmented body plan. When the basic dorsal – ventral and anterior – posterior axes of the fruit-fly embryo have been established, segmentation genes control the differentiation of the embryo into individual segments. These genes affect the number and organization of the segments, and mutations in them usually disrupt whole sets of segments. The approximately 25 segmentation genes in Drosophila are transcribed after fertilization; so they don’t exhibit a genetic maternal effect, and their expression is regulated by the Bicoid and Nanos protein gradients. (a) Gap genes (b) Pair-rule genes (c) Segment-polarity genes Deleted segments Normal larva 1 2 3 1 2 Head Thoracic segments 3 4 5 6 7 8 1 2 3 1 2 3 4 5 6 7 8 1 2 3 1 2 3 4 5 6 7 Abdominal segments Mutant larva Krüppel 1 Mutation of Krüppel causes the elimination of anterior segments. ◗ Even-skipped 2 Mutation of even-skipped causes the deletion of even–numbered segments. 21.8 Segmentation genes control the differentiation of the Drosophila embryo into individual segments. The gap genes affect large sections of the embryo. The pair-rule genes affect alternate segments. The segment-polarity genes affect the polarity of segments. Gooseberry 3 Mutation of gooseberry causes the posterior half of each segment to be replaced by a mirror image of the anterior half of an adjacent segment. 27 28 Chapter 21 these genes cause part of each segment to be deleted and replaced by a mirror image of part or all of an adjacent segment. For example, mutations in the gooseberry gene cause the posterior half of each segment to be replaced by the anterior half of an adjacent segment. The gap genes, pair-rule genes, and segment-polarity genes act sequentially, affecting progressively smaller regions of the embryo. First, the egg-polarity genes activate or repress the gap genes, which divide the embryo into broad regions. The gap genes, in turn, regulate the pair-rule genes, which affect the development of pairs of segments. Finally, the pairrule genes influence the segment-polarity genes, which guide the development of individual segments. Concepts When the major axes of the fruit-fly embryo have been established, segmentation genes determine the number, orientation, and basic organization of the body segments. Homeotic genes After the segmentation genes have established the number and orientation of the segments, homeotic genes become active and determine the identity of individual segments. Eyes normally arise only on the head segment, whereas legs develop only on the thoracic segments. The products of homeotic genes activate other genes that encode these segment-specific characteristics. Mutations in the homeotic genes cause body parts to appear in the wrong segments. Homeotic mutations were first identified in 1894, when William Bateson noticed that floral parts of plants occasionally appeared in the wrong place: he found, for example, flowers in which stamens grew in the normal place of (a) ◗ petals. In the late 1940s, Edward Lewis began to study homeotic mutations in Drosophila, which caused bizarre rearrangements of body parts. Mutations in the Antennapedia gene, for example, cause legs to develop on the head of a fly in place of the antenna ( ◗ FIGURE 21.9). Homeotic genes create addresses for the cells of particular segments, telling the cells where they are within the regions defined by the segmentation genes. When a homeotic gene is mutated, the address is wrong and cells in the segment develop as though they were somewhere else in the embryo. Homeotic genes are expressed after fertilization and are activated by specific concentrations of the proteins produced by the gap, pair-rule, and segment-polarity genes. The homeotic gene Ultrabithorax (Ubx), for example, is activated when the concentration of Hunchback protein (a product of a gap gene) is within certain values. These concentrations exist only in the middle region of the embryo; so Ubx is expressed only in these segments. The homeotic genes encode regulatory proteins that bind to DNA; each gene contains a subset of nucleotides, called a homeobox, that are similar in all homeotic genes. The homeobox consists of 180 nucleotides and encodes 60 amino acids that serve as a DNA-binding domain; this domain is related to the helix-turn-helix motif (See Figure 16.2a). Homeoboxes are also present in segmentation genes and other genes that play a role in spatial development. There are two major clusters of homeotic genes in Drosophila. One cluster, the Antennapedia complex, affects the development of the adult fly’s head and anterior thoracic segments. The other cluster consists of the bithorax complex and includes genes that influence the adult fly’s posterior thoracic and abdominal segments. Together, the bithorax and Antennapedia genes are termed the homeotic complex (HOM-C). In Drosophila, the bithorax complex contains three (b) 21.9 The homeotic mutation Antennapedia substitutes legs for the antenna of a fruit fly. (a) Normal, wild-type antenna. (b) Antennapedia mutant. 29 Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics Antennapedia complex Dfd Scr lab pb labial proboscipedia Bithorax complex abdA Antp Ubx Antennapedia Ultrabithorax AbdB Chromosome 3 Deformed Sex combs reduced abdominal A Abdominal B The arrangement of the genes on the chromosome corresponds to the sequence in which the genes are expressed along the anterior–posterior axis of the body. Anterior Posterior ◗ 21.10 Homeotic genes, which determine the identity of individual segments in Drosophila, are present in two complexes. The Antennapedia complex has five genes, and the bithorax complex has three genes. Homeobox Genes in Other Organisms genes, and the Antennapedia complex has five; they are all located on the same chromosome ( ◗ FIGURE 21.10). In addition to these eight genes, HOM-C contains many sequences that regulate the homeotic genes. Remarkably, the order of the genes in the HOM-C is the same as the order in which the genes are expressed along the anterior – posterior axis of the body. The genes that are expressed in the more anterior segments are found at the one end of the complex, whereas those expressed in the more posterior end of the embryo are found at the other end of complex (See Figure 21.10). The reason for this correlation is unknown. After homeotic genes in Drosophila had been isolated and cloned, molecular geneticists set out to determine if similar genes exist in other animals; probes complementary to the homeobox of Drosophila genes were used to search for homologous genes that might play a role in the development of other animals. The search was hugely successful: homeobox-containing (Hox) genes have been found in all animals studied so far, including nematodes, beetles, sea urchins, frogs, birds, and mammals. They have even been discovered in fungi and plants, indicating that Hox genes arose early in the evolution of eukaryotes. In vertebrates, there are four clusters of Hox genes, each of which contains from 9 to 11 genes. Interestingly, the Hox genes of other organisms exhibit the same relation between order on the chromosome and order of their expression along the anterior – posterior axis of the embryo as that of Drosophila ( ◗ FIGURE 21.11). Mammalian Hox genes, like those in Drosophila, encode transcription factors that help determine the identity of body regions along an anterior – posterior axis. Concepts Homeotic genes help determine the identity of individual segments in Drosophila embryos by producing DNA-binding proteins that activate other genes. Each homeotic gene contains a consensus sequence called a homeobox, which encodes the DNA-binding domain. 1 Genes shown in the same color are homologous. lab pb 1 2 Dfd Scr Antp Ubx abdA AbdB Drosophila 2 There are four clusters of Hox genes in mammals, each cluster containing 9–11 genes. 3 4 5 6 7 9 10 11 13 HoxA 1 2 3 4 5 6 4 5 6 7 8 9 8 9 10 11 12 13 8 9 10 11 12 13 HoxB Mammal HoxC 1 3 4 HoxD ◗ 21.11 Homeotic genes in mammals are similar to those found in Drosophila. The complexes are arranged so that genes with similar sequences lie in the same column. See Figure 21.10 for the full names of the Drosophila genes. 3 The mammalian Hox genes are similar in sequence to the homeotic genes found in Drosophila, and they are in the same order. 30 Chapter 21 Single-celled embryo Concepts Egg-polarity genes Homeobox-containing genes are found in many organisms, in which they regulate development. Determination of major body axes www.whfreeman.com/pierce genes Gap genes More information about Hox Regional sections of embryo defined Pair-rule genes Connecting Concepts The Control of Development Development is a complex process consisting of numerous events that must take place in a highly specific sequence. The results of studies in fruit flies and other organisms reveal that this process is regulated by a large number of genes. In Drosophila, the dorsal – ventral axis and the anterior – posterior axis are established by maternal genes; these genes encode mRNAs and proteins that are localized to specific regions within the egg and cause specific genes to be expressed in different regions of the embryo. The proteins of these genes then stimulate other genes, which in turn stimulate yet other genes in a cascade of control. As might be expected, most of the gene products in the cascade are regulatory proteins, which bind to DNA and activate other genes. In the course of development, successively smaller regions of the embryo are determined ( ◗ FIGURE 21.12). In Drosophila, first, the major axes and regions of the embryo are established by egg polarity genes. Next, patterns within each region are determined by the action of segmentation genes: the gap genes define large sections; the pair-rule genes define regional sections of the embryo and affect alternate segments; and the segment-polarity genes affect individual segments. Finally, the homeotic genes provide each segment with a unique identity. Initial gradients in proteins and mRNA stimulate localized gene expression, which produces more finely located gradients that stimulate even more localized gene expression. Developmental regulation thus becomes more and more narrowly defined. The processes by which limbs, organs, and tissues form (called morphogenesis) are less well understood, although this pattern of generalized-to-localized gene expression is encountered frequently. www.whfreeman.com/pierce Drosophila development, images of fruit-fly anatomy and development, images of mammalian embryos, and many resources on development biology Individual segments defined Segment-polarity genes Polarity of individual segments defined Homeotic genes Identity of individual segments defined ◗ 21.12 A cascade of gene regulation establishes the polarity and identity of individual segments of Drosophila. In development, successively smaller regions of the embryo are determined. Programmed Cell Death in Development Cell death is an integral part of multicellular life. Cells in many tissues have a limited life span, and they die and are replaced continually by new cells. Cell death shapes many body parts during development: it is responsible for the disappearance of a tadpole’s tail during metamorphosis and causes the removal of tissue between the digits to produce the human hand. Cell death is also used to eliminate dangerous cells that have escaped normal controls (see next section on cancer). Cell death in animals is often initiated by the cell itself in a kind of cellular suicide termed apoptosis. In this process, a cell’s DNA is degraded, its nucleus and cytoplasm shrink, and the cell undergoes phagocytosis by other cells without any leakage of its contents ( ◗ FIGURE 21.13a). Cells that are injured, on the other hand, die in a relatively uncontrolled manner called necrosis. In this process, a cell swells and bursts, spilling its contents over neighboring cells and eliciting an inflammatory response ( ◗ FIGURE 21.13b). Apoptosis is essential to embryogenesis; most multicellular animals cannot complete development if the process is inhibited. Surprisingly, most cells are programmed to undergo apoptosis and will survive only if the internal death pro- Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics gram is continually held in check. The process of apoptosis is highly regulated and depends on numerous signals inside and outside the cell. Geneticists have identified a number of genes having roles in various stages of the regulation of apoptosis. Some of these genes encode enzymes called caspases, which cleave other proteins at specific sites (after aspartic acid). Each caspase is synthesized as a large, inactive precursor (a procaspase) that is activated by cleavage, often by another caspase. When one caspase is activated, it cleaves other procaspases that trigger even more caspase activity. The resulting cascade of caspase activity eventually cleaves proteins essential to cell function, such as those supporting the nuclear membrane and cytoskeleton. Caspases also cleave a protein that normally keeps an enzyme that degrades DNA (DNAse) in an inactive form. Cleavage of (a) Apoptosis (b) Necrosis 1 DNA is degraded. 2 Cell and nucleus shrink; nucleus fragments. Macrophage 1 Cell swells. 2 Cell lyses and releases cytoplasmic material. this protein activates DNAse and leads to the breakdown of cellular DNA, which eventually leads to cell death. Procaspases and other proteins required for cell death are continuously produced by healthy cells, so the potential for cell suicide is always present. A number of different signals can trigger apoptosis; for instance, infection by a virus can activate immune cells to secrete substances onto an infected cell, causing that cell to undergo apoptosis. This process is believed to be a defense mechanism designed to prevent the reproduction and spread of viruses. Similarly, DNA damage can induce apoptosis and thus prevent the replication of mutated sequences. Damage to mitochondria and the accumulation of a misfolded protein in the endoplasmic reticulum also stimulate programmed cell death. Apoptosis in animal development is still poorly understood but is believed to be controlled through cell – cell signaling. The cell death that causes the disappearance of a tadpole’s tail, for example, is triggered by thyroxin, a hormone produced by the thyroid gland that increases in concentration during metamorphosis. The elimination of cells between developing fingers in humans is thought to result from localized signals from nearby cells. The symptoms of many diseases and disorders are caused by apoptosis or, in some cases, its absence. In neurodegenerative diseases such as Parkinson disease and Alzheimer disease, symptoms are caused by a loss of neurons through apoptosis. In heart attacks and stroke, some cells die through necrosis, but many others undergo apoptosis. Cancer is often stimulated by mutations in genes that regulate apoptosis, leading to a failure of apoptosis that would normally eliminate cancer cells. Concepts 3 Shrinking continues and cell is engulfed by macrophage. Cells are capable of apoptosis (programmed cell death), a highly regulated process that depends on enzymes called caspases. Apoptosis plays an important role in animal development and is implicated in a number of diseases. www.whfreeman.com/pierce apoptosis 4 Macrophage phagocytizes apoptotic cell. ◗ 21.13 Programmed cell death by apoptosis is distinct from uncontrolled cell death through necrosis. Additional information on Evo-Devo: The Study of Evolution and Development “Ontogeny recapitulates phylogeny” is a familiar phrase that was coined in the 1860s by German zoologist Ernst Haeckel to describe his belief — now considered wrong — that organisms repeat their evolutionary history during development. According to Haeckel’s theory, a human embryo passes through fish, amphibian, reptilian, and mammalian stages before developing human traits. 31 32 Chapter 21 Although ontogeny does not recapitulate phylogeny, many evolutionary biologists today are turning to the study of development for a better understanding of the processes and patterns of evolution. Sometimes called “evo-devo,” the study of evolution through the analysis of development is revealing that the same genes often shape developmental pathways in distantly related organisms. In humans and insects, for example, the same gene controls the development of eyes, despite the fact that insect and mammalian eyes are thought to have evolved independently. Similarly, biologists once thought that segmentation in vertebrates and invertebrates was only superficially similar, but we now know that, in both Drosophila and amphioxus (a marine organism closely related to vertebrates), a gene called engrailed divides the embryo into specific segments. A gene called distalless, which creates the legs of a fruit fly, has also been found to also play a role in the development of crustacean branched appendages. This same gene also stimulates body outgrowths of many other organisms, from polycheate worms to starfish. Similar genes may be part of a developmental pathway common to two different species but have quite different effects. For example, a Hox gene called AbdB helps define the posterior end of a Drosophila embryo; a similar group of genes in birds divides the wing into three segments. In another example, the sog gene in fruit flies stimulates cells to assume a ventral orientation in the embryo, but the expression of a similar gene called chordin in vertebrates causes cells to assume dorsal orientation, exactly the opposite of the situation in fruit flies. The theme emerging from these studies is that a small, common set of genes may underlie many basic developmental processes in many different organisms. Although Haeckel’s euphonious phrase “ontogeny recapitulates phylogeny” was incorrect, evo-devo is proving that development can reveal much about the process of evolution. Immunogenetics A basic assumption of developmental biology is that every somatic cell carries an identical set of genetic information and that no genes are lost during development. Although this assumption holds for most cells, there are some important exceptions, one of which concerns genes that encode immune function in vertebrates. The immune system provides protection against infection by specific bacteria, viruses, fungi, and parasites. The focus of an immune response is an antigen, defined as any molecule that elicits an immune reaction. Although any molecule can be an antigen, most are proteins. The immune system is remarkable in its ability to recognize an almost unlimited number of potential antigens. The body is full of proteins, so it is essential that the immune system be able to distinguish between self-antigens and foreign antigens. Occasionally, the ability to make this distinction breaks down, and the body produces an immune reaction to its own antigens, resulting in an autoimmune disease (Table 21.5). The Organization of the Immune System The immune system contains a number of different components and uses several mechanisms to provide protection against pathogens, but most immune responses can be grouped into two major classes: humoral immunity and cellular immunity. Although it is convenient to think of these classes as separate systems, they interact and influence each other significantly. Humoral immunity centers on the production of antibodies by special lymphocytes called B cells ( ◗ FIGURE 21.14), which mature in the bone marrow. Antibodies are proteins that circulate in the blood and other body fluids, binding to specific antigens and marking them for destruction by phagocytic cells. Antibodies also activate a set of proteins called complement that help to lyse cells and attract macrophages. Cellular immunity is conferred by T cells (see Figure 21.14), which are specialized lymphocytes that mature in the thymus and respond only to antigens found on the surfaces of the body’s own cells. After a pathogen such as a virus has infected a host cell, some viral antigens appear on the cell surface. Proteins, called T-cell receptors, on the surfaces of T cells bind to these antigens and mark the infected cell for destruction. T-cell receptors must simultaneously bind a foreign antigen and a self-antigen called a major histocompatibility complex (MHC) antigen on the cell surface. Not all T cells attack cells having foreign antigens; some help regulate immune responses, providing communication among different components of the immune system. How can the immune system recognize an almost unlimited number of foreign antigens? Remarkably, each mature lymphocyte is genetically programmed to attack Table 21.5 Examples of autoimmune diseases Disease Tissues Attacked Graves disease, Hashimoto thyroiditis Thyroid gland Rheumatic fever Heart muscle Systematic lupus erythematosus Joints, skin, and other organs Rheumatoid arthritis Joints Insulin-dependent diabetes mellitus Insulin-producing cells in pancreas Multiple sclerosis Myelin sheath around nerve cells Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics 1 Lymphocytes originate from stem cells in the bone marrow. 2 B cells mature in the bone marrow. B cell Lymphocyte stem cell Bone marrow Plasma cell Antigens 4 T cells mature in the thymus and enter circulation. T cell Thymus Antibodies 3 B cells enter the bloodstream. When they encounter antigens, they mature into B plasma cells, which secrete antibodies that confer humoral immunity to the antigen. 5 They attack by binding host cells and lysing them (cellular immunity). Receptors on T cell fit antigens ◗ 21.14 Immune responses are divided into humoral immunity, in which antibodies are produced by B cells, and cellular immunity produced by T cells. one and only one specific antigen: each mature B cell produces antibodies against a single antigen, and each T cell is capable of attaching to only one type of foreign antigen. If each lymphocyte is specific for only one type of antigen, how does an immune response develop? The theory of clonal selection proposes that initially there is a large pool of millions of different lymphocytes, each capable of binding only one antigen ( ◗ FIGURE 21.15); so millions of different foreign antigens can be detected. To illustrate clonal selection, let’s imagine that a foreign protein enters the body. Only a few lymphocytes in the pool will be specific for this particular foreign antigen. When one of these lymphocytes encounters the foreign antigen and binds to it, that lymphocyte is stimulated to divide. The lymphocyte proliferates rapidly, producing a large population of genetically identical cells — a clone — each of which is specific for that particular antigen. This initial proliferation of antigen-specific B and T cells is known as a primary immune response (see Figure 21.15); in most cases, the primary response destroys the foreign antigen. Subsequent to the primary immune response, most of the lymphocytes in the clone die, but a few continue to circulate in the body. These memory cells may remain in circulation for years or even for the rest of one’s life. Should the same antigen reappear at some time in the future, memory cells specific to that antigen become activated and quickly give rise to another clone of cells capable of binding the antigen. The rise of this second clone is termed a secondary immune response (see Figure 21.15). The ability to quickly produce a second clone of antigen-specific cells permits the long-lasting immunity that often follows recovery from a disease. For example, people who have chicken pox usually have 33 life-long immunity to the disease. The secondary immune response is also the basis for vaccination, which stimulates a primary immune response to an antigen and results in memory cells that can quickly produce a secondary response if that same antigen appears in the future. Three sets of proteins are required for immune responses: antibodies, T-cell receptors, and the major histocompatibility antigens. The next section explores how the enormous diversity in these proteins is generated. Concepts Each B cell and T cell of the immune system is genetically capable of binding one type of foreign antigen. When a lymphocyte binds to an antigen, the lymphocyte undergoes repeated division, giving rise to a clone of genetically identical lymphocytes (the primary response), all of which are specific for that same antigen. Memory cells remain in circulation for long periods of time; if the antigen reappears, the memory cells undergo rapid proliferation and generate a secondary immune response. Immunoglobulin Structure The principal products of the humoral immune response are antibodies — also called immunoglobulins. Each immunoglobulin (Ig) molecule consists of four polypeptide chains — two identical light chains and two identical heavy chains — which form a Y-shaped structure ( ◗ FIGURE 21.16). Disulfide bonds link the two heavy chains in the stem of the Y 34 Chapter 21 B1 B2 B3 1 In a large pool of B lymphocytes, each is specific for one antigen. B4 Antigens 2 When an antigen binds to a B cell, the B cell divides… B2 Primary immune response B2 Clone of B cells B2 B2 B2 B2 B2 3 …and gives rise to a clone of B cells, all specific for the same antigen. 4 This proliferation of lymphocytes is the primary immune response. and attach a light chain to a heavy chain in each arm of the Y. Binding sites for antigens are at the ends of the two arms. The light chains of an immunoglobulin come in two basic types, called kappa chains and lambda chains. An immunoglobulin molecule can have two kappa chains or two lambda chains, but it cannot have one of each type. Both the light and the heavy chain has a variable region at one end and a constant region at the other end; the variable regions of different immunoglobulin molecules vary in amino acid sequence, whereas the constant regions of different immunoglobulins are similar in sequence. The variable regions of both light and heavy chains make up the antigen-binding region and specify the type of antigen that the antibody can bind. Mammals have five basic classes of immunoglobulins, known as IgM, IgD, IgE, IgG, and IgA. Each class is defined (a) Plasma cells Antigenbinding site 5 Some cells differentiate into antibody-secreting plasma cells. 6 Antibodies are specific for the antigen. Antibodies Secondary immune response Memory cells B2 B2 S Light chain B2 Plasma cells Antibodies ◗ B2 S S S 9 …the antigen binds to the memory cells,… B2 B2 Joining region ( J) Light chain Constant region (C) (b) Antigenbinding site B2 S S 8 If a second exposure of the same antigen occurs,… Antigen B2 Variable region (V) S S 7 Memory cells remain in circulation. Antigenbinding site Heavy chains Light chains Antigenbinding site B2 10 …which rapidly give rise to a secondary immune response. Heavy chains Memory cells 21.15 An immune response to a specific antigen is produced through clonal selection. ◗ 21.16 Each immunoglobulin molecule consists of four polypeptide chains — two light chains and two heavy chains — that combine to form a Y-shaped structure. (a) Structure of an immunoglobulin. (b) Folded, space-filling model. Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics by the type of heavy chain found in the immunoglobulin. The different classes of antibodies have different functions or they appear at different times during an immune response or both. For example, in a primary response, all B cells initially make IgM but, as the immune response develops, they switch to producing a combination of IgM and IgD. Later, the B cells may switch to one of the other immunoglobulin classes. The Generation of Antibody Diversity The immune system is capable of making antibodies against virtually any antigen that might be encountered in one’s lifetime: each human is capable of making about 1015 different antibody molecules. Antibodies are proteins; so the amino acid sequences of all 1015 potential antibodies must be encoded in the human genome. However, there are fewer than 1  105 genes in the human genome and, in fact, only 3  109 total base pairs; so how can this huge diversity of antibodies be encoded? (a) The answer lies in the fact that antibody genes are composed of segments. There are a number of copies of each type of segment, each differing slightly from the others. In the maturation of a lymphocyte, the segments are joined to create an immunoglobulin gene. The particular copy of each segment used is random and, because there are multiple copies of each type, there are many possible combinations of the segments. A limited number of segments can therefore encode a huge diversity of antibodies. To illustrate this process of antibody assembly, let’s consider the immunoglobulin light chains. Kappa and lambda chains are encoded by separate genes on different chromosomes. Each gene is composed of three types of segments: V, for variable; J, for joining; and C, for constant. The V segments encode most of the variable region of the light chains, the C segment encodes the constant region of the chain, and the J segments encode a short set of nucleotides that join the V segment and the C segments together. The number of V, J, and C segments differs among species. For the human kappa gene, there are from 30 to 35 1 There are 30–35 different V gene segments,… 2 …5 J gene segments,… Variable region segments V1 V2 3 …and 1 C gene segment in germ-line DNA. Joining region segments V3 Vn (b) V1 Constant region segment J1 J2 J3 J4 J5 V2 J3 J4 J5 V2 J3 J4 Germline DNA C B-cell DNA 4 In this example, V2 is moved next to J3 through somatic recombination, producing the DNA found in a mature B cell. Transcription (c) C PremRNA C J5 5 The V-J-C pre-mRNA is processed so that the mature mRNA contains sequences for only one V, J, and C gene segment. (d) AAAAAAA V2 J3 C 6 This mRNA is translated into a functional light chain. Translation (e) V ◗ 21.17 Antibody diversity is produced by somatic recombination. Shown here is recombination among gene segments that encode the human kappa light chain. Mature mRNA J C Protein 35 36 Chapter 21 different functional V gene segments, 5 different J genes, and a single C gene segment, all of which are present in the germ-line DNA ( ◗ FIGURE 21.17a). The V gene segments, which are about 400 bp in length, are located on the same chromosome and are separated from one another by about 7000 bp. The J gene segments are about 30 bp in length and all together encompass about 1400 bp. Initially, an immature lymphocyte inherits all of the V gene segments and all of the J gene segments present in the germ line. In the maturation of the lymphocyte, somatic recombination within a single chromosome moves one of the V genes to a position next to one of the J gene segments ( ◗ FIGURE 21.17b). In Figure 21.17b, V2 (the second of approximately 35 different V gene segments) undergoes somatic recombination, which places it next to J3 (the third of 5 J gene segments); the intervening segments are lost. After somatic recombination has taken place, the combined V-J-C gene is transcribed and processed ( ◗ FIGURE 21.17c and d). The mature mRNA that results contains only sequences for a single V, J, and C segment; this mRNA is translated into a functional light chain ( ◗ FIGURE 21.17e). (a) V1 In this way, each mature human B cell produces a unique type of kappa light chain, and different B cells produce slightly different kappa chains, depending on the combination of V and J segments that are joined. The gene that encodes the lambda light chain is organized in a similar way but differs from the kappa gene in the number of copies of the different segments. In the human gene for the lambda light chain, there are from 29 to 33 different functional V gene segments and 4 or 5 different functional J and C gene segments (each C gene segment is attached to a different J segment). Somatic recombination takes place among the segments in the same way as that in the kappa gene, generating many possible combinations of lambda light chains. The gene that encodes the immunoglobulin heavy chain is arranged in V, J, and C segments, but this gene also possesses D (for diversity) segments. Somatic recombination taking place in lymphocyte maturation joins one D gene segment to one J gene segment, and then a V gene segment is joined to this combined D-J gene segment ( ◗ FIGURE 21.18a and b). Transcription and RNA processing of this gene produces a mRNA that encodes only one 2 In somatic recombination, V, D, J, and C segments are joined to produce the DNA found in the B cell. 1 The germ-line DNA contains multiple V, D, J and C gene segments. V2 V3 Vn D1 D2 D3 Dn J1 J2 J3 Jn C1 C2 C3 C4 C5 C6 C7 C8 C9 Germline DNA Somatic recombination (b) V1 V2 V3 D3 J3 C4 C5 C6 C7 C8 B-cell DNA Transcription (c) V3 D3 J3 C4 C5 C6 C7 C8 RNA processing (d) AAAAAAA V3 D3 J3 C4 Translation (e) V Variable region ◗ 21.18 Somatic recombination also produces variation in the heavy chain of the immunoglobulin molecule. J PremRNA 3 The combined V-D-J-C premRNA is processed so that the mature mRNA contains sequences for only one V, D, J, and C gene segment. Mature mRNA 4 This mRNA is translated into a functional heavy chain. C Constant region Protein Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics particular type of heavy ch36ain ( ◗ FIGURE 21.18c – e). Thus, many different types of light and heavy chains are possible. Somatic recombination is brought about by RAG1 and RAG2 proteins, which generate double-strand breaks at specific nucleotide sequences called recombination signal sequences that flank the V, D, J, and C gene segments. DNA repair proteins then process and join the ends of particular segments together ( ◗ FIGURE 21.19). In addition to somatic recombination, other mechanisms add to antibody diversity. First, each type of light chain can potentially combine with each type of heavy chain to make a functional immunoglobulin molecule, increasing the amount of possible variation in antibodies. Second, the recombination process that joins V, J, D, and C gene segments in the developing B cell is imprecise, and a few random nucleotides are frequently lost or gained at the junctions of the recombining segments. This junctional diversity greatly enhances variation among antibodies. Third, a high rate of mutation, called somatic hypermutation (the cause of which is unknown), is characteristic of the immunoglobulin genes. Concepts The genes encoding the antibody chains are organized in segments, and germ-line DNA contains multiple versions of each segment. The many possible combinations of V, J, and D segments permit an immense variety of different antibodies to be generated. This diversity is augmented by the different combinations of light and heavy chains, (a) the random addition and deletion of nucleotides at the junctions of the segments, and the high mutation rates in the immunological genes. T-Cell-Receptor Diversity Like B cells, each mature T cell has genetically determined specificity for one type of antigen that is mediated through the cell’s receptors. T-cell receptors are structurally similar to immunoglobulins ( ◗ FIGURE 21.20) and are located on the cell surface; most T-cell receptors are composed of one alpha and one beta polypeptide chain held together by disulfide bonds. One end of each chain is embedded in the cell membrane; the other end projects away from the cell and binds antigens. Like the immunoglobulin chains, each chain of the T-cell receptor possesses a constant region and a variable region (see Figure 21.20); the variable regions of the two chains provide the antigen-binding site. The genes that encode the alpha and beta chains of the T-cell receptor are organized much like those that encode the heavy and light chains of immunoglobulins: each gene is made up of segments that undergo somatic recombination before the gene is transcribed. For example, the human gene for the alpha chain initially consists of 44 to 46 V gene segments, 50 J gene segments, and a single C gene segment. The organization of the gene for the beta chain is similar, except that it also contains D segments. Random combination of alpha and beta chains and junctional diversity takes place, but there is no evidence for somatic hypermutation in T-cell-receptor genes. Recombination signal sequence V segment J segment 5’ 3’ 3’ 5’ 1 RAG1 and RAG2 proteins make double-stranded breaks,… (b) Hairpin RAG1 RAG2 Blunt end 5’ 3’ 3’ 5’ 2 …generating DNA fragments having hairpin ends and blunt ends. DNA repair proteins 3 Proteins that repair DNA open the hairpins,… (c) V segment J segment 5’ 3’ 3’ 5’ 4 …process the ends,… 5 …and join the gene segments. (d) 5’ 3’ ◗ 21.19 Somatic recombination is brought about by RAG1 and RAG2 proteins and DNA repair proteins. 3’ 5’ 37 38 Chapter 21 Surface receptors Antigen-binding region Macrophage MHC molecule Virus T cell Outside cell S S S S S S S S S S 1 A virus is ingested by a macrophage,… Variable region Constant region Viral (foreign) antigen 2 …which processes and displays foreign antigens on its cell surface. Inside cell α chain β chain 3 A T-cell receptor binds the foreign antigen and the host cell‘s own histocompatibility (MHC) molecules. ◗ 21.20 A T-cell receptor is composed of two polypeptide chains, each having a variable and constant region. Most T-cell receptors are composed of alpha () and beta () polypeptide chains held together by disulfide bonds. One end of each chain traverses the cell membrane; the other end projects away from the cell and binds antigens. T cell T-cell receptor Concepts Like the genes that encode antibodies, the genes for the T-cell-receptor chains consist of segments that undergo somatic recombination, generating an enormous diversity of antigen-binding sites. Major Histocompatibility Complex Genes When tissues are transferred from one species to another or even from one individual member to another within a species, the transplanted tissues are usually rejected by the host animal. The results of early studies demonstrated that this graft rejection is due to an immune response that occurs when antigens on the surface of the grafted tissue are detected and attacked by T cells in the host organism. The antigens that elicit graft rejection are referred to as histocompatibility antigens, and they are encoded by a cluster of genes called the major histocompatibility complex. T cells are activated only when the T-cell receptor simultaneously binds both a foreign antigen and the host cell’s own histocompatibility antigen. The reason for this requirement is not clear; it may reserve T cells for action against pathogens that have invaded cells. When a foreign body, such as a virus, is ingested by a macrophage or other cell, partly digested pieces of the foreign body containing antigens are displayed on the macrophage’s surface ( ◗ FIGURE 21.21). Through their T-cell receptors, T cells bind to both Perforin 4 This binding stimulates the T cell to release perforins… 5 …that lyse the antigenpresenting cell. ◗ 21.21 T cells are activated by binding to a foreign antigen and a histocompatibility antigen on the surface of a self-cell. Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics the histocompatibility protein and the foreign antigen and secrete substances that either destroy the antigen-containing cell or activate other B and T cells or both. The MHC genes are among the most variable genes known: there are more than 100 different alleles for some MHC loci. Because each person possesses five or more MHC loci and because many alleles are possible at each locus, no two people (with the exception of identical twins) produce the same set of histocompatibility antigens. The variation in histocompatibility antigens provides each of us with a unique identity for our own cells, which allows our immune systems to distinguish self from nonself. This variation is also the cause of rejection in organ transplants. the cells of the transplanted tissue. The ABO red-blood-cell antigens also are important because they elicit a strong immune reaction. The ideal donor is the patient’s own identical twin, who will have exactly the same MHC and ABO antigens. Unfortunately, most patients don’t have an identical twin. The next best donor is a sibling with the same major MHC and ABO antigens. If a sibling is not available, donors from the general population are considered. An attempt is made to match as many of the MHC antigens of the donor and recipient as possible, and immunosuppressive drugs are used to control rejection that occurs because of the mismatches. The long-term success of transplants depends on the closeness of the match. Survival rates after kidney transplants (the most successful of the major organ transplants) increase from 63% with zero or one MHC match to 90% with four matches. Concepts The MHC genes encode proteins that provide identity to the cells of each individual organism. To bring about an immune response, a T-cell receptor must simultaneously bind both a histocompatibility (self) antigen and a specific foreign antigen. www.whfreeman.com/pierce Additional information about the genetics of the immune system Genes and Organ Transplants For a person with a seriously impaired organ, a transplant operation may offer the only hope of survival. Successful transplantation requires more than the skills of a surgeon; it also requires a genetic match between the patient and the person donating the organ. The fate of transplanted tissue depends largely on the type of antigens present on the surface of its cells. Because foreign tissues are usually rejected by the host, the successful transplantation of tissues between different persons is very difficult. Tissue rejection can be partly inhibited by drugs that interfere with cellular immunity. Unfortunately, this treatment can create serious problems for transplant patients, because they may have difficulty fighting off common pathogens and thus may die of infection. The only other option for controlling the immune reaction is to carefully match the donor and the recipient, maximizing the genetic similarities. The tissue antigens that elicit the strongest immune reaction are the very ones used by the immune system to mark its own cells, those encoded by the major histocompatibility complex. The MHC spans a region of more than 3 million base pairs on human chromosome 6 and has many alleles, providing different MHC antigens on the cells of different people and allowing the immune system to recognize foreign cells. The severity of an immune rejection of a transplanted organ depends on the number of mismatched MHC antigens on Cancer Genetics Cancer kills one of every five Americans, and cancer treatments cost billions of dollars every year. Cancer is not a single disease; rather, it is a heterogeneous group of disorders characterized by the presence of cells that do not respond to the normal controls on division. Cancer cells divide rapidly and continuously, creating tumors that crowd out normal cells and eventually rob healthy tissues of nutrients. The cells of an advanced tumor can separate from the tumor and travel to distant sites in the body, where they may take up residence and develop into new tumors. The most common cancers in the United States are those of the prostate, breast, lung, colon and rectum, and blood (Table 21.6). The Nature of Cancer Normal cells grow, divide, mature, and die in response to a complex set of internal and external signals. A normal cell receives both stimulatory and inhibitory signals, and its growth and division are regulated by a delicate balance between these opposing forces. In a cancer cell, one or more of the signals has been disrupted, which causes the cell to proliferate at an abnormally high rate. As they lose their response to the normal controls, cancer cells gradually lose their regular shape and boundaries, eventually forming a distinct mass of abnormal cells — a tumor. If the cells of the tumor remain localized, the tumor is said to be benign; if the cells invade other tissues, the tumor is said to be malignant. Cells that travel to other sites in the body, where they establish secondary tumors, have undergone metastasis. Cancer As a Genetic Disease Cancer arises as a result of fundamental defects in the regulation of cell division, and its study therefore has significance not only for public health, but also for our basic understanding of cell biology. Through the years, a large number of theories have been put forth to explain cancer, but we now recognize that most, if not all, cancers arise from defects in DNA. 39 40 Chapter 21 Table 21.6 Estimated incidences of various cancers and cancer mortality in the United States in 2002 Type of Cancer Estimated New Cases per Year Estimated Deaths per Year Prostate 189,000 30,200 Breast 205,000 40,000 Lung and bronchus 169,400 154,900 Colon and rectum 148,300 56,600 Lymphoma 60,900 25,800 Bladder 56,500 12,600 Melanoma 53,600 7,400 Uterus 39,300 6,600 Leukemias 30,800 21,700 Oral cavity and pharynx 28,900 7,400 Pancreas 30,300 29,700 Ovary 23,300 13,900 Stomach 21,600 12,400 Brain and nervous system 17,000 13,100 Liver 16,600 14,100 Uterine cervix 13,000 4,100 8,300 3,900 1,268,000 553,400 Cancers of soft tissues including heart All cancers Source: American Cancer Society, Cancer Facts and Figures, 2002 (Atlanta: American Cancer Society, 2001), p. 80 Early observations suggested that cancer might result from genetic damage. First, it was recognized that many agents such as ionizing radiation and chemicals that cause mutations also cause cancer (are carcinogens). Second, some cancers are consistently associated with particular chromosome abnormalities. About 90% of people with chronic myeloid leukemia, for example, have a reciprocal translocation between chromosome 22 and chromosome 9 (see Figure 9.34). Third, some specific types of cancers tend to run in families. Retinoblastoma, a rare childhood cancer of the retina, appears with high frequency in a few families and is inherited as an autosomal dominant trait, suggesting that a single gene is responsible for these cases of the disease. Although these observations hinted that genes play some role in cancer, the theory of cancer as a genetic disease had several significant problems. If cancer is inherited, every cell in the body should receive the cancer-causing gene, and therefore every cell should become cancerous. In those types of cancer that run in families, however, tumors typically appear only in certain tissues and often only when the person reaches an advanced age. Finally, many cancers do not run in families at all and, even in regard to those cancers that generally do, isolated cases crop up in families with no history of the disease. In 1971, Alfred Knudson proposed a model to explain the genetic basis of cancer. Knudson was studying retinoblastoma, a cancer that usually develops in only one eye but occasionally appears in both. Knudson found that, when retinoblastoma appears in both eyes, onset is at an early age, and affected children often have close relatives who also have retinoblastoma. Knudson proposed that retinoblastoma results from two separate genetic defects, both of which are necessary for cancer to develop ( ◗ FIGURE 21.22). He suggested that, in the cases in which the disease affects just one eye, a single cell in one eye undergoes two successive mutations. Because the chance of these two mutations occurring in a single cell is remote, retinoblastoma is rare and typically develops in only one eye. For bilateral cases, Knudson proposed that the child inherited one of the two mutations required for the cancer, and so every cell contains this initial mutation. In these cases, all that is required for cancer to develop is for one eye cell to undergo the second mutation. Because each eye possesses millions of cells, there is a high probability that the second mutation will occur in at least one cell of each eye, producing tumors in both eyes at an early age. Knudson’s hypothesis suggests that cancer is the result of a multistep process that requires several mutations. If one or more of the required mutations is inherited, fewer Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics 1 Rarely, a single cell undergoes two somatic mutations,… First somatic mutation 2 …resulting in a single tumor, for example, in one eye. Second somatic mutation 4 Some cells undergo a single somatic mutation that produces cancer. 3 A predisposed person inherits one mutation. 5 Because only a single mutation is required to produce cancer, the likelihood of it occurring twice (in both eyes, for example) increases. First somatic mutation First somatic mutation Conclusion: Multiple mutations are required to produce cancerous cells. ◗ 21.22 Alfred Knudson proposed that retinoblastoma results from two separate genetic defects, both of which are necessary for cancer to develop. additional mutations are required to produce cancer, and the cancer will tend to run in families. The idea that cancer results from multiple mutations turns out to be correct for most cancers. Knudson’s genetic theory for cancer has been confirmed by the identification of genes that, when mutated, cause cancer. Today, we recognize that cancer is fundamentally a genetic disease, although few cancers are actually inherited. Most tumors arise from somatic mutations that accumulate during our life span, either through spontaneous mutation or in response to environmental mutagens. The clonal evolution of tumors Cancer begins when a single cell undergoes a mutation that causes the cell to divide at an abnormally rapid rate. The cell proliferates, giving rise to a clone of cells, each of which carries the same mutation. Because the cells of the clone divide more rapidly than normal, they soon outgrow other cells. Additional mutations that arise in the clone may further enhance the ability of those cells to proliferate, and cells carrying both mutations soon become dominant in the clone. Eventually, they may be overtaken by cells that contain yet more muta- tions that enhance proliferation. In this process, called clonal evolution, the tumor cells acquire more mutations that allow them to become increasingly more aggressive in their proliferative properties ( ◗ FIGURE 21.23). The rate of clonal evolution depends on the frequency with which new mutations arise. Any genetic defect that allows more mutations to arise will accelerate cancer progression. Genes that regulate DNA repair are often found to have been mutated in the cells of advanced cancers, and inherited disorders of DNA repair are usually characterized by increased incidences of cancer. Because DNA repair mechanisms normally eliminate many of the mutations that arise, without DNA repair, mutations are more likely to persist in all genes, including those that regulate cell division. Xeroderma pigmentosum, for example, is a rare disorder caused by a defect in DNA repair (see p. 000 in Chapter 17). People with this condition have elevated rates of skin cancer when exposed to sunlight (which induces mutation). Mutations in genes that affect chromosome segregation also may contribute to the clonal evolution of tumors. Many cancer cells are aneuploid, and it is clear that chromosome mutations contribute to cancer progression by 41 42 Chapter 21 The role of environment in cancer Although cancer is First mutation 1 A cell is predisposed to proliferate at an abnormally high rate. Second mutation 2 A second mutation causes the cell to divide rapidly. Third mutation 3 After a third mutation, the cell undergoes structural changes. Fourth mutation Malignant cell 4 A fourth mutation causes the cell to divide uncontrollably and invade other tissues. ◗ 21.23 Through clonal evolution, tumor cells acquire multiple mutations that allow them to become increasingly aggressive and proliferative. duplicating some genes (those on extra chromosomes) and eliminating others (those on deleted chromosomes). Cellular defects that interfere with chromosome separation increase aneuploidy and therefore may accelerate cancer progression. Concepts Cancer is fundamentally a genetic disease. Mutations in several genes are usually required to produce cancer. If one of these mutations is inherited, fewer somatic mutations are necessary for cancer to develop, and the person may have a predisposition to cancer. Clonal evolution is the accumulation of mutations in a clone of cells. fundamentally a genetic disease, most cancers are not inherited, and there is little doubt that many cancers are influenced by environmental factors. The role of environmental factors in cancer is suggested by differences in the incidence of specific cancers throughout the world (Table 21.7). The results of studies show that migrant populations typically take on the cancer incidence of their host country. For example, the overall rates of cancer are considerably lower in Japan than in Hawaii. However, within a single generation after migration to Hawaii, Japanese people develop cancer at rates similar to those of native Hawaiians. Smoking is a good example of an environmental factor that is strongly associated with cancer. Other environmental factors such as chemicals, ultraviolet light, ionizing radiation, and viruses are known carcinogens and are associated with variation in the incidence of many cancers. Genes That Contribute to Cancer The signals that regulate cell division fall into two basic types: molecules that stimulate cell division and those that inhibit it. These control mechanisms are similar to the accelerator and brake of an automobile. In normal cells (but hopefully not your car), both accelerators and brakes are applied at the same time, causing cell division to proceed at the proper speed. Because cell division is affected by both accelerators and brakes, cancer can arise from mutations in either type of signal, and there are several fundamentally different routes to cancer ( ◗ FIGURE 21.24). A stimulatory gene can be made hyperactive or active at inappropriate times, analogously to having the accelerator of an automobile stuck in the floored position. Mutations in stimulatory genes are usually dominant, because a mutation in a single copy of the gene is usually sufficient to produce a stimulatory effect. Dominantacting stimulatory genes that cause cancer are termed oncogenes. Cell division may also be stimulated when inhibitory genes are made inactive, analogously to having a defective brake in an automobile. Mutated inhibitory genes generally have recessive effects, because both copies must be mutated to remove all inhibition. Inhibitory genes in cancer are termed tumor-suppressor genes. Although oncogenes or mutated tumor-suppressor genes or both are required to produce cancer, mutations in DNA repair genes can increase the likelihood of acquiring mutations in these genes. Having mutated DNA repair genes is analogous to having a lousy car mechanic who does not make the necessary repairs to a broken accelerator or brake. Oncogenes and tumor-suppressor genes Oncogenes were the first cancer-causing genes to be identified. In 1910, Peyton Rous described a virus that caused connective-tissue tumors (sarcomas) in chickens; this virus became known as the Rous sarcoma virus. A number of other cancer-causing Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics Table 21.7 Examples of geographic variation in the incidence of cancer Type of Cancer Location Incidence Rate* Lip Canada (Newfoundland) Brazil (Fortaleza) 15.1 1.2 Nasopharynx Hong Kong United States (Utah) 30.0 0.5 Colon United States (Iowa) India (Bombay) 30.1 3.4 Lung United States (New Orleans, African Americans) Costa Rica 110.0 17.8 Prostate United States (Utah) China (Shanghai) 70.2 1.8 Bladder United States (Connecticut, Whites) Philippines (Rizal) 25.2 2.8 All cancer Switzerland (Basel) Kuwait 383.3 76.3 Source: C. Muir et al., Cancer incidence in Five Continents, vol. 5 (Lyon: International Agency for Research on Cancer, 1987), Table 12-2. *The incidence rate is the age-standardized rate in males per 100,000 population. viruses were subsequently isolated from various animal tissues. These viruses were generally assumed to carry a cancer-causing gene that was transferred to the host cell. The first oncogene, called src, was isolated from the Rous sarcoma virus in 1970. In 1975, Michael Bishop, Harold Varmus, and their colleagues began to use probes for viral oncogenes to search for related sequences in normal cells. They discovered that the genomes of all normal cells carry DNA sequences that are closely related to viral oncogenes. These cellular genes are called proto-oncogenes. They are responsible for basic cellular functions in normal cells but, when mutated, they become oncogenes that contribute to the development of cancer. When a virus infects a cell, a proto-oncogene may (a) Oncogenes (b) Tumor-suppressor genes Dominant-acting mutation Homozygous wild type (+/+) Heterozygous (+/–) Mutation in either allele Normal growthstimulating factors Normal cell division 1 Proto-oncogenes normally produce factors that stimulate cell division. ◗ Hyperactive Normal stimulatory stimulatory factor factor Excessive cell proliferation 2 Mutant alleles (oncogenes) tend to be dominant: one copy of the mutant allele is sufficient to induce excessive cell proliferation. Recessive-acting mutation Homozygous wild type (+/+) Heterozygous (–/–) Mutation in both alleles (or mutation in one and deletion in one) Normal growthNo limiting factors factor Normal cell division 3 Tumor-suppressor genes normally produce factors that inhibit cell division. 21.24 Both oncogenes and tumor-suppressor genes contribute to cancer but differ in their modes of action and dominance. No factor Excessive cell proliferation 4 Mutant alleles are recessive (both alleles must be mutated to produce excessive cell proliferation). 43 44 Chapter 21 Table 21.8 Some oncogenes and functions of their corresponding proto-oncogenes Oncogene Cellular Location Function of Proto-oncogene sis Secreted Growth factor erbB Cell membrane Part of growth-factor receptor erbA Cytoplasm Thyroid hormone receptor src Cell membrane Protein tyrosine kinase ras Cell membrane GTP binding and GTPase myc Nucleus Transcription factor fos Nucleus Transcription factor jun Nucleus Transcription factor bcl-1 Nucleus Cell cycle become incorporated into the viral genome through recombination. Within the viral genome, the proto-oncogene may mutate to an oncogene that, when inserted back into a cell, causes rapid cell division and cancer. Because the protooncogenes are more likely to undergo mutation or recombination within a virus, viral infection is often associated with the cancer. Proto-oncogenes can be converted into oncogenes in viruses by several different ways. The sequence of the protooncogene may be altered or truncated as it is being incorporated into the viral genome. This mutated copy of the gene may then produce an altered protein that causes uncontrolled cell proliferation. Alternatively, through recombination, a proto-oncogene may end up next to a viral promoter or enhancer, which then causes the gene to be overexpressed. Finally, sometimes the function of a protooncogene in the host cell may be altered when a virus inserts its own DNA into the gene, disrupting its normal function. Many oncogenes have been identified by experiments in which selected fragments of DNA are added to cells in culture. Some of the cells take up the DNA and, if these cells become cancerous, then the DNA fragment that was added to the culture must contain an oncogene. The fragments can then be sequenced, and the oncogene can be identified. More than 70 oncogenes have now been discovered (Table 21.8). Tumor-suppressor genes are more difficult than oncogenes to identify because they inhibit cancer and are recessive; both alleles must be mutated before the inhibition of cell division is removed. Because it is the failure of their function that promotes cell proliferation, tumor-suppressor genes cannot be identified by adding them to cells and looking for cancer. One of the first tumor-suppressor genes to be identified was the retinoblastoma gene. In 1985, Raymond White and Webster Cavenne showed that large segments of chromosome 13 were missing in cells of retinoblastoma tumors, and later the tumor-suppressor gene was isolated from these segments. A number of tumor-suppressor genes have now been discovered in this way (Table 21.9). Genes controlling the cell cycle Genes that control the cell cycle often serve as proto-oncogenes or tumor-suppressor genes. Let’s briefly revisit the regulation of the cell cycle, which was discussed in Chapter 2. The cell cycle is regulated by cyclins, whose concentration oscillates during the cell cycle, and cyclin-dependent kinases (CDKs), which have a relatively constant concentration. Cyclins bind to CDKs, producing activated protein kinases that initiate key events in the cell cycle. Genes that encode cyclins and factors that inhibit or stimulate the formation of activated CDKs are often oncogenes and tumor-suppressor genes, respectively. Mutated cyclin genes have been associated with cancers of the immune system, breast, stomach, and esophagus; genes, such as p16 and p21, that encode inhibitors of CDKs are mutated or missing in many cancer cells. Some proto-oncogenes and suppressor genes have roles in apoptosis. Cells have the ability to assess themselves and, when they are abnormal or damaged, they normally undergo apoptosis (see p. 000). Cancer cells frequently have chromosome mutations, DNA damage, and other cellular anomalies that would normally stimulate apoptosis and Table 21.9 Some tumor-suppressor genes and their functions Gene Cellular Location Function NF1 Cytoplasm GTPase activator p53 Nucleus Transcription factor, regulates apoptosis RB Nucleus Transcription factor WT-1 Nucleus Transcription factor Source: J. Marx, Learning how to suppress cancer, Science 261(1993):1385. Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics prevent their proliferation. Often these cells have mutations in genes that regulate apoptosis, and therefore they do not undergo programmed cell death. The ability of a cell to initiate apoptosis in response to DNA damage, for example, depends on a gene called p53, which is inactivate in many human cancers. www.whfreeman.com/pierce p53 and its role in cancer Additional information about DNA repair genes Cancer arises from the accumulation of multiple mutations in a single cell. Some cancer cells have normal rates of mutation, and multiple mutations accumulate because each mutation gives the cell a replicative advantage over cells without the mutations. Other cancer cells may have higher-than-normal rates of mutation in all of their genes, which leads to more frequent mutation of oncogenes and tumor-suppressor genes. What might be the source of these high rates of mutation in some cancer cells? Two processes control the rate at which mutations arise within a cell: (1) the rate at which errors arise during and after replication; and (2) the efficiency with which these errors are corrected. The error rate during replication is controlled by the fidelity of DNA polymerases and other proteins in the replication process (see Chapter 12). However, defects in genes encoding replication proteins have not been strongly linked to cancer. The mutation rate is also strongly affected by whether errors are corrected by DNA repair systems (see p. 000 in Chapter 17). Defects in genes that encode components of these repair systems have been consistently associated with a number of cancers. People with xeroderma pigmentosum, for example, are defective in nucleotide-excision repair, an important cellular repair system that normally corrects DNA damage caused by a number of mutagens, including ultraviolet light. Likewise, about 13% of colorectal, endometrial, and stomach cancers have cells that are defective in mismatch repair, another major repair system in the cell. Some types of colon cancer are inherited as an autosomal dominant trait. In families with this condition, a person can inherit one mutated and one normal allele of a gene that controls mismatch repair. The normal allele provides sufficient levels of the protein for mismatch repair to function, but it is highly likely that this normal allele will become mutated or lost in at least a few cells. If it does so, there is no mismatch repair, and these cells undergo higherthan-normal rates of mutation, leading to defects in oncogenes and tumor-suppressor genes that cause the cells to proliferate. www.whfreeman.com/pierce DNA repair Additional information on Genes affecting chromosome segregation Most advanced tumors contain cells that exhibit a variety of chromosome anomalies, including extra chromosomes, missing chromosomes, and chromosome rearrangements. Aneuploidy in somatic cells usually arises when chromosomes do not segregate properly in mitosis. Normal cells have a checkpoint that monitors the proper assembly of the mitotic spindle; if chromosomes are not properly attached to the microtubules at metaphase, the onset of anaphase is blocked. Some aneuploid cancer cells contain mutant alleles for genes that encode proteins having roles in this checkpoint; in these cells, anaphase is entered into despite the improper or lack of assembly of the spindle, and chromosome abnormalities result. The tumor-suppressor gene p53, in addition to controlling apoptosis, plays a role in the duplication of the centrosome, which is required for proper formation of the spindle and for chromosome segregation. Normally, the centrosome duplicates once per cell cycle. If p53 is mutated or missing, however, the centrosome may undergo extra duplications, resulting in the unequal segregation of chromosomes. In this way, mutation of the p53 gene may generate chromosome mutations that contribute to cancer. The p53 gene is also a tumor-suppressor gene that prevents cell division when the DNA is damaged. Sequences that regulate telomerase Another factor that may contribute to the progression of cancer is the inappropriate activation of an enzyme called telomerase. Telomeres are special sequences at the ends of eukaryotic chromosomes (see p. 000 in Chapter 11). In DNA replication in somatic cells, DNA polymerases require a 3-OH group to add new nucleotides. For this reason, the ends of chromosomes cannot be replicated, and telomeres become shorter with each cell division. This shortening eventually leads to the destruction of the chromosome and cell death; so somatic cells are capable of a limited number of cell divisions. In germ cells, telomerase replicates the chromosome ends (see p. 000 in Chapter 12), thereby maintaining the telomeres, but this enzyme is not normally expressed in somatic cells. In many tumor cells, however, sequences that regulate the expression of the telomerase gene are mutated so that the enzyme is expressed, and the cell is capable of unlimited cell division. Although the expression of telomerase appears to contribute to the development of many cancers, its precise role in tumor progression is still being investigated. Genes that promote vascularization and the spread of tumors A final set of factors that contribute to the progression of cancer includes genes that affect the growth and spread of tumors. Oxygen and nutrients, which are essential to the survival and growth of tumors, are supplied by blood vessels, and the growth of new blood vessels (angiogenesis) is important to tumor progression. Angiogenesis is stimulated by growth factors and others proteins encoded by genes whose expression is carefully regulated in normal cells. In tumor cells, genes encoding these proteins are often overexpressed compared with normal cells, and inhibitors 45 46 Chapter 21 of angiogenesis-promoting factors may be inactivated or underexpressed. At least one inherited cancer syndrome — van Hippel-Lindau disease, in which people develop multiple types of tumors — is caused by the mutation of a gene that affects angiogenesis. In the development of many cancers, the primary tumor gives rise to cells that spread to distant sites, producing secondary tumors. This process of metastasis is the cause of death in 90% of human cancer cases; it is influenced by cellular changes induced by somatic mutation. By using microarrays to measure levels of gene expression (see Chapter 19), researchers have identified several genes that are transcribed at a significantly higher rate in metastatic cells compared with nonmetastatic cells. These genes encode components of the extracellular matrix and the cytoskeleton, which are thought to affect the migration of cells. Other genes that affect metastasis include adhesion proteins that help hold cells together. Concepts Oncogenes are dominant in their action and stimulate cell proliferation. Tumor-suppressor genes are recessive in their action and inhibit cell proliferation. Defects in DNA repair genes allow a higher-than-normal rate of mutation in oncogenes and tumor suppressor genes. Mutations in genes that control chromosome segregation allow chromosome mutations to accumulate, which may then contribute to cancer progression. Mutations that allow telomerase to be expressed in somatic cells and that affect vascularization and metastasis also may contribute to cancer progression. www.whfreeman.com/pierce General information about cancer, the genetics of cancer, and telomerase The Molecular Genetics of Colorectal Cancer Mutations that contribute to colorectal cancer have received extensive study, and this cancer is an excellent example of how cancer often arises through the accumulation of successive genetic defects. Colorectal cancers arise in the cells lining the colon and rectum. More than 135,000 new cases of colorectal cancer are diagnosed in the United States each year, where this cancer is responsible for more than 56,000 deaths annually. If detected early, colorectal cancer can be treated successfully; consequently, there has been much interest in identifying the molecular events responsible for the initial stages of colorectal cancer. Colorectal cancer is thought to originate as benign tumors called adenomatous polyps. Initially, these polyps are microscopic, but in time they enlarge, and the cells of the polyp acquire the abnormal characteristics of cancer cells. In the later stages of the disease, the tumor may invade the mus- cle layer surrounding the gut and metastasize. The progression of the disease is slow; from 10 to 35 years may be required for a benign tumor to develop into a malignant tumor. Most cases of colorectal cancer are sporadic, developing in people with no family history of the disease, but a few families display a clear genetic predisposition to this disease. In one form of hereditary colon cancer, known as familial adenomatous polyposis coli, hundreds or thousands of polyps develop in the colon and rectum; if these polyps are not removed, one or more almost invariably becomes malignant. Because polyps and tumors of the colon and rectum can be easily observed and removed with a colonoscope (a fiber optic instrument that is used to view the interior of the rectum and colon), much is known about the progression of colorectal cancer, and some of the genes responsible for its clonal evolution have been identified. About 75% of colorectal cancers have mutations in tumor-suppressor gene p53, and many also have a mutation in the ras protooncogene. Families with adenomatous polyposis coli carry a defect in a gene called APC, and mutations in APC are found in the cells of tumors that arise sporadically (in persons without a family history). Additional genes that are frequently mutated in colorectal cancer include the oncogenes myc and neu and the tumor-suppressor gene HNPCC. Mutations in these genes are responsible for the different steps of colorectal cancer progression. One of the earliest steps is a mutation that inactivates the APC gene, which increases the rate of cell division, leading to polyp formation ( ◗ FIGURE 21.25). A person with familial adenomatous polyposis coli inherits one defective copy of the APC gene, and defects in this gene are associated with the numerous polyps that appear in those who have this disorder. Mutations in APC are also found in the polyps that develop in people who do not have adenomatous polyposis coli. Mutations of the ras oncogene usually occur later, in larger polyps comprising cells that have acquired some genetic mutations. The protein produced by the normal ras proto-oncogene sits inside the cell membrane. From there it relays signals from growth factors that stimulate cell division. When ras is mutated, the protein that it encodes continually relays a stimulatory signal for cell division, even when growth factor is absent. Mutations in p53 and other genes appear still later in tumor progression; these mutations are rare in polyps but common in malignant cells. Because p53 prevents the replication of cells with genetic damage and controls proper chromosome segregation, mutations in p53 may allow a cell to rapidly acquire further gene and chromosome mutations, which then contribute to further proliferation and invasion into surrounding tissues. The sequence of steps just outlined is not the only route to colorectal cancer, and the mutations need not occur in the order presented here, but this sequence is a common pathway by which colon and rectal cells become cancerous. Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics Section through normal colon Normal cells Loss of normal tumorsuppressor gene APC 1 A polyp (small growth) forms on the colon wall. 2 A benign, precancerous tumor grows. Activation of oncogene ras Blood vessel 3 An adenoma (benign tumor) grows. Loss of tumorsuppressor gene p53 4 A carcinoma (malignant tumor) develops. Other changes; loss of antimetastasis gene 5 The cancer metastasizes (spreads to other tissue through the bloodstream). ◗ 21.25 Mutations in multiple genes contribute to the progression of colorectal cancer. Connecting Concepts Across Chapters This chapter has focused on three specialized but important topics: the genetics of development, the immune system, and cancer. In addition to their relevance to genetics, these topics have obvious medical importance and all are the subject of intense research. The results of early experiments demonstrated that genes are not usually lost or permanently altered in the course of development; rather, development proceeds through the regulation of gene expression. The basic question for development is how are different sets of genes expressed in different parts of the embryo? Our study of pattern formation in Drosophila revealed that many genes take part and that they are regulated in a highly sequential manner. The process is initiated by maternally produced mRNA and proteins that become localized to particular regions of the egg. Sets of genes are successively activated, each set controlling the expression of other sets, so that successively smaller regions of the embryo are determined. The immune system also is encoded by a complex set of genes whose products interact closely. Unlike those in pattern development, genes encoding antibodies and T-cell receptors are permanently altered in lymphocyte maturation. Lymphocytes violate the general principle that all cells contain the same set of genetic information. Cancer also is influenced by complex interactions among multiple genes. Paradoxically, cancer is fundamentally a genetic disease, but most cancers are not inherited, because cancer usually requires somatic mutations at multiple genes. Even for those cancers for which a predisposition is clearly inherited, additional somatic mutations are required for cancer to arise. These mutations, each rare, accumulate because they provide the cell with a growth advantage. This chapter has synthesized much of the information provided in preceding chapters. Gene regulation (Chapter 16) is the basis of development, the understanding of which also requires knowledge of genetic maternal effects (Chapter 5), transcription (Chapter 13), and translation (Chapter 15). The rearrangement of segments in genes of the immune system builds on our understanding of recombination (Chapter 12) and RNA processing (Chapter 14). Chromosome and gene mutations (Chapters 9 and 17) are essential to understanding cancer progression. Many oncogenes and tumor-suppressor genes control the cell cycle (Chapter 2), and predisposition to some cancers may be inherited as single-gene traits (Chapter 3). Cancer may also entail mutations in DNA repair genes (Chapter 17), genes affecting chromosome segregation (Chapter 2), and the regulation of telomerase (Chapter 12). Recombinant DNA techniques (Chapter 18) have contributed tremendously to our understanding of all of these processes. 47 48 Chapter 21 The New Genetics ETHICS • SCIENCE • TECHNOLOGY Breast Cancer Scientists in a medical genetics program of a major university medical school enroll a 54-year-old woman with metastatic breast cancer into a research protocol. The patient reports that several of her maternal relatives have breast cancer, but no pathological specimens from affected relatives are available for verifying the diagnoses. The patient dies before research studies are completed. The patient has two daughters who are identical twins. Shortly after her mother’s death, one of the twins requests access to her mother’s test results. She explains that she wants this information because it will help her learn whether she carries the same mutation that might have contributed to her mother’s disease. She has been informed that laboratories will not test for a mutation in a person before the identification of a known cancer-causing mutation in another family member affected by breast cancer. She wishes to learn whether she has a mutation because she is considering a prophylactic mastectomy to reduce her risk of developing breast cancer. The research team learns from the head of the Institutional Review Board that there are no legal obstacles to this request, because the legal rights of the mother are not being compromised — she is deceased. The deceased are not considered research subjects under existing federal regulations. On discovering her sister’s intentions to request her mother’s results, the second twin objects to the release of their mother’s genetic information and says that she does not want to know whether she has inherited a greater risk of developing breast cancer. However, she went on to say, there is no way that the information could be kept from her, because she would inevitably learn of her sister’s decision to have surgery. What should the research team do? The case raises questions that are common in genetic research and clinical care today. Because of the nature and complexity of the questions raised, many of them require interdisciplinary examination. The GenEthics Consortium (GEC) was formed to bring scientists, bioethicists, lawyers, genetic counselors, and consumers together to discuss ethical issues emerging from research associated with the Human Genome Project. In the late 1990s, the GEC convened to consider this particular case. In the course of their discussion, some members expressed the opinion that the clinical setting should focus on meeting the needs of the individual person, and the research setting should focus on gathering evidence to support a hypothesis. Others disagreed, arguing that the results of genetic testing in research settings often have clinical implications for subjects. No explicit guidance was given by the mother, — should researchers release this information to a child who requests it? Some argued that the fact that the mother’s DNA was being tested for mutations that are markers for breast cancer implies that the mother was not opposed to this type of testing. Some also stated that we can safely presume that the mother expected to share that information with her husband and children. But whether children have a right to access the mother’s test results was not the only issue that merited discussion. What should clinicians and researchers do when family members disagree about whether such information should be made available? Some felt that the sister who first came to the researchers with a request for her mother’s Ron Green genetic information has a privileged position. Others objected strenuously, saying that one’s right to information in complex cases such as this one cannot merely be a matter of who gets to the doctor first. Still other issues arose in the course of this discussion, most of which fell under the umbrella question of whether the risks of breast cancer for the twins could really be determined with precision. Persons in high-risk families with hereditary breast cancer and cancer-causing mutations face a much greater than average lifetime risk. However, in the absence of a thorough family history, the presence of a certain mutation and breast cancer in the mother alone do not provide the basis for assuming the existence of “hereditary” breast cancer in the family. A second major area of uncertainty was highlighted when genetic professionals from different institutions disagreed about the statistics regarding risk reduction from prophylactic mastectomy. Some argued that the weighing of benefits and harms must include a comparison of the protection of one twin from the risk of premature death with the risk of psychological distress for the second twin. The research team would be morally justified in choosing the action that is most likely to reduce risk of death by cancer. However, others disagreed on the basis of inconclusive scientific evidence that mastectomy is a valid risk-reduction strategy. Although there was no agreement on how genetic professionals should respond to this situation, there was a sense that, at this stage of genetic research, individual professionals might ethically and responsibly come to different conclusions about what they would do when faced with competing requests of this sort. Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics 49 CONCEPTS SUMMARY • Each multicellular organism begins as a single cell that has the potential to develop into any cell type. As development proceeds, cells become committed to particular fates. The results of early cloning experiments demonstrated that this process arises from differential gene expression. • In the early Drosophila embryo, determination is effected through a cascade of gene control. • The dorsal– ventral and anterior – posterior axes of the Drosophila embryo are established by egg-polarity genes. These genes are expressed in the female parent and produce RNA and proteins that are deposited in the egg cytoplasm. Initial differences in the distribution of these molecules regulate gene expression in various parts of the embryo. The dorsal – ventral axis is defined by a concentration gradient of the Dorsal protein, and the anterior – posterior axis is defined by concentration gradients of Bicoid and Nanos proteins. • After the establishment of the major axes of development, three types of segmentation genes act sequentially to determine the number and organization of the embryonic segments in Drosophila. The gap genes establish large sections of the embryo, the pair-rule genes affect alternate segments, and the segment-polarity genes affect the organization of individual segments. • Homeotic genes then define the identity of individual Drosophila segments. All these genes contain a consensus sequence called a homeobox that encodes a DNA-binding domain; the products of homeotic genes are DNA-binding proteins that regulate the expression of other genes. Genes with homeoboxes are found in many other organisms. • Apoptosis, or programmed cell death, plays an important role in the development of many animals. In apoptosis, DNA is degraded, the nucleus and cytoplasm shrink, and the cell undergoes phagocytosis by other cells. Apoptosis is a highly regulated process that depends on caspases — proteins that cleave proteins. Each caspase is originally synthesized as an inactive precursor that must be activated, often through cleavage by another caspase. • The immune system is the primary defense network in vertebrates. In humoral immunity, B cells produce antibodies that bind foreign antigens; in cellular immunity, T cells attack cells carrying foreign antigens. • Each B and T cell is capable of binding only one type of foreign antigen. There are vast numbers of different types of B and T cells, and any potential antigen can be bound. When a lymphocyte binds to an antigen, the lymphocyte divides and gives rise to a clone of cells, each specific for that same antigen. This process is a primary immune response. A few memory cells remain in circulation for long periods of time. If the same antigen is encountered again, memory cells can proliferate rapidly and generate a secondary immune response. • Immunoglobulins (antibodies) consists of two light chains and two heavy chains, each containing variable and constant • • • • • • • • regions. Light chains are of two basic types: kappa and lambda chains. The genes that encode the immunoglobulin chains consist of several types of gene segments; germ-line DNA contains multiple copies of these gene segments, which differ slightly in sequence. In B-cell maturation, somatic recombination randomly brings together one version of each segment to produce a single complete gene. Many combinations of the different segments are possible. The potential for diversity of antibodies is further increased by the random addition and deletion of nucleotides at the junctions of the segments. A high mutation rate also increases the potential diversity of antibodies. T-cell receptors are composed of alpha and beta chains. The germ-line genes for these proteins consist of segments with multiple varying copies. Somatic recombination allows many different types of T-cell receptors in different cells. Junctional diversity also adds to T-cell receptor variability. The major histocompatibility complex encodes a number of histocompatibility antigens. Each T cell simultaneously binds a foreign antigen and a host MHC antigen. The MHC antigen allows the immune system to distinguish self from nonself. Each locus for the MHC contains many alleles. Cancer is fundamentally a genetic disorder, arising from somatic mutations in multiple genes that affect cell division and proliferation. If one or more mutations is inherited, then fewer additional mutations are required for cancer to develop. A mutation that allows a cell to divide rapidly provides the cell with a growth advantage; this cell gives rise to a clone of cells with the same mutation. Within this clone, other mutations occur that provide additional growth advantages, and cells with these additional mutations become dominant in the clone. In this way, the clone evolves. Environmental factors play an important role in the development of many cancers by increasing the rate of somatic mutations. Several types of genes contribute to cancer progression. Oncogenes are dominant mutated copies of genes that normally stimulate cell division. Tumor-suppressor genes normally inhibit cell division; recessive mutations in these genes may contribute to cancer. Oncogenes and tumor-suppressor genes often control the cell cycle or regulate apoptosis. Defects in DNA repair genes and genes that control chromosome segregation often increase the overall mutation rate of other genes, leading to defects in proto-oncogenes and tumor-suppressor genes that may contribute to cancer progression. Mutations in sequences that regulate telomerase, an enzyme that replicates the ends of chromosomes, are often associated with cancer. Telomerase allows cells to divide indefinitely but is not usually expressed in somatic cells. Mutations in tumor cells allow telomerase to be expressed. Tumor progression is also affected by mutations in genes that promote vascularization and the spread of tumors. 50 Chapter 21 • Colorectal cancer offers a model system for understanding tumor progression in humans. Initial mutations stimulate cell division, leading to a small benign polyp. Additional mutations allow the polyp to enlarge, invade the muscle layer of the gut, and eventually spread to other sites. Mutations in particular genes affect different stages of this progression. IMPORTANT TERMS totipotent (p. 000) determination (p. 000) egg polarity gene (p. 000) morphogen (p. 000) segmentation gene (p. 000) gap gene (p. 000) pair-rule gene (p. 000) segment-polarity gene (p. 000) homeotic gene (p. 000) homeobox (p. 000) Antennapedia complex (p. 000) bithorax complex (p. 000) homeotic complex (p. 000) Hox gene (p. 000) apoptosis (p. 000) caspase (p. 000) antigen (p. 000) autoimmune disease (p. 000) humoral immunity (p. 000) B cell (p. 000) antibody (p. 000) cellular immunity (p. 000) T cell (p. 000) T-cell receptor (p. 000) major histocompatibility complex (MHC) antigen (p. 000) theory of clonal selection (p. 000) primary immune response (p. 000) memory cell (p. 000) secondary immune response (p. 000) somatic recombination (p. 000) junctional diversity (p. 000) somatic hypermutation (p. 000) malignant tumor (p. 000) metastasis (p. 000) clonal evolution (p. 000) oncogene (p. 000) tumor-suppressor gene (p. 000) proto-oncogene (p. 000) Worked Problems 1. If a fertilized Drosophila egg is punctured at the anterior end and a small amount of cytoplasm is allowed to leak out, what will be the most likely effect on the development of the fly embryo? • Solution The egg-polarity genes determine the major axes of development in the Drosophila embryo. One of these genes is bicoid, which is transcribed in the maternal parent. As bicoid mRNA passes into the egg, the mRNA becomes anchored to the anterior end of the egg. After the egg is laid, bicoid mRNA is translated into Bicoid protein, which forms a concentration gradient along the anterior – posterior axis of the embryo. The high concentration of Bicoid protein at the anterior end induces the development of anterior structures such as the head of the fruit fly. If the anterior end of the egg is punctured, cytoplasm containing high concentrations of Bicoid protein will leak out, reducing the concentration of Bicoid protein at the anterior end. The result will be that the embryo fails to develop head and thoracic structures at the anterior end. 2. In some cancer cells, a specific gene has become duplicated many times. Is this gene likely to be an oncogene or a tumorsuppressor gene? Explain your reasoning. • Solution The gene is likely to be an oncogene. Oncogenes stimulate cell proliferation and act in a dominant manner. Therefore, extra copies of an oncogene will result in cell proliferation and cancer. Tumor-suppressor genes, on the other hand, suppress cell proliferation and act in a recessive manner; a single copy of a tumor-suppressor gene is sufficient to prevent cell proliferation. Therefore extra copies of the suppressor gene will not lead to cancer. 3. The immunoglobulin molecules of a particular mammalian species has kappa and lambda light chains and heavy chains. The kappa gene consists of 250 V and 8 J segments. The lambda gene contains 200 V and 4 J segments. The gene for the heavy chain consists of 300 V, 8 J, and 4 D segments. Considering just somatic recombination and random combinations of light and heavy chains, how many different types of antibodies can be produced by this species? • Solution For the kappa light chain, there are 250  8  2000 combinations; for the lambda light chain, there are 200  4  800 combinations; so a total of 2800 different types of light chains are possible. For the heavy chains, there are 300  8  4  9600 possible types. Any of the 2800 light chains can combine with any of the 9600 heavy chains; so there are 2800  9600  26,880,000 different types of antibodies possible from somatic recombination and random combination alone. Junctional diversity and somatic hypermutation would greatly increase this diversity. COMPREHENSION QUESTIONS * 1. What experiments suggested that genes are not lost or permanently altered in development? 2. Briefly explain how the Dorsal protein is redistributed in the formation of the Drosophila embryo and how this redistribution helps to establish the dorsal – ventral axis of the early embryo. * 3. Briefly describe how the bicoid and nanos genes help to determine the anterior – posterior axis of the fruit fly. Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics * 4. List the three major classes of segmentation genes and outline the function of each. 5. What role do homeotic genes play in the development of fruit flies? * 6. What is apoptosis and how is it regulated? * 7. Explain how each of the following processes contributes to antibody diversity. (a) Somatic recombination. (b) Junctional diversity. (c) Hypermutation. 8. What is the function of the MHC antigens? Why are the genes that encode these antigens so variable? 51 * 9. Outline Knudson’s multistage theory of cancer and describe how it helps to explain unilateral and bilateral cases of retinoblastoma. 10. Briefly explain how cancer arises through clonal evolution. *11. What is the difference between an oncogene and a tumor-suppressor gene? Give some examples of the functions of proto-oncogenes and tumor suppressers in normal cells. 12. Why do mutations in genes that encode DNA repair enzymes and chromosome segregation often produce a predisposition to cancer? *13. What role do telomeres and telomerase play in cancer progression? APPLICATION QUESTIONS AND PROBLEMS 14. If telomeres are normally shortened after each round of replication in somatic cells, what prediction would you make about the length of telomeres in Dolly, the first cloned sheep? *15. Give examples of genes that affect development in fruit flies by regulating gene expression at the level of (a) transcription and (b) translation. 16. What would be the most likely effect on development of puncturing the posterior end of a Drosophila egg, allowing a small amount of cytoplasm to leak out, and then injecting that cytoplasm into the anterior end of another egg? *17. What would be the most likely result of injecting bicoid mRNA into the posterior end of a Drosophila embryo and inhibiting the translation of nanos mRNA? 18. What would be the most likely effect of inhibiting the translation of hunchback mRNA throughout the embryo? *19. Molecular geneticists have performed experiments in which they altered the number of copies of the bicoid gene in flies, affecting the amount of Bicoid protein produced. (a) What would be the effect on development of an increased number of copies of the bicoid gene? (b) What would be the effect of a decreased number of copies of bicoid? Justify your answers. 20. What would be the most likely effect on fruit-fly development of a deletion in the nanos gene? 21. Give an example of a gene found in each of the categories of genes (egg-polarity, gap, pair-rule, and so forth) listed in Figure 21.12. *22. In a particular species, the gene for the kappa light chain has 200 V gene segments and 4 J segments. In the gene for 23. *24. 25. 26. the lambda light chain, this species has 300 V segments and 6 J segments. Considering only the variability arising from somatic recombination, how many different types of light chains are possible? In the fictional book Chromosome 6 by Robin Cook, a biotechnology company genetically engineers individual bonobos (a type of chimpanzee) to serve as future organ donors for clients. The genes of the bonobos are altered so that no tissue rejection takes place when their organs are transplanted into a client. What genes would need to be altered for this scenario to work? Explain your answer. A couple has one child with bilateral retinoblastoma. The mother is free from cancer, but the father had unilateral retinoblastoma and he has a brother who has bilateral retinoblastoma. (a) If the couple has another child, what is the probability that this next child will have retinoblastoma? (b) If the next child has retinoblastoma, is it likely to be bilateral or unilateral? (c) Propose an explanation for why the father’s case of retinoblastoma was unilateral, whereas his sons and brother’s cases are bilateral. Some cancers are consistently associated with the deletion of a particular part of a chromosome. Does the deleted region contain an oncogene or a tumor-suppressor gene? Explain why. Cells in a tumor contain mutated copies of a particular gene that promotes tumor growth. Gene therapy can be used to introduce a normal copy of this gene into the tumor cells. Would you expect this therapy to be effective if the mutated gene were an oncogene? A tumor-suppressor gene? Explain your reasoning. 52 Chapter 21 CHALLENGE QUESTIONS 27. As we have learned in this chapter, the Nanos protein inhibits the translation of hunchback mRNA, thus lowering the concentration of Hunchback protein at the posterior end of a fruit-fly embryo and stimulating the differentiation of posterior characteristics. The results of experiments have demonstrated that the action of Nanos on hunchback mRNA depends on the presence of an 11-base sequence that is located in the 3 untranslated region of hunchback mRNA. This sequence has been termed the Nanos response element (NRE). There are two copies of NREs in the trailer of hunchback mRNA. If a copy of NRE is added to the 3 untranslated region of another mRNA produced by a different gene, the mRNA now becomes repressed by Nanos. The repression is greater if several NREs are added. On the basis of these observations, propose a mechanism for how Nanos inhibits Hunchback translation. 28. Offer a possible explanation for the widespread distribution of Hox genes among animals. 29. Many cancer cells are immortal (will divide indefinitely) because they have mutations that allow telomerase to be expressed. How might this knowledge be used to design anticancer drugs? SUGGESTED READINGS Developmental Genetics De Robertis, E. M., G. Oliver, and C. V. E. Wright. 1990. Homeobox genes and the vertebrate body plan. Scientific American 264: (1):46 – 52. A readable account of how homeobox genes were discovered and how they affect development in vertebrates. Halder, G., P. Callaerts, and W. J. Gehring. 1995. Induction of ectopic eyes by targeted expression of the eyeless gene in Drosophila. Science 267:1788 – 1792. A report of the research that produced extra eyes in Drosophila. Kolata, G. 1998. Clone: The Road to Dolly and the Path Ahead. New York: William Morrow. A readable and accurate account of the cloning of Dolly, the first mammal cloned from an adult cell, and the ethical debate generated by this experiment. Jan, Y. N., and L. Y. Jan. 1998. Asymmetrical cell division. Nature 392:775 – 778. A review of the mechanisms by which asymmetrical cell division, which plays a critical role in development, arises. Meyer, A. 1998. Hox gene variation and evolution. Nature 391:225 – 227. A short review about the evolution of Hox gene clusters in vertebrates. McKinnell, R. G., and M. A. Di Berardino. 1999. The biology of cloning: history and rationale. Bioscience 49:875 – 885. A good summary of the history of cloning and some of its practical uses. Pennisi, E., and G. Vogel. 2000. Clones: a hard act to follow. Science 288:1722 – 1727. A news report on the different organisms that have been successfully cloned. Raff, M. 1998. Cell suicide for beginners. Nature 396:119 – 122. An introduction to the process of apoptosis. Science. 1994. Volume 266(October 28): 513 – 700. Deals with the topic of development and contains a number of reviews of research in developmental genetics. Science. 1998. Volume 281(August 28): 1301 – 1326. Contains a number of articles on apoptosis. Thompson, G. B. 1995. Apoptosis in the pathogenesis and treatment of disease. Science 267:1456 – 1462. A discussion of the role of apoptosis in disease. Immunogenetics Ada, G. L. and G. Nossal. 1987. The clonal-selection theory. Scientific American 257(2):62 – 69. History and review of the development of the theory of clonal selection. Gellert, M. 1992. Molecular analysis of V(D)J recombination. Annual Review of Genetics 22:425 – 446. An extensive review of the molecular mechanism of somatic recombination in genes of the immune system. Gellert, M. 2002. V(D)J recombination: RAG proteins, repair factors, and regulation. Annual Review of Biochemistry 71:101 – 132. A review of the mechanism of recombination that leads to antibody diversity. Leder, P. 1982. The genetics of antibody diversity. Scientific American 247(5):102 – 115. A review of the processes that lead to diversity in antibodies. Weaver, D. T., and F. W. Alt. 1997. From RAGs to stitches. Nature 388:428 – 429. Reviews findings concerning the mechanism of V-D-J joining in the generation of antibody diversity. Cancer Genetics Bitttner, M., P. Meltzer, Y. Chen, Y. Jiang, E. Seftor, M. Hendrix, et al. 2000. Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 406:536 – 540. Advanced Topics in Genetics: Developmental Genetics, Immunogenetics, and Cancer Genetics Evidence of genes that affect the spread of cancer. Fearon, E. R., and B. Vogelstein. 1990. A genetic model for colorectal tumorigenesis. Cell 61:759 – 767. A review of some of the mutations that led to colorectal cancer. Hanahan, D., and R. A. Weinberg. 2000. The hallmarks of cancer. Cell 100:57 – 70. A review of the different types of genes that are associated with cancer. Knudson, A. G. 2000. Chasing the cancer demon. Annual Review of Genetics 34:1 – 19. A short history of the search for a genetic cause of cancer, along with a review of hereditary cancers and the genes that cause them. Lengauer, C., K. W. Kinzler, and B. Vogelstein. 1998. Genetic instabilities in human cancer. Nature 396:643 – 649. A review of how defects in DNA repair and chromosome segregation genes lead to cancer. Orr-Weaver, T. L., and R. A. Weinberg. 1998. A checkpoint on the road to cancer. Nature 392:223 – 224. 53 Discusses how mutations that affect cell-cycle checkpoints may contribute to cancer progression. Ponder, B. A. 2001. Cancer genetics. Nature 411:336 – 341. A good review on the types of genetic events that contribute to cancer. Science. 1997. Volume 278(November 7): 1035 – 1077. Contains a number of articles on cancer, including discussions of the genetic basis of human cancer syndromes, genetic testing for cancer risk, genetic approaches to developing drugs for cancer treatment, and environmental influences on cancer. Weinberg, R. A. 1991. Tumor suppressor genes. Science 254:1138 – 1146. A review of tumor-suppressor genes. Weizman, J. B., and M. Yaniv. 1999. Rebuilding the road to cancer. Nature 400:401 – 401. Discusses the first successful attempt to convert normal human cells into cancer cells by artificially introducing telomerase-expressing genes, oncogenes, and tumor-suppressor genes into a cell. 22 Quantitative Genetics • Thoroughbred Winners Through Quantitative Genetics • Quantitative Characteristics The Relation Between Genotype and Phenotype Types of Quantitative Characteristics Polygenic Inheritance Kernel Color in Wheat Determining Gene Number for a Polygenic Characteristic • Statistical Methods for Analyzing Quantitative Characteristics Distributions Samples and Populations The Mean The Variance and Standard Deviation Correlation Regression Quantitative genetic methods are being used to improve and predict racing speed in thoroughbred horses. Applying Statistics to the Study of a Polygenic Characteristic • Heritability Phenotypic Variance Types of Heritability The Calculation of Heritability The Limitations of Heritability Locating Genes That Affect Quantitative Characteristics • Response to Selection Predicting the Response to Selection Limits to Selection Response Correlated Responses Thoroughbred Winners Through Quantitative Genetics For more than 300 years, thoroughbred horses have been raised for a single purpose — to win at the racetrack. The origin of these horses can be traced to a small group that was imported to England from North Africa and the Middle East in the 1600s. The population of racing horses remained small until the 1800s, when horse racing became increasing 54 popular; today there are approximately half a million thoroughbred horses worldwide. Breeding and racing thoroughbred horses is a multibillion-dollar industry that relies on the premise that a horse’s speed is inherited. Speed is not, however, a simple genetic characteristic such as seed shape in peas. Numerous genes and nongenetic factors such as diet, training, and the jockey who rides the horse all contribute to a horse’s success or failure in a race. The inheritance of racing speed Quantitative Genetics www.whfreeman.com/pierce More information on horse genetics, including the genetic basis of coat color, gene mapping, chromosomes, and genetic disorders Quantitative Characteristics Qualitative, or discontinuous, characteristics possess only a few distinct phenotypes ( ◗ FIGURE 22.1a); these characteristics are the types studied by Mendel and have been the focus of our attention thus far. However, many characteristics vary continuously along a scale of measurement with many overlapping phenotypes ( ◗ FIGURE 22.1b). They are referred to as continuous characteristics; they are also called quantitative characteristics because any individual’s phenotype must be described with a quantitative measurement. Quantitative characteristics might include height, weight, and blood pressure in humans, growth rate in mice, seed weight in plants, and milk production in cattle. Quantitative characteristics arise from two phenomena. First, many are polygenic — they are influenced by (a) Discontinuous characteristic 1 A discontinuous (qualitative) characteristic exhibits only a few, easily distinguished phenotypes. Number of individuals 2 The plants are either tall or dwarf. (b) Continuous characteristic 3 A continuous (quantitative) characterisitic exhibits a continuous range of phenotypes. Tall Dwarf Phenotype (height) 4 The plants exhibit a wide range of heights. Number of individuals in thoroughbreds is more complex than that of any of the characteristics that we have studied up to this point. Can the inheritance of a complex characteristic such as racing speed be studied? Is it possible to predict the speed of a horse on the basis of its pedigree? The answers are yes — at least in part — but these questions cannot be addressed with the methods that we used for simple genetic characteristics. Instead, we must use statistical procedures that have been developed for analyzing complex characteristics. The genetic analysis of complex characteristics such as racing speed of thoroughbreds is known as quantitative genetics. Although the mathematical methods for analyzing complex characteristics may seem to be imposing at first, most people can intuitively grasp the underlying logic of quantitative genetics. We all recognize family resemblance: we talk about inheriting our father’s height or our mother’s intelligence. Family resemblance lies at the heart of the statistical methods used in quantitative genetics. When genes influence a characteristic, related individuals resemble one another more than unrelated individuals. Closely related individuals (such as siblings) should resemble one another more than distantly related individuals (such as cousins). Comparing individuals with different degrees of relatedness, then, provides information about the extent to which genes influence a characteristic. This type of analysis has been applied to the inheritance of racing speed in thoroughbreds. In 1988, Patrick Cunningham and his colleagues examined records of more than 30,000 3-year-old horses that raced between 1961 and 1985. They reasoned that, if genes influence racing success, a horse’s racing success should be more similar to that of its parents than to that of unrelated horses. Similarly, the racing speeds of half-brothers and half-sisters should be more similar than the speeds of unrelated horses are. When Cunningham and his colleagues statistically analyzed the racing records for thoroughbreds, they found that a considerable amount of variation in track performance was due to genetic differences — racing speed is heritable. With the use of statistics, it is possible to estimate, with some degree of accuracy, the track performance of a horse from the performance of its relatives. This chapter is about the genetic analysis of complex characteristics such as racing speed. We begin by considering the differences between quantitative and qualitative characteristics and why the expression of some characteristics varies continuously. We’ll see how quantitative characteristics are often influenced by many genes, each of which has a small effect on the phenotype. Next, we will examine statistical procedures for describing and analyzing quantitative characteristics. We will consider the question of how much of phenotypic variation can be attributed to genetic and environmental influences and will conclude looking at the effects of selection on quantitative characteristics. It’s important to recognize that the methods of quantitative genetics are not designed to identify individual genes and genotypes. Rather, the focus is on statistical predictions based on groups of individuals. Dwarf Tall Phenotype (height) ◗ 22.1 Discontinuous and continuous characteristics differ in the number of phenotypes exhibited. 55 56 Chapter 22 genes at many loci. If many loci take part, many genotypes are possible, each producing a slightly different phenotype. Second, quantitative characteristics often arise when environmental factors affect the phenotype, because environmental differences result in a single genotype producing a range of phenotypes. Most continuously varying characteristics are both polygenic and influenced by environmental factors, and these characteristics are said to be multifactorial. The Relation Between Genotype and Phenotype For many discontinuous characteristics, there is a relatively straightforward relation between genotype and phenotype. Each genotype produces a single phenotype, and most phenotypes are encoded by a single genotype. Dominance and epistasis may allow two or three genotypes to produce the same phenotype, but the relation remains relatively simple. This simple relation between genotype and phenotype allowed Mendel to decipher the basic rules of inheritance from his crosses with pea plants; it also permits us both to predict the outcome of genetic crosses and to assign genotypes to individuals. For quantitative characteristics, the relation between genotype and phenotype is often more complex. If the characteristic is polygenic, many different genotypes are possible, several of which may produce the same phenotype. For instance, consider a plant whose height is determined by three loci (A, B, and C), each of which has two alleles. Assume that one allele at each locus (A, B, and C) encodes a plant hormone that causes the plant to grow 1 cm above its baseline height of 10 cm. The second allele at each locus (A, B, and C) encodes no plant hormone and does not contribute to additional height. Considering only the two alleles at a single locus, 3 genotypes are possible (AA, AA, and AA). If all three loci are taken into account, there are a total of 33  27 possible multilocus genotypes (AABBCC, AABBCC, etc.). Although there are 27 genotypes, they produce only seven phenotypes (10 cm, 11 cm, 12 cm, 13 cm, 14 cm, 15 cm, and 16 cm in height). Some of the genotypes produce the same phenotype (Table 22.1); for example, genotypes AABBCC, AABBCC, and AABBCC all have one gene that encodes plant hormone. These genotypes produce one dose of the hormone and a plant that is 11 cm tall. Even in this simple example of only three loci, the relation between genotype and phenotype is quite complex. The more loci encoding a characteristic, the greater the complexity. The influence of environment on a characteristic also can complicate the relation between genotype and phenotype. Because of environmental effects, the same genotype may produce a range of potential phenotypes (the norm of reaction; see p. 00 in Chapter 5). The phenotypic ranges of different genotypes may overlap, making it difficult to know whether individuals differ in phenotype because of genetic or environmental differences ( ◗ FIGURE 22.2). Table 22.1 Hypothetical example of plant height determined by pairs of alleles at each of three loci Doses of Plant Hormone Genotype Height (cm)       0 10       A A B B C C AABBCC AABBCC 1 11 AABBCC AABBCC AABBCC AABBCC AABBCC AABBCC 2 12 AABBCC AABBCC AABBCC AABBCC AABBCC AABBCC AABBCC 3 13 AABBCC AABBCC AABBCC AABBCC AABBCC AABBCC 4 14 AABBCC AABBCC AABBCC 5 15 AABBCC 6 16 A A B B C C Note: Each  allele contributes 1 cm in height above a baseline of 10 cm. In summary, the simple relation between genotype and phenotype that exists for many qualitative (discontinuous) characteristics is absent in quantitative characteristics, and it is impossible to assign a genotype to an individual on the basis of its phenotype alone. The methods used for analyzing qualitative characteristics (examining the phenotypic ratios of progeny from a genetic cross) will not work with quantitative characteristics. Our goal remains the same: we wish to make predictions about the phenotypes of offspring produced in a genetic cross. We may also want to know how much of the variation in a characteristic results from genetic differences and how much results from environmental differences. To answer these questions, we must turn to statistical methods that allow us to make predictions about the inheritance of phenotypes in the absence of information about the underlying genotypes. Quantitative Genetics Number of individuals Threshold Aa aa Number of individuals AA Normal Diseased Susceptibility to disease Dwarf Tall It is impossible to know whether an individual with this phenotype is genotype AA or Aa. ◗ 22.2 For a quantitative characteristic, each genotype may produce a range of possible phenotypes. In this hypothetical example, the phenotypes produced by genotypes AA, Aa, and aa overlap. www.whfreeman.com/pierce Information about some current research in quantitative genetics Types of Quantitative Characteristics Before we look more closely at polygenic characteristics and relevant statistical methods, we need to more clearly define what is meant by a quantitative characteristic. Thus far, we have considered only quantitative characteristics that vary continuously in a population. A continuous characteristic can theoretically assume any value between two extremes; the number of phenotypes is limited only by our ability to precisely measure the phenotype. Human height is a continuous characteristic because, within certain limits, people can theoretically have any height. Although the number of phenotypes possible with a continuous characteristic is infinite, we often group similar phenotypes together for convenience; we may say that two people are both 5 feet 11 inches tall, but careful measurement may show that one is slightly taller than the other. Some characteristics are not continuous but are nevertheless considered quantitative because they are determined by multiple genetic and environmental factors. Meristic characteristics, for instance, are measured in whole numbers. An example is litter size: a female mouse may have 4, 5, or 6 pups but not 4.13 pups. A meristic characteristic has a limited number of distinct phenotypes, but the underlying determination of the characteristic may still be quantitative. These characteristics must therefore be analyzed with the same techniques that we use to study continuous quantitative characteristics. Another type of quantitative characteristic is a threshold characteristic, which is simply present or absent. Although threshold characteristics exhibit only two phenotypes, they are considered quantitative because they, too, are determined by multiple genetic and environmental factors. The expression of the characteristic depends on an underlying susceptibility (usually referred to as liability or risk) that ◗ 22.3 Threshold characteristics display only two possible phenotypes — the trait is either present or absent — but they are quantitative because the underlying susceptibility to the characteristic varies continuously. When the susceptibility exceeds a threshold value, the characteristic is expressed. varies continuously. When the susceptibility is larger than a threshold value, a specific trait is expressed ( ◗ FIGURE 22.3). Diseases are often threshold characteristics because many factors, both genetic and environmental, contribute to disease susceptibility. If enough of the susceptibility factors are present, the disease develops; otherwise, it is absent. Although we focus on the genetics of continuous characteristics in this chapter, the same principles apply to many meristic and threshold characteristics. It is important to point out that just because a characteristic can be measured on a continuous scale does not mean that it exhibits quantitative variation. One of the characteristics studied by Mendel was height of the pea plant, which can be described by measuring the length of the plant’s stem. However, Mendel’s particular plants exhibited only two distinct phenotypes (some were tall and others short), and these differences were determined by alleles at a single locus. The differences that Mendel studied were therefore discontinuous in nature. Concepts Characteristics whose phenotypes vary continuously are called quantitative characteristics. For most quantitative characteristics, the relation between genotype and phenotype is complex. Some characteristics whose phenotypes do not vary continuously also are considered quantitative because they are influenced by multiple genes and environmental factors. Polygenic Inheritance The rediscovery of Mendel’s work in 1900 provided a cohesive theory of inheritance, but the characteristics that Mendel studied were all discontinuous. Questions soon arose about the inheritance of continuously varying characteristics. These characteristics had already been the focus of a group of biologists and statisticians, led by Francis Galton, who were 57 58 Chapter 22 known as biometricians. They examined the inheritance of quantitative characteristics such as human height and intelligence by using statistical procedures. The results of these studies showed that quantitative characteristics are inherited, although the mechanism of inheritance was as yet unknown. After Mendel’s work was rediscovered, a bitter dispute broke out about whether Mendel’s principles applied to quantitative characteristics. Some biometricians argued that the inheritance of quantitative characteristics could not be explained by Mendelian principles, whereas others felt that Mendel’s principles acting on numerous genes (polygenes) could adequately account for the inheritance of quantitative characteristics. This conflict began to be resolved by the work of Wilhelm Johannsen, who showed that continuous variation in the weight of beans was influenced by both genetic and environmental factors. George Udny Yule, a mathematician, proposed in 1906 that several genes acting together could produce continuous characteristics. This hypothesis was later confirmed by Herman Nilsson-Ehle, working on wheat and tobacco, and by Edward East, working on corn. The argument was finally laid to rest in 1918, when Ronald Fisher demonstrated that the inheritance of quantitative characteristics could indeed be explained by the cumulative effects of many genes, each following Mendel’s rules. Kernel Color in Wheat To illustrate how multiple genes acting on a characteristic can produce a continuous range of phenotypes, let us examine one of the first demonstrations of polygenic inheritance. Nilsson-Ehle studied kernel color in wheat and found that the intensity of red pigmentation was determined by three unlinked loci, each of which had two alleles. Nilsson-Ehle obtained several homozygous varieties of wheat that differed in color. Like Mendel, he performed crosses between these homozygous varieties and studied the ratios of phenotypes in the progeny. In one experiment, he crossed a variety of wheat that possessed white kernels with a variety that possessed purple (very dark red) kernels and obtained the following results: P plants with plants with  white kernels purple kernels p F1 plants with red kernels F2 p 16 plants with purple kernels 4 16 plants with dark-red kernels 6 16 plants with red kernels 4 16 plants with light-red kernels 1 16 plants with white kernels 1 Nilsson-Ehle interpreted this phenotypic ratio as the result of segregation of alleles at two loci. (Although he found alleles at three loci that affected kernel color, the two varieties used in this cross differed only at two of the loci.) He proposed that there were two alleles at each locus: one that produced red pigment and another that produced no pigment. We’ll designate the alleles that encoded pigment A and B and the alleles that encoded no pigment A and B. Nilsson-Ehle recognized that the effects of the genes were additive. Each gene seemed to contribute equally to color; so the overall phenotype could be determined by adding the effects of all the genes, as shown in this table. Genotype AABB AABB AABB AABB AABB AABB AABB AABB AABB Doses of pigment 4 Phenotype purple 3 dark red 2 red 1 light red 0 white Notice that the purple and white phenotypes are each encoded by a single genotype, but other phenotypes may result from several different genotypes. From these results, we see that five phenotypes are possible when alleles at two loci influence the phenotype and the effects of the genes are additive. When alleles at more than two loci influence the phenotype, more phenotypes are possible, and this would make the color appear to vary continuously between white and purple. If environmental factors had influenced the characteristic, individuals of the same genotype would vary somewhat in color, making it even more difficult to distinguish between discrete phenotypic classes. Luckily, environment played little role in determining kernel color in Nilsson-Ehle’s crosses, and only a few loci encoded color; so Nilsson-Ehle was able to distinguish among the different phenotypic classes. This ability allowed him to see the Mendelian nature of the characteristic. Let’s now see how Mendel’s principles explain the ratio obtained by Nilsson-Ehle in his F2 progeny. Remember that Nilsson-Ehle crossed a homozygous purple variety (AABB) with the homozygous white variety (AABB), producing F1 progeny that were heterozygous at both loci (AABB). All the F1 plants possessed two pigment-producing alleles that allowed two doses of color to make red kernels. The types and proportions of progeny expected in the F2 can be found by applying Mendel’s principles of segregation and independent assortment. Let’s first examine the effects of each locus separately. At the first locus, two heterozygous F1s are crossed (AA  AA). As we learned in Chapter 3, when two heterozygotes are crossed, we expect progeny in the proportions Quantitative Genetics 4 AA, 12 AA, and 14 AA. At the second locus, two heterozygotes also are crossed, and again we expect progeny in the proportions 14 BB, 12 BB, and 14 BB. 1 P generation A+ A+ B+ B+  A– A– B– B– F1 generation To obtain the probability of combinations of genes at both loci, we must use the multiplication rule of probability (p. 000 in Chapter 3), which is based on Mendel’s principle of independent assortment. The expected proportion of F2 progeny with genotype AABB is the product of the probability of obtaining genotype AA (14) and the probability of obtaining genotype BB (14), or 14  14  116 ( ◗ FIGURE 22.4). The probabilities of each of the phenotypes can then be obtained by adding the probabilities of all the genotypes that produce that phenotype. For example, the red phenotype is produced by three genotypes: A+ A– B+ B– Genotype AABB AABB AABB Break into simple crosses A+ A–  A+ A– B+ B–  B+ B– 1/4 A + A + 1/2 A + A – 1/4 A – A– 1/4 B + B + 1/2 B + B – 1/4 B – B– Combine results Number of pigment genes F2 generation 1/4 B + B + 1/4 A + A + 1/2 B + B – 1/4 B – 1/2 A + A – B– 4 Purple 1/41/2 = 2/16 3 Dark red 1/41/4 = 1/16 2 Red A+ A+ B+ B– A+ A+ B– B– 1/21/4 = 2/16 A+ A– B+ B+ 3 Dark red 1/2 B + B – 1/21/2 = 4/16 A+ A– B+ B– 2 Red 1 Light red 2 Red 1 Light red 0 White B– 1/4 B + B + A– 1/41/4 = 1/16 A+ A+ B+ B+ 1/4 B + B + 1/4 B – 1/4 A – Phenotype 1/2 B + B – 1/4 B – B– 1/21/4 = 2/16 A+ A– B– B– 1/41/4 = 1/16 A– A– B+ B+ 1/41/2 = 2/16 A– A– B+ B– 1/41/4 = 1/16 A– A– B– B– Combine common phenotypes F2 ratio Number of Frequency pigment genes Probability 16 16 1 4 1 1 Thus, the overall probability of obtaining red kernels in the F2 progeny is 116  116  14  616. Figure 22.4 shows that the phenotypic ratio expected in the F2 is 116 purple, 416 dark red, 616 red, 416 light red, and 116 white. This phenotypic ratio is precisely what Nilsson-Ehle observed in his F2 progeny, demonstrating that the inheritance of a continuously varying characteristic such as kernel color is indeed according to Mendel’s basic principles. Nilsson-Ehle’s crosses demonstrated that the difference between the inheritance of genes influencing quantitative characteristics and the inheritance of genes influencing discontinuous characteristics is in the number of loci that determine the characteristic. When multiple loci affect a character, more genotypes are possible; so the relation between the genotype and the phenotype is less obvious. As the number of loci affecting a character increases, the number of phenotypic classes in the F2 increases ( ◗ FIGURE 22.5). Several conditions of Nilsson-Ehle’s crosses greatly simplified the polygenic inheritance of kernel color and made it possible for him to recognize the Mendelian nature of the characteristic. First, genes affecting color segregated at only two or three loci. If genes at many loci had been segregating, he would have had difficulty in distinguishing the phenotypic classes. Second, the genes affecting kernel color had strictly additive effects, making the relation between genotype and phenotype simple. Third, environment played almost no role in the phenotype; had environmental factors Phenotype 1/16 4 Purple 4/16 3 Dark red 6/16 2 Red 4/16 1 Light red 1/16 0 White Conclusion: Polygenic characteristics are inherited according to Mendel's principles. ◗ 22.4 Nilsson-Ehle demonstrated that kernel color in wheat is inherited according to Mendelian principles. He crossed two varieties of wheat that differed in pairs of alleles at two loci affecting kernel color. A purple strain (AABB) was crossed with a white strain (AABB), and the F1 was intercrossed to produce F2 progeny. The ratio of phenotypes in the F2 can be determined by breaking the dihybrid cross into two simple single-locus crosses and combining the results with the multiplication rule. 59 Chapter 22 One locus, Aa  Aa Two loci, AaBb  AaBb Relative number of progeny 60 1 As the number of loci affecting the trait increases,… Five loci, AaBbCcDdEe  AaBbCcDdEe Many loci 2 …the number of phenotypic classes increases. Phenotype classes ◗ 22.5 The results of crossing individuals heterozygous for different numbers of loci affecting a characteristic. To illustrate the use of this equation, assume that we cross two different homozygous varieties of pea plants that differ in height by 16 cm, interbreed the F1, and find that approximately 1256 of the F2 are similar to one of the original homozygous parental varieties. This outcome would suggest that 4 loci with segregating pairs of alleles (1256  144) are responsible for the height difference between the two varieties. Because the two homozygous strains differ in height by 16 cm and there are 4 loci each with two alleles (8 alleles in all), each of the alleles contributes 16 cm/8  2 cm in height. This method for determining the number of loci affecting phenotypic differences requires the use of homozygous strains, which may be difficult to obtain in some organisms. It also assumes that all the genes influencing the characteristic have equal effects, that their effects are additive, and that the loci are unlinked. For many polygenic characteristics, these assumptions are not valid, so this method of determining the number of genes affecting a characteristic has limited application. Concepts The principles that determine the inheritance of quantitative characteristics are the same as the principles that determine the inheritance of discontinuous characteristics, but more genes take part in the determination of quantitative characteristics. Statistical Methods for Analyzing Quantitative Characteristics modified the phenotypes, distinguishing between the five phenotypic classes would have been difficult. Finally, the loci that Nilsson-Ehle studied were not linked; so the genes assorted independently. Nilsson-Ehle was fortunate — for many polygenic characteristics, these simplifying conditions are not present and Mendelian inheritance of these characteristics is not obvious. Because quantitative characteristics are described by a measurement and are influenced by multiple factors, their inheritance must be analyzed statistically. This section will explain the basic concepts of statistics that are used to analyze quantitative characteristics. Determining Gene Number for a Polygenic Characteristic Understanding the genetic basis of any characteristic begins with a description of the numbers and kinds of phenotypes present in a group of individuals. Phenotypic variation in a group, such as the progeny of a cross, can be conveniently represented by a frequency distribution, which is a graph of the frequencies (numbers or proportions) of the different phenotypes ( ◗ FIGURE 22.6). In a typical frequency distribution, the phenotypic classes are plotted on the horizontal (x) axis and the numbers (or proportions) of individuals in each class on the vertical (y) axis. Unlike qualitative characteristics ( ◗ FIGURE 22.6a), quantitative characteristics often exhibit many phenotypes, so a frequency distribution is a concise method of summarizing them all ( ◗ FIGURE 22.6b). Connecting the points of a frequency distribution with a line creates a curve that is characteristic of the distribution ( ◗ FIGURE 22.7). Many quantitative characteristics exhibit a When two individuals homozygous for different alleles at a single locus are crossed (A1A1  A2A2) and the resulting F1 are interbred (A1A2  A1A2), one-fourth of the F2 should be homozygous like each of the original parents. If the original parents are homozygous for different alleles at two loci, as are those in Nilsson-Ehle’s crosses, then 14  14  116 of the F2 should resemble one of the original homozygous parents. Generally, (14)n will be the number of individuals in the F2 progeny that should resemble each of the original homozygous parents, where n equals the number of loci with a segregating pair of alleles that affects the characteristic. This equation provides us with a possible means of determining the number of loci influencing a quantitative characteristic. Distributions Quantitative Genetics (b) Quantitative (continuous) characteristic Number of individuals (a) Qualitative (discontinuous) characteristic ◗ White Pink Red Phenotype color Samples and Populations Biologists frequently need to describe the distribution of phenotypes exhibited by some group of individuals. We might want to describe the height of students at the University of Texas (UT), but there are more than 40,000 students at UT, and measuring every one of them would be impractical. Scientists are constantly confronted with this problem: the group of interest, called the population, is too large for a complete census. One solution is to measure a smaller collection of individuals, called a sample, and use measurements made on the sample to describe the population. To provide an accurate description of the population, a good sample must have several characteristics. First, it must 20 10 be representative of the whole population. If our sample consisted entirely of members of the UT basketball team, for instance, we would probably overestimate the true height of the students. One way to ensure that a sample is representative of the population is to select the members of the sample randomly. Second, the sample must be large enough that chance differences between individuals in the sample and the overall population do not distort the estimate of the population measurements. If we measured only three students at UT and just by chance all three were short, we would underestimate the true height of the student population. Statistics can provide information about how much confidence to expect from estimates based on random samples. Concepts In statistics, the population is the group of interest; a sample is a subset of the population. The sample should be representative of the population and large enough to minimize chance differences between the population and the sample. (b) Squash fruit length 20 10 12 13 14 15 16 17 18 19% (c) Earwig forceps length 2 The distribution of fruit length among the F2 progeny is skewed to the right. 3 A distribution with two peaks is bimodal. Frequency (%) 1 This type of symmetrical (bell-shaped) distribution is called a normal distribution. Frequency (%) (a) Sugar beet percentage of sucrose ◗ 22.7 22.6 A frequency distribution is a graph that displays the number or proportion of different phenotypes. Phenotypic values are plotted on the horizontal axis and the numbers (or proportions) of individuals in each class are plotted on the vertical axis. Phenotype (body weight) symmetrical (bell-shaped) curve called a normal distribution ( ◗ FIGURE 22.7a). Normal distributions arise when a large number of independent factors contribute to a measurement. Quantitative characteristics are frequently affected by numerous genes and environmental factors; so their phenotypes often exhibit normal distributions. Two other common types of distributions (skewed and bimodal) are illustrated in ◗ FIGURE 22.7b and c. Frequency (%) 61 30 20 10 4 6 8 10 12 14 16 18 20 cm Distributions of phenotypes may assume several different shapes. 3 4 5 6 7 8 9 mm Chapter 22 x = 135 cm x = 175 cm 10-year-old boys 18-year-old boys 50 Percentage 25 0 110 120 130 140 150 160 Height (cm) 170 180 190 200 ◗ 22.8 The mean provides information about the center of a distribution. Both distributions of heights of 10-year-old and 18 year-old boys are normal, but they have different locations along a continuum of height, which makes their means different. The Mean The mean, also called the average, provides information about the center of the distribution. If we measured the heights of 10-year-old and 18-year-old boys and plotted a frequency distribution for each group, we would find that both distributions are normal, but the two distributions would be centered at different heights, and this difference would be indicated in their different means ( ◗ FIGURE 22.8). If we represent a group of measurements as x1, x2, x3, and so forth, then the mean (x) is calculated by adding all the individual measurements and dividing by the total number of measurements in the sample (n): x x1  x2  x3  . . .  xn n (22.1) A shorthand way to represent this formula is x xi n To calculate the variance, we (1) subtract the mean from each measurement and square the value obtained, (2) add all the squared deviations, and (3) divide this sum by the number of original measurements minus one. Another statistic that is closely related to the variance is the standard deviation (s), which is defined as the square root of the variance: s 2  √s 2 (22.5) Whereas the variance is expressed in units squared, the standard deviation is in the same units as the original measurements; so the standard deviation is often preferred for describing the variability of a measurement. A normal distribution is symmetrical; so the mean and standard deviation are sufficient to describe its shape. The mean plus or minus one standard deviation (x  s) includes approximately 66% of the measurements in a normal distribution; the mean plus or minus two standard (22.2) Mean x or x 1 n xi (22.3) where the symbol  means “the summation of ” and xi represents individual x values. The Variance and Standard Deviation A statistic that provides key information about a distribution is the variance, which indicates the variability of a group of measurements (how spread out the distribution is). Distributions may have the same mean but different variances ( ◗ FIGURE 22.9). The larger the variance, the greater the spread of measurements in a distribution about its mean. The variance (s2) is defined as the average squared deviation from the mean: s2  (xi  x)2 n1 (22.4) s 2 = 0.25 The greater the variance, the more spread out the distribution is about the mean. Frequency 62 s 2 = 1.0 s 2 = 4.0 5 ◗ 6 7 8 9 10 11 Length 12 13 14 15 22.9 The variance provides information about the variability of a group of phenotypes. Shown here are three distributions with the same mean but different variances. Quantitative Genetics 99% • Solution 95% The mean is calculated by using the following formula: x  xi /n. The value of xi is obtained by summing all the individual measurements, which equals 582; n is the total number of measurements, which equals 10; so x  (582/10)  58.20, or 58,200 pounds per year. The variance is calculated by using the following formula: Frequency 66% s2x  –3s –2s –1s x 1s 2s 3s Phenotype ◗ 22.10 The proportions of a normal distribution occupied by plus or minus one, two, and three standard deviations from the mean. deviations (x  2s) includes approximately 95% of the measurements, and the mean plus or minus three standard deviations (x  3s) includes approximately 99% of the measurements ( ◗ FIGURE 22.10). Thus, only 1% of a normally distributed population lies outside the range of x  3s. Concepts The mean and variance describe a distribution of measurements: the mean provides information about the location of the center of a distribution, and the variance provides information about its variability. Worked Problem The following table lists yearly amounts (in hundreds of pounds) of milk produced by 10 two-year-old Jersey cows. Calculate the mean, variance, and standard deviation of milk production for this sample of 10 cows. Annual milk production (hundreds of pounds) 60 74 58 61 56 55 54 57 65 42 (xi  x)2 n1 so we need to determine the deviation of each individual measurement from the mean (xi  x), square each value, and sum the squared deviations from the mean. Annual milk production (hundreds of pounds) x 60 74 58 61 56 55 54 57 65 42 Variance xi  x 1.80 15.80 0.20 2.80 2.20 3.20 4.20 1.20 6.80 16.20 Standard deviation (xi  x)2 3.24 249.64 0.04 7.84 4.84 10.24 17.64 1.44 46.24 262.44 (xi  x)2  603.60 The variance is therefore: s2x  (xi  x)2 603.60   67.07 n1 9 The standard deviation is the square root of the variance: sx  √s 2x  √67.07  8.19. Correlation The mean and the variance can be used to describe an individual characteristic, but geneticists are frequently interested in more than one characteristic. Often, two or more characteristics vary together. For instance, both the number and the weight of eggs produced by hens are important to the poultry industry. These two characteristics are not independent of each other. There is an inverse relation between egg number and weight: hens that lay more eggs produce smaller eggs. This kind of relation between two characteristics is called a correlation. When two characteristics are correlated, a change in one characteristic is likely to be associated with a change in the other. 63 64 Chapter 22 Correlations between characteristics are measured by a correlation coefficient (designated r), which measures the strength of their association. Consider two characteristics, such as human height (x) and arm length (y). To determine how these characteristics are correlated, we first obtain the covariance (cov) of x and y: covxy  (xi  x)(yi  y) n1 (22.6) The covariance is computed by (1) taking an x value for an individual and subtracting it from the mean of x (x); (2) taking the y value for the same individual and subtracting it from the mean of y ( y); (3) multiplying the results of these two subtractions; (4) adding the results for all the xy pairs; and (5) dividing this sum by n  1 (where n equals the number of xy pairs). The correlation coefficient (r) is obtained by dividing the covariance of x and y by the product of the standard deviations of x and y: r covxy sxsy (22.7) A correlation coefficient can theoretically range from  1 to 1 ( ◗ FIGURE 22.11). A positive value indicates that there is a direct association between the variables ( ◗ FIGURE 22.11a); as one variable increases, the other variable also tends to increase. A positive correlation exists for human height and weight: tall people tend to weigh more. A negative correlation coefficient indicates that there is an inverse relation between the two variables ( ◗ FIGURE 22.11b); as one variable increases, the other tends to decrease (as is the case for egg number and hen weight). The absolute value of the correlation coefficient (the size of the coefficient, ignoring its sign) provides information about the strength of association between the variables. A coefficient of 1 or 1 indicates a perfect correlation between the variables, meaning that a change in x is always (a) r = .7 y (b) r = –.7 y x A positive correlation indicates that there is a direct association between variables. ◗ accompanied by a proportional change in y. Correlation coefficients close to 1 or close to 1 indicate a strong association between the variables — a change in x is almost always associated with a proportional increase in y, as seen in ◗ FIGURE 22.11c. On the other hand, a correlation coefficient closer to 0 indicates a weak correlation — a change in x is associated with a change in y but not always ( ◗ FIGURE 22.11d). A correlation of 0 indicates that there is no association between variables ( ◗ FIGURE 22.11e). A correlation coefficient can be computed for two variables measured for the same individual, such as height (x) and weight (y). A correlation coefficient can also be computed for a single variable measured for pairs of individuals. For example, we can calculate for fish the correlation between the number of vertebrae of a parent (x) and the number of vertebrae of its offspring (y), as shown in ◗ FIGURE 22.12. This approach is often used in quantitative genetics. A correlation between two variables indicates only that the variables are associated; it does not imply a cause and effect relation. Correlation also does not mean that the values of two variables are the same; it means only that a change in one variable is associated with a proportional change in the other variable. For example, the x and y variables in the following list are almost perfectly correlated, with a correlation coefficient of .99. x value 12 14 10 6 3 Average: 9 A high correlation is found between these x and y variables; larger values of x are always associated with larger values of y. Note that the y values are about 10 times as large as the corresponding x values; so, although x and y are correlated, they are not identical. The distinction between correlation (c) r = .9 y x A negative correlation indicates that there is an inverse association between variables. y value 123 140 110 61 32 90 (d) r = .3 y x A strong positive correlation. 22.11 The correlation coefficient describes the relation between two or more variables. (e) r = 0 y x A weak positive correlation. x A correlation of zero indicates that there is no association between variables. 110 Weight of son (kg) Mean number of vertebrae in offspring Quantitative Genetics The regression line is the line that best fits all the points on the graph. Weight of father (kg) The number of vertebrae in offspring is strongly positively correlated with the number of vertebrae in mother. 105 100 105 110 ◗ 22.13 A regression line defines the relation between two variables. Illustrated here is a regression of the weights of fathers against the weights of sons. Each father – offspring pair is represented by a point on the graph: the x value of a point is the father’s weight and the y value of the point is the offspring’s weight. 115 y  a  bx Number of vertebrae in mother ◗ 22.12 A correlation coefficient can be computed for a single variable measured for pairs of individuals. Here, the number of vertebrae in mothers and offspring of the fish Zoarces viviparus is compared. and identity becomes important when we consider the effects of heredity and environment on the correlation of characteristics. Regression Correlation provides information only about the strength and direction of association between variables, but often we want to know more than just whether two variables are associated; we want to be able to predict the value of one variable, given a value of the other. A positive correlation exists between body weight of parents and body weight of their offspring; this correlation exists in part because genes influence body weight, and parents and children share the same genes. Because of this association between phenotypes of parent and offspring, we can predict the weight of an individual on the basis of the weights of its parents. This type of statistical prediction is called regression. This technique plays an important role in quantitative genetics because it allows us to predict characteristics of offspring from a given mating, even without knowledge of the genotypes that encode the characteristic. Regression can be understood by plotting a series of x and y values. ◗ FIGURE 22.13 illustrates the relation between the weight of fathers (x) and the weight of their offspring (y). Each father – offspring pair is represented by a point on the graph. The overall relation between these two variables is depicted by the regression line, which is the line that best fits all the points on the graph (deviations of the points from the line are minimized). The regression line defines the relation between the x and y variables and can be represented by (22.8) In Equation 22.8, x and y represent the x and y variables (in this case, the father’s weight and the offspring’s weight, respectively). The variable a is the y intercept of the line, which is the expected value of y when x is 0. Variable b is the slope of the regression line, also called the regression coefficient; it indicates how much y increases, on average, per increase in x. Trying to position a regression line by eye is not only very difficult but also inaccurate when there are many points scattered over a wide area. Fortunately, the regression coefficient and y intercept can be obtained mathematically. The regression coefficient (b) can be computed from the covariance of x and y (covxy) and the variance of x (s2x) by b covxy s2x (22.9) Several regression lines with different regression coefficients are illustrated in ◗ FIGURE 22.14. After the regression coefficient has been calculated, the y intercept can be calculated by substituting the regression b=1 b = .4 y b = .2 x ◗ 22.14 The regression coefficient (b) represents the change in y per unit change in x. Shown here are regression lines with different regression coefficients. 65 66 Chapter 22 coefficient and the mean values of x and y into the following equation: a  y  bx (22.10) The regression equation (y  a  bx) can then be used to predict the value of any y given the value of x. Concepts A correlation coefficient measures the strength of association between two variables. The sign (positive or negative) indicates the direction of the correlation; the absolute value measures the strength of the association. Regression is used to predict the value of one variable on the basis of the value of a correlated variable. Worked Problem Body weights of 11 female fish and the numbers of eggs that they produce are given in the following table. Weight (mg) x 14 17 24 25 27 33 34 37 40 41 42 Eggs (thousands) y 61 37 65 69 54 93 87 89 100 90 97 A Weight (mg) x B C xi  x 14 17 24 25 27 33 34 37 40 41 42 16.36 13.36 6.36 5.36 3.36 2.64 3.64 6.64 9.64 10.64 11.64 xi  334 What are the correlation coefficient and the regression coefficient for body weight and egg number in these 11 fish? • Solution The computations needed to answer this question are given in the table below. To calculate the correlation and regression coefficients, we first obtain the sum of all the xi values (xi) and the sum of all the yi values (yi); these sums are shown in the last row of the table. We can calculate the means of the two variables by dividing the sums by the number of measurements, which is 11: x xi 334   30.36 n 11 y yi 842   76.55 n 11 After the means have been calculated, the deviations of each value from the means are computed; these deviations are shown in columns B and E of the table. The deviations are then squared (columns C and F) and summed (last row of columns C and F). Next, the products of the deviation of the x values and the deviation of the y values [(xi  x) (yi  y)] are calculated; these products are shown in column G, and their sum is shown in the last row of column G. To calculate the covariance, we use Formula 22.6: covxy  (x i  x)(yi  y) 1743.84   174.38 n1 10 To calculate the covariance and regression requires the variances and standard deviations of x and y: s2x  (xi  x)2 932.55   93.26 n1 10 E F G (xi  x)2 D Eggs (thousands) y yi  y (yi  y)2 (xi  x)(yi  y) 267.65 178.49 40.45 28.73 11.29 6.97 13.25 44.09 92.93 113.21 135.49 61 37 65 69 54 93 87 89 100 90 97 15.55 39.55 11.55 7.55 22.55 16.45 10.45 12.45 23.45 13.45 20.45 241.80 1564.20 133.40 57.00 508.50 270.60 109.20 155.00 549.90 180.90 418.20 254.40 528.39 73.46 40.47 75.77 43.43 38.04 82.67 226.06 143.11 238.04 (x  x)2  932.55 yi  842 (y  y)2  4188.70 (xi  x)(yi  y)  1743.84 Source: R. R. Sokal and F. J. Rohlf, Biometry, 2d ed. (San Francisco: W. H. Freeman and Company, 1981.) Quantitative Genetics sx  √s2x  √93.26  9.66 We can now compute the correlation and regression coefficients. Correlation coefficient: Regression coefficient: b covxy 174.38   1.87 2 sx 93.26 Flower length x = 40.5 mm Flower length x = 93.3 mm F1 generation Frequency covxy 174.38   0.88 sx sy 9.66  20.47 Frequency sy  √s2y  √418.87  20.47 r Parental strain B Parental strain A (y  y)2 4188.70  i   418.87 n1 10 Frequency s2y Flower length P generation 1 Flower length in the F1 was about halfway between that in the two parents,… 2 …and the variance in the F1 was similar to that seen in the parents. 55 58 61 64 67 70 73 Flower length Numerous links to Web sites Applying Statistics to the Study of a Polygenic Characteristic Edward East carried out one early statistical study of polygenic inheritance on the length of flowers in tobacco (Nicotiana longiflora). He obtained two varieties of tobacco that differed in flower length: one variety had a mean flower length of 40.5 mm, and the other had a mean flower length of 93.3 mm ( ◗ FIGURE 22.15). These two varieties had been inbred for many generations and were homozygous at all loci contributing to flower length. Thus, there was no genetic variation in the original parental strains; the small differences in flower length within each strain were due to environmental effects on flower length. When East crossed the two strains, he found that flower length in the F1 was about halfway between that in the two parents (see Figure 22.15), as would be expected if the genes determining the differences in the two strains were additive in their effects. The variance of flower length in the F1 was similar to that seen in the parents, because the F1 were, like their parents, uniform in genotype (the F1 were all heterozygous at the genes that differed between the two parental varieties). East then interbred the F1 to produce F2 progeny. The mean flower length of the F2 was similar to that of the F1, but the variance of the F2 was much greater (see Figure 22.15). This greater variability indicates that there were genetic differences within the F2 progeny. East selected some F2 plants and interbred them to produce F3 progeny. He found that flower length of the F3 depended on flower length in the plants selected as their F2 generation 3 The mean of the F2 was similar to that observed for the F1,… Frequency www.whfreeman.com/pierce on statistics 4 …but the variance in the F2 was greater, indicating the presence of different genotypes among the F2 progeny. 70 60 50 40 30 20 10 0 52 55 58 61 64 67 70 73 76 79 82 85 88 Flower length (mm) ◗ 22.15 Edward East conducted an early statistical study of the inheritance of flower length in tobacco. parents. This finding demonstrated that flower-length differences in the F2 were partly genetic and thus were passed to the next generation. None of the 444 F2 plants that East raised exhibited flower lengths similar to those of the two parental strains. This result suggested that more than four loci with pairs of alleles affected flower length in his varieties, because four allelic pairs are expected to produce 1 of 256 progeny (144  1256) having one or the other of the original parental phenotypes. Heritability In addition to being polygenic, quantitative characteristics are frequently influenced by environmental factors. It is often useful to know how much of the variation in a quantitative characteristic is due to genetic differences and how much is due to environmental differences. That proportion 67 Chapter 22 of the total phenotypic variation that is due to genetic differences is known as the heritability. Consider a dairy farmer who owns several hundred milk cows. The farmer notices that some cows consistently produce more milk than others. The nature of these differences is important to the profitability of his dairy operation. If the differences in milk production are largely genetic in origin, then the farmer may be able to boost milk production by selectively breeding the cows that produce the most milk. On the other hand, if the differences are largely environmental in origin, selective breeding will have little effect on milk production, and the farmer might better boost milk production by adjusting the environmental factors associated with higher milk production. To determine the extent of genetic and environmental influences on variation in a characteristic, phenotypic variation in the characteristic must be partitioned into components attributable to different factors. aa Plant weight 68 AA AA aa Dry Wet Environment AA aa Phenotypic Variance To determine how much of phenotypic differences in a population is due to genetic and environmental factors, we must first have some quantitative measure of the phenotype under consideration. Consider a population of wild plants that differ in size. We could collect a representative sample of plants from the population, weigh each plant in the sample, and calculate the mean and variance of plant weight. This phenotypic variance is represented by VP. Components of phenotypic variance Phenotypic variance, which represents the phenotypic differences among individual members of a group, can be attributed to several factors. First, some of the differences in phenotype may be due to differences in genotypes among individual members of the population. These differences are termed the genetic variance and are represented by VG. Second, some of the differences in phenotype may be due to environmental differences among the plants; these differences are termed the environmental variance, VE. Environmental variance includes differences that can be attributed to specific environmental factors, such as the amount of light or water that the plant receives; it also includes random differences in development that cannot be attributed to any specific factor. Any variation in phenotype that is not inherited is, by definition, a part of the environmental variance. Third, genetic – environmental interaction variance (VGE) arises when the effect of a gene depends on the specific environment in which it is found. An example is shown in ◗ FIGURE 22.16. In a dry environment, genotype AA produces a plant that averages 12 g in weight, and genotype aa produces a smaller plant that averages 10 g. In a wet environment, genotype aa produces the larger plant, averaging 24 g in weight, whereas genotype AA produces a plant that averages 20 g. In this example, there are clearly differences in the two environments: both genotypes produce heavier plants in the wet environment. There are also differences in the weights of the two genotypes, but the relative perfor- AA aa ◗ 22.16 Genetic – environmental interaction variance occurs when the effect of a gene depends on the specific environment in which it is found. In this example, the genotype affects plant weight, but the environmental conditions determine which genotype produces the heavier plant. mances of the genotypes depend on whether the plants are grown in a wet or dry environment. In this case, the influences on phenotype cannot be neatly allocated into genetic and environmental components, because the expression of the genotype depends on the environment in which the plant grows. The phenotypic variance must therefore include a component that accounts for the way in which genetic and environmental factors interact. In summary, the total phenotypic variance can be apportioned into three components: VP  VG  VE  VGE (22.11) Components of genetic variance Genetic variance can be further subdivided into components consisting of different types of genetic effects. First, additive genetic variance (VA) comprises the additive effects of genes on the phenotype, which can be summed to determine the overall effect on the phenotype. For example, suppose that, in a plant, allele A1 contributes 2 g in weight and allele A2 contributes 4 g. If the alleles are strictly additive, then the genotypes would have the following weights: A1A1  2  2  4 g A1A2  2  4  6 g A2A2  4  4  8 g Quantitative Genetics The genes that Nilsson-Ehle studied, which affected kernel color in wheat, were additive in this way. Second, there is dominance genetic variance (VD) when some genes have a dominance component. In this case, the alleles at a locus are not additive; rather, the effect of an allele depends on the identity of the other allele at that locus. Here, we cannot simply add the effects of the alleles together. Instead, we must add a component (VD) to the genetic variance to account for the way that alleles interact. Third, genes at different loci may interact in the same way that alleles at the same locus interact. When this genic interaction occurs, the effects of genes are not additive, and we must include a third component, called genic interaction variance (VI), to the genetic variance: VG  VA  VD  VI (22.12) Summary equation We can now integrate these components into one equation to represent all the potential contributions to the phenotypic variance: VP  VA  VD  VI  VE  VGE (22.13) This equation provides us with a model that describes the potential causes of differences that we observe among individual phenotypes. It’s important to note that this model deals strictly with the observable differences (variance) in phenotypes among individual members of a population; it says nothing about the absolute value of the characteristic or about the underlying genotypes that produce these differences. Types of Heritability The model of phenotypic variance that we’ve just developed can be used to address the question of how much of the phenotypic variance in a characteristic is due to genetic differences. Broad-sense heritability (H 2) represents the proportion of phenotypic variance that is due to genetic variance and is calculated by dividing the genetic variance by the phenotypic variance: broad-sense heritability  H2  VG VP (22.14) It is symbolized H 2 because it is a measure of variance, which is in units squared. Broad-sense heritability can potentially range from 0 to 1. A value of 0 indicates that none of the phenotypic variance results from differences in genotype and all of the differences in phenotype result from environmental variation. A value of 1 indicates that all of the phenotypic variance results from differences in genotype. A heritability value between 0 and 1 indicates that both genetic and environmental factors influence the phenotypic variance. Often, we are more interested in the proportion of the phenotypic variance that results from the additive genetic variance, because the additive genetic variance primarily determines the resemblance between parents and offspring. Narrow-sense heritability (h2) is equal to the additive genetic variance divided by the phenotypic variance: narrow-sense heritability  h2  VA VP (22.15) The Calculation of Heritability Having considered the components that contribute to phenotypic variance and having developed a general concept of heritability, we can ask, How does one go about estimating these different components and calculating heritability? There are several ways to measure the heritability of a characteristic. They include eliminating one or more variance components, comparing the resemblance of parents and offspring, comparing the phenotypic variances of individuals with different degrees of relatedness, and measuring the response to selection. The mathematical theory that underlies these calculations of heritability is complex and beyond the scope of this book. Nevertheless, we can develop a general understanding of how heritability is measured. Heritability by elimination of variance components One way of calculating the broad-sense heritability is to eliminate one of the variance components. We have seen that VP  VG  VE  VGE. If we eliminate all environmental variance (VE  0), then VGE  0 (because, if either VG or VE is zero, no genetic – environmental interaction can take place), and VP  VG. In theory, we might make VE equal to 0 by ensuring that all individuals were raised in exactly the same environment but, in practice, it is virtually impossible. Instead, we could make VG equal to 0 by raising genetically identical individuals, causing VP to be equal to VE. In a typical experiment, we might raise cloned or highly inbred, identically homozygous individuals in a defined environment and measure their phenotypic variance to estimate VE. We could then raise a group of genetically variable individuals and measure their phenotypic variance (VP). Using VE calculated on the genetically identical individuals, we could obtain the genetic variance of the variable individuals by subtraction: VG[of genetically varying individuals]  VP[of genetically varying individuals]  VE[of genetically identical individuals] (22.16) The broad-sense heritability of the genetically variable individuals would then be calculated as follows: H2  VG[of genetically varying individuals] VP[of genetically varying individuals] (22.17) Sewall Wright used this method to estimate the heritability of white spotting in guinea pigs. He first measured 69 Chapter 22 the phenotypic variance for white spotting in a genetically variable population and found that VP  573. Then he inbred the guinea pigs for many generations so that they were essentially homozygous and genetically identical. When he measured their phenotypic variance in white spotting, he obtained VP equal to 340. Because VG  0 in this group, their VP  VE. Wright assumed this value of environmental variance for the original (genetically variable) population and estimated their genetic variance: VP  VE  VG 573  340  233 He then estimated the broad-sense heritability from the genetic and phenotypic variance: H2  H2  VG VP 233  .41 573 This value implies that 41% of the variation in spotting of guinea pigs in Wright’s population was due to differences in genotype. Estimating heritability by using this method assumes that the environmental variance of genetically identical individuals is the same as the environmental variance of the genetically variable individuals, which may not be true. Additionally, this approach can be applied only to organisms for which it is possible to create genetically identical individuals. (a) Mean offspring phenotype 70 (b) Heritability by parent – offspring regression Another method for estimating heritability is to compare the phenotypes of parents and offspring. When genetic differences are responsible for phenotypic variance, offspring should resemble their parents more than they resemble unrelated individuals, because offspring and parents have some genes in common that help determine their phenotype. Correlation and regression can be used to analyze the association of phenotypes in different individuals. To calculate the narrow-sense heritability in this way, we first measure the characteristic on a series of parents and offspring. The data are arranged into families, and the mean parental phenotype is plotted against the mean offspring phenotype ( ◗ FIGURE 22.17). Each data point in the graph represents one family; the value on the x (horizontal) axis is the mean phenotypic value of the parents in a family, and the value on the y (vertical) axis is the mean phenotypic value of the offspring for the family. Let’s assume that there is no narrow-sense heritability for the characteristic (h2  0); genetic differences do not contribute to the phenotypic differences among individuals. In this case, offspring will be no more similar to their parents than they are to unrelated individuals, and the data points will be scattered randomly, generating a regression coefficient of zero ( ◗ FIGURE 22.17a). Next, let’s assume that all of the phenotypic differences are due to additive genetic differences (h2  1.0). In this case, the mean phenotype of the offspring will be equal to the mean phenotype of the parents, and the regression coefficient will be 1 ( ◗ FIGURE 22.17b). If genes and environment both contribute to the differences in phenotype, both heritability and the regression coefficient will lie between 0 and 1 ( ◗ FIGURE 22.17c). The regression coefficient therefore provides information about the magnitude of the heritability. (c) b = h2 = 1 b = h2 = 0 b = h 2 = .5 Mean parental phenotype ◗ 22.17 The narrow-sense heritability (h2) equals the regression coefficient (b) in a regression of the mean phenotype of the offspring on the mean phenotype of the parents. (a) There is no relation between the parental phenotype and the offspring phenotype. (b) The offspring phenotype is the same as the parental phenotypes. (c) Both genes and environment contribute to the differences in phenotype. Quantitative Genetics A complex mathematical proof (which we will not go into here) demonstrates that, in a regression of the mean phenotype of the offspring against the mean phenotype of the parents, narrow-sense heritability (h2) equals the regression coefficient (b): h2  b(regression of mean offspring against mean of both parents) (22.18) An example of calculating heritability by regression of the phenotypes of parents and offspring is illustrated in ◗ FIGURE 22.18. In a regression of the mean offspring phenotype against the phenotype of only one parent, the narrow-sense heritability equals twice the regression coefficient: h2  2b(regression of mean offspring against mean of one parent) (22.19) With only one parent, the heritability is twice the regression coefficient because only half the genes of the offspring come from one parent; thus, we must double the regression coefficient to obtain the full heritability. Heritability and degrees of relatedness A third method for calculating heritability is to compare the phenotypes of individuals having different degrees of relatedness. This method is based on the concept that, the more closely related two individuals are, the more genes they have in common. Mean shell breadth of offspring (mm) 28 Monozygotic (identical) twins have 100% of their genes in common, whereas dizygotic (nonidentical) twins have, on average, 50% of their genes in common. If genes are important in determining variability in a characteristic, then monozygotic twins should be more similar in a particular characteristic than dizygotic twins. By using correlation to compare the phenotypes of monozygotic and dizygotic twins, we can estimate broad-sense heritability. A rough estimate of the broad-sense heritability can be obtained by taking twice the difference of the correlation coefficients for a quantitative characteristic in monozygotic and dizygotic twins: H 2  2(rMZ  rDZ) (22.20) where rMZ equals the correlation coefficient among monozygotic twins and rDZ equals the correlation coefficient among dizygotic twins. This calculation assumes that the two individuals of a monozygotic twin pair experience environments that are no more similar to each other than those experienced by the two individuals of a dizygotic twin pair, which is often not the case, unless the twins have been reared apart. Narrow-sense heritability can also be estimated by comparing the phenotypic variances for a characteristic in full sibs (who have both parents in common, as well as 50% of their genes on the average) and half sibs (who have only one parent in common and thus 25% of their genes on the average). All estimates of heritability depend on the assumption that the environments of related individuals are not more similar than those of unrelated individuals. This assumption is difficult to meet in human studies, because related people are usually reared together. Heritability estimates for humans should therefore always be viewed with caution. 25 Concepts Broad-sense heritability is the proportion of phenotypic variance that is due to genetic variance. Narrow-sense heritability is the proportion of phenotypic variance that is due to additive genetic variance. Heritability can be measured by eliminating one of the variance components, analyzing parent – offspring regression, or comparing individuals having different degrees of relatedness. 20 15 15 ◗ 22.18 20 25 Mean shell breadth of parents (mm) 30 The heritability of shell breadth in snails can be determined by regression of the phenotype of offspring against the mean phenotype of the parents. The regression coefficient, which equals the heritability, is .70. (From L. M. Cook, 1965. Evolution 19:86 – 94.) The Limitations of Heritability Knowledge of heritability has great practical value, because it allows us to statistically predict the phenotypes of offspring on the basis of their parent’s phenotype. It also provides useful information about how characteristics will respond to selection (see next section). In spite of its importance, heritability is frequently misunderstood. Heritability 71 72 Chapter 22 does not provide information about an individual’s genes or the environmental factors that control the development of a characteristic, and it says nothing about the nature of differences between groups. This section outlines some limitations and common misconceptions concerning broad- and narrow-sense heritability. Heritability does not indicate the degree to which a characteristic is genetically determined Heritability is the proportion of the phenotypic variance that is due to genetic variance; it says nothing about the degree to which genes determine a characteristic. Heritability indicates only the degree to which genes determine variation in a characteristic. The determination of a characteristic and the determination of variation in a characteristic are two very different things. Consider polydactyly (the presence of extra digits) in rabbits, which can be caused either by environmental factors or by a dominant gene. Suppose we have a group of rabbits all homozygous for a gene that produces normal numbers of digits. None of the rabbits in this group carries a gene for polydactyly, but a few of the rabbits are polydactylous because of environmental factors. Broad-sense heritability for polydactyly in this group is zero, because there is no genetic variation for polydactyly; all of the variation is due to environmental factors. However, it would be incorrect for us to conclude that genes play no role in determining the number of digits in rabbits. Indeed, we know that there are specific genes that can produce extra digits. Heritability indicates nothing about whether genes control the development of a characteristic; it only provides information about causes of the variation in a characteristic within a defined group. An individual does not have heritability Broad- and narrow-sense heritabilities are statistical values based on the genetic and phenotypic variances found in a group of individuals. It is impossible to calculate heritability for an individual, and heritability has no meaning for a specific individual. Suppose we calculate the narrow-sense heritability of adult body weight for the students in a biology class and obtain a value of .6. We could conclude that 60% of the variation in adult body weight among the students in this class is determined by additive genetic variation. We could not, however, conclude that 60% of any particular student’s body weight is due to additive genes. There is no universal heritability for a characteristic The value of heritability for a characteristic is specific for a given population in a given environment. Recall that broad-sense heritability is genetic variance divided by phenotypic variance. Genetic variance depends on which genes are present, which often differs between populations. In the example of polydactyly in rabbits, there were no genes for polydactyly in the group; so the heritability of the characteristic was zero. A different group of rabbits might contain many genes for polydactyly, and the heritability of the characteristic might be high. Environmental differences may affect heritability, because VP is composed of both genetic and environmental variance. When the environmental differences that affect a characteristic differ between two groups, the heritabilities for the two groups also will often differ. Because heritability is specific to a defined population in a given environment, it is important not to extrapolate heritabilities from one population to another. For example, human height is determined by environmental factors (such as nutrition and health) and by genes. If we measured the heritability of height in a developed country, we might obtain a value of .8, indicating that the variation in height in this population is largely genetic. This population has a high heritability because most people have adequate nutrition and health care (VE is low); so most of the phenotypic variation in height is genetically determined. It would be incorrect for us to assume that height has a high heritability in all human populations. In developing countries, there may be more variation in a range of environmental factors; some people may enjoy good nutrition and health, whereas others may have a diet deficient in protein and suffer from diseases that affect stature. If we measured the heritability of height in such a country, we would undoubtedly obtain a lower value than we observed in the developed country, because there is more environmental variation and the genetic variance in height constitutes a smaller proportion of the phenotypic variation, making the heritability lower. The important point to remember is that heritability must be calculated separately for each population and each environment. Even when heritability is high, environmental factors may influence a characteristic High heritability does not mean that environmental factors cannot influence the expression of a characteristic. High heritability indicates only that the environmental variation to which the population is currently exposed is not responsible for variation in the characteristic. Let’s look again at human height. In most developed countries, heritability of human height is high, indicating that genetic differences are responsible for most of the variation in height. It would be wrong for us to conclude that human height cannot be changed by alteration of the environment. Indeed, height decreased in several European cities during World War II owing to hunger and disease, and height can be increased dramatically by the administration of growth hormone to children. The absence of environmental variation in a characteristic does not mean that the characteristic will not respond to environmental change. Heritabilities indicate nothing about the nature of population differences in a characteristic A common misconception about heritability is that it provides information about population differences in a characteristic. Heritability is Quantitative Genetics specific for a given population in a given environment, so it cannot be used to draw conclusions about why populations differ in a characteristic. Suppose we measured heritability for human height in two groups. One group is from a small town in a developed country, where everyone consumes a high-protein diet. Because there is little variation in the environmental factors that affect human height and there is some genetic variation, the heritability of height in this group is high. The second group comprises the inhabitants of a single village in a developing country. The consumption of protein by these people is only 25% of that consumed by those in the first group; so their average adult height is several centimeters less than that in the developed country. Again, there is little variation in the environmental factors that determine height in this group, because everyone in the village eats the same types of food and is exposed to same diseases. Because there is little environmental variation and there is some genetic variation, the heritability of height in this group also is high. Thus, the heritability of height in both groups is high, and the average height in the two groups is considerably different. We might be tempted to conclude that the difference in height between the two groups is genetically based — that the people in the developed country are genetically taller than the people in the developing country. This conclusion is obviously wrong, however, because the differences in height are due largely to diet — an environmental factor. Heritability provides no information about the causes of differences between populations. These limitations of heritability have often been ignored, particularly in arguments about possible social implications of genetic differences between humans. Soon after Mendel’s principles of heredity were rediscovered, some geneticists began to claim that many human behavioral characteristics are determined entirely by genes. This claim led to debates about whether characteristics such as human intelligence are determined by genes or environment. Many of the early claims of genetically based human behavior were based on poor research; unfortunately, the results of these studies were often accepted at face value and led to a number of eugenic laws that discriminated against certain groups of people. Today, geneticists recognize that many behavioral characteristics are influenced by a complex interaction of genes and environment and that it is very difficult to separate genetic effects from those of the environment. The results of a number of modern studies indicate that human intelligence as measured by IQ and other intelligence tests has a moderately high heritability (usually from .4 to .8). On the basis of this observation, some people have argued that intelligence is innate and that enhanced educational opportunities cannot boost intelligence. This argument is based on the misconception that, when heritability is high, changing the environment will not alter the characteristic. In addition, because heritabilities of intelligence range from .4 to .8, a considerable amount of the variance in intelligence originates from environmental differences. Another argument based on a misconception about heritability is that ethnic differences in measures of intelligence are genetically based. Because the results of some genetic studies show that IQ has moderate heritability and because other studies find differences in the average IQ of ethnic groups, some people have suggested that ethnic differences in IQ are genetically based. As in the example of the effects of diet on nutrition, heritability provides no information about causes of differences among groups; it indicates only the degree to which phenotypic variance within a single group is genetically based. High heritability for a characteristic does not mean that phenotypic differences between ethnic groups are genetic. We should also remember that separating genetic and environmental effects in humans is very difficult; so heritability estimates themselves may be unreliable. Concepts Heritability provides information only about the degree to which variation in a characteristic is genetically determined. There is no universal heritability for a characteristic; heritability is specific for a given population in a specific environment. Environmental factors can potentially affect characteristics with high heritability, and heritability says nothing about the nature of population differences in a characteristic. Locating Genes That Affect Quantitative Characteristics The statistical methods described for use in analyzing quantitative characteristics can be used both to make predictions about the average phenotype expected in offspring and to estimate the overall contribution of genes to variation in the characteristic. These methods do not, however, allow us to identify and determine the influence of individual genes that affect quantitative characteristics. The genes that control polygenic characteristics are referred to as quantitative trait loci (QTLs). Although quantitative genetics has made important contributions to basic biology and to plant and animal breeding, the inability to identify QTLs and measure their individual effects has severely limited the application of quantitative genetic methods. Mapping QTLs In recent years, numerous genetic markers have been identified and mapped with the use of recombinant DNA techniques, making it possible to identify QTLs by linkage analysis. The underlying idea is simple: if the inheritance of a genetic marker is associated consistently with the inheritance of a particular characteristic (such as 73 74 Chapter 22 increased height), then that marker must be linked to a QTL that affects height. The key is to have enough genetic markers so that QTLs can be detected throughout the genome. With the introduction of restriction fragment length polymorphisms and microsatellite variations (see pp. 000 and pp. 000 in Chapter 18), variable markers are now available for mapping QTLs in a number of different organisms ( ◗ FIGURE 22.19). A common procedure for mapping QTLs is to cross two homozygous strains that differ in alleles at many loci. The resulting F1 progeny are then intercrossed or backcrossed to allow the genes to recombine through independent assortment and crossing over. Genes on different chromosomes and genes that are far apart on the same chromosome will recombine freely; genes that are closely linked will be inherited together. The offspring are measured for one or more quantitative characteristics; at the same time, they are genotyped for numerous genetic markers that span the genome. Any correlation between the inheritance of a particular marker allele and a quantitative phenotype indicates that a QTL is linked to that marker. If enough markers are used, it is theoretically possible to detect all the QTLs affecting a characteristic. This approach has been used to detect genes affecting various characteristics in several plant and animal species (Table 22.2). Table 22.2 Organism Quantitative characteristics for which QTLs have been detected Quantitative Characteristic Number of QTLs Detected Tomato Soluble solids Fruit mass Fruit pH Growth Leaflet shape Height 7 13 9 5 9 9 Corn Height Leaf length Tiller number Glume hardness Grain yield Number of ears Thermotolerance 11 7 1 5 18 9 6 Common bean Number of nodules 4 Mung bean Seed weight 4 Cow pea Seed weight 2 Wheat Preharvest sprout 4 Pig Growth Length of small intestine Average back fat Abdominal fat 2 1 1 1 Mouse Epilepsy 2 Rat Hypertension 2 Applications of QTL mapping The number of genes affecting a quantitative characteristic can be estimated by Source: After S. D. Tanksley, Mapping polygenes, Annual Review of Genetics 27 (1993):218. ◗ 22.19 The availability of molecular markers makes the mapping of QTLs possible in many organisms. locating QTLs with genetic markers and adding up the number of QTLs detected. This method will always be an underestimate, because QTLs that are located close together on the same chromosome will be counted together, and those with small effects are likely to be missed. QTL mapping also provides information about the magnitude of the effects that individual genes have on a quantitative characteristic. The polygenic model assumes that many genes affect a quantitative characteristic, that the effect of each gene is small, and that the effects of the genes are equal and additive. The results of studies of QTLs in a number of organisms now show that these assumptions are not always valid. Polygenes appear to vary widely in their effects. In many of the characteristics that have been studied, a few QTLs account for much of the phenotypic variation. In some instances, individual QTLs have been mapped that account for more than 20% of the variance in the characteristic. Quantitative Genetics Concepts The availability of numerous genetic markers revealed by molecular methods make it possible to map individual genes that contribute to polygenic characteristics. www.whfreeman.com/pierce More on QTLs Response to Selection Evolution is genetic change. Several different forces are potentially capable of producing evolution, and we will explore these forces and the process of evolution more fully in the next chapter. Here, we consider how one of these forces— natural selection — may bring about genetic change in a quantitative characteristic. Charles Darwin proposed the idea of natural selection in his book On the Origin of Species in 1859. Natural selection arises through the differential reproduction of individuals with different genotypes, allowing individuals with certain genotypes to produce more offspring than others. Natural selection is one of the most important of the forces that brings about evolutionary change and can be summarized as follows: Observation 1 — Many more individuals are produced each generation than are capable of surviving long enough to reproduce. Observation 2 — There is much phenotypic variation within natural populations. Observation 3 — Some phenotypic variation is heritable. In the terminology of quantitative genetics, some of the phenotypic variation in these characteristics is due to genetic variation, and these characteristics have heritability. Logical consequence — Individuals with certain characters (called adaptive traits) survive and reproduce better that others. Because the adaptive traits are heritable, offspring will tend to resemble their parents with regard to these traits, and there will be more individuals with these adaptive traits in the next generation. Thus, adaptive traits will tend to increase in the population through time. In this way, organisms become genetically suited to their environments; as environments change, organisms change in ways that make them better able to survive and reproduce. For thousands of years, humans have practiced a form of selection by promoting the reproduction of organisms with traits perceived as desirable. This form of selection is artificial selection, and it has produced the domestic plants and animals that make modern agriculture possible. The power of artificial selection, the first application of genetic principles by humans, is illustrated by the tremendous diversity of shapes, colors, and behaviors of modern domesticated dogs ( ◗ FIGURE 22.20). Predicting the Response to Selection When a quantitative characteristic is subjected to natural or artificial selection, it will increase with the passage of time, provided there is genetic variation for that characteristic in the population. Suppose a dairy farmer breeds only those cows in his herd that have the highest milk production. If there is genetic variation in milk production, the mean milk production in the offspring of the selected cows should be higher than the mean milk production of the original herd. This increased production is due to the fact that the selected cows possess more genes for high milk production than does the average cow, and these genes are passed on to the offspring. The offspring of the selected cows possess a higher proportion of genes for greater milk yield and therefore produce more milk than the average cow in the initial herd. The extent to which a characteristic subjected to selection changes in one generation is termed the response to selection. Suppose that the average cow in a dairy herd produces 80 liters of milk per week. A farmer selects for increased milk production by breeding the highest milk producers, and the progeny of these selected cows produce 100 liters of milk per week on average. The response to selection is calculated by subtracting the mean phenotype of the original population (80 liters) from the mean phenotype of the offspring (100 liters), obtaining a response to selection of 100  80  20 liters per week. The response to selection is determined primarily by two factors. First, it is affected by the narrow-sense heritability, which largely determines the degree of resemblance between parents and offspring. When the narrow-sense heritability is high, offspring will tend to resemble their parents; conversely, when the narrow-sense heritability is low, there will be little resemblance between parents and offspring. The second factor that determines the response to selection is how much selection there is. If the farmer is very stringent in the choice of parents and breeds only the highest milk producers in the herd (say, the top 2 cows), then all the offspring will receive genes for high-quality milk production. If the farmer is less selective and breeds the top 20 milk producers in the herd, then the offspring will not carry as many superior genes for high milk production, and they will not, on average, produce as much milk as the offspring of the top 2 producers. The response to selection depends on the phenotypic difference of the individuals that are selected as parents; this phenotypic difference is measured by the selection differential, defined as the difference between the mean phenotype of the selected parents and the mean phenotype of the original population. If the aver- 75 76 Chapter 22 Ancestral dog Canis familiaris metris-optimae Samoyed Persian sheepdog Briard German shepherd Field spaniel Greyhound Afghan hound Italian spaniel Husky Tibetan mastiff Saluki Shock dog Old English sheepdog Canis familiaris inostranzeni Canis familiaris leineri Canis familiaris intermedius Sleuthhound Spanish spaniel Bloodhound Irish wolfhound Golden retriever Whippet Mastiff Great Pyrenees Boxer Newfoundland Dogue de Bordeaux Labrador retriever Poodle Collie Border collie Corgi Keeshond Chow Norwegian elkhound Norfolk spaniel Springer spaniel Cocker spaniel Bulldog Foxhound Dalmation English setter St. Hubert hound Great Dane Dachshund Beagle Old English rough terrier Irish setter Manchester terrier Bull terrier 14 terrier breeds ◗ 22.20 Artificial selection has produced the tremendous diversity of shape, size, color, and behavior seen today among breeds of domestic dogs. age milk production of the original herd is 80 liters and the farmer breeds cows with an average milk production of 120 liters, then the selection differential is 120  80  40 liters. The response to selection (R) depends on the narrowsense heritability (h2) and the selection differential (S): R  h2  S inal bristle number in one population of fruit flies to be .52. The mean number of bristles in the original population was 35.3. They selected individual flies with a mean bristle number of 40.6 and intercrossed them to produce the next generation. The selection differential was 40.6  35.3  5.3; so they predicted a response to selection to be (22.21) This equation can be used to predict the magnitude of change in a characteristic when a given selection differential is applied. G. Clayton and his colleagues estimated the response to selection that would take place in abdominal bristle number of Drosophila melanogaster. Using several different methods, including parent – offspring regression, they first estimated the narrow-sense heritability of abdom- R  .52  5.3  2.8 The response to selection of 2.8 represents the expected increase in the characteristic of the offspring above that of the original population. They therefore expected the average number of abdominal bristles in the offspring of their selected flies to be 35.3  2.8  38.1. Indeed, they found an average bristle number of 37.9 in these flies. Quantitative Genetics Rearranging Equation 22.21 provides another way to calculate the narrow-sense heritability: h2  R S (22.22) In this way, h2 can be calculated by conducting a responseto-selection experiment. First, the selection differential is obtained by subtracting the population mean from the mean of selected parents. The selected parents are then interbred, and the mean phenotype of their offspring is measured. The difference between the mean of the offspring and that of the initial population is the response to selection, which can be used with the selection differential to estimate the heritability. Heritability determined by a response-to-selection experiment is usually termed the realized heritability. If certain assumptions are met, the realized heritability is identical with the narrow-sense heritability. One of the longest selection experiments is a study of oil and protein content in corn seeds ( ◗ FIGURE 22.21). This experiment began at the University of Illinois on 163 ears of corn with an oil content ranging from 4% to 6%. Corn plants having high oil content and those having low oil content were selected and interbreed. Response to selection for increased oil content (the upper line in Figure 22.21) reached about 20%, whereas response to selection for decreased oil content reached a lower limit near zero. Genetic analysis of the high- 17 16 15 14 13 12 11 10 9 8 7 Concepts The response to selection is influenced by narrow-sense heritability and the selection differential. Limits to Selection Response When a characteristic has been selected for many generations, the response eventually levels off, and the characteristic no longer responds to selection ( ◗ FIGURE 22.22). A potential reason for this leveling off is that the genetic variation in the population may be exhausted; at some point, all individuals in the population have become homozygous for alleles that encode the selected trait. When there is no more additive genetic variation, heritability equals zero, and no further response to selection can occur. The response to selection may level off even while some genetic variation remains in the population, however, because natural selection opposes further change in the characteristic. Response to selection for small body size in mice, for example, eventually levels off because the smallest animals are sterile and cannot pass on their genes for small body size. In this case, artificial selection for small size is opposed by natural selection for fertility, and the population can no longer respond to the artificial selection. Selection for high oil content 6 5 4 Mean number of bristles Percentage of oil content 19 18 and low-oil-content strains revealed that at least 20 loci take part in determining oil content. Selection for low oil content 3 2 1 90 80 60 50 ◗ 22.21 10 20 30 40 50 Generation 60 70 In a long-term response-to-selection experiment, selection for oil content in corn increased oil content in one line to about 20%, while almost eliminating it altogether in another line. 80 ◗ Control line 40 0 0 Selected line 70 5 10 15 Generations 20 22.22 The response of a population to selection often levels off at some point in time. In a response-to-selection experiment for increased abdominal chaetae bristle number in female fruit flies, the number of bristles increased steadily for about 20 generations and then leveled off. 25 77 78 Chapter 22 Correlated Responses Two or more characteristics are often correlated. Human height and weight exhibit a positive correlation: tall people, on the average, weigh more than short people. This correlation is a phenotypic correlation, because the association is between two phenotypes of the same person. Phenotypic correlations may be due to environmental or genetic correlations. Environmental correlations refer to two or more characteristics that are influenced by the same environmental factor. Moisture availability, for example, may affect both the size of a plant and the number of seeds produced by the plant. Plants growing in environments with lots of water are large and produce many seeds, whereas plants growing in environments with limited water are small and have few seeds. Alternatively, a phenotypic correlation may result from a genetic correlation, which means that the genes affecting two characteristics are associated. The primary genetic cause of phenotypic correlations is pleiotropy, which is due to the effect of one gene on two or more characteristics (see p. 000 in Chapter 5). In humans, for example, many body structures respond to growth hormone, and there are genes that affect the amount of growth hormone secreted by the pituitary gland. People with certain genes produce high levels of growth hormone, which increases both height and hand size. Others possess genes that produce lower levels of growth hormone, which leads to both short stature and small hands. Height and hand size are therefore phenotypically correlated in humans, and this correlation is due to a genetic correlation — the fact that both characteristics are affected by the same genes that control the amount of growth hormone. Genetically speaking, height and hand size are the same characteristic, because they are the phenotypic manifestation of a single set of genes. When two characteristics are influenced by the same genes they are genetically correlated. Genetic correlations are quite common (Table 22.3) and may be positive or negative. A positive genetic correlation between two characteristics means that genes that cause an increase in one characteristic also produce an increase in the other characteristic. Thorax length and wing length in Drosophila are positively correlated because the genes that increase thorax length also increase wing length. A negative genetic correlation means that genes that cause an increase in one characteristic produce a decrease in the other characteristic. Milk yield and percentage of butterfat are negatively correlated in cattle: genes that cause higher milk production result in milk with a lower percentage of butterfat. Genetic correlations are important in animal and plant breeding because they produce a correlated response to selection, which means that, when one characteristic is selected, genetically correlated characteristics also change. Correlated responses to selection occur because both characteristics are influenced by the same genes; selection for one characteristic causes a change in the genes affecting that Table 22.3 Genetic correlations in various organisms Genetic Correlation Organism Characteristics Cattle Milk yield and percentage of butterfat Pig Weight gain and back-fat thickness .13 Weight gain and efficiency .69 Body weight and egg weight .42 Chicken .38 Body weight and egg production .17 Egg weight and egg production .31 Mouse Body weight and tail length .29 Fruit fly Abdominal bristle number and sternopleural bristle number .41 Source: After D. S. Falconer, Introduction to Quantitative Genetics (London: Longman, 1981), p. 284. characteristic, and these genes also affect the second characteristic, causing it to change at the same time. Correlated responses may well be undesirable and may limit the ability to alter a characteristic by selection. From 1944 to 1964, domestic turkeys were subjected to intense selection for growth rate and body size. At the same time, fertility, egg production, and egg hatchability all declined. These correlated responses were due to negative genetic correlations between body size and fertility; eventually, these genetic correlations limited the extent to which the growth rate of turkeys could respond to selection. Genetic correlations may also limit the ability of natural populations to respond to selection in the wild and adapt to their environments. Concepts Genetic correlations result from pleiotropy. When two characteristics are genetically correlated, selection for one characteristic will produce a correlated response in the other characteristic. www.whfreeman.com/pierce Information and links on the use of quantitative genetics in plant and animal breeding Quantitative Genetics Connecting Concepts Across Chapters In this chapter, our perspective has shifted from individual genotypes (emphasized in transmission genetics) and the physical nature of the gene (emphasized in molecular genetics) to the genetic properties of groups of individuals. This shift will also be our perspective in Chapter 23, on population genetics. Many of the most important characteristics in nature are those that display complex phenotypes and vary continuously. Body weight, reproductive output, susceptibility to diseases, and behavioral attributes often have continuous phenotypes. These types of characteristics are important in agriculture and are frequently significant in human health and evolution. An important theme of this chapter has been that such complex characteristics are inherited according to Mendelian principles, but more genes take part and environmental factors modify the phenotype. Because many factors influence the phenotypes of these complex characteristics, 79 individual genes are difficult to identify, and we cannot predict precise phenotypic ratios among the offspring of a particular cross. Nevertheless, statistical procedures can be used to predict the average offspring phenotype and to assess the extent to which genetic and environmental factors are responsible for phenotypic differences in a characteristic. Because the genes that influence quantitative characteristics are inherited according to Mendelian principles, the study of quantitative genetics requires a thorough understanding of the basic principles of heredity, which were covered in Chapters 3 through 7. Twin studies, which can be used to calculate heritability, are discussed in detail in Chapter 6; restriction fragment length polymorphisms and microsatellite variants, used to map quantitative trait loci, are explained in Chapter 18. The study of quantitative genetics depends on the genetic composition of populations and how that composition changes with time, which is the focus of Chapter 23. CONCEPTS SUMMARY • Quantitative genetics focuses on the inheritance of complex characteristics whose phenotype varies continuously. For many quantitative characteristics, the relation between genotype and phenotype is complex because many genes and environmental factors influence a characteristic. • • Quantitative characteristics also include meristic (counting) characteristics and threshold characteristics whose underlying genetic basis is influenced by multiple factors. • • Many quantitative characteristics are polygenic. The individual genes that influence a polygenic characteristic follow the same Mendelian principles that govern discontinuous characteristics, but, because many genes participate, the expected ratios of phenotypes are obscured. • • A population is the group of interest, and a sample is a subset of the population used to describe it. • A frequency distribution, in which the phenotypes are represented on one axis and the number of individuals possessing the phenotype is represented on the other, is a convenient means of summarizing phenotypes found in a group of individuals. • The mean and variance provide key information about a distribution: the mean gives the central location of the distribution, and the variance provides information about how the phenotype varies within a group. • • The correlation coefficient measures the direction and strength of association between two variables. Regression can be used to predict the value of one variable on the basis of the value of a correlated variable. • Phenotypic variance in a characteristic can be divided into components that are due to additive genetic variance, dominance genetic variance, genic interaction variance, • environmental variance, and genetic – environmental interaction variance. Broad-sense heritability is the proportion of the phenotypic variance that is due to genetic variance; narrow-sense heritability is the proportion of the phenotypic variance due to additive genetic variance. Broad-sense heritability can be estimated by eliminating the environmental variance component. Narrow-sense heritability can be estimated by comparing the phenotypes of parents and offspring or by comparing phenotypes of individuals with different degrees of relatedness, such as identical twins and nonidentical twins. Heritability provides information only about the degree to which variation in a characteristic results from genetic differences. It does not indicate the degree to which a characteristic is genetically determined. Heritability is based on the variances present within a group of individuals, and an individual does not have heritability. Heritability of a characteristic varies among populations and among environments. Even if heritability for a characteristic is high, the characteristic may still be altered by changes in the environment. Heritabilities provide no information about the nature of population differences in a characteristic. Quantitative trait loci are genes that control polygenic characteristics. QTLs can be mapped by examining the association between the inheritance of a quantitative characteristic and the inheritance of genetic markers. The mapping of numerous genetic markers with molecular techniques has made QTL mapping feasible for many organisms. When selection is applied to a quantitative characteristic, the characteristic will change if additive genetic variation for the characteristic is present. The amount that a quantitative 80 Chapter 22 characteristic changes in a single generation when subjected to selection (the response to selection) is directly related to the selection differential and narrow-sense heritability. By applying a selection differential and measuring the response to selection, narrow-sense heritability can calculated. • After selection has been applied to a quantitative characteristic for a number of generations, the response to selection may level off because no additive genetic variation in the characteristic remains. Alternatively, the response to selection may level off because of genetic correlations between the selected trait and other traits that affect fitness. • A genetic correlation may be present when the same gene affects two or more characteristics (pleiotropy). Genetic correlations produce correlated responses to selection. IMPORTANT TERMS quantitative genetics (p. 000) meristic characteristic (p. 000) threshold characteristic (p. 000) frequency distribution (p. 000) normal distribution (p. 000) population (p. 000) sample (p. 000) mean (p. 000) variance (p. 000) standard deviation (p. 000) correlation (p. 000) correlation coefficient (p. 000) regression (p. 000) regression coefficient (p. 000) heritability (p. 000) phenotypic variance (p. 000) genetic variance (p. 000) environmental variance (p. 000) genetic – environmental interaction variance (p. 000) additive genetic variance (p. 000) dominance genetic variance (p. 000) genic interaction variance (p. 000) broad-sense heritability (p. 000) narrow-sense heritability (p. 000) quantitative trait locus (QTL) (p. 000) natural selection (p. 000) artificial selection (p. 000) response to selection (p. 000) selection differential (p. 000) realized heritability (p. 000) phenotypic correlation (p. 000) genetic correlation (p. 000) Worked Problems 1. Seed weight in a particular plant species is determined by pairs of alleles at two loci (a a and b b) that are additive and equal in their effects. Plants with genotype aabb have seeds that average 1 g in weight, whereas plants with genotype aabb have seeds that average 3.4 g in weight. A plant with genotype aabb is crossed with a plant of genotype aabb. (a) What is the predicted weight of seeds from the F1 progeny of this cross? (b) If the F1 plants are intercrossed, what are the expected seed weights and proportions of the F2 plants? P aabb  1g • Solution The difference in average seed weight of the two parental genotypes is 3.4 g  1 g  2.4 g. These two genotypes differ in four genes; so, if the genes have equal and additive effects, each gene difference contributes an additional 24 g/4  6 g of weight to the 1-g weight of a plant with none of these contributing genes (aabb). The cross between the two homozygous genotypes produces the following F1 and F2 progeny: aabb 3.4 g p aabb 2.2 g F1 p Genotype F2 aabb aabb aabb aabb aabb aabb aabb aabb aabb 4  14  116 2  14  18 1 4  12  18 1 4  14  116 1 4  14  116 1 2  12  14 1 4  12  18 1 2  14  18 1 4  14  116 1 1 Propability Number of contributing genes Average seed weight 1 16 0 1 g  (0  0.6 g)  1 g 2 8  416 1 1 g  (1  0.6 g)  1.6 g 2 16  14  616 2 1 g  (2  0.6 g)  2.2 g 2 8  416 3 1 g  (3  0.6 g)  2.8 g 16 4 1 g  (4  0.6 g)  3.4 g 1 Quantitative Genetics (a) The F1 are heterozygous at both loci (aabb) and possess two genes that contribute an additional 0.6 g each to the 1-g weight of a plant with no contributing genes. Therefore, the seeds of the F1 should average 1 g  2(0.6 g)  2.2 g. (b) The F2 will have the following phenotypes and proportions: 1 16 1 g; 416 1.6 g; 616 2.2 g; 416 2.8 g; and 116 3.4 g. 2. Phenotypic variation is analyzed for milk production in a herd of dairy cattle and the following variance components are obtained. 3. The heights of parents and their offspring are measured for 10 families: Mean height of parents (cm) 150 157 188 165 160 142 170 183 152 173 Additive genetic variance (VA)  .4 Dominance genetic variance (VD)  .1 Genic interaction variance (VI)  .2 Environmental variance (VE)  .5 Genetic-environmental interaction variance (VGE)  .0 (a) What is the narrow-sense heritability of milk production? (b) What is the broad-sense heritability of milk production? • Solution To determine the heritabilities, we first need to calculate VP and VG. VP  VA  VD  VI  VE  VGE  .4  .1  .2  .5  .5  1.2 VG  VA  VD  VI  .7 VA 0.4   .33 VP 1.2 (a) the mean, variance, and standard deviation of height of parents and offspring; (b) the correlation and regression coefficients for a regression of mean offspring height on mean parental height; and (c) the narrow-sense heritability of height in these families. (d) What conclusions can be drawn from the heritability value determined in part c? VG 0.7   .58 VP 1.2 A Mean height parents (cm) x B C xi  x 150 157 188 165 160 142 170 183 152 173 14 7 24 1 4 22 6 19 12 9 xi  1640 From these data, determine: (a) The best way to begin is by constructing a table, as shown below. To calculate the means, we need to sum the values of x and y, which are shown in the last rows of columns A and D of the table. For the mean of parental height, (b) The broad sense heritability is: H2  Mean height of offspring (cm) 152 163 193 163 152 157 183 175 163 180 • Solution (a) The narrow sense heritability is: h2  81 x x i 1640   164 cm n 10 E F G (xi  x)2 D Mean height offspring (cm) y yi  y (yi  y)2 (xi  x)(yi  y) 196 49 576 1 16 484 36 361 144 81 152 163 193 163 152 157 183 175 163 180 16.1 5.1 24.9 5.1 16.1 11.1 14.9 6.9 5.1 11.9 259.21 26.01 620.01 26.01 259.21 123.21 222.01 47.61 26.01 141.61 225.4 35.7 597.6 5.1 64.4 244.2 89.4 131.1 61.2 107.1 (x  x)2  1944  yi  1681 (y  y)2  1750.9 (xi  x)(yi  y)  1551 82 Chapter 22 For the mean of the offspring height, y yi 1681   168.1 cm n 10 For the variance, we subtract each x and y value from its mean (columns B and E) and square these differences (columns C and F). The sums of the these squared deviations are shown in the last row of columns C and F. For the regression, we need the covariance, which requires that we take the difference between each x value and its mean and multiply it by the difference between each y value and its mean [(xi  x)(yi  y), column G] and then sum these products (last row of column G). The variance is the sum of the squared deviations from the mean divided by n  1, where n is the number of measurements: s2x  s2y  (xi  x)2 1944   216 n1 9 (yi  y)2 1750.9   194.54 n1 9 The standard deviation is the square root of the variance: sx  √s2x  √216  14.70 sy  √  √194.54  13.95 s 2y To calculate the correlation coefficient and regression coefficient, we need the covariance: covxy  The correlation coefficient is the covariance divided by the standard deviation of x and the standard deviation of y: (xi  x)(yi  y) 1551   172.33 cm n1 9 r covxy 172.33   .84 sxsy (14.70)(13.95) The regression coefficient is the covariance divided by the variance of x: r covxy 172.33   .80 2 sx 216 (b) In a regression of the mean phenotype of the offspring against the mean phenotype of the parents, the regression coefficient equals the narrow-sense heritability, which is .80. (c) We conclude that 80% of the variance in height among the members of these families results from additive genetic variance. 4. A farmer is raising rabbits. The average body weight in his population of rabbits is 3 kg. The farmer selects the 10 largest rabbits in his population, whose average body weight is 4 kg, and interbreeds them. If the heritability of body weight in the rabbit population is .7, what is the expected body weight among offspring of the selected rabbits? • Solution The farmer has carried out a response-to-selection experiment, in which the response to selection will equal the selection differential times the narrow-sense heritability. The selection differential equals the difference in average weights of the selected rabbits and the entire population: 4 kg  3 kg  1 kg. The narrow-sense heritability is given as .7; so the expected response to selection is: R  h2  S  .7  1 kb  0.7 kg. This is the increase in weight that is expected in the offspring of the selected parents; so the average weight of the offspring is expected to be: 3 kg  0.7 kg  3.7 kg. COMPREHENSION QUESTIONS * 1. How does a quantitative characteristic differ from a discontinuous characteristic? 2. Briefly explain why the relation between genotype and phenotype is frequently complex for quantitative characteristics. 3. Why do polygenic characteristics have many * phenotypes? * 4. Explain the relation between a population and a sample. What characteristics should a sample have to be representative of the population? 5. What information do the mean and variance provide about a distribution? 6. How is the standard deviation related to the variance? * 7. What information does the correlation coefficient provide about the association between two variables? 8. What is regression? How is it used? *10. List all the components that contribute to the phenotypic variance and define each component. *11. How do the broad-sense and narrow-sense heritabilities differ? 12. Briefly outline some of the ways that heritability can be calculated. 13. Briefly discuss common misunderstandings or misapplications of the concept of heritability. 14. Briefly explain how genes affecting a polygenic characteristic are located with the use of QTL mapping. *15. How is the response to selection related to the narrow-sense heritability and the selection differential? What information does the response to selection provide? 16. Why does the response to selection often level off after many generations of selection? *17. What is the difference between phenotypic and genetic correlations? Quantitative Genetics 83 APPLICATION QUESTIONS AND PROBLEMS *18. For each of the following characteristics, indicate whether it would be considered a discontinuous characteristic or a quantitative characteristic. Briefly justify your answer. *23. A researcher studying alcohol consumption in North American cities finds a significant, positive correlation between the number of Baptist preachers and alcohol consumption. Is it reasonable for the researcher to conclude that the Baptist preachers are consuming most of the alcohol? Why or why not? 24. Body weight and length were measured on six mosquito fish; these measurements are given in the following table. Calculate the correlation coefficient for weight and length in these fish. (a) Kernel color in a strain of wheat, in which two codominant alleles segregating at a single locus determine the color. Thus, there are three phenotypes present in this strain: white, light red, and medium red. (b) Body weight in a family of Labrador retrievers. An autosomal recessive allele that causes dwarfism is present in this family. Two phenotypes are recognized: dwarf (less than 13 kg) and normal (greater than 13 kg). (c) Presence or absence of leprosy; susceptibility to leprosy is determined by multiple genes and numerous environmental factors. (d) Number of toes in guinea pigs, which is influenced by genes at many loci. (e) Number of fingers in humans; extra (more than five) *25. fingers are caused by the presence of an autosomal dominant allele. *19. The following data are the numbers of digits per foot in 25 guinea pigs. Construct a frequency distribution for these data. 4, 4, 4, 5, 3, 4, 3, 4, 4, 5, 4, 4, 3, 2, 4, 4, 5, 6, 4, 4, 3, 4, 4, 4, 5 20. Ten male Harvard students were weighed in 1916. Their weights are given in the following table. Calculate the mean, variance, and standard deviation for these weights. Weight (kg) of Harvard students (class of 1920) 51 69 69 57 61 57 75 105 69 63 *21. In Moab, Utah, temperature and rainfall exhibit a correlation of .67. Assuming that this correlation is significant (not due to chance), is there more rainfall, on the average, in the summer or in the winter at this location? 22. Among a population of tadpoles, the correlation coefficient for size at metamorphosis and time required for metamorphosis is .74. On the basis of this correlation, what conclusions can you make about the relative sizes of tadpoles that metamorphose quickly and those that metamorphose more slowly? Wet weight (g) 115 130 210 110 140 185 Length (mm) 18 19 22 17 20 21 The heights of mothers and daughters are given in the following table. (a) Calculate the correlation coefficient for the heights of the mothers and daughters. (b) Using regression, predict the expected height of a daughter whose mother is 67 inches tall. Height of mother (in) 64 65 66 64 63 63 59 62 61 60 Height of daughter (in) 66 66 68 65 65 62 62 64 63 62 *26. Assume that plant weight is determined by a pair of alleles at each of two independently assorting loci (A and a, B and b) that are additive in their effects. Further, assume that each allele represented by an uppercase letter contributes 4 g to weight and each allele represented by a lowercase letter contributes 1 g to weight. (a) If a plant with genotype AABB is crossed with a plant with genotype aabb, what weights are expected in the F1 progeny? (b) What is the distribution of weight expected in the F2 progeny? 27. Assume that three loci, each with two alleles (A and a, B and b, C and c), determine the differences in height between two homozygous strains of a plant. These genes are additive and equal in their effects on plant height. One strain 84 Chapter 22 (aabbcc) is 10 cm in height. The other strain (AABBCC) is 22 cm in height. The two strains are crossed, and the resulting F1 are interbred to produce F2 progeny. Give the phenotypes and the expected proportions of the F2 progeny. *28. A farmer has two homozygous varieties of tomatoes. One variety, called Little Pete, has fruits that average only 2 cm in diameter. The other variety, Big Boy, has fruits that average a whopping 14 cm in diameter. The farmer crosses Little 34. Pete and Big Boy; he then intercrosses the F1 to produce F2 progeny. He grows 2000 F2 tomato plants and doesn’t find any F2 offspring that produce fruits as small as Little Pete or as large as Big Boy. If we assume that the differences in fruit size of these varieties are produced by genes with equal and additive effects, what conclusion can we make about the minimum number of loci with pairs of alleles determining the differences in fruit size of the two varieties? *35. 29. Seed size in a plant is a polygenic characteristic. A grower crosses two pure-breeding varieties of the plant and measures seed size in the F1 progeny. He then backcrosses the F1 plants to one of the parental varieties and measures seed size in the backcross progeny. The grower finds that seed size in the backcross progeny has a higher variance than does seed size in the F1 progeny. Explain why the backcross progeny are more variable. *30. Phenotypic variation in tail length of mice has the following components: Additive genetic variance (VA) Dominance genetic variance (VD) Genic interaction variance (VI) Environmental variance (VE) Genetic – environmental interaction variance (VGE)  .5  .3  .1  .4  .0 (a) What is the narrow-sense heritability of tail length? (b) What is the broad-sense heritability of tail length? 31. The narrow-sense heritability of ear length in Reno rabbits is .4. The phenotypic variance (VP) is .8 and the environmental variance (VE) is .2. What is the additive genetic variance (VA) for ear length in these rabbits? *32. Assume that human ear length is influenced by multiple genetic and environmental factors. Suppose you measured ear length on three groups of people, in which group A consists of five unrelated persons, group B consists of five siblings, and group C consists of five first cousins. (a) Assuming that the environment for each group is similar, which group should have the highest phenotypic variance? Explain why. (b) Is it realistic to assume that the environmental variance for each group is similar? Explain your answer. 33. A characteristic has a narrow-sense heritability of .6. (a) If the dominance variance (VD) increases and all other variance components remain the same, what will happen to the narrow-sense heritability? Will it increase, decrease, or remain the same? Explain. (b) What will happen to the broad-sense heritability? Explain. (c) If the environmental variance (VE) increases and all other variance components remain the same, what will happen to the narrow-sense heritability? Explain. (d) What will happen to the broad-sense heritability? Explain. Flower color in the pea plants that Mendel studied is controlled by alleles at a single locus. A group of peas homozygous for purple flowers is grown in a garden. Careful study of the plants reveals that all their flowers are purple, but there is some variability in the intensity of the purple color. If heritability were estimated for this variation in flower color, what would it be. Explain your answer. A graduate student is studying a population of bluebonnets along a roadside. The plants in this population are genetically variable. She counts the seeds produced by 100 plants and measures the mean and variance of seed number. The variance is 20. Selecting one plant, the graduate student takes cuttings from it, and cultivates these cuttings in the greenhouse, eventually producing many genetically identical clones of the same plant. She then transplants these clones into the roadside population, allows them to grow for 1 year, and then counts the number of seeds produced by each of the cloned plants. The graduate student finds that the variance in seed number among these cloned plants is 5. From the phenotypic variance of the genetically variable and genetically identical plants, she calculates the broad-sense heritability. (a) What is the broad-sense heritability of seed number for the roadside population of bluebonnets? (b) What might cause this estimate of heritability to be inaccurate? *36. The length of the middle joint of the right index finger was measured on 10 sets of parents and their adult offspring. The mean parental lengths and the mean offspring lengths for each family are listed in the following table. Calculate the regression coefficient for regression of mean offspring length against mean parental length and estimate the narrow-sense heritability for this characteristic. Mean parental length (mm) 30 35 28 33 26 32 31 29 40 33 Mean offspring length (mm) 31 36 31 35 27 30 34 28 38 34 Quantitative Genetics 85 (b) What should be the average wing length of the *37. Mr. Jones is a pig farmer. For many years, he has fed his progeny of the selected cockroaches? pigs the food left over from the local university cafeteria, which is known to be low in protein, deficient in vitamins, 39. Three characteristics in beef cattle — body weight, fat and downright untasty. However, the food is free, and his content, and tenderness — are measured and the following pigs don’t complain. One day a salesman from a feed variance components are estimated. company visits Mr. Jones. The salesman claims that his company sells a new, high-protein, vitamin-enriched feed Body weight Fat content Tenderness that enhances weight gain in pigs. Although the food is VA 22 45 12 expensive, the salesman claims that the increased weight VD 10 25 5 gain of the pigs will more than pay for the cost of the feed, VI 3 8 2 increasing Mr. Jones’s profit. Mr. Jones responds that he VE 42 64 8 took a genetics class when he went to the university and VGE 0 0 1 that he has conducted some genetic experiments on his pigs; specifically, he has calculated the narrow-sense In this population, which characteristic would respond best heritability of weight gain for his pigs and found it to be to selection? Explain your reasoning. .98. Mr. Jones says that this heritability value indicates that *40. A rancher determines that the average amount of wool 98% of the variance in weight gain among his pigs is produced by a sheep in his flock is 22 kg per year. In an determined by genetic differences, and therefore the new pig attempt to increase the wool production of his flock, the feed can have little effect on the growth of his pigs. He rancher picks five male and five female sheep with the concludes that the feed would be a waste of his money. The greatest wool production; the average amount of wool salesman doesn’t dispute Mr. Jones’ heritability estimate, but produced per sheep by those selected is 30 kg. He he still claims that the new feed can significantly increase interbreeds these selected sheep and finds that the average weight gain in Mr. Jones’ pigs. Who is correct and why? wool production among the progeny of the selected sheep *38. Joe is breeding cockroaches in his dorm room. He finds that is 28 kg. What is the narrow-sense heritability for wool the average wing length in his population of cockroaches is 4 production among the sheep in the rancher’s flock? cm. He picks six cockroaches that have the largest wings; the 41. The narrow-sense heritability of wing length in a average wing length among these selected cockroaches is 10 population of Drosophila melanogaster is .8. The cm. Joe interbreeds these selected cockroaches. From narrow-sense heritability of head width in the same previous studies, he knows that the narrow-sense heritability population is .9. The genetic correlation between wing for wing length in his population of cockroaches is .6. length and head width is .86. If a geneticist selects for (a) Calculate the selection differential and expected increased wing length in these flies, what will happen to response to selection for wing length in these cockroaches. head width? CHALLENGE QUESTIONS 42. We have explored some of the difficulties in separating genetic and environmental components of human behavioral characteristics. Considering these difficulties and what you know about calculating heritability, propose an experimental design for accurately measuring the heritability of musical ability. 43. A student who has just learned about quantitative genetics says, “Heritability estimates are worthless! They don’t tell you anything about the genes that affect a characteristic. They don’t provide any information about the types of offspring to expect from a cross. Heritability estimates measured in one population can’t be used for other populations; so they don’t even give you any general information about how much of a characteristic is genetically determined. I can’t see that heritabilities do anything other than make undergraduate students sweat during tests.” How would you respond to this statement? Is the student correct? What good are heritabilities, and why do geneticists bother to calculate them? 44. A geneticist selects for increased size in a population of fruit flies that she is raising in her laboratory. She starts with the two largest males and the two largest females and uses them as the parents for the next generation. From the progeny produced by these selected parents, she selects the two largest males and the two largest females and mates them. She repeats this procedure each generation. The average weight of flies in the initial population was 1.1 mg. The flies respond to selection, and their body size steadily increases. After 20 generations of selection, the average weight is 2.3 mg. However, after about 20 generations, the response to selection in subsequent generations levels off, and the average size of the flies no longer increases. At this point, the geneticist takes a long vacation; while she is gone, the fruit flies in her population interbreed randomly. When 86 Chapter 22 she returns from vacation, she finds that the average size of the flies in the population has decreased to 2.0 mg. (a) Provide an explanation for why the response to selection leveled off after 20 generations. (b) Why did the average size of the fruit flies decrease when selection was no longer applied during the geneticist’s vacation? 45. Manic-depressive illness is a psychiatric disorder that has a strong hereditary basis, but the exact mode of inheritance is not known. Previous research has shown that siblings of patients with manic-depressive illness are more likely also to develop the disorder than are siblings of unaffected persons. A recent study demonstrated that the ratio of manic-depressive brothers to manic-depressive sisters is higher when the patient is male than when the patient is female. In other words, relatively more brothers of manic-depressive patients also have the disease when the patient is male than when the patient is female. What does this new observation suggest about the inheritance of manic-depressive illness? SUGGESTED READINGS Barton, N. H. 1989. Evolutionary quantitative genetics: how little do we know? Annual Review of Genetics 23:337 – 3370. A review of how quantitative genetics is used to study the process of evolution. Cunningham, P. 1991. The genetics of thoroughbred horses. Scientific American 264(5):92 – 98. An interesting account of how quantitative genetics is being applied to the breeding of thoroughbred horses. Dudley, J. W. 1977. 76 generations of selection for oil and protein percentage in maize. In E. Pollak, O. Kempthorne, and T. B. Bailey, Jr., Eds. Proceedings of the International Conference on Quantitative Genetics, pp. 459 – 473. Ames, IA: Iowa State University Press. A report on the progress of one of the longest running selection experiments. East, E. M. 1910. A Mendelian interpretation of variation that is apparently continuous. American Naturalist 44:65 – 82. East’s interpretation of how individual genes acting collectively produce continuous variation, including a discussion of Nilsson-Ehle’s research on kernel color in wheat. East, E. M. 1916. Studies on size inheritance in Nicotiana. Genetics 1:164 – 176. East’s study of flower length in Nicotiana. Falconer, D. S., and T. F. C. MacKay (Contributor). 1996. Introduction to Quantitative Genetics, 4th ed. New York: Addison-Wesley. An excellent basic text of quantitative genetics. Frary, A., T. C. Nesbitt, A. Frary, S. Grandillo, E. van der Knaap, et al. 2000. A quantitative trait locus key to the evolution of tomato fruit size. Science 289:85 – 88. A report of the discovery and cloning of one QTL that is responsible for the quantitative difference in fruit size between wild tomatoes and cultivated varieties. Gillham, N. W. 2001. Sir Francis Galton and the birth of eugenics. Annual Review of Genetics 2001:83 – 101. A history of Galton’s contributions to the eugenics movement. Mackay, T. F. C. 2001. The genetic architecture of quantitative traits. Annual Review of Genetics 35:303 – 339. A review of techniques for QLT mapping and results from current studies on QLTs. Martienssen, R. 1997. The origin of maize branches out. Nature 386:443 – 445. Discusses the identification of QTLs that contributed to the domestication of corn. Moore, K. J., and D. L. Nagle. 2000. Complex trait analysis in the mouse: the strengths, the limitations, and the promise yet to come. Annual Review of Genetics 43:653 – 686. A review of the genetic analysis of complex characteristics in mice, particularly emphasizing those that are medically important. Paterson, A. H., E. S. Lander, J. D. Hewitt, S. Peterson, S. E. Lincoln, and S. D. Tanksley. 1988. Resolution of quantitative traits into Mendelian factors by using a complete linkage map of restriction fragment length polymorphisms. Nature 335:721 – 726. A study identifying QTLs that control fruit mass, pH, and other important characteristics in tomatoes. Plomin, R. 1999. Genetics and general cognitive ability. Nature 402:C25 – C29. A good discussion of the genetics of general intelligence and the search for QTLs that influence it. Tanksley, S. D. 1993. Mapping polygenes. Annual Review of Genetics 27:205 – 233. This review article summarizes some of the efforts to map QTLs. It discusses the methodology and some of the findings that are emerging from this research. 23 Population and Evolutionary Genetics • The Genetic History of Tristan da Cuna • Genetic Variation Calculation of Genotypic Frequencies Calculation of Allelic Frequencies • The Hardy-Weinberg Law Closer Examination of the Assumptions of the Hardy-Weinberg Law Implications of the Hardy-Weinberg Law Extensions of the Hardy-Weinberg Law Testing for Hardy-Weinberg Proportions The inhabitants of the island of Tristan da Cuna have one of the highest incidences of asthma in the world due to the population’s unique genetic history. (John Eckwall.) Estimating Allelic Frequencies with the Hardy-Weinberg Law • • Nonrandom Mating Changes in Allelic Frequencies Mutation Migration Genetic Drift Natural Selection The Genetic History of Tristan da Cuna In the fall of 1993, geneticist Noé Zamel arrived at Tristan da Cuna, a small remote island in the South Atlantic ( ◗ FIGURE 23.1). It had taken Zamel 9 days to make the trip from his home in Canada, first by plane from Toronto to South Africa and then aboard a small research vessel to the island. Because of its remote location, the people of Tristan da Cuna call their home “the loneliest island,” but isolation was not what attracted Zamel to Tristan da Cuna. Zamel was looking for a gene that causes asthma, and the inhabitants of Tristan da Cuna have one of the world’s highest incidences of hereditary asthma: more than half of the islanders display some symptoms of the disease. The high frequency of asthma on Tristan da Cuna derives from the unique history of the island’s gene pool. The population traces its origin to William Glass, a Scot who moved his family there in 1817. They were joined by some shipwrecked sailors and a few women who migrated from the island of St. Helena but, owing to its remote loca- • Molecular Evolution Protein Variation DNA Sequence Variation Molecular Evolution of HIV in a Florida Dental Practice Patterns of Molecular Variation The Molecular Clock Molecular Phylogenies tion and lack of a deep harbor, the island population remained largely isolated. The descendants of Glass and the other settlers intermarried, and slowly the island population increased in number; by 1855, about 100 people inhabited the island. However, Tristan da Cuna’s population dropped markedly when, after William Glass’s death in 1856, many islanders migrated to South America and South Africa. By 1857, only 33 people remained, and the population grew slowly afterward. It was reduced again in 1885 when a small 669 670 Chapter 23 Equator SOUTH AMERICA AFRICA S O U T H A T L A N T I C O C E A N Tristan da Cuna ◗ 23.1 Tristan de Cuna is a small island in the South Atlantic. boat carrying 15 men was capsized by a huge wave, drowning all on board. Many of the widows and their children left the island, and the population dropped from 106 to 59. In 1961, a volcanic eruption threatened the main village. Fortunately, all of the islanders were rescued and transported to England, where they spent 2 years before returning to Tristan da Cuna. Today, just a little more than 300 people permanently inhabit the island. These islanders have many genes in common and, in fact, all the island’s inhabitants are no less closely related than cousins. Because the founders of the colony were few in number and many were already related, many of the genes in today’s population can be traced to just a few original settlers. The population has always been small, which also gives rise to inbreeding and allows chance factors to have a large effect on the frequencies of the alleles in the population. The abrupt population reductions in 1856 and 1885 eliminated some alleles from the population and elevated the frequencies of others. As will be discussed in this chapter, the events affecting these islanders (small number of founders, limited population size, inbreeding, and population reduction) affect the proportions of alleles in a population. All of these factors have contributed to the high proportion of alleles that cause asthma among the inhabitants of Tristan da Cuna. Tristan da Cuna illustrates how the history of a population shapes its genetic makeup. Population genetics is the branch of genetics that studies the genetic makeup of groups of individuals and how a group’s genetic composition changes with time. Population geneticists usually focus their attention on a Mendelian population, which is a group of interbreeding, sexually reproducing individuals that have a common set of genes, the gene pool. A population evolves through changes in its gene pool; so population genetics is therefore also the study of evolution. Population geneticists study the variation in alleles within and between groups and the evolutionary forces responsible for shaping the patterns of genetic variation found in nature. In this chapter, we will learn how the gene pool of a population is measured and what factors are responsible for shaping it. In the later part of the chapter, we will examine molecular studies of genetic variation and evolution. Genetic Variation An obvious and pervasive feature of life is variability. Consider a group of students in a typical college class, the members of which vary in eye color, hair color, skin pigmentation, height, weight, facial features, blood type, and susceptibility to numerous diseases and disorders. No two students in the class are likely to be even remotely similar in appearance ( ◗ FIGURE 23.2a). Humans are not unique in their extensive variability; almost all organisms exhibit variation in phenotype. For instance, lady beetles are highly variable in their patterns of spots ( ◗ FIGURE 23.2b), mice vary in body size, snails have different numbers of stripes on their shells, and plants vary in their susceptibility to pests. Much of this phenotypic variation is hereditary. Recognition of the extent of phenotypic variation and its genetic basis led Charles Darwin to the idea of evolution through natural selection. In fact, even more genetic variation exists in populations than is visible in the phenotype. Much variation exists at the molecular level owing to the redundancy of the genetic code, which allows different codons to specify the same amino acids. Thus two individuals can produce the same protein even if their DNA sequences are different. DNA sequences between the genes and introns within genes do not encode proteins; so much of the variation in these sequences also has little effect on the phenotype. The amount of genetic variation within natural populations and the forces that limit and shape it are of primary interest to population geneticists. Genetic variation is the basis of all evolution, and the extent of genetic variation within a population affects its potential to adapt to environmental change. An important, but frequently misunderstood, tool used in population genetics is the mathematical model. Let’s take a moment to consider what a model is and how it can be used. A mathematical model usually describes a process in terms of an equation. Factors that may influence the process are represented by variables in the equation; the equation defines the way in which the variables influence the process. Most models are simplified representations of a process, because it is impossible to simultaneously consider all of the influencing factors; some must be ignored in order to examine the effects of others. At first, a model might consider only one or a few factors, but, after their effects are understood, the model can be improved by the addition of Population and Evolutionary Genetics ◗ 23.2 All organisms exhibit genetic variation. (a) Extensive variation among humans. (b) Variation in spotting patterns of Asian lady beetles. (Part a, Paul Warner/AP) more details. It is important to realize that even a simple model can be a source of valuable insight into how a process is influenced by key variables. www.whfreeman.com/pierce More information on genetic diversity within the human species Before we can explore the evolutionary processes that shape genetic variation, we must be able to describe the genetic structure of a population. The usual way of doing so is to enumerate the types and frequencies of genotypes and alleles in a population. Calculation of Genotypic Frequencies A frequency is simply a proportion or a percentage, usually expressed as a decimal fraction. For example, if 20% of the alleles at a particular locus in a population are A, we would say that the frequency of the A allele in the population is .20. For large populations, where it is not practical to determine the genes of all individuals, a sample of individuals from the population is usually taken and the genotypic and allelic frequencies are calculated for this sample (see Chapter 22 for a discussion of samples.) The genotypic and allelic frequencies of the sample are then used to represent the gene pool of the population. To calculate a genotypic frequency, we simply add up the number of individuals possessing the genotype and divide by the total number of individuals in the sample (N). For a locus with three genotypes AA, Aa, and aa, the frequency (f ) of each genotype is: f(AA)  number of AA individuals N f(Aa)  number of Aa individuals N f(aa)  number of aa individuals N (23.1) The sum of all the genotypic frequencies always equals 1. Calculation of Allelic Frequencies The gene pool of a population can also be described in terms of the allelic frequencies. There are always fewer alleles than genotypes; so the gene pool of a population can be described in fewer terms when the allelic frequencies are used. In a sexually reproducing population, the genotypes are only temporary assemblages of the alleles: the genotypes 671 672 Chapter 23 break down each generation when individual alleles are passed to the next generation through the gametes, and so it is the types and numbers of alleles, not genotypes, that have real continuity from one generation to the next and that make up the gene pool of a population. Allelic frequencies can be calculated from (1) the numbers or (2) the frequencies of the genotypes. To calculate the allelic frequency from the numbers of genotypes, we count the number of copies of a particular allele present in a sample and divide by the total number of all alleles in the sample: frequency of an allele number of copies of the allele  number of copies of all alleles at the locus (23.2) For a locus with only two alleles (A and a), the frequencies of the alleles are usually represented by the symbols p and q, and can be calculated as follows: p  f(A)  q  f(a)  2nAA  nAa 2N (23.3) p  f(A1)  2nA1A1  nA1A2  nA1A3 2N q  f(A2)  2nA2A2  nA1A2  nA2A3 2N r  f(A3)  2nA3A3  nA1A3  nA2A3 2N (23.5) Alternatively, we can calculate the frequencies of multiple alleles from the genotypic frequencies by extending Equation 23.4. Once again, we add the frequency of the homozygote to half the frequency of each heterozygous genotype that possesses the allele: p  f(A1)  f(A1A1)  12 f(A1A2)  12 f(A1A3) (23.6) q  f(A2)  f(A2A2)  12 f(A1A2)  12 f(A2A3) r  f(A3)  f(A3A3)  12 f(A1A3)  12 f(A2A3) 2naa  nAa 2N where nAA , nAa , and naa represent the numbers of AA, Aa, and aa individuals, and N represents the total number of individuals in the sample. We divide by 2N because each diploid individual has two alleles at a locus. The sum of the allelic frequencies always equals 1 (p  q  1); so after p has been obtained, q can be determined by subtraction: q  1  p. Alternatively, allelic frequencies can also be calculated from the genotypic frequencies. To do so, we add the frequency of the homozygote for each allele to half the frequency of the heterozygote (because half of the heterozygote’s alleles are of each type): p  f(A)  f(AA)  12 f(Aa) and six genotypes (A1A1, A1A2, A1A3, A2A2, A2A3, and A3A3), the frequencies (p, q, and r) of the alleles are: (23.4) X-linked loci To calculate allelic frequencies for genes at X-linked loci, we apply these same principles. However, we must remember that a female possesses two X chromosomes and therefore has two X-linked alleles, whereas a male has only a single X chromosome and has one X-linked allele. Suppose there are two alleles at an X-linked locus, XA and Xa. Females may be either homozygous (XA XA or XaXa) or heterozygous (XAXa). All males are hemizygous (XAY or XaY). To determine the frequency of the XA allele (p), we first count the number of copies of XA: we multiply the number of XAXA females by two and add the number of XAXa females and the number of XAY males. We then divide the sum by the total number of alleles at the locus, which is twice the total number of females plus the number of males: q  f(a)  f(aa)  12 f(Aa) We obtain the same values of p and q whether we calculate the allelic frequencies from the numbers of genotypes (Equation 23.3) or from the genotypic frequencies (Equation 23.4). Loci with multiple alleles We can use the same principles to determine the frequencies of alleles for loci with more than two alleles. To calculate the allelic frequencies from the numbers of genotypes, we count up the number of copies of an allele by adding twice the number of homozygotes to the number of heterozygotes that possess the allele and divide this sum by twice the number of individuals in the sample. For a locus with three alleles (A1, A2, and A3) p  f(XA)  2nXAXA  nXAXa  nXAY 2nfemales  nmales (23.7a) Similarly, the frequency of the Xa allele is: q  f(Xa)  2nXaXa  nXAXa  nXaY 2nfemales  nmales (23.7b) The frequencies of X-linked alleles can also be calculated from genotypic frequencies by adding the frequency of the females that are homozygous for the allele, half the frequency of the females that are heterozygous for the allele, and the frequency of males hemizygous for the allele: Population and Evolutionary Genetics p  f(XA)  f(XAXA)  12 f(XAXa)  f(XAY) (23.8) number of copies of the allele and divide by the number of copies of all alleles at that locus. q  f(Xa)  f(XaXa)  12 f(XAXa)  f(XaY) If you remember the logic behind all of these calculations, you can determine allelic frequencies for any set of genotypes, and it will not be necessary to memorize the formulas. frequency of an allele  p  f(LM)   Concepts Population genetics is concerned with the genetic composition of a population and how it changes with time. The gene pool of a population can be described by the frequencies of genotypes and alleles in the population. Worked Problem The human MN blood type antigens are determined by two codominant alleles, LM and LN (see p. 000 in Chapter 5). The MN blood types and corresponding genotypes of 398 Finns from Karjala are tabulated here. Phenotype MM MN NN Genotype LMLM LMLN LNLN Number 182 172 44 Source: W. C. Boyd, Genetics and the Races of Man (Boston: Little, Brown, 1950.) Calculate the allelic and genotypic frequencies at the MN locus for the Karjala population. • Solution The genotypic frequencies for the population are calculated with the following formula: genotypic frequency number of individuals with genotype  total number of individudals in sample(N) f(LMLM)  number of LMLM individuals 182   .457 N 398 f(LMLN)  number of LMLN individuals 172   .432 N 398 f(LNLN)  number of LNLN individuals 44   .111 N 398 The allelic frequencies can be calculated from either the numbers or the frequencies of the genotypes. To calculate allelic frequencies from numbers of genotypes, we add the (2nLMLM)  (nLMLN) 2(182)  172  2N 2(398) 536  .673 796 q  f(LN)   number of copies of the allele number of copies of all alleles (2nLNLN)  (nLMLN) 2(44)  172  2N 2(398) 260  .327 796 To calculate the allelic frequencies from genotypic frequencies, we add the frequency of the homozygote for that genotype to half the frequency of each heterozygote that contains that allele: p  f(LM)  f(LMLM)  12 f(LMLN)  .457  12 (.432)  .673 p  f(LN)  f(LNLN)  12 f(LMLN)  .111  12 (.432)  .327 The Hardy-Weinberg Law The primary goal of population genetics is to understand the processes that shape a population’s gene pool. First, we must ask what effects reproduction and Mendelian principles have on the genotypic and allelic frequencies: How do the segregation of alleles in gamete formation and the combining of alleles in fertilization influence the gene pool? The answer to this question lies in the Hardy-Weinberg law, one of the most important principles of population genetics. The Hardy-Weinberg law was formulated independently by both Godfrey H. Hardy and Wilhelm Weinberg in 1908. (Similar conclusions were reached by several other geneticists about the same time.) The law is actually a mathematical model that evaluates the effect of reproduction on the genotypic and allelic frequencies of a population. It makes several simplifying assumptions about the population and provides two key predictions if these assumptions are met. For an autosomal locus with two alleles, the HardyWeinberg law can be stated as follows: Assumptions — If a population is large, randomly mating, and not affected by mutation, migration, or natural selection, then: Prediction 1 — the allelic frequencies of a population do not change; and 673 674 Chapter 23 Prediction 2 — the genotypic frequencies stabilize (will not change) after one generation in the proportions p2 (the frequency of AA), 2pq (the frequency of Aa), and q2 (the frequency of aa), where p equals the frequency of allele A and q equals the frequency of allele a. The Hardy-Weinberg law indicates that, when the assumptions are met, reproduction alone does not alter allelic or genotypic frequencies and the allelic frequencies determine the frequencies of genotypes. The statement that genotypic frequencies stabilize after one generation means that they may change in the first generation after random mating, because one generation of random mating is required to produce Hardy-Weinberg proportions of the genotypes. Afterward, the genotypic frequencies, like allelic frequencies, do not change as long as the population continues to meet the assumptions of the Hardy-Weinberg law. When genotypes are in the expected proportions of p2, 2pq, and q2, the population is said to be in Hardy-Weinberg equilibrium. Concepts The Hardy-Weinberg law describes how reproduction and Mendelian principles affect the allelic and genotypic frequencies of a population. Closer Examination of the Assumptions of the Hardy-Weinberg Law Before we consider the implications of the Hardy-Weinberg law, we need to take a closer look at the three assumptions that it makes about a population. First, it assumes that the population is large. How big is “large”? Theoretically, the Hardy-Weinberg law requires that a population be infinitely large in size, but this requirement is obviously unrealistic. In practice, a population need only be large enough that chance deviations from expected ratios do not cause significant changes in allelic frequencies. Later in the chapter, we will examine the effects of small population size on allelic frequencies. A second assumption of the Hardy-Weinberg law is that individuals in the population mate randomly, which means that each genotype mates in proportion to its frequency. For example, suppose that three genotypes are present in a population in the following proportions: f(AA)  .6, f(Aa)  .3, and f(aa)  .1. With random mating, the frequency of mating between two AA homozygotes (AA  AA) will be equal to the multiplication of their frequencies: .6  .6  .36, whereas the frequency of mating between two aa homozygotes (aa  aa) will be only .1  .1  .01. A third assumption of the Hardy-Weinberg law is that the allelic frequencies of the population are not affected by natural selection, migration, and mutation. Although mutation occurs in every population, its rate is so low that it has little effect on the predictions of the Hardy-Weinberg law. Although natural selection and migration are significant factors in real populations, we must remember that the purpose of the Hardy-Weinberg law is to examine only the effect of reproduction on the gene pool. When this effect is known, the effects of other factors (such as migration and natural selection) can be examined. A final point that should be mentioned is that the assumptions of the Hardy-Weinberg law apply to a single locus. No real population mates randomly for all traits; nor is a population completely free of natural selection for all traits. The Hardy-Weinberg law, however, does not require random mating and the absence of selection, migration, and mutation for all traits; it requires these conditions only for the locus under consideration. A population may be in Hardy-Weinberg equilibrium for one locus but not for others. Implications of the Hardy-Weinberg Law The Hardy-Weinberg law has several important implications for the genetic structure of a population. One implication is that a population cannot evolve if it meets the Hardy-Weinberg assumptions, because evolution consists of change in the allelic frequencies of a population. Therefore the Hardy-Weinberg law tells us that reproduction alone will not bring about evolution. Other processes such as natural selection, mutation, migration, or chance in small populations are required for populations to evolve. A second important implication is that, when a population is in Hardy-Weinberg equilibrium, the genotypic frequencies are determined by the allelic frequencies. For a locus with two alleles, the frequency of the heterozygote is greatest when allelic frequencies are between .33 and .66 and is at a maximum when allelic frequencies are each .5 ( ◗ FIGURE 23.3). The heterozygote frequency also never exceeds .5. Furthermore, when the frequency of one allele is low, homozygotes for that allele will be rare, and most of the copies of a rare allele will be present in heterozygotes. As you can see from Figure 23.3, when the frequency of allele a is .2, the frequency of the aa homozygote is only .04 (q2), but the frequency of Aa heterozygotes is .32 (2pq); 80% of the a alleles are in heterozygotes. A third implication of the Hardy-Weinberg law is that a single generation of random mating produces the equilibrium frequencies of p2, 2pq, and q2. The fact that genotypes are in Hardy-Weinberg proportions does not prove that the population is free from natural selection, mutation, and migration. It means only that these forces have not acted since the last time random mating took place. Extensions of the Hardy-Weinberg Law The Hardy-Weinberg law can also be applied to multiple alleles and X-linked alleles. The genotypic frequencies expected under Hardy-Weinberg equilibrium will differ according to the situation. Population and Evolutionary Genetics 1 1 The frequency of the heterozygote is greatest… .9 Genotypic frequency .8 .7 .6 aa 3 When the frequency of one allele is high, most of the individuals are homozygotes. AA 2 …when allelic frequencies are equal (p = q = .5). Aa .5 .4 .3 .2 .1 p (A) 0 q (a) 1 .1 .9 .2 .8 .3 .7 .4 .6 .5 .5 .6 .4 .7 .3 .8 .2 .9 .1 1 0 Allelic frequency ◗ 23.3 When a population is in Hardy-Weinberg equilibrium, the proportions of genotypes are determined by frequencies of alleles. Hardy-Weinberg expectations for loci with multiple alleles In general, the genotypic frequencies expected at equilibrium are the square of the allelic frequencies. For an autosomal locus with two alleles, these frequencies are (p  q)2  p2  2pq  q2. We can also use the square of the allelic frequencies to calculate the equilibrium frequencies for a locus with multiple alleles. An autosomal locus with three alleles, A1, A2, and A3, has six genotypes: A1A1, A1A2, A2A2, A1A3, A2A3, and A3A3. According to the Hardy-Weinberg law, the frequencies of the genotypes at equilibrium depend on the frequencies of the alleles. If the frequencies of alleles A1, A2, and A3 are p, q, and r, respectively, then the equilibrium genotypic frequencies will be the square of the allelic frequencies (p  q  r)2  p2  2pq  q2  2pr  2qr  r 2, where: p2  f(A1A1) (23.9) Hardy-Weinberg expectations for X-linked loci For an X-linked locus with two alleles, XA and Xa, there are five possible genotypes: XAXA, XAXa, XaXa, XAY, and XaY. Females possess two X-linked alleles, and the expected proportions of the female genotypes can be calculated by using the square of the allelic frequencies. If the frequencies of XA and Xa are p and q, respectively, then the equilibrium frequencies of the female genotypes are (p  q)2  p2 (frequency of XAXA)  2pq (frequency of XAXa)  q2 (frequency of XaXa). Males have only a single X-linked allele, and so the frequencies of the male genotypes are p (frequency of XAY) and q (frequency of XaY). Notice that these expected frequencies are the proportions of the genotypes among males and females rather than the proportions among the entire population. Thus, p2 is the expected proportion of females with the genotype XAXA; if females make up 50% of the population, then the expected proportion of this genotype in the entire population is .5  p2. The frequency of an X-linked recessive trait among males is q, whereas the frequency among females is q2. When an X-linked allele is uncommon, the trait will therefore be much more frequent in males than in females. Consider hemophilia A, a clotting disorder caused by an X-linked recessive allele with a frequency (q) of approximately 1 in 10,000, or .0001. At Hardy-Weinberg equilibrium, this frequency will also be the frequency of the disease among males. The frequency of the disease among females, however, will be q2  (.0001)2  .00000001, which is only 1 in 10 million. Hemophilia is 1000 times as frequent in males as in females. Testing for Hardy-Weinberg Proportions If a population is in equilibrium, then it is randomly mating for the locus in question, and selection, migration, mutation, and small population size have not significantly influenced the genotypic frequencies since random mating last took place. To determine whether these conditions are met, the genotypic proportions expected under the Hardy-Weinberg law must be compared with the observed genotypic frequencies. To do so, we first calculate the allelic frequencies, then find the expected genotypic frequencies by using the square of the allelic frequencies, and finally compare the observed and expected genotypic frequencies by using a chi-square test. 2pq  f(A1A2) q2  f(A2A2) 2pr  f(A1A3) 2qr  f(A2A3) r 2  f(A3A3) The square of the allelic frequencies can also be used to calculate the expected genotypic frequencies for loci with four or more alleles. Worked Problem Jeffrey Mitton and his colleagues found three genotypes (R2R2, R2R3, and R3R3) at a locus encoding the enzyme peroxidase in ponderosa pine trees growing in Colorado. The observed numbers of these genotypes at Glacier Lake, Colorado, were: Genotypes R2R2 R2R3 R3R3 Number observed 135 44 11 675 676 Chapter 23 Are the ponderosa pine trees at Glacier Lake, Colorado, in Hardy-Weinberg equilibrium at the peroxidase locus? • Solution If the frequency of the R2 allele equals p and the frequency of the R3 allele equals q, the frequency of the R2 allele is: p  f(R2)  (2nR2R2)  (nR2R3) 135  44   .826 2N 2(190) The frequency of the R3 allele is obtained by subtraction: q  f(R3)  1  p  .174 The frequencies of the genotypes expected under HardyWeinberg equilibrium are then calculated by using p2, 2pq, and q2: R2R2  p2  (.826)2  .683 chi-square tests that we used in Chapter 3 to assess progeny ratios in a genetic cross, where the degrees of freedom were n 1 and n equaled the number of expected genotypes. For the Hardy-Weinberg test, however, we must subtract an additional degree of freedom, because the expected numbers are based on the observed allelic frequencies; therefore, the observed numbers are not completely free to vary. In general, the degrees of freedom for a chi-square test of Hardy-Weinberg equilibrium equal the number of expected genotypic classes minus the number of associated alleles. For this particular Hardy-Weinberg test, the degrees of freedom are 3 2  1. Once we have calculated both the chi-square value and degrees of freedom, the probability associated with this value can be sought in a chi-square table (Table 3.4). With one degree of freedom, a chi-square value of 7.17 has a probability between .01 and .001. It is very unlikely that the peroxidase genotypes observed at Glacier Lake are in Hardy-Weinberg proportions. R2R3  2pq  2(.826)(.174)  .287 R3R3  q2  (.174)2  .03 Multiplying each of these expected genotypic frequencies by the total number of observed individuals in the sample (190), we obtain the numbers expected for each genotype: R2R2  .683  190  129.7 R2R3  .287  190  54.5 R3R3  .03  190  5.7 Comparing these expected numbers with the observed numbers of each genotype, we see that there are more R2R2 homozygotes and fewer R2R3 heterozygotes and R3R3 homozygotes in the population than we expect at equilibrium. A goodness-of-fit chi-square test is used to determine whether the differences between the observed and the expected numbers of each genotype are due to chance: (observed  expected)2 expected (135  129.7)2 (44  54.5)2 (11  5.7)2    129.7 54.5 5.7  0.22  2.02  4.93  7.17 2   The calculated chi-square value is 7.17; to obtain the probability associated with this chi-square value, we determine the appropriate degrees of freedom. Up to this point, the chi-square test for assessing Hardy-Weinberg equilibrium has been identical with the Concepts The observed number of genotypes in a population can be compared to the HardyWeinberg expected proportions by using a goodness of fit chi-square test. Estimating Allelic Frequencies with the Hardy-Weinberg Law A practical use of the Hardy-Weinberg law is that it allows us to calculate allelic frequencies when dominance is present. For example, cystic fibrosis is an autosomal recessive disorder characterized by respiratory infections, incomplete digestion, and abnormal sweating (see p. 000 in Chapter 6). Among North American Caucasians, the incidence of the disease is approximately 1 person in 2000. The formula for calculating allelic frequency (Equation 23.3) requires that we know the numbers of homozygotes and heterozygotes, but cystic fibrosis is a recessive disease, and so we cannot easily distinguish between homozygous normal persons and heterozygous carriers. Although molecular tests are available for identifying heterozygous carriers of the cystic fibrosis gene, the low frequency of the disease makes widespread screening impractical. In such situations, the Hardy-Weinberg law can be used to estimate the allelic frequencies. If we assume that a population is in Hardy-Weinberg equilibrium with regard to this locus, then the frequency of the recessive genotype (aa) will be q2, and the allelic frequency is the square root of the genotypic frequency: q  √f(aa) (23.10) Population and Evolutionary Genetics The frequency of cystic fibrosis in North American Caucasians is approximately 1 in 2000, or .0005; so q  √0.0005  .02. Thus, about 2% of the alleles in the Caucasian population encode cystic fibrosis. We can calculate the frequency of the normal allele by subtracting: p  1  q  1  .02  .98. After we have calculated p and q, we can use the Hardy-Weinberg law to determine the frequencies of homozygous normal people and heterozygous carriers of the gene: f(AA)  p2  (.98)2  .960 f(Aa)  2pq  2(.02)(.98)  .0392 Thus about 4% (1 of 25) of Caucasians are heterozygous carriers of the allele that causes cystic fibrosis. Concepts Although allelic frequencies cannot be calculated directly for traits that exhibit dominance, the Hardy-Weinberg law can be used to estimate the allelic frequencies if the population is in Hardy-Weinberg equilibrium for that locus. The frequency of the recessive allele will be equal to the square root of the frequency of the recessive trait. Inbreeding is usually measured by the inbreeding coefficient, designated F, which is a measure of the probability that two alleles are “identical by descent.” In a diploid organism, homozygous individual has two copies of the same allele. These two copies may be the same in state, which means that the two alleles are alike in structure and function but do not have a common origin. Alternatively, the two alleles in a homozygous individual may be the same because they are identical by descent — the copies are descended from a single allele that was present in an ancestor ( ◗ FIGURE 23.4). If we go back far enough in time, many alleles are likely to be identical by descent but, for calculating the effects of inbreeding, we consider identity by descent by going back only a few generations. Inbreeding coefficients can range from 0 to 1. A value of 0 indicates that mating in a large population is random; a value of 1 indicates that all alleles are identical by descent. Inbreeding coefficients can be calculated from analyses of pedigrees or they can be determined from the reduction in the heterozygosity of a population. Although we will not go into the details of how F is calculated, it’s important to understand how inbreeding affects genotypic frequencies. When inbreeding occurs, the frequency of the genotypes will be: f(AA)  p2  Fpq f(Aa)  2pq  2Fpq f(aa)  q2  Fpq Nonrandom Mating An assumption of the Hardy-Weinberg law is that mating is random with respect to genotype. Nonrandom mating affects the way in which alleles combine to form genotypes and alters the genotypic frequencies of a population. We can distinguish between two types of nonrandom mating. Positive assortative mating refers to a tendency for like individuals to mate. For example, humans exhibit positive assortative mating for height: tall people mate preferentially with other tall people; short people mate preferentially with other short people. Negative assortative mating refers to a tendency for unlike individuals to mate. If people engaged in negative assortative mating for height, tall and short people would preferentially mate. One form of nonrandom mating is inbreeding, which is preferential mating between related individuals. Inbreeding is actually positive assortative mating for relatedness, but it differs from other types of assortative mating because it affects all genes, not just those that determine the trait for which the mating preference occurs. Inbreeding causes a departure from the Hardy-Weinberg equilibrium frequencies of p2, 2pq, and q2. More specifically, it leads to an increase in the proportion of homozygotes and a decrease in the proportion of heterozygotes in a population. Outcrossing is the avoidance of mating between related individuals. (23.11) With inbreeding, the proportion of heterozygotes decreases by 2Fpq, and half of this value (Fpq) is added to the proportion of each homozygote. Consider a population that reproduces by self-fertilization (so F  1). We will assume that this population begins with genotypic frequencies in Hardy-Weinberg proportions (p2, 2pq, and q2). With selfing, each homozygote produces Ancestral population A 1A 2 Homozygotes in present population If the two alleles are the same in structure and function but do not have a common origin, they are identical by state. ◗ A 2A 3 A 2A 2 A 1A 2 A 1A 1 If the two alleles are descended from a single allele present in the ancestral population, they are identical by descent. 23.4 Individuals may be homozygous by state or by descent. Inbreeding is a measure of the probability that two alleles are identical by descent. 677 678 Chapter 23 2 …leading to a completely homozygous population. 1 Selfing reduces the proportion of heterozygotes by half in each generation,… Self-fertilization Homozygotes in population (%) 100 Siblings F=1 F = .25 90 3 Matings between siblings increases the percentage of homozygotes. 80 4 Mating between cousins also increases the percentage of homozygotes, but at a slower rate. F = .0625 70 First cousins 60 50 0 2 4 6 8 10 12 Generation of inbreeding 14 ◗ 23.5 Inbreeding increases the percentage of homozygous individuals in a population. progeny only of the same homozygous genotype (AA  AA produces all AA; and aa  aa produces all aa), whereas only half the progeny of a heterozygote will be like the parent (Aa  Aa produces 14 AA, 12 Aa, and 14 aa). Selfing therefore reduces the proportion of heterozygotes in the population by half with each generation, until all genotypes in the population are homozygous (Table 23.1 and ◗ FIGURE 23.5). For most outcrossing species, close inbreeding is harmful because it increases the proportion of homozygotes and thereby boosts the probability that deleterious and lethal recessive alleles will combine to produce homozygotes with a harmful trait. Assume that a recessive allele (a) that causes a genetic disease has a frequency (q) of .01. If the population mates randomly (F  0), the frequency of individuals af- Table 23.1 Generational increase in frequency of homozygotes in a self-fertilizing population starting with p  q  .5 Genotypic Frequencies Generation AA Aa aa 1 4 1 4  18  38 3 8  116  716 2 1 4 1 8 4 4  18  38 3 8  116  716 2 3 4 1 16  32  32 7 1 15 n 1  ( 2) 2  1 1 2 n 1 16 1 (12)n 0 fected with the disease (aa) will be q2  .012  .0001; so only 1 in 10,000 individuals will have the disease. However, if F  .25 (the equivalent of brother – sister mating), then the expected frequency of the homozygote genotype is q2  2pqF  (.01)2  2(.99)(.01)(.25)  .0026; thus, the genetic disease is 26 times as frequent at this level of inbreeding. This increased appearance of lethal and deleterious traits with inbreeding is termed inbreeding depression; the more intense the inbreeding, the more severe the inbreeding depression. The harmful effects of inbreeding have been recognized by humans for thousands of years and are the basis of cultural taboos against mating between close relatives. William Schull and James Neel found that, for each 10% increase in F, the mean IQ of Japanese children dropped six points. Child mortality also increases with close inbreeding (Table 23.2); children of first cousins have a 40% increase in mortality over that seen among the children of randomly mated people. Inbreeding also has deleterious effects on crops ( ◗ FIGURE 23.6) and domestic animals. Inbreeding depression is most often studied in humans, as well as in plants and animals reared in captivity, but the negative effects of inbreeding may be more severe in natural populations. Julie Jimenez and her colleagues collected wild mice from a natural population in Illinois and bred them in the laboratory for three to four generations. Laboratory matings were chosen so that some mice had no inbreeding, whereas others had an inbreeding coefficient of .25. When both types of mice were released back into the wild, the weekly survival of the inbred mice was only 56% of that of the noninbred mice. Inbred male mice also continously lost weight after release into the wild, whereas noninbred male mice regained their body weight within a few days after release. In spite of the fact that inbreeding is generally harmful for outcrossing species, a number of plants and animals regularly inbreed and are successful ( ◗ FIGURE 23.7). Inbreeding is commonly used to produce domesticated plants and animals having desirable traits. As stated earlier, inbreeding increases homozygosity, and eventually all individuals in the population become homozygous for the same allele. If a species undergoes inbreeding for a number of generations, many deleterious recessive alleles are weeded out by natural or artificial selection so that the population becomes homozygous for beneficial alleles. In this way, the harmful effects of inbreeding may eventually be eliminated, leaving a population that is homozygous for beneficial traits. 1 1 16  32  32 7 1 15 1  (12)n 2 2 1 Concepts Nonrandom mating alters the frequencies of the genotypes but not the frequencies of the alleles. Inbreeding is preferential mating between related individuals. With inbreeding, the frequency of homozygotes increases while the frequency of heterozygotes decreases. Population and Evolutionary Genetics Table 23.2 Effects of inbreeding on Japanese children Genetic Relationship of Parents Mortality of Children (Through 12 Years of Age) F Unrelated 0 Second cousins 0.016 ( 64) First cousins .082 1 .108 .0625 ( 16) 1 .114 Source: After D. L. Hartl, and A. G. Clark, Principles of Population Genetics. 2d ed. (Sunderland, MA: Sinauer, 1989), Table 2. Original data from W. J. Schull, and J. V. Neel, The Effects of Inbreeding on Japanese Children. (New York: Harper & Row, 1965). Changes in Allelic Frequencies The Hardy-Weinberg law indicates that allelic frequencies do not change as a result of reproduction; thus, other processes must cause alleles to increase or decrease in frequency. Processes that bring about change in allelic frequency include mutation, migration, genetic drift (random effects due to small population size), and natural selection. Mutation Before evolution can occur, genetic variation must exist within a population; consequently, all evolution depends on processes that generate genetic variation. Although new combinations of existing genes may arise through recombination in meiosis, all genetic variants ultimately arise through mutation. The effect of mutation on allelic frequencies Mutation can influence the rate at which one genetic variant increases at the expense of another. Consider a single locus in a popu- lation of 25 diploid individuals. Each individual possesses two alleles at the locus under consideration; so the gene pool of the population consists of 50 allelic copies. Let us assume that there are two different alleles, designated G1 and G2 with frequencies p and q, respectively. If there are 45 copies of G1 and 5 copies of G2 in the population, p  .90 and q  .10. Now suppose that a mutation changes a G1 allele into a G2 allele. After this mutation, there are 44 copies of G1 and 6 copies of G2, and the frequency of G2 has increased from .10 to .12. Mutation has changed the allelic frequency. If copies of G1 continue to mutate to G2, the frequency of G2 will increase and the frequency of G1 will decrease ( ◗ FIGURE 23.8). The amount that G2 will change (q) as a result of mutation depends on: (1) the rate of G1-to-G2 mutation ( ); and (2) p, the frequency of G1 in the population When p is large, there are many copies of G1 available to mutate to G2, and the amount of change will be relatively large. As more mutations occur and p decreases, there will be fewer copies of G1 available to mutate to G2. The change in G2 as a result of mutation equals the mutation rate times the allelic frequency: Average yield of corn (bushels/acre) q  p 60 50 40 30 20 ◗ (23.12) 0 .25 .50 .75 Inbreeding coefficient (F ) 1.0 23.6 Inbreeding often has deleterious effects on crops. As inbreeding increases, the average yield of corn, for example, decreases. ◗ 23.7 Although inbreeding is generally harmful, a number inbreeding organisms are successful. 679 680 Chapter 23 Because most alleles are G1, there are more forward mutations than reverse mutations. mutation ( ) e m u ta t i o n ( ) rd Forwa G 1 (p) R e v ers G 2 (q ) Forward mutations increase the frequency of G 2. G 2 (q ) G 1 (p) As the frequency of G 2 increases, the number of alleles undergoing reverse mutation increases. Eventually, an equilibrium is reached, where the number of forward mutations equals the number of reverse mutations. Equilibrium G 1 (p) G 2 (q ) Conclusion: At equilibrium, the allelic frequencies do not change even though mutation in both directions continues. initially available to mutate to G2, and the increase in G2 due to forward mutation will be relatively large. However, as the frequency of G2 increases as a result of forward mutations, fewer copies of G1 are available to mutate; so the number of forward mutations decreases. On the other hand, few copies of G2 are initially available to undergo a reverse mutation to G1 but, as the frequency of G2 increases, the number of copies of G2 available to undergo reverse mutation to G1 increases; so the number of genes undergoing reverse mutation will increase. Eventually, the number of genes undergoing forward mutation will be counterbalanced by the number of genes undergoing reverse mutation. At this point, the increase in q due to forward mutation will be equal to the decrease in q due to reverse mutation, and there will be no net change in allelic frequency (q  0), in spite of the fact that forward and reserve mutations continue to occur. The point at which there is no change in the allelic frequency of a population is referred to as equilibrium (see Figure 23.8). Factors determining allelic frequencies at equilibrium We can determine the allelic frequencies at equilibrium by manipulating Equation 23.13. Recall that p  1  q. Substituting 1  q for p in Equation 23.13, we get: q  (1  q)  q   q q   q(  ) At equilibrium, q will be 0; so: 0 ◗ 23.8 Recurrent mutation changes allelic frequencies. Forward and reserve mutations eventually lead to a stable equilibrium. q  p  q (23.13) Reaching equilibrium of allelic frequencies Consider an allele that begins with a high frequency of G1 and a low frequency of G2. In this population, many copies of G1 are  q(  ) (23.15) q(  )  q̂  As the frequency of p decreases as a result of mutation, the change in frequency due to mutation will be less and less So far we have considered only the effects of G1 : G2 forward mutations. Reverse G2 : G1 mutations also occur at rate , which will probably be different from the forward mutation rate, . Whenever a reverse mutation occurs, the frequency of G2 decreases and the frequency of G1 increases (see Figure 23.8). The rate of change due to reverse mutations equals the reverse mutation rate times the allelic frequency of G2 (q  q). The overall change in allelic frequency is a balance between the opposing forces of forward mutation and reverse mutation: (23.14)  where q̂ˆ equals the frequency of G2 at equilibrium. This final equation tells us that the allelic frequency at equilibrium is determined solely by the forward and reverse mutation rates. Summary of effects When the only evolutionary force acting on a population is mutation, allelic frequencies change with the passage of time because some alleles mutate into others. Eventually, these allelic frequencies reach equilibrium and are determined only by the forward and reverse mutation rates. When the allelic frequencies reach equilibrium, the Hardy-Weinberg law tells us that genotypic frequencies also will remain the same. The mutation rates for most genes are low; so change in allelic frequency due to mutation in one generation is very small, and long periods of time are required for a population to reach mutational equilibrium. For example, if the forward Population and Evolutionary Genetics Migration Allelic frequency of G 1(p) 1.0 .8 .6 1 p decreases owing to forward mutation. .4 .2 0 Number of generations 2 p increases owing to 3 At equilibrium, forward mutation reverse mutation. is balanced by reverse mutation. ◗ 23.9 Change due to recurrent mutation slows as the frequency of p drops. Allelic frequencies are approaching mutational equilibrium at typical low mutation rates. The allelic frequency of G1 decreases as a result of forward (G1 : G2) mutation at rate (.0001) and increases as a result of reverse (G2 : G1) mutation at rate (.00001). Owing to the low rate of mutations, eventual equilibrium takes many generations to be reached. and reverse mutation rates for alleles at a locus are 1  105 and 0.3  105 per generation, respectively (rates that have actually been measured at several loci in mice), and the allelic frequencies are p  .9 and q  .1, then the net change in allelic frequency per generation due to mutation is: q  p  q  (1  105)(.9)  (.3  105)(.1)  8.7  106  .0000087 Another process that may bring about change in the allelic frequencies is the influx of genes from other populations, commonly called migration or gene flow. One of the assumptions of the Hardy-Weinberg law is that migration does not take place, but many natural populations do experience migration from other populations. The overall effect of migration is twofold: (1) it prevents genetic divergence between populations and (2) it increases genetic variation within populations. The effect of migration on allelic frequencies Let us consider the effects of migration by looking at a simple, unidirectional model of migration between two populations that differ in the frequency of an allele a. Say the frequency of this allele in population I is qI and in population II is qII ( ◗ FIGURE 23.10a and b). In each generation, a representative sample of the individuals in population I migrates to population II ( ◗ FIGURE 23.10c) and reproduces, adding its genes to population II’s gene pool. Migration is only from population I to population II (is unidirectional), and all the conditions of the Hardy-Weinberg law apply, except the absence of migration. After migration, population II consists of two types of individuals ( ◗ FIGURE 23.10d). Some are migrants; they make up proportion m of population II, and they carry genes from population I; so the frequency of allele a in the migrants is qI. The other individuals in population II are the original residents. If the migrants make up proportion m of population II, then the residents make up 1  m; because the residents originated in population II, the frequency of allele a in this group is qII. After migration, the frequency of allele a in the merged population II (qII) is: qII  qI(m)  qII(1  m) (23.16) Therefore, change due to mutation in a single generation is extremely small and, as the frequency of p drops as a result of mutation, the amount of change will become even smaller ( ◗ FIGURE 23.9). The effect of typical mutation rates on Hardy-Weinberg equilibrium is negligible, and many generations are required for a population to reach mutational equilibrium. Nevertheless, if mutation is the only force acting on a population for long periods of time, mutation rates will determine allelic frequencies. where qI(m) is the contribution to q made by the copies of allele a in the migrants and qII(1  m) is the contribution to q made by copies of allele a in the residents. The change in the allelic frequency due to migration (q) will be equal to the new frequency of allele a (qII) minus the original frequency of the allele (qII): Concepts In Equation 23.16, we determined that qII equals qI(m)  qII(1  m). Substituting this value for qII into the preceding equation, we get: Recurrent mutation causes changes in the frequencies of alleles. At equilibrium, the allelic frequencies are determined by the forward and reverse mutation rates. Because mutation rates are low, the effect of mutation per generation is very small. qII  qIIqII q  qI(m)  qII(1  m)  qII Expanding the term qII(1  m), we get: q  qIm  qII  qIIm  qII 681 682 Chapter 23 (a) Population I f ( a) = q I a allele (c) With each generation of migration, the frequencies of the two populations become more and more similar until, eventually, the allelic frequency of population II equals that of population I. When qI  qII  0, there will be no further change in the allelic frequency of population II, in spite of the fact that migration continues. If migration between two populations takes place for a number of generations with no other evolutionary forces present, an equilibrium is reached at which the allelic frequency of the recipient population equals that of the source population. The simple model of unidirectional migration between two populations just outlined can be expanded to accommodate multidirectional migration between several populations ( ◗ FIGURE 23.11). (b) Population II f ( a) = q II A allele Migration f ( a) = q I f ( a) = q II (d) Population II after migration Migrants from population I (m) Residents from population II (1–m) Conclusion: The frequency of allele a in population II after migration is q‘II = qIm + qII (1 – m). ◗ 23.10 The amount of change in allelic frequency due to migration between populations depends on the difference in allelic frequency and the extent of migration. Shown here is a model of the effect of unidirectional migration on allelic frequencies. (a) The frequency of allele a in the source population (population I) is qI. (b) The frequency of this allele in the recipient population (population II) is qII. (c) Each generation, a random sample of individuals migrate from population I to population II. (d) After migration, population II consists of migrants and residents. The migrants constitute proportion m and have a frequency of a equal to qI ; the residents constitute proportion 1  m and have a frequency of a equal to qII. The overall effect of migration Migration has two major effects. First, it causes the gene pools of populations to become more similar. Later, we will see how genetic drift and natural selection lead to genetic differences between populations; migration counteracts this tendency and tends to keep populations homogeneous in their allelic frequencies. Second, migration adds genetic variation to populations. Different alleles may arise in different populations owing to rare mutational events, and these alleles can be spread to new populations by migration, increasing the genetic variation within the recipient population. Concepts Migration causes changes in the allelic frequency of a population by introducing alleles from other populations. The magnitude of change due to migration depends on both the extent of migration and the difference in allelic frequencies between the source and the recipient populations. Migration decreases genetic differences between populations and increases genetic variation within populations. Genetic Drift In this last equation, we are subtracting qII from qII, which gives us zero; so the equation simplifies to: q  qIm  qIIm  m(qI  qII) (23.17) Equation 23.17 summarizes the factors that determine the amount of change in allelic frequency due to migration. The amount of change in q is directly proportional to the migration (m); as the amount of migration increases, the change in allelic frequency increases. The magnitude of change is also affected by the differences in allelic frequencies of the two populations (qI  qII); when the difference is large, the change in allelic frequency will be large. The Hardy-Weinberg law assumes random mating in an infinitely large population; only when population size is infinite will the gametes carry genes that perfectly represent the parental gene pool. But no real population is infinitely large, and when population size is limited, the gametes that unite to form individuals of the next generation carry a sample of alleles present in the parental gene pool. Just by chance, the composition of this sample may deviate from that of the parental gene pool, and this deviation may cause allelic frequencies to change. The smaller the gametic sample, the greater the chance that its composition will deviate from that of the entire gene pool. The role of chance in altering allelic frequencies is analogous to flipping a coin. Each time we flip a coin, we have a 50% chance of getting a head and a 50% chance of getting a Population and Evolutionary Genetics Population A Population B mAB f (a) = qA mAC f (a) = qB mBA mCA mCB Chapter 22). Suppose that we observe a large number of separate populations, each with N individuals and allelic frequencies of p and q. After one generation of random mating, genetic drift expressed in terms of the variance in allelic frequency among the populations (sp2) will be: mBC f (a) = qC Population C Frequency of a after migration q’A = qBmBA + qCmCA + qA(1– mBA – mCA) q’B = qAmAB + qCmCB + qB(1– mAB – mCB) q’C = qAmAC + qBmBC + qC(1– mAC – mBC) ◗ 23.11 Model of multidirectional migration among three populations, A, B, and C, with initial frequency of allele a equal to qA, qB, and qC, respectively. The proportion of a population made up of migrants from other populations is designated by m, where the subscripts represent the source and recipient populations. For example, mAC represents the proportion of population C that consists of individuals that moved from A to C. The allelic frequencies in populations A, B, and C after migration are represented by q A, q B, and qC. tail. If we flip a coin 1000 times, the observed ratio of heads to tails will be very close to the expected 50:50 ratio. If, however, we flip a coin only 10 times, there is a good chance that we will obtain not exactly 5 heads and 5 tails, but rather maybe 7 heads and 3 tails or 8 tails and 2 heads. This kind of deviation from an expected ratio due to limited sample size is referred to as sampling error. Sampling error occurs when gametes unite to produce progeny. Many organisms produce a large number of gametes but, when population size is small, a limited number of gametes unite to produce the individuals of the next generation. Chance influences which alleles are present in this limited sample and, in this way, sampling error may lead to changes in allelic frequency, which is called genetic drift. Because the deviations from the expected ratios are random, the direction of change is unpredictable. We can nevertheless predict the magnitude of the changes. The magnitude of genetic drift The amount of sampling error resulting from genetic drift can be estimated from the variance in allelic frequency. Variance is a statistical measure that describes the degree of variability in a trait (see p. 000 in sp2  pq 2N (23.18) The amount of change resulting from genetic drift (the variance in allelic frequency) is determined by two parameters: the allelic frequencies (p and q) and the population size (N). Genetic drift will be maximal when p and q are equal (each .5) and when the population size is small. The effect of population size on genetic drift is illustrated by a study conducted by Luca Cavalli-Sforza and his colleagues. They studied variation in blood types among villagers in the Parm Valley of Italy, where the amount of migration between villages was limited. They found that variation in allelic frequency was greatest between small isolated villages in the upper valley but decreased between larger villages and towns farther down the valley. This result is exactly what we expect with genetic drift: there should be more genetic drift and thus more variation among villages when population size is small. For ecological and demographic studies, population size is usually defined as the number of individuals in a group. The evolution of a gene pool depends, however, only on those individuals who contribute genes to the next generation. Population geneticists usually define population size as the equivalent number of breeding adults, the effective population size (Ne). Several factors determine the equivalent number of breeding adults. One factor is the sex ratio. When the numbers of males and females in the population are equal, the effective population size is simply the sum of reproducing males and females. When they are unequal, then the effective population size is: Ne  4  nmales  nfemales nmales  nfemales (23.19) Table 23.3 gives the effective population size for a theoretical population of 100 individuals with different proportions of males and females. Notice that, when the number of males and females is unequal, the effective population size is smaller than it is when the number of males and females is the same. For example, when a population consists of 90 males and 10 females, the effective population size is only 36, and genetic drift will occur as though the actual population consisted of only 36 individuals, equally divided between males and females. A population with 90 males and 10 females has the same effective population size as a population with 10 males and 90 females — it makes no difference which sex is in excess. 683 684 Chapter 23 Table 23.3 Effective population size (Ne) in theoretical populations of 100 individuals, each with a different sex ratio Sex Ratio* Number of Males Number of Females Ne 1.00 3.00 0.33 9.00 0.10 99.00 0.01 50 75 25 90 10 99 1 50 25 75 10 90 1 99 100 75 75 36 36 3.96 3.96 *The sex ratio is the ratio of the number of males to the number of females. The reason that the sex ratio influences genetic drift is that half the genes in the gene pool come from males and half come from females. When one sex is present in low numbers, genetic drift increases because half of the genes are coming from a small number of individuals. In a population consisting of 10 males and 90 females, the overall population size is relatively large (100), but only 10 males contribute half the genes to the next generation. Sampling error therefore affects the range of genes present in the male gametes, and chance will have a major effect on the allelic frequencies of the next generation. Other factors that influence effective population size include variation between individuals in reproductive success, fluctuations in population size, the age structure of the population, and whether mating is random. Concepts Genetic drift is change in allelic frequency due to chance factors. The amount of change in allelic frequency due to genetic drift is inversely related to the effective population size (the equivalent number of breeding adults in a population). Effective population size decreases when there are unequal numbers of breeding males and females. Causes of genetic drift All genetic drift arises from sampling error, but there are several different ways in which sampling error can arise. First, a population may be reduced in size for a number of generations because of limitations in space, food, or some other critical resource. Genetic drift in a small population for multiple generations can significantly affect the composition of a population’s gene pool. A second way that sampling error can arise is through the founder effect, which is due to the establishment of a population by a small number of individuals; the population of Tristan da Cuna, discussed in the introduction to this chapter, underwent a founder effect. Although a population may increase and become quite large, the genes carried by all its members are derived from the few genes originally present in the founders (assuming no migration or mutation). Chance events affecting which genes were present in the founders will have an important influence on the makeup of the entire population. The small number of founders of Tristan da Cuna included two sisters and a daughter who suffered from asthma; the high incidence of asthma on the island today can be traced to alleles carried by these founders. A third way that genetic drift arises is through a genetic bottleneck, which develops when a population undergoes a drastic reduction in population size. A genetic bottleneck developed in northern elephant seals ( ◗ FIGURE 23.12). Before 1800, thousands of elephant seals were found along the California coast, but the population was devastated by hunting between 1820 and 1880. By 1884, as few as 20 seals survived on a remote beach of Isla de Guadelupe west of Baja, California. Restrictions on hunting enacted by the United States and Mexico allowed the seals to recover, and there are now more than 30,000 seals in the population. All seals in the population today are genetically similar, because they have genes that were carried by the few survivors of the population bottleneck. The effects of genetic drift Genetic drift has several important effects on the genetic composition of a population. First, it produces change in allelic frequencies within a population. Because drift is random, allelic frequency is just as likely to increase as it is to decrease and will wander with the passage of time (hence the name genetic drift). ◗ FIGURE 23.13 illustrates a computer simulation of genetic drift in five populations over 30 generations, starting with q  .5 and maintaining a constant population size of 10 males and 10 females. These allelic frequencies change randomly from generation to generation. A second effect of genetic drift is to reduce genetic variation within populations. Through random change, an allele may eventually reach a frequency of either 1 or 0, at which point all individuals in the population are homozygous for one allele. When an allele has reached a frequency of 1, we say that it has reached fixation. Other alleles are lost (reach a frequency of 0) and can be restored only by migration from another population or by mutation. Fixation, then, leads to a loss of genetic variation within a population. This loss can be seen in northern elephant seals. Today, these seals have low levels of genetic variation; a study of 24 protein-encoding genes found no individual or population differences in these genes. Given enough time, all small populations will become fixed for one allele. Which allele becomes fixed is random and is determined by the initial frequency of the allele. If Population and Evolutionary Genetics ◗ 23.12 Northern elephant seals underwent a severe genetic bottleneck between 1820 and 1880. Today, these seals have low levels of genetic variation. (Lisa Husar/DRK Photo.) the population begins with two alleles, each with a frequency of .5, both alleles have an equal probability of fixation. However, if one allele is initially common, it is more likely to become fixed. A third effect of genetic drift is that different populations diverge genetically with time. In Figure 23.13, all five populations begin with the same allelic frequency (q  .5) but, because drift occurs randomly, the frequencies in different populations do not change in the same way, and so 1 Population 1 Fixation of allele A2 .8 populations gradually acquire genetic differences. Notice that, although the variance in allelic frequency among the populations increases, the average allelic frequency remains basically the same. Eventually, all the populations reach fixation; some will become fixed for one allele and others will become fixed for the alternative allele. This divergence of populations through genetic drift is strikingly illustrated in the results of an experiment carried out by Peter Buri on fruit flies ( ◗ FIGURE 23.14). The three results of genetic drift (allelic frequency change, loss of variation within populations, and genetic divergence between populations) occur simultaneously, and all result from sampling error. The first two results occur within populations, whereas the third occurs between populations. f(A2) = q Population 2 .6 Concepts Population 3 .4 Population 4 .2 0 Fixation of allele A1 0 5 10 15 20 25 Population 5 30 Genetic drift results from continuous small population size, founder effect (establishment of a population by a few founders), and bottleneck effect (population reduction). Genetic drift causes change in allelic frequencies within a population, loss of genetic variation through fixation of alleles, and genetic divergence between populations. Generation ◗ 23.13 Genetic drift changes allelic frequencies within populations, leading to a reduction in genetic variation through fixation and genetic divergence among populations. Shown here is a computer simulation of changes in the frequency of allele A2 (q) in five different populations due to random genetic drift. Each population consists of 10 males and 10 females and begins with q  .5. Natural Selection A final process that brings about changes in allelic frequencies is natural selection, the differential reproduction of genotypes (see p. 000 in Chapter 22). Natural selection takes place when individuals with adaptive traits produce more offspring. If the adaptive traits have a genetic basis, they are inherited by the offspring and appear with greater frequency 685 686 Chapter 23 Generation: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 ◗ 23.14 Populations diverge in allelic frequency and become fixed for one allele as a result of genetic drift. In this experiment, Buri examined the frequency of two alleles (bw75 and bw) that affect Drosophila eye color in 107 replicate populations. Each population consisted of 8 males and 8 females; each population began with the frequency of bw75 equal to .5. ◗ 23.15 Natural selection produces adaptations, such as those seen in polar bears that inhabit the extreme Arctic environment. These bears blend into the snowy background, which helps them in hunting seals. The hairs of their fur stay erect even when wet, and thick layers of blubber provide insulation, which protects against subzero temperatures. Their digestive tracts are adapted to a seal-based carnivorous diet. (Tom and Pat Leeson/DRK Photo.) in the next generation. A trait that provides a reproductive advantage thereby increases over time, enabling populations to become better suited to their environments — to become better adapted. Natural selection is unique among evolutionary forces in that it promotes adaptation ( ◗ FIGURE 23.15). Fitness and selection coefficient The effect of natural selection on the gene pool of a population depends on the fitness values of the genotypes in the population. Fitness is defined as the relative reproductive success of a genotype. Here the term relative is critically important: fitness is the reproductive success of one genotype compared with the reproductive successes of other genotypes in the population. Fitness (W) ranges from 0 to 1. Suppose the number of viable offspring produced by three genotypes is: Genotypes: Mean number of offspring produced: A1A1 A1A2 A2A2 10 5 2 To calculate fitness for each genotype, we take the average number of offspring produced by a genotype and divide it by the mean number of offspring produced by the most prolific genotype: A1A1 A1A2 10 5 Fitness (W): W11   1.0 W12   .5 10 10 A2A2 2 W22   .2 10 (23.20) The fitness of the genotype A1A1 is designated W11, that of A1A2 is W12, and that of A2A2 is W22. A related variable is the selection coefficient (s), which is the relative intensity of selection against a genotype. The selection coefficient is equal to 1  W; so the selection coefficients for the preceding three genotypes are: Population and Evolutionary Genetics Selection coefficient (1  W): A1A1 s11  0 A1A2 s12  .5 A2A2 s22  .8 resents the frequency of A1 and q represents the frequency of A2. On the first line of the table, we record the initial genotypic frequencies before selection has acted, before the onset of winter. If mating has been random (an assumption of the model), the genotypes will have the Hardy-Weinberg equilibrium frequencies of p2, 2pq, and q2. On the second row of the table, we put the fitness values of the corresponding genotypes. Some of the birds die in the winter; so here the fitness values represent the relative survival of the three genotypes. The proportion of the population represented by each genotype after selection is obtained by multiplying the initial genotypic frequency times its fitness (third row of Table 23.4). Now the genotypes are no longer in Hardy-Weinberg equilibrium. The mean fitness (W ) of the population is the sum of the proportionate contributions of the three genotypes: We usually speak of selection for a particular genotype, but keep in mind that, when selection is for one genotype, selection is automatically against at least one other genotype. Concepts Natural selection is the differential reproduction of genotypes. It is measured as fitness, which is the reproductive success of a genotype compared with other genotypes in a population. The general selection model Differential fitness among genotypes over time leads to changes in the frequencies of the genotypes, which, in turn, lead to changes in the frequencies of the alleles that make up the genotypes. We can predict the effect of natural selection on allelic frequencies by using a general selection model, which is outlined in Table 23.4. Use of this model requires knowledge of both the initial allelic frequencies and the fitness values of the genotypes. It assumes that mating is random and the only force acting on a population is natural selection. We have defined fitness in terms of relative reproduction, but it will be easier to understand the logic behind the general selection model if we think of the fitness of the genotypes as differences in survival. It applies equally to fitnesses representing differential reproduction. Let’s apply the general selection model outlined in Table 23.4. Imagine a flock of sparrows overwintering in Rochester, New York. Assume that we can determine the genotypes for a locus that affects the ability of the birds to survive the winter; perhaps the genes at this locus determine the amount of fat that a bird accumulates before the onset of winter. For genotypes A1A1, A1A2, and A2A2, p rep- Table 23.4 W  p2W11  2pqW12  q2W22 (23.21) The mean fitness W is the average fitness of all individuals in the population and allows the frequencies of the genotypes after selection to be obtained. In our flock of birds, these frequencies will be those of the three genotypes after the winter mortality. The frequency of a genotype after selection will be equal to its proportionate contribution divided by the mean fitness of the population (p2W11/W for genotype A1A1, 2pqW12/W for genotype A1A2, and q2W22 /W for genotype A2A2), as shown in the fourth line of Table 23.4. When the new genotypic frequencies have been calculated, the new allelic frequency of A1 (p ) can be determined by using the now-familiar formula (Equation 23.4): p  f(A1)  f(A1A1)  12 f(A1A2) and that of q can be obtained by subtraction: q 1p Method for determining changes in allelic frequency due to selection A1A1 A1A2 A2 A2 Initial genotypic frequencies p2 2pq q2 Fitnesses W11 W12 W22 Proportionate contribution of genotypes to population p2W11 2pqW12 q2W22 Relative genotypic frequency after selection p2W11 W 2pqW22 W q2W22 W Note: W  p2W11  2pqW12  q2W22 Allelic frequencies after selection: p  f(A1)  f(A1A1)  12 f(A1A2) q 1p 687 688 Chapter 23 The last step is to determine the genotypic frequencies in the next generation. In regard to our birds, these genotypic frequencies are those of the offspring of the birds that survived the winter. If the survivors mate randomly, the genotypic frequencies in the next generation will be p 2, 2p q , and q 2. The general selection model can be used to calculate the allelic frequencies after any type of selection. It is also possible to work out formulas for determining the change in allelic frequency when selection is against recessive, dominant, and codominant traits, as well as traits in which the heterozygote has highest fitness (Table 23.5). Concepts The change in allelic frequency due to selection can be determined for any type of genetic trait by using the general selection model. The results of selection The results of selection depend on the relative fitnesses of the genotypes. If we have three genotypes (A1A1, A1A2, and A2A2) with fitnesses W11, W12, and W22, we can identify six different types of natural selection (Table 23.6). In type 1 selection, a dominant allele A1 confers a fitness advantage; in this case, the fitnesses of genotypes A1A1 and A1A2 are equal and higher than the fitness of A2A2 (W11  W12  W22). Because the heterozygote and the A1A1 homozygote both have copies of the A1 allele and produce more offspring than the A2A2 homozygote does, the frequency of the A1 allele will increase over time, whereas the frequency of the A2 allele will decrease. This form of selection, in which one allele or trait is favored over another, is termed directional selection. Type 2 selection (Table 23.6) is directional selection against a dominant allele A1 (W11  W12 W22). In this case, the A2 allele increases and the A1 allele decreases. Type Table 23.5 3 and type 4 selection also are directional selection, but in these cases there is incomplete dominance and the heterozygote has a fitness that is intermediate between the two homozygotes (W11 W12 W22 for type 3; W11  W12  W22 for type 4). When A1A1 has the highest fitness (type 3), over time the A1 allele increases and the A2 allele decreases. When A2A2 has the highest fitness (type 4), over time the A2 allele increases and the A1 allele decreases. Eventually, directional selection leads to fixation of the favored allele and elimination of the other allele, as long as no other evolutionary forces act on the population. Two types of selection (types 5 and 6) are special situations that lead to equilibrium, where there is no further change in allelic frequency. Type 5 selection is referred to as overdominance or heterozygote advantage. Here, the heterozygote has higher fitness than the fitnesses of the two homozygotes (W11 W12  W22). With overdominance, both alleles are favored in the heterozygote, and neither allele is eliminated from the population. Initially, the allelic frequencies may change because one homozygote has higher fitness than the other; the direction of change will depend on the relative fitness values of the two homozygotes. The allelic frequencies change with overdominant selection until a stable equilibrium is reached, at which point there is no further change. The allelic frequency at equilibrium (q̂) depends on the relative fitnesses (usually expressed as selection coefficients) of the two homozygotes: q̂  f(A2)  s11 s11  s22 where s11 represents the selection coefficient of the A1A1 homozygote and s22 represents the selection coefficient of the A2A2 homozygote. The last type of selection (type 6) is underdominance, in which the heterozygote has lower fitness than both Formulas for calculating change in allelic frequencies with different types of selection Fitness Values Type of Selection (23.22) A1A1 A 1 A2 A2A2 Change in q Selection against a recessive trait 1 1 1s spq2 1  sp2 Selection against a dominant trait 1 1s 1s spq2 1  s  sq2 Selection against a trait with no dominance 1 1  12s 1s 12 spq 1  sq Selection against both homozygotes (overdominance) 1  s11 1 1  s22 pq (s11p  s22q) 1  s11p2  s22q2 Population and Evolutionary Genetics Table 23.6 Types of natural selection Type Fitness Relation Form of Selection Result 1 W11  W12  W22 Directional selection against recessive allele A2 A1 increases, A2 decreases 2 W11  W12 Directional selection against dominant allele A1 A2 increases, A1 decreases 3 W11  W12  W22 Directional selection against incompletely dominant allele A2 A1 increases, A2 decreases 4 W11 W12 Directional selection against incompletely dominant allele A1 A2 increases, A1 decreases 5 W11 W12  W22 Overdominance Stable equilibrium, both alleles maintained 6 W11  W12 Underdominance Unstable equilibrium W22 W22 W22 Note: W11, W12, and W22 represent the fitnesses of genotypes A1A1, A1A2, and A2A2, respectively. Concepts Natural selection changes allelic frequencies; the direction and magnitude of change depends on the intensity of selection, the dominance relations of the alleles, and the allelic frequencies. Directional selection favors one allele over another and eventually leads to fixation of the favored allele. Overdominance leads to a stable equilibrium with maintenance of both alleles in the population. Underdominance produces an unstable equilibrium because the heterozygote has lower fitness than those of the two homozygotes. lower rate than that of dominant alleles. Recessive alleles increase at the lowest rate, because only the homozygote is favored by selection. The rate at which selection changes allelic frequencies also depends on the allelic frequency itself. If an allele (A2) is lethal and recessive, W11  W12  1, whereas W22  0. The 1 Dominant alleles increase rapidly through selection because both the homozygote and the heterozygote are favored. 2 With incomplete dominance, the allele increases at a lower rate. Dominant allele 1 Freqeuncy of allele (A) homozygotes (W11  W12 W22). Underdominance leads to an unstable equilibrium; here allelic frequencies will not change as long as they are at equilibrium but, if they are disturbed from the equilibrium point by some other evolutionary force, they will move away from equilibrium until one allele eventually becomes fixed. .9 Recessive allele .8 3 Recessive alleles increase at the lowest rate, because only the homozygote is favored by selection. .7 The rate of change in allelic frequency due to natural selection The rate at which an allele changes in .6 frequency owing to selection depends on the intensity of selection and the dominance relations among the genotypes ( ◗ FIGURE 23.16). Under directional selection, dominant alleles will increase much more rapidly than recessive alleles, because homozygotes and heterozygotes are favored. With incomplete dominance, the heterozygote has a selective advantage, but not as much as the homozygote; so incompletely dominant alleles increase in frequency at a .5 ◗ Allele with incomplete dominance 0 1 2 3 4 Generations 5 6 23.16 The rate of change in allelic frequency due to selection depends on the dominance relations among the genotypes. Here, change in the frequency of an allele is shown for different types of dominance with a constant selection coefficient. 689 690 Chapter 23 frequency of the A2 allele will decrease over time (because the A2A2 homozygote produces no offspring), and the rate of decrease will be proportional to the frequency of the recessive allele. When the frequency of the allele is high, the change in each generation is relatively large but, as the frequency of the allele drops, a higher proportion of the alleles are in the heterozygous genotypes, where they are immune to the action of natural selection (the heterozygotes have the same phenotype as the favored homozygote). Thus, selection against a rare recessive allele is very inefficient and its removal from the population is slow. The relation between the frequency of a recessive allele and its rate of change under natural selection has an important implication. Some people believe that the medical treatment of patients with rare recessive diseases will cause the disease gene to increase, eventually leading to degeneration of the human gene pool. This mistaken belief was the basis of eugenic laws that were passed in the early part of the twentieth century prohibiting the marriage of persons with certain genetic conditions and allowing the involuntary sterilization of others. However, most copies of rare recessive alleles are present in heterozygotes, and selection against the homozygotes will have little effect on the frequency of a recessive allele. Thus whether the homozygotes reproduce or not has little effect on the frequency of the disorder. Mutation and natural selection Recurrent mutation and natural selection act as opposing forces on detrimental alleles; mutation increases their frequency and natural selection decreases their frequency. Eventually, these two forces reach an equilibrium, in which the number of alleles added by mutation is balanced by the number of alleles removed by selection. Table 23.5 shows that the change in allelic frequency due to selection against a recessive allele is  spq2/(1  sq2). When q is very low, q2 is near zero; so 1  sq2 will be approximately 1  0  1. Thus, when q is very low, the decrease in frequency due to selection is approximately  spq2. The increase in frequency of an allele due to forward mutations is p (Equation 23.12). At equilibrium, the effects of mutation and selection are balanced; so spq2  p This equation can be rearranged: q2  p  sp s Taking the square root of each side, we get q̂  s (23.23) The frequency of the allele at equilibrium (q̂) is therefore equal to the square root of the mutation rate divided by the selection coefficient. With the use of the equation for selection acting on a dominant allele (see Table 23.5) and similar reasoning, the frequency of a dominant allele at equilibrium can be shown to be q̂  s (23.24) Achondroplasia (discussed in Chapter 17) is a common type of human dwarfism that results from a dominant gene. People with this condition are fertile, although they produce only about 74% as many children as are produced by people without achondroplasia. The fitness of people with achondroplasia therefore averages .74, and the selection coefficient (s) is 1  W, or .26. If we assume that the mutation rate for achondroplasia is about 3  105 (a typical mutation rate in humans), then we can predict that the equilibrium frequency for the achondroplasia allele will be q̂  (.00003/.26)  .0001153. This frequency is close to the actual frequency of the disease. Concepts Mutation and natural selection act as opposing forces on detrimental alleles: mutation tends to increase their frequency and natural selection tends to decrease their frequency, eventually producing an equilibrium. Connecting Concepts The General Effects of Evolutionary Forces You now know that four processes bring about change in the allelic frequencies of a population: mutation, migration, genetic drift, and natural selection. Their short- and long-term effects on allelic frequencies are summarized in Table 23.7. In some cases, these changes continue until one allele is eliminated and the other becomes fixed in the population. Genetic drift and directional selection will eventually result in fixation, provided these forces are the only ones acting on a population. With the other evolutionary forces, allelic frequencies change until an equilibrium point is reached, and then there is no additional change in allelic frequency. Mutation, migration, and some forms of natural selection can lead to stable equilibria (see Table 23.7). Population and Evolutionary Genetics Table 23.7 Effects of different evolutionary forces on allelic frequencies within populations Force Short-Term Effect Long-Term Effect Mutation Change in allelic frequency Equilibrium reached between forward and reverse mutations Migration Change in allelic frequency Equilibrium reached when allelic frequencies of source and recipient population are equal Genetic drift Change in allelic frequency Fixation of one allele Natural selection Change in allelic frequency Directional selection: fixation of one allele Overdominant selection: equilibrium reached The different evolutionary forces affect both genetic variation within populations and genetic divergence between populations. Evolutionary forces that maintain or increase genetic variation within populations are listed in the upper-left quadrant of ◗ FIGURE 23.17. These forces include some types of natural selection, such as overdominance in which both alleles are favored. Mutation and migration also increase genetic variation within populations because they introduce new alleles to the population. Evolutionary forces that decrease genetic variation within populations are listed in the lower-left quadrant of Figure 23.17. These forces include genetic drift, which decreases variation through fixation of alleles, and some forms of natural selection such as directional selection. The various evolutionary forces also affect the amount of genetic divergence between populations. Natural selection increases divergence among populations if different alleles are favored in the different populations, but it can also decrease divergence between populations by favoring the same allele in the different populations. Mutation almost Increase genetic variation Decrease genetic variation ◗ 23.17 Within populations Between populations Mutation Migration Some types of natural selection Mutation Genetic drift Some types of natural selection Genetic drift Some types of natural selection Migration Some types of natural selection Mutation, migration, genetic drift, and natural selection have different effects on genetic variation within populations and on genetic divergence between populations. always increases divergence between populations because different mutations arise in each population. Genetic drift also increases divergence between populations because changes in allelic frequencies due to drift are random and are likely to change in different directions in separate populations. Migration, on the other hand, decreases divergence between populations because it makes populations similar in their genetic composition. Migration and genetic drift act in opposite directions: migration increases genetic variation within populations and decreases divergence between populations, whereas genetic drift decreases genetic variation within populations and increases divergence among populations. Mutation increases both variation within populations and divergence between populations. Natural selection can either increase or decrease variation within populations, and it can increase or decrease divergence between populations. It is important to keep in mind that real populations are simultaneously affected by many evolutionary forces. This discussion has examined the effects of mutation, migration, genetic drift, and natural selection in isolation so that the influence of each process would be clear. However, in the real world, populations are commonly affected by several evolutionary forces at the same time, and evolution results from the complex interplay of numerous processes. www.whfreeman.com/pierce of resources on evolution Web addresses for a number Molecular Evolution For many years, it was not possible to examine genes directly, and evolutionary biology was confined largely to the study of how phenotypes change with the passage of time. The tremendous advances in molecular genetics in recent years 691 692 Chapter 23 have made it possible to investigate evolutionary change directly by analyzing protein and nucleic acid sequences. These molecular data offer a number of advantages for studying the process and pattern of evolution: 1. Molecular data are genetic. Evolution results from genetic change over time. Anatomical, behavioral, and physiological traits often have a genetic basis, but the relation between the underlying genes and the trait may be complex. Protein and nucleic acid sequence variation has a clear genetic basis that is easy to interpret. 2. Molecular methods can be used with all organisms. Early studies of population genetics relied on simple genetic traits such as human blood types or banding patterns in snails, which are restricted to a small group of organisms. However, all living organisms have proteins and nucleic acids; so molecular data can be collected from any organism. 3. Molecular methods can be applied to a huge amount of genetic variation. An enormous amount of data can be accessed by molecular methods. The human genome, for example, contains more than 3 billion base pairs of DNA, which constitutes a large pool of information about our evolution. 4. All organisms can be compared with the use of some molecular data. Trying to assess the evolutionary history of distantly related organisms is often difficult because they have few characteristics in common. The evolutionary relationships between angiosperms were traditionally assessed by comparing floral anatomy, whereas the evolutionary relationships of bacteria were determined by their nutritional and staining properties. Because plants and bacteria have so few structural characteristics in common, evaluating how they are related to one another was difficult in the past. All organisms have certain molecular traits in common, such as ribosomal RNA sequences and some fundamental proteins. These molecules offer a valid basis for comparisons among all organisms. 5. Molecular data are quantifiable. Protein and nucleic acid sequence data are precise, accurate, and easy to quantify, which facilitates the objective assessment of evolutionary relationships. 6. Molecular data often provide information about the process of evolution. Molecular data can reveal important clues about the process of evolution. For example, the results of a study of DNA sequences have revealed that one type of insecticide resistance in mosquitoes probably arose from a single mutation that subsequently spread throughout the world. 7. The database of molecular information is large and growing. Today, this database of DNA and protein sequences can be used for making evolutionary comparisons and inferring mechanisms of evolution. Studies of molecular evolution fall into three primary areas. First, much past research has focused on determining the extent and causes of genetic variation in natural populations. Molecular techniques allow these matters to be addressed directly by examining sequence variation in proteins and DNA. A second area of research examines molecular processes that influence evolutionary events, and the results of these studies have elucidated new mechanisms and processes of evolution that were not suspected before the application of molecular techniques to evolutionary biology. A third area of research in molecular evolution applies molecular techniques to constructing phylogenies (evolutionary trees) of various groups of organisms. A detailed evolutionary history is found in the DNA sequences of every organism, and molecular techniques allow this history to be read. Concepts Molecular techniques and data offer a number of advantages for evolutionary studies. Molecular data (1) are genetic in nature and can be investigated in all organisms; (2) provide potentially large data sets; (3) allow all organisms to be compared, by using the same characteristics; (4) are easily quantifiable; and (5) provide information about the process of evolution. Protein Variation The study of the amounts and kinds of genetic variation in natural populations is central to the study of evolution. For many traits, a complex interaction of many genes and environmental factors determines the phenotype, and assessing the amount of genetic variation by examining phenotypic variation was difficult. Early population geneticists were forced to rely on the phenotypic traits that had a simple genetic basis, such as human blood types or spotting patterns in butterflies ( ◗ FIGURE 23.18). The initial breakthrough that first allowed the direct examination of molecular evolution was the application of electrophoresis (see Figure 18.4) to population studies. This technique separates macromolecules, such as proteins or nucleic acids, on the basis of their size and charge. In 1966, Richard Lewontin and John Hubby extracted proteins from wild fruit flies, separated the proteins by electrophoresis, and stained for specific enzymes. Examining the pattern of bands on gels enabled them to assign genotypes to individual flies and to quantify the amount of genetic variation in natural populations. In the same year, Harry Harris quantified genetic variation in human populations by using the same technique. Protein variation has now been examined in hundreds of different species by using protein electrophoresis ( ◗ FIGURE 23.19). Population and Evolutionary Genetics Normal homozygotes Allele 2 Allele 3 Allele 1 Allele 2 Heterozygotes ◗ Recessive bimacula phenotype ◗ 23.18 Early population geneticists were forced to rely on the phenotypic traits that had a simple genetic basis. Variation in the spotting patterns of the butterfly Panaxia dominula is an example. Measures of genetic variation The amount of genetic variation in populations is commonly measured by two parameters. The proportion of polymorphic loci is the proportion of examined loci in which more than one allele is present in a population. If we examined 30 different loci and found two or more alleles present at 15 of these loci, the percentage of polymorphic loci would be 15/30  0.5. The expected heterozygosity is the proportion of individuals that are expected to be heterozygous at a locus under the Hardy-Weinberg conditions, which is 2pq when there are two alleles present in the population. The expected heterozygosity is often preferred to the observed heterozygosity because expected heterozygosity is independent of the breeding system of an organism. For example, if a species self-fertilizes, it may have little or no heterozygosity but still have considerable genetic variation, which will be detected by the expected heterozygosity. Expected heterozygosity is typically calculated for a number of loci and is then averaged over all the loci examined. 23.19 Molecular variation in proteins is revealed by electrophoresis. Tissue samples from three fruit flies have been subjected to electrophoresis and stained for malate dehydrogenase. Homozygotes are represented as single bands; heterozygotes as triple bands. The genotype of each fly is given below each sample. The percentage of polymorphic loci and the expected heterozygosity have been determined by protein electrophoresis for a number of species (Table 23.8). About one-third of all protein loci are polymorphic, and expected heterozygosity averages about 10%, although there is considerable diversity among species. These measures actually underestimate the true amount of genetic variation, though, because protein electrophoresis does not detect some amino acid substitutions; nor does it detect genetic variation in DNA that does not alter the amino acids of a protein (synonymous codons and variation in noncoding regions of the DNA). Explanations for protein variation By the late 1970s, geneticists recognized that most populations possess large amounts of genetic variation, although the evolutionary significance of this fact was not at all clear. Two opposing hypotheses arose to account for the presence of the extensive molecular variation in proteins. The neutral-mutation hypothesis proposed that the molecular variation revealed by protein electrophoresis is adaptively neutral; that is, individuals with different molecular variants have equal fitness. This hypothesis does not propose that the proteins are functionless; rather, it suggests that most variants revealed by protein electrophoresis are functionally equivalent. Because these variants are functionally equivalent, natural selection does not differentiate between them, and their evolution is shaped largely by the random processes of genetic drift and mutation. The neutral-mutation hypothesis accepts that 693 694 Chapter 23 Table 23.8 Proportion of polymorphic loci and heterozygosity for different organisms, as determined by protein electrophoresis Proportion of Polymorphic Loci Heterozygosity Group Number of Species Mean SD* Mean SD* Plants 15 0.26 0.17 0.07 0.07 Invertebrates (excluding insects) 28 0.40 0.28 0.10 0.07 Insects (excluding Drosophila) 23 0.33 0.20 0.07 0.08 Drosophila 32 0.43 0.13 0.14 0.05 Fish 61 0.15 0.01 0.05 0.04 Amphibians 12 0.27 0.13 0.08 0.04 Reptiles 15 0.22 0.13 0.05 0.02 Birds 10 0.15 0.11 0.05 0.04 Mammals 46 0.15 0.10 0.04 0.02 * SD, standard deviation from the mean. Source: After L. E. Mettler, T. G. Gregg, and H. E. Schaffer, Population Genetics and Evolution, 2d ed. (Englewood Cliffs, NJ: Prentice Hall, 1988), Table 9.2. Original data from E. Nevo, Genetic variation in natural populations: patterns and theory, Theoretical Population Biology 13(1978):121 – 177. natural selection is an important force in evolution, but views selection as a process that favors the “best” allele while eliminating others. It proposes that, when selection is important, there will be little genetic variation. The balance hypothesis proposes, on the other hand, that the genetic variation in natural populations is maintained by selection that favors variation (balancing selection). Overdominance, in which the heterozygote has higher fitness than that of either homozygote, is one type of balancing selection. Under this hypothesis, the molecular variants are not physiologically equivalent and do not have the same fitness. Instead, genetic variation within natural populations is shaped largely by selection, and, when selection is important, there will be much variation. Many attempts to prove one hypothesis or the other failed, because precisely how much variation was actually present was not clear (remember that protein electrophoresis detects only some genetic variation) and because both hypotheses are capable of explaining many different patterns of genetic variation. The controversy over the forces that control variation revealed by protein electrophoresis continues today, but the results of more-recent studies that provide direct information about DNA sequence variation demonstrate that much variation at the level of DNA has little obvious effect on the phenotype and therefore is likely to be neutral. Concepts The application of electrophoresis to the study of protein variation in natural populations revealed that most organisms possess large amounts of genetic variation. The neutral-mutation hypothesis proposes that most molecular variation is neutral with regard to natural selection and is shaped largely by mutation and genetic drift. The balance hypothesis proposes that genetic variation is maintained by balancing selection. www.whfreeman.com/pierce Information on genetic variation in natural populations DNA Sequence Variation The development of techniques for isolating, restricting, and sequencing DNA in the 1970s and 1980s provided powerful tools for detecting, quantifying, and investigating genetic variation. The application of these techniques has provided a detailed view of molecular variation. Restriction enzymes are one tool that can be used to detect genetic variation in DNA and examine patterns of genetic variation in nature. Each restriction enzyme recognizes and cuts a particular sequence of DNA nucleotides, known as Population and Evolutionary Genetics that enzyme’s restriction site (see Chapter 18). Variation in the presence of a restriction site is called a restriction fragment length polymorphism (RFLP; see Figure 18.26). Each restriction enzyme recognizes a limited number of nucleotide sites in a particular piece of DNA but, if a number of different restriction enzymes are used and the sites recognized by the enzymes are assumed to be random sequences, RFLPs can be used to estimate the amount of variation in the DNA and the proportion of nucleotides that differ between organisms. Methods for determining the complete nucleotide sequences of DNA fragments (see p. 000 in Chapter 19) provide the most detailed evolutionary information, although they are both time consuming and expensive. DNA sequencing in evolutionary studies is therefore usually limited to a few individuals or to short sequences. Nevertheless, the high resolution of information provided by sequencing is often invaluable for understanding molecular processes that influence evolution and for determining phylogenies of closely related organisms. For example, DNA sequencing has been used to study the evolution of human immunodeficiency virus (HIV), the virus that causes AIDS. Like many other RNA viruses, HIV evolves rapidly, often changing its sequences within a single host over a period of several years. Evolutionary comparisons of HIV sequences in a dentist and seven of his patients who had AIDS demonstrated that five of the patients contracted AIDS from the dentist, whereas the other two patients probably acquired their HIV infection elsewhere. Concepts Restriction fragment length polymorphisms and DNA sequencing can be used to directly examine genetic variation. Table 23.9 Molecular Evolution of HIV in a Florida Dental Practice In July 1990, the U.S. Center for Disease Control (CDC) reported that a young woman in Florida (later identified as Kimberly Bergalis) had become HIV positive after undergoing an invasive dental procedure performed by a dentist who had AIDS. Bergalis had no known risk factors for HIV infection and no known contact with other HIV-positive persons. The CDC acknowledged that Bergalis might have acquired the infection from her dentist. Subsequently, the dentist wrote to all of his patients, suggesting that they be tested for HIV infection. By 1992, 7 of the dentist’s patients had tested positive for HIV, and this number eventually increased to 10. Originally diagnosed with HIV infection in 1986, the dentist began to develop symptoms of AIDS in 1987 but continued to practice dentistry for another 2 years. All of his HIV-positive patients had received invasive dental procedures, such as root canals and tooth extractions, in the period when the dentist was infected. Among the seven patients originally studied by the CDC (patients A – G, Table 23.9), two had known risk factors for HIV infection (intravenous drug use, homosexual behavior, or sexual relations with HIV-infected persons), and a third had possible but unconfirmed risk factors. To determine whether the dentist had infected his patients, the CDC conducted a study of the molecular evolution of HIV isolates from the dentist and the patients. HIV undergoes rapid evolution, making it possible to trace the path of its transmission. This rapid evolution also allows HIV to develop drug resistance quickly, making the development of a treatment for AIDS difficult. Blood specimens were collected from the dentist, the patients, and a group of 35 local controls (other HIV-infected HIV-positive persons included in study of HIV isolates from a Florida dental practice Average Differences in DNA Sequences (%) Person Sex Known Risk Factors From HIV from Dentist From HIV from Controls Dentist M Yes Patient A F No 3.4 10.9 Patient B F No 4.4 11.2 11.0 Patient C M No 3.4 11.1 Patient E F No 3.4 10.8 Patient G M No 4.9 11.8 Patient D M Yes 13.6 13.1 Patient F M Yes 10.7 11.9 Source: After C. Ou, et al., Science 256(1992):1165 – 1171, Table 1. 695 696 Chapter 23 people who lived within 90 miles of the dental practice but who had no known contact with the dentist). DNA was extracted from white blood cells, and a 680-bp fragment of the envelope gene of the virus was amplified by PCR (see p. 000 in Chapter 16). The fragments from the dentist, the patients, and the local controls were then sequenced and compared. The divergence between the viral sequences taken from the dentist, the seven patients, and the controls is shown Table 23.9. Viral DNA taken from patients with no confirmed risk factors (patients A, B, C, E, and G) differed from the dentist’s viral DNA by 3.4% to 4.9%, whereas the viral DNA from the controls differed from the dentist’s by an average of 11%. The viral sequences collected from five patients (A, B, C, E, and G) were more closely related to the viral sequences collected from the dentist than to viral sequences from the general population, strongly suggesting that these patients acquired their HIV infection from the dentist. The viral isolates from patients D and F (patients with confirmed risk factors), however, differed from that of the dentist by 10.7% and 13.6%, suggesting that these two patients did not acquire their infection from the dentist. A phylogenetic tree depicting the evolutionary relationships of the viral sequences ( ◗ FIGURE 23.20) confirmed that the virus taken from the dentist had a close evolutionary relationship to viruses taken from patients A, B, C, E, and G. The viruses from patients D and F, with known risk factors, were no more similar to the virus from the dentist than to viruses from local controls, indicating that the dentist most likely infected five of his patients, whereas the other two patients probably acquired their infections elsewhere. Of three additional HIV-positive patients that have been identified since 1992, only one has viral sequences that are closely related to those from the dentist. The study of HIV isolates from the dentist and his patients provides an excellent example of the relevance of molecular evolutionary studies to real-world problems. How the dentist infected his patients during their visits to his office remains a mystery, but this case is clearly unusual. A study of almost 16,000 patients treated by HIV-positive health-care workers failed to find a single case of confirmed transmission of HIV from the health-care worker to the patient. www.whfreeman.com/pierce Disease Control 1 DNA sequences of HIV from these patients are most similar to the HIV sequence from the dentist. He probably infected them. Dentist Dentist-y Patient C-x Patient C-y Patient A-y Patient G-x Patient G-y Patient A-x Patient B-x Patient B-y Patient E-x Patient E-y LC2-x LC3-x LC2-y Patient F-x Patient F-y LC consensus seqeunce LC 9 LC 35 LC 3-y Patient D-x Patient D-y 2 DNA sequences from patients D and F are no more similar to that from the dentist than to those from local controls (LC). These patients were probably not infected by the dentist. ◗ 23.20 Evolutionary tree showing the relationships of HIV isolates from a dentist, seven of his patients (A through G), and other HIV-positive persons from the same region (local controls, LC). The letters x and y represent different isolates from the same patient. The phylogeny is based on DNA sequences taken from the envelope gene of the virus. Viral sequences from patients A, B, C, E, and G cluster with those of the dentist, indicating a close evolutionary relationship. Sequences from patients D and F, along with those of local controls, are more distantly related. [C. Ou et al. Molecular epidemiology of HIV transmission in a dental practice, Science 256(1992): 1167.] Web site of the U.S. Center for Patterns of Molecular Variation The results of molecular studies of numerous genes have demonstrated that different genes and different parts of the same gene often evolve at different rates. Rates of evolutionary change in nucleotide sequences are usually measured as the rate of nucleotide substitution, which is the number of substitutions taking place per nucleotide site per year. To calculate the rate of nucleotide substitution, we begin by looking at homologous sequences from different organisms. We compare the homologous sequences and determine the number of nucleotides that differ between the two sequences. We might compare the growth-hormone sequences for mice and rats, which diverged from a common ancestor some 15 million years ago. From the number of different nucleotides in their growth-hormone genes, we compute the number of nucleotide substitutions that must have taken place since they diverged. Because the same site may have mutated more than once, the number of nucleotide substitutions is larger than the number of nucleotide differences in two sequences; so special mathematical methods have been developed for inferring the actual number of substitutions likely to have taken place. Population and Evolutionary Genetics When we have the number of nucleotide substitutions per nucleotide site, we divide by the amount of evolutionary time that separates the two organisms (usually obtained from the fossil record) to obtain an overall rate of nucleotide substitution. For the mouse and rat growth-hormone gene, the overall rate of nucleotide substitution is approximately 8  109 substitutions per site per year. Nucleotide changes in a gene that alter the amino acid sequence of a protein are referred to as nonsynonymous substitutions. Nucleotide changes, particularly those at the third position of the codon, that do not alter the amino acid sequence are called synonymous substitutions. The rate of nonsynonymous substitution varies widely among mammalian genes. The rate for the -actin protein is only 0.01  109 substitutions per site per year, whereas the rate for interferon  is 2.79  109, about 1000 times as high. The rate of synonymous substitution also varies among genes, but not to the extent of variation in the nonsynonymous rate. For most protein-encoding genes, the synonymous rate of change is considerably higher than the nonsynonymous rate because synonymous mutations are tolerated by natural selection (Table 23.10). Nonsynonymous mutations, on the other hand, alter the amino acid sequence of the protein and in many cases are detrimental to Table 23.10 the fitness of the organism, so most of these mutations are eliminated by natural selection. Different parts of a gene also evolve at different rates, with the highest rates of substitutions in regions of the gene that have the least effect on function, such as the third position of a codon, flanking regions, and introns ( ◗ FIGURE 23.21). The 5 and 3 flanking regions of genes are not transcribed into RNA, and therefore substitutions in these regions do not alter the amino acid sequence of the protein, although they may affect gene expression (see Chapter 16). Rates of substitution in introns are nearly as high. Although these nucleotides do not encode amino acids, introns must be spliced out of the pre-mRNA for a functional protein to be produced, and particular sequences are required at the 5 splice site, 3 splice site, and branch point for correct splicing (see Chapter 14). Substitution rates are somewhat lower in the 5 and 3 untranslated regions of a gene. These regions are transcribed into RNA but do not encode amino acids. The 5 untranslated region contains the ribosome-binding site, which is essential for translation, and the 3 untranslated region contains sequences that may function in regulating mRNA stability and translation; so substitutions in these regions may have deleterious effects on organismal fitness and will not be tolerated. Rates of nonsynonymous and synonymous substitutions in mammalian genes based on human – rodent comparisons Nonsynonymous Rate (per Site per 109 Years) Synonymous Rate (per Site per 109 Years) -Actin 0.01 3.68 -Actin 0.03 3.13 Albumin 0.91 6.63 Gene Aldolase A 0.07 3.59 Apoprotein E 0.98 4.04 Creatine kinase 0.15 3.08 Erythropoietin 0.72 4.34 -Globin 0.55 5.14 -Globin 0.80 3.05 Growth hormone 1.23 4.95 Histone 3 0.00 6.38 Immunoglobulin heavy chain (variable region) 1.07 5.66 Insulin 0.13 4.02 Interferon 1 1.41 3.53 Interferon  2.79 8.59 Luteinizing hormone 1.02 3.29 Somatostatin-28 0.00 3.97 Source: After W. Li and D. Graur, Fundamentals of Molecular Evolution (Sunderland, MA: Sinauer, 1991), p. 69, Table 1. 697 Chapter 23 1 Nonsynonymous nucleotide substitutions alter the amino acid, but synonymous ones do not. Synonymous Nucelotide substitutions per site per year10–9 698 Pseudogene 5 4 3 2 1 Nonsynonymous 2 Rates of substitution are lower in amino acid coding and generegulation regions… 3 …but are much higher in nonfunctional DNA, such as pseudogenes. DNA 5’ flanking region Exon Exon 3’ flanking Intron region 5’ untrans3’ untranslated region lated region Pre-mRNA 5’ untranslated region mRNA Exons 3’ untranslated region Protein ◗ 23.21 Different parts of genes evolve at different rates. The highest rates of nucleotide substitution are in sequences that have the least effect on protein function. The lowest rates of substitution are seen in nonsynonymous changes in the coding region, because these substitutions always alter the amino acid sequence of the protein and are often deleterious. The highest rates of substitution are in pseudogenes, which are duplicated nonfunctional copies of genes that have acquired mutations. Such a gene no longer produces a functional product; so mutations in pseudogenes have no effect on the fitness of the organism. In summary, there is a relation between the function of a sequence and its rate of evolution; higher rates are found where they have the least effect on function. This observation fits with the neutral-mutation hypothesis, which predicts that molecular variation is not affected by natural selection. The Molecular Clock The neutral-mutation theory proposes that evolutionary change at the molecular level occurs primarily through the fixation of neutral mutations by genetic drift. The rate at which one neutral mutation replaces another depends only on the mutation rate, which should be fairly constant for any particular gene. If the rate at which a protein evolves is roughly constant over time, the amount of molecular change that a protein has undergone can be used as a molecular clock to date evolutionary events. For example, the enzyme cytochrome c could be examined in two organisms known from fossil evidence to have had a common ancestor 400 million years ago. By determining the number of differences in the cytochrome c amino acid sequences in each organism, we could calculate the number of substitutions that have occurred per amino acid site. The occurrence of 20 amino acid substitutions since the two organisms diverged indicates an average rate of 5 substitutions per 100 million years. Knowing how fast the molecular clock ticks allows us to use molecular changes in cytochrome c to date other evolutionary events: if we found that cytochrome c in two organisms differed by 15 amino acid substitutions, our molecular clock would suggest that they diverged some 300 million years ago. If we assumed some error in our estimate of the rate of amino acid substitution, statistical analysis would show that the true divergence time might range from 160 million to 440 million years. The molecular clock is analogous to geological dating based on the radioactive decay of elements. The molecular clock was proposed by Emile Zuckerandl and Linus Pauling in 1965 as a possible means of dating evolutionary events on the basis of molecules in present-day organisms. A number of studies have examined the rate of evolutionary change in proteins ( ◗ FIGURE 23.22), and the molecular clock has been widely used to date evolutionary events when the fossil record is absent or ambiguous. However, the results of several studies have shown that the molecular clock does not always tick at a constant rate, particularly over shorter time periods, and this method remains controversial. Concepts Different genes and different parts of the same gene evolve at different rates. Those parts of genes that have the least effect on function tend to evolve at the highest rates. The idea of the molecular clock is that individual proteins and genes evolve at a constant rate and that the differences in the sequences of present-day organisms can be used to date past evolutionary events. Population and Evolutionary Genetics Molecular Phylogenies Number of amino acid substitutions (a) 0.9 As already mentioned, a phylogeny is an evolutionary history of a group of organisms, usually represented as a tree ( ◗ FIGURE 23.23). The branches of the phylogenetic tree represent the ancestral relationships between the organisms, and the length of each branch is proportional to the amount of evolutionary change that separates the members of the phylogeny. Before the rise of molecular biology, phylogenies were based largely on anatomical, morphological, or behavioral traits. Evolutionary biologists attempted to gauge the relationships among organisms by assessing the overall degree of similarity or by tracing the appearance of key characteristics of these traits. The first phylogenies constructed from molecular 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 100 200 300 400 500 Time since divergence (millions of years) (b) Quagga Human Dog Burchell Zebras Kangaroo Grevy Echidna Mountain Chicken Wild ass Ancestral organism Newt Carp Half ass Shark 600 500 440 400 350 270 225 180 135 70 Present Millions of years ago Domestic Horses ◗ 23.22 The molecular clock is based on the assumption of a constant rate of change in protein or DNA sequence. (a) Relation between the rate of amino acid substitution and time since divergence, based on amino acid sequences of  hemoglobin from the eight species shown in part b. The constant rate of evolution in protein and DNA sequences has been used as a molecular clock to date past evolutionary events. (b) Phylogeny of eight species and their approximate times of divergence, based on the fossil record. 8 ◗ 6 4 2 Sequence divergence (%) 0 Przewalski 23.23 A phylogeny is the evolutionary history — the ancestral relationships — of a group of organisms. This branching diagram shows the phylogeny of horses based on mitochondrial DNA sequences. DNA of the extinct quagga was extracted from skins from preserved museum specimens. 699 700 Chapter 23 (a) The number 1 indicates an invariant position in the cytochrome c molecule (i.e., all the organisms have the same amino acid in this position). The position is probably functionally very significant. Position in sequence Number of amino acids in different organisms at the position shown Human Monkey Horse Donkey Other: Acidic side chains: Pig Dog D Aspartic acid C Cysteine E Glutamic acid Rabbit P Proline Q Glutamine Kangaroo Basic side chains: N Asparagine Chicken H Histidine S Serine Pigeon K Lysine T Threonine Duck R Arginine G Glycine Turtle Hydrophobic side chains: Rattlesnake V Valine F Phenylalanine Tuna Y Tyrosine I Isoleucine Samia cynthia (moth) W Tryptophan L Leucine Screwworm fly A Alanine M Methionine Saccharomyces (baker's yeast) Candida krusei (yeast) Neurospora crassa (mold) 1 5 10 Side chains marked by red arrows interact with the heme group. 15 20 25 30 1 3 4 3 2 1 2 3 3 1 4 3 2 1 3 3 1 1 1 3 1 3 2 23 2 1 3 1 1 2 13 1 G G G G G G G G G G G G G G G G G G G D D D D D D D D D D D D D D N D S S D V V V V V V V V I I V V V V A V A A S E E E E E E E E E E E E E A E E K K K K K K K K K K K K K K K K K N K K K K G G G G G G G G G G G G G G G G G G G K K K K K K K K K K K K K K K K A A A K K K K K K K K K K K K K K K K T T N I I I I I I I I I I I I I T I L L L L F F F F F F F F F F F F F F F F F F F I I V V V V V V V V V V T V V V K K K M M Q Q Q Q Q Q Q Q Q Q M Q Q Q T T T K K K K K K K K K K K K K K R R R R R C C C C C C C C C C C C C C C C C C C S S A A A A A A S S S A S A A A E A A Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q L E E C C C C C C C C C C C C C C C C C C C H H H H H H H H H H H H H H H H H H H T T T T T T T T T T T T T T T T T T V V V V V V V V V V V V V V V V V I E E E E E E E E E E E E E E E E E E E K K K K K K K K K K K K K N A A K A G G G G G G G G G G G G G G G G G G N G G G G G G G G G G G G G G G G G G L K K K K K K K K K K K K K K K K P P T H H H H H H H H H H H H H H H H H H Q K K K K K K K K K K K K K K K K K K K T T T T T T T T T T T T T V V V V V I G G G G G G G G G G G G G G G G G G G P P P P P P P P P P P P P P P P P P P N N N N N N N N N N N N N N N N N N A L L L L L L L L L L L L L L L L L L L H H H H H H H N H H H N H W H H H H H G G G G G G G G G G G G G G G G G G G (b) Human Monkey Dog Horse Donkey Pig Kangaroo Rabbit Pigeon Duck Chicken Turtle Rattlesnake Tuna Screwworm fly Samia cynthia (moth) Neurospora crassa (mold) Saccharomyces (baker's yeast) Candida krusei (yeast) Ancestral organism 30 25 20 15 Average minimal substitutions 10 5 0 ◗ 23.24 A phylogeny based on amino acid sequences of the cytochrome c molecule. data were based on amino acid sequences of proteins such as cytochrome c ( ◗ FIGURE 23.24), but, more recently, phylogenies have been based on DNA sequences. One example is the use of DNA sequences to study the relationship of humans to the other apes. Charles Darwin originally proposed that chimpanzees and gorillas were closely related to humans. However, subsequent study has placed humans in the family Hominidae and the great apes (chimpanzees, gorilla, orangutan, and gibbon) in the family Pongidae. Some researchers suggested that gibbons belong to a third family; others proposed that humans are most closely related to orangutans. Molecular data support the hypothesis that humans, chimpanzees, and Population and Evolutionary Genetics Multiple amino acids at a position indicate a great deal of change. The position is probably less significant than others. 35 40 45 50 Rarely Mostly Uncharged charged Uncharged hydrophobic Invariant 55 60 65 70 75 80 85 90 95 100 104 33 2 12 2 1 1 1 6 1 2 3 1 2 4 1 2 2 5 2 2 2 5 1 6 3 4 2 2 3 2 1 1 21 1 2 1 1 1 1 1 11 1 31 3 1 2 2 1 6 7 2 1 6 22 2 2 2 2 25 3 35 3 L L L L L L L L L L L L L L F L I I L F F F F F F F F F F F I F F Y F F F F G G G G G G G G G G G G G G G G G S G R R R R R R R R R R R R R R R R R R R K K K K K K K K K K K K K K K K H H K T T T T T T T T T T T T T T T T S S T G G G G G G G G G G G G G G G G G G G Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q A A A A A A A A A A A A A A A A A A A P P P P P P V P E E E E V E P A Q Q D G G G G G G G G G G G G G G G G G G G Y Y F F F F F F F F F F Y Y F F Y Y Y S S T S S S S T S S S S S S S A S S A Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y T T T T T T T T T T T T T T S T T T T A A D D D D D D D D D E A D N N D D D A A A A A A A A A A A A A A A A A A A N N N N N N N N N N N N N S N N N N N K K K K K K K K K K K K K K K K I K K N N N N N N N N N N N N N N A A K R Q K K K K K K K K K K K K K K K K K A K G G G G G G G G G G G G G G G G N G G I I I I I I I I I I I I I I I I V V I I I T T T T T I T T T T I V T T L E T W W W W W W W W W W W W W W W W W W W G G K K G G G G G G G G G N G Q D A D E E E E E E E E E E E E D N D D E E E D D E E E E D D D D D E D D D D N P N T T T T T T T T T T T T T T T T N T T L L L L L L L L L L L L L L L L M M L M M M M M M M M M M M M M M F F S S F E E E E E E E E E E E E E E E E E D E Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y L L L L L L L L L L L L L L L L L L L E E E E E E E E E E E E E E E E T E E N N N N N N N N N N N N N N N N N N N P P P P P P P P P P P P P P P P P P P K K K K K K K K K K K K K K K K X X X K K K K K K K K K K K K K K K K K K K Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y I I I I I I I I I I I I I I I I I I I P P P P P P P P P P P P P P P P P P P G G G G G G G G G G G G G G G G G G G T T T T T T T T T T T T T T T T T T T K K K K K K K K K K K K K K K K K K K M M M M M M M M M M M M M M M M M M M I I I I I I I I I I I I V I V I A A A F F F F F F F F F F F F F F F F F F F G G G G G G G G G G G G G G G G G G G G G G V V A A A A A A A A A A T A A A I I I I I I I I I I I I L I L L L L L K K K K K K K K K K K K S K K K K K K K K K K K K K K K K K K K K K K K K K K K K K K T K K K K K K K K A P E A D E E T T G G D S A S A K N N K K K E E E E E E E E E E E E E E E E D D D R R R R R R R R R R R R R R R R R R R A A E E E A A A V A A A T Q A D D D D D D D D D D D D N D D D N D N D N D L L L L L L L L L L L L L L L L L L I I I I I I I I I I I I I I V I I I V I A A A A A A A A A A A A A A A A T T T Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y F L L L L L L L L L L L L L L L L L M M K K K K K K K K K K K K K K K K K L K K K K K K K K K D Q D D E S E S K E E A A A A A A A A A A A A K A S A A A A T T T T T T T T T T T T T T T T C S T N N N N N K N N S A A S A S K K E K A E E E E E E E E K K K K A gorillas are most closely related and that orangutans and gibbons diverged from the other apes at a much earlier date. Growing evidence favors a close relationship between humans and chimpanzees ( ◗ FIGURE 23.25). Because molecular data can be collected from virtually any organism, comparisons can be made between evolu- tionary distant organisms. For example, DNA sequences have been used to examine the primary divisions of life and to construct universal phylogenies. On the basis of 16S rRNA, Norman Pace and his colleagues constructed a universal tree of life that included all the major groups of organisms ( ◗ FIGURE 23.26). The results of their studies (a) (b) 1 Do humans and chimpanzees have the more recent ancestor?… Chimpanzee 2 …or did humans split from the group first? Chimpanzee Gorilla Human Ancestor Ancestor Human Gorilla Conclusion: Most molecular data support the phylogeny in part a. ◗ 23.25 Two possible phylogenies of the human, chimpanzee, and gorilla relationships. One phylogeny suggests (a) that humans and chimpanzees have the more recent ancestor. The other phylogeny suggests (b) that humans split from the group first and that chimpanzees and gorillas have the more recent ancestor. 701 702 Chapter 23 Eukaryotes Human Xenopus laevis (frog) Corn Saccharomyces (yeast) Oxytricha nova Dictyostelium (Slime mold) Trypanosoma brucei Escherichia coli Eubacteria Pseudomonas testosteroni Agrobacterium tumifaciens Corn mitochondria Bacillus subtilis Ancestral organism Anacystis nidulans Corn chloroplast Flavobacterium heparinum Archaea Halobacterium volcanii Methanospirillum hungatei Methanobacterium formicicum Methanococcus vannielii Thermoproteus tenax Sulfolobus solfataricus ◗ 23.26 A universal tree of life can be constructed from 16S rRNA sequences. Note that sequences from corn mitochondria and chloroplasts are most similar to sequences from eubacteria, confirming the endosymbiotic hypothesis that these eukaryotic organelles evolved from bacteria (see Chapter 20). revealed that there are three divisions of life: the eubacteria (the common bacteria), the archaea (a distinct group of lesser-known prokaryotes), and the eukaryotes. Concepts Molecular data can be used to infer phylogenies (evolutionary histories) of groups of living organisms. www.whfreeman.com/pierce evolution Current research in molecular Connecting Concepts Across Chapters The central theme of this chapter has been genetic evolution — how the genetic composition of a population changes with time. Unlike transmission and molecular genetics, which focus on individuals and particular genes, this chapter has focused on the genetic makeup of groups of individuals. To describe the genes in these groups, we must rely on mathematics and statistical tools; population genetics is therefore fundamentally quantitative in nature. Mathematical models are commonly used in population genetics to describe processes that bring about change in genotypic and allelic frequencies. These models are, by necessity, simplified representations of the real world, but they nevertheless can be sources of insight into how various factors influence the processes of genetic change. Our study of population genetics depends on and synthesizes much of the information that we have covered in other parts of this book. Describing the genetic composition of a population requires an understanding of the principles of heredity (Chapters 3 through 5) and how genes are changed by mutation (Chapter 17). Our examination of molecular evolution in the second half of the chapter presupposes an understanding of how genes are encoded in DNA, replicated, and expressed (Chapters 10 through 15). It includes the use of molecular tools, such as restriction enzymes, DNA sequencing, and PCR, which are covered in Chapter 18. Population and Evolutionary Genetics 703 CONCEPTS SUMMARY • Population genetics examines the genetic composition of groups of individuals and how this composition changes with time. • A Mendelian population is a group of interbreeding, sexually reproducing individuals, whose set of genes constitutes the population’s gene pool. Evolution occurs through changes in this gene pool. • Genetic variation and the forces that shape it are important in population genetics. A population’s genetic composition can be described by its genotypic and allelic frequencies. • The Hardy-Weinberg law describes the effect of reproduction and Mendel’s laws on the allelic and genotypic frequencies of a population. It assumes that a population is large, randomly mating, and free from the effects of mutation, migration, and natural selection. When these conditions are met, the allelic frequencies do not change and the genotypic frequencies stabilize after one generation in the Hardy-Weinberg equilibrium proportions p2, 2pq, and q3, where p and q equal the frequencies of the alleles. • Nonrandom mating affects the frequencies of genotypes but not alleles. Positive assortative mating is preferential mating between like individuals; negative assortative mating is preferential mating between unlike individuals. • Inbreeding, a type of positive assortative mating, increases the frequency of homozygotes while decreasing the frequency of heterozygotes. Inbreeding is frequently detrimental because it increases the appearance of lethal and deleterious recessive traits. • Mutation, migration, genetic drift, and natural selection can change allelic frequencies. • Recurrent mutation eventually leads to an equilibrium, with the allelic frequencies being determined by the relative rates of forward and reverse mutation. Change due to mutation in a single generation is usually very small because mutation rates are low. • Migration, the movement of genes between populations, increases the amount of genetic variation within populations and decreases differences between populations. The magnitude of change depends both on the differences in allelic frequencies between the populations and on the magnitude of migration. • Genetic drift, the change in allelic frequencies due to chance factors, is important when the effective population size is small. Genetic drift occurs when a population consists of a small number of individuals, is established by a small number of founders, or undergoes a major reduction in size. Genetic drift changes allelic frequencies, reduces genetic variation within populations, and causes genetic divergence among populations. • Natural selection is the differential reproduction of genotypes; it is measured by the relative reproductive successes of genotypes (fitnesses). The effects of natural selection on allelic frequency can be determined by applying the general selection model. Directional selection leads to the fixation of one allele. The rate of change in allelic frequency due to selection depends on the intensity of selection, the dominance relations, and the initial frequencies of the alleles. • Mutation and natural selection can produce an equilibrium, in which the number of new alleles introduced by mutation is balanced by the elimination of alleles through natural selection. • Molecular methods offer a number of advantages for the study of evolution. The use of protein electrophoresis to study genetic variation in natural populations showed that most natural populations have large amounts of genetic variation in their proteins. Two hypotheses arose to explain this variation. The neutral-mutation hypothesis proposed that molecular variation is selectively neutral and is shaped largely by mutation and genetic drift. The balance model proposed that molecular variation is maintained largely by balancing selection. • Different parts of the genome show different amounts of genetic variation. In general, those that have the least effect on function evolve at the highest rates. • The molecular-clock hypothesis proposes a constant rate of nucleotide substitution, providing a means of dating evolutionary events by looking at nucleotide differences between organisms. • Molecular data are often used for constructing phylogenies. IMPORTANT TERMS Mendelian population (p. 000) gene pool (p. 000) genotypic frequency (p. 000) allelic frequency (p. 000) Hardy-Weinberg law (p. 000) Hardy-Weinberg equilibrium (p. 000) positive assortative mating (p. 000) negative assortative mating (p. 000) inbreeding (p. 000) outcrossing (p. 000) inbreeding coefficient (p. 000) inbreeding depression (p. 000) equilibrium (p. 000) migration (gene flow) (p. 000) sampling error (p. 000) genetic drift (p. 000) effective population size (p. 000) founder effect (p. 000) genetic bottleneck (p. 000) fixation (p. 000) fitness (p. 000) selection coefficient (p. 000) directional selection (p. 000) overdominance (p. 000) underdominance (p. 000) phylogeny (p. 000) proportion of polymorphic loci (p. 000) expected heterozygosity (p. 000) neutral-mutation hypothesis (p. 000) balance hypothesis (p. 000) molecular clock (p. 000) 704 Chapter 23 Worked Problems 1. The following genotypes were observed in a population: Genotype HH Hh hh Number 40 45 50 (a) Calculate the observed genotypic and allelic frequencies for this population. (b) Calculate the numbers of genotypes expected if this population were in Hardy-Weinberg equilibrium. (c) Using a chi-square test, determine whether the population is in Hardy-Weinberg equilibrium. • Solution (a) The observed genotypic and allelic frequencies are calculated by using Equations 23.1 and 23.3: f(HH)  number of HH individuals 40   .30 N 135 f(Hh)  number of Hh individuals 45   .33 N 135 number of hh individuals 50 f(hh)    .37 N 135 2nHH  nHh 2(40)  (45) p  f(H)    .46 2N 2(135) q  f (h)  (1  p)  (1 .46)  .54 (b) If the population is in Hardy-Weinberg equilibrium, the expected numbers of genotypes are: HH  p2  N  (.46)2  135  28.57 Hh  2pq  N  2(.46)(.54)  135  67.07 hh  q2  N  (.54)2  135  39.37 (c) The observed and expected numbers of the genotypes are: Genotype HH Hh hh Number observed 40 45 50 Number expected 28.57 67.07 39.37 These numbers can be compared by using a chi-square test: (observed  expected)2 expected (40  28.57)2 (45  67.07)2 (50  39.37)2    28.57 67.07 39.37  4.57  7.26  2.87  14.70 2   The degrees of freedom associated with this chi-square value are n  2, where n equals the number of expected genotypes, or 3  2  1. By examining Table 3.4, we see that the probability associated with this chi-square and the degrees of freedom is P .001, which means that the difference between the observed and expected values is unlikely to be due to chance. Thus, there is a significant difference between the observed numbers of genotypes and the numbers that we would expect if the population were in Hardy-Weinberg equilibrium. We conclude that the population is not in equilibrium. 2. A recessive allele for red hair (r) has a frequency of .2 in population I and a frequency of .01 in population II. A famine in population I causes a number of people in population I to migrate to population II, where they reproduce randomly with the members of population II. Geneticists estimate that, after migration, 15% of the people in population II consist of people who migrated from population I. What will be the frequency of red hair in population II after the migration? • Solution From Equation 23.16, the allelic frequency in a population after migration (q II) is q II  qI(m)  qII(1  m) where qI and qII are the allelic frequencies in population I (migrants) and population II (residents), respectively, and m is the proportion of population II that consist of migrants. In this problem, the frequency of red hair is .2 in population I and .01 in population II. Because 15% of population II consists of migrants, m  .15. Substituting these values into Equation 23.16, we obtain: q II  .2(.15)  (.01)(1  .15)  .03  .0085  .0385 This is the expected frequency of the allele for red hair in population II after migration. Red hair is a recessive trait; if mating is random for hair color, the frequency of red hair in population II after migration will be: f(rr)  q2  (.0385)2  .0015 3. Two populations have the following numbers of breeding adults: Population A: 60 males, 40 females Population B: 5 males, 95 females Population and Evolutionary Genetics (a) Calculate the effective population sizes for populations A and B. (b) What predications can you make about the effects of the different sex ratios of these populations on their gene pools? (a) The effective population size can be calculated by using Equation 23.19: Ne  4  n males  n females n males  n females For population A: Ne  4  60  40  96 60  40 For population B: Ne  4  5  95  19 5  95 Although each population has a total of 100 breeding adults, the effective population size of population B is much smaller because it has a greater disparity between the numbers of males and females. (b) The effective population size determines the amount of genetic drift that will occur. Because the effective population size of B is much smaller than that of population A, we can predict that population B will undergo more genetic drift, leading to greater changes in allelic frequency, greater loss of genetic variation, and greater genetic divergence from other populations. 4. Alcohol is a common substance in rotting fruit, where fruit fly larvae grow and develop; larvae use the enzyme alcohol dehydrogenase (ADH) to detoxify the effects of this alcohol. In some fruitfly populations, two alleles are present at the locus than encodes ADH: ADHF, which encodes a form of the enzyme that migrates rapidly (fast) on an electrophoretic gel; and ADHS, which encodes a form of the enzyme that migrates slowly on an electrophoretic gel. Female fruit flies with different ADH genotypes produce the following numbers of offspring when alcohol is present: Initial genotypic frequencies: Fitnesses: Proportionate contribution of genotypes to population: Relative genotypic frequency after selection: W  .04  .16  .16  .36 Mean number of offspring 120 60 30 Genotype ADHFADHF ADHFADHS ADHSADHS • Solution 705 (a) Calculate the relative fitnesses of females having these genotypes. (b) If a population of fruit flies has an initial frequency of ADHF equal to .2, what will be the frequency in the next generation when alcohol is present? • Solution (a) Fitness is the relative reproductive output of a genotype and is calculated by dividing the average number of offspring produced by that genotype by the mean number of offspring produced by the most prolific genotype. The fitnesses of the three ADH genotypes therefore are: Genotype Mean number of offspring ADHFADHF 120 ADHFADHS 60 ADHSADHS 30 Fitness 120 1 120 60 WFS   .5 120 30 WSS   .25 120 WFF  (b) To calculate the frequency of the ADHF allele after selection, we can use the table method. The frequencies of the three genotypes before selection are the Hardy-Weinberg equilibrium frequencies of p2, 2pq, and q2. We multiply each of these frequencies by the fitness of each genotype to obtain the frequencies after selection. These products are summed to obtain the mean fitness of the population (W ), and the products are then divided by the mean fitness to obtain the relative genotypic frequencies after selection as shown here: ADHFADHF p2  (.2)2  .04 WFF  1 p2WFF  .04(1)  .04 p2WFF .04  .36 W  .11 ADHFADHS 2pq  2(.2)(.8)  .32 WFS  .5 2pqWFS  (.32)(.5)  .16 2pqWFS .16  .36 W  .44 ADHSADHS q2  (.8)2  0.64 W22  .25 q2WSS  (.64)(.25)  .16 q2WSS .16  .36 W  .44 706 Chapter 23 To calculate the allelic frequency after selection, we use Equation 23.4: We predict that the frequency of ADHF will increase from .2 to .33. p  f(ADHF)  f(ADHFADHF)  12 f(ADHFADHS)  .11  12 (.44)  .33 The New Genetics MINING GENOMES POPULATION GENETICS: ANALYSES AND SIMULATIONS In this exercise, you will analyze real molecular data, primarily generated by high-school and college students, to learn how allele frequencies and genotype distributions can be used to study human populations. To do so, you will use the databases and statistical tools at the Dolan DNA Learning Center of Cold Spring Harbor Laboratory. In addition, you will use simulations to explore how factors such as population size, selection pressure, and genetic drift interact to cause allele frequencies to change. COMPREHENSION QUESTIONS 1. What is a Mendelian population? How is the gene pool of a Mendelian population usually described? What are the predictions given by the Hardy-Weinberg law? * 2. What assumptions must be met for a population to be in Hardy-Weinberg equilibrium? 3. What is random mating? * 4. Give the Hardy-Weinberg expected genotypic frequencies for (a) an autosomal locus with three alleles, and (b) an X-linked locus with two alleles. 5. Define inbreeding and briefly describe its effects on a population. 6. What determines the allelic frequencies at mutational equilibrium? * 7. What factors affect the magnitude of change in allelic frequencies due to migration? 8. Define genetic drift and give three ways that it can arise. What effect does genetic drift have on a population? * 9. What is effective population size? How does it affect the amount of genetic drift? 10. Define natural selection and fitness. 11. Briefly discuss the differences between directional selection, overdominance, and underdominance. Describe the effect of each type of selection on the allelic frequencies of a population. 12. What factors affect the rate of change in allelic frequency due to natural selection? *13. Compare and contrast the effects of mutation, migration, genetic drift, and natural selection on genetic variation within populations and on genetic divergence between populations. 14. Give some of the advantages of using molecular data in evolutionary studies. *15. What is the key difference between the neutral-mutation hypothesis and the balance hypothesis? 16. Outline the different rates of evolution that are typically seen in different parts of a protein-encoding gene. What might account for these differences? *17. What is the molecular clock? APPLICATION QUESTIONS AND PROBLEMS 18. How would you respond to someone who said that models are useless in studying population genetics because they represent oversimplifications of the real world? *19. Voles (Microtus ochrogaster) were trapped in old fields in southern Indiana and were genotyped for a transferrin locus. The following numbers of genotypes were recorded. T ET E 407 T ET F 170 T FT F 17 Calculate the genotypic and allelic frequencies of the transferrin locus for this population. 20. Orange coat color in cats is due to an X-linked allele (XO) that is codominant to the allele for black (X). Genotypes of the orange locus of cats in Minneapolis and St. Paul, Minnesota, were determined and the following data were obtained. XOXO females XOX females XX females XOY males XY males 11 70 94 36 112 Population and Evolutionary Genetics Calculate the frequencies of the XO and X alleles for this population. 21. A total of 6129 North American Caucasians were blood typed for the MN locus, which is determined by two codominant alleles, LM and LN. The following data were obtained: Blood type M MN N Number 1787 3039 1303 707 *26. Color blindness in humans is an X-linked recessive trait. Approximately 10% of the men in a particular population are color blind. (a) If mating is random for the color-blind locus, what is the frequency of the color-blind allele in this population? (b) What proportion of the women in this population are expected to be color-blind? (c) What proportion of the women in the population are expected to be heterozygous carriers of the color-blind allele? * 27. The human MN blood type is determined by two codominant alleles, LM and LN. The frequency of LM in Eskimos on a small Arctic island is .80. If the inbreeding coefficient for this population is .05, what are the expected frequencies of the M, MN, and N blood types on the island? 28. Demonstrate mathematically that full sib mating (F  14) reduces the heterozygosity by 14 with each generation. 29. The forward mutation rate for piebald spotting in guinea pigs is 8  105; the reverse mutation rate is 2  106. Genotype Number Assuming that no other evolutionary forces are present, M1M1 20 what is the expected frequency of the allele for piebald 1 2 MM 45 spotting in a population that is in mutational 2 2 MM 42 equilibrium? 1 3 MM 4 2 3 * 30. In German cockroaches, curved wing (cv) is recessive to MM 8 3 3 normal wing (cv). Bill, who is raising cockroaches in his MM 6 dorm room, finds that the frequency of the gene for curved Total 125 wings in his cockroach population is .6. In the apartment of his friend Joe, the frequency of the gene for curved (a) Calculate the genotypic and allelic frequencies for this wings is .2. One day Joe visits Bill in his dorm room, and population. several cockroaches jump out of Joe’s hair and join the (b) What would be the expected numbers of genotypes if population in Bill’s room. Bill estimates that 10% of the the population were in Hardy-Weinberg equilibrium? cockroaches in his dorm room now consists of individual 23. Full color (D) in domestic cats is dominant over dilute roaches that jumped out of Joe’s hair. What will be the new color (d). Of 325 cats observed, 194 have full color and 131 frequency of curved wings among cockroaches in Bill’s have dilute color. room? (a) If these cats are in Hardy-Weinberg equilibrium for the 31. A population of water snakes is found on an island in Lake dilution locus, what is the frequency of the dilute allele? Erie. Some of the snakes are banded and some are unbanded; banding is caused by an autosomal allele that is (b) How many of the 194 cats with full color are likely to recessive to an allele for no bands. The frequency of banded be heterozygous? snakes on the island is .4, whereas the frequency of banded 24. Tay-Sachs disease is an autosomal recessive disorder. Among snakes on the mainland is .81. One summer, a large number Ashkenazi Jews, the frequency of Tay-Sachs disease is 1 in of snakes migrate from the mainland to the island. After 3600. If the Ashkenazi population is mating randomly for this migration, 20% of the island population consists of the Tay-Sachs gene, what proportion of the population snakes that came from the mainland. consists of heterozygous carriers of the Tay-Sachs allele? (a) Assuming that both the mainland population and the 25. In the plant Lotus corniculatus, cyanogenic glycoside protects island population are in Hardy-Weinberg equilibrium for the plants against insect pests and even grazing by cattle. This the alleles that affect banding, what is the frequency of the glycoside is due to a simple dominant allele. A population of allele for bands on the island and on the mainland before L. corniculatus consists of 77 plants that possess cyanogenic migration? glycoside and 56 that lack the compound. What is the frequency of the dominant allele that results in the presence (b) After migration has taken place, what will be the of cyanogenic glycoside in this population? frequency of the banded allele on the island? Carry out a chi-square test to determine whether this population is in Hardy-Weinberg equilibrium at the MN locus. 22. Genotypes of leopard frogs from a population in central Kansas were determined for a locus that encodes the enzyme malate dehydrogenase. The following numbers of genotypes were observed: 708 Chapter 23 *32. Calculate the effective size of a population with the following numbers of reproductive adults: (a) What will be the frequency of the sickle-cell allele (s) in the next generation? (b) What will be the frequency of the sickle cell allele at equilibrium? 35. Two chromosomal inversions are commonly found in populations of Drosophila pseudoobscura: Standard (ST) and Arrowhead (AR). When treated with the insecticide DDT, the genotypes for these inversions exhibit overdominance, with the following fitnesses: (a) 20 males and 20 females (b) 30 males and 10 females (c) 10 males and 30 females (d) 2 males and 38 females 33. Pikas are small mammals that live at high elevation in the talus slopes of mountains. Populations located on mountain tops in Colorado and Montana in North America are relatively isolated from one another, because the pikas don’t Genotype Fitness occupy the low-elevation habitats that separate the ST/ST .47 mountain tops and don’t venture far from the talus slopes. ST/AR 1 Thus, there is little gene flow between populations. AR/AR .62 Furthermore, each population is small in size and was founded by a small number of pikas. A group of population geneticists propose to study the What will be the frequency of ST and AR after equilibrium amount of genetic variation in a series of pika populations and has been reached? to compare the allelic frequencies in different populations. On *36. In a large, randomly mating population, the frequency of an the basis of biology and the distribution of pikas, what do you autosomal recessive lethal allele is .20. What will be the predict the population geneticists will find concerning the frequency of this allele in the next generation? within- and between-population genetic variation? 37. A certain form of congenital glaucoma results from an 34. In a large, randomly mating population, the frequency of autosomal recessive allele. Assume that the mutation rate is the allele (s) for sickle-cell hemoglobin is .028. The results 105 and that persons having this condition produce, on the of studies have shown that people with the following average, only about 80% of the offspring produced by genotypes at the beta-chain locus produce the average persons who do not have glaucoma. numbers of offspring given: (a) At equilibrium between mutation and selection, what Average number will be the frequency of the gene for congenital Genotype of offspring produced glaucoma? SS 5 (b) What will be the frequency of the disease in a Ss 6 randomly mating population that is at equilibrium? ss 0 CHALLENGE QUESTIONS 38. The Barton Springs salamander is an endangered species found only in a single spring in the city of Austin, Texas. There is growing concern that a chemical spill on a nearby freeway could pollute the spring and wipe out the species. To provide a source of salamanders to repopulate the spring in the event of such a catastrophe, a proposal has been made to establish a captive breeding population of the salamander in a local zoo. You are asked to provide a plan for the establishment of this captive breeding population, with the goal of maintaining as much of the genetic variation of the species as possible in the captive population. What factors might cause loss of genetic variation in the establishment of the captive population? How could loss of such variation be prevented? Assuming that it is feasible to maintain only a limited number of salamanders in captivity, what procedures should be instituted to ensure the long-term maintenance of as much of the variation as possible? SUGGESTED READINGS Avise, J. C. 1994. Molecular Markers, Natural History, and Evolution. New York: Chapman and Hall. An excellent review of how molecular techniques are being used to examine evolutionary questions. Buri, P. 1956. Gene frequency in small populations of mutant Drosophila. Evolution 10:367 – 402. Buri’s famous experiment demonstrating the effects of genetic drift on allelic frequencies. Population and Evolutionary Genetics Hardy, G. H. 1908. Mendelian proportions in a mixed population. Science 28:49 – 50. Original paper by G. H. Hardy outlining the Hardy-Weinberg law. Hartl, D. L., and A. G. Clark. 1997. Principles of Population Genetics, 3d ed. Sunderland, MA: Sinauer. An advanced textbook in population genetics. MacIntyre, R. J., Ed. 1985. Molecular Evolutionary Genetics. New York: Plenum. Contributors treat various aspects of molecular evolution. Mettler, L. E., T. G. Gregg, and H. S. Schaffer. 1998. Population Genetics and Evolution, 2d ed. Englewood Cliffs. NJ: Prentice Hall. A short, readable textbook on population genetics. Nei, M., and S. Kumar. 2000. Molecular Evolution and Phylogenetics. Oxford: Oxford University Press. An advanced textbook on the methods used in the study of molecular evolution. Ou, C., C. A. Ciesielski, G. Myers, C. I. Bandea, et al. 1992. Molecular epidemiology of HIV transmission in a dental practice. Science 256:1165 – 1171. 709 Study of molecular evolution of HIV in a Florida dental practice. Provine, W. B. 2002. The Origins of Theoretical Population Genetics, 2nd ed. Chicago: Chicago University Press. A complete history of the origins of population genetics as a field of study. Saccheri, I., M. Kuussaari, M. Kankare, P. Vikman, W. Fortelius, and I. Hanski. 1998. Inbreeding and extinction in a butterfly metapopulation. Nature 392:491 – 494. Discusses the role of inbreeding in population extinction of butterflies. Vial, C., P. Savolainen, J. E. Maldonado, I. R. Amorim, J. E. Rice, R. L. Honeyutt, K. A. Cranall, J. Lundeberg, and R. K. Wayne. 1997. Multiple and ancient origins of the domestic dog. Science 276:1687 – 1689. Using the molecular clock and mitochondrial DNA sequences, these geneticists estimate that the dog was domesticated more than 100,000 years ago.