Chapter 4 - Alberts - Bài 1+2
Chapter 4 - Alberts - Bài 1+2
Chapter 4 - Alberts - Bài 1+2
I II III IV V
BASIC GENETIC MECHANISMS
CHAPTER
DNA, Chromosomes,
and Genomes 4
Life depends on the ability of cells to store, retrieve, and translate the genetic IN THIS CHAPTER
instructions required to make and maintain a living organism. This hereditary
information is passed on from a cell to its daughter cells at cell division, and from THE STRUCTURE AND
one generation of an organism to the next through the organism’s reproductive FUNCTION OF DNA
cells. The instructions are stored within every living cell as its genes, the infor-
mation-containing elements that determine the characteristics of a species as a CHROMOSOMAL DNA AND
whole and of the individuals within it. ITS PACKAGING IN THE
As soon as genetics emerged as a science at the beginning of the twentieth cen- CHROMATIN FIBER
tury, scientists became intrigued by the chemical structure of genes. The informa-
tion in genes is copied and transmitted from cell to daughter cell millions of times CHROMATIN STRUCTURE AND
during the life of a multicellular organism, and it survives the process essentially FUNCTION
unchanged. What form of molecule could be capable of such accurate and almost
unlimited replication and also be able to exert precise control, directing multi- THE GLOBAL STRUCTURE OF
cellular development as well as the daily life of every cell? What kind of instruc- CHROMOSOMES
tions does the genetic information contain? And how can the enormous amount
of information required for the development and maintenance of an organism fit HOW GENOMES EVOLVE
within the tiny space of a cell?
The answers to several of these questions began to emerge in the 1940s. At
this time researchers discovered, from studies in simple fungi, that genetic infor-
mation consists largely of instructions for making proteins. Proteins are phenom-
enally versatile macromolecules that perform most cell functions. As we saw in
Chapter 3, they serve as building blocks for cell structures and form the enzymes
that catalyze most of the cell’s chemical reactions. They also regulate gene expres-
sion (Chapter 7), and they enable cells to communicate with each other (Chapter
15) and to move (Chapter 16). The properties and functions of cells and organisms
are determined to a great extent by the proteins that they are able to make.
Painstaking observations of cells and embryos in the late nineteenth century
had led to the recognition that the hereditary information is carried on chro-
mosomes—threadlike structures in the nucleus of a eukaryotic cell that become
visible by light microscopy as the cell begins to divide (Figure 4–1). Later, when
biochemical analysis became possible, chromosomes were found to consist of
deoxyribonucleic acid (DNA) and protein, with both being present in roughly the
same amounts. For many decades, the DNA was thought to be merely a structural
174 Chapter 4: DNA, Chromosomes, and Genomes
element. However, the other crucial advance made in the 1940s was the identifica-
tion of DNA as the likely carrier of genetic information. This breakthrough in our
understanding of cells came from studies of inheritance in bacteria (Figure 4–2).
But still, as the 1950s began, both how proteins could be specified by instructions
in the DNA and how this information might be copied for transmission from cell
to cell seemed completely mysterious. The puzzle was suddenly solved in 1953,
when James Watson and Francis Crick derived the mechanism from their model
of DNA structure. As outlined in Chapter 1, the determination of the double-he-
lical structure of DNA immediately solved the problem of how the information (B)
in this molecule might be copied, or replicated. It also provided the first clues as 10 μm
to how a molecule of DNA might use the sequence of its subunits to encode the
instructions for making proteins. Today, the fact that DNA is the genetic material
is so fundamental to biological thought that it is difficult to appreciate the enor-
mous intellectual gap that was filled by this breakthrough discovery.
We begin this chapter by describing the structure of DNA. We see how, despite
its chemical simplicity, the structure and chemical properties of DNA make it
ideally suited as the raw material of genes. We then consider how the many pro-
MBoC6 m4.01/4.01
teins in chromosomes arrange and package this DNA. The packing has to be done
in an orderly fashion so that the chromosomes can be replicated and apportioned
correctly between the two daughter cells at each cell division. And it must also
allow access to chromosomal DNA, both for the enzymes that repair DNA damage
and for the specialized proteins that direct the expression of its many genes.
In the past two decades, there has been a revolution in our ability to deter-
mine the exact order of subunits in DNA molecules. As a result, we now know the
sequence of the 3.2 billion nucleotide pairs that provide the information for pro-
ducing a human adult from a fertilized egg, as well as having the DNA sequences
for thousands of other organisms. Detailed analyses of these sequences are pro-
viding exciting insights into the process of evolution, and it is with this subject that
the chapter ends.
This is the first of four chapters that deal with basic genetic mechanisms—the
ways in which the cell maintains, replicates, and expresses the genetic informa-
tion carried in its DNA. In the next chapter (Chapter 5), we shall discuss the mech-
anisms by which the cell accurately replicates and repairs DNA; we also describe
how DNA sequences can be rearranged through the process of genetic recombi-
nation. Gene expression—the process through which the information encoded in
DNA is interpreted by the cell to guide the synthesis of proteins—is the main topic
of Chapter 6. In Chapter 7, we describe how this gene expression is controlled by
the cell to ensure that each of the many thousands of proteins and RNA molecules
encrypted in its DNA is manufactured only at the proper time and place in the life
of a cell.
building blocks of DNA DNA strand Figure 4–3 DNA and its building blocks.
DNA is made of four types of nucleotides,
sugar
which are linked covalently into a
phosphate polynucleotide chain (a DNA strand) with
+ G 5′ 3′ a sugar-phosphate backbone from which
the bases (A, C, G, and T) extend. A DNA
G C A T
sugar- base G
molecule is composed of two antiparallel
phosphate (guanine)
nucleotide DNA strands held together by hydrogen
bonds between the paired bases. The
double-stranded DNA DNA double helix arrowheads at the ends of the DNA strands
3′
indicate the polarities of the two strands. In
3′ the diagram at the bottom left of the figure,
5′
5′
the DNA molecule is shown straightened
G C
out; in reality, it is twisted into a double
G C
helix, as shown on the right. For details,
see Figure 4–5 and Movie 4.1.
T A T A
A T A T
A T A
C G sugar-phosphate G C
backbone
G C C G
G C C G
A
T A
C G C G
A T A T
5′ 5′
3′ 3′
hydrogen-bonded
base pairs
N C G N H N C C H
C C C C
C
N H
O H N
H
guanine H cytosine
hydrogen 5′
3′ bond
1 nm
(A)
_
5′ end O
P
O O Figure 4–5 The DNA double helix.
5′ (A) A space-filling model of 1.5 turns of
bases the DNA double helix. Each turn of DNA is
C O
O made up of 10.4 nucleotide pairs, and the
minor
P O center-to-center distance between adjacent
groove 3′ end _ nucleotide pairs is 0.34 nm. The coiling of
O O O the two strands around each other creates
_ G
O 3′ 0.34 nm two grooves in the double helix: the wider
major O O G O
groove P O groove is called the major groove, and the
_ O O smaller the minor groove, as indicated.
O
O C (B) A short section of the double helix
P
P O O O _ O viewed from its side, showing four base
O O O
O P pairs. The nucleotides are linked together
_ T sugar covalently by phosphodiester bonds that
O O O A
G O_ join the 3ʹ-hydroxyl (–OH) group of one
5′ O P O sugar to the 5ʹ-hydroxyl group of the next
O O
O O sugar. Thus, each polynucleotide strand
O P O
_ C has a chemical polarity; that is, its two
hydrogen bond 3′ O phosphodiester
ends are chemically different. The 5ʹ end
bond
2 nm of the DNA polymer is by convention often
5′ end
3′ end illustrated carrying a phosphate group,
(A) (B) while the 3ʹ end is shown with a hydroxyl.
178 Chapter 4: DNA, Chromosomes, and Genomes
S can serve as a template for making a new strand Sʹ, while strand Sʹ can serve as a
template for making a new strand S (Figure 4–6). Thus, the genetic information in
DNA can be accurately copied by the beautifully simple process in which strand
S separates from strand Sʹ, and each separated strand then serves as a template
for the production of a new complementary partner strand that is identical to its
former partner.
The ability of each strand of a DNA molecule to act as a template for producing
MBoC6 m4.08/4.08
a complementary strand enables a cell to copy, or replicate, its genome before
passing it on to its descendants. We shall describe the elegant machinery that the
cell uses to perform this task in Chapter 5.
Organisms differ from one another because their respective DNA molecules
have different nucleotide sequences and, consequently, carry different biological
messages. But how is the nucleotide alphabet used to make messages, and what
do they spell out?
As discussed above, it was known well before the structure of DNA was deter-
mined that genes contain the instructions for producing proteins. If genes are
made of DNA, the DNA must therefore somehow encode proteins (Figure 4–7).
As discussed in Chapter 3, the properties of a protein, which are responsible for its
biological function, are determined by its three-dimensional structure. This struc-
ture is determined in turn by the linear sequence of the amino acids of which it is
composed. The linear sequence of nucleotides in a gene must therefore somehow
spell out the linear sequence of amino acids in a protein. The exact correspon-
dence between the four-letter nucleotide alphabet of DNA and the twenty-letter
amino acid alphabet of proteins—the genetic code—is not at all obvious from the
DNA structure, and it took over a decade after the discovery of the double helix
before it was worked out. In Chapter 6, we will describe this code in detail in the
course of elaborating the process of gene expression, through which a cell converts
the nucleotide sequence of a gene first into the nucleotide sequence of an RNA
molecule, and then into the amino acid sequence of a protein.
The complete store of information in an organism’s DNA is called its genome,
and it specifies all the RNA molecules and proteins that the organism will ever
synthesize. (The term genome is also used to describe the DNA that carries this
information.) The amount of information contained in genomes is staggering. The
nucleotide sequence of a very small human gene, written out in the four-letter
nucleotide alphabet, occupies a quarter of a page of text (Figure 4–8), while the
complete sequence of nucleotides in the human genome would fill more than a gene A gene B gene C
thousand books the size of this one. In addition to other critical information, it
includes roughly 21,000 protein-coding genes, which (through alternative splic- DNA GENE
double EXPRESSION
ing; see p. 415) give rise to a much greater number of distinct proteins. helix
As described in Chapter 1, nearly all the DNA in a eukaryotic cell is sequestered in Figure 4–7 The relationship between
a nucleus, which in many cells occupies about 10% of the total cell volume. This genetic information carried in DNA and
compartment is delimited by a nuclear envelope formed by two concentric lipid proteins. (Discussed in Chapter 1.)
MBoC6 m4.06/4.06
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 179
Summary
Genetic information is carried in the linear sequence of nucleotides in DNA. Each
molecule of DNA is a double helix formed from two complementary antiparallel
strands of nucleotides held together by hydrogen bonds between G-C and A-T base
pairs. Duplication of the genetic information occurs by the use of one DNA strand
as a template for the formation of a complementary strand. The genetic information
stored in an organism’s DNA contains the instructions for all the RNA molecules and
proteins that the organism will ever synthesize and is said to comprise its genome.
In eukaryotes, DNA is contained in the cell nucleus, a large membrane-bound com-
partment.
nucleolus
nucleolus centrosome
microtubule
nuclear lamina
nuclear pore
The display of the 46 human chromosomes at mitosis is called the human (A) chromosome 6 chromosome 4
Y2 X Y1
X Y
Chinese muntjac Indian muntjac
DNA may do, it seems clear that it is not a great handicap for a eukaryotic cell to Figure 4–14 Two closely related species
carry a large amount of it. of deer with very different chromosome
numbers. In the evolution of the Indian
How the genome is divided into chromosomes also differs from one eukaryotic muntjac, initially separate chromosomes
species to the next. For example, while the cells of humans have 46 chromosomes, fused, without having a major effect on the
those of some small deer have only 6, while those of the common carp contain animal. These two species contain a similar
over 100. Even closely related species with similar genome sizes can have very number of genes. (Chinese muntjac photo
courtesy of Deborah Carreno, Natural
different numbers and sizes of chromosomes (Figure 4–14). Thus, there is no sim-
Wonders Photography.)
ple relationship between chromosome number, complexity of the organism, and
total genome size. Rather, the genomes and chromosomes of modern-day species
have each been shaped by a unique history of seemingly random genetic events,
acted on by poorly understood selection pressures over long evolutionary times.
MBoC6 m4.14/4.14
Figure 4–15 The organization of genes on
The Nucleotide Sequence of the Human Genome Shows How a human chromosome. (A) Chromosome
Our Genes Are Arranged 22, one of the smallest human chromosomes,
contains 48 × 106 nucleotide pairs and
With the publication of the full DNA sequence of the human genome in 2004, it makes up approximately 1.5% of the human
became possible to see in detail how the genes are arranged along each of our genome. Most of the left arm of chromosome
chromosomes (Figure 4–15). It will be many decades before the information con- 22 consists of short repeated sequences
tained in the human genome sequence is fully analyzed, but it has already stimu- of DNA that are packaged in a particularly
compact form of chromatin (heterochromatin)
lated new experiments that have had major effects on the content of every chapter discussed later in this chapter. (B) A tenfold
in this book. expansion of a portion of chromosome 22,
with about 40 genes indicated. Those in dark
(A) human chromosome 22 in its mitotic conformation, composed of two
double-stranded DNA molecules, each 48 × 106 nucleotide pairs long
brown are known genes and those in red are
predicted genes. (C) An expanded portion of
(B) showing four genes. (D) The intron–exon
arrangement of a typical gene is shown
after a further tenfold expansion. Each exon
heterochromatin (red) codes for a portion of the protein, while
×10 the DNA sequence of the introns (gray) is
relatively unimportant, as discussed in detail
in Chapter 6.
10% of chromosome arm ~40 genes The human genome (3.2 × 109 nucleotide
(B) pairs) is the totality of genetic information
belonging to our species. Almost all of this
genome is distributed over the 22 different
×10 autosomes and 2 sex chromosomes (see
1% of chromosome arm containing 4 genes
Figures 4–10 and 4–11) found within the
nucleus. A minute fraction of the human
(C) genome (16,569 nucleotide pairs—in multiple
copies per cell) is found in the mitochondria
×10 (introduced in Chapter 1, and discussed
in detail in Chapter 14). The term human
one gene of 3.4 × 104 nucleotide pairs
genome sequence refers to the complete
nucleotide sequence of DNA in the 24
(D) nuclear chromosomes and the mitochondria.
exon intron gene expression Being diploid, a human somatic cell nucleus
regulatory DNA
sequences
contains roughly twice the haploid amount of
RNA DNA, or 6.4 × 109 nucleotide pairs, when not
duplicating its chromosomes in preparation
protein for division. (Adapted from International
Human Genome Sequencing Consortium,
Nature 409:860–921, 2001. With permission
folded protein
from Macmillan Publishers Ltd.)
MBoC6 m4.15/4.15
184 Chapter 4: DNA, Chromosomes, and Genomes
The first striking feature of the human genome is how little of it (only a few
percent) codes for proteins (Table 4–1 and Figure 4–16). It is also notable that
nearly half of the chromosomal DNA is made up of mobile pieces of DNA that
have gradually inserted themselves in the chromosomes over evolutionary time,
multiplying like parasites in the genome (see Figure 4–62). We discuss these trans-
posable elements in detail in later chapters.
A second notable feature of the human genome is the large average gene
size—about 27,000 nucleotide pairs. As discussed above, a typical gene carries in
its linear sequence of nucleotides the information for the linear sequence of the (B)
amino acids of a protein. Only about 1300 nucleotide pairs are required to encode
a protein of average size (about 430 amino acids in humans). Most of the remain- Figure 4–16 Scale of the human genome.
ing sequence in a gene consists of long stretches of noncoding DNA that interrupt If drawn with a 1 mm space between each
nucleotide pair, as in (A), the human genome
the relatively short segments of DNA that code for protein. As will be discussed in would extend 3200 km (approximately
detail in Chapter 6, the coding sequences are called exons; the intervening (non- 2000 miles), far enough to stretch across
coding) sequences in genes are called introns (see Figure 4–15 and Table 4–1). the center of Africa,m4.16/4.16
MBoC6 the site of our human
The majority of human genes thus consist of a long string of alternating exons and origins (red line in B). At this scale, there
would be, on average, a protein-coding
introns, with most of the gene consisting of introns. In contrast, the majority of
gene every 150 m. An average gene would
genes from organisms with concise genomes lack introns. This accounts for the extend for 30 m, but the coding sequences
much smaller size of their genes (about one-twentieth that of human genes), as in this gene would add up to only just over
well as for the much higher fraction of coding DNA in their chromosomes. a meter.
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 185
In addition to introns and exons, each gene is associated with regulatory DNA
sequences, which are responsible for ensuring that the gene is turned on or off at
the proper time, expressed at the appropriate level, and only in the proper type of
cell. In humans, the regulatory sequences for a typical gene are spread out over
tens of thousands of nucleotide pairs. As would be expected, these regulatory
sequences are much more compressed in organisms with concise genomes. We
discuss how regulatory DNA sequences work in Chapter 7.
Research in the last decade has surprised biologists with the discovery that,
in addition to 21,000 protein-coding genes, the human genome contains many
thousands of genes that encode RNA molecules that do not produce proteins, but
instead have a variety of other important functions. What is thus far known about
these molecules will be presented in Chapters 6 and 7. Last, but not least, the
nucleotide sequence of the human genome has revealed that the archive of infor-
mation needed to produce a human seems to be in an alarming state of chaos. As
one commentator described our genome, “In some ways it may resemble your
garage/bedroom/refrigerator/life: highly individualistic, but unkempt; little evi-
dence of organization; much accumulated clutter (referred to by the uninitiated
as ‘junk’); virtually nothing ever discarded; and the few patently valuable items
indiscriminately, apparently carelessly, scattered throughout.” We shall discuss
how this is thought to have come about in the final sections of this chapter entitled
“How Genomes Evolve.” Figure 4–17 A simplified view of the
eukaryotic cell cycle. During interphase,
the cell is actively expressing its genes
Each DNA Molecule That Forms a Linear Chromosome Must and is therefore synthesizing proteins.
Contain a Centromere, Two Telomeres, and Replication Origins Also, during interphase and before cell
division, the DNA is replicated and each
To form a functional chromosome, a DNA molecule must be able to do more than chromosome is duplicated to produce two
simply carry genes: it must be able to replicate, and the replicated copies must be closely paired sister DNA molecules (called
sister chromatids). A cell with only one type
separated and reliably partitioned into daughter cells at each cell division. This
of chromosome, present in maternal and
process occurs through an ordered series of stages, collectively known as the cell paternal copies, is illustrated here. Once
cycle, which provides for a temporal separation between the duplication of chro- DNA replication is complete, the cell can
mosomes and their segregation into two daughter cells. The cell cycle is briefly enter M phase, when mitosis occurs and
summarized in Figure 4–17, and it is discussed in detail in Chapter 17. Briefly, the nucleus is divided into two daughter
nuclei. During this stage, the chromosomes
during a long interphase, genes are expressed and chromosomes are replicated, condense, the nuclear envelope breaks
with the two replicas remaining together as a pair of sister chromatids. Through- down, and the mitotic spindle forms from
out this time, the chromosomes are extended and much of their chromatin exists microtubules and other proteins. The
as long threads in the nucleus so that individual chromosomes cannot be easily condensed mitotic chromosomes are
distinguished. It is only during a much briefer period of mitosis that each chro- captured by the mitotic spindle, and one
complete set of chromosomes is then
mosome condenses so that its two sister chromatids can be separated and dis- pulled to each end of the cell by separating
tributed to the two daughter nuclei. The highly condensed chromosomes in a the members of each sister-chromatid pair.
dividing cell are known as mitotic chromosomes (Figure 4–18). This is the form A nuclear envelope re-forms around each
in which chromosomes are most easily visualized; in fact, the images of chromo- chromosome set, and in the final step of
M phase, the cell divides to produce two
somes shown so far in the chapter are of chromosomes in mitosis.
daughter cells. Most of the time in the cell
Each chromosome operates as a distinct structural unit: for a copy to be passed cycle is spent in interphase; M phase is
on to each daughter cell at division, each chromosome must be able to replicate, brief in comparison, occupying only about
and the newly replicated copies must subsequently be separated and partitioned an hour in many mammalian cells.
nuclear envelope
mitotic
surrounding the nucleus
chromosome
correctly into the two daughter cells. These basic functions are controlled by three
types of specialized nucleotide sequences in the DNA, each of which binds spe-
cific proteins that guide the machinery that replicates and segregates chromo-
somes (Figure 4–19).
Experiments in yeasts, whose chromosomes are relatively small and easy to
manipulate, have identified the minimal DNA sequence elements responsible for
each of these functions. One type of nucleotide sequence acts as a DNA repli-
cation origin, the location at which duplication of the DNA begins. Eukaryotic
chromosomes contain many origins of replication to ensure that the entire chro-
mosome can be replicated rapidly, as discussed in detail in Chapter 5.
After DNA replication, the two sister chromatids that form each chromosome
remain attached to one another and, as the cell cycle proceeds, are condensed
further to produce mitotic chromosomes. The presence of a second specialized
DNA sequence, called a centromere, allows one copy of each duplicated and con-
densed chromosome to be pulled into each daughter cell when a cell divides. A
protein complex called a kinetochore forms at the centromere and attaches the
duplicated chromosomes to the mitotic spindle, allowing them to be pulled apart
(discussed in Chapter 17).
The third specialized DNA sequence forms telomeres, the ends of a chromo- 1 µm
some. Telomeres contain repeated nucleotide sequences that enable the ends of
chromosomes to be efficiently replicated. Telomeres also perform another func- Figure 4–18 A mitotic chromosome.
tion: the repeated telomere DNA sequences, together with the regions adjoining A mitotic chromosome is a condensed
them, form structures that protect the end of the chromosome from being mis- duplicated chromosome in which the
taken by the cell for a broken DNA molecule in need of repair. We discuss both this two new chromosomes, called sister
type of repair and the structure and function of telomeres in Chapter 5. chromatids, are still linked together (see
Figure 4–17). The constricted region
In yeast cells, the three types of sequences required to propagate a chromo- indicates the position of the centromere.
some are relatively short (typically less than 1000 base pairs each) and therefore (Courtesy of Terry m4.20/4.18
MBoC6 D. Allen.)
use only a tiny fraction of the information-carrying capacity of a chromosome.
Although telomere sequences are fairly simple and short in all eukaryotes, the
DNA sequences that form centromeres and replication origins in more complex
organisms are much longer than their yeast counterparts. For example, experi-
ments suggest that a human centromere can contain up to a million nucleotide
pairs and that it may not require a stretch of DNA with a defined nucleotide
sequence. Instead, as we shall discuss later in this chapter, a human centromere
is thought to consist of a large, regularly repeating protein–nucleic acid structure
that can be inherited when a chromosome replicates.
chromatin
DNA
Figure 4–20 Chromatin. As illustrated,
chromatin consists of DNA bound to both
histone and non-histone proteins. The
mass of histone protein present is about
equal to the total mass of non-histone
protein, but—as schematically indicated
here—the latter class is composed of an
enormous number of different species. In
total, a chromosome is about one-third
histone non-histone proteins DNA and two-thirds protein by mass.
188 Chapter 4: DNA, Chromosomes, and Genomes
50 nm
H2B, H3, and H4—and double-stranded DNA that is 147 nucleotide pairs long.
The histone octamer forms a protein core around which the double-stranded DNA
is wound (Figure 4–22).
The region of linker DNA that separates each nucleosome core particle from
MBoC6
the next can vary in length from m4.22/4.20
a few nucleotide pairs up to about 80. (The term
nucleosome technically refers to a nucleosome core particle plus one of its adjacent
DNA linkers, but it is often used synonymously with nucleosome core particle.)
On average, therefore, nucleosomes repeat at intervals of about 200 nucleotide
pairs. For example, a diploid human cell with 6.4 × 109 nucleotide pairs contains
core histones
approximately 30 million nucleosomes. The formation of nucleosomes converts a linker DNA of nucleosome
DNA molecule into a chromatin thread about one-third of its initial length.
The Structure of the Nucleosome Core Particle Reveals How DNA nucleosome includes
“beads-on-a-string”
Is Packaged form of chromatin ~200 nucleotide
pairs of DNA
The high-resolution structure of a nucleosome core particle, solved in 1997, NUCLEASE
revealed a disc-shaped histone core around which the DNA was tightly wrapped DIGESTS
in a left-handed coil of 1.7 turns (Figure 4–23). All four of the histones that make LINKER DNA
up the core of the nucleosome are relatively small proteins (102–135 amino acids),
and they share a structural motif, known as the histone fold, formed from three α
helices connected by two loops (Figure 4–24). In assembling a nucleosome, the
histone folds first bind to each other to form H3–H4 and H2A–H2B dimers, and
the H3–H4 dimers combine to form tetramers. An H3–H4 tetramer then further
combines with two H2A–H2B dimers to form the compact octamer core, around released
which the DNA is wound. nucleosome 11 nm
core particle
The interface between DNA and histone is extensive: 142 hydrogen bonds are
formed between DNA and the histone core in each nucleosome. Nearly half of
these bonds form between the amino acid backbone of the histones and the sug- DISSOCIATION
WITH HIGH
ar-phosphate backbone of the DNA. Numerous hydrophobic interactions and salt CONCENTRATION
linkages also hold DNA and protein together in the nucleosome. More than one- OF SALT
fifth of the amino acids in each of the core histones are either lysine or arginine
(two amino acids with basic side chains), and their positive charges can effectively
octameric 147-nucleotide-pair
Figure 4–22 Structural organization of the nucleosome. A nucleosome histone core DNA double helix
contains a protein core made of eight histone molecules. In biochemical
experiments, the nucleosome core particle can be released from isolated
chromatin by digestion of the linker DNA with a nuclease, an enzyme that DISSOCIATION
breaks down DNA. (The nuclease can degrade the exposed linker DNA but
cannot attack the DNA wound tightly around the nucleosome core.) After
dissociation of the isolated nucleosome into its protein core and DNA, the
length of the DNA that was wound around the core can be determined.
This length of 147 nucleotide pairs is sufficient to wrap 1.7 times around the
histone core. H2A H2B H3 H4
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 189
(A)
H2A
N C
H2B N C
MBoC6 m4.24/4.22
H3 N C
H4 N C
N C
N
histone
octamer
(C)
N
C
C N
N
N
N N
N N
Figure 4–24 The overall structural organization of the core histones. (A) Each of the core
histones contains an N-terminal tail, which is subject to several forms of covalent modification, and
a histone fold region, as indicated. (B) The structure of the histone fold, which is formed by all four
of the core histones. (C) Histones 2A and 2B form a dimer through an interaction known as the
“handshake.” Histones H3 and H4 form a dimer through the same type of interaction. (D) The final
histone octamer on DNA. Note that all eight N-terminal tails of the histones protrude from the disc-
shaped core structure. Their conformations are highly flexible, and they serve as binding sites for
sets of other proteins.
190 Chapter 4: DNA, Chromosomes, and Genomes
core. The bending requires a substantial compression of the minor groove of the G-C preferred here
(minor groove outside)
DNA helix. Certain dinucleotides in the minor groove are especially easy to com-
press, and some nucleotide sequences bind the nucleosome more tightly than
others (Figure 4–25). This probably explains some striking, but unusual, cases
of very precise positioning of nucleosomes along a stretch of DNA. However, the
sequence preference of nucleosomes must be weak enough to allow other factors
to dominate, inasmuch as nucleosomes can occupy any one of a number of posi-
tions relative to the DNA sequence in most chromosomal regions.
In addition to its histone fold, each of the core histones has an N-terminal AA, TT, and TA dinucleotides
amino acid “tail,” which extends out from the DNA–histone core (see Figure preferred here
4–24D). These histone tails are subject to several different types of covalent mod- (minor groove inside)
ifications that in turn control critical aspects of chromatin structure and function, histone core DNA of
of nucleosome nucleosome
as we shall discuss shortly. (histone octamer)
As a reflection of their fundamental role in DNA function through controlling
chromatin structure, the histones are among the most highly conserved eukary- Figure 4–25 The bending of DNA in a
otic proteins. For example, the amino acid sequence of histone H4 from a pea nucleosome. The DNA helix makes
differs from that of a cow at only 2 of the 102 positions. This strong evolution- 1.7 tight turns around the histone octamer.
ary conservation suggests that the functions of histones involve nearly all of their This diagram illustrates how the minor
groove is compressed on the inside of the
amino acids, so that a change in any position is deleterious to the cell. But in addi- turn. Owing to structural features of the
tion to this remarkable conservation, eukaryotic organisms also produce smaller DNA molecule, the indicated dinucleotides
amounts of specialized variant core histones that differ in amino acid sequence are preferentially accommodated in such
from the main ones. As discussed later, these variants, combined with the surpris- a narrow minor groove, which helps to
explain why certain DNA sequences
ingly large number of covalent modifications that can be added to the histones in
will bind more tightly than others to the
nucleosomes, give rise to a variety of chromatin structures in cells. nucleosome core.
ATP-dependent
chromatin remodeling
complex
EXCHANGE OF
DNA lacking
NUCLEOSOME CORE
nucleosome
(HISTONE OCTAMER)
histone
chaperone
How nucleosomes are organized into condensed arrays is unclear. The struc-
ture of a tetranucleosome (a complex of four nucleosomes) obtained by x-ray
crystallography and high-resolution electron microscopy of reconstituted chro-
matin have been used to support a zigzag model for the stacking of nucleosomes
in a 30-nm fiber (Figure 4–28). But cryoelectron microscopy of carefully prepared
nuclei suggests that most regions of chromatin are less regularly structured.
What causes nucleosomes to stack so tightly on each other? Nucleosome-to-
nucleosome linkages that involve histone tails, most notably the H4 tail, consti-
tute one important factor (Figure 4–29). Another important factor is an additional
MBoC6 m4.30/4.26
histone that is often present in a 1-to-1 ratio with nucleosome cores, known as Figure 4–28 A zigzag model for the 30-
histone H1. This so-called linker histone is larger than the individual core histones nm chromatin fiber. (A) The conformation
and it has been considerably less well conserved during evolution. A single his- of two of the four nucleosomes in a
tetranucleosome, from a structure
tone H1 molecule binds to each nucleosome, contacting both DNA and protein, determined by x-ray crystallography.
and changing the path of the DNA as it exits from the nucleosome. This change in (B) Schematic of the entire tetranucleosome;
the exit path of DNA is thought to help compact nucleosomal DNA (Figure 4–30). the fourth nucleosome is not visible, being
stacked on the bottom nucleosome and
behind it in this diagram. (C) Diagrammatic
illustration of a possible zigzag structure
that could account for the 30-nm chromatin
fiber. (A, PDB code: 1ZBB; C, adapted
from C.L. Woodcock, Nat. Struct. Mol. Biol.
12:639–640, 2005. With permission from
Macmillan Publishers Ltd.)
(B) (C)
(A)
CHROMOSOMAL DNA AND ITS PACKAGING IN THE CHROMATIN FIBER 193
H4 tail
H2B tail
H2A tail H3 tail
H2A tail
H4 tail
H2B tail
H3 tail
(A) (B)
Most eukaryotic organisms make several histone H1 proteins of related but quite Figure 4–29 A model for the role played
distinct amino acid sequences. The presence of many other DNA-binding pro- by histone tails in the compaction of
chromatin. (A) A schematic diagram
teins, as well as proteins that bind directly to histones, is certain to add important shows the approximate exit points of
additional features to any array of nucleosomes. the eight histone tails, one from each
MBoC6 m4.33/4.27.5 histone protein, that extend from each
Summary nucleosome. The actual structure is
shown to its right. In the high-resolution
A gene is a nucleotide sequence in a DNA molecule that acts as a functional unit structure of the nucleosome, the tails are
largely unstructured, suggesting that they
for the production of a protein, a structural RNA, or a catalytic or regulatory RNA
are highly flexible. (B) As indicated, the
molecule. In eukaryotes, protein-coding genes are usually composed of a string of histone tails are thought to be involved in
alternating introns and exons associated with regulatory regions of DNA. A chro- interactions between nucleosomes that
mosome is formed from a single, enormously long DNA molecule that contains a help to pack them together. (A, PDB
linear array of many genes, bound to a large set of proteins. The human genome code: 1K X 5.)
contains 3.2 × 109 DNA nucleotide pairs, divided between 22 different autosomes
(present in two copies each) and 2 sex chromosomes. Only a small percentage of this
DNA codes for proteins or functional RNA molecules. A chromosomal DNA mole-
cule also contains three other types of important nucleotide sequences: replication
origins and telomeres allow the DNA molecule to be efficiently replicated, while a
centromere attaches the sister DNA molecules to the mitotic spindle, ensuring their
accurate segregation to daughter cells during the M phase of the cell cycle.
The DNA in eukaryotes is tightly bound to an equal mass of histones, which
form repeated arrays of DNA–protein particles called nucleosomes. The nucleosome
is composed of an octameric core of histone proteins around which the DNA dou-
ble helix is wrapped. Nucleosomes are spaced at intervals of about 200 nucleotide
pairs, and they are usually packed together (with the aid of histone H1 molecules)
into quasi-regular arrays to form a 30-nm chromatin fiber. Even though compact,
the structure of chromatin must be highly dynamic to allow access to the DNA.
There is some spontaneous DNA unwrapping and rewrapping in the nucleosome
itself; however, the general strategy for reversibly changing local chromatin struc-
ture features ATP-driven chromatin remodeling complexes. Cells contain a large set
of such complexes, which are targeted to specific regions of chromatin at appropri-
ate times. The remodeling complexes collaborate with histone chaperones to allow
nucleosome cores to be repositioned, reconstituted with different histones, or com-
pletely removed to expose the underlying DNA.
histone H1
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
barrier
genes
heterochromatin euchromatin early in the developing embryo, heterochromatin forms and spreads into neighboring
1 2 3 4 5 euchromatin to different extents in different cells
1 2 3 4 5 1 2 3 4 5 1 2 3 4 5
CHROMOSOME
TRANSLOCATION
cell proliferation
1 2 3 4 5
heterochromatin euchromatin
(A) (B)
Figure 4–31 The cause of position effect variegation in Drosophila. (A) Heterochromatin (green) is normally prevented from
spreading into adjacent regions of euchromatin (red) by barrier DNA sequences, which we shall discuss shortly. In flies that
inherit certain chromosomal rearrangements, however, this barrier is no longer present. (B) During the early development of such
flies, heterochromatin can spread into neighboring chromosomal DNA, proceeding for different distances in different cells. This
spreading soon stops, but the established pattern of heterochromatin is subsequently inherited, so that large clones of progeny
cells are produced that have the same neighboring genes condensed into heterochromatin and thereby inactivated (hence the
“variegated” appearance of some of these flies; see Figure 4–32). Although “spreading” is used to describe the formation of
new heterochromatin close to previously existing heterochromatin, the term may not be wholly accurate. There is evidence that
during expansion, the condensation of DNA into heterochromatin can “skip over” some regions of chromatin, sparing the genes
that lie within them from repressive effects.
H O H O H O H O H O
N C C N C C N C C N C C N C C
Figure 4–33 Some prominent types of covalent amino acid side-chain (B) SERINE PHOSPHORYLATION
modifications found on nucleosomal histones. (A) Three different levels
H O H O
of lysine methylation are shown; each can be recognized by a different
binding protein and thus each can have a different significance for the cell. N C C N C C
Note that acetylation removes the plus charge on lysine, and that, most
importantly, an acetylated lysine cannot be methylated, and vice versa. H CH2 H CH2
(B) Serine phosphorylation adds a negative charge to a histone. Modifications
of histones not shown here include the mono- or dimethylation of an arginine, OH O
the phosphorylation of a threonine, the addition of ADP-ribose to a glutamic
acid, and the addition of a ubiquityl, sumoyl, or biotin group to a lysine. serine O P O
_
O
As a first step, one can carry out a search for the molecules that are involved.
phosphoserine
This has been done by means of genetic screens, in which large numbers of
mutants are generated, after which one picks out those that show an abnormal-
ity of the process in question. Extensive genetic screens in Drosophila, fungi, and
mice have identified more than 100 genes whose products either enhance or sup-
press the spread of heterochromatin and its stable inheritance—in other words,
genes that serve as either enhancers or suppressors of position effect variegation.
Many of these genes turn out to code for non-histone chromosomal proteins that
interact with histones and are involved in modifying or maintaining chromatin
structure. We shall discuss how they work in the sections that follow.
H3
P A A A A
H3
SGRGKQGGKARAKAKTRSSRAGLQFPVGRV H2A
side view 1 5 9 13 15
H4
P M
A A A A A
PEPAKSAPAPKKGSKKAVTKAQKKDGKKRK H2B
5 12 14 15 20 2324
A A
A MA A M
H2B M M P M M M M P M
ARTKQTARKSTGGKAPRKQLATKAARKSAPATGGVK H3
H2B H2A 2 4 9 10 14 1718 23 26 2728 36
H2A
H4
H3 A A
P M A A M A M
H2A H3 SGRGKGGKGLGKGGAKRHRKVLRDNIQGIT H4
1 3 5 8 12 16 20
Figure 4–34 The covalent modification of core histone tails. (A) The structure of the nucleosome highlighting the location of
the first 30 amino acids in each of its eight N-terminal histone tails (green). These tails are unstructured and highly mobile, and
thus will change their conformation depending on other bound proteins. (B) Well-documented modifications of the four histone
core proteins are indicated. Although only a single symbol is used here for methylation (M), each lysine (K) or arginine (R) can be
methylated in several different ways. Note also that some positions (e.g., lysine 9 of H3) can be modified either by methylation
or by acetylation, but not both. Most of the modifications shown add a relatively small molecule onto the histone tails; the
exception is ubiquitin, a 76-amino-acid protein also used for other cell processes (see Figure 3–69). Not shown are more than
20 possible modifications located in the globular core of the histones. (A, PDB: 1KX5; B, adapted from H. Santos-Rosa and
C. Caldas, Eur. J. Cancer 41:2381–2402, 2005. With permission from Elsevier.)
different times and places in the life of an organism, thereby determining where
and when the chromatin-modifying enzymes will act. In this way, the DNA
sequence ultimately determines how histones are modified. But in at least some
cases, the covalent modifications on nucleosomesMBoC6 can persist long after the tran-
m4.39/4.32
scription regulator proteins that first induced them have disappeared, thereby
providing the cell with a memory of its developmental history. Most remarkably,
as in the related phenomenon of position effect variegation discussed above, this
memory can be transmitted from one cell generation to the next.
Very different patterns of covalent modification are found on different groups
of nucleosomes, depending both on their exact position in the genome and on
the history of the cell. The modifications of the histones are carefully controlled,
and they have important consequences. The acetylation of lysines on the N-ter-
minal tails loosens chromatin structure, in part because adding an acetyl group
to lysine removes its positive charge, thereby reducing the affinity of the tails for
adjacent nucleosomes. However, the most profound effects of the histone modifi-
cations lie in their ability to recruit specific other proteins to the modified stretch
of chromatin. Trimethylation of one specific lysine on the histone H3 tail, for
instance, attracts the heterochromatin-specific protein HP1 and contributes to
the establishment and spread of heterochromatin. More generally, the recruited
proteins act with the modified histones to determine how and when genes will be
expressed, as well as other chromosome functions. In this way, the precise struc-
ture of each domain of chromatin governs the readout of the genetic information
that it contains, and thereby the structure and function of the eukaryotic cell.
198 Chapter 4: DNA, Chromosomes, and Genomes
histone fold SPECIAL FUNCTION Figure 4–35 The structure of some histone
variants compared with the major histone
that they replace. The histone variants
H3 are inserted into nucleosomes at specific
sites on chromosomes by ATP-dependent
H3.3 transcriptional activation
chromatin remodeling enzymes that act in
concert with histone chaperones (see Figure
4–27). The CENP-A (Centromere Protein-A)
CENP-A centromere function and
kinetochore assembly
variant of histone H3 is discussed later in
loop insert this chapter (see Figure 4–42); other variants
are discussed in Chapter 7. The sequences
in each variant that are colored differently
H2A (compared to the major histone above it)
denote regions with an amino acid sequence
different from this major histone. (Adapted
H2AX DNA repair and
recombination
from K. Sarma and D. Reinberg, Nat. Rev.
Mol. Cell Biol. 6:139–149, 2005. With
H2AZ gene expression, permission from Macmillan Publishers Ltd.)
chromosome segregation
transcriptional repression,
macroH2A
X-chromosome inactivation
histone fold
CH3
CH3
Zn
H3C N+
Zn
(A) Arg2
Lys4
Thr6
Thr3
N-terminus
Gln5 Ala
(B) (C)
Figure 4–36 How a mark on a nucleosome is read. The figure shows the structure of a protein module (called an ING PHD
domain) that specifically recognizes histone H3 trimethylated on lysine 4. (A) A trimethyl group. (B) Space-filling model of an ING
PHD domain bound to a histone tail (green, with the trimethyl group highlighted in yellow). (C) A ribbon model showing how
the N-terminal six amino acids in the H3 tail are recognized. The red lines represent hydrogen bonds. This is one of a family of
PHD domains that recognize methylated lysines on histones; different members of the family bind tightly to lysines located at
different positions, and they can discriminate between a mono-, di-, and trimethylated lysine. In a similar way, other small protein
modules recognize specific histone side chains that have been marked with acetyl groups, phosphate groups, and so on.
(Adapted from P.V. Peña et al., Nature 442:100–103, 2006. With permission from Macmillan Publishers Ltd.)
see Figure 7–20). But after a modifying enzyme “writes” its mark on one or a few
neighboring nucleosomes, events that resemble a chain reaction can ensue. In
such a case, the “writer enzyme” works in concert with a “reader protein” located
in the same protein complex. The reader protein contains a module that recog- Figure 4–39 Some specific meanings
nizes the mark and binds tightly to the newly modified nucleosome (see Figure of histone modifications. (A) The
modifications on the histone H3 N-terminal
(A) tail are shown, repeated from Figure
A A 4–34. (B) The H3 tail can be marked by
M A A A M different sets of modifications that act in
M M P MMBoC6
M m4.43/4.36
M M P M combination to convey a specific meaning.
histone
Only a small number of the meanings
R K KS K RK K RK S K H3
are known, including the three examples
2 4 9 10 14 17 18 23 26 27 28 36
shown. Not illustrated is the fact that, as
just implied (see Figure 4–38), reading a
histone mark generally involves the joint
(B) modification state “meaning” recognition of marks at other sites on the
nucleosome along with the indicated H3
trimethyl tail recognition. In addition, specific levels
M of methylation (mono-, di-, or trimethyl
heterochromatin formation, groups) are generally required. Thus,
K gene silencing
for example, the trimethylation of lysine
9
trimethyl 9 attracts the heterochromatin-specific
M A protein HP1, which induces a spreading
gene expression wave of further lysine 9 trimethylation
K K followed by further HP1 binding, according
4 9 to the general scheme that will be
trimethyl
illustrated shortly (see Figure 4–40). Also
M
gene silencing important in this process, however, is a
K (Polycomb repressive complex) synergistic trimethylation of the histone H4
27 N-terminal tail on lysine 20.
CHROMATIN STRUCTURE AND FUNCTION 201
REPEATS
higher-order repeat
Figure 4–43 Evidence for the plasticity of human centromere formation. (A) A series of A-T-rich alpha satellite DNA
sequences is repeated many thousands of times at each human centromere (red), and is surrounded by pericentric
heterochromatin (brown). However, due to an ancient chromosome breakage-and-rejoining event, some human chromosomes
contain two blocks of alpha satellite DNA, each of which presumably functioned as a centromere in its original chromosome.
Usually, chromosomes with two functional centromeres are not stably propagated because they attach improperly to the
spindle and are broken apart during mitosis. In chromosomes that do survive, however, one of the centromeres has somehow
become inactivated, even though it contains all the necessary DNA sequences. This allows the chromosome to be stably
propagated. (B) In a small fraction (1/2000) of human births, extra chromosomes are observed in cells of the offspring. Some of
these extra chromosomes, which have formed from a breakage event, lack alpha satellite DNA altogether, yet new centromeres
(neocentromeres) have arisen from what was originally euchromatic DNA.
The complexity of centromeric chromatin is not illustrated in these diagrams. The alpha satellite DNA that forms centromeric
chromatin in humans is packaged into alternating blocks of chromatin. One block is formed from a long string of nucleosomes
containing the CENP-A H3 variant histone; the other block contains nucleosomes that are specially marked with dimethyl lysine
4 on the normal H3 histone. Each block is more than a thousand nucleosomes long. This centromeric chromatin is flanked by
pericentric heterochromatin, as shown. The pericentric chromatin contains methylated lysine 9 on its H3 histones, along with
HP1 protein, and it is an example of “classical” heterochromatin (see Figure 4–39).
related, often have different numbers of chromosomes; see Figure 4–14 for an
extreme example. As we shall discuss below, detailed genome comparisons show
that in many cases the changes in chromosome numbers have arisen through
chromosome breakage-and-rejoining events, creating novel chromosomes, some
of which must initially have contained abnormal numbers of centromeres—either
more than one, or none at all. Yet stable inheritance requires that each chromo-
some should contain one centromere, and one only. It seems that surplus cen-
tromeres must have been inactivated, and/or new centromeres created, so as to
allow the rearranged chromosome sets to be stably maintained.
MBoC6 m4.49/4.42
Some Chromatin Structures Can Be Directly Inherited
The changes in centromere activity just discussed, once established, need to be
perpetuated through subsequent cell generations. What could be the mechanism
of this type of epigenetic inheritance?
It has been proposed that de novo centromere formation requires an initial
seeding event, involving the formation of a specialized DNA–protein structure that
contains nucleosomes formed with the CENP-A variant of histone H3. In humans,
this seeding event happens more readily on arrays of alpha satellite DNA than
on other DNA sequences. The H3–H4 tetramers from each nucleosome on the
parental DNA helix are directly inherited by the sister DNA helices at a replication
fork (see Figure 5–32). Therefore, once a set of CENP-A-containing nucleosomes
has been assembled on a stretch of DNA, it is easy to understand how a new cen-
tromere could be generated in the same place on both daughter chromosomes
following each round of cell division. One need only assume that the presence of
the CENP-A histone in an inherited nucleosome selectively recruits more CENP-A
histone to its newly formed neighbors.
There are some striking similarities between the formation and maintenance
of centromeres and the formation and maintenance of some other regions of
CHROMATIN STRUCTURE AND FUNCTION 205
Summary
In the chromosomes of eukaryotes, DNA is uniformly assembled into nucleosomes,
but a variety of different chromatin structures is possible. This variety is based on a
large set of reversible covalent modifications of the four histones in the nucleosome
core. These modifications include the mono-, di-, and trimethylation of many differ-
ent lysine side chains, an important reaction that is incompatible with the acetyla-
tion that can occur on the same lysines. Specific combinations of the modifications
mark many nucleosomes, governing their interactions with other proteins. These
marks are read when protein modules that are part of a larger protein complex
bind to the modified nucleosomes in a region of chromatin. These reader proteins
then attract additional proteins that perform various functions.
Some reader protein complexes contain a histone-modifying enzyme, such as a
histone lysine methylase, that “writes” the same mark that the reader recognizes. A
reader–writer–remodeling complex of this type can spread a specific form of chro-
matin along a chromosome. In particular, large regions of condensed heterochro-
matin are thought to be formed in this way. Heterochromatin is commonly found
around centromeres and near telomeres, but it is also present at many other posi-
tions in chromosomes. The tight packaging of DNA into heterochromatin usually
silences the genes within it.
The phenomenon of position effect variegation provides strong evidence for the
inheritance of condensed states of chromatin from one cell generation to the next. A
similar mechanism appears to be responsible for maintaining the specialized chro-
matin at centromeres. More generally, the ability to propagate specific chromatin
structures across cell generations makes possible an epigenetic cell memory process
that plays a role in maintaining the set of different cell states required by complex
multicellular organisms.
Having discussed the DNA and protein molecules from which the chromatin fiber
is made, we now turn to the organization of the chromosome on a more global
scale and the way in which its various domains are arranged in space. As a 30-nm 10 µm
fiber, a typical human chromosome would still be 0.1 cm in length and able to sister
chromatids
span the nucleus more than 100 times. Clearly, there must be a still higher level
of folding, even in interphase chromosomes. Although the molecular details are
still largely a mystery, this higher-order packaging almost certainly involves the
folding of the chromatin into a series of loops and coils. This chromatin packing is
fluid, frequently changing in response to the needs of the cell. less
We begin this section by describing some unusual interphase chromosomes condensed highly
chromatin condensed
that can be easily visualized. Exceptional though they are, these special cases chromatin
reveal features that are thought to be representative of all interphase chromo-
somes. Moreover, they provide ways to investigate some fundamental aspects of
chromatin structure that we have touched on in the previous section. Next, we
describe how a typical interphase chromosome is arranged in the mammalian Figure 4–46 A model for the chromatin
cell nucleus. Finally, we shall discuss the additional tenfold compaction that chro- domains in a lampbrush chromosome.
Shown is a small portion of one pair of
mosomes undergo in the passage from interphase to mitosis. sister chromatids. Here, two identical DNA
MBoC6 n4.126/4.47
double helices are aligned side by side,
Chromosomes Are Folded into Large Loops of Chromatin packaged into different types of chromatin.
The set of lampbrush chromosomes
Insight into the structure of the chromosomes in interphase cells has come from in many amphibians contains a total of
studies of the stiff and enormously extended chromosomes in growing amphib- about 10,000 loops resembling those
shown here. The rest of the DNA in each
ian oocytes (immature eggs). These very unusual lampbrush chromosomes (the chromosome (the great majority) remains
largest chromosomes known), paired in preparation for meiosis, are clearly visi- highly condensed. Four copies of each
ble even in the light microscope, where they are seen to be organized into a series loop are present in the cell, since each
of large chromatin loops emanating from a linear chromosomal axis (Figure 4–46 lampbrush chromosome consists of two
and Figure 4–47). aligned sets of paired chromatids. This
four-stranded structure is characteristic of
In these chromosomes, a given loop always contains the same DNA sequence this stage of development of the oocyte,
that remains extended in the same manner as the oocyte grows. These chromo- which has arrested at the diplotene stage
somes are producing large amounts of RNA for the oocyte, and most of the genes of meiosis; see Figure 17–56.
208 Chapter 4: DNA, Chromosomes, and Genomes
present in the DNA loops are being actively expressed. The majority of the DNA,
however, is not in loops but remains highly condensed on the chromosome axis,
where genes are generally not expressed.
It is thought that the interphase chromosomes of all eukaryotes are similarly
arranged in loops. Although these loops are normally too small and fragile to be
easily observed in a light microscope, other methods can be used to infer their
presence. For example, modern DNA technologies have made it possible to assess
the frequency with which any two loci along an interphase chromosome are held
together, thus revealing likely candidates for the sites on chromatin that form the
bases of loop structures (Figure 4–48). These experiments and others suggest that
the DNA in human chromosomes is likely to be organized into loops of various
lengths. A typical loop might contain between 50,000 and 200,000 nucleotide
pairs of DNA, although loops of a million nucleotide pairs have also been sug-
gested (Figure 4–49).
(A)
100 µm
DNA-binding cross-link
proteins formed
high-level
looped domain expression
folded of genes
chromatin in loop
fiber
histone-
modifying enzymes
chromatin
remodeling complexes
RNA polymerase
Figure 4–49 A model for the organization of an interphase chromosome. A section of an interphase chromosome is shown folded into a series
of looped domains, each containing perhaps 50,000–200,000 or more nucleotide pairs of double-helical DNA condensed into a chromatin fiber.
The chromatin in each individual loop is further condensed through poorly
MBoC6 understood folding processes that are reversed when the cell requires
m4.57/4.48
direct access to the DNA packaged in the loop. Neither the composition of the postulated chromosomal axis nor how the folded chromatin fiber is
anchored to it is clear. However, in mitotic chromosomes, the bases of the chromosomal loops are enriched both in condensins (discussed below)
and in DNA topoisomerase II enzymes (discussed in Chapter 5), two proteins that may form much of the axis at metaphase.
210 Chapter 4: DNA, Chromosomes, and Genomes
The set of proteins bound as part of the chromatin at a given locus varies
depending on the cell type and its stage of development. These variations make
the accessibility of specific genes different in different tissues, helping to generate
the cell diversification that accompanies embryonic development (described in
Chapter 21).
5 µm
closely associated with the nuclear lamina, regardless of the chromosome exam-
ined. And DNA probes that preferentially stain gene-rich regions of human chro-
mosomes produce a striking picture of the interphase nucleus that presumably
reflects different average positions for active and inactive genes (Figure 4–54).
How is most of the chromatin in each interphase chromosome condensed
when its genes are not being expressed? A powerful extension of the chromosome
conformation capture method described previously (see Figure 4–48), which
exploits a high-throughput DNA sequencing technology called massive parallel
sequencing (see Panel 8–1, pp. 478–481), allows the connections between all of
the different one-megabase (1 Mb) segments of the human genome to be mapped
in human interphase chromosomes. The results reveal that most regions of our
chromosomes are folded into a conformation referred to as a fractal globule: a
knot-free arrangement that facilitates maximally dense packing while, at the same
time, preserving the ability of the chromatin fiber to unfold and fold (Figure 4–55).
(B)
only as needed, and they create a high local concentration of the many different
enzymes and RNA molecules needed for a particular process. In an analogous
way, when DNA is damaged by irradiation, the set of enzymes needed to carry out
DNA repair are observed to congregate in discrete foci inside the nucleus, creating
MBoC6 m4.69/4.56
“repair factories” (see Figure 5–52). And nuclei often contain hundreds of discrete
foci representing factories for DNA or RNA synthesis (see Figure 6–47).
It seems likely that all of these entities make use of the type of tethering illus-
trated in Figure 4–58B, where long flexible lengths of polypeptide chain and/or
long noncoding RNA molecules are interspersed with specific binding sites that
concentrate the multiple proteins and other molecules that are needed to catalyze
a particular process. Not surprisingly, tethers are similarly used to help to speed
biological processes in the cytoplasm, increasing specific reaction rates there (for
example, see Figure 16–18).
Is there also an intranuclear framework, analogous to the cytoskeleton, on
which chromosomes and other components of the nucleus are organized? The chromosome
nuclear matrix, or scaffold, has been defined as the insoluble material left in the
nucleus after a series of biochemical extraction steps. Many of the proteins and
RNA molecules that form this insoluble material are likely to be derived from the
fibrous subcompartments of the nucleus just discussed, while others may be pro-
teins that help to form the base of chromosomal loops or to attach chromosomes
to other structures in the nucleus.
centromere
Mitotic Chromosomes Are Especially Highly Condensed
Having discussed the dynamic structure of interphase chromosomes, we now
turn to mitotic chromosomes. The chromosomes from nearly all eukaryotic cells
become readily visible by light microscopy during mitosis, when they coil up to
form highly condensed structures. This condensation reduces the length of a
typical interphase chromosome only about tenfold, but it produces a dramatic
change in chromosome appearance. chromatid
Figure 4–59 depicts a typical mitotic chromosome at the metaphase stage
of mitosis (for the stages of mitosis, see Figure 17–3). The two DNA molecules Figure 4–59 A typical mitotic
produced by DNA replication during interphase of the cell-division cycle are chromosome at metaphase. Each sister
chromatid contains one of two identical
separately folded to produce two sister chromosomes, or sister chromatids, held sister DNA molecules generated earlier in
together at their centromeres, as mentioned earlier. These chromosomes are nor- the cell cycle by DNA replication (see also
mally covered with a variety of molecules, including large amounts of RNA–protein Figure 17–21).
MBoC6 m4.70/4.57
THE GLOBAL STRUCTURE OF CHROMOSOMES 215
complexes. Once this covering has been stripped away, each chromatid can be
seen in electron micrographs to be organized into loops of chromatin emanating
from a central scaffolding (Figure 4–60). Experiments using DNA hybridization chromatid 2
to detect specific DNA sequences demonstrate that the order of visible features
along a mitotic chromosome at least roughly reflects the order of genes along the
DNA molecule. Mitotic chromosome condensation can thus be thought of as the
final level in the hierarchy of chromosome packaging (Figure 4–61).
The compaction of chromosomes during mitosis is a highly organized and 0.1 µm
dynamic process that serves at least two important purposes. First, when conden-
sation is complete (in metaphase), sister chromatids have been disentangled from
each other and lie side by side. Thus, the sister chromatids can easily separate
when the mitotic apparatus begins pulling them apart. Second, the compaction
of chromosomes protects the relatively fragile DNA molecules from being broken
as they are pulled to separate daughter cells.
The condensation of interphase chromosomes into mitotic chromosomes MBoC6 m4.71/4.58
begins in early M phase, and it is intimately connected with the progression of
the cell cycle. During M phase, gene expression shuts down, and specific mod-
ifications are made to histones that help to reorganize the chromatin as it com-
pacts. Two classes of ring-shaped proteins, called cohesins and condensins, aid
this compaction. How they help to produce the two separately folded chromatids
of a mitotic chromosome will be discussed in Chapter 17, along with the details
of the cell cycle.
short region of 2 nm
DNA double helix
“beads-on-a-string” 11 nm
form of chromatin
chromatin fiber
of packed 30 nm
nucleosomes
chromatin fiber
700 nm
folded into loops
centromere
entire
mitotic 1400 nm
chromosome
Figure 4–61 Chromatin packing. This
NET RESULT: EACH DNA MOLECULE HAS BEEN model shows some of the many levels
PACKAGED INTO A MITOTIC CHROMOSOME THAT of chromatin packing postulated to give
IS 10,000-FOLD SHORTER THAN ITS FULLY rise to the highly condensed mitotic
EXTENDED LENGTH
chromosome.
216 Chapter 4: DNA, Chromosomes, and Genomes
Summary
Chromosomes are generally decondensed during interphase, so that the details
of their structure are difficult to visualize. Notable exceptions are the specialized
lampbrush chromosomes of vertebrate oocytes and the polytene chromosomes in
the giant secretory cells of insects. Studies of these two types of interphase chromo-
somes suggest that each long DNA molecule in a chromosome is divided into a large
number of discrete domains organized as loops of chromatin that are compacted by
further folding. When genes contained in a loop are expressed, the loop unfolds and
allows the cell’s machinery access to the DNA.
Interphase chromosomes occupy discrete territories in the cell nucleus; that is,
they are not extensively intertwined. Euchromatin makes up most of interphase
chromosomes and, when not being transcribed, it probably exists as tightly folded
fibers of compacted nucleosomes. However, euchromatin is interrupted by stretches
of heterochromatin, in which the nucleosomes are subjected to additional packing
that usually renders the DNA resistant to gene expression. Heterochromatin exists in
several forms, some of which are found in large blocks in and around centromeres
and near telomeres. But heterochromatin is also present at many other positions on
chromosomes, where it can serve to help regulate developmentally important genes.
The interior of the nucleus is highly dynamic, with heterochromatin often posi-
tioned near the nuclear envelope and loops of chromatin moving away from their
chromosome territory when genes are very highly expressed. This reflects the exis-
tence of nuclear subcompartments, where different sets of biochemical reactions
are facilitated by an increased concentration of selected proteins and RNAs. The
components involved in forming a subcompartment can self-assemble into discrete
organelles such as nucleoli or Cajal bodies; they can also be tethered to fixed struc-
tures such as the nuclear envelope.
During mitosis, gene expression shuts down and all chromosomes adopt a
highly condensed conformation in a process that begins early in M phase to pack-
age the two DNA molecules of each replicated chromosome as two separately folded
chromatids. The condensation is accompanied by histone modifications that facil-
itate chromatin packing, but satisfactory completion of this orderly process, which
reduces the end-to-end distance of each DNA molecule from its interphase length by
an additional factor of ten, requires additional proteins.
nucleotide sequences with the sequences of genes that have been characterized
in other more readily studied organisms.
In general, the sequences of individual genes are much more tightly con-
served than is overall genome structure. Features of genome organization such
as genome size, number of chromosomes, order of genes along chromosomes,
abundance and size of introns, and amount of repetitive DNA are found to differ
greatly when comparing distant organisms, as does the number of genes that each
organism contains.
exon intron
mouse
GTGCCTATCCAGAAAGTCCAGGATGACACCAAAACCCTCATCAAGACCATTGTCACCAGGATCAATGACATTTCACACACGGTA-GGAGTCTCATGGGGGGACAAAGATGTAGGACTAGA
GTGCCCATCCAAAAAGTCCAAGATGACACCAAAACCCTCATCAAGACAATTGTCACCAGGATCAATGACATTTCACACACGGTAAGGAGAGT-ATGCGGGGACAAA---GTAGAACTGCA
human
mouse
ACCAGAGTCTGAGAAACATGTCATGCACCTCCTAGAAGCTGAGAGTTTAT-AAGCCTCGAGTGTACAT-TATTTCTGGTCATGGCTCTTGTCACTGCTGCCTGCTGAAATACAGGGCTGA
GCCAG--CCC-AGCACTGGCTCCTAGTGGCACTGGACCCAGATAGTCCAAGAAACATTTATTGAACGCCTCCTGAATGCCAGGCACCTACTGGAAGCTGA--GAAGGATTTGAAAGCACA
human
Figure 4–65 The very different rates of evolution of exons and introns, as illustrated by comparing a portion of the
mouse and human leptin genes. Positions where the sequences differ by a single nucleotide substitution are boxed in green,
and positions that differ by the addition or deletion of nucleotides are boxed in yellow. Note that, thanks to purifying selection,
the coding sequence of the exon is much more conserved than is the adjacent intron sequence.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X
mouse
chromosome
index 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 X
Good evidence for the loss of DNA sequences in small blocks during evolution
can be obtained from a detailed comparison of regions of synteny in the human
and mouse genomes. The comparative shrinkage of the mouse genome can be
clearly seen from such comparisons, with the net loss of sequences scattered
MBoC6 m4.601/4.66
throughout the long stretches of DNA that are otherwise homologous (Figure
4–68).
DNA is added to genomes both by the spontaneous duplication of chromo-
somal segments that are typically tens of thousands of nucleotide pairs long
(as will be discussed shortly) and by insertion of new copies of active transposons.
Most transposition events are duplicative, because the original copy of the
transposon stays where it was when a copy inserts at the new site; see, for exam-
ple, Figure 5–63. Comparison of the DNA sequences derived from transposons
in the human and the mouse readily reveals some of the sequence additions
(Figure 4–69).
It remains a mystery why all mammals have maintained genome sizes of
roughly 3 billion nucleotide pairs that contain nearly identical sets of genes,
even though only approximately 150 million nucleotide pairs appear to be under
sequence-specific functional constraints.
cave in
Vindija, Croatia
blocks from many different species, thereby identifying large numbers of so-called
multispecies conserved sequences: some of these code for protein, but most of
them do not (Figure 4–73).
Most of the noncoding conserved sequences discovered in this way turn out
to be relatively short, containing between 50 and 200 nucleotide pairs. Among the
most mysterious are the so-called “ultraconserved” noncoding sequences, exem-
plified by more than 5000 DNA segments over 100 nucleotides long that are exactly
the same in human, mouse, and rat. Most have undergone little or no change
since mammalian and bird ancestors diverged about 300 million years ago. The
strict conservation implies that even though the sequences do not encode pro-
teins, each nevertheless has an important function maintained by purifying selec-
tion. The puzzle is to unravel what those functions are.
Many of the conserved sequences that do not code for protein are now known
to produce untranslated RNA molecules, such as the thousands of long noncoding
RNAs (lncRNAs) that are thought to have important functions in regulating gene
transcription. As we shall also see in Chapter 7, others are short regions of DNA
scattered throughout the genome that directly bind proteins involved in gene reg-
ulation. But it is uncertain how much of the conserved noncoding DNA can be
accounted for in these ways, and the function of most of it remains a mystery. This
enigma highlights how much more we need to learn about the fundamental bio-
logical mechanisms that operate in animals and other complex organisms, and its
solution is certain to have profound consequences for medicine.
How can cell biologists tackle the mystery of noncoding conserved DNA? Tra-
ditionally, attempts to determine the function of a puzzling DNA sequence begin
by looking at the consequences of its experimental disruption. But many DNA
sequences that are crucial for an organism in the wild can be expected to have no
noticeable effect on its phenotype under laboratory conditions: what is required
for a mouse to survive in a laboratory cage is very much less than what is required
5′ 3′
100%
chimpanzee 50% Figure 4–73 The detection of
orangutan multispecies conserved sequences.
In this example, genome sequences
baboon
for each of the organisms shown have
marmoset been compared with the indicated region
of the human CFTR (cystic fibrosis
lemur
transmembrane conductance regulator)
rabbit gene; this region contains one exon plus
percent a large amount of intronic DNA. For each
horse identity organism, the percent identity with human
cat for each 25-nucleotide block is plotted
in green. In addition, a computational
dog algorithm has been used to detect the
mouse
sequences within this region that are most
highly conserved when the sequences from
opossum all of the organisms are taken into account.
Besides the exon (dark blue on the line at
chicken
the top of the figure), the positions of three
Fugu 100% other blocks of multispecies conserved
50%
sequences are indicated (pale blue). The
100 nucleotide pairs function of most such sequences in the
human genome is not known. (Courtesy of
10,000 nucleotide pairs Eric D. Green.)
226 Chapter 4: DNA, Chromosomes, and Genomes
reception of
extracellular signals
HUMAN
MOUSE
Figure 4–74 The types of changes
COW in gene regulation inferred to have
predominated during the evolution of
PLATYPUS our vertebrate ancestors. To produce
CHICKEN
the information summarized in this plot,
wherever possible the type of gene
development and FROG regulated by each conserved noncoding
transcription sequence was inferred from the identity of
FISH
regulation its closest protein-coding gene. The fixation
post-translational time for each conserved sequence was
protein then used to derive the conclusions shown.
modification (Based on C.B. Lowe et al., Science
500 400 300 200 100 0 333:1019–1024, 2011. With permission
millions of years before present from AAAS.)
228 Chapter 4: DNA, Chromosomes, and Genomes
other. Genes without homologous counterparts are relatively scarce even when
we compare such divergent organisms as a mammal and a worm. On the other
hand, we frequently find gene families that have different numbers of members in
different species. To create such families, genes have been repeatedly duplicated,
and the copies have then diverged to take on new functions that often vary from
one species to another.
Gene duplication occurs at high rates in all evolutionary lineages, contributing
to the vigorous process of DNA addition discussed previously. In a detailed study
of spontaneous duplications in yeast, duplications of 50,000 to 250,000 nucleotide
pairs were commonly observed, most of which were tandemly repeated. These
appeared to result from DNA replication errors that led to the inexact repair of
double-strand chromosome breaks. A comparison of the human and chimpanzee
genomes reveals that, since the time that these two organisms diverged, such seg-
mental duplications have added about 5 million nucleotide pairs to each genome
every million years, with an average duplication size being about 50,000 nucleo-
tide pairs (although there are some duplications five times larger). In fact, if one
counts nucleotides, duplication events have created more differences between
our two species than have single-nucleotide substitutions.
Figure 4–75 A comparison of the structure of one-chain and four-chain single-chain globin binds
globins. The four-chain globin shown is hemoglobin, which is a complex of one oxygen molecule
two α-globin and two β-globin chains. The one-chain globin present in some
primitive vertebrates represents an intermediate in the evolution of the four-chain
globin. With oxygen bound it exists as a monomer; without oxygen it dimerizes.
copies have remained active. In many cases, the most obvious functional differ-
ence between the duplicated genes is that they are expressed in different tissues
or at different stages of development. One attractive theory to explain such an end oxygen-
result imagines that different, mildly deleterious mutations occur quickly in both binding site
copies of a duplicated gene set. For example, one copy might lose expression in on heme
EVOLUTION OF A
a particular tissue as a result of a regulatory mutation, while the other copy loses SECOND GLOBIN
CHAIN BY
expression in a second tissue. Following such an occurrence, both gene copies GENE DUPLICATION
would be required to provide the full range of functions that were once supplied FOLLOWED BY
MUTATION
by a single gene; hence, both copies would now be protected from loss through
inactivating mutations. Over a longer period, each copy could then undergo fur- β
ther changes through which it could acquire new, specialized features. β
molecule consisting of two α chains and two β chains (Figure 4–75). The four oxy- b
gen-binding sites in the α2β2 molecule interact, allowing a cooperative allosteric 300
change in the molecule as it binds and releases oxygen, which enables hemoglo- a b
bin to take up and release oxygen more efficiently than the single-chain version.
Still later, during the evolution of mammals, the β-chain gene apparently 500
translocation
underwent duplication and mutation to give rise to a second β-like chain that separating a single-chain
is synthesized specifically in the fetus. The resulting hemoglobin molecule has a and b genes globin
higher affinity for oxygen than adult hemoglobin and thus helps in the transfer 700
of oxygen from the mother to the fetus. The gene for the new β-like chain subse-
Figure 4–76 An evolutionary scheme
quently duplicated and mutated again to produce two new genes, ε and γ, the ε
for the globin chains that carry oxygen
chain being produced earlier in development (to form α2ε2) than the fetal γ chain, in the blood of animals. The scheme
which forms α2γ2. A duplication of the adult β-chain gene occurred still later, emphasizes the β-like globin gene family.
during primate evolution, to give rise to a δ-globin gene and thus to a minor form A relatively recent gene duplication of the
of hemoglobin (α2δ2) that is found only in adult primates (Figure 4–76). γ-chain gene produced γG and γA, which
are fetal β-like chains of identical function.
Each of these duplicated genes has been modified by point mutations that The location of the globin genes in the
affect the properties of the final hemoglobin molecule, as well as by changes in human genome is shown at the top of
regulatory regions that determine the timing and level of expression of the gene. the figure.
230 Chapter 4: DNA, Chromosomes, and Genomes
compared. For example, typical human and chimpanzee DNA sequences differ
from one another by about 1%. In contrast, when the same region of the genome
is sampled from two randomly chosen humans, the differences are typically about
0.1%. For more distantly related organisms, the interspecies differences outshine
intraspecies variation even more dramatically. However, each “fixed difference”
between the human and the chimpanzee (in other words, each difference that is
now characteristic of all or nearly all individuals of each species) started out as a
new mutation in a single individual. If the size of the interbreeding population in
which the mutation occurred is N, the initial allele frequency for a new mutation
would be 1/(2N) for a diploid organism. How does such a rare mutation become
fixed in the population, and hence become a characteristic of the species rather
than of a few scattered individuals?
The answer to this question depends on the functional consequences of the
mutation. If the mutation has a significantly deleterious effect, it will simply be
eliminated by purifying selection and will not become fixed. (In the most extreme
case, the individual carrying the mutation will die without producing progeny.)
Conversely, the rare mutations that confer a major reproductive advantage on
individuals who inherit them can spread rapidly in the population. Because
humans reproduce sexually and genetic recombination occurs each time a gam-
ete is formed (discussed in Chapter 5), the genome of each individual who has
inherited the mutation will be a unique recombinational mosaic of segments
inherited from a large number of ancestors. The selected mutation along with a
modest amount of neighboring sequence—ultimately inherited from the individ-
ual in which the mutation occurred—will simply be one piece of this huge mosaic.
The great majority of mutations that are not harmful are not beneficial either.
These selectively neutral mutations can also spread and become fixed in a pop-
ulation, and they make a large contribution to evolutionary change in genomes.
For example, as we saw earlier, they account for most of the DNA sequence dif-
ferences between apes and humans. The spread of neutral mutations is not as
rapid as the spread of the rare strongly advantageous mutations. It depends on
a random variation in the number of mutation-bearing progeny produced by
each mutation-bearing individual, causing changes in the relative frequency of
the mutant allele in the population. Through a sort of “random walk” process, the
mutant allele may eventually become extinct, or it may become commonplace.
This can be modeled mathematically for an idealized interbreeding population,
on the assumption of constant population size and random mating, as well as
selective neutrality for the mutations. While neither of the first two assumptions
is a good description of human population history, study of this idealized case
reveals the general principles in a clear and simple way.
When a new neutral mutation occurs in a population of constant size N that
is undergoing random mating, the probability that it will ultimately become fixed
is approximately 1/(2N). This is because there are 2N copies of the gene in the
diploid population, and each of them has an equal chance of becoming the pre-
dominant version in the long run. For those mutations that do become fixed, the
mathematics shows that the average time to fixation is approximately 4N gener-
ations. Detailed analyses of data on human genetic variation have suggested an
ancestral population size of approximately 10,000 at the time when the current
pattern of genetic variation was largely established. With a population that has
reached this size, the probability that a new, selectively neutral mutation would
become fixed is small (1/20,000), while the average time to fixation would be on
the order of 800,000 years (assuming a 20-year generation time). Thus, while we
know that the human population has grown enormously since the development
of agriculture approximately 15,000 years ago, most of the present-day set of com-
mon human genetic variants reflects the mixture of variants that was already pres-
ent long before this time, when the human population was still small.
Similar arguments explain another phenomenon with important practical
implications for genetic counseling. In an isolated community descended from
a small group of founders, such as the people of Iceland or the Jews of Eastern
232 Chapter 4: DNA, Chromosomes, and Genomes
Europe, genetic variants that are rare in the human population as a whole can
often be present at a high frequency, even if those variants are mildly deleterious
(Figure 4–78).
MBoC6 n4.448/4.76.5
A Great Deal Can Be Learned from Analyses of the Variation
Among Humans
Even though the common variant gene alleles among modern humans originate
from variants present in a comparatively tiny group of ancestors, the total number
of variants now encountered, including those that are individually rare, is very
large. New neutral mutations are constantly occurring and accumulating, even
though no single one of them has had enough time to become fixed in the vast
modern human population.
From detailed comparisons of the DNA sequences of a large number of mod-
ern humans located around the globe, scientists can estimate how many gener-
ations have elapsed since the origin of a particular neutral mutation. From such
data, it has been possible to map the routes of ancient human migrations. For
example, by combining this type of genetic analysis with archaeological findings,
scientists have been able to deduce the most probable routes that our ancestors
took when they left Africa 60,000 to 80,000 years ago (Figure 4–79).
We have been focusing on mutations that affect a single gene, but these are not
the only source of variation. Another source, perhaps even more important but
missed for many years, lies in the many duplications and deletions of large blocks
of human DNA. When one compares any individual human with the standard
reference genome in the database, one will generally find roughly 100 differences
involving gain or loss of long sequence blocks, totaling perhaps 3 million nucleo-
tide pairs. Some of these copy number variations (CNVs) will be very common,
presumably reflecting relatively ancient origins, while others will be present in Figure 4–79 Tracing the course of
only a small minority of people (Figure 4–80). On average, nearly half of the CNVs human history by analyses of genome
sequences. The map shows the
contain known genes. CNVs have been implicated in many human traits, includ- routes of the earliest successful human
ing color blindness, infertility, hypertension, and a wide variety of disease suscep- migrations. Dotted lines indicate two
tibilities. In retrospect, this type of variation is not surprising, given the prominent alternative routes that our ancestors are
role of DNA addition and DNA loss in vertebrate evolution. thought to have taken out of Africa. DNA
The intraspecies variations that have been most extensively characterized, sequence comparisons suggest that
modern Europeans descended from a
however, are single-nucleotide polymorphisms (SNPs). These are simply points small ancestral population that existed
in the genome sequence where one large fraction of the human population has about 30,000 to 50,000 years ago.
one nucleotide, while another substantial fraction has another. To qualify as In agreement, archaeological findings
suggest that the ancestors of modern
native Australians (solid red arrows)—and
of modern European and Middle Eastern
populations—reached their destinations
about 45,000 years ago. Even more recent
studies, comparing the genome sequences
of living humans with those of Neanderthals
and another extinct population from
southern Siberia (the Denisovans), suggest
that our exit from Africa was a bit more
convoluted, while also revealing that a
number of our ancestors interbred with
these hominid neighbors as they made
their way across the globe. (Modified from
P. Forster and S. Matsumura, Science
308:965–966, 2005.)
ECB4 e19.37/19.41
HOW GENOMES EVOLVE 233
Summary
Comparisons of the nucleotide sequences of present-day genomes have revolution- WHAT WE DON’T KNOW
ized our understanding of gene and genome evolution. Because of the extremely
high fidelity of DNA replication and DNA repair processes, random errors in main-
• How many different types of
taining the nucleotide sequences in genomes occur so rarely that only about one chromatin structure are important for
nucleotide in a thousand is altered in every million years in any particular eukary- cells? How is each of these structures
otic line of descent. Not surprisingly, therefore, a comparison of human and chim- established and maintained, and
panzee chromosomes—which are separated by about 6 million years of evolution— which ones tend to be inherited
reveals very few changes. Not only are our genes essentially the same, but their order following DNA replication?
on each chromosome is almost identical. Although a substantial number of seg-
mental duplications and segmental deletions have occurred in the past 6 million • Why are there so many different
years, even the positions of the transposable elements that make up a major portion chromatin remodeling complexes in
of our noncoding DNA are mostly unchanged. cells? What are their essential roles,
When one compares the genomes of two more distantly related organisms—such and how do they get loaded onto
as a human and a mouse, separated by about 80 million years—one finds many chromatin at specific places and at
more changes. Now the effects of natural selection can be clearly seen: through puri- specific times?
fying selection, essential nucleotide sequences—both in regulatory regions and in
coding sequences (exons)—have been highly conserved. In contrast, nonessential • How do chromosomal loops form
sequences (for example, much of the DNA in introns) have been altered to such an during interphase, and what happens
extent that one can no longer see any family resemblance. to these loops in condensed mitotic
Because of purifying selection, the comparison of the genome sequences of chromosomes?
multiple related species is an especially powerful way to find DNA sequences with
important functions. Although about 5% of the human genome has been conserved • What genetic changes made
as a result of purifying selection, the function of the majority of this DNA (tens of us uniquely human? What further
thousands of multispecies conserved sequences) remains mysterious. Future exper- aspects of our recent evolutionary
development can be reconstructed
iments characterizing its functions should teach us many new lessons about verte-
by sequencing DNA from remains of
brate biology. ancient hominids?
Other sequence comparisons show that a great deal of the genetic complexity of
present-day organisms is due to the expansion of ancient gene families. DNA dupli-
• How much of the enormous
cation followed by sequence divergence has clearly been a major source of genetic complexity that we find in cell biology
novelty during evolution. On a more recent time scale, the genomes of any two is unnecessary, having evolved by
humans will differ from each other both because of nucleotide substitutions (sin- random drift?
gle-nucleotide polymorphisms, or SNPs) and because of inherited DNA gains and
DNA losses that cause copy number variations (CNVs). Understanding the effects
of these differences will improve both medicine and our understanding of human
biology.
PROBLEMS
Which statements are true? Explain why or why not.
4–1 Human females have 23 different chromosomes, served DNA sequences facilitates the search for function-
whereas human males have 24. ally important regions.
4–2 The four core histones are relatively small proteins 4–5 Gene duplication and divergence is thought to
with a very high proportion of positively charged amino have played a critical role in the evolution of increased bio-
acids; the positive charge helps the histones bind tightly to logical complexity.
DNA, regardless of its nucleotide sequence.
Discuss the following problems.
4–3 Nucleosomes bind DNA so tightly that they cannot
4–6 DNA isolated from the bacterial virus M13 con-
move from the positions where they are first assembled.
tains 25% A, 33% T, 22% C, and 20% G. Do these results
4–4 In a comparison between the DNAs of related strike you as peculiar? Why or why not? How might you
organisms such as humans and mice, identifying the con- explain these values?
CHAPTER 4 END-OF-CHAPTER PROBLEMS 235
4–13 Look at the two yeast colonies in Figure Q4–3. Each chromosome 2
of these colonies contains about 100,000 cells descended
from a single yeast cell, originally somewhere in the mid- 100 kb HoxD cluster
dle of the clump. A white colony arises when the Ade2 gene
is expressed from its normal chromosomal location. When Figure Q4–4 Transposable elements and genes in 1-Mb regions of
the Ade2 gene is moved to a location near a telomere, it chromosomes 2 and 22 (Problem 4–14). Blue lines that project upward
is packed into heterochromatin and inactivated in most indicate exons of known genes. Red lines that project downward
indicate transposable elements; they are so numerous (constituting more
cells, giving rise to colonies that are mostly red. In these than 40% of the human genome) that they merge into nearly a solid
largely red colonies, white sectors fan out from the middle block outside the Hox clusters. (Adapted from E. Lander et al., Nature
of the colony. In both the red and white sectors, the Ade2 409:860–921, 2001. With permission from Macmillan Publishers Ltd.)
Problems p4.31/4.22/Q4.4
236 Chapter 4: DNA, Chromosomes, and Genomes