Physical Mapping
Physical Mapping
Physical Mapping
Physical Mapping
a one-dimensional jigsaw puzzle
.
four cloned inserts as probes produced the bright spots on the metaphase chromo-
somes in the micrograph shown on the page opposite.
Using these contig maps, we can localize any cloned fragment or other DNA probe,
again by hybridization, to a much smaller portion of the genome, namely to one of
the cloned fragments in one of the maps. Moreover, we can determine the position
of any DNA probe relative to all other landmarks that have been similarly localized.
Once contig maps are constructed, the entire genome will be available as cloned
fragments, and we will be able to use these clones to analyze any region down to
the level of its base sequence.
74-$(
ern Linkage Mapping”).
Contig map of fragment
that make up the map also provide the
overlapping
material needed to sequence the human cloned fragments
~ “,
+’==+’
genome.
Question: How do we obtain the clones that compose the contig maps?
Example: The figure illustrates the steps in preparing a library of cloned DNA
fragments. We start by isolating the DNA from many human cells. Then we break
up the DNA into a large set of ovet-lapping fragments by partial digestion ot’ the
DNA with J restriction enzyme. A restriction enzyme digests d DNA rnolccule
by recognizing and cleaving the tnolecule at every occurrence 0[ a particular- short
sequence usually four to eight base pairs long. Such a site is called a restriction
site and is marked on the figure by a dot. Since complete digestion woLIld yield
nonoverlapping fragment~ (every copy of’ u particular DNA molecule woLlld be
cleaved at the same places), we interrupt the digestion process bei’ore it reaches
completion, thereby leaving many restriction sites intact at random locations along
each molecule. ([n the figure. cleavage is iodictited by a vertical line through the
restriction site. ) Such partial digestion ensures that each resulting t’ragmcn[ will
overlap other fragments in the set.
To create a contig map of’ a single human chromosome, many groups are starting with
a chromosome-specific library of cloned fragments constructed by start in: with many
copies of a particular chromosome. Chromosome-specific libraries arc being made by
the National Laboratory Gene Library Project at Los Alamos and Li\emlore and are
available to research groups throughout the world (see “’Libraries from Flow-sorted
Chromosomes”).
114
Mapping the Genome/P/lysicul MU[Jpi/lLq
@’
o \\
/
the current state of sequencing technology.
that approach is totally impractical.
each clone have separated into distinct bands, each band consisting of all the
restriction fragments with a particular length. (The bands are made visible by staining,
and each gel is calibrated with fragments of known lengths. )
The region of overlap between the two clones shown in the figure yields four
restriction fragments with Ien.gtbs of 4, 2. 6.5, and 5 kbp. Thus the fingerprints
of the two clones have four bands in common at the gel positions corresponding to
those lengths. Suppose these two fingerprints were the only information we had about
the two clones shown in the figure. We might suspect that the clones overlap one
another and that the overlap region included four restriction i’ragments with lengths
of 2, 4, 5, and 6.5 kbp. We might then partition the restriction fragments into a region
of overlap and two regions of nonoverlap as shown in part (c) of the figure. Note that
we would have no way to impose any further ordering on the restriction fragments
present in the fingerprint. Shown in (d) is a photograph of actual fingerprint data.
Question: Can we infer that two clones overlap solely on the basis of their
restriction-fragment fingerprints?
Question: How are pairs of clones with a high likelihood of overlap assembled into
contigs, sets of contiguous overlapping clones?
117
Mapping the Genome/Phy.si(ui Muppi}7,q
standard procedure is to find pairs of’ clones, link thost pairs into groups, and then
attempt to order all the restriction fragments within each group of clones in a \elf-
consistent manner. The method is essentially an incremental approach. As each new
clone is added to a contig, one tries lo retain as much of the existing construction as
possible even in the face of contradictory data.
A significant departure from the incremental procedure has recently been dcvelope(i
at Los Alamos. Map construction is treated as an opl.imiza[ion problem in which
all available data are taken into account rather thm only the data yielding high
overlap probabilities. A description of this global approach to map construction is
discussed in “Computation and the I Iuman Genome Project.” Here we illustrate the
more standard procedure.
Example 2: Suppose that the fingerprints of clones A, B, and C reveal that clones
A and B have five fragment lengths in common, A and C Iwvc six fragment lengths
in common, and B and C have one fragment length in common. Furthermore. we
have calculated from those data that the likelihood of A and B overlapping ii 90
percent, of A and C overlapping is 95 percent, and of B and C overlapping is I()
percent. We would then assemble the three clones into a contig as shown in the
figure, where some restriction fragments are placed in regions of overlap and the
Assembly of a Contig
Overlap
—..—
Clone B ~1 Ov<,r,ap
“one A ~G
I Ill I I 1
Clone C I+J
Overlap
remaining ones are placed in the regions of nonoverlap. As we add other clones to
the contig, we might have to revise the partitioning of the fragments into overlapping
and nonoverlapping regions to construct a consistent ordering for the entire coruig.
Because of the unce~ainties in fm,gment lengths and the possibility that fragments of
equal length are not necessarily the same fragment, complicated computer aigoritnms
are necessary to determine the most likely order of the clones in a contig. When the
number of clones in a contig is much larger than the number required to span the
region covered by the contig, we can order many of the restriction fragments {hat
appear in each fingerprint and thereby help to avoid some false overlaps.
100,000 base pairs. Also shown Lull 11 Ill ILII Ill I I 1,, ,1, ,1, ,,1 ,,, 1, ,1, II ,,1 ,.LuLLuL—
Answer: No. In a random tingerprintmg strategy, both the numbers and sizes of the
contigs grow fairl> rapidly at tirit, but tine rates of growth decrease after the existing
contigs cover abou[ two-thirds of’ the region to be mapped. The decrease in growth
rate is due to the increasing probabi Iity that a randomly seiectea clone faHs within a
region for which a contig has already been assembled. Contig growth is also limited
because smal I overlaps typically go undetected and some portions of the region being
mappcci may not have survived the cloning process. h fact, contigs assembled from
cosmid clones typical Iy stop growing after reaching iengths of 100 kbp.
Question: How do we order disconnected contigs along the chromosome and how
do we check their accuracy?
Answer: Many types of lower-resolution maps can be used to position the contigs
along a chromosome and to check that all the clones in a contig come from approx-
imately the same region of the genome.
Example: The contigs constructed for yeast chromosomes, which had an average
length of 100 kbp, were ordered relative to a high-clensity genetic-linkage map
containing 400 markers spaced at an average physical distance of 30,000 base pairs.
To check the integrity of each contig, the clones that fonrn it were hybridized to very
,,
Uu
II
I
‘,
,’,’
/’
,’, ,
1, ,’ /
!, ,’
I I I b’lkagemap
m r) :3D0
n g~ ~ -0
c n ~~:~
J3 N co. x
$ -;
3
g .P m
m
I I I I I I I I I I I I I I
100 50 0 50
Genetic distance (centimorgans)
The high-resolution restriction map for yeast chromosome I was derived from a completed contig map of the
chromosome. The Xs mark the beginning of the subtelomeric regions which are known to lie a few thousand
base pairs away from the telomeres (ends) of the chromosome. Restriction sites for the thirteen-base cutter SfiI
and the eight-base cutter JVotl and markers on the linkage map of chromosome I are localized tcl particular restric-
tion fragments on the high-resolution restriction map. (Courtesy of Maynard Olson, Washington lJniversity)
long (over 100.000 base pairs) restriction fragments of DNA that had been separated
by pulsed-field gel electrophoresis. If the clones assigned to a contig do in Fact come
from a single region ot’ the genome. it is likely that all of them will hybridize to a
single large fragment on the gel.
The figure shows the high-resolution restriction map deduced from the completed
contig map of yeast chromosome 1. Also shown is the alignment of the restriction
map with two other maps: ( I ) tbe genetic-linkage map and (2) a long-range restriction
map showing the distances between the eight-base restriction sites ot’ the enzylme NotI
and the thirteen-base restriction site of S’i[. (The latter map was constructed using
pulsed-field gel electrophoresis. ) Markers on the genetic-linkage map and restriction
sites on the long-range restriction map have been localized to particul:~rre$+tri~tion
fragments on the conti.g map. Those correspondences are indicated by dotted lines.
The conti.gs being assembled for human chromosomes are being checked by a variety
of’ techniques including in-situ hybridization and hybridization to the DNA from
hybrid cells containing increasingly longer portions of the chromosome being mapped
(see “The Mapping of Chromosome 16“).
Question: After the contigs are ordered and checked for accuracy, how do we jill
in the gaps between the contigs?
I’71
Mapping the Genome/Phy.rical hluppin,q
“-.
,,
“~
cle, which is amenable to automation
Contig X Contlg Y and improved data-analysis algorithms,
is only a small fraction of the work. The
Four walking steps are needed to bridge the gap between two contlgs
rest of the work required to close gaps
I -J
-1 between contigs and to track down in-
1
consistencies such as the branching of
t 4 +====%+
one contig into two or more contigs in-
I 1 lb——=—=+
1 volves many standard molecular-biology
~( {
t 1 4
procedures, which, in the case of the
1
1 I
!+—===’+
human genome, must be carried out on
‘—v
) L an unprecedented scale. It is estimated
Contlg W Contlg Z that the completion of the yeast map
took about 20 person-years of work, und
+=-===+ Clone in a contlg the mapping of ca<h human chromosome
Probe used to find the next clone in a walk will take about 100 person-years, Fur-
ther, mapping of human chromosomes
~ Next clone in a walk
presents some new challenges,
● An average human chromosome is ten times the size of the yeast genorne, and the
increased size calls for more efficient mapping strategies, such as working with
larger clones.
● Unlike the genomes of yeast and E. cdi, human DNA contains repetitive elements
that require a new fingerprinting strategy to avoid inferring overlaps between clones
containing long repetitive stretches of DNA near their ends.
● Experience has shown that regions containing repetitive sequences are often lost in
the cloning process. Consequently, parts of the puzzle {ofeach human chromosome
may be missing, in which case completion of the map will require specialized
techniques.
These challenges are being met in a variety of ways including the use of YAC
cloning vectors, which accept DNA inserts eight to ten iimes Larger than the inserts
accepted by cosmids, and the use of STS markers, which, unlike restriction-fragment
fingerprints, identify unique Ianamarks on the map and therefore eliminate the need
for complicated probabilistic analyses to infer overlap between two clones. ■