Physical Mapping

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

Mapping the Genome/Physical Mapping

Physical Mapping
a one-dimensional jigsaw puzzle
.

The human genome consists of forty-six double-stranded DNA molecules. Each


molecule is made up, on average, of 130 million base pairs strung in a linear order
between two sugar-phosphate backbones, and each is wound around proteins to
form a chromosome. In order to study genes and other interesting regions of the
genome at the molecular level, standard practice is to isolate the DNA and break up
the long molecules into many fragments. We then make many identical copies of
each fragment by cloning and pick out the clones of interest. Almost all methods
for analyzing DNA at the molecular level require many copies of the fragment of
interest. Therefore, cloning is essential for procedures such as finding the positions
of restriction-enzyme cutting sites, determining the sequence of nucleotide bases in a
particular DNA fragment, and identifying polymorphic DNA markers. However, in
fragmenting the DNA molecules prior to cloning, we lose all information about the
physical locations of fragments along the genome itself.

Problem: How do we find the chromosomalpositions of known genes, polymorphic


markers, and other cloned portions of the human genome?

Low-Resolution Physical Mapping by In-Situ Hybridization


In contrast to a linkage map, which specifies statistical distances between variable
DNA markers and genes in terms of recombination fractions (see “Classical Linkage
Mapping”), a physical map specifies physical distances between landmarks on the
DNA molecule of each chromosome.

One standard low-resolution method for finding the physi-


In-Situ Hybridization on Human Chromosome 21
cal position of a cloned fragment is in-situ hybridization on
metaphase chromosomes. We first find a segment within
the cloned region whose base sequence occurs nowhere
else in the genome, We then synthesize many copies
of a single strand of that unique segment and label each
copy with a fluorescent tag to make it useful as a DNA
probe. A solution containing the DNA probe is then ap-
plied to a spread of chromosomes that have been arrested
at metaphase and fixed to a microscope slide. (Metaphase
is the phase of cell division during which chromosomes
have condensed to form the wormlike shapes easily visi-
ble under a light microscope.) Under appropriate conditions
the probe binds, or hybridizes, only to the chromosomal
DNA with a base sequence exactly complementary to that
Four DNAprobes labeled with a fluorescent dye produce positive of the probe (see “Hybridization” in “Understanding Inheri-
hybridization signals at four locations along chromosome 2!. tance”). The position on a metaphase chromosome where the
Because metaphase chromosomes are made up of two nearly probe has hybridized is imaged with a fluorescence micro-
identical sister chromatics, each probe produces a pair of signals. scope as a bright spot. Because DNA molecules are wound
very tightly during metaphase, the resolution achieved with
in-situ hybridization is low, about 3 million base pairs. In other words, the hybridiza-
tion signals from two probes less than 3 million base pairs apart will overlap one
another and cannot be resolved into two distinct spots. In-situ hybridization using

112 Los Alamos Science Number 20 1992


Mapping the Genome/Physicu/ .L4appitl,y

four cloned inserts as probes produced the bright spots on the metaphase chromo-
somes in the micrograph shown on the page opposite.

High-Resolution Physical Mapping by Construction of Contig Maps


of Overlapping Clones
To determine the positions of genomic landmarks with much greater resolution, we
can replace the chromosomes themselves with twenty-four contig maps, one for each
of our twenty-two homologous chromosome pairs and one for each of our two sex
chromosomes. A contig map is a set of contiguous overlapping cloned fragments
that have been positioned relative to one another. In a complete contig map for a
human chromosome, the cloned fragments would include all the DNA present in
the chromosome and follow the same order found on the DNA molecule of the
chromosome. As in any physical map, distances are measured in base pairs.

Using these contig maps, we can localize any cloned fragment or other DNA probe,
again by hybridization, to a much smaller portion of the genome, namely to one of
the cloned fragments in one of the maps. Moreover, we can determine the position
of any DNA probe relative to all other landmarks that have been similarly localized.
Once contig maps are constructed, the entire genome will be available as cloned
fragments, and we will be able to use these clones to analyze any region down to
the level of its base sequence.

Example: The figure at right is a


schematic of a contig map for one chro- Physical Map of a Human Chromosome
mosome. Right now, the top prior-
ity of the Human Genome Project is
to construct a contig map for each of ( II m ;)
the twenty-four different chromosomes Human chromosome
in the human genome. Those maps, & (stained to reveal banding pattern)
I
when integrated with the correspond- WI
ing genetic-linkage maps, will provide a
b--’=-+
means of finding the segments of DNA
that contain disease genes (see “Mod-
..
The clones Cloned

74-$(
ern Linkage Mapping”).
Contig map of fragment
that make up the map also provide the
overlapping
material needed to sequence the human cloned fragments
~ “,
+’==+’
genome.

Many different strategies are being de- +


— 130 million base pairs
veloped to make contig maps of hu-
man chromosomes. (Details of the Los
Alamos effort to map a human chromc~- The contig map spans the single DNA molecule contained
some are presented in “The Mapping of in the chromosome.
Chromosome 16.”) Here we introduce
the basic principles of conti,g-map con-
struction.

Number 20 1992 L.(MA/um().r .$(Ie/7( (, 113


Mapping the Genome/Physi<ul Mapping

Question: How do we obtain the clones that compose the contig maps?

Answer: We prepare a collection, or Iibrary. of cloned human DNA t’ragrnents in a


lmanner such that ( 1) essentially all parts of the genome are probably present in the
library and (2) the human DNA fhgments in the clones overlup one ano[her. Overlaps
among the cloned fragments are essential because they allow us to recon~trucl the
order in which the fragments appear along the genome.

Example: The figure illustrates the steps in preparing a library of cloned DNA
fragments. We start by isolating the DNA from many human cells. Then we break
up the DNA into a large set of ovet-lapping fragments by partial digestion ot’ the
DNA with J restriction enzyme. A restriction enzyme digests d DNA rnolccule
by recognizing and cleaving the tnolecule at every occurrence 0[ a particular- short
sequence usually four to eight base pairs long. Such a site is called a restriction
site and is marked on the figure by a dot. Since complete digestion woLIld yield
nonoverlapping fragment~ (every copy of’ u particular DNA molecule woLlld be
cleaved at the same places), we interrupt the digestion process bei’ore it reaches
completion, thereby leaving many restriction sites intact at random locations along
each molecule. ([n the figure. cleavage is iodictited by a vertical line through the
restriction site. ) Such partial digestion ensures that each resulting t’ragmcn[ will
overlap other fragments in the set.

Next, each of these fragments is joined to a cloning vector to foml a recombinant


DNA molecule. A cloning vector is a small DNA molecule that, after entering a host
organism (such as yeast or bticteria). is replicated by the cellular machinery of the
host organism. The cloning vector shown here is a small circular DNA molecule that
has been engineered to include a single cutting site fur the restriction enzyme cho\en
to digest the sample of human DNA. Copies of the cloning vectors tire cut at lhat
site and are mixed with the human DNA fragments. and the enzytne DNA ]igase is
added to the mixture. The “sticky ends” of a cloning vector (which m Iormed by
restrictiomerwyme cleavage) bind to the “sticky ends” 01’a human DNA I’ragrnent,
and the Iigase catalyzes the chemical union of the suxar-phosphate backbones of
the two DNAs into a recombinant DNA molecule. We [hen expose a population of
the host organi~rn to the recornbinan[ DNA molecules. and. it’ we are lucky. each
recombinant DNA molecule enters a host organism and is there replicated as the host
replicates. Each host colony containing clones of a particular fragment is individually
plucked and stored in a well of J 96-well microtiter diih \vhere the cell~ can be grown
up again and again. This library of’ clonei proviclej a renewable supply ot’ :ill the
fragments that have survived the cloning procesj.

To create a contig map of’ a single human chromosome, many groups are starting with
a chromosome-specific library of cloned fragments constructed by start in: with many
copies of a particular chromosome. Chromosome-specific libraries arc being made by
the National Laboratory Gene Library Project at Los Alamos and Li\emlore and are
available to research groups throughout the world (see “’Libraries from Flow-sorted
Chromosomes”).

114
Mapping the Genome/P/lysicul MU[Jpi/lLq

The cloned fragments in a DNA library


are “anonymous”; that is. we know noth-
ing about them except their approximate Construction of a Library of Cloned DNA Fragments
length, which is determined by the length
of the DNA insert that can be success- Step 1: (a) Isolate many copies of the human DNA molecule to be mapped,
fully incorporated into the cloning vector
we have chosen. Until recently cosmids Restrlctlon —
site for a
were the cloning vectors most often used restriction
for map construction. Cosmids reproduce enzyme

in the bacteria[ host E, coli, and they accept


DNA inserts ranging from about 25,()()0 (b) Partially digest the molecules with a restriction enzyme to create over-
to 45.000 base pairs in length. There- lapping fragments

t’ore about cosmid clones could ac-


4000
commodate all the DNA in an average hu-
man chromosome. However, to achieve
the overlaps among cloned fmgrnents re-
quired in the construction of a contig map
and to better assure that all the chromo-
Step 2: (a) Linearize the circular cloning vectors with the restriction enzyme
somal DNA is represented in the clone li-
used in step lb.
brary, the usual practice is to construct a
library with up to ten times that number of Cloning vectors Humar DNA fragments
cosmid clones.

Question: How do we position the cloned


o
00
–Restrlchon site

DNA fragments along the DNA molecules


in the genome? \
i — Linearized cloning vector
‘\%
Answer: Positioning cloned DNA frag-
ments is analogous to solving a one-
dimensional jigsaw puzzle, but rather than ‘Lol

looking for interlocking pieces. we look for Recombinant DNA molecule

detectable overlaps between clones, that is,


for clones that have a unique stretch of hu- (b) Ligate cloning vectors and human DNA fragments to create recoin
man DNA in common. Because the num- binant DNA molecules.

ber of pieces in the puzzle is so large, we


need ti rapid method for detecting overlaps Step 3: Facilitate the entry of recombinant DNAmolecules into host cells, here the
bacterium E, co/i, and grow each host cell into an isolated colony, thereby
between pairs of clones. If we could se-
producing many identical copies of that recombinant DNAmolecule.
quence each clone, we could identify over-
laps unambiguously. provided the overlap- Recombinant
ping region is not a sequence that repeats DNA molecule

elsewhere in the genome. However, given

@’
o \\
/
the current state of sequencing technology.
that approach is totally impractical.

A practical and successful


tic method for detecting
probabilis-
overlaps is
‘-E ‘\
“ E. CO/1
co/i genome

to make a “fingerprint” of each clone


(more precisely, of the human DNA in-
sert within each clone) and compare the —

NUUIIXI - 20 I 992 L()\ ./t/( //)//).\S( ;{,)) ((,


Mapping the Genome/Phy.sical Mapping

fingerprints. The simplest fingerprint


of a cloned fragment is the one ob-
Restriction-Fragment Fingerprints tained by completely digesting about
1010 copies of the clone with a re-
(a) Clone 1 overlapping clone 2 striction enzyme and then determining
Restriction sites
the lengths of the resulting restriction
i of ECORI
fragments by gel electrophor-esis. The
— 3.5— —6— —4— -2- —6.5— —5— Clone 1
restriction-fragment lengths determined
from the gel constitute the restriction-
f’ragment lingerprint of the clone.
—4—-2– — 6.5— —5— —7— —3—
Clone 2

Suppose wc obtain restriction-fragment


fingerprints of our clones by using the
restriction enzyme EcwR1, which can cut
DNA at every occurrence of’ the six-
(b) Fingerprints of clones 1 and 2 base-pair sequence GA ATTC. Within a”
random seql~ence of the fbur DNA bases,
Clone 1 Clone 2 any six-base-pair sequence occurs, on
average, every 46, or about 4000, base
pairs. Therefore the average length of
the restrict [on fragments produced by
7.0
E(oR1 from a random sequence of the
6.5 6.5
6.0 DNA bases is about 4000 base pairs,
Now the sequence of bases in the human
5.0 5.0
genome is not random, but nonetheless.
4.0 4.0
the average length of the restriction t’rag-
ments in tht h“c{~RIfingerprints of a set
3.5
3.0 of clones is about 4000 base pairs. Thus
we expect {that the human DNA inserts
2.0 2.0 in two cosmid clones. each of which are,
say, about 30,000” base pairs long. will
have at least one restriction in fragment
“~) common if they overlap by more than
Gel patterns about 15 percent.

Example: ‘To illustrate the information


content of fingerprints made by u~ing
(c) Regions of overlap and nonoverlap inferred from fingerprint data in (b). Fragments
are arbitrarily ordered, from largest to smallest, within each region. the restriction enzyme EcoRI, consider
two clones that are known to over-
lap as shown in part (a) of the fig-
6 3.5 6.5 5 4 2 ure. The cleavage sites for fl~)RI are
Clone 1
marked by arrows, ancl the di~tances
6.5
151412 7 I 3
Clone 2 between restriction sites are given in
thousands of base pairs (kbp). Part
~’ J‘~’—-) (b) shows the restriction-fragment fin-
“~ ‘—-y
gerprints obtained by completely digest-
Nonoverlap Overlap Nonoverlap
ing many copies of each clone with
EcoRI. Afrer several hours of elec-
trophoresis, the restriction fragments of

116 L().I AIUHI(LV.y(IcII(c N1]nlbcr Z() 19°?


Mapping the Genome/Physicu[ Mappirr<?

each clone have separated into distinct bands, each band consisting of all the
restriction fragments with a particular length. (The bands are made visible by staining,
and each gel is calibrated with fragments of known lengths. )

The region of overlap between the two clones shown in the figure yields four
restriction fragments with Ien.gtbs of 4, 2. 6.5, and 5 kbp. Thus the fingerprints
of the two clones have four bands in common at the gel positions corresponding to
those lengths. Suppose these two fingerprints were the only information we had about
the two clones shown in the figure. We might suspect that the clones overlap one
another and that the overlap region included four restriction i’ragments with lengths
of 2, 4, 5, and 6.5 kbp. We might then partition the restriction fragments into a region
of overlap and two regions of nonoverlap as shown in part (c) of the figure. Note that
we would have no way to impose any further ordering on the restriction fragments
present in the fingerprint. Shown in (d) is a photograph of actual fingerprint data.

Question: Can we infer that two clones overlap solely on the basis of their
restriction-fragment fingerprints?

Answer: Since a restriction-fragment fingerprint is, in essence, just a list of restriction-


fragment lengths. it gives us no information about the order of the fragments within
each clone. Also, we can’t tell whether the restriction fragments of the same length
in two different fingeqxints are copies of the same fragmeni. So the fact that the
fingerprints of two clones have one or more restriction-fragment lengths in common
does not provide unambiguous evidence that the two clones overlap. On the other
hand, by taking into account statistical properties of restrictiomfragment lengths, we
can estimate the likelihood of overlap given the data. David Torney of Los Alamos has
developed a rigorous formulation of the likelihood calculation that takes into account
the distribution of the distances between cleavage sites in tbe genome (the distribution
of EcoRI cleavage sites appears to be a Poisson distribution with an average spacing of
4000 base pairs), the errors in the measurement of restriction-fragment lengths (about
1 percent), and all possible ways in which the two clones might overlap. Since the
declaration of a false overlap would lead to the merging of pieces of the map that
are not contiguous on the genome and since such mistakes are very time-consuming
to correct. a conservative approach is to declare an overlap only if the likelihood of
overlap is 90 percent or greater. Given the simple restriction-fragment fingerprints
shown on the page opposite, two clones must overlap by about 50 percent to yield
such high likelihoods of overlap. Thus small overlaps are typically not detected with
this conservative approach. As described in “The Mapping of Chromosome 16,”
the Los Alamos mapping group has devised a fingerprint that includes information
about the presence of repetitive DNA sequences on the restriction fragments in each
fingerprint. That additional information facilitates the detection of much smaller
overlaps and therefore requires the fingerprinting of fewer clones to complete the
contig map.

Question: How are pairs of clones with a high likelihood of overlap assembled into
contigs, sets of contiguous overlapping clones?

Answer: Given the uncertainties in fingerprint data, assembling pairs of overlapping


clones into contigs from those data alone is a difficult computational problem, The

117
Mapping the Genome/Phy.si(ui Muppi}7,q

standard procedure is to find pairs of’ clones, link thost pairs into groups, and then
attempt to order all the restriction fragments within each group of clones in a \elf-
consistent manner. The method is essentially an incremental approach. As each new
clone is added to a contig, one tries lo retain as much of the existing construction as
possible even in the face of contradictory data.

A significant departure from the incremental procedure has recently been dcvelope(i
at Los Alamos. Map construction is treated as an opl.imiza[ion problem in which
all available data are taken into account rather thm only the data yielding high
overlap probabilities. A description of this global approach to map construction is
discussed in “Computation and the I Iuman Genome Project.” Here we illustrate the
more standard procedure.

Example 2: Suppose that the fingerprints of clones A, B, and C reveal that clones
A and B have five fragment lengths in common, A and C Iwvc six fragment lengths
in common, and B and C have one fragment length in common. Furthermore. we
have calculated from those data that the likelihood of A and B overlapping ii 90
percent, of A and C overlapping is 95 percent, and of B and C overlapping is I()
percent. We would then assemble the three clones into a contig as shown in the
figure, where some restriction fragments are placed in regions of overlap and the

Assembly of a Contig

Overlap
—..—

Clone B ~1 Ov<,r,ap

“one A ~G

I Ill I I 1
Clone C I+J

Overlap

Likelihood analysis of fingerprint data suggests that clone A overlaps clone B


and clone C and that clone B and clone C do not overlap. However, clone B
and clone C do share one restriction fragment and that fragment can be
placed in the overlap between clone B and C with 170 loss of consistency.
Preliminary assignments of restriction fragments to overlap or non-overlap
regions might be altered as more clones are added to the contig.

remaining ones are placed in the regions of nonoverlap. As we add other clones to
the contig, we might have to revise the partitioning of the fragments into overlapping
and nonoverlapping regions to construct a consistent ordering for the entire coruig.
Because of the unce~ainties in fm,gment lengths and the possibility that fragments of
equal length are not necessarily the same fragment, complicated computer aigoritnms
are necessary to determine the most likely order of the clones in a contig. When the
number of clones in a contig is much larger than the number required to span the
region covered by the contig, we can order many of the restriction fragments {hat
appear in each fingerprint and thereby help to avoid some false overlaps.

118 LOS ..llumos Science Number 20 !W’2


Mapping the Genome/Physicul Muppi/~,q

Example 2: Shown at right is


a contig assembled on the ba- Typical Contig for the Yeast Genome
sis of restriction-fragment finger-
prints. The contig spans about o 10 20 30 40 50 60 70 80 90 I 00 110 120 kb

100,000 base pairs. Also shown Lull 11 Ill ILII Ill I I 1,, ,1, ,1, ,,1 ,,, 1, ,1, II ,,1 ,.LuLLuL—

II WMU I I Hlul II J I I II I II II Iu 1! 111 I 1,+1 Restriction


is a restriction map deciuced from II
map
the contig. The restriction map
show’s the order of and distances I -++ 3733 +-+ 2446 IF+W11707 Iitl I j5516
II UL ~
II 2183 ~ 4277 ~~—1 5885 H+++-1 1187
between restriction sites in thou-
t-++ 24 I 1 ~ 3230 i++t+ 1230
sands of base pairs or in kbp. The
!~ 6225 ~ 5038 i++-j 3037 Contig of
exact positions of solme restriction overlapping
sites (marked by the longer verti- 1+~ 1240 +-l 4797 4HH-I 2767
clones
cal lines that extend through the 1~~ 2191 ~1 3781 H+; 3035
cloned fragments ) have been de- 1~1 5955 ~ 1816 j~l1852
termined by the fact that each lies ~q+l 3180 I I ~I 4717 l-H+~ 2767
t4 II
at the end of one of’ the clones +++ 2780 1~1 1708 H-H++++ -15607
in the contig and therefore sep- +1 1225 }~~ 4441 lw++-1 3443
arates a region of’ overldp be- ‘+1 1222 l~l+q 5791 1~+ 4230
tween two clones from a region ~1~ 5554 I II I u 5270 H++--+ 2420
of nono~erlap. Other restriction ++++ 2438 1+~1 2777 ~+2199
sites (marked by the shorter \er- +!_+ 2913
~4057 ~ 3901
tical lines) ha~e been localized to 1, II Ill
1’ 5898
a single overlap region but can-
-++2161
not be ordered further. Such sites
++H– 2492
have been arbitrarily located left
to right on the contig in order k++ttt& 3610
This contig spans 108,000 base I-H––H-H+ 5774
of decreasing inter-site distance.
pairs (108 kbp) of yeast 11,1
This contig i~ representative of II I 5884
chromosome V.
those used in constructing the re- }~+1-+ 1751
(Courtesy of Maynard Olson,
cently completed physical map 01 Washington University) ~t++-1 4831
the genome of baker’~ yeast (Su[- ~+ + 291P

(/zironiyccs tcre~’isi([c). That map H% 3034


is, on a~erage, eight clones deep.
That is. tiny region is present in,
on average, eight clones. Such
great redundancy provided information about the order of & large fraction of the
restriction sitei and greatly reduced the chance of a false overlap.

Question: Do the disconnected contigs assembled by fingerprinting randomly


selected clones steadily increase in lenglh until they become connected?

Answer: No. In a random tingerprintmg strategy, both the numbers and sizes of the
contigs grow fairl> rapidly at tirit, but tine rates of growth decrease after the existing
contigs cover abou[ two-thirds of’ the region to be mapped. The decrease in growth
rate is due to the increasing probabi Iity that a randomly seiectea clone faHs within a
region for which a contig has already been assembled. Contig growth is also limited
because smal I overlaps typically go undetected and some portions of the region being
mappcci may not have survived the cloning process. h fact, contigs assembled from
cosmid clones typical Iy stop growing after reaching iengths of 100 kbp.

Number 20 19LJ2 L(J.\.A/<//)/f,\.S[’;(’//([>


Mapping the Genome/Physical Mappin<q

Question: How do we order disconnected contigs along the chromosome and how
do we check their accuracy?

Answer: Many types of lower-resolution maps can be used to position the contigs
along a chromosome and to check that all the clones in a contig come from approx-
imately the same region of the genome.

Example: The contigs constructed for yeast chromosomes, which had an average
length of 100 kbp, were ordered relative to a high-clensity genetic-linkage map
containing 400 markers spaced at an average physical distance of 30,000 base pairs.
To check the integrity of each contig, the clones that fonrn it were hybridized to very

Complete High-Resolution Restriction Map of Yeast Chromosome I

Physical distance (kbp)


o 50 100 150 200 250
I I I I I I I I I I I 1~1 I I I I

Long-range restriction map


SfiI
Subteleomere region
NO?I
,,,/
,, 1
,A ,, High-resolution restriction map (2.6 kbp) ‘;(
,,,,
1’1
Iullll 111 Ill I 1111,, Ill ~ Ill J I
H~~ ,,111 I
II Ilil II I LL,l
,,
,~,
I II I
1:1’
I
I 1111
I I ,1 Illlu I I I

,,
Uu
II
I
‘,
,’,’
/’
,’, ,
1, ,’ /
!, ,’

I I I b’lkagemap
m r) :3D0
n g~ ~ -0
c n ~~:~
J3 N co. x
$ -;
3
g .P m
m

I I I I I I I I I I I I I I
100 50 0 50
Genetic distance (centimorgans)

The high-resolution restriction map for yeast chromosome I was derived from a completed contig map of the
chromosome. The Xs mark the beginning of the subtelomeric regions which are known to lie a few thousand
base pairs away from the telomeres (ends) of the chromosome. Restriction sites for the thirteen-base cutter SfiI
and the eight-base cutter JVotl and markers on the linkage map of chromosome I are localized tcl particular restric-
tion fragments on the high-resolution restriction map. (Courtesy of Maynard Olson, Washington lJniversity)

120 Los Alamos Science Number 20 1992


Mapping the Genorne/Physical Mappi}lg

long (over 100.000 base pairs) restriction fragments of DNA that had been separated
by pulsed-field gel electrophoresis. If the clones assigned to a contig do in Fact come
from a single region ot’ the genome. it is likely that all of them will hybridize to a
single large fragment on the gel.

The figure shows the high-resolution restriction map deduced from the completed
contig map of yeast chromosome 1. Also shown is the alignment of the restriction
map with two other maps: ( I ) tbe genetic-linkage map and (2) a long-range restriction
map showing the distances between the eight-base restriction sites ot’ the enzylme NotI
and the thirteen-base restriction site of S’i[. (The latter map was constructed using
pulsed-field gel electrophoresis. ) Markers on the genetic-linkage map and restriction
sites on the long-range restriction map have been localized to particul:~rre$+tri~tion
fragments on the conti.g map. Those correspondences are indicated by dotted lines.

The conti.gs being assembled for human chromosomes are being checked by a variety
of’ techniques including in-situ hybridization and hybridization to the DNA from
hybrid cells containing increasingly longer portions of the chromosome being mapped
(see “The Mapping of Chromosome 16“).

Question: After the contigs are ordered and checked for accuracy, how do we jill
in the gaps between the contigs?

Answer: As mentioned earlier, the fingerprinting of randomly selected clones is not


an efticient way to fill in the gaps between contigs after the existing contigs cover
a large fraction of the region being mapped. Instead it is time to employ a directed
strategy. One directed strategy involves identifying unique regions within the clones
at the ends of a contig and using those regions as probes to pick out other clones
that will extend the contig. If the contigs cover a very large traction (95 percent) of
the region being mapped, a single probe from the end of a clone may identify a new
clone that spans the distance between two existing contigs and thus merges them into
one, It’ not. then one must continue stepwise by creating an end probe from each
added clone and screening the library of clones to tind the next clone that extends
the contig a bit farther. This procedure is called walking, and it is extremely time-
consurning. Nevertheless. it has been used successfully to complete physical maps
of the E. (oli and yeast genomes. Those genomes are relatively small (containing
5 million base pairs and 13 million base pairs, respectively), and the gaps between
contigs were small before walking was attempted.

I’71
Mapping the Genome/Phy.rical hluppin,q

Example: The figure illustrates the


merging of two contigs by either a single
Closing the Gap between Two Contigs
clone or se~eral walking steps.
Only one walking step IS needed to bridge the gap between two cont{gs

,.. CAVEAT: A physical map is a very


difficult puzzle to complete. As men-
+’====+
b J tioned in tlhe round table (see pages
I ‘A 108–109 in “Mapping the Genome”), the
1
{~ ‘
1
generic clone-to-fingerprint-to-contig cy-
} i

“-.
,,
“~
cle, which is amenable to automation
Contig X Contlg Y and improved data-analysis algorithms,
is only a small fraction of the work. The
Four walking steps are needed to bridge the gap between two contlgs
rest of the work required to close gaps
I -J
-1 between contigs and to track down in-
1
consistencies such as the branching of
t 4 +====%+
one contig into two or more contigs in-
I 1 lb——=—=+
1 volves many standard molecular-biology
~( {
t 1 4
procedures, which, in the case of the
1
1 I
!+—===’+
human genome, must be carried out on
‘—v
) L an unprecedented scale. It is estimated
Contlg W Contlg Z that the completion of the yeast map
took about 20 person-years of work, und
+=-===+ Clone in a contlg the mapping of ca<h human chromosome
Probe used to find the next clone in a walk will take about 100 person-years, Fur-
ther, mapping of human chromosomes
~ Next clone in a walk
presents some new challenges,

● An average human chromosome is ten times the size of the yeast genorne, and the
increased size calls for more efficient mapping strategies, such as working with
larger clones.

● Unlike the genomes of yeast and E. cdi, human DNA contains repetitive elements
that require a new fingerprinting strategy to avoid inferring overlaps between clones
containing long repetitive stretches of DNA near their ends.

● Experience has shown that regions containing repetitive sequences are often lost in
the cloning process. Consequently, parts of the puzzle {ofeach human chromosome
may be missing, in which case completion of the map will require specialized
techniques.

These challenges are being met in a variety of ways including the use of YAC
cloning vectors, which accept DNA inserts eight to ten iimes Larger than the inserts
accepted by cosmids, and the use of STS markers, which, unlike restriction-fragment
fingerprints, identify unique Ianamarks on the map and therefore eliminate the need
for complicated probabilistic analyses to infer overlap between two clones. ■

122 L(,, /’\/(/m(L\S(/[,/7<<, Nunlbcr 20 I Q92

You might also like