lt04 06cmn

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 40

Session 4

Anonymous markers: Microsatellites and RFLPs


The concept of a polymorphic anonymous marker
The restriction fragment polymorphism
Microsatellites
Microsatellites are excellent markers in the absence of very
high throughput techniques
How microsatellites are typed
Session 4
Learning Objectives

Explain the need for anonymous markers in whole genome


studies

Understand how polymorphic anonymous markers are found

Explain why microsatellites are excellent markers in the


absence of very high throughput techniques

Understand how microsatellite markers are used in practice


Polymorphic Markers
In any individual (you for example) each one of the
chromosomes of a particular pair (chromosome 6, for
example) originates from one of your parents. Though
homologous and on average 99.9% identical, small
differences (of many types) occur.

Identified pieces of sequence that are from known locations


(loci) and that are measurably different between different
copies of a chromosome are called polymorphic markers.

Polymorphic markers can be used to trace the inheritance


of a locus and hence neighbouring parts of the
chromosome. This process underpins recombinational
mapping.
Genetic Markers for Trait
Mapping
A series of polymorphic markers can be used to build a
framework genetic map, against which disease trait
genes and other features can be located, even if a
genome has not been sequenced.

The first step in the identification of a genetic trait gene


is its mapping to a known segment of the genome in
relation to an established framework of polymorphic
markers.

Given a complete genomic sequence, all markers can


be located exactly and the position of all genes can be
found.
Why are markers useful?
As a result of the Human Genome Project
• Any marker can be mapped to a position in the DNA.
• All the known genes have been located on the sequence.

Once you know (as a result of recombinational analysis) that


a trait gene (such as a disease gene) maps between a pair
of markers (the “interval between flanking markers”), you
know that the trait gene must be one of the genes that are
present between the markers.
see Levran et al (2005) The BRCA1-interacting helicase BRIP1 is deficient in Fanconi anemia.
Nature Genetics 37:931-933 for a good example of the modern integration of online data and
genetic markers.

Therefore you can locate all of the possible genes that could
be the trait gene by genetic mapping and examining online
HGP data.
DNA Markers
DNA based
RFLPs Biallelic, Southern blot, now PCR

Microsatellites Multiallelic, highly informative,


PCR and electrophoresis.
Every 50 kb.

SNPs Millions per genome, very stable,


biallelic: low information content. Can
combine into highly informative haplotypes
“Anonymous” Markers
When detecting DNA differences, it doesn’t matter
whether the sequence difference is functional or not.

Doesn’t matter that you know nothing about the sequence,


except how to detect the alleles.
Hence “anonymous markers” (David Botstein)

What does matter in recombinational mapping?

•The marker must contain unique sequence.


•The sequence must be polymorphic.
•The polymorphism must be detectable by a simple
method (such as PCR and size determination and/or
microarray hybridisation).
• All alleles must be detectable by the same method.
Polymorphic anonymous
markers: RFLP
Originally “Restriction Fragment Length Polymorphisms”
Usually bimorphic (2 forms)

A specific fragment of DNA is flanked by restriction sites. If a


base change (a SNP) introduced a new site/ deleted an old
site or extra sequence (such as a ‘minisatellite’ repeat) was
introduced between the two sites, then the length of the
fragment would be bimorphic.

Detection originally by Southern blot, now can be done by


PCR. Either way, RFLPs are now considered old-fashioned.
Polymorphic anonymous
markers: RFLP
Restriction Fragment Length Polymorphisms Usually
bimorphic (2 forms)
Restriction Sites (e.g. Hind III AAGCTT )

Base change removes/adds new restriction site

Insertion of extra sequence segment

Detection originally by Southern blot, now can be done by


PCR.
Microsatellite Markers
Useful markers are di-, tri-, and tetra-nucleotide repeats.

These are highly unstable (in population terms).


Any given CA dinucleotide repeats changes at an average
rate of one dinucleotide per 100 generations, generating
new alleles.

CA (=TG) dinucleotide repeats represent 0.25% of the


entire human genome.
(International Human Genome Sequencing Consortium, 2001; Nature)

>10,000 microsatellites are characterised. There are


probably >60,000 polymorphic microsatellites in the
genome, mostly not characterised (but all sequenced).
Microsatellite Markers
Different alleles are different lengths
Commonest
Variable length short tandem repeats: and most
polymorphic
dinucleotide repeats (e.g.
CACACACACACACACACACACACACA)
All are only
trinucleotide repeats (e.g. useful as
GGTGGTGGTGGTGGTGGTGGTGGTGGT) markers when
they are
tetranucleotide repeats (e.g. embedded in
CTATCTATCTATCTATCTATCTATCTAT) unique
sequence
Microsatellite Markers
Revolutionised genetic mapping in humans.

Common and easily recognisable alleles of the marker.

• Primers for PCR recognise the unique flanking sequence


(not the repeat!).
repeat Microsatellite markers are a subset of
STSs (mentioned before)

• Frequently there are 5 or 6 common alleles of


dinucleotide repeats, so that (in genetic analysis) it is
usually possible to trace the origin of an allele to one
parent. Heterozygosity is usually 0.7 - 0.8 (almost
everyone is a heterozygote at a given locus).
Details of D3S1217 (again)QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.

QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.

TGACAAGTTT AAAGGGTCCC A>>> TGTCAAAGTCCCCTTCCTTG>>>

Primer 1 Primer 2

aagaacccaa tgaggccctc agaggtcatt tcaataggat ttttgggccc 66,884,375

TGACAAGTTT AAAGGGTCCC A>>>


TGACAAGTTT AAAGGGTCCC Agtgtatcta gagacatttc agagaaggat 66,884,425
atgtgtgtgt gtgtatgtgt gtgtgtgtgt gtgtgtgtgt gtgtgtgtgt 66,884,475
gcagtcacag caatgactca ctgggagaac agtgggacaa cagcagccca 66,884,525
aaacctgaca gttaccatag CAAGGAAGGG
TGTCAAAGTC CCCTTCCTTG >>> GACTTTGACA aacggaaccc 66,884,575

tttcatgata atctcaacaa tgtaatctgt aaatgtgttt gaaactccac 66,884,625


Sources of Human DNA
Blood. Normal source. Can do
unlimited tests on DNA from 10ml
blood. High molecular weight Buccal epithelial cells Inside of
(HMW) DNA is not usually mouth is brushed with a special tooth-
obtained. brush. Often preferable as considered
non-invasive. Good for children and in
Blood-spot Less invasive than a
unhygienic conditions. Not used for
blood sample. Limited number of
HMW DNA
tests. DNA will not be HMW.

Purified human leukocytes from


Sperm DNA very abundant. HMW
blood. Usual source of HMW DNA
DNA easily isolated. Now mainly
for genomic cloning. DNA must be
for (male) recombination studies.
very gently isolated from undamaged
Sperm cells are amplified
leukocytes. A 50 ml blood sample
separately. Each one is the product
would be needed for making a
of a separate recombination event.
genomic library.
Usual Raw Material: Human
Blood

Simple methods devised to lyse anuclear erythrocytes,


collect leucocytes (nucleated) and extract crude DNA.

NB: This material usually consists of small DNA


fragments only. This is adequate for PCR reactions but
cannot be used to construct a genomic library.
Typing microsatellites

Extract DNA from subject. For whole genome screen to


identify gene for novel trait, you need to do perhaps 1000
PCR reactions, so you need tens of micrograms of genomic
DNA, so you need a few ml of blood from each subject.

To establish linkage to known microsatellites you might


need to do tens of PCR reactions and stored spots of blood
on filter paper (Guthrie spots) or DNA extracted from buccal
swabs or brushes can suffice.

DNA is extracted and PCR reactions performed using


marker-specific primers.
Typing Microsatellites
Allele 1

Allele 2

Primer A CA-repeat Primer B

80 – 400 bp
Daughter
F M In this most informative
F M possibility, father and
mother have 4
distinguishable alleles.
D
With most microsatellite
markers, this will happen
migration >50% of the time.
Typing Microsatellites
Allele 1

Allele 2

Primer A CA-repeat Primer B

80 – 400 bp

“Stutter” bands D F M
F M caused by slippage of
polymerase on the
repeat. Tri- and Tetra-
D nucleotide repeats
stutter less, but are
less polymorphic
migration
Distribution of CA repeats in
Genome
CA /TG repeats occur roughly 1 per 50,000 nt.
Polymorphisms result from DNA replication errors
(occurs in all eukaryotes) slippage of DNA
polymerase.

180.0 kb
Distribution of (CA) ≥9 in a 180 kb segment of known
sequence (AC007271)

Rapid evolution (1 change per locus per 1/1000


meioses): found in all populations.
Framework Map
When mapping a novel trait, people with the trait are usually
scarce and family structures are seldom ideal for analysis.

Traits are mapped against a precise and accurate genomic map


prepared by examining recombinations in families between
polymorphic anonymous markers.

Map distances on the genomic map have been derived from large,
well documented families of many generations.

The order of the markers can be defined precisely from the human
genome sequence
Framework Map
Centre d’etude du polymorphisme humain (CEPH)

CEPH identified thousands of dinucleotide repeat markers (ca


1990)
CEPH (around 1984) collected 40 large three-generation
families totalling several hundred individuals. Used to map
>10,000 markers genetically. The principle was that the same
families would be used for all mapping.

DNA prepared from EBV-immortalised B cells: effectively


limitless supply of DNA

Resolution average 3 cM
Initial programme of CEPH (1990) explained: Dausset et al., 1990. Genomics 6, 575-577.
Cloning Microsatellites
Markers at Random
Select DNA of interest: Subclone it, array the
Whole genome, a subclones, hybridise
chromosome, a genomic them with
clone (PAC, BAC etc).). microsatellite core
sequence (e.g. oligo-
CA)
Select individual
clones and sequence
the unique flanking
sequences. Design Do PCR reactions on
unique primers. individuals and families.
Catalogue allele sizes.

Deposit data, primer sequences


and PCR conditions.
The CEPH High Density Map
Centre d’Études de Polymorphisme Humaine

• Discovered and screened a set of >9000 highly


polymorphic markers: mean frequency of
heterozygotes is 0.64.

• Collected DNA from a collection of large families (CEPH)


for linkage analysis.

• Created a genetic map at 1 cM resolution of the entire


genome.

• Accumulated new markers for further detailed study by


other groups. http://www.cephb.fr
The CEPH Genetic Map
Resolution of map can be no better than dictated by
recombinational events. In family sizes from CEPH, this is
seldom better than 1 cM.

CA-repeat markers occur ~50 kb. Many markers are


“binned” together: they are being separated into smaller
bins by physical mapping.

Published in 1993 with 300 markers, 1996 with 3000,


version 8.2 (1998) had >9000 microsatellites out of 12000
total markers.
Example of CEPH Family

61 * Three generation families distributed as


immortalised B cell clones. *needed for phase
Several families are actually joined.
http://locus.umdnj.edu/nigms/ceph/ceph.html
Important!
Microsatellites are generally used only for tracking a
piece of DNA through a family.
Disease-causing mutations of a trait gene can arise
many times against a background of different
microsatellite alleles.
Microsatellites also change rapidly (1% of meioses).
An allele of a microsatellite close to the trait gene does
not indicate the status of the trait gene, except within a
family.
=> The linked allele is likely to be different in every family
you examine.
Reminder of Principle of Linkage
Analysis
For each offspring in a family and for each of several
hundred markers we are asking:

Did the offspring inherit one particular allele of the marker


along with the disease?

Mapping software is used to determine whether there is


significant linkage between each marker and the trait
given that markers are also linked to one another.
Applied Biosystems® (ABI)
Standard Primer Sets for
Multiplex Analysis

Standard sets of primers work in PCR reactions under the


same conditions. Primers have fluorescent dye attached.

Colour and size range of PCR product are chosen so that up


to 10 different markers can be pooled from one subject into
the same lane. (see ABI.pdf on WEB CT Vista)

Whole genome sets of 192 pairs give 20 cM and 811 pairs


give 5 cM resolution, respectively. Rough linkage analysis
(20 cM) can be done in 20 gel lanes per subject.
ABI Standard Primer Sets

Marker name

Map separation, cM
Genotyping

R1
B1

G1 R2

R3
B2
R4

B3
G2
Genotyping

D F M R1
B1
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.

G1 R2

R3
B2
RealR4data for one marker from a
heterozygote. This is a CA repeat. Stutter
bands are marked. Tetranucleotides
B3
repeats show littleG2
stutter but also fewer
alleles, less polymorphism.
An informative marker
A Marker is informative in a particular family if it is possible
to tell whether a recombination occurred between the marker
and the trait.

Can we tell whether a recombination has or has not occurred


between the marker and the trait?

A1A2 A1A2 A1A2 A1A2

A1A1 A2A2 A1A2 A1A2 A1A2 A1A2 A1A2 A2A2


A1A2 A1A2 A1A1 A2A2
Uninformative: the trait came Unimformative: the Informative: A1 is inherited Recombinant: A2
from father who had two A1 trait came from father, from father: inherited from father.
alleles but we can’t tell which Non-recombinant between A
allele of A is from and trait.
father.
Need more markers?
The precision to which an unknown trait locus can be mapped is
limited by rare recombinational events around the locus in the
affected family. Look at only families where recombination has
occurred close to the peak of LOD score.

You can find new markers within region:

Either use reported microsatellites or

Screen the minimal region of DNA that you have for a


microsatellite core sequence such as (CA)10 that has not been
reported as a marker: this is likely to be polymorphic. Construct
a PCR genotyping assay for yourself.
Microsatellite Markers in a
Megabase

Tetra- Tri- Di- nucleotide


ATTC CCT TG repeat

See http://www.ensembl.org
Once you have the map
position
Suppose you see, in 30 meioses, that the trait is always
inherited from the affected parent along with the affected
parent’s allele of D10S1790 and D10S1652. These
markers are about 4 cM apart on the genetic map.(They
are 9.2 Mb apart)
This means that it is highly likely that the trait gene lies
between these two markers.

Go to either of the following servers

http://genome.ucsc.edu
http://www.ensembl.org
Gene annotation
Both browsers will show you a wealth of annotation that has
been mapped to specific sequence on the genome.

This includes known and predicted genes:


• reported cDNA sequences
• spliced expressed sequence tags (ESTs--more later)
• predicted homologues to known genes (human and other).
• genes predicted from splicing patterns and open reading
frames (these are not reliable yet).

Annotation includes links to more data and to data sources.


UCSC Genome Browser
Easy to use genome browser

QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.

Genes
Conclusion 4
Any form of common polymorphism can be used for
genetic mapping.

Restriction fragment length polymorphism mapping is


seldom used today in mapping, because it is laborious.

The microsatellites are highly polymorphic, frequent and


can be genotyped cheaply in simple apparatus and
adapted for large scale up. These were used to
establish framework maps of the genome.
They are being superseded by SNPs…

You might also like