Wah Lund
Wah Lund
Wah Lund
Introduction
So far weve focused on inbreeding as one important way that populations may fail to mate
at random, but theres another way in which virtually all populations and species fail to mate
at random. Individuals tend to mate with those that are nearby. Even within a fairly small
area, phenomena like nearest neighbor pollination in flowering plants or home-site fidelity in
animals can cause mates to be selected in a geographically non-random way. What are the
population genetic consequences of this form of non-random mating?
Well, if you think about it a little, you can probably figure it out. Since individuals that
occur close to one another tend to be more genetically similar than those that occur far
apart, the impacts of local mating will mimic those of inbreeding within a single, well-mixed
population.
A numerical example
For example, suppose we have two subpopulations of green lacewings, one of which occurs
in forests the other of which occurs in adjacent meadows. Suppose further that within each
subpopulation mating occurs completely at random, but that there is no mating between
forest and meadow individuals. Suppose weve determined allele frequencies in each population at a locus coding for phosglucoisomerase (P GI), which conveniently has only two
alleles. The frequency of A1 in the forest is 0.4 and in the meadow in 0.7. We can easily
calculate the expected genotype frequencies within each population, namely
A1 A1 A1 A2 A2 A2
Forest
0.16
0.48
0.36
Meadow
0.49
0.42
0.09
Suppose, however, we were to consider a combined population consisting of 100 individuals from the forest subpopulation and 100 individuals from the meadow subpopulation.
Then wed get the following:1
1
From forest
From meadow
Total
A1 A1 A1 A2
16
48
49
42
65
90
A2 A2
36
9
45
So the frequency of A1 is (2(65) + 90)/(2(65 + 90 + 45)) = 0.55. Notice that this is just
the average allele frequency in the two subpopulations, i.e., (0.4 + 0.7)/2. Since each subpopulation has genotypes in Hardy-Weinberg proportions, you might expect the combined
population to have genotypes in Hardy-Weinberg proportions, but if you did youd be wrong.
Just look.
Expected (from p = 0.55)
Observed (from table above)
A1 A1
(0.3025)200
60.5
65
A1 A2
A2 A2
(0.4950)200 (0.2025)200
99.0
40.5
90
45
The expected and observed dont match, even though there is random mating within both
subpopulations. They dont match because there isnt random mating involving the combined population. Forest lacewings choose mates at random from other forest lacewings,
but they never mate with a meadow lacewing (and vice versa). Our sample includes two
populations that dont mix. This is an example of whats know as the Wahlund effect [2].
A1 A1
p2
P
1
p2i
k
A1 A2
2
pq
1 P
2pi qi
k
A2 A2
q2
P
1
qi2
k
For the time being, Im going to assume that we know the allele frequencies without error, i.e., that
we didnt have to estimate them from data. Next time well deal with real life, i.e., how we can detect the
Wahlund effect when we have to estimate allele freqeuncies from data.
3
Wed get the same result by relaxing this assumption, but the algebra gets messier, so why bother?
where p =
pi /k and q = 1 p. Now
1X
1X 2
pi =
(pi p + p)2
k
k
1 X
=
(pi p)2 + 2
p(pi p) + p2
k
1X
=
(pi p)2 + p2
k
= Var(p) + p2
(1)
(2)
(3)
(4)
Similarly,
1X
2pi qi = 2
pq 2Var(p)
k
1X 2
qi = q2 + Var(p)
k
(5)
(6)
Since Var(p) 0 by definition, with equality holding only when all subpopulations have
the same allele frequency, we can conclude that
Homozygotes will be more frequent and heterozygotes will be less frequent than expected based on the allele frequency in the combined population.
The magnitude of the departure from expectations is directly related to the magnitude
of the variance in allele frequencies across populations, Var(p).
The effect will apply to any mixing of samples in which the subpopulations combined
have different allele frequencies.4
The same general phenomenon will occur if there are multiple alleles at a locus, although it is possible for one or a few heterozygotes to be more frequent than expected
if there is positive covariance in the constituent allele frequencies across populations.5
The effect is analogous to inbreeding. Homozygotes are more frequent and heterozygotes are less frequent than expected.6
4
For example, if we combine samples from different years or across age classes of long-lived organisms, we
may see a deficienty of heterozygotes in the sample purely as a result of allele frequency differences across
years.
5
If youre curious about this, feel free to ask, but Ill have to dig out my copy of Li [1] to answer. I dont
carry those details around in my head.
6
And this is what we predicted when we started.
= 0.0225
A1 A1
A1 A2
A2 A2
Expected
0.3025
0.4950
0.2025
(7)
(8)
Observed
+
0.0225 =
0.3250
- 2(0.0225) =
0.4500
+
0.0225 =
0.2250
Wrights F -statistics
One limitation of the way Ive described things so far is that Var(p) doesnt provide a
convenient way to compare population structure from different samples. Var(p) can be
much larger if both alleles are about equally common in the whole sample than if one occurs
at a mean frequency of 0.99 and the other at a frequency of 0.01. Moreover, if you stare at
equations (4)(6) for a while, you begin to realize that they look a lot like some equations
weve already encountered. Namely, if we were to define Fst 7 as Var(p)/
pq, then we could
rewrite equations (4)(6) as
1X 2
pi = p2 + Fst pq
k
(9)
1X
2pi qi = 2
pq(1 Fst )
(10)
k
1X 2
qi = q2 + Fst pq
(11)
k
And its not even completely artificial to define Fst the way I did. After all, the effect of
geographic structure is to cause matings to occur among genetically similar individuals. Its
rather like inbreeding. Moreover, the extent to which this local mating matters depends on
the extent to which populations differ from one another. pq is the maximum allele frequency
variance possible, given the observed mean frequency. So one way of thinking about Fst is
that it measures the amount of allele frequency variance in a sample relative to the maximum
possible.8
7
The reason for the subscript will become apparent later. Its also very important to notice that Im
defining FST here in terms of the population parameters p and Var(p). Again, well return to the problem
of how to estimate FST from data next time.
8
I say one way, because there are several other ways to talk about Fst , too. But we wont talk about
them until later.
There may, of course, be inbreeding within populations, too. But its easy to incorporate this into the framework, too.9 Let Hi be the actual heterozygosity in individuals
within subpopulations, Hs be the expected heterozygosity within subpopulations assuming
Hardy-Weinberg within populations, and Ht be the expected heterozygosity in the combined population assuming Hardy-Weinberg over the whole sample.10 Then thinking of f
as a measure of departure from Hardy-Weinberg and assuming that all populations depart
from Hardy-Weinberg to the same degree, i.e., that they all have the same f , we can define
Fit = 1
Hi
Ht
1 Fit =
where Fis is the inbreeding coefficient within populations, i.e., f , and Fst has the same
definition as before.11 Ht is often referred to as the genetic diversity in a population. So
another way of thinking about Fst = (Ht Hs )/Ht is that its the proportion of the diversity
in the sample thats due to allele frequency differences among populations.
References
[1] C. C. Li. First Course in Population Genetics. Boxwood Press, Pacific Grove, CA, 1976.
[2] S. Wahlund. Zusammensetzung von population und korrelationserscheinung vom standpunkt der vererbungslehre aus betrachtet. Hereditas, 11:65106, 1928.
9