PR Student Manual

Students Solutions Manual
Probability and Statistics

This manual contains solutions to odd-numbered exercises from the book Probability and Statistics by
Miroslav Lovric, published by Nelson Publishing.
Keep in mind that the solutions provided represent one way of answering a question or solving an
exercise. In many cases there are alternatives, so make sure that you dont dismiss your solution just
because it does not look like the solution in this manual.
This solutions manual is not meant to be read! Think, try to solve an exercise on your own, investigate
dierent approaches, experiment, see how far you get. If you get stuck and dont know how to proceed,
try to understand why you are having diculties before looking up the solution in this manual. If you
just read a solution you might fail to recognize the hard part(s); even worse, you might completely
miss the point of the exercise.
I accept full responsibility for errors in this text and will be grateful to anybody who brings them to
my attention. Your comments and suggestions will be greatly appreciated.
Miroslav Lovric September 2011
Department of Mathematics and Statistics
McMaster University
e-mail: [email protected]
Section 2 [Solutions] P1-1
Section 2 Stochastic Models
1. (a) The deterministic part p
t+1
= p
t
models a population which does not change in size (a dead
lion is immediately replaced by another lion). The stochastic term I
t
represents a possible inux of 6
new lions per year. There is a 50% chance that the inux (and thus an increase in population) occurs
in any given year. To make a prediction, it is reasonable to assume that in a period of 10 years, an
inux of 6 new lions will occur in 5 years. Thus, the most likely value for p
10
is 100 + 5 6 = 130.
The most likely values are those close to the 50-50 split: 5 or 7 years with an inux of 6 new lions per
year. So, the three most likely values for p
10
are 125, 130 and 135.
(b) Assume that heads (H) means inux of 6 new lions, and tails (T) represents no inux.
First simulation: HTHTHHHTTT; the corresponding values of p
t
, starting with p
0
= 100 are 100,
106, 106, 112, 112, 118, 124, 130, 130, 130, 130.
Second simulation: HTTTHTHHHT; the corresponding values of p
t
, starting with p
0
= 100 are
100, 106, 106, 106, 106, 112, 112, 118, 124, 130, 130.
Third simulation: TTTHHTTTHT; the corresponding values of p
t
, starting with p
0
= 100 are
100, 100, 100, 100, 106, 112, 112, 112, 112, 118, 118.
(c) The two extreme cases are: no immigration in any of the 10 years (in which case p
10
= 100)
and immigration in every year (in which case p
10
= 160). In-between are the cases of immigration
occurring anywhere from once in 10 years to nine times in 10 years. Thus, the values of p
10
(and thus
the sample space) are 100, 106, 112, 118, 124, 130, 136, 142, 148, 152, 156, 160.
3. (a) There is a 50% chance that m
1
= 2 and a 50% chance that m
1
= 1. If m
1
= 2, then there is
a 50% chance that m
2
1
= 2. If m
1
= 1, then there is a 50% chance
that m
2
1
= 1. Thus, there are three outcomes for m
2
: 4, 2 and
1. The value m
2
= 4 can happen in only one way; m
2
= 2 can happen in two ways; m
2
= 1 can
happen in one way. Thus, the chance that m
2
= 1 is 1/4. (For the record: the chance that m
2
= 4
is 1/4, and the chance that m
2
= 2 is 2/4 = 1/2.)
(b) To get m
4
, we have to multiply m
0
= 1 the total of four times by a combination of the
two factors 2 or 1. If we multiply 1 by 2 four times, we get m
4
= 16. If we multiply 1 by 2 three
times, then the fourth multiplication is by 1; we get m
4
= 8. If we multiply 1 by 2 two times, the
remaining two multiplications are by 1; we get m
4
= 4. If we multiply 1 by 2 once, the remaining
three factors are 1 and we get m
4
= 2. Finally, if we multiply 1 by 1 four times, we get m
4
= 1.
Thus, the sample space for m
4
is the set {1, 2, 4, 8, 16}.
5. (a) The deterministic part p
t+1
= p
t
models a population which does not change in size (a dead
leopard is immediately replaced by another leopard). The stochastic term I
t
represents the change in
the number of leopards. There is a 75% chance that the inux (i.e., an increase in the population by
3 leopards) occurs in any given year. With a 25% chance, 3 leopards leave in any given year.
(b) Take a four-year interval. In three of the four years, we expect an inux of 3 leopards per year. In
one of the four years, we expect that 3 leopards will leave. Thus, the total change in population in the
four years is 3 3 3 = 6 leopards; equivalently, the increase in population is, on average, 6/4 = 1.5
leopards per year. Thus, we predict that in 10 years the population will increase by 15 leopards.
In the long term, the population of leopards will increase (at an average of 1.5 leopards per year).
(c) We declare that diamonds (D for decrease) represent 3 leopards leaving in a given year, and the
remaining three suits (spades, hearts, and clubs; call them I for increase) represent an inux of 3
leopards in a given year. Assuming that the deck of cards is complete and fair, the chance of picking
a diamonds card is 1/4 = 25%.
First simulation: DIIIDIIIII; the corresponding values of p
t
, starting with p
0
= 100 are 100, 97,
100, 103, 106, 103, 106, 109, 112, 115, 118.
Second simulation: DIDDIIIDII; the corresponding values of p
t
, starting with p
0
= 100 are 100,
97, 100, 97, 94, 97, 100, 103, 100, 103, 106.
Thurd simulation: IIIDDDDDID; the corresponding values of p
t
, starting with p
0
= 100 are 100,
103, 106, 109, 106, 103, 100, 97, 94, 97, 94.
P1-2 Probability and Statistics [Solutions]
7. (a) We use a deck of cards and declare that one suit (say, diamonds) represents the no-immigration
year, and the remaining three suits (spades, hearts, and clubs) represent immigration of 12 new lions
in a year. Assuming that the four suits are equally likely to be drawn, the chance of one suit (say,
diamonds) to be picked is 1/4 = 25%.
An alternative is to use a mechanism capable of randomly generating numbers between 0 and
99 (there are 100 outcomes). We declare any number between 0 and 24 (total of 25 numbers) to
represent no-immigration, and the remaining 75 numbers (from 25 to 99) to represent immigration.
(This mechanism could be software or home-made: we could write the numbers on pieces of paper,
place them in a bowl and randomly pick a number, keeping in mind that we have to return the number
back into the bowl before picking another number.)
(b) In our simulation, we obtained the following: . The corresponding number of lions
is, starting with p
0
= 160 (we perform calculations using decimal numbers, and round o when we
are done):
p
1
= 0.95p
0
+I
0
= 0.95(160) + 0 = 152
p
2
= 0.95p
1
+I
1
= 0.95(152) + 12 = 156.40
p
3
= 0.95p
2
+I
2
= 0.95(156.40) + 12 = 160.58
p
4
= 0.95p
3
+I
3
= 0.95(160.58) + 12 = 164.55
p
5
= 0.95p
4
+I
4
= 0.95(164.55) + 0 = 156.32
p
6
= 0.95p
5
+I
5
= 0.95(156.32) + 12 = 160.50
Thus, p
6
= 160 (or p
6
= 161). We expect p
6
to be larger than the values in Figure 2.1, since the
chance of immigration is higher (75%, compared to 50%).
9. (a) The distribution of genotypes among the rst generation is: 1/4 of all ospring are AA, 1/2 of
all ospring are AB, and 1/4 of all ospring are BB.
(b) The ratio of genotype BB ospring in the second generation is: 1/4 (since all ospring of a
genotype BB plant are of genotype BB) + (1/4) (1/2) (since one quarter of ospring of genotype AB
parents are of genotype BB). Thus, in the second generation: 1/4 +(1/4) (1/2) = 3/8 of all ospring
are BB. For AA ospring, we use exactly the same reasoning; the ratio is 3/8 as well. The ratio of AB
ospring is 1 minus the sum of the ratios of AA and BB ospring, which is 13/83/8 = 2/8 = 1/4.
(c) We continue in the same way: All BB plants and 1/4 of AB plants from the second generation will
produce BB ospring. Thus, the ratio of BB ospring in the third generation is 3/8 + (1/4)(1/4) =
7/16.
11. (a) All ospring of AA and BB parents are of genotype AB, and so have long ears. Thus the
chance that an ospring of AA and BB parents has short ears is 0%.
(b) Making all possible combinations, we get AB, AB, BB, BB. Thus, an ospring of AB and BB
parents is of genotype AB (with a chance of 50%) or of genotype BB (with a chance of 50%). Thus,
the chance that an ospring of AB and BB parents is BB, i.e., has short ears, is 50%.
13. Denote by p
t
the chance that the molecule is still inside the region during the time interval
t. Thus, p
0
= 1 (initially, the molecule is inside the region). After one hour, the molecule is still
inside the region with a chance of 75%. Thus, p
1
= 0.75. After two hours, the molecule is still
inside the region if it was inside the region during the rst hour and during the second hour. Thus,
p
2
= 0.75 0.75 = 0.75p
1
Continuing in the same way, we obtain the dynamical system p
t+1
= 0.75p
t
whose solution is p
t
= 0.75
t
. From
0.75
t
< 0.1
t ln 0.75 < ln 0.1
t >
ln 0.1
ln 0.75
t > 8.0039
we conclude that the chance the molecule is still inside the region falls below 10% after 8 hours.
15. The chance that the molecule is inside the region after 2 minutes is 0.25 0.25 = 0.625 (the
molecule needs to be inside the region during the rst minute and during the second minute). The
the chance that the molecule is inside the region after 3 minutes is 0.25 0.25 0.25 = 0.015625, i.e.,
about 1.56%.
17. (a) By adding 1 and 1 to all elements of the sample space at time t we obtain the sample space
at time t + 1. When t = 0, the sample space is {0}. When t = 1, the sample space is {1, 1}. When
t = 2, the sample space is {2, 0, 2}. When t = 3, the sample space is {3, 1, 1, 3}. When t = 4, the
sample space is {4, 2, 0, 2, 4}. When t = 5, the sample space is {5, 3, 1, 1, 3, 5}.
(b) Continuing part (a), we nd the sample space at time t = 6 to be {6, 4, 2, 0, 2, 4, 6}.
(c) Looking at the pattern in (a) and (b), we see that the sample space at time t (i.e., after t steps
have been completed) is the set {t, t + 2, t + 4, . . . , t 4, t 2, t}.
Section 3 Basics of Probability Theory
1. Examples of experiments whose sample space consists of three simple events that are not equally
likely: (1) Modify the random walk routine: assume that a particle moves from its present position to
the left for one unit of distance with a 70% chance, to the right for one unit of distance with a 20%
chance, and remains where it is with a 10% chance. Declare the outcome of the experiment to be the
location of the particle starting at x = 0 after one step of this modied random walk. The sample set
is S = {1, 0, 1}, and the three simple events occur with dierent probability. (2) Only one of three
molecules diuses out of a cell. Molecule A diuses out with probability 0.4, molecule B diuses out
with probability 0.1, and molecule C diuses out with probability 0.5. (3) A wolf is hunting for food.
It catches a rabbit with probability 0.4, a mouse with probability 0.5 and does not catch anything
with probability 0.1.
Examples of experiments whose sample space consists of three simple events that are equally likely:
(1) Modify the random walk routine: with equal probability (1/3) the particle moves to the right, to
the left, or stays where it is. Declare the outcome of the experiment to be the location of the particle
starting at x = 0 after one step of this modied random walk. The sample set is S = {1, 0, 1}, and
the three outcomes have equal chance of occurring. (2) Only one of three molecules diuses out of a
cell. Each molecule diuses with the probability of 1/3. (3) A person randomly picks one of the three
ights available from Toronto to Vancouver.
3. The sample space S consists of all mutual products of numbers 1, 2, 3, 4, 5 and 6: Multiplying this
sequence by 1, we get 1, 2, 3, 4, 5, 6; multiplying by 2, we get 2, 4, 6, 8, 10, 12; multiplying by 3, we get
3, 6, 9, 12, 15, 18; multiplying by 4, we get 4, 8, 12, 16, 20, 24; multiplying by 5, we get 5, 10, 15, 20, 25, 30;
multiplying by 6, we get 6, 12, 18, 24, 30, 36. Listing each number once, we write the sample space as
S = {1, 2, 3, 4, 5, 6, 8, 9, 10, 12, 15, 16, 18, 20, 24, 25, 30, 36}. Counting the number of elements in S, we
see that |S| = 18.
5. The sample space S consists of all numbers from 0 to 8. Its size is |S| = 9.
7. Lets look at a small value for n rst, say n = 3. To construct the sample space, we think of forming
three-letter sequences where each letter is either H or T; for instance, HTH, TTH, and so on. We
have two choices for the rst letter (H or T), two choices for the second letter, and two choices for the
third letter. Thus, the total number of three-letter sequences is 2 2 2 = 2
3
= 8. By reasoning in the
same way, we conclude that the sample space for the experiment of tossing a coin n times consists of
2
n
elements.
9. Four years: the sample space consists of all four-letter sequences of letters, where each letter is either
I or N. Because we have two choices for each letter, the total number of sequences is 2222 = 2
4
= 16.
Its elements are: IIII, IIIN, IINI, INII, IINN, INNI, ININ, INNN, NIII, NIIN, NINI, NNII, NINN,
NNNI, NNIN, NNNN. (This, and the next part of the question are exercises in organizing the list:
note that the rst eight elements have I as the rst letter, and the remaining eight elements were
obtained from those by changing that rst letter from I to N.)
Five years: the sample space consists of all ve-letter sequences of letters, where each letter is either
I or N. Since we have two choices for each letter, the total number of sequences is 2 2 2 2 2 = 2
5
= 32.
To obtain a list of all elements, we use the list for the four years, and append I as the rst letter,
and then use the same list and append N as the rst letter: IIIII, IIIIN, IIINI, IINII, IIINN, IINNI,
IININ, IINNN, INIII, INIIN, ININI, INNII, ININN, INNNI, INNIN, INNNN, NIIII, NIIIN, NIINI,
NINII, NIINN, NINNI, NININ, NINNN, NNIII, NNIIN, NNINI, NNNII, NNINN, NNNNI, NNNIN,
NNNNN.
Reasoning in the same way, we conclude that the sample space for n years of the immigration/
no-immigration dynamics contains 2
n
elements.
11. We nd
A B = {1, 3, 5, 6, 7, 8, 9} {1, 3, 4} = {1, 3, 4, 5, 6, 7, 8, 9}
A B = {1, 3, 5, 6, 7, 8, 9} {1, 3, 4} = {1, 3}
A
c
= {1, 3, 5, 6, 7, 8, 9}
c
= {2, 4}
A B
c
= {1, 3, 5, 6, 7, 8, 9} {1, 3, 4}
c
= {1, 3, 5, 6, 7, 8, 9} {2, 5, 6, 7, 8, 9} = {5, 6, 7, 8, 9}
13. Since all numbers divisible by 4 are even (i.e., B A), it follows that AB = A and AB = B.
The complement of A consists of all odd (non-negative) numbers, A
c
= {1, 3, 5, 7, 9, . . .}. Finally,
A B
c
= {0, 2, 4, 6, 8, 10, . . .} {0, 4, 8, 12, 16, 20, . . .}
c
= {0, 2, 4, 6, 8, 10, . . .} {1, 2, 3, 5, 6, 7, 9, 10, 11, . . .} = {2, 6, 10, 14, 18, 22, . . .}
So A B
c
is the set of all (non-negative) even numbers which are not divisible by 4.
15. See below.
AB
A
S
B
(AB)
c
A
S
B
A
S
B
A
c
B
c
A
c
B
c
A
S
B
A
S
B
17. Looking at De Morgans law (A B)
c
= A
c
B
c
we realize that we can nd P(A
c
B
c
) if we
can nd P(A B); recall that P((A B)
c
) = 1 P(A B). Thus
P(A B) = P(A) +P(B) P(A B) = 0.4 + 0.2 0.1 = 0.5
P((A B)
c
) = 1 0.5 = 0.5 and so P(A
c
B
c
) = P((A B)
c
) = 0.5.
19. See the gure below for a proof that B = A(A
c
B). Because A and A
c
B are disjoint (A
c
B
is a subset of A
c
), we conclude that
P(B) = P(A (A
c
B)) = P(A) +P(A
c
B)
Since A is a proper subset of B, the set A
c
B is non-empty, and therefore P(A
c
B) > 0. Thus,
P(A) < P(B).
A
S
B
S S
B
B
A
A
c
B
21. Interpreting probability as area (or using the argument presented in Exercise 19), we realize
that A B B implies that P(A B) P(B). The data given (P(A B) = 0.4 and P(B) = 0.2)
contradict this formula.
23. (a) The probabilities add up to 1. Thus
P(4) = 1 P(1) P(2) P(3) P(5) = 1 0.4 0.15 0.2 0.1 = 0.15
(Because the meaning is clear, we drop the curly braces from the notation for the probability of an
event which consists of a single element, and write P(1) instead of P({1}), P(2) instead of P({2}),
and so on.)
(b) We compute
P(A) = P({1, 2}) = P(1) +P(2) = 0.4 + 0.15 = 0.55
P(B) = P({2, 3, 4}) = 0.15 + 0.2 + 0.15 = 0.5
P(A B) = P({1, 2, 3, 4}) = P(1) +P(2) +P(3) +P(4) = 0.4 + 0.15 + 0.2 + 0.15 = 0.9
(Or, P(A B) = P({1, 2, 3, 4}) = 1 P(5) = 1 0.1 = 0.9.)
(c) A and B are not disjoint, and therefore P(AB) = P(A)+P(B). To verify, we use the probabilities
we found in (b): P(A) +P(B) = 1.05 whereas P(A B) = 0.8.
25. (a) Since the sum of all probabilities is 1, we get
P(2) = 1 P(1) P(3) P(4) P(5) = 1 0.2 0.4 0.3 0.1 = 0
(Because the meaning is clear, we drop the curly braces from the notation for the probability of an
event which consists of a single element, and write P(1) instead of P({1}), P(2) instead of P({2}),
and so on.)
(b) We compute
P(A) = P(2) = 0 [Thus, A = {2} is an impossible event.]
P(A
c
) = 1 P(A) = 1 0 = 1
P(B) = P({1, 3, 4, 5}) = 0.2 + 0.4 + 0.3 + 0.1 = 1
P(B
c
) = 1 P(B) = 1 1 = 0
(c) Consider the formula P(A C) = P(A) + P(C) P(A C). Since A C A it follows that
P(A C) P(A). But since P(A) = 0, the probability P(A C) = 0 and therefore P(A C) =
P(A) +P(C) is true.
27. (a) The sample space S is
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT}
Let A = exactly two heads in a row occurred. Then A = {HHT, THH} and P(A) = |A|/|S| =
2/8 = 1/4.
(b) The sample space consists of four-letter sequences in which each letter is either H or T. Since
there are two choices for each of the four locations in the sequence, there is a total of 2
4
= 16
distinct sequences. Thus, |S| = 16. Let A = exactly two heads in a row occurred. Then A =
{HHTT, HHTH, THHT, TTHH, HTHH} and P(A) = |A|/|S| = 5/16.
29. The sample space consists of 36 simple events (see Example 3.12 and Table 3.2). A simple event
is an ordered pair (m, n) where m is the number that came up on the rst die and n is the number
that came up on the second die (1 m, n 6). Let A = maximum of the two numbers is 4. A
simple event (an ordered pair (m, n)) belongs to A if neither of its entries is larger than 4. There are 4
choices for m, and 4 choices for n, and so |A| = 16. [For the record: the following ordered pairs belong
to A: (1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2, 3), (2, 4), (3, 1), (3, 2), (3, 3), (3, 4), (4, 1), (4, 2), (4, 3),
(4, 4).] We conclude that P(A) = 16/36 = 4/9.
31. Let A = at least one child is a girl. Then A
c
= all children are boys. The sample space
consists of eight equally likely events (G for a girl and B for a boy): S = {GGG, BGG, GBG, GGB,
BBG, BGB, GBB, BBB}. Thus P(A
c
) = P(BBB) = 1/8, and so P(A) = 1 1/8 = 7/8.
33. We follow the strategy of Exercise 31. The sample space consists of six-letter sequences, where
each letter is either G or B. Since there are 2 choices for the rst letter, two choices for the second
letter, and so on, the total number of these six-letter sequences is 2
6
= 64. Let A = at least one child
is a girl and A
c
= all children are boys. Since A
c
consists of one event (BBBBBB), it follows that
P(A
c
) = 1/64, and so P(A) = 1 1/64 = 63/64.
35. From p/(1 p) = 2/100 we get 100p = 2 2p and 102p = 2. Thus, the corresponding probability
is p = 2/102 = 1/51.
37. (a) The sample space is {1, 0, 1}. By assumption, the three simple events are equally likely:
P(1) = P(0) = P(1) = 1/3.
(b) Assume that the particle is at 1 at t = 1. At t = 2, with equal probability, it is located at 2, or
1, or 0. We proceed by listing the remaining cases: a particle which is at 0 at t = 1 will be at 1, 0
or 1 at t = 2 (with equal probability). A particle which is at 1 at t = 1 will be at 0, 1 or 2 at t = 2
(with equal probability).
Summarizing the above information: 1 path leads to 2, 2 paths lead to 1, 3 paths lead to 0,
2 paths lead to 1 and 1 path leads to 2 (note the symmetry for the locations x and x). There are
1 + 2 + 3 + 2 + 1 = 9 equally likely paths, and so P(2) = P(2) = 1/9, P(1) = P(1) = 2/9, and
P(0) = 3/9.
(c) We proceed as in (b). A particle located at 2 when t = 2 is at 3, 2, or 1 when t = 3. A
particle located at 1 when t = 2 is at 2, 1, or 0 when t = 3. A particle located at 0 when t = 2
is at 1, 0, or 1 when t = 3. A particle located at 1 when t = 2 is at 0, 1, or 2 when t = 3. A
particle located at 2 when t = 2 is at 1, 2, or 3 when t = 3. This time, there is a total of 3
3
= 27
paths (for each t, there are three choices (move left, dont move, and move right), so there is a total
of 3 3 3 = 3
3
choices).
One path leads to 3. How many paths lead to 2? One from 2 (to which a particle can arrive
along one path; see (b)) and one from 1 (to which a particle can arrive along two paths; see (b)).
Thus, there are three paths that end at 2 at t = 3.
How many paths lead to 1? One from 2 (to which a particle can arrive along one path) one
from 1 (to which a particle can arrive along two paths) and one from 0 (to which a particle can
arrive along three paths) Thus, there are six paths that end at 1 at t = 3. Due to symmetry, we get
that one path leads to 3, three paths lead to 2 and six paths lead to 1.
To count the paths that lead to 0 we can proceed as above, or subtract from 27 the number of
paths that lead to all other locations; thus, there are 27 (1 + 3 + 6 + 6 + 3 + 1) = 7 paths that
lead to 0. It follows that P(3) = P(3) = 1/27, P(2) = P(2) = 3/27, P(1) = P(1) = 6/27, and
P(0) = 7/27.
39. (a) and (b) See below.
A
S
B
AB
S
B
S
B
c
A
A
S
B
S S
A

B
c
AB
(c) Since A B is a disjoint union of B and B
c
A, it follows that
P(A B) = P(B) +P(B
c
A)
Likewise, A is a disjoint union of A B and A B
c
, and thus
P(A) = P(A B) +P(A B
c
)
Eliminating P(A B
c
) we get
P(A B) = P(B) +P(B
c
A)
= P(B) + (P(A) P(A B))
= P(A) +P(B) P(A B)
Section 4 Conditional Probability and the Law of Total Probability
1. Take C to be a subset of A, so that A C = C; in that case, P(A| C) = P(A C)/P(C) =
P(C)/P(C) = 1. (Since we are asked to supply a specic example, we pick A = {2, 3, 4} and C =
{2, 4}.) If we take disjoint sets B and D, then P(B| D) = P(B D)/P(D) = P()/P(D) = 0. (For
example, B = {2, 3} and D = {4, 5}.)
3. We compute P(A B) = P(1) = 0.2, P(A) = P({1, 2, 3}) = 0.2 + 0.1 + 0.15 = 0.45, and
P(B) = P({1, 4, 5}) = 0.2 + 0.45 + 0.1 = 0.75. Thus, P(A| B) = P(A B)/P(B) = 0.2/0.75 = 2/75
and P(B| A) = P(B A)/P(A) = 0.2/0.45 = 2/45.
5. We compute P(A B) = P({4, 5}) = 0.4, P(A) = P({1, 2, 4, 5}) = 0.8, and P(B) = P({4, 5}) =
0.4. Thus, P(A| B) = P(A B)/P(B) = 0.4/0.4 = 1 (not a surprize, since B A) and P(B| A) =
P(B A)/P(A) = 0.4/0.8 = 1/2.
7. Looking at the formulas
P(A| B) =
P(A B)
P(B)
and P(B| A) =
P(B A)
P(A)
we notice that P(A| B) and P(B| A) have equal numerators. Thus, it is the denominators that we
need to think about.
(a) Since all simple events are equally likely, to make P(A) = P(B) we pick A and B to be of dierent
sizes (with a non-empty intersection). For instance, if A = {1, 2, 3} and B = {3, 4} then
P(A| B) =
P(A B)
P(B)
=
P(3)
P({3, 4})
= 1/2
P(B| A) =
P(B A)
P(A)
=
P(3)
P({1, 2, 3})
= 1/3
(b) To make P(A) = P(B) we pick sets of the same size. For instance, if A = {2, 3, 4} and B = {3, 4, 5}
then
P(A| B) =
P(A B)
P(B)
=
P({3, 4})
P({3, 4, 5})
= 2/3
P(B| A) =
P(B A)
P(A)
=
P({3, 4})
P({2, 3, 4})
= 2/3
Alternatively, we can pick two disjoint sets for A and B (not necessarily of the same size), in which
case both conditional probabilities are zero.
9. Dene A = two children are girls and B = third child is a boy. The probability that the third
child is a boy given that two children are girls is P(B| A).
If two children are girls means exactly two children are girls then P(B| A) = 1.
If two children are girls means at least two children are girls then we proceed as follows. The
sample space consists of eight equally likely events (G for a girl and B for a boy): S = {GGG, BGG,
GBG, GGB, BBG, BGB, GBB, BBB}. Thus
P(A) = P({GGG, BGG, GBG, GGB}) = 4/8
and
P(A B) = P(BGG, GBG, GGB }) = 3/8
and therefore P(B| A) = P(A B)/P(A) = (3/8)/(4/8) = 3/4.
11. Dene A = three children are of the same sex and B = fourth child is a girl. The probability
that the fourth child is a girl given that three children are of the same sex is P(B| A). The sample
space consists of four-letter sequences, where each letter is either G (girl) or B (boy). Since there are
two choices for the rst letter, two choices for the second letter, and so on, the total number of these
four-letter sequences is 2
4
= 16.
If A = three children are of the same sex means A = exactly three children are of the same
sex then
P(B| A) =
P(A B)
P(A)
=
P({BBBG, BBGB, BGBB, GBBB})
P({BBBG, BBGB, BGBB, GBBB, GGGB, GGBG, GBGG, BGGG})
=
1
2
If A = three children are of the same sex means A = at least three children are of the same
sex then
P(B| A) =
P(A B)
P(A)
=
P({BBBG, BBGB, BGBB, GBBB, GGGG})
P({BBBB, BBBG, BBGB, BGBB, GBBB, GGGB, GGBG, GBGG, BGGG, GGGG})
=
1
2
13. Dene A = one toss is H and B = at least two H. The probability we are looking for is
P(B| A). The sample space (tossing a coin three times) is
S = {HHH, HHT, HTH, THH, HTT, THT, TTH, TTT}
Thus
P(A) = 1 P(all tosses are T) = 1 1/8 = 7/8
and
P(A B) = P(as least two H) = P({HHH, HHT, HTH, THH}) = 4/8
and therefore P(B| A) = P(A B)/P(A) = (4/8)/(7/8) = 4/7.
15. Dene A = one die shows a number larger than 3 and B = sum is equal to 7; we are looking
for P(B| A). The sample space consists of 36 elements (see Example 3.12 and Table 3.2 in Section 3).
To nd P(A) we can list all ordered pairs (m, n) such that one or both numbers are equal to
4, 5, or 6. Alternatively, we look at the complementary event A
c
= both dice show 1, or 2, or 3.
Since there are 3 choices for each of the two numbers, |A
c
| = 3
2
= 9, and P(A
c
) = 9/36; thus,
P(A) = 1 9/36 = 27/36. Now
P(A B) = P(sum is 7 and one die shows a number larger than 3)
= P({(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}) = 6/36
and therefore P(B| A) = P(A B)/P(A) = (6/36)/(27/36) = 6/27 = 2/9.
17. Dene A = baby tiger has one T allele and B = baby tiger has a striped tail; we are looking
for P(B| A). The sample space of genotypes consists of three events {PP, PT, TT} with probabilities
P(PP) = 1/4, P(PT) = 1/2, and P(TT) = 1/4. Thus P(A) = P({PT, TT}) = 1/2 + 1/4 = 3/4,
P(A B) = P(TT) = 1/4 and therefore P(B| A) = P(A B)/P(A) = (1/4)/(3/4) = 1/3.
19. (a) We have to pick at least one pair of non-disjoint subsets (i.e., they have to have a non-empty
intersection). For instance, take A = {1, 2, 3}, B = {3, 4}, and C = {5}. The union of these sets is S,
but A and B are not mutually exclusive.
(b) Let A = {1}, B = {2}, and C = {3}. The three sets are mutually exclusive, but their union
{1, 2, 3} is not equal to S.
(c) Let A = {1, 4}, B = {2, 5}, and C = {3}. The three sets are mutually exclusive and their union is
equal to the universal set S.
21. The diagram below will help us calculate the probabilities. We use the following: F = female,
M = male, S = smoker, and NS = non-smoker. The subsets F and M form a partition of the
surveyed population.
F
surveyed
population
P( F) =0.6
M
S
S
NS
NS
P( M) =0.4
P( S| F) =0.2
P( S| M) =0.35
P( NS| F) =0.8
P( NS| M) =0.65
(a) By the law of total probability,
P(S) = P(S | F)P(F) +P(S | M)P(M) = (0.2)(0.6) + (0.35)(0.4) = 0.26
(b) Using Bayes formula, we get
P(M| S) =
P(S | M)P(M)
P(S | F)P(F) +P(S | M)P(M)
=
(0.35)(0.4)
0.26
=
14
26
=
7
13
23. We use the following: C = child, Y = adolescent, A = adult, and F = u. The subsets
C, Y, and A form a partition of the given population.
P(F) = P(F | C)P(C) +P(F | Y )P(Y ) +P(F | A)P(A)
= (0.45)(0.2) + (0.2)(0.3) + (0.15)(0.5) = 0.225
P(A| F) =
P(F | A)P(A)
P(F | C)P(C) +P(F | Y )P(Y ) +P(F | A)P(A)
=
(0.15)(0.5)
0.225
=
0.075
0.225
=
1
3
25. Let F = female, M = male, and A = asthma. The subsets F and M form a partition
of the population of young adults. It is given that P(F) = P(M) = 0.5, P(A| F) = 0.064 and
P(A| M) = 0.045.
P(A) = P(A| F)P(F) +P(A| M)P(M) = (0.064)(0.5) + (0.045)(0.5) = 0.0545
P(F | A) =
P(A| F)P(F)
P(A| F)P(F) +P(A| M)P(M)
=
(0.064)(0.5)
0.0545
=
0.032
0.0545
0.587
27. Let R = rain, NR = no rain, and C = car available. The subsets R and NR form a partition
of the set of possible weather conditions tomorrow. It is given that P(R) = 0.6 (thus P(NR) = 0.4),
P(C | R) = 0.3, and P(C | NR) = 0.9. By the law of total probability,
P(C) = P(C | R)P(R) +P(C | NR)P(NR) = (0.3)(0.6) + (0.9)(0.4) = 0.54
29. Let M = person has meningitis, NM = person does not have meningitis, and A = test for
meningitis is positive. The subsets M and NM form a partition of the population of Canada. It is
given that P(M) = 3.4/100, 000 (thus P(NM) = 1 3.4/100, 000 = 99, 996.6/100, 000), P(A| M) =
0.85, and P(A| NM) = 0.07.
P(A) = P(A| M)P(M) +P(A| NM)P(NM)
= 0.85
3.4
100, 000
+ 0.07
99, 996.6
100, 000
=
7, 002.652
100, 000
0.0700
So, the probability that a randomly selected person tests positive for meningitis is about 7%.
P(M| A) =
P(A| M)P(M)
P(A| M)P(M) +P(A| NM)P(NM)
=
0.85
3.4
100,000
7,002.652
100,000
=
2.890
7002.62
0.00041270
So if a person tests positive for bacterial meningitis, the probability that they have it is very small,
about 0.04%.
31. (a) See below. Because E
1
, E
2
, and E
3
form a partition, they are disjoint, and thus the sets
A E
1
, A E
2
, and A E
3
are disjoint as well (because A E
1
is a subset of E
1
, A E
2
is a subset
of E
2
, and AE
3
is a subset of E
3
). Therefore, P(A) = P(AE
1
) +P(AE
2
) +P(AE
3
) is true.
A
E
1
E
3
E
2
AE
3
AE
2
AE
1
E
1
E
3
E
2
(b) From P(A| E
1
) = P(A E
1
)/P(E
1
) it follows that P(A E
1
) = P(A| E
1
)P(E
1
). Likewise,
P(A E
2
) = P(A| E
2
)P(E
2
) and P(A E
3
) = P(A| E
3
)P(E
3
). So, the equation
P(A) = P(A E
1
) +P(A E
2
) +P(A E
3
)
implies that
P(A) = P(A| E
1
)P(E
1
) +P(A| E
2
)P(E
2
) +P(A| E
3
)P(E
3
)
(c) Assume that E
1
, E
2
, . . . , E
n
form a partition of S. As in (a), a Venn diagram shows that A can be
written as a union of disjoint sets
A = (A E
1
) (A E
2
) (A E
n
)
Thus
P(A) = P(A E
1
) +P(A E
2
) + +P(A E
n
)
Repeating the calculation in (b), we show that P(A E
i
) = P(A| E
i
)P(E
i
) for all i = 1, 2, . . . , n.
Therefore,
P(A) = P(A| E
1
)P(E
1
) +P(A| E
2
)P(E
2
) + +P(A| E
n
)P(E
n
)
and we are done.
Section 5 Independence
1. No. If A and B are disjoint, then P(A B) = P() = 0, and the condition for independence
P(A B) = P(A)P(B) reads 0 = P(A)P(B). This equation implies that either P(A) = 0 and
P(B) = 0, which contradicts the assumption that P(A) > 0 and P(B) > 0.
3. We compute
P(B| A) =
P(B A)
P(A)
=
P(5)
P({4, 5})
=
0.1
0.4 + 0.1
=
0.1
0.5
= 0.2
P(B) = P({2, 5}) = 0.1 + 0.1 = 0.2
Since P(B| A) = P(B), it follows that B and A are independent. (Because the meaning is clear,
we drop the curly braces from the notation for the probability of an event which consists of a single
element, and write P(5) instead of P({5}).)
5. We compute
P(A| B) =
P(A B)
P(B)
=
P({1, 5})
P({1, 2, 5})
=
0.2 + 0.1
0.2 + 0.1 + 0.1
=
0.3
0.4
= 0.75
P(A) = P({1, 5}) = 0.2 + 0.1 = 0.3
Since P(A| B) = P(A), it follows that A and B are not independent.
7. We compute
P(A B) = P({1}) = 0.2
P(A) = P({1, 3}) = 0.2 + 0.2 = 0.4
P(B) = P({1, 4}) = 0.2 + 0.3 = 0.5
Since P(A)P(B) = (0.4)(0.5) = 0.2 = P(A B), it follows that A and B are independent.
9. Because A = {1, 3} and B = {2, 3} are independent,
P(A B) = P(A)P(B)
P({3}) = (0.5)(0.4)
and so P(3) = 0.2. (Since the meaning is clear, we drop the curly brace notation for the probability
of an event which consists of a single element and write P(3) instead of P({3}), P(1) instead of
P({1}), and so on.) From P(A) = P({1, 3}) = P(1) +P(3) we get 0.5 = P(1) + 0.2 and P(1) = 0.3.
Likewise, P(B) = P({2, 3}) = P(2) + P(3) implies 0.4 = P(2) + 0.2 and P(2) = 0.2. Finally,
P(4) = 1 P(1) P(2) P(3) = 0.3.
11. We nd P(A) = 0.2 +0.3 = 0.5. The relation for independence P(AB) = P(A)P(B) tells us to
look for a two-element set B (thus P(B) > 0) so that P(A B) = 0.5P(B).
Note that B cannot be disjoint from A (since in that case P(A B) = 0 and the relation
P(AB) = 0.5P(B) does not hold). If B = A, then P(AB) = P(B), and again P(AB) = 0.5P(B)
does not hold. This analysis shows that B must have one element in common with A.
Now, its a matter of trial-and-error. Assume that A B = {2} and take B = {2, 1}. Then
P(A B) = P(2) = 0.2 and P(B) = P(2) + P(1) = 0.5, so P(A B) = 0.5P(B) does not hold. In
the same way, we learn that the choice B = {2, 3} does not work either.
Thus, it must be that A B = {4}. Take B = {4, 1}. Then P(A B) = P(4) = 0.3 and
P(B) = P(4) +P(1) = 0.6 and P(A B) = 0.5P(B) is satised. Thus, B = {4, 1}. By showing that
B = {4, 3} does not satisfy P(A B) = 0.5P(B), we show that the answer for B is unique.
13. Let Q
i
= student answers the ith question correctly, where i = 1, 2, . . . , 10. The context implies
that Q
i
are independent events and P(Q
i
) = 1/2 for all i. The probability of complementary events
Q
c
i
= student answers the ith question incorrectly is P(Q
c
i
) = 1/2, i = 1, 2, . . . , 10.
(a) As usual, the phrase at least suggests that we use a complementary event. If A = student
answers at least one question correctly then A
c
= student answers all questions incorrectly. From
A
c
= Q
c
1
Q
c
2
Q
c
10
we get (by independence)
P(A
c
) = P(Q
c
1
)P(Q
c
2
) P(Q
c
10
) =
_
1
2
_
10
=
1
1024
Thus,
P(A) = 1 P(A
c
) = 1
1
1024
=
1023
1024
0.99902
(b) Let B = student answers all questions correctly. Then from
B = Q
1
Q
2
Q
10
we compute
P(B) =
_
1
2
_
10
=
1
1024
0.00098
15. Let G
i
= ith child is a girl and B
i
= ith child is a boy. It is given that P(G
i
) = 0.45 and
P(B
i
) = 0.55. In part (a), i = 1, 2, 3; in part (b), i = 1, 2, 3, 4.
(a) Let A = two girls. Then
A = (G
1
G
2
B
3
) (G
1
B
2
G
3
) (B
1
G
2
G
3
)
Note that A is a union of three disjoint sets. By the mutual exclusivity property and then by the
independence, we get
P(A) = P(G
1
G
2
B
3
) +P(G
1
B
2
G
3
) +P(B
1
G
2
G
3
)
= P(G
1
)P(G
2
)P(B
3
) +P(G
1
)P(B
2
)P(G
3
) +P(B
1
)P(G
2
)P(G
3
)
= (0.45)(0.45)(0.55) + (0.45)(0.55)(0.45) + (0.55)(0.45)(0.45)
= 3(0.45)
2
(0.55) 0.334
(b) Let C = at least two children are boys. The event C includes all combinations involving 2, 3,
and 4 boys. To reduce the number of combinations, we consider C
c
= no boys or one boy. Since
C
c
= (G
1
G
2
G
3
G
4
) (B
1
G
2
G
3
G
4
) (G
1
B
2
G
3
G
4
)
(G
1
G
2
B
3
G
4
) (G
1
G
2
G
3
B
4
)
it follows that (again, by the mutual exclusivity and the independence of the events)
P(C
c
) = P(G
1
)P(G
2
)P(G
3
)P(G
4
) +P(B
1
)P(G
2
)P(G
3
)P(G
4
) +P(G
1
)P(B
2
)P(G
3
)P(G
4
)
+P(G
1
)P(G
2
)P(B
3
)P(G
4
) +P(G
1
)P(G
2
)P(G
3
)P(B
4
)
= (0.45)
4
+ (0.45)
3
(0.55) + (0.45)
3
(0.55) + (0.45)
3
(0.55) + (0.45)
3
(0.55)
= (0.45)
4
+ 4(0.45)
3
(0.55) 0.241
The probability that at least two children are boys is
P(C) = 1 P(C
c
) 1 0.241 = 0.759
17. Let H
i
= ith house dust mite survives in laundry washed at 60
o
C, where i = 1, 2, . . . , 100. It is
given that P(H
i
) = 0.01 for all i; thus, P(H
c
i
) = 0.99 for all i. We are looking for the probability of
A = at least one house dust mite survives. Consider the complementary event A
c
= none of the
100 house dust mites survives, i.e.,
A
c
= H
c
1
H
c
2
H
c
100
Then (using the independence)
P(A
c
) = P(H
c
1
)P(H
c
2
) P(H
c
100
) = (0.99)
100
and so P(A) = 1 (0.99)
100
0.634.
19. Let F
i
= test result for the ith person is false-negative, where i = 1, 2, . . . , 50. It is given
that P(F
i
) = 0.012 for all i; thus, F
c
i
= test result for the ith person is not false-negative and
P(F
c
i
) = 0.988 for all i. We are looking for the probability of F = at least one false-negative test
result in a group of 50 people. Consider the complementary event F
c
= no one in a group of 50
people receives a false-negative test result
F
c
= F
c
1
F
c
2
F
c
50
Assuming the independence of testing,
P(F
c
) = P(F
c
1
)P(F
c
2
) P(F
c
50
) = (0.988)
50
and P(F) = 1 (0.988)
50
0.453. Thus, the probability of at least one false-negative test result is
quite high, about 45.3%.
21. Let C
i
= use of a condom prevents pregnancy in year i, where i = 1, 2, 3, 4, 5. It is given that
P(C
i
) = 0.86 for all i. We are looking for the probability of A = sexually active woman who uses
condoms regularly gets pregnant at least once in a 5-year period. Consider the complementary event
A
c
= sexually active woman who uses condoms regularly does not get pregnant in a 5-year period.
We write
A
c
= C
1
C
2
C
3
C
4
C
5
Assuming the independence of events,
P(A
c
) = P(C
1
)P(C
2
)P(C
3
)P(C
4
)P(C
5
) = (0.86)
5
and P(A) = 1 (0.86)
5
0.530. The probability at least one pregnancy in ve years is about 53%.
23. We use abbreviated symbols to represent independence conditions: we write XY for P(XY ) =
P(X)P(Y ), XY Z for P(X Y Z) = P(X)P(Y )P(Z), and so on.
(a) To prove that the four events A, B, C, and D are independent, we need to check: pairs of events
AB, AC, AD, BC, BD, CD; triples of events ABC, ABD, ACD, BCD; and the quadruple of events
ABCD. Thus, we need to check the total of 6 + 4 + 1 = 11 conditions.
(b) To prove that the ve events A, B, C, D, and E are independent, we need to check: pairs of events
AB, AC, AD, AE, BC, BD, BE, CD, CE, DE; triples of events ABC, ABD, ABE, ACD, ACE,
ADE, BCD, BCE, BDE, CDE; quadruples of events ABCD, ABCE, ABDE, ACDE, BCDE;
and the quintuplet of events ABCDE. Thus, we need to check the total of 10 + 10 + 5 + 1 = 26
conditions.
(c) (Section 10 reasoning.) Assume that there are n events. The number of conditions involving 2
events is the number of ways we can pick a group of 2 symbols out of the group of n symbols, which is
_
n
2
_
; the number of conditions involving 3 events is the number of ways we can pick a group of 3 symbols
out of the group of n symbols, which is
_
n
3
_
; and so on. (So, the sum in (b) is
_
5
2
_
+
_
5
3
_
+
_
5
4
_
+
_
5
5
_
.)
25. (a) Using g
t+1
= ag
t
, we compute g
1
= ag
0
, g
2
= ag
1
= a(ag
0
) = a
2
g
0
, g
3
= ag
2
= a(a
2
g
0
) = a
3
g
0
,
and so on. Thus, g
t
= a
t
g
0
. When a = 1 m, we get g
t
= g
0
(1 m)
t
.
(b) We check that the left side r
t+1
is equal to the right side (1 m)r
t
+ m when we substitute
r
t
= (r
0
1)(1 m)
t
+ 1:
r
t+1
= (r
0
1)(1 m)
t+1
+ 1
(1 m)r
t
+m = (1 m)[(r
0
1)(1 m)
t
+ 1] +m
= [(r
0
1)(1 m)
t+1
+ 1 m] +m
= (r
0
1)(1 m)
t+1
+ 1
Section 6 Discrete Random Variables
1. The range of X is the set {5, 6, 7, 8, 9, 10, . . .}; it is an innite, countable set (since its elements can
be listed in a sequence). Thus, X is a discrete random variable.
3. Since p(1) +p(2) +p(3) = 0.16 + 0.54 + 0.29 = 0.99 = 1, p cannot be a probability mass function
of a random variable.
5. Because F(x) = 0.32 if 0 x < 1 and F(x) = 0.31 if 1 x < 2 it follows that F(x) is decreasing
on a part of its domain. Thus one of the properties of a cumulative distribution function (F(x) is
non-decreasing for all x) fails to hold.
7. The sample space S consists of four-letter sequences, where each letter is either H or T. Thus, S
contains 2 2 2 2 = 2
2
= 16 elements:
S = {THHH, TTHH, THTH, THHT, TTTH, TTHT, THTT, TTTT,
HHHH, HTHH, HHTH, HHHT, HTTH, HTHT, HHTT, HTTT}
Since all 16 events are equally likely, the probability of any one occurring is 1/16.
The range of X is {0, 1, 2, 3, 4}. The probabilities are:
P(X = 0) = P({HHHH}) = 1/16
P(X = 1) = P({THHH, TTHH, THTH, THHT, TTTH, TTHT, THTT, TTTT}) = 8/16
P(X = 2) = P({HTHH, HTTH, HTHT, HTTT}) = 4/16
P(X = 3) = P({HHTH, HHTT}) = 2/16
P(X = 4) = P({HHHT}) = 1/16
The probability mass function of X is given in the table below.
x P(X = x)
0 1/16
1 1/2
2 1/4
3 1/8
4 1/16
9. The sample space of the experiment consists of 36 simple events
S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), . . . , (6, 5), (6, 6)}
(where (m, n) means that the number m came up on the rst die and n came up on the second die).
Since all simple events are equally likely, the probability that any one occurs is 1/36.
The range of X is {1, 2, 3, 4, 5, 6}. The probabilities are:
P(X = 1) = P({(1, 1)}) = 1/36
P(X = 2) = P({(1, 2), (2, 1), (2, 2)}) = 3/36
P(X = 3) = P({(1, 3), (3, 1), (2, 3), (3, 2), (3, 3)}) = 5/36
P(X = 4) = P({(1, 4), (4, 1), (2, 4), (4, 2), (3, 4), (4, 3), (4, 4)}) = 7/36
P(X = 5) = P({(1, 5), (5, 1), (2, 5), (5, 2), (3, 5), (5, 3), (4, 5), (5, 4), (5, 5)}) = 9/36
P(X = 6) = P({(1, 6), (6, 1), (2, 6), (6, 2), (3, 6), (6, 3), (4, 6), (6, 4), (5, 6), (6, 5), (6, 6)}) = 11/36
The probability mass function of X is given in the table below.
x P(X = x)
1 1/36
2 3/36
3 5/36
4 7/36
5 9/36
6 11/36
11. We use the mutual exclusivity of events to calculate the probabilities. The probability distribution
for Y : P(Y = 0) = 1/8, P(Y = 1) = 4/8, P(Y = 2) = 2/8, P(Y = 3) = 1/8. The probability
distribution for Z: P(Z = 0) = 1/8, P(Z = 1) = 4/8, P(Z = 2) = 2/8, P(Z = 3) = 1/8.
The probability distribution for W: P(W = 6) = 1/8, P(W = 1) = 3/8, P(W = 4) = 3/8,
P(W = 9) = 1/8.
13. Denote the presence of a virus by V and its absence by N. The sample space is
S = {VV, NV, VN, NN}
For instance, NV describes the event that the population (which starts virus-free, by assumption) is
virus-free for a month; then the virus appears.
The random variable X counts the number of virus-free months in the 2-month period; thus
X(VV) = 0, X(NV) = 1, X(VN) = 1, and X(NN) = 2.
Using independence, we compute
P(VV) = P(V during the rst month and V during the second month
= P(V during the rst month)P(V during the second month)
= (0.3)(0.4) = 0.12
P(NV) = P(N)P(V) = (0.7)(0.3) = 0.21
P(VN) = P(V)P(N) = (0.3)(0.6) = 0.18
P(NN) = P(N)P(N) = (0.7)(0.7) = 0.49
Thus, the probability mass function for X is given by P(X = 0) = 0.12, P(X = 1) = 0.21+0.18 = 0.39,
and P(X = 2) = 0.49.
15. Denote a dark brown baby monkey by D and a light brown baby monkey by L. It is given that,
in every year, P(D) = 0.65 and P(L) = 0.35. The sample space is
S = {DDD, LDD, DLD, DDL, LLD, LDL, DLL, LLL}
and the probabilities of the simple events in S are (using independence):
P(DDD) = P(D)P(D)P(D) = (0.65)
3
= 0.274625
P(LDD) = P(DLD) = P(DDL) = P(D)P(D)P(L) = (0.65)
2
(0.35) = 0.147875
P(LLD) = P(LDL) = P(DLL) = P(D)P(L)P(L) = (0.65)(0.35)
2
= 0.079625
P(LLL) = P(L)P(L)P(L) = (0.35)
3
= 0.042875
The range of R is {0, 1, 2, 3}. Using mutual exclusivity, we compute
P(R = 0) = 0.042875
P(R = 1) = 3(0.079625) = 0.238875
P(R = 2) = 3(0.147875) = 0.443625
P(R = 3) = 0.274625
17. Denote a red-eyed baby monkey by D, a blue-eyed baby monkey by E, a brown-eyed baby monkey
by N (for lack of good notation, we use the last letter of the word for the colour). It is given that, in
every year, P(D) = 0.15, P(E) = 0.05, and P(N) = 0.8. The sample space is
S = {DD, EE, NN, DE, ED, NE, EN, DN, ND}
and the probabilities of the events in S are (using independence):
P(DD) = P(D)P(D) = (0.15)
2
= 0.0225
P(EE) = P(E)P(E) = (0.05)
2
= 0.0025
P(NN) = P(D)P(D) = (0.8)
2
= 0.64
P(DE) = P(ED) = P(D)P(E) = (0.15)(0.05) = 0.0075
P(NE) = P(EN) = P(N)P(E) = (0.8)(0.05) = 0.04
P(DN) = P(ND) = P(D)P(N) = (0.15)(0.8) = 0.12
The probability mass function of B = number of blue-eyed baby monkeys born to the couple in a
2-year period is given in the table below.
x P(B = x)
0 P(DD, NN, DN, ND) = 0.0225 + 0.64 + 2(0.12) = 0.9025
1 P(DE, ED, NE, EN) = 2(0.0075) + 2(0.04) = 0.095
2 P(EE) = 0.0025
19. See below. The word that best describes the histogram is uniform.
1 2 3 4 5 6 7 8
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
21. See below. The word that best describes the histogram is skewed right.
1 2 3 4 5 6 7 8
0
0.1
0.2
0.3
0.4
0.5
23. The discontinuities of F(x) occur at x = 0.7, 1, and 1.2. The sizes of the jumps determine the
non-zero probabilities, and hence the probability mass function of X: P(X = 0.7) = 0.3 0 = 0.3,
P(X = 1) = 0.7 0.3 = 0.4, and P(X = 1.2) = 1 0.7 = 0.3.
25. The discontinuities of F(x) occur at x = 1/2, 1, 3/2, and 3. The sizes of the jumps determine the
non-zero probabilities, and hence the probability mass function of X: P(X = 1/2) = 0.1 0 = 0.1,
P(X = 1) = 0.5 0.1 = 0.4, P(X = 3/2) = 0.8 0.5 = 0.3, and P(X = 3) = 1 0.8 = 0.2.
27. The values of F(x) are zero until x (moving from to ) reaches the smallest value in the
range of X, which is x = 0. There, F(0) = P(X 0) = P(X = 0) = 0.25. Then, F(x) remains
constant until x reaches the next value in the range of X, which is x = 1. The value of F is
F(1) = P(X 1) = P(X = 0) +P(X = 1) = 0.25 + 0.25 = 0.5
Continuing in this way, we obtain the following:
F(x) =
_
_
0 x < 0
0.25 0 x < 1
0.5 1 x < 2
0.75 2 x < 3
1 x 3
See the graph below.
1 2 3
x
0
0.25
F( x)
1
0.5
0.75
29. The values of F(x) are zero until x (moving from to ) reaches the smallest value in the
range of X, which is x = 0. There, F(0) = P(X 0) = P(X = 0) = 0.8. Then, F(x) remains constant
until x reaches the next value in the range of X, which is x = 1. The value of F is
F(1) = P(X 1) = P(X = 0) +P(X = 1) = 0.8 + 0.05 = 0.85
Continuing in the same way, we obtain the following:
F(x) =
_
_
0 x < 0
0.8 0 x < 1
0.85 1 x < 2
0.9 2 x < 3
0.95 3 x < 4
1 x 4
See the graph below.
1 2 3 x 0
0.85
F( x)
1
0.9
0.8
0.95
4
31. (a) Initial location: 0; locations after 1 step: 1 and 1; locations after 2 steps: 2, 0 and 2;
locations after 3 steps: 3, 1, 1, and 3; locations after 4 steps: 4, 2, 0, 2, and 4; locations
after 5 steps: 5, 3, 1, 1, 3, and 5. To move one step ahead, we add 1 to the locations in the
previous step or subtract 1 from the locations in the previous step. Adding 1 to an even number
(or subtracting 1 from an even number) makes it odd, and vice versa. A particle starts at an even
numbered location (x = 0). Thus, after an even (odd) number of steps, the particle arrives at an
even-numbered (odd-numbered) location.
(b) To be absorbed, the particle needs to reach 3 or 3, which are odd numbers. The particle can
reach 3 or 3 in 3 steps (in which case X = 3). If it does not, it means that it ended at 1 or 1 after 3
steps (since after an odd number of steps a particle can only be in an odd-numbered location). Thus,
the particle needs 2 more steps to reach 3 or 3 (in which case X = 5); if it does not, it means that
it ended at 1 or 1; repeating this routine, we see that X can assume only odd-numbered values.
33. We read the values from the table. The probability mass function of X is given by P(X = 1) = 0.3,
P(X = 2) = 0.1, P(X = 3) = 0.2, P(X = 4) = 0.1, and P(X = 5) = 0.3. The discontinuities of the
cumulative distribution function F(x) of X occur at x = 1, 2, 3, 4 and 5. We nd that
F(x) =
_
_
0 x < 1
0.3 1 x < 2
0.4 2 x < 3
0.6 3 x < 4
0.7 4 x < 5
1 x 5
Section 7 The Mean, the Median, and the Mode
1. Ordering S
1
, we get S
1
= {2, 3, 4, 5, 6, 7, 10}; the median is 5. Ordering S
2
, we get S
2
=
{2, 3, 4, 5, 6, 700,000, 1,000,000}; the median is 5 as well. The median fails to capture large dier-
ence in the values at the right ends of the two distributions.
3. Adding up the values of all elements in S
1
and dividing by the number of elements in S
1
we get the
mean of S
1
. To calculate the mean of S
2
, the numerator doubles whereas the denominator remains
the same. Thus, the mean of S
2
is double the mean of S
1
.
The location of the midpoint of the two distributions does not change, since multiplication by 2
does not change the order (assume that S
1
and S
2
are ordered; if a is before b in the list for S
1
, then
2a is before 2b in the list for S
2
). Thus, the median of S
2
is double the median of S
1
.
If a is the value that appears most often in S
1
, then the value 2a appears most often in S
2
. So,
the mode of S
2
is double the mode of S
1
.
5. Intutively: since all outcomes are equally likely, the mean is (1+2+3+ +10)/10 = 55/10 = 5.5.
Formally,
E(X) =
10
k=1
k P(X = k) =
10
k=1
k
1
10
=
1
10
(1 + 2 + 3 + + 10) =
1
10

10 11
2
= 5.5
(Recall the formula:

n
k=1
k = 1 + 2 + 3 + +n = n(n + 1)/2.)
7. No. Consider the random variable X given by P(X = 0) = 0.5 and P(X = 6) = 0.5 Then
E(X) = 0 0.5 +6 0.5 = 3. The distribution of X
2
is P(X
2
= 0) = 0.5 and P(X
2
= 36) = 0.5 and so
E(X
2
) = 0 0.5 + 36 0.5 = 18. (This is just one of many counterexamples.)
9. Using properties of the expected value,
E(2X
2
4X + 1) = E(2X
2
) E(4X) +E(1) = 2E(X
2
) 4E(X) +E(1)
Since
E(1) =

x
1 P(X = x) =

x
P(X = x) = 1
we get E(2X
2
4X + 1) = 2(3) 4(2) + 1 = 1.
11. Using the properties of the expected value,
E(Y ) = E
_
1
(X )
_
=
1
E (X ) =
1
(E(X) ) = 0,
since, by assumption, E(X) = . (Recall that E(X +b) = E(X) +b for a real number b; replacing b
by , we get E(X ) = E(X) , which is how the second last equality was obtained.)
13. We compute
E(X) =
3
x=0
x P(X = x) = 0 0.25 + 1 0.25 + 2 0.25 + 3 0.25 = 6(0.25) = 1.5
E(X
2
) =
3
x=0
x
2
P(X = x) = 0 0.25 + 1 0.25 + 4 0.25 + 9 0.25 = 14(0.25) = 3.5
E(X(X 1)) =
3
x=0
x(x 1) P(X = x) = 0 0.25 + 0 0.25 + 2 0.25 + 6 0.25 = 8(0.25) = 2
To check:
E(X(X 1)) = E(X
2
X) = E(X
2
) E(X) = 3.5 1.5 = 2
15. We compute
E(X) =
4
x=0
x P(X = x) = 0 0.8 + 1 0.05 + 2 0.05 + 3 0.05 + 4 0.05 = 10(0.05) = 0.5
E(X
2
) =
4
x=0
x
2
P(X = x) = 0 0.8 + 1 0.05 + 4 0.05 + 9 0.05 + 16 0.05 = 30(0.05) = 1.5
E(X(X 1)) =
4
x=0
x(x 1) P(X = x) = 0 0.8 + 0 0.05 + 2 0.05 + 6 0.05 + 12 0.05
= 20(0.05) = 1
To check:
E(X(X 1)) = E(X
2
X) = E(X
2
) E(X) = 1.5 0.5 = 1
17. Instead of ordering the list (call it S), we record the frequencies:
Value 14 18 19 20 22 25 27 29 30
Frequency 1 15 3 5 1 1 3 2 5
Clearly, the mode is 18.
The data set S contains 36 elements. In identifying the median, we calculate the mean of the
18th and the 19th entries. Since both are equal to 19, the median of S is 19. The mean is
S =
1
36
(1 14 + 15 18 + 3 19 + 5 20 + 1 22 + 1 25 + 3 27 + 2 29 + 5 30)
=
777
36
21.58
19. From
E(X) =
4
x=1
x P(X = x) = 1 0.2 + 2 0.4 + 3 0.3 + 4 0.1 = 2.3
E(sin(X)) =
4
x=1
sin x P(X = x) = sin 1 0.2 + sin 2 0.4 + sin 3 0.3 + sin 4 0.1 0.49867
we compute E(sin X) sin(E(X)) = 0.49867 sin 2.3 0.24704.
21. From
E(ln(X)) =
4
x=1
ln x P(X = x) = ln 1 0.2 + ln 2 0.4 + ln 3 0.3 + ln 4 0.1 0.74547
we compute e
E(ln X)
= e
0.74547
2.10743.
23. We compute
E(1/X) =
4
x=1
1
x
P(X = x) =
1
1
0.2 +
1
2
0.4 +
1
3
0.3 +
1
4
0.1 = 0.525
25. Let R represent the per capita production rate of the sh population. Its probability mass
function is P(R = 1.25) = 0.7 and P(R = 0.1) = 0.3. From
E(ln(R)) = ln 1.25 P(R = 1.25) + ln 0.1 P(R = 0.1)
= (ln 1.25)(0.7) + (ln 0.1)(0.3)
0.53458
we get the geometric mean e
E(ln R)
= e
0.53458
0.58591. The geometric mean predicts a decline in
the population at the rate of 1 0.58591 = 0.41409 per year.
27. The mode consists of three values: 2, 6, and 7. The probability mass function is given below.
x P(X = x)
1 0.15
2 0.2
5 0.1
6 0.2
7 0.2
8 0.15
The mean is
E(X) = (1)(0.15) + 2(0.2) + 5(0.1) + 6(0.2) + 7(0.2) + 8(0.15) = 4.85
To nd the median, we keep calculating the values of the cumulative distribution function until we
reach 0.5: F(1) = 0.15, F(2) = 0.35, F(5) = 0.45, F(6) = 0.65. The median is (5 + 6)/2 = 5.5.
29. The mode consists of two values: 3 and 6. The probability mass function is given in the table
below.
x P(X = x)
1 0.2
3 0.3
6 0.3
8 0.2
The mean is
E(X) = (1)(0.2) + 3(0.3) + 6(0.3) + 8(0.2) = 4.5
To nd the median, we keep calculating the values of the cumulative distribution function until we
reach 0.5: F(1) = 0.2, F(3) = 0.5, F(6) = 0.8. The median is (3 + 6)/2 = 4.5.
Section 8 The Spread of a Distribution
1. All three samples share the same mean:
A
=
B
=
C
= 3. The sample B is least spread out
(the two values which dier from the mean are one unit away from it). The sample A is less spread
out than C: the four values in A which dier from 3 are closer to the mean than the four values in C
which dier from 3. Thus, B has the smallest standard deviation, followed by A; the sample C has
the largest standard deviation of the three samples.
We conrm our reasoning by calculating the three standard deviations:
var(A) =

a
(a
A
)
2
P(A = a) =
1
5
a
(a 3)
2
=
1
5
_
(2 3)
2
+ (2 3)
2
+ (3 3)
2
+ (4 3)
2
+ (4 3)
2
_
=
4
5
and
A
=
_
var(A) = 2/
5. Likewise,
var(B) =
b
(b
B
)
2
P(B = b) =
1
5
b
(b 3)
2
=
1
5
_
(2 3)
2
+ (3 3)
2
+ (3 3)
2
+ (3 3)
2
+ (4 3)
2
_
=
2
5
and
B
=
_
var(B) =
2/
5. Finally,
var(C) =
c
(c
C
)
2
P(C = c) =
1
5
c
(c 3)
2
=
1
5
_
(1 3)
2
+ (1 3)
2
+ (3 3)
2
+ (5 3)
2
+ (5 3)
2
_
=
16
5
and
C
=
_
var(C) = 4/
5. Thus,
B
<
A
<
C
.
3. Consider multiplying X by a real number a. The formula var(aX) = a
2
var(X) gives 2 = a
2
(22),
so a
2
= 1/22 and a = 1/
11. Dene Y = (1/
11)X; the variance of Y is 2. To check:

var(Y ) = var
_
11
X
_
=
_
11
_
2
var(X) =
1
11
22 = 2
(Note that adding a real number to X does not change its variance; thats why we considered the
multiplication by a real number).
5. The expected value of X is zero:
E(X) =
4
k=4
kP(X = k) =
1
9
4
k=4
k =
1
9
(4 3 2 1 + 0 + 1 + 2 + 3 + 4) = 0
Therefore
var(X) =
4
k=4
(k E(X))
2
P(X = k) =
1
9
4
k=4
k
2
=
1
9
(16 + 9 + 4 + 1 + 0 + 1 + 4 + 9 + 16) =
60
9
7. From
E(X) =
x
xP(X = x) = (0)(0.15) + (1)(0.15) + (2)(0.15) + (4)(0.15) = (7)(0.15) = 1.05
and
E(X
2
) =

x
x
2
P(X = x) = (0)(0.15) + (1)(0.15) + (4)(0.15) + (16)(0.15) = (21)(0.15) = 3.15
we compute
var(X) = E(X
2
) [E(X)]
2
= 3.15 (1.05)
2
= 2.0475
and
X
=
2.0475 1.43091.
9. From
E(X) =
x
xP(X = x) = (2)(0.25) + (1)(0.2) + (0)(0.1) + (1)(0.2) + (2)(0.25) = 0
and
E(X
2
) =
x
x
2
P(X = x) = (4)(0.25) + (1)(0.2) + (0)(0.1) + (1)(0.2) + (4)(0.25) = 2.4
we compute
var(X) = E(X
2
) [E(X)]
2
= 2.4 (0)
2
= 2.4
and
X
=
2.4 1.54919.
11. Let E(X) = and X
1
= X E(X) = X .
Direct proof:
E(X
1
) =
x
1
x
1
P(X
1
= x
1
)
=
x
(x )P(X = x )
=
x
(x )P(X = x)
=

x
xP(X = x)
x
P(X = x)
= E(X)
x
P(X = x)
= 1 = 0
Using Theorem 7:
E(X
1
) = E(X ) = E(X) = = 0
13. Replacing X in var(X) = E(X
2
) [E(X)]
2
by aX, we get
var(aX) = E[(aX)
2
] [E(aX)]
2
= E[a
2
X
2
] [aE(X)]
2
= a
2
E[X
2
] a
2
[E(X)]
2
= a
2
(E(X
2
) [E(X)]
2
) = a
2
var(X)
15. The sample of 12 healthy adults, sorted:
110, 116, 120, 122, 123, 125, 125, 128, 132, 138, 138, 140
The minimum is 110, and the maximum is 140. The median is the mean of the sixth and the
seventh numbers: 125. The lower quartile is the mean of the third and the fourth numbers: Q
1
=
(120 + 122)/2 = 121, and the upper quartile is the mean of the ninth and the tenth numbers: Q
3
=
(132 + 138)/2 = 135.
The sample of 12 adults with a history of cardiovascular problems, sorted:
136, 142, 148, 150, 154, 154, 154, 158, 160, 160, 162, 166
The minimum is 136, and the maximum is 166. The median is the mean of the sixth and the
seventh numbers: 154. The lower quartile is the mean of the third and the fourth numbers: Q
1
=
(148+150)/2 = 149, and the upper quartile is the mean of the ninth and the tenth numbers: Q
3
= 160.
See the gure below for the boxplots.
149
121
Cardiovascular problems
110
160
125
140
166
Healthy
154
136
135
Blood pressure
17. The sample, sorted:
14, 16, 17, 18, 19, 20, 20, 20, 22, 22, 24, 24, 24, 25
The sample contains 14 numbers. The minimum is 14, and the maximum is 25. The median is the
mean of the seventh and the eighth numbers: 20. The lower quartile is the fourth number: Q
1
= 18,
and the upper quartile is the eleventh number: Q
3
= 24.
18
14
20
25
Lions in captivity
24
Lifespan
19. The sample, sorted:
12, 20, 20, 20, 21, 23, 23, 24, 24, 25, 25, 26, 27, 28
The sample contains 14 numbers. The minimum is 12, and the maximum is 28. The median is the
mean of the seventh and the eighth numbers: 23.5. The lower quartile is the fourth number: Q
1
= 20,
and the upper quartile is the eleventh number: Q
3
= 25.
20
12
23.5
28
Moose
25
Lifespan
21. We extract the probability mass function from the histogram.
x P(X = x)
1 0.15
2 0.1
3 0.05
4 0.15
5 0.1
6 0.2
7 0.05
8 0.2
From
E(X) =
x
xP(X = x)
= (1)(0.15) + (2)(0.1) + (3)(0.05) + (4)(0.15) + (5)(0.1) + (6)(0.2) + (7)(0.05) + (8)(0.2)
= 4.75
and
E(X
2
) =
x
x
2
P(X = x)
= (1)(0.15) + (4)(0.1) + (9)(0.05) + (16)(0.15) + (25)(0.1) + (36)(0.2) + (49)(0.05) + (64)(0.2)
= 28.35
we compute
var(X) = E(X
2
) [E(X)]
2
= 28.35 (4.75)
2
= 5.7875
and
X
=
5.7875 2.40572.
23. We extract the probability mass function from the histogram.
x P(X = x)
1 0.05
2 0.05
3 0.1
4 0.1
5 0.15
6 0.15
7 0.2
8 0.2
From
E(X) =

x
xP(X = x)
= (1)(0.15) + (2)(0.15) + (3)(0.1) + (4)(0.1) + (5)(0.15) + (6)(0.15) + (7)(0.2) + (8)(0.2) = 5.8
and
E(X
2
) =
x
x
2
P(X = x)
= (1)(0.05) + (4)(0.05) + (9)(0.1) + (16)(0.1) + (25)(0.15) + (36)(0.15) + (49)(0.2) + (64)(0.2)
= 34.5
we compute
var(X) = E(X
2
) [E(X)]
2
= 34.5 (5.8)
2
= 0.86
and
X
=
0.86 0.92736.
25. The mean of all three distributions is 24.5. For the Milky Way Farm,
MAD(X
1
) = E(|X
1
E(X
1
)|) = E(|X
1
24.5|)
= |18 24.5|
6
30
+|20 24.5|
5
30
+|22 24.5|
2
30
+|24 24.5|
1
30
+|25 24.5|
1
30
+|27 24.5|
4
30
+|29 24.5|
4
30
+|30 24.5|
7
30
=
134
30
For the Milkshake Farm,
MAD(X
2
) = E(|X
2
24.5|)
= |22 24.5|
4
30
+|23 24.5|
6
30
+|24 24.5|
3
30
+|25 24.5|
8
30
+|26 24.5|
6
30
+|27 24.5|
3
30
=
41
30
For the Butterscotch Farm,
MAD(X
3
) = E(|X
3
24.5|)
= |17 24.5|
2
30
+|18 24.5|
7
30
+|19 24.5|
4
30
+|20 24.5|
3
30
+|30 24.5|
2
30
+|31 24.5|
5
30
+|32 24.5|
7
30
=
192
30
Thus, the MAD is able to detect the dierences in the spreads of the three distributions. Note that
the order of the three distributions from the smallest to the largest standard deviation is the same as
the order of the three distributions from the smallest to the largest mean absolute deviation.
Section 9 Joint Distributions
1. Using independence, we nd
P(X = 1, Y = 1) = P(X = 1)P(Y = 1) = (0.2)(0.7) = 0.14
P(X = 1, Y = 2) = P(X = 1)P(Y = 2) = (0.2)(0.3) = 0.06
P(X = 2, Y = 1) = P(X = 2)P(Y = 1) = (0.8)(0.7) = 0.56
P(X = 2, Y = 2) = P(X = 2)P(Y = 2) = (0.8)(0.3) = 0.24
These four probabilities form the joint probability distribution of X and Y. See below.
X = 1 X = 2
Y = 1 0.14 0.56
Y = 2 0.06 0.24
3. Denote the missing entries by a and b and expand the table to include the horizontal and the
vertical totals:
X = 0 X = 1
Y = 0 0.1 0.3 P(Y = 0) = 0.4
Y = 1 a b P(Y = 1) = a +b = 0.6
P(X = 0) = 0.1 +a P(X = 1) = 0.4 +b
By independence
P(X = 0, Y = 0) = P(X = 0)P(Y = 0)
0.1 = (0.1 +a)(0.4)
0.25 = 0.1 +a
and thus a = 0.15. From P(Y = 1) = a +b = 0.6 we get b = 0.45.
5. Using independence, we nd
P(X = 1, Y = 1) = P(X = 1)P(Y = 1) = (0.2)(0.9) = 0.18
P(X = 1, Y = 2) = P(X = 1)P(Y = 2) = (0.2)(0.1) = 0.02
P(X = 2, Y = 1) = P(X = 2)P(Y = 1) = (0.8)(0.9) = 0.72
P(X = 2, Y = 2) = P(X = 2)P(Y = 2) = (0.8)(0.1) = 0.08
The four probabilities form the joint probability distribution of X and Y, shown in the table below
(expanded, to include horizontal and vertical totals):
X = 1 X = 2
Y = 1 0.18 0.72 P(Y = 1) = 0.9
Y = 2 0.02 0.08 P(Y = 2) = 0.1
P(X = 1) = 0.2 P(X = 2) = 0.8
We nd
P(X = 1 | Y = 1) =
P(X = 1, Y = 1)
P(Y = 1)
=
0.18
0.9
= 0.2
P(X = 1 | Y = 2) =
P(X = 1, Y = 2)
P(Y = 2)
=
0.02
0.1
= 0.2
Recall the law of total probability: If A is an event, and E
1
and E
2
form a partition, then
P(A) = P(A| E
1
)P(E
1
) +P(A| E
2
)P(E
2
)
Substituting A = {X = 1}, E
1
= {Y = 1}, and E
2
= {Y = 2}, we obtain the desired relation
P(X = 1) = P(X = 1 | Y = 1)P(Y = 1) +P(X = 1 | Y = 2)P(Y = 2)
between P(X = 1 | Y = 1), P(X = 1 | Y = 2), and P(X = 1). We illustrate it by substituting the
probabilities we calculated:
0.2 = (0.2)(0.9) + (0.2)(0.1)
7. We need to nd P(R = + and G = B| G = B). From P(G = B) = 0.076 + 0.014 = 0.09 we get
P(R = + and G = B| G = B) =
P(R = +, G = B)
P(G = B)
=
0.076
0.09
0.844
9. The two probabilities
P(R = +| G = B) =
P(R = +, G = B)
P(G = B)
=
0.076
0.09
=
76
90
P(R = | G = B) =
P(R = , G = B)
P(G = B)
=
0.014
0.09
=
14
90
dene the distribution of R conditional on G = B. (Note that P(G = B) = 0.076 + 0.014 = 0.09.)
11. By adding up the entries horizontally, we obtain the marginal distribution for A:
P(A = allergy) = P(A = allergy, T = positive)
+P(A = allergy, T = negative) +P(A = allergy, T = inconclusive)
= 0.3 + 0.07 + 0.1 = 0.47
P(A = no allergy) = P(A = no allergy, T = positive)
+P(A = no allergy, T = negative) +P(A = no allergy, T = inconclusive)
= 0.03 + 0.45 + 0.05 = 0.53
Thus, there is a 47% chance that a randomly selected person from the group has allergy.
By adding up the entries vertically, we obtain the marginal distribution for T:
P(T = positive) = P(A = allergy, T = positive) +P(A = no allergy, T = positive)
= 0.3 + 0.03 = 0.33
P(T = negative) = P(A = allergy, T = negative) +P(A = no allergy, T = negative)
= 0.07 + 0.45 = 0.52
P(T = inconclusive) = P(A = allergy, T = inconclusive) +P(A = no allergy, T = inconclusive)
= 0.1 + 0.05 = 0.15
Thus, for 33% of the population the test is positive, and for 52% it is negative; for 15% of the
population the test is inconclusive.
13. We compute
P(A = allergy | T = negative) =
P(A = allergy, T = negative)
P(T = negative)
=
0.07
0.07 + 0.45
=
7
52
0.13461
15. We need to nd the probabilities a, b, c and d which dene the joint distribution:
X = 1 X = 2
Y = 3 a b
Y = 4 c d
From the given information, we get the following equations:
P(X = 1) = 0.4 implies that a +c = 0.4
P(X = 2) = 0.6 implies that b +d = 0.6
P(Y = 3 | X = 1) = 0.7 implies that
P(Y = 3, X = 1)
P(X = 1)
=
a
a +c
= 0.7
P(Y = 3 | X = 2) = 0.1 implies that
P(Y = 3, X = 2)
P(X = 2)
=
b
b +d
= 0.1
Combining the rst and the third equation we get a/0.4 = 0.7 and thus a = 0.28. From a + c = 0.4
it follows that c = 0.12. Combining the second and the fourth equation we get b/0.6 = 0.1 and thus
b = 0.06. From b +d = 0.6 it follows that d = 0.54. The joint distribution is
X = 1 X = 2
Y = 3 0.28 0.06
Y = 4 0.12 0.54
17. By adding up the entries along the rows we obtain the distribution for F:
P(F = sh) = P(F = sh, P = brown bear)
+P(F = sh, P = wolf) +P(F = sh, P = fox)
= 0.2 + 0.02 + 0.03 = 0.25
P(F = insects) = P(F = insects, P = brown bear)
+P(F = insects, P = wolf) +P(F = insects, P = fox)
= 0.1 + 0.05 + 0.05 = 0.2
P(F = small mammals) = P(F = small mammals, P = brown bear)
+P(F = small mammals, P = wolf) +P(F = small mammals, P = fox)
= 0.2 + 0.25 + 0.1 = 0.55
By adding up the entries vertically we obtain the distribution for P:
P(P = brown bear) = P(P = brown bear, F = sh)
+P(P = brown bear, F = insects)
+P(P = brown bear, F = small mammals)
= 0.2 + 0.1 + 0.2 = 0.5
P(P = wolf) = P(P = wolf, F = sh)
+P(P = wolf, F = insects) +P(P = wolf, F = small mammals)
= 0.02 + 0.05 + 0.25 = 0.32
P(P = fox) = P(P = fox, F = sh)
+P(P = fox, F = insects) +P(P = fox, F = small mammals)
= 0.03 + 0.05 + 0.1 = 0.18
19. The conditional probabilities are:
P(F = sh | P = wolf) =
P(F = sh, P = wolf)
P(P = wolf)
=
0.02
0.02 + 0.05 + 0.25
=
0.02
0.32
P(F = insects | P = wolf) =
P(F = insects, P = wolf)
P(P = wolf)
=
0.05
0.02 + 0.05 + 0.25
=
0.05
0.32
P(F = small mammals | P = wolf) =
P(F = small mammals, P = wolf)
P(P = wolf)
=
0.25
0.02 + 0.05 + 0.25
=
0.25
0.32
The probabilities add up to 1, because a wolf will have one of the three for food.
21. The probability that a bear will prey on a small mammal is
P(F = small mammals | P = bear) =
P(F = small mammals, P = bear)
P(P = bear)
=
0.2
0.2 + 0.1 + 0.2
=
2
5
or 40%.
23. (a) The marginal distribution for X is given by
P(X = 0) = P(X = 0, Y = 0) +P(X = 0, Y = 1) = 0.05 + 0.45 = 0.5
P(X = 1) = P(X = 1, Y = 0) +P(X = 1, Y = 1) = 0.1 + 0.4 = 0.5
The marginal distribution for Y is given by
P(Y = 0) = P(X = 0, Y = 0) +P(X = 1, Y = 0) = 0.05 + 0.1 = 0.15
P(Y = 1) = P(X = 0, Y = 1) +P(X = 1, Y = 1) = 0.45 + 0.4 = 0.85
(b) The random variables X and Y are not independent; for instance, P(X = 0, Y = 0) = 0.05 is not
equal to P(X = 0)P(Y = 0) = (0.5)(0.15) = 0.075.
25. (a) The marginal distribution for X is given by
P(X = 0) = P(X = 0, Y = 0) +P(X = 0, Y = 1) +P(X = 0, Y = 2) = 0.12 + 0.22 + 0.02 = 0.36
P(X = 1) = P(X = 1, Y = 0) +P(X = 1, Y = 1) +P(X = 1, Y = 2) = 0.18 + 0.28 + 0.18 = 0.64
The marginal distribution for Y is given by
P(Y = 0) = P(X = 0, Y = 0) +P(X = 1, Y = 0) = 0.12 + 0.18 = 0.3
P(Y = 1) = P(X = 0, Y = 1) +P(X = 1, Y = 1) = 0.22 + 0.28 = 0.5
P(Y = 2) = P(X = 0, Y = 2) +P(X = 1, Y = 2) = 0.02 + 0.18 = 0.2
(b) The random variables X and Y are not independent; for instance, P(X = 0, Y = 1) = 0.22 is not
equal to P(X = 0)P(Y = 1) = (0.36)(0.5) = 0.18.
27. We nd
P(Y = 0 | X = 0) =
P(Y = 0, X = 0)
P(X = 0)
=
0.2
0.2 + 0.08 + 0.12
=
0.2
0.4
= 0.5
P(Y = 0 | X = 1) =
P(Y = 0, X = 1)
P(X = 1)
=
0.3
0.3 + 0.12 + 0.18
=
0.3
0.6
= 0.5
We see that P(Y = 0 | X = 0)+P(Y = 0 | X = 1) = 0/5+0.5 = 1. From the joint distribution table we
compute P(Y = 0) = 0.2+0.3 = 0.5. By examining the joint distribution closer, we realize that X and
Y are independent. Thus P(Y = 0 | X = 0)+P(Y = 0 | X = 1) = P(Y = 0)+P(Y = 0) = 2P(Y = 0),
which is illustrated by their numeric values above.
29. The probabilities P(X = 0) = 0.05 + 0.1 + 0.4 = 0.55 and P(X = 1) = 0.1 + 0.1 + 0.25 = 0.45
dene the marginal probability distribution of X.
31. The probabilities
P(X = 0 | Y = 2) =
P(X = 0, Y = 2)
P(Y = 2)
=
0.4
0.4 + 0.25
=
0.4
0.65
=
8
13
P(X = 1 | Y = 2) =
P(X = 1, Y = 2)
P(Y = 2)
=
0.25
0.4 + 0.25
=
0.25
0.65
=
5
13
dene the distribution of X conditional on Y = 2.
33. The marginal probability distributions of X and Y are given in the last row and the last column
in the table below.
Y = 1 Y = 2
X = 2 0 0.12 P(X = 2) = 0.12
X = 1 0.1 0.38 P(X = 1) = 0.48
X = 0 0.26 0.14 P(X = 0) = 0.4
P(Y = 1) = 0.36 P(Y = 2) = 0.64
35. Assume that P(X = 1) = p
1
and P(X = 2) = p
2
is the probability distribution of X and
P(Y = 3) = q
1
, P(Y = 4) = q
2
, and P(Y = 5) = q
3
is the probability distribution of Y. Then
E(X) = p
1
+2p
2
and E(Y ) = 3q
1
+4q
2
+5q
3
. The range of XY is {3, 4, 5, 6, 8, 10} and its distribution
is given by (here we use independence)
P(XY = 3) = P(X = 1 and Y = 3) = P(X = 1)P(Y = 3) = p
1
q
1
P(XY = 4) = P(X = 1 and Y = 4) = P(X = 1)P(Y = 4) = p
1
q
2
P(XY = 5) = P(X = 1 and Y = 5) = P(X = 1)P(Y = 5) = p
1
q
3
P(XY = 6) = P(X = 2 and Y = 3) = P(X = 2)P(Y = 3) = p
2
q
1
P(XY = 8) = P(X = 2 and Y = 4) = P(X = 2)P(Y = 4) = p
2
q
2
P(XY = 10) = P(X = 2 and Y = 5) = P(X = 2)P(Y = 5) = p
2
q
3
It follows that
E(XY ) = 3p
1
q
1
+ 4p
1
q
2
+ 5p
1
q
3
+ 6p
2
q
1
+ 8p
2
q
2
+ 10p
2
q
3
Since
E(X)E(Y ) = (p
1
+ 2p
2
)(3q
1
+ 4q
2
+ 5q
3
) = 3p
1
q
1
+ 4p
1
q
2
+ 5p
1
q
3
+ 6p
2
q
1
+ 8p
2
q
2
+ 10p
2
q
3
we conclude that E(XY ) = E(X)E(Y ).
In general: the range of X is {x
1
, x
2
, . . . , x
m
}; assume that its distribution is given by P(X =
x
i
) = p
i
. The range of Y is {y
1
, y
2
, . . . , y
n
}; assume that its distribution is given by P(Y = y
i
) = q
i
.
Then E(X) = p
1
x
1
+ p
2
x
2
+ + p
m
x
m
and E(Y ) = q
1
y
1
+ q
2
y
2
+ + q
n
y
n
. The range of XY
consists of all products x
i
y
j
, where i = 1, 2, . . . , m and j = 1, 2, . . . , n. The probability distribution is
(by independence)
P(XY = x
i
y
j
) = P(X = x
i
and Y = y
j
) = P(X = x
i
)P(Y = y
j
) = p
i
q
j
and
E(XY ) = x
1
y
1
p
1
q
1
+x
1
y
2
p
1
q
2
+ +x
1
y
n
p
1
q
n
+x
2
y
1
p
2
q
1
+x
2
y
2
p
2
q
2
+ +x
2
y
n
p
2
q
n
+
+x
m
y
1
p
m
q
1
+x
m
y
2
p
m
q
2
+ +x
m
y
n
p
m
q
n
Computing the product
E(X)E(Y ) = (p
1
x
1
+p
2
x
2
+ +p
m
x
m
)(q
1
y
1
+q
2
y
2
+ +q
n
y
n
)
we see that E(XY ) = E(X)E(Y ).
Section 10 The Binomial Distribution
1. No. The binomial distribution requires that the same experiment (with the same probability of
success) be repeated. In this case, the probability of success (a male is interviewed) changes: initially,
the probability that a male is selected for an interview is 1/2. Assuming independence, the probability
that the second interviewee is a male is 10/19 (if the rst interviewee was a woman) or 9/19 (if the
rst interviewee was a man); however, neither is equal to 50%.
3. Dene
B =
_
1 goshawk catches a small mammal (success)
0 goshawk does not catch a small mammal
B is a Bernoulli random variable with the probability of success equal to 0.6. Repeat the experiment
10 times; by assumption, the outcomes are independent. The random variable X = number of
small mammals captured counts the number of successes in ten independent repetitions of the same
experiment. Thus, X is a binomial variable.
5. Let S represent success and F represent a no-success (failure). Exactly two successes in four trials
occur in the following six cases: SSFF, SFSF, SFFS, FSSF, FSFS, and FFSS. They are all equally
likely, with the probability
P(SSFF) = P(S)P(S)P(F)P(F) = (0.3)(0.3)(0.7)(0.7) = (0.3)
2
(0.7)
2
Thus, probability of obtaining exactly two successes in four trials is 6P(SSFF) = 6 (0.3)
2
(0.7)
2
=
0.2646.
Now the binomial distribution approach: the probability of success in a single experiment is 0.3.
The probability of 2 successes in 4 independent repetitions of the experiment is given by
b(2, 4; 0.3) =
_
4
2
_
(0.3)
2
(0.7)
2
= 6 (0.3)
2
(0.7)
2
Clearly, the two answers match.
7.
_
12
3
_
represents the number of ways to choose a group of three objects from a group of 12 objects
(say, the number of ways of picking three shirts from a collection of 12 shirts in dierent colours). We
compute
_
12
3
_
=
12 11 10
1 2 3
= 2 11 10 = 220
9. We compute
C(8, 0) =
_
8
0
_
=
8!
0! (8 0)!
= 1
since 0! = 1. In theory, C(8, 0) represents the number of ways to choose zero objects from a group
of eight objects (we can dene that there is one way of not picking any object from a group of 8
objects).
11. The number b(1, 4; 0.6) represents the probability of one success in 4 independent repetitions of
the same Bernoulli experiment with the probability of success equal to 0.6. We compute
b(1, 4; 0.6) =
_
4
1
_
(0.6)
1
(1 0.6)
41
= 4 (0.6)(0.4)
3
= 0.1536.
13. The number b(1, 7; 0.2) represents the probability of one success in 7 independent repetitions of
the same Bernoulli experiment whose probability of success is 0.2. We compute
b(1, 7; 0.2) =
_
7
1
_
(0.2)
1
(1 0.2)
71
= 7 (0.2)(0.8)
6
0.3670.
15. Label the tosses by numbers S = {1, 2, 3, 4, 5, 6, 7, 8}. Picking a group of three numbers from S
corresponds to one event in which 3 tails occurred in 8 tosses (for instance, picking 4, 5 and 8 describes
the event in which tails occurred on the 4th, 5th and 8th tosses). Thus, the number of ways of getting
three tails in eight tosses of a coin is equal to the number of ways of selecting a group of 3 numbers
from the set S of 8 numbers, which is
_
8
3
_
=
8 7 6
1 2 3
= 56
17. The number of ways of selecting a team of 4 students from a group of 20 students is
_
20
4
_
=
20 19 18 17
1 2 3 4
= 5 19 3 17 = 4845
19. At least three successes means 3, 4, or 5 successes. Thus, the probability of at least three
successes in ve trials is given by b(3, 5; 0.6) +b(4, 5; 0.6) +b(5, 5; 0.6).
21. The number of successes could be 5, 6, 7, 8, or 9. The probability is given by b(5, 25; 0.6) +
b(6, 25; 0.6) +b(7, 25; 0.6) +b(8, 25; 0.6) +b(9, 25; 0.6).
23. Formula (10.3) states that
_
n
k
_
=
n!
k!(n k)!
Replacing k by n k, we get
_
n
n k
_
=
n!
(n k)!(n (n k))!
=
n!
(n k)!(k!
=
_
n
k
_
Thus,
_
22
20
_
=
_
22
22 20
_
=
_
22
2
_
=
22 21
1 2
= 231
25. (a) The probability distribution of X is given by:
P(X = 0) = b(0, 2; 0.4) =
_
2
0
_
(0.4)
0
(1 0.4)
2
= (0.6)
2
= 0.36
P(X = 1) = b(1, 2; 0.4) =
_
2
1
_
(0.4)
1
(1 0.4)
1
= 2 (0.4)(0.6) = 0.48
P(X = 2) = b(2, 2; 0.4) =
_
2
2
_
(0.4)
2
(1 0.4)
0
= (0.4)
2
= 0.16
(b) See below.
k
P(N=k)
0.16
0.36
0 1 2
0.48
(c) The mean is
E(X) = 0 0.36 + 1 0.48 + 2 0.16 = 0.8
From
E(X
2
) = 0 0.36 + 1 0.48 + 4 0.16 = 1.12
we compute the variance
var(X) = E(X
2
) (E(X))
2
= 1.12 0.8
2
= 0.48
(d) Using (10.4), E(X) = np = 2 0.4 = 0.8; using (10.5), var(X) = np(1 p) = 2 0.4 0.6 = 0.48.
27. (a) The probability distribution of X is given by:
P(X = 0) = b(0, 4; 0.4) =
_
4
0
_
(0.4)
0
(0.6)
4
= (0.6)
4
= 0.1296
P(X = 1) = b(1, 4; 0.4) =
_
4
1
_
(0.4)
1
(0.6)
3
= 4 (0.4)(0.6)
3
= 0.3456
P(X = 2) = b(2, 4; 0.4) =
_
4
2
_
(0.4)
2
(0.6)
2
= 6 (0.4)
2
(0.6)
2
= 0.3456
P(X = 3) = b(3, 4; 0.4) =
_
4
3
_
(0.4)
3
(0.6)
1
= 4 (0.4)
3
(0.6) = 0.1536
P(X = 4) = b(4, 4; 0.4) =
_
4
4
_
(0.4)
4
(0.6)
0
= (0.4)
4
= 0.0256
(b) See below.
k
P(N=k)
0.3456
0.1296
0 1 2 3 4
0.1536
0.0256
(c) The mean is
E(X) = 0 0.1296 + 1 0.3456 + 2 0.3456 + 3 0.1536 + 4 0.0256 = 1.6
From
E(X
2
) = 0 0.1296 + 1 0.3456 + 4 0.3456 + 9 0.1536 + 16 0.0256 = 3.52
var(X) = E(X
2
) (E(X))
2
= 3.52 1.6
2
= 0.96
(d) Using (10.4), E(X) = np = 4 0.4 = 1.6; using (10.5), var(X) = np(1 p) = 4 0.4 0.6 = 0.96.
29. (a) Dene
H =
_
1 a chocolate has a hazelnut (success)
0 a chocolate has no hazelnut
H is a Bernoulli random variable with probability of success equal to 0.03. Let N = number of
chocolates with a hazelnut in a box of 20 chocolates. Since N counts the number of successes in
repetitions of H (assumed independent), N is a binomially distributed random variable with n = 20
and p = 0.03.
The expected number of chocolates with a hazelnut per box is E(N) = np = 20(0.03) = 0.6.
(b) The probability that there are no chocolates with a hazelnut in one box of 20 chocolates is the
probability of no successes in 20 repetitions:
b(0, 20; 0.03) =
_
20
0
_
(0.03)
0
(0.97)
20
= 0.54379
i.e., a bit over 54%.
(c) Dene
B =
_
1 a box of chocolates has no chocolates with a hazelnut (success)
0 a box of chocolates has a chocolate with a hazelnut
B is a Bernoulli random variable with probability of success equal to 0.54379. Let M = number of
boxes of chocolates which do not contain a chocolate with a hazelnut. Since M counts the number
of successes in 15 independent repetitions of B, M is a binomially distributed random variable with
n = 15 and p = 0.54379.
The expected number of of boxes that contain no chocolates with a hazelnut is E(M) = np =
15(0.54379) = 8.15685; i.e., 8 boxes.
31. Dene
T =
_
1 a tomato plant has been infested with hornworms (success)
0 a tomato plant has not been infested with hornworms
T is a Bernoulli random variable with probability of success equal to 0.15. Let N = number of
tomato plants which have been infested with hornworms. Since N counts the number of successes in
independent repetitions of T, N is a binomially distributed random variable; it is given that n = 10
and p = 0.15. The probability that none of the ten randomly picked tomato plants have been infested
with hornworms is (zero successes in ten trials)
b(0, 10; 0.15) =
_
10
0
_
(0.15)
0
(0.85)
10
0.19687
33. The probability distribution of the genotype of a puppy of SC parents is P(SS) = 0.25, P(SC) =
0.5, and P(CC) = 0.25. Thus,
P(puppy has straight hair) = P(SS) +P(SC) = 0.75
P(puppy has curly hair) = P(CC) = 0.25
Dene
H =
_
1 a puppy has curly hair (success)
0 a puppy does not have curly hair
H is a Bernoulli random variable with probability of success equal to 0.25. Let N = number of
puppies which have curly hair. Since N counts the number of successes in independent repetitions
of H, N is a binomially distributed random variable; it is given that n = 8 and p = 0.25.
The expected number of puppies with curly hair is E(N) = np = 8(0.25) = 2. The probability
that exactly 2 puppies have curly hair is
b(2, 8; 0.25) =
_
8
2
_
(0.25)
2
(0.75)
6
0.31146
35. The probability distribution of the genotype of an ospring of LS parents is P(LL) = 0.25 =
P(long), P(LS) = 0.5 = P(medium-sized), and P(SS) = 0.25 = P(short). Dene
T =
_
1 an ospring is medium-sized (success)
0 an ospring is not medium-sized
T is a Bernoulli random variable with probability of success equal to 0.5. Let N = number of medium-
sized ospring. Since N counts the number of successes in repetitions of T, N is a binomially
distributed random variable; it is given that n = 12 and p = 0.5.
(a) The expected number of medium-sized ospring is E(N) = np = 12(0.5) = 6.
(b) The probability that that there are at most two medium-sized ospring is
b(0, 12; 0.55) +b(1, 12; 0.55) +b(2, 12; 0.55)
=
_
12
0
_
(0.5)
0
(0.5)
12
+
_
12
1
_
(0.5)
1
(0.5)
11
+
_
12
2
_
(0.5)
2
(0.5)
10
= (1 + 12 + 66)(0.5)
12
0.01929
37. (a) We approximate
50!
250
_
50
e
_
50
= 10
_
50
e
_
50
3.036344619 10
64
The true value is
50! = 30414093201713378043612608166064768844377641568960512000000000000
(b) We get
log
10
_
2n
_
n
e
_
n
_
= log
10
2n + log
10
_
n
e
_
n
=
1
2
(log
10
(2) + log
10
n) +n(log
10
n log
10
e)
=
1
2
log
10
(2) +
_
n +
1
2
_
log
10
n nlog
10
e
When n = 120,
log
10
120!
1
2
log
10
(2) + 120.5 log
10
120 120 log
10
e 198.8250922
(c) Using (b), we get
log
10
_
120
36
_
= log
10
120!
36!84!
= log
10
120! [log
10
36! + log
10
84!]
1
2
log
10
(2) + 120.5 log
10
120 120 log
10
e
_
1
2
log
10
(2) + 36.5 log
10
36 36 log
10
e +
1
2
log
10
(2) + 84.5 log
10
84 84 log
10
e
_
=
1
2
log
10
(2) + 120.5 log
10
120 36.5 log
10
36 84.5 log
10
84
30.7356092
Section 11 The Multinomial and the Geometric Distributions
1. (a) We can do it in
4!
1! 3!
=
24
6
= 4 ways: {1 | 2, 3, 4}, {2 | 1, 3, 4}, {3 | 1, 2, 4}, and {4 | 1, 2, 3}.
(b) We can do it in
4!
2! 2!
=
24
4
= 6 ways: {1, 2 | 3, 4}, {1, 3 | 2, 4}, {1, 4 | 2, 3}, {2, 3 | 1, 4}, {2, 4 | 1, 3},
and {3, 4 | 1, 2}.
(c) We can do it in
4!
1! 1! 2!
=
24
2
= 12 ways: {1 | 2 | 3, 4}, {2 | 1 | 3, 4}, {1 | 3 | 2, 4}, {3 | 1 | 2, 4}, {1 | 4 | 2, 3},
{4 | 1 | 2, 3}, {2 | 3 | 1, 4}, {3 | 2 | 1, 4}, {2 | 4 | 1, 3}, {4 | 2 | 1, 3}, {3 | 4 | 1, 2}, and {4 | 3 | 1, 2}.
3. (a) The probability that the 80 wolves will prey on 10 deer, 70 beavers, no moose, and no animals
from the other group is
P(N
1
= 10, N
2
= 70, N
3
= 0, N
4
= 0) =
80!
10! 70! 0! 0!
0.33
10
0.55
70
0.05
0
0.07
0
=
80!
10! 70!
0.33
10
0.55
70
(b) Is it given that N
2
= 60, N
4
= 16 and N
1
+ N
3
= 4. The probability is (we go through all
combinations of N
1
and N
3
whose sum is 4):
P(N
1
= 0, N
2
= 60, N
3
= 4, N
4
= 16) +P(N
1
= 1, N
2
= 60, N
3
= 3, N
4
= 16)
+P(N
2
= 2, N
2
= 60, N
3
= 2, N
4
= 16) +P(N
1
= 3, N
2
= 60, N
3
= 1, N
4
= 16)
+P(N
1
= 4, N
2
= 60, N
3
= 0, N
4
= 16)
=
80!
0! 60! 4! 16!
0.33
0
0.55
60
0.05
4
0.07
16
+
80!
1! 60! 3! 16!
0.33
1
0.55
60
0.05
3
0.07
16
+
80!
2! 60! 2! 16!
0.33
2
0.55
60
0.05
2
0.07
16
+
80!
3! 60! 1! 16!
0.33
3
0.55
60
0.05
1
0.07
16
+
80!
4! 60! 0! 16!
0.33
4
0.55
60
0.05
0
0.07
16
5. The probability distribution of the genotype of an ospring of AB parents is P(AA) = 0.25,
P(AB) = 0.5, and P(BB) = 0.25. There is a total of 9 ospring. The probability is
P(three AA, two AB, four BB) =
9!
3! 2! 4!
0.25
3
0.5
2
0.25
4
=
9!
6 2 24
0.25
8
0.01923
i.e., close to 2%.
7. The probability distribution of the genotype of an ospring of LS parents is P(LL) = 0.25 =
P(long), P(LS) = 0.5 = P(medium length), and P(SS) = 0.25 = P(short).
(a) The probability is
P(two LL, two LS, two SS) =
6!
2! 2! 2!
0.25
2
0.5
2
0.25
2
=
6!
8
0.25
5
0.08789
(b) The probability is
P(two LL, zero LS, four SS) +P(two LL, one LS, three SS)
=
6!
2! 0! 4!
0.25
2
0.5
0
0.25
4
+
6!
2! 1! 3!
0.25
2
0.5
1
0.25
3
0.00366 + 0.02930 = 0.03296
9. The probability distribution of the genotype of an ospring of AB parents is
P(AA) = 0.25 = P(neither carrier nor has the trait)
P(AB) = 0.5 = P(carrier)
P(BB) = 0.25 = P(has the trait)
The probability that one child will have attached earlobes, two will be carriers, and one will neither
be a carrier nor have attached earlobes is
P(one AA, two AB, one BB) =
4!
1! 2! 1!
(0.25)
1
(0.5)
2
(0.25)
1
= 12 (0.25)
3
= 0.1875
11. (a) Consider the geometric distribution with probability of success p = 0.15. The probability of
the rst success occurring on the fourth trial is
P(X = 4) = (1 0.15)
3
(0.15) = (0.85)
3
(0.15) 0.092
(b) See below.
1 2 4 6 8 10 12 14 16 18 20
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
p = 0.15
13. (a) Consider the geometric distribution with probability of success p = 0.2. The probability of
the rst success occurring on the third trial is
P(X = 3) = (1 0.2)
2
(0.2) = (0.8)
2
(0.2) = 0.128
(b) See below.
1 2 3 4 5 6 7 8 9 10 11 12
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
p = 0.2
15. (a) Consider the geometric distribution with probability of success p = 0.6. The probability that
the rst success occurs on or after the fourth trial is (use the complementary event success occurs
before the fourth trial)
P(X 4) = 1 P(X < 4)
= 1 [P(X = 1) +P(X = 2) +P(X = 3)]
= 1 [0.6 + (1 0.6)(0.6) + (1 0.6)
2
(0.6)] = 0.064
(b) See below.
1 2 3 4 5 6 7 8
0
0.1
0.2
0.3
0.4
0.5
0.6
p = 0.6
17. (a) Consider the geometric distribution with probability of success p = 0.6. The probability that
the rst success occurs on or before the fourth trial is
P(X 4) = P(X = 1) +P(X = 2) +P(X = 3) +P(X = 4)
= 0.6 + (1 0.6)(0.6) + (1 0.6)
2
(0.6) + (1 0.6)
3
(0.6)
= 0.6[1 + 0.4 + 0.4
2
+ 0.4
3
]
= 0.6
1 0.4
4
1 0.4
= 1 0.4
4
= 0.9744
(In calculating the sum in the end, we used the formula 1 +q +q
2
+q
3
+ +q
n
= (1 q
n+1
)/(1 q)
with q = 0.4 and n = 3.)
(b) See below.
1 2 3 4 5 6 7 8
0
0.1
0.2
0.3
0.4
0.5
0.6
p = 0.6
19. A geometric distribution will larger p is less spread out that the one with smaller p (look at
histograms in Figure 11.1). Thus, the geometric distribution with p
2
= p/2 is more spread out than
the one with p
1
= p.
Formally: the variances are var
1
= (1 p)/p
2
and
var
2
=
1
p
2
_
p
2
_
2
=
1
p
2
p
2
4
=
4 2p
p
2
From 4 2p > 1 p (which is true whenever p < 3) we conclude
var
1
=
1 p
p
2
<
4 2p
p
2
= var
2
So, p
2
= p/2 yields larger variance (thus, wider spread) than p
1
= p.
21. From E(X) = 1/p = 5 we get p = 0.2 The variance is var(X) = (1 p)/p
2
= 0.8/0.04 = 20 and
the standard deviation is

20 4.472.
23. From var = (1p)/p
2
= 2 we get 2p
2
= 1p and 2p
2
+p1 = (2p1)(p+1) = 0. Thus p = 1/2
(the remaining solution p = 1 makes no sense) and so the mean is 1/(1/2) = 2.
25. Let X = number of trials until gene mutates. X is a geometrically distributed random variable
with the probability of success p = 0.001.
The probability that a gene will mutate during the 20th cell division is
P(X = 20) = (1 0.001)
19
(0.001) 0.00098
The probability that the gene will mutate before or during the 20th cell division is
P(X 20) = P(X = 1) +P(X = 2) +P(X = 3) + +P(X = 20)
= 0.001 + (0.999)(0.001) + (0.999)
2
(0.001) + + (0.999)
19
(0.001)
= 0.001[1 + 0.999 + (0.999)
2
+ + (0.999)
19
]
= 0.001
1 0.999
20
1 0.999
= 1 0.999
20
0.0198
(In calculating the sum we used the formula 1 + q + q
2
+ q
3
+ + q
n
= (1 q
n+1
)/(1 q) with
q = 0.999 and n = 19.)
27. (a) We compute
s
n
qs
n
= 1 +q +q
2
+q
3
+ +q
n
q(1 +q +q
2
+q
3
+ +q
n
)
s
n
(1 q) = 1 +q +q
2
+q
3
+ +q
n
q q
2
q
3
q
n
q
n+1
s
n
(1 q) = 1 q
n+1
s
n
=
1 q
n+1
1 q
(b) Since |q| < 1, it follows that the limit of q
n+1
as n is zero. Thus,
lim
n
s
n
= lim
n
1 q
n+1
1 q
=
1
1 q
i.e.,
1 +q +q
2
+q
3
+ =
1
1 q
29. (a) Dierentiating
1 +q +q
2
+q
3
+ =
1
1 q
with respect to q, we get
1 + 2q + 3q
2
+ 4q
3
+ = (1 q)
2
(1) =
1
(1 q)
2
replacing q by 1 p yields
1 + 2(1 p) + 3(1 p)
2
+ 4(1 p)
3
+ =
1
(1 (1 p))
2
k=1
k(1 p)
k1
=
1
p
2
(b) Dierentiating
1 +q +q
2
+q
3
+ =
1
1 q
with respect to q, then multiplying by q and dierentiating with respect to q again, we obtain
1 + 2q + 3q
2
+ 4q
3
+ =
1
(1 q)
2
q + 2q
2
+ 3q
3
+ 4q
4
+ =
q
(1 q)
2
1 + 2
2
q + 3
2
q
2
+ 4
2
q
3
+ =
(1 q)
2
q 2(1 q)(1)
(1 q)
4
=
(1 q) + 2q
(1 q)
3
1 + 2
2
q + 3
2
q
2
+ 4
2
q
3
+ =
q + 1
(1 q)
3
Replacing q by 1 p and then multiplying by p yields
1 + 2
2
(1 p) + 3
2
(1 p)
2
+ 4
2
(1 p)
3
+ =
(1 p) + 1
(1 (1 p))
3
k=1
k
2
(1 p)
k1
=
2 p
p
3
p
k=1
k
2
(1 p)
k1
=
2 p
p
2
Section 12 The Poisson Distribution
1. It is given that X Po (2.5). Using
P(X = k) =
e
k
k!
=
e
2.5
(2.5)
k
k!
we obtain
P(X = 0) =
e
2.5
(2.5)
0
0!
= e
2.5
0.0820850
P(X = 1) =
e
2.5
(2.5)
1
1!
0.205212
P(X = 2) =
e
2.5
(2.5)
2
2!
0.256516
P(X = 3) =
e
2.5
(2.5)
3
3!
0.213763
P(X = 4) =
e
2.5
(2.5)
4
4!
0.133602
3. It is given that X Po (12). We nd
P(4 X 7) = P(X = 4) +P(X = 5) +P(X = 6) +P(X = 7)
=
e
12
12
4
4!
+
e
12
12
5
5!
+
e
12
12
6
6!
+
e
12
12
7
7!
= e
12
_
12
4
4!
+
12
5
5!
+
12
6
6!
+
12
7
7!
_
0.087213
5. It is given that X Po (4). We nd
P(0 X 3) = P(X = 0) +P(X = 1) +P(X = 2) +P(X = 3)
=
e
4
4
0
0!
+
e
4
4
1
1!
+
e
4
4
2
2!
+
e
4
4
3
3!
= e
4
_
1 + 4 + 8 +
32
3
_
0.433470
7. Look at Figure 12.1. There are two identical probabilities, corresponding to the values P(X = 1)
and P(X = ). Thus, the given graph represents the Poisson distribution with = 4.
We now prove that the observation we made is indeed true. Assume that is an integer, 1.
and that X Po (). Then
P(X = 1) =
e
1
( 1)!
=
e
1
( 1)!

=
e
!
= P(X = )
(In the above, we used the fact that ( 1)! = !.)
9. Dene X = number of people with a respiratory infection in a group of 5000 people. The
occurrence of 3 out of 2,000 translates to (multiply by 2.5) 7.5 out of 5,000. Thus, X is a Poisson
distribution with parameter = 7.5. The probability that 12 out of 5,000 people are diagnosed with
the infection is
P(X = 12) =
e
7.5
(7.5)
12
12!
0.036575
11. Let X = number of more serious trac accidents per week. Then X is a Poisson distribution
with parameter = 4. The probability that at least two more serious accidents happen in a week is
P(X 2) = 1 P(X < 2) = 1 (P(X = 0) +P(X = 1))
= 1
_
e
4
(4)
0
0!
+
e
4
(4)
1
1!
_
= 1 5e
4
0.908422
13. Dene X = number of spoiled apples in a bag of 15 apples. The occurrence of 2 spoiled
apples in a bag of 30 apples translates to 1 spoiled apple in a bag of 15 apples. Thus, X is a Poisson
distribution with parameter = 1. The probability that there are no more than two spoiled apples in
the bag of 15 apples is
P(X 2) = P(X = 0) +P(X = 1) +P(X = 2)
=
e
1
(1)
0
0!
+
e
1
(1)
1
1!
+
e
1
(1)
2
2!
= e
1
(1 + 1 + 0.5) = 2.5e
1
0.919699
15. Dene X = number of heavy metal particles in a half-litre glass of tap water. The occurrence
of six heavy metal particles in 1 L of tap water translates to three heavy metal particles in 1/2 L of
tap water. Thus, X is a Poisson distribution with parameter = 3. The probability that there are no
heavy metal particles in a half-litre glass of tap water is
P(X = 0) =
e
3
(3)
0
0!
= e
3
0.049787
i.e., a bit less than 5%.
17. Dene X = number of molecules leaving the region by the end of the second hour. The rate
of 0.4 molecules per hour translates to the rate of 0.8 molecules per two hours. Thus, X is a Poisson
distribution with parameter = 0.8. The probability that three or fewer molecules leave by the end
of the second hour is
P(X 3) = P(X = 0) +P(X = 1) +P(X = 2) +P(X = 3)
=
e
0.8
(0.8)
0
0!
+
e
0.8
(0.8)
1
1!
+
e
0.8
(0.8)
2
2!
+
e
0.8
(0.8)
3
3!
0.990920
19. Dene X = number of hits by cosmic rays in an eight-hour interval. The rate of one cosmic ray
per day translates to the rate of 1/3 cosmic rays per eight hours. Thus, X is a Poisson distribution
with parameter = 1/3. The probability that we will be hit at least once in an eight-hour interval is
P(X 1) = 1 P(X < 1) = 1 P(X = 0) = 1
e
1/3
(1/3)
0
0!
= 1 e
1/3
0.283469
21. Let X = number of text messages received in an hour. The context implies that X is a Poisson
distribution with parameter = 3. The probability that the student receives more than ve messages
in an hour is
P(X > 5) = 1 P(X 5)
= 1 (P(X = 0) +P(X = 1) +P(X = 2) +P(X = 3) +P(X = 4) +P(X = 5))
= 1
_
e
3
(3)
0
0!
+
e
3
(3)
1
1!
+
e
3
(3)
2
2!
+
e
3
(3)
3
3!
+
e
3
(3)
4
4!
+
e
3
(3)
5
5!
_
= 1
92
5
e
3
0.083918
23. Since X Po (1) and Y Po (9), it follows that (assuming independence) X + Y Po (1 + 9),
i.e., X +Y Po (10). Thus
P(X +Y = 2) =
e
10
(10)
2
2!
= 50e
10
0.002270
and
P(Y = 2 | X +Y = 2) =
P(Y = 2 and X +Y = 2)
P(X +Y = 2)
=
P(Y = 2 and X = 0)
P(X +Y = 2)
=
P(Y = 2)P(X = 0)
P(X +Y = 2)
=
e
9
(9)
2
2!
e
1
(1)
0
0!
50e
10
=
(9)
2
50 2!
= 0.81
25. Dene T = number of text messages in an hour and C = number of phone calls in an hour.
It is given that T Po (4) and C Po (2). Let I = T + C = number of interruptions in an hour.
Assuming independence, I Po (6). The probability that the student will experience no interruptions
in 1 hour is
P(I = 0) =
e
6
(6)
0
0!
= e
6
0.002479
Let J = number of interruptions in a ten-minute interval. Then J Po (1), and the probability
that the student will experience one interruption every 10 minutes is
P(J = 1) =
e
1
(1)
1
1!
= e
1
0.367879
27. Dene
A =
_
1 a person experiences serious side eects from allergy medication (success)
0 a person does not experience serious side eects from allergy medication
A is a Bernoulli random variable with probability of success equal to 0.003. Let N = number of
people experiencing serious side eects from allergy medication in a group of 200 people. Since N
counts the number of successes in 200 independent repetitions of the event A, it follows that N is a
binomially distributed random variable with n = 200 and p = 0.003. Thus, the probability that in a
group of 200 people nobody experiences serious side eects is
b(0, 200; 0.003) =
_
200
0
_
(0.003)
0
(0.997)
200
= (0.997)
200
0.548317
Using Poisson approximation (recall that b(k, n; p) P(X = k) if X Po (np)), we get
b(0, 200; 0.003) P(X = 0)
where X Po (200 0.003 = 0.6). Thus
P(X = 0) =
e
0.6
(0.6)
0
0!
= e
0.6
0.548812
29. Dene
L =
_
1 a person has serious consequences from lactose intolerance (success)
0 a person does not have serious consequences from lactose intolerance
L is a Bernoulli random variable with probability of success equal to 0.002. Let N = number of
people who have serious consequences from lactose intolerance in a group of 500 people. Since N
counts the number of successes in 500 independent repetitions of the event L, it follows that N is a
binomially distributed random variable with n = 500 and p = 0.002. Thus, the probability that in a
group of 500 people one person experiences serious consequences from lactose intolerance is
b(1, 500; 0.002) =
_
500
1
_
(0.002)
1
(0.998)
499
= (0.998)
499
0.368248
Using Poisson approximation (recall that b(k, n; p) P(X = k) if X Po (np)), we get
b(1, 500; 0.002) P(X = 1)
where X Po (500 0.002 = 1). Thus
P(X = 1) =
e
1
(1)
1
1!
= e
1
0.367879
Section 13 Continuous Random Variables
1. The function f(x) = 1 x
2
, x [0, 2], cannot be a probability density function because f(x) 0
does not hold on [0, 2]. For instance, f(1.5) = 1 (1.5)
2
= 1.25.
3. To satisfy f(x) 0, we need a 0 (actually, we need a > 0; if a = 0, then f is identically zero and
cannot be a probability density function). As well, the integral of f has to be equal to 1:
_
10
1
a
x
dx = a ln |x|
10
1
= a ln10 a ln1 = a ln10 = 1
Thus, a = 1/ ln 10.
5. To satisfy f(x) 0, we need a 0 (actually we need a > 0; if a = 0, then f is identically zero and
cannot be a probability density function). As well, the integral of f has to be 1:
_

0
a
1 +x
2
dx = a arctanx
0
= a arctan() a arctan0 = a
2
= 1
(since arctan0 = 0). Thus, a = 2/.
In the above, we abbreviated the calculation of the improper integral. Without skipping steps:
_

0
a
1 +x
2
dx = a lim
T
_
T
0
a
1 +x
2
dx
= a lim
T
arctanx
T
0
= a lim
T
(arctanT arctan 0) = a
2
7. Clearly, f(x) = 2/x
3
is positive for x [1, ). As well,
_

1
2
x
3
dx = lim
T
_
T
1
2
x
3
dx
= lim
T
2
x
2
2
T
1
= lim
T
1
x
2
T
1
= lim
T
_
1
T
2
+
1
1
2
_
= 0 + 1 = 1
The mean is
=
_

1
x
2
x
3
dx = lim
T
_
T
1
2
x
2
dx
= lim
T
2
x
1
1
T
1
= lim
T
2
x
T
1
= lim
T
_
2
T
+
2
1
_
= 0 + 2 = 2
9. No. Let f(x) = a for x [0, ), where a > 0 is a constant. Since
_

0
a dx = lim
T
_
T
0
a dx = lim
T
ax
T
0
= lim
T
(aT) =
the integral of f cannot be equal to 1, no matter what value of a is used.
11. Using the probability density function, we compute
P(0.5 X 2) =
_
2
0.5
(0.3 + 0.2x) dx = (0.3x + 0.1x
2
)
2
0.5
= (0.6 + 0.4) (0.15 + 0.025) = 0.825
The cumulative distribution function of f(x) is
F(x) =
_
x
0
(0.3 + 0.2t) dt = (0.3t + 0.1t
2
)
x
0
= (0.3x + 0.1x
2
) 0 = 0.3x + 0.1x
2
for x in [0, 2]. Thus,
P(0.5 X 2) = F(2) F(0.5) = [(0.3)(2) + (0.1)(2)
2
] [(0.3)(0.5) + (0.1)(0.5)
2
] = 0.825
13. Using the probability density function, we compute
P(1 X 2) =
_
2
1
1
x
dx = ln |x|
2
1
= ln 2 ln 1 = ln 2
The cumulative distribution function of f(x) is
F(x) =
_
x
1
1
t
dt = ln |t|
x
1
= ln x ln 1 = ln x
for x in [1, e]. Thus,
P(1 X 2) = F(2) F(1) = ln 2 ln 1 = ln 2
15. We check the properties listed in Theorem 13:
(a) Since e
2x
1 for x 0, it follows that F(x) = 1 e
2x
0 for all x [0, ). As well, e
2x
> 0,
and thus F(x) = 1 e
2x
1 for all x [0, ).
(b) The function F(x) is continuous for all x, as the dierence of two continuous functions. The fact
that F
(x) = e
2x
(2) = 2e
2x
> 0 implies that F(x) is increasing (thus, it is non-decreasing) for
all x [0, ).
(c) The limits:
lim
x0
F(x) = lim
x0
(1 e
2x
) = 1 e
0
= 0
and
lim
x
F(x) = lim
x
(1 e
2x
) = 1 e
= 1
Thus F(x) = 1 e
2x
, x [0, ), is indeed a cumulative distribution function. The corresponding
probability density function is f(x) = F
(x) = 2e
2x
.
The expected value is given by
=
_

0
x(2e
2x
) dx = 2
_

0
xe
2x
dx
First we calculate the indenite integral (using integration by parts): let u = x and v
= e
2x
. Then
u
= 1, v = e
2x
/2, and
_
xe
2x
dx = uv
_
vu
dx
=
1
2
xe
2x
+
1
2
_
e
2x
dx
=
1
2
xe
2x
1
4
e
2x
=
1
4
(2x + 1)e
2x
Thus
= 2
_

0
xe
2x
dx = 2 lim
T
_
T
0
xe
2x
dx
= 2 lim
T
_
1
4
(2x + 1)e
2x
_
T
0
= 2
_
lim
T
_
1
4
(2T + 1)e
2T
_
1
4
__
= 2
_
0 +
1
4
_
=
1
2
Recall that
lim
T
e
2T
= 0
and, by LH opitals rule,
lim
T
Te
2T
= lim
T
T
e
2T
= lim
T
1
2e
2T
= 0
17. (a) Clearly, f(x) = 2x 0 for x [0, 1]. As well,
_
1
0
f(x) dx =
_
1
0
2xdx = x
2
1
0
= 1 0 = 1
(b) The cumulative distribution function is
F(x) =
_
x
0
2t dt = t
2
x
0
= x
2
for x [0, 1].
(c) The expected value of X is
= E(X) =
_
1
0
x(2x) dx =
2x
3
3
1
0
=
2
3
(d) We nd
P(X ) = P(X 2/3) = F(2/3) =
_
2
3
_
2
=
4
9
19. (a) Clearly, f(x) = 3x
2
0 for x [0, 1]. As well,
_
1
0
f(x) dx =
_
1
0
3x
2
dx = x
3
1
0
= 1 0 = 1
F(x) =
_
x
0
3t
2
dt = t
3
x
0
= x
3
for x [0, 1].
= E(X) =
_
1
0
x(3x
2
) dx =
3x
4
4
1
0
=
3
4
(d) We nd
P(X ) = P(X 3/4) = F(3/4) =
_
3
4
_
3
=
27
64
21. (a) From 0 x 3 we get (after multiplying by 2/9) 0 2x/9 2/3. Thus, 2/3 2x/9 0 for
x [0, 3]. As well,
_
3
0
f(x) dx =
_
3
0
_
2
3

2x
9
_
dx =
_
2x
3

x
2
9
_
3
0
= (2 1) 0 = 1
F(x) =
_
x
0
_
2
3

2t
9
_
dt =
_
2t
3

t
2
9
_
x
0
=
2x
3

x
2
9
for x [0, 3].
= E(X) =
_
3
0
x
_
2
3

2x
9
_
dx =
_
3
0
_
2x
3

2x
2
9
_
dx =
_
x
2
3

2x
3
27
_
3
0
= (3 2) 0 = 1
(d) We nd
P(X ) = P(X 1) = F(1) =
_
2
3

1
9
_
=
5
9
23. We compute
= E(X) =
_
1
0
x(3x
2
) dx =
3x
4
4
1
0
=
3
4
= 0.75
E(X
2
) =
_
1
0
x
2
(3x
2
) dx =
3x
5
5
1
0
=
3
5
var(X) = E(X
2
) (E(X))
2
=
3
5

9
16
=
3
80
and =
_
var(X) =
_
3/80 0.19365. The probability that the values of X are at most one standard
deviation away from the mean is
P( X +) =
_
+
3x
2
dx
= x
3
= ( +)
3
( )
3
= (0.75 + 0.19365)
3
(0.75 0.19365)
3
= 0.668093
25. The cumulative distribution function is
F(x) =
_
x
0
3t
2
dt = t
3
x
0
= x
3
for x [0, 1]. The median is the value x where F(x) = 1/2, i.e., where x
3
= 1/2. Thus, the median is
3
_
1/2.
27. We are looking for a number Q
3
such that P(X Q
3
) = 0.75.
_
Q
3
0
_
2
3

2x
9
_
dx = 0.75
_
2x
3

x
2
9
_
Q
3
0
= 0.75
2Q
3
3

Q
2
3
9
= 0.75
Multiplying by 9 and using the quadratic formula, we get
Q
2
3
6Q
3
+ 6.75 = 0
Q
3
=
6
36 27
2
So Q
3
= 1.5 or Q
3
= 4.5. Since the probability density function and the cumulative distribution
function are deed on 0 x 3, the upper quartile is 1.5.
29. The average lifetime of the tree is given by the integral
_

0
t f(t) dt =
_

0
t 0.01e
0.01t
dt = lim
T
_
T
0
0.01te
0.01t
dt
To calculate the indenite integral, we use the integration by parts with u = t and v
= e
0.01t
. Then
u
= 1, v = e
0.01t
/0.01 = 100e
0.01t
, and
_
0.01te
0.01t
dt = 0.01
_
uv
_
vu
dt
_
= 0.01
_
100te
0.01t
+
_
100e
0.01t
dt
_
= 0.01
_
100te
0.01t
10000e
0.01t
_
= te
0.01t
100e
0.01t
Thus,
lim
T
_
T
0
0.01te
0.01t
dt = lim
T
_
te
0.01t
100e
0.01t
_
T
0
= lim
T
__
Te
0.01T
100e
0.01T
_
(0 100)
= 100
because
lim
T
e
0.01T
= 0
and, by LH opitals rule,
lim
T
Te
0.01T
= lim
T
T
e
0.01T
= lim
T
1
0.01e
0.01T
= 0
Thus, the average lifetime of a tree is 100 years.
The probability that a tree will live longer than 70 years is
P =
_

70
f(t) dt =
_

70
0.01e
0.01t
dt
= lim
T
_
T
70
0.01e
0.01t
dt
= lim
T
e
0.01t
T
70
= lim
T
_
e
0.01T
+e
0.01(70)
_
= e
0.7
0.49659
i.e., about 50%.
31. The probability is
P(distance 10) =
_
10
0
2
(1 +x
2
)
dx
=
2
arctan x
10
0
=
2
arctan 10 0 0.936550
33. (a) When x 0, f(x) = 1 |x| = 1 x; thus,
P(1/2 X 3/4) =
_
3/4
1/2
(1 |x|) dx =
_
3/4
1/2
(1 x) dx
=
_
x
x
2
2
_
3/4
1/2
=
_
3
4

9
32
_
_
1
2

1
8
_
=
3
32
When x < 0, f(x) = 1 |x| = 1 (x) = 1 +x; so
P(1/2 X 0) =
_
0
1/2
(1 |x|) dx =
_
0
1/2
(1 +x) dx
=
_
x +
x
2
2
_
0
1/2
= (0)
_
1
2
+
1
8
_
=
3
8
(b) To nd the expected value, we need to nd the integral
E(X) =
_
1
1
x(1 |x|) dx =
_
0
1
x(1 +x) dx +
_
1
0
x(1 x) dx
We can proceed as usual, calculating antiderivatives and evaluating. But, there is a shortcut: the
function x(1 |x|) is odd, and therefore its integral from 1 to 1 is zero. Thus, E(X) = 0. We need
to nd
E(X
2
) =
_
1
1
x
2
(1 |x|) dx =
_
0
1
x
2
(1 +x) dx +
_
1
0
x
2
(1 x) dx
=
_
0
1
(x
2
+x
3
) dx +
_
1
0
(x
2
x
3
) dx
=
_
x
3
3
+
x
4
4
_
0
1
+
_
x
3
3

x
4
4
_
1
0
= (0)
_
1
3
+
1
4
_
+
_
1
3

1
4
_
(0) =
1
6
Thus, the variance is var(X) = E(X
2
) (E(X))
2
= 1/6.
35. The Intermediate Value Theorem states that a continuous function dened on a closed interval
[a, b] assumes all values between f(a) and f(b). The cumulative distribution function is continuous,
and by assumption in this exercise, it is dened on a closed interval [a, b] (and not on an interval that
includes or ). Any cumulative distribution function F(x) satises F(a) = 0 and F(b) = 1. Thus,
the Intermediate Value Theorem implies that F assumes all values between 0 and 1, in particular the
value 1/2. In other words, there is a number x where F(x) = 1/2; this number is the median of X.
Section 14 The Normal Distribution
1. Assume that X is normally distributed with mean and variance
2
. The z-score of a number
a is the number (a )/; it is used to convert a probability related to a normal distribution to a
probability related to the standard normal distribution.
If X N(3, 16) then = 3 and = 4. To calculate P(0 X 7) we convert the numbers to
their z-scores:
P(0 X 7) = P
_
0 3
4

X 3
4

7 3
4
_
= P(3/4 Z 1)
The random variable Z = (X 3)/4 has the standard normal distribution.
3. The notation X N(0, 2
2
) says that the mean is = 0 and the standard deviation is = 2. Thus,
P(1 X 2) = P
_
1 0
2

X 0
2

2 0
2
_
= P(1/2 Z 1)
Below is the graph of the standard normal distribution; the area of the shaded region is equal to
P(1 X 2).
5 4 3 2 1/2 0 1 2 3 4 5
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
5. It is given that = 5 and = 10. Thus
P(X < 9) = P
_
X 5
10
<
9 5
10
_
= P(Z < 0.4) = F(0.4) = 0.655422
P(X > 25) = P
_
X 0
10
>
25 0
10
_
= P(Z > 2.5)
= 1 P(Z 2.5)
= 1 F(2.5) = 1 0.993790 = 0.006210
9. It is given that = 5 and = 10. We nd
P(X < 10) = P
_
X (5)
10
<
10 (5)
10
_
= P(Z < 0.5)
= F(0.5)
= 1 F(0.5) = 1 0.691462 = 0.308538
P(0 X 5) = P
_
0 2
5

X 2
5

5 2
5
_
= P(0.4 Z 0.6)
= F(0.6) F(0.4)
= F(0.6) (1 F(0.4)) = 0.725747 (1 0.655422) = 0.381169
13. Let W denote the weight of a pink salmon. It is given that W N(1.7, 0.1
2
). The ratio of pink
salmon which is heavier than 1.9 kg is given by
P(W > 1.9) = P
_
W 1.7
0.1
>
1.9 1.7
0.1
_
= P(Z > 2)
= 1 P(Z 2)
= 1 F(2) = 1 0.977250 = 0.022750
So, about 2.3% of salmon is heavier than 1.9 kg.
15. The mean of I is = 100 and the standard deviation is = 15. We compute
P(I > 120) = P
_
Z >
120 100
15
_
= P(Z > 4/3)
= 1 P(Z 1.33)
= 1 F(1.33)
1 F(1.35) = 1 0.911492 = 0.088508
The probability that someones IQ is more than 120 is about 8.85%.
17. Given S N(44, 5
2
), we compute
P(S > 50) = P
_
Z >
50 44
5
_
= P(Z > 1.2)
= 1 P(Z 1.2)
= 1 F(1.2) = 1 0.884930 = 0.115070
About 11.5% of moose can run faster than 50 km/h.
19. The fraction of the population in the interval (, +) is 0.683. The fraction of the population
in the interval (2, +2) which is outside of (, +) is 0.955 0.683 = 0.272. The fraction
of population in ( , + 2) is the fraction of the population in ( , + ) plus (because of
symmetry) one half of the population in the interval (2, +2) which is outside of (, +).
Thus, the required fraction is 0.683 + 0.272/2 = 0.819.
21. Let X denote the given population. From P( X + ) = 0.683 it follows that
P( X +) = 0.683/2 (thats because of the symmetry of the graph). Thus,
P( X +) = P( X ) +P( X +) = 0.5 + 0.683/2 = 0.8415
23. Let X denote the given population. From P( X + ) = 0.683 it follows that
P( X ) = 0.683/2 (because of the symmetry of the graph). Thus,
P( X ) = P( X ) P( X ) = 0.5 0.683/2 = 0.1585
25. X is normally distributed with mean E(X) = 2 + 4 = 6 and variance var(X) = 12
2
+ 6
2
= 180
(so the standard deviation of X is =
180).
27. Reducing to z-scores, we obtain
P(X x) = 0.56
P
_
Z
x 2
12
_
= 0.56
In Table 14.4 we nd
P(Z 0.15) = 0.559618
which is the closest value to 0.56. Thus, (x 2)/12 0.15, and x 12(0.15) + 2 = 3.8.
29. Reducing to z-scores, we obtain
P(X > x) = 0.2
P
_
Z >
x 2
12
_
= 0.2
1 P
_
Z
x 2
12
_
= 0.2
P
_
Z
x 2
12
_
= 0.8
In Table 14.4 we nd
P(Z 0.85) = 0.802337
which is the closest value to 0.8. Thus, (x 2)/12 0.85, and x 12(0.85) + 2 = 12.2.
31. Denote by S the grades on the test. It is given that S N(72, 8
2
). The ratio of students which
scored more than 90% on the test is
P(S > 90) = P
_
Z >
90 72
8
_
= P(Z > 18/8 = 2.25)
= 1 P(Z 2.25)
= 1 F(2.25)
= 1 0.987776 = 0.012224
Thus, about 1.2% of students scored more than 90% on the test.
33. Denote by S the grades on the test. It is given that S N(72, 8
2
). We are asked to nd s so that
P(S s) = 0.05. We compute
P(S s) = 0.05
P
_
Z >
s 72
8
_
= 0.05
1 P
_
Z
s 72
8
_
= 0.05
P
_
Z
s 72
8
_
= 0.95
In Table 14.4 we nd P(Z 1.65) = 0.950529, which is the closest value to 0.95. Thus, (s 72)/8 =
1.65, and s = 8(1.65) + 72 = 85.2. So the minimum score of the highest 5% of the test scores is 85.2
(of 100).
35. Dene the Bernoulli experiment
T
i
=
_
1 ith tree is infested by canker-rot fungus (success)
0 ith tree is not infested by canker-rot fungus
It is given that p = P(T
i
= 1) = 0.014 and P(T
i
= 0) = 0.986 for i = 1, 2, . . . , 200 (we nd
E(T
i
) = p = 0.014 and var(T
i
) = p(1 p) = (0.014)(0.986) = 0.013804 for all i). The random variable
M =
200
i=1
T
i
counts the number trees infested by canker-rot fungus.
The random variables T
i
are identically distributed (and assumed to be) independent. The mean
of M is (see Theorem 7 in Section 7) E(M) = np = (200)(0.014) = 2.8 and the variance is (see
Theorem 9 in Section 9) var(M) = np(1 p) = 200(0.014)(0.986) = 2.7608. Using the Central Limit
Theorem, we approximate M by the normal distribution M N(2.8, 2.7608).
The probability that fewer than 25 trees are infested with the fungus is (approximately)
P(M 25) = P
_
Z
25 2.8
2.7608
_
P(Z 13.36089)
= F(13.36089)
0.999999
(We dont have this value in the tables, but know that its very close to 1).
37. Consider the random variable B
i
= number of surviving ospring from ith bacterium, where
i = 1, 2, 3, . . . , 10, 000. It is given that, for all i, P(B
i
= 2) = 0.15, P(B
i
= 1) = 0.75, and P(B
i
=
0) = 0.1. We compute
E(B
i
) = 2(0.15) + 1(0.75) + 0(0.1) = 1.05
From
E(B
2
i
) = 4(0.15) + 1(0.75) + 0(0.1) = 1.35
var(B
i
) = E(B
2
i
) (E(B
i
))
2
= 1.35 1.05
2
= 0.2475.
The random variable
M =
10,000
i=1
B
i
counts the number surviving ospring.
The random variables B
i
are identically distributed (with mean = 1.05 and variance
2
=
0.2475) and assumed to be independent. The mean of M is (see Theorem 7 in Section 7) E(M) =
n = (10, 000)(1.05) = 10, 500 and the variance is (see Theorem 9 in Section 9) var(M) = n
2
=
10, 000(0.2475) = 2, 475. Using the Central Limit Theorem, we approximate M by the normal distri-
bution M N(10, 500, 2, 475).
The probability that the population will be larger than 10,000 is (approximately)
P(M > 10, 000) = P
_
Z >
10, 000 10, 500
2, 475
_
= P(Z > 10.05)
= 1 P(Z 10.05)
= 1 F(10.05)
= 1 (1 F(10.05)) = F(10.05) 0.999999
(F(10.05) is very close to 1.)
39. Dene the Bernoulli experiment
V =
_
1 virus is present (success)
0 virus is absent
It is given that p = P(V = 1) = 0.2 and P(V = 0) = 0.8. Repeat the experiment 120 times, and let
N count the number of successes (number of months the virus is present). The probability that the
virus will be present in between 30 and 36 months during a 10-year period is given by the sum of the
probabilities of 30, 31, 32, 33, 34, 35 and 36 successes in 120 repetitions:
P(30 n 36) = b(30, 120; 0.2) +b(31, 120; 0.2) +b(32, 120; 0.2)
+b(33, 120; 0.2) +b(34, 120; 0.2) +b(35, 120; 0.2) +b(36, 120; 0.2)
The major diculty in evaluating the seven expressions consists of dealing with products of very large
numbers (factorials) with very small numbers (coming from the probabilities). For instance,
b(33, 120; 0.2) =
_
120
33
_
(0.2)
33
(0.8)
87
=
120!
33! 87!
(0.2)
33
(0.8)
87
However, thus is not a real problem if instead of a pocket calculator we use Maple, Matlab, Mathe-
matica, or similar software.
41. Let t = u
2
. Then dt/du = 2u, and udu = dt/2; we get
_
ue
u
2
du =
_
e
t
_
dt
2
_
dt =
1
2
_
e
t
dt =
1
2
e
t
+C =
1
2
e
u
2
+C.
The denite integral is computed to be
_

0
ue
u
2
du = lim
T
_
T
0
ue
u
2
du
= lim
T
_
1
2
e
u
2
_
T
0
=
1
2
lim
T
_
e
T
2
e
0
_
=
1
2
(0 1) =
1
2
Let f(u) = ue
u
2
. Then f(u) = (u)e
(u)
2
= ue
u
2
= f(u); i.e., f(u) is an odd function.
Because
_
0
ue
u
2
du is a convergent integral (equal to 1/2) it follows that
_
0
ue
u
2
du is convergent
as well, and equal to 1/2. Thus,
_

ue
u
2
du =
_
0
ue
u
2
du +
_

0
ue
u
2
du =
1
2
+
_
1
2
_
= 0
43. (a) The calculation g(x) = e
(x)
2
= e
x
2
= g(x) proves that g is an even function.
(b) We compute g
(x) = 2xe
x
2
. If x > 0, then g
(x) < 0 (keep in mind that e

x
2
> 0 for all x),
and so g is decreasing. If x < 0, then g
(x) > 0 and g is increasing.

The equation g
(x) = 2xe
x
2
= 0 implies that x = 0 is the only critical point of g. Since g
changes from increasing to decreasing at x = 0, it follows that g(0) = 1 is a local maximum.
Because x
2
0 for all x, we conclude that e
x
2
e
0
= 1 for all real numbers x. Thus, g(0) = 1
is also a global maximum of g.
(c) Dierentiating g
, we obtain
g
(x) = 2e
x
2
2xe
x
2
(2x) = 2e
x
2
(1 2x
2
)
From g
(x) = 0 we get 1 2x
2
= 0, x
2
= 1/2 and x = 1/
2.
If x < 1/
2, then g
(x) > 0 and g is concave up. If 1/
2 < x < 1/
2, then g
(x) < 0 and g

is concave down. If x > 1/
2, then g
(x) > 0 and g is concave up. Thus, x = 1/
2 are points of
inection of g.
(d) We nd
lim
x
g(x) = lim
x
e
x
2
= e
= 0
lim
x
g(x) = lim
x
e
x
2
= e
= 0
45. It is assumed that X N(,
2
). We use z-scores to convert to calculations involving the standard
normal distribution:
P( X +) = P
_

_
= P(1 Z 1)
= F(1) F(1)
= F(1) (1 F(1))
= 2F(1) 1 = 2(0.841345) 1 = 0.682690 0.683
Likewise,
P( 2 X + 2) = P
_
2

+ 2
_
= P(2 Z 2)
= F(2) F(2)
= 2F(2) 1 = 2(0.977250) 1 = 0.9545 0.955
and
P( 3 X + 3) = P
_
3

+ 3
_
= P(3 Z 3)
= F(3) F(3)
= 2F(3) 1 = 2(0.998650) 1 = 0.9973 0.997
Section 15 The Uniform and the Exponential Distributions
1. From var(U) = (b 0)
2
/12 = 12 we get b
2
= 12
2
and b = 12 (since b > 0). The mean of U is
E(U) = (0 + 12)/2 = 6.
3. (a) The probability density function is f(t) = 0.2e
0.2t
and the cumulative distribution function is
F(t) = 1 e
0.2t
. The probability that the rst event occurs between times 2 and 6 is
P(2 T 6) =
_
6
2
0.2e
0.2t
dt
=
_
e
0.2t
_
6
2
= e
1.2
+e
0.4
0.369126
Alternatively, using the cumulative distribution function,
P(2 T 6) = F(6) F(2) =
_
1 e
0.2(6)
_
_
1 e
0.2(2)
_
= e
1.2
+e
0.4
0.369126
(b) See below.
0 1 2 3 4 5 6 7 8 9 10
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
1.5t
F(t) = 1 e
1.5t
. The probability that the rst event occurs before t = 3 is
P(T < 3) =
_
3
0
1.5e
1.5t
dt
=
_
e
1.5t
_
3
0
= e
4.5
+ 1 0.988891
Alternatively, using the cumulative distribution function, we obtain
P(T < 3) = F(3) = 1 e
1.5(3)
= 1 e
4.5
0.988891
(b) See below.
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
0
0.5
1
1.5
2.4t
F(t) = 1 e
2.4t
. The probability that the rst event occurs before t = 0.3 or after t = 1.2 is
P(T < 0.3) +P(T > 1.2) = P(T < 0.3) + (1 P(T 1.2))
= F(0.3) + 1 F(1.2)
= (1 e
2.4(0.3)
) + 1 (1 e
2.4(1.2)
) 0.569383
(b) See below.
0 0.3 1 2 3
0
0.5
1
1.5
2
2.5
1.2
9. Given s(t) = e
0.4t
, we identify = 0.4/month. The mean lifetime is 1/ = 1/0.4 = 2.5 months.
From s(3) = e
0.4(3)
= e
1.2
0.301194 we conclude that about 30.1% of insects will survive 3
months.
11. Denote the lifespan of the atom by T. Since the expected lifespan is 4 hours, it follows that
= 1/4 = 0.25/hour. The probability density function of T is f(t) = 0.25e
0.25t
, the cumulative
distribution function is F(t) = 1 e
0.25t
, and the survivorship function is s(t) = e
0.25t
.
The probability that the atom will not decay during the rst 3 hours is
P(T > 3) = 1 P(T 3) = 1 F(3) = s(3) = e
0.25(3)
= e
0.75
0.472367
Repeating this calculation, we obtain the probability that the atom will decay after 6 hours:
P(T > 6) = s(6) = e
0.25(6)
= e
1.5
0.223130
13. (a) The average lifespan of a guinea pig is 1/0.18 5.56 years.
(b) The survivorship function for the guinea pig is s(t) = e
0.18t
. Thus, the chance that a guinea pig
will live longer than 6 years is s(6) = e
0.18(6)
0.236928.
(c) Let T represent the lifetime of a guinea pig. We nd
P(T > 8 | T > 2) =
P((T > 8) (T > 2))
P(T > 2)
=
P(T > 8)
P(T > 2)
=
s(8)
s(2)
=
e
0.18(8)
e
0.18(2)
= e
0.18(6)
= s(6)
The answer is the same as in (b).
15. Young and old organisms are more likely to die, since the survivorship curve is sharply decreasing
for them. After the initial sharp drop, the curve continues with a small negative slope. Thus, an adult
organism has a good change of living bit longer (until it reaches the age where the survivorship curve
drops quickly again).
17. This is not hard to guess: the function f(x) = 5x stretches by a factor of 5: it maps the interval
(0, 1) to the interval (0, 5). Now we shift by 3 units, so the answer is f(x) = 5x + 3.
(Formally: we are looking for a linear function that maps the initial point of the rst interval (0)
to the initial point of the second interval (3) and the terminal point of the rst interval (1) to the
terminal point of the second interval (8). In other words, we are looking for an equation of a line
through the points (0, 3) and (1, 8). Using the point-slope equation, we get y 3 =
83
10
(x 0), i.e.,
y = 5x + 3.)
By generating random numbers in the interval (0, 1) and then applying f(x) to them, we generate
random numbers in the interval (3, 8).
The length of the interval (a, b) is b a. Thus f(x) = (b a)x transforms the interval (0, 1) to
(0, b a). Now we move it so that it starts at a; the function f(x) = (b a)x + a maps the interval
(0, 1) to (0 + a, b a + a) = (a, b). So, composing a random number generator on the interval (0, 1)
with f(x) we obtain a random number generator on the interval (a, b).
19. The half-life of a radioactive substance is the time t
h
for which P(T > t
h
) = s(t
h
) = 1/2. From
e
t
h
= 1/2 we obtain
t
h
= ln(1/2) = ln 1 ln 2 = ln 2
and t
h
= ln 2/.
The median is the time t
m
such that F(t
m
) = 1 e
t
m
= 1/2, i.e., e
t
m
= 1/2.
We see that t
m
= t
h
. From F(t) = 1 e
t
= 1 s(t) we conclude that F(t) + s(t) = 1. So, if
one of the F(t) or s(t) is 1/2, so is the other. Or: the half-life is the time t when the probability of
surviving s(t) is the same as the probability of dying F(t).

PR Student Manual

Uploaded by

Copyright:

Available Formats

PR Student Manual

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PR Student Manual

Uploaded by

Copyright:

Available Formats

Students Solutions Manual

Probability and Statistics

11. Dene Y = (1/

11)X; the variance of Y is 2. To check:

(x) < 0 (keep in mind that e

(x) > 0 and g is increasing.

(x) > 0 and g is concave up. If 1/

(x) < 0 and g

(x) > 0 and g is concave up. Thus, x = 1/

You might also like