Poly Proba Stat
Poly Proba Stat
Poly Proba Stat
COURSE HANDOUT
Mohamed BOUKELOUA
1
Conditional mean of X given Y = yj . . . . . . . . . . . . . . . . . . 30
Conditional variance of X given Y = yj . . . . . . . . . . . . . . . . . 31
2.3 Covariance of two characters . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Properties of the covariance . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.3 Correlation coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Fittings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.1 Fitting of type Y = aX + b . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.2 Fitting of type Y = B × AX . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.3 Fitting of type Y = B × X a . . . . . . . . . . . . . . . . . . . . . . . 44
4 Solutions to exercises 57
4.1 Solutions to exercises of chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . 57
2
Descriptive Statistics
Part I
Descriptive Statistics
3
Chapter 1
1.1 Introduction
1.1.1 Generalities
Descriptive statistics is a collection of methods used to describe, summarize, interpret and
analyse datasets which can be found in a given study. It helps analysts to better understand
the data and to draw conclusions from them. The datasets may be treated using tables,
graphs and numerical characteristics such as the mean, the variance, the quantiles, etc. The
statistical analysis may be univariate or multivariate. Univariate analysis focuses on one
character of the data. The main aspects of interest in this framework are the distribution,
the central tendency and the dispersion. Furthermore, multivariate analysis focuses on the
relationship between two or more characters. The main aspects in this framework are the
covariance, the coefficient of correlation and the conditional distributions. An other impor-
tant topic in descriptive statistics is the regression. This notion deals with the possibility to
establish an equation that links two (or more) variables. Such an equation may be linear,
exponential, polynomial or may have other forms.
1.1.2 Definitions
We will start with some basic definitions of descriptive statistics.
Population
The population is a set of similar items on which the statistical study is based. The number
of elements within a population is called the size of the population.
Sample
A sample is a subset of the population having the same characteristics as it. Samples are
used when the population sizes are too large so as it becomes impossible to include all pos-
sible observations. A sample should represent the population as a whole and not reflect any
bias toward a specific attribute.
4
Statistical unit
Each element in the population is called a statistical unit or an individual.
Statistical character
The character is a particular feature of the observations, in which the statistical study is
interested.
Modalities of a character
The modalities of a character are the different situations taken by this character.
Qualitative character
They are measures of "types" and may be represented by names or symbols. They are re-
lated to categorical variables. The modalities of a qualitative character are words or symbols.
Qualitative characters may also be represented by number codes.
5
Quantitative character
They are measures of values or counts and are expressed as numbers. They are related to
numeric variables. Quantitative characters may be discrete or continuous.
- Discrete quantitative character (or discrete statistical variable): It is a variable that takes
on distinct and countable values. The set of values of such a variable is finite or countable
(at most countable). The modalities are distinct numbers.
- Continuous quantitative character (or continuous statistical variable): It is a variable that
takes on an infinite number of possible values within a given range. The set of values of such
a variable is infinite and uncountable. The modalities are intervals called "Class intervals".
• ∀ i ∈ {1, . . . , k}, 0 ≤ fi ≤ 1.
Pk
• i=1 fi = 1.
6
1.2.1 Qualitative case
Example 1 (continued):
The study of the blood group of 150 students in a university gave the following results.
Blood group Number of students
A 45
B 25
AB 9
O 71
This statistical series can be represented by the following statistical table.
Modalities ni fi
A 45 0.3
B 25 0.167
AB 9 0.06
O 71 0.473
Total 150 1
ni ni
fi = = , ∀i ∈ {1, 2, 3, 4}.
n 150
7
The (xi )1≤i≤6 are the values of the studied statistical variable X (the number of children).
ni ni
fi = = , ∀i ∈ {1, 2, . . . , 6}.
n 60
8
Bar chart
This graphic consists of bars representing the modalities of the character. The height of each
bar is determined by either the absolute frequency or the relative frequency of the respective
modality.
Pie chart
A pie chart is a circle partitioned into segments, where each of the segments represents a
modality. The size of each segment depends upon the relative frequency and is determined
by the angle θi = fi × 360◦ .
We will represent our statistical series of Example 1 (Blood group) using these graphics.
Example 1 (continued):
The bar chart (using the relative frequencies) of this statistical series is as follows.
- To draw the pie chart of this series, we need to calculate the angle θi = fi × 360◦ for all
i ∈ {1, 2, 3, 4}.
Modalities ni fi θi
A 45 0.3 108◦
B 25 0.167 60.12◦
AB 9 0.06 21.6◦
O 71 0.473 170.28◦
Total 150 1 360◦
9
1.3.2 Discrete quantitative case
Let X be a discrete statistical variable taking the values {x1 , x2 , . . . , xk }, with x1 < x2 <
· · · < xk . For all i ∈ {1, 2, . . . , k}, we denote by ni (resp. fi ) the absolute (resp. the relative)
frequency of xi . In this case, the statistical series can be represented by two types of graphics:
The differential diagram and the integral diagram.
10
Definition 6. (The empirical cumulative distribution function)
The empirical cumulative distribution function (ECDF) of X is the function F : R −→ [0, 1]
defined for all x ∈ R by
0 if x < x1
F (x) = Fi if x ∈ [xi , xi+1 [, for 1 ≤ i ≤ k − 1
1 if x ≥ xk .
• ∀ x ∈ R, 0 ≤ F (x) ≤ 1.
Now, we will represent our statistical series of Example 2 (Number of children) using the
above graphics.
Example 2 (continued):
The line graph (using the relative frequencies) of this statistical series is as follows.
- To draw the cumulative frequency curve of this series, we need to calculate the cumu-
lative relative frequencies (Fi )1≤i≤6 .
11
xi ni fi Ni Fi
0 0
0 5 0.083 5 0.083
5 0.083
1 10 0.167 15 0.25
15 0.25
2 11 0.183 26 0.433
26 0.433
3 18 0.3 44 0.733
44 0.733
4 11 0.183 55 0.916
55 0.916
5 5 0.083 60 1
60 1
Total 60 1
So, the cumulative frequency curve is as follows.
12
fi
the height of the ith bar is hi = , where di = ei − ei−1 denotes the magnitude of the ith
di
class [ei−1 , ei [. An important consideration for this concept is that the area of each bar is
proportional to the corresponding relative frequency.
Remark 3.
If all the classes have the same magnitude, we can take hi = fi for all i ∈ {1, 2, . . . , k}.
Definition 7.
The empirical cumulative distribution function (ECDF) of the continuous variable X is the
function F : R −→ [0, 1] defined for all x ∈ R by
0 if x < e0
fi
F (x) = Fi−1 + (x − ei−1 ) if x ∈ [ei−1 , ei [, for 1 ≤ i ≤ k − 1
ei − ei−1
1 if x ≥ e , k
with F0 = 0.
F is a piecewise linear function and it satisfies the same properties of the ECDF in the
discrete case except the fact that it is continuous on R (and not only right continuous).
Now, we will represent our statistical series of Example 3 (Size of students) using the above
graphics.
Example 3 (continued):
To draw the graphical representations of this statistical series, we need to calculate the
magnitudes (di )1≤i≤5 and the cumulative relative frequencies (Fi )1≤i≤5 .
fi
ei ci ni fi Ni Fi di
di
1.50 0 0
160 1.55 20 0.1 20 0.1 0.1 1
1.60 20 0.1
165 1.625 45 0.225 65 0.325 0.05 4.5
1.65 65 0.325
175 1.70 85 0.425 150 0.75 0.1 4.25
1.75 150 0.75
185 1.80 40 0.2 190 0.95 0.1 2
1.85 190 0.95
190 1.875 10 0.05 200 1 0.05 1
1.90 200 1
Total 200 1
13
The histogram of our statistical series is as follows.
We can also add the frequency polygon by joining the midpoints of the tops of the rectangles.
We plot also the previous and next points on the x−axis to start and end the polygon. These
d1 dk
two points correspond respectively to e0 − and ek + .
2 2
14
1.4 Parameters of a statistical series
In this section, we will study some parameters that measure the central tendency and the
dispersion of a statistical series with a quantitative character. We will deal with the discrete
and the continuous cases separately.
Arithmetic mean
The arithmetic mean of X is defined by
k k
1X X
X= ni xi = f i xi .
n i=1 i=1
Remark 4.
15
If use a transformation Y = aX + b with a, b ∈ R, then Y = aX + b.
Indeed, we have for all i ∈ {1, . . . , k} yi = axi + b, then
k
1X
Y = ni yi
n i=1
k
1X
= ni (axi + b)
n i=1
k
1X
= (ani xi + bni )
n i=1
k k
1X 1X
= ani xi + bni
n i=1 n i=1
k
! k
!
1X 1X
=a ni xi + b ni
n i=1 n i=1
n
= aX + b ×
n
= aX + b.
Mode
The mode of X, denoted by M , is the value(s) having the largest absolute frequency. The
mode may not be unique.
Quantiles
Let p ∈ [0, 1], the quantile of order p (or pth quantile) of X is the value xp of X which divides
the dataset in two parts such that p−proportion of the data are less than or equal to xp and
(1 − p)−proportion of the data are greater than xp . In other words
Dispersion parameters
Dispersion parameters are statistical parameters that describe the dispersion of the obser-
vations around any particular value.
16
The variance of X is defined by
k k
1X 2 X 2
V ar(X) = ni xi − X = f i xi − X .
n i=1 i=1
It measures of how far the dataset is spread out from their average value.
The variance is always non-negative and the standard deviation of X is defined by
p
σX = V ar(X).
The standard deviation has the same unit of measurement as the data whereas the unit of
the variance is the square of the units of the observations.
Remark 5.
i) We have
k k
1X 2 X 2 2
V ar(X) = ni x2i − X = f i xi − X .
n i=1 i=1
Indeed
k
1X 2
V ar(X) = n i xi − X
n i=1
k
1X 2 2
= ni xi − 2xi X + X
n i=1
k k k
!
1 X X X 2
= ni x2i − 2ni xi X + ni X
n i=1 i=1 i=1
k k 2 k
1 X 2X X X X
= ni x2i −ni xi + ni
n i=1 i=1
n n i=1
k 2
1X 2 X
= ni x2i − 2 X + ×n
n i=1 n
k
1X 2
= ni x2i − X .
n i=1
17
Indeed
k
1X 2
V ar(Y ) = ni yi − Y
n i=1
k
1X 2
= ni axi + b − aX − b
n i=1
k
1X 2
= ni a2 xi − X
n i=1
k
a2 X 2
= n i xi − X
n i=1
= a2 V ar(X)
and p p
σY = V ar(Y ) = a2 V ar(X) = |a|σX .
Range
The range of X is defined as the difference between the maximum and minimum value of X.
R = max xi − min xi .
1≤i≤k 1≤i≤k
Interquartile range
The interquartile range of X is defined as the difference between the first and the third
quartiles of X.
IQ = Q3 − Q1 .
Now, we will calculate the central tendency and the dispersion parameters of the statistical
series of Example 2 (Number of children).
Example 2 (continued):
To calculate the parameters of this statistical series, we need to add the following columns
in the statistical table.
18
xi ni fi Ni Fi n i xi ni x2i
0 0
0 5 0.083 5 0.083 0 0
5 0.083
1 10 0.167 15 0.25 10 10
15 0.25
2 11 0.183 26 0.433 22 44
26 0.433
3 18 0.3 44 0.733 54 162
44 0.733
4 11 0.183 55 0.916 44 176
55 0.916
5 5 0.083 60 1 25 125
60 1
Total 60 1 155 517
Total/n 2.583 8.617
19
Central tendency parameters
Arithmetic mean
The arithmetic mean of X is defined by
k k
1X X
X= n i ci = f i ci .
n i=1 i=1
Modal class
The modal class of X, denoted by M , is the class(es) that correspond(s) to the largest ni /di
(or fi /di ). It may not be unique.
Quantiles
The quantiles are defined in the same way as in the discrete case. To calculate them, we
use the method of linear interpolation. For example, to calculate the median, we determine
i such that Fi−1 ≤ 0.5 < Fi which means that M ed ∈ [ei−1 , ei [, then we apply the formula
M ed − ei−1 0.5 − Fi−1 0.5 − Fi−1
= =⇒ M ed = ei−1 + (ei − ei−1 ) .
ei − ei−1 Fi − Fi−1 Fi − Fi−1
For any p ∈ [0, 1], we apply the same method to calculate the quantile xp , using the appro-
priate proportion p.
Dispersion parameters
Variance and standard deviation
The variance of X is defined by
k k
1X 2 1X 2 2
V ar(X) = n i ci − X = ni ci − X
n i=1 n i=1
and the standard deviation of X is defined by
p
σX = V ar(X).
Range
The range of X is defined by
R = ek − e0 .
Interquartile range
The interquartile range of X is defined by
IQ = Q3 − Q1 .
Now, we will calculate the central tendency and the dispersion parameters of the statistical
series of Example 3 (Size of students).
Example 3 (continued):
To calculate the parameters of this statistical series, we need to add the following columns
in the statistical table.
20
fi
ei ci ni fi Ni Fi di ni ci ni c2i
di
1.50 0 0
160 1.55 20 0.1 20 0.1 0.1 1 31 48.05
1.60 20 0.1
165 1.625 45 0.225 65 0.325 0.05 4.5 73.125 118.828
1.65 65 0.325
175 1.70 85 0.425 150 0.75 0.1 4.25 144.5 245.65
1.75 150 0.75
185 1.80 40 0.2 190 0.95 0.1 2 72 129.6
1.85 190 0.95
190 1.875 10 0.05 200 1 0.05 1 18.75 35.156
1.90 200 1
Total 200 1 339.375 577.284
Total/n 1.697 2.886
- The central tendency parameters are:
• The arithmetic mean
5
1X 339.375
X= n i ci = = 1.697
n i=1 200
• The modal class is M = [1.60, 1.65[ because the largest fi /di is 4.5
• The median:
We have 0.325 < 0.5 < 0.75, then M ed ∈ [1.65, 1.75[ and the method of linear interpolation
gives
M ed − 1.65 0.5 − 0.325
=
1.75 − 1.65 0.75 − 0.325
M ed − 1.65 0.175
=⇒ = = 0.412
0.1 0.425
=⇒ M ed = 0.412 × 0.1 + 1.65 = 1.691
• The quartiles:
We have 0.1 < 0.25 < 0.325, then Q1 ∈ [1.60, 1.65[ and the method of linear interpolation
gives
Q1 − 1.60 0.25 − 0.1
=
1.65 − 1.60 0.325 − 0.1
Q1 − 1.60 0.15
=⇒ = = 0.667
0.05 0.225
=⇒ Q1 = 0.667 × 0.05 + 1.60 = 1.633
We remark that 0.75 is on the line, so Q3 = 1.75.
- Furthermore, the dispersion parameters are:
• The variance
5
1X 2 2 577.284
V ar(X) = n i ci − X = − (1.697)2 = 2.886 − (1.697)2 = 0.006
n i=1 200
21
p √
• The standard deviation σX = V ar(X) = 0.006 = 0.077
• The range
R = ek − e0 = 1.90 − 1.50 = 0.40
• The interquartile range
1.5 Exercises
Exercise 1:
In each case, determine the population, the statistical unit, the studied character and its
type.
1. A teacher recorded the scores of the test of mathematics obtained by the pupils of a
class.
2. A survey on the marital status has been conducted among the employees of a company.
3. The study of the weight of the students of the Preparatory class department.
4. The study of the maximum temperature in a specific day in the 58 wilayas of Algeria.
5. A survey conducted among the employees of a company dealt with the means of trans-
port used to get to work.
6. The study of the number of mobiles in each house of a neighbourhood.
7. The study of the monthly salary of the employees of a company.
Exercise 2:
A survey on the hobbies of 80 inhabitants of a city gave the following results.
Hobbies Number of inhabitants
Reading 20
Sport 24
Cinema 20
Theatre 16
1. Determine the population, the studied character, its type and its modalities.
2. Draw up the statistical table with absolute and relative frequencies.
3. Draw the appropriate graphical representations.
Exercise 3:
A survey conducted among 120 employees of a company dealt with the means of transport
used to get to work. The results of this survey are given in the following table.
22
Means of transport Number of employees
Private car 18
Taxi 24
Bus 30
Tramway 42
Motorcycle 6
1. Determine the population, the studied character, its type and its modalities.
Exercise 4:
A study on the number of milk litres bought each week by 100 consumers gives the following
results.
Number of bought milk litres 0 1 2 3 4 5
Number of consumers 5 20 35 25 10 5
1. Determine the population, the studied character, its type and its modalities.
Exercise 5:
The shoe sizes of the pupils of a school have been recorded in the following table.
Shoe size 36 37 38 39 40 41 42
Number of pupils 8 20 32 32 30 24 14
1. Determine the population, the studied character, its type and its modalities.
Exercise 6:
A farmer recorded the mass of the eggs laid in a specific day. The masses are given in the
following table.
Mass (in gram) [38, 47[ [47, 52[ [52, 57[ [57, 62[ [62, 72[ [72, 82[
Number of eggs 51 74 112 92 62 9
23
1. Determine the population, the studied character, its type and its modalities.
Exercise 7:
The areas of 100 housings are recorded in the following table.
Area (in m2 ) [30, 40[ [40, 60[ [60, 80[ [80, 100[ [100, 140[ [140, 200[
Number of housings 13 20 22 19 21 5
1. Determine the population, the studied character, its type and its modalities.
Exercise 8:
The size X of 100 students are recorded in the following table.
Size (in cm) [150, 160[ [160, 165[ [165, 170[ [170, 175[ [175, 180[ [180, 190[
Number of students 8 24 42 14 10 2
1. Calculate X and σX .
24
Chapter 2
2.1 Introduction
In the previous chapter, we have studied the distribution of a statistical variable and we have
seen how to describe it using numerical and graphical tools. However, in many situations we
may be interested in the relation between two (or more) statistical variables. In particular,
we need to know whether the value taken by a variable affects the other one, i.e. whether
there is a correlation between the two variables. We may also be interested in fitting one
variable with respect to the other using a mathematical equation. This allows to predict the
value of the fitted variable knowing the value of the other one.
25
Y
y1 y2 ... yj ... yl Total
X
x1 n11 n12 ... n1j ... n1l
x2 n21 n22 ... n2j ... n2l
.. .. .. .. .. .. ..
. . . . . . .
xi ni1 ni2 ... nij ... nil
.. .. .. .. .. .. ..
. . . . . . .
xk nk1 nk2 ... nkj ... nkl
Total
Example 1:
A study on the number of pupils and the number of teachers in 200 secondary school gave
the following results, where X represents the number of pupils and Y represents the number
of teachers.
Y
20 22 25 27 29 31 32 Total
X
400 14 10 6 8 5 2 3 48
450 4 14 5 3 4 3 1 34
500 0 3 8 18 10 1 2 42
550 2 4 1 16 20 5 5 53
600 1 2 1 3 2 4 10 23
Total 21 33 21 48 41 15 21 200
- The values of X are {400, 450, 500, 550, 600} and the values of Y are {20, 22, 25, 27, 29, 31, 32}.
- For example, we have:
n34 = 18: There is 18 schools with 500 pupils (X = x3 = 500) and 27 teachers (Y = y4 = 27)
n34 18
and the corresponding relative frequency is f34 = = = 0.09.
n 200
n42 = 4: There is 4 schools with 550 pupils (X = x4 = 550) and 22 teachers (Y = y2 = 22)
n42 4
and the corresponding relative frequency is f42 = = = 0.02.
n 200
- Summing over rows or columns gives the same result, which is the sample size n = 200.
26
Definition 9. (Marginal relative frequencies of X)
The ith marginal relative frequency of X is the proportion of individuals for which X = xi
regardless of the value of Y . It is given by
l
X ni.
fi. = fij = .
j=1
n
We define in the same way the marginal absolute and relative frequencies of Y which are
given by
k k
X X n.j
n.j = nij and f.j = fij = .
i=1 i=1
n
We have
k
X l
X k
X l
X
ni. = n.j = n and fi. = ‘ f.j = 1.
i=1 j=1 i=1 j=1
To calculate ni. (resp. n.j ) from the contingency table, we sum the nij over the ith row (resp.
the jth column). In the previous example, we have ni1 = 48, ni2 = 34, ni3 = 42, ni4 = 53
and ni5 = 23. Moreover, n.1 = 21, n.2 = 33, n.3 = 21, n.4 = 48, n.5 = 41, n.6 = 15 and
n.7 = 21.
Remark 6.
If one of the variables X and Y (or the two of them) is qualitative, we define the marginal
distributions in the same way.
Marginal variance
The marginal variance of X is defined by
k k
1X 1X
V ar(X) = ni. (xi − X)2 = ni. x2i − (X)2
n i=1 n i=1
27
p
and the marginal standard deviation of X is σX = V ar(X).
Similarly, the marginal variance and stadard deviation of Y are defined by
l l
1X 1X
V ar(Y ) = n.j (yj − Y )2 = n.j yj2 − (Y )2
n j=1 n j=1
p
and σY = V ar(Y ).
Remark 7.
If one of the variables X and Y (or the two of them) is continuous, we replace the values xi
and/or yj by the centres of the class intervals.
Example 1 (continued):
We will calculate the marginal means and variances in the example of secondary schools.
Y
20 22 25 27 29 31 32 ni. ni. xi ni. x2i
X
400 14 10 6 8 5 2 3 48 19200 7680000
450 4 14 5 3 4 3 1 34 15300 6885000
500 0 3 8 18 10 1 2 42 21000 10500000
550 2 4 1 16 20 5 5 53 29150 16032500
600 1 2 1 3 2 4 10 23 13800 8280000
n.j 21 33 21 48 41 15 21 200 98450 49377500
n.j yj 420 726 525 1296 1189 465 672 5293
n.j yj2 8400 15972 13125 34992 34481 14415 21504 142889
5
1X 98450
X= ni. xi = = 492.25
n i=1 200
5
1X 49377500
ni. x2i − (X)2 =
V ar(X) = − (492.25)2 = 4577.437
n i=1 200
p √
and σX = V ar(X) = 4577.437 = 67.657
Moreover,
7
1X 5293
Y = n.j yj = = 26.465
n j=1 200
7
1X 142889
V ar(Y ) = n.j yj2 − (Y )2 = − (26.465)2 = 14.049
n j=1 200
p
and σY = V ar(Y ) = 3.748 We can also present the marginal distributions of X and Y as
follows.
28
xi ni. fi. ni. xi ni. x2i
400 48 0.24 19200 7680000
450 34 0.17 15300 6885000
500 42 0.21 21000 10500000
550 53 0.265 29150 16032500
600 23 0.115 13800 8280000
Total 200 1 98450 49377500
Total/n 492.25 246887.5
Definition 10.
The ith conditional relative frequency of X given Y = yj is the proportion of individuals for
which X = xi in the sub-population constituted of individuals for which Y = yj . It is given
by
nij
fi/Y =yj = .
n.j
29
xi ni3 fi/Y =y3
400 6 0.286
450 5 0.238
500 8 0.381
550 1 0.048
600 1 0.048
Total 21 1
ni3 ni3
fi/Y =y3 = = .
n.3 21
- Determine the conditional distribution of the number of teachers of the schools having 450
pupils.
We have to determine the conditional distribution of Y given X = x2 = 450. This conditional
distribution is as follows.
yj n2j fj/X=x2
20 4 0.118
22 14 0.412
25 5 0.147
27 3 0.088
29 4 0.118
31 3 0.088
32 1 0.029
Total 34 1
n2j n2j
fj/X=x3 = = .
n2. 34
30
Conditional variance of X given Y = yj
The conditional variance of X given Y = yj is defined by
k k
1 X 2 1 X 2
V ar(X/Y = yj ) = nij xi − X /Y =yj = nij x2i − X /Y =yj
n.j i=1 n.j i=1
31
yj n2j fj/X=x2 n2j yj n2j yj2
20 4 0.118 80 1600
22 14 0.412 308 6776
25 5 0.147 125 3125
27 3 0.088 81 2187
29 4 0.118 116 3364
31 3 0.088 93 2883
32 1 0.029 32 1024
Total 34 1 835 20959
7
1 X 835
Y /X=x2 = n2j yj = = 24.559
n2. j=1 34
7
1 X 2 20959
V ar(Y /X = x2 ) = n2j yj2 − Y /X=x2 = − (24.559)2 = 13.297
n2. j=1 34
p
and σ(Y /X = x2 ) = V ar(Y /X = x2 ) = 3.647
- Determine the conditional distribution as well as the conditional mean and standard devi-
ation of the number of teachers of the schools having at most 500 pupils.
We have to determine the conditional distribution of Y given X ≤ x3 (x3 = 500).
Y /X ≤ x3
20 22 25 27 29 31 32
X
400 14 10 6 8 5 2 3
450 4 14 5 3 4 3 1
500 0 3 8 18 10 1 2
Total (nj/X≤x3 ) 18 27 19 29 19 6 6
So
7
1 X 3141
Y /X≤x3 = nj/X≤x3 yj = = 25.331
124 j=1 124
32
7
1 X 2 81173
V ar(Y /X ≤ x3 ) = nj/X≤x3 yj2 − Y /X≤x3 = − (25.331)2 = 12.961
124 j=1 124
p
and σ(Y /X ≤ x3 ) = V ar(Y /X ≤ x3 ) = 3.600
Example 1 (continued):
Calculate the covariance of the number of pupils and the number of teachers.
33
Y
20 22 25 27 29 31 32 ni.
X
112000 88000 60000 86400 58000 24800 38400 467600
400
14 10 6 8 5 2 3 48
36000 138600 56250 36450 52200 41850 14400 375750
450
4 14 5 3 4 3 1 34
0 33000 100000 243000 145000 15500 32000 568500
500
0 3 8 18 10 1 2 42
22000 48400 13750 237600 319000 85250 88000 814000
550
2 4 1 16 20 5 5 53
12000 26400 15000 48600 34800 74400 192000 403200
600
1 2 1 3 2 4 10 23
182000 334400 245000 652050 609000 241800 364800 2629050
n.j
21 33 21 48 41 15 21 200
34
So
5 7
1 XX 2629050
cov(X, Y ) = nij xi yj − X Y = − (492.25)(26.465) = 117.854
n i=1 j=1 200
Proposition 1.
Let X and Y be two discrete statistical variables taking respectively the values x1 < x2 <
· · · < xk and y1 < y2 < · · · < yl and let a, b, c, d ∈ R be some constants. We have
2. cov(X, X) = V ar(X).
4. cov(aX + b, cY + d) = ac cov(X, Y ).
5. |cov(X, Y )| ≤ σX σY .
Proof.
1. We have
k l l k
1 XX 1 XX
cov(X, Y ) = nij xi yj − X Y = nij yj xi − Y X = cov(Y, X).
n i=1 j=1 n j=1 i=1
2. We have
k k
1 XX 2
cov(X, X) = nij xi xj − X
n i=1 i=1
Since X can not take two different values at the same time, we have
ni if i = j
nij =
0 if i ̸= j.
Thus
k
1X 2
cov(X, X) = ni x2i − X = V ar(X).
n i=1
35
3. We have
k l
1 XX 2
V ar(X + Y ) = nij xi + yj − X + Y
n i=1 j=1
k l
1 XX 2
= nij xi − X + yj − Y
n i=1 j=1
k l k l k l
1 XX 2 1 X X 2 2 X X
= nij xi − X + nij yj − Y + nij xi − X yj − Y
n i=1 j=1 n i=1 j=1 n i=1 j=1
k l l k
1X 2 X 1X 2 X
= xi − X nij + yj − Y nij + 2 cov(X, Y )
n i=1 j=1
n j=1 i=1
k l
1X 2 1 X 2
= ni. xi − X + n.j yj − Y + 2 cov(X, Y )
n i=1 n j=1
= V ar(X) + V ar(Y ) + 2 cov(X, Y ).
4. We have
k l
1 XX
cov(aX + b, cY + d) = nij axi + b − aX + b cyj + d − cY + d
n i=1 j=1
k l
1 XX
= nij axi + b − aX − b cyj + d − cY − d
n i=1 j=1
k l
1 XX
= ac nij xi − X yj − Y
n i=1 j=1
k l
ac X X
= nij xi − X yj − Y
n i=1 j=1
= ac cov(X, Y ).
5. To prove this property, we need the following Cauchy-Schwarz inequality which we will
first establish.
For all real numbers a1 , a2 , . . . , ap and b1 , b2 , . . . , bp , we have
v
p u p p
X uX X
2
ai b i ≤ t ai b2i .
i=1 i=1 i=1
Set for t ∈ R
p p p p p
X X X X X
2 2 2 2 2 2
P (t) = (ai t + bi ) = (ai t + 2ai bi t + bi ) = t ai + 2t ai b i + b2i .
i=1 i=1 i=1 i=1 i=1
36
By definition the polynomial P (t) is positive for all t ∈ R, so its discriminant ∆ is
negative. Thus
p
!2 p
! p
! p
!2 p
! p
!
X X X X X X
2 2 2 2
∆= 2 ai b i − 4 ai bi = 4 ai b i − 4 ai bi ≤ 0
i=1 i=1 i=1 i=1 i=1 i=1
p
!2 p
! p
!
X X X
=⇒ 4 ai bi ≤4 a2i b2i
i=1 i=1 i=1
p
!2 p
! p
!
X X X
=⇒ ai bi ≤ a2i b2i
i=1 i=1 i=1
v !2 v
u p u p ! p
!
u X u X X
=⇒t ai b i ≤t a2i b2i
i=1 i=1 i=1
v
p u p p
! !
X u X X
=⇒ ai b i ≤ t a2i b2i .
i=1 i=1 i=1
37
2.3.3 Correlation coefficient
The correlation coefficient of the two statistical variables X and Y is defined by
cov(X, Y )
ρX,Y = .
σX σY
In Example 1 above, we have
cov(X, Y ) 117.854
ρX,Y = = = 0.465
σX σY (67.657)(3.748)
Remark 8.
Since |cov(X, Y )| ≤ σX σY , we have |ρX,Y | ≤ 1, thus −1 ≤ ρX,Y ≤ 1.
If ρX,Y = 1 or near to 1, there exists a linear relation between X and Y with positive slope.
If ρX,Y = −1 or near to −1, there exists a linear relation between X and Y with negative
slope.
If ρX,Y = 0 or near to 0, there is no linear relation between X and Y . In this case, the points
in the scatter plot of the two variables X and Y are arbitrary placed.
2.4 Fittings
2.4.1 Fitting of type Y = aX + b
The line of best fit (or the regression line) of Y on X using the least square method is given
by Y = aX + b, where
a = cov(X, Y )
V ar(X)
b = Y − aX.
38
To determine the regression line of Y on X, we have to calculate the coefficients a and b.
xi yi x2i yi2 xi yi
2 2 4 4 4
5 3 25 9 15
7 8 49 64 56
8 9 64 81 72
11 8 121 64 88
13 10 169 100 130
14 13 196 169 182
16 14 256 196 224
20 13 400 169 260
24 19 576 361 456
Total 120 99 1860 1217 1487
Total/n 12 9.9 186 121.7 148.7
10 10
1X 120 1X 99
X= xi = = 12, Y = yi = = 9.9
n i=1 10 n i=1 10
10 10
1X 2 2 1860 1X 2 2 1217
V ar(X) = xi − X = −(12)2 = 42, V ar(Y ) = yi − Y = −(9.9)2 = 23.69
n i=1 10 n i=1 10
10
1X 1487
cov(X, Y ) = xi yi − X Y = − 12 × 9.9 = 29.9
n i=1 10
Therefore
a = cov(X, Y ) = 29.9 = 0.712
V ar(X) 42
b = Y − aX = 9.9 − (0.712) × 12 = 1.356
Y = (0.712)X + 1.356
In the next figure, we represent the scatter plot of (X, Y ) as well as the regression line.
39
We remark that the points (xi , yi ) are placed in a position near to the regression line. This
is confirmed by the correlation coefficient
cov(X, Y ) 29.9
ρX,Y = =√ = 0.948
σX σY 42 × 23.69
which is near to 1.
- The regression line can be used to estimate (predict) the value of Y corresponding to a
value of X which does not exist in the table. For example to estimate the number of days
of absence of an employee in service from 27 years, we calculate
Remark 9.
1. The scatter plot can take many forms depending on the value of ρX,Y . When ρX,Y = 1
(resp. ρX,Y = −1) the points (xi , yi ) are collinear and lie on a line with positive (resp.
negative) slope. When ρX,Y ≈ 1 (resp. ρX,Y ≈ −1), the scatter plot is near to a line
with positive (resp. negative) slope and when ρX,Y ≈ 0, the points of the scatter plot
are arbitrary placed. Here are some examples.
40
2. The equation Y = aX + b should not be solved to determine the regression line of X
on Y , we have to calculate the coefficients c and d of the equation X = cY + d, using
their formulas.
cov(X, Y )
3. The regression line of Y on X is Y = aX + b, with a = and b = Y − aX, so
V ar(X)
Y = aX+Y − aX
=⇒ Y − Y = a X − X
cov(X, Y )
= X −X
V ar(X)
cov(X, Y )
= 2
σY X − X
σX σY
σY
Y − Y = ρX,Y X −X
σX
and we can prove in the same way that the regression line of X on Y can be written
as
σX
X − X = ρX,Y Y −Y .
σY
So, if ρX,Y = 1 or ρX,Y = −1, the two regression lines coincide and only in this case,
we can solve one equation to determine the other one.
41
2.4.2 Fitting of type Y = B × AX
The regression line is not always appropriate to describe the relation between two statistical
variables X and Y . In some situations, the scatter plot may suggest other forms of functions
such as an exponential function of the form Y = B × AX . We will illustrate this fitting
through an example.
Example 3:
The number X of open checkouts in a hypermarket and the average wait time Y (in minutes)
have been recorded in the following table.
Number of open checkouts (X) 3 4 5 6 8 10 12
Average wait time (Y ) 16 12 9.6 7.9 6 4.7 4
We will fit Y on X using the equation Y = B × AX , where A, B > 0 are constants. In order
to determine the coefficients A and B, we use the natural logarithm to linearise the equation
Z = aX + b.
42
7 7
1X 48 1X 14.313
X= xi = = 6.857, Z = zi = = 2.045
n i=1 7 n i=1 7
7
1X 2 2 394
V ar(X) = xi − X = − (6.857)2 = 9.268
n i=1 7
7
1X 88.419
cov(X, Z) = xi zi − X Z = − (6.857)(2.045) = −1.392
n i=1 7
Therefore
a = cov(X, Z) = − 1.392 = −0.150
V ar(X) 9.268
b = Z − aX = 2.045 + (0.150)(6.857) = 3.074
and
A = ea = 0.861
B = eb = 21.628
Y = (21.628) × (0.861)X .
In the next figure, we represent this function as well as the scatter plot of (X, Y ).
43
We remark that the points (xi , yi ) are near to the fitting curve.
- Question: Estimate the average wait time when 9 checkouts are open.
Response:
Y = (21.628) × (0.861)9 = 5.624 minutes.
T = aZ + b.
44
5
1X 2 2 17.715
V ar(Z) = zi − Z = − (1.831)2 = 0.190
n i=1 5
5
1X 37.361
cov(Z, T ) = zi ti − Z T = − (1.831)(4.318) = −0.434
n i=1 5
Therefore
a = cov(Z, T ) = − 0.434 = −2.284
V ar(Z) 0.190
b = T − aZ = 4.318 + (2.284)(1.831) = 8.500 =⇒ B = eb = 4914.769
Y = (4914.769) × X −2.284 .
In the next figure, we represent this function as well as the scatter plot of (X, Y ).
45
precisely, if we have a set of data (xi , yi )1≤i≤n and we want to know the best type of fitting
to describe this scatter plot among the linear fitting Y = aX + b, the exponential fitting
Y = B × AX and the power fitting Y = B × X a , we proceed as follows.
- We calculate ρ2X,Y which represents the degree of strength of the linear relation between X
and Y (corresponding to the linear fitting Y = aX + b).
- We calculate ρ2X,Z , where Z = ln(Y ), which represents the degree of strength of the linear
relation between X and Z = ln(Y ) (corresponding to the exponential fitting Y = B × AX ).
- We calculate ρ2U,V , where U = ln(X) and V = ln(Y ), which represents the degree of strength
of the linear relation between U = ln(X) and V = ln(Y ) (corresponding to the power fitting
Y = B × X a ).
- The largest value among ρ2X,Y , ρ2X,Z and ρ2U,V determines the best type of fitting to be used
to describe our scatter plot (xi , yi )1≤i≤n .
46
Probability
Part II
Probability
47
Chapter 3
3.1.1 k−permutation
A k−permutation of E is an ordered selection of k items from E. The items are selected
without repetition.
The permutation coefficient Pnk is the number of all k−permutations of a set of n items.
We have
n!
Pnk = ,
(n − k)!
with n! = n × (n − 1) × (n − 2) × · · · × 1, for n ∈ N∗ (by convention 0! = 1).
Example 1:
5 athletes participate in a race. The first one who crosses the finish line wins a gold medal,
the second one wins a silver medal and the third wins a bronze medal. At their arrival, the
winners have to find the list of their names already written. So, the organizing committee
must prepare all possible lists. How many lists are there?
Solution:
The order of athletes is important and there is no repetition in the lists, so the number of
possible lists is
5! 5! 5 × 4 × 3 × 2!
P53 = = = = 60 possible lists.
(5 − 3)! 2! 2!
Example 2:
The students of a faculty want to create an association. They organized elections to choose
the members of the committee of this association. The committee consists of
48
- A president.
- A vice-president.
- A treasurer.
These positions will be attributed in accordance with the number of votes received by each
candidate. 6 candidates are presented to the elections. How many possible committees can
we expect?
Solution:
The order of candidates is important and there is no repetition in the committees, so the
number of possible committees is
6! 6! 6 × 5 × 4 × 3!
P63 = = = = 120 possible committees.
(6 − 3)! 3! 3!
Example 3:
One urn contains 3 white balls and 4 black ones. We draw from this urn, 3 balls successively
and without replacement. What is the number of possible cases?
Solution:
The total number of balls is 4 + 3 = 7.
The order of balls is important since the sampling is successive and there is no repetition
because the sampling is without replacement. So, the number of possible cases is
7! 7! 7 × 6 × 5 × 4!
P73 = = = = 210 possible cases.
(7 − 3)! 4! 4!
3.1.2 Permutation
An n−permutation of E is simply called permutation. It is a selection of all the items of E
in a certain order. The number of permutations of a set of n items is Pnn = n!.
3.1.3 Combination
A combination of k items of E is any subset of k items of E.
In a combination, the items are selected without repetition and their order is not important.
The number of combinations of k items from a set of n items is given by
n!
Cnk = .
k!(n − k)!
Example 4:
We return to Example 1, but we assume that the first three winners share the same award
with equal parts. How many possible winner groups are there?
Solution:
Here there is no repetition and the order of athletes is not important, so the number of
possible winner groups is
5! 5 × 4 × 3!
C53 = = = 10 possible winner groups.
3! × 2! 3! × 2
49
Example 5:
We return to Example 2, but here the students decide that the committee of the association
operates in a collegial manner, without difference between members. How many possible
committees are there?
Solution:
Here there is no repetition and the order of candidates is not important, so the number of
possible committees is
6! 6 × 5 × 4 × 3!
C63 = = = 20 possible committees.
3! × 3! 3! × 3 × 2
Example 6:
We return to Example 3, but here the sampling is simultaneous. What is the number of
possible cases?
Solution:
Here there is no repetition and the order of balls is not important because the sampling is
simultaneous. So, the number of possible cases is
7! 7 × 6 × 5 × 4!
C73 = = = 35 possible cases.
3! × 4! 3 × 2 × 4!
Sample space
A sample space is the set of all possible outcomes of a random experiment. It is denoted by
Ω.
Example:
- For the throwing of a dice, the sample space is Ω = {1, 2, 3, 4, 5, 6}.
- In the examples of the urn (Examples 3 and 6 above), the cardinal number of Ω in Ex-
ample 3 (successive sampling without replacement) is 210 and in Example 6 (simultaneous
sampling), it is equal to 35.
Remark 11.
1. The sample space may be finite as in the the case of the dice throwing or infinite as in
the following example.
50
Example:
We throw successively a dice until we obtain the face ”6”.
The random experiment consists in observing the number of necessary throws to obtain
the face ”6”. Thus, the sample space is Ω = N∗ .
2. The sample space may be discrete as in the previous example, or continuous as in the
following example.
Example :
If the random experiment consists in observing the lifetime of an electronic component,
this lifetime T can take any value in the interval [0, +∞[, so Ω = [0, +∞[.
Elementary event
The elements of the sample space are called elementary events.
Example:
In the example of the dice throwing, the elementary events are: 1, 2, 3, 4, 5 and 6.
Composite event
Any subset of the sample space is called an event or a composite event.
Example:
In the example of the dice throwing, we denote
A: "The outcome of the throw is even".
A = {2, 4, 6} is a composite event.
Operations on events
Let A and B be two events.
- The union A ∪ B is realized when at least one the two events A and B is realized.
Remark 12.
In view of De Morgan’s laws, we have
1. (A ∩ B)c = Ac ∪ B c .
2. (A ∪ B)c = Ac ∩ B c .
51
3.2.2 General definition of a probability
Definition 11.
Let Ω be a finite sample space. We call probability any map P from P(Ω) to [0, 1] satisfying
a) P (Ω) = 1.
2. P (Ac ) = 1 − P (A).
3. P (A \ B) = P (A) − P (A ∩ B).
2. We have A ∩ Ac = Φ, so
P (A ∪ Ac ) = P (A) + P (Ac )
=⇒ P (Ω) = P (A) + P (Ac )
=⇒ 1 = P (A) + P (Ac )
=⇒ P (Ac ) = 1 − P (A).
52
5. If B ⊆ A, we have from the previous property
P (A) − P (B) = P (A \ B) ≥ 0 =⇒ P (A) ≥ P (B).
6. We have A ∪ B = (A \ B) ∪ (B \ A) ∪ (A ∩ B) and (A \ B) ∩ (B \ A) ∩ (A ∩ B) = Φ.
Thus
P (A ∪ B) = P ((A \ B) ∪ (B \ A) ∪ (A ∩ B))
= P (A \ B) + P (B \ A) + P (A ∩ B)
= P (A) − P (A ∩ B) + P (B) − P (B ∩ A) + P (A ∩ B) (in view of property 3)
= P (A) + P (B) − 2P (A ∩ B) + P (A ∩ B)
= P (A) + P (B) − P (A ∩ B).
Remark 13.
We say that the events A and B are incompatible if A ∩ B = Φ.
Independence
We say that the two events A and B are independent if P (A ∩ B) = P (A) × P (B).
If A and B are independent, then Ac and B (resp. A and B c ; Ac and B c ) are also independent.
53
3.2.4 Conditional probabilities
Let A and B be two events such that P (A) ̸= 0. The conditional probability of B given A
is defined by
P (A ∩ B)
P (B/A) = .
P (A)
Remark 14.
Example :
We throw a fair dice. Given that the outcome is even, what is the probability of getting a
multiple of 3?
Solution :
Denote by
A: "Getting an even number" and B: "Getting a multiple of 3".
We have A = {2, 4, 6} and B = {3, 6}. Thus
P (A ∩ B) P ({6}) 1/6 1
P (B/A) = = = = .
P (A) P ({2, 4, 6}) 3/6 3
Theorem 1.
Let A and B be two events with non-zero probability, we have
P (B/A) × P (A)
P (A/B) = .
P (B)
Proof.
The proof follows immediately from the definition of the conditional probability. Indeed, we
have
P (A ∩ B)
× P (A)
P (B/A) × P (A) P (A) P (A ∩ B)
= = = P (A/B).
P (B) P (B) P (B)
Example:
We return to the previous example of throwing a fair dice. Given that the outcome is a
multiple of 3, what is the probability of getting an even number?
54
Solution:
In view of Bayes formula, we have
1 3
P (B/A) × P (A) × 1
P (A/B) = = 3 6 = .
P (B) 2 2
6
Law of total probability
Theorem 2.
Let A1 , A2 , . . . , Ak be a partition of Ω such that P (Ai ) ̸= 0 for all i ∈ {1, 2, . . . , k}. We have
for all B ∈ P(Ω)
Xk
P (B) = P (B/Ai ) × P (Ai ).
i=1
Proof.
We have
P (B) = P (B ∩ Ω)
k
!!
[
=P B∩ Ai
i=1
k
!
[
=P (B ∩ Ai )
i=1
k
X
= P (B ∩ Ai ) (because the events (B ∩ Ai )1≤i≤k are incompatible)
i=1
k
X P (B ∩ Ai )
= × P (Ai )
i=1
P (Ai )
k
X
= P (B/Ai ) × P (Ai ).
i=1
55
Theorem 3.
Let A1 , A2 , . . . , Ak be a partition of Ω such that P (Ai ) ̸= 0 for all i ∈ {1, 2, . . . , k}. We have
for all B ∈ P(Ω) such that P (B) ̸= 0
P (B/Ai ) × P (Ai )
P (Ai /B) = Pk , ∀i ∈ {1, . . . , k}.
j=1 P (B/Aj ) × P (Aj )
Proof.
In view of Bayes formula, we have
P (B/Ai ) × P (Ai )
P (Ai /B) =
P (B)
and the law of total probability allows to write
k
X
P (B) = P (B/Aj ) × P (Aj ).
j=1
Thus
P (B/Ai ) × P (Ai )
P (Ai /B) = Pk .
j=1 P (B/A j ) × P (A j )
Chain rule
Theorem 4.
Let A1 , A2 , . . . , Ak be a sequence of events such that P (A1 ∩ A2 ∩ · · · ∩ Ak ) ̸= 0. We have
P (A1 ∩ A2 ∩ · · · ∩ Ak ) = P (A1 )P (A2 /A1 )P (A3 /A1 ∩ A2 ) × · · · × P (Ak /A1 ∩ A2 ∩ · · · ∩ Ak−1 ).
Proof.
We proceed by induction on k.
- For k = 2, we have
P (A1 ∩ A2 )
P (A2 /A1 ) = =⇒ P (A1 ∩ A2 ) = P (A1 )P (A2 /A1 ).
P (A1 )
So, the relation is satisfied.
- We assume that the relation is satisfied for k and we prove it for k + 1.
We have
P (A1 ∩ A2 ∩ · · · ∩ Ak+1 )
P (Ak+1 /A1 ∩ A2 ∩ · · · ∩ Ak ) =
P (A1 ∩ A2 ∩ · · · ∩ Ak )
=⇒ P (A1 ∩ A2 ∩ · · · ∩ Ak+1 ) = P (A1 ∩ A2 ∩ · · · ∩ Ak )P (Ak+1 /A1 ∩ A2 ∩ · · · ∩ Ak )
= P (A1 )P (A2 /A1 )P (A3 /A1 ∩ A2 ) × · · · × P (Ak /A1 ∩ A2 ∩ · · · ∩ Ak−1 )×
P (Ak+1 /A1 ∩ A2 ∩ · · · ∩ Ak ).
So, the relation is satisfied for k + 1.
Thus, the relation is satisfied for all k ≥ 2.
56