Poly Proba Stat

Download as pdf or txt
Download as pdf or txt
You are on page 1of 57

PEOPLE’S DEMOCRATIC REPUBLIC of ALGERIA

MINISTRY of HIGHER EDUCATION and SCIENTIFIC RESEARCH


NATIONAL POLYTECHNIC INSTITUTE MALEK BENNABI of CONSTANTINE

COURSE HANDOUT

PROBABILITY and STATISTICS

Mohamed BOUKELOUA

Academic year 2023/2024


Contents

Part I: Descriptive Statistics 3

1 Statistical series with one character 4


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Generalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.2 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Types of characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Statistical series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Qualitative case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.2 Discrete quantitative case . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.3 Continuous quantitative case . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 Representation of a statical series . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.1 Qualitative case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.2 Discrete quantitative case . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.3 Continuous quantitative case . . . . . . . . . . . . . . . . . . . . . . . 12
1.4 Parameters of a statistical series . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4.1 Discrete case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Central tendency parameters . . . . . . . . . . . . . . . . . . . . . . . 15
Dispersion parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.2 Continuous case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Central tendency parameters . . . . . . . . . . . . . . . . . . . . . . . 20
Dispersion parameters . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2 Statistical series with two characters 25


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Distributions and characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.1 Marginal distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2.2 Marginal characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 27
Marginal mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Marginal variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.2.3 Conditional distribution . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.4 Conditional Characteristics . . . . . . . . . . . . . . . . . . . . . . . 30

1
Conditional mean of X given Y = yj . . . . . . . . . . . . . . . . . . 30
Conditional variance of X given Y = yj . . . . . . . . . . . . . . . . . 31
2.3 Covariance of two characters . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2 Properties of the covariance . . . . . . . . . . . . . . . . . . . . . . . 35
2.3.3 Correlation coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4 Fittings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.1 Fitting of type Y = aX + b . . . . . . . . . . . . . . . . . . . . . . . 38
2.4.2 Fitting of type Y = B × AX . . . . . . . . . . . . . . . . . . . . . . . 42
2.4.3 Fitting of type Y = B × X a . . . . . . . . . . . . . . . . . . . . . . . 44

Part II: Probability 47

3 Introduction to probability calculus 48


3.1 Reminders on combinatorial analysis . . . . . . . . . . . . . . . . . . . . . . 48
3.1.1 k−permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1.2 Permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.1.3 Combination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.2 Probability of events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.2 General definition of a probability . . . . . . . . . . . . . . . . . . . . 52
3.2.3 Study of the equiprobability . . . . . . . . . . . . . . . . . . . . . . . 53
3.2.4 Conditional probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.2.5 Law of total probability and chain rule . . . . . . . . . . . . . . . . . 54

4 Solutions to exercises 57
4.1 Solutions to exercises of chapter 1 . . . . . . . . . . . . . . . . . . . . . . . . 57

2
Descriptive Statistics

Part I

Descriptive Statistics

3
Chapter 1

Statistical series with one character

1.1 Introduction
1.1.1 Generalities
Descriptive statistics is a collection of methods used to describe, summarize, interpret and
analyse datasets which can be found in a given study. It helps analysts to better understand
the data and to draw conclusions from them. The datasets may be treated using tables,
graphs and numerical characteristics such as the mean, the variance, the quantiles, etc. The
statistical analysis may be univariate or multivariate. Univariate analysis focuses on one
character of the data. The main aspects of interest in this framework are the distribution,
the central tendency and the dispersion. Furthermore, multivariate analysis focuses on the
relationship between two or more characters. The main aspects in this framework are the
covariance, the coefficient of correlation and the conditional distributions. An other impor-
tant topic in descriptive statistics is the regression. This notion deals with the possibility to
establish an equation that links two (or more) variables. Such an equation may be linear,
exponential, polynomial or may have other forms.

1.1.2 Definitions
We will start with some basic definitions of descriptive statistics.

Population
The population is a set of similar items on which the statistical study is based. The number
of elements within a population is called the size of the population.

Sample
A sample is a subset of the population having the same characteristics as it. Samples are
used when the population sizes are too large so as it becomes impossible to include all pos-
sible observations. A sample should represent the population as a whole and not reflect any
bias toward a specific attribute.

4
Statistical unit
Each element in the population is called a statistical unit or an individual.

Statistical character
The character is a particular feature of the observations, in which the statistical study is
interested.

Modalities of a character
The modalities of a character are the different situations taken by this character.

We will illustrate theses definitions by some examples.


Example 1:
The study of the blood group of 150 students in a university.
In this situation:
- The population is comprised of the 150 students of the university. Each student is a
statistical unit.
- The character is the blood group.
- The modalities of this character are A, B, AB and O.
Example 2:
The study of the number of children in 60 families of a city.
In this situation:
- The population is comprised of the 60 families of the city. Each family is a statistical unit.
- The character is the number of children.
- The modalities of this character are for example: 0, 1, 2, 3, 4 and 5.
Example 3:
The study of the size of 200 students in a university.
For this example:
- The population consists of the 200 students of the university. Each student is a statistical
unit.
- The character is the size.
- The modalities of this character may be any values between 1.50 and 1.90 m. Instead of
using all the 200 sizes of the students, it is preferable to group them into classes such as
[1.50, 1.60[, [1.60, 1.65[, [1.65, 1.75[, [1.75, 1.85[ and [1.85, 1.90[.

1.1.3 Types of characters


There are two types of characters: qualitative characters and quantitative characters.

Qualitative character
They are measures of "types" and may be represented by names or symbols. They are re-
lated to categorical variables. The modalities of a qualitative character are words or symbols.
Qualitative characters may also be represented by number codes.

5
Quantitative character
They are measures of values or counts and are expressed as numbers. They are related to
numeric variables. Quantitative characters may be discrete or continuous.
- Discrete quantitative character (or discrete statistical variable): It is a variable that takes
on distinct and countable values. The set of values of such a variable is finite or countable
(at most countable). The modalities are distinct numbers.
- Continuous quantitative character (or continuous statistical variable): It is a variable that
takes on an infinite number of possible values within a given range. The set of values of such
a variable is infinite and uncountable. The modalities are intervals called "Class intervals".

For the previous examples, we have


- In example 1: The character (blood type) is qualitative.
- In example 2: The character (number of children) is discrete quantitative.
- In example 3: The character (size) is continuous quantitative.

1.2 Statistical series


Consider a statistical population including n individuals. Assume that we are interested in
a statistical character related to this population, with k modalities M1 , M2 , . . . , Mk .
Definition 1. (Absolute frequency)
The absolute frequency of the modality Mi (1 ≤ i ≤ k) is the number of individuals corre-
sponding to this modality. It is noted ni .
Definition 2. (Relative frequency)
The relative frequency of the modality Mi (1 ≤ i ≤ k) is the proportion of individuals
corresponding to this modality and it is noted fi . So, we have
ni
fi = , ∀ i ∈ {1, . . . , k}.
n
Remark 1.
We have
Pk
• i=1 ni = n.

• ∀ i ∈ {1, . . . , k}, 0 ≤ fi ≤ 1.
Pk
• i=1 fi = 1.

Definition 3. (Statistical series)


The set {(M1 , n1 ), (M2 , n2 ), . . . , (Mk , nk )} is called a statistical series. It is generally repre-
sented by a statistical table. When the character is quantitative, its values (modalities) are
sorted in ascending order.
In the sequel, we will study in detail the different types of characters, using some examples.

6
1.2.1 Qualitative case
Example 1 (continued):
The study of the blood group of 150 students in a university gave the following results.
Blood group Number of students
A 45
B 25
AB 9
O 71
This statistical series can be represented by the following statistical table.
Modalities ni fi
A 45 0.3
B 25 0.167
AB 9 0.06
O 71 0.473
Total 150 1
ni ni
fi = = , ∀i ∈ {1, 2, 3, 4}.
n 150

1.2.2 Discrete quantitative case


Example 2 (continued):
The study of the number of children in 60 families of a city gave the following results.
Number of children Number of families
0 5
1 10
2 11
3 18
4 11
5 5
This statistical series can be represented by the following statistical table.
xi ni fi
0 5 0.083
1 10 0.167
2 11 0.183
3 18 0.3
4 11 0.183
5 5 0.083
Total 60 1

7
The (xi )1≤i≤6 are the values of the studied statistical variable X (the number of children).
ni ni
fi = = , ∀i ∈ {1, 2, . . . , 6}.
n 60

1.2.3 Continuous quantitative case


Example 3 (continued):
The study of the size (in cm) of 200 students in a university gave the following results.
Class intervals Number of students
[1.50, 1.60[ 20
[1.60, 1.65[ 45
[1.65, 1.75[ 85
[1.75, 1.85[ 40
[1.85, 1.90[ 10
In this continuous quantitative case, the statistical table has the following form.
ei ci ni fi
1.50
160 1.55 20 0.1
1.60
165 1.625 45 0.225
1.65
175 1.70 85 0.425
1.75
185 1.80 40 0.2
1.85
190 1.875 10 0.05
1.90
Total 200 1
The (ei )0≤i≤5 are the limits of the class intervals.
ei−1 + ei
For all i ∈ {1, 2, . . . , 5}, ci = is the centre of the ith class [ei−1 , ei [ and
2
ni ni
fi = = .
n 200

1.3 Representation of a statical series


In this section, we will study some graphical representations of a statistical series for the
different types of characters.

1.3.1 Qualitative case


In this case, a statistical series can be represented by two types of graphics: The bar chart
and the pie chart.

8
Bar chart
This graphic consists of bars representing the modalities of the character. The height of each
bar is determined by either the absolute frequency or the relative frequency of the respective
modality.

Pie chart
A pie chart is a circle partitioned into segments, where each of the segments represents a
modality. The size of each segment depends upon the relative frequency and is determined
by the angle θi = fi × 360◦ .

We will represent our statistical series of Example 1 (Blood group) using these graphics.

Example 1 (continued):
The bar chart (using the relative frequencies) of this statistical series is as follows.

- To draw the pie chart of this series, we need to calculate the angle θi = fi × 360◦ for all
i ∈ {1, 2, 3, 4}.

Modalities ni fi θi
A 45 0.3 108◦
B 25 0.167 60.12◦
AB 9 0.06 21.6◦
O 71 0.473 170.28◦
Total 150 1 360◦

So, the pie chart is as follows.

9
1.3.2 Discrete quantitative case
Let X be a discrete statistical variable taking the values {x1 , x2 , . . . , xk }, with x1 < x2 <
· · · < xk . For all i ∈ {1, 2, . . . , k}, we denote by ni (resp. fi ) the absolute (resp. the relative)
frequency of xi . In this case, the statistical series can be represented by two types of graphics:
The differential diagram and the integral diagram.

Differential diagram (Line graph)


The line graph consists of vertical lines representing the values of X. The height of the line
corresponding to xi is determined by either ni or fi .

Integral diagram (Cumulative frequency curve)


To define this graphic, we need first to define the cumulative absolute frequencies and the
cumulative relative frequencies.

Definition 4. (Cumulative absolute frequency)


For all i ∈ {1, 2, . . . , k}, the cumulative absolute frequency of the ith value xi of X is defined
by
X i
Ni = nj .
j=1

Definition 5. (Cumulative relative frequency)


For all i ∈ {1, 2, . . . , k}, the cumulative relative frequency of the ith value xi of X is defined
by
i
X Ni
Fi = fj = .
j=1
n

10
Definition 6. (The empirical cumulative distribution function)
The empirical cumulative distribution function (ECDF) of X is the function F : R −→ [0, 1]
defined for all x ∈ R by

 0 if x < x1
F (x) = Fi if x ∈ [xi , xi+1 [, for 1 ≤ i ≤ k − 1
1 if x ≥ xk .

The cumulative frequency curve is the graph of the ECDF.


Remark 2.
The ECDF F satisfies the following properties:

• ∀ x ∈ R, 0 ≤ F (x) ≤ 1.

• F is a non-decreasing right continuous function.

• limx→−∞ F (x) = 0 and limx→+∞ F (x) = 1.

Now, we will represent our statistical series of Example 2 (Number of children) using the
above graphics.
Example 2 (continued):
The line graph (using the relative frequencies) of this statistical series is as follows.

- To draw the cumulative frequency curve of this series, we need to calculate the cumu-
lative relative frequencies (Fi )1≤i≤6 .

11
xi ni fi Ni Fi
0 0
0 5 0.083 5 0.083
5 0.083
1 10 0.167 15 0.25
15 0.25
2 11 0.183 26 0.433
26 0.433
3 18 0.3 44 0.733
44 0.733
4 11 0.183 55 0.916
55 0.916
5 5 0.083 60 1
60 1
Total 60 1
So, the cumulative frequency curve is as follows.

1.3.3 Continuous quantitative case


Let X be a continuous statistical variable. We assume that the class intervals of X are
[e0 , e1 [, [e1 , e2 [, . . . , [ek−1 , ek [. For all i ∈ {1, 2, . . . , k}, we denote by ni (resp. fi ) the
absolute (resp. the relative) frequency of the class [ei−1 , ei [. In this case, the statistical
series can be represented by two types of graphics: The differential diagram and the integral
diagram.

Differential diagram (Histogram)


A histogram consists of bars that correspond to the class intervals. For all i ∈ {1, 2, . . . , k},

12
fi
the height of the ith bar is hi = , where di = ei − ei−1 denotes the magnitude of the ith
di
class [ei−1 , ei [. An important consideration for this concept is that the area of each bar is
proportional to the corresponding relative frequency.
Remark 3.
If all the classes have the same magnitude, we can take hi = fi for all i ∈ {1, 2, . . . , k}.

Integral diagram (Cumulative frequency curve)


The cumulative absolute and relative frequencies can be defined in the same way as in the
discrete case. The cumulative frequency curve is the graph of the ECDF, which is defined
in the continuous case as follows.

Definition 7.
The empirical cumulative distribution function (ECDF) of the continuous variable X is the
function F : R −→ [0, 1] defined for all x ∈ R by


 0 if x < e0
 fi
F (x) = Fi−1 + (x − ei−1 ) if x ∈ [ei−1 , ei [, for 1 ≤ i ≤ k − 1

 ei − ei−1
 1 if x ≥ e , k

with F0 = 0.

F is a piecewise linear function and it satisfies the same properties of the ECDF in the
discrete case except the fact that it is continuous on R (and not only right continuous).
Now, we will represent our statistical series of Example 3 (Size of students) using the above
graphics.
Example 3 (continued):
To draw the graphical representations of this statistical series, we need to calculate the
magnitudes (di )1≤i≤5 and the cumulative relative frequencies (Fi )1≤i≤5 .
fi
ei ci ni fi Ni Fi di
di
1.50 0 0
160 1.55 20 0.1 20 0.1 0.1 1
1.60 20 0.1
165 1.625 45 0.225 65 0.325 0.05 4.5
1.65 65 0.325
175 1.70 85 0.425 150 0.75 0.1 4.25
1.75 150 0.75
185 1.80 40 0.2 190 0.95 0.1 2
1.85 190 0.95
190 1.875 10 0.05 200 1 0.05 1
1.90 200 1
Total 200 1

13
The histogram of our statistical series is as follows.

We can also add the frequency polygon by joining the midpoints of the tops of the rectangles.
We plot also the previous and next points on the x−axis to start and end the polygon. These
d1 dk
two points correspond respectively to e0 − and ek + .
2 2

Moreover, the cumulative frequency curve is as follows.

14
1.4 Parameters of a statistical series
In this section, we will study some parameters that measure the central tendency and the
dispersion of a statistical series with a quantitative character. We will deal with the discrete
and the continuous cases separately.

1.4.1 Discrete case


Let {(x1 , n1 ), (x2 , n2 ), . . . , (xk , nk )} (with x1 < x2 < · · · < xk ) be a statistical series corre-
sponding to a discrete statistical variable X.

Central tendency parameters


Central tendency parameters (or location parameters) are statistical parameters that describe
the average or centre of the data such as the arithmetic mean, the mode and the quantiles.

Arithmetic mean
The arithmetic mean of X is defined by
k k
1X X
X= ni xi = f i xi .
n i=1 i=1

Remark 4.

15
If use a transformation Y = aX + b with a, b ∈ R, then Y = aX + b.
Indeed, we have for all i ∈ {1, . . . , k} yi = axi + b, then
k
1X
Y = ni yi
n i=1
k
1X
= ni (axi + b)
n i=1
k
1X
= (ani xi + bni )
n i=1
k k
1X 1X
= ani xi + bni
n i=1 n i=1
k
! k
!
1X 1X
=a ni xi + b ni
n i=1 n i=1
n
= aX + b ×
n
= aX + b.

Mode
The mode of X, denoted by M , is the value(s) having the largest absolute frequency. The
mode may not be unique.

Quantiles
Let p ∈ [0, 1], the quantile of order p (or pth quantile) of X is the value xp of X which divides
the dataset in two parts such that p−proportion of the data are less than or equal to xp and
(1 − p)−proportion of the data are greater than xp . In other words

xp = inf {x ∈ R/F (x) ≥ p},

where F is the ECDF of X.


-Particular cases:
For p = 0.5, x0.5 is called the median of X, denoted by M ed.
For p = 0.25, x0.25 is called the first quartile of X, denoted by Q1 .
For p = 0.75, x0.75 is called the third quartile of X, denoted by Q3 .

Dispersion parameters
Dispersion parameters are statistical parameters that describe the dispersion of the obser-
vations around any particular value.

Variance and standard deviation

16
The variance of X is defined by
k k
1X 2 X 2
V ar(X) = ni xi − X = f i xi − X .
n i=1 i=1

It measures of how far the dataset is spread out from their average value.
The variance is always non-negative and the standard deviation of X is defined by
p
σX = V ar(X).

The standard deviation has the same unit of measurement as the data whereas the unit of
the variance is the square of the units of the observations.
Remark 5.

i) We have
k k
1X 2 X 2 2
V ar(X) = ni x2i − X = f i xi − X .
n i=1 i=1

Indeed
k
1X 2
V ar(X) = n i xi − X
n i=1
k
1X  2 2 
= ni xi − 2xi X + X
n i=1
k k k
!
1 X X X 2
= ni x2i − 2ni xi X + ni X
n i=1 i=1 i=1
k k 2 k
1 X 2X X X X
= ni x2i −ni xi + ni
n i=1 i=1
n n i=1
k 2
1X 2 X
= ni x2i − 2 X + ×n
n i=1 n
k
1X 2
= ni x2i − X .
n i=1

ii) If Y = aX + b with a, b ∈ R, then V ar(Y ) = a2 V ar(X) and σY = |a|σX .

17
Indeed
k
1X 2
V ar(Y ) = ni yi − Y
n i=1
k
1X 2
= ni axi + b − aX − b
n i=1
k
1X 2
= ni a2 xi − X
n i=1
k
a2 X 2
= n i xi − X
n i=1
= a2 V ar(X)

and p p
σY = V ar(Y ) = a2 V ar(X) = |a|σX .

Range
The range of X is defined as the difference between the maximum and minimum value of X.

R = max xi − min xi .
1≤i≤k 1≤i≤k

Interquartile range
The interquartile range of X is defined as the difference between the first and the third
quartiles of X.
IQ = Q3 − Q1 .
Now, we will calculate the central tendency and the dispersion parameters of the statistical
series of Example 2 (Number of children).
Example 2 (continued):
To calculate the parameters of this statistical series, we need to add the following columns
in the statistical table.

18
xi ni fi Ni Fi n i xi ni x2i
0 0
0 5 0.083 5 0.083 0 0
5 0.083
1 10 0.167 15 0.25 10 10
15 0.25
2 11 0.183 26 0.433 22 44
26 0.433
3 18 0.3 44 0.733 54 162
44 0.733
4 11 0.183 55 0.916 44 176
55 0.916
5 5 0.083 60 1 25 125
60 1
Total 60 1 155 517
Total/n 2.583 8.617

- The central tendency parameters are:


• The arithmetic mean
6
1X 155
X= ni xi = = 2.583
n i=1 60
• The mode M = 3 because the largest ni is 18.
• The median:
We have 0.433 < 0.5 < 0.733, then M ed = 3.
• The quartiles:
We remark that 0.25 is on the line between the values 1 and 2, so we take the smallest value
Q1 = 1.
We have 0.733 < 0.75 < 0.916, then Q3 = 4.
- Furthermore, the dispersion parameters are:
• The variance
6
1X 2 517
V ar(X) = ni x2i − X = − (2.583)2 = 8.617 − (2.583)2 = 1.945
n i=1 60
p √
• The standard deviation σX = V ar(X) = 1.945 = 1.395
• The range
R = max xi − min xi = 5 − 0 = 5.
1≤i≤6 1≤i≤6

• The interquartile range


IQ = Q3 − Q1 = 4 − 1 = 3.

1.4.2 Continuous case


Let {([e0 , e1 [, n1 ), ([e1 , e2 [, n2 ), . . . , ([ek−1 , ek [, nk )} be a statistical series corresponding to a
continuous statistical variable X.

19
Central tendency parameters
Arithmetic mean
The arithmetic mean of X is defined by
k k
1X X
X= n i ci = f i ci .
n i=1 i=1

Modal class
The modal class of X, denoted by M , is the class(es) that correspond(s) to the largest ni /di
(or fi /di ). It may not be unique.

Quantiles
The quantiles are defined in the same way as in the discrete case. To calculate them, we
use the method of linear interpolation. For example, to calculate the median, we determine
i such that Fi−1 ≤ 0.5 < Fi which means that M ed ∈ [ei−1 , ei [, then we apply the formula
M ed − ei−1 0.5 − Fi−1 0.5 − Fi−1
= =⇒ M ed = ei−1 + (ei − ei−1 ) .
ei − ei−1 Fi − Fi−1 Fi − Fi−1
For any p ∈ [0, 1], we apply the same method to calculate the quantile xp , using the appro-
priate proportion p.

Dispersion parameters
Variance and standard deviation
The variance of X is defined by
k k
1X 2 1X 2 2
V ar(X) = n i ci − X = ni ci − X
n i=1 n i=1
and the standard deviation of X is defined by
p
σX = V ar(X).
Range
The range of X is defined by
R = ek − e0 .
Interquartile range
The interquartile range of X is defined by
IQ = Q3 − Q1 .
Now, we will calculate the central tendency and the dispersion parameters of the statistical
series of Example 3 (Size of students).
Example 3 (continued):
To calculate the parameters of this statistical series, we need to add the following columns
in the statistical table.

20
fi
ei ci ni fi Ni Fi di ni ci ni c2i
di
1.50 0 0
160 1.55 20 0.1 20 0.1 0.1 1 31 48.05
1.60 20 0.1
165 1.625 45 0.225 65 0.325 0.05 4.5 73.125 118.828
1.65 65 0.325
175 1.70 85 0.425 150 0.75 0.1 4.25 144.5 245.65
1.75 150 0.75
185 1.80 40 0.2 190 0.95 0.1 2 72 129.6
1.85 190 0.95
190 1.875 10 0.05 200 1 0.05 1 18.75 35.156
1.90 200 1
Total 200 1 339.375 577.284
Total/n 1.697 2.886
- The central tendency parameters are:
• The arithmetic mean
5
1X 339.375
X= n i ci = = 1.697
n i=1 200
• The modal class is M = [1.60, 1.65[ because the largest fi /di is 4.5
• The median:
We have 0.325 < 0.5 < 0.75, then M ed ∈ [1.65, 1.75[ and the method of linear interpolation
gives
M ed − 1.65 0.5 − 0.325
=
1.75 − 1.65 0.75 − 0.325
M ed − 1.65 0.175
=⇒ = = 0.412
0.1 0.425
=⇒ M ed = 0.412 × 0.1 + 1.65 = 1.691
• The quartiles:
We have 0.1 < 0.25 < 0.325, then Q1 ∈ [1.60, 1.65[ and the method of linear interpolation
gives
Q1 − 1.60 0.25 − 0.1
=
1.65 − 1.60 0.325 − 0.1
Q1 − 1.60 0.15
=⇒ = = 0.667
0.05 0.225
=⇒ Q1 = 0.667 × 0.05 + 1.60 = 1.633
We remark that 0.75 is on the line, so Q3 = 1.75.
- Furthermore, the dispersion parameters are:
• The variance
5
1X 2 2 577.284
V ar(X) = n i ci − X = − (1.697)2 = 2.886 − (1.697)2 = 0.006
n i=1 200

21
p √
• The standard deviation σX = V ar(X) = 0.006 = 0.077
• The range
R = ek − e0 = 1.90 − 1.50 = 0.40
• The interquartile range

IQ = Q3 − Q1 = 1.75 − 1.633 = 0.117

1.5 Exercises
Exercise 1:
In each case, determine the population, the statistical unit, the studied character and its
type.
1. A teacher recorded the scores of the test of mathematics obtained by the pupils of a
class.
2. A survey on the marital status has been conducted among the employees of a company.
3. The study of the weight of the students of the Preparatory class department.
4. The study of the maximum temperature in a specific day in the 58 wilayas of Algeria.
5. A survey conducted among the employees of a company dealt with the means of trans-
port used to get to work.
6. The study of the number of mobiles in each house of a neighbourhood.
7. The study of the monthly salary of the employees of a company.
Exercise 2:
A survey on the hobbies of 80 inhabitants of a city gave the following results.
Hobbies Number of inhabitants
Reading 20
Sport 24
Cinema 20
Theatre 16

1. Determine the population, the studied character, its type and its modalities.
2. Draw up the statistical table with absolute and relative frequencies.
3. Draw the appropriate graphical representations.

Exercise 3:
A survey conducted among 120 employees of a company dealt with the means of transport
used to get to work. The results of this survey are given in the following table.

22
Means of transport Number of employees
Private car 18
Taxi 24
Bus 30
Tramway 42
Motorcycle 6

1. Determine the population, the studied character, its type and its modalities.

2. Draw up the statistical table with absolute and relative frequencies.

3. Draw the appropriate graphical representations.

Exercise 4:
A study on the number of milk litres bought each week by 100 consumers gives the following
results.
Number of bought milk litres 0 1 2 3 4 5
Number of consumers 5 20 35 25 10 5

1. Determine the population, the studied character, its type and its modalities.

2. Calculate the central tendency parameters.

3. Calculate the dispersion parameters.

4. Draw the appropriate graphical representations.

Exercise 5:
The shoe sizes of the pupils of a school have been recorded in the following table.
Shoe size 36 37 38 39 40 41 42
Number of pupils 8 20 32 32 30 24 14

1. Determine the population, the studied character, its type and its modalities.

2. Calculate the central tendency parameters.

3. Calculate the dispersion parameters.

4. Draw the appropriate graphical representations.

Exercise 6:
A farmer recorded the mass of the eggs laid in a specific day. The masses are given in the
following table.
Mass (in gram) [38, 47[ [47, 52[ [52, 57[ [57, 62[ [62, 72[ [72, 82[
Number of eggs 51 74 112 92 62 9

23
1. Determine the population, the studied character, its type and its modalities.

2. Calculate the central tendency parameters.

3. Calculate the dispersion parameters.

4. Draw the appropriate graphical representations.

Exercise 7:
The areas of 100 housings are recorded in the following table.
Area (in m2 ) [30, 40[ [40, 60[ [60, 80[ [80, 100[ [100, 140[ [140, 200[
Number of housings 13 20 22 19 21 5

1. Determine the population, the studied character, its type and its modalities.

2. Calculate the central tendency parameters.

3. Calculate the dispersion parameters.

4. Draw the appropriate graphical representations.

Exercise 8:
The size X of 100 students are recorded in the following table.

Size (in cm) [150, 160[ [160, 165[ [165, 170[ [170, 175[ [175, 180[ [180, 190[
Number of students 8 24 42 14 10 2

1. Calculate X and σX .

2. Calculate the percentage of individuals belonging to the interval [X − σX , X + σX ].

24
Chapter 2

Statistical series with two characters

2.1 Introduction
In the previous chapter, we have studied the distribution of a statistical variable and we have
seen how to describe it using numerical and graphical tools. However, in many situations we
may be interested in the relation between two (or more) statistical variables. In particular,
we need to know whether the value taken by a variable affects the other one, i.e. whether
there is a correlation between the two variables. We may also be interested in fitting one
variable with respect to the other using a mathematical equation. This allows to predict the
value of the fitted variable knowing the value of the other one.

2.2 Distributions and characteristics


To present the joint distribution and the marginal distributions of two statistical variables,
we will consider the case of two discrete statistical variables. If one of the variables is
continuous (or the two of them), we replace the numeric values of the variable by the class
intervals.
Consider a statistical population including n individuals. Assume that we are interested
in two discrete statistical variables X and Y , related to this population. We denote by
x1 < x2 < · · · < xk (resp. y1 < y2 < · · · < yl ) the values of the variable X (resp. Y ).
For any i ∈ {1, . . . , k} and j ∈ {1, . . . , l}, we define the absolute frequency nij as the number
of individuals for which X takes the value xi and Y takes the value yj .
nij
We also define the relative frequency fij as fij = .
Pk Pl Pk Pl n
We have i=1 j=1 nij = n and i=1 j=1 fij = 1.
We can represent the statistical series of the couple (X, Y ) by a double entry table called a
contingency table. It has the following form.

25
Y
y1 y2 ... yj ... yl Total
X
x1 n11 n12 ... n1j ... n1l
x2 n21 n22 ... n2j ... n2l
.. .. .. .. .. .. ..
. . . . . . .
xi ni1 ni2 ... nij ... nil
.. .. .. .. .. .. ..
. . . . . . .
xk nk1 nk2 ... nkj ... nkl
Total
Example 1:
A study on the number of pupils and the number of teachers in 200 secondary school gave
the following results, where X represents the number of pupils and Y represents the number
of teachers.
Y
20 22 25 27 29 31 32 Total
X
400 14 10 6 8 5 2 3 48
450 4 14 5 3 4 3 1 34
500 0 3 8 18 10 1 2 42
550 2 4 1 16 20 5 5 53
600 1 2 1 3 2 4 10 23
Total 21 33 21 48 41 15 21 200
- The values of X are {400, 450, 500, 550, 600} and the values of Y are {20, 22, 25, 27, 29, 31, 32}.
- For example, we have:
n34 = 18: There is 18 schools with 500 pupils (X = x3 = 500) and 27 teachers (Y = y4 = 27)
n34 18
and the corresponding relative frequency is f34 = = = 0.09.
n 200
n42 = 4: There is 4 schools with 550 pupils (X = x4 = 550) and 22 teachers (Y = y2 = 22)
n42 4
and the corresponding relative frequency is f42 = = = 0.02.
n 200
- Summing over rows or columns gives the same result, which is the sample size n = 200.

2.2.1 Marginal distributions


Marginal distributions are the statistical distributions of each variable (X or Y ) alone.
Definition 8. (Marginal absolute frequencies of X)
The ith marginal absolute frequency of X is the number of individuals for which X = xi
regardless of the value of Y . It is given by
l
X
ni. = nij
j=1

26
Definition 9. (Marginal relative frequencies of X)
The ith marginal relative frequency of X is the proportion of individuals for which X = xi
regardless of the value of Y . It is given by
l
X ni.
fi. = fij = .
j=1
n

We define in the same way the marginal absolute and relative frequencies of Y which are
given by
k k
X X n.j
n.j = nij and f.j = fij = .
i=1 i=1
n
We have
k
X l
X k
X l
X
ni. = n.j = n and fi. = ‘ f.j = 1.
i=1 j=1 i=1 j=1

To calculate ni. (resp. n.j ) from the contingency table, we sum the nij over the ith row (resp.
the jth column). In the previous example, we have ni1 = 48, ni2 = 34, ni3 = 42, ni4 = 53
and ni5 = 23. Moreover, n.1 = 21, n.2 = 33, n.3 = 21, n.4 = 48, n.5 = 41, n.6 = 15 and
n.7 = 21.
Remark 6.
If one of the variables X and Y (or the two of them) is qualitative, we define the marginal
distributions in the same way.

2.2.2 Marginal characteristics


Marginal mean
The marginal mean of X is defined by
k
1X
X= ni. xi
n i=1

Similarly, the marginal mean of Y is defined by


l
1X
Y = n.j yj
n j=1

Marginal variance
The marginal variance of X is defined by
k k
1X 1X
V ar(X) = ni. (xi − X)2 = ni. x2i − (X)2
n i=1 n i=1

27
p
and the marginal standard deviation of X is σX = V ar(X).
Similarly, the marginal variance and stadard deviation of Y are defined by
l l
1X 1X
V ar(Y ) = n.j (yj − Y )2 = n.j yj2 − (Y )2
n j=1 n j=1
p
and σY = V ar(Y ).
Remark 7.
If one of the variables X and Y (or the two of them) is continuous, we replace the values xi
and/or yj by the centres of the class intervals.
Example 1 (continued):
We will calculate the marginal means and variances in the example of secondary schools.

Y
20 22 25 27 29 31 32 ni. ni. xi ni. x2i
X
400 14 10 6 8 5 2 3 48 19200 7680000
450 4 14 5 3 4 3 1 34 15300 6885000
500 0 3 8 18 10 1 2 42 21000 10500000
550 2 4 1 16 20 5 5 53 29150 16032500
600 1 2 1 3 2 4 10 23 13800 8280000
n.j 21 33 21 48 41 15 21 200 98450 49377500
n.j yj 420 726 525 1296 1189 465 672 5293
n.j yj2 8400 15972 13125 34992 34481 14415 21504 142889
5
1X 98450
X= ni. xi = = 492.25
n i=1 200
5
1X 49377500
ni. x2i − (X)2 =
V ar(X) = − (492.25)2 = 4577.437
n i=1 200
p √
and σX = V ar(X) = 4577.437 = 67.657
Moreover,
7
1X 5293
Y = n.j yj = = 26.465
n j=1 200
7
1X 142889
V ar(Y ) = n.j yj2 − (Y )2 = − (26.465)2 = 14.049
n j=1 200
p
and σY = V ar(Y ) = 3.748 We can also present the marginal distributions of X and Y as
follows.

28
xi ni. fi. ni. xi ni. x2i
400 48 0.24 19200 7680000
450 34 0.17 15300 6885000
500 42 0.21 21000 10500000
550 53 0.265 29150 16032500
600 23 0.115 13800 8280000
Total 200 1 98450 49377500
Total/n 492.25 246887.5

yj n.j f.j n.j yj n.j yj2


20 21 0.105 420 8400
22 33 0.165 726 15972
25 21 0.105 525 13125
27 48 0.24 1296 34992
29 41 0.205 1189 34481
31 15 0.075 465 14415
32 21 0.105 672 21504
Total 200 1 5293 142889
Total/n 26.465 714.445

2.2.3 Conditional distribution


The conditional distribution of X is the distribution of X that corresponds to a fixed value
yj of Y .

Definition 10.
The ith conditional relative frequency of X given Y = yj is the proportion of individuals for
which X = xi in the sub-population constituted of individuals for which Y = yj . It is given
by
nij
fi/Y =yj = .
n.j

We have ki=1 fi/Y =yj = 1.


P
We define in the same way the conditional distribution of Y given X = xi and the conditional
relative frequency
nij
fj/X=xi =
ni.
Pl
and we have j=1 fj/X=xi = 1.
Example 1 (continued):
- Determine the conditional distribution of the number of pupils of the schools having 25
teachers.
We have to determine the conditional distribution of X given Y = y3 = 25. This conditional
distribution is as follows.

29
xi ni3 fi/Y =y3
400 6 0.286
450 5 0.238
500 8 0.381
550 1 0.048
600 1 0.048
Total 21 1
ni3 ni3
fi/Y =y3 = = .
n.3 21
- Determine the conditional distribution of the number of teachers of the schools having 450
pupils.
We have to determine the conditional distribution of Y given X = x2 = 450. This conditional
distribution is as follows.
yj n2j fj/X=x2
20 4 0.118
22 14 0.412
25 5 0.147
27 3 0.088
29 4 0.118
31 3 0.088
32 1 0.029
Total 34 1
n2j n2j
fj/X=x3 = = .
n2. 34

2.2.4 Conditional Characteristics


Conditional mean of X given Y = yj
The conditional mean of X given Y = yj is defined by
k
1 X
X /Y =yj = nij xi
n.j i=1

The conditional mean of Y given X = xi is defined in the same way


l
1 X
Y /X=xi = nij yj
ni. j=1

30
Conditional variance of X given Y = yj
The conditional variance of X given Y = yj is defined by
k k
1 X 2 1 X 2
V ar(X/Y = yj ) = nij xi − X /Y =yj = nij x2i − X /Y =yj
n.j i=1 n.j i=1

p the conditional standard deviation of X given Y = yj is defined by σ(X/Y = yj ) =


and
V ar(X/Y = yj ).
The conditional variance and standard deviation of Y given X = xi are defined in the same
way:
l l
1 X 2 1 X 2
V ar(Y /X = xi ) = nij yj − Y /X=xi = nij yj2 − Y /X=xi
ni. j=1 ni. j=1
p
and σ(Y /X = xi ) = V ar(Y /X = xi ).
Example 1 (continued):
- Determine the conditional mean and standard deviation of the number of pupils of the
schools having 25 teachers.
We have to calculate X /Y =y3 and σ(X/Y = y3 ).

xi ni3 fi/Y =y3 ni3 xi ni3 x2i


400 6 0.286 2400 960000
450 5 0.238 2250 1012500
500 8 0.381 4000 2000000
550 1 0.048 550 302500
600 1 0.048 600 360000
Total 21 1 9800 4635000
5
1 X 9800
X /Y =y3 = ni3 xi = = 466.667
n.3 i=1 21
5
1 X 2 4635000
V ar(X/Y = y3 ) = ni3 x2i − X /Y =y3 = − (466.667)2 = 2936.197
n.3 i=1 21
p
and σ(X/Y = y3 ) = V ar(X/Y = y3 ) = 54.187
- Determine the conditional mean and standard deviation of the number of teachers of the
schools having 450 pupils.
We have to calculate Y /X=x2 and σ(Y /X = x2 ).

31
yj n2j fj/X=x2 n2j yj n2j yj2
20 4 0.118 80 1600
22 14 0.412 308 6776
25 5 0.147 125 3125
27 3 0.088 81 2187
29 4 0.118 116 3364
31 3 0.088 93 2883
32 1 0.029 32 1024
Total 34 1 835 20959
7
1 X 835
Y /X=x2 = n2j yj = = 24.559
n2. j=1 34
7
1 X 2 20959
V ar(Y /X = x2 ) = n2j yj2 − Y /X=x2 = − (24.559)2 = 13.297
n2. j=1 34
p
and σ(Y /X = x2 ) = V ar(Y /X = x2 ) = 3.647
- Determine the conditional distribution as well as the conditional mean and standard devi-
ation of the number of teachers of the schools having at most 500 pupils.
We have to determine the conditional distribution of Y given X ≤ x3 (x3 = 500).

Y /X ≤ x3
20 22 25 27 29 31 32
X
400 14 10 6 8 5 2 3
450 4 14 5 3 4 3 1
500 0 3 8 18 10 1 2
Total (nj/X≤x3 ) 18 27 19 29 19 6 6

yj nj/X≤x3 fj/X≤x3 nj/X≤x3 yj nj/X≤x3 yj2


20 18 0.145 360 7200
22 27 0.218 594 13068
25 19 0.153 475 11875
27 29 0.234 783 21141
29 19 0.153 551 15979
31 6 0.048 186 5766
32 6 0.048 192 6144
Total 124 1 3141 81173

So
7
1 X 3141
Y /X≤x3 = nj/X≤x3 yj = = 25.331
124 j=1 124

32
7
1 X 2 81173
V ar(Y /X ≤ x3 ) = nj/X≤x3 yj2 − Y /X≤x3 = − (25.331)2 = 12.961
124 j=1 124
p
and σ(Y /X ≤ x3 ) = V ar(Y /X ≤ x3 ) = 3.600

2.3 Covariance of two characters


2.3.1 Definition
The covariance of the two statistical variables X and Y is defined by
k l k l
1 XX   1 XX
cov(X, Y ) = nij xi − X yj − Y = nij xi yj − X Y .
n i=1 j=1 n i=1 j=1

The two relation are equals since


k l
1 XX  
cov(X, Y ) = nij xi − X yj − Y
n i=1 j=1
k l
1 XX
= (nij xi yj − nij xi Y − nij Xyj + nij X Y )
n i=1 j=1
k l k l
! l k
! k l
1 XX 1X X 1X X X Y XX
= nij xi yj − Y xi nij − X yj nij + nij
n i=1 j=1 n i=1 j=1 n j=1 i=1 n i=1 j=1
k l k
! l
!
1 XX 1X 1X XY
= nij xi yj − Y ni. xi − X n.j yj + ×n
n i=1 j=1 n i=1 n j=1 n
k l
1 XX
= nij xi yj − 2X Y + X Y
n i=1 j=1
k l
1 XX
= nij xi yj − X Y .
n i=1 j=1

Example 1 (continued):
Calculate the covariance of the number of pupils and the number of teachers.

33
Y
20 22 25 27 29 31 32 ni.
X
112000 88000 60000 86400 58000 24800 38400 467600
400
14 10 6 8 5 2 3 48
36000 138600 56250 36450 52200 41850 14400 375750
450
4 14 5 3 4 3 1 34
0 33000 100000 243000 145000 15500 32000 568500
500
0 3 8 18 10 1 2 42
22000 48400 13750 237600 319000 85250 88000 814000
550
2 4 1 16 20 5 5 53
12000 26400 15000 48600 34800 74400 192000 403200
600
1 2 1 3 2 4 10 23
182000 334400 245000 652050 609000 241800 364800 2629050
n.j
21 33 21 48 41 15 21 200

34
So
5 7
1 XX 2629050
cov(X, Y ) = nij xi yj − X Y = − (492.25)(26.465) = 117.854
n i=1 j=1 200

2.3.2 Properties of the covariance


The following proposition gives some properties of the covariance.

Proposition 1.
Let X and Y be two discrete statistical variables taking respectively the values x1 < x2 <
· · · < xk and y1 < y2 < · · · < yl and let a, b, c, d ∈ R be some constants. We have

1. cov(X, Y ) = cov(Y, X).

2. cov(X, X) = V ar(X).

3. V ar(X + Y ) = V ar(X) + V ar(Y ) + 2 cov(X, Y ).

4. cov(aX + b, cY + d) = ac cov(X, Y ).

5. |cov(X, Y )| ≤ σX σY .

Proof.

1. We have
k l l k
1 XX 1 XX
cov(X, Y ) = nij xi yj − X Y = nij yj xi − Y X = cov(Y, X).
n i=1 j=1 n j=1 i=1

2. We have
k k
1 XX 2
cov(X, X) = nij xi xj − X
n i=1 i=1
Since X can not take two different values at the same time, we have

ni if i = j
nij =
0 if i ̸= j.

Thus
k
1X 2
cov(X, X) = ni x2i − X = V ar(X).
n i=1

35
3. We have
k l
1 XX 2
V ar(X + Y ) = nij xi + yj − X + Y
n i=1 j=1
k l
1 XX 2
= nij xi − X + yj − Y
n i=1 j=1
k l k l k l
1 XX 2 1 X X 2 2 X X  
= nij xi − X + nij yj − Y + nij xi − X yj − Y
n i=1 j=1 n i=1 j=1 n i=1 j=1
k l l k
1X 2 X 1X 2 X
= xi − X nij + yj − Y nij + 2 cov(X, Y )
n i=1 j=1
n j=1 i=1
k l
1X 2 1 X 2
= ni. xi − X + n.j yj − Y + 2 cov(X, Y )
n i=1 n j=1
= V ar(X) + V ar(Y ) + 2 cov(X, Y ).

4. We have
k l
1 XX  
cov(aX + b, cY + d) = nij axi + b − aX + b cyj + d − cY + d
n i=1 j=1
k l
1 XX  
= nij axi + b − aX − b cyj + d − cY − d
n i=1 j=1
k l
1 XX  
= ac nij xi − X yj − Y
n i=1 j=1
k l
ac X X  
= nij xi − X yj − Y
n i=1 j=1
= ac cov(X, Y ).

5. To prove this property, we need the following Cauchy-Schwarz inequality which we will
first establish.
For all real numbers a1 , a2 , . . . , ap and b1 , b2 , . . . , bp , we have
v
p u p p
X uX X
2
ai b i ≤ t ai b2i .
i=1 i=1 i=1

Set for t ∈ R
p p p p p
X X X X X
2 2 2 2 2 2
P (t) = (ai t + bi ) = (ai t + 2ai bi t + bi ) = t ai + 2t ai b i + b2i .
i=1 i=1 i=1 i=1 i=1

36
By definition the polynomial P (t) is positive for all t ∈ R, so its discriminant ∆ is
negative. Thus
p
!2 p
! p
! p
!2 p
! p
!
X X X X X X
2 2 2 2
∆= 2 ai b i − 4 ai bi = 4 ai b i − 4 ai bi ≤ 0
i=1 i=1 i=1 i=1 i=1 i=1

p
!2 p
! p
!
X X X
=⇒ 4 ai bi ≤4 a2i b2i
i=1 i=1 i=1
p
!2 p
! p
!
X X X
=⇒ ai bi ≤ a2i b2i
i=1 i=1 i=1
v !2 v
u p u p ! p
!
u X u X X
=⇒t ai b i ≤t a2i b2i
i=1 i=1 i=1
v
p u p p
! !
X u X X
=⇒ ai b i ≤ t a2i b2i .
i=1 i=1 i=1

Now, we can prove property 5. We have


k l
1 XX  
|cov(X, Y )| = nij xi − X yj − Y
n i=1 j=1
k X l r r
X nij  nij 
= xi − X yj − Y
i=1 j=1
n n
v ! k l !
u k l
u X X nij 2 X X nij 2
≤t xi − X yj − Y (Cauchy-Schwarz inequality)
i=1 j=1
n i=1 j=1
n
v ! !
u k l l k
u 1X 2 X 1 X 2 X
=t xi − X nij yj − Y nij
n i=1 j=1
n j=1 i=1
v ! !
u k l
u 1X 2 1X 2
= t ni. xi − X n.j yj − Y
n i=1 n j=1
p
= V ar(X)V ar(Y )
= σX σY .

37
2.3.3 Correlation coefficient
The correlation coefficient of the two statistical variables X and Y is defined by
cov(X, Y )
ρX,Y = .
σX σY
In Example 1 above, we have
cov(X, Y ) 117.854
ρX,Y = = = 0.465
σX σY (67.657)(3.748)
Remark 8.
Since |cov(X, Y )| ≤ σX σY , we have |ρX,Y | ≤ 1, thus −1 ≤ ρX,Y ≤ 1.
If ρX,Y = 1 or near to 1, there exists a linear relation between X and Y with positive slope.
If ρX,Y = −1 or near to −1, there exists a linear relation between X and Y with negative
slope.
If ρX,Y = 0 or near to 0, there is no linear relation between X and Y . In this case, the points
in the scatter plot of the two variables X and Y are arbitrary placed.

2.4 Fittings
2.4.1 Fitting of type Y = aX + b
The line of best fit (or the regression line) of Y on X using the least square method is given
by Y = aX + b, where 
 a = cov(X, Y )

V ar(X)
 b = Y − aX.

X is called explanatory variable and Y is called response variable.


The regression line of X on Y is given by X = cY + d, where

 c = cov(X, Y )

V ar(Y )
 d = X − cY .

The two lines pass through the point (X, Y ).


Example 2:
On a sample of 10 employees of a company, the number of years of service (denoted by X)
and the number of days of absence for medical reasons during the last year (denoted by Y )
have been recorded in the following table.
Number of years of service (X) 2 5 7 8 11 13 14 16 20 24
Number of days of absence (Y ) 2 3 8 9 8 10 13 14 13 19

38
To determine the regression line of Y on X, we have to calculate the coefficients a and b.
xi yi x2i yi2 xi yi
2 2 4 4 4
5 3 25 9 15
7 8 49 64 56
8 9 64 81 72
11 8 121 64 88
13 10 169 100 130
14 13 196 169 182
16 14 256 196 224
20 13 400 169 260
24 19 576 361 456
Total 120 99 1860 1217 1487
Total/n 12 9.9 186 121.7 148.7
10 10
1X 120 1X 99
X= xi = = 12, Y = yi = = 9.9
n i=1 10 n i=1 10
10 10
1X 2 2 1860 1X 2 2 1217
V ar(X) = xi − X = −(12)2 = 42, V ar(Y ) = yi − Y = −(9.9)2 = 23.69
n i=1 10 n i=1 10
10
1X 1487
cov(X, Y ) = xi yi − X Y = − 12 × 9.9 = 29.9
n i=1 10
Therefore 
 a = cov(X, Y ) = 29.9 = 0.712

V ar(X) 42
 b = Y − aX = 9.9 − (0.712) × 12 = 1.356

So the regression line of Y on X is given by

Y = (0.712)X + 1.356

In the next figure, we represent the scatter plot of (X, Y ) as well as the regression line.

39
We remark that the points (xi , yi ) are placed in a position near to the regression line. This
is confirmed by the correlation coefficient

cov(X, Y ) 29.9
ρX,Y = =√ = 0.948
σX σY 42 × 23.69
which is near to 1.
- The regression line can be used to estimate (predict) the value of Y corresponding to a
value of X which does not exist in the table. For example to estimate the number of days
of absence of an employee in service from 27 years, we calculate

Y = (0.712) × 27 + 1.356 = 20.58 ≈ 21 days.

Remark 9.

1. The scatter plot can take many forms depending on the value of ρX,Y . When ρX,Y = 1
(resp. ρX,Y = −1) the points (xi , yi ) are collinear and lie on a line with positive (resp.
negative) slope. When ρX,Y ≈ 1 (resp. ρX,Y ≈ −1), the scatter plot is near to a line
with positive (resp. negative) slope and when ρX,Y ≈ 0, the points of the scatter plot
are arbitrary placed. Here are some examples.

40
2. The equation Y = aX + b should not be solved to determine the regression line of X
on Y , we have to calculate the coefficients c and d of the equation X = cY + d, using
their formulas.
cov(X, Y )
3. The regression line of Y on X is Y = aX + b, with a = and b = Y − aX, so
V ar(X)

Y = aX+Y − aX

=⇒ Y − Y = a X − X
cov(X, Y ) 
= X −X
V ar(X)
cov(X, Y ) 
= 2
σY X − X
σX σY
σY 
Y − Y = ρX,Y X −X
σX
and we can prove in the same way that the regression line of X on Y can be written
as
σX 
X − X = ρX,Y Y −Y .
σY
So, if ρX,Y = 1 or ρX,Y = −1, the two regression lines coincide and only in this case,
we can solve one equation to determine the other one.

41
2.4.2 Fitting of type Y = B × AX
The regression line is not always appropriate to describe the relation between two statistical
variables X and Y . In some situations, the scatter plot may suggest other forms of functions
such as an exponential function of the form Y = B × AX . We will illustrate this fitting
through an example.
Example 3:
The number X of open checkouts in a hypermarket and the average wait time Y (in minutes)
have been recorded in the following table.
Number of open checkouts (X) 3 4 5 6 8 10 12
Average wait time (Y ) 16 12 9.6 7.9 6 4.7 4

We will fit Y on X using the equation Y = B × AX , where A, B > 0 are constants. In order
to determine the coefficients A and B, we use the natural logarithm to linearise the equation

Y = B × AX =⇒ ln(Y ) = ln(B × AX ) = ln(B) + X ln(A).

Setting Z = ln(Y ), a = ln(A) and b = ln(B), we get

Z = aX + b.

So, we will calculate a and b using the relations



 a = cov(X, Z)

V ar(X)
 b = Z − aX.

Then, we can deuce A and B since



 a = ln(A) ⇐⇒ A = ea
 b = ln(B) ⇐⇒ B = eb .

To calculate a and b, we need the following table.


xi yi zi = ln(yi ) x2i xi zi
3 16 2.773 9 8.319
4 12 2.485 16 9.94
5 9.6 2.262 25 11.31
6 7.9 2.067 36 12.402
8 6 1.792 64 14.336
10 4.7 1.548 100 15.48
12 4 1.386 144 16.632
Total 48 14.313 394 88.419
Total/n 6.857 2.045 56.286 12.631

42
7 7
1X 48 1X 14.313
X= xi = = 6.857, Z = zi = = 2.045
n i=1 7 n i=1 7
7
1X 2 2 394
V ar(X) = xi − X = − (6.857)2 = 9.268
n i=1 7
7
1X 88.419
cov(X, Z) = xi zi − X Z = − (6.857)(2.045) = −1.392
n i=1 7
Therefore 
 a = cov(X, Z) = − 1.392 = −0.150

V ar(X) 9.268
 b = Z − aX = 2.045 + (0.150)(6.857) = 3.074

and 
 A = ea = 0.861
 B = eb = 21.628

So, the equation of Y on X is given by

Y = (21.628) × (0.861)X .

In the next figure, we represent this function as well as the scatter plot of (X, Y ).

43
We remark that the points (xi , yi ) are near to the fitting curve.
- Question: Estimate the average wait time when 9 checkouts are open.
Response:
Y = (21.628) × (0.861)9 = 5.624 minutes.

2.4.3 Fitting of type Y = B × X a


We will present an other fitting type, where the fitting curve has an equation of the form
Y = B × X a.
Example 4:
Five different doses of an insecticide were used in five parcels with the same area and situated
in the same field. Then, after several days, the number of the insects that are still on the
parcels was recorded in the following table.
Amount of insecticide in decilitres (X) 3 5 7 9 10
Number of insects (Y ) 312 220 42 33 25
We will fit Y on X using the equation Y = B × X a , where a ∈ R and B > 0 are constants.
We have
Y = B × X a =⇒ ln(Y ) = ln(B × X a ) = ln(B) + a ln(X).
Setting Z = ln(X), T = ln(Y ) and b = ln(B), we get

T = aZ + b.

So, a and b can be calculated as follows.



 a = cov(Z, T )

V ar(Z)
 b = T − aZ.

Then, B can be calculated by


b = ln(B) ⇐⇒ B = eb .
xi yi zi = ln(xi ) ti = ln(yi ) zi2 zi ti
3 312 1.099 5.743 1.208 6.312
5 220 1.609 5.394 2.589 8.679
7 42 1.946 3.738 3.787 7.274
9 33 2.197 3.497 4.827 7.683
10 25 2.303 3.219 5.304 7.413
Total 9.154 21.591 17.715 37.361
Total/n 1.831 4.318 3.543 7.472
5 5
1X 9.154 1X 21.591
Z= zi = = 1.831, T = ti = = 4.318
n i=1 5 n i=1 5

44
5
1X 2 2 17.715
V ar(Z) = zi − Z = − (1.831)2 = 0.190
n i=1 5
5
1X 37.361
cov(Z, T ) = zi ti − Z T = − (1.831)(4.318) = −0.434
n i=1 5
Therefore

 a = cov(Z, T ) = − 0.434 = −2.284

V ar(Z) 0.190
 b = T − aZ = 4.318 + (2.284)(1.831) = 8.500 =⇒ B = eb = 4914.769

So, the equation of Y on X is given by

Y = (4914.769) × X −2.284 .

In the next figure, we represent this function as well as the scatter plot of (X, Y ).

We remark that the regression curve fits well the data.


Remark 10.
The coefficient of determination of two statistical variables X and Y is defined by ρ2X,Y . It
can be used to determine the best type of fitting to be used to describe a set of data. More

45
precisely, if we have a set of data (xi , yi )1≤i≤n and we want to know the best type of fitting
to describe this scatter plot among the linear fitting Y = aX + b, the exponential fitting
Y = B × AX and the power fitting Y = B × X a , we proceed as follows.
- We calculate ρ2X,Y which represents the degree of strength of the linear relation between X
and Y (corresponding to the linear fitting Y = aX + b).
- We calculate ρ2X,Z , where Z = ln(Y ), which represents the degree of strength of the linear
relation between X and Z = ln(Y ) (corresponding to the exponential fitting Y = B × AX ).
- We calculate ρ2U,V , where U = ln(X) and V = ln(Y ), which represents the degree of strength
of the linear relation between U = ln(X) and V = ln(Y ) (corresponding to the power fitting
Y = B × X a ).
- The largest value among ρ2X,Y , ρ2X,Z and ρ2U,V determines the best type of fitting to be used
to describe our scatter plot (xi , yi )1≤i≤n .

46
Probability

Part II

Probability

47
Chapter 3

Introduction to probability calculus

3.1 Reminders on combinatorial analysis


Combinatorial analysis is a branch of mathematics serving to count objects. It is very useful
for probability calculus.
Let E be a non empty set with cardinal number n and let k ∈ {1, . . . , n}.

3.1.1 k−permutation
A k−permutation of E is an ordered selection of k items from E. The items are selected
without repetition.
The permutation coefficient Pnk is the number of all k−permutations of a set of n items.
We have
n!
Pnk = ,
(n − k)!
with n! = n × (n − 1) × (n − 2) × · · · × 1, for n ∈ N∗ (by convention 0! = 1).
Example 1:
5 athletes participate in a race. The first one who crosses the finish line wins a gold medal,
the second one wins a silver medal and the third wins a bronze medal. At their arrival, the
winners have to find the list of their names already written. So, the organizing committee
must prepare all possible lists. How many lists are there?
Solution:
The order of athletes is important and there is no repetition in the lists, so the number of
possible lists is
5! 5! 5 × 4 × 3 × 2!
P53 = = = = 60 possible lists.
(5 − 3)! 2! 2!

Example 2:
The students of a faculty want to create an association. They organized elections to choose
the members of the committee of this association. The committee consists of

48
- A president.
- A vice-president.
- A treasurer.
These positions will be attributed in accordance with the number of votes received by each
candidate. 6 candidates are presented to the elections. How many possible committees can
we expect?
Solution:
The order of candidates is important and there is no repetition in the committees, so the
number of possible committees is
6! 6! 6 × 5 × 4 × 3!
P63 = = = = 120 possible committees.
(6 − 3)! 3! 3!
Example 3:
One urn contains 3 white balls and 4 black ones. We draw from this urn, 3 balls successively
and without replacement. What is the number of possible cases?
Solution:
The total number of balls is 4 + 3 = 7.
The order of balls is important since the sampling is successive and there is no repetition
because the sampling is without replacement. So, the number of possible cases is
7! 7! 7 × 6 × 5 × 4!
P73 = = = = 210 possible cases.
(7 − 3)! 4! 4!

3.1.2 Permutation
An n−permutation of E is simply called permutation. It is a selection of all the items of E
in a certain order. The number of permutations of a set of n items is Pnn = n!.

3.1.3 Combination
A combination of k items of E is any subset of k items of E.
In a combination, the items are selected without repetition and their order is not important.
The number of combinations of k items from a set of n items is given by
n!
Cnk = .
k!(n − k)!
Example 4:
We return to Example 1, but we assume that the first three winners share the same award
with equal parts. How many possible winner groups are there?
Solution:
Here there is no repetition and the order of athletes is not important, so the number of
possible winner groups is
5! 5 × 4 × 3!
C53 = = = 10 possible winner groups.
3! × 2! 3! × 2

49
Example 5:
We return to Example 2, but here the students decide that the committee of the association
operates in a collegial manner, without difference between members. How many possible
committees are there?
Solution:
Here there is no repetition and the order of candidates is not important, so the number of
possible committees is
6! 6 × 5 × 4 × 3!
C63 = = = 20 possible committees.
3! × 3! 3! × 3 × 2
Example 6:
We return to Example 3, but here the sampling is simultaneous. What is the number of
possible cases?
Solution:
Here there is no repetition and the order of balls is not important because the sampling is
simultaneous. So, the number of possible cases is
7! 7 × 6 × 5 × 4!
C73 = = = 35 possible cases.
3! × 4! 3 × 2 × 4!

3.2 Probability of events


3.2.1 Definitions
Random experiment
An experiment is said to be a random experiment if all the possible outcomes of the experi-
ment are know, but at any execution of the experiment, the final outcome is not known in
advance.
Example:
Throwing a fair dice.

Sample space
A sample space is the set of all possible outcomes of a random experiment. It is denoted by
Ω.
Example:
- For the throwing of a dice, the sample space is Ω = {1, 2, 3, 4, 5, 6}.
- In the examples of the urn (Examples 3 and 6 above), the cardinal number of Ω in Ex-
ample 3 (successive sampling without replacement) is 210 and in Example 6 (simultaneous
sampling), it is equal to 35.
Remark 11.

1. The sample space may be finite as in the the case of the dice throwing or infinite as in
the following example.

50
Example:
We throw successively a dice until we obtain the face ”6”.
The random experiment consists in observing the number of necessary throws to obtain
the face ”6”. Thus, the sample space is Ω = N∗ .

2. The sample space may be discrete as in the previous example, or continuous as in the
following example.
Example :
If the random experiment consists in observing the lifetime of an electronic component,
this lifetime T can take any value in the interval [0, +∞[, so Ω = [0, +∞[.

Elementary event
The elements of the sample space are called elementary events.
Example:
In the example of the dice throwing, the elementary events are: 1, 2, 3, 4, 5 and 6.

Composite event
Any subset of the sample space is called an event or a composite event.
Example:
In the example of the dice throwing, we denote
A: "The outcome of the throw is even".
A = {2, 4, 6} is a composite event.

Operations on events
Let A and B be two events.

- The complement of A, denoted by Ac is realized when A is not realized.

- The intersection A ∩ B is realized when A and B are simultaneously realized.

- The union A ∪ B is realized when at least one the two events A and B is realized.

- The set difference A \ B = A ∩ B c is realized when A is realized and B is not realized.

- The symmetric difference A∆B = (A \ B) ∪ (B \ A) is realized when exactly one of


the two events A and B is realized.

Remark 12.
In view of De Morgan’s laws, we have

1. (A ∩ B)c = Ac ∪ B c .

2. (A ∪ B)c = Ac ∩ B c .

51
3.2.2 General definition of a probability
Definition 11.
Let Ω be a finite sample space. We call probability any map P from P(Ω) to [0, 1] satisfying
a) P (Ω) = 1.

b) ∀A, B ∈ P(Ω) such that A ∩ B = Φ, we have P (A ∪ B) = P (A) + P (B).


The following proposition gives some properties of a probability P .
Proposition 2.
Let A, B ∈ P(Ω) be two events. We have
1. P (Φ) = 0.

2. P (Ac ) = 1 − P (A).

3. P (A \ B) = P (A) − P (A ∩ B).

4. If B ⊆ A, then P (A \ B) = P (A) − P (B).

5. If B ⊆ A, then P (B) ≤ P (A).

6. P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

7. P (A∆B) = P (A) + P (B) − 2P (A ∩ B).


Proof.
1. We have Ω ∩ Φ = Φ, so

P (Ω ∪ Φ) = P (Ω) + P (Φ) =⇒ P (Ω) = P (Ω) + P (Φ) =⇒ 1 = 1 + P (Φ) =⇒ P (Φ) = 0.

2. We have A ∩ Ac = Φ, so

P (A ∪ Ac ) = P (A) + P (Ac )
=⇒ P (Ω) = P (A) + P (Ac )
=⇒ 1 = P (A) + P (Ac )
=⇒ P (Ac ) = 1 − P (A).

3. We have A = (A \ B) ∪ (A ∩ B) and (A \ B) ∩ (A ∩ B) = Φ. Thus

P (A) = A((A\B)∪(A∩B)) = P (A\B)+P (A∩B) =⇒ P (A\B) = P (A)−P (A∩B).

4. If B ⊆ A, then A ∩ B = B. Thus, the previous property implies that

P (A \ B) = P (A) − P (A ∩ B) = P (A) − P (B).

52
5. If B ⊆ A, we have from the previous property
P (A) − P (B) = P (A \ B) ≥ 0 =⇒ P (A) ≥ P (B).

6. We have A ∪ B = (A \ B) ∪ (B \ A) ∪ (A ∩ B) and (A \ B) ∩ (B \ A) ∩ (A ∩ B) = Φ.
Thus
P (A ∪ B) = P ((A \ B) ∪ (B \ A) ∪ (A ∩ B))
= P (A \ B) + P (B \ A) + P (A ∩ B)
= P (A) − P (A ∩ B) + P (B) − P (B ∩ A) + P (A ∩ B) (in view of property 3)
= P (A) + P (B) − 2P (A ∩ B) + P (A ∩ B)
= P (A) + P (B) − P (A ∩ B).

7. We have A∆B = (A \ B) ∪ (B \ A) and (A \ B) ∩ (B \ A) = Φ. Thus


P (A∆B) = P ((A \ B) ∪ (B \ A))
= P (A \ B) + P (B \ A)
= P (A) − P (A ∩ B) + P (B) − P (B ∩ A) (in view of property 3)
= P (A) + P (B) − 2P (A ∩ B).

Remark 13.
We say that the events A and B are incompatible if A ∩ B = Φ.
Independence

We say that the two events A and B are independent if P (A ∩ B) = P (A) × P (B).
If A and B are independent, then Ac and B (resp. A and B c ; Ac and B c ) are also independent.

3.2.3 Study of the equiprobability


The equiprobability means that all elementary events have the same probability.
Example:
Throwing a fair dice.
We have Ω = {1, 2, 3, 4, 5, 6}.
1
For all ω ∈ Ω, P ({ω}) = , so in this case, we have equiprobability.
6
- In the case of equiprobability, we have for all A ∈ P(Ω)
|A| Number of favorable cases
P (A) = = ,
|Ω| Total number of possible cases
where |E| denotes the cardinal number of the set A.
In the example of the throwing of a fair dice, if A: "Th outcome of the throw is even", we
have A = {2, 4, 6}, so
|A| 3 1
P (A) = = = .
|Ω| 6 2

53
3.2.4 Conditional probabilities
Let A and B be two events such that P (A) ̸= 0. The conditional probability of B given A
is defined by
P (A ∩ B)
P (B/A) = .
P (A)
Remark 14.

1. P (B c /A) = 1 − P (B/A) (P (B/Ac ) ̸= 1 − P (B/A)).

2. If A and B are independent, then P (B/A) = P (B).

Example :
We throw a fair dice. Given that the outcome is even, what is the probability of getting a
multiple of 3?
Solution :
Denote by
A: "Getting an even number" and B: "Getting a multiple of 3".
We have A = {2, 4, 6} and B = {3, 6}. Thus

P (A ∩ B) P ({6}) 1/6 1
P (B/A) = = = = .
P (A) P ({2, 4, 6}) 3/6 3

3.2.5 Law of total probability and chain rule


Bayes formula

Theorem 1.
Let A and B be two events with non-zero probability, we have

P (B/A) × P (A)
P (A/B) = .
P (B)

Proof.
The proof follows immediately from the definition of the conditional probability. Indeed, we
have
P (A ∩ B)
× P (A)
P (B/A) × P (A) P (A) P (A ∩ B)
= = = P (A/B).
P (B) P (B) P (B)

Example:
We return to the previous example of throwing a fair dice. Given that the outcome is a
multiple of 3, what is the probability of getting an even number?

54
Solution:
In view of Bayes formula, we have
1 3
P (B/A) × P (A) × 1
P (A/B) = = 3 6 = .
P (B) 2 2
6
Law of total probability

Definition 12. (Partition)


A family of non empty sets A1 , A2 , . . . , Ak is a partition of Ω if

 Sk A = A ∪ A ∪ · · · ∪ A = Ω
i=1 i 1 2 k
 A ∩ A = Φ, if i ̸= j (i, j ∈ {1, . . . , k}).
i j

Theorem 2.
Let A1 , A2 , . . . , Ak be a partition of Ω such that P (Ai ) ̸= 0 for all i ∈ {1, 2, . . . , k}. We have
for all B ∈ P(Ω)
Xk
P (B) = P (B/Ai ) × P (Ai ).
i=1

Proof.
We have

P (B) = P (B ∩ Ω)
k
!!
[
=P B∩ Ai
i=1
k
!
[
=P (B ∩ Ai )
i=1
k
X
= P (B ∩ Ai ) (because the events (B ∩ Ai )1≤i≤k are incompatible)
i=1
k
X P (B ∩ Ai )
= × P (Ai )
i=1
P (Ai )
k
X
= P (B/Ai ) × P (Ai ).
i=1

Generalized Bayes formula

55
Theorem 3.
Let A1 , A2 , . . . , Ak be a partition of Ω such that P (Ai ) ̸= 0 for all i ∈ {1, 2, . . . , k}. We have
for all B ∈ P(Ω) such that P (B) ̸= 0
P (B/Ai ) × P (Ai )
P (Ai /B) = Pk , ∀i ∈ {1, . . . , k}.
j=1 P (B/Aj ) × P (Aj )

Proof.
In view of Bayes formula, we have
P (B/Ai ) × P (Ai )
P (Ai /B) =
P (B)
and the law of total probability allows to write
k
X
P (B) = P (B/Aj ) × P (Aj ).
j=1

Thus
P (B/Ai ) × P (Ai )
P (Ai /B) = Pk .
j=1 P (B/A j ) × P (A j )

Chain rule
Theorem 4.
Let A1 , A2 , . . . , Ak be a sequence of events such that P (A1 ∩ A2 ∩ · · · ∩ Ak ) ̸= 0. We have
P (A1 ∩ A2 ∩ · · · ∩ Ak ) = P (A1 )P (A2 /A1 )P (A3 /A1 ∩ A2 ) × · · · × P (Ak /A1 ∩ A2 ∩ · · · ∩ Ak−1 ).
Proof.
We proceed by induction on k.
- For k = 2, we have
P (A1 ∩ A2 )
P (A2 /A1 ) = =⇒ P (A1 ∩ A2 ) = P (A1 )P (A2 /A1 ).
P (A1 )
So, the relation is satisfied.
- We assume that the relation is satisfied for k and we prove it for k + 1.
We have
P (A1 ∩ A2 ∩ · · · ∩ Ak+1 )
P (Ak+1 /A1 ∩ A2 ∩ · · · ∩ Ak ) =
P (A1 ∩ A2 ∩ · · · ∩ Ak )
=⇒ P (A1 ∩ A2 ∩ · · · ∩ Ak+1 ) = P (A1 ∩ A2 ∩ · · · ∩ Ak )P (Ak+1 /A1 ∩ A2 ∩ · · · ∩ Ak )
= P (A1 )P (A2 /A1 )P (A3 /A1 ∩ A2 ) × · · · × P (Ak /A1 ∩ A2 ∩ · · · ∩ Ak−1 )×
P (Ak+1 /A1 ∩ A2 ∩ · · · ∩ Ak ).
So, the relation is satisfied for k + 1.
Thus, the relation is satisfied for all k ≥ 2.

56

You might also like