Beamer Mémoire-1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Basic descriptive statistics vocabulary

Bivariate descriptive statistics

Probabilities-Statistics

Dr.HEBCHI Chaima

22 octobre 2023

1/33 Dr.HEBCHI Chaima Proba-stat


Basic descriptive statistics vocabulary
Bivariate descriptive statistics

Introduction
The term ’statistics’ refers to both a collection of numerical data
(observational data) related to a specific subject and the activities
involved in gathering, processing, and interpreting this data.
The term ’descriptive statistics’ involves ordering, classifying, and
representing observed data in a suitable format.
Course Objectives
Review and set reminders for important concepts and
vocabulary.
Learn how to describe and represent a set of data effectively.
Explore the study of statistical distributions involving two
variables.
Formulate conclusions based on the analysis of the studied
population.

2/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

statistical population :
The statistical population is the set of elements that we intend to
study. This set is denoted as : Ω
Example : the Algerian population, all the companies in a region, a
set of geographical sites...
Statistical individual (or statistical unit) :
A statistical individual ω represents an element of the population
under consideration.
Example : a person, a company, a geographical site...
Character (statistical variable) :
The character refers to how the observation of individuals in a
population is conducted. Each individual in a population can
generally be described by one or more characteristics, and this
application is denoted as X .
Example : for people : gender, age, salary,...
for companies : number of employees, sector of activity, etc.
3/33 Dr.HEBCHI Chaima Proba-stat
Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

For geographic locations : altitude, type of vegetation,...


Remark
"To characterize a population appropriately, one must focus on
considering the most relevant and discriminative characteristics."

Modalities xi : : The modalities of a statistical variable are the


different values it can take.
Example : "The statistical variable is ’marital status,’ and the
modalities are ’single, married, divorced, widowed, undeclared (no
response).’"
Classification of statistical variables : Two main categories
subdivided into two types : Quantitative variable, Qualitative
variable (or categorical).

4/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Remark : Two types of approaches can be considered when


presenting statistical data :
When conducting a survey, one initially has raw data. In this
case, it is necessary, first and foremost, to arrange the various
values in the series (usually from smallest to largest) and then
classify them in a summary table.
If a statistical study is conducted on existing data, these data
are already organized and arranged in tables. In this case (and
in the previous case, starting from this stage), the objective
will be to determine the main elements that characterize the
studied series, namely : the statistical population, the
statistical unit, the variable, and the type of variable
(qualitative, discrete quantitative, or continuous quantitative).

5/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

The frequency ni of a modality xi : it is the number of


observations that have the same modality xi . X
The total number of data points in the dataset n : n = ni
i
ni
Relative frequency fi : fi = n.

Remark
X
fi = 1
i

Example of a qualitative criterion : consider a population of 450


students, with a female count of 260 and a male count of 190.
Let’s express this information in the language of descriptive
statistics.
Example of a quantitative criterion : consider a sample of 12
students who took an exam. They achieved the following scores
{16, 5, 10, 5, 19, 13, 7, 12, 5, 13, 9, 17}
6/33 Dr.HEBCHI Chaima Proba-stat
Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Example of a qualitative criterion :


population : Ω the population size is : n = 450
statistical individual : i Each student i = 1, 2, ..., n
statistical variable : X Gender
modality : xF xM Feminine or Masculine
associated frequency nF nM 260 women, 190 men
with each modality
Example of a quantitative criterion :
sample : E sample size n = 12
statistical individual : i Each student i = 1, 2, ..., n
statistical variable : X scores
values : {x1 , x2 , ..., xn } {5, 7, 9, 10, 12, 13, 16, 17, 19}
associated frequency {n1 , n2 , ..., nn } {3, 1, 1, 1, 1, 2, 1, 1, 1}
with each value

7/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Tables and graphs are the two main methods of presenting


statistical data.
Tables : The general form of statistical tables (single-variable
series) is as follows :
Categories of the variable Frequencies by Relative frequencies
xi modality ni by modality fi
x1 n1 f1
x2 n2 f2
... ... ...
xi ni fi
... ... ...
xk nk fk
k
X k
X
n= ni 1= fi
i=1 i=1

8/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Graphic : : We differentiate between methods of representing a


statistical variable based on the nature of the variable (qualitative
or quantitative).
Bar charts : Bar charts are reserved for discrete qualitative or
quantitative variables.
Example : the scores of a written exam

Score (Note) Frequencies (Effectifs)


7 1
8 2
9 5
11 3
12 9
16 1

9/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

8
Effectifs

6
4
2

8 10 12 14 16

Note

10/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Histogram : Histograms are primarily used for continuous


variables, but some also use them for discrete variables. This
representation, which depicts the distribution of frequencies (or
relative frequencies) according to the categories of the variable
under study, consists of a set of adjacent rectangles.
Example : the size of the fish in cm (see : Grim,Z.(2013), p57).
Classes mi ni
[9.5 − 10.5[ 10 5
[10.5 − 11.5[ 11 5
[11.5 − 12.5[ 12 10
[12.5 − 13.5[ 13 20
[13.5 − 14.5[ 14 20
[14.5 − 15.5[ 15 20
[15.5 − 16.5[ 16 10
[16.5 − 17.5[ 17 5
[17.5 − 18.5[ 18 5
11/33 Dr.HEBCHI Chaima Proba-stat
Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Cumulative polygon : This graphical representation depicts the


distribution of cumulative frequencies (or frequencies) based on the
studied characteristic’s categories.
Example : The following statistical data is provided regarding the
distribution of weights in kilograms for 200 individuals.

12/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

class limits Classes frequencies cumulative frequencies


40 0
[40 − 50[ 32
50 32
[50 − 60[ 47
60 79
[60 − 70[ 51
70 130
[70 − 80[ 36
80 166
[80 − 90[ 19
90 185
[90 − 100[ 15
100 200
Total 200

13/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Arithmetic mean :

Case of discrete variables

Definition : The arithmetic mean is the value obtained by dividing


the sum of observed values xi by the number n of observations.
The formula for simple arithmetic mean :
k
1X
x̄ = xi ,
n i=1

k : number of categories ;
n = the total number of data points in the dataset.
The formula for weighted arithmetic mean (by the frequencies) :
k
1X
x̄ = ni xi
n i=1

14/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Example : Consider the series of numbers {4, 5, 8, 11, 17}, the


arithmetic mean of this series of numbers :
4 + 5 + 8 + 11 + 17
x̄ = =9
5
Example : Consider the series of numbers
{8, 20, 11, 26, 4, 8, 6, 26, 14, 8, 4, 19, 14, 11}, the weighted
arithmetic mean of this series of numbers :
x̄ = (4×2)+(6×1)+(8×3)+(11×2)+(14×2)+(19×1)+(20×1)+(26×2)
14 = 12.79

15/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Case of continuous variables We determine the center of a class


by dividing the sum of its limits in half (the class mark mi of
[xi , xi+1 [ is : mi = xi+12+xi ).
The formula for weighted arithmetic mean is :
k
1X
x̄ = ni m i
n i=1

where : k = number of classes and : mi = class mark of the i th


classe, with a partial frequency of ni .

16/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Example :

xi ni mi ni mi
[5 − 15[ 3 10 30
[15 − 25[ 12 20 240
[25 − 35[ 2 30 60
[35 − 45[ 9 40 360
[45 − 55[ 1 50 50
27 740

740
x̄ = 27 = 27.41

17/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Mode :
Case of discrete variables
Definition : the mode is the value of the modality (xi ) that
corresponds to the highest frequency ni (or the frequency fi ) in the
data.
Example : Consider the series of numbers
{5, 5, 5, 5, 10, 10, 15, 7, 7, 8, 13, 13, 13}. The most frequent value is
5, so Mode = 5.
Case of continuous variables
Definition : We call the modal class of a continuous statistical
series any class with the highest frequency. To calculate the mode
(with frequencies grouped into classes of equal amplitudes), you
should apply the following formula :
f1 − f 0
Mode = l1 + ×i
(f1 − f0 ) + (f1 − f2 )
18/33 Dr.HEBCHI Chaima Proba-stat
Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

l1 = Lower limit of modal class, i = Class width.


Example : return to the previous table :
9
Mode = 15 + 9+10 × 10 = 19.74
The median "Q2 " :
Definition : The median corresponds to the modality (xi ) of the
variable X that divides the total frequency into two equal subsets.

Case of discrete variables

The determination of the median depends on the parity (even or


odd) of the total frequency :
When the total frequency n is an odd value, the median is equal to
th
the value of the variable that corresponds to the n+1 2 observation
(Q2 = x n+1 ).
2
When the total frequency n is an even value, the median is the
value of xi corresponding to the n2 th and ( n2 + 1)th positions, and
x n +x n +1
their average is taken (Q2 = 2
2
2
).
19/33 Dr.HEBCHI Chaima Proba-stat
Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Example : Consider the following series : {20, 2, 15, 2, 2}. To find


the median, you need to arrange the series in ascending order of
values, which gives us {2, 2, 2, 15, 20}. Since n = 5 (the total
number of values is an odd number), we apply the formula n+1 2 ,
5+1
which is 2 = 3. The third value in the series is 2.
Example : Consider the following series :
{4, 8, 10, 13, 16, 16, 20, 50}. n = 8 (the total number of values is
x n +x n +1
13+16
even), so. Q2 = 2
2
2
= 2 = 14.5

Case of continuous variables

In this case, the calculation of the median requires applying the


following formula :

n/2 − c.f
Q2 = l1 + ×i
f

20/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

where :
l1 = lower limit of the median class ;
n = total number of data ;
c.f = cumulative frequencyof the class preceding the median class ;
f = frequency of the median class ;
i = size of the median class interval.
Example : Calculation of the median when values are grouped into
classes with equal class width

21/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

xi ni class limits cumulative frequency


10 0
[10 − 20[ 3
20 3
[20 − 30[ 12
30 15
[30 − 40[ 2
40 17
[40 − 50[ 9
50 26
[50 − 60[ 2
60 28
28

We have here : : n/2 = 14. This means that the 14th observation
is located in the class [20 − 30[.
22/33 Dr.HEBCHI Chaima Proba-stat
Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

The specific value of the median is :


14 − 3
Q2 = 20 + × 10 = 29.17
12
The quartile :
Definition : (see : Ambrosini, CH. (2011), p75) A quartile refers to
one of the 3 values of a statistical variable that allow splitting the
total population into 4 subsets of equal frequencies. The 3
quartiles are denoted as Q1 , Q2 , and Q3 .
Remark : Q2 = the median

23/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

The variance :
Let X be a statistical characteristic that takes on k modalities xi ,
to which are associated frequencies ni . The variance σ 2 (X ) of this
series is written as follows :
k
1X
σ 2 (X ) = ni (xi − X̄ )2
n i=1

The standard deviation :


it is the square root of the variance.
v
u k
q u1 X
σ(X ) = σ 2 (X ) = t ni (xi − X̄ )2
n i=1

24/33 Dr.HEBCHI Chaima Proba-stat


Representation of a series
Basic descriptive statistics vocabulary
Measures of Central Tendency
Bivariate descriptive statistics
Measures of Central Tendency

Properties :
if a ∈ R, b ∈ R and let Y = aX + b, then :
1) σ 2 (Y ) = σ 2 (aX + b) = a2 σ 2 (X )
2) σ(Y ) = σ(aX + b) = |a|σ(X )
X
3) σ 2 (X ) = 1
n ni xi2 − (X̄ )2
i
4) σ 2 (X ) = (X¯2 ) − (X̄ )2

25/33 Dr.HEBCHI Chaima Proba-stat

You might also like