AI_Chap2 (1)

Apprentissage non supervisé
(Unsupervised Learning)
Youcef CHIBANI, Prof., Dr.

[email protected]
Laboratoire d’Ingénierie des Systèmes Intelligents et Communicants
Faculté de Génie Electrique

Université des Sciences et de la Technologie
“Houari Boumediene”
Alger, Algérie
Prof. Youcef CHIBANI 1

Plan du cours
1. Introduction
2. Taxonomie des algorithmes d’apprentissage non supervisé
3. Algorithme K-Means
4. Applications

Introduction (1)
Collecte des données : Ensemble d’échantillons
Echantillon Caractéristique 1 (Feature #1)

Caractéristique 2 (Feature #2)
(Sample)
Echantillon #1 Caractéristique 1 Caractéristique 2

0 64281 11.21
1 54003 14.86 Cluster 1
2 65578 11.24 Cluster 2
3 51489 14.44
Caractéristique 2
4 53467 13.59
5 55920 14.70
6 41231 12.97
7 48570 13.33
8 64616 11.64
9 54917 14.33
10 59689 14.68 Cluster 3
11 61443 15.43
12 63892 10.28
13 62502 15.62
… … …
Caratéristique 1

Introduction (2)
Fruit • Couleur
• Texture
Algorithme de
regroupement
Données : Ensemble d’échantillons
Clusters
Apprentissage non supervisée pour le regroupement

Introduction (3)
Objectif de l’apprentissage non supervisé est de former des classes ou clusters
(groupes) contenant les échantillons qui se ressemblent en utilisant un critère ou une
mesure de similarité.
Cluster 1
Cluster 2 Le processus d’apprentissage non supervisé
est appelé Clustering ou regroupement.
Caractéristique 2
Cluster 3 La formation des clusters impose deux conditions :

• Définir le nombre de clusters à former.
• Choisir un algorithme de clustering.
Caractéristique 1

Taxonomie des algorithms d’apprentissage (1)
Algorithmes d’apprentissage sont réparties en quatre approches :

1. Clustering partionnel (Partional clustering)
2. Clustering hiérarchique (Hierarchical clustering)
3. Clustering fondé sur la densité (Density-based clustering)
4. Clustering fondé sur une loi de distribution (Distribution-based clustering)

Clustering partionnel (Partional clustering) est fondé sur la séparation des données
en utilisant les centroides des clusters.
Centroïde
Centroïde
Cluster 1
Cluster 2
Principe :
Caractéristique 2
Affectation de chaque échantillon au

centroïde le plus proche en utilisant
une mesure de similarité.
Cluster 3
Centroïde
Caractéristique 1
Difficulté :
Détermination des centroïdes.

Clustering hiérarchique (Hierarchical clustering) est fondé sur le regroupement
d’échantillons similaires dans des clusters en utilisant un regroupement
hiérarchique via une mesure de similarité.
Principe :
1. Au départ, chaque échantillon est considéré
comme un cluster.
Caractéristique 2
2. On cherche deux clusters les plus proches

pour former un nouveau cluster.
3. On répète l’opération plusieurs fois jusqu’à
former un seul cluster.
Caractéristique 1

Exemple de Clustering hiérarchique
Etape 1 :
Caractéristique 2
Caractéristique 2
Caratéristique 1 Caratéristique 1
Identification des clusters Regroupement des clusters

les plus proches. les plus proches

Etape 2 :
Caractéristique 2
Caractéristique 2
Caractéristique 1 Caractéristique 1


Etape 3 :
Caractéristique 2
Caractéristique 2


Etape 4 :
Caractéristique 2
Caractéristique 2


Etape 5 :
Caractéristique 2
Caractéristique 2


Caractéristique 2 2
1
4
3
6
5
1 2 4 3 5 6
Caractéristique 1 Dendogramme
(Représentation hiérarchique)
Combien de clusters ?

Clustering fondé sur la densité (Density-based clustering) permet de former

les clusters selon la densité (ou concentration) des échantillons en utilisant
une mesure de similarité et un seuil de décision.
Outliers
Caractéristique 2
Caractéristique 1

Clustering fondé sur la distribution (Distribution-based clustering) permet de

former les clusters en se basant sur l’hypothèse que les données appartenant à un
cluster obeïssent à une loi de distribution comme par exemple la loi gaussienne.
Chaque cluster est défini par une

distribution gaussienne ayant une moyenne
et une variance.
Caractéristique 2
Chaque échantillon est rattaché à un cluster en

sélectionnant la distance minimale par rapport à
la moyenne et la variance des distributions.
Caractéristique 1

K-Means Algorithm (1)
Principle: Assigning each sample to the nearest centroid using a similarity

measure.
Optimization criterion: Expectation-Maximization.
Algorithm:
Centroid
1: Define the number of K clusters.
Centroid Cluster 1 2: Initialize randomly the K centroids.
Cluster 2
3: Repeat:
4: Expectation: Assign each sample to the
Feature 2
nearest centroid.
5: Maximization : Calculate the new
Cluster 3 centroids using the average in each cluster.
Centroid 6: Until Stabilization of centroids
Feature 1

Notations :
• = , ,….., where defines the sample of the index = 1, … , and is the
number of samples.
• = , ,…, where is the cluster of the index = 2, … , and defines the
number of clusters.
• Each cluster is composed of a subset ={ , , ,…, } having samples.
• Each cluster has a centroid defined by the mean :
1
=

Initialize randomly the centroids: Random selection of samples used as centroids.
Feature 2
Feature 2
Feature 1 Feature 1

Assign samples to a cluster : Each sample is assigned to the nearest cluster defined by
the centroid using the similarity measure expressed via the Euclidian distance as:
, =
Feature 2
Feature 1

Calculate and shift the new centroids using the mean equation.
Feature 2
Feature 2
Feature 1 Feature 1

Recalculate and shift new centroids until stabilization.
Feature 2
Feature 2
Feature 1 Feature 1

Drawback of the K-Means algorithm: Random selection of the centroids to start the
algorithm.
Solution : Running the algorithm many times to determine the best number of clusters by
minimizing the sum of distances of samples to its centroid called inertia based on the Sum
of Squared Errors (SSE).
Notation:
={ , , ,…, }, : Number of samples in a cluster .
Calculation of the inertia for a cluster:

Feature 2
= ( , )
Calculation of the inertia for all clusters:
= + +…+
Feature 1

Feature 2
Feature 2
Feature 2
Feature 1 Feature 1 Feature 1
First running Second running Third running
= 180 = 80 = 100

Finding the number of clusters via the Elbow method
Elbow algorithm:
1: Define the range of the cluster’s number.
2: Calculate the inertia (SSE) for each number of clusters.
3: Plot the inertia versus the number of clusters.
4: Select the optimal number of clusters by searching the Elbow of the curve.
Elbow Curve
600
500
400
SSE (Inertia)
300
200
100
0
2 3 4 5 6 7 8 9 10 11 12
K Clusters

Finding the number of clusters via the Silhouette method
Notations :
• = , ,….., where defines the sample of the index = 1, … , and is the
number of samples.
• = , ,…, where is the uster of the index = 2, … , and defines the
number of clusters
• Each cluster is composed of a subset of samples namely .

Finding the number of clusters via the Silhouette score
• Silhouette score is used to evaluate the quality of clusters
when using the K-Means algorithm.
• Silhouette score is calculated for each sample using two

Cluster 1
mean distances namely ( ) and , which is defined:
()
=
max{ , )}
Feature 2
( ): Mean distance between the sample and all other

samples in the same cluster called the mean intra-cluster
distance.
( ): Mean distance between the sample and all other
samples of the nearest cluster called the mean nearest-
Cluster 2 cluster distance.
The mean Silhouette score is then deduced for all samples as:
Feature 1
+1
+1: Good separation.

0: Bad separation (Overlapping of clusters).
1: Samples are assigned to the wrong clusters.
Illustrative example Sample Cluster
1
1
1
Cluster 1
2
2
2
Feature 2
Distance Table Score Table

() () ()
Cluster 2
0.0 0.2 0.3 1.2 1.4 1.5 0.25 1.37 0.82
0.2 0.0 0.1 1.1 1.2 0.9 0.15 1.07 0.86

Feature 1
0.3 0.1 0.0 1.6 1.8 1.4 0.2 1.60 0.88
1.2 1.1 1.6 0.0 0.4 0.8 0.6 1.3 0.54
1.4 1.2 1.8 0.4 0.0 0.6 0.5 1.47 0.66
1.5 0.9 1.4 0.8 0.6 0.0 0.7 1.27 0.45
Mean Score 0.70

Finding the number of clusters via the Silhouette method
Silhouette algorithm:
1: Define the range of the cluster’s number.
2: Calculate the mean distance of the points belonging to the same cluster.
3: Calculate the mean distance of the points belonging to other clusters.
4: Calculate the Silhouette score.
5: Plot the Silhouette score versus the cluster’s number.
6: Select the optimal cluster’s number by finding the maximum score.
Score Silhouette Curve

0.7
0.6
0.5
Score
0.4
0.3
0.2
0.1
0
2 3 4 5 6 7 8 9 10 11 12
K Clusters

Application
BTS distribution in Algeria

AI_Chap2 (1)

Transféré par

Droits d'auteur :

Formats disponibles

AI_Chap2 (1)

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

AI_Chap2 (1)

Transféré par

Droits d'auteur :

Formats disponibles

Apprentissage non supervisé

Youcef CHIBANI, Prof., Dr.

Laboratoire d’Ingénierie des Systèmes Intelligents et Communicants

Faculté de Génie Electrique

Prof. Youcef CHIBANI 1

Prof. Youcef CHIBANI 2

Echantillon Caractéristique 1 (Feature #1)

Echantillon #1 Caractéristique 1 Caractéristique 2

Prof. Youcef CHIBANI 3

Données : Ensemble d’échantillons

Apprentissage non supervisée pour le regroupement

Prof. Youcef CHIBANI 4

Cluster 3 La formation des clusters impose deux conditions :

Prof. Youcef CHIBANI 5

Algorithmes d’apprentissage sont réparties en quatre approches :

Prof. Youcef CHIBANI 6

Affectation de chaque échantillon au

Prof. Youcef CHIBANI 7

2. On cherche deux clusters les plus proches

Prof. Youcef CHIBANI 8

Identification des clusters Regroupement des clusters

Prof. Youcef CHIBANI 9

Identification des clusters Regroupement des clusters

Prof. Youcef CHIBANI 10

Identification des clusters Regroupement des clusters

Prof. Youcef CHIBANI 11

Identification des clusters Regroupement des clusters

Prof. Youcef CHIBANI 12

Identification des clusters Regroupement des clusters

Prof. Youcef CHIBANI 13

Prof. Youcef CHIBANI 14

Clustering fondé sur la densité (Density-based clustering) permet de former

Prof. Youcef CHIBANI 15

Clustering fondé sur la distribution (Distribution-based clustering) permet de

Chaque cluster est défini par une

Chaque échantillon est rattaché à un cluster en

Prof. Youcef CHIBANI 16

Principle: Assigning each sample to the nearest centroid using a similarity

Optimization criterion: Expectation-Maximization.

Centroid 6: Until Stabilization of centroids

Prof. Youcef CHIBANI 17

• Each cluster has a centroid defined by the mean :

Prof. Youcef CHIBANI 18

Prof. Youcef CHIBANI 19

Prof. Youcef CHIBANI 20

Prof. Youcef CHIBANI 21

Prof. Youcef CHIBANI 22

={ , , ,…, }, : Number of samples in a cluster .

Calculation of the inertia for a cluster:

Calculation of the inertia for all clusters:

Prof. Youcef CHIBANI 23

First running Second running Third running

Prof. Youcef CHIBANI 24

Prof. Youcef CHIBANI 25

Prof. Youcef CHIBANI 26

• Silhouette score is calculated for each sample using two

( ): Mean distance between the sample and all other

+1: Good separation.

Distance Table Score Table

0.2 0.0 0.1 1.1 1.2 0.9 0.15 1.07 0.86