KMeans Clustering Bidimensional Daniel Ames Camayo
KMeans Clustering Bidimensional Daniel Ames Camayo
KMeans Clustering Bidimensional Daniel Ames Camayo
June 7, 2024
1 GUIA DE LABORATORIO
1.1 APELLIDOS, Nombres: Ames Camayo Daniel Vides
Fecha: 07 de Julio de 2024
2 STANDARD LIBRARIES:
[ ]: import pandas as pd
import matplotlib.pyplot as plt
3 CUSTOMIZED LIBRARIES:
[ ]: from sklearn.cluster import KMeans
from sklearn.preprocessing import MinMaxScaler
4 EXTRACCION DE DATOS:
[ ]: pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S') # Se capta la fecha y hora␣
↪actual
[ ]: '2024-06-07 05:59:10'
<IPython.core.display.HTML object>
Saving ClusteringBidimensional.csv to ClusteringBidimensional (4).csv
1
2 3 0.0119 28 0.1439 1
3 4 0.5362 76 0.2878 0
4 5 0.8096 14 0.1641 1
5 6 0.5323 35 0.1502 1
6 7 0.7870 90 0.0360 2
7 8 0.9125 31 0.1158 1
8 9 0.6307 91 0.0601 2
9 10 0.8131 22 0.0595 1
10 11 0.7874 17 0.1261 1
11 12 0.0579 70 0.0996 2
12 13 0.3477 37 0.0843 1
13 14 0.1617 24 0.0644 1
14 15 0.9018 81 0.3518 0
15 16 0.8550 32 0.1218 1
16 17 0.9065 22 0.1765 1
17 18 0.2867 63 0.0383 2
18 19 0.7480 28 0.1585 1
19 20 0.7381 99 0.0357 2
20 21 0.4254 22 0.1685 1
21 22 0.0550 70 0.1002 2
22 23 0.7066 19 0.0931 1
23 24 0.1206 35 0.0981 1
24 25 0.5469 75 0.3965 0
25 26 0.7344 18 0.0561 1
26 27 0.8721 17 0.1319 1
27 28 0.2066 82 0.0496 2
28 29 0.1000 28 0.1406 1
29 30 0.5946 98 0.0678 2
30 31 0.1220 32 0.1686 1
31 32 0.5110 82 0.4803 0
32 33 0.2753 86 0.4155 0
33 34 0.7862 80 0.4860 0
34 35 0.1394 94 0.4062 0
35 36 0.6563 70 0.4168 0
36 37 0.4410 74 0.3937 0
37 38 0.4346 85 0.0789 2
38 39 0.8532 74 0.3001 0
39 40 0.0828 69 0.0394 2
5 ARRAY de Numpy
[ ]: import numpy as np
array = np.array([1, 2, 3, 4, 5])
print(array)
[1 2 3 4 5]
2
6 E.D.A - Exploratory Data Analysis: Análisis Estadística De-
scriptiva
[ ]: df.describe()
3
8 Análisis Univariado Cualitativas: Análisis de cada variable
categórica nominal y ordinal independientemente
[ ]: sns.countplot(x='Risk', data=df) # Asegurarse de que 'Index' es una columna␣
↪categórica
4
9 Análisis Bivariado y Multivariado: Análisis de cada variable
“x(i)” versus la variable target (objetivo) “y”
[ ]: sns.scatterplot(x='Risk', y='Return', data=df)
5
10 TRANSFORMACION Y/O LIMPIEZA - PRE-
PROCESAMIENTO: ETL
[ ]: df_no_normalizado = df.copy()
scaler = MinMaxScaler(feature_range=(0, 1))
df[['Risk', 'Return']] = scaler.fit_transform(df[['Risk', 'Return']])
df
6
11 12 0.0579 0.674157 0.141905 2
12 13 0.3477 0.303371 0.107928 1
13 14 0.1617 0.157303 0.063735 1
14 15 0.9018 0.797753 0.701976 0
15 16 0.8550 0.247191 0.191206 1
16 17 0.9065 0.134831 0.312680 1
17 18 0.2867 0.595506 0.005774 2
18 19 0.7480 0.202247 0.272707 1
19 20 0.7381 1.000000 0.000000 2
20 21 0.4254 0.134831 0.294915 1
21 22 0.0550 0.674157 0.143238 2
22 23 0.7066 0.101124 0.127471 1
23 24 0.1206 0.280899 0.138574 1
24 25 0.5469 0.730337 0.801244 0
25 26 0.7344 0.089888 0.045303 1
26 27 0.8721 0.078652 0.213635 1
27 28 0.2066 0.808989 0.030868 2
28 29 0.1000 0.202247 0.232956 1
29 30 0.5946 0.988764 0.071286 2
30 31 0.1220 0.247191 0.295137 1
31 32 0.5110 0.808989 0.987342 0
32 33 0.2753 0.853933 0.843438 0
33 34 0.7862 0.786517 1.000000 0
34 35 0.1394 0.943820 0.822785 0
35 36 0.6563 0.674157 0.846325 0
36 37 0.4410 0.719101 0.795026 0
37 38 0.4346 0.842697 0.095936 2
38 39 0.8532 0.719101 0.587164 0
39 40 0.0828 0.662921 0.008217 2
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870:
7
FutureWarning: The default value of `n_init` will change from 10 to 'auto' in
1.4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(
13 INTERPRETACION: De resultados
[ ]: # Plot the clusters on a scatter plot
plt.scatter(df['Risk'], df['Return'], c=df['Cluster_identificado'],␣
↪cmap='viridis')
8
[ ]: from scipy.spatial import ConvexHull
from matplotlib.patches import Polygon
hull = ConvexHull(cluster_points)
polygon = plt.Polygon(cluster_points[hull.vertices], edgecolor='blue',␣
↪linewidth=1, fill=None)
9
ax.add_patch(polygon)
# Create a legend
legend_elements = scatter.legend_elements()[0]
legend_labels = ['Cluster {}'.format(i) for i in range(num_clusters)]
ax.legend(legend_elements, legend_labels, loc='upper left')
10
# Plot the cluster centers as red circles
ax.scatter(centroids[:, 0], centroids[:, 1], marker='o', s=200, c='red')
ax.add_patch(polygon)
# Create a legend
legend_elements = scatter.legend_elements()[0]
legend_labels = ['Cluster {}'.format(i) for i in range(num_clusters)]
ax.legend(legend_elements, legend_labels, loc='upper left')
11
[ ]: # Perform clustering using K-means
%%time
num_clusters = 3
kmeans2 = KMeans(n_clusters=num_clusters)
kmeans2.fit(df_no_normalizado[['Risk', 'Return']])
12
/usr/local/lib/python3.10/dist-packages/sklearn/cluster/_kmeans.py:870:
FutureWarning: The default value of `n_init` will change from 10 to 'auto' in
1.4. Set the value of `n_init` explicitly to suppress the warning
warnings.warn(
CPU times: user 572 ms, sys: 45.8 ms, total: 618 ms
Wall time: 644 ms
13
for cluster in range(num_clusters):
cluster_points =␣
↪df_no_normalizado[df_no_normalizado['Cluster_identificado'] ==␣
↪cluster][['Risk', 'Return']].values
hull = ConvexHull(cluster_points)
polygon = plt.Polygon(cluster_points[hull.vertices], edgecolor='blue',␣
↪linewidth=1, fill=None)
ax.add_patch(polygon)
# Create a legend
legend_elements = scatter.legend_elements()[0]
legend_labels = ['Cluster {}'.format(i) for i in range(num_clusters)]
ax.legend(legend_elements, legend_labels, loc='upper left')
14
15