K Means
K Means
K Means
Machine Learning
S4 2023-2024
Lab (K-means)
Exercise1:
Implement K-means clustering using python:
Create the custom dataset with make_blobs (from sklearn.datasets) and plot it (type
the following code):
fig = plt.figure(0)
plt.grid(True)
plt.scatter(X[:,0],X[:,1])
plt.show()
The algorithm
1) Initialize the k cluster centers and plot them with data points.
clusters = {}
np.random.seed(23)
for i in range(k):
center = 2*(2*np.random.random((X.shape[1],))-1)
points = []
cluster = {'center' : center,'points' : [] }
clusters[i] = cluster
print(clusters)
plt.scatter(X[:,0],X[:,1])
plt.grid(True)
for i in clusters:
center = clusters[i]['center']
plt.scatter(center[0],center[1],marker = '*',c = 'red')
plt.show()
2) For each observation:
o Calculate the distance between each observation and the k center points.
def distance(p1,p2):
return np.sqrt(np.sum((p1-p2)**2))
o Assign the observation to the cluster of the nearest center point.
def assign_clusters(X, clusters):
for i in range(X.shape[0]):
dist = []
curr_x = X[i]
for j in range(k):
dis = distance(curr_x,clusters[j]['center'])
dist.append(dis)
curr_cluster = np.argmin(dist)
clusters[curr_cluster]['points'].append(curr_x)
return clusters
3) The center points are moved to the means (i.e., centers) of their respective clusters.
def update_clusters(X, clusters):
for i in range(k):
points = np.array(clusters[i]['points'])
if points.shape[0] > 0:
clusters[i]['center'] = new_center
clusters[i]['points'] = []
return clusters
Exercise2:
Load the Iris dataset. Build and train the K-means classifier (use KMeans from
sklearn.cluster, with k=5).
Find the optimal number of clusters(k) in the dataset (plot an elbow curve for
K=range(1,10)).
Build the model with the optimal k-value.
Find the cluster centers.
Predict classes of each observation.
Predict the class of this observation: [[0.8, 0.8, 0.8, 0.8]].