K Means

Download as pdf or txt
Download as pdf or txt
You are on page 1of 3

ESTIN

Machine Learning
S4 2023-2024

Lab (K-means)

Exercise1:
Implement K-means clustering using python:

Create the custom dataset with make_blobs (from sklearn.datasets) and plot it (type
the following code):

X,y = make_blobs(n_samples = 500,n_features = 2,centers = 3,random_state = 1)

fig = plt.figure(0)

plt.grid(True)

plt.scatter(X[:,0],X[:,1])

plt.show()

The algorithm

1) Initialize the k cluster centers and plot them with data points.
clusters = {}
np.random.seed(23)
for i in range(k):
center = 2*(2*np.random.random((X.shape[1],))-1)
points = []
cluster = {'center' : center,'points' : [] }
clusters[i] = cluster
print(clusters)
plt.scatter(X[:,0],X[:,1])
plt.grid(True)
for i in clusters:
center = clusters[i]['center']
plt.scatter(center[0],center[1],marker = '*',c = 'red')
plt.show()
2) For each observation:
o Calculate the distance between each observation and the k center points.
def distance(p1,p2):
return np.sqrt(np.sum((p1-p2)**2))
o Assign the observation to the cluster of the nearest center point.
def assign_clusters(X, clusters):
for i in range(X.shape[0]):
dist = []
curr_x = X[i]
for j in range(k):
dis = distance(curr_x,clusters[j]['center'])
dist.append(dis)
curr_cluster = np.argmin(dist)
clusters[curr_cluster]['points'].append(curr_x)
return clusters
3) The center points are moved to the means (i.e., centers) of their respective clusters.
def update_clusters(X, clusters):

for i in range(k):

points = np.array(clusters[i]['points'])

if points.shape[0] > 0:

new_center = points.mean(axis =0)

clusters[i]['center'] = new_center

clusters[i]['points'] = []

return clusters

4) Steps 2) and 3) are repeated until no observation changes in cluster membership.


5) Predict the cluster for the data points.
def pred_cluster(X, clusters):
pred = []
for i in range(X.shape[0]):
dist = []
for j in range(k):
dist.append(distance(X[i],clusters[j]['center']))
pred.append(np.argmin(dist))
return pred

 Plot the data points with their predicted cluster center.


plt.scatter(X[:,0],X[:,1],c = pred_cluster)
for i in clusters:
center = clusters[i]['center']
plt.scatter(center[0],center[1],marker = '+',c = 'red')
plt.show()

Exercise2:

 Load the Iris dataset. Build and train the K-means classifier (use KMeans from
sklearn.cluster, with k=5).
 Find the optimal number of clusters(k) in the dataset (plot an elbow curve for
K=range(1,10)).
 Build the model with the optimal k-value.
 Find the cluster centers.
 Predict classes of each observation.
 Predict the class of this observation: [[0.8, 0.8, 0.8, 0.8]].

You might also like