The document defines functions for preprocessing data, applying clustering algorithms, and evaluating clustering performance. It loads clustering algorithms like MiniBatchKMeans, SpectralClustering, and OPTICS. For different datasets, it applies the algorithms, calculates the Davies-Bouldin Index metric, and identifies the best performing algorithm and number of clusters. It plots the clustered data and prints the results.
The document defines functions for preprocessing data, applying clustering algorithms, and evaluating clustering performance. It loads clustering algorithms like MiniBatchKMeans, SpectralClustering, and OPTICS. For different datasets, it applies the algorithms, calculates the Davies-Bouldin Index metric, and identifies the best performing algorithm and number of clusters. It plots the clustered data and prints the results.
The document defines functions for preprocessing data, applying clustering algorithms, and evaluating clustering performance. It loads clustering algorithms like MiniBatchKMeans, SpectralClustering, and OPTICS. For different datasets, it applies the algorithms, calculates the Davies-Bouldin Index metric, and identifies the best performing algorithm and number of clusters. It plots the clustered data and prints the results.
The document defines functions for preprocessing data, applying clustering algorithms, and evaluating clustering performance. It loads clustering algorithms like MiniBatchKMeans, SpectralClustering, and OPTICS. For different datasets, it applies the algorithms, calculates the Davies-Bouldin Index metric, and identifies the best performing algorithm and number of clusters. It plots the clustered data and prints the results.
Download as TXT, PDF, TXT or read online from Scribd
Download as txt, pdf, or txt
You are on page 1of 2
import pandas as pd
from sklearn.cluster import MiniBatchKMeans, SpectralClustering, OPTICS
from sklearn.metrics import davies_bouldin_score import matplotlib.pyplot as plt
# Data preprocessing function
def preprocess_data(file_path): # Reading the data from CSV data = pd.read_csv(file_path, header=None) # Normalizing the data data_normalized = (data - data.mean()) / data.std() return data_normalized
# Function to apply clustering and calculate the Davies-Bouldin Index
def apply_clustering_and_evaluate(data, algorithm): # Applying the clustering algorithm to the data labels = algorithm.fit_predict(data) # Calculating Davies-Bouldin Index db_index = davies_bouldin_score(data, labels) return labels, db_index
for path in [ (r"C:\Users\Vishal\Desktop\DM2\D01.csv"), (r"C:\Users\Vishal\Desktop\DM2\D02.csv"), (r"C:\Users\Vishal\Desktop\DM2\D03.csv") ]: data = preprocess_data(path)
labels, db_index = apply_clustering_and_evaluate(data, alg) if db_index < best_davies_bouldin_score: best_davies_bouldin_score = db_index best_algorithm_name = name best_labels = labels
# Counting number of clusters
num_clusters = len(set(best_labels))
# Plotting the results
plt.scatter(data[0], data[1], c=best_labels, cmap='viridis', marker='o') plt.title(f'{best_algorithm_name} - Davies-Bouldin Index: {best_davies_bouldin_score:.2f}, Number of Clusters: {num_clusters}') plt.xlabel('Feature 1') plt.ylabel('Feature 2') plt.show() print(f"The best algorithm for {path} is {best_algorithm_name} with a Davies- Bouldin Index of {best_davies_bouldin_score:.2f} and {num_clusters} clusters")