Davies-Bouldin Index for Clustering Evaluation
Clustering DS practice problem on Onlearn.
Difficulty: medium.
Topics: Davies-Bouldin Index for Clustering Evaluation, Davies-Bouldin Index, Intra-cluster Dispersion, Inter-cluster Separation, Euclidean Distance, Centroid Proximity, Unsupervised Learning, Information Theory, Statistical Analysis, Computational Geometry, Performance Evaluation, Clustering Validation, Distance Metrics, Centroid-based Modeling, Cluster Separation Analysis, Dimensionality Reduction.
Implement a function to calculate the Davies Bouldin Index (DBI) for evaluating clustering quality. The DBI measures the average similarity between each cluster and its most similar cluster, where similarity is defined as a ratio of within cluster scatter to between cluster separation. For each cluster, calculate the scatter (average Euclidean distance of points to the cluster centroid). Then for each pair of clusters, compute their similarity ratio using the sum of their scatters divided by the distance between their centroids. The DBI is the average of the maximum similarity ratios across all clusters. Given: X: a numpy array of shape (n samples, n features) containing data points labels: a numpy array of shape (n samples,) containing cluster assignments for each point Return the Davies Bouldin Index rounded to 4 decimal places. If there is only one cluster, return 0.0. Lower values indicate better clustering with more compact and well separated clusters.