Silhouette Score for Clustering Evaluation

Clustering DS practice problem on Onlearn.

Difficulty: medium.

Topics: Silhouette Score for Clustering Evaluation, Intra-cluster Distance, Nearest-cluster Distance, Centroid-based Partitioning, Euclidean Space, Coefficient Normalization, Unsupervised Learning, Statistical Inference, Data Preprocessing, Performance Evaluation, Computational Geometry, Clustering Algorithms, Distance Metrics, Cluster Validation, Dimensionality Reduction, Feature Scaling.

Implement a function to calculate the Silhouette Score, a metric used to evaluate the quality of clustering results. The silhouette score measures how similar each data point is to its own cluster compared to other clusters. For each sample i: a(i) = mean distance from sample i to all other points in the same cluster b(i) = minimum of mean distances from sample i to points in each other cluster The silhouette coefficient for sample i is: s(i) = (b(i) a(i)) / max(a(i), b(i)) The overall silhouette score is the mean of all individual silhouette coefficients. The function should take a 2D numpy array X of data points and a 1D numpy array of cluster labels, and return the silhouette score rounded to 4 decimal places. Handle edge cases: if there is only one cluster or each point is its own cluster, return 0.0.