Calinski-Harabasz Index for Clustering Evaluation

Clustering DS practice problem on Onlearn.

Difficulty: medium.

Topics: Calinski-Harabasz Index for Clustering Evaluation, Between-cluster Dispersion, Within-cluster Variance, Trace of Covariance Matrix, Euclidean Distance, Cluster Centroid, Unsupervised Learning, Statistical Inference, Linear Algebra, Information Theory, Computational Geometry, Clustering Validation, Distance Metrics, Matrix Decompositions, Centroid-based Modeling, Dimensionality Reduction.

Implement a function to compute the Calinski Harabasz Index (also known as the Variance Ratio Criterion) for evaluating clustering quality. This metric measures the ratio of between cluster dispersion to within cluster dispersion, where higher values indicate better defined clusters. Given: A 2D numpy array X of shape (n samples, n features) containing the data points A 1D numpy array labels of shape (n samples,) containing cluster assignments for each point Return the Calinski Harabasz score as a float. If there is only one cluster or if the number of clusters equals the number of samples, return 0.0.