Hierarchical Clustering (Agglomerative)
Clustering DS practice problem on Onlearn.
Difficulty: medium.
Topics: Hierarchical Clustering (Agglomerative), Ward's Method, Cophenetic Correlation Coefficient, Euclidean Distance, Min-Max Normalization, Single Linkage, Unsupervised Learning, Distance Metrics, Data Preprocessing, Computational Complexity, Model Evaluation, Agglomerative Clustering, Linkage Criteria, Dendrogram Analysis, Feature Scaling, Proximity Matrices.
Task: Implement Agglomerative Hierarchical Clustering Implement a function agglomerative clustering(X, n clusters, linkage) that performs bottom up hierarchical clustering on a dataset. Overview Agglomerative clustering starts with each data point as its own cluster and iteratively merges the closest pairs of clusters until the desired number of clusters is reached. Requirements: 1. Use Euclidean distance to measure the distance between individual data points 2. Support three linkage methods to compute inter cluster distances: 'single': Minimum distance between any two points in different clusters 'complete': Maximum distance between any two points in different clusters 'average': Average distance between all pairs of points in different clusters 3. When merging clusters, always merge into the cluster with the smaller index 4. In case of distance ties, prefer the cluster pair where the first cluster has the smallest index, then the second cluster has the smallest index 5. Assign final cluster labels based on the sorted order of remaining cluster indices (smallest index gets label 0, next gets label 1, etc.) Input: X: Data points as a 2D list of shape (n samples, n features) n clusters: Target number of clusters (int) linkage: Linkage method one of 'single', 'complete', or 'average' Output: A list of integer cluster labels for each sample