K-Means++ Initialization
Clustering DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding K-Means++ Initialization Strategy, Squared Euclidean Distance, Weighted Random Sampling, Cumulative Distribution Function, Proximity-based Selection, Iterative Centroid Seeding, Unsupervised Learning, Probability Theory, Distance Metrics, Optimization Algorithms, Vector Spaces, Centroid Initialization, Stochastic Processes, Euclidean Geometry, Initialization Heuristics, Convergence Analysis.
Implement the K Means++ initialization algorithm. Given a dataset X (as a list of lists) and the number of clusters k, select k initial centroids such that the first is chosen uniformly at random, and subsequent centroids are chosen based on the D^2 weighting scheme to improve convergence speed and final cluster quality.