The Best Gini-Based Split for a Binary Decision Tree

Tree Models & Ensembles DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding Find the Best Gini-Based Split for a Binary Decision Tree, Gini Impurity, Weighted Average, Thresholding, Feature Scanning, Tie-breaking Logic, Supervised Learning, Information Theory, Computational Complexity, Data Structures, Numerical Analysis, Decision Tree Induction, Impurity Measures, Feature Selection, Binary Classification, Algorithmic Optimization.

Implement a function that scans every feature and threshold in a small data set, then returns the split that minimises the weighted Gini impurity. Your implementation should support binary class labels (0 or 1) and handle ties gracefully. You will write one function: X is an $n\times d$ NumPy array of numeric features. y is a length $n$ NumPy array of 0/1 labels. The function returns (best feature index, best threshold) for the split with the lowest weighted Gini impurity. If several splits share the same impurity, return the first that you encounter while scanning features and thresholds.