XGBoost Objective Function Calculation
Tree Models & Ensembles DS practice problem on Onlearn.
Difficulty: medium.
Topics: XGBoost Objective Function Calculation, Hessian Matrix, L2 Regularization Term, Newton-Raphson Step, Gradient Statistics, Objective Function Curvature, Supervised Learning, Numerical Optimization, Ensemble Methods, Information Theory, Computational Statistics, Gradient Boosting Frameworks, Taylor Series Approximation, Regularized Loss Functions, Decision Tree Induction, Second-Order Optimization.
Implement a function to calculate key components of the XGBoost objective function for a potential tree split. XGBoost uses a second order Taylor expansion to approximate the loss function, which requires both first order gradients (g) and second order hessians (h) for each sample. When evaluating a split, we need to compute: 1. Optimal leaf weights : The weight that minimizes the objective for each leaf node 2. Split gain : The reduction in the objective function from making the split Input: gradients: 1D numpy array of first order gradient values for each sample hessians: 1D numpy array of second order hessian values for each sample left indices: 1D numpy array of sample indices assigned to left child right indices: 1D numpy array of sample indices assigned to right child lambda reg: L2 regularization parameter (default 1.0) gamma: Tree complexity penalty for adding a new leaf (default 0.0) Output: A dictionary containing: 'left weight': Optimal weight for the left leaf (rounded to 4 decimal places) 'right weight': Optimal weight for the right leaf (rounded to 4 decimal places) 'gain': Gain from this split (rounded to 4 decimal places) A positive gain indicates the split improves the objective, while a negative gain suggests the split may not be worthwhile due to regularization penalties.