Out-of-Bag Score Calculation
Tree Models & Ensembles DS practice problem on Onlearn.
Difficulty: medium.
Topics: Out-of-Bag Score Calculation, Out-of-Bag Error, Bootstrap Sampling, Unseen Data Estimation, Feature Importance, Subsampling Bias, Ensemble Learning, Statistical Validation, Supervised Learning, Decision Theory, Computational Complexity, Bagging Algorithms, Resampling Techniques, Model Generalization, Performance Metrics, Bootstrap Aggregation.
In bagging ensemble methods like Random Forest, each base estimator is trained on a bootstrap sample (random sample with replacement) of the training data. This means that for each estimator, some samples from the original dataset are not used during training these are called Out of Bag (OOB) samples. The OOB score provides an unbiased estimate of the ensemble's generalization performance without requiring a separate validation set. For each sample in the dataset, we can aggregate predictions from all estimators for which that sample was OOB, then compare the aggregated prediction with the true label. Your task is to implement a function calculate oob score that computes the OOB accuracy score for a classification task. The function takes: n samples: Total number of samples in the original dataset bootstrap indices: A list of lists, where each inner list contains the indices of samples used to train each estimator (samples NOT in this list are OOB for that estimator) predictions: A list of lists, where each inner list contains that estimator's predictions for ALL samples in the dataset y true: The true labels for all samples The function should: 1. Identify which samples are OOB for each estimator 2. Collect OOB predictions for each sample 3. Aggregate predictions using majority voting 4. Return the accuracy score over all samples that have at least one OOB prediction If a sample has no OOB predictions (it was in bag for all estimators), exclude it from the calculation. If no samples have OOB predictions, return 0.0.