Prediction Distribution Monitoring

Data Pipelines, Monitoring & Reliability DS practice problem on Onlearn.

Difficulty: medium.

Topics: Prediction Distribution Monitoring, Kullback-Leibler Divergence, Population Stability Index, Kolmogorov-Smirnov Test, Prediction Confidence Intervals, Concept Drift Detection, Statistical Process Control, Model Observability, Data Drift Analysis, Software Reliability Engineering, Probabilistic Modeling, Distribution Shift Detection, Performance Metric Tracking, Uncertainty Quantification, Feature Attribution Analysis, Alerting Threshold Optimization.

Implement a function to monitor changes in model prediction distributions between a reference (baseline) period and a current period. This is a critical MLOps task for detecting model drift in production. Given two lists of prediction scores (probabilities between 0 and 1), compute the following monitoring metrics: 1. Mean Shift : The difference between the mean of current predictions and the mean of reference predictions 2. Standard Deviation Ratio : The ratio of current standard deviation to reference standard deviation 3. Jensen Shannon Divergence : A symmetric measure of distribution similarity based on histogram comparison 4. Drift Detected : A boolean flag indicating if JS divergence exceeds 0.1 (significant drift threshold) For the Jensen Shannon divergence calculation: Create histograms using n bins equally spaced bins between 0 and 1 Apply Laplace smoothing to handle empty bins: P(bin) = (count + 1) / (total + n bins) Compute JS divergence as the average of KL divergences from each distribution to their mixture Return a dictionary with keys 'mean shift', 'std ratio', 'js divergence', and 'drift detected'.