Analyze Canary Deployment Health for Model Rollout
Data Pipelines, Monitoring & Reliability DS practice problem on Onlearn.
Difficulty: medium.
Topics: Analyze Canary Deployment Health for Model Rollout, A/B Testing Significance, Latency Percentile Tracking, Canary Weight Shifting, Model Drift Detection, Rollback Automation, MLOps and Infrastructure, Statistical Inference, Software Reliability Engineering, Distributed Systems, Data Engineering, Continuous Deployment Pipelines, Hypothesis Testing, Observability and Telemetry, Traffic Routing Strategies, Model Performance Evaluation.
In production ML systems, canary deployments are a critical strategy for safely rolling out new model versions. A small percentage of traffic is routed to the new (canary) model while the majority continues to use the existing (baseline) model. By comparing their performance, you can decide whether to promote the canary to full production or roll back. Given prediction results from both canary and baseline models, compute key comparison metrics to determine if the canary deployment is healthy. Each result in both lists is a dictionary with: 'latency ms': Response latency in milliseconds (float) 'prediction': The model's predicted value 'ground truth': The actual correct value Write a function analyze canary deployment(canary results, baseline results, accuracy tolerance, latency tolerance) that computes: 1. canary accuracy : Fraction of correct predictions for canary model (0 1) 2. baseline accuracy : Fraction of correct predictions for baseline model (0 1) 3. accuracy change pct : Relative change in accuracy as percentage 4. canary avg latency : Average latency of canary model (ms) 5. baseline avg latency : Average latency of baseline model (ms) 6. latency change pct : Relative change in latency as percentage 7. promote recommended : Boolean True if canary accuracy did not degrade beyond accuracy tolerance AND latency did not increase beyond latency tolerance If either input list is empty, return an empty dictionary. All numeric values should be rounded to 2 decimal places except accuracy values which should be rounded to 4 decimal places.