Model Inference Statistics for Monitoring
Data Pipelines, Monitoring & Reliability DS practice problem on Onlearn.
Difficulty: easy.
Topics: Model Inference Statistics for Monitoring, P99 Latency Percentile, Kullback-Leibler Divergence, GPU Memory Fragmentation, HTTP 5xx Error Frequency, Inference Request Batching, Statistical Process Control, Distributed Systems Engineering, Software Observability, Data Stream Processing, Model Performance Evaluation, Latency Distribution Analysis, Concept Drift Detection, Resource Utilization Telemetry, Error Rate Thresholding, Throughput Bottleneck Identification.
In production ML systems, monitoring model inference performance is essential for maintaining service quality. Given a list of inference latency measurements (in milliseconds), compute key statistics that are commonly used in MLOps dashboards: 1. Throughput : The number of requests that can be processed per second (assuming single threaded sequential processing) 2. Average Latency : The mean latency across all measurements 3. Percentiles (p50, p95, p99) : The latency values below which 50%, 95%, and 99% of requests fall Write a function calculate inference stats(latencies ms) that takes a list of latency measurements and returns a dictionary with the computed statistics. Use linear interpolation for percentile calculations. If the input list is empty, return an empty dictionary.