Autoscaling Replica Simulator with SLA Tracking

Data Pipelines, Monitoring & Reliability DS practice problem on Onlearn.

Difficulty: medium.

Topics: Autoscaling Replica Simulator with SLA Tracking, Horizontal Pod Autoscaling, P99 Latency Thresholding, Proportional-Integral-Derivative (PID) Tuning, Exponential Backoff Retries, Service Level Objective (SLO) Budgeting, Cloud Infrastructure Engineering, Distributed Systems Architecture, Performance Monitoring & Observability, Control Theory, Software Reliability Engineering, Dynamic Resource Provisioning, Load Balancing Algorithms, Time-Series Telemetry Analysis, Feedback Loop Controllers, Fault Tolerance Patterns.

Implement a function that simulates an autoscaling system for a model serving deployment. The simulator processes a time series of incoming request rates (requests per second) and dynamically adjusts the number of active replicas based on utilization thresholds, while tracking Service Level Agreement (SLA) compliance. Your function should accept: rps series: A list of integers representing the incoming requests per second at each time step capacity per replica: An integer representing the maximum requests per second each replica can handle min replicas: The minimum number of replicas that must remain active max replicas: The maximum number of replicas allowed scale up threshold: A float; if utilization exceeds this value, a scale up action is triggered scale down threshold: A float; if utilization drops below this value, a scale down action is triggered cooldown steps: An integer representing how many time steps must pass after a scaling action before another scaling action is permitted The simulation starts with min replicas active. At each time step, the system computes utilization as the ratio of incoming RPS to total capacity. An SLA violation occurs when requests exceed total capacity, and any requests beyond capacity are considered dropped. After computing metrics for the current step, the system evaluates whether to scale up or down by one replica, subject to replica bounds and the cooldown constraint. A scaling action only resets the cooldown if the replica count actually changes. Your function should return a dictionary with: 'final replicas': The replica count at the end of the simulation 'total sla violations': Number of time steps where demand exceeded capacity 'average utilization': Mean utilization across all time steps, rounded to 2 decimal places 'max replicas used': The peak replica count observed during the simulation 'total dropped requests': Total requests that could not be served due to insufficient capacity