Continuous Batching (In-Flight Batching) Simulator

LLM Inference & Memory Systems DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding Continuous Batching (In-Flight Batching) in LLM Inference, In-flight Batching, KV Cache Management, Iteration-level Scheduling, Context Window Constraints, Request Preemption, Computer Architecture, Operating Systems, Distributed Systems, Algorithm Design, Performance Engineering, Throughput Optimization, Latency Analysis, Queueing Theory, GPU Resource Scheduling, Memory Management.

Implement a 'ContinuousBatchingSimulator' class that manages a request queue and a fixed size batch buffer. The simulator should process tokens iteratively. When a request completes its generation, the simulator must immediately pull the next request from the waiting queue into the active batch slot, ensuring the GPU is never idle if requests are pending. Return the total time steps taken to complete all requests.