Continuous Batching vs Static Batching Throughput Comparison
LLM Inference & Memory Systems DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding throughput optimization in LLM serving through batching strategies, Iteration-level Scheduling, Padding Overhead, KV Cache Management, Batch Fragmentation, Time-to-First-Token (TTFT), Distributed Systems, Computer Architecture, Parallel Computing, Queueing Theory, Performance Analysis, GPU Utilization, Memory Bandwidth Bottlenecks, Scheduling Algorithms, Compute-bound Tasks, Latency vs Throughput Trade-offs.
Implement a simulator to compare the total execution time (latency) of static batching versus continuous batching. Given a list of sequence lengths (tokens) and a fixed batch size capacity, calculate the total time units required for each strategy. Assume each token takes 1 unit of time to process. In static batching, the batch duration is determined by the max sequence length in the current batch. In continuous batching, new requests are added to the batch as soon as a slot becomes available, and processing proceeds iteration by iteration.