KV Cache Tiered Offloading Simulator
LLM Inference & Memory Systems DS practice problem on Onlearn.
Difficulty: hard.
Topics: Understanding KV Cache Tiered Offloading and Memory Hierarchy, KV Cache Block Allocation, LRU Eviction Policy, Tiered Memory Hierarchy, GPU-CPU Interconnect Latency, Cache Hit/Miss Ratio Tracking, Computer Architecture, Operating Systems, Memory Management, Data Structures, Algorithms, Cache Coherency, Page Replacement Policies, Virtual Memory, Latency Optimization, Throughput Analysis.
Design a KV Cache Tiered Offloading Simulator. You are given a sequence of block IDs representing KV cache access patterns. The simulator has a fixed GPU capacity (in blocks) and a CPU capacity. When the GPU is full, offload the least recently used block to the CPU. If the CPU is full, evict the oldest block from the CPU. Implement a function 'simulate offloading(access pattern, gpu capacity, cpu capacity)' that returns the final list of block IDs currently in the GPU cache after all requests are processed.