KV Cache Tiered Offloading Simulator

LLM Inference & Memory Systems DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding KV Cache Tiered Offloading and Memory Hierarchy, KV Cache Block Allocation, LRU Eviction Policy, Tiered Memory Hierarchy, GPU-CPU Interconnect Latency, Cache Hit/Miss Ratio Tracking, Computer Architecture, Operating Systems, Memory Management, Data Structures, Algorithms, Cache Coherency, Page Replacement Policies, Virtual Memory, Latency Optimization, Throughput Analysis.

Design a KV Cache Tiered Offloading Simulator. You are given a sequence of block IDs representing KV cache access patterns. The simulator has a fixed GPU capacity (in blocks) and a CPU capacity. When the GPU is full, offload the least recently used block to the CPU. If the CPU is full, evict the oldest block from the CPU. Implement a function 'simulate offloading(access pattern, gpu capacity, cpu capacity)' that returns the final list of block IDs currently in the GPU cache after all requests are processed.

dsLLM Inference & Memory Systems

Tutor

Waking the tutor…

LLM Inference & Memory Systems

0 of 23 solved

Back to roadmap