KV Cache Memory Budget and Eviction Policy

LLM Inference & Memory Systems DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding KV Cache Management and Least Recently Used (LRU) Eviction Strategies in LLMs, KV Cache State, Least Recently Used (LRU) Algorithm, Cache Eviction Triggers, Doubly Linked List for O(1) Updates, Hash Map for O(1) Lookup, Computer Architecture, Operating Systems, Deep Learning Optimization, Data Structures, Memory Management, KV Cache Paging, Cache Replacement Policies, Memory Fragmentation, Transformer Inference Latency, LRU Implementation.

Implement a KV Cache Manager that supports a fixed memory budget (measured in number of blocks). When the cache reaches its capacity, implement an LRU (Least Recently Used) eviction policy to remove the least recently accessed request's KV blocks. The manager should handle 'allocate' (add a request) and 'access' (update timestamp) operations.