KV Cache for Efficient Autoregressive Attention

LLM Inference & Memory Systems DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding KV Cache for Efficient Autoregressive Attention, Key-Value Projection, Sequence Dimension Concatenation, Time-Complexity Reduction, Cache Warm-up, Memory-Compute Trade-off, Deep Learning, Natural Language Processing, Computational Complexity, Memory Management, Transformer Architecture, Autoregressive Decoding, Attention Mechanism, Tensor Buffering, Inference Optimization, Stateful Computation.

Implement a simplified KV Cache class for a single attention head. The class should store the Key (K) and Value (V) tensors and provide a method to update the cache with new tokens and retrieve the full sequence of K and V for attention computation.