KV Cache for Efficient Autoregressive Attention
LLM Inference & Memory Systems DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding KV Cache for Efficient Autoregressive Attention, Key-Value Projection, Sequence Dimension Concatenation, Time-Complexity Reduction, Cache Warm-up, Memory-Compute Trade-off, Deep Learning, Natural Language Processing, Computational Complexity, Memory Management, Transformer Architecture, Autoregressive Decoding, Attention Mechanism, Tensor Buffering, Inference Optimization, Stateful Computation.
Implement a simplified KV Cache class for a single attention head. The class should store the Key (K) and Value (V) tensors and provide a method to update the cache with new tokens and retrieve the full sequence of K and V for attention computation.