Implementing PagedAttention: Block-wise Attention Computation

LLM Inference & Memory Systems DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding PagedAttention Memory Management, Block Table Indexing, Physical Block Pool, Non-contiguous KV Storage, Reference Counting, Allocation Free-List Management, Computer Architecture, Operating Systems, Memory Management, Deep Learning Systems, Transformer Architecture, KV Cache Optimization, Virtual Memory Paging, Memory Fragmentation, Dynamic Allocation, Sequence Parallelism.

Implement a simplified PagedAttention allocator. You are given a total number of physical blocks and a block size. Your task is to manage a 'BlockTable' for a sequence of tokens. When a new token arrives, if the current block is full, you must allocate a new physical block index from a free list. Return the final block table mapping for a sequence of N tokens.