Grouped Query Attention (GQA)

Attention Mechanisms DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding Grouped Query Attention (GQA) Implementation, Head Grouping, KV Projection, Query-Key Dot Product, Softmax Normalization, Head Dimension Alignment, Deep Learning, Linear Algebra, Transformer Architecture, Attention Mechanisms, Computational Efficiency, Multi-Head Attention, KV Caching, Memory Bandwidth Optimization, Tensor Broadcasting, Attention Scaling.

Implement a function grouped query attention(q, k, v, num query groups) that simulates the GQA mechanism. Given query (Q), key (K), and value (V) tensors, perform the attention mechanism where multiple query heads share a single key value head group. Assume inputs are 4D tensors of shape (batch, heads, seq len, head dim).