NoPE (No Positional Embedding) with iRoPE Attention
Attention Mechanisms DS practice problem on Onlearn.
Difficulty: hard.
Topics: Understanding NoPE (No Positional Embedding) with iRoPE (Implicit Rotary Positional Embedding) Attention, Complex-valued rotation matrices, Interleaved feature pairing, Frequency-based positional encoding, Query-Key dot product transformation, Implicit spatial bias in attention, Linear Algebra, Deep Learning, Natural Language Processing, Attention Mechanisms, Vector Calculus, Rotary Positional Embeddings, Relative Position Encoding, Complex Number Representation in Neural Nets, Transformer Decoder Architectures, Attention Score Normalization.
Implement a simplified iRoPE attention mechanism that computes the attention scores without explicit positional embedding layers. Assume the input dimension 'd' is even, and perform the rotation on pairs of features using the rotary formula: q rotated = q cos(theta) + q rotated orthogonal sin(theta).