Rotary Positional Embeddings (RoPE)
Attention Mechanisms DS practice problem on Onlearn.
Difficulty: hard.
Topics: Understanding Rotary Positional Embeddings (RoPE) implementation, Theta Base Scaling, Interleaved Rotation Pairs, Complex Number Representation, Broadcasting Constraints, Frequency Decay, Linear Algebra, Deep Learning Architectures, Numerical Methods, Trigonometry, Tensor Operations, Attention Mechanisms, Positional Encoding, Complex Plane Rotation, Vector Transformations, Transformer Optimization.
Implement a function apply rotary pos emb(x, seq len, dim) that takes an input tensor x of shape (batch, seq len, head dim) and applies the Rotary Positional Embedding (RoPE). Assume the head dim is even. Use the standard theta base 10000. Return the rotated tensor.