Learned Positional Embeddings
Attention Mechanisms DS practice problem on Onlearn.
Difficulty: easy.
Topics: Learned Positional Embeddings, Absolute Positional Encoding, Sinusoidal Basis Functions, Learnable Weight Matrices, Sequence Length Constraints, Relative Position Bias, Deep Learning Architectures, Natural Language Processing, Signal Processing, Information Theory, Computational Complexity, Attention Mechanisms, Sequence Modeling, Representation Learning, Vector Space Embeddings, Parameter Optimization.
Implement a function that applies learned positional embeddings to a batch of token embeddings. In many transformer architectures (such as BERT and GPT), positional information is injected into the model using a trainable embedding table. Unlike fixed sinusoidal encodings, learned positional embeddings are stored as a matrix where each row corresponds to a specific sequence position and is optimized during training. Your Task: Write a function learned positional encoding(token embeddings, position embedding table, start pos) that takes: token embeddings: A 3D numpy array of shape (batch size, seq len, d model) representing the embedded token vectors for a batch of sequences. position embedding table: A 2D numpy array of shape (max seq len, d model) representing the learned embedding lookup table indexed by position. start pos: An integer indicating the starting position index in the sequence (default is 0). This is useful in autoregressive generation where decoding may continue from a non zero position. The function should return a numpy array of the same shape as token embeddings with the appropriate positional information incorporated. The same positional embeddings should be applied identically across all samples in the batch.