Build a Transformer Encoder Layer

Attention Mechanisms DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding the architectural components of a Transformer Encoder Layer, Scaled Dot-Product Attention, Multi-Head Attention Projection, Layer Normalization, Residual Connections, Positional Encoding integration, Dropout regularization, Deep Learning, Sequence Modeling, Linear Algebra, Optimization, Neural Network Architectures, Attention Mechanisms, Encoder-Decoder Architectures, Normalization Techniques, Gradient Flow, Feed-Forward Networks.

Implement a Transformer Encoder Layer class in PyTorch. The layer must include: 1. Multi Head Self Attention, 2. A Position wise Feed Forward Network (two linear layers with a ReLU activation), 3. Two LayerNormalization steps, and 4. Residual connections around both the attention and feed forward sub layers. Assume the input dimension is 'd model'.

dsAttention Mechanisms

Tutor

Waking the tutor…

Attention Mechanisms

0 of 16 solved

Back to roadmap