Build a Transformer Encoder Layer
Attention Mechanisms DS practice problem on Onlearn.
Difficulty: hard.
Topics: Understanding the architectural components of a Transformer Encoder Layer, Scaled Dot-Product Attention, Multi-Head Attention Projection, Layer Normalization, Residual Connections, Positional Encoding integration, Dropout regularization, Deep Learning, Sequence Modeling, Linear Algebra, Optimization, Neural Network Architectures, Attention Mechanisms, Encoder-Decoder Architectures, Normalization Techniques, Gradient Flow, Feed-Forward Networks.
Implement a Transformer Encoder Layer class in PyTorch. The layer must include: 1. Multi Head Self Attention, 2. A Position wise Feed Forward Network (two linear layers with a ReLU activation), 3. Two LayerNormalization steps, and 4. Residual connections around both the attention and feed forward sub layers. Assume the input dimension is 'd model'.