Masked Self-Attention

Attention Mechanisms DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding Implement Masked Self-Attention, Softmax Normalization, Causal Masking, Scaled Dot-Product, Query-Key Alignment, Triangular Matrix Indexing, Linear Algebra, Deep Learning Architectures, Natural Language Processing, Probability and Statistics, Computational Complexity, Matrix Operations, Attention Mechanisms, Sequence Modeling, Tensor Transformations, Autoregressive Decoding.

Implement masked self attention, a variation of the attention mechanism used in sequence modeling tasks such as text generation. Your task is to compute masked self attention using query (Q), key (K), value (V) matrices and an attention mask.