Sliding Window Attention
Attention Mechanisms DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding Sliding Window Attention, Local Context Window, Quadratic Complexity Reduction, Banded Matrix Structures, Softmax Normalization, Token Dependency Constraints, Linear Algebra, Deep Learning, Natural Language Processing, Computational Complexity, Matrix Operations, Attention Mechanisms, Memory Efficiency, Sequence Modeling, Transformer Architectures, Masking Strategies.
Implement a function sliding window attention mask(seq len, window size) that generates a boolean mask matrix of shape (seq len, seq len). The mask should be True if the token at column j can attend to the token at row i (i.e., |i j| <= window size), and False otherwise.