Guidance Attention Mask for Chunked Video
Detection, Video & Advanced Vision DS practice problem on Onlearn.
Difficulty: hard.
Topics: Understanding Guidance Attention Masks in Spatiotemporal Video Processing, Block-Diagonal Masking, Causal Self-Attention, Temporal Chunking Strategy, Global Context Injection, Sparse Attention Patterns, Linear Algebra, Deep Learning Architectures, Computer Vision, Optimization Theory, Signal Processing, Self-Attention Mechanisms, Temporal Convolutional Networks, Video Representation Learning, Transformer Memory Bottlenecks, Spatiotemporal Feature Extraction.
Implement a function create guidance mask(chunk size, num chunks, guidance indices) that generates a 2D attention mask of shape (T, T) where T = chunk size num chunks. The mask should enforce: 1) Intra chunk attention (frames can attend to all others in the same chunk). 2) Inter chunk guidance (all frames in chunk 'i' can attend to the global 'guidance indices' frames). 3) Return the mask as a binary matrix where 1 allows attention and 0 restricts it.