Sequential Video Generation with Diffusion Models
Detection, Video & Advanced Vision DS practice problem on Onlearn.
Difficulty: hard.
Topics: Understanding Temporal Consistency in Video Diffusion Models, Denoising Score Matching, Classifier-Free Guidance, Temporal Self-Attention, Latent Diffusion Process, Frame-wise Noise Injection, Deep Learning, Computer Vision, Generative Modeling, Linear Algebra, Probability Theory, Diffusion Probabilistic Models, Latent Space Representations, Temporal Dynamics, Attention Mechanisms, Spatial-Temporal Convolutions.
Implement a simplified temporal attention module for a video diffusion model. Given a tensor of shape (batch, frames, channels, height, width), reshape it to perform self attention across the temporal dimension to ensure coherence between frames.