Frame-Aware Corrupt for Drift Simulation

Data Pipelines, Monitoring & Reliability DS practice problem on Onlearn.

Difficulty: medium.

Topics: Frame-Aware Corrupt for Drift Simulation, Covariate Shift, Gaussian Noise Injection, Temporal Frame Alignment, Distributional Divergence Metrics, Deterministic Corruption Kernels, Data Engineering, Statistical Process Control, Software Reliability Engineering, Computer Vision, Machine Learning Operations, Data Drift Detection, Synthetic Data Augmentation, Pipeline Observability, Image Preprocessing Pipelines, Fault Injection Testing.

Autoregressive video models condition each new chunk on historical frames. At inference time, those historical frames are model generated and may contain accumulated errors. If the model was only ever trained on clean history, it becomes brittle when its own imperfect outputs are fed back as context, leading to drift: color shifts, inconsistent motion, and scene resets. Frame Aware Corrupt addresses this by randomly perturbing historical frames during training, simulating the kinds of errors a model accumulates over long generation. Each frame is independently processed through up to three corruption types applied in sequence: additive Gaussian noise, a per channel color shift, and a mean blur. Write a function frame aware corrupt that applies this corruption pipeline to a batch of video frames. The function takes a numpy array frames of shape (T, H, W, C) representing T historical frames, along with the following parameters: gaussian prob (float): probability of applying Gaussian noise to each frame. gaussian std (float): standard deviation of the Gaussian noise. color shift prob (float): probability of applying a color shift to each frame. color shift range (float): each channel is shifted by a value sampled uniformly from [ color shift range, color shift range]. blur prob (float): probability of applying a mean blur to each frame. blur kernel size (int): side length of the square mean blur kernel. rng (numpy.random.Generator): a seeded random generator to use for all stochastic decisions. For each frame independently and in order, apply: Gaussian noise if rng.random() < gaussian prob, then color shift if rng.random() < color shift prob, then blur if rng.random() < blur prob. For blur, use edge padding and apply the mean kernel to each channel independently. Return the corrupted frames as a float numpy array of the same shape.