Core MDN Residualization

Vision-Language & Cross-Modal Systems DS practice problem on Onlearn.

Difficulty: hard.

Topics: Core MDN Residualization, Gaussian Mixture Components, Skip-Connection Residuals, Kullback-Leibler Divergence, Maximum Likelihood Estimation, Softmax Gating Networks, Multimodal Representation Learning, Probabilistic Graphical Modeling, Deep Neural Network Architecture, Statistical Optimization Theory, Information Theory, Mixture Density Networks, Residual Learning Frameworks, Cross-Modal Alignment, Latent Variable Modeling, Gradient-Based Parameter Estimation.

Implement the core Metadata Normalization (MDN) layer from the CVPR 2021 paper 'Metadata Normalization' by Lu et al. MDN removes the effects of extraneous variables (metadata) from learned features. Unlike Batch Normalization which standardizes using batch statistics, MDN uses regression analysis to remove the linear relationship between features and specified metadata variables. The goal is to produce residualized features that are orthogonal to the metadata subspace meaning the metadata can no longer explain any variance in the output features. For efficient batch learning, the paper pre computes the inverse covariance matrix from the full training set, then uses batch level statistics during training with a scaling factor to account for batch vs population size. Your task: Given features f, metadata matrix X, the pre computed inverse covariance sigma inv, and total sample count N, return the residualized features.