MDN with Label Collinearity Control

Vision-Language & Cross-Modal Systems DS practice problem on Onlearn.

Difficulty: hard.

Topics: MDN with Label Collinearity Control, Gaussian Mixture Components, Label Collinearity Penalty, Negative Log-Likelihood, Heteroscedastic Aleatoric Uncertainty, Covariance Matrix Conditioning, Probabilistic Machine Learning, Multimodal Representation Learning, Statistical Signal Processing, Optimization Theory, Information Geometry, Mixture Density Networks, Cross-Modal Alignment, Regularization Techniques, Latent Variable Modeling, Multivariate Distribution Estimation.

Implement the extended MDN that handles confounding when metadata correlates with both features AND prediction labels. The problem: In many real scenarios, metadata does not just affect features but also correlates with the target label. For example, in medical imaging, patient age affects both brain scan features AND disease probability. Naive MDN might accidentally remove disease predictive information along with the metadata effects. The solution: Model both metadata X and labels y jointly in a multiple regression framework, but only remove the metadata component while preserving the label related variance. This requires forming an augmented design matrix, computing joint regression coefficients, then selectively removing only the metadata related component from features. Your task: Implement this extended MDN that preserves label informative variance while removing confounding metadata effects.