Domain Expert Model Fusion
Vision-Language & Cross-Modal Systems DS practice problem on Onlearn.
Difficulty: medium.
Topics: Domain Expert Model Fusion, Contrastive Loss, Knowledge Distillation, Late Fusion, Multi-Head Self-Attention, Weight Decay, Multimodal Representation Learning, Ensemble Methods, Transfer Learning, Information Theory, Probabilistic Graphical Models, Cross-Modal Alignment, Model Distillation, Feature Fusion Architectures, Attention Mechanisms, Regularization Techniques.
Implement a Domain Expert Fusion algorithm inspired by LongCat Flash Thinking's domain parallel training scheme. In this approach, instead of training a single model on all domains simultaneously (which can be unstable), separate expert models are trained for different domains (e.g., STEM, Code, Agentic). These experts are then fused into a single model that achieves near Pareto optimal performance across all domains. Given: expert scores: A dictionary mapping expert names to their performance scores across domains. Each expert has scores for multiple domains as a dictionary {domain: score}. domain weights: A dictionary mapping domain names to their importance weights (weights sum to 1.0). fusion method: Either 'weighted average' or 'best per domain' Your function should: 1. For 'weighted average': Compute a weighted combination of all experts, where the fused model's score for each domain is the average of all experts' scores for that domain, weighted by how well each expert performs on that domain relative to others. 2. For 'best per domain': For each domain, select the score from the expert that performs best on that domain. Finally, compute the overall fused score as the weighted sum of domain scores using domain weights. Return the overall fused score as a float.