Sigmoid MoE Router with Bias Correction

MoE, Compression & Scaling DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding Sparse Mixture of Experts Routing Dynamics, Sigmoid Gating, Top-K Selection, Learnable Bias Parameters, Expert Routing Logic, Weight Initialization Strategies, Deep Learning, Linear Algebra, Optimization Theory, Neural Network Architectures, Probability Distributions, Sparse Mixture of Experts, Gating Mechanisms, Tensor Operations, Gradient-based Learning, Load Balancing.

Implement a Sigmoid based MoE router class. The router should take an input tensor of shape (batch size, seq len, hidden dim) and project it using a weight matrix (hidden dim, num experts). Add a learnable bias vector of size (num experts). Apply the sigmoid function to the output of this projection. Finally, return the top k expert indices and their corresponding gating scores.

dsMoE, Compression & Scaling

Tutor

Waking the tutor…

MoE, Compression & Scaling

0 of 18 solved

Back to roadmap