The Noisy Top-K Gating Function

MoE, Compression & Scaling DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding Implement the Noisy Top-K Gating Function, Top-K Selection, Gaussian Noise Injection, Softmax Normalization, Expert Routing, Sparsity Constraints, Deep Learning Architectures, Numerical Linear Algebra, Probabilistic Modeling, Distributed Computing, Information Theory, Mixture-of-Experts Systems, Sparse Tensor Operations, Stochastic Regularization, Activation Functions, Parallel Model Training.

Implement the Noisy Top K gating mechanism used in Mixture of Experts (MoE) models. Given an input matrix, weight matrices, pre sampled noise, and a sparsity constraint k, compute the final gating probabilities matrix.