The Noisy Top-K Gating Function
MoE, Compression & Scaling DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding Implement the Noisy Top-K Gating Function, Top-K Selection, Gaussian Noise Injection, Softmax Normalization, Expert Routing, Sparsity Constraints, Deep Learning Architectures, Numerical Linear Algebra, Probabilistic Modeling, Distributed Computing, Information Theory, Mixture-of-Experts Systems, Sparse Tensor Operations, Stochastic Regularization, Activation Functions, Parallel Model Training.
Implement the Noisy Top K gating mechanism used in Mixture of Experts (MoE) models. Given an input matrix, weight matrices, pre sampled noise, and a sparsity constraint k, compute the final gating probabilities matrix.