Computational Efficiency of MoE
MoE, Compression & Scaling DS practice problem on Onlearn.
Difficulty: easy.
Topics: Understanding Calculate Computational Efficiency of MoE, Gating Network Routing, FLOPs Estimation, Expert Capacity Factor, Matrix Multiplication Sparsity, Conditional Computation, Deep Learning Architectures, Computational Complexity Theory, Distributed Systems, Numerical Linear Algebra, Hardware-Aware Optimization, Transformer Scaling Laws, Sparse Neural Networks, Floating Point Arithmetic, Parallel Computing Paradigms, Model Compression Techniques.
Calculate the computational cost savings of an MoE layer compared to a dense layer, as discussed in the paper 'Outrageously Large Neural Networks: The Sparsely Gated Mixture of Experts Layer.' Given the number of experts, sparsity (number of active experts), and input/output dimensions, compute the floating point operations (FLOPs) for both and determine the savings percentage.