Muon Optimizer Step with Matrix Preconditioning

Backpropagation, Training & Optimization DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding Muon Optimizer and Matrix Preconditioning, Newton-Schulz Iteration, Polar Decomposition, Weight Orthogonalization, Inverse Square Root Approximation, Spectral Normalization, Linear Algebra, Optimization Theory, Deep Learning, Matrix Calculus, Numerical Analysis, Gradient Preconditioning, Second-Order Optimization, Orthogonal Manifold Optimization, Matrix Decomposition, Iterative Refinement.

Implement a simplified Muon optimizer step. Given a weight gradient matrix G, compute the preconditioned update using the Newton Schulz iteration to approximate the inverse square root of G^T G. Specifically, compute the update U = G (G^T G)^( 1/2). Use 3 iterations for the approximation.