Muon Optimizer Update with Newton-Schulz Iteration

Calculus & Optimization DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding Matrix Square Root Approximation via Newton-Schulz Iteration for Neural Network Weight Updates, Newton-Schulz convergence criteria, Identity matrix initialization, Matrix multiplication efficiency, Stability of inverse square root approximations, Weight update preconditioners in Muon, Numerical Linear Algebra, Convex Optimization, Matrix Calculus, Deep Learning Optimization, Iterative Methods, Matrix Inversion, Spectral Radius, Matrix Square Root, Gradient Preconditioning, Fixed-Point Iteration.

Implement a function that performs one iteration of the Newton Schulz algorithm to approximate the inverse square root of a square matrix G. The update rule is X {k+1} = 0.5 X k (3I G X k^2). Assume the matrix is pre scaled such that the spectral radius is within the convergence range.