RMSNorm (Root Mean Square Layer Normalization)

Initialization, Normalization & Regularization DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding and Implementing Root Mean Square Layer Normalization (RMSNorm), RMS calculation, Numerical Stability (Epsilon), Element-wise Scaling, Feature Dimension Reduction, Learnable Gain Parameters, Deep Learning, Neural Network Architectures, Numerical Optimization, Linear Algebra, Tensor Calculus, Normalization Techniques, Transformer Components, Gradient Stability, Activation Scaling, Broadcasting Operations.

Implement the RMSNorm (Root Mean Square Layer Normalization) function. Unlike LayerNorm, RMSNorm does not subtract the mean, it only scales the inputs by the root mean square of the activations. The formula is: y = (x / sqrt(mean(x^2) + eps)) gamma, where gamma is a learnable weight parameter. Assume gamma is a vector of ones for this implementation.