Weight Decay as L2 Regularization

Backpropagation, Training & Optimization DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding Weight Decay and L2 Regularization in Neural Network Optimization, L2 Penalty (Ridge), Weight Decay Coefficient, Gradient Update Rules, Overfitting Mitigation, Parameter Sparsity, Deep Learning Foundations, Optimization Theory, Numerical Analysis, Calculus, Machine Learning, Backpropagation, Regularization Techniques, Loss Function Optimization, Gradient Descent Variants, Hyperparameter Tuning.

Implement a function that performs a single weight update step using Stochastic Gradient Descent (SGD) with L2 weight decay. Given the current weights, gradient of the loss, learning rate, and a weight decay coefficient (lambda), calculate the new weights. The update rule is: w new = w learning rate (gradient + lambda w).