AdamW Optimizer Step
Backpropagation, Training & Optimization DS practice problem on Onlearn.
Difficulty: hard.
Topics: Understanding AdamW Optimizer Step, AdamW Decoupled Weight Decay, First Moment Exponential Moving Average, Second Moment Exponential Moving Average, Bias Correction, Epsilon Smoothing, Numerical Optimization, Calculus, Deep Learning, Stochastic Gradient Descent, Functional Programming, Gradient Descent Variants, Adaptive Learning Rates, Regularization Techniques, Momentum Accumulation, Hyperparameter Tuning.
Implement the AdamW weight update step for a single parameter. Given the parameter 'w', its gradient 'grad', the first moment 'm', the second moment 'v', the learning rate 'lr', the weight decay coefficient 'wd', and the epsilon 'eps', calculate the new weight and update the moment buffers according to the AdamW formula.