Cosine Annealing with Warm Restarts

Initialization, Normalization & Regularization DS practice problem on Onlearn.

Difficulty: medium.

Topics: Cosine Annealing with Warm Restarts, Cosine Annealing, Warm Restarts, Cyclical Learning Rates, Learning Rate Decay, Global Minimum Convergence, Optimization Theory, Deep Learning Foundations, Numerical Analysis, Hyperparameter Tuning, Stochastic Processes, Learning Rate Scheduling, Gradient Descent Variants, Convergence Analysis, Non-convex Optimization, Training Dynamics.

Implement a learning rate scheduler based on the SGDR (Stochastic Gradient Descent with Warm Restarts) technique from Loshchilov & Hutter, 2017. The scheduler uses cosine annealing to decay the learning rate from a maximum value to a minimum value within each cycle. At the end of each cycle, the learning rate "restarts" back to the maximum. The cycle length can optionally grow after each restart by a multiplicative factor. Write a function cosine annealing warm restarts(eta max, eta min, T 0, T mult, total epochs) that computes the learning rate for each epoch from 0 to total epochs 1. Parameters: eta max (float): Maximum (initial) learning rate at the start of each cycle eta min (float): Minimum learning rate at the end of each cycle T 0 (int): Number of epochs in the first cycle T mult (int): Multiplicative factor for cycle length after each restart. The second cycle has length T 0 T mult, the third has length T 0 T mult^2, and so on. When T mult is 1, all cycles have the same length T 0. total epochs (int): Total number of epochs to compute learning rates for The function should return a list of floats, where each float is the learning rate for the corresponding epoch, rounded to 4 decimal places. For each epoch, you need to determine which cycle it belongs to, compute how far into that cycle it is, and apply the cosine annealing formula accordingly.