Warmup + Cosine Decay Schedule

Calculus & Optimization DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding Learning Rate Schedulers: Linear Warmup and Cosine Decay, Linear Warmup Phase, Cosine Annealing, Min/Max Learning Rate Bounds, Step-wise vs Epoch-wise Scheduling, Floating Point Precision in Decay, Optimization Theory, Deep Learning Foundations, Gradient Descent Dynamics, Hyperparameter Tuning, Neural Network Training, Learning Rate Scheduling, Convergence Acceleration, Weight Initialization Impacts, Stochastic Gradient Descent, Training Stability.

Implement a learning rate scheduler function that calculates the learning rate for a given current step. The schedule should start with a linear warmup from 0 to 'max lr' over 'warmup steps', followed by a cosine decay from 'max lr' to 'min lr' over the remaining steps until 'total steps'. If the current step exceeds 'total steps', return 'min lr'.