PPO Clipped Surrogate Loss with Clip Diagnostics
Advanced & Deep RL DS practice problem on Onlearn.
Difficulty: hard.
Topics: Understanding Proximal Policy Optimization (PPO) Clipped Surrogate Loss and Stability Diagnostics, Clipped Surrogate Objective, Probability Ratio r_t(theta), Advantage Normalization, Clip Fraction Calculation, KL-Divergence Constraint, Trust Region Policy Optimization, Reinforcement Learning, Deep Learning, Optimization Theory, Probability Theory, Numerical Analysis, Policy Gradient Methods, Trust Region Methods, Stochastic Gradient Descent, Actor-Critic Architectures, Advantage Estimation.
Implement a function that computes the PPO clipped surrogate loss given policy ratios, advantage estimates, and a clipping epsilon. Additionally, return the fraction of samples that were clipped to provide a diagnostic metric for training stability.