PPO Clipped Surrogate Loss with Clip Diagnostics

Advanced & Deep RL DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding Proximal Policy Optimization (PPO) Clipped Surrogate Loss and Stability Diagnostics, Clipped Surrogate Objective, Probability Ratio r_t(theta), Advantage Normalization, Clip Fraction Calculation, KL-Divergence Constraint, Trust Region Policy Optimization, Reinforcement Learning, Deep Learning, Optimization Theory, Probability Theory, Numerical Analysis, Policy Gradient Methods, Trust Region Methods, Stochastic Gradient Descent, Actor-Critic Architectures, Advantage Estimation.

Implement a function that computes the PPO clipped surrogate loss given policy ratios, advantage estimates, and a clipping epsilon. Additionally, return the fraction of samples that were clipped to provide a diagnostic metric for training stability.