Asynchronous PPO Training Pipeline
Advanced & Deep RL DS practice problem on Onlearn.
Difficulty: hard.
Topics: Understanding Asynchronous Proximal Policy Optimization (APPO) Architectures, Clipped Surrogate Objective, Generalized Advantage Estimation (GAE), V-trace Importance Sampling, Entropy Regularization, Asynchronous SGD, Reinforcement Learning, Distributed Systems, Parallel Computing, Optimization Theory, Stochastic Processes, Policy Gradient Methods, Actor-Critic Architectures, Parameter Synchronization, Trajectory Sampling, Experience Replay Management.
Design a simplified asynchronous PPO training loop structure. Implement a class that handles the synchronization of a global model with multiple worker actors, assuming a shared global network. The system should allow workers to push trajectories to a buffer and update the global policy using clipped objective functions.