RL Training Experiment Tracking
RL Environments, Games & Applications DS practice problem on Onlearn.
Difficulty: medium.
Topics: RL Training Experiment Tracking, WandB Artifacts, Proximal Policy Optimization, Git LFS, Matplotlib Heatmaps, Kubernetes Pod Scaling, Experiment Management, Reinforcement Learning Theory, Software Engineering, Data Visualization, Cloud Infrastructure, Hyperparameter Optimization, Policy Gradient Methods, Version Control Systems, Time-Series Analytics, Distributed Computing.
During reinforcement learning training, it is essential to log and summarize performance metrics across episodes to understand learning progress, detect issues, and compare experiments. Implement a function track rl experiment that takes episode level training data and produces a summary report. The function receives: episode rewards (list of floats): The total reward obtained in each episode episode lengths (list of ints): The number of timesteps in each episode window size (int, default 10): The size of the sliding window for computing recent statistics Return a dictionary with the following keys: total episodes: The number of episodes recorded mean reward: The mean of all episode rewards std reward: The population standard deviation of all episode rewards max reward: The highest reward achieved in any episode min reward: The lowest reward achieved in any episode best episode: The 0 indexed episode number with the highest reward (if tied, return the earliest) mean length: The mean episode length across all episodes final moving avg: The mean reward over the last window size episodes (or all episodes if fewer than window size exist) reward improvement: The difference between the mean reward of the last window size episodes and the mean reward of the first window size episodes (using effective window if fewer episodes exist) All floating point values in the output should be rounded to 4 decimal places.