Episodic Info Dictionary Aggregation

Planning, Dynamics & Decision Systems DS practice problem on Onlearn.

Difficulty: medium.

Topics: Episodic Info Dictionary Aggregation, Dictionary Key Collision Resolution, Temporal Difference Error Accumulation, Circular Buffer Eviction Policy, Recursive Mean Estimation, Sparse Reward Vectorization, Reinforcement Learning, Control Theory, Data Structures, Probability Theory, System Identification, Episodic Memory Management, State-Space Modeling, Hash Map Optimization, Bayesian Inference, Trajectory Rollout Analysis.

In reinforcement learning training loops, environments return an info dictionary at each timestep. When an episode terminates at a given step, the info dictionary typically contains an "episode" key with sub dictionary holding summary statistics for that completed episode: "r" for total episodic reward and "l" for episode length (number of steps). When no episode ends at a step, the "episode" key is simply absent from that info dictionary. When training across many timesteps (and potentially many parallel environments), these info dictionaries accumulate. To monitor training progress, we need to detect which steps correspond to completed episodes and compute aggregate statistics over all completed episodes. Implement a function aggregate episodic info that takes a list of info dictionaries and returns a single summary dictionary with the following keys: "num episodes": integer count of completed episodes found "mean reward": mean of all episode rewards, rounded to 4 decimal places "mean length": mean of all episode lengths, rounded to 4 decimal places "min reward": minimum episode reward, rounded to 4 decimal places "max reward": maximum episode reward, rounded to 4 decimal places "min length": minimum episode length (integer) "max length": maximum episode length (integer) If no episodes were completed (no info dict contains the "episode" key), return all numeric values as 0 (use 0.0 for float fields and 0 for integer fields).