RL Environment Wrapper Profiling

RL Environments, Games & Applications DS practice problem on Onlearn.

Difficulty: medium.

Topics: RL Environment Wrapper Profiling, Gymnasium Wrapper Pattern, Wall-clock Latency Measurement, Memory Footprint Tracking, Decorator Design Pattern, Context Switching Overhead, Reinforcement Learning, Software Engineering, Performance Analysis, Systems Architecture, Data Structures, Environment Abstraction, Execution Profiling, Computational Complexity, Middleware Design, Asynchronous Processing.

In reinforcement learning, environments are often wrapped with multiple layers (observation normalization, frame stacking, reward clipping, etc.). Each wrapper adds processing overhead on every environment step. Profiling these wrappers helps identify bottlenecks in the data collection pipeline. Implement a function profile env wrappers that takes timing measurements collected at each wrapper level and computes per wrapper profiling statistics. The function receives: wrapper names: a list of strings representing wrapper names ordered from outermost to innermost (the last entry is always the base environment). cumulative step times: a list of lists of floats, where cumulative step times[i] contains multiple timing measurements (in milliseconds) recorded at wrapper level i. Each measurement at level i includes the execution time of all inner wrappers and the base environment. The function should compute and return a dictionary with: wrappers: a list of dictionaries (one per wrapper, in the same order), each containing: name: the wrapper name mean cumulative ms: the mean of the cumulative times at that level (rounded to 2 decimal places) overhead ms: the overhead introduced by this wrapper alone (rounded to 2 decimal places) overhead pct: the percentage of total step time attributable to this wrapper (rounded to 2 decimal places) bottleneck: the name of the wrapper with the highest overhead (if tied, use the one that appears first in the list) total overhead ms: the total average step time, equal to the outermost mean cumulative time (rounded to 2 decimal places)