Gym/Gymnasium API Migration Layer
Planning, Dynamics & Decision Systems DS practice problem on Onlearn.
Difficulty: medium.
Topics: Gym/Gymnasium API Migration Layer, Gymnasium Step API, Observation Space Mapping, Action Space Normalization, Wrapper Pattern, Reset Method Signature, Reinforcement Learning, Software Engineering, Control Theory, Robotics Simulation, API Design, Environment Abstraction, Interface Versioning, State Space Modeling, Middleware Architecture, Backward Compatibility.
Reinforcement learning environments historically used two different Python API conventions for their reset() and step() methods. Your task is to implement a conversion function that translates outputs between these two formats. Old API format: reset() returns: observation step(action) returns: (observation, reward, done, info) New API format: reset() returns: (observation, info) step(action) returns: (observation, reward, terminated, truncated, info) The key semantic difference is that the old done flag conflated two distinct concepts: (1) the episode ended because a terminal state was reached (terminated), and (2) the episode ended because an external time limit was hit (truncated). In the old format, truncation was sometimes signaled by including "TimeLimit.truncated": True in the info dictionary. Implement gym api convert(data, call type, direction) that converts a single API output. Arguments: data: the output to convert (observation for old reset, tuple for everything else) call type: either 'reset' or 'step' direction: either 'old to new' or 'new to old' Returns: the converted output in the target format. Conversion details: For reset old to new, the info dictionary in the new format should be empty. For reset new to old, only the observation is returned. For step conversions, correctly handle the done/terminated/truncated mapping and manage the "TimeLimit.truncated" info key appropriately.