Hierarchical Action Space Flattening

Planning, Dynamics & Decision Systems DS practice problem on Onlearn.

Difficulty: medium.

Topics: Hierarchical Action Space Flattening, Action Flattening, Discrete-Continuous Hybridization, Bellman Equation, Jacobian Transformation, Embedding Dimensionality, Reinforcement Learning, Control Theory, Optimization Methods, Robotics Kinematics, Computational Geometry, Action Space Representation, Policy Gradient Methods, State-Space Modeling, Dynamic Programming, Manifold Learning.

In many reinforcement learning environments, actions are organized hierarchically: an agent first selects a high level action category, then chooses a specific sub action within that category. For example, a robot might first decide to "move" or "grasp", then select a direction or grip type. The number of sub actions can differ across top level actions. However, most standard RL algorithms (e.g., DQN, policy gradient methods) expect a flat, single integer action space. To bridge this gap, we need to flatten the hierarchical action space into contiguous integer indices and maintain bidirectional mappings. Implement a function flatten action space that takes: sub action counts: A list of integers where sub action counts[i] is the number of sub actions available under top level action i. flat probs (optional): A list of floats representing a probability distribution over the flat action space. The function should return a dictionary with: "n flat": The total number of flat actions. "flat to hier": A dictionary mapping each flat action index to a (top action, sub action) tuple. "hier to flat": A dictionary mapping each (top action, sub action) tuple to its flat index. Flat indices should be assigned by iterating through top level actions in order, and within each top level action, iterating through sub actions in order. If flat probs is provided, the result should additionally contain: "marginal probs": A list of floats giving the marginal probability of each top level action (sum of flat probabilities for all sub actions under that top level action), each rounded to 4 decimal places. "conditional probs": A list of lists, where entry i gives the conditional probability distribution over sub actions given top level action i was chosen, each value rounded to 4 decimal places. If the marginal probability for a top level action is zero, distribute probability uniformly across its sub actions.