Asynchronous Dynamic Programming for Value Iteration

Advanced RL Theory, Planning & TD Learning DS practice problem on Onlearn.

Difficulty: medium.

Topics: Asynchronous Dynamic Programming for Value Iteration, In-place Updates, Gauss-Seidel Iteration, Contraction Mapping, Asynchronous Parallelism, State-Space Partitioning, Reinforcement Learning, Dynamic Programming, Numerical Optimization, Stochastic Processes, Computational Complexity, Value Iteration, Bellman Operators, Markov Decision Processes, Convergence Analysis, Parallel Computing.

Implement asynchronous value iteration for a Markov Decision Process (MDP). Unlike synchronous dynamic programming, where all states are updated simultaneously in each sweep using values from the previous iteration, asynchronous DP updates states one at a time in a given order, and each update immediately uses the most recent value estimates (in place updates). Given an MDP specified by: num states: the number of states transitions: a nested list where transitions[s][a] is a list of (probability, next state, reward) tuples representing the dynamics for taking action a in state s gamma: the discount factor update order: a list of state indices specifying which state to update at each step (states may repeat) Your function should: 1. Initialize the value function to zeros for all states 2. Process each state in update order sequentially, performing a Bellman optimality backup using the current value estimates 3. Return the final value function as a list of floats If a state has no available actions (empty action list), its value remains unchanged when selected for update. The Bellman optimality backup for a state computes the maximum expected return over all available actions.

dsAdvanced RL Theory, Planning & TD Learning

Tutor

Waking the tutor…

Advanced RL Theory, Planning & TD Learning

0 of 43 solved

Back to roadmap