Priority Queue in Planning

Advanced RL Theory, Planning & TD Learning DS practice problem on Onlearn.

Difficulty: medium.

Topics: Priority Queue in Planning, Binary Heap, Dijkstra's Algorithm, Prioritized Sweeping, TD Error Magnitude, Fibonacci Heap, Reinforcement Learning, Data Structures, Dynamic Programming, Graph Theory, Computational Complexity, Temporal Difference Learning, Heuristic Search Algorithms, Priority Queue Implementations, Bellman Equation Solvers, State Space Traversal.

Implement a planning function that uses a priority queue to efficiently perform value updates on a Q table using a learned environment model. In model based reinforcement learning, an agent can learn a model of the environment and use it for planning. Instead of updating all state action pairs uniformly, a priority queue can focus computation on the most impactful updates first. The priority of each state action pair is determined by the magnitude of its Bellman error how much the current Q value differs from the target value implied by the model. Your function should: 1. Accept a Q table (num states x num actions), a model dictionary mapping (state, action) to (reward, next state), a predecessors dictionary mapping each state to a list of (state, action) pairs that lead to it, a discount factor gamma, a priority threshold theta, and a maximum number of update steps. 2. Initialize the priority queue by computing the Bellman error for every (state, action) pair in the model, adding those with error exceeding theta. 3. Repeatedly pop the highest priority (state, action) pair from the queue and perform the Bellman update on Q(s, a). After each update, compute the Bellman error for all predecessor (state, action) pairs of the updated state and add them to the queue if their error exceeds theta. 4. Stop when the queue is empty or when max steps updates have been performed. 5. Return the updated Q table as a list of lists, rounded to 4 decimal places. Use a max priority queue (highest priority popped first). Handle stale entries in the priority queue appropriately.

dsAdvanced RL Theory, Planning & TD Learning

Tutor

Waking the tutor…

Advanced RL Theory, Planning & TD Learning

0 of 43 solved

Back to roadmap