Gridworld Policy Evaluation

Foundations & Tabular RL DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding Gridworld Policy Evaluation, Policy Evaluation, Value Function Approximation, In-place Updates, Terminal State Handling, Stopping Criteria, Reinforcement Learning, Dynamic Programming, Numerical Analysis, Probability Theory, Discrete Mathematics, Markov Decision Processes, Iterative Methods, Bellman Equations, State Space Modeling, Convergence Analysis.

Implement policy evaluation for a 5x5 gridworld. Given a policy (mapping each state to action probabilities), compute the state value function $V(s)$ for each cell using the Bellman expectation equation. The agent can move up, down, left, or right, receiving a constant reward of 1 for each move. Terminal states (the four corners) are fixed at 0. Iterate until the largest change in $V$ is less than a given threshold. Only use Python built ins and no external RL libraries.