The Bellman Equation for Value Iteration
Foundations & Tabular RL DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding Implement the Bellman Equation for Value Iteration, State-Value Function, Transition Probability Matrix, Discount Factor, Expected Utility, Argmax Operator, Reinforcement Learning, Dynamic Programming, Probability Theory, Numerical Analysis, Decision Theory, Markov Decision Processes, Bellman Optimality, Iterative Methods, Stochastic Processes, Policy Evaluation.
Write a function that performs one step of value iteration for a given Markov Decision Process (MDP) using the Bellman equation. The function should update the state value function V(s) for each state based on possible actions, transition probabilities, rewards, and the discount factor gamma. Only use NumPy.