Expected Value in a Markov Decision Process

Advanced RL Theory, Planning & TD Learning DS practice problem on Onlearn.

Difficulty: medium.

Topics: Expected Value in a Markov Decision Process, Discount Factor, Transition Probability Matrix, State-Value Function, Stationary Distribution, Recursive Decomposition, Probability Theory, Dynamic Programming, Stochastic Processes, Functional Analysis, Decision Theory, Bellman Equations, Markov Chains, Expectation Operators, Value Function Approximation, Dynamic Programming Recursion.

Given an MDP (Markov Decision Process) specified by a set of states, actions, transition probabilities, and rewards, write a function to compute the expected value of taking a particular action in a particular state, assuming a discount factor gamma. Use only NumPy.