First-Visit Monte Carlo Control with Exploring Starts
Foundations & Tabular RL DS practice problem on Onlearn.
Difficulty: hard.
Topics: Understanding First-Visit Monte Carlo Control with Exploring Starts, First-Visit Averaging, Exploring Starts Assumption, Greedy Policy Improvement, Return Accumulation, State-Action Value Function (Q-table), Reinforcement Learning, Probability Theory, Stochastic Processes, Dynamic Programming, Optimization, Monte Carlo Methods, Policy Iteration, Action-Value Estimation, Generalized Policy Iteration (GPI), Exploration Strategies.
Implement a First Visit Monte Carlo Control algorithm with Exploring Starts for a generic environment. The environment should support state action pairs, return calculation, and policy improvement. Given a list of episodes (each episode is a list of (state, action, reward) tuples), compute the optimal Q table and the resulting optimal deterministic policy.