First-Visit Monte Carlo Control with Exploring Starts

Foundations & Tabular RL DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding First-Visit Monte Carlo Control with Exploring Starts, First-Visit Averaging, Exploring Starts Assumption, Greedy Policy Improvement, Return Accumulation, State-Action Value Function (Q-table), Reinforcement Learning, Probability Theory, Stochastic Processes, Dynamic Programming, Optimization, Monte Carlo Methods, Policy Iteration, Action-Value Estimation, Generalized Policy Iteration (GPI), Exploration Strategies.

Implement a First Visit Monte Carlo Control algorithm with Exploring Starts for a generic environment. The environment should support state action pairs, return calculation, and policy improvement. Given a list of episodes (each episode is a list of (state, action, reward) tuples), compute the optimal Q table and the resulting optimal deterministic policy.

dsFoundations & Tabular RL

Tutor

Waking the tutor…

Foundations & Tabular RL

0 of 24 solved

Back to roadmap