Every-Visit Monte Carlo Prediction

Foundations & Tabular RL DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding Every-Visit Monte Carlo Prediction in Reinforcement Learning, Every-Visit vs First-Visit, Discount Factor (gamma), Cumulative Returns, Incremental Mean Updating, Sample Average Convergence, Reinforcement Learning, Probability Theory, Dynamic Programming, Stochastic Processes, Statistical Estimation, Monte Carlo Methods, State-Value Functions, Return Estimation, Policy Evaluation, Episode Trajectories.

Implement the Every Visit Monte Carlo prediction algorithm to estimate the state value function V for a given policy. Given a list of episodes (where each episode is a list of (state, reward) tuples), calculate the value of each state by averaging the returns following every occurrence of that state across all episodes.