First-Visit Monte Carlo Prediction

Foundations & Tabular RL DS practice problem on Onlearn.

Difficulty: medium.

Topics: First-Visit Monte Carlo Prediction, First-Visit Bias, Return Averaging, State-Value Function, Episode Trajectory, Empirical Mean, Reinforcement Learning, Probability Theory, Dynamic Programming, Statistical Estimation, Stochastic Processes, Monte Carlo Methods, Temporal Difference Learning, Markov Decision Processes, Policy Evaluation, Sampling Theory.

Implement First Visit Monte Carlo prediction for estimating state values. Monte Carlo methods learn directly from complete episodes of experience without bootstrapping. In first visit MC, we estimate the value of a state as the average of returns following the first visit to that state in each episode. Your task is to process episodes and compute state value estimates.