Random Walk: TD vs Monte Carlo
Foundations & Tabular RL DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding Temporal Difference (TD) learning vs Monte Carlo (MC) methods in a Random Walk environment., TD Error, Episode Trajectory, Terminal State Handling, State Value Function, Incremental Mean Estimation, Reinforcement Learning, Probability Theory, Dynamic Programming, Stochastic Processes, Statistics, Value Function Approximation, Policy Evaluation, Markov Decision Processes, Bootstrapping, Sampling Methods.
Implement a 1D random walk environment with 7 states (5 non terminal, 2 terminal). Create two functions: 'monte carlo update' which updates state values based on a full episode return, and 'td update' which updates state values using a one step bootstrap. Assume a learning rate alpha=0.1 and a discount factor gamma=1.0.