Random Walk: TD vs Monte Carlo

Foundations & Tabular RL DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding Temporal Difference (TD) learning vs Monte Carlo (MC) methods in a Random Walk environment., TD Error, Episode Trajectory, Terminal State Handling, State Value Function, Incremental Mean Estimation, Reinforcement Learning, Probability Theory, Dynamic Programming, Stochastic Processes, Statistics, Value Function Approximation, Policy Evaluation, Markov Decision Processes, Bootstrapping, Sampling Methods.

Implement a 1D random walk environment with 7 states (5 non terminal, 2 terminal). Create two functions: 'monte carlo update' which updates state values based on a full episode return, and 'td update' which updates state values using a one step bootstrap. Assume a learning rate alpha=0.1 and a discount factor gamma=1.0.

dsFoundations & Tabular RL

Tutor

Waking the tutor…

Foundations & Tabular RL

0 of 24 solved

Back to roadmap