N-Step TD Prediction

Foundations & Tabular RL DS practice problem on Onlearn.

Difficulty: medium.

Topics: Understanding n-Step Temporal Difference (TD) Learning for Value Prediction, n-step Return, Truncated Returns, Discounted Cumulative Reward, Incremental Update Rule, Convergence Properties of TD(n), Reinforcement Learning, Dynamic Programming, Statistical Estimation, Stochastic Processes, Optimization Theory, Temporal Difference Learning, Bootstrapping, Bias-Variance Trade-off, Markov Decision Processes, Value Function Approximation.

Implement a function that calculates the n step TD update for a state value function. Given a sequence of rewards, a discount factor (gamma), the step size (alpha), the current value estimate for the state at time t, and the value estimate for the state at time t+n, calculate the updated value V(S t).