The Discounted Return for a Given Trajectory
Foundations & Tabular RL DS practice problem on Onlearn.
Difficulty: easy.
Topics: Understanding the Cumulative Discounted Return in Reinforcement Learning, Discount Factor, Episodic Tasks, Cumulative Summation, Geometric Series, Trajectory Sampling, Reinforcement Learning, Probability Theory, Dynamic Programming, Stochastic Processes, Mathematical Optimization, Markov Decision Processes, Reward Functions, Return Definition, Time Horizons, Value Function Estimation.
In Reinforcement Learning, the return G t is defined as the sum of discounted rewards: G t = sum(gamma^k R {t+k+1}) for k=0 to infinity. Given a list of rewards [r 0, r 1, ..., r n] obtained from a trajectory and a discount factor gamma (0 <= gamma <= 1), write a function to calculate the total discounted return starting from time step 0.