Exponential Weighted Average of Rewards
Foundations & Tabular RL DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding Exponential Weighted Average of Rewards, Exponential Decay, Step Size Parameter, Recency Bias, Geometric Series Summation, Initial Value Bias, Reinforcement Learning, Probability Theory, Numerical Analysis, Time Series Analysis, Stochastic Processes, Temporal Difference Learning, Recursive Estimation, Signal Processing, Non-Stationary Environments, Weighted Moving Averages.
Given an initial value $Q 1$, a list of $k$ observed rewards $R 1, R 2, \ldots, R k$, and a step size $\alpha$, implement a function to compute the exponentially weighted average as: $$(1 \alpha)^k Q 1 + \sum {i=1}^k \alpha (1 \alpha)^{k i} R i$$ This weighting gives more importance to recent rewards, while the influence of the initial estimate $Q 1$ decays over time. Do not use running/incremental updates; instead, compute directly from the formula. (This is called the exponential recency weighted average .)