Bootstrap Parameter Lambda Study

Representation Learning, Advanced Theory & Miscellaneous DS practice problem on Onlearn.

Difficulty: medium.

Topics: Bootstrap Parameter Lambda Study, Bootstrap Bias Correction, Lambda Hyperparameter Tuning, Empirical Risk Minimization, Variance Reduction, Asymptotic Consistency, Statistical Learning Theory, Optimization Theory, Representation Learning, Probabilistic Graphical Models, Computational Complexity, Resampling Methods, Regularization Techniques, Stochastic Approximation, Latent Variable Modeling, Convergence Analysis.

Implement a function that computes lambda returns for every time step in a trajectory across multiple values of the bootstrapping parameter lambda, enabling analysis of how lambda controls the trade off between bootstrapping and Monte Carlo estimation. The lambda return blends n step returns using an exponentially decaying weighting scheme controlled by lambda. When lambda=0, the return reduces to the 1 step TD target (maximum bootstrapping from value estimates). When lambda=1, it equals the full Monte Carlo return (no bootstrapping). Intermediate values interpolate between these extremes. Given: rewards: A list of T float rewards [r 0, r 1, ..., r {T 1}] collected along a trajectory. values: A list of T float value estimates [V(s 0), V(s 1), ..., V(s {T 1})] for each non terminal state visited. The terminal state after the last reward has value 0. gamma: The discount factor (float between 0 and 1). lambdas: A list of float lambda values (each between 0 and 1) to evaluate. Your function should compute the lambda return for each time step t = 0, 1, ..., T 1 for each lambda value in the list, and return the results as a list of lists. The outer list corresponds to the lambda values in order, and each inner list contains the lambda returns for time steps 0 through T 1, with each value rounded to 4 decimal places.