Types of Eligibility Traces
Representation Learning, Advanced Theory & Miscellaneous DS practice problem on Onlearn.
Difficulty: medium.
Topics: Types of Eligibility Traces, Accumulating Traces, Replacing Traces, Dutch Traces, Eligibility Decay Rate, Lambda-return, Reinforcement Learning, Temporal Difference Learning, Stochastic Approximation, Dynamic Programming, Function Approximation, Credit Assignment Mechanisms, Policy Evaluation Algorithms, Bootstrap Methods, On-policy Control Methods, Off-policy Correction Techniques.
Implement a function compute eligibility traces that computes the final eligibility trace vector after processing a sequence of state visits. Eligibility traces are a fundamental mechanism in reinforcement learning that keep track of which states have been visited recently and how often. There are three main variants that differ in how they handle revisits to the same state: 1. Accumulating traces ("accumulating"): Each time a state is visited, its trace value increases by 1, on top of any decayed value. This means frequently visited states can have trace values greater than 1. 2. Replacing traces ("replacing"): When a state is visited, its trace is set to 1 regardless of its current value. This caps the trace at 1 and prevents unbounded growth. 3. Dutch traces ("dutch"): A parameterized blend that uses a step size parameter alpha to partially update the trace toward 1 upon a visit. After decaying, if state s is visited: e(s) = e(s) + alpha (1 e(s)). For all three types, at each timestep every trace first decays by a factor of gamma lam, and then the visited state's trace is updated according to the chosen rule. Your function should: Initialize all traces to zero Process each state visit in order Return the final eligibility trace vector as a list of floats Parameters: state visits: List of integer state indices visited at each timestep num states: Total number of states gamma: Discount factor lam: Trace decay parameter (lambda) trace type: One of "accumulating", "replacing", or "dutch" alpha: Step size parameter (only used for Dutch traces, default 0.1)