General Value Functions

Representation Learning, Advanced Theory & Miscellaneous DS practice problem on Onlearn.

Difficulty: medium.

Topics: General Value Functions, Cumulant Functions, Successor Representations, Eligibility Traces, Target Policy Alignment, Temporal Difference Error, Reinforcement Learning, Temporal Difference Learning, Predictive State Representations, Function Approximation, Stochastic Processes, Off-Policy Evaluation, Bootstrapping Methods, Neural Network Architectures, Discount Factor Dynamics, Policy Gradient Estimation.

Implement a function gvf td learning that learns multiple General Value Functions (GVFs) simultaneously from a single stream of experience using TD(0) updates. A GVF generalizes the standard value function in reinforcement learning. Instead of predicting cumulative discounted reward with a fixed discount factor, each GVF defines its own cumulant signal (what to predict) and continuation function (how to discount) at each transition. This allows an agent to make many different predictions about its interaction with the environment at the same time. Your function receives: transitions: A list of (state, next state) tuples representing the agent's experience. num states: Total number of states. gvf cumulants: A list of lists. Each inner list corresponds to one GVF and contains the cumulant value observed at each transition. gvf gammas: A list of lists. Each inner list corresponds to one GVF and contains the continuation (discount) value at each transition. A value of 0.0 indicates termination for that GVF at that step. alpha: The learning rate for TD updates. For each transition, update all GVF value estimates simultaneously. Initialize all value estimates to zero. Return a list of lists containing the learned value estimates for each GVF across all states.