Successor Representation Learning
Representation Learning, Advanced Theory & Miscellaneous DS practice problem on Onlearn.
Difficulty: medium.
Topics: Successor Representation Learning, Discounted Future State Occupancy, Eigenvector Centrality, Transition Dynamics Matrix, Generalized Value Functions, Successor Feature Decomposition, Reinforcement Learning, Linear Algebra, Stochastic Processes, Representation Learning, Dynamic Programming, Temporal Difference Learning, Spectral Decomposition, Markov Reward Processes, Feature Map Construction, Bellman Equations.
Implement a function that learns the Successor Representation (SR) from a stream of experience in a finite state environment. The Successor Representation decomposes the value function into two parts: a representation matrix M that captures the expected discounted future state occupancy under a policy, and a reward weight vector w that captures the expected immediate reward at each state. The value function is then reconstructed as V = M w. Given: experience: A list of (state, reward, next state, done) tuples representing transitions n states: Number of states in the environment gamma: Discount factor alpha sr: Learning rate for the SR matrix updates alpha w: Learning rate for the reward weight vector updates For each transition in the experience stream, your function should: 1. Update the reward weight for the current state toward the observed reward using the reward learning rate 2. Update the SR matrix row for the current state using a temporal difference update. The TD target for the SR at the current state combines the one hot indicator for the current state with the discounted SR of the next state (when the episode has not terminated). When the transition is terminal, no bootstrapping from the next state should occur. 3. After processing all experience, compute the value function by combining M and w. Return a tuple of three elements: the SR matrix M (as a list of lists), the reward weight vector w (as a list), and the value function V (as a list), all with values rounded to 4 decimal places.