State Aggregation for Value Approximation

Representation Learning, Advanced Theory & Miscellaneous DS practice problem on Onlearn.

Difficulty: medium.

Topics: State Aggregation for Value Approximation, Tiling Coding, Bellman Error Minimization, Voronoi Tessellation, Feature Hashing, Bias-Variance Tradeoff, Reinforcement Learning, Information Theory, Statistical Learning Theory, Dynamic Programming, Functional Analysis, Value Function Approximation, Markov Decision Processes, Dimensionality Reduction, Stochastic Approximation, Partitioning Algorithms.

Implement state aggregation as a simple form of value function approximation. In many reinforcement learning problems, the state space can be very large. State aggregation addresses this by grouping multiple states together so that all states within the same group share a single value estimate. Given a list of episodes (each a list of (state, reward) tuples), a total number of states, a group assignment array that maps each state index to its group index, and a discount factor gamma, estimate the value of each state using Monte Carlo returns. The process works as follows: Compute the discounted return from every visit to every state across all episodes. Pool the returns by group: all returns from states in the same group are collected together. The value of each group is the average of all returns assigned to that group. Each state inherits the value of its assigned group. States belonging to groups that are never visited should have a value of 0.0. The reward at index i in an episode is the reward received after leaving the state at index i.