Signature Tables for Value Functions

Retrieval & Ranking Systems DS practice problem on Onlearn.

Difficulty: medium.

Topics: Signature Tables for Value Functions, Signature Table Compression, Bellman Equation Decomposition, Hash-based State Mapping, Quantization Error Bounds, Lookup Table Sparsity, Reinforcement Learning, Information Retrieval, Numerical Analysis, Data Structures, Probability Theory, Dynamic Programming, Approximate Value Iteration, Memory-Efficient Indexing, Stochastic Approximation, State Space Discretization.

Implement a signature table approach for estimating state values using TD(0) learning. In large or structured state spaces, it is often beneficial to compress the value function by grouping states that share common characteristics. A signature table achieves this by mapping each state to a signature derived from its binary feature vector. States that produce identical signatures share a single value estimate in a compact lookup table. Given a binary feature matrix where each row represents a state's feature vector, the signature of a state is determined by its feature pattern. States with the same pattern are treated as equivalent for value estimation purposes any TD update to one state's value also affects all states that share its signature. Your function should: 1. Determine each state's signature from its binary feature vector 2. Maintain a single shared value entry for each unique signature (all initialized to 0) 3. Process the given episodes using TD(0) updates, where the value entry for the current state's signature is updated based on the observed reward and the bootstrapped value of the next state's signature 4. Return the estimated value for every state Parameters: episodes: List of episodes. Each episode is a list of (state, reward, next state) tuples. next state is None for terminal transitions (terminal state value is 0). n states: Total number of states (states are integers 0 to n states 1) features: A numpy array of shape (n states, n features) with binary entries. features[s] is the feature vector for state s. gamma: Discount factor alpha: Learning rate Returns: A list of length n states with the estimated value for each state, rounded to 4 decimal places.