Rote Learning for Game States
Representation Learning, Advanced Theory & Miscellaneous DS practice problem on Onlearn.
Difficulty: medium.
Topics: Rote Learning for Game States, State-Action Value Function, Kolmogorov Complexity, Asymptotic Time Complexity, Rademacher Complexity, Least Recently Used Eviction, Reinforcement Learning, Information Theory, Computational Complexity, Statistical Learning Theory, Memory Systems Architecture, Markov Decision Processes, Entropy and Compression, Algorithmic Efficiency, Generalization Bounds, Cache Hierarchy Management.
Task: Implement Rote Learning for Game State Evaluation One of the simplest approaches to game playing AI is rote learning : the agent plays through complete games. For every state it encounters along the way, it computes the discounted return from that point forward and stores it. If the same state appears across multiple games (or even multiple times within a single game), all observed returns are recorded and their average becomes the state's estimated value. This produces a lookup table that maps every previously seen game state to its memorized value. States never encountered remain unknown. Implement the function rote learning that processes a collection of game episodes and builds this memorized value table. Args: episodes: A list of episodes. Each episode is a list of (state, reward) tuples, where state is a hashable identifier (e.g., a string) and reward is a float received at that timestep. gamma: Discount factor (float between 0 and 1 inclusive). Returns: A dictionary mapping each encountered state to its average discounted return, with values rounded to 4 decimal places. The discounted return from timestep $t$ within an episode of length $T$ is computed backwards from the end of the episode. Every occurrence of a state (including repeated visits within the same episode) contributes an observed return to that state's record.