Minimax with Learned Value Function

Planning, Dynamics & Decision Systems DS practice problem on Onlearn.

Difficulty: medium.

Topics: Minimax with Learned Value Function, Alpha-Beta Pruning, Bellman Equation, Value Function Approximation, Backpropagation, Markov Decision Process, Game Theory, Reinforcement Learning, Robotics Control, Function Approximation, Search Algorithms, Adversarial Planning, Deep Q-Learning, Dynamic Programming, Neural Network Architectures, State-Space Modeling.

In two player zero sum games, the minimax algorithm finds optimal play by recursively evaluating game states. However, for games with deep or infinite game trees, searching all the way to terminal states is infeasible. A practical solution is to combine minimax search with a learned value function that estimates the value of non terminal states when the search reaches a predefined depth limit. Implement a function minimax with learned value that performs depth limited minimax search on a game tree. The function should: Return the exact value for terminal nodes (regardless of remaining depth). Use a linear value function $V(s) = \phi(s)^T \mathbf{w}$ to evaluate non terminal nodes when the depth limit is reached (depth equals 0) or when a non terminal node has no children. Alternate between maximizing and minimizing players at each level of the tree. Parameters: node: int, the current node ID. children: dict mapping node id (int) to a list of child node ids. Nodes not in this dict or with empty lists have no children. terminal values: dict mapping node id to a float value for terminal game states. features: numpy array of shape (num nodes, d), where features[node id] is the feature vector for that node. weights: numpy array of shape (d,), the weight vector for the learned value function. depth: int, the remaining search depth allowed. maximizing: bool, True if the current player is the maximizer. Returns: A float representing the minimax value at the given node.