Monte Carlo Tree Search
Planning, Dynamics & Decision Systems DS practice problem on Onlearn.
Difficulty: hard.
Topics: Monte Carlo Tree Search, Upper Confidence Bound applied to Trees (UCT), Backpropagation of Value Estimates, Tree Policy Selection, Rollout Simulation, Expansion and Node Creation, Reinforcement Learning, Search and Optimization, Probabilistic Graphical Models, Game Theory, Robotic Motion Planning, Sequential Decision Making, Heuristic Search Algorithms, Simulation-Based Inference, Multi-Armed Bandit Problems, State-Space Representation.
Implement Monte Carlo Tree Search (MCTS) for a simple sequential game. The game: start at a number, players alternate adding either 1 or 2, and whoever reaches exactly the target value wins (going over means you lose). MCTS should explore possible moves through random simulations and return the best first action. The algorithm has four phases: Selection (pick promising nodes), Expansion (add new nodes), Simulation (random playout), and Backpropagation (update statistics).