Backgammon Feature Engineering
RL Environments, Games & Applications DS practice problem on Onlearn.
Difficulty: medium.
Topics: Backgammon Feature Engineering, TD-lambda Algorithm, Pip-count Calculation, One-hot Board Encoding, Markov Decision Process, Radial Basis Function Networks, Reinforcement Learning, Game Theory, Feature Engineering, Probability Theory, Statistical Modeling, Temporal Difference Learning, Combinatorial Game Analysis, Dimensionality Reduction, Stochastic Processes, Supervised Representation Learning.
Implement a feature extraction function for backgammon board positions that produces a fixed size feature vector suitable for value function approximation in reinforcement learning. Backgammon has two players (white and black) with 15 checkers each, distributed across 24 points. The board state must be encoded as a numeric feature vector so that a learning algorithm can estimate the value of any position. Encoding scheme (198 features total): For each of the 24 points (indexed 0 to 23), produce 4 features per player (8 features per point, 192 total): Given n checkers of a specific player on a point: Feature 1: 1.0 if n = 1, else 0.0 Feature 2: 1.0 if n = 2, else 0.0 Feature 3: 1.0 if n = 3, else 0.0 Feature 4: (n 3) / 2.0 if n = 3, else 0.0 The first 96 features are for white (4 per point, points 0 23), the next 96 are for black (same scheme). After the 192 point features, append 6 additional features: Index 192: white checkers on bar, divided by 2.0 Index 193: black checkers on bar, divided by 2.0 Index 194: white checkers borne off, divided by 15.0 Index 195: black checkers borne off, divided by 15.0 Index 196: 1.0 if it is white's turn, else 0.0 Index 197: 1.0 if it is black's turn, else 0.0 Inputs: board: list of 24 integers. Positive values = white checkers, negative values = black checkers on that point. bar: tuple (white bar, black bar) with checkers on the bar for each player. borne off: tuple (white off, black off) with checkers borne off for each player. turn: string 'white' or 'black' indicating whose turn it is. Returns: a numpy array of 198 float features.