Blackjack with Monte Carlo Prediction
RL Environments, Games & Applications DS practice problem on Onlearn.
Difficulty: medium.
Topics: Blackjack with Monte Carlo Prediction, First-Visit Estimation, Episode Trajectory Sampling, State-Action Value Function, Discount Factor Gamma, Monte Carlo Control, Reinforcement Learning, Probability Theory, Stochastic Processes, Computational Statistics, Game Theory, Monte Carlo Methods, Temporal Difference Learning, Markov Decision Processes, Policy Evaluation, Value Function Approximation.
Implement a function that estimates state values for a simplified Blackjack card game using first visit Monte Carlo prediction. The simplified Blackjack rules are: Cards are drawn randomly from values 1 to 13, where values 11, 12, 13 count as 10 (face cards). An ace (card value 1) can count as 1 or 11. The player receives two cards and sees one of the dealer's cards (the showing card). The dealer also has a hidden card. A hand's value is the sum of its cards. If the hand contains an ace and counting it as 11 does not cause a bust (exceeding 21), the ace is "usable" and counted as 11. The player follows a fixed deterministic policy: hit (draw another card) if hand sum is below a given threshold, otherwise stick. After the player sticks or busts, the dealer reveals the hidden card and hits until reaching 17 or above, then sticks. If the player busts (sum 21), reward is 1. If the dealer busts, reward is +1. Otherwise, the hand closer to 21 wins (+1 or 1), or it is a draw (0). A state is a tuple: (player sum, dealer showing card, has usable ace), where has usable ace is a boolean. Your function should: 1. Simulate the specified number of Blackjack episodes using the given random seed 2. Apply first visit Monte Carlo prediction to estimate the value of each visited state under the fixed policy 3. Return a dictionary mapping each state to its estimated value Use numpy for random number generation (np.random.randint) with the provided seed.