Mountain Car with Function Approximation
RL Environments, Games & Applications DS practice problem on Onlearn.
Difficulty: medium.
Topics: Mountain Car with Function Approximation, Tile Coding, Experience Replay, Epsilon-Greedy Policy, Radial Basis Functions, Bellman Equation, Reinforcement Learning, Function Approximation, Control Theory, Stochastic Optimization, Computational Geometry, Temporal Difference Learning, Neural Network Architectures, Exploration Strategies, Value Function Estimation, Policy Gradient Methods.
Implement episodic semi gradient Sarsa for the Mountain Car problem using linear function approximation with grid based binary features. The Mountain Car environment has the following specification: State: (position, velocity) where position is in [ 1.2, 0.5] and velocity is in [ 0.07, 0.07] Actions: 0 (reverse thrust), 1 (no thrust), 2 (forward thrust) Dynamics: velocity' = clip(velocity + 0.001 (action 1) 0.0025 cos(3 position), 0.07, 0.07), then position' = clip(position + velocity', 1.2, 0.5). If position' hits the left boundary ( 1.2), velocity is reset to 0.0. Reward: 1.0 per time step Termination: position' = 0.5 Start state: position sampled uniformly from [ 0.6, 0.4), velocity = 0.0 For function approximation, use one hot binary features over a uniform grid: Divide the position range into n bins equal intervals and the velocity range into n bins equal intervals Each state action pair maps to a single active feature out of (n bins n bins 3) total features Use numpy's searchsorted on the bin edges (excluding the leftmost edge) to determine the bin index, clamped to [0, n bins 1] The agent uses an epsilon greedy policy with ties broken by selecting the lowest action index. Use numpy.random.RandomState initialized with the given seed for all randomness (epsilon greedy decisions and start state sampling). Your function should return a tuple of: 1. A list of integers representing steps taken per episode 2. The sum of the final weight vector, rounded to 4 decimal places