Windy Gridworld with Sarsa
Foundations & Tabular RL DS practice problem on Onlearn.
Difficulty: medium.
Topics: Understanding Temporal Difference Learning in Stochastic Environments, Epsilon-Greedy Action Selection, Q-Table Initialization, Windy Environment Dynamics, Episode Termination Conditions, TD Error Calculation, Reinforcement Learning, Markov Decision Processes, Stochastic Processes, Dynamic Programming, Probability Theory, On-policy Control, Temporal Difference Learning, Exploration vs Exploitation, Gridworld Environments, Value Function Approximation.
Implement a Sarsa agent to navigate a 10x7 gridworld with a wind effect. The wind pushes the agent upward by a varying number of cells (0, 1, or 2) depending on the column. The agent must reach the goal state (7, 3) from start (0, 3) with a discount factor of 1.0, step size of 0.5, and epsilon of 0.1.