Cart-Pole Balancing with Linear Policy Simulation

Planning, Dynamics & Decision Systems DS practice problem on Onlearn.

Difficulty: medium.

Topics: Cart-Pole Balancing with Linear Policy Simulation, Reward Function Shaping, Eigenvalue Decomposition, Jacobian Matrix Computation, Hyperparameter Tuning, Inverted Pendulum Kinematics, Reinforcement Learning, Control Theory, Linear Algebra, Numerical Optimization, Classical Mechanics, Policy Gradient Methods, State-Space Representation, Matrix Factorization, Stochastic Gradient Descent, Lagrangian Dynamics.

Implement a function that simulates the classic cart pole balancing task using a linear policy. The cart pole system consists of a pole attached to a cart that moves along a frictionless track. The system has a 4 dimensional state: (x, x dot, theta, theta dot) representing cart position, cart velocity, pole angle (in radians from vertical), and pole angular velocity. At each time step, the agent selects one of two actions: Action 0: Apply a force of 10.0 N (push left) Action 1: Apply a force of +10.0 N (push right) Action selection uses a linear policy: choose action 1 if the dot product of the policy weight vector and the current state is = 0, otherwise choose action 0. The environment uses Euler integration with step size dt=0.02 to update the state based on the standard cart pole equations of motion with the following constants: gravity=9.8, cart mass=1.0, pole mass=0.1, pole half length=0.5. An episode terminates when: The cart position leaves the range [ 2.4, 2.4] (i.e., |x| 2.4) The pole angle leaves the range [ 12 degrees, 12 degrees] (i.e., |theta| 12 pi/180 radians) The maximum number of steps is reached The agent receives a reward of +1 for each step where the state is within bounds (termination check happens at the start of each step, before the state update). Return a tuple of (total reward, final state) where final state is a list of 4 floats each rounded to 4 decimal places.