Error-Dependent Multiple Weight Updates

Representation Learning, Advanced Theory & Miscellaneous DS practice problem on Onlearn.

Difficulty: medium.

Topics: Error-Dependent Multiple Weight Updates, Hessian Matrix Approximation, Error Surface Curvature, Momentum Decay Factor, Gradient Clipping Threshold, Weight Update Synchronicity, Optimization Theory, Statistical Learning, Neural Network Architectures, Information Theory, Numerical Analysis, Stochastic Gradient Descent, Backpropagation Algorithms, Weight Initialization Strategies, Regularization Techniques, Adaptive Learning Rate Methods.

Implement a function error dependent updates that performs Q learning with a variable number of weight updates per transition, where the number of updates is determined by the magnitude of the temporal difference (TD) error. The core idea is that transitions with large TD errors represent surprising experiences that warrant more learning iterations, while small errors only need a single update. This adaptive approach allocates more computation where it matters most. Your function receives: transitions: A list of tuples (state, action, reward, next state, done) representing the agent's experience stream. States and actions are integers. If done is True, the transition is terminal. num states: Total number of states. num actions: Total number of actions. alpha: Learning rate for Q value updates. gamma: Discount factor. error threshold: A positive float that controls how the TD error magnitude maps to the number of updates. max updates: An integer cap on the maximum number of updates per transition. For each transition in order: 1. Compute the initial TD error. For non terminal transitions use the Q learning target (greedy over next state). For terminal transitions the target is just the reward. 2. Determine the number of updates k to perform: use integer floor division of the absolute TD error by the threshold, then clamp to be at least 1 and at most max updates. 3. Perform k successive updates to Q(s, a), recomputing the TD error from the current Q values before each individual update. Initialize all Q values to zero. Return a dictionary with: 'Q': The final Q table as a list of lists, with values rounded to 4 decimal places. 'update counts': A list of integers, one per transition, indicating how many updates were performed for that transition.