Prioritized Experience Replay

Advanced & Deep RL DS practice problem on Onlearn.

Difficulty: medium.

Topics: Prioritized Experience Replay, Temporal Difference Error, Sum-Tree Data Structure, Annealing Bias Correction, Stochastic Rank-based Sampling, Proportional Prioritization, Reinforcement Learning, Stochastic Optimization, Data Structures, Probability Theory, Deep Learning, Temporal Difference Learning, Importance Sampling, Stochastic Prioritization, Binary Heap Operations, Experience Buffer Management.

Implement a prioritized experience replay sampling function used in deep reinforcement learning. In standard experience replay, transitions are sampled uniformly at random from a buffer. Prioritized experience replay instead samples transitions with probability proportional to their priority (typically based on TD error magnitude), so that more 'surprising' or 'informative' experiences are replayed more frequently. Your function should: 1. Compute sampling probabilities from priorities using proportional prioritization with an exponent alpha that controls the degree of prioritization. 2. Sample a batch of indices from the buffer according to these probabilities (without replacement). 3. Compute importance sampling (IS) weights that correct for the bias introduced by non uniform sampling. The IS exponent beta controls how much correction is applied. The weights should be normalized by dividing by the maximum weight in the batch. The function should return a dictionary with: 'indices': list of sampled experience indices 'probabilities': list of sampling probabilities for ALL experiences (rounded to 4 decimal places) 'weights': list of normalized IS weights for the sampled experiences (rounded to 4 decimal places)