Shared Memory Circular Buffer for RL

Infrastructure, Parallelism & Hardware Efficiency DS practice problem on Onlearn.

Difficulty: medium.

Topics: Shared Memory Circular Buffer for RL, Ring Buffer Indexing, Atomic Compare-and-Swap, Memory-Mapped Files, Producer-Consumer Synchronization, Cache Line Alignment, High-Performance Computing, Reinforcement Learning, Distributed Systems, Memory Management, Concurrency Control, Experience Replay Mechanisms, Inter-Process Communication, Lock-Free Data Structures, Shared Memory Architectures, Asynchronous Parallelism.

In reinforcement learning, experience replay buffers are essential for storing and resampling past transitions. When training across multiple processes, these buffers are often backed by fixed size shared memory regions. A circular (ring) buffer is ideal because it uses a constant amount of memory and naturally overwrites the oldest data when full. Implement a class SharedCircularBuffer that stores RL transitions (state, action, reward, next state, done) in pre allocated numpy arrays (simulating shared memory segments). The class should support: 1. init (self, capacity, state dim): Initialize fixed size numpy arrays for all transition fields. States and next states should be float64 2D arrays of shape (capacity, state dim). Actions should be int64. Rewards should be float64. Dones should be boolean. 2. push(self, state, action, reward, next state, done): Write a transition at the current write position and advance the pointer circularly. When the buffer is full, the oldest transition should be overwritten. 3. sample(self, batch size, seed=None): Return a dictionary of randomly sampled transitions. Use np.random.RandomState(seed) to generate random indices into the valid portion of the buffer. Dictionary keys should be 'states', 'actions', 'rewards', 'next states', 'dones', each containing a Python list (via .tolist()). 4. latest(self, n): Return the n most recently pushed transitions as a dictionary (same format as sample), ordered from oldest to newest. If n exceeds the current buffer size, return all stored transitions. 5. len (self): Return the current number of valid transitions stored.