Zero-Copy Batch Data Loading from Shared Memory
Infrastructure, Parallelism & Hardware Efficiency DS practice problem on Onlearn.
Difficulty: medium.
Topics: Zero-Copy Batch Data Loading from Shared Memory, POSIX Shared Memory, Memory Mapping (mmap), Zero-Copy Serialization, Ring Buffer Synchronization, NUMA-Aware Allocation, Distributed Systems, Operating Systems, High-Performance Computing, Data Engineering, Computer Architecture, Inter-Process Communication, Memory Management, Parallel Data Pipelines, Kernel-Level Optimization, I/O Throughput Engineering.
In high performance ML training pipelines, data loading can be a critical bottleneck. One optimization is to store datasets in a shared, contiguous memory buffer and serve batches as views (zero copy) rather than creating new copies for every batch request. Implement a ZeroCopyBatchLoader class that: 1. init (self, data, batch size) : Accepts a 2D NumPy array data of shape (n samples, n features) and an integer batch size. Internally stores the data in a flat (1D) contiguous buffer that simulates a shared memory region. 2. num batches(self) int : Returns the total number of batches. The last batch may have fewer samples than batch size. 3. get batch(self, batch idx) np.ndarray : Returns a 2D array of shape (batch rows, n features) for the requested batch. This must be a memory view into the internal buffer, not a copy. Raises IndexError for invalid batch indices. 4. is zero copy(self, batch idx) bool : Returns True if the batch returned by get batch shares memory with the internal buffer (i.e., is truly zero copy), False otherwise. 5. get batch means(self) list : Returns a list of per batch mean values (scalar mean across all elements in each batch), each rounded to 4 decimal places. 6. write to buffer(self, row, col, value) : Writes a value directly into the internal flat buffer at the position corresponding to (row, col) in the original 2D layout. The key constraint: get batch must return arrays that share the same underlying memory as the internal buffer, so that mutations via write to buffer are immediately visible through previously or subsequently retrieved batch views, and vice versa.