Multi-Instance GPU (MIG) Resource Allocation

Infrastructure, Parallelism & Hardware Efficiency DS practice problem on Onlearn.

Difficulty: medium.

Topics: Multi-Instance GPU (MIG) Resource Allocation, GPU Slice Isolation, Single Root I/O Virtualization, Quality of Service (QoS) Enforcement, Memory Bandwidth Throttling, Compute Instance Profiling, Distributed Systems, Computer Architecture, Cloud Computing, Virtualization Technology, Performance Engineering, Hardware Partitioning, Resource Orchestration, Memory Management, Compute Scheduling, System Telemetry.

Multi Instance GPU (MIG) Resource Allocation NVIDIA's Multi Instance GPU (MIG) technology allows a single physical GPU to be partitioned into multiple isolated instances, each with dedicated compute slices and memory. This enables efficient sharing of expensive GPU hardware across multiple inference or training workloads. Implement a function mig resource allocation that allocates MIG instances to a set of workloads on a single GPU. Inputs gpu config (dict): GPU specifications with keys 'total compute slices' (int) and 'total memory gb' (float) mig profiles (list of dicts): Available MIG partition profiles. Each dict has keys 'name' (str), 'compute slices' (int), 'memory gb' (float). Multiple workloads can use the same profile type. workloads (list of dicts): Workloads requesting GPU resources. Each dict has keys 'name' (str), 'min compute slices' (int), 'min memory gb' (float) Allocation Strategy Use a greedy approach: 1. Process workloads in order of decreasing resource demand (sort by compute slices descending as primary key, then by memory descending as secondary key) 2. For each workload, find a compatible MIG profile where both compute slices and memory meet or exceed the workload's minimums 3. Among compatible profiles, prefer the one that wastes the fewest resources (smallest compute slices first, then smallest memory) 4. Only allocate the profile if the GPU has enough remaining compute slices AND memory to create that instance 5. If no compatible profile can be allocated, reject the workload Output Return a dictionary with: 'allocations': list of dicts, each with 'workload' (str), 'profile' (str), 'compute slices' (int), 'memory gb' (float) 'total compute used': int, total compute slices allocated 'total memory used': float, total memory allocated (rounded to 2 decimals) 'compute utilization': float, percentage of compute used (rounded to 2 decimals) 'memory utilization': float, percentage of memory used (rounded to 2 decimals) 'workloads served': int, number of successfully allocated workloads 'workloads rejected': list of workload name strings that could not be allocated