Estimate Minimum GPU Count for Model Deployment

Infrastructure, Parallelism & Hardware Efficiency DS practice problem on Onlearn.

Difficulty: easy.

Topics: Estimate Minimum GPU Count for Model Deployment, Tensor Sharding, VRAM Fragmentation, Batch Size Scaling, Peak Memory Footprint, Inter-node Bandwidth, Distributed Systems, Hardware Architecture, Performance Engineering, Capacity Planning, Cloud Infrastructure, Model Parallelism, Memory Management, Throughput Optimization, Resource Provisioning, Latency Profiling.

When deploying large language models (LLMs) or other deep learning models in production, a critical planning step is estimating how many GPUs are needed to host the model. The model's parameters must fit in GPU memory along with runtime overhead for activations, KV cache, and framework buffers. Write a function estimate min gpus that calculates the minimum number of GPUs required to deploy a model given: num params billion : The number of model parameters in billions (e.g., 7 for a 7B model) bytes per param : The number of bytes used per parameter based on the chosen precision (e.g., 4 for FP32, 2 for FP16/BF16, 1 for INT8) gpu memory gb : The available memory per GPU in gigabytes (using the convention 1 GB = 10^9 bytes, consistent with GPU manufacturer specs) overhead fraction : A fraction representing additional memory needed beyond raw model weights for runtime costs such as KV cache, activations, and framework overhead (e.g., 0.2 means 20% extra memory on top of model weight memory) The function should return a dictionary with three keys: model memory gb: The memory required for model weights alone (rounded to 2 decimal places) total memory gb: The total memory required including overhead (rounded to 2 decimal places) min gpus: The minimum number of GPUs needed (integer, rounded up since you cannot use a fraction of a GPU)