Block-wise FP8 Quantization

MoE, Compression & Scaling DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding Block-wise FP8 Quantization for LLM Inference, E4M3 vs E5M2 Formats, Block-wise vs Per-tensor Quantization, Quantization Clipping, De-quantization Error Analysis, Scaling Factor Quantization, Numerical Linear Algebra, Computer Architecture, Deep Learning Optimization, Information Theory, Floating Point Arithmetic, Quantization-Aware Training, Weight Compression, Hardware Acceleration, Dynamic Range Management, Outlier Mitigation.

Implement a function block wise fp8 quantize(tensor, block size) that takes a 1D tensor (represented as a list of floats) and a block size. The function should divide the tensor into chunks of block size, compute the scaling factor for each block (max val / 448.0 for E4M3), and return the quantized FP8 values (simulated as integers) and the scale factors for each block.

dsMoE, Compression & Scaling

Tutor

Waking the tutor…

MoE, Compression & Scaling

0 of 18 solved

Back to roadmap