FP4 Quantization with Microscaling (MXFP4)

MoE, Compression & Scaling DS practice problem on Onlearn.

Difficulty: hard.

Topics: Understanding MXFP4 Quantization and Microscaling, Tile-based scaling, Shared exponent representation, 4-bit quantization mapping, Max-abs normalization, Quantization error analysis, Numerical Linear Algebra, Computer Architecture, Information Theory, Optimization, Deep Learning Hardware, Quantization-Aware Training, Floating Point Arithmetic, Tensor Compression, Hardware-Software Co-design, Memory Bandwidth Optimization.

Implement a function that simulates the MXFP4 quantization process for a given vector of floating point numbers. The function should divide the input into tiles of size 32, find the shared scale factor based on the maximum absolute value within the tile, and return the quantized 4 bit representations and the associated scale factors.