Yonsei

Yonsei MLSys Student Group

Preserve-Then-Quantize

Yoonjun Cho - AI-ISL, CS, Yonsei University

Low-bit quantization of large language models often suffers from reconstruction error, as prior methods like QER allocate all low-rank capacity to error approximation while neglecting dominant structure. We propose Structured Residual Reconstruction (SRR), which splits a fixed rank budget between preserving key subspace directions and reconstructing quantization error via a principled one-shot criterion. This captures the trade-off without costly search. Empirically, SRR outperforms QER, integrates with GPTQ and QUIP#, and provides strong initialization for QPEFT, highlighting the importance of balancing preservation and reconstruction.

CV

Catering Courtesy of CIP Lab