Preserve-Then-Quantize

15 Apr 2026

Yoonjun Cho - AI-ISL, CS, Yonsei University

Low-bit quantization of large language models often suffers from reconstruction error, as prior methods like QER allocate all low-rank capacity to error approximation while neglecting dominant structure. We propose Structured Residual Reconstruction (SRR), which splits a fixed rank budget between preserving key subspace directions and reconstructing quantization error via a principled one-shot criterion. This captures the trade-off without costly search. Empirically, SRR outperforms QER, integrates with GPTQ and QUIP#, and provides strong initialization for QPEFT, highlighting the importance of balancing preservation and reconstruction.

Catering Courtesy of CIP Lab

Yonsei MLSys Student Group

Preserve-Then-Quantize

Yoonjun Cho - AI-ISL, CS, Yonsei University

Related posts

Understanding Cameras from First Principles: Focus, Exposure, and the Bridge from Film to Digital 22 Apr 2026

My Research Journey at CIPLAB 08 Apr 2026

Compiler Optimization for PIM 01 Apr 2026