Attention-aware Post-training Quantization without Backpropagation

Junhan Kim; Ho-young Kim; Eulrang Cho; Chungman Lee; Joonyoung Kim; Yongkweon Jeon

Attention-aware Post-training Quantization without Backpropagation

Junhan Kim, Ho-young Kim, Eulrang Cho, Chungman Lee, Joonyoung Kim, Yongkweon Jeon

24 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Quantization, Hyper-scale LLMs, Attention, Hessian

TL;DR: We propose a novel post-training quantization algorithm that considers inter-layer dependencies inside the attention module without relying on backpropagation.

Abstract: Quantization offers a promising solution for deploying large-scale language models (LLMs) on resource-constrained devices. However, early quantization methods, developed for smaller networks like ResNet, rely on gradient-based optimization, which becomes impractical for hyper-scale LLMs with billions of parameters. While recently proposed backpropagation-free post-training quantization (PTQ) methods alleviate this issue, their performance is limited by a lack of inter-layer dependency consideration. In this paper, we introduce a novel PTQ algorithm that incorporates inter-layer dependencies without relying on backpropagation. The key innovation is the development of attention-aware Hessian matrices that capture inter-layer interactions within the attention module. Extensive experiments demonstrate that our approach significantly outperforms conventional PTQ methods, particularly at low bit-widths.

Primary Area: other topics in machine learning (i.e., none of the above)

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3463

Loading