Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models
Keywords: Large Language Models, Quantization, Safety Alignment
TL;DR: We conduct the first systematic safety evaluation for quantization on LLMs and propose a novel safety-patching algorithm for quantized LLMs.
Abstract: Quantized large language models (LLMs) have garnered surging demand for broadening the deployment scenarios of LLMs, particularly on resource-constrained applications, which would otherwise be infeasible due to the substantial resource overhead incurred by astronomical model sizes. Propelled by this vast application potential, various quantization techniques have been developed to convert high-precision LLMs into low-precision quantized counterparts, aiming to preserve strong capabilities with reduced bit-widths. While these techniques have made significant strides in preserving utility, their implications for safety remain insufficiently studied. Recent findings highlight the fragility of safety mechanisms in both high-precision and quantized LLMs, underscoring the need for systematic safety evaluations and targeted interventions for quantized models.
In this paper, we present a comprehensive safety evaluation of quantized LLMs to complement existing efforts, covering four mainstream quantization techniques across diverse settings, including varying quantization bit-widths and different quantization-assisting datasets, through widely-accepted safety measurements. Our empirical evaluation reveals concerning safety degradation across all quantization methods and settings. To address this, we propose a quantization-aware safety patching framework, Q-resafe, to efficiently restore the safety capabilities of quantized LLMs while minimizing any adverse impact on utility. Extensive experiments demonstrate that Q-resafe effectively restores the safety of quantized LLMs obtained from diverse quantization processes, aligning closely with pre-quantization LLMs, even when evaluated against challenging datasets. We will make our implementation publicly available https://anonymous.4open.science/r/Qresafe-D085/.
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6792
Loading