Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models

Kejia Chen; Jiawen Zhang; Jiacong Hu; Yu Wang; Jian Lou; Zunlei Feng; Mingli Song

Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models

Kejia Chen, Jiawen Zhang, Jiacong Hu, Yu Wang, Jian Lou, Zunlei Feng, Mingli Song

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We conduct the systematic safety evaluation for quantization on LLMs and propose a novel safety-patching algorithm for quantized LLMs.

Abstract: Quantized large language models (LLMs) have gained increasing attention and significance for enabling deployment in resource-constrained environments. However, emerging studies on a few calibration dataset-free quantization methods suggest that quantization may compromise the safety capabilities of LLMs, underscoring the urgent need for systematic safety evaluations and effective mitigation strategies. In this paper, we present comprehensive safety evaluations across various mainstream quantization techniques and diverse calibration datasets, utilizing widely accepted safety benchmarks. To address the identified safety vulnerabilities, we propose a quantization-aware safety patching framework, Q-resafe, to efficiently restore the safety capabilities of quantized LLMs while minimizing any adverse impact on utility. Extensive experiment results demonstrate that Q-resafe successfully re-aligns the safety of quantized LLMs with their pre-quantization counterparts, even under challenging evaluation scenarios. Project page: https://github.com/Thecommonirin/Qresafe.

Lay Summary: Large language models (LLMs) like ChatGPT are powerful AI systems that generate text and answer questions. However, they are so big that they require a lot of computing power, making it hard to use them on smaller devices. One solution is to compress these models, a process called quantization, which makes them more efficient. But we found that compressing LLMs can make them less safe, meaning they might generate inappropriate or harmful outputs more easily. In our work, we tested several common compression methods and found clear safety issues. To fix this, we designed a new system called Q-resafe. It acts like a safety patch, restoring the LLMs' protective filters even after compression, without sacrificing their usefulness. Our tests show that Q-resafe helps compressed models stay as safe as they were before compression, even under tough conditions. This research helps ensure that as LLMs become more widely used, they stay both efficient and responsible.

Application-Driven Machine Learning: This submission is on Application-Driven Machine Learning.

Link To Code: https://github.com/Thecommonirin/Qresafe

Primary Area: Social Aspects->Safety

Keywords: Large Language Model, Preference Alignment, Safety Evaluation

Submission Number: 9330

Loading