On the Efficiency-Safety Dilemma in Large Reasoning Models

ICLR 2026 Conference Submission15240 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Large Reasoning Models; Efficiency Methods; Model Safety; Jailbreak Attacks
Abstract: Large reasoning models (LRMs) excel in complex reasoning tasks but incur high inference costs, and efficiency techniques like quantization, pruning and KV Cache compression are widely used to reduce these costs. However, the impact of these techniques on model safety has been largely unexplored. This study offers the first comprehensive analysis of the relationship between efficiency, safety and reasoning performance in LRMs under such efficiency techniques. It finds that most efficiency methods seem to improve model safety in jailbreak benchmark tests, but this improvement is superficial, caused by reduced reasoning ability leading to more attempted but failed malicious responses rather than genuine alignment enhancements. Analysis across multiple representative LRMs confirms a consistent correlation between reasoning performance and vulnerability to jailbreak attacks. Additionally, the study evaluates combinations of efficiency methods and identifies quantization with pruning as the optimal strategy, which balances efficiency, safety and reasoning performance better than single methods. These findings fill the gap in understanding how efficiency techniques affect LRM safety and provide an empirical foundation for their safe and efficient deployment.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 15240
Loading