Energy-Oriented Alignment for Large Language Models

27 Sept 2024 (modified: 01 Dec 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: datasets and benchmarks, jailbreak, energy attack, large language model, efficiency, denial-of-service
Abstract: Large language models (LLMs) have showcased remarkable capabilities on a variety of natural language processing (NLP) tasks, powering various real-world applications. Ensuring the safe and effective deployment of LLMs requires careful alignment to mitigate risks associated with malicious inputs, which now mainly involve toxic content and misinformation. In this study, we expand this focus by identifying and exploring a novel category of energy-oriented malicious instructions, akin to Denial-of-Service (DoS) attacks. These instructions provoke LLMs to generate excessively lengthy responses through impractical tasks, resulting in high energy and computational resource consumption, and even risking system overload. To address this gap, we curate EnergyAlign, the first energy-oriented malicious instruction dataset with 8 diverse categories. Then, we conduct a comprehensive evaluation of 5 advanced proprietary LLMs and 24 open-source LLMs. The results reveal a notable disparity: while proprietary LLMs can refuse such malicious inputs, most open-source LLMs are extremely vulnerable with a failure rate of up to 96.8\%. Additionally, we assess the effectiveness of jailbreak techniques in bypassing the energy-related safety measures of proprietary models. Lastly, we highlight the inadequacies of existing defense mechanisms and propose energy-oriented alignment data against EnergyAlign for future research.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 10252
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview