PEARL: Towards Permutation-Resilient LLMs

ICLR 2025 Conference Submission12884 Authors

28 Sept 2024 (modified: 27 Nov 2024)ICLR 2025 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: LLM, Distributionally Robust Optimization, OT
Abstract: The in-context learning (ICL) ability of large language models (LLMs) enables them to undertake challenging tasks using provided demonstrations. However, it is prone to instability: different orderings of demonstrations can significantly influence predictions, revealing LLMs’ limitations in processing combinatorial inputs. This paper shows that this vulnerability can be exploited to design a natural attack that is almost imperceptible to the model provider and can achieve nearly 80% success rates on the SOTA open-source model, LLaMA, by simply permuting the demonstrations. In light of this, how to overcome the ordering sensitivity problem is an important issue for improving the performance of LLMs. However, current mitigation methods focus on post-processing and fail to enhance models’ inherent robustness to the vast space of possible input permutations. To overcome this issue, we propose a novel Permutation-resilient learning framework (PEARL) based on distributional robust optimization (DRO), which optimizes model performance against the worst case among all possible permutations. Specifically, PEARL consists of a hard permutation mining network (P-Net) and the LLM. The P-Net identifies the most challenging permutations by formulating the task as an optimal transport problem, which is solved using an entropy-constrained Sinkhorn algorithm. Through minimax optimization, the P-Net progressively generates harder samples to enhance the LLM’s worst-case performance. Experiments with synthetic data and instruction tuning tasks demonstrate that the proposed PEARL framework effectively mitigates permutation attacks and improves overall performance.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 12884
Loading