Towards Formally Verifying LLMs: Taming the Nonlinearity of the Transformer

26 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: large language models, formal verification, set-based computing, matrix polynomial zonotopes, neural networks
Abstract: Large language models are increasingly used across various domains, which raises important safety concerns, particularly regarding adversarial attacks. While recent advancements in formal neural network verification have shown promising results, the complexity of transformers, the backbone of large language models, poses unique challenges for formal robustness verification. Traditional convex relaxation methods often result in large approximation errors due to the transformer's parallel, nonlinear attention heads. In this work, we address these limitations by introducing a novel approach based on non-convex, set-based computing to preserve the nonlinear dependencies through a transformer. Our approach generalizes previous methods on robustness verification of transformers, and the desired precision is tunable at the cost of additional computation time with a single parameter.
Supplementary Material: zip
Primary Area: alignment, fairness, safety, privacy, and societal considerations
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6381
Loading

OpenReview is a long-term project to advance science through improved peer review with legal nonprofit status. We gratefully acknowledge the support of the OpenReview Sponsors. © 2025 OpenReview