Language Models are Injective and Hence Invertible

Language Models are Injective and Hence Invertible

ICLR 2026 Conference Submission16613 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: transformers, language models, invertibility, injectivity, inversion, privacy

TL;DR: We prove that transformers are (a.s.) injective and propose an algorithm that provably inverts their hidden representations back to the original input prompt.

Abstract: Transformer components such as non-linear activations and normalization are inherently non-injective, suggesting that different inputs could map to the same output and prevent exact recovery of the input from a model’s representations. In this paper, we challenge this view. First, we prove mathematically that transformer language models mapping discrete input sequences to their corresponding sequence of continuous representations are injective and therefore lossless, a property established at initialization and preserved during training. Second, we confirm this result empirically through billions of collision tests on six state-of-the-art language models, and observe no collisions. Third, we operationalize injectivity: we introduce SipIt, the first algorithm that provably and efficiently reconstructs the exact input text from hidden activations, establishing linear-time guarantees and demonstrating exact invertibility in practice. Overall, our work establishes injectivity as a fundamental and exploitable property of language models, with direct implications for transparency, interpretability, and safe deployment.

Supplementary Material: zip

Primary Area: foundation or frontier models, including LLMs

Submission Number: 16613

Loading