Alice: An Interpretable Neural Architecture for Generalization in Substitution Ciphers

Jeff Shen; Lindsay M. Smith

Alice: An Interpretable Neural Architecture for Generalization in Substitution Ciphers

Jeff Shen, Lindsay M. Smith

13 Sept 2025 (modified: 04 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: reasoning, generalization, interpretability, cryptograms, permutation learning, substitution cipher

TL;DR: New model with interpretable bijective decoding head achieves SOTA performance and strong generalization on cryptogram deciphering, a novel reasoning task

Abstract: We present cryptogram solving as an ideal testbed for studying neural network reasoning and generalization; models must decrypt text encoded with substitution ciphers, choosing from 26! possible mappings without explicit access to the cipher. We develop ALICE (an Architecture for Learning Interpretable Cryptogram dEcipherment), a simple encoder-only Transformer that sets a new state-of-the-art for both accuracy and speed on this decryption problem. Surprisingly, ALICE generalizes to unseen ciphers after training on only ${\sim}1500$ unique ciphers, a minute fraction ($3.7 \times 10^{-24}$) of the possible cipher space. To enhance interpretability, we introduce a novel bijective decoding head that explicitly models permutations via the Gumbel-Sinkhorn method, enabling direct extraction of learned cipher mappings. Through early exit and probing experiments, we reveal how ALICE progressively refines its predictions in a way that appears to mirror common human strategies---early layers place greater emphasis on letter frequencies, while later layers form word-level structures. Our architectural innovations and analysis methods are applicable beyond cryptograms and offer new insights into neural network generalization and interpretability.

Primary Area: interpretability and explainable AI

Submission Number: 4907

Loading