ALICE: An Inherently Interpretable Transformer for Cryptogram Solving

ACL ARR 2026 January Submission5342 Authors

05 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: explainability, reasoning, generalization, interpretability, cryptograms, permutation learning, substitution cipher, architecture
Abstract: While neural networks can solve complex symbolic puzzles, their internal decision-making on such tasks is often opaque. To address the lack of models that are interpretable by design, we propose cryptogram solving as a testbed for studying neural network reasoning and explainability. We introduce ALICE (an Architecture for Learning Interpretable Cryptogram dEcipherment), an encoder-only Transformer designed to solve cryptograms with both high accuracy (~99%) and explainability. To make the model inherently interpretable and to enforce the bijectivity constraint of the task without post-hoc approximations, we incorporate a novel bijective decoding head via the Gumbel-Sinkhorn method. Furthermore, we employ early-exit and probing experiments---the first of their kind for this task---to deconstruct ALICE's internal decision-making process, revealing that ALICE progressively refines its predictions in a way that appears to mirror common human strategies, with early layers placing greater emphasis on letter frequencies and later layers forming word-level structures.
Paper Type: Long
Research Area: Special Theme (conference specific)
Research Area Keywords: Explainability of NLP Models, probing, hierarchical & concept explanations
Contribution Types: Model analysis & interpretability, NLP engineering experiment
Languages Studied: English, French, German, Italian, Latin, Portuguese, Spanish
Submission Number: 5342
Loading