Mission Accomplished? Recovering Information from ‘Impossible’ Languages with LLMs

ACL ARR 2026 January Submission749 Authors

24 Dec 2025 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Larg Language Models, Information Locality, Impossible Language, Cognitive Modeling
Abstract: Understanding whether large language models operate under constraints comparable to human linguistic cognition remains a central question in AI and cognitive science. While there has been some research into whether LLMs can learn linguistically possible and impossible languages, it is less clear whether they can systematically recover linguistic structure and meaning from systematically degraded input. In this work, we investigate whether LLMs can translate impossible languages back into possible forms and whether there is any difference between different types of impossible languages for recovery. By fine-tuning GPT-2 on several perturbation types, we find that models can reconstruct grammatically well-formed output, with performance systematically modulated by the nature of the perturbation. Models trained on longer sentences benefit from richer training contexts, although longer sequences also increase the difficulty of resolving non-local dependencies. Overall, our findings indicate that LLMs display a preference for local over distant dependencies, yet can still overcome structural violations that render input unintelligible, revealing a partial alignment between neural architectural constraints and human linguistic biases.
Paper Type: Long
Research Area: Linguistic theories, Cognitive Modeling and Psycholinguistics
Research Area Keywords: linguistic theories, cognitive modeling, language modeling, linguistic universals, inductive biases
Contribution Types: Model analysis & interpretability, Data analysis, Theory
Languages Studied: English
Submission Number: 749
Loading