Unwrapping Circularity: Can Transformers Learn Languages with Circular Schemes?

Xiutian Zhao; Aulia Rafi; Siying Chen; Xiulin Yang

Unwrapping Circularity: Can Transformers Learn Languages with Circular Schemes?

Xiutian Zhao, Aulia Rafi, Siying Chen, Xiulin Yang

Published: 22 Jun 2025, Last Modified: 17 Jul 2025ACL-SRW 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Transformer models, non-linear languages, constructed languages, language learnabiltiy, inductive biases.

TL;DR: This study explores GPT-2 Small's ability to learn artificially synthesized circular languages, revealing complex interactions between token order, local context, and language learnability.

Abstract: The success of Transformer-based language models in NLP has sparked debate about their ability to simulate human language learning. Chomsky contends that these models indiscriminately acquire both natural and "impossible" languages. While recent studies have challenged this claim, the capacity of Transformers to handle unconventional linguistic structures remains underexplored. Inspired by natural and speculative languages with circular structural properties, this study examines the ability of GPT-2 to learn languages featuring circular schemes. We synthesize such circular languages by mapping original sequences onto textual circles and then relinearize them using parametric, mathematically invertible procedures that "unwrap" the circles into linear sequences. We train GPT-2 models on these relinearized corpora and assess the impact of linearization parameters by tracking structural distortion and measuring perplexity. Interestingly, high levels of distortion relative to the original structures do not necessarily correspond to increased perplexity, suggesting that GPT-2 is relatively insensitive to global token order during language acquisition. Instead, preserving local context during linearization plays a more critical role in model learning. Further analysis using surprisal differences reveals that positional shifts pose greater challenges to the model than changes in stride or direction, underscoring the nuanced effects of linearization strategies. These findings offer new insights into the inductive biases of Transformer-based models in acquiring unconventional linguistic structures.

Archival Status: Non-archival

Acl Copyright Transfer: pdf

Paper Length: Long Paper (up to 8 pages of content)

Submission Number: 203

Loading