Keywords: llm, continuity, spatiotemporal transformers, linguistics
TL;DR: LLMs implicitly behave like continuous models, even when trained in a discrete fashion.
Abstract: Language is typically modelled with discrete sequences. However, the most successful approaches to language modelling, namely neural networks, are continuous and smooth function approximators.
In this work, we show that Transformer-based language models implicitly learn to represent sentences as continuous-time functions defined over a continuous input space.
This phenomenon occurs in most state-of-the-art Large Language Models (LLMs), including Llama2, Llama3, Phi3, Gemma, Gemma2, and Mistral, and suggests that LLMs reason about language in ways that fundamentally differ from humans.
Our work formally extends Transformers to capture the nuances of time and space continuity in both input and output space.
Our results challenge the traditional interpretation of how LLMs understand language, with several linguistic and engineering implications.
Primary Area: foundation or frontier models, including LLMs
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 11994
Loading