When can isotropy help adapt LLMs' next word prediction to numerical domains?

Rashed Shelim; Shengzhe Xu; Walid Saad; Naren Ramakrishnan

When can isotropy help adapt LLMs' next word prediction to numerical domains?

Rashed Shelim, Shengzhe Xu, Walid Saad, Naren Ramakrishnan

28 Sept 2024 (modified: 30 Jan 2025)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Contextual embedding space, clusters, isotropy, language model representation, numeric downstream task

Abstract: Recent studies have shown that vector representations of embeddings learned by pre-trained large language models (LLMs) are effective in various downstream tasks in numerical domains. Despite their significant benefits, the tendency of LLMs to hallucinate in such domains can have severe consequences in applications like finance, energy, retail, climate science, wireless networks, synthetic tabular generation, among others. To guarantee prediction reliability and accuracy in numerical domains, it is necessary to have performance guarantees through explainability. However, there is little theoretical understanding of when pre-trained language models help solve numeric downstream tasks. This paper seeks to bridge this gap by understanding when the next-word prediction capability of LLMs can be adapted to numerical domains through the lens of isotropy. Specifically, we first provide a general numeric data generation process that captures the core characteristics of numeric data across various numerical domains. Then, we consider a log-linear model for LLMs in which numeric data can be predicted from its context through a network with softmax as its last layer. We demonstrate that, in order to achieve state-of-the-art performance in numerical domains, the hidden representations of the LLM embeddings must possess a structure that accounts for the shift-invariance of the softmax function. We show how the isotropic property of LLM embeddings preserves the underlying structure of representations, thereby resolving the shift-invariance problem problem of softmax function. In other words, isotropy allows numeric downstream tasks to effectively leverage pre-trained representations, thus providing performance guarantees in the numerical domain. Experiments show that different characteristics of numeric data could have different impacts on isotropy.

Primary Area: learning on time series and dynamical systems

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 12874

Loading