LLM Embeddings Improve Test-Time Adaptation to Tabular $Y|X$-Shifts

Yibo Zeng; Jiashuo Liu; Henry Lam; Hongseok Namkoong

LLM Embeddings Improve Test-Time Adaptation to Tabular $Y|X$-Shifts

Yibo Zeng, Jiashuo Liu, Henry Lam, Hongseok Namkoong

27 Sept 2024 (modified: 05 Feb 2025)Submitted to ICLR 2025EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM embeddings, distribution shifts, tabular data

TL;DR: In this work, through a thorough empirical study, we demonstrate that LLM embebddings are inherently powerful to deal with $Y|X$-shifts on tabular data, and even few target samples have a significant impact on the target generalization performance.

Abstract: For tabular datasets, the change in the relationship between the label and covariates ($Y|X$-shifts) is common due to missing variables. Since it is impossible to generalize to a completely new and unknown domain, we study models that are easy to adapt to the target domain even with few labeled examples. We focus on building more informative representations of tabular data that can mitigate $Y|X$-shifts, and propose to leverage the prior world knowledge in LLMs by serializing the tabular data to encode it. We find LLM embeddings alone provide inconsistent improvements in robustness, but models trained on them can be well adapted to the target domain even using 32 labeled observations. Our finding is based on a systematic study consisting of 7650 source-target pairs and benchmark against **261,000** model configurations trained by 20 algorithms. Our observation holds when ablating the size of accessible target data and different adaptation strategies.

Primary Area: alignment, fairness, safety, privacy, and societal considerations

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 10485

Loading