Distribution shifts between the source and target domain pose significant challenges for machine learning, and different types of distribution shifts require distinct interventions. After analyzing 7,650 distribution shift pairs across three real-world tabular datasets, we find that $Y|X$-shifts are more prevalent in tabular data, in contrast to image data, where $X$-shifts are more dominant. In this work, we conduct a comprehensive and systematic study on leveraging recent large language models to generate improved feature embeddings for backend neural network models. Specifically, we develop a large-scale testbed consisting of 7,650 distribution shift pairs across the ACS Income, ACS Mobility, and ACS Public Coverage datasets, following a standard training-validation-testing protocol. Through an extensive analysis of 20 models and learning strategies across over 261,000 model configurations, we find that while LLM embeddings are inherently powerful, they do not consistently outperform state-of-the-art tree-ensemble methods. Interestingly, even a small number of target samples can have a significant impact for tabular $Y|X$ shifts. Additionally, we explore the influence of target sample size, fine-tuning strategies, and methods of integrating supplementary information.
Keywords: distribution shift, tabular prediction, large language model embeddings
TL;DR: In this work, through a thorough empirical study, we demonstrate that LLM embebddings are inherently powerful to deal with $Y|X$-shifts on tabular data, and even few target samples have a significant impact on the target generalization performance.
Abstract:
Submission Number: 15
Loading