Keywords: LLMs, tabular prediction, in-context learning
TL;DR: We distill tree ensembles into concise textual representations for in-context learning in LLMs.
Abstract: Large Language Models (LLMs) are widely used in natural language processing and other real-world applications due to their ability to generalize across a broad range of tasks. However, they often underperform on tabular prediction problems, where traditional machine learning methods such as gradient boosting remain the state-of-the-art \cite{shwartz2022tabular}. In this paper we introduce \textsc{TreePrompt}, a framework that aims to bridge this gap. \textsc{TreePrompt} distills a tree ensemble into a concise textual representation and uses the representation for in-context learning in LLMs. This allows an LLM to effectively incorporate structured tabular information, from the tree ensemble, without any expensive model re-fitting or fine-tuning. Across several benchmark datasets, we show that \textsc{TreePrompt} consistently improves LLM performance on tabular prediction tasks and outperforms other in-context learning strategies under a fixed token budget.
Submission Number: 23
Loading