TreePrompt: Distilling Boosted Tree Ensembles for In-Context Learning in LLMs

Brian Liu; Rahul Mazumder

TreePrompt: Distilling Boosted Tree Ensembles for In-Context Learning in LLMs

Brian Liu, Rahul Mazumder

Published: 04 Jul 2025, Last Modified: 04 Aug 2025KDD 2025 Workshop SKnow-LLM PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLMs, tabular prediction, in-context learning

TL;DR: We distill tree ensembles into concise textual representations for in-context learning in LLMs.

Abstract: Large Language Models (LLMs) are widely used in natural language processing and other real-world applications due to their ability to generalize across a broad range of tasks. However, they often underperform on tabular prediction problems, where traditional machine learning methods such as gradient boosting remain the state-of-the-art \cite{shwartz2022tabular}. In this paper we introduce \textsc{TreePrompt}, a framework that aims to bridge this gap. \textsc{TreePrompt} distills a tree ensemble into a concise textual representation and uses the representation for in-context learning in LLMs. This allows an LLM to effectively incorporate structured tabular information, from the tree ensemble, without any expensive model re-fitting or fine-tuning. Across several benchmark datasets, we show that \textsc{TreePrompt} consistently improves LLM performance on tabular prediction tasks and outperforms other in-context learning strategies under a fixed token budget.

Submission Number: 23

Loading