LLATAS: Large LAnguage models as Tabular Auxiliary feature Synthesizer

Published: 05 Mar 2026, Last Modified: 25 Apr 2026ICLR 2026 Workshop LLM ReasoningEveryoneRevisionsBibTeXCC BY 4.0
Track: tiny / short paper (up to 4 pages)
Keywords: Auxiliary Feature Generation, LLM Reasoning, Tabular Learning
Abstract: While classical models like Gradient Boosting remain state-of-the-art for tabular data, their performance is often bottlenecked by the limitations of heuristic feature engineering. To address this, we introduce LLATAS, a framework that leverages Large Language Models (LLMs) to synthesize semantic reasoning traces as auxiliary features. Grounded in the Learning Using Privileged Information (LUPI) paradigm, we use these generated signals to train a teacher model, which then guides a lightweight student model operating solely on original inputs. This distillation process allows the student to inherit complex reasoning capabilities without incurring the computational cost of LLMs at inference. Empirical evaluations on disease prediction tasks demonstrate that LLATAS significantly outperforms baselines, reducing test error rates by 17.6% for XGBoost and 22.0% for MLP models.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Funding: Yes, the presenting author of this submission falls under ICLR’s funding aims, and funding would significantly impact their ability to attend the workshop in person.
Submission Number: 107
Loading