[Short] Few-Shot Cross-Table Data Mixture in Tabular In-Context Learning: Benefits, Failure Modes, and Alignment
Keywords: Tabular foundation models
Abstract: Tabular foundation models show promise for structured data prediction, but unlike text and images, tabular datasets exhibit heterogeneous schemas and label semantics. This raises a critical question: Does mixing tables during few-shot training improve in-context learning (ICL)? We systematically investigate cross-table training under controlled few-shot protocols, comparing single-table training versus augmentation with auxiliary datasets. We identify severe negative transfer under naive mixing and propose two alignment strategies: feature-level matching via optimal transport (OT) and label semantics alignment via pseudo-labeling. Our key finding reveals an architectural divide: TabPFN-v2 and MITRA fail to benefit from cross-table augmentation, while representation-based models (TabICL) achieve +1.02% average improvement. This indicates that cross-table learning requires learned embedding spaces where semantic correspondences can be preserved across heterogeneous schemas.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 118
Loading