SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

Kacper Jurek; Wojciech Batko; Marek Śmieja; Marcin Przewięźlikowski

SeBA: Semi-supervised few-shot learning via Separated-at-Birth Alignment for tabular data

Kacper Jurek, Wojciech Batko, Marek Śmieja, Marcin Przewięźlikowski

Published: 18 Nov 2025, Last Modified: 18 Nov 2025AITD@EurIPS 2025 PosterEveryoneRevisionsBibTeXCC BY 4.0

Submission Type: Short paper (4 pages)

Keywords: Self-supervised learning, tabular data, few-shot learning

TL;DR: A novel augmentation-free joint-embedding self-supervised pretraining algorithm for tabular data.

Abstract: Learning from scarce labeled data with a larger pool of unlabeled samples, known as semi-supervised few-shot learning (SS-FSL), remains critical for applications involving tabular data in domains like medicine, finance, and science. The existing SS-FSL methods often rely on self-supervised learning (SSL) frameworks developed for vision or language, which assume the availability of a natural form of data augmentations. For tabular data, defining meaningful augmentations is non-trivial and can easily distort semantics, limiting the effectiveness of conventional SSL. In this work, we rethink SSL for tabular data and propose Separated-at-Birth Alignment (SeBA), a joint-embedding framework for SS-FSL that eliminates the dependence on augmentations. Our core idea is to separate the data into two independent, but complementary views and align the representations of one view to mirror the nearest-neighbor correspondence of the data in the second view. A type-aware separation scheme ensures robust handling of mixed categorical and numerical attributes, while a lightweight architecture with ensemble aggregation improves generalization and reduces sensitivity to misselection of model parameters. An experimental study conducted in various benchmark datasets demonstrates that SeBA often achieves state-of-the-art performance in tabular SS-FSL, opening a new avenue for SSL paradigm in the domain of tabular data.

Submission Number: 39

Loading