Representation Space Augmentation for Effective Self-Supervised Learning on Tabular Data

Moonjung Eo, Kyungeun Lee, Hye-Seung Cho, Dongmin Kim, Ye Seul Sim, Woohyung Lim

Published: 01 Jan 2025, Last Modified: 21 Jul 2025AAAI 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Tabular data, widely used across industries, remains underexplored in deep learning. Self-supervised learning (SSL) shows promise for pre-training deep neural networks (DNNs) on tabular data, but its potential is hindered by challenges in designing suitable augmentations. Unlike image and text data, where SSL leverages inherent spatial or semantic structures, tabular data lacks such explicit structure. This makes traditional input-level augmentations, like modifying or removing features, less effective due to difficulties in balancing critical information preservation with variability. To address these challenges, we propose RaTab, a novel method that shifts augmentation from input-level to representation-level using matrix factorization, specifically truncated SVD. This approach preserves essential data structures while generating diverse representations by applying dropout at various stages of the representation, thereby significantly enhancing SSL performance for tabular data.