Abstract: Self-supervised learning has drawn recent interest for learning generalizable, transferable and robust representations from unlabeled tabular data. Unfortunately, unlike its image and language counterparts which have unique spatial or semantic structure information, it is difficult to design an effective augmentation method generically beneficial to downstream tasks in the tabular setting, owing to its lack of common structure and diverse nature. On the other hand, most existing augmentation methods are domain-specific (such as rotation in vision, token masking for NLP, and edge dropping for graphs), making them less effective for real-world tabular data. This significantly limits tabular self-supervised learning and hinders progress in this domain. Aiming to fill this crucial gap, we propose STab, an augmentation-free self-supervised representation learning based on stochastic regularization techniques that does not rely on negative pairs, to capture highly heterogeneous and non-structured information in tabular data. Our experiments show that STab achieves state-of-the-art performance compared to existing contrastive and pretext task self-supervised methods.
Slides: pdf
0 Replies
Loading