Informed Augmentation Selection Improves Tabular Contrastive Learning

Published: 13 Oct 2024, Last Modified: 02 Dec 2024NeurIPS 2024 Workshop SSLEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Self-supervised Learning, Contrastive Learning, Tabular Data, Deep Representation Learning
TL;DR: This study investigates how tabular data augmentations influence contrastive-learned feature spaces, and proposes a novel framework for systematically selecting and combining suitable augmentations.
Abstract: While contrastive learning (CL) has demonstrated success in image data, its application to tabular data remains relatively unexplored. The effectiveness of CL heavily depends on data augmentations, yet the suitability of tabular augmentation techniques for contrastive learning remains unclear. In this study, we assess the compatibility of various tabular augmentation techniques with CL by examining their impact on feature space characteristics (i.e., uniformity and alignment) which serve as proxies for downstream performance. Our investigation reveals that augmentations impact feature space quality, and that achieving a balance between uniformity and alignment is essential for good downstream performance. We then propose a novel framework for selecting augmentation combinations that strike this balance. Experimental results on 21 tabular datasets from the OpenML-CC18 benchmark and on the TCGA cancer genomics dataset consistently demonstrate the effectiveness of our proposed framework in enhancing downstream performance.
Submission Number: 67
Loading