Self-supervision for Tabular Data by Learning to Predict Additive Homoskedastic Gaussian Noise as Pretext

Tahir Syed

27 Aug 2023OpenReview Archive Direct UploadReaders: Everyone

Abstract: The lack of scalability of data annotation translates to the need to decrease dependency on labels. Self-supervision offers a solution with data training themselves. However, it has received relatively less attention on tabular data, data that drive a large proportion of business and application domains. This work, which we name the Statistical Self-Supervisor (SSS), proposes a method for self-supervision on tabular data by defining a continuous perturbation as pretext. It enables a neural network to learn representations by learning to predict the level of additive isotropic Gaussian noise added to inputs. The choice of the pretext transformation is motivated by intrinsic characteristics of a neural network fundamentally performing linear fits under the widely adopted assumption of Gaussianity in its fitting error and the preservation of locality of a data example on the data manifold in the presence of small random perturbations. The transform condenses information in the generated representations, making them better employable for further task-specific prediction as evidenced by performance improvement of the downstream classifier. To evaluate the persistence of performance under low-annotation settings, SSS is evaluated against different levels of label availability to the downstream classifier (1% to 100%) and benchmarked against self- and semi-supervised methods. At the most label-constrained, 1% setting, we report a maximum increase of at least 2.5% against the next-best semi-supervised competing method. We report an increase of more than 1.5% against self-supervised state of the art. Ablation studies also reveal that increasing label availability from 0% to 1% results in a maximum increase of up to 50% on either of the five performance metrics and up to 15% thereafter, indicating diminishing returns in additional annotation.

0 Replies