Learned Mixing Weights for Transferable Tabular Data Augmentation

Tal Shaharabany; Lior Wolf

Learned Mixing Weights for Transferable Tabular Data Augmentation

Tal Shaharabany, Lior Wolf

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Tabuler data, mix augmentation, data augmentation

Abstract: We present an architecture-agnostic method for tabular data augmentation, which mixes pairs of samples from the training set. The mixing procedure is based on a set of per-feature weights that are assigned by a learned network $g$ that is separate from the primary classification model $f$. The features are selected between the two samples at random, and the sum of the weights that $g$ assigns the features that are selected from each sample determines the mixing of the target label. $g$ itself is trained based on two loss terms, one that encourages variability in the assigned weights between the features and one that encourages, for every training sample, the model $f$ to be agnostic to the features for which $g$ assigns low weights. Our experiments show that this learned data augmentation method improves multiple neural architectures designed for tabular data. Even more notable is that the network $g$ that was trained on an MLP produces mixed samples that improve non-differentiable methods, including classical methods and gradient-boosting decision tree methods. This is done without any further tuning and with the default parameters of the classifiers. The outcome achieved this way, using the cutting-edge CatBoost method now represents the state of the art.

Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 5409

Loading