OmniMixup: Generalize Mixup with Mixing-Pair Sampling Distribution

17 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: Mixup, Machine Learning, molecule property prediction, image classification, data augmentation
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: A new Mixup method generalizing the vanilla Mixup method and allow incorporating multiple previous works into a unified analysis framework.
Abstract: Mixup is a widely-adopted data augmentation techniques to mitigates the overfitting issue in empirical risk minimization. Current works of modifying Mixup are modality-specific, thereby limiting the applicability across diverse modalities. Although alternative approaches try circumventing such barrier via mixing-up data from latent features based on sampling distribution, they still require domain knowledge for designing sampling distribution. Moreover, a unified theoretical framework for analyzing the generalization bound for this line of research remains absent. In this paper, we introduce OmniMixup, a generalization of prior works by introducing Mixing-Pair Sampling Distribution (MPSD), accompanied by a holistic theoretical analysis framwork. We find both theoretically and empirically that the Mahalanobis distance (M-Score), derived from the sampling distribution, offers significant insights into OmniMixup's generalization capabilities. Accordingly, we propose OmniEval, an evaluation framework designed to autonomously identify the optimal sampling distribution. The empirical study on both images and molecules demonstrates that 1) OmniEval is adept at determining the appropriate sampling distribution for OmniMixup, and 2) OmniMixup exhibits promising capability for application across various modalities and domains.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 999
Loading