Simplicial SMOTE: Oversampling Solution to the Imbalanced Learning Problem

22 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: data augmentation, oversampling, imbalanced learning problem
Abstract: SMOTE is the established geometric approach to random oversampling to balance classes in the imbalanced classes learning problem, followed by many extensions. Its idea is to introduce synthetic data points of the minor class, with each new point being the convex combination of an existing data point and one of its k-nearest neighbors. This could be viewed as a sampling from the edges of a geometric neighborhood graph. Borrowing tools from the topological data analysis, we propose a generalization of the sampling approach, thus sampling from the simplices of the geometric neighborhood simplicial complex. That is, a new point is defined by the barycentric coordinates with respect to a simplex spanned by an arbitrary number of data points being sufficiently close, rather than a pair. We evaluate the generalized technique which we call Simplicial SMOTE on 23 benchmark datasets, and conclude that it outperforms the original SMOTE and its extensions. Moreover, we show how simplicial sampling can be integrated into several popular SMOTE extensions, with our simplicial generalization of Borderline SMOTE further improves the performance on benchmarks datasets.
Primary Area: unsupervised, self-supervised, semi-supervised, and supervised representation learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 6425
Loading