AE-SMOTE: A Multi-Modal Minority Oversampling Framework

Sajad Darabi; Yotam Elor

AE-SMOTE: A Multi-Modal Minority Oversampling Framework

Sajad Darabi, Yotam Elor

28 Sept 2020 (modified: 05 May 2023)ICLR 2021 Conference Withdrawn SubmissionReaders: Everyone

Keywords: Data Augmentation, Binary Classification, Autoencoder, Tabular Data, Imbalanced Data

Abstract: Real-world binary classification tasks are in many cases unbalanced i.e. the minority class is much smaller than the majority class. This skewness is challenging for machine learning algorithms as they tend to focus on the majority and greatly misclassify the minority. Oversampling the minority using \emph{SMOTE} before training the model is a popular method to address this challenge. Inspired by \emph{SMOTE}, we propose \emph{AE-SMOTE}, which by using an autoencoder, (1) maps the features to a dense continuous latent space, (2) applies oversampling by interpolation in the latent space, and (3) maps the synthetic samples back to the original feature space. While \emph{SMOTE} supports discrete (categorical) features, almost all variants and extensions of \emph{SMOTE} do not. Wrapping any one of these \emph{SMOTE} variants with an autoencoder will enable it to support multi-modal datasets that include discrete features. We have empirically shown the effectiveness of the proposed approach on 35 publicly available datasets.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Reviewed Version (pdf): https://openreview.net/references/pdf?id=yXbEE-M4Vp

4 Replies

Loading