Abstract: The data mining community has a major challenge in classifying datasets with noise, outliers, and imbalanced classes. Twin support vector machine (TSVM) is a well-known plane-based learning technique for classification, however, it has poor performance on the aforementioned datasets. To address the issue, in this article, we propose a novel class probability and generalized bell fuzzy twin SVM for imbalanced data (CGFTSVM-ID). The proposed CGFTSVM-ID assigns membership value to the data points using a new membership function called class probability and generalized bell (CPGB) function. The membership function for the majority class is a combination of the generalized bell (gbell) function, class probability, and imbalance ratio. The gbell function suppresses the negative impact of outliers in the training data by assigning them less value. The less class probability of the majority class data points denotes their higher possibility to be noise. The imbalance ratio of the classes considered in the membership function tackles the imbalancing issue of the datasets. In order to ensure the importance of the minority class samples in model learning, relatively high memberships are assigned to them. Thus, the proposed CPGB function handles the class imbalance learning problem with noise and outliers. We employ successive overrelaxation technique to solve the proposed optimization problem. The extensive numerical experiments and statistical analysis carried out over imbalanced real-world UCI and KEEL datasets clearly reveal that the proposed CGFTSVM-ID has superior generalization performance in comparison to baseline models. Moreover, the experiments are also conducted on the publicly available ADNI dataset for Alzheimer's disease classification and the results demonstrate the superiority of the proposed CGFTSVM-ID.
Loading