Adaptive Data Pruning for Support Vector Machines

Yasuhiro Fujiwara, Junya Arai, Sekitoshi Kanai, Yasutoshi Ida, Naonori Ueda

2018 (modified: 17 Apr 2023)IEEE BigData 2018Readers: Everyone

Abstract: Support Vector Machine (SVM) is one of the most popular classification algorithms. SVM separates data points into two classes by using the hyper-plane that is maximally distant from the two classes. Since SVM is theoretically based on statistical learning theory and the principle of structural risk minimization, it offers highly accurate classification. However, its training process is computationally expensive. This paper proposes Sahara as an efficient training algorithm for SVM. It identifies data points that have no influence on SVM classification by computing the upper and lower bounds of a parameter that determines the hyper-plane. Our approach can efficiently compute the bounds by using Singular Value Decomposition (SVD) and a sparse data matrix. Theoretically, our approach guarantees to yield the optimal hyper-plane of SVM for any given set of data points. Experiments show that Sahara is significantly faster than previous approaches.

0 Replies