Abstract: Training a support vector machine (SVM) on large data sets is a computationally intensive task. In this paper, we study the problem of selecting a subset of data for training the SVM classifier under requirement that the loss of performance due to training data reduction is low. A function quantifying suitability of a selected subset is proposed, and a greedy algorithm for solving the subset selection problem is introduced. The algorithm is evaluated on hand digit recognition and other binary classification tasks, and its performance is compared to stratified sampling methods.
0 Replies
Loading