Open Peer Review. Open Publishing. Open Access. Open Discussion. Open Directory. Open Recommendations. Open API. Open Source.
Small Coresets to Represent Large Training Data for Support Vector Machines
Cenk Baykal, Murad Tukan, Dan Feldman, Daniela Rus
Feb 15, 2018 (modified: Feb 15, 2018)ICLR 2018 Conference Blind Submissionreaders: everyoneShow Bibtex
Abstract:Support Vector Machines (SVMs) are one of the most popular algorithms for classification and regression analysis. Despite their popularity, even efficient implementations have proven to be computationally expensive to train at a large-scale, especially in streaming settings. In this paper, we propose a novel coreset construction algorithm for efficiently generating compact representations of massive data sets to speed up SVM training. A coreset is a weighted subset of the original data points such that SVMs trained on the coreset are provably competitive with those trained on the original (massive) data set. We provide both lower and upper bounds on the number of samples required to obtain accurate approximations to the SVM problem as a function of the complexity of the input data. Our analysis also establishes sufficient conditions on the existence of sufficiently compact and representative coresets for the SVM problem. We empirically evaluate the practical effectiveness of our algorithm against synthetic and real-world data sets.
TL;DR:We present an algorithm for speeding up SVM training on massive data sets by constructing compact representations that provide efficient and provably approximate inference.
Keywords:coresets, data compression
Enter your feedback below and we'll get back to you as soon as possible.