Fast Training of Large Kernel Models with Delayed Projections

Amirhesam Abedsoltan; Siyuan Ma; Parthe Pandit; Mikhail Belkin

Fast Training of Large Kernel Models with Delayed Projections

Amirhesam Abedsoltan, Siyuan Ma, Parthe Pandit, Mikhail Belkin

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 spotlightEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Kernel machines, large-scale kernel machines, Preconditioned-SGD, Nyström approximation

TL;DR: We introduce a new SGD-based algorithm with delayed projection for training kernel machines that achieves comparable or superior performance while reducing training time from days to under an hour.

Abstract: Classical kernel machines have historically faced significant challenges in scaling to large datasets and model sizes—a key ingredient that has driven the success of neural networks. In this paper, we present a new methodology for building kernel machines that can scale efficiently with both data size and model size. Our algorithm introduces delayed projections to Preconditioned Stochastic Gradient Descent (PSGD) allowing the training of much larger models than was previously feasible. We validate our algorithm, \EP4, across multiple datasets, demonstrating drastic training speedups without compromising the performance. Our implementation is publicly available at: https://github.com/EigenPro/EigenPro .

Supplementary Material: zip

Primary Area: General machine learning (supervised, unsupervised, online, active, etc.)

Submission Number: 17592

Loading