Joker: Joint Optimization Framework for Lightweight Kernel Machines

Junhong Zhang; Zhihui Lai

Joker: Joint Optimization Framework for Lightweight Kernel Machines

Junhong Zhang, Zhihui Lai

Published: 01 May 2025, Last Modified: 23 Jul 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY-SA 4.0

TL;DR: We propose a unified framework for kernel methods (not limited to KRR) with low computation requirements.

Abstract: Kernel methods are powerful tools for nonlinear learning with well-established theory. The scalability issue has been their long-standing challenge. Despite the existing success, there are two limitations in large-scale kernel methods: (i) The memory overhead is too high for users to afford; (ii) existing efforts mainly focus on kernel ridge regression (KRR), while other models lack study. In this paper, we propose **Joker**, a joint optimization framework for diverse kernel models, including KRR, logistic regression, and support vector machines. We design a dual block coordinate descent method with trust region (DBCD-TR) and adopt kernel approximation with randomized features, leading to low memory costs and high efficiency in large-scale learning. Experiments show that **Joker** saves up to 90% memory but achieves comparable training time and performance (or even better) than the state-of-the-art methods.

Lay Summary: This work studied the kernel method, a powerful tool for machine learning. However, this tool usually requires top-tier computers in the big data era. Therefore, it is unaffordable for most users. On the other hand, the application of kernel methods is limited to a small number of machine learning models. To address these problems, we proposed a new strategy named **Joker** to improve the process of kernel methods. In short, **Joker** divides a big task into several small ones and handles each small task iteratively. Then, we can process each task with little effort and greatly reduce the burden on the computer. In practice, **Joker** can handle a large dataset (with 5 million images) in about 2 hours using a consumer GPU. To summarize, **Joker** lowers the requirement of computer hardware, making kernel methods easy to use for the public.

Link To Code: https://github.com/Apple-Zhang/Joker-paper

Primary Area: General Machine Learning->Kernel methods

Keywords: unified optimization, large-scale kernel method, lightweight model

Submission Number: 2004

Loading