Using RKHS Weight Functions in Random Feature Models

TMLR Paper6844 Authors

06 Jan 2026 (modified: 08 Apr 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: We examine the consequences of positing that the weight function $\alpha$ in the classical random feature model formulation $f(x) = \E_{w\sim p}\qty[\alpha(w)\phi(w,x)]$ belongs to a reproducing kernel Hilbert space. Depending on the choices of parameters of the random feature model, this assumption grants the ability to exactly calculate the model instead of relying on the random kitchen sinks method of approximation. We present several such examples. Additionally, using this form of the model, the functional gradient of the loss can be approximated in an unbiased way through sampling of the random features. This allows using a stochastic functional gradient descent to learn the weight function. We show that convergence is guaranteed under mild assumptions. Further theoretical analysis shows that the empirical risk minimizer converges with the same $\Ocal\qty(\frac 1 {\sqrt m} + \frac 1 {\sqrt T})$ rate as Rahimi & Recht (2009). We also present two other algorithms for learning the weight function. We run experiments to compare these three learning algorithms, and to compare this random feature model variant to the original random kitchen sinks and other state of the art algorithms.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: 1. A new list of contributions in the introduction. 2. New Tables 5 and 6 breaking down the algorithmic complexities of Algorithms 3 (SFGD) and 4 (least squares fit). 3. A new Section 5.1.1 explaining why SFGD samples only one random feature each iteration. 4. A new Section 5.4 linking our three algorithms to similar existing algorithms. 5. Section 6.3 has been streamlined by moving both intermediate lemmas to the appendix. 6. A new Table 7 summarizing the three main assumptions on the weight function and their implications. 7. A bug fix has improved the results for SFGD in Figure 1. The text has been modified accordingly. 8. A new Figure 2 more clearly showing the algorithmic complexities of Algorithms 3 and 4. 9. A new Section 7.2 and Figures 3 and 4 showing the relevance of our proposed method for hyperparameter selection. 10. Other minor changes to the text.
Assigned Action Editor: ~Mauricio_A_Álvarez1
Submission Number: 6844
Loading