Matrix-Free Least Squares Solvers: Values, Gradients, and What to Do With Them

Matrix-Free Least Squares Solvers: Values, Gradients, and What to Do With Them

ICLR 2026 Conference Submission16273 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Automatic differentiation, Numerical Linear Algebra, Constrained Optimization, Implicit Differentiation, Gaussian Process

TL;DR: This paper presents a high-performance, differentiable least-squares solver that can be used like a neural network layer and demonstrates its usefulness by enforcing arbitrary constraints in neural networks and calibrating Gaussian processes.

Abstract: This paper argues that the method of least squares has significant unfulfilled potential in modern machine learning, far beyond merely being a tool for fitting linear models. To release its potential, we derive custom gradients that transform the solver into a differentiable operator, like a neural network layer, enabling many diverse applications. Empirically, we demonstrate: (i) scalability by enforcing weight sparsity on a 50 million parameter model; (ii) imposing conservativeness constraints in score-based generative models; and (iii) hyperparameter tuning of Gaussian processes based on predictive performance. By doing this, our work represents the next iteration in developing differentiable linear-algebra tools and making them widely accessible to machine learning practitioners.

Primary Area: neurosymbolic & hybrid AI systems (physics-informed, logic & formal reasoning, etc.)

Submission Number: 16273

Loading