Efficient Linear System Solver with Transformers

Max Vladymyrov; Johannes von Oswald; Nolan Andrew Miller; Mark Sandler

Efficient Linear System Solver with Transformers

Max Vladymyrov, Johannes von Oswald, Nolan Andrew Miller, Mark Sandler

Published: 13 Jun 2024, Last Modified: 12 May 2025ICML 2024 Workshop AI4MATH PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: linear attention, numerical methods, linear system of equations

TL;DR: A novel, efficient transformer-based approach to solving small linear systems.

Abstract: This paper investigates the potential of linear Transformers as solvers for systems of linear equations. We propose a novel approach where the Transformer encodes each equation as a separate token, allowing the model to process the system in a permutation-invariant manner. To enhance generalizability and reduce the parameter count, we introduce a block-wise re-parameterization technique for the attention weight matrices. This technique decouples the problem dimension from the model's parameter count, enabling the Transformer to effectively handle systems of varying sizes. Our experiments demonstrate the Transformer's competitive performance compared to established classical methods such as Conjugate Gradient, especially for systems with smaller sizes. We further explore the model's ability to extrapolate to larger systems, providing evidence for its potential as a versatile and efficient solver for linear equations.

Submission Number: 22

Loading