Keywords: efficient architectures, structured matrices, molecular dynamics
Abstract: Direct application of Transformer architectures in scientific domains poses computational challenges, due to quadratic scaling in the number of inputs. In this work, we propose an alternative method based on hierarchical semi-separable matrices (HSS), a class of rank-structured operators with linear-time evaluation algorithms. Through connections between linearized attention and HSS, we devise an implicit hierarchical parametrization strategy that interpolates between linear and quadratic attention, achieving both subquadratic scaling and high accuracy. We demonstrate the effectiveness of the proposed approach on the approximation of potentials from computational physics.
Submission Track: Original Research
Submission Number: 140
Loading