TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training

TensorGRaD: Tensor Gradient Robust Decomposition for Memory-Efficient Neural Operator Training

TMLR Paper7680 Authors

25 Feb 2026 (modified: 03 Mar 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Scientific problems require resolving multi-scale phenomena across different resolutions and learning solution operators in infinite-dimensional function spaces. Neural operators provide a powerful framework for this, using tensor-parameterized layers to capture complex, multi-dimensional relationships. However, scaling neural operators to high-resolution problems leads to significant computational demands, making the training of industrial-scale models prohibitive. In this work, we introduce TensorGRaD, a novel method that directly addresses the memory challenges associated with optimizing large tensor-structured weights. Our approach, based on a robust tensor decomposition, factorizes gradients as the sum of a low-rank tensor and a sparse one to efficiently capture information within optimizer states, including outliers. Additionally, we provide a recipe for mixed precision training of TensorGRaD, achieving further memory savings without sacrificing accuracy. We showcase the effectiveness of TensorGRaD on Fourier Neural Operators, a class of models crucial for solving partial differential equations (PDE). We provide theoretical guarantees for TensorGRaD, demonstrating its fundamental advantage over matrix-based gradient compression methods. We empirically demonstrate large improvements across various PDE tasks, including the challenging turbulent Navier-Stokes case at a Reynolds number of $10^5$. TensorGRaD reduces total memory usage by over 50% while maintaining and sometimes even improving accuracy.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Guillaume_Dalle1

Submission Number: 7680

Loading