GradMetaNet: An Equivariant Architecture for Learning on Gradients

Yoav Gelberg; Yam Eitan; Aviv Navon; Aviv Shamsian; Theo Putterman; Michael M. Bronstein; Haggai Maron

GradMetaNet: An Equivariant Architecture for Learning on Gradients

Yoav Gelberg, Yam Eitan, Aviv Navon, Aviv Shamsian, Theo Putterman, Michael M. Bronstein, Haggai Maron

Published: 18 Sept 2025, Last Modified: 29 Oct 2025NeurIPS 2025 posterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Geometric Deep Learning, Equivariance, Weight Space Symmetries, Weight Space Learning, Learned Optimizers

TL;DR: GradMetaNet is a neural architecture that efficiently processes gradients of other networks by exploiting their symmetries and rank-1 decomposition structure, enabling better learned optimizers, model editing, and loss curvature estimation

Abstract: Gradients of neural networks encode valuable information for optimization, editing, and analysis of models. Therefore, practitioners often treat gradients as inputs to task-specific algorithms, e.g., using gradient statistics for pruning or optimization. Recent works explore *learning* algorithms that operate directly on gradients but use architectures that are not specifically designed for gradient processing, hindering their applicability. In this paper, we present a principled approach for designing architectures that process gradients. Our approach is guided by three principles: (1) equivariant design that preserves neuron permutation symmetries, (2) processing sets of gradients across multiple data points to capture curvature information, and (3) efficient gradient representation through rank-1 decomposition. Based on these principles, we introduce GradMetaNet, a novel architecture for learning on gradients, constructed from simple equivariant blocks. We prove universality results for GradMetaNet, and show that previous approaches cannot approximate natural gradient-based functions that GradMetaNet can. We then demonstrate GradMetaNet's effectiveness on a diverse set of gradient-based tasks for *MLPs* and *transformers*, such as learned optimization, INR editing, and loss landscape curvature estimation.

Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)

Submission Number: 15925

Loading