Keywords: Geometric Deep Learning, Equivariance, Weight Space Symmetries, Weight Space Learning, Learned Optimizers
TL;DR: GradMetaNet is a neural architecture that efficiently processes gradients of other networks by exploiting their symmetries and rank-1 decomposition structure, enabling better learned optimizers, model editing, and loss curvature estimation
Abstract: Gradients of neural networks encode valuable information for optimization, editing, and analysis of models. Therefore, practitioners often treat gradients as inputs to task-specific algorithms, e.g., using gradient statistics for pruning or optimization. Recent works explore *learning* algorithms that operate directly on gradients but use architectures that are not specifically designed for gradient processing, hindering their applicability. In this paper, we present a principled approach for designing architectures that process gradients. Our approach is guided by three principles: (1) equivariant design that preserves neuron permutation symmetries, (2) processing sets of gradients across multiple data points to capture curvature information, and (3) efficient gradient representation through rank-1 decomposition. Based on these principles, we introduce GradMetaNet, a novel architecture for learning on gradients, constructed from simple equivariant blocks. We prove universality results for GradMetaNet, and show that previous approaches cannot approximate natural gradient-based functions that GradMetaNet can. We then demonstrate GradMetaNet's effectiveness on a diverse set of gradient-based tasks for *MLPs* and *transformers*, such as learned optimization, INR editing, and loss landscape curvature estimation.
Primary Area: Deep learning (e.g., architectures, generative models, optimization for deep networks, foundation models, LLMs)
Submission Number: 15925
Loading