Training a Convergent Energy Transformer with Equilibrium Propagation

Published: 03 Mar 2026, Last Modified: 23 Apr 2026NFAM 2026 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Energy transformer, Equilibrium propagation, EBM, Hopfield network, attention
TL;DR: We introduce a Convergent version of the Energy Transformer (CET) that is compatible with Equilibrium Propagation based training.
Abstract: Equilibrium Propagation (EP) is a learning framework for energy-based models, i.e. models whose dynamics evolve toward minima (or more generally critical points) of an energy functional. Because it relies on equilibration dynamics and local learning rules, EP is well suited to computing platforms based on analog physics, which may offer substantial energy-efficiency gains. Although standard Transformers are not usually viewed or framed as energy-based models, the recently introduced the Energy Transformer (ET) implements transformer-like computations through dynamics minimizing a global energy function. In its original form, however, the ET is not directly compatible with EP, because it is designed to perform only a small number of energy-minimization steps rather than to converge to equilibrium. We therefore develop a convergent variant, the Convergent Energy Transformer (CET), and train it with EP on a masked image completion task. This work takes a step toward physically inspired, hardware-friendly training methods for transformer-like models.
Submission Number: 19
Loading