Keywords: Hopfield, Energy, Transformer, Graph, Anomaly Detection, Dynamic
TL;DR: Our work combines attention, energy-based models, and associative memory (or Hopfield) models by introducing a novel model called Energy Transformer, which is evaluated on image and graph domains.
Abstract: Our work combines aspects of three promising paradigms in
machine learning, namely, attention mechanism, energy-based models, and
associative memory. Attention is the power-house driving modern deep
learning successes, but it lacks clear theoretical foundations. Energy-based
models allow a principled approach to discriminative and generative
tasks, but the design of the energy functional is not straightforward. At
the same time, Dense Associative Memory models or Modern Hopfield Networks
have a well-established theoretical foundation, and allow an intuitive
design of the energy function. We propose a novel architecture, called the
Energy Transformer (or ET for short), that uses a sequence of attention
layers that are purposely designed to minimize a specifically engineered
energy function, which is responsible for representing the relationships
between the tokens. In this work, we introduce the theoretical foundations
of ET, explore its empirical capabilities using the image completion task,
and obtain strong quantitative results on the graph anomaly detection and
graph classification tasks.
Submission Number: 29
Loading