Thermodynamic Binding: Freezing Chimeric States in Multi-Modal Associative Memories

Published: 03 Mar 2026, Last Modified: 26 Mar 2026NFAM 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: associative memory, multi-modal learning, binding problem, energy-based models, transformer attention
TL;DR: mTAM resolves the multi-modal binding problem by enforcing global representational consistency through a shared consensus bottleneck derived from a global energy functional.
Abstract: Multi-modal inference requires heterogeneous perceptual streams to converge to a single, internally consistent interpretation. Standard cross-attention does not enforce this consistency: each modality maintains an independent posterior over a shared memory bank, which admits chimeric states, namely stable configurations in which different modalities retrieve different prototypes from the shared memory. We introduce the Multi-modal Transformer Associative Memory (mTAM), an energy-based architecture that precludes chimeric states by construction. Its core mechanism, Consensus Split-Bank Attention (CSA), aggregates query-key evidence across modalities into a single global score, produces one shared distribution over memory, and broadcasts it synchronously to every modality. The resulting dynamics correspond to the Concave-Convex Procedure applied to a Difference-of-Convex energy, which guarantees monotonic descent and convergence of each trajectory to a stationary point. A graph-lifting construction maps the model to a Modern Hopfield Network and yields a topology-dependent critical load through an extreme-value capacity analysis in the spirit of the Random Energy Model. Synthetic experiments show retrieval transitions, one-step chimera resolution where standard baselines fail, and topology-dependent capacity scaling consistent with the theory.
Submission Number: 36
Loading