Parallel Manifold Steering: Efficient Adaptation of Large Associative Memories via Residual Energy Shaping

Published: 03 Mar 2026, Last Modified: 06 Mar 2026NFAM 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Associative Memory, Energy-Based Models, Transformers, Parameter-Efficient Fine-Tuning (PEFT), Manifold Learning, Attractor Dynamics, Catastrophic Interference
TL;DR: We propose H-Res, a method for adapting frozen associative memories via residual vector fields that steer token trajectories into task-specific attractors, preserving the pre-trained energy landscape while eliminating prompt tuning's quadratic cost.
Abstract: Large Transformer models function as Dense Associative Memories (DAMs), retrieving knowledge via high-dimensional attractor dynamics driven by the self-attention mechanism. However, adapting these frozen memory systems to new tasks presents a fundamental "Plasticity-Stability" dilemma. Current methods either risk catastrophic interference by modifying synaptic weights directly (e.g., LoRA) or degrade associative capacity by clogging the retrieval buffer with static prompt tokens (e.g., VPT). In this work, we propose H-Res (Hierarchical Residual Steering), a mechanism that modulates the effective energy landscape of the Transformer without altering its global equilibrium or expanding its sequence length. By formulating adaptation as a control problem on the activation manifold, H-Res learns a state-dependent vector field that steers token trajectories into task-specific basins of attraction. We formally prove that H-Res preserves the attention entropy of the foundation model and facilitates Neural Collapse. Empirically, Manifold Steering outperforms global weight modification by 26% on associative retrieval tasks and eliminates the computational overhead of prompt-based methods, scaling effectively to structured domains.
Submission Number: 30
Loading