Keywords: Transformer, Lipschitz stability, channel coupling, Ising model
Abstract: Statistical physics has played a pivotal role in the formulation of neural networks and understanding their behaviour. However, the effort to utilize the physical principle in the transformer architecture is still underexplored. In our work, we first show that spectral feature learning with self-attention is prone to instability. Inspired from the Ising model, we then propose a transformer based network using a adjacently coupled spectral attention to learn the spectral mapping from RGB images. We further analyse its stability using the theory of Lipschitz constant. The method is evaluated and compared with different state-of-the-art methods on multiple standard datasets.
Submission Number: 4
Loading