Adaptive Complex Wavelet Informed Transformer Operator

Xiaotong Li, Licheng Jiao, Fang Liu, Shuyuan Yang, Hao Zhu, Xu Liu, Lingling Li, Wenping Ma

Published: 2025, Last Modified: 25 Mar 2026IEEE Trans. Multim. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Visual transformers have achieved great success in representation learning. This is mainly due to efficient token dependency modeling via self-attention. However, the computational burden increases sharply as the input pixels increase. Although recent Fourier-based global frequency-domain mixing methods attempt to improve the efficiency of transformers for high-resolution image inputs, the Fourier operator has limited ability to capture the local geometric structure. Complex wavelets can perform local attention in both the spatial domain and the frequency domain. Therefore, we propose the complex wavelet informed transformer operator that uses the real and imaginary wavelets of the dual-tree complex wavelet transform to simulate the interaction in the attention kernel. In order to further reduce the computational burden of operators, we introduce an adaptive local block shared attention mechanism in the channel domain for our wavelet informed operators. Further, we construct the deep multi-head operator network consisting of a hybrid stack of complex wavelet informed transformer operators and self-attention layers. This enables the Transformer to more sparsely capture multi-scale and multi-directional structured features in the process of learning dependencies. Extensive experimental results show that our adaptive complex wavelet informed transformer operator under the Transformer architecture achieves highly competitive accuracy performance on multiple image classification benchmark datasets. And the proposed operators can be flexibly and effectively migrated to vision tasks in dynamic video scenarios.
Loading