Complex Dual-Tree Pyramid Scattering Transformer

Xiaotong Li, Licheng Jiao, Lingling Li, Fang Liu, Hao Zhu, Xin Zhang, Xu Liu, Shuyuan Yang

Published: 2025, Last Modified: 25 Mar 2026IEEE Trans. Neural Networks Learn. Syst. 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Attention-based transformer networks have recently played an increasingly important role in computer vision tasks. However, since pixel-by-pixel attention multiplication does not involve constraint assumptions such as spatial invariance, the computational complexity grows quadratically with the increase of input pixels. Therefore, this article proposes a complex pyramid scattering Transformer in dense scale space, which introduces sparse scattering constraints with a small number of wavelet basis parameters. It enhances the Transformer’s flexibility and sparsity in multiscale space and, to a certain extent, slows down the increase in computational complexity caused by multiresolution input. In addition, compared with the general single-tree real wavelet transform, the dual-tree complex scattering method improves the aliasing of the scattering attention layer and helps obtain a more robust feature representation. At the same time, the multihead stepwise pyramid scattering coupling mechanism helps increase the abundance of directional priors. We conduct experiments in image classification and video tracking scenarios and verify the reliability and superiority of our dual-tree complex pyramid scattering Transformer for visual tasks with different scale requirements. The performance is better than that of the baseline Transformer and other advanced wavelet scattering networks at the same parameter scale. The code is available at https://github.com/Dawn5786/CPSTFormer
Loading