Abstract: Highlights•For the first time, a novel hierarchical spatial-frequency transformer is proposed.•k-NN-based self- and cross- attentions can promote both local and global information.•The proposed fusion transformer with good performance can support other similar task.