Abstract: Highlights•Siamese feature extractor is proposed to jointly extract static and motion features.•Deep level set method is utilized to fix the semantic gap.•Cross-attention transformer is proposed to refine and fuse static and motion features.
Loading