Abstract: Highlights•Based on CNN and Transformer, the FAFuse framework can effectively integrate fine-grained local features and high-level semantic context information through FAF.•Four-Axis attention can effectively capture non-local and multi-directional contextual information. At the same time, long-range dependencies between feature maps can be captured.
Loading