Abstract: Unlike general semantic segmentation, aerial image segmentation has its own particular challenges, three of the most prominent of which are great object scale variation, the scattering of multiple tiny objects in a complex background and imbalance between foreground and background. Previous affinity learning-based methods introduced intractable background noise but lost key-point information due to the additional interaction between different level features in their Feature Pyramid Network (FPN) like structure, which caused inferior results.We argue that multi-scale information can be further exploited in each FPN level individually without cross-level interaction, then propose a Multi-scale Attention Cascade (MAC) model to leverage spatial local contextual information by using multiple sized non-overlapping window self-attention module, which mitigates the effect of complex and imbalanced background. Moreover, the multi-scale contextual cues are propagated in a cascade manner to tac
Loading