DiagSWin: A multi-scale vision transformer with diagonal-shaped windows for object detection and segmentation
Abstract: Highlights•Diagonal-shaped Window attention has fewer computational costs and parameters.•Combines multi-scale feature extractions within a single self-attention layer.•The proposed method can easily capture multi-scale objects in high-resolution images.
Loading