Abstract: This paper proposes an innovative approach to enhance object detection performance in high-resolution images using the Detection Transformer model with an MLP-Like architecture. The method leverages the Vision Permutator model as the backbone network, integrating focal loss as the loss function, and incorporating local attention mechanisms within the Encoder-Decoder blocks of the Transformer architecture. By utilizing these components, the enhanced Detection Transformer model demonstrates significant improvements in accurately detecting objects in high-resolution imagery while mitigating information loss during processing. Experimental results showcase the effectiveness of the proposed approach, highlighting its potential for various applications in computer vision and surveillance systems. Source code is available at: https://github.com/nguyendung622/iwis2024
Loading