Transformer with MLP-like Approach for Improving Object Detection Efficiency

Published: 01 Jan 2024, Last Modified: 19 Jun 2025IWIS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: This paper proposes an innovative approach to enhance object detection performance in high-resolution images using the Detection Transformer model with an MLP-Like architecture. The method leverages the Vision Permutator model as the backbone network, integrating focal loss as the loss function, and incorporating local attention mechanisms within the Encoder-Decoder blocks of the Transformer architecture. By utilizing these components, the enhanced Detection Transformer model demonstrates significant improvements in accurately detecting objects in high-resolution imagery while mitigating information loss during processing. Experimental results showcase the effectiveness of the proposed approach, highlighting its potential for various applications in computer vision and surveillance systems. Source code is available at: https://github.com/nguyendung622/iwis2024
Loading