Abstract: Accurate traffic sign data recognition is crucial for enhancing safety in autonomous driving system. However, recognizing traffic signs from natural scenes is challenging due to factors like dim lighting, occlusion, and blurriness, which make traditional deep learning algorithms inefficient. These algorithms require a larger number of parameters to construct network models to achieve higher recognition accuracy. Balancing parameter quantity and accuracy, and building an efficient road traffic sign recognition system, is an important research topic in the field of autonomous driving. In response to these challenges, we introduce a novel architecture named Dynamic Feature Extraction-Efficient Vision Transformer (DFE-EViT). This innovative design comprises two components: a dynamic feature extraction network and an Efficient Vision Transformer (EViT) classifier. By synergizing local information and global receptive fields, this architecture is uniquely equipped to handle intricate and dynamic traffic sign recognition scenarios. The experimental results show that the method proposed in this paper improves the efficiency of road traffic sign recognition in natural scenes, and the network has the advantages of small parameter size and high recognition accuracy. The model proposed in this paper has only 0.859M parameters, but the accuracy can reach 98.4%.
External IDs:dblp:conf/icai4/GeNCZ23
Loading