LAtt-Yolov8-seg: Video Real-time Instance Segmentation for Urban Street Scenes Based on Focused Linear Attention Mechanism

Xinqi Zhang, Tuo Dong, Liqi Yan, Zhenglei Yang, Jianhui Zhang

Published: 2024, Last Modified: 19 Feb 2025CVDL 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Abstract: Recently, instance segmentation models with complex architectures and large parameter sets have shown impressive levels of precision. Nonetheless, considering a practical perspective, balancing precision and speed is more desirable. Real-time instance segmentation faces efficiency and quality challenges in complex urban street scenes. In the present research, we propose a YOLOv8-seg based model named LAtt-Yolov8-seg. A pivotal advancement lies in the introduction of a mechanism called Focused Linear Attention, which effectively reduces the computational complexity of traditional attention while maintaining representational capacity. This mechanism first designs a focusing function to adjust the orientations of query and key features to bring similar features together and dissimilar features apart, thereby mimicking the distribution of Softmax attention. Secondly, depthwise convolutions are used to recover the rank of the linear attention matrix, improving feature diversity. On the Cityscapes dataset, LAtt-Yolov8-seg achieves the optimal balance between real-time performance and quality compared to convolutional and transformer models. This work provides an effective and practical instance segmentation solution for resource-constrained real-world applications.