Abstract: Panoptic segmentation, a crucial computer vision task for scene understanding, simultaneously combines semantic segmentation and instance segmentation to classify pixels and identify object instances in images. Despite significant progress in this field, real-time performance with deep learning methods has yet to reach its full potential. In this paper, we present a novel adaptation of the "You Only Segment Once" (YOSO) architecture, designed for real-time panoptic segmentation. Our primary contribution involves substituting YOSO’s original ResNet50 backbone with the time-efficient STDC networks. Our "Real-time-YOSO" (RT-YOSO) model significantly improves real-time performance, demonstrating its compatibility with robotic systems and autonomous driving scenarios. Thorough experiments on the Cityscapes dataset confirm the robustness and accuracy of our model, achieving 59.2% and 51.2% PQ with corresponding frame rates of 18.5 and 27.8 FPS, respectively. This establishes the best accuracy-speed trade-off by maintaining comparable accuracy while delivering an average of 50% higher FPS. The source code is publicly available at https://github.com/AbdallahReda/RT-YOSO
Loading