MaskConver: A Universal Panoptic and Semantic Segmentation Model with Pure Convolutions

Abdullah Rashwan; Yeqing Li; Xingyi Zhou; Jiageng Zhang; Fan Yang

MaskConver: A Universal Panoptic and Semantic Segmentation Model with Pure Convolutions

Abdullah Rashwan, Yeqing Li, Xingyi Zhou, Jiageng Zhang, Fan Yang

22 Sept 2022 (modified: 13 Feb 2023)ICLR 2023 Conference Withdrawn SubmissionReaders: Everyone

Keywords: panoptic segmentation, semantic segmentation, convolutional networks, mobile models

TL;DR: Universal panoptic segmentation model with pure convolutions tailored for mobile devices.

Abstract: Universal panoptic segmentation models have achieved state-of-the-art quality by using transformers for predicting masks. However, in mobile applications, transformer models are not computation-friendly due to the quadratic complexity with respect to the input length. In this work, we present MaskConver, a unified panoptic and semantic segmentation model with pure convolutions, which is optimized for mobile devices. We propose a novel lightweight mask embedding decoder to predict mask embeddings. These mask embeddings are used to predict a set of binary masks for both things and stuff classes. MaskConver achieves \textbf{37.2\%} panoptic quality score on COCO validation set, which is \textbf{6.4\%} better than Panoptic DeepLab with the same MobileNet backbone. After mobile-specific optimizations, MaskConver runs at \textbf{30} FPS and delivers 29.7\% panoptic quality score on a Pixel 6, making it a real-time model, which is 10$\times$ faster than Panoptic DeepLab using the same backbone.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics

Submission Guidelines: Yes

Please Choose The Closest Area That Your Submission Falls Into: Applications (eg, speech processing, computer vision, NLP)

5 Replies

Loading