Referring Expression Matters: Multi-referring Feature Aggregation for Referring Video Object Segmentation

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Referring Video Object Segmentation, Referring expression segmentation, Multimodal representation earning
Abstract: Referring Video Object Segmentation aims to segment object instances referred to by natural language referring expressions in a video sequence. This interaction style is quite simple and flexible, being capable of producing high quality segmentation masks. However, the referring expression variation occurs due to the randomness of expressions provided by users, making the existing state-of-the-art models still face the problem of wrongly identifying the referred object. To address this issue, we present a novel referring video object segmentation network fed with multiple referring expressions. Specifically, a simple but effective neural expression generation module is proposed to map the features of multiple referring expressions to complementary features with less redundancy. This interaction of multiple referring expressions not only is beneficial to identify the referred object but also speeds up the training convergence. We make evaluations of the proposed method on the popular referring video object segmentation datasets, and experimental results demonstrate that our method outperforms the state-of-the-arts by a significant margin in terms of segmentation quality and achieves considerable gains in terms of training convergence speed. Our code and pre-trained models will be available.
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8803
Loading