Synonym relations affect object detection learned on vision-language dataDownload PDF

Anonymous

16 Dec 2023ACL ARR 2023 December Blind SubmissionReaders: Everyone
TL;DR: We introduce two ways to improve performance of vision-language open-vocabulary object detectors when synonyms are used to refer to objects
Abstract: We analyze whether the problem of multi-modal object detectors trained on vision-language data learn effective visual representations for synonyms. Since many current vision-language models accept user-provided textual input, we highlight the need for such models to learn feature representations that are robust to changes in how such input is provided. Specifically, we analyze changes in synonyms used to refer to objects. Here, we study object detectors trained on vision-language data and investigate how to make their performance less dependent on whether synonyms are used to refer to an object. We propose two approaches to achieve this goal: data augmentation by back-translation and class embedding enrichment. We show the promise of such approaches, reporting improved performance on synonyms from mAP@0.5=33.87% to 37.93%.
Paper Type: short
Research Area: Multimodality and Language Grounding to Vision, Robotics and Beyond
Contribution Types: Model analysis & interpretability, Data analysis
Languages Studied: English
0 Replies

Loading