Keywords: Vision, Segmentation
TL;DR: Utilizing visual properties to achieve better representations of objects.
Abstract: In recent years, large vision models have made significant advancements and excelled in tasks such as detection, segmentation, and tracking. This is partly due to vision models‘ good representation of visual objects. Although the recently proposed SAM (the Segment Anything Model ) or the one/few-shot models based on SAM have wide applicability across many tasks, some researchers have found that they do not perform well on certain downstream tasks . In this paper, we focused on a specific group of these objects, which can be summarized as glass-like objects, and quantitatively studied the inadequacies related to the vision models’ feature representation of glass-like objects using the representation accuracy(RA) metric we proposed. Then, we proposed a novel, extremely simple method that introduces almost no additional computations to address these inadequacies. The main idea is utilizing the visual properties of target objects to find representation dimensions which dominate in recognizing them and leveraging these information accordingly to achieve better representations of target objects. Using representation accuracy and setting these representations as reference in one-shot segmentation tasks, our experiments demonstrated the substantial effectiveness of our method.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 3125
Loading