Keywords: scene graph generation, visual relationship detection, visual scene understanding
Abstract: As an important component of visual scene, visual relationship has received extensive attention in recent years.
Most existing works directly utilize the rough visual appearance to represent visual relationships.
Although they have been made tremendous progress, the study of visual relationship may be still far from perfect.
This common idea may have three problems.
1) The similarity of space aggravates the ambiguity of predicate representation.
2) The differences between many visual relationships are subtle.
3) It lacks interpretability.
To address these problems, we propose a novel method - Progressive Visual Relationship Inference(\PVRI) - which considers both rough visual appearance and fine-grained visual cues to gradually infer visual relationships.
It includes the following three steps.
1) Known Cues Collection:
firstly, we utilize Large Language Model(LLM) to collect the cues that may help infer visual relationships;
2) Unknown Cues Extraction:
secondly, we design UCE strategy to extract the cues that are not defined by the text.
3) Progressive Inference:
thirdly, we utilize the obtained cues to infer visual relationships.
We demonstrate the effectiveness and efficiency of our method for the Visual Genome, Open Image V6 datasets.
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1428
Loading