LLM2Labels: Zero-shot dataset summarizing and labeling using foundational LLM models

Aleksandar Cvejić; Rameen Abdal; Peter Wonka

LLM2Labels: Zero-shot dataset summarizing and labeling using foundational LLM models

Aleksandar Cvejić, Rameen Abdal, Peter Wonka

21 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: Vocabulary Creation, zero-shot multi-label classification

Abstract: We introduce the LLM2Labels framework for systematically generating a label vocabulary tailored to image segmentation within a comprehensive image dataset. This framework leverages the capabilities of Visual Language Models (VLMs) and Large Language Models (LLMs). Our methodology unfolds in two distinct stages. Firstly, we perform per-image processing, encompassing the Image Label Proposal and Filtering stage, comprising the Label Proposal Module (LPM) and the Label Filtering Module (LFM). In this stage, LPM employs VLMs to suggest candidate labels for each image, with consideration for the context of the task at hand. Subsequently, the suggested labels undergo a rigorous filtering process in the LFM, guided by a predetermined filtering strategy. Secondly, the Logical Grouping stage leverages well-established LLMs, notably Llama2, to empower the logical categorization of the meticulously filtered candidate labels. This categorization process resembles the organization of labels into coherent groups, akin to WordNet synonym sets. We assess the effectiveness of our framework on segmentation datasets, with a primary focus on ground truth segmentation labels within a closed-set scenario, while also revisiting the open-set evaluation. Notably, this research pioneers a novel application of VLMs and LLMs for zero-shot vocabulary discovery without manual annotators or experts. Our results reveal performance levels rival trained close-set multi-label classification while surpassing naive zero-shot models. This work signifies a pioneering leap in harnessing advanced language models for vocabulary generation in computer vision. Beyond its immediate applications in vocabulary creation for image segmentation, it promises to substantially benefit image analysis and research across the field.

Primary Area: generative models

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 3328

Loading