Keywords: cifar, objects, classification, computer-vision, llm, context, feedback, imagenet
TL;DR: We introduce a system that uses LLM-generated context hints in feedback loops to improve object classification.
Abstract: In this work, we present ELF (Evolving LLM-Based Schemas for Mid-Vision Feedback), a framework that integrates schema evolution with Mid Vision Feedback (MVF) for visual learning. We leverage Large Language Models (LLMs) to automatically generate schemas: executable semantic programs operating over sets of context categories (e.g., ”animate” or ”inanimate”). We integrate schemas into visual processing via MVF, a method that utilizes top-down feedback connections to inform mid-level visual processing with high-level contextual knowledge. To optimize these schemas we utilize EvoPrompt, an evolutionary algorithm that refines schemas through iterative search, resulting in improvements in accuracy and contextual consistency. We demonstrate the effectiveness of ELF across multiple datasets and multiple architectures for the task of object classification
Primary Area: applications to computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8940
Loading