DemoReranker: Enhancing the In-context Learning Capability of Multi-modal Large Models via Demonstration Reranking

DemoReranker: Enhancing the In-context Learning Capability of Multi-modal Large Models via Demonstration Reranking

ICLR 2026 Conference Submission13083 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: large vision language models, in context learning, visual question answering

Abstract: In the deployment of Large Multi-modal Models (LMMs), researchers and practitioners often rely on simplistic strategies for in-context learning (ICL), such as reusing fixed demonstrations across diverse samples or retrieving candidates directly using the CLIP model. These approaches may not ensure that selected demonstrations align optimally with LMM requirements. To bridge this gap, we introduce DemoReranker, a novel framework that fine-tunes a specialized reranker module to improve demonstration selection for LMMs. First, we assess demonstration quality by measuring its influence on model outputs. Second, our reranker incorporates a scoring head atop the CLIP embedding model, evaluating compatibility between test samples and candidate demonstrations. Third, we optimize the reranker using list-wise ranking loss while keeping the CLIP backbone frozen. Extensive experiments across 7 datasets spanning 3 multi-modal tasks confirm that DemoReranker effectively enhances LMM performance in ICL by reranking demonstrations to identify the most suitable candidates.

Primary Area: foundation or frontier models, including LLMs

Submission Number: 13083

Loading