Inductive Visual Logic for Few-Shot Out-of-Distribution Adaptation in VLMs

Inductive Visual Logic for Few-Shot Out-of-Distribution Adaptation in VLMs

ICLR 2026 Conference Submission24 Authors

01 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: VLM, LLM, FSDA, OOD

TL;DR: Instead of fine-tuning VLMs on novel concepts they can't represent, IVL extracts and reasons over human-interpretable visual traits from few examples.Retry

Abstract: Few-shot visual reasoning requires models not only to learn from limited supervision while also adapting across domains, including those that are far from pretraining distributions. Modern vision-language models (VLMs) such as Qwen and LLaVA excel in zero-shot tasks while collapsing in these distant out-of-distribution (OOD) settings, where standard adaptation methods provide limited gains. We introduce $\textbf{I}$nductive $\textbf{V}$isual $\textbf{L}$ogic (IVL), a trait-based reasoning framework that extracts visual traits through dual-mode prompting (semantic and low-level features) and organizes them into compact, interpretable dictionaries. IVL applies inductive–deductive reasoning over these traits at inference and grounds predictions in transferable explanations without updating model weights. Through reasoning over traits rather than memorizing examples, IVL enables training-free few-shot adaptation that explicitly addresses both near-domain shifts and distant OOD shifts. Our experiments across multiple datasets demonstrate that IVL improves few-shot performance while producing more interpretable predictions. Our evaluation results and insights highlight trait-level reasoning as a scalable and complementary path toward robust OOD adaptation in foundation-scale VLMs.

Primary Area: transfer learning, meta learning, and lifelong learning

Submission Number: 24

Loading