Keywords: VLM, LLM, FSDA, OOD
TL;DR: Instead of fine-tuning VLMs on novel concepts they can't represent, IVL extracts and reasons over human-interpretable visual traits from few examples.Retry
Abstract: Few-shot visual reasoning requires models not only to learn from limited supervision while also adapting across domains, including those that are far from pretraining distributions. Modern vision-language models (VLMs) such as Qwen and LLaVA excel in zero-shot tasks while collapsing in these distant out-of-distribution (OOD) settings, where standard adaptation methods provide limited gains. We introduce $\textbf{I}$nductive $\textbf{V}$isual $\textbf{L}$ogic (IVL), a trait-based reasoning framework that extracts visual traits through dual-mode prompting (semantic and low-level features) and organizes them into compact, interpretable dictionaries. IVL applies inductive–deductive reasoning over these traits at inference and grounds predictions in transferable explanations without updating model weights. Through reasoning over traits rather than memorizing examples, IVL enables training-free few-shot adaptation that explicitly addresses both near-domain shifts and distant OOD shifts. Our experiments across multiple datasets demonstrate that IVL improves few-shot performance while producing more interpretable predictions. Our evaluation results and insights highlight trait-level reasoning as a scalable and complementary path toward robust OOD adaptation in foundation-scale VLMs.
Primary Area: transfer learning, meta learning, and lifelong learning
Submission Number: 24
Loading