Logic Channel Validation and Enhancement of Zero-Shot Vision-Language Comprehension on Vision Language Models

Hui Li Tan; Mei Chee Leong; Gu Ying; Liyuan Li; Nancy F. Chen

Logic Channel Validation and Enhancement of Zero-Shot Vision-Language Comprehension on Vision Language Models

Hui Li Tan, Mei Chee Leong, Gu Ying, Liyuan Li, Nancy F. Chen

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Vision-Language Model, Visual-Language Comprehension, logic reasoning, zero-shot

Abstract: Frontier Large Vision-Language Models (LVLMs) exhibit remarkable capabilities in Visual-Language Comprehension (VLC) tasks, enabled by pretraining on vast visual-textual corpus. However, they are often deployed as zero-shot solution in a black-box manner, as retraining challenges remain due to data privacy or model inaccessibility. Validating and understanding the behavior of the models become important for generalization to new task. We propose a Logic Channel, in parallel with the black-box model channel, to perform explicit logic reasoning for validation and enhancement. The frontier LVLM, encapsulating latent vision-language knowledge, can be considered as an Implicit Logic Channel. The proposed Explicit Logic Channel, mimicking human logic reasoning, incorporates a Large Language Model (LLM), a Visual Foundation Model (VFM), and a logical reasoning module involving novel probabilistic inference for factual, counterfactual, relational, and causal condition reasoning over the extracted and grounded visual-textual facts. Cross-channel logic consistency analysis enables model validation and selection, even without ground-truth annotations. Additionally, cross-channel integration further improves performance in zero-shot tasks over SOTA models. Our experiments on three recent challenging VLC benchmarks, Neg- Bench, HC-RefCOCOg, and HC-RefLoCo, demonstrate the effectiveness of the proposed Logic Channel for logic-based model validation, selection and improvement on LVLM with enhanced explainability and trustworthiness.

Supplementary Material: pdf

Primary Area: interpretability and explainable AI

Submission Number: 11168

Loading