Identifying and Mitigating Reasoning Errors in VLM Verifiers via Activation Decomposition

Joonhyuk Cha; Moises Andrade; Zsolt Kira

Identifying and Mitigating Reasoning Errors in VLM Verifiers via Activation Decomposition

Joonhyuk Cha, Moises Andrade, Zsolt Kira

Published: 02 Mar 2026, Last Modified: 11 Mar 2026ICLR 2026 Workshop VerifAI-2EveryoneRevisionsBibTeXCC BY 4.0

Track: long paper (up to 8 pages)

Keywords: vision language models, verifiers, activation analysis, steering vectors

TL;DR: We decompose VLM verifier activations into independent error components and adaptively steer along them to improve performance.

Abstract: Vision language model (VLM) verifiers exhibit reasoning errors that compromise their reliability. While existing mitigation strategies based on prompting or finetuning can improve performance, they overlook the connection between the errors and internal representations, often leaving the interventions imprecise or opaque. We address this gap by examining the structure of the errors in the model activations using independent component analysis (ICA). We identify components corresponding to previously known position and length preferences, as well as error patterns beyond these established biases. Steering experiments confirm that intervening along these components predictably alters the associated behaviors, validating their influence on verifier decisions. We train a lightweight adapter that steers the model along the component directions, adjusting the correction per token and task. The adapter improves verification accuracy and reduces position bias, demonstrating the effectiveness of activation-level intervention informed by error decomposition.

Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.

Submission Number: 54

Loading