Identifying and Mitigating Reasoning Errors in VLM Verifiers via Activation Decomposition
Track: long paper (up to 8 pages)
Keywords: vision language models, verifiers, activation analysis, steering vectors
TL;DR: We decompose VLM verifier activations into independent error components and adaptively steer along them to improve performance.
Abstract: Vision language model (VLM) verifiers exhibit reasoning errors that compromise their reliability. While existing mitigation strategies based on prompting or finetuning can improve performance, they overlook the connection between the errors and internal representations, often leaving the interventions imprecise or opaque. We address this gap by examining the structure of the errors in the model activations using independent component analysis (ICA). We identify components corresponding to previously known position and length preferences, as well as error patterns beyond these established biases. Steering experiments confirm that intervening along these components predictably alters the associated behaviors, validating their influence on verifier decisions. We train a lightweight adapter that steers the model along the component directions, adjusting the correction per token and task. The adapter improves verification accuracy and reduces position bias, demonstrating the effectiveness of activation-level intervention informed by error decomposition.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 54
Loading