Keywords: Visual Question Answering (VQA), Causal Debiasing, Uncertainty Estimation, Counterfactual Reasoning, Adaptive Intervention, Curriculum Learning
Abstract: Visual Question Answering (VQA) models often exploit spurious correlations, hindering true multimodal reasoning. While causal inference offers principled debiasing methods, current approaches pair complex causal graphs with overly simplistic, static counterfactual interventions (e.g., feature subtraction). This limits effectiveness. We challenge this by proposing a novel framework synergistically integrating uncertainty estimation with causal counterfactual reasoning for robust VQA debiasing. This is the first work, to our knowledge, to leverage uncertainty within a causal VQA framework. We systematically explore uncertainty quantification techniques (entropy, prediction margin) to assess model confidence. This estimated uncertainty dynamically modulates the counterfactual intervention, allowing adaptive adjustment of biased information sources based on real-time confidence. This moves beyond rigid interventions. Furthermore, we introduce a tailored Curriculum Learning strategy that dynamically assesses sample difficulty using uncertainty-aware metrics, enhancing the adaptive mechanism. Our uncertainty-guided intervention module is architecture-agnostic, enabling integration into diverse VQA networks. This adaptive, uncertainty-aware approach offers a more flexible, robust, and theoretically grounded pathway towards mitigating VQA biases.
Supplementary Material: zip
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 1520
Loading