Abstract: Document Visual Question Answering (DocVQA) systems often produce overconfident or ethically misaligned responses, especially under uncertainty. Existing models like LayoutLMv3, UDOP, and DONUT focus on accuracy but lack ethical calibration. We propose HonestVQA, a model-agnostic, self-supervised framework that aligns model confidence with correctness using weighted loss and contrastive learning. We introduce two new metrics—Honesty Score (H-Score) and Ethical Confidence Index (ECI)—to evaluate ethical alignment. HonestVQA improves accuracy and F1 by up to 4.3% across SpDocVQA, InfographicsVQA, and SROIE, while reducing overconfidence. It also generalizes well across domains, achieving 78.9% accuracy and 76.1% F1-score. Our code is available at: https://anonymous.4open.science/r/HonestVQA-B454/README.md.
Paper Type: Long
Research Area: Interpretability and Analysis of Models for NLP
Research Area Keywords: DocVQA, LLM, NLP
Contribution Types: Model analysis & interpretability
Languages Studied: N/A
Submission Number: 220
Loading