Confidence Calibration in Vision-Language-Action Models

Confidence Calibration in Vision-Language-Action Models

TMLR Paper9023 Authors

18 May 2026 (modified: 31 May 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Trustworthy robot behavior requires not only high levels of task success but also that the robot can reliably quantify how likely it is to succeed. To this end, we present a first-of-its-kind study of confidence calibration in vision-language-action (VLA) foundation models, which map visual observations and natural language instructions to low-level robot motor commands. We establish a confidence estimation baseline for VLAs, examine how task success relates to calibration error and how calibration evolves over time, and introduce two lightweight techniques to remedy the miscalibration we observe: prompt ensembles and action-wise Platt scaling. Our aim in this study is to begin to develop the tools and conceptual understanding necessary to render VLAs trustworthy via reliable uncertainty quantification.

Submission Type: Regular submission (no more than 12 pages of main content)

Assigned Action Editor: ~Zhiwen_Fan2

Submission Number: 9023

Loading