Confidence Calibration in Vision-Language-Action Models

Thomas P Zollo; Richard Zemel

Confidence Calibration in Vision-Language-Action Models

Thomas P Zollo, Richard Zemel

02 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: robotics, VLA, uncertainty quantification, calibration

TL;DR: We present the first systematic study of confidence calibration in vision-language-action (VLA) foundation models, to develop the tools and conceptual understanding necessary to render VLAs trustworthy via reliable uncertainty quantification.

Abstract: Trustworthy robot behavior requires not only high levels of task success but also that the robot can reliably quantify how likely it is to succeed. To this end, we present the first systematic study of confidence calibration in vision-language-action (VLA) foundation models, which map visual observations and natural language instructions to low-level robot motor commands. We examine how task success relates to calibration error and how calibration evolves over time, and introduce two lightweight fixes to remedy the miscalibration we observe: prompt ensembles and action-wise Platt scaling. Our aim in this study is to begin to develop the tools and conceptual understanding necessary to render VLAs both highly performant and highly trustworthy via reliable uncertainty quantification.

Supplementary Material: zip

Primary Area: applications to robotics, autonomy, planning

Submission Number: 938

Loading