SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibrations

Published: 02 Mar 2026, Last Modified: 29 Mar 2026Agentic AI in the Wild: From Hallucinations to Reliable Autonomy PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Agents, GUI Grounding, Uncertainty Calibration
Abstract: Graphical User Interface (GUI) grounding aims to translate natural language instructions into executable screen coordinates, enabling automated GUI interaction. Nevertheless, incorrect grounding can result in costly, hard-to-reverse actions (e.g., erroneous payment approvals), raising concerns about model reliability. In this paper, we introduce ***SafeGround***, an uncertainty-aware framework for GUI grounding models that enables risk-aware predictions through calibrations before testing. ***SafeGround*** leverages a distribution-aware uncertainty quantification method to capture the spatial dispersion of stochastic samples from outputs of any given model. Then, through the calibration process, ***SafeGround*** derives a test-time decision threshold with statistically guaranteed false discovery rate (FDR) control. We apply ***SafeGround*** on multiple GUI grounding models for the challenging ScreenSpot-Pro benchmark. Experimental results show that our uncertainty measure consistently outperforms existing baselines in distinguishing correct from incorrect predictions, while the calibrated threshold reliably enables rigorous risk control and potentials of substantial system-level accuracy improvements. Across multiple GUI grounding models, ***SafeGround*** improves system-level accuracy by up to 5.38\% percentage points over Gemini-only inference.
Submission Number: 32
Loading