Robust Trust

Piotr Dworczak; Alex Smolin

Robust Trust

Piotr Dworczak, Alex Smolin

Published: 02 Mar 2026, Last Modified: 09 Mar 2026ICLR 2026 Workshop AIMSEveryoneRevisionsCC BY 4.0

Keywords: alignment, human-AI interaction, minimax, robustness, trustworthy AI

Abstract: We study an agent who combines her private information with recommendations from an informed but potentially misaligned adviser. The adviser observes a signal and, with known probability, reveals it truthfully; otherwise he can send an arbitrary message. We characterize the agent’s inference-and-action rule that delivers the maximal guaranteed payoff. Any optimal rule admits a trust region representation in belief space: advice is taken at face value when it induces a posterior within the trust region, and otherwise the agent acts as if the posterior were on the trust region’s boundary. We show that commitment has no value to the agent and derive thresholds on the truthfulness probability above which the adviser's presence strictly benefits the agent.

Track: Long Paper

Email Sharing: We authorize the sharing of all author emails with Program Chairs.

Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.

Submission Number: 78

Loading