Keywords: foundation-model agents, compositional coherence, probabilistic forecasting, uncertainty quantification, logical consistency, Dutch-book coherence, projection methods, agent benchmarking
TL;DR: Local coherence does not guarantee coherent composed LLM-agent beliefs. We define a runtime residual, prove when composition fails, and repair it by projection, improving Brier and betting performance
Abstract: Multi-component foundation-model agents compose several LLM calls into a joint belief; even when each component is internally coherent on its own questions, the assembled belief need not be. We define the compositional residual ε⋆, the L2 distance from the composed quote to the joint coherent polytope, computable at runtime from system output and cross-component constraints alone. A product-structure dichotomy characterizes when local coherence implies global coherence, and a Rayleigh-quotient prediction recovers the typical residual from the specialist panel covariance. Across 1,876 ensemble cliques on four LLMs, ε⋆ > 0 on 33–94% of cliques by relation type; a hierarchical Boyle–Dykstra projection eliminates it at 1×m elicitation cost while retaining specialist routing, improves Brier against resolved labels by up to 0.014 (p < 10^-15), and yields +0.115 nats per bet on 1,770 resolved bets. A planner-discretion harness confirms ε⋆ > 0 on 20/20 partitions under deployed routing. Code, prompts, and raw per-clique
responses are released anonymously at https://anonymous.4open.science/r/ctb-compositional-incoherence-6D6C.
Paper Type: Long (8 pages)
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 154
Loading