Epistemic Context Learning: Building Trust the Right Way in LLM Multi-Agent Systems

Ruiwen Zhou; Maojia Song; Xiaobao Wu; Sitao Cheng; Xunjian Yin; Yuxi Xie; Zoey Hao; Wenyue Hua; Liangming Pan; Soujanya Poria; Min-Yen Kan

Epistemic Context Learning: Building Trust the Right Way in LLM Multi-Agent Systems

Ruiwen Zhou, Maojia Song, Xiaobao Wu, Sitao Cheng, Xunjian Yin, Yuxi Xie, Zoey Hao, Wenyue Hua, Liangming Pan, Soujanya Poria, Min-Yen Kan

Published: 02 Mar 2026, Last Modified: 23 Mar 2026Agentic AI in the Wild: From Hallucinations to Reliable Autonomy PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models, Multi-Agent Systems, Reinforcement Learning

TL;DR: We identify the limitations of LLMs in reasoning with reference to peer responses in multi-agent systems, and develop a structured reasoning framework that first builds peer profile from history and then conditions final reasoning on them.

Abstract: Individual agents in multi-agent (MA) systems often lack robustness, tending to blindly conform to misleading peers. We show this weakness stems from both sycophancy and inadequate ability to evaluate peer reliability. To address this, we first formalize the learning problem of history-aware reference, introducing the historical interactions of peers as additional input, so that agents can estimate peer reliability and learn from trustworthy peers when uncertain. This shifts the task from evaluating peer reasoning quality to estimating peer reliability based on interaction history. We then develop Epistemic Context Learning (ECL): a reasoning framework that conditions predictions on explicitly-built peer profiles from history. We further optimize ECL by reinforcement learning using auxiliary rewards. Our experiments reveal that our ECL enables small models like Qwen 3-4B to outperform a history-agnostic baseline 8x its size (Qwen 3-30B) by accurately identifying reliable peers. ECL also boosts frontier models to near-perfect (100\%) performance. We show that ECL generalizes well to various MA configurations and we find that trust is modeled well by LLMs, revealing a strong correlation in trust modeling accuracy and final answer quality.

Submission Number: 85

Loading