Mitigating Knowledge Entropy: A Multi-Agent Framework with Decoupled Reranking and Governance-by-Design

Published: 22 Sept 2025, Last Modified: 22 Sept 2025WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: llm application, generative ai, enterprise ai, vector databases, similarity matching
Abstract: Enterprise knowledge bases (KBs) are critical assets, yet they consistently degrade under the strain of unscalable manual curation. This process creates a significant operational bottleneck, leading to knowledge entropy—a state where KBs become unreliable repositories of outdated, overlapping, and contradictory information. This decay directly impacts employee productivity, increases support costs, and erodes trust in internal systems. While AI automation presents a potential solution, simplistic implementations often fail in production environments. They struggle with the nuanced ambiguity of mature KBs and, more critically, operate as black boxes without "governance-by-design." This lack of a mandatory human-in-the-loop (HITL) process creates an unacceptable risk of silently polluting the KB with unverified information, replacing a problem of neglect with one of active misinformation. To address these challenges, we propose a multi-agent framework that automates the knowledge lifecycle through a novel, decoupled architecture that separates user-facing retrieval from internal content governance. This framework is built on three core components, each targeting a specific failure point of existing systems. First, a Context-Aware Reranking (CAR) algorithm overcomes the ambiguity of overlapping content where standard vector search fails, dramatically improving article matching. Second, a multi-step Actionable Verdict Engine (AVE) replaces brittle, single-shot classifiers with a robust logical pipeline, significantly improving the accuracy of determining necessary actions (e.g., Update Article, Draft New). Finally, and most critically, a dynamic AI Reliability Score provides the essential governance layer for the HITL process. This score, updated based on expert feedback, empowers human reviewers to efficiently prioritize their oversight on the AI's contributions, ensuring the long-term integrity of the knowledge base. Running in a production environment, this approach improved verdict classification accuracy from a 65.7% baseline to 88.2%, reduced article authoring time by 75%, and yielded an estimated $430k in annualized savings. This work presents a robust and practical model for deploying generative AI, proving that automation efficiency can be achieved without sacrificing the critical human oversight required for safe and reliable enterprise-scale operation.
Submission Number: 399
Loading