Keywords: fact verification, process reward, GRPO, structured output, reinforcement learning, hallucination detection, trustworthy AI, information integrity
TL;DR: SEVA produces auditable structured verification — not binary labels. Process reward resolves GRPO's advantage collapse on multi-component output, matching GPT-4o-mini at 3B scale.
Abstract: Hallucination remains a critical reliability bottleneck for LLM-based agents, and fact attribution verifiers serve as the last line of defense. Yet current verifiers produce opaque binary labels, revealing nothing about which evidence was considered or why the judgment was made — leaving agents unable to correct themselves and operators unable to audit decisions. We introduce SEVA (Self-Evolving Verification Agent), which outputs structured evidence alignments, step-by-step reasoning chains, calibrated confidence, and error diagnoses from a six-category taxonomy. To train SEVA beyond SFT, we design a process reward function that scores five verification components independently, allocating 70% of weight to process signals (evidence grounding, reasoning quality, format compliance) and 30% to outcome (label, diagnosis). This design is not optional: we show that standard binary reward suffers from advantage collapse on structured output, where GRPO receives no gradient signal and training stagnates entirely. Process reward resolves this and induces an implicit curriculum — the agent first masters verification behavior (alignment: 0.917→0.997; format: 72%→100%), then improves verification outcomes (F1: 64.9→69.0). Structured output further enables a self-evolution loop where the agent's own errors — now transparent and diagnosable — drive targeted adversarial data generation for iterative improvement. On ClearFacts, SEVA-3B matches GPT-4o-mini (69.0 vs. 69.8 F1) while producing substantially richer, auditable output.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 495
Loading