Expectation–Evidence Prompting: Structuring Verification by Comparing Expected and Observed Evidence

Chang Wang; Longwei Wang; KC Santosh; Chaowei Zhang; Yang Zhou

Expectation–Evidence Prompting: Structuring Verification by Comparing Expected and Observed Evidence

Chang Wang, Longwei Wang, KC Santosh, Chaowei Zhang, Yang Zhou

19 Sept 2025 (modified: 06 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Large Language Models (LLMs), Factual Verification, Prompt Engineering, Cognitive Psychology–Inspired Prompting, Expectation–Evidence Alignment, Contradiction Detection, Abstention Mechanism

Abstract: Large language models (LLMs) often fail in factual verification due to hallucinations, unreliable truthfulness judgments, and opaque reasoning. We identify a structural limitation underlying these failures: LLMs directly compare claims with evidence without accounting for expected refutational alternatives. Specifically, we demonstrate that this omission leads to ambiguity in contradiction detection and unreliable abstention. Leveraging this observation, we introduce Expectation-Evidence Prompting (EEP), a cognitively inspired strategy that first generates supportive and refutational expectations from a claim and then aligns them with observed evidence. This bidirectional reasoning process enforces logical symmetry, reduces bias toward agreement, and provides a principled abstention mechanism. Across three fact-checking benchmarks: FEVER, PubHealth, and SciFact, EEP achieves consistent gains over strong prompting baselines, including an 86.3 macro-F1 on FEVER (+3.6 over Chain-of-Thought), 82.1 precision on PubHealth (highest among all methods), and 76.1 F1 on the Supports class in SciFact. These results demonstrate that embedding expectation evidence alignment into prompt design yields more interpretable, robust, and trustworthy factual reasoning in LLMs.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 19036

Loading