One POMDP, Many Stories: LLM-driven Framework for Equivalence Verification

One POMDP, Many Stories: LLM-driven Framework for Equivalence Verification

ACL ARR 2026 January Submission10571 Authors

06 Jan 2026 (modified: 20 Mar 2026)ACL ARR 2026 January SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: POMDP, model equivalence verification, large language models, bidirectional translation, iterative refinement, noise-invariant decision models

Abstract: Different formulations of sequential decision processes make decision-making under uncertainty inefficient. Revealing deeper knowledge about the structure of decision problems by identifying and verifying equivalence across diverse formulations such as in Partially Observable Markov Decision Processes (POMDPs) can help but cost significant expert interventions. We propose a bidirectional translation iteration approach using large language models (LLMs) to systematically verify the equivalence between POMDPs and their textual descriptions. Our approach involves (1) generating a text description from a mathematical model of POMDP, (2) reconstructing a mathematical model from the description, and (3) comparing the reconstructed and the original models. To further test robustness, we introduce noise elements in states, actions, and observations to assess whether our approach recognizes them as equivalent.73\% of our POMDPs were perfectly reconstructed from textual descriptions, and 89\% remained structurally equivalent even under noise, leading to two distinct but equivalent textual versions for 63\% of all models. Most discrepancies manifested as false negatives, reflecting the inherent ambiguity of natural language. Iterative refinement proved effective in reducing errors, though rare cases required additional iterations or more accurate initial descriptions. These results demonstrate that we can model multiple descriptions implying the same POMDP model, and this complex POMDP can be relatively accurately reconstructed from text descriptions. It also proves that iteratively refining the LLM outputs can significantly decline omissions or inaccuracies.

Paper Type: Long

Research Area: Mathematical, Symbolic, Neurosymbolic, and Logical Reasoning

Research Area Keywords: symbolic reasoning, logical reasoning, deductive reasoning, symbolic AI, Markov logics

Contribution Types: Model analysis & interpretability, Theory

Languages Studied: English

Submission Number: 10571

Loading