Partial Observation Inversion and Batched Belief-State Planning and for Information Gathering POMDPs

Partial Observation Inversion and Batched Belief-State Planning and for Information Gathering POMDPs

TMLR Paper8035 Authors

22 Mar 2026 (modified: 16 Jun 2026)Under review for TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: We present the inversion variational autoencoder ($\mathcal{I}$-VAE), a conditional generative model for efficient belief-state planning in partially observable sequential decision-making problems. The $\mathcal{I}$-VAE maps partial observations to stochastic posterior state samples by learning an observation-conditioned latent prior, enabling consistent belief updates without an explicit likelihood model. We further fine-tune the belief model with a trajectory-based mutual information objective to improve latent space consistency across observation sequences. To support scalable planning with these learned beliefs, we formulate the batched belief-state Markov decision process, which is designed to parallelize rollouts while preserving optimality in expectation. We analyze heuristic policies that maximize the expected entropy reduction of the updated belief and show that these heuristics result in the optimal one-step expected Bayesian information gain. Our approach is evaluated on a benchmark masked-pixel task and a real-world intrusion discovery task using indirect muon tomography data, showing improved estimation accuracy and planning efficiency over conventional methods.

Submission Type: Long submission (more than 12 pages of main content)

Changes Since Last Submission: We thank all three reviewers for the high-quality feedback received, which we have sought to implement in full. Major changes to the revised manuscript, attached, include: **1.)** restructuring the title/contributions/main body to reflect the $\mathcal{I}$-VAE as the principal contribution and the batched belief-state planning as a secondary, supporting contribution. This includes moving discussion of batched value decomposition and optimality preservation, constructing the batched planning model, action selection, interpretation, and value function, to the technical appendix. **2.)** introducing a section on relevant decision processes, namely the POMDP and EMDP, along with a formal definition of the EMDP problem statement/planning objective in Problem 1. **3.)** improving notation consistency throughout and minimizing statistical reporting issues. **4.)** introducing an ablation on the TCL weighting parameter $\alpha$ using the MNIST benchmark, in the technical appendix. **5.)** introducing a Gaussian-mixture benchmark with a tractable posterior alongside a conditional diffusion baseline, in the technical appendix. **6.)** adding a notation table to the technical appendix. **7.)** adding a paragraph on parallel and batched online planning to the related works section of the introduction. We hope that the revised manuscript is better suited for publication and are prepared to make additional revisions as appropriate to address any lingering concerns.

Assigned Action Editor: ~Thiago_D._Simão1

Submission Number: 8035

Loading