Bayesian Inverse Transition Learning: Learning Dynamics from Near-Optimal Trajectories

Leo Benac; Abhishek Sharma; Sonali Parbhoo; Finale Doshi-Velez

Bayesian Inverse Transition Learning: Learning Dynamics from Near-Optimal Trajectories

Leo Benac, Abhishek Sharma, Sonali Parbhoo, Finale Doshi-Velez

Published: 03 Feb 2026, Last Modified: 02 May 2026AISTATS 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: Estimating transition dynamics using demonstrations in Reinforcement Learning.

Abstract: We consider the problem of estimating the transition dynamics from near-optimal expert trajectories in the context of offline model-based reinforcement learning. We develop a novel constraint-based method, Inverse Transition Learning, that treats the limited coverage of the expert trajectories as a feature: we use the fact that the expert is near-optimal to inform our estimate of. We integrate our constraints into a Bayesian approach. Across both synthetic environments and real healthcare scenarios like Intensive Care Unit (ICU) patient management in hypotension, we demonstrate not only significant improvements in decision-making, but that our posterior can inform when transfer will be successful.

Code Dataset Promise: No

Signed Copyright Form: pdf

Format Confirmation: I agree that I have read and followed the formatting instructions for the camera ready version.

Submission Number: 986

Loading