Belief-Aware Decision Transformers for Offline-to-Online Decision-Making under Partial Observability: A Geosteering Case Study
Keywords: Decision Transformer, Offline Reinforcement Learning, Decision Making, Uncertainty Quantification, Geosteering
TL;DR: Belief-aware Decision Transformer with two heads (structured belief + action conditioned on belief), tested on geosteering. Coupling action to belief improves decisions; strict bottleneck improves belief calibration but hurts decisions.
Abstract: Sequence-modeling approaches such as Decision Transformers learn offline policies directly from trajectories, but their internal representation of hidden state remains implicit and difficult to inspect or calibrate. We introduce a belief-aware Decision Transformer that makes hidden state explicit within a single offline sequence model: a shared transformer encoder feeds a structured belief head predicting physically meaningful hidden variables, and an action head conditioned on this belief, supervised jointly against simulator ground truth and offline actions. We exemplify the framework in geosteering, where the agent must steer a drilling trajectory through a thin reservoir layer using only noisy indirect measurements, with each decision irreversibly committing part of the well path. Comparing four architectural variants, we find that conditioning the action head on the predicted belief yields the strongest decision quality, while a strict belief bottleneck improves belief calibration at the cost of decisions. These findings suggest that belief structure should inform but not entirely constrain offline policies under structured partial observability, and that physically meaningful beliefs offer a natural interface for future offline-to-online adaptation.
Submission Number: 82
Loading