Augmentations in Offline Reinforcement Learning for Active Positioning

ICLR 2026 Conference Submission17132 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Offline Reinforcement Learning, Reinforcement Learning, Active Position, Off-Policy Learning, Value Function Geometry
TL;DR: We introduce a trajectory-based data augmentation method that improves offline reinforcement learning in active positioning tasks by leveraging task structure and geometric properties of rewards, values, and logging policies.
Abstract: We propose a method for data augmentation in offline reinforcement learning applied to active positioning problems. The approach enables the training of off-policy models from a limited number of trajectories generated by a suboptimal logging policy. Our method is a trajectory-based augmentation technique that exploits task structure and quantify the effect of admissible perturbations on the data using the geometric interplay of properties of the reward, the value function, and the logging policy. Moreover, we show that by training an off-policy model with our augmentation while collecting data, the suboptimal logging policy can be supported during collection, leading to higher data quality and improved offline reinforcement learning performance. We provide theoretical justification for these strategies and validate them empirically across positioning tasks of varying dimensionality and under partial observability.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 17132
Loading