Laplacian Flows for Policy Learning from Experience

Published: 02 Mar 2026, Last Modified: 11 Mar 2026ICLR 2026 Workshop GRaM PosterEveryoneRevisionsBibTeXCC BY 4.0
Track: long paper (up to 8 pages)
Keywords: Policy Optimization, Machine learning, Wasserstein geometry (Manifold), Laplacian Learning
Abstract: Many learning and decision-making systems output conditional distributions rather than point predictions, yet are trained via locally “reasonable” myopic gradient updates that implicitly assume their composition remains globally stable and feasible. In RL (policy gradient/actor–critic) and LLMs (cross-entropy), drift is typically controlled by KL/Fisher trust regions, which need not reflect the true behavioral scale of policy change, so small per-step moves can accumulate into large transport-scale shifts that break stability, long-horizon evidence integration, and robustness (like a millimeter map error causing a catastrophic fall in physical space). We propose the Policy Laplacian Trace (PLT): retrieved historical policies define an OT-induced local graph, and each update solves a variational OT+KL proximal step coupling a Wasserstein barycenter term with KL regularization, yielding experience-induced Laplacian smoothing of task gradient drift. Geometrically, PLT connects to Laplace learning in Wasserstein space: its discrete graph energy approximates a p-Dirichlet/Laplace–Beltrami energy on the realizable policy subset. Empirically, PLT is plug-and-play and improves PPO/MAPPO stability, sample efficiency, and robustness under controlled shifts, and strengthens LLM-as-policy performance on counterfactual trust, long-range factual recall, and few-shot novel-category learning across GPT-family models, while maintaining or improving base performance and calibration.
Anonymization: This submission has been anonymized for double-blind review via the removal of identifying information such as names, affiliations, and identifying URLs.
Submission Number: 11
Loading