DeepWaveRL: Self-Supervised Full Waveform Inversion via Reinforcement Learning

ICLR 2026 Conference Submission22537 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Full Waveform Inversion, Self-Supervised Learning, Reinforcement Learning, Computational Imaging
Abstract: Full Waveform Inversion (FWI) is a fundamental technique to estimate subsurface geophysical properties, such as velocity, from seismic measurements. While supervised deep learning methods have recently shown promising performance by directly mapping seismic data to velocity maps, they require ground-truth velocity maps, whichare costly and impractical to obtain at scale. A recent self-supervised approach (UPFWI) removes this dependency by leveraging a differentiable forward operator to reconstruct seismic data from predictions. However, in some practical settings, the forward operator can only be accessed as a black box (e.g., legacy or commercial). Moreover, for complex scenarios, the operator can even be non-differentiable. In this paper, we address this limitation (i.e., the dependency on derivatives of forward operators) by introducing reinforcement learning (RL) into self-supervised FWI. Our method, named DeepWaveRL, reformulates FWI as a policy learning problem, where the model generates velocity maps as actions, and the forward operator is used only to compute rewards. This design avoids backpropagation through the forward operator, thus eliminating the need to compute its derivatives. Furthermore, we identify key strategies to stabilize reinforcement learning in this challenging setting. In the absence of ground-truth labels and differentiable forward operators, our method achieves competitive performance compared to supervised counterparts. We believe our approach provides a more flexible solution for the FWI research community.
Primary Area: applications to physical sciences (physics, chemistry, biology, etc.)
Submission Number: 22537
Loading