Model-free reinforcement learning with noisy actions for automated experimental control in optics

Lea Richtmann; Viktoria-S. Schmiesing; Dennis Wilken; Jan Heine; Aaron D Tranter; Avishek Anand; Tobias J. Osborne; Michèle Heurs

Model-free reinforcement learning with noisy actions for automated experimental control in optics

Lea Richtmann, Viktoria-S. Schmiesing, Dennis Wilken, Jan Heine, Aaron D Tranter, Avishek Anand, Tobias J. Osborne, Michèle Heurs

Published: 08 Jul 2025, Last Modified: 08 Jul 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0

Abstract: Setting up and controlling optical systems is often a challenging and tedious task. The high number of degrees of freedom to control mirrors, lenses, or phases of light makes automatic control challenging, especially when the complexity of the system cannot be adequately modeled due to noise or non-linearities. Here, we show that reinforcement learning (RL) can overcome these challenges when coupling laser light into an optical fiber, using a model-free RL approach that trains directly on the experiment without pre-training on simulations. By utilizing the sample-efficient algorithms Soft Actor-Critic (SAC), Truncated Quantile Critics (TQC), or CrossQ, our agents learn to couple with 90% efficiency. A human expert reaches this efficiency, but the RL agents are quicker. In particular, the CrossQ agent outperforms the other agents in coupling speed while requiring only half the training time. We demonstrate that direct training on an experiment can replace extensive system modeling. Our result exemplifies RL's potential to tackle problems in optics, paving the way for more complex applications where full noise modeling is not feasible.

Submission Length: Regular submission (no more than 12 pages of main content)

Changes Since Last Submission: As suggested by the reviewers, we have repeated our experiment using the more recent algorithm CrossQ. It significantly outperforms the previously tested algorithms TQC and SAC. Using CrossQ, the training time can be halved, and the trained agent requires fewer steps to reach the goal. To provide a better comparison between algorithms, we also conducted a training with the highest goal of 90% coupling efficiency using SAC. We now provide a comprehensive comparison of three algorithms employed in a real-world scenario with agents trained in-situ. This has significantly increased the quality of our manuscript, and we are grateful for this suggestion. We have updated the paper accordingly. In particular, we have adapted Section 5 and replaced Figure 2 to incorporate these new findings. We have also made few minor editorial changes to improve readability.

Code: https://github.com/ViktoriaSchmiesing/RL_Fiber_Coupling

Supplementary Material: zip

Assigned Action Editor: ~Zheng_Wen1

Submission Number: 4186

Loading