RL-I2IT: Image-to-Image Translation with Deep Reinforcement Learning

Jing Hu, Ziwei Luo, Chengming Feng, Shu Hu, Bin Zhu, Xi Wu, Xin Li, Hongtu Zhu, Siwei Lyu, Xin Wang

Published: 01 Oct 2025, Last Modified: 10 Nov 2025Neural NetworksEveryoneRevisionsCC BY-SA 4.0
Abstract: Most existing Image-to-Image Translation (I2IT) methods generate images in a single run of deep learning (DL) models. However, designing a single-step model often requires many parameters and suffers from overfitting. Inspired by the analogy between diffusion models and reinforcement learning, we reformulate I2IT as an iterative decision-making problem via deep reinforcement learning (DRL) and propose a computationally efficient RL-based I2IT (RL-I2IT) framework. The key feature in the RL-I2IT framework is to decompose a monolithic learning process into small steps with a lightweight model to progressively transform the source image to the target image. Considering the challenge of handling high-dimensional continuous state and action spaces in the conventional RL framework, we introduce meta policy with a new “concept Plan” to the standard Actor-Critic model. This plan is of a lower dimension than the original image, which facilitates the actor to generate a tractable high-dimensional action. In the RL-I2IT framework, we also employ a task-specific auxiliary learning strategy to stabilize the training process and improve the performance of the corresponding task. Experiments on several I2IT tasks demonstrate the effectiveness and robustness of the proposed method when facing high-dimensional continuous action space problems. Our implementation of the RL-I2IT framework is available at https://github.com/lesley222/RL-I2IT.
Loading