Keywords: Reinforcement Learning, Blind Face Restoration
Abstract: Blind Face Restoration (BFR) encounters inherent challenges in exploring its large solution space, leading to common artifacts like missing details and identity ambiguity in the restored images.
To tackle these challenges, we propose a **L**ikelihood-**R**egularized **P**olicy **O**ptimization (LRPO) framework, the first to apply online reinforcement learning (RL) to the BFR task. LRPO leverages rewards from sampled candidates to refine the policy network, increasing the likelihood of high-quality outputs while improving restoration performance on low-quality inputs.
However, directly applying RL to BFR creates incompatibility issues, producing restoration results that deviate significantly from the ground truth. To balance perceptual quality and fidelity, we propose three key strategies: 1) a composite reward function tailored for face restoration assessment, 2) ground-truth guided likelihood regularization, and 3) noise-level advantage assignment.
Extensive experiments demonstrate that our proposed LRPO significantly improves the face restoration quality over baseline methods and achieves _state-of-the-art_ performance. The source codes and models are available at: https://anonymous.4open.science/r/LRPO-5874.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 5007
Loading