Keywords: reinforcement learning, quantum machine learning, robotics
Abstract: We present a novel hybrid quantum-classical actor-critic reinforcement learning (RL) model. In the noisy intermediate-scale quantum (NISQ) era, full utilization of qubits is impractical due to resource limitations. To tackle this issue, this paper proposes Quantum-Critic Proximal Policy Optimization (QC-PPO), where the critic is designed using Quantum Neural Networks, whereas an actor is implemented using conventional networks. We further argue that allocating quantum capacity to the critic serves as a more natural lever for performance gains in actor-critic RL. This is because bootstrapped value estimates shape advantage computation, which consequently shapes the direction of every policy update. Evaluations on multiple MuJoCo environments show consistent improvements; on Humanoid-v4, QC-PPO improves the median return by 52.3\% over PPO with equal environment steps, demonstrating its potential for on-board applications.
Primary Area: reinforcement learning
Submission Number: 22542
Loading