Reinforcement Learning Agents in Quantum Code Discovery with Argmax-Preserving Quantization

18 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0
Keywords: Quantum Technology, Agents, Quantization
TL;DR: We propose a scheme that preserves action rankings in reinforcement learning agents with model quantization, allowing efficient yet accurate discovery of quantum error-correcting codes.
Abstract: Reinforcement learning (RL) has recently been employed to autonomously discover quantum error-correcting codes and their encoders tailored to specific noise models and hardware constraints. However, RL policies are highly sensitive to approximation errors, and conventional quantization often disrupts action ranking, leading to degraded exploration and suboptimal codes. We propose Argmax-Preserving Quantization (APQ), a quantization method that directly regularizes action ranking during quantization-aware training. APQ minimizes ranking errors between full-precision and quantized policies, ensuring stable action selection even under low-bit representations. To further safeguard correctness, we integrate a reward-safe constraint that bounds perturbations of Knill–Laflamme conditions under quantization. Experiments with policy-gradient agents on Clifford-simulated environments show that APQ maintains discovery of [[n, k, d]] codes with distance up to 5 using INT8 networks, achieving equivalent logical error suppression as FP16 baselines while reducing inference cost by 3.8×. Our approach demonstrates that decision-consistent quantization can substantially accelerate RL-based quantum code discovery without sacrificing the quality of discovered codes.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 11552
Loading