HeaPA: Difficulty-Aware Heap Sampling and On-Policy Query Augmentation for LLM Reinforcement Learning.

Weiqi Wang 0001, Xin Liu 0039, Binxuan Huang, Hejie Cui, Rongzhi Zhang, Changlong Yu, Shuowei Jin, Jingfeng Yang 0001, Qingyu Yin, Zhengyang Wang, Zheng Li 0018, Yifan Gao 0001, Priyanka Nigam, Bing Yin, Lihong Li 0001, Yangqiu Song

01 Apr 2026CoRR 2026EveryoneCC BY-SA 4.0
Loading