Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation.

Lingyue Fu, Bolun Zhang, Hao Guan, Yaoming Zhu, Lin Qiu, Weiwen Liu, Xuezhi Cao, Xunliang Cai, Weinan Zhang 0001, Yong Yu 0001

06 Jan 2026 (modified: 21 Jan 2026)CoRR 2025EveryoneRevisionsCC BY-SA 4.0
Loading