A large language model-driven reward design framework via dynamic feedback for reinforcement learning
Abstract: Highlights•We introduce CARD, an LLM-based framework for reward code design and refinement.•Our method lowers human costs, token usage, and training time.•Results show that our method outperforms baselines and exceeds the human oracle.
External IDs:dblp:journals/kbs/SunLLYZL25
Loading