Refining Bias and Reward in LLM Recommender Agents through Meta-Controlled Tool Invocation

Yi Zhang; Ruihong Qiu; Jiajun Liu; Guansong Pang; Sen Wang

Refining Bias and Reward in LLM Recommender Agents through Meta-Controlled Tool Invocation

Yi Zhang, Ruihong Qiu, Jiajun Liu, Guansong Pang, Sen Wang

20 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: LLM Agent, Offline Reinforcement Learning for Recommender Systems

TL;DR: We design a tool-augmented LLM agent for offline recommendation, where a meta-controller adaptively invokes bias detection, reward refinement, and action grounding to improve both accuracy and fairness.

Abstract: Large language model (LLM) agents have recently been brought to recommender systems given their flexible capability of tool use. Although existing approaches adopt the reasoning and acting paradigms for profiling, planning, and memory augmentation, they remain ad hoc and overlook core recommendation challenges in agent-environment interactions, including debiasing and reward estimation in offline learning scenarios. In this paper, we introduce BARO (Bias And Reward Optimization), a meta-controlled, tool-augmented LLM agent framework that explicitly addresses these challenges. BARO employs a two-stage recommendation process: a coarse recommender generates a candidate slate based on user history, and a meta-controller adaptively invokes three specialized tools to refine the recommendation results: a bias detector assesses and mitigates bias in the candidate set, a reward estimator calibrates noisy offline rewards, and an action grounder selects final recommendations from the candidate pool. This design injects bias correction and reward refinement directly into the agent’s decision loop in the recommendations. Empirical results on two benchmark datasets demonstrate that BARO achieves consistent improvements over state-of-the-art methods in metrics such as accuracy, diversity, and fairness. The code will be made publicly available upon acceptance.

Primary Area: other topics in machine learning (i.e., none of the above)

Submission Number: 24833

Loading