Keywords: LLM Agent, Offline Reinforcement Learning for Recommender Systems
TL;DR: We design a tool-augmented LLM agent for offline recommendation, where a meta-controller adaptively invokes bias detection, reward refinement, and action grounding to improve both accuracy and fairness.
Abstract: Large language model (LLM) agents have recently been brought to recommender systems given their flexible capability of tool use. Although existing approaches adopt the reasoning and acting paradigms for profiling, planning, and memory augmentation, they remain ad hoc and overlook core recommendation challenges in agent-environment interactions, including debiasing and reward estimation in offline learning scenarios. In this paper, we introduce BARO (Bias And Reward Optimization), a meta-controlled, tool-augmented LLM agent framework that explicitly addresses these challenges. BARO employs a two-stage recommendation process: a coarse recommender generates a candidate slate based on user history, and a meta-controller adaptively invokes three specialized tools to refine the recommendation results: a bias detector assesses and mitigates bias in the candidate set, a reward estimator calibrates noisy offline rewards, and an action grounder selects final recommendations from the candidate pool. This design injects bias correction and reward refinement directly into the agent’s decision loop in the recommendations. Empirical results on two benchmark datasets demonstrate that BARO achieves consistent improvements over state-of-the-art methods in metrics such as accuracy, diversity, and fairness. The code will be made publicly available upon acceptance.
Primary Area: other topics in machine learning (i.e., none of the above)
Submission Number: 24833
Loading