Aligning Agent Policies with Preferences: Human-Centered Interpretable Reinforcement Learning

Published: 22 Sept 2025, Last Modified: 03 Jan 2026WiML @ NeurIPS 2025EveryoneRevisionsBibTeXCC BY 4.0
Keywords: reinforcement learning, interpretability, explainability, transparency, ai agents, interpretable reinforcement learning, learning from human feedback
Submission Number: 22
Loading