Keywords: Reinforcement Learning, Reward Mechanism, Causal AI, Knowledge Representation, Knowledge Graph, Counterfactual Reasoning
TL;DR: We propose KARMA, a framework that integrates domain knowledge and causal inference to dynamically adjust rewards in RL, improving efficiency, robustness, and generalization over standard and spurious reward signals.
Abstract: Designing effective reward functions is a fundamental challenge in reinforcement learning (RL), and poorly specified or spurious reward signals can severely hinder generalization and robustness. Recent findings in reinforcement learning from human or verification feedback (RLHF/RLVR) show that even incorrect or random rewards may yield short-term gains, but fail to provide reliable training signals. We introduce KARMA, a causally-informed reward adjustment framework that integrates structured domain knowledge with causal representation learning to refine the reward signal. KARMA dynamically estimates causal effects via counterfactual reasoning and adapts rewards accordingly, mitigating the impact of misleading correlations. We provide theoretical guarantees on convergence and sample efficiency, and demonstrate across controlled benchmarks—including grid navigation, robotic skill acquisition, and traffic control—that KARMA achieves substantial gains in final performance (up to 30%) and significantly faster convergence compared to strong baselines. Moreover, KARMA exhibits improved out-of-distribution generalization and robustness under noisy observations, highlighting a new paradigm for reliable reward design in RL.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 2830
Loading