Keywords: machine learning, model based reinforcement learning, reinforcement learning, segment anything model, model interpretability, model explanation
TL;DR: You can use noisy explainability techniques with aggregation via a segmentation model to select what's important during MBRL training.
Abstract: Model-based reinforcement learning (MBRL) offers sample-efficient policy optimization but is susceptible to distractions. We address this by developing Policy-Shaped Prediction (PSP), a method that empowers agents to interpret their own policies and shape their world models accordingly. By combining gradient-based interpretability, pretrained segmentation models, and adversarial learning, PSP outperforms existing distractor-reduction approaches. This work represents an interpretability-driven advance towards robust MBRL.
Track: Main track
Submitted Paper: Yes
Published Paper: No
Published Venue: NeurIPS
Submission Number: 57
Loading