Policy-shaped prediction: improving world modeling through interpretability

Published: 10 Oct 2024, Last Modified: 03 Dec 2024IAI Workshop @ NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0
Keywords: machine learning, model based reinforcement learning, reinforcement learning, segment anything model, model interpretability, model explanation
TL;DR: You can use noisy explainability techniques with aggregation via a segmentation model to select what's important during MBRL training.
Abstract: Model-based reinforcement learning (MBRL) offers sample-efficient policy optimization but is susceptible to distractions. We address this by developing Policy-Shaped Prediction (PSP), a method that empowers agents to interpret their own policies and shape their world models accordingly. By combining gradient-based interpretability, pretrained segmentation models, and adversarial learning, PSP outperforms existing distractor-reduction approaches. This work represents an interpretability-driven advance towards robust MBRL.
Track: Main track
Submitted Paper: Yes
Published Paper: No
Published Venue: NeurIPS
Submission Number: 57
Loading