Policy-shaped prediction: improving world modeling through interpretability

Miles Richard Hutson; Isaac Kauvar; Nick Haber

Policy-shaped prediction: improving world modeling through interpretability

Miles Richard Hutson, Isaac Kauvar, Nick Haber

Published: 10 Oct 2024, Last Modified: 03 Dec 2024IAI Workshop @ NeurIPS 2024EveryoneRevisionsBibTeXCC BY 4.0

Keywords: machine learning, model based reinforcement learning, reinforcement learning, segment anything model, model interpretability, model explanation

TL;DR: You can use noisy explainability techniques with aggregation via a segmentation model to select what's important during MBRL training.

Abstract: Model-based reinforcement learning (MBRL) offers sample-efficient policy optimization but is susceptible to distractions. We address this by developing Policy-Shaped Prediction (PSP), a method that empowers agents to interpret their own policies and shape their world models accordingly. By combining gradient-based interpretability, pretrained segmentation models, and adversarial learning, PSP outperforms existing distractor-reduction approaches. This work represents an interpretability-driven advance towards robust MBRL.

Track: Main track

Submitted Paper: Yes

Published Paper: No

Published Venue: NeurIPS

Submission Number: 57

Loading