Object-Centric Learning of Neural Policies for Zero-shot Transfer over Domains with Varying Quantities of Interest

TMLR Paper2745 Authors

25 May 2024 (modified: 20 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Our goal is to learn policies that generalize across variations in quantities of interest in the domain (e.g., number of objects, motion dynamics, distance to the goal) in a zero-shot manner. Recent work on object-centric approaches for image and video processing has shown significant promise in building models that generalize well to unseen settings. In this work, we present {\em Object Centric Reinforcement Learning Agent (ORLA)}, an object-centric approach for model-free RL in perceptual domains. ORLA works in three phases: first, it learns to extract a variable number of object masks via an expert trained using encoder-decoder architecture, which in turn generates data for fine-tuning a YOLO-based model for extracting bounding boxes in unseen settings. Second, bounding boxes are used to construct a symbolic state consisting of object positions across a sequence of frames. Finally, a Graph Attention Network (GAT) based architecture is employed over the extracted object positions to learn a dense state embedding, which is then decoded to get the final policy that generalizes to unseen environments. Our experiments over a number of domains show that ORLA can learn significantly better policies that transfer across variations in different quantities of interest compared to existing baselines, which often fail to do any meaningful transfer.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Dinesh_Jayaraman2
Submission Number: 2745
Loading