Unsupervised Learning of Temporal Abstractions with Slot-based Transformers

TMLR Paper5 Authors

24 Mar 2022 (modified: 17 Sept 2024)Rejected by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: The discovery of reusable sub-routines simplifies decision-making and planning in complex reinforcement learning problems. Previous approaches propose to learn such temporal abstractions in a purely unsupervised fashion through observing state-action trajectories gathered from executing a policy. However, a current limitation is that they process each trajectory in an entirely sequential manner, which prevents them from revising earlier decisions about sub-routine boundary points in light of new incoming information. In this work we propose SloTTAr, a fully parallel approach that integrates sequence processing Transformers with a Slot Attention module and adaptive computation for learning about the number of such sub-routines in an unsupervised fashion. We demonstrate how SloTTAr is capable of outperforming strong baselines in terms of boundary point discovery, even for sequences containing variable amounts of sub-routines, while being up to $7\mathrm{x}$ faster to train on existing benchmarks.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: Please find the updated manuscript with all the changes based on the suggestions/recommendations made by the reviewers. Major changes include: 1) A new visualization of the mask generation process (Figure 6) in the Appendix.
Assigned Action Editor: ~Debadeepta_Dey1
Submission Number: 5
Loading