Long Horizon Episodic Decision Making for Cognitively Inspired Robots

Shweta Singh; Vedant Ghatnekar; Sudaman Rajesh Katti

Long Horizon Episodic Decision Making for Cognitively Inspired Robots

Shweta Singh, Vedant Ghatnekar, Sudaman Rajesh Katti

23 Sept 2023 (modified: 11 Feb 2024)Submitted to ICLR 2024EveryoneRevisionsBibTeX

Primary Area: representation learning for computer vision, audio, language, and other modalities

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Keywords: Representation learning, Reinforcement learning, Human-Robot Collaboration

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Abstract: The Human decision-making process works by recollecting past sequences of observations and using them to decide the best possible action in the present. These past sequences of observations are stored in a derived form which only includes important information the brain thinks might be useful in the future, while forgetting the rest. Transformers have shown great results in multi-modal robotic navigation and human-robot collaboration tasks but lack the ability to scale to large memory sizes and learn long horizon tasks efficiently as the computational requirements needed to run these models scale non-linearly with memory length. Our model for tries to mimic the human brain and improve the memory efficiency of transformers by using a modified TransformerXL architecture which uses Automatic Chunking that chunks the past memories and only attends to the relevant chunks in the transformer block. On top of this, we use ForgetSpan which is technique to remove memories that do not contribute to learning. We also theorize the technique of Similarity based forgetting where the current observations are compared with the elements in the memory and only the new observations are stored, similar to how humans do not store repetitive memories. We test our model in various visual and audio-visual tasks that demand long horizon recollection, audio-visual instruction deciphering and robotic navigation. These tasks test the abilities of the robot that would be required in a human-robot collaboration scenario. We demonstrate that Automatic Chunking with ForgetSpan can improve the memory efficiency and help models to memorize important information and also achieve better performance than the baseline TransformerXL in the tasks previously mentioned. We also show that our model generalizes well by testing the trained models in modified versions of the tasks.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7693

Loading