Keywords: RoboNLP, Interpretability, Summarization
TL;DR: We develop a system that learns to summarize a virtual robot's actions from text or video input
Abstract: We demonstrate the task of giving natural language summaries of the actions of a robotic agent's actions in a virtual environment. Existing datasets that match robot actions with natural language descriptions designed for instruction following tasks can be repurposed to serve as a training ground for robot action summarization. We propose and test several methods of learning to generate such summaries, starting from either egocentric video frames of the robot taking actions or text representations of the actions and find a two stage summarization process which uses structured language as an intermediate step improves accuracy. Quantitative and qualitative evaluations of the results are provided to serve as a baseline for future work.