Keywords: Food Manipulation, Robot Scooping, Active Perception
TL;DR: A food scooping robot learning framework with active perception to improve generalization.
Abstract: The ability to successfully scoop up food items presents a significant challenge for existing robot systems due to the complex states and physical properties of food. To overcome this challenge, we believe it is crucial to encode food items into task-related and meaningful representations. However, the distinctive properties of food items, including deformability, fragility, fluidity, or granularity, pose significant challenges for existing representations. In this paper, we investigate the potential of active perception for learning meaningful food representations in an implicit manner. To this end, we present SCONE, a food scooping robot learning framework that leverages the representations gained from active perception to provide the food-scooping model. SCONE consists of two essential encoding modules: the interactive encoder and the state retrieval module. The encoding process allows the model to capture the characteristics of food items and essential features of states. In the real-world food scooping experiments, SCONE achieves a 71% task success rate on 6 unseen food items under 3 levels of difficulty, outperforming other baselines. Additionally, it exhibits higher stability, as evidenced by the task success rate of each food item surpassing 50%.
Student First Author: yes
Supplementary Material: zip
Instructions: I have read the instructions for authors (https://corl2023.org/instructions-for-authors/)