Task Planning for Long-Horizon Cooking Tasks Based on Large Language Models

Published: 01 Jan 2024, Last Modified: 12 Nov 2025IROS 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: In the field of robot manipulation, learnable task planners are gaining attention, especially for long-horizon tasks such as cooking. However, existing methods that predominantly rely on symbolic representations suffer from limitations in generalization capabilities, particularly in handling unseen objects. Given that objects may vary in real-world environments, this limitation may constrain their practical applicability. To address this issue, we propose a novel task-planning framework that leverages a pretrained large language model (LLM) for environmental interpretation. Our proposed framework extracts semantic features directly from textual data, enabling the planner to accommodate unfamiliar objects. We further incorporate a transformer-based encoder-decoder framework to understand environmental attributes derived from the language model and generate sequential predictions in line with object-oriented subgoals. To validate the effectiveness of our model, we utilize a dataset focused on cooking recipes. Going a step further, we propose a method that automatically generates object-oriented data from natural language description using recurrent LLM, enhancing the framework to manage previously unseen targets as well. Our framework shows an average success rate of 95% when validated with test sets that involve unseen objects. By providing the automatically generated dataset to the framework, we achieve a significant 27% increase in success rate on unknown target recipes. We also provide evidence of the real-world viability of our planner by successfully deploying it on a robot platform.
Loading