Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation

Jinbang Huang; Zhiyuan Li; Yuanzhao Hu; Zhanguang Zhang; Mark Coates; Xingyue Quan; Yingxue Zhang

Self-CriTeach: LLM Self-Teaching and Self-Critiquing for Improving Robotic Planning via Automated Domain Generation

Jinbang Huang, Zhiyuan Li, Yuanzhao Hu, Zhanguang Zhang, Mark Coates, Xingyue Quan, Yingxue Zhang

Published: 30 Apr 2026, Last Modified: 24 Jun 2026ICML 2026 regularEveryoneRevisionsBibTeXCC BY 4.0

TL;DR: We enable LLM self-training and self-critiquing by using self-generated PDDL planning domains as both data generators and structured reward signals for robotic planning.

Abstract: Large Language Models (LLMs) have recently shown strong promise for robotic task planning, particularly through automatic planning domain generation. However, prior approaches largely treat generated planning domains as planning utilities, which are brittle under imperfect logical states and perception noise, overlooking their potential as scalable sources of reasoning supervision and structured reward signals. At the same time, reasoning LLMs depend on chain-of-thought (CoT) supervision that is expensive to collect for robotic tasks, and reinforcement learning (RL) faces challenges on reward engineering. We propose Self-CriTeach, an LLM self-teaching and self-critiquing framework in which an LLM autonomously generates symbolic planning domains that serve a dual role: (i) enabling large-scale generation of robotic planning problem–plan pairs, and (ii) providing structured reward functions. First, the self-written domains enable large-scale generation of symbolic task plans, which are automatically transformed into extended CoT trajectories for supervised fine-tuning. Second, the self-written domains are reused as structured reward functions, providing dense feedback for reinforcement learning without manual reward engineering. This unified training pipeline yields a planning-enhanced LLM with higher planning success rates, stronger cross-task generalization, reduced inference cost, and resistance to imperfect logical states.

Lay Summary: Robots can use large language models to help plan, but they often fail when the task is long and complex. Recent methods allow language models to automatically write planning rules for robots, yet these rules are usually used only as tools for generating plans. In this work, we show that the same rules can also teach the model how to reason and provide feedback during training. We propose Self-CriTeach, a framework in which a language model generates planning rules, uses them to create robot planning examples, and converts these examples into step-by-step reasoning data for supervised fine-tuning. The generated rules are also reused as structured reward functions for reinforcement learning, reducing the need for manually designed rewards. As a result, the trained model can solve robot planning tasks more successfully, generalize better to new tasks, use fewer costs, and remain more robust when the robot’s logical state is imperfect.

Link To Code: https://markli1hoshipu.github.io/Plan_LLM/

Primary Area: Applications->Robotics

Keywords: Task and Motion Planning, PDDL, Robot Task Planning, LLMs for Planning, LLM Post training, Embodied AI

Originally Submitted PDF: pdf

Submission Number: 27510

Loading