RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks

Mingxuan Yan; Yuping Wang; Zechun Liu; Jiachen Li

RDD: Retrieval-Based Demonstration Decomposer for Planner Alignment in Long-Horizon Tasks

Mingxuan Yan, Yuping Wang, Zechun Liu, Jiachen Li

Published: 27 May 2026, Last Modified: 04 Jun 2026FMEA @ CVPR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Sub-task Discovery, Robotic Learning

TL;DR: RDD is a controllable and efficient visual su-btask labelling algorithm.

Abstract: Long-horizon embodied agents increasingly combine foundation-model planners with low-level visuomotor policies, but the two levels are often trained from differently segmented data. In hierarchical vision-language-action (VLA) frameworks, a vision-language model (VLM)-based planner must decompose complex manipulation tasks into simpler sub-tasks that the low-level policy can execute. Finetuning such planners for a new task requires demonstrations segmented into sub-tasks, yet human annotation is expensive and heuristic segmentations can deviate from the visuomotor policy's training distribution, degrading embodied decision-making. We propose a Retrieval-based Demonstration Decomposer (RDD), a training-free method that decomposes video demonstrations by retrieving visually similar sub-task intervals from the low-level policy's training data. RDD formulates sub-task identification as an optimal partitioning problem and solves it efficiently with dynamic programming, directly aligning the planner's finetuning data with the policy's learned capabilities. Experiments on simulation and real-world manipulation benchmarks show that RDD outperforms state-of-the-art heuristic decomposition methods and improves planner-policy coordination for long-horizon embodied tasks.

Submission Number: 18

Loading