Learning Diffusion Policy from Primitive Skills for Robot Manipulation
Abstract: Diffusion policies (DP) have recently shown great promise
for generating actions in robotic manipulation. However, ex-
isting approaches often rely on global instructions to pro-
duce short-term control signals, which can result in misalign-
ment in action generation. We conjecture that the primitive
skills, referred to as fine-grained, short-horizon manipula-
tions, such as “move up” and “open the gripper”, provide a
more intuitive and effective interface for robot learning. To
bridge this gap, we propose SDP, a skill-conditioned DP that
integrates interpretable skill learning with conditional action
planning. SDP abstracts eight reusable primitive skills across
tasks and employs a vision-language model to extract discrete
representations from visual observations and language in-
structions. Based on them, a lightweight router network is de-
signed to assign a desired primitive skill for each state, which
helps construct a single-skill policy to generate skill-aligned
actions. By decomposing complex tasks into a sequence of
primitive skills and selecting a single-skill policy, SDP en-
sures skill-consistent behavior across diverse tasks. Extensive
experiments on two challenging simulation benchmarks and
real-world robot deployments demonstrate that SDP consis-
tently outperforms SOTA methods, providing a new paradigm
for skill-based robot learning with diffusion policies.
Loading