Keywords: imitation learning, reinforcement learning from human feedback
Abstract: Diffusion policies have recently achieved impres-
sive results in robotic manipulation tasks. However, their rigid
reliance on demonstration data makes it difficult to adapt
behavior to evolving user preferences or dynamic deployment
environments. We propose FDPP (Fine-tuning Diffusion Policy
with Human Preference), a simple yet effective method that
leverages human preference labels to train a reward function,
which is then used to fine-tune pre-trained diffusion policies
via reinforcement learning. This approach allows robots to
align with new task constraints or personalized objectives
while retaining core task competence. We further incorporate
Kullback–Leibler (KL) regularization during fine-tuning to
prevent overfitting and preserve the original policy distribution.
Experiments on diverse robotic tasks demonstrate that FDPP
successfully reshapes policy behavior in alignment with human
intent without sacrificing task success.
Submission Number: 17
Loading