Keywords: Robotic manipulation, diffusion policy, geometric conditioning, trajectory refinement, physical consistency
Abstract: Learning physically consistent and robust manipulation policies from perceptual inputs remains a key challenge in robot learning. Most existing diffusion-based approaches condition only on raw point clouds or robot states, failing to exploit the underlying geometric relations that govern feasible object interactions. To address this gap, we propose GeoDiff, a geometry-conditioned diffusion policy for refined robotic trajectory generation. GeoDiff constructs object-centric geometric representations via clustering-based point cloud segmentation and encodes relational features capturing spatial dependencies between the robot and surrounding objects. Conditioned on these geometric features, the diffusion policy generates multiple stochastic trajectory candidates under consistent initial conditions. A physics-aware evaluation module then scores each candidate based on smoothness, goal accuracy, and collision safety, selecting the optimal physically valid trajectory. We leverage a composite loss combining denoising reconstruction and differentiable physical consistency to further enforces smooth, goal-directed, and collision-free motion generation. Extensive experiments across three well-known simulated manipulation benchmarks demonstrate that GeoDiff achieves over 15% improvement in task success rate and motion smoothness compared with state-of-the-art diffusion and optimization-based baselines. Those results highlight the importance of geometric conditioning and physics-guided refinement for reliable diffusion-based robotic manipulation.
Submission Number: 116
Loading