Keywords: 3D Gaussian Reconstruction, Diffusion Policy, 3D Scene Representation, Robot Action Generation
Abstract: Despite significant advances in robotic policy generation, effective coordination in embodied multi-agent systems remains a fundamental challenge—particularly in scenarios where agents must balance individual perspectives with global environmental awareness.
Existing approaches often struggle to balance fine-grained local control with comprehensive scene understanding, resulting in limited scalability and compromised collaboration quality.
In this paper, we present GauDP, a novel Gaussian-image synergistic representation that facilitates scalable, perception-aware imitation learning in multi-agent collaborative systems.
Specifically, GauDP reconstructs a globally consistent 3D Gaussian field from local-view RGB images, allowing all agents to dynamically query task-relevant features from a shared scene representation.
This design facilitates both fine-grained control and globally coherent behavior without requiring additional sensing modalities.
We evaluate GauDP on the RoboFactory benchmark, which includes diverse multi-arm manipulation tasks.
Our method achieves superior performance over existing image-based methods and approaches the effectiveness of point-cloud-driven methods, while maintaining strong scalability as the number of agents increases.
Extensive ablations and visualizations further demonstrate the robustness and efficiency of our unified local-global perception framework for multi-agent embodied learning.
Supplementary Material: zip
Primary Area: Applications (e.g., vision, language, speech and audio, Creative AI)
Submission Number: 16697
Loading