Keywords: imitation learning, robotics, point clouds, point maps
TL;DR: Fourier feature projections improve all 3D modalities for diffusion imitation learning of high-precision tasks, but are especially beneficial for point cloud policies.
Abstract: Various 3D modalities have been proposed for high-precision imitation learning tasks to compensate for the short-comings of RGB-only policies.
Modalities that explicitly represent positions in Cartesian space, such as most point cloud encoder architectures, have an inherent advantage over purely image-based ones, since they allow policies to reason about geometry.
Despite the effectiveness of such architectures, a number of hybrid 2D/3D architectures have been proposed in the literature, indicating that this performance can often be task-dependent.
We hypothesize that this discrepancy may be due to the spectral bias of neural networks towards learning low frequency functions, which especially affects architectures conditioned on slow-moving Cartesian features.
We thus propose to use a parametric projection to map point clouds from Cartesian space into high-dimensional Fourier space when using a point cloud encoder.
We experimentally validate the use of these Fourier features on challenging manipulation tasks from the RoboCasa and ManiSkill3 benchmarks, and on a real robot setup.
Despite their simplicity, we find that Fourier features provide robust and significant benefits across diverse encoder architectures and tasks.
These results indicate that Fourier features let policies leverage geometric details more effectively than Cartesian features, showing their potential as a general-purpose tool for point cloud-based imitation learning.
The overview and demos are available on our [project page: https://fourier-il.github.io/fourier-il](https://fourier-il.github.io/fourier-il/).
Primary Area: applications to robotics, autonomy, planning
Submission Number: 13670
Loading