Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

ICLR 2026 Conference Submission13670 Authors

18 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: imitation learning, robotics, point clouds, point maps
TL;DR: Fourier feature projections improve all 3D modalities for diffusion imitation learning of high-precision tasks, but are especially beneficial for point cloud policies.
Abstract: Various 3D modalities have been proposed for high-precision imitation learning tasks to compensate for the short-comings of RGB-only policies. Modalities that explicitly represent positions in Cartesian space have an inherent advantage over purely image-based ones, since they allow policies to reason about geometry. Point clouds are a common way to represent geometric information, and have several benefits such as permutation invariance and flexible observation size. Despite their effectiveness, a number of hybrid 2D/3D architectures have been proposed in the literature, indicating that this performance can often be task-dependent. We hypothesize that this may be due to the spectral bias of neural networks towards learning low frequency functions, which especially affects models conditioned on slow-moving Cartesian features. Building on prior work that uses a parametric projection from Cartesian space into high-dimensional Fourier space to overcome the innate low-pass filtering characteristic of neural networks, we apply Fourier features to several representative point cloud encoder architectures. We validate this approach on challenging manipulation tasks from the RoboCasa and ManiSkill3 benchmarks, and find that adding Fourier feature projections provides benefits across diverse encoder architectures and tasks, with meaningful improvements seen in the vast majority of tasks. We show that Fourier features are a general-purpose tool for point cloud-based imitation learning, which consistently improves performance by enabling policies to leverage geometric details more effectively than models conditioned on Cartesian features.
Primary Area: applications to robotics, autonomy, planning
Submission Number: 13670
Loading