Abstract: Robotic manipulation requires learning a generalizable policy that can adapt to complicated new environments. However, existing methods typically overlook the inherent task complexity and employ a policy with the same budget for tasks with varied difficulties, facing challenges in inefficient computational resource allocation and zero-shot generalization. In this work, we identify three facets of complexity imbalance issues in the current manipulation tasks at the Inter-task, Intra-task, and Noise-timesteps levels. To address this gap, we introduce the Complexity-Aware Policy (CAP), a novel approach integrating flow matching with a Transformer-based backbone and a Mixture of Heterogeneous Experts (MoHE) structure for policy learning. By leveraging Rectified Flow and dynamically adjusting model capacity based on task complexity, which is assessed through features like object counts and precision needs, our method allocates computational resources efficiently and effectively. This results in faster convergence, optimized computational resource usage, and improved precision across diverse manipulation tasks. Our proposed method achieves the state-of-the-art performance on widely-used CALVIN, LIBERO, and SimplerEnv benchmarks. Our method is further validated through six real-world experiments, where it consistently outperforms baseline methods across all tasks.
External IDs:dblp:journals/tcsv/YangHYZPX26
Loading