GAM: Generalized Action Model for Robotic Manipulation

Uday Kamal; Jeffrey Jiang; Chaitanya Mitash; Weiyao Wang; Ikechukwu Obi-Okoye; Che Wang; Aatif Jiwani; Huitan Mao; Hai Huu Nguyen; Michel Breyer; Wenyu Xia; Nikhil Mishra; Kostas Bekris; Kapil Katyal

GAM: Generalized Action Model for Robotic Manipulation

Uday Kamal, Jeffrey Jiang, Chaitanya Mitash, Weiyao Wang, Ikechukwu Obi-Okoye, Che Wang, Aatif Jiwani, Huitan Mao, Hai Huu Nguyen, Michel Breyer, Wenyu Xia, Nikhil Mishra, Kostas Bekris, Kapil Katyal

Published: 13 May 2026, Last Modified: 13 May 2026ICRA 2026: From Data to Decisions PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: VLA, Diffusion, Manipulation, Cluttered Scene, Multi-task, Cross-embodiment

TL;DR: We propose GAM, a unified diffusion-based VLA model that generates and scores diverse manipulation actions across embodiments via language prompts, validated at>95% success over 10M real-world cycles.

Abstract: We present the Generalized Action Model (GAM), a production-grade foundation model that unifies robotic action generation across diverse tasks and embodiments through a vision-language-action (VLA) pipeline. GAM addresses two fundamental barriers in scaling robotic manipulation: the lack of a unified representation for diverse robot end-effectors and the prohibitive cost of acquiring high-quality interaction data at scale. Our approach introduces (1) a unified language-prompted policy and critic that generates and scores diverse manipulation actions---including suction grasps, pinch grasps, caging, and placements---from a single model, (2) a scalable offline data generation pipeline that recomputes dense action candidates and quality labels in simulation from real-world observations, and (3) an end-effector encoding that enables zero-shot transfer to unseen hardware. We validate GAM on a fleet of robotic work-cells, where it has executed over 10 million pick-and-place cycles with greater than $95\%$ pick and greater than $90\%$ place success rates. The same model generalizes to hybrid end-effectors with distinct grasping modes at greater than $90\%$ success.

Submission Number: 46

Loading