World Model Implanting for Test-time Adaptation of Embodied Agents

Published: 01 May 2025, Last Modified: 18 Jun 2025ICML 2025 posterEveryoneRevisionsBibTeXCC BY 4.0
TL;DR: World Model Implanting for Test-time Adaptation of Embodied Agents
Abstract: In embodied AI, a persistent challenge is enabling agents to robustly adapt to novel domains without requiring extensive data collection or retraining. To address this, we present a world model implanting framework (WorMI) that combines the reasoning capabilities of large language models (LLMs) with independently learned, domain-specific world models through test-time composition. By allowing seamless implantation and removal of the world models, the embodied agent's policy achieves and maintains cross-domain adaptability. In the WorMI framework, we employ a prototype-based world model retrieval approach, utilizing efficient trajectory-based abstract representation matching, to incorporate relevant models into test-time composition. We also develop a world-wise compound attention method that not only integrates the knowledge from the retrieved world models but also aligns their intermediate representations with the reasoning model's representation within the agent's policy. This framework design effectively fuses domain-specific knowledge from multiple world models, ensuring robust adaptation to unseen domains. We evaluate our WorMI on the VirtualHome and ALFWorld benchmarks, demonstrating superior zero-shot and few-shot performance compared to several LLM-based approaches across a range of unseen domains. These results highlight the framework’s potential for scalable, real-world deployment in embodied agent scenarios where adaptability and data efficiency are essential.
Lay Summary: Robots and virtual assistants often stumble when they are moved from a home environment—where they were trained—to a brand-new setting. Gathering fresh data or retraining them every time is costly and slow. We tackle this by giving the agent a “plug-and-play memory” called WorMI (World-Model Implanting). WorMI lets a large language model reason as usual while seamlessly adding or removing smaller, specialist world models, each learned in a different domain (like a kitchen, a workshop, or a game). At test time the agent quickly retrieves the most relevant specialists using compact prototype taken from its recent experience, then fuses their knowledge with a new attention mechanism that keeps all pieces talking to one another. Because nothing is retrained, the agent adapts on the fly, even to places it has never seen. In two challenging benchmarks, WorMI outperforms other zero-shot and few-shot methods, showing that this modular approach could make future household robots and game characters far more flexible without endless data collection.
Link To Code: https://github.com/mjyoo2/WorMI
Primary Area: Reinforcement Learning->Everything Else
Keywords: Embodied AI, Model implanting, World models, Large language model
Submission Number: 10236
Loading