Keywords: Embodied AI, Vision-Language Models, Memory-Augmented Reasoning, LLM Agents
TL;DR: MOSAIC is a memory-conditioned agent that, for each embodied task, retrieves similar past designs and uses a frozen frontier meta-agent to synthesize a task-specific module-graph from a typed library.
Abstract: Embodied agents demand a heterogeneous mix of capabilities: spatial precision for manipulation, persistent mapping for long-horizon navigation, and multi-step planning under natural-language instruction.
The dominant remedy folds every such capability into one foundation model through spatial pre-training and geometry-aware fine-tuning, shipping a single fixed pipeline for every task in the benchmark. We contend that the question is not how to grow capability into the model, but how to compose, per task, the context that lets the model's existing capability bear on the task at hand. We formulate embodied agent design as per-task module-graph synthesis: a joint search over which modules to include, which concrete tool implements each, and how each tool's parameters bind to environment observations and upstream outputs.
Solving this search by full self-evolution is infeasible in embodied control, where validating each candidate requires a multi-step rollout and credit arrives only at episode end. We therefore introduce MOSAIC, a memory-conditioned agent that retains the load-bearing slice of self-evolution and drops the rest: the typed module catalog, the meta-agent, and the model weights stay fixed, while a memory of past pipeline designs and their outcomes evolves with deployment.
A frontier meta-agent runs once per episode to compose a pipeline conditioned on retrievals from this memory, and a small executor runs every step.On EB-Nav, MOSAIC reaches 50.0\% average success across five capability subsets, an 18.4\% gain over the default EmbodiedBench agent at the same Qwen3-VL-8B backbone and +5.3\% over a baseline that activates every module in the same catalog without per-task selection.
Submission Number: 39
Loading