Keywords: Embodied AI, Domain-Adaptive Planning, Symbolic Reasoning, Physical Grounding, Reinforcement Learning
TL;DR: JTEmbodiedAgent aligns optimization with domain constraints, utilizing SFT for symbolic rigidity in VirtualHome and RL for physical grounding in Behavior.
Abstract: Translating natural language instructions into executable plans requires balancing symbolic rigidity with physical adaptability. In this technical report for the Embodied Agent Interface (EAI) Challenge, we present a domain-adaptive methodology that aligns optimization strategies with environmental constraints. We demonstrate that while Supervised Fine-Tuning (SFT) is sufficient for the deterministic, logic-driven nature of VirtualHome environment, the high-variance physics of the Behavior environment necessitates a Reinforcement Learning (RL) paradigm to prevent out-of-distribution forgetting and hallucination. To this end,We propose a hybrid framework that integrates Gemini 2.5-pro for high-level reasoning and sequencing with a Qwen3-14B backbone, optimized via SFT and GRPO/GMPO, to handle precise symbolic tasks and state grounding. Our results demonstrate a distinct trade-off: learned policies excel at transition modeling and goal interpretation, whereas prompt engineering outperforms in long-horizon sequencing for sparse-reward domains.
Submission Number: 6
Loading