Empowering Channel-of-Mobile-Experts with Informative Hybrid-Capabilities Reasoning

ICLR 2026 Conference Submission24905 Authors

20 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Mobile Agent, Hybrid-Capabilities Reasoning
TL;DR: We propose Channel-of-Mobile-Experts (CoME) to enhance hybrid-capabilities reasoning on mobile task automation, via infomation gain driven DPO
Abstract: Mobile Agents can autonomously execute user instructions, which requires hybrid-capabilities reasoning, including screen summary, subtask planning, action decision and action function. However, existing agents struggle to achieve both decoupled enhancement and balanced integration of these capabilities. While Mixture-of-Experts (MoE) supports capability decoupling, the input-oriented activation prevents the selection of expert aligning with the reasoning stage. To address these challenges, we propose Channel-of-Mobile-Experts (CoME), a novel agent architecture consisting of four distinct experts, each aligned with a specific reasoning stage, CoME activates the corresponding expert to generate output tokens in each reasoning stage via output-oriented activation. To empower CoME with hybrid-capabilities reasoning, we introduce a progressive training strategy: Expert-FT enables decoupling and enhancement of different experts' capability; Router-FT aligns expert activation with the different reasoning stage; CoT-FT facilitates seamless collaboration and balanced optimization across multiple capabilities. To mitigate error propagation in hybrid-capabilities reasoning, we propose InfoGain-Driven DPO (Info-DPO), which uses information gain to evaluate the contribution of each intermediate step, thereby guiding CoME toward more informative reasoning. Comprehensive experiments show that CoME outperforms dense mobile agents and MoE methods on both AITZ and AMEX datasets.
Primary Area: applications to computer vision, audio, language, and other modalities
Submission Number: 24905
Loading