ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping

Shuang Chen; Hangyu Guo; Yimeng Ye; Shijue Huang; Wenbo Hu; Jiayu Chen; Manyuan Zhang; Haoxi Li; Song Guo; Nanyun Peng

ARES: Multimodal Adaptive Reasoning via Difficulty-Aware Token-Level Entropy Shaping

Shuang Chen, Hangyu Guo, Yimeng Ye, Shijue Huang, Wenbo Hu, Jiayu Chen, Manyuan Zhang, Haoxi Li, Song Guo, Nanyun Peng

Published: 26 Jan 2026, Last Modified: 11 Apr 2026ICLR 2026 PosterEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Entropy based Multimodal Adaptive Reasoning

TL;DR: ARES is a multimodal adaptive reasoning framework that curbs overthinking on easy tasks and promotes deeper exploration on hard ones, achieving state-of-the-art performance efficiently.

Abstract: Recent advances in multimodal large reasoning models (MLRMs) have substantially improved their ability to solve complex textual and visual tasks. However, these models tend to *overthink* on simple problems, producing unnecessarily lengthy reasoning traces, while *under-exploring* on challenging ones, leading to missed solutions. To address this imbalance, we propose **ARES**, a unified open-source framework for *adaptive reasoning* that dynamically allocates exploration effort based on task difficulty. Our approach is motivated by two key empirical findings: (i) while single-token entropy is noisy, *high window-entropy (HWE) tokens* (token-level entropies averaged under a sliding window) can reliably capture reasoning-critical moments; and (ii) reducing HWE usage benefits easy problems, while increasing it is essential for solving hard ones. Building on these insights, ARES introduces a two-stage training pipeline. In the *Adaptive Cold-Start* stage, we curate multimodal and textual data paired with reasoning traces of length proportional to problem difficulty, equipping the model with initial difficulty awareness. In the second stage, we develop *Adaptive Entropy Policy Optimization (AEPO)*, which uses HWE tokens as exploration triggers to decide *when to explore*, and a hierarchical entropy reward with dynamic KL control to decide *how much to explore*. Extensive experiments demonstrate that ARES achieves state-of-the-art performance and reasoning efficiency across diverse mathematical, logical, and multimodal benchmarks, while closing the gap to leading commercial systems under significantly lower inference costs. The anonymous code repository is available at https://anonymous.4open.science/r/ARES-60728M.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 3054

Loading