Learning Utility‑Calibrated Routing for Hierarchical Multi-Agents in Portfolio Decision‑Making

20 Sept 2025 (modified: 26 Jan 2026)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Financial LLM; portfolio management
Abstract: We study how tool‑using agents can make high‑stakes decisions under uncertainty and costs, with a focus on portfolio allocation. We introduce a hierarchical agent with a learned router that dispatches market contexts to specialized tools (e.g., event extractors, forecasters, options pricers) and an allocator that turns probabilistic predictions into trades under explicit risk and transaction constraints. Our training objective couples proper scoring rules for probabilistic calibration with risk‑sensitive portfolio utility and cost regularization, yielding utility‑calibrated predictions that are natively decision‑aware. To enable reliable offline assessment, we derive a doubly‑robust off‑policy evaluation procedure tailored to backtesting with market frictions, reducing bias and providing uncertainty estimates. Across two challenging settings—options‑only allocation over large‑cap technology names and multi‑asset allocation in the U.S. SP500 sector—our approach delivers consistent gains in expected utility and Sharpe, markedly improved probability calibration, and lower turnover while satisfying risk and exposure constraints. The architecture is modular and data‑agnostic, enabling seamless integration of new tools and experts while preserving end‑to‑end differentiability through the router and allocator. We release code and reproducible benchmarks to support rigorous evaluation of risk‑aware, tool‑using agents for financial decision‑making and beyond.
Primary Area: reinforcement learning
Submission Number: 23544
Loading