Orchestrating Pre-Trained Agents for Multi-Objective Decision Making

Carrie Wang; Reynold Cheng; Behrooz Omidvar Tehrani; Sihem Amer-Yahia

Orchestrating Pre-Trained Agents for Multi-Objective Decision Making

Carrie Wang, Reynold Cheng, Behrooz Omidvar Tehrani, Sihem Amer-Yahia

17 Sept 2025 (modified: 11 Feb 2026)Submitted to ICLR 2026EveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-Objective Optimization, Context Engineering, LLM-judge, ML Reusability

TL;DR: Agents for multi-objective optimization

Abstract: Multi-objective sequential decision making (MO-SDM) is central to many real-world tasks, where an agent must make a sequence of decisions that balance multiple, often conflicting objectives. Multi-objective reinforcement learning (MORL) is a common approach to solving MO-SDM problems, but typically requires training from scratch under known objective configurations. In this paper, we propose a zero-shot paradigm: reusing a set of existing or pre-trained single-objective (SO) policies through large language model (LLM)-driven orchestration. We formalize this setting using context engineering and develop three types of orchestrators that vary in the context components they observe, such as knowledge, tools, and reflection, and in their ability to reason over policy behavior. Experiments on two domains (education and control) demonstrate that our method achieves competitive Pareto quality and per-objective performance while reducing computational cost by over $3\times$ compared to MORL methods. An ablation study further reveals how context richness and reflective foresight influence zero-shot decision quality.

Supplementary Material: zip

Primary Area: reinforcement learning

Submission Number: 8803

Loading