Keywords: Multi-Objective Optimization, Context Engineering, LLM-judge, ML Reusability
TL;DR: Agents for multi-objective optimization
Abstract: Multi-objective sequential decision making (MO-SDM) is central to many real-world tasks, where an agent must make a sequence of decisions that balance multiple, often conflicting objectives. Multi-objective reinforcement learning (MORL) is a common approach to solving MO-SDM problems, but typically requires training from scratch under known objective configurations. In this paper, we propose a zero-shot paradigm: reusing a set of existing or pre-trained single-objective (SO) policies through large language model (LLM)-driven orchestration. We formalize this setting using context engineering and develop three types of orchestrators that vary in the context components they observe, such as knowledge, tools, and reflection, and in their ability to reason over policy behavior. Experiments on two domains (education and control) demonstrate that our method achieves competitive Pareto quality and per-objective performance while reducing computational cost by over $3\times$ compared to MORL methods. An ablation study further reveals how context richness and reflective foresight influence zero-shot decision quality.
Supplementary Material: zip
Primary Area: reinforcement learning
Submission Number: 8803
Loading