Keywords: Principal–agent games, Multi-agent reward design, Theory of compound AI systems
TL;DR: We theoretically study the power and limitations of aggregation in compound AI systems by building on a principal-agent framework.
Abstract: When designing AI systems for complex tasks, it is becoming increasingly common to query a model in different ways and aggregate the outputs to create a compound AI system. In this work, we mathematically study the power and limitations of aggregation within a stylized principal-agent framework. This framework models how the system designer can partially steer each agent's output through reward specification, but still faces limitations due to prompt engineering ability and model capabilities.
Our analysis identifies three mechanisms—feasibility expansion, support expansion, and binding set contraction—through which aggregation can expand the set of elicitable outputs. We prove that any aggregation operation must implement one of these mechanisms to provide benefit, though none are sufficient alone. To sharpen this picture, we establish necessary and sufficient conditions for when aggregation expands elicitable outputs. Altogether, our results take a step towards characterizing when compound AI systems can overcome limitations in model capabilities and in prompt engineering.
Primary Area: learning theory
Submission Number: 13523
Loading