Keywords: Principal–agent games, Multi-agent reward design, Theory of compound AI systems
TL;DR: We theoretically study the power and limitations of aggregation in compound AI systems by building on a principal-agent framework.
Abstract: When designing AI systems for complex tasks, it is becoming increasingly common to aggregate multiple copies of the same model to create a compound AI system. Given that the homogeneity of these models, this raises the question of when aggregation unlocks greater performance than querying a single model. In this work, we mathematically study the power and limitations of aggregation within a stylized principal-agent framework. This framework models how the system designer can partially steer each agent's output through reward specification, but still faces limitations in prompt engineering ability and model capabilities. To analyze the power of aggregation, we characterize when an aggregation operation expands the set of outputs that the system designer can elicit. We also analyze the limitations of aggregation, showing tight conditions under which aggregation does not expand the set of elicitable outputs, regardless of the level of limitations of prompt engineering ability. Finally, we apply these characterizations to simple intersection-based and union-based aggregation rules. Altogether, our results take a step towards characterizing when compound AI systems can overcome limitations in model capabilities and in prompt engineering.
Primary Area: learning theory
Submission Number: 13523
Loading