Keywords: alignment, preference modeling, large language models, LLMs, human-centered AI
TL;DR: We integrate an LLM with a latent factor model to predict individual’s agreement on text perspectives with increasing efficiency at scale
Abstract: In multi-principal-agent alignment scenarios including governance, markets, conflict resolution, and AI decision-making, it is infeasible to elicit every principal's view on all perspectives relevant to an agent's decisions. Elicitation inference optimization (EIO) aims to minimize the $n$ elicitations needed to approximate $N$ principal's views across $K$ perspectives. In this work, we demonstrate an EIO approach where data efficiency ($NK/n$) increases with scale. We introduce STUMP: an elicitation inference model which integrates a large language model with a latent factor model to enable learning transfer across samples, contexts, and languages. We characterize STUMP's performance on a set of elicitation primitives from which scalable elicitation (sampling) protocols can be constructed. Building from these results, we design and demonstrate two elicitation protocols for STUMP where, surprisingly, data efficiency scales like $O(n)$ in the number of elicitations $n$. In other words, the number of elicitations needed per principal remains constant even as the number of perspectives and principals grows. This makes it possible to approximate complex, high-dimensional preference signals spanning principal populations at scale---which may then be incorporated into agent decision-making.