Elicitation Inference Optimization for Multi-Principal-Agent AlignmentDownload PDF

Published: 01 Feb 2023, Last Modified: 13 Feb 2023Submitted to ICLR 2023Readers: Everyone
Keywords: alignment, large language models, LLMs, NLP, transfer learning, human-centered AI, LLMs, preference modeling
TL;DR: We integrate an LLM with a latent factor model to predict individual’s agreement on text perspectives with increasing efficiency at scale
Abstract: In multi-principal-agent alignment scenarios spanning governance, markets, diplomacy, and AI, it is infeasible to elicit every principal's view on all perspectives relevant to agent decisions. Elicitation inference optimization (EIO) aims to minimize the $n$ elicitations needed to approximate $N$ principal's views across $K$ perspectives. In this work, we demonstrate an EIO approach where data efficiency ($NK/n$) increases with scale. We introduce STUMP: an elicitation inference model which integrates an LLM with a latent factor model to enable learning transfer across samples, contexts, and languages. Then, we characterize STUMP's performance on a set of elicitation primitives from which scalable elicitation (sampling) protocols can be constructed. Building from these results, we design and demonstrate two scalable elicitation protocols for STUMP where data efficiency grows boundlessly, scaling like $O(n)$ in the number of elicitations $n$. This makes it possible to obtain complex, high-dimensional preference signals spanning principal populations at any scale.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics
Submission Guidelines: Yes
Please Choose The Closest Area That Your Submission Falls Into: Social Aspects of Machine Learning (eg, AI safety, fairness, privacy, interpretability, human-AI interaction, ethics)
5 Replies

Loading