Abstract: Human-AI Value Alignment has emerged as a central challenge in the rapid deployment of Artificial Intelligence (AI). In many applications like Large Language Models (LLMs), the goals and constraints for the AI model's desired behavior are known only to a (potentially very small) group of humans. The objective then is to develop mechanisms that allow humans to communicate these goals both efficiently and reliably to AI models like LLMs prior to their deployment. This requires explicitly reasoning about human strategies for shaping the behavior of AI agents, and how these strategies are derived from the humans' prior beliefs about the AI's own learning process. In this position paper, we argue that it is natural to view the alignment problem from the perspective of multiagent systems (MAS). We briefly survey open alignment challenges in the finetuning of large language models and in zero-shot learning with LLMs. We then connect these open questions to concepts developed for multiagent problems (particularly for ad hoc coordination), and discuss how these ideas may be applied to address mis-alignment in LLMs.
Paper Type: short
Research Area: Machine Learning for NLP
Contribution Types: Position papers, Surveys
Languages Studied: English
0 Replies
Loading