Keywords: alignment, extrapolated volition, economics, mechanism design, information economics
Abstract: We propose a simple formal definition of AI alignment motivated by an economic analogy. Market failures (such as imperfect information and imperfect assurance) naturally correspond to misalignment modes: in particular, though we leave this for future work, we believe that a ``recursive information markets'' mechanism naturally leads us to a formalization of extrapolated volition in terms of a value-of-information expression.
Serve As Reviewer: ~Abhimanyu_Pallavi_Sudhir1
Confirmation: I confirm that I and my co-authors have read the policies are releasing our work under a CC-BY 4.0 license.
Submission Number: 23
Loading