Moral High Ground: A text-based games benchmark for moral evaluation

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: Text-based Games, LLM Evaluation, LLM Tuning
TL;DR: Text-based games for evaluating and tuning large language models
Abstract: This paper introduces a benchmark for the evaluation of large language models on moral values and business principles. The main focus of this framework is to evaluate moral and ethical reasoning ability of large language models using text-based games, which can be played by both human player and models. We present these games to the player as an interaction between the player and the environment. Each action in these games is associated with a reward based on the moral and ethical values, i.e., higher reward implies higher moral values and vice versa. We score the game trajectory taken by a player by combining the rewards of the individual action, with highest score corresponding with the most moral or ethical paths possible. This will enable us to compare different models and human players on the moral values. In addition, this framework can be used to teach/tune the large language models using these text-based games on desired moral values and business principles. Through this framework, we hope to expand upon the diverse area of alignment techniques to help ensure future models grasp the often nuanced topics of moral and ethical values.
Supplementary Material: zip
Primary Area: transfer learning, meta learning, and lifelong learning
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8903
Loading