Trolley Problems for Large Language Models Across 100+ Languages
Primary Area: societal considerations including fairness, safety, privacy
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Keywords: AI Safety, Moral Decision-Making, Cognitive Science, Large Language Models
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
TL;DR: Our paper investigates the moral tendencies of large language models when addressing trolley problems.
Abstract: As large language models (LLMs) are becoming deployed in more and more systems, it is crucial to inspect the moral tendencies in their decision-making process. In this paper, we take the largest set of moral judgments collected to date, the trolley problems, to check the preferences stated in LLMs' responses, as well as their supporting reasons. We first conduct a large-scale inspection on over 10K trolley problem scenarios in English, and compare them with human preferences. Then we extend the questions to different cultures by changing the language of the prompts and positioning the questions for different persona. We discover that LLMs tend to generate more human-aligned responses for the English culture, but also show interesting tendencies on other cultures.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 9413
Loading