MoralGPT: Measuring Moral Dimensions via Large Language Models
Abstract: We propose MoralGPT, a lightweight tool for scoring moral relevance in text using large language models (LLMs). We conduct an extensive evaluation of MoralGPT to assess its generalizability in new datasets, concepts, moral frameworks, and applications. First, we show that MoralGPT significantly outperforms fine-tuned neural networks on four new datasets. Moreover, MoralGPT can score new moral concepts with high AUC, including a new moral foundation Liberty, as well as concepts in the Common Morality framework, with the caveat of duty. Additionally, our results on scoring the Moral Foundations Questionnaire (MFQ) and the Morality-as-cooperation (MAC) questionnaire can provide a reflection tool for moral psychologists designing the questions and scoring rules. Our work also yields two methodological insights: 1) It is important to prompt LLMs to label a set of related moral concepts simultaneously; 2) Learning a combination of pre-curated lexicons and LLM outputs could be helpful. Our results suggest that similar prompting strategies could be adopted for other subjective text analysis tasks.
Loading