Rethinking Pragmatics in Large Language Models: Towards Open-Ended Evaluation and Preference Tuning

Rethinking Pragmatics in Large Language Models: Towards Open-Ended Evaluation and Preference Tuning

ACL ARR 2024 June Submission5939 Authors

16 Jun 2024 (modified: 02 Jul 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This study addresses the challenges of assessing and enhancing social-pragmatic inference in large language models (LLMs). We first highlight the inadequacy of current accuracy-based multiple choice question answering (MCQA) formats in assessing social-pragmatic reasoning, and propose the direct evaluation of models' free-form responses as measure, which - as our results show - correlates better with human judgement. Further, we explore the enhancement of pragmatic abilities in LLMs, proposing the use of preference optimization (PO) over supervised finetuning (SFT) since there’s no ``gold'' answer in responding to a social situation. Our results indicate that preferential tuning significantly outperforms and proves more robust than SFT across pragmatic phenomena, and offers a near-free launch to enhance models' pragmatic ability without compromising generic abilities. Lastly, we delve into LLMs' internal space and demonstrate that the substantial boost of the model's pragmatic reasoning capabilities is linked to deeper layer representation, mirroring human's high-level thinking. Our experiments span multiple pragmatic and social reasoning data sources, covering diverse phenomena, as well as a image referential game requiring multimodal theory of mind (ToM). With our refined paradigms for evaluating and enhancing pragmatic inference, this paper offers key insights for developing more socially aware language models.

Paper Type: Long

Research Area: Discourse and Pragmatics

Research Area Keywords: conversation,communication,

Contribution Types: NLP engineering experiment

Languages Studied: English

Submission Number: 5939

Loading