Do LLMs understand Pragmatics? An Extensive Benchmark for Evaluating Pragmatic Understanding of LLMs

Settaluri Lakshmi Sravanthi; Meet Doshi; Pavan Kalyan Tankala; Rudra Murthy; Pushpak Bhattacharyya

Do LLMs understand Pragmatics? An Extensive Benchmark for Evaluating Pragmatic Understanding of LLMs

Settaluri Lakshmi Sravanthi, Meet Doshi, Pavan Kalyan Tankala, Rudra Murthy, Pushpak Bhattacharyya

24 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX

Keywords: LLMs, Pragmatics, Benchmark, NLP, Evaluation

Abstract: Large language models (LLMs) are typically evaluated based on semantic understanding and are believed to be capable of handling general language processing. While LLMs can mimic human-like responses, they still are a contraption in their pragmatic or contextual understanding of language. To test this hypothesis, we subject LLMs to the complex task of pragmatics. We conducted evaluation across \textit{fourteen} tasks spanning \textit{four} domains of pragmatics namely, Implicature, Presupposition, Reference, and Deixis. For each task, we curated high-quality test sets, consisting of Multiple Choice Question Answers (MCQA). We evaluate a wide range of LLMs with different types and sizes. Our findings reveal that LLMs with no instruction fine-tuning have near-random accuracy on many tasks. The performance gradually increases with the increase in model capacity. Additionally, we create a unified benchmark enabling the research community to better assess the underlying pragmatic understanding of the language models.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 9107

Loading