Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds

Anonymous

Simple Linguistic Inferences of Large Language Models (LLMs): Blind Spots and Blinds

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone

Abstract: We evaluate LLMs' language understanding capacities on simple inference tasks that most humans find trivial. Specifically, we target (i) grammatically-specified entailments, (ii) premises with evidential adverbs of uncertainty, and (iii) monotonicity entailments. We design evaluation sets for these tasks and conduct experiments in both zero-shot and chain-of-thought setups, and with multiple prompts. The models exhibit moderate to low performance on these evaluation sets in all settings. Subsequent experiments show that embedding the premise under presupposition triggers or non-factives, which should exhibit opposite linguistic behavior, causes ChatGPT to predict entailment more frequently in the zero-shot and less frequently in the chain-of-thought setup, and in each case regardless of the correct label. Similar experiments with LLaMA 2 exhibit different yet equally flawed tendencies. Overall these results suggest that, despite LLMs' celebrated language understanding capacity, they have blindspots with respect to certain types of entailments, and that certain information-packaging structures act as "blinds'' overshadowing the semantics of the embedded premise.

Paper Type: long

Research Area: Semantics: Sentence-level Semantics, Textual Inference and Other areas

Contribution Types: Model analysis & interpretability

Languages Studied: English

Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.

0 Replies

Loading