Commonsense Frame Completion and its Probabilistic EvaluationDownload PDF

Anonymous

16 Jul 2022 (modified: 05 May 2023)ACL ARR 2022 July Blind SubmissionReaders: Everyone
Abstract: Commonsense knowledge is critical to achieving artificial general intelligence. Large language models have demonstrated impressive performance on commonsense tasks, however these tasks are often posed as multiple-choice questions, allowing models to exploit systematic biases. Commonsense is also inherently probabilistic; a plumber could repair a sink in a kitchen or a bathroom, or even a basement, although the former answers are more probable. Existing tasks do not capture the probabilistic nature of common sense. To this end we present commonsense frame completion (CFC), a new generative task which evaluates common sense via multiple open-ended generations. We also propose a method of probabilistic evaluation which strongly correlates with human judgements. Humans drastically outperform strong language model baselines on our dataset, indicating this approach is both a challenging and useful evaluation of machine common sense.
Paper Type: long
0 Replies

Loading