SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Alex Wang; Yada Pruksachatkun; Nikita Nangia; Amanpreet Singh; Julian Michael; Felix Hill; Omer Levy; Samuel R. Bowman

SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, Samuel R. Bowman

06 Sept 2019 (modified: 05 May 2023)NeurIPS 2019Readers: Everyone

Abstract: In the last year, new models and methods for pretraining and transfer learning have driven striking performance improvements across a range of language understanding tasks. The GLUE benchmark, introduced one year ago, offers a single-number metric that summarizes progress on a diverse set of such tasks, but performance on the benchmark has recently come close to the level of non-expert humans, suggesting limited headroom for further research. In this paper we present a new benchmark styled after GLUE with a new set of more difficult language understanding tasks, improved resources, and a public leaderboard.

CMT Num: 1828

Code Link: Benchmark website and data: https://super.gluebenchmark.com/, Baselines: https://github.com/nyu-mll/jiant/blob/master/scripts/superglue-baselines.sh

0 Replies

Loading