MOROCCO: Model Resource Comparison FrameworkDownload PDF

Anonymous

16 Nov 2021 (modified: 05 May 2023)ACL ARR 2021 November Blind SubmissionReaders: Everyone
Abstract: A new generation of pre-trained transformer language models has established new state-of-the-art results on many tasks, even exceeding the human level in standard NLU benchmarks. Despite the rapid progress, the benchmark-based evaluation has generally relied on the downstream performance as a primary metric which limits the scope of model comparison in terms of their practical use. This paper presents MOdel ResOurCe COmparison (MOROCCO), a framework that allows to assess models with respect to their downstream quality combined with two computational efficiency metrics such as memory consumption and throughput during the inference stage. The framework allows for a flexible integration with popular leaderboards compatible with jiant environment that supports over 50 downstream tasks. We demonstrate the MOROCCO applicability by evaluating 10 transformer models on two multi-task GLUE-style benchmarks in English and Russian and provide the model analysis.
0 Replies

Loading