ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models

Anonymous

ElitePLM: An Empirical Study on General Language Ability Evaluation of Pretrained Language Models

Anonymous

17 Sept 2021 (modified: 05 May 2023)ACL ARR 2021 September Blind SubmissionReaders: Everyone

Abstract: Pretrained language models (PLMs), such as BERT and GPT-3, have dominated the majority of NLP tasks. However, relatively little work has been conducted on systematically evaluating the language abilities of PLMs. In this paper, we present a large-scale empirical study on gen\underline{E}ral \underline{l}anguage ab\underline{i}li\underline{t}y \underline{e}valuation of PLMs (ElitePLM). We first design four evaluation dimensions in ElitePLM, including memory, comprehension, reasoning, and composition, and further measure ten widely-used PLMs within five categories. Our empirical results demonstrate that: (1) the pretraining objectives and strategies have significant impacts on PLMs performance in downstream tasks; (2) fine-tuning PLMs in downstream tasks is usually sensitive to the data size and distribution; (3) PLMs have excellent transferability between similar tasks. Our experimental results summarize several important findings, which can guide the future work to choose, apply, and design PLMs for specific tasks. We have made all the details of experiments publicly available at https://anonymous.4open.science/r/Paper-for-ACL-4FD1.

0 Replies

Loading