ChatLog: Carefully Evaluating the  Evolution of ChatGPT Across Time

ChatLog: Carefully Evaluating the Evolution of ChatGPT Across Time

ACL ARR 2024 April Submission2 Authors

05 Apr 2024 (modified: 05 Jun 2024)ACL ARR 2024 April SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: ChatGPT has achieved great success and can be considered to have acquired an infrastructural status. There are abundant works for evaluating ChatGPT on benchmarks. However, existing benchmarks encounter two challenges: (1) Disregard for periodical evaluation and (2) Lack of fine-grained features. In this paper, we construct ChatLog, an ever-updating dataset with large-scale records of diverse long-form ChatGPT responses for 21 NLP benchmarks from March, 2023 to now. We conduct a comprehensive performance evaluation to find that most capabilities of ChatGPT improve over time except for some abilities, and there exists a step-wise evolving pattern of ChatGPT. We further analyze the inherent characteristics of ChatGPT by extracting the knowledge and linguistic features. We find some stable features that stay unchanged and apply them on the detection of ChatGPT-generated texts to improve the robustness of cross-version detection. We will continuously maintain our project at GitHub to facilitate future research.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: automatic creation and evaluation of language resources, statistical testing for evaluation

Contribution Types: NLP engineering experiment, Data resources, Data analysis

Languages Studied: English

Section 2 Permission To Publish Peer Reviewers Content Agreement: Authors grant permission for ACL to publish peer reviewers' content

Submission Number: 2

Loading