LLMs as NLP Researchers: Paper (Meta-)Reviewing as a Testbed

LLMs as NLP Researchers: Paper (Meta-)Reviewing as a Testbed

ACL ARR 2024 June Submission4137 Authors

16 Jun 2024 (modified: 04 Aug 2024)ACL ARR 2024 June SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload? This study focuses on the topic of LLMs as NLP Researchers, particularly examining how effectively LLMs can perform paper (meta-)reviewing. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with "deficiency" labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) "LLMs as Reviewers", how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) "LLMs as Metareviewers", how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.

Paper Type: Long

Research Area: Resources and Evaluation

Research Area Keywords: benchmarking, NLP datasets, evaluation, metrics, AI for Research, AI for Peer Review, LLM

Contribution Types: Model analysis & interpretability, Data resources, Data analysis

Languages Studied: English

Submission Number: 4137

Loading