Keywords: ChatGPT, reviewing research papers, LLM watermarking, long context windows, fine-tuning, randomized human blind evaluation
TL;DR: Using LLMs for reviewing research papers has benefits but also challenges that we address by LLM watermarking, relevant context, error and shortcoming detection, blind human evaluation, and improving quality of papers and reviews with human feedback.
Abstract: Human reviews of research papers are slow and of variable quality. Hence there is increasing interest in using large language models (LLMs) such as GPT to review research papers. This paper develops a proof-of-concept LLM review process that shows LLMs offer consistently high-quality reviews almost instantly. However, many challenges and limitations remain: risk of misuse, inflated review scores, overconfident ratings, skewed score distributions, and limited prompt length. We mitigate these issues without prompt engineering by using LLM watermarking to mark LLM-generated reviews; classifying and detection errors and shortcomings of papers; and using long-context windows that include the review form, entire paper, reviewer guidelines, code of ethics and conduct, area chair guidelies, and previous year statistics; and a blind human evaluation of reviews. We aim to use OpenReviewer to review and revise research papers, improving their quality. This work identifies and addresses drawbacks associated with GPT as a reviewer and enhances the quality of the reviewing process based on a randomized human blind evaluation. Making OpenReviewer available as an open online service that generates reviews will allow the use of scalable human feedback to learn and improve.
Primary Area: datasets and benchmarks
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 8844
Loading