Are the confidence scores of reviewers consistent with the review content? Evidence from top conference proceedings in AI
Abstract: Peer review is a critical process used in academia to assess the quality and validity of research articles. Top-tier conferences in the field of artificial intelligence (e.g. ICLR and ACL et al.) require reviewers to provide confidence scores to ensure the reliability of their review reports. However, existing studies on confidence scores have neglected to measure the consistency between the comment text and the confidence score in a more refined way, which may overlook more detailed details (such as aspects) in the text, leading to incomplete understanding of the results and insufficient objective analysis of the results. In this work, we propose assessing the consistency between the textual content of the review reports and the assigned scores at a fine-grained level, including word, sentence and aspect levels. The data used in this paper is derived from the peer review comments of conferences in the fields of deep learning and natural language processing. We employed deep learning models to detect hedge sentences and their corresponding aspects. Furthermore, we conducted statistical analyses of the length of review reports, frequency of hedge word usage, number of hedge sentences, frequency of aspect mentions, and their associated sentiment to assess the consistency between the textual content and confidence scores. Finally, we performed correlation analysis, significance tests and regression analysis on the data to examine the impact of confidence scores on the outcomes of the papers. The results indicate that textual content of the review reports and their confidence scores have high level of consistency at the word, sentence, and aspect levels. The regression results reveal a negative correlation between confidence scores and paper outcomes, indicating that higher confidence scores given by reviewers were associated with paper rejection. This indicates that current overall assessment of the paper’s content and quality by the experts is reliable, making the transparency and fairness of the peer review process convincing. We release our data and associated codes at https://github.com/njust-winchy/confidence_score.
Loading