Abstract: Double-blind conferences have engaged in debates over whether to allow authors to post their papers online on arXiv or elsewhere during the review process. Independently, some authors of research papers face the dilemma of whether to put their papers on arXiv due to its pros and cons. We conduct a study to substantiate this debate and dilemma via quantitative measurements. Specifically, we conducted surveys of reviewers in two top-tier double-blind computer science conferences---ICML 2021 (5361 submissions and 4699 reviewers) and EC 2021 (498 submissions and 190 reviewers). Our three main findings are as follows. First, more than a third of the reviewers self-report searching online for a paper they are assigned to review. Second, conference policies restricting authors from publicising their work on social media or posting preprints before the review process may have only limited effectiveness in maintaining anonymity. Third, outside the review process, we find that preprints from better-ranked institutions experience a very small increase in visibility compared to preprints from other institutions.
Submission Length: Long submission (more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=ywiegsPRSF, https://openreview.net/forum?id=tfPLmwS0jL
Changes Since Last Submission: We thank the previous reviewers and action editor (Laurent Charlin) for their careful feedback. The new submission is a resubmission with considerable modifications, including those addressing previous reviewers’ comments. We first enumerate the major modifications and additions we have made to the manuscript. We then discuss how these specifically address the two major concerns raised by the reviewers in our previous submission of this manuscript.
**New RQ**: Added a research question and related analysis to “What are the trends in posting preprints online and the visibility enjoyed by these preprints?” This analysis serves three purposes:
It is an important piece of this research direction and provides useful insight on its own.
We frame some of this analysis as a way to understand the potential impact of conference policies concerning the posting of submissions online before or during the review process.
The findings from this analysis provide reasoning for the assumptions made in the causal diagram proposed in this work.
The modifications related to the new RQ are in the text as follows:
Experiment design description is in the latter half of Section 3.2. The analysis and the results are in Section 4.2.
**Updated ATE**: In our analysis to address the causal question of effect of rank of papers on preprint visibility, we provide clear formulation of the setup and the definition of the average treatment effect (ATE) being estimated. Further, we provide analysis to show that under certain assumptions the estimator proposed measured the ATE of interest.
The modifications related to the updated ATE are in the text as follows:
The formal definition of the ATE is in Section 4.3.1.
The proposed estimator and related guarantees are in Section 4.3.2.
The outcome of the proposed estimator on the observed data is in Section 4.3.3.
**Issues raised by reviewers in previous submission and our responses**
* The first issue is that there is no (causal) edge between quality and visibility in the causal model. This seems unnatural as high-quality papers will likely enjoy more visibility (e.g., they will be shared online more) all else being equal. Hence, it is difficult to know if the conclusions from this model hold.
**Response:**
While the reviewer's point is plausible, we did an additional analysis (in section 4.2.3) on this where we find that indeed the effect of quality on visibility is negligible.
More specifically, the updated manuscript finds: “The numbers obtained suggest that a majority of the reviewers learned about the papers from preprint servers such as arXiv. On the other hand, only a small proportion of responses mentioned social media as the source of their information about the paper. This suggests that social media might only be a second order contributor to preprint visibility.”
Based on this finding, in the causal model proposed in our work, we assume that there is no causal edge between quality of the preprint and its visibility. Reproduced text from Section 4.3.1:
“In the model, we assume that there is no causal link between Q and V, this assumes that the initial viewership achieved by a paper does not get affected by
its quality. This assumption follows from the observations made in Section 4.2.3. The responses tallied in Table 3 and Table 4 indicate that a majority of the reviewers learnt about the preprints they were queried about from first hand sources such as arXiv and talk announcements.
This finding suggests that on most occasions people view a paper (and its authors) before knowing the paper’s quality. Thus, in our model we assume that the role of the quality of the paper in its visibility is absent.”
* The second issue is in the words of the reviewer (with light editing) "that the paper states its causal model but does not formalize what the causal estimand is (e.g., ATE, ATT). Furthermore, there is no statement (or proof) that any causal estimand is identified by the observed data, and no statement (or proof) that the proposed estimation procedure is a consistent estimate of any estimand. In short, the reader does not know the estimand, and does not know if it is being estimated."
**Response:**
In the resubmission, Section 4.3.2 provides a formal definition of the causal estimand (Average treatment effect, ATE). Next, the text provides a derivation (Equation 2a-2f) showing that under a set of assumptions stated in the text, the proposed estimator provides an unbiased estimate of the ATE.
Assigned Action Editor: ~Laurent_Charlin1
Submission Number: 4565
Loading