Abstract: Tracing the source of research papers is a fundamental yet challenging task for researchers. The billion-scale citation relations between papers can hinder researchers from understanding the evolution of science. To date, there is still a lack of an accurate dataset constructed by professional researchers to identify the direct source of their studied papers, based on which automatic algorithms can be developed to expand the evolutionary knowledge of science. In this paper, we study the problem of paper source tracing (PST) and construct a high-quality and ever-increasing benchmark dataset PST-Bench in computer science. Based on PST-Bench, we also reveal several intriguing discoveries, such as the difference in the life force of papers in different areas (e.g., AI and HPC). An exploration of various methods validates the hardness of PST-Bench, pinpointing potential directions on this topic. The dataset and codes have been available.
Paper Type: long
Research Area: Information Retrieval and Text Mining
Contribution Types: Data resources, Data analysis
Languages Studied: English
Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.
0 Replies
Loading