Modeling Multi-Scale Scientific Impact via Heterogeneous Networks and LLMs

Modeling Multi-Scale Scientific Impact via Heterogeneous Networks and LLMs

ICLR 2026 Conference Submission17870 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Impact Prediction; AI for Science; LLMs

Abstract: Accurately modeling scientific impact is essential for understanding research dynamics and supporting evidence-based decisions in funding, hiring, and policy. However, despite substantial interest, three core challenges remain unresolved: (i) the heterogeneous and multi-scale nature of scientific impact, encompassing short-term citations to long-term disciplinary influence; (ii) the underexplored potential of large language models (LLMs) to capture the rich semantics embedded in scientific texts; and (iii) the absence of standardized benchmarks, which impedes rigorous comparison and evaluation of predictive methods. In this work, motivated by the need to capture both paper content and research trends, we propose a unified framework that integrates heterogeneous graph neural networks with pretrained LLMs to model scientific impact across temporal and structural scales. To balance effectiveness and efficiency, we freeze the backbone LLM parameters and train only a small set of task-specific parameters using prefix-tuning, which enables scalable training while preserving strong semantic representations. To enable systematic training and evaluation, we construct a large-scale and multi-grained benchmark dataset that combines diverse metadata and multiple impact indicators from real-world scientific corpora. Extensive experiments demonstrate that our approach substantially outperforms both traditional baselines and recent LLM-based methods. All our used datasets and code will be released on GitHub.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 17870

Loading