Keywords: Large Language Models, Interdisciplinary Research, NLP for Science, Benchmark, Dataset
Abstract: This work introduces IDRBench --- a pioneering benchmark featuring an expert-annotated dataset and a suite of tasks tailored to evaluate LLMs' capabilities in proposing valuable research ideas for \textbf{Interdisciplinary Research (IDR)}. To ensure a reliable evaluation, our dataset consists of scientific publications sourced from the ArXiv platform covering six distinct disciplines and is annotated by domain experts with diverse academic backgrounds.The design of evaluation tasks in IDRBench follows a progressive, real-world perspective, reflecting the natural stages of interdisciplinary research development, including 1) IDR Paper Identification, 2) IDR Idea Integration}, and 3) IDR Idea Recommendation. Using IDRBench, we construct baselines across 10 LLMs and observe that despite fostering some level of IDR awareness, LLMs still struggle to produce quality IDR ideas. These findings could not only spark new research directions, but also help to develop next-generation LLMs that excel in interdisciplinary research.
Submission Number: 25
Loading