Augmenting Research Ideation with Data: An Empirical Investigation in Social Science

Xiao Liu; Xinyi Dong; Xinyang Gao; Yansong Feng; Xun Pang

Augmenting Research Ideation with Data: An Empirical Investigation in Social Science

Xiao Liu, Xinyi Dong, Xinyang Gao, Yansong Feng, Xun Pang

Published: 24 Sept 2025, Last Modified: 15 Oct 2025NeurIPS2025-AI4Science SpotlightEveryoneRevisionsBibTeXCC BY 4.0

Track: Track 1: Original Research/Position/Education/Attention Track

Keywords: ideation, research idea generation, data-driven, social science

TL;DR: We show that augmenting LLMs with data and preliminary validation improves the feasibility and expected effectiveness of research ideas, inspiring higher-quality human ideation in social science.

Abstract: Recent advancements in large language models (LLMs) demonstrate strong potential for generating novel research ideas, yet such ideas often struggle with feasibility and effectiveness. In this paper, we investigate whether augmenting LLMs with relevant data during the ideation process can improve idea quality. Our framework integrates data at two stages: (1) incorporating metadata during idea generation to guide models toward more feasible concepts, and (2) introducing an automated preliminary validation step during idea selection to assess the empirical plausibility of hypotheses within ideas. We evaluate our approach in the social science domain, with a specific focus on climate negotiation topics. Expert evaluation shows that metadata improves the feasibility of generated ideas by 20\%, while automated validation improves the overall quality of selected ideas by 7\%. Beyond assessing the quality of LLM-generated ideas, we conduct a human study to examine whether these ideas, augmented with related data and preliminary validation, can inspire researchers in their own ideation. Participants report that the LLM-generated ideas and validation are highly useful, and the ideas they propose with such support are proven to be of higher quality than those proposed without assistance. Our findings highlight the potential of data-augmented research ideation and underscore the practical value of LLM-assisted ideation in real-world academic settings.

Submission Number: 235

Loading