Empowering AI as Autonomous Researchers: Evaluating LLMs in Generating Novel Research Ideas through Automated Metrics

Published: 20 Dec 2024, Last Modified: 30 Dec 2024AI4Research @ AAAI 2025 OralEveryoneRevisionsBibTeXCC BY 4.0
Keywords: AI-generated research ideas, large language models in research, direct preference optimization, AI for scientific creativity, user satisfaction score metrics, automated creativity index, AI-assisted research ideation, novel research idea generation, human-AI collaboration in research, evaluating AI creativity metrics
Abstract: This study explores the potential of large language models (LLMs) as independent research generators, leveraging a dataset of over 1.2 million DBLP papers (2019-2023) across diverse domains. Utilizing cutting-edge LLMs, including Llama-3, Mistral, Mixtral, and Gemma, we subjected them to supervised fine-tuning and direct preference optimization (DPO) using an automated preference dataset. Our experiments reveal that DPO-optimized models surpass solely supervised fine-tuned models like GPT-3.5 Turbo, Davinci-002, and Gemini-1.0 by 27% in the novel creativity index, which evaluates originality, feasibility, impact, and reliability. Additionally, these models achieved a 42% improvement in automated user satisfaction scores, with 89% of the generated research ideas being validated as highly relevant and promising by domain experts. This research demonstrates the significant potential of LLMs as autonomous researchers, setting a new standard for efficiency and creativity in ideation.
Archival Option: The authors of this submission want it to appear in the archival proceedings.
Submission Number: 22
Loading