Learning to Predict Future-Aligned Research Proposals with Language Models
Keywords: AI Scientist, LLM for Research, Hypotheis Generation
Abstract: Large language models (LLMs) are increasingly used as research assistants, but evaluating the quality of LLM-generated research proposals remains difficult: novelty, soundness, and feasibility are hard to measure automatically and typically require costly human judgment, making it unclear how to define a scalable learning objective.
We propose a verifiable alternative by reframing proposal generation as a time-sliced scientific forecasting problem.
Given a research question and inspiring papers available before a cutoff time $t_C$, the model generates a structured proposal and is evaluated by whether it anticipates research directions that appear in papers published after $t_C$.
We operationalize this objective with the Future Alignment Score (FAS), computed via retrieval and LLM-based semantic scoring against a held-out future corpus.
To train models under this objective, we construct a time-consistent dataset of 17,771 papers by converting published papers and their pre-cutoff citations into proposal targets, and synthesize reasoning traces that explicitly perform gap analysis and inspiration borrowing; we further introduce a stepwise variant that decomposes generation into problem identification, method design, and experimental planning.
Across Llama-3.1 and Qwen2.5 models, future-aligned tuning improves future alignment over unaligned baselines (up to +10.6\% overall FAS), and domain-expert human evaluation corroborates improved proposal quality.
Finally, we demonstrate practical impact by implementing two model-generated proposals with a code agent, obtaining 4.17\% accuracy gain on MATH from a new prompting strategy and consistent improvements for a novel model-merging method.
Email Sharing: We authorize the sharing of all author emails with Program Chairs.
Data Release: We authorize the release of our submission and author names to the public in the event of acceptance.
Submission Number: 167
Loading