ReasonIE: better LLMs for Scientific Information Extraction with reinforcement learning and data augmentation

18 Sept 2025 (modified: 04 Dec 2025)ICLR 2026 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Information extraction; reinforcement learning; relation extraction
Abstract: Large Language Models (LLMs) are good at reasoning in math and coding but underperform smaller, supervised models on structured Scientific Information Extraction (SciIE) tasks. This gap rises from a limited domain data and SciIE requires a combination of knowledge memorization and complex reasoning. To bridge this gap, we propose ReasonIE, a novel two-stage training framework. First, we use LLM-driven data augmentation to generate additional domain-specific training data, mitigating data limitation. We then introduce MimicSFT, a supervised fine-tuning method that uses structured reasoning templates to teach logical patterns without human-annotated chains-of-thought, followed by R\textsuperscript{2}GRPO, an RLVR algorithm optimized with a composite reward function that jointly scores factual relevance and logical consistency. Evaluated on SciIE benchmarks, our approach enables a general-purpose Qwen2.5-7B model to become competitive with specialized supervised baselines with less training data, demonstrating that RLVR and LLM-based data augmentation can successfully enhance both the knowledge retention and structured reasoning capacities of LLMs. The implementation is available at: \url{https://anonymous.4open.science/r/R2GRPO-48B5}
Primary Area: generative models
Submission Number: 11037
Loading