AutoIE-LLM: An Automated Information Extraction Framework from Scientific Literature Based on the LLM
Abstract: Specialized research literature in PDF contains abundant domain-specific knowledge, yet extracting critical information from these documents remains a daunting challenge. To address this, we propose AutoIE-LLM, an innovative information extraction framework integrating Large Language Models (LLMs) with human-in-the-loop for domain-specific knowledge processing. The framework comprises layout analysis, key information extraction, and continuous learning modules. We introduce a novel dataset of 1,122 chemical molecular sieve documents to validate our approach. Experimental results demonstrate that AutoIE-LLM achieves 79\% accuracy in named entity recognition and relation extraction tasks, a 10\% improvement over the baseline AutoIE model. The framework handles complex terminology and non-standard document structures, demonstrating its effectiveness in specialized domains. This study enhances LLMs' capabilities in expert fields and provides a valuable resource for future molecular sieve information extraction studies.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: Information Extraction; Layout Analysis scientific document Analysis; Large Language Models
Contribution Types: NLP engineering experiment, Data resources, Theory
Languages Studied: ENGLISH
Submission Number: 67
Loading