Unlocking the Power of LLMs for Efficiently Automatic Extract Information from Hybrid Long Documents

Anonymous

Unlocking the Power of LLMs for Efficiently Automatic Extract Information from Hybrid Long Documents

Anonymous

16 Oct 2023ACL ARR 2023 October Blind SubmissionReaders: Everyone

Abstract: Information extraction is a vital task in natural language processing. It involves extracting user-interesting information from natural language and serves many downstream tasks, including knowledge graphs, information retrieval, and question-answering systems. Given LLMs' robust comprehension and reasoning across diverse tasks, their potential for this task is substantial. However, applying LLMs directly for complex documents faces challenges, including handling lengthy documents, understanding tables, adapting to representation ambiguity, and ensuring numerical precision. Given the absence of comprehensive datasets encompassing these challenges, we introduce the Financial Reports Numerical Extraction (FINE) dataset to facilitate further investigation. We present the Split-Recombination Framework (SiReF) that effectively counters these challenges with table serialization, embedding retrieval, and precision prompts. Extensive experiment results demonstrate its adaptability across various domains and LLMs with different capabilities. The dataset and code are provided in the attachments.

Paper Type: long

Research Area: Information Extraction

Contribution Types: NLP engineering experiment, Data resources

Languages Studied: English

Consent To Share Submission Details: On behalf of all authors, we agree to the terms above to share our submission details.

0 Replies

Loading