TKGT: Redefinition and A New Way of Text-to-Table Tasks Based on Real World Demands and Knowledge Graphs Augmented LLMs
Abstract: The task of text-to-table receives widespread attention, but its importance and difficulty are underestimated. Existing works use simple datasets like those from table-to-text tasks and employ methods that ignore domain structures. As a bridge between raw text and statistical analysis, the text-to-table task faces challenges from more complex semi-structured texts that refer to certain domain topics in the real world with obvious entities and events, especially from those of social sciences. In this paper, we analyse the limitation of previous datasets with methods and redefine the text-to-table task, based on which we propose a new dataset called CPL (Chinese Private Lending) of case judgments from a real world legal academic project. We further propose TKGT (Text-KG-Table), a two stages domain-aware pipeline, which firstly generates domain knowledge graphs (KGs) classes semi-automatically from raw text with the mixed information extraction (Mixed-IE) method, then adopts the hybrid retrieval augmented generation (Hybird-RAG) method to transform it to tables for downstream needs under the guidance of KGs classes. Experiment results show that TKGT achieves state-of-the-art (SOTA) performance on both traditional datasets and the CPL. Our code and data are available at https://anonymous.4open.science/r/TKGT-4755.
Paper Type: Long
Research Area: Information Extraction
Research Area Keywords: named entity recognition and relation extraction, event extraction, document-level extraction, zero/few-shot extraction
Contribution Types: NLP engineering experiment, Data resources
Languages Studied: English, Chinese
Submission Number: 4350
Loading