Semantic Structure Extraction for Spreadsheet Tables with a Multi-task Learning ArchitectureDownload PDF

Published: 01 Nov 2019, Last Modified: 05 May 2023DI 2019Readers: Everyone
Keywords: spreadsheet, table, deep learning, semantic
TL;DR: We propose a novel multi-task framework that learns table detection, semantic component recognition and cell type classification for spreadsheet tables with promising results.
Abstract: Semantic structure extraction for spreadsheets includes detecting table regions, recognizing structural components and classifying cell types. Automatic semantic structure extraction is key to automatic data transformation from various table structures into canonical schema so as to enable data analysis and knowledge discovery. However, they are challenged by the diverse table structures and the spatial-correlated semantics on cell grids. To learn spatial correlations and capture semantics on spreadsheets, we have developed a novel learning-based framework for spreadsheet semantic structure extraction. First, we propose a multi-task framework that learns table region, structural components and cell types jointly; second, we leverage the advances of the recent language model to capture semantics in each cell value; third, we build a large human-labeled dataset with broad coverage of table structures. Our evaluation shows that our proposed multi-task framework is highly effective that outperforms the results of training each task separately.
1 Reply

Loading