Abstract: Charts are important non-textual elements present in documents, providing a visual representation of numerical data. Among different representations, Pie-charts are commonly employed in digital documents due to their perceptual advantages for displaying numerical data and inter-relationship information. Chart Data Extraction is a multi-stage pipeline, with each stage playing a crucial role in obtaining the raw data correctly. Prior work mostly focuses on improving the performance of one or a combination of a few sub-stages. In this work, we propose a novel end-to-end data extraction algorithm, PieExtract, to extract data from pie-charts. This proposed algorithm designs a novel Robust Fusion Attention Network (RobFA-Net) approach for chart classification tasks. This network introduces a robust fusion attention strategy to learn significant discriminative global and local information, thereby enhancing the learning model performance. In addition, our novel rule-based sector data extraction method further enhances its performance in extracting data from pie-charts. Extensive experimentation is conducted on three datasets, specifically Revision, Chagas, and FigureQA, focusing on chart classification and the FigureQA dataset for data extraction from pie-charts. Our findings demonstrate that the proposed pipeline outperforms compared to previous works, showcasing superior performance.
Loading