FieldSwap: Data Augmentation for Form-Like DocumentsDownload PDF

Anonymous

17 Jun 2023ACL ARR 2023 June Blind SubmissionReaders: Everyone
Abstract: Extracting structured data from visually rich documents like invoices, receipts, financial statements, and tax forms is key to automating many business workflows. Building extraction models in this space typically requires a large number of high-quality training examples. We propose a novel data-augmentation technique called FieldSwap for such extraction problems. FieldSwap converts a candidate for a source field into a candidate for a target field by replacing a key phrase indicative of the source field with a key phrase indicative of the target field. Using experiments on five different datasets, we show that training on data augmented with FieldSwap improves performance by 1--11 F1 points at low data setting (10--100 documents). We demonstrate that FieldSwap is effective when key phrases are manually specified or inferred automatically from the training data.
Paper Type: long
Research Area: Efficient/Low-Resource Methods for NLP
0 Replies

Loading