Exploring Data Augmentation in Neural DRS-to-Text GenerationDownload PDF

Anonymous

16 Aug 2023ACL ARR 2023 August Blind SubmissionReaders: Everyone
Abstract: Neural networks are notoriously data-hungry. This represents an issue in cases where data are scarce such as in low-resource languages. Data augmentation is a technique that is commonly used in computer vision to provide neural networks with more data and for increasing their generalization power. When dealing with data augmentation for natural language, however, simple data augmentation techniques similar to the ones used in computer vision such as rotation and cropping cannot be employed because they would generate ungrammatical texts. Thus, data augmentation needs a specific design in the case of neural data-to-text systems, especially for a structurally rich input format such as the ones used for meaning representation. This is the case of the neural natural language generation for Discourse Representation Structures (DRS-to-Text), where the logical nature of DRS needs a specific design of data augmentation. In this paper, we adopt a novel approach in DRS-to-Text to selectively augment a training set with new data by adding and varying two specific lexical categories, i.e. proper and common nouns. In particular, we propose to use WordNet supersenses for producing new training sentences using both in-and-out context nouns. We present a number of experiments for evaluating the role played by augmented lexical information. The experimental results prove the effectiveness of our approach for data augmentation in DRS-to-Text generation.
Paper Type: long
Research Area: Generation
0 Replies

Loading