Enhancing Data Augmentation with Knowledge-enriched Data Generation via Dynamic Prompt-tuning Method

Published: 01 Jan 2024, Last Modified: 26 Mar 2025IJCNN 2024EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Data augmentation is a popular technique to address the limited amount of training data available for machine learning models. However, existing approaches based on pretrained language models (PLMs) often suffer from limited diversity at the word or sub-word level and high costs associated with manual data collection and labeling. In this paper, we introduce a novel approach called DPTAK, which leverages the rich prior knowledge pre-learned by transformer-based PLMs to generate diverse and high-quality augmented data for text-to-data and data-to-text tasks. Unlike other methods, DPTAK retrieves associated knowledge with a given dataset and does not require manual data collection or labeling. Our experiments on E2E, WebNLG, and DART datasets demonstrate that DPTAK outperforms existing baseline models in terms of BLEU score by 0.37, 0.44, and 0.87, respectively, for the data-to-text task when applied with GPT-2. In text-to-data, DPTAK shows improvements of more than 0.44 BLEU score on E2E compared to other baseline methods. Moreover, DPTAK-augmented datasets exhibit the highest diversity scores among all existing data augmentation methods in data-to-text task, providing evidence of the effectiveness of our approach.
Loading