EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

Jason Wei; Kai Zou

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

Jason Wei, Kai Zou

Published: 17 Apr 2019, Last Modified: 22 Jun 2025LLD 2019Readers: Everyone

Keywords: Data Augmentation, Text Classification, Natural Language Processing

TL;DR: Simple text augmentation techniques can significantly boost performance on text classification tasks, especially for small datasets.

Abstract: We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 11 code implementations](https://www.catalyzex.com/paper/eda-easy-data-augmentation-techniques-for/code)

3 Replies

Loading