Counterfactual Generative Smoothing for Imbalanced Natural Language Classification

Hojae Han, Seungtaek Choi, Myeongho Jeong, Jin-Woo Park, Seung-won Hwang

2021 (modified: 16 Oct 2022)CIKM 2021Readers: Everyone

Abstract: Classification datasets are often biased in observations, leaving onlya few observations for minority classes. Our key contribution is de-tecting and reducing Under-represented (U-) and Over-represented(O-) artifacts from dataset imbalance, by proposing a Counterfac-tual Generative Smoothing approach on both feature-space anddata-space, namely CGS_f and CGS_d. Our technical contribution issmoothing majority and minority observations, by sampling a ma-jority seed and transferring to minority. Our proposed approachesnot only outperform state-of-the-arts in both synthetic and real-lifedatasets, they effectively reduce both artifact types.

0 Replies