Understanding and Reducing the Class-Dependent Effects of Data Augmentation with A Two-Player Game Approach

Published: 28 Jun 2025, Last Modified: 28 Jun 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Data augmentation is widely applied and has shown its benefits in different machine learning tasks. However, as recently observed, it may have an unfair effect in multi-class classification. While data augmentation generally improves the overall performance (and therefore is beneficial for many classes), it can actually be detrimental for other classes, which can be problematic in some application domains. In this paper, to counteract this phenomenon, we propose CLAM, a CLAss-dependent Multiplicative-weights method. To derive it, we first formulate the training of a classifier as a non-linear optimization problem that aims at simultaneously maximizing the individual class performances and balancing them. By rewriting this optimization problem as an adversarial two-player game, we propose a novel multiplicative weight algorithm, for which we prove the convergence. Interestingly, our formulation also reveals that the class-dependent effects of data augmentation is not due to data augmentation only, but is in fact a general phenomenon. Our empirical results over five datasets demonstrate that the performance of learned classifiers is indeed more fairly distributed over classes, with only limited impact on the average accuracy.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=y9p76VfJsL
Changes Since Last Submission: We have changed the font from Times New Roman to Computer Modern Bright. After this correction, the manuscript is overlength so we move the last paragraph of experimental results to the appendix.
Code: https://github.com/jyp9961/CLAM
Supplementary Material: zip
Assigned Action Editor: ~Tongzheng_Ren1
Submission Number: 4571
Loading