TransformMix: Learning Transformation and Mixing Strategies from Data

TMLR Paper2398 Authors

20 Mar 2024 (modified: 27 Apr 2024)Under review for TMLREveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Data augmentation improves the generalization power of deep learning models by synthesizing more training samples. Sample-mixing is a popular data augmentation approach that creates additional data by combining existing samples. Recent sample-mixing methods, like Mixup and Cutmix, adopt simple mixing operations to blend multiple inputs. Although such a heuristic approach shows certain performance gains in some computer vision tasks, it mixes the images blindly and does not adapt to different datasets automatically. A mixing strategy that is effective for a particular dataset does not often generalize well to other datasets. If not properly configured, the methods may create misleading mixed images, which jeopardize the effectiveness of sample-mixing augmentations. In this work, we propose an automated approach, TransformMix, to learn better transformation and mixing augmentation strategies from data. In particular, TransformMix applies learned transformations and mixing masks to create compelling mixed images that contain correct and important information for the target tasks. We demonstrate the effectiveness of TransformMix on multiple datasets in transfer learning, classification, object detection, and knowledge distillation settings. Experimental results show that our method achieves better performance as well as efficiency when compared with strong sample-mixing baselines.
Submission Length: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: We thank all the reviewers for their valuable and quality reviews. In the current version, we added new changes to clarify our methods and new content to support our claims. The changes are colored in violet in the new revision. The clarifications are marked with the ‘FIX’ tags, and the new content is marked with the ‘NEW’ tags. The major changes are as follows: [FIX] 1. We clarified the six affine parameters of the spatial transformation network in Section 3.1. 2. We explained how to use pre-trained teacher networks to obtain CAMs for unseen datasets in Section 4.2. 3. We stated the number of experimental trials and the use of sample standard deviation when reporting the results in Transfer Classification and Direct Classification in Sections 4.2 and 4.3. 4. We acknowledged the extra time for training the teacher network and mixing module in Section 4.5. [NEW] 1. We added more references to five MixUp variants, as suggested by one of our reviewers. 2. We ran more trials (from 3 to 5) on our major results in the transfer classification experiments. 3. We added the experiment on the MS-COCO dataset in Section 4.4. 4. In Section 4.6, we added new Ablation studies to show the effectiveness of our method compared to simple stacking and pre-defined transformations.
Assigned Action Editor: ~Yu-Xiong_Wang1
Submission Number: 2398
Loading