Training-free Design of Augmentations with Data-centric Principles

Published: 17 Jun 2024, Last Modified: 17 Jul 2024ICML2024-AI4Science PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: data augmentation, medical image analysis, deep learning theory, topological data analysis, computer vision
TL;DR: We propose effective, training-free metrics to design augmentations based on deep learning theory, which accurately estimate optimal augmentation strategies and result in significant improvements on real-world medical imaging datasets.
Abstract: The remarkable advancements in Artificial Intelligence (AI) and Deep Learning owe significantly to the evolution of informative datasets. With the emerging concept of ``Data-centric AI'', there has been a shift in focus from developing deep neural networks (DNNs) to crafting high-quality training datasets. However, current data-centric approaches predominantly rely on empirics or heavy DNN training costs, lacking established design principles. Our work concentrates on data augmentation, a key technique for enhancing data quality. Grounded by the recent development of deep learning theory, we discover principled metrics that effectively gauge both data quality and its interaction with DNNs. Crucially, these principles can be calculated without the need for extensive DNN training, enabling training-free augmentation design with minimal computation costs. Comprehensive experiments validate that our principles are strongly aligned with optimal choices of augmentations used in practice. Our method is particularly beneficial in domain-specific fields like medical image analysis, where the optimal augmentation strategy and the data's inductive bias are often unclear. Our results demonstrate consistent improvements over existing state-of-the-art segmentation methods across various medical imaging datasets.
Submission Number: 101
Loading