Tracing Training Progress: Dynamic Influence Based Selection for Active Learning

Published: 20 Jul 2024, Last Modified: 21 Jul 2024MM2024 PosterEveryoneRevisionsBibTeXCC BY 4.0
Abstract: Active learning (AL) aims to select highly informative data points from an unlabeled dataset for annotation, mitigating the need for extensive human labeling effort. However, classical AL methods heavily rely on human expertise to design the sampling strategy, inducing limited scalability and generalizability. Many efforts have sought to address this limitation by directly connecting sample selection with model performance improvement, typically through influence function. Nevertheless, these approaches often ignore the dynamic nature of model behavior during training optimization, despite empirical evidence highlights the importance of dynamic influence to track the sample contribution. This oversight can lead to suboptimal selection, hindering the generalizability of model. In this study, we explore the dynamic influence based data selection strategy by tracing the impact of unlabeled instances on model performance throughout the training process. Our theoretical analyses suggest that selecting samples with higher projected gradients along the accumulated optimization direction at each checkpoint leads to improved performance. Furthermore, to capture a wider range of training dynamics without incurring excessive computational or memory costs, we introduce an additional dynamic loss term designed to encapsulate more generalized training progress information. These insights are integrated into a universal and task-agnostic AL framework termed Dynamic Influence Scoring for Active Learning (DISAL). Comprehensive experiments across various tasks have demonstrated that DISAL significantly surpasses existing state-of-the-art AL methods, demonstrating its ability to facilitate more efficient and effective learning in different domains.
Primary Subject Area: [Content] Media Interpretation
Relevance To Conference: In multimedia processing, model training relies on extensive labeled data, which incurs significant manual labeling costs and computational overhead. This study proposes an efficient active learning strategy for identifying informative data. Leveraging dynamic influence estimation theory, our active learning method comprehensively considers the model training process to select data that enhances generalization. This approach is task-agnostic, applicable to various media data and various tasks.
Supplementary Material: zip
Submission Number: 4329
Loading