Accelerated Deep Active Learning with Graph-based Sub- Sampling

Published: 17 Aug 2024, Last Modified: 17 Sept 2024Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Past years have witnessed the fast and thorough development of active learning, a human-in-the-loop semi-supervised learning that helps reduce the burden of expensive data annotation. Diverse techniques have been proposed to improve the efficiency of label acquisition. However, the existing techniques are mostly intractable at scale on massive unlabeled instances. In particular, the query time and model retraining time of large scale image-data models is usually linear or even quadratic in the size of the unlabeled pool set and its dimension. The main reason for this intractability is the iterative need to scan the pool set at least once in order to select the best samples for label annotation. To alleviate this computational burden we propose efficient Diffusion Graph Active Learning (DGAL). DGAL is used on a pre-computed Variational-Auto-Encoders (VAE) latent space to restrict the pool set to a much smaller candidates set. The sub-sample is then used in deep architectures, to reduce the query time, via an additional standard active learning baseline criterion. DGAL demonstrates a query time versus accuracy trade-off that is two or more orders of magnitude acceleration over state-of-the-art methods. Moreover, we demonstrate the important exploration-exploitation trade-off in DGAL that allows the restricted set to capture the most impactful samples for active learning at each iteration.
Submission Length: Regular submission (no more than 12 pages of main content)
Previous TMLR Submission Url: https://openreview.net/forum?id=ENHSYYas3e&noteId=W5GSrdEMcc
Changes Since Last Submission: We are following on the editors suggestion to resubmit our previous version with the suggested corrections. We addressed all the questions and comments raised by reviewers and action chair, in our revision. We elaborate below on the changes we introduced. 1. Notation, organization, and clarity. We have taken this remark with our revision and executed changes as the reviewer suggested: we revised section 4.2, added definitions, rigor, additional equations, and we moved the discussion of VAE to the background. We edited the text and added the connection of the objective function in Section 3 to section 4.2. 2. We addressed in our revision every single remark and questions that reviewer JQe2 has pointed to, without exception. 3. Effects pf Hyperparameters: Following the reviewer JxW6 question we added section B in our appendix (due to space limitations) on parameter selection in our algorithmic setting. Our analysis demonstrates that the selection of key parameters has no dependence on the data set itself, under weak assumptions. Other parameters are set based on prior experiments and references. 4. Sampling: Following the reviewer's comment we provide in our revision appendix E.0.3 with new results of using additional AL criteria in the deep net architecture to test the acceleration based on the VAE and graph diffusion-based restriction. We used the following methods with our VAE-based graph-diffusion method: DGBADGE, DGCoreSet, and DGEnotropy and compared them with our DGMG. BADGE and CoreSet are well know sophisticated active learning methods that require additional computation time on each query step, but may yield higher accuracy. In our experiments we see that indeed DGBADGE is must faster than BADGE, and similarly for DGCoreSet and DGEntropy. Clearly this is a result of using our restriction method on the VAE graph with our diffusion based AL. DGBADGE is comparable in its accuracy-to-time trade-off to our DGMG, and also to DGEntropy, which are faster criteria to compute. For DGCoreSet the situation is worse as we can see slower query time trade off with accuracy consistently for all data sets. CoreSet computation is so high that even for the small restricted set it lingers over BADGE and the other more simple criteria Margin, and Entropy. Yet, DGCoreSet is faster than CoreSet. 5. Other representation learning Methods: we agree that the reviewers question of whether the VAE representation methods yields better representation than others is an interesting one. In our paper, we do not claim that VAE is necessarily the best representation of choice, and in fact we don’t expect it to be such for every data set. The main contribution of our paper is to propose an architecture in which a representation space can be constructed in an unsupervised manner and used efficiently to build a graph representation that can be used to restrict a large data set from being wholly fed in the deep net. Our other main contribution is also in proposing the diffusion based method in the restriction step, which leverages an optimal exploration-exploitation framework for query sampling. The main intuition behind VAEs is that optimizing the reconstruction loss will yield good class representation IF the structure in the data points corresponds to the class assignment. This assumption, we believe, is typical in imagery data. It may be the case that other unsupervised representations capture class structure as well, even though their loss function is not the reconstruction one. We address this point in section 4 and in the introduction.
Supplementary Material: pdf
Assigned Action Editor: ~Jake_Snell1
Submission Number: 2068
Loading