Language-assisted Feature Representation and Lightweight Active Learning For On-the-Fly Category Discovery

Published: 26 Sept 2025, Last Modified: 26 Sept 2025Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Contemporary deep learning models are very successful in recognizing predetermined categories, but often struggle when confronted with novel ones, constraining their utility in the real world. Identifying this research gap, On-the-fly Category Discovery aims to enable machine learning systems trained on closed labeled datasets to promptly discern between novel and familiar categories of the test-images encountered in an online manner (one image at a time), along with clustering the different new classes as and when they are encountered. To address this challenging task, we propose SynC, a pragmatic yet robust framework that capitalizes on the presence of category names within the labeled datasets and the powerful knowledge-base of Large Language Models to obtain unique feature representations for each class. It also dynamically updates the classifiers of both the seen and novel classes for improved class discriminability. An extended variant, SynC-AL incorporates a lightweight active learning module to mitigate errors during inference, for long-term model deployment. Extensive evaluation show that SynC and SynC-AL achieve state-of-the-art performance across a spectrum of classification datasets.
Submission Length: Long submission (more than 12 pages of main content)
Changes Since Last Submission: We have added new tables 8,9 that exhibit the performance of our framework with varying text encoders and varying the number of text prompts. We have edited Table 10(previously Table 8) to include the ablation for the Stanford Cars dataset. We have added Figure 5, exhibiting the growth of the confusion buffer during inference. We have added two new headings in related works (textual supervision and zero-shot learning), citing relevant works. We have clearly stated the distinction between ZSL and OCD under Section 3(Proposed Framework). Lastly, we have added two new sections in the appendix, one exhibiting evidence about the claim "semantically similar classes exhibit similar variations in their feature distributions". At the same time, another provides ample evidence that the improvement obtained due to our methodology is from effective use of distributional semantics, rather than reliance on implicit visual priors within different language models. We also provide the performance of our model against the current SOTA PHE, for the subset ‘Arachnida’, of iNaturalist dataset. The class names present in the dataset ‘Arachnida’, namely ’Loxosceles reclusa’, ’Mastigoproctus giganteus’, ’Menemerus bivittatus’, etc, are unlikely to appear in the pretraining corpus of SentenceBert. Hence, the performance improvement is not attributable to any implicit visual information encoded in the language mode
Video: https://youtu.be/q-A2isj0x0o
Code: https://github.com/missBanerjee/SynC/tree/main
Assigned Action Editor: ~Brian_Kulis1
Submission Number: 4846
Loading