Proxy-Anchor and EVT-Driven Continual Learning Method for Generalized Category Discovery

Published: 28 Feb 2026, Last Modified: 28 Feb 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Continual generalized category discovery has been introduced and studied in the literature as a method that aims to continuously discover and learn novel categories in incoming data batches while avoiding catastrophic forgetting of previously learned categories. A key component in addressing this challenge is the model’s ability to separate novel samples, where Extreme Value Theory (EVT) has been effectively employed. In this work, we propose a novel method that integrates EVT with proxy anchors to define boundaries around proxies using a probability of inclusion function, enabling the rejection of unknown samples. Additionally, we introduce a novel EVT-based loss function to enhance the learned representation, achieving superior performance compared to other deep-metric learning methods in similar settings. Using the derived probability functions, novel samples are effectively separated from previously known categories. However, category discovery within these novel samples can sometimes overestimate the number of new categories. To mitigate this issue, we propose a novel EVT-based approach to reduce the model size and discard redundant proxies. We also incorporate a novel experience replay and knowledge distillation mechanisms during the continual learning stage to prevent catastrophic forgetting. Experimental results demonstrate that our proposed approach outperforms state-of-the-art methods in continual generalized category discovery scenarios.
Submission Type: Long submission (more than 12 pages of main content)
Changes Since Last Submission: In the continual learning section, the feature replay part was changed based on the revision and reviews, and a new and novel method was used, more speciefially this paragraph was added: Furthermore, to improve robustness against catastrophic forgetting, we sample cosine distances from each proxy’s Weibull distribution and use spherical interpolation to place unit-norm exemplars on the hypersphere. This preserves the intrinsic geometry of cosine space and aligns more closely with the feature space learned by the EVT loss, compared to alternative distributions such as the Gaussian. Based on these synthesized features, we define the feature replay loss as follows: This new approach was also included in the contribution section of the introduction. The effect of using this new approach was also included in the ablation study in Table 8. An appendix was added to demonstrate that over-clustering of new categories happens in different clustering methods (not just Affinity Propagation) The font size of the caption of Figure 2 was increased to improve its readability.
Code: https://github.com/NumOne01/CATEGORIZER
Assigned Action Editor: ~Piyush_Rai1
Submission Number: 5937
Loading