Transitioning Heads Conundrum: The Hidden Bottleneck in Long-Tailed Class-Incremental Learning

Published: 24 Apr 2026, Last Modified: 24 Apr 2026Accepted by TMLREveryoneRevisionsBibTeXCC BY 4.0
Abstract: Long-Tailed Class-Incremental Learning (LTCIL) faces a fundamental tension: models must sequentially learn new classes while contending with extreme class imbalance, which amplifies catastrophic forgetting. A particularly overlooked phenomenon is the Transitioning Heads Conundrum: as replay buffers constrain memory, initially well-represented head classes shrink over time and effectively become tail classes, undermining knowledge retention. Existing approaches fail to address this because they apply knowledge distillation too late, after these transitions have already eroded head-class representations. To overcome this, we introduce DEcoupling Representations for Early Knowledge distillation (DEREK), which strategically employs Early Knowledge Distillation to safeguard head-class knowledge before data constraints manifest. Comprehensive evaluation across 2 LTCIL benchmarks, 12 experimental settings, and 24 baselines, including Long-Tail, Class-Incremental, Few-Shot CIL, and LTCIL methods, shows that DEREK maintains competitive performance across categories, establishing new state-of-the-art results.
Submission Type: Regular submission (no more than 12 pages of main content)
Changes Since Last Submission: - **Table Q1-A** (per-group accuracy and forgetting across ablations) added to **Section 5, page 11**. - **Table Q2-A** (Early vs. Late KD comparison) added to **Appendix A.2, page 4**. - **Tables Q3-A and Q3-B** (specialization and ensemble analysis) added to **Appendix A.2, pages 4-5**. - **Table Q4-A** (LCL vs. KL comparison) added to **Section 5, page 11**. - LCL motivation paragraph added to **Section 3.2, page 6**. - Elaboration on why prior methods cannot exploit early distillation added to **Section 3 (paragraph 3, page 5)**. - **Lamp walkthrough** added to **Section 3.1 (page 6)** and **Section 3.2 (page 6)**. - **Replay strategy discussion** added to **Appendix A.4.2, pages 8-9**. - Discussion of Dai et al. (NeurIPS 2025) added to **Section 7, page 13**. - Typo corrected in **Section 4.1, page 7** ("fequency" → "frequency").
Supplementary Material: pdf
Assigned Action Editor: ~Hanwang_Zhang3
Submission Number: 6621
Loading