Abstract: Incrementally learning new semantic concepts while retaining existing information is fundamental for several real-world applications. Although the impact of backbone size and architectural choices has been extensively studied in non-incremental computer vision tasks for efficiency concerns, class-incremental semantic segmentation models have so far focused primarily on large backbones, without offering a fair comparison in terms of model size. In this work, we propose a fairer study across existing class-incremental semantic segmentation methods, focusing on the models efficiency with respect to their memory footprint. Moreover, we propose TILES (Transformer-based Incremental Learning for Expanding Segmenter), a novel approach exploiting small-size ViT backbones efficiency to offer an alternative solution where severe memory constraints are applied. It is based on expanding the architecture with the increments, allowing to learn new tasks while retaining old knowledge within a limited memory footprint. Besides, in order to tackle the background semantic shift, we apply adaptive losses specific to the incremental branches, while balancing old and new knowledge. Furthermore, we exploit the confidence of each incremental task to propose an efficient branch merging strategy. TILES outperforms several previous methods on various challenging benchmarks while using up to $14$ times fewer parameters.
Submission Length: Regular submission (no more than 12 pages of main content)
Assigned Action Editor: ~Massimiliano_Mancini1
Submission Number: 4903
Loading