Abstract: Scene understanding is a computer vision task that involves grasping the pixel-level distribution of objects. Unlike most research focuses on single-scene models, we consider a more versatile proposal: domain-incremental learning for scene understanding. This allows us to adapt well-studied single-scene models into multi-scene models, reducing data requirements and ensuring model flexibility. However, domain-incremental learning that leverages correlations between scene domains has yet to be explored. To address this challenge, we propose a Domain-Incremental Learning Paradigm (D-ILP) for scene understanding, along with a new strategy of Pseudo-Replay Generation (PRG) that does not require manual labeling. Specifically, D-ILP leverages pre-trained single-scene models and incremental images for supervised training to acquire new knowledge from other scenes. As a pre-trained generation model, PRG can controllably generate pseudo-replays resembling source images from incremental images and text prompts. These pseudo-replays are utilized to minimize catastrophic forgetting in the original scene. We perform experiments with three publicly accessible models: Mask2Former, Segformer, and DeepLabv3+. With successfully transforming these single-scene models into multi-scene models, we achieve high-quality parsing results for original and new scenes simultaneously. Meanwhile, the validity and rationality of our method are proved by the analysis of D-ILP.
External IDs:dblp:journals/cvgip/XieQHLT25
Loading