Multi-Teacher Knowledge Distillation for Efficient Object Segmentation

Published: 14 Sept 2025, Last Modified: 28 Jan 2026ICIPEveryoneCC BY-NC-ND 4.0
Abstract: Segment Anything Model 2 (SAM2) has demonstrated state-of-the-art performance in image/video object segmentation across many domains, but its large encoder makes it challenging for resource-constrained devices or real-time applications. One solution to this problem is to carry out knowledge distillation from the bulky encoder to a lightweight encoder, but this can result in degraded performance. In this work, we investigate multi-teacher distillation to mitigate performance degradation for distilled segmentation models. Using several foundation teacher models, our multi-teacher distilled models achieve 3.2 times speedup during end-to-end inference compared to SAM2 while achieving the best results of 74.4 and 71.1 (72.1 and 69.6 for single-teacher distillation) mIoU on the COCO and LVIS image segmentation datasets, as well as showing competitive results on video segmentation. Our results show that multi-teacher distillation offers a powerful solution for efficient image/video segmentation, while also maintaining compelling performance.
Loading