TTT-KD: Test-Time Training for 3D Semantic Segmentation Through Knowledge Distillation From Foundation Models

Published: 2025, Last Modified: 17 Oct 20253DV 2025EveryoneRevisionsBibTeXCC BY-SA 4.0
Abstract: Test-Time Training (TTT) proposes to adapt a pretrained network to changing data distributions on-the-fly. In this work, we propose the first TTT method for 3D semantic segmentation, TTT-KD, which models Knowledge Distillation (KD) from foundation models (e.g. DINOv2) as a self-supervised objective for adaptation to distribution shifts at test-time. Given access to paired image-pointcloud (2D-3D) data, we first optimize a 3D segmentation backbone for the main task of semantic segmentation using the pointclouds and the task of $2 D \rightarrow 3 D K D$ by using an off-the-shelf $2 D$ pre-trained foundation model. At test-time, our TTT-KD updates the 3D segmentation backbone for each test sample by using the self-supervised task of knowledge distillation before performing the final prediction. Extensive evaluations on multiple indoor and outdoor 3D segmentation benchmarks show the utility of TTT-KD, as it improvesperformance for both in-distribution (ID) and outof distribution (OOD) test datasets. We achieve a gain of up to 13% mIoU (7% on average) when the train and test distributions are similar and up to 45% (20% on average) when adapting to OOD test samples. The code is available in the following repository.
Loading