TRUST: Trajectory-guided State-Space Temporal Test-Time Adaptation

Published: 28 Feb 2026, Last Modified: 04 Apr 2026CAO PosterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Video Object Detection, Domain Shift, Test-Time Adaptation, Bayesian Inference, Object Tracking
TL;DR: We propose TRUST, a backpropagation-free Bayesian framework for video object detection that treats adaptation as temporal smoothness.
Abstract: Vision-language models (VLMs) enable text-conditioned object detection, but their performance degrades under temporally evolving distribution shifts. We propose TRUST (TRajectory-gUided State-space Temporal test-time adaptation), a backpropagation-free Bayesian framework for video object detection that treats adaptation as temporal smoothness over a global cache capturing gradual distribution shift and an instance-level state-space filtering guided by object trajectories tracking. Our method maintains a global cache state that contains prototype vision embeddings and scale statistics. The instance-level state captures object dynamics through a Kalman-style trajectory tracking that leverages an embedding smoothing over the tracks. The resulting algorithm is backpropagation-free and works without online gradients. We evaluate on the SHIFT dataset, which provides videos with continuous intra-sequence gradual shifts. The implementations are available at https://github.com/FardadDadboud/vlm.git.
Submission Number: 70
Loading