CALIBFREE: Self-Supervised Feature Disentanglement for Calibration-Free Multi-Camera Multi-Object Tracking

CALIBFREE: Self-Supervised Feature Disentanglement for Calibration-Free Multi-Camera Multi-Object Tracking

ICLR 2026 Conference Submission20056 Authors

19 Sept 2025 (modified: 08 Oct 2025)ICLR 2026 Conference SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Multi-camera multi-object tracking; Self-supervised representation learning; Feature disentanglement; Surveillance video analytics

Abstract: Multi-camera multi-object tracking (MCMOT) faces significant challenges in maintaining consistent object identities across varying camera perspectives, particularly when precise calibration and extensive annotations are required. In this paper, we present CALIBFREE, a self-supervised representation learning framework that does not need any calibration or manual labeling for the MCMOT task. By disentangling view-agnostic and view-specific features through single-view distillation and cross-view reconstruction, our method adapts to complex, dynamic scenarios with minimal overhead. Experiments on the MMP-MvMHAT dataset show a 3% improvement in overall accuracy and a 7. 5% increase in the average F1 score over state-of-the-art approaches, confirming the effectiveness of our calibration-free design. Moreover, on the more diverse MvMHAT dataset, our approach demonstrates superior over-time tracking and strong cross-view performance, highlighting its adaptability to a wide range of camera configurations.

Primary Area: applications to computer vision, audio, language, and other modalities

Submission Number: 20056

Loading