MMOT: The First Challenging Benchmark for Drone-based Multispectral Multi-Object Tracking

Published: 18 Sept 2025, Last Modified: 30 Oct 2025NeurIPS 2025 Datasets and Benchmarks Track posterEveryoneRevisionsBibTeXCC BY 4.0
Keywords: Multi-object Tracking, Multispectral Imagery, Drone-based Vision, Rotation-aware Tracking
Abstract: Drone-based multi-object tracking is essential yet highly challenging due to small targets, severe occlusions, and cluttered backgrounds. Existing RGB-based multi-object tracking algorithms heavily depend on spatial appearance cues such as color and texture, which often degrade in aerial views, compromising tracking reliability. Multispectral imagery, capturing pixel-level spectral reflectance, provides crucial spectral cues that significantly enhance object discriminability under degraded spatial conditions. However, the lack of dedicated multispectral UAV datasets has hindered progress in this domain. To bridge this gap, we introduce **MMOT**, the first challenging benchmark for drone-based multispectral multi-object tracking dataset. It features three key characteristics: (i) **Large Scale** — 125 video sequences with over 488.8K annotations across eight object categories; (ii) **Comprehensive Challenges** — covering diverse real-world challenges such as extreme small targets, high-density scenarios, severe occlusions and complex platform motion; and (iii) **Precise Oriented Annotations** — enabling accurate localization and reduced object ambiguity under aerial perspectives. To better extract spectral features and leverage oriented annotations, we further present a multispectral and orientation-aware MOT scheme adapting existing MOT methods, featuring: (i) a lightweight Spectral 3D-Stem integrating spectral features while preserving compatibility with RGB pretraining; (ii) a orientation-aware Kalman filter for precise state estimation; and (iii) an end-to-end orientation-adaptive transformer architecture. Extensive experiments across representative trackers consistently show that multispectral input markedly improves tracking performance over RGB baselines, particularly for small and densely packed objects. We believe our work will benefit the community for advancing drone-based multispectral multi-object tracking research. Our MMOT, code and benchmarks are publicly available at https://github.com/Annzstbl/MMOT.
Croissant File: json
Dataset URL: https://huggingface.co/datasets/Annzstbl/MMOT
Code URL: https://github.com/Annzstbl/MMOT
Primary Area: Datasets & Benchmarks for applications in computer vision
Submission Number: 181
Loading