RGB-Event MOT: A Cross-Modal Benchmark for Multi-Object Tracking

19 Sept 2023 (modified: 25 Mar 2024)ICLR 2024 Conference Withdrawn SubmissionEveryoneRevisionsBibTeX
Keywords: event data, object tracking, cross-modal
TL;DR: the first RGB-Event benchmark datasets for MOT under challenging environments and new RGB-Event MOT baselines. The new dataset will open up a new track.
Abstract: Leveraging the power of contemporary deep learning techniques, it has become increasingly convenient for methodologies to recognize, detect, and track objects in real-world scenarios. Nonetheless, challenges persist, particularly regarding the robustness of these models in recognizing small objects, operating in low-illumination conditions, or dealing with occlusions. Recognizing the unique advantages offered by Event-based vision - including superior temporal resolution, vast dynamic range, and minimal latency - it is quickly becoming a coveted tool among computer vision researchers. To bolster foundational research in areas such as object detection and tracking, we present the first cross-modal RGB-Event multi-object tracking benchmark dataset. This expansive repository encompasses nearly one million carefully annotated ground-truth bounding boxes, offering an extensive data resource for research endeavors. Designed to augment the practical implementation of Event-based vision technology, this dataset proves particularly beneficial in intricate and challenging environments, including low-light situations, scenarios marked by occlusions, and contexts involving diminutive objects. The utility and potency of cross-modal detection and tracking models have been extensively tested and confirmed through our experimental studies. The encouraging results not only affirm the necessity of these models but also highlight their efficacy, thus emphasizing the benchmark’s potential to significantly propel the advancement of Event-based vision technology. We have included the code in the supplementary material and will make the dataset publicly available.
Supplementary Material: zip
Primary Area: representation learning for computer vision, audio, language, and other modalities
Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.
Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2024/AuthorGuide.
Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors' identity.
No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.
Submission Number: 1927
Loading