IndianRoad: A Video Dataset of Diverse Atomic Visual Elements in Dense and Unpredictable Environments

Xijun Wang; Pedro Sandoval-Segura; Tianrui Guan; Ruiqi Xian; Fuxiao Liu; Rohan Chandra; Boqing Gong; Dinesh Manocha

IndianRoad: A Video Dataset of Diverse Atomic Visual Elements in Dense and Unpredictable Environments

Xijun Wang, Pedro Sandoval-Segura, Tianrui Guan, Ruiqi Xian, Fuxiao Liu, Rohan Chandra, Boqing Gong, Dinesh Manocha

26 Sept 2024 (modified: 13 Nov 2024)ICLR 2025 Conference Withdrawn SubmissionEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Dataset, Vulnerable Road Users, Dense and Unpredictable Environment, Video Understanding, Behaviour Understanding

Abstract: Most existing traffic video datasets including Waymo are structured, focusing predominantly on Western traffic, which hinders global applicability. Specifically, most Asian scenarios are far more complex, involving numerous objects with distinct motions and behaviors. Addressing this gap, we present a new dataset, IndianRoad, designed for evaluating perception methods with high representation of Vulnerable Road Users (VRUs: e.g. pedestrians, animals, motorbikes, and bicycles) in complex and unpredictable environments. IndianRoad is a manually annotated dataset encompassing 16 diverse actor categories (spanning animals, humans, vehicles, etc.) and 16 action types (complex and rare cases like cut-ins, zigzag movement, U-turn, etc.), which require high reasoning ability. IndianRoad densely annotates over 13 million bounding boxes (bboxes) actors with identification, and more than 1.6 million boxes are annotated with both actor identification and action/behavior details. The videos within IndianRoad are collected based on a broad spectrum of factors, such as weather conditions, the time of day, road scenarios, and traffic density. IndianRoad can benchmark video tasks like Tracking, Detection, Spatiotemporal Action Localization, Language-Visual Moment retrieval, and Multi-label Video Action Recognition. Given the critical importance of accurately identifying VRUs to prevent accidents and ensure road safety, in IndianRoad, vulnerable road users constitute 41.13% of instances, compared to 23.71% in Waymo. IndianRoad provides an invaluable resource for the development of more sensitive and accurate visual perception algorithms in the complex real world. Our experiments show that existing methods suffer degradation in performance when evaluated on IndianRoad, highlighting its benefit for future video recognition research.

Supplementary Material: zip

Primary Area: datasets and benchmarks

Code Of Ethics: I acknowledge that I and all co-authors of this work have read and commit to adhering to the ICLR Code of Ethics.

Submission Guidelines: I certify that this submission complies with the submission instructions as described on https://iclr.cc/Conferences/2025/AuthorGuide.

Reciprocal Reviewing: I understand the reciprocal reviewing requirement as described on https://iclr.cc/Conferences/2025/CallForPapers. If none of the authors are registered as a reviewer, it may result in a desk rejection at the discretion of the program chairs. To request an exception, please complete this form at https://forms.gle/Huojr6VjkFxiQsUp6.

Anonymous Url: I certify that there is no URL (e.g., github page) that could be used to find authors’ identity.

No Acknowledgement Section: I certify that there is no acknowledgement section in this submission for double blind review.

Submission Number: 7719

Loading