D2SO: Detecting Distant and Small Objects for Vision-Based Vehicle Autonomous Systems

Hanzhi Zhang, Harold Lucero, Kewei Sha, Heng Fan, Song Fu, Yunhe Feng

Published: 2025, Last Modified: 01 Jun 2026MOST 2025EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: Detecting distant and small objects is a critical capability for vision-based vehicle autonomous systems, particularly in safety-critical scenarios such as assisted driving, where constant alertness, early reaction, and safe operation are very important. However, accurately recognizing tiny objects at a distance presents significant challenges due to limited pixel information, preprocessing-induced downscaling, and restricted detector resolution. To address these issues, this paper introduces D2SO, a Vision Transformer (ViT)-based framework specifically designed to enhance the detection of distant and small objects for autonomous systems. D2SO integrates multiple fine-tuned AI models, derived from the open-source Segment Anything Model (SAM), to detect distant objects such as static structures, humans, and vehicles, thereby improving situational awareness. The system employs visual cues through color mask overlays to efficiently convey essential information, ensuring users remain wellinformed about detection outcomes. D2SO is explicitly optimized to detect distant entities that occupy fewer than $24 \times 24$ pixels on display, a scale often imperceptible to humans. Experimental results demonstrate that D2SO significantly outperforms baseline models, including SegFormer, YOLO v11 Segmentation, and UNet, on a real-world street scene dataset spanning 50 cities, establishing its effectiveness in enhancing autonomous system performance.

External IDs:dblp:conf/most/ZhangLS0FF25