Multi-sensor data fusion across dimensions: A novel approach to synopsis generation using sensory data
Abstract: Unmanned aerial vehicles (UAVs) and autonomous ground vehicles are increasingly outfitted with advanced sensors such as LiDAR, cameras, and GPS, enabling real-time object detection, tracking, localization, and navigation. These platforms generate high-volume sensory data, such as video streams and point clouds, that require efficient processing to support timely and informed decision-making. Although video synopsis techniques are widely used for visual data summarization, they encounter significant challenges in multi-sensor environments due to disparities in sensor modalities. To address these limitations, we propose a novel sensory data synopsis framework designed for both UAV and autonomous vehicle applications. The proposed system integrates a dual-task learning model with a real-time sensor fusion module to jointly perform abnormal object segmentation and depth estimation by combining LiDAR and camera data. The framework comprises a sensory fusion algorithm, a 3D-to-2D projection mechanism, and a Metropolis-Hastings-based trajectory optimization strategy to refine object tubes and construct concise, temporally-shifted synopses. This design selectively preserves and repositions salient information across space and time, enhancing synopsis clarity while reducing computational overhead. Experimental evaluations conducted on standard datasets (i.e., KITTI, Cityscapes, and DVS) demonstrate that our framework achieves a favorable balance between segmentation accuracy and inference speed. In comparison with existing studies, it yields superior performance in terms of frame reduction, recall, and F1 score. The results highlight the robustness, real-time capability, and broad applicability of the proposed approach to intelligent surveillance, smart infrastructure, and autonomous mobility systems.
External IDs:doi:10.1016/j.jii.2025.100876
Loading