Abstract: In the past few years, there has been a rising demand for monocular vehicle 6D pose estimation in traffic video, due to its applicability in emerging domains such as smart mobility and intelligent transportation systems. However, most of the existing approaches are image-based, which results in undesirable temporal pose inconsistencies of vehicles between consecutive frames when applied to video sequences. In this work, we present a Kalman filter-based post-processing method, named MonoKalman, that enhances the temporal consistencies of the 6D pose estimates of vehicles in traffic video. We compare our method with a state-of-the-art 6D pose estimation method on synthetic video data of traffic scenery. The experimental results indicate that our MonoKalman significantly outperforms the image-based baseline method and effectively reduces temporal pose artifacts, ensuring a more coherent and stable representation of 6D vehicle temporal poses in traffic video. To more effectively demonstrate MonoKalman’s enhancements over the baseline model, we design a graphical user interface. This interface offers users insights through detailed quantitative metrics and dynamic visualizations, allowing them to conduct and customize their experiments.
Loading