Abstract: Automatic traffic surveillance usually relies on the estimation of traffic flow parameters through either dedicated sensors or the processing of road surveillance cameras. However, dedicated sensors are expensive to deploy and maintain. Moreover, available video processing algorithms usually require a complex multi-step pipeline, unsuited for large scale deployment. Herein, we address the problem of automatically estimating the flow rate (number of vehicles/unit of time) from surveillance cameras at low computation cost. To do so, we rely on end-to-end deep architectures applied to compressed MPEG4 part-2 video streams issued from road surveillance cameras. By leveraging the approximate flow representation induced by the compression, we heavily reduce the computation and memory requirements. We propose three end-to-end deep architectures using this coarse pixel flow representation as input. We also release two datasets, one based on synthetic videos and one collected on industrial tunnel cameras. By training the deep models on the newly introduced datasets, we evidence the effectiveness of predicting the flow rate directly from MPEG4 part-2 compressed video streams. We demonstrate an improved accuracy in comparison with a more classical RGB-based architecture and show an impressive speed up of $\times 2065$ at prediction time.
Loading