A Decade Review of Video Compressive Sensing: A Roadmap to Practical Applications

Zhihong Zhang, Siming Zheng

Published: 30 Aug 2024, Last Modified: 13 Nov 2024OpenReview Archive Direct UploadEveryoneRevisionsCC BY 4.0

Abstract: It has been over a decade since the first coded aperture video compressive sensing (CS) system was reported. The underlying principle of this technology is to employ a high-frequency modulator in the optical path to modulate a recorded high-speed scene within one integration time. The superimposed image captured in this manner is modulated and compressed, since multiple modulation patterns are imposed. Following this, reconstruction algorithms are utilized to recover the desired high-speed scene. One leading advantage of video CS is that a single captured measurement can be used to reconstruct a multi-frame video, thereby enabling a low-speed camera to capture high-speed scenes. Inspired by this, a number of variants of video CS systems have been built, mainly using different modulation devices. Meanwhile, in order to obtain highquality reconstruction videos, many algorithms have been developed, from optimization-based iterative algorithms to deep-learning-based ones. Recently, emerging deep learning methods have been dominant due to their high-speed inference and high-quality reconstruction, highlighting the possibility of deploying video CS in practical applications. Toward this end, this paper reviews the progress that has been achieved in video CS during the past decade. We further analyze the efforts that need to be made—in terms of both hardware and algorithms—to enable real applications. Research gaps are put forward and future directions are summarized to help researchers and engineers working on this topic.