Abstract: With the rapid advances of deep learning-based computer vision (CV) technology, digital images are increasingly consumed, not by humans, but by downstream CV algorithms. However, capturing high-fidelity and high-resolution images is energy-intensive. It not only dominates the energy consumption of the sensor itself (i.e. in low-power edge devices), but also contributes to significant memory burdens and performance bottlenecks in the later storage, processing, and communication stages. In this paper, we systematically explore a new paradigm of in-sensor processing, termed "learned compressive acquisition" (LeCA). Targeting machine vision applications on the edge, the LeCA framework exploits the joint learning of a sensor autoencoder structure with the downstream CV algorithms to effectively compress the original image into low-dimensional features with adaptive bit depth. We employ column-parallel analog-domain processing directly inside the image sensor to perform the compressive encoding of the raw image, resulting in meaningful hardware savings, and energy efficiency improvements. Evaluated within a modern machine vision processing pipeline, LeCA achieves 4×, 6×, and 8× compression ratios prior to any digital compression, with minimal accuracy loss of 0.97%, 0.98%, and 2.01% on ImageNet, outperforming existing methods. Compared with the conventional full-resolution image sensor and the state-of-the-art compressive sensing sensor, our LeCA sensor is 6.3× and 2.2× more energy-efficient while reaching a 2× higher compression ratio.
Loading