Abstract: Deep learning based artificial intelligence algorithms are widely deployed in the video analytics pipelines as they drastically reduce the need for manual analysis and achieve human-like accuracy. However, they have high compute memory/storage requirements due to ever increasing model architecture size and large volumes of data. Processing-in-memory architectures are gaining prominence for efficient execution of deep learning workloads as they reduce the data movement bottlenecks by moving compute closer to the data. In this work, we present the system level analysis of processing-in-memory architectures across the memory hierarchy for the execution of deep learning algorithms in the video analytics workloads using the proposed SysPIM methodology. We compare processing-in-memory architectures at cache memory, main memory, and non-volatile memory in terms of their execution latency, energy consumption, and overall data movement for representative video analytics workloads.
Loading