Presto: Hybrid CPU-GPU Preprocessing Framework for Video-based AI Inference System

Jihyuk Lee, Dongsu Han, Jaehong Kim

Published: 23 Jun 2025, Last Modified: 09 Nov 2025CrossrefEveryoneRevisionsCC BY-SA 4.0
Abstract: The growing adoption of video-based AI models has created a pressing demand for high throughput, low latency inference systems. However, existing preprocessing frameworks—whether CPU or GPU based—struggle to keep up with the computational burdens of video decoding and data augmentation, resulting in suboptimal GPU utilization and degraded inference system performance.In this paper, we present PRESTO, a high-performance hybrid CPU-GPU preprocessing framework tailored for video-based AI inference systems. Presto integrates a hybrid preprocessing scheduler to dynamically balance CPU and GPU workloads, leverages selective decoding to eliminate unnecessary frame processing, and introduces a custom GPU Memory Manager that enables pipelined preprocessing and efficient GPU memory reuse. Through evaluation on the video captioning task, we show that Presto achieves up to 4.37× higher throughput and 2.72× lower latency compared to the de facto baselines, while reducing cloud costs by up to 75%.
Loading