Data Storage and Management for Image AI Pipelines

Published: 22 Jun 2025, Last Modified: 25 Mar 2026SIGMODEveryoneCC BY 4.0
Abstract: Image AI is essential for various applications, such as self-driving cars, medical imaging, and smart farming. Data management is key for efficient image AI, from how to store images to how to manage data while processing the images. This tutorial overviews the emerging area of image AI pipelines by combining approaches from various cross-disciplinary areas such as data management, digital signal processing, computer vision, and machine learning. We specifically focus on image storage and data management. The tutorial first gives an overview of image AI pipelines step by step, how they work and the main the challenges. We then describe the main approaches to making image AI pipelines more efficient. We first cover how image AI pipelines store images based on stan- dard storage formats, learned formats, task-specific learned formats, and self-designed formats. Second, we cover how state-of-the-art approaches manage data within image AI pipelines. We identify and describe three main approaches to making image AI pipelines more efficient by efficiently managing data within the pipeline: (i) compressing intermediate data, (ii) materializing and re-using data objects, and (iii) parallelism for better hardware utilization. Lastly, the tutorial covers open data management and systems problems and future directions in making image AI pipelines more efficient.
Loading