Progressive Compressed Records: Taking a Byte Out of Deep Learning Data

Michael Kuchnik; George Amvrosiadis; Virginia Smith

Progressive Compressed Records: Taking a Byte Out of Deep Learning Data

Michael Kuchnik, George Amvrosiadis, Virginia Smith

25 Sept 2019 (modified: 22 Jun 2025)ICLR 2020 Conference Blind SubmissionReaders: Everyone

TL;DR: We propose a simple, general, and space-efficient data format to accelerate deep learning training by allowing sample fidelity to be dynamically selected at training time

Abstract: Deep learning training accesses vast amounts of data at high velocity, posing challenges for datasets retrieved over commodity networks and storage devices. We introduce a way to dynamically reduce the overhead of fetching and transporting training data with a method we term Progressive Compressed Records (PCRs). PCRs deviate from previous formats by leveraging progressive compression to split each training example into multiple examples of increasingly higher fidelity, without adding to the total data size. Training examples of similar fidelity are grouped together, which reduces both the system overhead and data bandwidth needed to train a model. We show that models can be trained on aggressively compressed representations of the training data and still retain high accuracy, and that PCRs can enable a 2x speedup on average over baseline formats using JPEG compression. Our results hold across deep learning architectures for a wide range of datasets: ImageNet, HAM10000, Stanford Cars, and CelebA-HQ.

Keywords: Deep Learning, Storage, Bandwidth, Compression

Community Implementations: [![CatalyzeX](/images/catalyzex_icon.svg) 2 code implementations](https://www.catalyzex.com/paper/progressive-compressed-records-taking-a-byte/code)

Original Pdf: pdf

10 Replies

Loading