Optimizing Cloud-to-GPU Throughput for Deep Learning With Earth Observation Data

Akram Zaytar; Caleb Robinson; Girmaw Abebe Tadesse; Tammy Glazer; Gilles HACHEME; Anthony Ortiz; Rahul M Dodhia; Juan M Lavista Ferres

Optimizing Cloud-to-GPU Throughput for Deep Learning With Earth Observation Data

Akram Zaytar, Caleb Robinson, Girmaw Abebe Tadesse, Tammy Glazer, Gilles HACHEME, Anthony Ortiz, Rahul M Dodhia, Juan M Lavista Ferres

Published: 10 Jun 2025, Last Modified: 17 Jul 2025TerraBytes 2025 withproceedingsEveryoneRevisionsBibTeXCC BY 4.0

Keywords: Geospatial Deep Learning, Cloud-Native Training, Data Loader Optimization

TL;DR: We optimize PyTorch data loading for GeoTIFF files from cloud storage, achieving 20x faster remote throughput that keeps GPUs busy during Earth observation model training.

Abstract: Training deep learning models on petabyte-scale Earth observation (EO) data requires separating compute resources from data storage. However, standard PyTorch data loaders cannot keep modern GPUs utilized when streaming GeoTIFF files directly from cloud storage. In this work, we benchmark GeoTIFF loading throughput from both cloud object storage and local SSD, systematically testing different loader configurations and data parameters. We focus on tile-aligned reads and worker thread pools, using Bayesian optimization to find optimal settings for each storage type. Our optimized configurations increase remote data loading throughput by 20$\times$ and local throughput by 4$\times$ compared to default settings. On three public EO benchmarks, models trained with optimized remote loading achieve the same accuracy as local training within identical time budgets. We improve validation IoU by $6$--$15$\% and maintain $85$--$95$\% GPU utilization versus $0$--$30$\% with standard configurations. Code and reproducible pipelines will be released publicly.

Supplementary Material: zip

Submission Number: 26

Loading