Abstract: High-performance computing (HPC) systems generate massive scientific datasets, often stored in remote data repositories. Limited bandwidth and resource-constrained endpoints pose challenges for efficient large data transfer. Error-bounded lossy compression addresses this by reducing data sizes (higher bit-rates) while controlling distortion. However, different compressors exhibit distinct rate-distortion behaviors even under the same error bounds. Thus, selecting an optimal compressor before transfer is essential to meet endpoint-specific requirements e.g., maximizing data reduction at a fixed distortion. Existing trial-and-error approaches require multiple costly full-scale compression runs to reach at target requirements, making them impractical for such online use. To address this, we propose OptRD, a compressor-agnostic framework that efficiently models rate-distortion trade-offs across multiple lossy compressors by analyzing spatial data traits at reduced resolutions. Evaluated using 3 state-of-the-art lossy compressors on 30 scientific datasets from 4 HPC applications, OPTRD incurs only $\sim 5 \%$ average estimation error and achieves over $100 \times$ runtime speedup compared to trial-and-error methods, significantly improving optimal compressor selection during such data transfer use cases.
Loading