Abstract: With the advances of satellite remote sensing techniques, we are receiving huge amount of satellite observation data for the Earth. While the data greatly helps Earth scientists on their research, conducting data processing and analytics from the data is getting more and more time consuming and complicated. One common data processing task is to aggregate satellite observation data from original pixel level to latitude-longitude grid level to easily obtain global information and work with global climate models. This paper focuses on how to best aggregate NASA MODIS satellite data products from pixel level to grid level in a distributed environment and provision the aggregation capability as a service for Earth scientists to use easily. We propose three different approaches of parallel data aggregation and employ three parallel platforms (Spark, Dask and MPI) to implement the approaches. We run extensive experiments based on these parallel approaches and platforms on a local cluster to benchmark their differences in execution performance and discuss key factors to achieve good speedup. We also study how to make the provisioned service adaptable to different service libraries and protocols via a unified framework.
0 Replies
Loading