A Predictive Profiling and Performance Modeling Approach for Distributed Stream Processing in Edge

Hasan Geren; Nasrin Sohrabi; Zahir Tari; Nour Moustafa

A Predictive Profiling and Performance Modeling Approach for Distributed Stream Processing in Edge

Hasan Geren, Nasrin Sohrabi, Zahir Tari, Nour Moustafa

Published: 01 Jan 2024, Last Modified: 13 May 2025ICDE 2024EveryoneRevisionsBibTeXCC BY-SA 4.0

Abstract: The advent of edge computing has allowed the continuously generated data to be processed closer to their sources instead of being sent to the cloud for processing. Given the heterogeneous and limited computational resources and dynamic nature of edge computing, stream processing systems need an accurate and easily accessible performance modeling/measurement to perform efficiently in edge environments. This paper proposes a predictive profiling model to enable measuring the performance of a system by predicting the operators' processing time on heterogeneous devices without having to carry out the testing on individual devices. This profiling model comprises a quadratic function to generate CPU clock speed/processing time curves for each operator. By using these curves, the model predicts the processing times of operators without requiring any extra profiling runs. Moreover, a performance model is proposed to deal with (performance) degradation of stream processing applications by modeling their topologies as systems comprising M/M/1 queues. The model uses the performance expectations of queueing models to define the data transfer rates inside topologies and uses Integer Linear Programming to specify the maximum input rate and an operator placement plan that can process that input rate. Experimental results showed that the profiling approach predicts the processing times of 17 operators with an average error rate of 5%. The performance model finds the maximum input rate accurately, while the operator placement plan achieves up to 84% higher throughput and 70% less latency in AWS EC2 instances and 257% higher throughput and 66% less latency in real hardware compared to the default resource-aware scheduler of Apache Storm.

Loading