Abstract: Sliding-window aggregation is one of the core operations in processing and analyzing data streams, but it seriously suffers from the unordered events or elements from data streams. Unordered streams or out-of-order data streams contain events whose order based on their timestamps (called event time) is different from the order based on their arriving times to the system (called ingestion time). Out-of-order data streams typically occur in a distributed environment due to many factors, such as network disruptions and delays. Out-of-order data streams drastically make the processing speed slower and existing works, that can handle out-of-order streams, do not address this problem well and can be further improved. The time complexities of existing approaches are not efficient because they are dependent on $n$, which is the number of slides in the window. In addition, they ignore the past windows affected by the late-arrival records. In many applications, updating and reporting the results of the past windows affected by the late-arrival records in real time is strongly needed. This paper proposes two solutions: (1) A Maximum-allowed lateness-based IndeXing algorithm with a Constant time complexity (CMiX) for computing the current window, and (2) A Past Window Indexing algorithm (PWiX) for efficient updating the past windows. Experimental results show that CMiX and PWiX can deal with out-of-order data streams significantly better than other existing approaches. CMiX is about 3.21 times faster than the state-of-the-art approach by significantly using less memory. It is important to emphasize that all approaches mentioned in the paper have the following limitations: (1) Aggregation can be both distributive and algebraic, which must be commutative due to the out-of-order of data streams, and (2) The window and slide sizes are assumed to be fixed, and if they are changed, the indices must be reconstructed.
Loading